When adding more than one ER field from node to term with the taxonomy_index behaviour, entries in {taxonomy_index} are incorrectly inserted in more than one occurrence. Apart from the instrinsic error, this causes problems in views using this index, which receive duplicate results.

The problems comes from EntityReferenceBehavior_TaxonomyIndex::buildbuildNodeIndex() being invoked multiple times from entityreference_entity_crud() and inserting the correct data every time instead of merging them. I'm writing a test case to demonstrate the problem and prove the fix once someone writes it.

Files: 
CommentFileSizeAuthor
#12 duplicate-entries-in-taxo_index.patch3.91 KBFrando
PASSED: [[SimpleTest]]: [MySQL] 123 pass(es).
[ View ]
#7 0002-Issue-1924444-duplicate-entries-in-taxo_index.patch3.68 KBfgm
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es).
[ View ]
#1 0001-Issue-1924444-duplicate-entries-in-taxonmy_index.patch2.35 KBfgm
FAILED: [[SimpleTest]]: [MySQL] 120 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

Comments

fgm’s picture

StatusFileSize
new2.35 KB
FAILED: [[SimpleTest]]: [MySQL] 120 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

And here is the test showing the problem.

fgm’s picture

Status:Active» Needs review

Pinging bot to show RED.

Status:Needs review» Needs work

The last submitted patch, 0001-Issue-1924444-duplicate-entries-in-taxonmy_index.patch, failed testing.

Damien Tournoud’s picture

Is this code in 8.x now? If so, please reassign to core.

fgm’s picture

Alas no: 8.x only included the selection plugins, not the behaviors. I just tested and this index feature is not present at all in 8.x.

fgm’s picture

Possible patch. Green ?

Note that, although this appears to work, it switches node saves from #fields multi-row Insert queries to #fields * #tids single-row Merge queries. There might be a more efficient way to do this.

fgm’s picture

Status:Needs work» Needs review
StatusFileSize
new3.68 KB
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es).
[ View ]

Better with the patch.

Damien Tournoud’s picture

Alternatively, we could add a entityPostInsertMultiple() and entityPostUpdateMultiple() methods, that get called once per operation per entity, with a list of (field, instance) tuples.

fgm’s picture

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

Not worth it, IMO. Or at least not worth delaying a bug fix. We can always refactor later, especially when the time comes to kill taxonomy_term_reference in D8.

Damien Tournoud’s picture

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

No, I suggest one call per operation, per entity. The "multiple" part is that it's called only once per behavior, even if there are multiple fields in the entity that uses the same behavior.

fgm’s picture

That's very likely to be a good idea, especially when looking at our current implementations, where we have probably more repeated loops than needed. I just won't have any time to work on it in the short term, so unless there is a problem with this fix, I'd rather see it merged and the performance issue addressed separately.

Frando’s picture

Issue summary:View changes
StatusFileSize
new3.91 KB
PASSED: [[SimpleTest]]: [MySQL] 123 pass(es).
[ View ]

Attached patch is the same as #7 but adds an update function to remove existing duplicates.

This can take a while for large sites, I am not sure how batch API handles very long queries. Large sites will likely run upates via drush updatedb anyway.

hawkeye217’s picture

Was running into this issue and I can confirm the patch in #12 works great.

eelkeblok’s picture

Status:Needs review» Reviewed & tested by the community

Same here. Pure coincidence we both needed this patch today, I suppose. I'm putting this on RTBC (glancing over the patch I don't see anything weird code-wise either).

jayhawkfan75’s picture

I just tried patch #12, and it worked for me up until the job of clearing duplicate rows. This was the error I received:

entityreference module
Update #7003

Failed: PDOException: SQLSTATE[42000]: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near the keyword 'LIKE'.: CREATE TABLE {tmp_taxonomy_index} LIKE {taxonomy_index}; Array ( ) in entityreference_update_7003() (line 170 of sites\all\modules\entityreference\entityreference.install).

Will the SQL query work only in mySQL? I'm using SQL Server, so I don't know if the syntax is what's causing the issue for me.

eelkeblok’s picture

It would appear that "CREATE TABLE ... LIKE" is indeed not supported by SQL Server (e.g. http://stackoverflow.com/questions/616104/what-is-the-equivalent-of-crea...). Maybe you can provide an alternative patch that also works on SQL Server? I'm not sure if this is reason enough to reset the issue status back to needs work.

jayhawkfan75’s picture

Thanks for the quick reply. I can definitely look into doing that. It would be my first attempt at a real patch. If I'm having issues with it with SQL Server, then Postgres users are probably having the same issue as well.

In the meantime, I ran those 5 query jobs manually without an issue.

fgm’s picture

I think that would have to be a patch against the MS SQL driver rather than EntityReference ? This EntityReference behavior relies on the sql_field_storage, but should not be incuding engine-specific code.

jelo’s picture

I just ran into this as well. Will the patch be committed anytime soon? To me this sounds like a major issue as it seems rather common to use multiple entity references in views. I upgraded my site from D6 to D7 and every second view was suddenly showing duplicates because of this after I completed the field migration from nodereference to entityreference.

jelo’s picture

I can confirm that the patch in #12 applied nicely and fixed the issue.

jelo’s picture

Well, since I applied the patch any nodes being added with a single taxonomy term do not get indexed in taxonomy_index table. The values are stored in the field_data_taxonomy_vocabulary_x table, but not indexed which removes the entries from views as soon as one filters on it...

jelo’s picture

I did some more digging and discovered the following. Not sure if this affects a clean install as my site is an upgraded site from D6. On content types with a single entity reference field the taxonomy index was not created if the field was NOT required (and hence not shown in views that used taxonomy index), i.e. this problem seems not to be caused by this patch. As soon as I switched the field to a required field, the taxonomy index entry was created correctly (with and without the patch being applied). If someone can confirm that it happens to them as well with the described setup, I suggest we create a separate issue...