When adding more than one ER field from node to term with the taxonomy_index behaviour, entries in {taxonomy_index} are incorrectly inserted in more than one occurrence. Apart from the instrinsic error, this causes problems in views using this index, which receive duplicate results.

The problems comes from EntityReferenceBehavior_TaxonomyIndex::buildbuildNodeIndex() being invoked multiple times from entityreference_entity_crud() and inserting the correct data every time instead of merging them. I'm writing a test case to demonstrate the problem and prove the fix once someone writes it.

Files: 
CommentFileSizeAuthor
#38 duplicate_entries_with-1924444-38.patch3.91 KBzaporylie
#28 duplicate_entries_with-1924444-28.patch3.85 KBzaporylie
#28 duplicate_entries_with-TEST_ONLY-1924444-28.patch1.95 KBzaporylie
#25 entityreference-tax_index_dup-1924444-25.patch3.91 KBindytechcook
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch entityreference-tax_index_dup-1924444-25.patch. Unable to apply patch. See the log in the details link for more information. View
#12 duplicate-entries-in-taxo_index.patch3.91 KBFrando
PASSED: [[SimpleTest]]: [MySQL] 123 pass(es). View
#7 0002-Issue-1924444-duplicate-entries-in-taxo_index.patch3.68 KBfgm
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es). View
#1 0001-Issue-1924444-duplicate-entries-in-taxonmy_index.patch2.35 KBfgm
FAILED: [[SimpleTest]]: [MySQL] 120 pass(es), 1 fail(s), and 0 exception(s). View

Comments

fgm’s picture

FileSize
2.35 KB
FAILED: [[SimpleTest]]: [MySQL] 120 pass(es), 1 fail(s), and 0 exception(s). View

And here is the test showing the problem.

fgm’s picture

Status: Active » Needs review

Pinging bot to show RED.

Status: Needs review » Needs work

The last submitted patch, 0001-Issue-1924444-duplicate-entries-in-taxonmy_index.patch, failed testing.

Damien Tournoud’s picture

Is this code in 8.x now? If so, please reassign to core.

fgm’s picture

Alas no: 8.x only included the selection plugins, not the behaviors. I just tested and this index feature is not present at all in 8.x.

fgm’s picture

Possible patch. Green ?

Note that, although this appears to work, it switches node saves from #fields multi-row Insert queries to #fields * #tids single-row Merge queries. There might be a more efficient way to do this.

fgm’s picture

Status: Needs work » Needs review
FileSize
3.68 KB
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es). View

Better with the patch.

Damien Tournoud’s picture

Alternatively, we could add a entityPostInsertMultiple() and entityPostUpdateMultiple() methods, that get called once per operation per entity, with a list of (field, instance) tuples.

fgm’s picture

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

Not worth it, IMO. Or at least not worth delaying a bug fix. We can always refactor later, especially when the time comes to kill taxonomy_term_reference in D8.

Damien Tournoud’s picture

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

No, I suggest one call per operation, per entity. The "multiple" part is that it's called only once per behavior, even if there are multiple fields in the entity that uses the same behavior.

fgm’s picture

That's very likely to be a good idea, especially when looking at our current implementations, where we have probably more repeated loops than needed. I just won't have any time to work on it in the short term, so unless there is a problem with this fix, I'd rather see it merged and the performance issue addressed separately.

Frando’s picture

Issue summary: View changes
FileSize
3.91 KB
PASSED: [[SimpleTest]]: [MySQL] 123 pass(es). View

Attached patch is the same as #7 but adds an update function to remove existing duplicates.

This can take a while for large sites, I am not sure how batch API handles very long queries. Large sites will likely run upates via drush updatedb anyway.

hawkeye217’s picture

Was running into this issue and I can confirm the patch in #12 works great.

eelkeblok’s picture

Status: Needs review » Reviewed & tested by the community

Same here. Pure coincidence we both needed this patch today, I suppose. I'm putting this on RTBC (glancing over the patch I don't see anything weird code-wise either).

jayhawkfan75’s picture

I just tried patch #12, and it worked for me up until the job of clearing duplicate rows. This was the error I received:

entityreference module
Update #7003

Failed: PDOException: SQLSTATE[42000]: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near the keyword 'LIKE'.: CREATE TABLE {tmp_taxonomy_index} LIKE {taxonomy_index}; Array ( ) in entityreference_update_7003() (line 170 of sites\all\modules\entityreference\entityreference.install).

Will the SQL query work only in mySQL? I'm using SQL Server, so I don't know if the syntax is what's causing the issue for me.

eelkeblok’s picture

It would appear that "CREATE TABLE ... LIKE" is indeed not supported by SQL Server (e.g. http://stackoverflow.com/questions/616104/what-is-the-equivalent-of-crea...). Maybe you can provide an alternative patch that also works on SQL Server? I'm not sure if this is reason enough to reset the issue status back to needs work.

jayhawkfan75’s picture

Thanks for the quick reply. I can definitely look into doing that. It would be my first attempt at a real patch. If I'm having issues with it with SQL Server, then Postgres users are probably having the same issue as well.

In the meantime, I ran those 5 query jobs manually without an issue.

fgm’s picture

I think that would have to be a patch against the MS SQL driver rather than EntityReference ? This EntityReference behavior relies on the sql_field_storage, but should not be incuding engine-specific code.

jelo’s picture

I just ran into this as well. Will the patch be committed anytime soon? To me this sounds like a major issue as it seems rather common to use multiple entity references in views. I upgraded my site from D6 to D7 and every second view was suddenly showing duplicates because of this after I completed the field migration from nodereference to entityreference.

jelo’s picture

I can confirm that the patch in #12 applied nicely and fixed the issue.

jelo’s picture

Well, since I applied the patch any nodes being added with a single taxonomy term do not get indexed in taxonomy_index table. The values are stored in the field_data_taxonomy_vocabulary_x table, but not indexed which removes the entries from views as soon as one filters on it...

jelo’s picture

I did some more digging and discovered the following. Not sure if this affects a clean install as my site is an upgraded site from D6. On content types with a single entity reference field the taxonomy index was not created if the field was NOT required (and hence not shown in views that used taxonomy index), i.e. this problem seems not to be caused by this patch. As soon as I switched the field to a required field, the taxonomy index entry was created correctly (with and without the patch being applied). If someone can confirm that it happens to them as well with the described setup, I suggest we create a separate issue...

Anybody’s picture

This issue is now quite old but still seems to have cross-SQL-engine problems.
Could we place clear up, if the patch needs work or if the cross-sql-engine problems are outside of the patch and take this into the next .dev version of entity reference soon?

As said above some month ago this issue is really problematic and should not wait for a fix that long, I think. Is there someone with the required know-how and probably test environments to have a look at this?

Anybody’s picture

Isn't there a further little typo in patch #12?
I think there are closing brackets missing (eventhough the patch seems to work fine under MySQL):

In

  db_query("CREATE TABLE {tmp_taxonomy_index} LIKE {taxonomy_index}");
  db_query("INSERT INTO {tmp_taxonomy_index} SELECT * FROM {taxonomy_index} GROUP BY nid, tid");
  db_query("DELETE FROM {taxonomy_index}");
  db_query("INSERT INTO {taxonomy_index} SELECT * FROM {tmp_taxonomy_index");
  db_query("DROP TABLE {tmp_taxonomy_index");

See:
db_query("INSERT INTO {taxonomy_index} SELECT * FROM {tmp_taxonomy_index}");
db_query("DROP TABLE {tmp_taxonomy_index}");

--------

For MSSQL Server something like this will be needed:
http://stackoverflow.com/questions/15428168/sql-server-create-a-copy-of-...

SELECT *
INTO {tmp_taxonomy_index}
FROM {taxonomy_index};

Does somebody know if all other engines support that?
MySQL: https://dev.mysql.com/doc/refman/5.0/en/select-into.html

indytechcook’s picture

FileSize
3.91 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch entityreference-tax_index_dup-1924444-25.patch. Unable to apply patch. See the log in the details link for more information. View

@anybody, nice catch. Here is a patch attached with the correct formatting.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 25: entityreference-tax_index_dup-1924444-25.patch, failed testing.

indigoxela’s picture

Hi,
unfortunately the latest attached patch (#25) failed testing.
And #12 has typos.

@indytechcook: would you please have a look at this "unable to apply patch" issue?

regards,
indigoxela

zaporylie’s picture

I just encountered this issue so here's rerolled version of #25 and test-only version to prove it works.

The last submitted patch, 28: duplicate_entries_with-TEST_ONLY-1924444-28.patch, failed testing.

indigoxela’s picture

@ zaporylie: many thanks for the rerolled patch.

Tested on a fresh install with entityreference-7.x-1.x-dev on a node type with two different entityreference fields to taxonomy terms..
First without patch to reproduce the faulty behaviour. As expected: duplicates in database table taxonomy_index occured.
(Adding a third field to this node type resulted in three records per node.)

Then I applied the patch and ran "drush updb".
This removed the duplicates from database table.

Adding more terms to nodes now works as it should.
No more duplicates.

zaporylie’s picture

In that case RTBC?

indigoxela’s picture

In that case RTBC?

Good question...
Is entityreference module supposed to support database servers besides MySQL?

I just tested it on MySQL.

indigoxela’s picture

Well, I guess, we should make the update compatible to other database servers.

Does any of the followers see a chance to adapt function entityreference_update_7003() to only use Drupal's db abstraction layer?
https://api.drupal.org/api/drupal/includes!database!database.inc/7.x

I'm not familiar with Postgres or MSSQL and couldn't even test it.

indigoxela’s picture

This could be a starting point for the update function without direct MySQL syntax usage.
Tested to work on MySQL.

<?php
function entityreference_update_7003() {
  $tx_schema = drupal_get_schema('taxonomy_index');
  $fields = array_keys($tx_schema['fields']);
  db_create_table('taxonomy_index_tmp', $tx_schema);
  $select = db_select('taxonomy_index', 'tx');
  $select->fields('tx');
  $select->groupBy('tx.nid');
  $select->groupBy('tx.tid');
  $result = $select->execute()->fetchAll(PDO::FETCH_ASSOC);
  $insert = db_insert('taxonomy_index_tmp')->fields($fields);
  foreach ($result as $record) {
    $insert->values($record);   
  }
  $insert->execute();
  db_drop_table('taxonomy_index');
  db_rename_table('taxonomy_index_tmp', 'taxonomy_index');
}
?>

People with better pdo skills: please have a look and improve the code.
Anyone willing to test on different database servers is welcome.

eelkeblok’s picture

Thanks for that. I would suggest to define a batch size and do this in several goes when there are a lot of entries. I'm a little worried this query could spin out of control if there are a lot of entries in the index table.

indigoxela’s picture

this query could spin out of control if there are a lot of entries in the index table.

Good point.

I just saw, that it is possible to directly feed a select to db_insert (didn't realize that yesterday). This query happens in db, so there's no danger, that php runs out of memory or something like that.

<?php
function entityreference_update_7003() {
  $tx_schema = drupal_get_schema('taxonomy_index');
  db_create_table('taxonomy_index_tmp', $tx_schema);
  $select = db_select('taxonomy_index', 'tx');
  $select->fields('tx');
  $select->groupBy('tx.nid');
  $select->groupBy('tx.tid');
  db_insert('taxonomy_index_tmp')->from($select)->execute(); 
  db_drop_table('taxonomy_index');
  db_rename_table('taxonomy_index_tmp', 'taxonomy_index');
}
?>
eelkeblok’s picture

Good stuff. I've actually done a quick scan of how the various database drivers (including the non-core support for MS SQL, https://www.drupal.org/project/sqlsrv) and they all seem to build a construct of the form

'INSERT INTO {' . $this->table . '}' . $insert_fields_string . $this->fromQuery;

At least, if it doesn't work, we've used the appropriate APIs and there is an opportunity to fix it in the database driver, where it belongs.

zaporylie’s picture

In that case I updated #28 with changes suggested by @indigoxela in #36

Status: Needs review » Needs work

The last submitted patch, 38: duplicate_entries_with-1924444-38.patch, failed testing.

The last submitted patch, 38: duplicate_entries_with-1924444-38.patch, failed testing.

The last submitted patch, 38: duplicate_entries_with-1924444-38.patch, failed testing.

indigoxela’s picture

Looks like we are stuck a bit here.

entityreference.admin.test keeps failing. This is not the test introduced with our patch.
Any ideas about how to solve this are welcome.

eelkeblok’s picture

Edit: Obviously, first make sure the tests also fail without the patch applied.

I believe the correct procedure would be to identify the problem and create a critical issue for it (or raise an existing one to critical if it doesn't exist yet). See https://www.drupal.org/node/45111:

Critical bugs include those that:

  • ...
  • Cause tests to fail in HEAD on the automated testing platform for any supported environment (including random failures), since this blocks all other work.

The last submitted patch, 28: duplicate_entries_with-1924444-28.patch, failed testing.

indigoxela’s picture

Whatever the problem is - it is not reproducible on my local dev server.
All entityreference tests pass here - with or without this patch.

My setup:
PHP 5.4
MySQL 5.5
Drupal 7.43
Testing module on (obviously)

The test ran for 44 seconds.
Result: 126 passes, 0 fails, 0 exceptions and 12 debug messages

So unfortunately I'm still clueless.

indigoxela’s picture

Ahhh... reading my own post, I just realized the difference: Drupal core version.

Running the same tests on my local dev server right after the update to Drupal 7.x-dev fails.
Result: 102 passes, 31 fails, 6 exceptions und 6 debug messages
All fails are in the tests for the administrative UI - the same as online.

So very likely it is a problem with a drupal core code change and not with our test.

eelkeblok’s picture

Thanks for that. I did some git bisecting on the most recent changes to core. This is the result:

c3bf7484b38a1821c1631033217c4290a591c0de is the first bad commit
commit c3bf7484b38a1821c1631033217c4290a591c0de
Author: David Rothstein <drothstein@gmail.com>
Date:   Sat May 28 14:43:25 2016 -0400

    Issue #611294 by swentel, klausi, David_Rothstein, tsphethean, jenlampton, attiks, yched, sun, benjy, dcam: Added a new "administer fields" permission for trusted users to use the field UI.

:100644 100644 e6e6e19fd0018f74d26777fac2bd181c9b0553e2 3ededdac75803b056384f9dab0f787867b280c56 M	CHANGELOG.txt
:040000 040000 0efb6b34b8d1de001d32c65a0a42f49eec420831 546c62b5ab10e462141264d1726a6d4a3d038111 M	modules

I'll create an issue for this and try my hand at a patch.

eelkeblok’s picture

eelkeblok’s picture