Posted by xjm

Problem/Motivation

(From #1050466: The taxonomy index should be maintained in a node hook, not a field hook.)

  • If a node has multiple taxonomy autocomplete fields that use the same vocabulary and the user enters the same value in two fields, the term will be duplicated.

Example scenario

This is not just a weird edge case. Consider:

  1. You have a "University" vocabulary.
  2. You allow users to enter their undergraduate school in one field, and their graduate school in another.
  3. The user enters "University of Texas" in both fields.
  4. When the node is saved, there will be two different terms named "University of Texas."

Proposed resolution

  • Check for existing records with the same name in the same vocabulary before creating new terms.

Remaining tasks

  • It's been suggested that the query used in taxonomy_field_presave()to check for existing terms could lead to race conditions:
    +++ b/core/modules/taxonomy/taxonomy.moduleundefined
    @@ -1710,6 +1710,15 @@ function taxonomy_rdf_mapping() {
    +      // Avoid duplicating tags within the same vocabulary.
    +      $tid = db_query_range("SELECT tid FROM {taxonomy_term_data} WHERE name = :name AND vid = :vid", 0, 1, array(
    +        ':name' => trim($item['name']),
    +        ':vid' => $item['vid'],
    +      ))->fetchField();
    +      if (!empty($tid)) {
    +        $items[$delta]['tid'] = $tid;
    +        continue;
    +      }
    

    See #1050466-34: The taxonomy index should be maintained in a node hook, not a field hook for some proposed solutions.

User interface changes

  • None.

API changes

  • None.
Files: 
CommentFileSizeAuthor
#27 add_duplicate_term-1343822-27.patch10.43 KBchr.fritsch
#25 interdiff-1343822-22-25.txt3.64 KBchr.fritsch
#25 add_duplicate_term-1343822-25.patch10.44 KBchr.fritsch
#22 add_duplicate_term-1343822-22.patch7.17 KBchr.fritsch
#20 add_duplicate_term-1343822-20.patch5.64 KBchr.fritsch
#9 D7-autocreate-1343822-9-notests-do-not-test.patch886 bytesjames.williams
#1 autocreate-1343822-tests.patch4.29 KBxjm
FAILED: [[SimpleTest]]: [MySQL] 34,005 pass(es), 2 fail(s), and 0 exception(es). View
#1 autocreate-1343822-combined.patch5.18 KBxjm
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch autocreate-1343822-combined.patch. Unable to apply patch. See the log in the details link for more information. View

Comments

xjm’s picture

FileSize
5.18 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch autocreate-1343822-combined.patch. Unable to apply patch. See the log in the details link for more information. View
4.29 KB
FAILED: [[SimpleTest]]: [MySQL] 34,005 pass(es), 2 fail(s), and 0 exception(es). View
xjm’s picture

Status: Active » Needs review
xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Marked #1146744: The tagging widget doesn't check if the same term already exists as duplicate. From that issue:

mr.baileys: Isn't the problem more that the exist-check is done by taxonomy_autocomplete_validate, but the term-creation is done by taxonomy_field_presave, not that the results from taxonomy_term_load_multiple is cached?

So first both autocomplete-fields are element-validated, which sets them as "need to be created", and when the field is then saved, no checking is done to see if the term exists, so both terms are created.

catch’s picture

I think there's another issue for this somewhere as well, but can't find it at the moment.

fietserwin’s picture

#4; #610076: Add a primary key to the {taxonomy_index} table and prevent duplicates being inserted?

I think that the race condition can be avoided by using a merge query (aka upsert), That is, if that would be a real merge query at database level OR if the 2 separate queries (that are issued directly after each other) are placed in a transaction.

As further work on #610076: Add a primary key to the {taxonomy_index} table and prevent duplicates being inserted is going on, I think this one can be closed as a duplicate. Perhaps the test can be added to the mentioned issue.

kscheirer’s picture

#1: autocreate-1343822-combined.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, autocreate-1343822-combined.patch, failed testing.

xjm’s picture

I'd suggest postponing work on taxonomy autocomplete bugs at least a week or so until we see what direction #1847596: Remove Taxonomy term reference field in favor of Entity reference is going to take.

james.williams’s picture

Here's a port of the patch for Drupal 7, without the accompanying tests. Applies cleanly & fixes the bug for me.

james.williams’s picture

Issue summary: View changes

Updated summary.

xjm’s picture

Issue summary: View changes

Removing myself from the author field so I can unfollow. --xjm

jibran’s picture

Component: taxonomy.module » entity_reference.module

All the taxonomy field related matters are handled by entityreference field after #1847596: Remove Taxonomy term reference field in favor of Entity reference

amateescu’s picture

Component: entity_reference.module » taxonomy.module

Drupal\taxonomy\Plugin\EntityReferenceSelection\TermSelection is provided by the Taxonomy module, moving back to the right queue.

anrikun’s picture

@james.williams
Thanks for patch at #9!

michaelmallett’s picture

Patch worked great for me, thanks. Is there any way to get this in core? Just wondering why it's been around for 3 years.

amateescu’s picture

Title: Autocreated taxonomy terms duplicated if they are added to multiple fields. » Add duplicate term names prevention as a vocabulary option
Version: 8.0.x-dev » 8.1.x-dev
Category: Bug report » Feature request

Since there is no more dedicated taxonomy reference field, just the generic entity reference one, this can no longer be fixed at the field level, we need to move the fix higher up the chain to taxonomy term entity presave.

But, as far as I see, there is no mechanism that prevents adding terms with the same name in a certain vocabulary, so maybe this should be added as a vocabulary configuration option?

With that in mind, I'm moving the issue to 8.1.x and reclassifying as a feature request.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

ABaier’s picture

I am wondering if there is any movement within this issue or maybe even a patch for 8.2.2?

My editors can add multiple event dates to a node using a paragraphs type which also includes a taxonomy reference field for the type of each individual event (for example "concert"). Field settings are autocomplete tagging and "Create referenced entities if they don't already exist".

If they add multiple instances of this paragraph on node creation, all containing a similar, non existent tag of "event type", this term will be created multiple times with the same name, but varying TIDs, after saving the node.

Unfortunately I cannot help with the integration, but I would think the existence of the entered terms has to be validated, maybe for each individual instance of the taxonomy field, one after another, if it appears multiple times inside the node form.

ABaier’s picture

Another case where a taxonomy field could exist multiple times in a node edit form, besides paragraphs, would be the use of media entity. Maybe many more, if inline entity form is used to create other entities within a node.

I ran in both issues, paragraphs and media entity. The paragraph is used to tag for example the location of an event and the media entity image/video can also be tagged with this location vocabulary. This is all for categorization purposes and granulary filtering of content. To mention is, that the node itself also contains some taxonomy fields including the location field.

What I want to say is, that the chance of having more instances of one taxonomy field is quite big. Especially with the upcoming node creation workflows using the mentioned modules.

Now that media entity will be ported to core this issue really should be considered.

Any ideas?

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
5.64 KB

Here is a first patch. I implemented the check as a Validator.

Status: Needs review » Needs work

The last submitted patch, 20: add_duplicate_term-1343822-20.patch, failed testing.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
7.17 KB

Fix some tests

chr.fritsch’s picture

Issue tags: +Needs tests

I know, tests are still missing.

I want first to get a sign-off, if this approach is feasible and we want to continue into this direction.

webflo’s picture

  1. +++ b/core/modules/taxonomy/src/Entity/Vocabulary.php
    @@ -120,6 +128,13 @@ public function getDescription() {
    +  public function preventDuplicates() {
    

    This method should be on the interface. I think its not possible to add methods to an existing interface. Not sure about the BC implications. I think we have to add a new one.

  2. +++ b/core/modules/taxonomy/src/Plugin/Validation/Constraint/TermNameConstraintValidator.php
    @@ -0,0 +1,57 @@
    +    if (!Vocabulary::load($entity->bundle())->preventDuplicates()) {
    

    Should check for the new interface first.

  3. +++ b/core/modules/taxonomy/src/Plugin/Validation/Constraint/TermNameConstraintValidator.php
    @@ -0,0 +1,57 @@
    +      ->condition($id_key, (int) $items->getEntity()->id(), '<>')
    

    Access checks are applied to entity queries by default. Maybe we should opt out to make sure its really unique?

I agree on the overall approach.

chr.fritsch’s picture

1+2. I discussed #24-1 with @dawehner and we think because of https://www.drupal.org/node/2562903 it should be possible to add a new method to the class and the interface.
3. Opted out for accessCheck

Additionally i added a test to the TermTest.php

mtodor’s picture

This looks good, I have tested it (works fine) and checked code. Only few nitpicks.

  1. +++ b/core/modules/taxonomy/src/Plugin/Validation/Constraint/TermNameConstraintValidator.php
    @@ -0,0 +1,58 @@
    +    $field_name = $items->getFieldDefinition()->getName();
    

    This expression can go after if (!Vocabulary::load($entity->bundle())->preventDuplicates()) { statement. And also it would be better to get $items->getFieldDefinition() in some variable and then use it in 2 places, because $field_name is used only once, so there is no need for variable. There is also repeating for $items->getEntity() and $entity->getEntityType() - they could be assigned to variable.

  2. +++ b/core/modules/taxonomy/src/Tests/TermTest.php
    @@ -585,4 +586,60 @@ public function testTermBreadcrumbs() {
    +    $this->assertNotNull($term, 'Term found in database.');
    

    If we already have check $this->assertText('Created new term ' . $term->label()); at end of test, maybe it can be added here too.

  3. +++ b/core/modules/taxonomy/src/Tests/TermTest.php
    @@ -585,4 +586,60 @@ public function testTermBreadcrumbs() {
    +    $termName = $this->randomMachineName(14);
    +    $edit = [
    +      'name[0][value]' => $termName,
    +      'description[0][value]' => $this->randomMachineName(100),
    +      'parent[]' => [0],
    +    ];
    

    Instead of making new random name, it would be easier to just make something like this: $edit['name[0][value]'] .= '_2'; - to avoid 1 in 64509974703297150976 cases when we will get two same random generated names. ;)

  4. +++ b/core/modules/taxonomy/src/VocabularyInterface.php
    @@ -54,4 +54,12 @@ public function setHierarchy($hierarchy);
    +   * Indicates if multiple terms with same name are allowed in the vocabulary.
    

    Comment is a bit misleading. I think it should be: "Indicates if multiple terms with same name are forbidden in the vocabulary." - because that's behaviour when you get TRUE.

Overall -> VERY NICE!
Now we will not get errors like: Multiple entities match this reference; "My Tag (3)", "My Tag (7)". - when I just want to use "My Tag". :)

chr.fritsch’s picture

I adjusted all the comments from #26. Unfortunately it was not possible to create an interdiff.