Problem/Motivation

Migrate has the concept of de-duplication, where it looks for things with identical machine names, and creates a copy if there's a match.

This is actually creating duplicates (or avoiding overwrites), but it's not really de-duplication.

Proposed resolution

Document how de-duplication works.

Remaining tasks

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#67 deprecatedMigrateModule.patch2.79 KBDrupalMattS
#58 interdiff.txt1.22 KBohthehugemanatee
#58 2824610-57.patch25.29 KBohthehugemanatee
#54 interdiff.txt1.88 KBquietone
#54 2824610-54.patch25.61 KBquietone
#52 interdiff.txt2.46 KBquietone
#52 2824610-52.patch25.61 KBquietone
#48 interdiff.txt14.97 KBquietone
#48 2824610-48.patch24.16 KBquietone
#44 2824610-44.patch22.3 KBjofitz
#41 2824610-41.patch22.37 KBquietone
#39 interdiff.txt505 bytesquietone
#39 2824610-39.patch22.35 KBquietone
#37 2824610-37.patch22.17 KBjofitz
#30 interdiff.txt499 bytesquietone
#30 2824610-30.patch22.31 KBquietone
#28 interdiff.txt628 bytesquietone
#28 2824610-28.patch22.3 KBquietone
#26 interdiff.txt23.28 KBquietone
#26 2824610-26.patch22.31 KBquietone
#22 interdiff-9-22.txt23.18 KBquietone
#22 2824610-22.patch20.53 KBquietone
#20 interdiff.txt16.88 KBquietone
#20 2824610-20.patch17.52 KBquietone
#10 2824610-9.patch5.38 KBquietone
#9 interdiff.txt2.2 KBquietone
#5 2824610-5.patch5.24 KBquietone
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch created an issue. See original summary.

alexpott’s picture

RenameOnClash?

mikeryan’s picture

Well, it's actually the *IDs* getting deduplicated, not the entities. And entities are not (necessarily) duplicated... If you have an existing entity whose name is 'foobar', and an incoming entity also happens to have the name 'foobar', the presumption is that the incoming entity is a different entity, and to have it co-exist with the existing entity it needs to be renamed (its ID needs to be deduped).

mikeryan’s picture

Title: De-duplication is confusing » Add documentation to DedupeBase/DedupeEntity process plugins
Parent issue: » #2776179: [meta] Add process plugin documentation to the codebase

Let's just make this issue the child of #2776179: [meta] Add process plugin documentation to the codebase for the dedupe plugins.

quietone’s picture

Status: Active » Needs review
FileSize
5.24 KB

And here is a patch made from copying the DedupeEntity and DedupeBase changes from #2776179: [meta] Add process plugin documentation to the codebase.

heddn’s picture

Assigned: Unassigned » heddn

Assigning to myself for review this week.

heddn’s picture

Status: Needs review » Needs work
  1. +++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
    @@ -9,19 +9,40 @@
    + * Provides deduplication logic.
    

    dedupe plugin, no?

  2. +++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
    @@ -9,19 +9,40 @@
    + *   - start: (optional) The returned string will start at the start'th position in string.
    

    - 80 characters.
    - 'The position at which to start reading.' is what substr uses. I think a variation of that would make more sense.

  3. +++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
    @@ -9,19 +9,40 @@
    + *   - length: (optional) The maximum number of characters in the returned string.
    

    - 80 characters
    - This isn't strictly accurate. The length limits the size of the substr call, not the returned value. The returned value also gets the postfix and the incremental digit added.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
    @@ -9,19 +9,40 @@
    +   *   The imploded value.
    

    This doesn't implode.

  5. +++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeEntity.php
    @@ -8,12 +8,72 @@
    + * Ensures the source value is not duplicated against an entity field.
    

    Nit: Or property. It doesn't just support fields. Or do we call them the same thing in D8? Maybe we do...

  6. +++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeEntity.php
    @@ -8,12 +8,72 @@
    + *   - postfix: (optional) A string to insert before the numeric postfix.
    

    If we are going to include this, then we should also include all the other config available from the base, not just this one.

heddn’s picture

Assigned: heddn » Unassigned
quietone’s picture

FileSize
2.2 KB

thx for the review,

1. Fixed
2 & 3. Changed to use the same text as in substr. It is good enough for that so why not?
4. Fixed.
5. Hmm not sure. No change since the configuration key is 'fields' it seems simpler to just stick with that?
6. Done.

quietone’s picture

Status: Needs work » Needs review
FileSize
5.38 KB

And uploading the patch helps.

heddn’s picture

Status: Needs review » Reviewed & tested by the community

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 10: 2824610-9.patch, failed testing.

MaskyS’s picture

Status: Needs work » Reviewed & tested by the community
alexpott’s picture

Issue summary: View changes

I've updated the issue summary to document the proposed solution. I still think we should consider the naming also I don't think the documentation really explains that this can result in things that feel exactly like duplicates.

catch’s picture

Status: Reviewed & tested by the community » Needs review
+++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
@@ -9,19 +9,40 @@
  *

Again this is talking about avoiding overwrites not duplication, doesn't really address the issue summary and I really think we should consider a re-name and deprecating the old classes for clarity.

mikeryan’s picture

@catch: What would we rename it to? What it does is dedupe IDs (or potentially other unique fields), I'm not sure how better to briefly describe that (other than dedupe_entity_id?).

mikeryan’s picture

uniquify?

quietone’s picture

Wow, uniquify. It isn't in the dictionary. While I understand what it means, it is probably not helpful to people who aren't fluent in English.

And this adds some information:
http://english.stackexchange.com/questions/61939/uniquate-eliminate-or-r...

But I tend to the simple and direct. Since this is making any field unique how about 'Disambiguate' or the more wordy
'Make_unique'?

heddn’s picture

Assigned: Unassigned » quietone

We discussed in the migrate weekly meeting and we will rename to 'make_unique' and update docs. Assigning to quietone for follow-up.

quietone’s picture

we should consider a re-name and deprecating the old classes for clarity.

The name is changed to MakeUnique, documentation updated and the old classes deprecated (though not sure that I've done that correctly).

heddn’s picture

Status: Needs review » Needs work
+++ /dev/null
@@ -1,86 +0,0 @@
-class DedupeEntity extends DedupeBase implements ContainerFactoryPluginInterface {

Instead of deleting, I think we should mark as @deprecated. See #2187735: Add removal information to docblock of all @deprecated functions. And maybe you are already, but can you run the diff using -M? git diff -M I think it will more cleanly demonstrate the renames.

quietone’s picture

Status: Needs work » Needs review
FileSize
20.53 KB
23.18 KB

@heddn, thanks for the review. Seems I uploaded the wrong patch. Let's try again.

Ad it looks I changed all the instance of dedupe:

ag dedupe core
core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
6: * This abstract base contains the dedupe logic.
13: * @link https://www.drupal.org/node/2345929 Online handbook documentation for dedupebase process plugin @endlink
18:abstract class DedupeBase extends MakeUniqueBase {

core/modules/migrate/src/Plugin/migrate/process/DedupeEntity.php
11: * @link https://www.drupal.org/node/2135325 Online handbook documentation for dedupe_entity process plugin @endlink
14: *   id = "dedupe_entity"
20:abstract class DedupeEntity extends MakeUniqueEntity {
heddn’s picture

Assigned: quietone » Unassigned
Status: Needs review » Needs work

A couple of (hopefully) small nits.

  1. +++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeEntity.php
    --- /dev/null
    +++ b/core/modules/migrate/src/Plugin/migrate/process/MakeUnique.php
    

    Bikeshedding, but maybe MakeEntityUnique or MakeUniqueEntity to clarify this is for entities.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/process/MakeUnique.php
    @@ -0,0 +1,148 @@
    + * Ensures the source value is not duplicated against an entity field.
    

    Instead of duplicated, let's state that this makes the entity field value unique. Here and elsewhere, where we use the phrase, "duplicate".

quietone’s picture

1. I thought about that. But since the field name that is to be made unique still has to be provided it could be any field and not the entity id. Why assume it is the entity id?

2. Yea, that phrasing works for me.

heddn’s picture

What about 'make_unique_field_value'?

Or (another line of logic entirely) collapse the abstract & concrete classes into a single class. Someone can always extend a concrete class, why have the additional level of complexity if we aren't going to have some distinguishing aspect of MakeUnique. I think we either need to call it either MakeUnqiue or define what we are making unique.

Also, I cannot tell on point #2. Are you agreeing that we should change it or disagreeing and saying the current language is fine. I don't feel strongly enough to push the point. But if we are going to change the name of the thing, then I feel we should utilize language that emphasizes the new name, rather than vestiges of the old name.

quietone’s picture

Status: Needs work » Needs review
FileSize
22.31 KB
23.28 KB

What about 'make_unique_field_value'?

Since the exists method used is for entities, make_unique_entity_field_value., would be more accurate. But 'value' isn't used in other process plugin names, so maybe 'make_unique_entity_field'?

2. I agree with you.

This patch uses make_unique_entity_field and remove 'deduplicate' from the comments.

Sadly, the interdiff is larger than the patch, but attached for your reading pleasure.

Status: Needs review » Needs work

The last submitted patch, 26: 2824610-26.patch, failed testing.

quietone’s picture

Status: Needs work » Needs review
FileSize
22.3 KB
628 bytes

Fixed typo.

Status: Needs review » Needs work

The last submitted patch, 28: 2824610-28.patch, failed testing.

quietone’s picture

Status: Needs work » Needs review
FileSize
22.31 KB
499 bytes

Seems to be my day for silly errors....

heddn’s picture

Status: Needs review » Reviewed & tested by the community

I think this is good now. All feedback is now addressed.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

quietone’s picture

Title: Add documentation to DedupeBase/DedupeEntity process plugins » Add documentation to DedupeBase/DedupeEntity process plugins and rename to MakeUnique

Adding rename to the title since this is also doing a rename.

xjm’s picture

Version: 8.4.x-dev » 8.3.x-dev

Straightforward code documentation improvements can always go into any patch release, alpha, beta, or RC, so please always file them against the production branch (currently 8.3.x). Thanks!

catch’s picture

Title: Add documentation to DedupeBase/DedupeEntity process plugins and rename to MakeUnique » Rename DedupeBase/DedupeEntity process plugins to MakeUnique and add documentation
Version: 8.3.x-dev » 8.4.x-dev

This isn't a straightforward docs patch, it's also a rename. Moving back to 8.4.x but also switching the order of the title around to make that clearer.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 30: 2824610-30.patch, failed testing.

jofitz’s picture

Status: Needs work » Needs review
FileSize
22.17 KB

Re-roll.

Status: Needs review » Needs work

The last submitted patch, 37: 2824610-37.patch, failed testing.

quietone’s picture

Status: Needs work » Needs review
FileSize
22.35 KB
505 bytes

Changed filename from DedupeEntityTest.php to MakeUniqueEntityTest.php. And get an entity_type.manager instead of entity.query.

Status: Needs review » Needs work

The last submitted patch, 39: 2824610-39.patch, failed testing.

quietone’s picture

Status: Needs work » Needs review
FileSize
22.37 KB

That filename change should have been to MakeUniqueEntityFieldTest.php.

No interdiff since the filename change didn't show up in it.

heddn’s picture

Status: Needs review » Reviewed & tested by the community

Ready again.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 41: 2824610-41.patch, failed testing.

jofitz’s picture

Status: Needs work » Needs review
FileSize
22.3 KB

Re-rolled.

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community

Was RTBC, let's set it back. I compared the two patches and all the changes are related to the recent coding standards for short array syntax patch.

#2776975: March 3, 2017: Convert core to array syntax coding standards for Drupal 8.3.x RC phase

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 44: 2824610-44.patch, failed testing.

heddn’s picture

Issue tags: +Needs reroll

Legitimate failures. Tagging for reroll.

quietone’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll
FileSize
24.16 KB
14.97 KB

DedupeEntity was incorrectly made an abstract class, which in combination to the addition of dedupe_entity in d7_taxonomy_vocabulary.yml, caused the errors.

d7_taxonomy_vocabulary.yml is now using make_unique_entity_field. DedupeEntity is no longer abstract.

Plus, I have restored the original DedupeEntity unit test, which would have caught the error.

heddn’s picture

Status: Needs review » Needs work

I don't see DedupeEntityTest in the patch, only the interdiff.

quietone’s picture

Previous patches moved DedupeEntityTest to MakeUniqueEntityFieldTest and then modified it, leaving only one test file. The latest patch copeid DedupeEntityTest to a new file MakeUniqueEntityFieldTest, leaving the original untouched and adding the new one.

quietone’s picture

Status: Needs work » Needs review

But, I just realized that the original test needs to have deprecation added to it.

quietone’s picture

Now with deprecation notice on the dedupe test. At least I hope, experiencing a bit of stress with canceled flights and delayed flights.

heddn’s picture

Status: Needs review » Needs work
+++ b/core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
@@ -2,6 +2,9 @@
+Drupal 8.4.0 and will be removed before Drupal 9.0.0. Instead, use '. __NAMESPACE__ . '\MakeUniqueEntityFieldBase', E_USER_DEPRECATED);

Not a blocker per se, but I do have question/nit. Can we use sprintf and MakeUniqueEntityFieldBase::class to make this easier to read? Is there a pattern set that should be used for these trigger_error's?

quietone’s picture

Status: Needs work » Needs review
FileSize
25.61 KB
1.88 KB

@heddn, I think I implemented that as specified in the policy, https://www.drupal.org/node/2856615#how-class.

Tidied all the deprecation notices to be 8.4.x.

heddn’s picture

Status: Needs review » Reviewed & tested by the community

Thanks for the link. Then this is good to go.

phenaproxima’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/modules/migrate/tests/src/Unit/process/DedupeEntityTest.php
    @@ -2,6 +2,9 @@
    +@trigger_error('The '. __NAMESPACE__ . '\DedupeEntityTest is deprecated in
    

    Coding standards violation -- the . (concat operator) needs to be surrounded by a space on each side.

  2. +++ b/core/modules/migrate/tests/src/Unit/process/DedupeEntityTest.php
    @@ -2,6 +2,9 @@
    +Drupal 8.4.0 and will be removed before Drupal 9.0.0. Instead, use '. __NAMESPACE__ . '\MakeUniqueEntityFieldTest', E_USER_DEPRECATED);
    

    Again, the concat operator must have a space on either side of it.

  3. +++ b/core/modules/migrate/tests/src/Unit/process/DedupeEntityTest.php
    @@ -11,6 +14,9 @@
    + *
    + * @deprecated in Drupal 8.4.x and will be removed in Drupal 9.0.x. Use
    +     \Drupal\migrate\Plugin\migrate\process\MakeUniqueEntityField instead.
    

    I don't think we need to have a deprecation notice on a test...?

quietone’s picture

Status: Needs work » Needs review
FileSize
25.24 KB
3.44 KB

Fixed items in #56. And removed an unused use statement.

Beware if you copy the trigger error format from Drupal core deprecation policy that you need to correct the spacing.

ohthehugemanatee’s picture

Reviewed the changes, they make sense to me. Validated that dedupe is now only mentioned in BC stuff. Made the changes phenaproxima suggested.

ohthehugemanatee’s picture

Ah ffs - sorry to duplicate your work @quietone! The patches are effectively identical, except you caught another deprecation notice missing space, and the use statement. My review of the patch therefore stands... and we should use quietone's version. Please mark the issue as RTBC once tests come back green.

ohthehugemanatee’s picture

I can't seem to show @quietone's patch. But that's the one we should accept. So, er... just look at that comment 57.

  • catch committed eef585f on 8.4.x
    Issue #2824610 by quietone, Jo Fitzgerald, ohthehugemanatee, heddn,...

  • catch committed 9590bbf on 8.3.x
    Issue #2824610 by quietone, Jo Fitzgerald, ohthehugemanatee, heddn,...
catch’s picture

Status: Needs review » Fixed

Fixed these on commit:

FILE: ...x/core/modules/migrate/src/Plugin/migrate/process/DedupeBase.php
----------------------------------------------------------------------
FOUND 2 ERRORS AFFECTING 2 LINES
----------------------------------------------------------------------
 5 | ERROR | [x] Concat operator must be surrounded by a single space
 6 | ERROR | [x] Concat operator must be surrounded by a single space
----------------------------------------------------------------------
PHPCBF CAN FIX THE 2 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------

Time: 38ms; Memory: 4Mb


FILE: ...core/modules/migrate/src/Plugin/migrate/process/DedupeEntity.php
----------------------------------------------------------------------
FOUND 2 ERRORS AFFECTING 2 LINES
----------------------------------------------------------------------
 5 | ERROR | [x] Concat operator must be surrounded by a single space
 6 | ERROR | [x] Concat operator must be surrounded by a single space
----------------------------------------------------------------------
PHPCBF CAN FIX THE 2 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------

Time: 18ms; Memory: 4Mb

PHPCS: core/modules/migrate/src/Plugin/migrate/process/MakeUniqueBase.php passed
PHPCS: core/modules/migrate/src/Plugin/migrate/process/MakeUniqueEntityField.php passed
PHPCS: core/modules/migrate/tests/src/Unit/process/DedupeEntityTest.php passed

FILE: ...les/migrate/tests/src/Unit/process/MakeUniqueEntityFieldTest.php
----------------------------------------------------------------------
FOUND 0 ERRORS AND 1 WARNING AFFECTING 1 LINE
----------------------------------------------------------------------
 8 | WARNING | [x] Unused use statement
----------------------------------------------------------------------
PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------

Committed/pushed to 8.4.x and cherry-picked to 8.3.x, thanks!

quietone’s picture

@catch, thanks.
I did fix those errors in patch #57. But then I noticed there are 2 patches named 2824610-57.patch, one in comment #57 and one in #58. And only the later one appears in the file listing.

catch’s picture

Somehow I managed to commit this from Code Needs Review yesterday (and missed the #58 patch with the coding style fixes). Since it was already RTBC a couple of comments before, and the changes were the same as I ended up making on commit anyway, going to leave status as is. Just wanted to leave a comment in case anyone gets confused. This might be my first ever direct from CNR -> Fixed commit, will try to make it the last one too.

DrupalMattS’s picture

Added @see records for:
/core/modules/migrate/src/Plugin/process/DedupeBase.php
/core/modules/migrate/src/Plugin/process/DedupeEntity.php
/core/modules/migrate/src/Plugin/process/MakeUniqueBase.php
/core/modules/migrate/src/Plugin/process/MakeUniqueEntityField.php

joelpittet’s picture

@DrupalMattS If that is something we do (which I've yet to see, but doesn't mean we don't), it would be best to create a follow-up issue because this one is fixed. Click 'clone issue' in the top right of this page.

Mile23’s picture

Probably un-deprecating DedupeEntityTest here, because tests can't really be deprecated and should just be replaced: #2488860: Bring phpunit bridge into drupal and use it for unit tests and simpletest to handle Deprecation

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.