Use hash for Migration source keys, rather than verbatim values [#2613878]

Problem/Motivation

There needs to be a way to adjust the source key before it gets used. Currently the setting of the source ids is embedded inside next(). This setting should be externalized so it can be overwritten.

Proposed resolution

Hash the keys to the mapping table. This would fix the problem for non-ASCII keys
Add the unhashed values to a non-indexed (PK) column(s). This would make the DX of hashing less bad since the original values would still be available, just not stored as a PK (with all its limitations)

Remaining tasks

Provide a patch with the new direction.

User interface changes

API changes

This adds a public setter on SourcePluginBase & MigrateSourceInterface.

Data model changes

Original summary by @miiimooo.
This is a follow up to #2613332: Support for non-ascii collations in SQL migration map.

My problem is that I have Swedish characters in my only column that is used to create terms in a taxonomy. The code is Sql.php modifies the sourceidX field definition to ascii so it can store keys up to 255 characters. Removing this breaks a number of tests.

At the moment CSV returns type "string" for the identifiers field. But maybe this could be something different so I don't run into this problem with non ASCII characters?

Comment	File	Size	Author
#69	interdiff.txt	696 bytes	edysmp
#69	2613878-hash_source_keys-69.patch	25.47 KB	edysmp
#69	8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass PHP 5.5 & PostgreSQL 9.1 14,567 pass PHP 5.5 & SQLite 3.8 14,564 pass
#67	interdiff_64-66.txt	1.98 KB	heddn
#67	2613878-hash_source_keys-66.patch	25.38 KB	heddn
#67	8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass, 2 fail PHP 5.5 & PostgreSQL 9.1 14,567 pass, 2 fail
#64	interdiff_60-64.txt	770 bytes	heddn
#64	2613878-hash_source_keys-64.patch	25.24 KB	heddn
#64	8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass, 1 fail
#60	2613878-hash_source_keys-60.patch	25.24 KB	edysmp
#60	8.0.x: PHP 5.5 & MySQL 5.5 14,480 pass PHP 5.5 & SQLite 3.8 14,564 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,567 pass, 1 fail
#60	interdiff-2613878-58-60.txt	1.08 KB	edysmp
#58	interdiff-2613878-56-58.txt	788 bytes	edysmp
#58	2613878-hash_source_keys-58.patch	23.86 KB	edysmp
#58	8.0.x: PHP 5.5 & MySQL 5.5 14,474 pass, 1 fail PHP 5.5 & SQLite 3.8 14,462 pass PHP 5.5 & PostgreSQL 9.1 14,464 pass, 1 fail
#56	interdiff-2613878-45-56.txt	1.45 KB	edysmp
#56	2613878-hash_source_keys-56.patch	24.86 KB	edysmp
#56	8.0.x: PHP 5.5 & MySQL 5.5 14,473 pass, 2 fail PHP 5.5 & SQLite 3.8 14,461 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,463 pass, 2 fail
#43	2613878-hash_source_keys-43.patch	25.43 KB	edysmp
#43	8.0.x: PHP 5.5 & MySQL 5.5 14,466 pass
#43	interdiff-2613878-40-43.txt	10.29 KB	edysmp
#35	2613878-hash_source_keys-35.patch	23.19 KB	edysmp
#35	8.0.x: PHP 5.5 & MySQL 5.5 14,426 pass, 4 fail
#35	interdiff-33-35.txt	588 bytes	edysmp
#31	2613878-hash_source_keys-31.patch	23.47 KB	edysmp
#31	8.0.x: PHP 5.5 & MySQL 5.5 14,387 pass, 96 fail
#31	interdiff-29-31.txt	16.35 KB	edysmp
#29	2613878-hash_source_keys-29.patch	10.89 KB	Ada Hernandez
#29	8.0.x: PHP 5.5 & MySQL 5.5 14,356 pass, 100 fail
#29	interdiff-25-29.txt	5.43 KB	Ada Hernandez
#25	2613878-hash_source_keys-25.patch	11.05 KB	edysmp
#25	8.0.x: PHP 5.5 & MySQL 5.5 14,356 pass, 100 fail
#19	2613878-hash_source_keys-19.patch	25.83 KB	jian he
#19	8.0.x: PHP 5.5 & MySQL 5.5 14,264 pass, 228 fail
#16	interdiff-2613878-13-16.txt	472 bytes	Lord_of_Codes
#16	2613878-16.patch	26.06 KB	Lord_of_Codes
#16	8.0.x: PHP 5.5 & MySQL 5.5 1 fail
#13	use_hash_for_migration-2613878-13.patch	26.06 KB	edysmp
#13	8.0.x: PHP 5.5 & MySQL 5.5 1 fail
#4	drupal-migrate_set_source_keys-2613878-4.patch	1.71 KB	heddn
#4	8.0.x: PHP 5.5 & MySQL 5.5 14,347 pass
#33	interdiff-31-33.txt	4.22 KB	edysmp
#33	2613878-hash_source_keys-33.patch	23.17 KB	edysmp
#33	8.0.x: PHP 5.5 & MySQL 5.5 14,444 pass, 16 fail
#38	2613878-hash_source_keys-38.patch	23.19 KB	Ada Hernandez
#38	8.0.x: PHP 5.5 & MySQL 5.5 14,460 pass, 2 fail
#38	interdiff-35-38.txt	1.62 KB	Ada Hernandez
#40	interdiff-2613878-38-40.txt	4.01 KB	edysmp
#40	2613878-hash_source_keys-40.patch	24.71 KB	edysmp
#40	8.0.x: PHP 5.5 & MySQL 5.5 14,465 pass
#45	interdiff-2613878-43-45.txt	2.31 KB	edysmp
#45	2613878-hash_source_keys-45.patch	25.48 KB	edysmp
#45	8.0.x: PHP 5.5 & MySQL 5.5 14,475 pass PHP 5.5 & SQLite 3.8 14,461 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,463 pass, 2 fail

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

12 November 2015 at 08:52

miiimooo created an issue. See original summary.

Comment #2

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 12 November 2015 at 14:27

Project:	Migrate Source CSV	» Drupal core
Version:	8.x-1.x-dev	» 8.0.x-dev
Component:	Code	» ajax system
Issue summary:	View changes

Comment #3

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 12 November 2015 at 14:29

Title:	Non ASCII characters in identifiers	» Add ability to overwrite source keys (Non ASCII characters in identifiers)
Component:	ajax system	» migration system

Comment #4

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 12 November 2015 at 14:39

Category:	Support request	» Bug report
Status:	Active	» Needs review

File	Size
drupal-migrate_set_source_keys-2613878-4.patch	1.71 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,347 pass

Before I get distracted, let's throw out a patch to start the conversation. BTW, I'm classifying this a bug in the API.

Comment #5

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan at Acquia commented 13 November 2015 at 22:44

Title:	Add ability to overwrite source keys (Non ASCII characters in identifiers)	» Add ability to overwrite source keys
Category:	Bug report	» Feature request
Status:	Needs review	» Needs work
Issue tags:		+Needs tests

So, next() is just filled with chickens and eggs... Optimally prepareRow() is the place to take the raw incoming data and transform it into your canonical source for the pipeline, so ideally that would be the place to manipulate source keys as needed. But one of the inputs into prepareRow (via the Row object) is the existing idmap data (if any), so to retrieve that we have to have the already-fixed-up keys. Without rearranging the chickens and eggs, yes, adding the public setter is fine as far as it goes... It requires extending the source plugin, I'd also ideally like to see an event so anyone can get in there.

But, is there any deeper refactoring we can do here? The source is tightly coupled to the idmap here, should it be? Ideally the source should just be delivering source data - but, it needs to know what rows have previously been processed and thus (usually) skipped. Or it could spew forth events and a listening idmap could tell it when to ignore a row... Anyway, that's all pie in the (9.x) sky...

Comment #6

chx CreditAttribution: chx commented 17 November 2015 at 12:31

That setter is missing because it shouldn't exist. IMO this is a won't fix and the other issue should focus on documenting on getIds instead of prepareRow.

Comment #7

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 17 November 2015 at 13:53

@chx, sometimes the only value available for the source key is a really long string. Sometimes it includes strange non-UTF8 characters that aren't supported by the DB as a key. Sometimes the key is a fabricated number at run time. Lots of reasons exist why one might want to modify the source key. Currently, without overriding next(), there isn't any way to perform that override. In ever 10 migrations on D7, I end up having to override source keys 1 or 2 times.

Why shouldn't it exist? What are some possible alternatives.

Comment #8

miiimooo

Europe

CreditAttribution: miiimooo at Wunder commented 18 November 2015 at 09:24

Just to add, in my case if I wanted to handle this in prepare_row rather, it would mean rewriting the source values once for this migration and then for any other migrations that might refer to it.
I guess that is an option but it's conceivable that the rewriting could be complex enough to warrant creating a further mapping table..

Comment #9

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 14 December 2015 at 15:51

To summarize a discussion in IRC between chx, phenaproxima, neclimdul and myself:

Setter isn't a preferred approach because we should just solve the problem.
Hash the keys to the mapping table. This would fix the problem for non-ASCII keys
Add the unhashed values to a non-indexed (PK) column(s). This would make the DX of hashing less bad since the original values would still be available, just not stored as a PK (with all its limitations)

Comment #10

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 14 December 2015 at 21:39

Issue summary:	View changes
Priority:	Normal	» Major
Related issues:		+#2543282: Migrate source CSV dies on long text

Another related issue: #2543282: Migrate source CSV dies on long text. Also, updated the IS given the current direction to solve this. Since this is the 4th issue in the last month, I'm going to bump this to a major, since it seems to be a very common request.

Comment #11

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 14 December 2015 at 21:41

Title:

Add ability to overwrite source keys

» Use hash for Migration source keys, rather than verbatim values

Comment #12

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 23 December 2015 at 21:39

Assigned:

Unassigned

» edysmp

Working on the new direction.

Comment #13

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 24 December 2015 at 21:13

Assigned:	edysmp	» Unassigned
Status:	Needs work	» Needs review

File	Size
use_hash_for_migration-2613878-13.patch	26.06 KB
8.0.x: PHP 5.5 & MySQL 5.5 1 fail

1 file was hidden/shown/deleted

File	Size
drupal-migrate_set_source_keys-2613878-4.patch	1.71 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,347 pass

Initial work in progress, still needs work on tests.

Comment #14

24 December 2015 at 21:16

Status:

Needs review

» Needs work

The last submitted patch, 13: use_hash_for_migration-2613878-13.patch, failed testing.

Comment #15

chx CreditAttribution: chx commented 24 December 2015 at 21:39

Thanks for the great work! Overall looks really good.

Please do not use a separate setSourceIdsHash method -- especially not a public one (if needed for testing , use a protected method). Currently it's not used anywhere but the constructor and that's the right thing to do ; as before with source ids , same with source hash: it should never change.

Comment #16

Lord_of_Codes CreditAttribution: Lord_of_Codes commented 30 December 2015 at 17:56

File	Size
2613878-16.patch	26.06 KB
8.0.x: PHP 5.5 & MySQL 5.5 1 fail
interdiff-2613878-13-16.txt	472 bytes

Changed the function signatures of setSourceIdsHash . Turned it into a protected method.

Comment #17

jian he CreditAttribution: jian he commented 31 December 2015 at 03:03

Status:

Needs work

» Needs review

Comment #18

31 December 2015 at 03:10

Status:

Needs review

» Needs work

The last submitted patch, 16: 2613878-16.patch, failed testing.

Comment #19

jian he CreditAttribution: jian he commented 31 December 2015 at 04:20

Status:

Needs work

» Needs review

File	Size
2613878-hash_source_keys-19.patch	25.83 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,264 pass, 228 fail

re-roll

Comment #20

31 December 2015 at 04:54

Status:

Needs review

» Needs work

The last submitted patch, 19: 2613878-hash_source_keys-19.patch, failed testing.

Comment #21

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 8 January 2016 at 01:37

next()
- setSourceIdHash
processRow
- transform (process plugins)
import
- getDestination(getSourceIdHash())

The problem is that the source id that is set in next uses the value before it is transformed. Then it freezes the row. Transform changes the values. #15 above (rightly) points out the setter for the hash should be protected, but then we cannot update when we transform the value. And getDestination uses the hashed value with the old value PRIOR to the transform.

Comment #22

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 8 January 2016 at 15:22

+++ b/core/modules/migrate/src/Event/MigrateIdMapMessageEvent.php
@@ -73,13 +73,13 @@ public function getMigration() {
   }

+++ b/core/modules/migrate/src/MigrateExecutable.php
@@ -216,7 +209,6 @@ public function import() {
@@ -235,7 +227,7 @@ public function import() {

@@ -235,7 +227,7 @@ public function import() {
       if ($save) {
         try {
           $this->getEventDispatcher()->dispatch(MigrateEvents::PRE_ROW_SAVE, new MigratePreRowSaveEvent($this->migration, $row));
-          $destination_id_values = $destination->import($row, $id_map->lookupDestinationId($this->sourceIdValues));
+          $destination_id_values = $destination->import($row, $id_map->lookupDestinationId($row->getSourceIDsHash()));
           $this->getEventDispatcher()->dispatch(MigrateEvents::POST_ROW_SAVE, new MigratePostRowSaveEvent($this->migration, $row, $destination_id_values));
           if ($destination_id_values) {
             // We do not save an idMap entry for config.
@@ -424,7 +416,7 @@ protected function currentSourceIds() {

This is where I think we need to solve the problem. Instead of passing in the original source values (which can change in tranform) or passing in a hash (which is based on the original value), we need to pass in the tranformed values, then hash them. Otherwise we will result in duplicate created entries.

Test case:

Spanish (static_map => Español)
Español

Spanish "hashed" = abc
Español "hashed" = def

next() creates the row, sets the hash then freezes the row
processRow() transforms the value
lookupDestinationId() needs to use the transformed values, instead of the hash.

Why?
1st record (Spanish) (which was transformed to Español) doesn't exist mapping. If we pass in abc or Spanish, it doesn't find it, so it creates a record for Español
2nd record (Español) also doesn't exist in the mapping. Passing def or Español, it doesn't find. So it creates a 2nd entry for Español.

We need to pass in the values after the transformation to lookupDestinationId(). But now we run into another problem. There isn't a way to extract the transformed values of the sourceIds from Row. getSourceIdValues() returns the pre-transform values. Transform changes the destination values in row, not the source values. And the keys for destination are not the same as source.

Possible solution:
Update getSourceIdValues() to merge in any transformed values. And rename the function to getIdValues, because it isn't the source values any more. It is just the values of the ids. But I don't see a way to make this happen. There isn't any mapping inside of Row. That's outside of it.

Comment #23

chx CreditAttribution: chx commented 8 January 2016 at 16:01

I am confused as to what's happening here. I thought this change will be internal to the sql idmap.

Comment #24

chx CreditAttribution: chx commented 8 January 2016 at 17:25

As for #22 which is a very different issue, without hashing, I think extending the example with source IDs and destination IDs will make it easier to understand.

We have a static map with bypass TRUE mapping Spanish to Español.

Row1 , sourceid1 is Spanish, destinationid1 becomes Español.

Row2, sourceid1 is Español, destinationid1 becomes Español.

These are two different rows as identified by the source id. If you want to avoid this then add a process plugin which skips the row if the destination already exists. See DedupeEntity::exists for the very code you need.

Comment #25

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 10 January 2016 at 22:59

File	Size
2613878-hash_source_keys-25.patch	11.05 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,356 pass, 100 fail

I think this is enough, I made changes only in sql idmap.

Please review.

Comment #26

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 10 January 2016 at 23:02

Status:

Needs work

» Needs review

Comment #27

10 January 2016 at 23:43

Status:

Needs review

» Needs work

The last submitted patch, 25: 2613878-hash_source_keys-25.patch, failed testing.

Comment #28

chx CreditAttribution: chx commented 11 January 2016 at 00:35

Thanks for the patch! A few observations:

I didn't even know $this:: is valid syntax. Can we change to static::?
As msgid is a serial, it's a primary key. You can't have two primary keys.

Comment #29

Ada Hernandez CreditAttribution: Ada Hernandez at MTech, LLC commented 11 January 2016 at 22:37

Status:

Needs work

» Needs review

File	Size
interdiff-25-29.txt	5.43 KB
2613878-hash_source_keys-29.patch	10.89 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,356 pass, 100 fail

2 files were hidden/shown/deleted

File	Size
interdiff-2613878-13-16.txt	472 bytes
2613878-hash_source_keys-19.patch	25.83 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,264 pass, 228 fail

I made the changes referenced in comment #28

Comment #30

11 January 2016 at 23:15

Status:

Needs review

» Needs work

The last submitted patch, 29: 2613878-hash_source_keys-29.patch, failed testing.

Comment #31

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 13 January 2016 at 21:40

Status:	Needs work	» Needs review
Issue tags:	-Needs tests

File	Size
2613878-hash_source_keys-31.patch	23.47 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,387 pass, 96 fail
interdiff-29-31.txt	16.35 KB

Worked in the tests and refined sql idmap.

Comment #32

13 January 2016 at 22:16

The last submitted patch, 31: 2613878-hash_source_keys-31.patch, failed testing.

Comment #33

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 15 January 2016 at 19:28

File	Size
2613878-hash_source_keys-33.patch	23.17 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,444 pass, 16 fail
interdiff-31-33.txt	4.22 KB

Updated Test.

Comment #34

15 January 2016 at 20:08

The last submitted patch, 33: 2613878-hash_source_keys-33.patch, failed testing.

Comment #35

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 15 January 2016 at 23:21

File	Size
2613878-hash_source_keys-35.patch	23.19 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,426 pass, 4 fail
interdiff-33-35.txt	588 bytes

updated getSourceIDsHash function.

Comment #36

chx CreditAttribution: chx commented 15 January 2016 at 23:38

This looks incredibly promising thanks for the persistent hard work!

Comment #37

15 January 2016 at 23:56

The last submitted patch, 35: 2613878-hash_source_keys-35.patch, failed testing.

Comment #38

Ada Hernandez CreditAttribution: Ada Hernandez at MTech, LLC commented 20 January 2016 at 21:55

File	Size
2613878-hash_source_keys-38.patch	23.19 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,460 pass, 2 fail
interdiff-35-38.txt	1.62 KB

3 files were hidden/shown/deleted

File	Size
use_hash_for_migration-2613878-13.patch	26.06 KB
8.0.x: PHP 5.5 & MySQL 5.5 1 fail
interdiff-31-33.txt	4.22 KB
2613878-hash_source_keys-33.patch	23.17 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,444 pass, 16 fail

I updated the testMessageSave() for UnitTest.

Comment #39

20 January 2016 at 22:33

Status:

Needs review

» Needs work

The last submitted patch, 38: 2613878-hash_source_keys-38.patch, failed testing.

Comment #40

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 21 January 2016 at 16:08

File	Size
2613878-hash_source_keys-40.patch	24.71 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,465 pass
interdiff-2613878-38-40.txt	4.01 KB

2 files were hidden/shown/deleted

File	Size
2613878-hash_source_keys-38.patch	23.19 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,460 pass, 2 fail
interdiff-35-38.txt	1.62 KB

Final test.. for now.

Comment #41

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 21 January 2016 at 16:11

Status:

Needs work

» Needs review

Comment #42

chx CreditAttribution: chx commented 21 January 2016 at 23:57

Status:

Needs review

» Needs work

Awesome!

+    $have_keys = !isset($source_id_values[0]);
+    foreach ($this->sourceIdFields() as $field_name => $source_id) {
+      $id_values_validated += [$field_name => $have_keys ? $source_id_values[$field_name] : array_shift($source_id_values)];
+    }

Huh. What is this. Why is source_id_values sometimes a list sometimes an associated array? Is this a testing artifact? What's going on here?

Comment #43

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 22 January 2016 at 17:47

Status:

Needs work

» Needs review

File	Size
2613878-hash_source_keys-43.patch	25.43 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,466 pass
interdiff-2613878-40-43.txt	10.29 KB

Comment #44

chx CreditAttribution: chx commented 22 January 2016 at 18:30

Status:

Needs review

» Needs work

This is really close!

+ // source key and value, e.g. ['nid' => 41]. In this case, $source_id_values need to be ordered the same needs to be 80 columns max

+ * It is public only for testing purposes. needs a line break before

But the most serious problems are in saveIdMapping : previously if there are no sourceIdFields, we didn't save anything. Whether that makes any sense is not for this patch to decide so this behavior needs to be changed. Before + $fields += array( add a if (!$fields) { return; } to keep the previous behavior. Also $this->eventDispatcher->dispatch(MigrateEvents::MAP_SAVE, new MigrateMapSaveEvent($this, $keys + $fields)); let's remove $keys from here, it's just the hashed source id values which should never be leaked to the outside world.

Comment #45

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 22 January 2016 at 18:57

File	Size
2613878-hash_source_keys-45.patch	25.48 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,475 pass PHP 5.5 & SQLite 3.8 14,461 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,463 pass, 2 fail
interdiff-2613878-43-45.txt	2.31 KB

Thanks for review and comment.

Comment #46

chx CreditAttribution: chx at Smartsheet commented 22 January 2016 at 19:01

Category:	Feature request	» Task
Status:	Needs work	» Reviewed & tested by the community

Looks great! thanks!

Comment #47

22 January 2016 at 19:34

The last submitted patch, 45: 2613878-hash_source_keys-45.patch, failed testing.

Comment #48

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 26 January 2016 at 12:10

Status:

Reviewed & tested by the community

» Needs work

This is looking like a good solution to a hard problem

+++ b/core/modules/migrate/src/Plugin/migrate/id_map/Sql.php
@@ -172,6 +177,30 @@ public static function create(ContainerInterface $container, array $configuratio
+    // source key and value, e.g. ['nid' => 41]. In this case, $source_id_values
+    // need to be ordered the same order as $this->sourceIdFields().
+    // However, the Migration process plugin doesn't currently have a way to get
+    // the source key so we presume the values have been passed through in the
+    // correct order.
+    if (!isset($source_id_values[0])) {
+      $source_id_values = array_intersect_key($source_id_values, $this->sourceIdFields());
+    }

This does not order the keys in the same order as $this->sourceIdFields(). See https://3v4l.org/BiD8Y. Also the fact that the tests are green imply that we're missing test coverage of this.

+++ b/core/modules/user/src/Tests/Migrate/d6/MigrateUserPictureFileTest.php
@@ -44,7 +44,7 @@ public function testUserPictures() {
-    $file = array_shift($files);
+    $file = array_pop($files);

This is a bit concerning - why has the order that files are migrated changed due to this?

Comment #49

chx CreditAttribution: chx at Smartsheet commented 26 January 2016 at 21:16

Regarding order: defeat snatched from the jaws of victory. Check https://3v4l.org/ag2u7 it does the ordering according to the first array.

Comment #50

chx CreditAttribution: chx at Smartsheet commented 26 January 2016 at 21:40

> This is a bit concerning - why has the order that files are migrated changed due to this?

Because previously we ordered on the serial source and now we order on the serial id hash. The order is still deterministic just different.

Comment #51

benjy CreditAttribution: benjy at PreviousNext commented 26 January 2016 at 22:50

I'm not sure if we added explicitly coverage for the key ordering but we had implicit coverage from the pg driver, just added a test run for that.

Original issue: #2571499: idMap source and destination id filtering requires keys

Comment #52

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 27 January 2016 at 17:07

Status:

Needs work

» Reviewed & tested by the community

PG tests passed. I think that means this is RTBC again?

Comment #53

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 27 January 2016 at 22:48

Status:

Reviewed & tested by the community

» Needs work

@heddn nope the sorting issue needs to fixed... as explained in #48.1

Comment #54

benjy CreditAttribution: benjy at PreviousNext commented 28 January 2016 at 01:13

@heddn, the PG tests failed? https://www.drupal.org/pift-ci-job/155478

Comment #55

chx CreditAttribution: chx at Smartsheet commented 28 January 2016 at 02:50

I discussed this with edysmp yesterday at length and we came to an agreement on how to fix the sorting; not hard; use a loop.I expect they will post a patch soon, test pending.

Comment #56

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 28 January 2016 at 20:48

Status:

Needs work

» Needs review

File	Size
2613878-hash_source_keys-56.patch	24.86 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,473 pass, 2 fail PHP 5.5 & SQLite 3.8 14,461 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,463 pass, 2 fail
interdiff-2613878-45-56.txt	1.45 KB

3 files were hidden/shown/deleted

File	Size
2613878-hash_source_keys-35.patch	23.19 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,426 pass, 4 fail
interdiff-2613878-40-43.txt	10.29 KB
2613878-hash_source_keys-43.patch	25.43 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,466 pass

For sorting.

Comment #57

28 January 2016 at 21:22

Status:

Needs review

» Needs work

The last submitted patch, 56: 2613878-hash_source_keys-56.patch, failed testing.

Comment #58

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 28 January 2016 at 21:47

Status:

Needs work

» Needs review

File	Size
2613878-hash_source_keys-58.patch	23.86 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,474 pass, 1 fail PHP 5.5 & SQLite 3.8 14,462 pass PHP 5.5 & PostgreSQL 9.1 14,464 pass, 1 fail
interdiff-2613878-56-58.txt	788 bytes

2 files were hidden/shown/deleted

File	Size
2613878-hash_source_keys-56.patch	24.86 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,473 pass, 2 fail PHP 5.5 & SQLite 3.8 14,461 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,463 pass, 2 fail
interdiff-2613878-45-56.txt	1.45 KB

and one less problem...

Comment #59

28 January 2016 at 22:21

Status:

Needs review

» Needs work

The last submitted patch, 58: 2613878-hash_source_keys-58.patch, failed testing.

Comment #60

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 1 February 2016 at 22:54

Status:

Needs work

» Needs review

File	Size
interdiff-2613878-58-60.txt	1.08 KB
2613878-hash_source_keys-60.patch	25.24 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,480 pass PHP 5.5 & SQLite 3.8 14,564 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,567 pass, 1 fail

2 files were hidden/shown/deleted

File	Size
2613878-hash_source_keys-58.patch	23.86 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,474 pass, 1 fail PHP 5.5 & SQLite 3.8 14,462 pass PHP 5.5 & PostgreSQL 9.1 14,464 pass, 1 fail
interdiff-2613878-56-58.txt	788 bytes

Final test.

Comment #61

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 26 February 2016 at 14:01

Issue tags:

+Migrate critical

Given the blocking nature this causes for contrib migrate, I'm marking as migrate critical. This really needs to get before 8.1 or there will be a lot of BC issues.

Comment #62

chx CreditAttribution: chx at Smartsheet commented 26 February 2016 at 16:45

This is quite close to ready. My only concern here is // Postgress sorts results by order inserted, MySQL sorts by hash.

There are two problems a) PostgreSQL b) while the comment states two facts it is not clear at all why these facts need to be stated here. So something like "As PostgreSQL sorts results by order inserted and MySQL sorts by hash, create a consistent order for easier testing".

Comment #63

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 26 February 2016 at 16:47

Status:

Needs review

» Needs work

+++ b/core/modules/user/src/Tests/Migrate/d6/MigrateUserPictureFileTest.php
@@ -42,6 +42,8 @@ public function testUserPictures() {
     foreach ($this->migration->getIdMap() as $destination_ids) {
       $file_ids[] = reset($destination_ids);
     }
+    // Postgress sorts results by order inserted, MySQL sorts by hash.
+    sort($file_ids);

Doesn't this sort of imply that the fix should be in getIdMap() so that that returns a consistent order?

+++ a/core/modules/field/src/Tests/Migrate/d6/MigrateFieldFormatterSettingsTest.php
@@ -181,7 +181,7 @@
+    $this->assertIdentical(array('node', 'story', 'teaser', 'field_test'), Migration::load('d6_field_formatter_settings')->getIdMap()->lookupDestinationID(array('story', 'teaser', 'node', 'field_test')));
-    $this->assertIdentical(array('node', 'story', 'teaser', 'field_test'), Migration::load('d6_field_formatter_settings')->getIdMap()->lookupDestinationID(array('node', 'teaser', 'story', 'field_test')));

I thought the whole point was that the order of the arguments does not matter?

Comment #64

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 26 February 2016 at 17:08

Status:

Needs work

» Needs review

File	Size
2613878-hash_source_keys-64.patch	25.24 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass, 1 fail
interdiff_60-64.txt	770 bytes

1 file was hidden/shown/deleted

File	Size
interdiff-2613878-58-60.txt	1.08 KB

re #63
The sort order is always consistent/deterministic. Typically, the order usually doesn't matter, except when running tests. No need to add extra overhead to getIdMap, just put in a deterministic order into the test and it is fine.

The order being passed to lookupDestinationID() is important. See:

  public function getSourceIDsHash(array $source_id_values) {
    // When looking up the destination ID we require an array with both the
    // source key and value, e.g. ['nid' => 41]. In this case, $source_id_values
    // need to be ordered the same order as $this->sourceIdFields().
    // However, the Migration process plugin doesn't currently have a way to get
    // the source key so we presume the values have been passed through in the
    // correct order.

Comment #65

chx CreditAttribution: chx at Smartsheet commented 26 February 2016 at 17:13

Status:

Needs review

» Reviewed & tested by the community

> The sort order is always consistent/deterministic.

This is not true. http://www.postgresql.org/docs/9.1/static/queries-order.html says "The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on" but we do not care:

> Typically, the order usually doesn't matter, except when running tests.

Correct. The actual import is stateless between rows so whatever order we get the rows, that's all fine. The test uses an order for simpler code but the actual import does not.

Comment #66

26 February 2016 at 17:46

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 64: 2613878-hash_source_keys-64.patch, failed testing.

Comment #67

heddn

English

Nicaragua

CreditAttribution: heddn at MTech, LLC commented 26 February 2016 at 18:28

Status:

Needs work

» Needs review

File	Size
2613878-hash_source_keys-66.patch	25.38 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass, 2 fail PHP 5.5 & PostgreSQL 9.1 14,567 pass, 2 fail
interdiff_64-66.txt	1.98 KB

3 files were hidden/shown/deleted

File	Size
2613878-hash_source_keys-60.patch	25.24 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,480 pass PHP 5.5 & SQLite 3.8 14,564 pass, 1 fail PHP 5.5 & PostgreSQL 9.1 14,567 pass, 1 fail
2613878-hash_source_keys-64.patch	25.24 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass, 1 fail
interdiff_60-64.txt	770 bytes

Comment #68

26 February 2016 at 19:02

Status:

Needs review

» Needs work

The last submitted patch, 67: 2613878-hash_source_keys-66.patch, failed testing.

Comment #69

edysmp

Spanish

Nicaragua

CreditAttribution: edysmp at MTech, LLC commented 26 February 2016 at 19:29

Status:

Needs work

» Needs review

File	Size
2613878-hash_source_keys-69.patch	25.47 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass PHP 5.5 & PostgreSQL 9.1 14,567 pass PHP 5.5 & SQLite 3.8 14,564 pass
interdiff.txt	696 bytes

2 files were hidden/shown/deleted

File	Size
2613878-hash_source_keys-66.patch	25.38 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,577 pass, 2 fail PHP 5.5 & PostgreSQL 9.1 14,567 pass, 2 fail
interdiff_64-66.txt	1.98 KB

Comment #70

chx CreditAttribution: chx at Smartsheet commented 26 February 2016 at 20:19

Status:

Needs review

» Reviewed & tested by the community

Thanks.

Comment #71

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 27 February 2016 at 08:54

Status:

Reviewed & tested by the community

» Fixed

Committed da55f60 and pushed to 8.0.x and 8.1.x. Thanks! I committed this to both branches as this is bug and although there is API change here - it is small and migrate is experimental.

Comment #72

27 February 2016 at 08:55

alexpott committed bd71f24 on 8.1.x

Issue #2613878 by edysmp, heddn, Adita, Lord_of_Codes, jian he, chx,...

Comment #73

27 February 2016 at 08:55

alexpott committed da55f60 on 8.0.x

Issue #2613878 by edysmp, heddn, Adita, Lord_of_Codes, jian he, chx,...

Comment #74

davidwbarratt CreditAttribution: davidwbarratt at Golf Channel commented 2 March 2016 at 18:57

This issue introduced a critical issue. :(
#2679797: Migration migrate_update_8009 for source hash

Comment #75

16 March 2016 at 19:04

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.