Problem/Motivation
Node authors don't always migrate properly.
Proposed resolution
Make node and comment migrations dependent on user migrations.
Remaining tasks
Updated node and comment configuration files, manually test.
User interface changes
None.
API changes
None.
Original report by @eliza411
Migrate documentation says that it doesn't matter what order you put migration in in the manifest file: https://www.drupal.org/node/2257723
Using a manifest created from the drush listing, node authors weren't migrating. I moved the user migrations to the top of the manifest and the node authors migrated as expected.
Either the documentation needs to be updated or something needs to be adjusted so that user migrations are run first. If it's a question of docs, I can make those updates, but I want to verify that it's indeed the case.
Comment | File | Size | Author |
---|---|---|---|
#29 | interdiff.txt | 888 bytes | benjy |
#29 | 2376141-29.patch | 8.24 KB | benjy |
#22 | interdiff.txt | 901 bytes | benjy |
#22 | 2376141-22.patch | 8 KB | benjy |
#18 | interdiff.txt | 2.74 KB | chx |
Comments
Comment #1
eliza411 CreditAttribution: eliza411 commentedAfter verifying that the order should not matter, I'm setting this to major and retitling.
What I'm seeing is that nodes aren't associated their authors when the user migrations are at the bottom of the file (except for user1), but they ARE associated when the user migrations are at the top.
Using the drush order (alphabetical)
Moving user migrations to the top
I tried an experiment: I just moved the user migrations to the top of the file so that nodes would migrate later, did a
drush si
, and ran the migration. (See the attached user.yml manifest which has the same commented out migrations as the first) In this case:Here's a comparison of the actual order they ran in:
https://docs.google.com/spreadsheets/d/1BtLemKzDWUQiIYELpowVU26qYzholMyz...
Comment #2
eliza411 CreditAttribution: eliza411 commentedChanging to a bug report
Comment #3
ultimikechx and I talked about this issue earlier this week during the weekly hangout, from what I gathered, we need to make all migrations that involve "entities with authors" dependent on users because "entities do not want to store invalid data" (chx mentioned this was related to a core bug). This includes nodes, node revisions, and comments (any others?)
I'll go ahead and update the issue summary (chx - what is the core bug that you referred to in the hangout?) and then do some manual testing to confirm that adding the dependencies solve this issue.
Thanks,
-mike
Comment #4
chx CreditAttribution: chx commentedWe agreed on starting with adding a user dependency to node , node_revision, and comments.
Comment #5
ultimikeI've banged on this one for a few hours and I'm not convinced this is a dependency issue.
I have a simple D6 site with a handful of users, nodes, revisions, and comments that I've been testing with. I first tried to duplicated eliza411's issue of things not getting their proper authors using her default.yml manifest. I was able to reproduce the issue she was having with nodes not getting the proper authors.
I then tried using eliza411's author.yml manifest, but that didn't fix the missing node authors for me.
I then tried adding d6_user as a dependency on d6_node, d6_node_revision, and d6_node_comment and that didn't fix the missing node authors either.
After some digging, I found that the d6 node source plugin was pulling the user id for node from the node revision table, not the node table. On my d6 site, the user id of the node revision is set at the user actually making the revision - not necessarily the user listed as the author of the revision.
I've attached a patch that solves this issue for me. I'd like to get weal's feedback on this because I seem to recall that he worked a similar issue
Thanks,
-mike
Comment #6
ultimikeComment #8
ultimikeI dug into this one a bit more. I've been unable to find time to chat with weal about this yet.
DB structure differences
In D6:
In D8::
So, in D6, there is no information about who the listed author is at each revision, only the current revision. In D8, we know who the listed author is at each revision (node_field_revision table).
Manual testing
Starting with a simple D6 site with a single node with 4 revision, where the current author is uid=2, I ran the migration to Drupal 8 and while the node appears to migrate fine, the author is incorrect. When attempting to view the list of revisions (node/2/revisions), I get a PHP fatal error: http://note.io/1sHbWZm
I'm pretty sure the author is incorrect because the node source plugin is pulling the uid from the node_revision table, not the node table. In fact, I think that we need to be pulling both node.uid (content author) and node_revisions.uid (revision author) and mapping those to the proper D8 fields.
-mike
Comment #9
ultimikePatch attached that takes into account both the node author uid and the revision author uid.
I was surprised that the d6_node migration wasn't dependent on the d6_user migration, so I added that as well.
Based on what I found, I'm still not convinced that there is an dependency/ordering issue. I've been unable to reproduce the behavior that eliza411 reported.
I've run MigrateNodeTest, MigrateNodeRevisionTest, MigrateDrupal6Test, and a manual test locally - all passed.
-mike
Comment #11
chx CreditAttribution: chx commentedThanks. It seems you got almost there; only tests failing for test reasons; I fixed them.
Comment #12
chx CreditAttribution: chx commentedHrm. That wanted to be the interdiff. Sorry!
Comment #13
chx CreditAttribution: chx commentedHere's some test coverage.
Comment #14
chx CreditAttribution: chx commentedSince d6_user has
uid: uid
and in general migrate_drupal (but not migrate) keeps numeric entity ids I have moved the migration lookup of uids into a comment keeping it because it's quite likely people will start at node to learn about migration (thanks benjy for the suggestion!).Comment #15
benjy CreditAttribution: benjy commentedPatch fixes the author id problems which I think should be the main focus of this issue. Test coverage has been added, RTBC.
Comment #17
benjy CreditAttribution: benjy commentedStraight re-roll
Comment #18
chx CreditAttribution: chx commentedThe test I wrote, that was testing bupkis.
Comment #19
benjy CreditAttribution: benjy commentedBack to RTBC.
Comment #20
alexpottI don't understand how this works in one place but not the other. Testing with the suggested structure in the comments passes all the migrate_drupal tests.
Comment #21
benjy CreditAttribution: benjy commentedThat should just be removed from node_revision or commented out the same as node. It works with or without the code because we map the uid in D6/D7 to the uid in D8 therefore the ids are the same in both the source and destination sites. This makes the lookup superfluous.
Same for revision_uid.
Comment #22
benjy CreditAttribution: benjy commentedRemoved as mentioned in #21
Comment #23
chx CreditAttribution: chx commentedLooks like Alex's concerns are addressed.
Comment #24
webchickHm. Not sure Alex's concerns were totally addressed. He seemed to also be asking why we comment that stuff out, since it seems to work fine either way. Moving back to him for further comment. (The commented out stuff also gave me pause, since we don't do that anywhere else that I'm aware of?)
Comment #25
chx CreditAttribution: chx commentedIt's left there as an explanation and with a comment. Yes we don't typically do it but we felt that nodes will be the place people look first for uid migrations and custom migrations will need the lookup.
Comment #26
benjy CreditAttribution: benjy commentedYeah, I do think it's worthwhile leaving, core serves as a great example for developers and the node migration is likely to be a popular example.
Comment #27
alexpottHonestly, I have no idea what to do with this information. At the moment it just makes me think where does migrate_drupal keep these ids. To actually be helpful to people writing migrations the comment needs to be more verbose.
Comment #28
benjy CreditAttribution: benjy commentedProposed improved comment.
"Core migrations are designed for replacing the upgrade path and therefore all node and user ids are preserved. For that reason we do not need to look-up the user id for the node. If you're writing a custom migration, user ids will vary from the source site and a lookup as shown below will be required."
Comment #29
benjy CreditAttribution: benjy commentedRe-roll with proposed comment in 28.
Comment #30
benjy CreditAttribution: benjy commented@alexpott and @chx approved the comment in IRC, setting back to RTBC.
Comment #31
alexpottMigrate is excluded from beta evaluation. Committed 498bcc7 and pushed to 8.0.x. Thanks!