This is an alternative for the derivation strategy in #2911244-141: Field collections deriver and base migration.
Problem/Motivation
Bug (A)
If you try to migrate nested paragraphs with the current paragraphs HEAD, you will get a lot of orphaned paragraphs entities after the migration, with broken references in the parent entities.
But even for having an easier and more straightforward (nested) field collections and paragraphs migrations, it would be nice to derive paragraph and field collection entity migration not just based on their own bundle (this equals to the parent field in case of field collections), but based on their parent field, parent entity type ID and (maybe) parent entity bundle ID.
If paragraphs derives these migrations with this strategy, it will be very easy to represent the migration dependencies in the corresponding field migration plugins' defineValueProcessPipeline
method, the references will be kept, and you won't have any orphaned paragraphs.
Work around data integrity issues caused by Drupal 7 paragraphs (B)
There might be other data integrity issues as well: I've seen a client site where paragraphs entities (and almost all of their revisions) were deleted (because user deleted a paragraph in a newer node revision for example). In this casewe cannot migrate anything for some previous revisisions if we have no data, but the revision of the paragraph reference field is kept, so the value process pipeline creates non-updatable stubs in these cases (and most of the cases in the wrong paragraph migrations).
I except that in such cases, the non-resolvable references should be gone on the destination site as well (you don't have this data on the source D7 site) instead of creating invalid stubs.
Proposed resolution
Solution for bug (A):
- Apply a new, parent-based paragraphs derivation strategy, including new, or extended tests.
- Add new test content, e.g: new content with paragraphs referencing another paragraphs AND even field collections and vice versa.
Solution for (B):
- Don't migrate paragraph revisions if the active revision does not exist
- Prevent stub creation in value process pipelines.
Remaining tasks
- Don't migrate paragraph revisions if the active revision does not exist
- Prevent stub creation in value process pipelines.
- Apply a new, parent-based paragraphs derivation strategy, including new, or extended tests.
- Add new test content, e.g: new content with paragraphs referencing another paragraphs AND even field collections and vice versa.
API changes
Different paragraph and field collection migration derivatives are provided by paragraphs.
Data model changes
Data integrity is kept.
Comment | File | Size | Author |
---|---|---|---|
#45 | paragraphs-orphaned_log_instead_of_message-3145755-45.txt | 8.38 KB | huzooka |
#43 | paragraphs-bug_derive_by_parent_and_orphans-3145755-43.diff | 498.61 KB | huzooka |
Issue fork paragraphs-3145755
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3145755-migrate-paragraphs-per-parent-entity changes, plain diff MR !5
Comments
Comment #2
huzookaPatch
paragraphs-derive_by_parent-3145755-2--on-top-of-2911244-141--do-not-test.patch
requires #2911244-141: Field collections deriver and base migration applied, the combined patch contains both of them.Comment #3
Wim LeersNit: meaningless comment.
Nit: "strategy" vs "method".
I think "strategy" is preferable. I was going to say that that is the term that https://www.drupal.org/node/3105503 uses, but that's unfortunately not true. That uses "type" 🤷♂️
🤓 Weird formatting IMHO. I think this is what core uses:
Übernit: s/field collection/field_collection/
Nit: IMHO it's always preferable to do
assert($something instanceof FQCN)
instead of/** @var FQCN $something */
. That gives you an extra assurance and is not IDE-specific.Übernit: s/paragraph/paragraphs/
🤔 What about
node_revision
andnode_complete
? These seem to only work for non-Node
entity types?🤔 What are "entity properties"?
🤔 What is a "configuration item"?
Übernit: why mix single and double quotes in one function call?
Comment #4
huzookaComment #5
huzookaAddressed everything from #3, except of #10:
In the third argument (in the join conditions) I need variable substitution, while the first and the second args are simple string literals.
Comment #6
huzookaUsed wrong strategy values in the tests. And they're failing with the new derivation strategy. 😕
Comment #7
huzookaI also added a new test content with nested paragraphs. Its migration ends in orphaned paragraphs (and missing paragraph reference values) with the original derivation strategy – so I just skip testing this new content in
ParagraphContentMigrationTest
and also inMigrateUiParagraphsTest
.The combined and also the on top of #2911244-146: Field collections deriver and base migration patches contain the database fixture update as well. It still needs some cleanup, but it takes too much time, I have given up.
Forgot to address #3.7:
@Wim Leers, I think that you mix the migration (base) plugin ID with the migration destination (base) plugin ID.
There is no such thing as
node_complete
imho (fixme). There ared6_node_complete
ord7_node_complete
that are the migrations of complete nodes, but their destination plugin's base ID isentity_complete
.Comment #8
huzookaI just noticed that the acceptance criteria (migrate even nested paragraphs and field collections successfully) isn't met here.
Right now, the new derivation strategy creates these kind of derivatives:
d7_paragraphs:paragraph_reference_field:node:article:paragraph_bundle
, ord7_paragraphs:paragraph_reference_field_in_nested_para:paragraphs_item:paragraph_bundle:paragraph_bundle
So the pattern is:
d7_paragraphs:<parent-field-name>:<parent-entity-type>:<parent-entity-bundle>:<paragraph-bundle>
.The only problem with this is that you may have a nested paragraph entity that contains an another nested paragraph entity. In this situation, the current patch cannot properly set the required migration dependencies, and it can lead to circular dependencies (nested bundle 1 contains nested bundle 2 and vice versa), or to a case where a paragraphs migration depends on itself.
I think that we can handle this by introducing a yet-another derivation level that represents the nesting level of the paragraph entity migration.
So my proposal is: every paragraphs migration which parent entity type is a paragraph will be derived with an extra nesting level bit. Example:
d7_paragraphs:paragraph_reference_field:node:article:paragraph_bundle
, but:d7_paragraphs:paragraph_reference_field_in_nested_para:paragraphs_item:paragraph_bundle:paragraph_bundle:1
,d7_paragraphs:paragraph_reference_field_in_nested_para:paragraphs_item:paragraph_bundle:paragraph_bundle:2
etc, so the pattern will be:
d7_paragraphs:<parent-field-name>:<parent-entity-type>:<parent-entity-bundle>:<paragraph-bundle>:<how-many-nesting-levels-you-have-to-go-back-to-get-a-non-paragraph-parent-entity>
.Back to needs work.
Comment #9
Wim LeersRE #3.7 (I wrote a reply to that but apparently failed to post it 😬) — yep, you're right, oops! 🙈😅
#8: thanks for that detailed write-up! 🙏 I am really curious to @heddn's thoughts about this!
Comment #10
huzookaComment #11
ccjjmartin CreditAttribution: ccjjmartin at Four Kitchens commented@huzooka It looks like your patches have a bunch of extra files in them. An example I am seeing is file paths committed that show sites/default/files. Can you look at uploading your patches again? The ones closer to 1MB in size seem to be the ones affected.
Comment #12
huzooka@ccjjmartin, the "big patches" contain database fixture updates and also add the missing files for being able to test this patch with Drupal core's Migrate UI.
This is exactly the same that I already shared in #7:
Comment #13
huzookaWe don't need the approach described in #8 – the
paragraphs_lookup
migration process plugin will create the required stubs.Comment #15
huzookaSad, but true: since the migration plugin IDs of the derived migrations of this patch are longer than the usual, this depends on (and is blocked by) #2845340: migrate mapping & messages table names are truncated, can lead to incorrect mapping lookups.
Comment #16
huzookaAnother bug discovered when migrating from PostgreSQL source: #3164520: FieldableEntity::getFieldValues() does not guarantee that the returned field values are sorted by their delta.
I will (and I can) work around this issue.
Comment #17
huzookaComment #19
huzookaFixing coding standard.
Comment #20
huzookaAdding a "fix-only" patch as well.
Comment #21
Wim Leers#13:
🤔 "at this point not accurate" — fascinating!
This looks like it could/should be in a helper method.
"finalized" is a less ambiguous word 👍 Let's use "unfinalized" and "finalized".
Does "raw" here mean "unfinalized"?
Why is this order changing? (Not that it matters, I would just like to understand.)
#17:
👏
🤔 Why do we need the
array_filter(…, 'is_int')
here?Comment #22
huzookaComment #23
huzookaStill needs some work for the field_collection migrations.
Comment #24
huzookaAddressing #21 and #23.
Re #21:
[ #13 ]
[ #17 ]
I will try to reduce the database fixture changes as much as possible.
Comment #26
huzookaD9.1 test failures are unrelated to this patch afaict.
This patch contains the db fixture with manually reduced size.
Only one single CS error was fixed in a test, no interdiff.
Comment #27
Wim LeersAmazing work!!! 🤩🚀
Comment #28
huzookaComment #29
Wim LeersThis is the essential change here — everything else is just infrastructure changes that make this code a lot simpler (and hence easier to understand).
What this is doing now is using required dependencies whenever possible. Only when the source value of the
process
pipeline is dynamic, we add both possible dependnecies, and we mark both as optional, since we cannot be certain that it will be required.👍
Comment #31
Wim LeersThe two fails are for coding standards:
Comment #32
iancawthorne CreditAttribution: iancawthorne commentedDeleted.
Comment #33
Wim LeersConfirmation over at #3178043-4: Experience report: migrating D7 field collections to D8 paragraphs that this patch is helping making Paragraphs migrations feasible.
Comment #34
iancawthorne CreditAttribution: iancawthorne commentedAs per my post mentioned on #33 above, I was initially struggling with this with a migration involving a site with both field collections and paragraphs plus some of the paragraphs being nested.
Some essential steps I found to make it all work were:
I had to run the field and field instance migrations twice, eg:
d7_field and d7_field_instance were required to get the d7 field collection originating paragraph type fields added while;
upgrade_d7_field and upgrade_d7_field_instance got the d7 paragraph originating paragraph type fields.
There seemed to be no side effects of doing this, but it seemed odd that it was necessary.
then for the data...
The field collections worked absolutely fine, as did the paragraphs which were not nested. For the paragraphs which were nested (to be a bit more clear, by nested I mean paragraph type 1 > paragraph type 2 > content type hierarchy) it was essential to run the first child items first, followed by the "middle" paragraphs, finally the content. Without doing this (which was not protected with migration order dependency - I had to work out which order it needed to be and do them one by one), the paragraphs would get very confused with the wrong "paragraph type" being assigned to the content with empty fields.
So the summary was; you can't just let it run through the migration on auto. It needs some manual working out of the order in which things need to happen.
Comment #35
Wim Leers#34:
Thanks for sharing that — hopefully this means @huzooka and I can convince the migration system maintainers (and the Paragraphs maintainers) to use more
['migration_dependencies']['required']
than we've been able to convince them to use in Drupal core 🤞Comment #36
huzookaThis is the current MR as diff.
Comment #37
huzookaComment #38
huzookaComment #39
huzookaThe legacy derivation strategy was completely removed, and rebased on top of #3226658: Guarantee the uniqueness of the migrated paragraph types.
Comment #40
huzookaComment #41
huzookaAdding patch #40 as diff.
Comment #43
huzookaComment #44
Wim Leers#39 interdiff:
Well … that's a huge simplification! 🤩
#40:
If I understand it correctly, this simple addition is wholly responsible for ensuring no orphaned paragraphs are migrated — wow! 🤩
Übernit: s/restora/restore/
#43:
This looks much more robust indeed, and simpler.
🤔 When can this occur?
Comment #45
huzookaThis is for logging data integrity issues of paragraphs (revisions) – instead of saving migrate messages. Works if you have the logger process plugin from here: #3232075: Create a logger migrate process plugin
I don't want to rebase my patches on top of the current paragraphs HEAD (lack of time), so I attach an interdiff only.
Comment #46
Wim Leers#45:
… and it is not something that can solved! At least not automatically. And manually it'd also be really hard.
IOW: this is a data integrity issue affecting only past revisions: the default (live) revision MUST work correctly, otherwise the migration will fail.
And that is why this "log to inform but do not complain" solution is deemed acceptable: only if you go digging through the past history (the revisions) of these paragraphs will you run into problems.
Comment #47
pub497 CreditAttribution: pub497 commentedI was only able to get this to work as #34 did, thanks for that -- side effect was my entity reference fields that used taxonomies got reset so those content types needed to have the taxonomy re-selected and content had to be re-imported
Comment #48
natefollmer CreditAttribution: natefollmer commentedI can't get the patch to apply cleanly to the latest dev branch. I'm having some issues with nested paragraphs coming in with data-less fields and I'm hoping this patch may be the fix.