Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
For we need to source data from D8. We started building something ourselves in #2912348: Handle entity_references related to Drupal 6 and 7 node translations with different IDs. Then found out that https://www.drupal.org/project/migrate_drupal_d8 already has much of that functionality. Let's move it into core instead of building something new.
Proposed resolution
Discussed where this should land and it was decided in the weekly migrate meeting to land all of the code directly into migrate_drupal.
Remaining tasks
Do it.- Review it.
User interface changes
None.
API changes
Addition: A new D8 content entity source plugin and its deriver.
Data model changes
None.
Comment | File | Size | Author |
---|---|---|---|
#77 | interdiff_74-77.txt | 4.16 KB | heddn |
#77 | 2935951-77.patch | 29.25 KB | heddn |
Comments
Comment #2
heddnAlso this is critical because it blocks finishing a critical.
Comment #3
heddnTerm test will fail. We also need to support translatable fields. But I want to see what else fails.
Comment #4
heddnWe need to reload the entity to fix the term tests.
Comment #5
heddnUnassigning for now.
Comment #7
maxocub CreditAttribution: maxocub commentedThis should add support for translations.
Needs tests for translations, so back to work.
Comment #8
heddnComment #9
heddnComment #10
maxocub CreditAttribution: maxocub commentedThis should fix the failing test.
Also, here's a start for testing translation support. The addtranslation() doesn't seems to have the desired effect. The node_field_data table only ends up with one row (the translated one) instead of two (the source and the translation).
Comment #12
heddnHopefully fixes the last failure here.
Comment #13
quietone CreditAttribution: quietone as a volunteer commentedJust two small things
s/contenty/content/
$modules?
Comment #14
phenaproximaThe entity type should not be configuration. I think this should be a derived plugin, because that would also allow us to set the source_provider more accurately.
I *really* do not think we should be directly reading the database tables. This module exists in Drupal 8, running on Drupal 8's API. Why not just take advantage of Drupal 8's APIs??
And policy dictates that API is going to stay stable for the foreseeable future, with plenty of time and notice to deal with deprecations. It seems pretty safe to rely on.
Same here. I am not a fan of this. For example -- this approach offers no support for dealing with computed fields. Just take advantage of the APIs!
Comment #15
heddnLinking to #2543726: Make $term->parent behave like any other entity reference field, to fix REST and Migrate support and de-customize its Views integration, since that blocks full taxonomy term support.
No interdiff as there's enough changes it wouldn't make sense. This still doesn't support computed fields or custom fields, but you can easily grok the code and add these via a prepareRow. This also doesn't have any deriver yet.
Comment #17
heddnOK, this passes tests locally. It has a deriver and should be ready for adding tests for i18n. Leaving the tests tag so we don't forget to do that. Also tagging for a change record so we don't forget to do that.
Comment #18
heddnComment #19
phenaproximaThanks for taking advantage of the entity API! This is pretty straightforward now; it's just refinement from here.
This can go away; the derivative ID is the entity type ID. Once parent::__construct() is called, $entity_type == $this->getDerivativeId().
Call the parent constructor first, then this can be $this->getDerivativeId().
I think we can remove this. It seems overzealous to me.
This returns a generator (\Generator), not an array, so we should adjust the doc comment as such.
IDs
Micro-optimization: We should call getStorage() before the foreach, so we don't have to keep doing it.
We should be very careful with toArray(). It returns entity data in a specific style, where every item is an array of values that would each need to be processed with something like the sub_process plugin. Not sure that will be ideal for most process pipelines.
@return and @throws need descriptions.
Should this return something?
Nit: $fieldDefinitions should be $field_definitions.
$fieldName should be $field_name. Same thing everywhere else in this plugin.
I'm not seeing where this is actually used?
Nit: getDefinitions() is keyed by entity type ID, so we could do $id => $definition and lose the repeated calls to $definition->id().
This is redundant and can be removed. The plugin's derivative ID is already the entity type ID.
Comment #20
heddnSo, this adds i18n export by default. And addresses most of the feedback in #19.
1, 2, 13, 14 => this is intentional. We want this source to be usable outside of a deriver where someone extends it and add their own custom queries. Using the deriver id would make that more troublesome.
3 => edge case handling is helpful. I suggest we leave it.
4, 5, 6, 8, 9, 10, 11, 13 => fixed
7 => It might not be ideal, but its the only way I know of to return the data in a valid manner. I'm open to suggestions though.
9 => How can we calculate a valid count that includes translations? I'm not sure.
Comment #22
heddnThis will probably still fail. But I couldn't get xdebug to pause so I could figure out what was going on. It seems to me like the entity schema isn't getting completely installed. But that doesn't make sense since I'm now explicitly installing it. Putting this aside for now. Maybe someone else will figure out what is wrong.
Comment #24
quietone CreditAttribution: quietone as a volunteer commentedThis worked for me. Action needs system module, book needs user module., setup() method isn't needed.
Comment #25
maxocub CreditAttribution: maxocub commentedThere's mode change on run-test.sh and it's accidentally included in the patch since #20.
Comment #26
quietone CreditAttribution: quietone as a volunteer commented@maxocub, thanks. I was just working too late.
Comment #27
maxocub CreditAttribution: maxocub commentedI think this should be 'source_module', right? When trying to use this plugin I have an exception thrown that it is missing the 'source_module' property.
Comment #28
heddnComment #29
alexpottHow come #27 wasn't picked up by the tests?
Comment #30
heddnI've opened #2937045: source plugin source_module testing seems incomplete as a follow-up for why #27 wasn't caught in tests.
Comment #31
alexpottOriginally I thought we probably ought to mention the possibility of needing to install additional modules in tests as a result of this change in the change record, but I've had a think and this probably is only try for core sources where the modules that provide the entities are somewhere else. Ie. the action entity is provided by system.
It's great to see
@group migrate_drupal_8
- these sources are important to Drupal's major version release plans. And obviously is it good we have test coverage of the sources here but it would also be great to see a full migration test like we have for Drupal 6 and 7. Basically a Drupal 8 equivalent of\Drupal\Tests\migrate_drupal_ui\Functional\d7\MigrateUpgrade7Test
. Not in this issue but as a follow-up.One thing that surprised me is that we have one gigantic test and not many new tests all extending
\Drupal\Tests\migrate\Kernel\MigrateSqlSourceTestBase
. I guess that's because the code can be generic. This leads to the question about whether the approach taken here is correct. Ie. using D8 APIs in derivative discovery. Not sure.Comment #32
phenaproximaGetting there...
This shouldn't be a string. Why not simply let this be the result of $this->entityTypeManager->getDefinition(), and then if we need the entity type ID, we can call $this->entityType->id()?
We don't need this. As long as we set $this->entityType in the constructor, we can call $this->entityType->getKey('bundle').
We should throw an exception if the given entity type is a config entity type.
These two if statements can be combined into one.
I don't think we should call this method 'yield' -- it's vague. How about 'loadEntities' or 'streamEntities'?
We should not be doing this unless $entity instanceof TranslatableInterface.
Can we expand this doc comment to explain how, exactly, an entity is converted to an array, and why we are doing this extra massaging?
This seems pointless. Can't we just do $return = $entity->toArray()?
IDs, ID
What if $return[$id] is empty? We should probably throw some sort of exception.
@return is missing a descrption.
Variable casing is inconsistent; $idKey should be $id_key.
Comment #33
alexpottHad a quick discussion with @catch.
So one thought about the D8 API usage is that these are not actually D8 specific source plugins. They are current drupal version plugins. When we open D9 this will be the D9 source plugin deriver not the D8 source plugin deriver.
Comment #35
heddnI fixed all the comments from #31 and #34.
For #32:
1-5 fixed
6: How would we know at this point if translations are turned on for an entity? I'd like to know, because then we could do that same logic and still call count when the entity is not translatable. See #2937166: Improve count() for ContentEntity source plugin. However, I think translations is at a bundle level, not a entity-wide setting. So we don't really know since the generic support must handle both. I'd be willing to punt this to a follow-up. The only concern is with BC if we do that, since it could mean that the mapping table would need to be rebuilt to /not/ include a langcode. What is the hurt if we do include langcode? Then we naturally support multi-lingual if someone ever turns it on...
7-12 fixed
I also opened up #2937166: Improve count() for ContentEntity source plugin as a follow-up to better calculate counts.
Comment #36
alexpottI'm not sure that the patch in #35 was the intended patch as it seems to be the same as the patch in #28 and not have the interdiff applied.
On reviewing the interdiff I would have expected this to change as there's no way we'd find this and change it when we tag Drupal 9.
Comment #37
jofitz CreditAttribution: jofitz at ComputerMinds commentedAttempt to create the patch that was intended in #35.
Comment #38
heddnHere's the patch that was intended in #35. Th lone diff I'm seeing is a response to #36, where it uses 'migrate_drupal' for the test group. Sorry for the snafu.
Comment #39
alexpottI think we should explain that this is the for the current version of Drupal.
Missing class doc.
Comment #40
phenaproximaThis is getting really close.
It's not really from the database...
Should we rename the plugin to content_entity, so it's clearer? (Also future-proof, in case we support config entities at some point too.)
Nit: We can use hasKey() instead of getKey() here.
I wonder if we shouldn't just use $this->entityType->getPluralLabel() here.
Should be "...yields entities, one at a time."
Should be "Converts an entity to an array".
"Makes all IDs into flat values"
Does EntityMalformedException still exist? If so, that might be more appropriate here. Also, 'id' should be 'ID', and we should be calling $entity->getEntityTypeId() here. (getEntityType() returns EntityTypeInterface, not a string; the fact that the tests didn't catch this leads me to think that we need a test of this exception :)
Comment #41
heddn#39 and #40 addressed. Except for adding tests for 40.8.
I'm not sure how to test that. I did some investigation into typed data and I really don't think it is possible for that exception to ever get reached. I'm almost inclined to remove the check and exception. @phenaproxima, you asked for it originally. Is it possible for TypedData to not return a double array nested response?
Comment #42
heddnI did some more digging into situations where the exception might occur. I don't think that if the backend is sql-based, it is very realistic. The ids are going to be required to be either primary key or unique keys by SqlContentEntityStorageSchema. This means that it would be impossible to store the data in the database for later use by this source plugin without some value.
I've also now switched to type hint on
ItemList->first()->getString()
. Which makes it more clear we are returning a single value from the first element in the entity's id property. And if that should fail for some reason, then we'll throw the appropriate entity API errors/warnings/exceptions; which will just naturally get bubbled up.Comment #43
phenaproximaSorry, more shtuff...
This doesn't appear to be used anywhere.
Nit: Can this say "The entity type definition"?
I think we should simply return
(string) $this->entityType->getPluralLabel()
and leave it at that. The word "source" adds nothing.Maybe we should hide this behind a configuration option (include_translations), which is TRUE by default.
Should be "Converts an entity to an array."
Should be "The entity, represented as an array."
Why is the unset() necessary if we're overwriting $return[$id] anyway?
Rather than have two foreach loops, let's build a full array of $field_definitions first, by combining the base and bundle field definition arrays.
This code might be useful to destinations as well. Perhaps it should go into a trait? I'm OK with discussing that in a follow-up, though.
This comment either needs to be expanded or removed; nothing else in the method makes reference to content_translation.
I think we can remove this comment. "ContentEntityDeriver" pretty much sums it up.
This should be dual-type hinted as both a mock object and MigrationInterface.
Is there any particular reason we don't use MigrationPluginManager::createStubMigration()? That would be a more "realistic" test than setting up mock objects.
The Media module has a trait that we can use for this: MediaFunctionalTestCreateMediaTypeTrait. It will work in kernel tests.
Comment #44
heddnI've added #2937782: Create trait for getDefinitionFromEntity to address #43.9. The rest of the points are addressed. I also added a parent check on terms since[#2543726] is now committed. That makes this a 8.6 thing only thing right now. We'll have to do a special backport patch to remove those test checks on 8.5.
Tests pass locally.
Comment #46
heddnSeems it wanted to test 8.5, even though I switched to 8.6. So, back to NR.
Comment #47
maxocub CreditAttribution: maxocub commentedInstead of creating a new issue for a 8.5.x backport, I'll just upload the two patches for the two versions here, with the only difference being how we test the taxonomy term parents. I think it's easier to follow the work on these patches if they are in the same issue.
Comment #49
maxocub CreditAttribution: maxocub as a volunteer commentedA few minor improvements. I think this is ready for another review. Meanwhile, I'll write a change record, probably tomorrow.
Comment #50
maxocub CreditAttribution: maxocub as a volunteer commentedI've been trying to write a change record for this issue and provide some usage examples, but there's something I don't understand in this patch, and here's a few questions to help me understand.
What does a source plugin deriver does? I know that, for example, a migration plugin deriver will generate multiple migrations plugins, and run them separately, but a deriver for a source plugin I don't understand the usage. Source plugins are used in a migration plugin, so how does one migration would use multiple derived source plugins?
And in fact, the deriver in this patch does not even seem to be working. If I create a migration with the
content_entity
source plugin, it will fail saying that it's missing theentity_type
property, which is supposed to be the thing that is being derived.Do we really need the deriver here? And if so, can someone clarify how it would be used?
Comment #51
phenaproximaSource plugin derivers do the same thing as migration plugin derivers, for the same purpose. As with any derived plugin, it lets us have several variants of a single plugin, each of which is only slightly different from the others. Derivatives are a core feature of the plugin system in general, to help with code reuse.
A migration would not use multiple source plugins, period. It doesn't matter whether or not they were derived; a migration has a single source plugin and that's that. The source plugin in use could be a variant created by a deriver, but it's still only a single variant that will be used in any given migration.
Comment #52
heddnTo use it, you have to use it like the destination plugin and its deriver. For the user entity, it would look like:
content_entity:user
, where 'user' is the entity type id. Now, why does it complain about a missing entity_type? That's because the plugin is built with extensibility in mind. Not everyone who uses the source plugin will use it by passing in the entity_type. Because they will want to extend from this class and include additional fields or data. So, for that purpose, we automatically map the entity_type in the deriver, but give it as an argument/configuration if you decided to extend and then cannot take advantage of the deriver.Comment #53
maxocub CreditAttribution: maxocub as a volunteer commented@phenaproxima, @heddn: Thanks, that makes sense.
Discussed in the weekly migrate meeting. We should add doxygen documentation with usage examples and we should remove the term parent tests in the 8.5.x backport. We should also create a follow up for the 8.5.x backport to do add a check in the source to see if it's taxonomy terms and load the parent.
Comment #54
maxocub CreditAttribution: maxocub as a volunteer commentedHere's a first draft for the plugin documentation.
I also created the follow up #2940198: Load taxonomy term parents in ContentEntity source plugin.
Comment #55
heddnNodes are bundleable and translatable, so these phrases aren't needed. Patch with fixes attached. Otherwise, I think all the feedback here is addressed and we're ready for a final review.
Only a single interdiff uploaded, because the changes between 8.5 and 8.6 are identical. Remember, the only difference between 8.5 and 8.6 is in test coverage and the interdiff of those changes is in #54.
Comment #56
heddnI also wonder if we should add details about the entity_type config (required) config option. I stumbled on that question while writing the CR.
Comment #57
maxocub CreditAttribution: maxocub as a volunteer commentedReading your CR, I guess it wouldn't hurt to add the entity_type config in the doc block, in fact, I think we need it.
Comment #58
heddnI've added more details about setting the entity_type and why you ever might want to extend and build your own custom source plugin. This should address my question in #56. /Now/ we are ready for a final review.
Comment #59
heddnWhile reviewing #2630732-36: Implement Entity::fields() for migration destinations, I stumbled upon the fact that you can create a content entity that is non-fieldable. This would be very hard to do, because ContentEntityBase is /the/ default class to extend when working with entities. All core entities extend from it. And it implements ContentEntityInterface, which implements FieldableEntityInterface. And its actually pretty mind-boggling to consider creating content entities that aren't fieldable. But for completeness sake, I've added a check.
Comment #62
phenaproximaOnly found a few things. And only one them is not a documentation nitpick.
This patch is looking fantastic.
Typo: "calculated" should be "calculate".
"id" should be "ID".
Missing commas. Should be "...utilized, i.e., a custom..."
We can lose "String,"
We can lose the "Boolean,", I think. "Default" should be "defaults". Personally, I'd prefer if exclude_translations was a positive switch -- include_translations, defaulting to TRUE, but that's just a preference.
We should mention that this does NOT load all revisions, just the default one.
We need to look at $configuration['bundle'], not $this->configuration['bundle'], because AFAIK $this->configuration is set by the parent::__construct() call. The fact that we didn't catch this indicates that we should probably have a couple of unit tests of this constructor.
Lovely!
Supernit: "return" should be "raise" or "throw" :)
"id" --> "ID"
Comment #63
maxocub CreditAttribution: maxocub as a volunteer commentedRe #62.5: I would also prefer include_translations over exclude_translations, I don't like the double negative.
Comment #64
heddnI've dropped the 8.5 patch in this post. Once we get closer to RTBC, I'll re-add it. It's just too hard keeping all of everything in sync for 8.5.
All feedback in #62 should be addressed.
Comment #66
heddnI think we ran into https://github.com/doctrine/common/issues/744 for those failures.
This should fix things.
Comment #67
heddnThis was fixed upstream already, but we're still using 2.6.2 of doctrine/common, which doesn't have the fixes.
Comment #68
phenaproximaThis is ready. Thanks, everyone!
Comment #69
heddnLet's leave this open after the 8.6.x commit for the follow-up commit to 8.5.x? Its only a couple lines and will be way easier to re-roll in here than opening a follow-up.
Comment #70
larowlanFrom #31
I can't see where that follow up was created?
should we also validate that the bundle is valid for the given entity type?
nice
is there an issue with the amount of memory the static cache of entities on the storage handler will use?
should we be reseting cache periodically?
aside:could use array_map here
What is this all about?
Comment #71
maxocub CreditAttribution: maxocub as a volunteer commentedFollow-up created for the full migration test: #2940806: Add a full migration test for the ContentEntity source plugin
Comment #72
heddn#70.2: The migrate executable has
memoryExceeded()
. If we consume more than 85% of available memory, it will stop the migration cleanly. So, I think that is covered already. One can always re-run the migration from that point and pick up from where it stopped off.If this come back green, this should be good to go again.
Comment #73
catch#70.2 there's #1596472: Replace hard coded static cache of entities with cache backends and #1199866: Add an in-memory LRU cache open to improve entity static caching and memory.
One thing I spotted. It might be early in the morning, but I'm struggling a bit with exactly how translations are or are not included, could maybe use some high level documentation somewhere?
This should have an accessCheck(FALSE).
Comment #74
heddnre: translations
There is a flag
include_translations
. If true (the default),yieldEntities
callsgetTranslationLanguages
and returns all non-default languages. The default language will always be the first returned, followed by any additional languages.There's docs on
include_translations
in the doxygen. Is more needed?Attached is the requested accessCheck logic change.
Comment #75
heddnAlso re 71: and the full test... I thought the full migration tests would be all the follow-up issues for i18n this issue is blocking. Of course, we can always add more tests, but those might be enough to give us full coverage.
Comment #76
phenaproximaJust for readability, can we expand this long if statement into a couple of lines? Also, the in_array() call should have TRUE passed as the third argument.
We should not throw InvalidPluginDefinitionException here, because the definition is valid; it's the configuration that isn't. InvalidArgumentException should be fine in this case.
Comment #77
heddn#77 addressed. Although I'm not as sure we should do a strict match. The entity query won't be that strict itself on /most/ sql implementations.
Comment #78
maxocub CreditAttribution: maxocub as a volunteer commentedI think all feedback has been addressed, let's get this back to RTBC so we can un-postpone the other criticals.
Comment #79
phenaproximaI agree. +1 for RTBC.
Comment #81
catchCommitted/pushed to 8.6.x and cherry-picked to 8.5.x. Thanks!
Comment #84
Wim LeersFYI: the changes made in #71 to throw an exception when an invalid bundle is specified breaks during rollbacks. See #3186449: Rolling back a migration implementing MigrationWithFollowUpInterface does not clear the generated follow up migrations from the cache.. Feedback welcome!