Problem/Motivation
Query if an entity exists based on various properties and convert to its entity id.
Proposed resolution
Create a base class that lookups entities and creates a stub if it doesn't exist. Add an ignore_case config option. Because for files and terms and even nodes, we might want to care about that.
Remaining tasks
- The property configuration option should be an array.
- Convert this to a base class.
- Inject EntityTypeManagerInterface instead of QueryFactory. This way we can also use the entity manager to create stubs. See #2633242: Process plugin for importing/creating terms by name as an example.
- Add an ignore_case config option.
- If not set, 'entity_type' config option should be set in implementation classes. So accomodate that in the code.
Original summary by @sime
I have this process plugin running as part of a migration. I'm wondering whether to put it in a new module, but checking first if you want something like this in migrate_plus. https://gist.github.com/simesy/755897d43b24e06afeed
Background: we have a taxonomy that is installed as part of an installation profile. When the migration runs, it looks up terms based on term title. Our working process config looks like this:
field_subject:
plugin: entity_lookup
source: subject
entity_type: taxonomy_term
property: name
Comments
Comment #2
mikeryanYes, that would be an excellent addition to migrate_plus, I'd love to see a patch here.
Thanks.
Comment #3
simeGreat! This is just an initial copy/paste job from my custom module. I've changed the namespaces but I'm uncertain about ideal directory/namespace structures. It should run, but I haven't checked it.
Comment #4
simeI just worked out that I shouldn't call MigrationException in my plugin construct. Not sure what I should be calling, if anything, if the configuration for the plugin is not correct. Core never validates the config of migration process plugins it seems.
Comment #5
simeLooks like MigrateSkipProcessException is what I want, but it would be nice to roll back the migration so that it's not sitting in an "Importing" state.
Comment #6
mikeryanLooks good in general. Validating that required configuration is present is not a current core pattern, and if done I don't think the constructor is the place to do it anyway.
I don't think injecting the entity query is valuable - and won't keeping it around cause conditions to pile up for each invocation?
Test code should be in the tests namespace.
Comment #7
mikeryanAlso, considering this in the context of #2633242: Process plugin for importing/creating terms by name, I think bundle should be an optional parameter (taxonomy term lookups, at the very least, would normally be done in the context of a specific bundle).
Comment #8
heddnThere's some interesting ideas in here. I've been working on something similar in #2633242: Process plugin for importing/creating terms by name. I wonder if we could combine the work. I like the more general approach used here, but I'd also like the ability to create a stub entry. Which is different than a default value. Perhaps a flag to indicate what we want to do if we cannot locate a record? And maybe the conversation in #2639254: Make it possible to skip empty migration destinations about the creation of a stub is also good to consider.
Some additional thoughts, the property configuration option should be an array. It needs to be be for taxonomy terms as you'll want to search by name and vid.
If you look at this from the perspective of #2635622: Process plugin for importing/creating files by URL, we'll also need to make accommodations for converting URLs into constructed files. Perhaps a general base class and specific instances would be an easier approach? Then we wouldn't need to pass the 'entity_type' as a configuration parameter.
Comment #9
heddnThere's already a default_value plugin. I think it can be used instead, no?
Comment #10
heddnTo bad we cannot introspect the destination entity_type by looking at the destination field. But that doesn't seem possible. Nor if you could hack things enough, is the value from it of any value. Since ER fields can reference one or many bundles.
Comment #11
sime@mikeryan said:
In my own project I switched it to raise the MigrateSkipProcessException, but it still feels hacky. I'd love to see some patterns in core I can mimic, I guess it's because migrate in core is experimental.
I got guidance from chx in #drupal-migrate that led to this solution... my understanding is that you have to inject any object that you would want to mock in a test?
So I should unset the condition after each execution? Makes sense. I'll have a look at my migration, that would mean it would fail to bring in any values after the first matched lookup worked.
@heddn said:
I thought stubbing entries assumes that the entity would be coming in in another migration - wouldn't you just use the migration process plugin for that? In my case the entities already exist outside the migration.
I agree!
Can you chain process plugins like that? I always wondered.
Comment #12
mikeryanComment #13
heddnRe: #11
I'm switching this issue around to create a base abstract class, with implementations being handled by concrete classes.
Comment #14
ada hernandez commentedComment #15
sime@heddn If a plugin is not properly configured, it should (ideally speaking) be possible to throw the exception before any processing begins.
Comment #16
ada hernandez commentedIt's changed to abstract base class that does only query.
Comment #17
heddnSome cleanup. Interdiff wouldn't make sense.
Comment #18
mikeryanLet's not use the word "stub" for the created entity here, it's very confusing because its purpose is different from the stub concept we already have in migrate. A stub is a minimal temporary holding place until real data can be filled in - here we're creating the real entity we want to create, so it's not really a stub.
Is there a reason we can't just have one general plugin that will work with any entity type? I really don't want to have to create a new process plugin for every single entity type out there...
Comment #19
heddnReason for calling it stub. It creates the basic required info for the entity. But it doesn't provide all the possible data. If there are fields on a term or file, those aren't populated. The term description isn't populated. The file image's alt and title text isn't populated. Which is why I went with stub. But I can respect that it seems confusing to use that language.
Having a single general plugin would be nice. I'll see if we can do it. There are different required pieces of data for each entity. We'd have to make some assumptions and add a few more configuration defaults. One issue is when you run into an entity that has some required field that isn't provided. But maybe that can be introspected from the entity name and (optional) bundle and an exception thrown. And maybe, we can even figure out if the entity is a bundle-able entity and throw an exception if the bundle name isn't provided. Lots of possibilities.
Comment #20
heddnre #18/#19, that won't work. Files need the partial basename without the hostname. Other entities, custom or not will have similar one-off scenarios that creating a base class is worth it. Thoughts?
Comment #21
mikeryanI'd like to see this modeled after the entity destination plugin - there's the general plugin which will work with any entity, but it can be overridden for entity types that require special handling. See user/src/Plugin/migrate/destination/EntityUser.php for an example of an override.
Comment #22
heddnLinking to #2412569: Allow setting the auto-create bundle on entity reference fields with multiple target bundles and pondering what this does for migrations and specifically this process plugin. If the autocreate bit is flipped and a default bundle is defined, then we are golden, right? We still run into the problem where it is nigh impossible to reverse engineer the destination field config by only the field name. We'd still need the destination bundle and entity type on the process plugin.
Comment #23
heddnComment #24
heddnComment #25
heddnComment #26
heddnAnd the right patch now.
Comment #27
heddnComment #28
vasi commentedComment #29
vasi commentedNone of this depends on $value. Maybe it should be factored out, and cached?
Could this be optional? Often I want to look something up, and it's acceptable if I find nothing.
The migration author should be able to specify 'ignore_case: false'.
If the condition on valueKey was case-insensitive, this section of code does nothing. If it was case-sensitive, then it already excluded the items that don't match case.
Why not allow returning multiple matches?
The code in destination\EntityContentBase::processStubRow() is much more thorough. Do you think we could re-use that?
Comment #30
heddnTests are next.
re #29
#1, process plugins are stateless. If someone wants to optimize things, just provide the values as config parameters to the plugin.
#2, addressed
#3, this isn't how I see it done for other boolean config values. see as an example #2639254: Make it possible to skip empty migration destinations
#4, addressed
#5, that could be in a follow-up. It is a bit of an edge case given the typical use case of importing taxonomy terms.
#6, this only builds out a very basic stub and EntityContentBase::processStubRow() builds out a lot more fields because it can. But we don't have any other values in the process plugin except the passed in single value. That said, the stub() method is externalized intentionally so extending the process plugin is trivial on a case-by-case basis to add in additional values for edge cases.
Comment #31
vasi commentedOk, most of these I'm happy with. Except:
#3, That one uses
!empty($this->configuration['no_stub']), which means 'no_stub: false' means FALSE. This doesisset($this->configuration['ignore_case']) ?: FALSE, which means 'ignore_case: false' means TRUE.#6, Do you think we could add a 'values' property to specify default values for the new entity? Otherwise it gets really hard to create stubs with required fields. But this could be a follow up.
We need docs for what this process does, and what config it accepts. Also, should it go back to entity_lookup instead of entity_generate, now that it allows just looking things up?
Comment #32
heddnre #31
3. I've gotten rid of the no stub logic. I'll think you'll be happy with the results. The ignore_case logic makes sense and is identical to what is used in D7.
6. Added as per your request.
Still needs tests, but I'm happy to say it works with a real, live migration of taxonomy terms. No interdiff since enough moved around I don't think it would be that helpful.
Comment #33
heddnFixed a couple small typos.
Comment #34
vasi commentedOoh, I really like where this is at! The two related plugins are a good idea.
This should be stubEntityType.
This plugin doesn't generate stubs, need a better description.
I like that this is in its own method now, thanks.
This should be $this->stubEntityType, we already check for stubValueKey above
Comment #35
heddnAdded some docs and addressed #34, 1-4.
Comment #36
heddnOne additional docs tweak.
Comment #37
heddnComment #38
heddnComment #39
edysmpComment #40
edysmpComment #41
edysmpComment #42
mikeryanBeen discussing this with heddn, I really don't like to use "stub" here, where it has a different meaning from elsewhere in the migration system. I'm not sure what's being created here actually needs it's own name, since they're just entities - for example, this comment could be "This plugin generates entities dynamically".
Oops! 8.1.x core is dealing with migration plugins in the process plugins, not entities any longer.
I added a README.txt to migrate_plus this week, please add these plugins there in the next patch.
Extra credit - it would be *really awesome* to add an example of this. Either conjure up a new example migration in migrate_example_advanced, or perhaps modify migrate_example to make, say, field_migrate_example_country a term reference to a new Country vocabulary which uses these plugins to populate the terms.
I'm going to release beta1 without this, but I'd really love to see in the the 8.x-2.0 release.
Thanks!
Comment #43
mikeryanAs it happens, I needed this functionality (at least the lookup, so far) for my new project, so here's a patch that fixes the entity_lookup plugin for 8.1.x. Since I didn't need generation (so far), I didn't touch that plugin here.
Note that besides fixing the use statement and a couple of trivial cleanups, I changed it to return NULL when the entity query returns nothing, which avoids the problem in #2692373: Value should be null when is produced skip process.
Comment #44
mikeryanOh, one observation - I really want to log to the message table if the entity query fails to find a match (controlled by a config option log_message or somesuch). But... it's challenging to log from a process plugin (e.g., how do we get the source ID?). Just putting it out there, this needs help from core (#2558047: [meta] Dealing with conditions during migration (messaging/exceptions) is confusing and inconsistent).
Comment #45
heddnComment #46
nvahalik commentedI needed a version which worked with multiple targets... so here goes.
Comment #47
esclapes commentedI am using this plugin (both lookup & generate) on a d6 -> d8 custom migration with csv source.
I am hitting the issue #2705925: ImageItem presave() fails when pointing to a non-existing file entity because the plugin adds an empty string as a
target_idfor myfield_imageand then the presave validation fails with no real image entity.Will try to fix the issue and bring the patch, but I am not really familiar on how the
field_imageshould be set so it is considered empty of any value, so any hint would be appreciated.Comment #48
esclapes commentedonly generate if not empty
third OR also needs
empty()Comment #49
vasi commentedTalking with @heddn at NOLA, we had an idea for providing field values of the generated entity. Instead of providing a static 'default_values' array, you could instead provide a 'values' key, that's just a key into the destination row.
For example:
Comment #50
heddnThe last ternary should be '=', not NULL.
Comment #51
generalconsensus commentedFixed module file reference so users won't get confused about integrating this core migrate, etc
Comment #52
heddnre #51: can we get an interdiff? See https://www.drupal.org/documentation/git/interdiff
Comment #53
generalredneckThere is a problem when assigning values to fields... where "properties" are working like a charm.
So given the example above. I'm migrating the body field into a Text (Long) field on a paragraph entity. So to import a field, you have to import the entire field... That means value and format.
In this case the body field looks like this coming over:
When run through the code in EntityLookup::query on line 207, this is going to search field_text:value for the values in "value", "summary" and "format"... like so:
I'm not sure what the solution here would be or if you plan for there to be one, but it's possible that I retrieve an entirely different entity than I intended due to this.
Comment #54
generalconsensus commentedHere is the patch and interdiff for #51
Comment #55
heddnI think that patch is bad. It has too many folders in the path. And therefore the interdiff is also a little wonky.
Comment #56
killes@www.drop.org commentedYeah, the patch needs to be applied with -p2.
Comment #57
mikeryanWe're not done yet!
Comment #58
killes@www.drop.org commentedOoops! Must have been a bad case of wishful thinking! :D
Comment #59
killes@www.drop.org commentedI played a bit with the entity_generate option.
Some remarks:
1) seems to generally work
2) requires entity references, does not work with field collection fields
3) I wasn't able to get extra values from my migration to show up in the auto-created dependend nodes, seems that only the title field is populated.
4) when I rolled back the migration, the auto-created dependend nodes remained behind, IMO they should be deleted.
Comment #60
generalconsensus commented@heddn The reason for the additional folder was to avoid confusion with the similar core paths. Ultimately this is fixed when this patch reintegrates with migrate_plus. I will remove my patch and interdiff
Comment #61
generalconsensus commentedComment #62
ressaAs @mikeryan writes
I couldn't agree more. From the top of my head, examples of these would also be great:
Comment #64
mikeryanThe basic functionality here has been stable for sometime and used by many people - I'm on my second project now using it. I regularly point people with use cases needing it to this patch - much easier if it's actually there no? So, I've committed it - we can further refine with individual issues (I've added followups for tests and examples).
Thanks for all the work on this!
Comment #65
mikeryanNote the follow-up #2765543: Deprecate "stub" language in entity_generate - I just can't abide overloading "stub" for this use case.
Comment #67
chx commentedFor posterity: A) entity_lookup should have been a 1-to-1 encoding of entity query, at least a feasible subset of. A lot of it is feasible. The current plugin is buggy as it doesn't guarantee that the result of the lookup between two runs will be the same nor that the result is the desired one as the lookup is not restricted enough. B) entity_generate shouldn't exist. Just write another migration. But, I won't fixed the issue I filed for the latter. This is just a notice.
Comment #68
heddnBased on #49 above, I've added #2821266: Make it easier to provide values into the entity being generated.