Problem/Motivation

Query if an entity exists based on various properties and convert to its entity id.

Proposed resolution

Create a base class that lookups entities and creates a stub if it doesn't exist. Add an ignore_case config option. Because for files and terms and even nodes, we might want to care about that.

Remaining tasks

  1. The property configuration option should be an array.
  2. Convert this to a base class.
  3. Inject EntityTypeManagerInterface instead of QueryFactory. This way we can also use the entity manager to create stubs. See #2633242: Process plugin for importing/creating terms by name as an example.
  4. Add an ignore_case config option.
  5. If not set, 'entity_type' config option should be set in implementation classes. So accomodate that in the code.

Original summary by @sime

I have this process plugin running as part of a migration. I'm wondering whether to put it in a new module, but checking first if you want something like this in migrate_plus. https://gist.github.com/simesy/755897d43b24e06afeed

Background: we have a taxonomy that is installed as part of an installation profile. When the migration runs, it looks up terms based on term title. Our working process config looks like this:

  field_subject:
    plugin: entity_lookup
    source: subject
    entity_type: taxonomy_term
    property: name
Files: 

Comments

sime created an issue. See original summary.

mikeryan’s picture

Yes, that would be an excellent addition to migrate_plus, I'd love to see a patch here.

Thanks.

sime’s picture

Great! This is just an initial copy/paste job from my custom module. I've changed the namespaces but I'm uncertain about ideal directory/namespace structures. It should run, but I haven't checked it.

sime’s picture

Status: Active » Needs work

I just worked out that I shouldn't call MigrationException in my plugin construct. Not sure what I should be calling, if anything, if the configuration for the plugin is not correct. Core never validates the config of migration process plugins it seems.

sime’s picture

Looks like MigrateSkipProcessException is what I want, but it would be nice to roll back the migration so that it's not sitting in an "Importing" state.

mikeryan’s picture

Looks good in general. Validating that required configuration is present is not a current core pattern, and if done I don't think the constructor is the place to do it anyway.

  1. +++ b/src/Plugin/migrate/process/EntityLookup.php
    @@ -0,0 +1,91 @@
    +    $this->query = $entityQuery->get($configuration['entity_type']);
    ...
    +      $container->get('entity.query')
    

    I don't think injecting the entity query is valuable - and won't keeping it around cause conditions to pile up for each invocation?

  2. +++ b/src/Plugin/migrate/process/TestEntityLookup.php
    @@ -0,0 +1,30 @@
    +namespace Drupal\migrate_plus\Plugin\migrate\process;
    

    Test code should be in the tests namespace.

mikeryan’s picture

Also, considering this in the context of #2633242: Process plugin for importing/creating terms by name, I think bundle should be an optional parameter (taxonomy term lookups, at the very least, would normally be done in the context of a specific bundle).

heddn’s picture

There's some interesting ideas in here. I've been working on something similar in #2633242: Process plugin for importing/creating terms by name. I wonder if we could combine the work. I like the more general approach used here, but I'd also like the ability to create a stub entry. Which is different than a default value. Perhaps a flag to indicate what we want to do if we cannot locate a record? And maybe the conversation in #2639254: Make it possible to skip empty migration destinations about the creation of a stub is also good to consider.

Some additional thoughts, the property configuration option should be an array. It needs to be be for taxonomy terms as you'll want to search by name and vid.

If you look at this from the perspective of #2635622: Process plugin for importing/creating files by URL, we'll also need to make accommodations for converting URLs into constructed files. Perhaps a general base class and specific instances would be an easier approach? Then we wouldn't need to pass the 'entity_type' as a configuration parameter.

heddn’s picture

+++ b/src/Plugin/migrate/process/EntityLookup.php
@@ -0,0 +1,91 @@
+    // Return the default value if no value was passed or if the passed value returned no results from the query.
+    return $this->configuration['default_value'];

There's already a default_value plugin. I think it can be used instead, no?

heddn’s picture

To bad we cannot introspect the destination entity_type by looking at the destination field. But that doesn't seem possible. Nor if you could hack things enough, is the value from it of any value. Since ER fields can reference one or many bundles.

sime’s picture

@mikeryan said:

Validating that required configuration is present is not a current core pattern, and if done I don't think the constructor is the place to do it anyway.

In my own project I switched it to raise the MigrateSkipProcessException, but it still feels hacky. I'd love to see some patterns in core I can mimic, I guess it's because migrate in core is experimental.

I don't think injecting the entity query is valuable

I got guidance from chx in #drupal-migrate that led to this solution... my understanding is that you have to inject any object that you would want to mock in a test?

- and won't keeping it around cause conditions to pile up for each invocation?

So I should unset the condition after each execution? Makes sense. I'll have a look at my migration, that would mean it would fail to bring in any values after the first matched lookup worked.

@heddn said:

I wonder if we could combine the work. I like the more general approach used here, but I'd also like the ability to create a stub entry

I thought stubbing entries assumes that the entity would be coming in in another migration - wouldn't you just use the migration process plugin for that? In my case the entities already exist outside the migration.

Some additional thoughts, the property configuration option should be an array. It needs to be be for taxonomy terms as you'll want to search by name and vid.

I agree!

There's already a default_value plugin. I think it can be used instead, no?

Can you chain process plugins like that? I always wondered.

mikeryan’s picture

Can you chain process plugins like that? I always wondered.

Yes, see the full pipeline doc. Using the default_value plugin at the end of a chain is (or will be when there are more real-world migrations out there) a common pattern.
heddn’s picture

Title: Entity query process plugin » Create Entity query process Base plugin
Issue summary: View changes

Re: #11

  1. Throwing MigrateException is fine, just do it in a transform. See DedupeBase as an example.
  2. Stubbing from a previous migration assumes there is a previous migration. I'm loading a node from scratch and there are several term fields. I don't want to create several migrations just to insert the terms. This is similar to D7: https://www.drupal.org/node/1224042

I'm switching this issue around to create a base abstract class, with implementations being handled by concrete classes.

Adita’s picture

Issue summary: View changes
sime’s picture

Throwing MigrateException is fine, just do it in a transform. See DedupeBase as an example.

@heddn If a plugin is not properly configured, it should (ideally speaking) be possible to throw the exception before any processing begins.

Adita’s picture

FileSize
4.15 KB

It's changed to abstract base class that does only query.

heddn’s picture

Status: Needs work » Needs review
FileSize
4.81 KB

Some cleanup. Interdiff wouldn't make sense.

mikeryan’s picture

Status: Needs review » Needs work

Let's not use the word "stub" for the created entity here, it's very confusing because its purpose is different from the stub concept we already have in migrate. A stub is a minimal temporary holding place until real data can be filled in - here we're creating the real entity we want to create, so it's not really a stub.

Is there a reason we can't just have one general plugin that will work with any entity type? I really don't want to have to create a new process plugin for every single entity type out there...

heddn’s picture

Reason for calling it stub. It creates the basic required info for the entity. But it doesn't provide all the possible data. If there are fields on a term or file, those aren't populated. The term description isn't populated. The file image's alt and title text isn't populated. Which is why I went with stub. But I can respect that it seems confusing to use that language.

Having a single general plugin would be nice. I'll see if we can do it. There are different required pieces of data for each entity. We'd have to make some assumptions and add a few more configuration defaults. One issue is when you run into an entity that has some required field that isn't provided. But maybe that can be introspected from the entity name and (optional) bundle and an exception thrown. And maybe, we can even figure out if the entity is a bundle-able entity and throw an exception if the bundle name isn't provided. Lots of possibilities.

heddn’s picture

re #18/#19, that won't work. Files need the partial basename without the hostname. Other entities, custom or not will have similar one-off scenarios that creating a base class is worth it. Thoughts?

mikeryan’s picture

I'd like to see this modeled after the entity destination plugin - there's the general plugin which will work with any entity, but it can be overridden for entity types that require special handling. See user/src/Plugin/migrate/destination/EntityUser.php for an example of an override.

heddn’s picture

Linking to #2412569: Allow setting the auto-create bundle on entity reference fields with multiple target bundles and pondering what this does for migrations and specifically this process plugin. If the autocreate bit is flipped and a default bundle is defined, then we are golden, right? We still run into the problem where it is nigh impossible to reverse engineer the destination field config by only the field name. We'd still need the destination bundle and entity type on the process plugin.

heddn’s picture

Status: Needs work » Needs review
FileSize
5.26 KB
heddn’s picture

heddn’s picture

heddn’s picture

And the right patch now.

heddn’s picture

Title: Create Entity query process Base plugin » Create Entity Lookup process Base plugin
Issue summary: View changes
vasi’s picture

Issue tags: +Needs tests
vasi’s picture

Status: Needs review » Needs work
  1. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,187 @@
    +    if (!empty($this->configuration['value_key'])) {
    +      $this->valueKey = $this->configuration['value_key'];
    +    }
    +    elseif (!empty($valueKey) && !empty($this->entityManager->getBundleInfo($this->entityType)['label'])) {
    +      $this->valueKey = $this->entityManager->getBundleInfo($this->entityType)['label'];
    +    }
    +    if (empty($this->valueKey)) {
    +      throw new MigrateException('The entity_generate plugin requires a value_key, none located.');
    +    }
    +
    +    if (!empty($this->configuration['bundle'])) {
    +      $bundle = $this->configuration['bundle'];
    +    }
    +    elseif (!empty($bundle) && !empty($this->migration->getProcess()[$this->bundleKey]['default_value'])) {
    +      $destinationEntityBundle = $this->migration->getProcess()[$this->bundleKey]['default_value'];
    +      $fieldConfig = $this->entityManager->getFieldDefinitions($this->entityType, $destinationEntityBundle)[$destinationProperty]->getConfig($destinationEntityBundle);
    +      if ('entity_reference' != $fieldConfig->getType()) {
    +        throw new MigrateException('The entity_generate plugin found no entity reference field.');
    +      }
    +      if ($fieldConfig->getSetting('auto_create')) {
    +        $bundle = reset($fieldConfig->getSetting('auto_create_bundle'));
    +      }
    +    }
    +    if ($this->bundleKey && empty($bundle)) {
    +      throw new MigrateException('The entity_generate plugin found no bundle but destination entity requires one.');
    +    }
    

    None of this depends on $value. Maybe it should be factored out, and cached?

  2. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,187 @@
    +      return $this->generateEntity($value, $bundle);
    

    Could this be optional? Often I want to look something up, and it's acceptable if I find nothing.

  3. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,187 @@
    +    $ignoreCase = isset($this->configuration['ignore_case']) ?: FALSE;
    

    The migration author should be able to specify 'ignore_case: false'.

  4. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,187 @@
    +    if ($ignoreCase) {
    +      // Returns the entity's identifier.
    +      $valueLower = strtolower($value);
    +      $identifiers = $this->entityManager->getStorage($this->entityType)->loadMultiple($results);
    +
    +      foreach ($identifiers as $identifier  => $entity) {
    +        if ($valueLower == strtolower($entity->{$this->valueKey}->value())) {
    +          return $identifier;
    +        }
    +      }
    +    }
    

    If the condition on valueKey was case-insensitive, this section of code does nothing. If it was case-sensitive, then it already excluded the items that don't match case.

  5. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,187 @@
    +    return reset($results);
    

    Why not allow returning multiple matches?

  6. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,187 @@
    +  protected function stub($value, $bundle) {
    

    The code in destination\EntityContentBase::processStubRow() is much more thorough. Do you think we could re-use that?

heddn’s picture

Version: 8.x-1.x-dev » 8.x-2.x-dev
FileSize
5.84 KB
1.45 KB

Tests are next.

re #29
#1, process plugins are stateless. If someone wants to optimize things, just provide the values as config parameters to the plugin.
#2, addressed
#3, this isn't how I see it done for other boolean config values. see as an example #2639254: Make it possible to skip empty migration destinations
#4, addressed
#5, that could be in a follow-up. It is a bit of an edge case given the typical use case of importing taxonomy terms.
#6, this only builds out a very basic stub and EntityContentBase::processStubRow() builds out a lot more fields because it can. But we don't have any other values in the process plugin except the passed in single value. That said, the stub() method is externalized intentionally so extending the process plugin is trivial on a case-by-case basis to add in additional values for edge cases.

vasi’s picture

Ok, most of these I'm happy with. Except:

#3, That one uses !empty($this->configuration['no_stub']), which means 'no_stub: false' means FALSE. This does isset($this->configuration['ignore_case']) ?: FALSE, which means 'ignore_case: false' means TRUE.

#6, Do you think we could add a 'values' property to specify default values for the new entity? Otherwise it gets really hard to create stubs with required fields. But this could be a follow up.

We need docs for what this process does, and what config it accepts. Also, should it go back to entity_lookup instead of entity_generate, now that it allows just looking things up?

heddn’s picture

Status: Needs work » Needs review
FileSize
9.33 KB

re #31
3. I've gotten rid of the no stub logic. I'll think you'll be happy with the results. The ignore_case logic makes sense and is identical to what is used in D7.
6. Added as per your request.

Still needs tests, but I'm happy to say it works with a real, live migration of taxonomy terms. No interdiff since enough moved around I don't think it would be that helpful.

heddn’s picture

FileSize
9.33 KB

Fixed a couple small typos.

vasi’s picture

Status: Needs review » Needs work

Ooh, I really like where this is at! The two related plugins are a good idea.

  1. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,77 @@
    +      ->getStorage($this->destinationEntityType)
    

    This should be stubEntityType.

  2. +++ b/src/Plugin/migrate/process/EntityLookup.php
    @@ -0,0 +1,186 @@
    + * This plugin generates entity stubs.
    + *
    + * @MigrateProcessPlugin(
    + *   id = "entity_lookup"
    

    This plugin doesn't generate stubs, need a better description.

  3. +++ b/src/Plugin/migrate/process/EntityLookup.php
    @@ -0,0 +1,186 @@
    +  protected function collectStubProperties($destinationProperty) {
    

    I like that this is in its own method now, thanks.

  4. +++ b/src/Plugin/migrate/process/EntityLookup.php
    @@ -0,0 +1,186 @@
    +    if (empty($this->stubValueKey)) {
    +      throw new MigrateException('The entity_lookup plugin requires a entity_type, none located.');
    

    This should be $this->stubEntityType, we already check for stubValueKey above

heddn’s picture

Status: Needs work » Needs review
FileSize
11.1 KB
10.51 KB

Added some docs and addressed #34, 1-4.

heddn’s picture

One additional docs tweak.

heddn’s picture

FileSize
526 bytes
heddn’s picture

Title: Create Entity Lookup process Base plugin » Create Entity Lookup & Generate process Base plugins
edysmp’s picture

Status: Needs review » Needs work
edysmp’s picture

Status: Needs work » Needs review
FileSize
775 bytes
11.24 KB
edysmp’s picture

mikeryan’s picture

Component: Code » API
Status: Needs review » Needs work
  1. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,103 @@
    + * This plugin generates entity stubs.
    

    Been discussing this with heddn, I really don't like to use "stub" here, where it has a different meaning from elsewhere in the migration system. I'm not sure what's being created here actually needs it's own name, since they're just entities - for example, this comment could be "This plugin generates entities dynamically".

  2. +++ b/src/Plugin/migrate/process/EntityLookup.php
    @@ -0,0 +1,220 @@
    +use Drupal\migrate\Entity\MigrationInterface;
    

    Oops! 8.1.x core is dealing with migration plugins in the process plugins, not entities any longer.

I added a README.txt to migrate_plus this week, please add these plugins there in the next patch.

Extra credit - it would be *really awesome* to add an example of this. Either conjure up a new example migration in migrate_example_advanced, or perhaps modify migrate_example to make, say, field_migrate_example_country a term reference to a new Country vocabulary which uses these plugins to populate the terms.

I'm going to release beta1 without this, but I'd really love to see in the the 8.x-2.0 release.

Thanks!

mikeryan’s picture

Component: API » Plugins
FileSize
11.4 KB
2.01 KB

As it happens, I needed this functionality (at least the lookup, so far) for my new project, so here's a patch that fixes the entity_lookup plugin for 8.1.x. Since I didn't need generation (so far), I didn't touch that plugin here.

Note that besides fixing the use statement and a couple of trivial cleanups, I changed it to return NULL when the entity query returns nothing, which avoids the problem in #2692373: Value should be null when is produced skip process.

mikeryan’s picture

Oh, one observation - I really want to log to the message table if the entity query fails to find a match (controlled by a config option log_message or somesuch). But... it's challenging to log from a process plugin (e.g., how do we get the source ID?). Just putting it out there, this needs help from core (#2558047: [meta] Dealing with conditions during migration (messaging/exceptions) is confusing and inconsistent).

heddn’s picture

nvahalik’s picture

I needed a version which worked with multiple targets... so here goes.

esclapes’s picture

I am using this plugin (both lookup & generate) on a d6 -> d8 custom migration with csv source.

I am hitting the issue #2705925: ImageItem presave() fails when pointing to a non-existing file entity because the plugin adds an empty string as a target_id for my field_image and then the presave validation fails with no real image entity.

Will try to fix the issue and bring the patch, but I am not really familiar on how the field_image should be set so it is considered empty of any value, so any hint would be appreciated.

esclapes’s picture

  1. +++ b/src/Plugin/migrate/process/EntityGenerate.php
    @@ -0,0 +1,103 @@
    +  protected function generateEntity($value) {
    +    $entity = $this->entityManager
    +        ->getStorage($this->lookupEntityType)
    +        ->create($this->stub($value));
    +    $entity->save();
    +    return $entity->id();
    +  }
    

    only generate if not empty

  2. +++ b/src/Plugin/migrate/process/EntityLookup.php
    @@ -0,0 +1,241 @@
    +    if (empty($this->lookupValueKey) || empty($this->lookupBundleKey) || $this->lookupBundle || empty($this->lookupEntityType)) {
    

    third OR also needs empty()

vasi’s picture

Talking with @heddn at NOLA, we had an idea for providing field values of the generated entity. Instead of providing a static 'default_values' array, you could instead provide a 'values' key, that's just a key into the destination row.

For example:

process:
  field_tags:
    plugin: entity_generate
    source: tags
    values: tag_fields
  tag_fields/description: tag_description
  tag_fields/foo: bar
heddn’s picture

+++ b/src/Plugin/migrate/process/EntityLookup.php
@@ -196,9 +202,12 @@ class EntityLookup extends ProcessPluginBase implements ContainerFactoryPluginIn
+      ->condition($this->lookupValueKey, $value, $multiple ? 'IN' : NULL);

The last ternary should be '=', not NULL.

generalconsensus’s picture

Fixed module file reference so users won't get confused about integrating this core migrate, etc

heddn’s picture

generalredneck’s picture

There is a problem when assigning values to fields... where "properties" are working like a charm.

id: upgrade_d7_node_page
migration_tags:
  - 'Drupal 7'
migration_group: migrate_drupal_7
label: 'Nodes (Page 1211)'
source:
  plugin: d7_node
  node_type: page
process:
  field_body_paragraphs: <-- This is a Paragraph Entity Reference Revisions field.
    plugin: entity_generate
    source: body <-- this is your standard run of the mill D7 body field
    value_key: field_text <-- this is a Standard Text (Long) field with wysiwyg. Single value.
    bundle_key: type
    bundle: free_form_editor <-- Paragraph bundle. just has the "field_text" field
    entity_type: paragraph
destination:
  plugin: 'entity:node'

So given the example above. I'm migrating the body field into a Text (Long) field on a paragraph entity. So to import a field, you have to import the entire field... That means value and format.

In this case the body field looks like this coming over:

array(3) {
  ["value"]=>
  string(454) "<h2>Pellentesque</h2>
<p>Vestibulum ullamcorper mauris at ligula. Morbi ac felis. Donec id justo.</p>"
  ["summary"]=>
  string(0) ""
  ["format"]=>
  string(9) "full_html"
}

When run through the code in EntityLookup::query on line 207, this is going to search field_text:value for the values in "value", "summary" and "format"... like so:

SELECT base_table.revision_id AS revision_id, base_table.id AS id
FROM
{paragraphs_item} base_table
INNER JOIN {paragraph__field_text} paragraph__field_text ON paragraph__field_text.entity_id = base_table.id
INNER JOIN {paragraphs_item_field_data} paragraphs_item_field_data ON paragraphs_item_field_data.id = base_table.id
WHERE  (paragraph__field_text.field_text_value IN  (:db_condition_placeholder_0, :db_condition_placeholder_1, :db_condition_placeholder_2)) AND (paragraphs_item_field_data.type = :db_condition_placeholder_3)

I'm not sure what the solution here would be or if you plan for there to be one, but it's possible that I retrieve an entirely different entity than I intended due to this.

generalconsensus’s picture

Here is the patch and interdiff for #51

heddn’s picture

I think that patch is bad. It has too many folders in the path. And therefore the interdiff is also a little wonky.

killes@www.drop.org’s picture

Status: Needs work » Closed (fixed)

Yeah, the patch needs to be applied with -p2.

mikeryan’s picture

Status: Closed (fixed) » Needs work

We're not done yet!

killes@www.drop.org’s picture

Ooops! Must have been a bad case of wishful thinking! :D

killes@www.drop.org’s picture

I played a bit with the entity_generate option.

Some remarks:

1) seems to generally work

2) requires entity references, does not work with field collection fields

3) I wasn't able to get extra values from my migration to show up in the auto-created dependend nodes, seems that only the title field is populated.

4) when I rolled back the migration, the auto-created dependend nodes remained behind, IMO they should be deleted.

generalconsensus’s picture

@heddn The reason for the additional folder was to avoid confusion with the similar core paths. Ultimately this is fixed when this patch reintegrates with migrate_plus. I will remove my patch and interdiff

generalconsensus’s picture

ressa’s picture

As @mikeryan writes

Extra credit - it would be *really awesome* to add an example of this. Either conjure up a new example migration in migrate_example_advanced, or perhaps modify migrate_example to make, say, field_migrate_example_country a term reference to a new Country vocabulary which uses these plugins to populate the terms.

I couldn't agree more. From the top of my head, examples of these would also be great:

  • How to define an alias during import
  • How to make a reference to an already existing node by alias
  • How to import an image from a folder

mikeryan’s picture

Status: Needs work » Fixed

The basic functionality here has been stable for sometime and used by many people - I'm on my second project now using it. I regularly point people with use cases needing it to this patch - much easier if it's actually there no? So, I've committed it - we can further refine with individual issues (I've added followups for tests and examples).

Thanks for all the work on this!

mikeryan’s picture

Note the follow-up #2765543: Deprecate "stub" language in entity_generate - I just can't abide overloading "stub" for this use case.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

chx’s picture

For posterity: A) entity_lookup should have been a 1-to-1 encoding of entity query, at least a feasible subset of. A lot of it is feasible. The current plugin is buggy as it doesn't guarantee that the result of the lookup between two runs will be the same nor that the result is the desired one as the lookup is not restricted enough. B) entity_generate shouldn't exist. Just write another migration. But, I won't fixed the issue I filed for the latter. This is just a notice.

heddn’s picture