Considering how to easily provide specific source data for testing (discussion starting at https://www.drupal.org/node/2535458#comment-10175438), an ArraySource that was just like EmptySource except permitting injection of source data (defaulting to the single empty row EmptySource now has) would be useful. It may even be useful in real migration scenarios, for importing data that can be determined at configuration time (as opposed to coming from a dynamic source such as a SQL database.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

phenaproxima’s picture

I'm not yet certain how I feel about using such a source in real migrations, but for testing, I can imagine this being quite useful. +1.

mikeryan’s picture

I'm thinking "array" isn't right here - actually, the simplest way to specify hard-coded data for testing would be directly in the source schema. E.g.

id: node_test
label: Testing nodes
source:
  plugin: embedded_data
  embedded_data:
    -
      id: 1
      subject: dummy value
    -
      id: 2
      subject: dummy value2
process:
  nid: id
  title: subject
destination:
  plugin: entity:node

Thoughts on what the best name for a source plugin that works this way would be? Using 'embedded_data' for the moment, as you can see above...

mikeryan’s picture

Title: Replace EmptySource with new ArraySource » Create new source plugin taking data from plugin config
mikeryan’s picture

Status: Active » Needs review
FileSize
7.23 KB

Here we go - the EmbeddedSourcePlugin lets you put source data directly into the plugin configuration. I've switched MigrateEventsTest to use that instead of defining a custom source plugin just to deliver test data.

mikeryan’s picture

Priority: Normal » Major
Issue tags: +blocker, +Migrate critical

Blocks #2522012: Remove broken idlist handling, replace with more robust exception handling , which should use this for tests. Since that is Major and Migrate-critical, this should be as well.

benjy’s picture

Don't mind the name happy to go with that. The only other suggestion i had was something just like ConfigurableTestSource

Can we also remove EmptySource now?

+++ b/core/modules/migrate/src/Tests/MigrateEmbeddedDataTest.php
@@ -0,0 +1,72 @@
+  public function testEmbeddedData() {

Interesting test, had me confused at first, looked like something from the ban module.

mikeryan’s picture

Can we also remove EmptySource now?

Interesting question - I ran across at least one non-test usage of migrate's EmptySource that I didn't quite understand (which naturally I can't find at the moment). migrate_drupal's EmptySource extends that one as md_empty, and is easier to find usages for. I'm not sure it's that simple, but I could look at replacing the empty stuff (and extending the md_empty version from EmbeddedDataSource) tomorrow.

Interesting test, had me confused at first, looked like something from the ban module

Yes, borrowing the ban module's destination to test here, but I'm having second thoughts about that, it would be better to not have a dependency on an unrelated module.

Thanks!

mikeryan’s picture

So, with this all that's left of migrate_events_test module is the DummyDestination, which would also be useful here instead of borrowing the ban module. Perhaps we should add the destination-side equivalent of EmbeddedDataSource, a simple destination that just copies the incoming data somewhere we can easily validate its arrival.

phenaproxima’s picture

Status: Needs review » Needs work

Let's create a destination plugin that simply slaps $row->getSource() into somewhere temporary (cache, state, key-value storage, /dev/null, temp file...?) Then we can have a useful source and destination pair for testing purposes.

But maybe we should do that in a follow-up issue to prevent this one from getting too big and confusing. MigrateEmbeddedDataTest can use the null destination for now; in fact, I wonder why we're even bothering to test the destination. EmbeddedData is a source plugin, so why aren't we testing that functionality only?

A few other thoughts:

+++ b/core/modules/migrate/config/schema/migrate.source.schema.yml
+migrate.source.embedded_data:
+  type: migrate_source
+  label: 'Embedded data source'
+  mapping:
+    embedded_data:
+      type: sequence
+      label: 'Embedded data'
+      sequence:
+        type: sequence
+        label: 'Data row'
+        sequence:
+          type: string

I was thinking that this should be a sequence of mappings so that each data row is an array. As far as I can tell, this defines each row as a string, but that's not a very accurate approximation of how normal Migrate sources work. (If I'm misinterpreting the config schema here, disregard my blathering.)

+++ b/core/modules/migrate/src/Plugin/migrate/source/EmbeddedDataSource.php
+  public function __construct(array $configuration, $plugin_id, $plugin_definition, MigrationInterface $migration) {
+    parent::__construct($configuration, $plugin_id, $plugin_definition, $migration);
+    $this->embeddedData = $configuration['embedded_data'];

Can we call the config key data or rows or something, rather than embedded_data? I find that name confusing. (It's OK for the plugin ID, but it's a strange config key.) If we rename it, let's also change the $embeddedData property to match.

+++ b/core/modules/migrate/src/Plugin/migrate/source/EmbeddedDataSource.php
+  public function getIds() {
+    if ($this->count() > 0) {
+      // We take the first field as the key.
+      $first_row = reset($this->embeddedData);
+      $keys = array_keys($first_row);
+      $id_field_name = reset($keys);
+      $ids[$id_field_name]['type'] = 'string';

Can we use gettype() here to get the field type? It returns 'string' for strings and 'integer' for integers, which matches what other sources return. Alternatively, it might be handy to allow the plugin config to specify the ID keys directly, and have this auto-detection code as a fallback. Something like:

plugin: embedded_data
rows:
  -
    foo
  -
    bar
  -
    baz
ids:
  my_id:
    type: string
mikeryan’s picture

I'll reroll the test to just grab the source plugin and verify it delivers the expected data. Let's deal with a new destination in a separate issue.

+        type: sequence
+        label: 'Data row'
+        sequence:
+          type: string

I was thinking that this should be a sequence of mappings so that each data row is an array. As far as I can tell, this defines each row as a string,

That's defining each row as a sequence of strings.

Can we call the config key data or rows or something, rather than embedded_data?

OK, I'll go with rows.

Can we use gettype() here to get the field type?

gettype('1') == 'string'

Alternatively, it might be handy to allow the plugin config to specify the ID keys directly

I had that pegged for a follow-up once we had a use case, but I could do that now.

Thanks!

mikeryan’s picture

Status: Needs work » Needs review
FileSize
7.58 KB
6.86 KB

OK, here it is with a much nicer test. Interdiff might be too much to be useful, but here it is anyway...

mikeryan’s picture

With requested comment change.

phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me.

alexpott’s picture

Status: Reviewed & tested by the community » Needs review

It kind of scares me that embedded data is an regular migration source plugin - do we have use-cases outside of tests?

phenaproxima’s picture

I do not believe we have a use case outside of tests -- yet -- but there are a few places it might be useful (it's more straightforward, for instance, than using the EmptySource and a bunch of constants, which IIRC is done by a few real migrations).

As for it being a regular source plugin, I'm not sure what else it would be? We already have a "null" destination plugin which sends incoming rows to the Big DIMM in the Sky, and that is a "regular" destination plugin which is a) part of Migrate API and b) so far unused outside of testing. What are you envisioning?

mikeryan’s picture

I think this will be useful outside of tests - I've seen plenty of cases where some small amount of fixed (for a given project) data is imported using migration, so that it can be referenced by other migrations (in D8 parlance, with the 'migration' process plugin). A "Channels" vocabulary is one such use case I've seen multiple times. With this plugin, it'd be simple to have a UI tool to enter a few rows of data that can be baked into a migration configuration.

mikeryan’s picture

E.g., channels.yml:

id: channels
source:
  plugin: embedded_data
  data_rows:
    -
      channel_machine_name: music
      channel_description: Music
    -
      channel_machine_name: movies
      channel_description: Movies
  ids:
    channel_machine_name:
      type: string
process:
  name: channel_machine_name
  description: channel_description
destination:
  plugin: entity:taxonomy_term
phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Good call @mikeryan. It's quite possible that you might want, in some cases, to define arbitrary source data in a migration. It's an edge case, but combined with the usefulness for testing purposes I see no reason this should not go ahead.

webchick’s picture

Assigned: Unassigned » alexpott

Back to alexpott.

phenaproxima’s picture

Issue tags: -Migrate critical

With the advent of the event system, the idlist functionality (#2522012: Remove broken idlist handling, replace with more robust exception handling ) is something we'll soon be able to implement in contrib, which is where it rightfully belongs. So I no longer think this qualifies as Migrate-critical.

phenaproxima’s picture

Priority: Major » Normal

De-escalating.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 12: create_new_source-2544690-12.patch, failed testing.

phenaproxima’s picture

Issue tags: +Needs reroll
mikeryan’s picture

Status: Needs work » Needs review
FileSize
10.46 KB

The test for the interruption patch was leveraging the source plugin provided for the events test (and modified it to provide more data), which rather confused things. So, with this version we're have both the events and interruption tests leveraging this new source plugin rather than having to define a custom one to provide them with test data.

Note that at the moment #2522012: Remove broken idlist handling, replace with more robust exception handling is creating its own source plugin to provide it with test data - if we can get this patch it, then I'll update that to make use of the embedded_data source plugin. If the idlist patch gets in first, we'll want to update this patch to remove the custom source plugin in favor of this one.

phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Look good to me, this does.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Thanks for answering the question. I can see the use-case and think I've done something similar myself. Committed 427d080 and pushed to 8.0.x. Thanks!

  • alexpott committed 427d080 on 8.0.x
    Issue #2544690 by mikeryan, phenaproxima: Create new source plugin...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.