Problem/Motivation
We often need to copy/paste YAML config across several migrations.
If we want to apply a specific suite of process plugins to all fulltext fields (e.g: dom process plugins), all you can do ATM is copy/paste the suite of yaml config everywhere.
Same if you have a specific process plugin with specific/complex config.
If different migrations use the same pipeline for the same source and destination fields, then we can use shared configuration provided by the migrate_tools module. (That option did not exist when this issue was created.) But we still need a way to handle different fields.
Proposed resolution
Add the snippet plugin, which reads a process pipeline from a YAML file in the directory mymodule/migrations/process/.
For example, if the file mymodule/migrations/process/clean_whitespace.yml contains this code
- plugin: callback
callable: htmlentities
- plugin: str_replace
search:
- ' '
- ' '
replace: ' '
- plugin: str_replace
regex: true
search: '@\s+@'
replace: ' '
- plugin: callback
callable: trim
then we can use the snippet plugin in mymodule/migrations/test.yml like this:
id: test
label: Test the snippet process plugin.
source:
plugin: embedded_data
data_rows:
- id: 1
string: 'Taxonomy term 1'
- id: 2
string: ' Taxonomy term 2 '
ids:
id:
type: integer
destination:
plugin: entity:taxonomy_term
default_bundle: tags
process:
name:
- plugin: snippet
module: mymodule
path: clean_whitespace
source: string
Original proposal
Define a "pipeline" process plugin that defines a suite of process plugins to run.
Given the following example in a migration:
process:
'body/value':
- plugin: pipeline
id: my_process_pipeline
source: 'body/0/value'
We would create the file migrate_plus.migration_process_pipeline.my_process_pipeline.yml with config:
langcode: en
status: true
id: my_process_pipeline
label: 'My process pipeline'
description: 'Process pipeline for processing fulltext fields'
process:
-
plugin: dom
method: import
-
plugin: dom_str_replace
mode: attribute
xpath: '//a'
attribute_options:
name: href
search: 'foo'
replace: 'bar'
-
plugin: dom_str_replace
mode: attribute
xpath: '//a'
attribute_options:
name: href
regex: true
search: '/foo/'
replace: 'bar'
-
plugin: dom
method: export
| Comment | File | Size | Author |
|---|---|---|---|
| #19 | interdiff_18-19.txt | 2.84 KB | danflanagan8 |
| #19 | migrate_plus-snippet_process_plugin-3123534-19.patch | 4.22 KB | danflanagan8 |
| #2 | migrate_plus_process_pipeline_plugin-3123534-2.patch | 7.17 KB | herved |
Issue fork migrate_plus-3123534
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
herved commentedI have started a very first draft
Ideas and opinions are most welcome.
Tests of course still needed.
Comment #3
ergonlogicFrom the project description:
Now, admittedly, this is limited, since migrations cannot belong to multiple groups, AFAICT. But it does allow you to significantly cut down on duplicate config.
Comment #4
herved commented@ergonlogic Yes, I know we can share config using the group's
shared_configurationbut it's far from ideal IMO for the process part.The config keys are merged in the migrations in migrate_plus.module and sharing process configuration presents a few issues:
1. If we add a mapping for a field in the group, e.g:
The body will be processed for every migration, even if this field has no use in most migrations.
So it can slow down significantly or even fail migrations for no reason.
2. Extracting a suite of process plugin into their own config allows to reuse it for other sources and destinations.
We can also customize it and inject it in the middle of other process plugins.
E.g:
Comment #5
heddnWould any of the guidance for re-use of yaml found in https://medium.com/@kinghuang/docker-compose-anchors-aliases-extensions-... be helpful? It doesn't let you re-share across yamls, but it does let you share within the same yaml file.
Comment #6
herved commented@heddn Indeed, using YAML anchors would be ideal but it's only working in the same file.
It could be helpful in some cases but in many cases we actually need to share across YAML files.
I also thought about importing a YAML containing anchors and then using the aliases in the migrations but AFAIK this is not possible.
I found this core issue: #2284787: Allow yml files to nominate imports but if I'm not mistaken, this issue only addresses non-config files (router.yml, services.yml, etc) and doesn't address config files which are decoded in another way, as raw in
\Drupal\Core\Config\FileStorage::decode.And even if we would manage imports in config files, I wonder if an alias could be resolved (I'm guessing it won't) though I'm sure there are ways to merge YAML config in PHP but I suppose this won't happen for config files.
So I came up with this plugin.
Comment #7
mikeryanI've often thought it would be great if YAML had a preprocessor like C, with
#includeand#define. But, of course, the C preprocessor can be used with any kind of file - so why reinvent the wheel? Now, reading that, perhaps m4 (which I've never used) would be a better choice.The point being - there's no need to implement something custom here, just someone to write up a tutorial on using
cpporm4with YAML. Maybe develop some conventions and best practices for migration files, a repository of snippets for common use cases, etc.Comment #8
herved commented@mykeryan This is interesting but how would this work (roughly)?
Would it be inside of the PECL yaml extension?
Comment #9
estoyausenteHi, I have test the code and it works like a charm! It's being very useful for me and I think that is a very simple tool that can be shared to the community.
I'm not sure, @mykeryan, if we can resolve this requirement using a C preprocessor, cpp or something like that. Really, I don't know. Maybe we can do it, but I think that developers prefer to use a simple process plugin like this.
BTW, Herve, thank for sharing! My two cents is that incorporate this to migrate_plus is a good idea! :)
Comment #10
quietone commentedI do like not reinventing the wheel so I tested a bit with using m4 as suggested by mikeryan in #7. I created a small migration yml, test.yml, and an m4 file, sample.m4, containing a definition of a pipeline with two destination fields. In that file the destination fields names are set as $1 and $2 so they can be filled in when the command run. Just thought it would be nice to see some flexibility here.
test.yml
sample.m4
Now process test.yml with the sample.m4
That is certainly easy. Is this worth pursuing?
Comment #11
heddnThe great gitlab migration is upon us! This is now a duplicate of https://gitlab.com/drupalspoons/migrate_plus/-/issues/133.
Comment #12
benjifisherNow that this issue queue is active again, I think we should re-open this issue.
We have a choice of approaches to take!
New configuration type
Use the
pipelineprocess plugin from the patch in #2 and pipelines defined inmigrate_plus.migration_process_pipeline.*.yml.I think it will work only with migrations defined as config entities (using the Migrate Plus module), not core migrations (defined using
mymodule/migrations/*.yml). Maybe not. It will certainly require the Migrate Plus module.YAML anchors
See #5, #6.
I think this only works to reuse a pipeline within a single YAML file.
Do we have any working examples? I am not sure that the YAML parser we currently use (Symfony?) supports anchors.
Preprocessor
See #7, #10.
This requires some sort of build process or manually running the preprocessor.
New process plugin and migrations/process/*.yml
The
snippetplugin from #3214186: Scan for snippets of process plugins is similar to thepipelineplugin, but it looks for "snippets" inmymodule/migrations/process/*.ymlinstead of in config entities.This is currently a patch on the Migrate Scanner module, but it could easily be moved here. It will work like any other process plugin. I may be biased, but I think the implementation is simpler than for the
pipelineplugin.Using a specific directory may be unfamiliar for people used to working with Migrate Plus configuration.
New key in the migration YAML
@mikelutz proposed this in #3214258: [meeting] Migrate Meeting 2021-05-20.
The idea is to allow a new key,
process_pipelines, in the migration YAML. Presumably, this would work with either core migrations or migrations defined by config entities.Example:
Using just Drupal core, a pipeline could be reused within a single migration, as in the example.
Using Migrate Plus, the pipelines could be defined in a migration group and shared with multiple migrations.
I think this would have to be implemented as a core patch. Is there any use for it in core? A good place to look is the List of core Migrate process plugins, especially the ones outside the Migrate module.
I think this would be more complicated to implement than the
pipelineorsnippetplugin.Comment #13
donquixote commentedWhat did I miss? I know these issues were on drupalspoons before. Is there any announcement of the issues moving back here?
In the project we are currently working on, we are using patch #2 successfully, with migrations as config entities.
We maintain the migrations in a module config/install dir, and run a selective drush config-import to update them.
However, I think all that we are doing there could have also been achieved with migrations declared as plugins with yml. I don't really know what we gained by using config entities.
So perhaps a pipelines solution should also support a setup where migrations are not config entities.
In most of the cases where we use pipelines, we reuse them across yml files. Being limited to the same yml file would defeat the purpose in our case.
I think I prefer processing plugin definitions rather than mess with the yml language.
Not sure what exactly is being proposed here, but:
We could use a plugin deriver to register "composite processors" as new process plugins. This would achieve the same goal as the pipeline. There could be an option to make this a config entity.
This would restrict reuse of the pipeline to the migrations within the same group. This seems unnecessarily restrictive to me, and would also reduce flexibility for how we can use groups.
I would prefer if this was not tied to migration groups.
In fact I would prefer some kind of inheritance over migration groups shared_configuration.
Comment #14
benjifisher@donquixote:
Thanks for the input!
It was announced on the #migration channel in Drupal Slack. Is there another place where you would be more likely to notice the announcement? Was the move to DrupalSpoons announced somewhere?
I would like to know for sure whether the
pipelineplugin from #2 works with core migrations. Can you do a test?Sample usage for the
snippetplugin (simplified version of the example in #3214186: Scan for snippets of process plugins):mymodule/migrations/process/clean_whitespace.yml:That is used in this migration,
mymodule/migrations/test.yml:To be clear: "I may be biased" means "I wrote this."
I considered a plugin deriver, but I decided that it was simpler to have a single plugin and read
migrations/process/*.ymlfrom that plugin.Comment #15
donquixote commentedI don't know what would be the "correct" place to do this.
One possibility would be to have an issue either here or on drupalspoons, and then a short note on the module page with a link to the issue.
Could also be a wiki page or a change record. I don't know if any of this is the ideal solution, but at least issues are easy to create.
I never used the core migrations, so I would not even know where to start :)
Also, do we need pipelines to work with core migrations?
I don't have a clear opinion whether I prefer a plugin deriver or a single plugin with a setting.
However, if the files are like "$module_dir/migrations/process/*.yml", then I would expect these to be process plugins.
If they are just options within the snippet plugin, I would name them e.g. "$module_dir/migrations/snippets/*.yml".
Comment #16
heddnSeems related to #3250040: Shared configuration for migrations.
Comment #17
danflanagan8I did a quick test of patch #2 using Migrate Sandbox (which doesn't rely on migrate_plus to run migrations) and it worked fine.
My "Migration Process Pipeline" plugin (migrate_plus.migrate_process_pipeline.my_process_pipeline.yml) ended up looking like this:
It obviously doesn't do anything useful, but it's fine for a test!
My process pipeline looked like this:
I also used the patch for migrate_scanner (though it no longer applies to HEAD) and had similar success. I put the following in my_module/migrations/process:
The process pipeline read like so:
These two approaches don't differ that much from from the outside at least. I guess I think the migrate_scanner approach is a little bit simpler for me. It's clumsy to get new config entities discovered whereas the snippets get discovered easily.
I personally find #2 and the migrate_scanner approach easier to grok than the approaches in #10. But I also understand the sentiment of not wanting to reinvent the wheel.
Comment #18
danflanagan8Here's the migrate_plus patch version of @benjifisher's MR against migrate_scanner, creating the Snippet process plugin described in #12 and #14. This is 100% the work of @benjifisher.
Comment #19
danflanagan8I ran into a little bug using the snippet process plugin in the patch in #18, specifically in the case where a process plugin needs the migration passed to it (like migration_lookup). This fixes that problem.
Comment #20
amaisano commentedThis doesn't work with sub_process plugin and/or when sending in an array. The snippet plugin ends up stripping the 1st item out of the source array from the main migration yml rather than the whole array, so the snippet file only gets the 1st item.
Comment #21
danflanagan8Would it work to put the single_value plugin in the pipeline ahead of the snippet plugin in that case?
Comment #22
amaisano commented@danflanagan8 that seems to be working! What a nice workaround. Thank you.
Snippet YML:
--migrate-debug output:
It's interesting how the output values contain the pseudo_field, which usually isn't present in the output for Destinations. It has no effect on the import thankfully.
Comment #23
alisonI just started using #19 to do identical processing on a bunch of rich text fields and I'm in love, this feature is absolutely wonderful, THANK YOU!!
The status is "Needs work" not "Needs review," so I won't change it to RTBC, but do y'all know what's left needing work, or maybe can we move this to RTBC........?
Comment #24
danflanagan8The `snippet` plugin for sure needs test coverage. (Added the tag.) And the IS here describes a different approach. I would be in favor of updating the IS to describe the `snippet` plugin. So I added that tag too.
Comment #25
benjifisherI am updating the title and issue summary, since I think we have consensus on using the
snippetplugin.To do:
I am assigning the issue to myself, but if I do not make any progress in a week, then feel free to change that.
Comment #26
benjifisherComment #27
benjifisherComment #28
benjifisherComment #30
benjifisherI made a MR based on the patch in #19.
I notice that there is a todo comment in the code:
We should handle that as well as add a test.
Comment #31
benjifisherI handled the @todo comment and added a kernel test. This issue is ready for review.
Comment #32
benjifisherComment #33
danflanagan8I've been working on the "T" part of RTBC and almost everything is amazing. I'm getting helpful exception messages when the snippet of my yaml is bad or when the "path" I configured was incorrect. Or when I forget to put the "module" or when I use a module that isn't installed (which is even a different message!). Really awesome DX.
I've also used the plugin successfully on multiple projects.
The only problem I've noticed is that the snippet plugin doesn't attempt to handle stopping a pipeline. (#3245997: Allow process plugins to stop further processing on a pipeline)
What should happen if I use this snippet?
I think I'd want to get
nullbut instead I get that default message when the source is not numeric.If we don't support stopping in a snippet, we should document that. I think the way to support it would be to go back to the foreach loop and do something like:
Comment #34
benjifisher@danflanagan8:
Thanks for all those good ideas! I think the plugin will be a lot more robust thanks to this input. Back to NR.
Since
isPipelineStopped()was introduced in Drupal 10.3.0, andmigrate_plusdeclares compatibility with 9.1, I addedmethod_exists($step, 'isPipelineStopped')to the test. With earlier versions of Drupal core, the code will never reach thebreakline, but aMigrateSkipProcessExceptionwill be handled by executable.I am disappointed: I do not think I have ever used
array_reduce()before, and I thought it was an elegant way to implementtransform(). But I do not know how to break out ofarray_reduce(), except by throwing an exception, so I went back to aforeachloop.Comment #35
benjifisherOn second thought, I need to do more: the
snippetplugin should implementisPipelineStopped(), not just check that method for the process plugins in the pipeline. Maybe this is a hint that I should change the implementation to be more like thesub_processplugin, so that I am not re-implementing the executable. If I do that, then usingsource:in a snippet should work normally.I am setting the status to NW and assigning this issue to myself. I intend to update it within the next few days.
Comment #36
danflanagan8Oh, this is actually a really subtle and interesting point. What IS the snippet? I think I was imagining that stopping within the snippet pipeline would not result in the main pipeline stopping. Does one imagine the snippet as a branch off of the main pipeline (like subprocess)? Or does one imagine the snippet as being inserted into the main pipeline (basically a syntactic trick)? I think there are good uses for both implementations.
It depends on what we're going for. Is the snippet supposed to act more like subprocess? Or are we supposed to imagine the snippet's yaml being inserted into the main pipeline?
That occurred to me too! Your beautiful new bit of documentation would be for naught. :(
Comment #37
benjifisherPersonally, I think of it the second way.
A more important consideration is that the plugin should behave the same way in Drupal 10.2 as it does in 10.3 and 11.0. I added a test: it uses a snippet that includes
skip_on_empty. In older versions of Drupal (and with the version of this plugin that several people have been using) that throws aMigrateSkipProcessExceptionexception, which is caught by the code calling thesnippetplugin. In order to get the same behavior with newer versions, I stop the pipeline and returnNULL.I did not quite get it to work, but I think the constructor could create a stub migration using the
nullsource and destination plugins andThen create a
MigrateExecutableobject with that migration and save that as a class variable instead of the$pipelinearray. Then thetransform()method would just have to setsnippet_valueon theRowobject, callMigrateExecutable::processRow(), and return thesnippet_resultdestination property.I think I will leave well enough alone and set the issue status back to NR.
Comment #38
danflanagan8> Personally, I think of it the second way.
Gotcha. Re-reading the IS, I'm convinced that's what's being asked for anyway.
> A more important consideration is that the plugin should behave the same way in Drupal 10.2 as it does in 10.3 and 11.0.
It should be tested manually, but the latest looks good to me. It's right to return
$valueinstead ofNULL, which you must have realized after #37Leaving at NR for the T part of RTBC, as you say.
Comment #39
alisonFinally back to re-test!
FYI, on my current project, I'm only using snippet to do text format mapping on all my rich text fields (so far) -- so, it's only one snippet, and it's only on one type of field/data/whatever -- and only on one use per migration.
Before today, I was using the 2024-08-01 version of the patch/MR.
--updateflag -- everything worked great."Everything worked great" = node data came in, no errors, proper text formats applied to rich text fields 🎉
Please let me know if there's any other info I should share. Thank you!!
Comment #40
amaisano commentedI was worried when I saw additional notes in the doc comments about needing to explicitly `get` source values inside the snippet, because I have code like this:
Parent migration:
Snippet:
But things APPEAR to still have worked fine, both before (using an earlier version of this patch) and now. Is this comment only for the first step in the pipeline? It is confusing to me.
I was afraid the '@_parent_id' and/or the 'constants/xyz' sources would start to fail.
Unless I'm missing something? The comment in question:
* Normally, any process plugin can specify the source key, which implicitly
* adds the get process plugin to the pipeline. This does NOT WORK inside a
* snippet:
*
* @code
* # The source will be ignored. The source of the snippet plugin is always
* # used.
* - plugin: default_value
* default_value: default value
* source: some_field
* @endcode
*
* Instead, explicitly add the get plugin to the pipeline:
*
* @code
* - plugin: get
* source: some_field
* - plugin: default_value
* default_value: default value
* @endcode
Comment #41
benjifisher@amaisano:
I tested your example with both the current version of MR 101 and the first commit on the feature branch, which is the same code as Comment #19. Both times, your snippet did not work. Do you have any custom code that might be involved?
I created the test module
sw_migrate_contentusingdrush gen module, and I copied your snippet intomodules/sw_migrate_content/migrations/process/text_format.yml. I installed the Migrate Conditions and Migrate Sandbox! modules. I copied theprocesssection of the migration from your comment and entered the following for the data row:For the constants, I entered
When I tried to process the row, I got the error
When I modified the snippet as suggested in the doc block (explicitly adding the
getplugin), it worked as expected:I do not see any changes in the code that would affect this behavior. I added error checking, used constructor promotion, and updated the code to check
isPipelineStopped()when it is defined. None of that seems relevant.Comment #43
heddnThanks for the candid feedback and collaboration on this ticket.
Comment #45
heddnCan we add a change record with docs how to use this?