Problem/Motivation
When importing data using the Migrate module and the concat process plugin (http://cgit.drupalcode.org/drupal/tree/core/modules/migrate/src/Plugin/m...), if the data source has empty values, the delimiter is included anyways.
Example:
| id | value_1 | value_2 |
|---|---|---|
| 1 | 'Value 1' | 'Value 2' |
| 2 | 'Value 1' | |
| 3 | 'Value 2' | |
| 4 |
If I'd wanted to have a migration that used concat to join value_1 and value_2 I'd have these settings:
my_field:
plugin: concat
source:
- value_1
- value_2
delimiter: ' & '
Which will result in the following data:
| id | join value |
|---|---|
| 1 | 'Value 1 & Value 2' |
| 2 | 'Value 1 & ' |
| 3 | ' & Value 2' |
| 4 | ' & ' |
I don't think 2, 3 and 4 are the desired result.
Proposed resolution
At a migrate meeting #3160932: [meeting] Migrate Meeting 2020-07-30 it was decided to not change the concat process plugin because this same result can be achieved with a callback using array_filter followed by concat. Improvements to the callback process plugin may also help, there is an issue in Migrate Plus for that, #2882276: Extended callback process plugin to call functions with multiple parameters.
original proposal
Add an array filter to strip out empty values when concat'ing. I don't see a use case of wanting these empty values, but this could be optional.
| Comment | File | Size | Author |
|---|---|---|---|
| #20 | 2913428-20.patch | 6.21 KB | sylvainm |
| #16 | 2913428-16.patch | 6.17 KB | jofitz |
| #16 | interdiff-11-16.txt | 5.35 KB | jofitz |
| #11 | 2913428-11.patch | 5.56 KB | jofitz |
| #11 | interdiff-7-11.txt | 642 bytes | jofitz |
Comments
Comment #2
pcambraComment #3
pcambraComment #4
jofitzFirstly, this will need a test to demonstrate the problem (and prove that the patch fixes it). Secondly, perhaps we should add "allow_empty" to the plugin configuration (probably set to false by default) for those (rare) cases when I user might want ' & Value 2' (or similar).
Comment #5
pcambraThanks for the tags Jo, obviously this needs tests and a decision on the allow_empty option, I'll wait for a component maintainer to chime in before investing in adding those.
Comment #6
quietone commentedYea, Jo Fitzgerald is on it, as usual. Everything in #4 sounds like a good implementation, with the addition of updating the API documentation will need updating.
Comment #7
jofitzComment #9
heddnLooks good to me. Docs are added and tests too. I don't think we need a CR for the very minor API addition. This can probably make its way into 8.4, since migrate is still beta/experimental. So I've switched back versions. And adding a test for good measure.
Comment #10
quietone commentedReally sorry to set this back to NW. Using the structure 'if set' was pointed out in another migration issue (which of course I can't find) so let's use it here.
It would be better and be consistent with the other process plugin docs to use something like; '(optional) If set, empty values will be included in the output. Defaults to FALSE.'
Comment #11
jofitzUpdated the comment for consistency, as suggested by @quietone.
Comment #12
heddnDocs fixes added in response to #10. Back to RTBC.
Comment #13
quietone commented@Jo Fitzgerald, thanks!
Comment #14
alexpottI wonder if changing the default behaviour is a good idea - someone's migrations could be relying on this? Maybe we should swap the configuration default?
Comment #15
alexpottAlso migrate is beta and this plugin is part of the API so this needs a change record.
Comment #16
jofitzI have attempted to address @alexpott's very valid argument in #14. I have corrected that oversight, but in doing so I have come across a problem: when exclude_empty is TRUE zero values are excluded (when they shouldn't be, imo). Any suggestions?
Comment #17
heddnHonestly, this whole piece of work seems a little like we could either 1) move it to contrib - migrate_plus is a good landing place or 2) leave it to a custom extention. Where in core do we need this functionality? Can we get a convincing argument why filtering empty is needed? we can always run array_filter with a callback process plugin beforehand, no?
Comment #20
sylvainm commented@Jo Fitzgerald
we could use strlen as second paramater of array_filter, as suggested in a comment of php.net: https://secure.php.net/manual/en/function.array-filter.php
I just changed that in attached patch
Comment #21
sylvainm commentedComment #23
quietone commentedAlthough initially my reviews suggested that I agree with this change, after all it seems quit logical. However, I am no longer convinced it should be in core. It would be a nice addition to migrate_plus, giving folks an option and could reduce the number of processes in some pipelines.
First, the concat process plugin implements implode. Once you know that you will expect implode not some variation. Now, the name concat is somewhat unfortunate where implode seems to be the best choice but I don't know the history. And after all, there is explode.
Two, it isn't needed in core. And as heddn points out in #17 there are ways to filter out the empty strings if that is what is needed.
Three, changing the default behavior of concat is likely to cause grief as alluded to by alexpott in #14.
Comment #24
heddnAfter thinking about it some more, if we moved to contrib... What would we name this thing? Process plugins work best when they do single chunks of work. And then chain them. Perhaps that's what should be done here. Just use callback with array filter. And this get closed won't fix?
Comment #25
heddnRe tagging as feature rather than task.
Comment #26
quietone commentedGood point about the name.
We could add documentation showing how to use concat with array_filter somewhere in https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins. Would that be sufficient?
Comment #27
sylvainm commentedWith the array_filter callback solution, we can't choose the second parameter to pass to array_filter.
And we need it to avoid the problem of zero values
Comment #28
heddnThen perhaps we should add an array_filter process plugin to migrate_plus?
Comment #29
sylvainm commentedWith such a plugin, we could solve the problem, think I.
I can provide a patch (and an issue in migrate_plus project) if you want
Comment #33
quietone commentedReviewing the issue there seems to be consensus to move this to Migrate Plus. This came after a comment by alexpott that the patch in #14 would break existing migrations. While a later patch, #20, may have fixed that it was after heddn suggested, #17 that this be moved to Migrate Plus. He pointed out that this isn't needed by core, there is a workaround and no one has presented a argument why this is needed. The following comments support the move. The final discussion was to create an array_filter process plugin instead of changing concat.
An issue needs to be made in Migrate Plus and then this becomes a won't fix.
Comment #34
quietone commentedDiscussed this at the migrate meeting today, #3160932: [meeting] Migrate Meeting 2020-07-30.
The desired result can be achieved by using a pipeline of callback, with array_filter before the concat to obtain the desired result. That kind of processing is what the pipeline is for and it means there isn't a need for change to concat. Creating an array filter process plugin can also be avoided if the callback process plugin is improved. There is an issue in Migrate Plus for that, #2882276: Extended callback process plugin to call functions with multiple parameters.
@pcambra, thank you for suggesting this change to concat and for the patch. Also to jofitx and SylvainM for adding patches and to alexpott for the review that started a rethink of this issue.
Comment #35
byrond commentedHere is an example using the pipeline described above in #34: