Problem/Motivation
Currently the SubProcess migrate process plugin has some issues, when dealing with XMLs as source data (in combination with migrate_plus).
Example XML structure:
<?xml version="1.0" encoding="UTF-8" ?>
<article>
<elements>
<id>element.text.001</id>
</elements>
<elements>
<id>element.text.002</id>
</elements>
</article>
The corresponding migration fetches all elements nodes via the following XPath statement: /article/elements.
As a result, I get an array of SimpleXMLElement objects - one for each elements XML node.
I am currently trying to pass this array of SimpleXMLElement objects to the SubProcess plugin. In its transform method, there is the following line of code that creates the new data sub-row (where $new_value is the currently subprocessed SimpleXMLElement of the passed in array):
$new_row = new Row($new_value + $source);
This code unfortunately errors out, due to the fact that you can't merge a SimpleXMLElement object with an array like this.
Proposed resolution
Ensure that the $new_value variable is an array before trying to merge it with $source. Otherwise throw a MigrateException with helpful debugging message.
Remaining tasks
Create a patch
User interface changes
n/a
API changes
The SubProcess plugin can throw a MigrateException exception.
Data model changes
n/a
| Comment | File | Size | Author |
|---|---|---|---|
| #44 | 3048464-30.patch | 3.63 KB | quietone |
| #31 | interdiff_27-30.txt | 2.49 KB | danflanagan8 |
| #31 | 3048464-30.patch | 3.63 KB | danflanagan8 |
| #27 | 3048464-27.patch | 3.62 KB | danflanagan8 |
| #25 | interdiff_22-25.txt | 1.01 KB | danflanagan8 |
Comments
Comment #2
hctomComment #3
hctomComment #4
hctomHere is a patch with a simple workaround to ensure
$new_valueis an array.I'd appreciate your feedback ;)
Comment #5
heddnLet's move this over to migrate plus. I think its xml processing should be return array perhaps. I doubt this will ever change in core.
Comment #6
hctom...erm, but shouldn't Drupal core catch the case that there might not be an array passed in by whatever preceeding process/source plugin?
Comment #7
heddnGood point. It should complain about that. But that fix (throw skip row exception) and this solution here (xml not passed as array) are different things. Let's fix those things separately.
Comment #8
hctomOkay so let's just throw a MigrateSkipRowException instead of casting the value to an array. Attached is an updated patch. But as this has nothing to do with migrate_plus anymore, I also change the project back to Drupal core.
Comment #12
quietone commentedComment #13
quietone commentedThis needs a test.
Comment #15
joegl commentedIs there a corresponding issue to fix this in Migrate Plus?
We are using the patch in #4 because it actually resolves the issue, whereas #8 only throws an exception which is unhelpful while there are no other solutions or workarounds.
Comment #16
danflanagan8I took a stab at some tests. A fail test is attached. I also combined the patch from #8 with the fail test to show that it makes the test pass.
If the tests look good someone can remove the Needs tests tag.
Comment #17
danflanagan8Just when I think I'm good at git I do something like that...
Here are patches without a bunch of accidental local changes.
Comment #18
danflanagan8Oh boy...let's try that again again.
Comment #20
danflanagan8Finally! Hopefully this looks good to the community. I'm going to take off the needs tests tag. This definitely recreates the situation described in the IS.
Comment #21
quietone commented@danflanagan8, sorry I didn't get to this sooner. This looks great, just a few improvements.
1) I started by looking at and running the test. I got the expected error. Since we are here lets add a few more test cases, say for integers, booleans and NULL and an iterator, that is, other non arrays. I was also wondering why simple values like 1 or TRUE are not tested.
2) Then I looked at the changes to the process plugin and see that this is not testing the value on input but on each element in the array. That makes sense here. Nice to keep this limited to that case.
3) Now, why throw a SkipRow instead of a SkipProcess? I guess because sub_process can't do the work expected of it it is better to skip the whole row, similar to static map throwing a skip row.
4) Then I looked at the exception message. Let's make this like the one in the Extract process plugin because it provides more data. Now that #2976098: MigrateExecutable should add details for the migration & destination property to exceptions that cause a row failure the message will contain the process plugin ID so there will be no confusion on which plugin threw the exception.
Comment #22
danflanagan8Here are some updates, @quietone. Please let me know if I missed anything or did anything extra. I was a little confused about which of you bullet points were actual requests vs. which ones were simply your musings on the issue. :)
Here's a comparison of your bullets and what I did!
Comment #24
danflanagan8Ugh, see I wasn't kidding when I said I went through 10 versions of that language. That was lazy.
Here's a patch I swear passes. I'm not going to trigger tests in an attempt to save resources.Nope. Ignore this patch because I messed it up also.
Comment #25
danflanagan8Ignore the last patch despite my claim that "I swear [it] passes" because I made a git error and didn't inspect the patch manually. This is good advice: don't trust me ever!
So here's a new patch. I'm just going to run the tests again. I'm not clever enough to try to save resources...
Comment #26
quietone commented@danflanagan8, this looks good and I don't think there is a test case missing.
Unfortunately, the patch no longer applies. And I think the title needs an update, 'Change SubProcess to throw an exception on invalid input'?
Comment #27
danflanagan8Thanks @quietone
This was a trivial reroll (thankfully!). It applies to 9.3.x and 9.4.x. I'm never 100% sure what branch we should be targeting. Probably 9.4.x? I'll leave it as is for now.
I updated the title with a slightly different wording than suggested so that it still sounds like a bug rather than a task. I also touched up a couple things in the IS.
Comment #28
mikelutzThis seems to be a duplicate of #2880278: SubProcess should throw an explicit error on bad input.. I'm closing that one because more work seems to have been done here.
As far as the merits of the issue, the original reporter wanted sub_process to work with an array of simplexml elements. It currently doesn't and this works as designed. If you want to feed an array of simplexml elements in you need to convert them to an array of arrays first. The documentation for the plugin clearly states that the input needs to be an array of arrays.
As far as then throwing an exception on bad input, I'm fine with that, but a skip row exception is wrong, we need to throw a MigrateException on bad input. Looking through the process plugins just in the core migrate module, the following test for valid input before processing:
ArrayBuild,
Callback
Concat
Explode
Extract
FileCopy
Flatten
FormatDate
MakeUniqueBase
StaticMap
SubStr
UrlEncode
Of those, Callback throws an InvalidArgumentException and every single other process plugin that checks for valid input throws a MigrateException on invalid input. We have no process plugins now that throw a skip row on invalid input, and for good reasons. If there is invalid input getting into your process plugins, then there is something wrong with your migration, either in your source data, or in your processes. In most of the cases, skipping the row is just going to end up happening over and over anyway due to a bad configuration, so better to dump out of the migration and just fix it. We shouldn't blindly continue processing rows when the migration is broken. We should also do it for consistency, to match what other process plugins do.
Comment #29
mikelutzComment #30
mikelutzComment #31
danflanagan8Thanks @mikelutz
You make a very convincing case! I have changed the exception to
MigrateException. I also updated the proposed solution in the IS to reflect that change.Note: Comment 30 was posted as I was posting these patches. I don't want to rename and reupload so I'm just posting with the wrong comment number. :)
Comment #32
quietone commentedAh, so good that mikelutz was able to chime in here. I completely missed that!
The changes asked for in #28 have been made and this is much better now. We already had tests and now we have the correct exception, MigrateException. This is ready now.
@danflanagan8, thanks!
Time to get this in.
Comment #33
danflanagan8Yeah, this exception is better. And @mikelutz went ahead and created an issue to get
callbackin line with the other plugins: #3247950: Throw consistent exceptions on invalid process plugin configurations.Comment #34
larowlanDiscussed with @quietone, @amjad1233 and @fubarhouse at Drupal South and we agreed that because of the new exception being thrown we should write a change record here.
Comment #35
quietone commentedChange record added.
Comment #36
danflanagan8Change record text looks good to me. Thanks, @quietone!
Comment #39
mikelutzRerunning tests
Comment #41
mikelutzComment #43
quietone commentedRestoring RTBC because the test failure was due to #3255836: Test fails due to Composer 2.2 which has been fixed.
Comment #44
quietone commentedRe-uploading the patch so it will test with 9.4.x every two days. At least, I hope so.
Comment #45
alexpottCommitted and pushed bcb89890b24 to 10.0.x and 200a93269d9 to 9.4.x. Thanks!