In the current iteration of the data parser plugin `Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Json`, the ability to use XPath selectors only allows for simple string selectors at each level. It would be helpful to allow for a slightly more advanced form of XPath-based filtering when parsing JSON data. For example, allowing for filtering based on the child attributes of the current selector level.
Testing instructions
It is a little tricky to test this patch, since an important part is that it adds a new package to composer.json.
One approach is to add the patch as usual, to get everything else (the code that uses the added package). Then
composer require softcreatr/jsonpath:0.8.*
This should have the same effect as letting Composer install the new requirement.
The other approach is to add a repository to composer.json for the issue fork for this issue in composer.json:
"repositories": [ { "type": "vcs", "url": "https://git.drupalcode.org/issue/migrate_plus-3007709" }, { "type": "composer", "url": "https://packages.drupal.org/8" } ],
The order matters! Then require the feature branch for this issue. (This will also update composer.json.)
composer require drupal/migrate_plus:dev-3007709-xpath-6.0.x
This will pull in the patch for this issue and install the newly added package.
JSONPath Information
The following links are helpful when trying to create proper JSONPath selectors with queries.
- JSONPath -- XPath for JSON
- JSONPath Expression Tester: Enter sample JSON and create expressions to evaluate.
: Syntax documentation
| Comment | File | Size | Author |
|---|---|---|---|
| #18 | migrate_plus-json-xpath-filtering-3007709-18.patch | 3.67 KB | mstrelan |
| #18 | interdiff_3007709_16_18.txt | 664 bytes | mstrelan |
| #20 | migrate_plus-json-xpath-filtering-3007709-20.patch | 4.02 KB | pcate |
| #24 | migrate_plus-json-xpath-filtering-3007709-24.patch | 4 KB | duaelfr |
| #25 | interdiff-3007709-24-25.txt | 789 bytes | duaelfr |
Issue fork migrate_plus-3007709
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
michael_wojcik commentedInitial patch to allow for filtering by child attributes using the format `SELECTOR[FILTER_KEY=FILTER_VALUE]`
Comment #3
heddnLots of good comments in the patch. Thanks for your contributions. However, we need to be sure we maintain backward compatibility so having some tests that demonstrate that the current xpath approach works and that this new method also are supported would be nice. Setting back to NW for tests.
Comment #4
michael_wojcik commented@heddn Thanks for the feedback! That sounds good. I'm not very familiar with writing tests for contrib modules. Is this a case where I can simply add a test to the existing class at `migrate_plus\Kernel\Plugin\migrate_plus\data_parser\JsonTest`?
Comment #5
heddnYeah, that would work. There's nothing much special to these unit or kernel tests. Just normal unit test style stuff. If you haven't read it, there's some good docs over in https://phpunit.de/manual/4.5/en/writing-tests-for-phpunit.html about general testing and https://www.drupal.org/docs/8/phpunit
Comment #6
ctrladelThoughts on getting off the island that's being built here?
There's 2 very popular packages on packagist that implement JSONPath selectors which allows for much more advanced selectors. flow/jsonpath and mtdowling/jmespath.php
Seems like a better use of time to integrate one of these packages than to cobble together our own selection logic.
#2640514 has one example use case which deals with a product import.
Comment #7
heddnI would much rather see us use an external library. From a quick survey of those two projects, flow/jsonpath seems the better fit for us. However, this also is a very likely break of BC. Shall we add a new json parser, built from scratch, that uses one of these? If we did that, I'd want to see some decent tests out of the gate.
Comment #8
ctrladelTook an first pass at this. Just copied the existing json parser and modified the selection logic. Only part I'm not real happy about is
I might be missing something but I couldn't get a selector that returned just a string. Instead it was returning an array with one element as an example.
selected item:
field selector: label
result:
Example migration & data that's working for me:
Comment #9
ctrladelForgot to attach the patch :/
Comment #10
acbramley commented+1 for this feature, but I agree with #8 that bringing in a 3rd party library to do the heavy lifting for us would be the best approach. Being able to use JSONPath or proper Xpath expressions would be a great feature. E.g, I have a JSON feed I'm consuming, but I need to filter out some entries based on an attribute. Using Xpath like this works:
/data[attributes/parent_product/product_type = "Subscription"]But isn't supported currently in the Json parser plugin.
Comment #11
Hookset Media commentedI was about to open a new ticket, but I think this addresses my problem. You can see full troubleshooting here:
https://www.drupal.org/forum/support/module-development-and-code-questio...
Basically, I am trying to import 1 of 121 objects from an xpath selector. My migration works for all objects and creates 121 entities, but when I set the selector to the xpath of one of them, it errors out because the they are nested instead of in an array, like this:
{
"Abilene": {
"area": 589.4,
"condensed_name": "Abilene",
"conservation_capacity": 7900.0,
"conservation_pool_elevation": 2012.3,
"conservation_storage": 7757.0,
"dead_pool_elevation": 1968.8,
"elevation": 2012.07,
"full_name": "Lake Abilene",
"gauge_location": {
"coordinates": [
-99.88897705,
32.23457718
],
"type": "Point"
},
"percent_full": 98.2,
"short_name": "Abilene",
"tags": [
"climate_low_rolling_plains",
"monitored",
"municipal_abilene",
"region_brazos_g",
"water_supply",
"texas",
"major",
"basin_brazos"
],
"timestamp": "2019-03-13",
"volume": 7757.0,
"volume_under_conservation_pool_elevation": 7900.0
},
"Addicks": {
"area": null,
"condensed_name": "Addicks",
"conservation_capacity": null,
"conservation_pool_elevation": 67.5,
"conservation_storage": null,
"dead_pool_elevation": 67.5,
"elevation": 68.88,
"full_name": "Addicks Reservoir",
"gauge_location": {
"coordinates": [
-95.62355804,
29.79133987
],
"type": "Point"
}}
the parser identifies correct number of rows, but fails. if I select 'Addicks'.
Here is the full JSON feed:
https://waterdatafortexas.org/reservoirs/recent-conditions.json
Seems like this ticket addresses a similar problem, if not, I will be glad to open a different ticket.
Comment #12
naveenvalechaI have applied with above patch to test it with the data below. There's an issue with the library itself when there's a "-" in the field selectors.The below migration template throws error when fetching the display-name from the json.
https://github.com/FlowCommunications/JSONPath/issues/16
Here's the migration template:
Data.
Comment #13
naveenvalechaRe: #12
I made the following change to the selector and it worked fine as mentioned in the github issue.
selector: '"display-name"'Comment #14
flamsens commentedI tested the JsonPath parser on some test data. When a field selector finds no matching field in the json source, the result is an empty array. In that case, the process plugins attached to this field are not executed. If I change fetchNextRow to return an empty string, the process plugins are executed.
Comment #15
duaelfrPath in #8 works well for my simple use case.
It has an issue though if you try to use a SubProcess because the
\Drupal\migrate\Plugin\migrate\process\SubProcess::transform()method tries to merge a JsonPath object with an array here$new_row = new Row($new_value + $source);I have no idea on how to fix this for now.
Comment #16
duaelfrI had to dig a bit around this SubProcess problem. It seems that JSONPath:first() can return JSONPath objects and also that JSONPath:data() converts JSON objects to PHP stdClass objects. The problem is that the SubProcess plugin uses a "+" operator to merge the source so it fails.
The attached patch solves this issue by recursively converting data to nested arrays.
Comment #17
radelson commentedPatch was not applying anymore
Comment #18
mstrelan commentedThe
is_scalar()check from #16 converts NULL values to an array which is causing database exceptions. Updated patch checksis_null()as well.Comment #19
pcate commentedLooks like the patch needs a re-roll for v5.1. Also, looks like
flow/jsonpathlib has a newer 0.5.0 release.Comment #20
pcate commentedComment #21
pcate commentedUpdated patch to work with v5.1.
Comment #22
daveianoDo not forget https://www.drupal.org/project/migrate_source_jsonpath which uses https://github.com/Galbar/JsonPath-PHP under the hood, which is for my use case the better library.
In flow/jsonpath there is no possibility (at least I don't found any, see https://github.com/FlowCommunications/JSONPath/pull/57#issuecomment-6890...) to filter based on sub-properties, like
Galbar/JsonPath-PHP supports this and many more features.
Comment #23
duaelfrflow/jsonpath has been deprecated in favor of softcreatr/jsonpath so this patch needs to be updated. I've planned to do it in a two weeks if no one does it before.
Comment #24
duaelfrBumped to 5.x then made a straight reroll to be able to provide an eventual interdiff after that.
Comment #25
duaelfrIt was easier than expected ;)
I replace the dependency, placed it in the require section and reverted the change on the author URL that is out of scope.
Comment #26
benjifisherI have not tested at all, and I have no opinion on which library to use (#22).
The attached patch is just a reroll of the one in #25, which no longer applies. I also replace
with
Comment #28
benjifisherSame result as the 8.x-5.x branch: the test passes with Drupal 9, not with Drupal 8.9. I think it is the version of PHPUnit that matters.
Back to NR.
Comment #29
pcate commented#26 wasn't applying with the latest dev version so I did a re-roll.
Comment #30
pcate commentedUpdated #29 patch as it wasn't adding the new file in the correct directory for me.
Comment #31
heddnNW for some tests of the source plugin.
Comment #32
daniel.pernold commentedThis updated patch uses JSONPath->getData() instead of JSONPath->data() and ensures compatibility to softcreatr/jsonpath 0.8.0.
Comment #33
radelson commentedThe branch is updated with the patch from #32 + a test case checking the plugin behavior when handling a Json file with null values and missing properties.
The jsonpath expression used in the test for null values is similar to something we used in a real migration.
The test for missing properties is lifted from the regular Json plugin test.
Not sure if the tests are sufficient.
This needs work as the tests are confirming comment #14 behavior.
Comment #34
radelson commentedComment #35
daniel.pernold commentedReroll #32 for 5.3.x and pin dependency to softcreatr/jsonpath 0.8.*
Comment #38
binoli lalani commentedHello,
I applied #35 patch but it throws below errors while running the migration so I've fixed it and added the patch.
Please review.
Thanks!
Comment #39
binoli lalani commentedErrors which I am getting and added above patch to resolve it
Comment #40
pcate commentedAttached is an update of #38 patch and rerolled for the 6.x dev branch.
The main change is I added a test for the parser. I also added some type declarations that the 6.x regular JSON parser had.
An interdiff is included, but there were a lot of whitespace changes so I excluded those from it.
Comment #41
pcate commentedComment #43
benjifisherI am updating the version to 6.0.x. I also created a new MR based on that branch and the patch in #40.
Comment #44
benjifisherComment #45
pixlkat commentedI installed the MR version of migrate_plus using the instructions above on a vanilla D10 site with the minimal profile and created a test content type to use as the destination for the test migration.
I created a module with a simple migration using a sample JSON file source (attached) and was successful in using a selector to limit the items which were migrated. The JSON contains an attribute named "favoriteFruit"; the item_selector in the migration selects those with favoriteFruit=strawberry. My migration only imported the items with the matching selector with no errors.
My simple migration:
Comment #46
pixlkat commentedComment #47
benjifisher@pixlcat:
Thanks for testing this, and for adding JSONPath links to the issue summary.
Since we looked at this issue at DrupalCon earlier this month, I am adding the tag for that.
I reviewed the changes. It mostly looks good, but I would like a little code cleanup. I am setting the status to NW, and I will re-review promptly. Let's try to get this issue done!
One of my review comments will require a bit of research and/or experimentation.
Comment #48
pixlkat commented@benjifisher I guess I didn't set this back to needs review. Doing that here.
Comment #49
jon nunan commentedShould this new class extend the current JSON class instead of DataParserPluginBase? I did some quick testing and it seems this new plugin doesn't support the pager enhancements as it doesn't implement getNextUrls()?
Comment #51
b.lammers commentedAdded a patch file based upon #50
Comment #52
heddnNice idea here. Added some comments on the MR.
Comment #53
xurizaemonLooks like the CI pipeline failure "composer (previous minor)" can be disregarded here - #3565496: pipeline failure - composer (previous minor)
Tested with an existing JSONPath migration locally, dataparser goes away when softcreatr/jsonpath is not a dependency of the current project, is available and works when it is. Further testing welcome!
Comment #54
heddnThe only worry I have with CI test failures is the cspell on dataparser. If we can add that to the allow list, then this looks good to me. The default previous version of drupal core is about to change any day now so I'm not too worried about getting that green.