Follow-up to #2623012: Implement interfaces and base classes for URL-based sources
The stand-alone JSON plugin had some support for following links from one request to the next where a feed is paginated (remnants still exist in comments). Get this working in general in the fetcher plugins.
Remaining tasks
Get tests passing.Add documentation.
| Comment | File | Size | Author |
|---|---|---|---|
| #137 | migrate_plus-support_paging-2640516-137.diff | 33.48 KB | nadavoid |
| #130 | migrate_plus-support-paging-2640516-130.patch | 17.2 KB | jacobbell84 |
| #125 | migrate_plus-support-paging-2640516-125.patch | 17.15 KB | jacobbell84 |
| #124 | interdiff-123-124.txt | 1021 bytes | kekkis |
| #124 | migrate_plus-support-paging-2640516-124.patch | 16.97 KB | kekkis |
Issue fork migrate_plus-2640516
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
mikeryanComment #3
Grayside commentedNext step after #2608610: Add support for list/item pattern.
Comment #4
Grayside commentedHere's my rough game plan going into this. Note that I'm attempting generic approach but I am not going to work on XML/SOAP.
Configuration
next, HAL would be_links/next/href.Links Header
DataParserPluginBase::nextSource()should call$this->getNextLinks()and prepend the resulting URLs to$this->urls. getNextLinks() will be abstract.Links Header: JSON
Json::openSourceUrls()can extract link headers from the data fetcher. Should data fetchers in general have a getStreamMetadata() method or should it check for theHttpplugin specifically?Links Header: XML (Blocked)
XMLReader is not currently using the HTTP data fetcher at all. Link Header support is dependent on switching to it. (E.g., download the XML doc via fetcher then open from temporary file location).
Links Header: SOAP (Blocked)
See XML.
Next Links
Next Links: JSON
Json::openSourceUrlscallsgetSourceData()to retrieve the data for import, but that might exclude data for the paging process.Next Links: XML
TBD, but appears like it can be approximately similar to JSON in that we could parse out the next link based on a selector.
Next Links: SOAP
TBD, but appears like it can be approximately similar to JSON in that we could parse out the next link based on a selector.
Comment #5
Grayside commentedHere we go. This patch will conflict with #2608610: Add support for list/item pattern as both needed some of the same changes to Json::getSourceData().
Comment #6
mikeryanComment #7
mikeryanThanks for this patch - I'm going to need to think on this bit, just want to let you know I'm not ignoring it as I catch up on reviews...
Comment #8
Grayside commentedI suspect the current implementation of this or #2608610: Add support for list/item pattern is disrupting rollbacks. Got this second-hand so I do not have a lot of details, but likely related to item counts.
Comment #9
mikeryanHaven't gotten around to a thorough review and testing, but I'd like to see the interface be more general - getNextLinks(). The implementation could, based on configuration, derive the links from headers, selectors, or query strings (e.g., ?page=1)
Comment #10
Grayside commentedSure. I named it
getNextLinksFromMetadata()because to my mind,getNextLinks()sounded like specifically the Link HTTP header and metadata is more generic.Would like to wait on more feedback before diving back in myself.
There is also this spurious warning in the logs under some configuration. Haven't traced which aspect of my usage is triggering it, as I have 2-3 variations of the pager configuration in place.
Comment #11
hugovk commentedThis would be a very useful addition. What's the current status? Thanks!
Comment #12
drclaw commentedHere's a re-rolled patch for 8.x-4.x from #5 with a few changes and a few additions:
Changes to original patch:
getNextUrlswhich IMO is more technically correct. Also it matches the language we use throughout the fetcher/parser plugins.Example config for the "urls" paging type:
And the payload might look something like this
Additions:
Added support for cursor-based paging
Some APIs (like the one I'm using) doesn't return full urls for the next and previous pages. Instead you get a cursor value that you need to insert into the url yourself. The twitter api docs explains this technique pretty well and has some examples https://dev.twitter.com/overview/api/cursoring.
Here's an example of the pager config for this:
And the payload might look like
Comment #13
drclaw commentedAck, metadata :)
Comment #14
badrange commentedHappy to see that this patch has progressed! I just tried it out in our project and it seemed to do it's job - running for a long time - until it suddenly stopped with this error message:
Comment #15
badrange commentedAnd it is not surprising that it fails; the last page in our datasource does return
meta/next: nullwhen there are no more items. I guess it would make sense for the pager to stop if the data source returns "null" or "false"?https://api.hel.fi/linkedevents/v1/event/?end=2017-09-12&include=locatio...
Comment #16
badrange commentedA simple check to see if the url is empty might do the trick.
Comment #17
heddnWe could use some tests.
Comment #18
drclaw commented@badrange I think you're right, a simple check for data should do it!
@heddn Yeah tests would be great. I'll try and get to those in the next couple weeks.
Comment #19
berenddeboer commentedThis patch really should be committed guys. Works great. Have extended it to handle the case where paging is done by page numbers. I.e. no previous/next, just ?page=1 kind of style.
Comment #21
svendecabooter@berenddeboer your patch does not apply, because it is not diffed from the module root directory, but rather from your Drupal installation root.
Comment #22
heddnI agree this is pretty important. But with the ease of adding a unit or kernel test these days, I'm going to ask we do that. So leaving at NW. Plus there's the need for a re-roll per #21.
Comment #23
mortona2k commentedRerolled the patch.
Comment #24
berenddeboer commentedSaying you need a test is easier said then done, definitely doesn't appear to be easy. I.e. it seems you may need an HTTP source for json.
I made a first attempt at such a test using a file uri, see attached file, and the error I get is:
How on earth do I fix that???
Comment #25
berenddeboer commentedOops, here the files.
Comment #26
berenddeboer commentedOK, let's do this better. Here a full patch including the test. Test will fail. Have added two comments testing this against a local webserver, and it passes then.
Comment #27
berenddeboer commentedAnd another attempt to get a patch that can be applied.
Comment #28
berenddeboer commentedComment #29
dweidner commentedWhat is holding this feature back? Looking for something similar to process a large HAL+JSON endpoint. I would like to help, even though I have no experience in contributing to drupal projects.
Comment #30
ressaI could also use this feature, maybe the patch just needs a review? I am changing status to Needs Review.
Comment #31
heddnWe now have some tests, but those are failing. And I don't see a lot of docs in this patch on how to configure a json migration with paging. Let's beef things up and get the failing test to pass.
Comment #32
ressaI wonder if the patch will also add an option of paging through multiple files? This which works fine currently:
(I got the tip from Mike Ryan in From: Using Migrate API with a multi-page / paginated source.)
But if you have hundreds of files, something like this would be better:
Meaning: Start from 1, get 100 records, repeat four times.
Comment #33
m.lebedev commentedFor test.
Comment #35
m.lebedev commentedRerolled; changed a path of a test file.
Comment #37
m.lebedev commentedlast try. I take an example from data_parser/JsonTest.php
Besides, I have an other problem
Notice: Undefined index: next in Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Json->getSourceData()
I get a data from a resource, which may don't have a next page on last page.
Comment #39
jcandan commentedFixed the failing test, changed names to match test performed, and added appropriate doc-blocks.
Added a new pager type:
paginator. This pager type allows one to deal with a paginated data where the next urls cannot be derived from the content. The example configuration below demonstrates how thepage_keyvalue would iteratively increase appropriately based on the configuredpaginator_type(e.g.check api.example.com/v1/?offset=100&limit50).Options:
paginator_typeUse "page_number" when the pager uses page numbers to determine the item to start at, use "starting_item" when the pager uses the item number to start at.default_num_itemsThe first call is made without size specified; one must therefore define the default size. There is a@todonoted to address this.page_keyThe url parameter key that should be used for incrementing pagessize_keyThe url parameter key that specifies the number of items per pageAs I wrote this description, I realized that unless we do add the ability to specify the size on the first run, the
default_num_sizeis unnecessary. I may address this in a follow-up patch.Need help with TESTS
Though I got the tests passing, I do not feel that the test, as supplied by previous patches, tests the available pager options adequately. I think it would be best to inject a mock http client into the parser and iterate over mock endpoints, as this issue addresses. I made a few attempts, but could not figure out how to accomplish this.
This pager type addition was inspired by External Entities, and met my needs perfectly.
Comment #40
jcandan commentedAdditionally, I removed this bit of code which allows one to configure the paginator size type. This was also inspired by the External Entities UI (see image above in #39). However, I had a hard time understanding its use-case. If someone feels it necessary to add to the patch, feel free.
Comment #41
jcandan commentedRe-rolled patch #39. Fixed coding standards in patched files.
Comment #42
achapHi @jcandan. Thanks for the patch. Is it possible to elaborate on this config key? Maybe an example. It's not really clear to me. "paginator_type Use "page_number" when the pager uses page numbers to determine the item to start at, use "starting_item" when the pager uses the item number to start at."
Otherwise, it is working.
Comment #43
jcandan commented@achap, see the url parameters in the example given in #39. There, the parameter keys are
limitandoffset. Now, think in terms of how that is paginated: in this case, theoffsettells the API endpoint how many items to skip, which is the same as listing thestarting item(e.g. offset=0 would show items 0-10, offset=11 would show items 11-20, etc.). In another example, the parameter keys might bepfor page, andpsfor page size. The pagination increment is representative of pages of data (p=1, p=2, etc.), in which casepaginator_typewould be set topage. Examples of both of these are https://librivox.org/api/info and http://rijksmuseum.github.io/ respectively. Hope this helps clarify.Comment #44
achap@jcandan yes makes perfect sense. Thank you!
Comment #45
jcandan commentedRe-rolled patch #41 (a re-roll of #39). Fixed one last missed coding standards item.
Comment #46
jcandan commentedComment #47
jcandan commentedComment #48
sdstyles commentedUse data fetcher plugin to check if URL is valid, this will work also with URLs which require authorization.
Comment #49
sdstyles commentedComment #50
audriusb commentedI am having an issue with latest patches. They do not return correct data.
see this call:
$data = $this->getSourceData($url, $this->configuration['pager']['selector']);it has 2 paramaters but the function is
protected function getSourceData($url)so the second parameter is completely ignored.
the latest working patch with 2nd parameter is #37
nevermind, missed the #39 with instructions. Do initial call require to be without page limit? what if it returns whole set then? I am going back to #37 for now
Comment #51
m.lebedev commentedImport does not work starting from #39
source:
pager:
type: urls
selector: "next"
In function getNextUrls:
$data = $this->getSourceData($url, $this->configuration['pager']['selector']);
but function takes one arg
protected function getSourceData($url)
How to set up for the "http://devel.loc/api/v1/request?page=85"?
{
"self": "http://devel.loc/api/v1/request?page=84",
"first": "http://devel.loc/api/v1/request?page=0",
"last": "http://devel.loc/api/v1/request?page=87",
"prev": "http://devel.loc/api/v1/request?page=83",
"next": "http://devel.loc/api/v1/request?page=85",
"list": [
What type of pager to use?
Comment #52
audriusb commentedyou need to use
type: paginatorinstead oftype: urlsyou currently use to work with latest patch. There are additional params needed to be set in #39.Comment #53
m.lebedev commentedok..
1)
source:
pager:
type: paginator
paginator_type: page_number
default_num_items: 100
page_key: page
size_key: limit
When a parser has try to get next link after last link I get an error:
"Error message: Client error: `GET http://devel.loc/api/v1/request?page=88&limit=100` resulted in a `404 Not Found`"
if this:
change on this:
then it will work.
2) what about other types of pager? They don't work now
Comment #54
cameron prince commentedI've tested the patch against a SharePoint API and it seems to work very well for importing paginated responses. The one thing that bugs me is that it seems to break
drush ms. When I debug the command, I see that it's paging through the API just like the import, trying to get the total number of rows to display.Do you think listing the paginated migrations with maybe "N/A" or "---" for their values in the status report would be a good solution?
Comment #55
weynhamzJust a heads up here, specifically for JSON:API, this issue propose to create a jsonapi data_parser plugin for JSON:API responses, in which also added the support for links/next, will there be any conflict against the work here?
Comment #56
heddnRe #54: there's a
skip_count: trueoption for all source plugins.I think we need more docs on how to use this feature in the doxygen.
I think we need more docs on how to use this feature in the doxygen.
I don't see where we are using the pager. As the test data only has a single page.
Comment #57
cameron prince commentedThanks @heddn!
skip_count: truetook care of the issue with the status report.Comment #58
cameron prince commentedI'm seeing a few bugs with this now that we're actually testing things and comparing counts. For instance, if the row limit is set to 50, we're seeing missing records. If we set the row limit down to 10, we get all results.
Also, when adding debugging to the Http class, I see that the same URL is queried multiple times. It seems to go forward two steps, then back one, such as: startrow=10, 20, 10, 20, 30, 20, 30, 40, etc.
Comment #59
cameron prince commentedAnother thing we've found with this patch is that when a migration is configured with a pager, it will no longer work if you change the data_fetcher_plugin to file. In our case, we were providing a local .json file as the source for a test. It wasn't obvious that the failure was related, but removing the pager resolved the issue.
Comment #60
m.lebedev commentedI created a patch based on the #53 comment
Comment #61
hudriThe last working patch for me is #37. I believe this is because I need to use the
pager: type: urlstogether withitem_selectorconfig key.Given a common data structure like
and a migration config like
then this patch fails on
migrate:importwithURI must be a string or UriInterfacebecause filesrc/Plugin/migrate_plus/data_parser/Json.phpgetSourceData()can't return the next url because it is limited to payload withinitem_selectorComment #62
weekbeforenextThe patch from #60 worked for me, but my API endpoint returns a warning message instead of NULL when there are no more results.
This is a patch that avoids the error:
My configuration for pagination looks like this:
Comment #63
weekbeforenextI had a need for the 'urls' pager type, so then I experienced the errors you all speak of.
I created a new patch where both the 'urls' and 'paginator' pager types work. I've also included interdiffs between patch #37 and patch #62.
Comment #64
weekbeforenextComment #65
duaelfrI found out that there was an issue if the current page number were "0".
Here is the fix.
Comment #66
heddnI wonder if this piggy-backed on the work in #3040427: Allow callback for Url source, and single item Json plugin if we would be better off? Build a default callback for drupal's json that is callable via a callback.
Comment #67
duaelfr@heddn Thanks for your feedback. I believe both features could work together.
In my current case, I need to build 20K+ URLs so the other issue would be perfect but each of these URLs can be paginated and I cannot know if they are before reading them (the page count is in the answer).
Comment #68
duaelfrBumped version to 5.x and added a way to select the max page from the source in "page" mode.
I feel like we should find a cleaner way to handle these use cases. We might need a new plugin type, don't you think?
Comment #69
pixlkat commentedIn reference to the comment in #58 -- while debugging this, I discovered that if you set the `default_num_items` config variable to anything greater than the default number of rows returned by the API, it will skip the number you specified, so it is possible to skip the difference between your requested size and the default size since the *initial* url is not set with the pager values. I got around this by setting this in my initial URL.
The API we are using pagination with returns the number of rows in the response as part of the JSON. I've updated the patch in 68 to use the `selector` configuration option to return that value and use it to decide if I need to add another URL to the array. I realize this may be somewhat of a snowflake, but it eliminated the need to request the URL to decide if it should be added. Our API will return a 200 whether or not there are any results or even an error in the query.
I would also support the idea of using a callback to enable pager support rather than trying to create a patch that is all things to all people.
Comment #70
prudloff commentedThe patch in #69 was very useful to us!
However, I found two remaining issues :
rel="next"in Link headers, the pager behave strangely, becauseJson::getNextUrls()andHttp::getNextUrls()will both want to add the same pages.item_selectorthat returns some empty items, it will cause the pager to go to the next page even if the current page is not over. The attached patch fixed this issue for us.Comment #71
robin.ingelbrecht commentedI found another use case where this patch crashes. I have multiple URL's setup and each of them has multiple pages.
The following line in DataParserPluginBase::addNextUrls() causes the array keys to be unsequential, which results in a NULL url in DataParserPluginBase::nextSource():
Adding the following fixes the problem:
Comment #72
robin.ingelbrecht commentedComment #73
axel80 commentedHi, thanks for the very useful patch.
I made some test on patch #72 with YouTube Data API v3, which use a cursor for pagination.
Cursors available are stored in the following JSON field:
The cursor are NOT returned if the corresponding pages is not available (i.e. last API call doesn't have "nextPageToken" in the response)
I made a test running the following configuration
If I run drush command, the import is succesfully executed with the Info message [info] Undefined index: nextPageToken Json.php:83
The items not imported are duplicated url that are skipped (unique constraint is not satisfied)
The same test executed from UI is failing after the import of the first 2 items. This is what I get on the page
Comment #74
azussman commentedIs anyone else having issues with paginator. I am trying to map a drupal 7 content type to a drupal 8 content type. I am using Views Datasource to help create a JSON endpoint on my drupal 7 CMS. I've created the endpoint and have successfully mapped the fields into my drupal 8 site. This works great with my test data. But I am looking at a massive migration 300K+ of data, because of that I want to utilize the power of pagination. The JSON endpoint provides this snippet to use:
I tried using this data to map onto the paginator as such:
This returns a warning `A non-numeric value encountered Json.php:255` and actually does return page 1, but it does not fetch page 2. If I try and hardcode the values into all those fields. The migration just stops working all together and just sits idle, returning no error message or feedback of what could be hanging up the request.
Any help or information around this would be greatly appreciated.
Comment #75
ruslan piskarovAlso the following code not always work:
in my case I get the following result with status 200:
Need to check if "item_selector" in my case "items" is empty also make
unset($next_urls[$key]);Comment #76
ruslan piskarovI have updated #72 patch for the case if an API returns 200 status code, but with empty 'item_selector'.
For some reason, I can't create an interdiff on my local.
My updated the following:
Comment #79
segi commentedI have fined a bug when I used patch #76.
I fixed in a new patch, plus I pulled the last changes from original repo to MR.
Comment #80
JeremyFrench commentedI ran into an issue using this patch with a target of a D7 restws site. Trying to pull back terms in a vocabulary ended up with an encoding error on the URL.
https://www.example.com/taxonomy_term?vocabulary=5&page=1In the page, this was being returned in the json as a Unicode code (which may be the issue and nothing to do with this patch).
,"next":"https:\/\/www.example.com\/taxonomy_term?vocabulary=5\u0026page=1",The pager config is like this
Trying to call the page with & in the URL gives a 404.
Comment #81
szloredan commentedI have a case where i need the sourceData parsed from URLs so a getter will be very good to not run the fetch again for same URL. I added a patch for this.
Comment #82
jcandan commentedRE: #74, @azussman:
The paginator pager type is meant to specify the parameters passed in the URL, not the item selector path to read pagination metadata. The example given in #39 shows how URL parameters might be used in pagination, e.g.
?offset=100&limit=50puts you at page three. And the pager configuration example works to iterate through that pagination without the next link supplied as seen in these other pager types.Comment #83
ronaldmulero commentedRe-roll of #81 against the latest 8.x-5.x-dev branch.
Comment #85
roam2345 commented@ronaldmulero thank you kindly for that, we are working with this issue actively over this week to consume an endpoint that requires this. Looking through the patch I don't see any documentation on the required YAML definition to utilize the patch. Looking through and trying some of the options put forward in this thread im still battling to get it to function.
Could that be added to the readme? Or an example be added some where in this module.
Thanks.
Lathan
Comment #86
ronaldmulero commented@lathan See #39 for @jcandan's example of how to use the patch.
Rerolled the patch to fix that failing test:
see reroll_diff_83-86.txt
Only 2 changes since #83:
elseblock replaces the entire source array with NULL, which can't be right.Comment #87
ronaldmulero commentedComment #88
scott_euser commentedThanks everyone for the work on this - very helpful!
When testing this with a D7 Restful Web Services site https://www.drupal.org/project/restws I noticed that the first page does not have a 'prev' and the last page does not have a 'next'. With the keys missing the entire JSON is returned instead for the pagination. The entire JSON is of course not a URL so the migration will error with "URI must be a string or UriInterface"
Updated patch to ensure that getSourceData by selector will only return data for that selector and if not found, then returns NULL instead.
Comment #90
scott_euser commentedUpdated patch to not assume that a selector has been provided.
Comment #91
scott_euser commentedComment #93
scott_euser commentedSorry for the spam, obviously need a bit more coffee this morning!
Comment #94
hmdnawaz commentedI have version 5.1 installed but the patch in #93 is not applying cleanly.
So I have rerolled the patch for that version.
My migration configurations are:
Comment #95
peter törnstrand commentedIf the pager selector contains a path with more then one component the pager URL's returned are wrong.
For example, with a pager selector like
_links/next/hrefthe returned value contains all data under_links.Comment #96
KevinVanRansbeeck commentedRe-roll against latest 8.x-5.x branch
Comment #98
sleopold commentedPrevious patch resulted in an `ArgumentCountError`
Comment #99
apmsooner commentedPatch in #98 works pretty well for me except I get an error on the migrations overview page:
Notice: Undefined index: next in Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Json->getSourceData() (line 82 of /var/www/web/modules/composer/migrate_plus/src/Plugin/migrate_plus/data_parser/Json.php)Comment #100
rob230 commentedPatch #98 has a few issues arising from parameters being used instead of a class variable - this was not replaced in all areas.
We also have the issue that when there are not multiple pages, the JSON:API doesn't output a links/next section, so I've added a check for this which means it won't try to load a next URL if the xpath for it isn't found. Pagination has to end so eventually there won't be a next URL. This may also fix your undefined index error apmsooner.
Comment #101
rob230 commentedComment #102
rob230 commentedNot sure why the patch didn't upload before, hopefully this time it works.
Comment #104
samerali commentedThanks for the patch @rob230, any chance we can get this applied to 6.x ? i tried applying it and it would not apply.
Comment #105
apmsooner commentedpatch in #102 didn't work for me applied against 5.x version. It returned no results. #98 is still working for me aside from the visual errors.
Comment #106
apmsooner commentedFYI, if anyone is trying to migrate from wordpress rest apis, i finally worked through a solution for that with example. The pager type is simply "page" and the defaults work.
You may also need to set authentication if the rest apis are restricted on wordpress site.
Comment #107
monymirzaRe-roll against latest 6.0.x dev branch
Comment #109
auth commentedAttaching a modification to the patch in #107 where we use is_numeric instead of is_int to determine if the selector should be used to select depth instead of as an xpath-like selector.
Comment #110
ricovandevin commentedTriggering tests.
Comment #111
ricovandevin commentedComment #112
quadbyte commentedReapplied #109 to latest 6.0.x-dev and improved code for the "paginator" pager type to support non scalar selector.
Comment #113
bramdriesenPatch does not apply
Comment #114
quadbyte commentedFixed patch prefix.
Comment #115
bramdriesenSetting to needs review to trigger tests :)
Comment #117
heddnThis really needs some tests. Any chance we can add some?
Comment #118
slayne40 commentedFor pager type urls, verify if the selector return an url.
Comment #119
kekkisI believe the patches starting from #112 contain an error when using the 'paginator' type and when the $selector_data is scalar. The comparison operator between $num_items and $selector_data has shifted from '==' to '!=' for an unknown reason. How this manifests is that only the first page ever gets processed.
In this new patch I try to explain the thinking in the comment a bit better, plus of course fix the operator.
Leaving as Needs work since there still are no tests.
Comment #120
sassafrass commentedI am testing using the latest patch. In my yaml, I am using the paginator type because the urls provided in the JSON endpoints are not valid for my use case.
The paginated urls are generated as expected. However, it never stops generating urls. Urls being generated are valid but have no records. For example see: https://services.baltimorecountymd.gov/api/hub/pets/pets?status=Adoptabl....
My particular use case has non-scalar $selector_data but the JSON is not in format expected by this segment of code, which always evaluates to true.
Comment #121
jcandan commentedRe-roll #119 for latest 6.0.x-dev.
Comment #123
chuyenlv commentedRe-roll #121 for 6.0.x, Drupal 10 and PHP 8.1
Comment #124
kekkisThe return type of
\Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Json::getSourceDatacannot be set toarrayreliably since it might returnNULL(per its own code) or any scalar (fromjson_decode). This is my suggestion how to fix it, even though usingmixedmight do as well.Comment #125
jacobbell84 commentedComment 100 introduced a logic change to the getSourceData function in the Json data parser plugin that I believe wasn't intended.
With that change, it's no longer possible to use the Json data parser without setting an item selector. This is because the 'explode' method will return an array with one empty element if an empty item selector is passed to it, which causes the new check to fail 100% of the time and returns null. I've changed that logic to what's below, which I believe solves that problem and keeps the original intended functionality.
Comment #127
nadavoid commentedIs there a way to use this patch for the paging scenario I have? The things in particular that I'm not sure how to solve:
My JSON source has this structure:
The URL of the next page is the same as the current page, with the change of a URL param:
&pageToken=qwk9I'm thinking that I'll need to update the patch or extend it using custom code, but wanted to check here first. If updating the patch is there an existing pager type that would be best to model this on? These are the pager types that I see recognized in the patch so far:
UPDATE
The solution is
cursorwith this configuration:Comment #128
nadavoid commentedUpdating issue summary.
Comment #129
nadavoid commentedCommenting with what I understand to be the purpose of some of the pager options for the JSON plugin.
selector: The path to the JSON item identifying the next page.
type
urls: selector points to one or more URLs.cursor: selector is just a portion of the next page URL. Set `key` to the URL parameter key that should be passed.page: selector is a current page number, and expected to increment: 1, 2, 3, etc.paginator: not surekey: Parameter key in the URL, will be given a value that identifies the page to request. If key is set to
page, then the resulting generated URL will contain?page=[value-from-current-json]More parameters to document:
size_key
page_key
selector_max
default_num_items
paginator_type
Comment #130
jacobbell84 commentedFixing some incorrect variable names (A couple spots were still using $this->itemSelector instead of $item_selector) and fixing another edge case where the item_selector is blank.
Comment #132
guiu.rocafort.ferrerI updated the issue fork with the latest 6.0.x branch, so it is mergeable now. I also made a few changes so the Gitlab CI is available and runs de tests, and also fixed some of the tests.
Some of the changes i made, which might be arguable, are:
My plan is to next write some tests for the pager functionality, and go from there to fix potential issues and make sure it works ok, before merging into the main branch.
Comment #133
guiu.rocafort.ferrerThe last commit adds test cases for the pager types urls, cursor, and page. The tests for paginator are still missing thought.
I also expanded a little bit the documentation on the pager types here: https://www.drupal.org/docs/8/api/migrate-api/migrate-source-plugins/mig...
Comment #134
guiu.rocafort.ferrerI am changing the status to "Needs review", since i wrote tests for all the pager types, and also updated the documentation to reflect the new functionality.
Documentation: https://www.drupal.org/docs/8/api/migrate-api/migrate-source-plugins/mig...
Comment #135
nadavoid commentedThe issue fork has merge conflicts against the main branch, with the recent update of migrate_plus to 6.0.2. The latest patches also no longer apply.
@guiu.rocafort.ferrer Could you merge the latest into your issue fork and resolve conflicts? I'll be happy to test after that, since I'm actively using this, and will continue to for quite a while.
It looks like the conflict is only in .gitlab-ci.yml
Comment #136
guiu.rocafort.ferrerHi @nadavoid, the merge request should now be good to go.
Comment #137
nadavoid commentedThanks for the quick update @guiu.rocafort.ferrer! I really appreciate it. I've successfully applied the patch from the MR on 6.0.2 and confirmed that the cursor pagination still works in my installation.
The code looks good good to me, and the tests seem comprehensive. So I'm marking this RTBC. Others who have been deeper in this issue are of course also welcome to review. I'm strongly in favor of merging soon, and if adjustments are needed or bugs are found, they can be handled in smaller followup issues.
I'm also uploading the current version of the patch so that people have something stable to use in their builds today.
Comment #138
guiu.rocafort.ferrerComment #139
longwaveI'm using this with the
paginatorpager type and while it works well, it is repeating HTTP requests - and my source is relatively slow to respond, so this is slowing down my migrations.If I add logging to
Http::getResponse()I see that requests are made up to three times for the same URL:The extra calls come from two places. Firstly:
where
parent::getNextUrls($url)calls the Http fetcher to look for Link headers; this probably isn't necessary if a pager is configured?Secondly:
This isn't necessary for my source so it would be good to configure this as well.
I hate to say this at this stage but I wonder if the different pagers need to be their own type of plugin? That would mean they could be more easily extended for different cases. As an example my source also returns a count of total rows in each response which should mean I can automatically calculate the final page number without having to look ahead.
Comment #140
guiu.rocafort.ferrerHi @longwave, I understand your concerns about the performance issues, and the redundant http requests, i do believe too that this is not optimal, and there is room for improve.
This issue have been opened since 2015, and it has been difficult to push it forward and make it to merge with the development branch, so i am worried that addressing those issues would delay even more. I personally have a few sites that make use of the patch for a while now, and i believe some other people might be in the same situation.
So i believe it would be better to merge the issue into development, and then create a follow-up issue to improve the performance situation.
What do the other think about this ?
Comment #141
longwaveThat decision is up to the Migrate Plus maintainers, I think it would be fine to defer to followups given this fixes some immediate issues and both removing the additional requests and refactoring to use plugins could be done separately.
Comment #142
googletorp commentedI've used this in the past and compared to the alternative (which is to fetch everything in one go) this is a major improvement. This is also an opt-in feature, so while making a lot of requests can cause problems, it's something the developers using this can determine if it's a problem or not. From what I understand if you're not using the feature we won't have any performance change.
Maybe it's better to get it in, solve a lot of people's problems and get real life feedback on which performance issues would make sense to address.
I think this is RTBC.
Comment #143
heddnThis is likely one of the longest running requests in the migrate plus queue. Let's land it and make incremental improvements on what we have here.
Comment #145
heddnThank you everyone for sticking with this issue.
Comment #147
heddnComment #148
ressaYes, thanks everyone for landing this feature with great tenacity. This is one of the things which makes so Drupal so great -- while an issue may take some time, many will eventually be resolved, and make Drupal even greater.
Comment #149
init90I've created an issue to address one of the points from #139 - #3466499: Omit an extra request when the pager is used