Currently, batch_limit for drush migrate-import is 1/100 of total items to process. It's fine (or doesn't matter) for sources like SQL but for fairly large file-based (xml in my case) sources it seems to slow down the process with each new batch. First 1-3 batches of 25k items import almost instantly, then it starts to slow down considerably. No big deal for a one-time process but in my case it's a recurrent (daily) migration. The source file isn't large so I think increasing batch_limit would help.
Is there a way other than hacking the code?
Comment | File | Size | Author |
---|---|---|---|
#42 | 3067165-reroll.patch | 16.62 KB | aurelianzaha |
#38 | migrate_tools-batch_size-3067165-38.patch | 16.99 KB | gaards |
#36 | 3067165-36.patch | 60 KB | timohuisman |
#30 | migrate_tools-batch_size-3067165-30.patch | 525 bytes | manuel.adan |
Issue fork migrate_tools-3067165
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
scoff CreditAttribution: scoff commentedSame migration, batch_limit 251, 1000 and 2507 (changed in MigrateBatchExecutable.php)
Comment #3
jlscott CreditAttribution: jlscott commentedI have created a patch which allows a new parameter called "batch_size" to be included in the options array provided to the constructor of the MigrateBatchExecutable class. This value is used to set the size of each batch within the limit for each migrate import.
Comment #4
jlscott CreditAttribution: jlscott commentedComment #5
rob_pr CreditAttribution: rob_pr commentedI re-rolled the patch for the current 4.x-dev and added a few minor fixes.
Since drush migrate:import will use MigrateBatchExecutable soon, I added "--batch-size" as an option.
Comment #6
heddnLet's re-roll this for 5x please and seems there is still some CS issues.
Comment #7
jlscott CreditAttribution: jlscott commentedPatch re-rolled for 5.x branch. Note that this patch also applies to the latest 4.x releases.
Comment #8
jlscott CreditAttribution: jlscott commentedPatch updated to add missing @var tag.
Comment #9
jlscott CreditAttribution: jlscott commentedComment #10
amjad1233@jlscott Looks good to me. Just deployed to one of our projects seems working as expected.
Comment #11
heddnThe great gitlab migration is upon us. See https://gitlab.com/drupalspoons/migrate_tools/-/issues/71. The latest patch from here is posted to https://gitlab.com/drupalspoons/migrate_tools/-/merge_requests/3.
Comment #12
zcht CreditAttribution: zcht commentedIt seems that I would need exactly this patch for my migration :) However, it doesn't seem to work for me when I take the patch from the #8 comment and install it. I'm using the current 8.x-5.x-dev version of the module, under Drupal 8.9.3. I wanted to do the migration with an example size:
drush migrate:import my_taxonomy_migration --batch-size=5
I get the following error:
As a test, I have included in MigrateExecutable.php file -> use Drupal\Component\Utility\Bytes; but that doesn't seem to work either. I get the following error:
Any idea how I could fix it? My taxonomy migrations are very small, but I would like to use this patch for my node migrations, keeping the batch size as small as possible, since the node migration is very large (>60k nodes).
Thanks a lot for the great work :)
Comment #15
rob_pr CreditAttribution: rob_pr commentedI updated the patch from #8 for the current 5.x-dev and started a merge request on drupalcode.org.
Comment #16
heddnI like the direction this is going. Can we add test coverage? Either to drush or the UI for adjusting the batch size?
Comment #17
Marios Anagnostopoulos CreditAttribution: Marios Anagnostopoulos as a volunteer commentedWould it be a good addition to allow for a more global settings? For example in the migration group, which will be overridden per migration if the setting in this patch is filled.
To implement that we would need changes in both the migrate tools & migrate plus modules though, and I have not idea how we should handle this in the issue queue.
I believe it is worth it though, and am willing to work on it if you can guide me on how to patches that are dependent to one another. Any thoughts?
Edit: Maybe we could also consider adding setting in the migration configurations or is it overkill?
Comment #18
Marios Anagnostopoulos CreditAttribution: Marios Anagnostopoulos as a volunteer commentedAdditionally the patch in #8 does not apply for some reason, some line numbers in the patch are off (I am using 5.1.0 of migrate_tools and no other patch for the module, also checked it with the 8.x-5.x branch. Maybe I am doing something wrong, but for anyone facing the same issue, here is reroll i guess.
Comment #20
james.williams CreditAttribution: james.williams at ComputerMinds commentedI've just brought across the work from #2701121: New batch started because of not enough reclaimed memory seems to forget which migrations have already run leading to missing migration dependencies , which expands the scope of this issue a bit, but as @heddn suggested in #2701121-53: New batch started because of not enough reclaimed memory seems to forget which migrations have already run leading to missing migration dependencies , combining forces on these two very related issues should be well worth it.
I've stuck with the 'batch-size' option from this issue, rather than using the 'batch' option from the other one. I've then added a single commit of my own work on top, to allow specifying the 'batch-size' option as either a percentage of the total number of items, or an absolute number of items to process in each iteration. This resolves the 'todo' comment in the existing code, which was placed there in the original issue that added batching in any form (see #2470882-16: Implement running migration processes through the UI), so that the limit can be set usefully for both large and small migrations.
Meanwhile, I'm hiding all the patches here, as the merge request is probably the thing that should be used nowadays. (The most recent patch was simply a re-rolled version that had nothing but trivial differences to the merge request.)
Leaving as 'needs work', as comment 16 sensibly asked for more test coverage.
Note that any potential credit for people from #2701121: New batch started because of not enough reclaimed memory seems to forget which migrations have already run leading to missing migration dependencies should be brought across, should this work get accepted.
Comment #21
Radelson CreditAttribution: Radelson at WebstanZ commentedMigrateBatchExecutable is not compatible with the --sync option.
Not directly related with this issue description but it seems apt to mention it here. The --sync option isn't supported from the UI but the patch is introducing support for the Drush command, which advertise the --sync option.
Maybe we shouldn't support the --sync option when used with the --batch-size option and postpone the work to make it compatible ? The command could fail and warn the user that the flags aren't compatible.
We replaced our existing migrations to use batching and we ended up working around that limitation by running them first without batching and with the --limit option. We sync first and then import in a second step.
Comment #23
timohuismanWe use composer patches to apply patches. Merge request patch urls are updated for every commit on the branch, which makes it a moving target. Attached is a patch containing the current state of the merge request.
Comment #26
andycarlberg CreditAttribution: andycarlberg at Bounteous commentedI rerolled this against the 6.x version and opened a new PR. I'm updating the version on this issue accordingly.
I haven't rerolled against a major version before. If I should have opened a new issue, let me know and I will do so.
Comment #28
grifstuf CreditAttribution: grifstuf at Bounteous commentedTo fix a type error, we had to add a boolean to integer type cast in the MigrateBatchExecutable class. This is because drush brings in CLI option flags as booleans but the class requires integers for the
update
andforce
options as opposed to booleans. This type cast did not happen automatically because the MigrateBatchExecutable class hasstrict_types
enabled for that file.While this fixes the type error encountered when running these batch migrations for now, it is fair to say that the MigrateBatchExecutable class should be able to accept booleans as well as integers. The values for the update and force options are used for their falsiness or truthiness, so semantically the class should accept booleans as well as integers for these options. It might be worth having another issue to fix this tech debt for anyone else wishing to interface with this class in the future.
Comment #29
timohuismanThis patch contains the current state of the merge request.
Comment #30
manuel.adanThe new option might take some time to be fully implement. A "security" max limit could be useful in the meantime.
Comment #31
naveenvalechaAssigning myself to review & test the PR.
Comment #33
jardasmahel CreditAttribution: jardasmahel at Morpht for Morpht commentedThe tests are passing on my local however they are failing in CI.
Can someone share some wisdom on it?
lando phpunit --group=migrate_tools
PHPUnit 8.5.31 by Sebastian Bergmann and contributors.
Testing
............................
......
.. 36 / 36 (100%)
Time: 4.84 minutes, Memory: 353.00 MB
OK (36 tests, 328 assertions)
HTML output was generated
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
https://drupal-contributions.lndo.site/sites/simpletest/browser_output/D...
Comment #35
naveenvalechaPR - https://git.drupalcode.org/project/migrate_tools/-/merge_requests/19
Looks good to me.
Comment #36
timohuismanThis patch contains the current state of the merge request. See "Patches from drupal.org merge request URLs are dangerous?" for more information.
Comment #37
gaards CreditAttribution: gaards as a volunteer and at Yle - Finnish Broadcasting Company commentedThe latest patch provided here didn't apply, so I created a new one from the latest diff of merge request 19.
Comment #38
gaards CreditAttribution: gaards as a volunteer and at Yle - Finnish Broadcasting Company commentedUpdated the patch with a small fix to the option name which was incorrect, causing the batch size to not work (thanks to olli for spotting it).
Comment #39
thisisalistairsaccount CreditAttribution: thisisalistairsaccount commentedHoping if things are all good here that we could get this into a future release for the module? This would be beneficial for some people we're working with and would love to see it included :)
Comment #40
Kfootm70 CreditAttribution: Kfootm70 commentedMuch like the above comment, is there a date of release for this module please? Thanks team!
Comment #41
naveenvalechaforgot to unassign from myself
Comment #42
aurelianzaha CreditAttribution: aurelianzaha at jobiqo - job board technology commentedattached is the patched re-rolled against 6.0.x
Comment #43
Kristen PolI'm ignoring the patches (though I understand why some people are adding them).
We will use the MR. Since there are 2 open. we will use the more recent reroll.
But, the latest MR (19) needs a merge conflict resolved, so moving to needs work:
https://git.drupalcode.org/project/migrate_tools/-/merge_requests/19/con...
Comment #44
Kristen PolBack to needs review.
Comment #45
Kristen PolI've compared MRs 7 and 19 to see what has changed other than coding standards and formatting and these are the things I found.
The description is substantially shorter in the latest MR, but then that makes it much easier to read. Perhaps the extra details could be moved into the README or some other documentation.
Comment #46
Kristen PolI got confirmation from @heddn via Slack that these changes above look okay so this can move forward with a round of testing for the *new* MR. Once that happens and this is RTBC with the new MR, then we can see if @heddn or one of the other comaintainers can get this in.
Comment #47
emixaam CreditAttribution: emixaam commentedSucessfully tested MR 19 on a large custom migration (42000 nodes) with --batch-size of 1000 and 2000, with and without the --update flag, in a local environment and on a dev server. migrate-import correctly processes the given amount of items, and the duration for each batch stays similar.
Comment #48
Kristen PolHmm... that's disappointing. Anyone else able to test?