Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
Large migrations can be problematic, from both a query standpoint and a processing standpoint. Adding the ability to batch process a migration would be helpful. With batch support, the ability to limit migrations would be a much easier task as well. See #2282013: Add a row limit option to MigrateExecutable.
Proposed resolution
Add batching support to the core migrate module.
Remaining tasks
All the tasks.
User interface changes
Possibly the ability to limit migrations from the UI.
API changes
TBD
Comment | File | Size | Author |
---|---|---|---|
#43 | 2309695-43.patch | 21.18 KB | alexpott |
#43 | 40-43-interdiff.txt | 3.16 KB | alexpott |
#40 | interdiff.txt | 4.69 KB | quietone |
#40 | 2309695-40.patch | 21.25 KB | quietone |
#37 | interdiff.txt | 2.08 KB | quietone |
Comments
Comment #1
chx CreditAttribution: chx commentedI think #2282013: Add a row limit option to MigrateExecutable will be completely swallowed by this. Specify a batch size, run one batch, done.
Comment #2
mikeryanBatching for the UI, support for row limits, etc. is all being handled now by the contrib tools. The one element here we don't have in D8 is batching of large SQL queries - for very large migrations, the migration's base query may simply overrun memory limits/buffer sizes/etc. This functionality was added to D7 migrate in #2296187: Add batched query support to MigrateSQLSource, we should look at porting it forward (not from the patch there but from the current implementation, there ended up being a couple significant bugs in that patch).
Comment #3
mikeryanLooks like an instance of this issue may have been spotted in the wild: #2625366: Upgrading D6 to D8 will crash with lots of nodes
Comment #4
mikeryanComment #5
mikeryanHere's a straight-forward port of the D7 functionality. Would be great if someone with a huge volume of content could manually test with this path.
Comment #6
mikeryanAnd then, thinking about how to test this (which requires setting a batch_size option in the source plugin configuration), given the lack of tools for configuring migrations... The batching is disabled by default, but maybe we should enable it with some fair-sized value like, I don't know, 50000?
Comment #7
benjy CreditAttribution: benjy at PreviousNext commentedIt seems strange to prefix these with get when they don't actually return anything? Maybe it should just be nextRow() or prepareNextRow(), which then is kind of close to prepareRow(), open to suggestions?
Needs a class comment, can you explain how the batch query works here in relation to the normal query? E.g., explain we clone it upfront and then clone it for each new batch query.
Un-needed, already initialised on the class itself.
I'm not sure how this actually works. Where is $this->result initialised? And how is it used.
Comment #8
catch50k for a default is good.
Comment #9
mikeryans/get/fetch/ ?
As I recall, we cloned it in D7 so that our alterations didn't affect other usages of the main query (like retrieving counts). I'll review and document it more clearly.
Hmm, could initializeIterator() be called more than once (perhaps in tests)? It is redundant, but maybe it's the initialization on the member that should be removed...
Umm... A product of blind copying, $this->result meant something in D7, but here is not defined or used so this is doing nothing useful. Which means this isn't nearly as simple as I initially thought, given that the Sql plugin is based on an IteratorIterator over the initial query...
Comment #10
chx CreditAttribution: chx commentedComment #11
quietone CreditAttribution: quietone commentedUsing this patch and setting batchsize to 2, I get WSOD when navigating to /upgrade. Drush migrate-upgrate .. configure only gives:
It is failing in one of the FieldInstancePer files, which has an initializeIterator method. There are about 26 other migrate files with an initializeIterator method. Can someone explain how these are supposed interact (or not) with the batch aware initializeIterator in SqlBase?
Comment #12
quietone CreditAttribution: quietone commentedline 286 is $query = clone $this->batchQuery.
Comment #13
quietone CreditAttribution: quietone commentedchx kindly pointed out the obvious, that is, to simply check if the clone exists using an isset.
This lets the configuration load. But there is still more work to do. I got 28 migration failures when migrating a small D6 test site.
Comment #14
quietone CreditAttribution: quietone as a volunteer commentedWell, this works locally. It could be way off as I haven't worked with iterators and queries like this before.
Regarding benjy's comments in #7; 1) the method names have been changed from get to fetch, 2-4) this patch is quite different, so they don't really apply anymore.
This needed a reroll, so there is no interdiff.
Comment #15
quietone CreditAttribution: quietone as a volunteer commentedComment #17
mikeryanComment #19
chx CreditAttribution: chx at Smartsheet commentedWhy is this not using PagerSelectExtender?
Comment #20
quietone CreditAttribution: quietone as a volunteer commentedThis isn't using PagerSelectExtender because I didn't know about it. But from reading and looking at existing usages it appears to be for UI use. Is that right? How would that work here?
Comment #21
chx CreditAttribution: chx at Smartsheet commentedOh it's still bound to globals? Nevermind me.
Comment #22
quietone CreditAttribution: quietone as a volunteer commentedchx, np, always good to learn something.
Here's a reroll. I tested with MigrateNodeTypeTest with a batch_size of 2, and it passed that test. But, I guess this still needs tests.
Comment #23
quietone CreditAttribution: quietone as a volunteer commentedComment #25
quietone CreditAttribution: quietone as a volunteer commentedFailure could be #2724871: Random failure in \Drupal\migrate_drupal_ui\Tests\d7\MigrateUpgrade7Test, so retesting.
Comment #26
quietone CreditAttribution: quietone as a volunteer commentedTests passed so setting to Needs Review.
Comment #27
quietone CreditAttribution: quietone as a volunteer commentedSince this needs tests changing to needs work.
Comment #29
quietone CreditAttribution: quietone as a volunteer commentedHope to post the reroll later today. Not sure how to write the tests, any pointers appreciated.
Comment #30
quietone CreditAttribution: quietone as a volunteer commentedReroll.
Comment #31
mikeryanComment #32
quietone CreditAttribution: quietone as a volunteer commentedThe batch_size configuration parameter is now checked to make sure it is an integer >=0. If it is invalid, a MigrateException is thrown in InitializeIterator, as suggested by mikeryan.
A test has been added. The test is a basically a copy/paste from MigrateSourceTestBase and MigrateSqlSourceTestBase with additional code to test that the batch size is set correctly and the final batch count is correct. The tests are run with a source table of 200 rows and batch sizes of 0, 20, 30, 200, and 300. I think that covers the possible cases. Although there is only one test where there is a remainder and an extra batch run is needed. I think that a smaller number of rows is equally valid, but on IRC phenaproxima suggest a few hundred, thus it 200.
Comment #34
quietone CreditAttribution: quietone as a volunteer commentedRTFM, again. "(PHP 7) - intdiv — Integer division"
Comment #35
mikeryanComment #36
mikeryanWe're close here, just one non-nit (first item below).
Why not just have
$batchSize = 0
, and skip directly to the batch_size check?Actually, tests that it throws an exception.
Ditto.
No blank line here.
Comment #37
quietone CreditAttribution: quietone as a volunteer commentedThx, mikeryan. 1) I toyed briefly with the idea of setting batchSize to 0 on errors, and didn't clean up when I realized that didn't make sense. 2,3,4) All fixed. Also corrected a comment in QueryBatchTest
Comment #38
mikeryanAll-righty, thanks!
Comment #39
alexpottNeeds a description.
It'd be nice if this was a bit more descriptive.
Why do we need this as well as $batchSize?
Really nice to see this comprehensive test coverage.
Some coding standards things from the new test.
Comment #40
quietone CreditAttribution: quietone as a volunteer commented1. This is no longer used and is deleted.
2. Comment improved.
3. Methinks you are right, it is not needed. It is removed and batchSize used in the test.
4. Thanks.
5. Fixed.
Comment #41
mikeryanSpace between > and 0 (can be fixed on commit).
Comment #42
quietone CreditAttribution: quietone as a volunteer commentedSigh, I missed one. @mikeryan, thx.
Comment #43
alexpottThese two ifs can be merged so less indentation.
This should be
if ($this->batchSize > 0 && !$this->getIterator()->valid()) {
Doing the cheap check first and less brackets.
We now do
$this->setExpectedException()
in the test... see https://thephp.cc/news/2016/02/questioning-phpunit-best-practices for why.Patch attached does all of this.
Comment #44
mikeryanComment #45
mikeryan@alexpott:
That post says:
So, shouldn't we be using expectException() and related methods here?
Is the change in best practice for Drupal documented anywhere?
Thanks.
Comment #46
mikeryanI answered my own question - we're on phpunit 4.8, which didn't have expectException()...
Comment #47
alexpottCommitted and pushed bf02c1b to 8.3.x and ebe5340 to 8.2.x. Thanks!
Committed to 8.2.x because migrate is still experimental and this is only adding stuff rather than changing.