Problem/Motivation

Large migrations can be problematic, from both a query standpoint and a processing standpoint. Adding the ability to batch process a migration would be helpful. With batch support, the ability to limit migrations would be a much easier task as well. See #2282013: Add a row limit option to MigrateExecutable.

Proposed resolution

Add batching support to the core migrate module.

Remaining tasks

All the tasks.

User interface changes

Possibly the ability to limit migrations from the UI.

API changes

TBD

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chx’s picture

I think #2282013: Add a row limit option to MigrateExecutable will be completely swallowed by this. Specify a batch size, run one batch, done.

mikeryan’s picture

Title: Add Batching Support » Add query batching to SqlBase

Batching for the UI, support for row limits, etc. is all being handled now by the contrib tools. The one element here we don't have in D8 is batching of large SQL queries - for very large migrations, the migration's base query may simply overrun memory limits/buffer sizes/etc. This functionality was added to D7 migrate in #2296187: Add batched query support to MigrateSQLSource, we should look at porting it forward (not from the patch there but from the current implementation, there ended up being a couple significant bugs in that patch).

mikeryan’s picture

mikeryan’s picture

Priority: Normal » Major
mikeryan’s picture

Status: Active » Needs review
Issue tags: +Needs tests
FileSize
5 KB

Here's a straight-forward port of the D7 functionality. Would be great if someone with a huge volume of content could manually test with this path.

mikeryan’s picture

And then, thinking about how to test this (which requires setting a batch_size option in the source plugin configuration), given the lack of tools for configuring migrations... The batching is disabled by default, but maybe we should enable it with some fair-sized value like, I don't know, 50000?

benjy’s picture

  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/SourcePluginBase.php
    @@ -284,7 +284,7 @@ public function next() {
    +      $this->getNextRow();
    
    @@ -318,6 +318,13 @@ public function next() {
    +  protected function getNextRow() {
    
    +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -225,10 +249,41 @@ protected function initializeIterator() {
    +  protected function getNextRow() {
    ...
    +  protected function getNextBatch() {
    

    It seems strange to prefix these with get when they don't actually return anything? Maybe it should just be nextRow() or prepareNextRow(), which then is kind of close to prepareRow(), open to suggestions?

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -32,6 +32,11 @@
    +  protected $batchQuery;
    

    Needs a class comment, can you explain how the batch query works here in relation to the normal query? E.g., explain we clone it upfront and then clone it for each new batch query.

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -157,6 +180,7 @@ protected function prepareQuery() {
    +    $this->batch = 0;
    

    Un-needed, already initialised on the class itself.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -225,10 +249,41 @@ protected function initializeIterator() {
    +      $this->getNextBatch();
    +      $this->getIterator()->next();
    ...
    +    $this->result = $query->execute();
    

    I'm not sure how this actually works. Where is $this->result initialised? And how is it used.

catch’s picture

50k for a default is good.

mikeryan’s picture

Status: Needs review » Needs work

It seems strange to prefix these with get when they don't actually return anything? Maybe it should just be nextRow() or prepareNextRow(), which then is kind of close to prepareRow(), open to suggestions?

s/get/fetch/ ?

Needs a class comment, can you explain how the batch query works here in relation to the normal query? E.g., explain we clone it upfront and then clone it for each new batch query.

As I recall, we cloned it in D7 so that our alterations didn't affect other usages of the main query (like retrieving counts). I'll review and document it more clearly.

Un-needed, already initialised on the class itself.

Hmm, could initializeIterator() be called more than once (perhaps in tests)? It is redundant, but maybe it's the initialization on the member that should be removed...

I'm not sure how this actually works. Where is $this->result initialised? And how is it used.

Umm... A product of blind copying, $this->result meant something in D7, but here is not defined or used so this is doing nothing useful. Which means this isn't nearly as simple as I initially thought, given that the Sql plugin is based on an IteratorIterator over the initial query...

chx’s picture

Issue tags: +SprintWeekend2016
quietone’s picture

Using this patch and setting batchsize to 2, I get WSOD when navigating to /upgrade. Drush migrate-upgrate .. configure only gives:

PHP Fatal error:  __clone method called on non-object in /opt/sites/md8/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php on line 286
PHP Stack trace:

It is failing in one of the FieldInstancePer files, which has an initializeIterator method. There are about 26 other migrate files with an initializeIterator method. Can someone explain how these are supposed interact (or not) with the batch aware initializeIterator in SqlBase?

quietone’s picture

+++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
@@ -225,10 +249,41 @@ protected function initializeIterator() {
+  protected function getNextBatch() {
+    $this->batch++;
+    $query = clone $this->batchQuery;
+    $query->range($this->batch * $this->batchSize, $this->batchSize);
+    $this->result = $query->execute();
+  }

line 286 is $query = clone $this->batchQuery.

quietone’s picture

chx kindly pointed out the obvious, that is, to simply check if the clone exists using an isset.

$this->batch++;
if (isset($this->batchQuery)) {
   $query = clone $this->batchQuery;
   $query->range($this->batch * $this->batchSize, $this->batchSize);
 }

This lets the configuration load. But there is still more work to do. I got 28 migration failures when migrating a small D6 test site.

quietone’s picture

Well, this works locally. It could be way off as I haven't worked with iterators and queries like this before.

Regarding benjy's comments in #7; 1) the method names have been changed from get to fetch, 2-4) this patch is quite different, so they don't really apply anymore.

This needed a reroll, so there is no interdiff.

quietone’s picture

Status: Needs work » Needs review

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

mikeryan’s picture

Version: 8.1.x-dev » 8.2.x-dev

Status: Needs review » Needs work

The last submitted patch, 14: 2309695-14.patch, failed testing.

chx’s picture

Why is this not using PagerSelectExtender?

quietone’s picture

This isn't using PagerSelectExtender because I didn't know about it. But from reading and looking at existing usages it appears to be for UI use. Is that right? How would that work here?

chx’s picture

Oh it's still bound to globals? Nevermind me.

quietone’s picture

chx, np, always good to learn something.
Here's a reroll. I tested with MigrateNodeTypeTest with a batch_size of 2, and it passed that test. But, I guess this still needs tests.

quietone’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 22: 2309695-22.patch, failed testing.

quietone’s picture

quietone’s picture

Status: Needs work » Needs review

Tests passed so setting to Needs Review.

quietone’s picture

Status: Needs review » Needs work

Since this needs tests changing to needs work.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

quietone’s picture

Assigned: Unassigned » quietone
Issue tags: +Needs reroll

Hope to post the reroll later today. Not sure how to write the tests, any pointers appreciated.

quietone’s picture

Status: Needs work » Needs review
FileSize
10.45 KB

Reroll.

mikeryan’s picture

Issue tags: -Needs reroll
quietone’s picture

The batch_size configuration parameter is now checked to make sure it is an integer >=0. If it is invalid, a MigrateException is thrown in InitializeIterator, as suggested by mikeryan.

A test has been added. The test is a basically a copy/paste from MigrateSourceTestBase and MigrateSqlSourceTestBase with additional code to test that the batch size is set correctly and the final batch count is correct. The tests are run with a source table of 200 rows and batch sizes of 0, 20, 30, 200, and 300. I think that covers the possible cases. Although there is only one test where there is a remainder and an extra batch run is needed. I think that a smaller number of rows is equally valid, but on IRC phenaproxima suggest a few hundred, thus it 200.

Status: Needs review » Needs work

The last submitted patch, 32: 2309695-32.patch, failed testing.

quietone’s picture

Status: Needs work » Needs review
FileSize
21.66 KB
674 bytes

RTFM, again. "(PHP 7) - intdiv — Integer division"

mikeryan’s picture

Assigned: quietone » mikeryan
mikeryan’s picture

Assigned: mikeryan » Unassigned
Status: Needs review » Needs work

We're close here, just one non-nit (first item below).

  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -43,6 +49,28 @@
    +  protected $batchSize = NULL;
    
    @@ -160,68 +188,113 @@ protected function prepareQuery() {
    +    if (is_null($this->batchSize)) {
    +      // Set the default batch size to 0..
    +      $this->batchSize = 0;
    

    Why not just have $batchSize = 0, and skip directly to the batch_size check?

  2. +++ b/core/modules/migrate/tests/src/Kernel/QueryBatchTest.php
    @@ -0,0 +1,265 @@
    +   * Tests a negative batch size is set to zero.
    

    Actually, tests that it throws an exception.

  3. +++ b/core/modules/migrate/tests/src/Kernel/QueryBatchTest.php
    @@ -0,0 +1,265 @@
    +   * Tests a non integer batch size is set to zero.
    

    Ditto.

  4. +++ b/core/modules/migrate/tests/src/Kernel/QueryBatchTest.php
    @@ -0,0 +1,265 @@
    +
    

    No blank line here.

quietone’s picture

Status: Needs work » Needs review
FileSize
21.59 KB
2.08 KB

Thx, mikeryan. 1) I toyed briefly with the idea of setting batchSize to 0 on errors, and didn't clean up when I realized that didn't make sense. 2,3,4) All fixed. Also corrected a comment in QueryBatchTest

mikeryan’s picture

Status: Needs review » Reviewed & tested by the community

All-righty, thanks!

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -29,6 +30,11 @@
       /**
    +   * @var \Drupal\Core\Database\Query\SelectInterface
    +   */
    +  protected $batchQuery;
    

    Needs a description.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -43,6 +49,28 @@
       /**
    +   * Current data batch.
    

    It'd be nice if this was a bit more descriptive.

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -43,6 +49,28 @@
    +  /**
    +   * Batching has run once.
    +   *
    +   * @var bool
    +   */
    +  protected $batchActive = FALSE;
    

    Why do we need this as well as $batchSize?

  4. +++ b/core/modules/migrate/tests/src/Kernel/QueryBatchTest.php
    @@ -0,0 +1,264 @@
    +    $test_parameters = [
    +      // Test when batch size is 0.
    +      [200, 0],
    +      // Test when rows mod batch size is 0.
    +      [200, 20],
    +      // Test when rows mod batch size is > 0.
    +      [200, 30],
    +      // Test when batch size = row count.
    +      [200, 200],
    +      // Test when batch size > row count.
    +      [200, 300],
    +    ];
    

    Really nice to see this comprehensive test coverage.

  5. FILE: .../drupal/core/modules/migrate/tests/src/Kernel/QueryBatchTest.php
    ----------------------------------------------------------------------
    FOUND 8 ERRORS AND 1 WARNING AFFECTING 9 LINES
    ----------------------------------------------------------------------
      72 | ERROR   | [x] Additional blank lines found at end of doc
         |         |     comment
      83 | WARNING | [ ] Line exceeds 80 characters; contains 81
         |         |     characters
     142 | ERROR   | [x] Expected "int" but found "integer" for parameter
         |         |     type
     146 | ERROR   | [x] Expected "int" but found "integer" for parameter
         |         |     type
     148 | ERROR   | [x] Expected "int" but found "integer" for parameter
         |         |     type
     175 | ERROR   | [x] Line indented incorrectly; expected 8 spaces,
         |         |     found 10
     182 | ERROR   | [x] Inline comments must end in full-stops,
         |         |     exclamation marks, colons, question marks, or
         |         |     closing parentheses
     263 | ERROR   | [x] Expected 1 blank line after function; 0 found
     264 | ERROR   | [x] The closing brace for the class must have an
         |         |     empty line before it
    ----------------------------------------------------------------------
    PHPCBF CAN FIX THE 8 MARKED SNIFF VIOLATIONS AUTOMATICALLY
    ----------------------------------------------------------------------
    

    Some coding standards things from the new test.

quietone’s picture

Status: Needs work » Needs review
FileSize
21.25 KB
4.69 KB

1. This is no longer used and is deleted.
2. Comment improved.
3. Methinks you are right, it is not needed. It is removed and batchSize used in the test.
4. Thanks.
5. Fixed.

mikeryan’s picture

Status: Needs review » Reviewed & tested by the community
+++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
@@ -279,9 +267,9 @@
+    if (!$this->getIterator()->valid() && ($this->batchSize >0)) {

Space between > and 0 (can be fixed on commit).

quietone’s picture

Sigh, I missed one. @mikeryan, thx.

alexpott’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
3.16 KB
21.18 KB
  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -160,68 +177,110 @@ protected function prepareQuery() {
    +    if ($this->batchSize == 0) {
    +      if (isset($this->configuration['batch_size'])) {
    

    These two ifs can be merged so less indentation.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -160,68 +177,110 @@ protected function prepareQuery() {
    +    if (!$this->getIterator()->valid() && ($this->batchSize >0)) {
    

    This should be
    if ($this->batchSize > 0 && !$this->getIterator()->valid()) {
    Doing the cheap check first and less brackets.

  3. +++ b/core/modules/migrate/tests/src/Kernel/QueryBatchTest.php
    @@ -0,0 +1,264 @@
    +   * @expectedException \Drupal\migrate\MigrateException
    +   * @expectedExceptionMessage batch_size must be greater than or equal to zero
    ...
    +   * @expectedException \Drupal\migrate\MigrateException
    +   * @expectedExceptionMessage batch_size must be greater than or equal to zero
    

    We now do $this->setExpectedException() in the test... see https://thephp.cc/news/2016/02/questioning-phpunit-best-practices for why.

Patch attached does all of this.

mikeryan’s picture

Assigned: Unassigned » mikeryan
mikeryan’s picture

@alexpott:

We now do $this->setExpectedException() in the test... see https://thephp.cc/news/2016/02/questioning-phpunit-best-practices for why.

That post says:

There is nothing that you can do with these new methods that you could not have done with the old setExpectedException() method. However, because setExpectedException() can do everything these new methods can its API was convoluted and inconvenient. Separation of concerns, anyone? The setExpectedException() method has been deprecated and will be removed in PHPUnit 6.

So, shouldn't we be using expectException() and related methods here?

Is the change in best practice for Drupal documented anywhere?

Thanks.

mikeryan’s picture

Assigned: mikeryan » Unassigned
Status: Needs review » Reviewed & tested by the community

So, shouldn't we be using expectException() and related methods here?

I answered my own question - we're on phpunit 4.8, which didn't have expectException()...

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed bf02c1b to 8.3.x and ebe5340 to 8.2.x. Thanks!

Committed to 8.2.x because migrate is still experimental and this is only adding stuff rather than changing.

  • alexpott committed bf02c1b on 8.3.x
    Issue #2309695 by quietone, alexpott, mikeryan, benjy: Add query...

  • alexpott committed ebe5340 on 8.2.x
    Issue #2309695 by quietone, alexpott, mikeryan, benjy: Add query...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.