In a high-volume migration into an SQL database, one of the biggest performance boosts you can achieve is by inserting multiple rows in a single statement. This isn't always practical - certainly not when using a Drupal entity destination, where any number of hooks/events are expecting to handle one entity at a time on save - but if you can do it (as in our case), it's a big performance win.

Proposal:

  1. Add a batch_size option to the table destination plugin.
  2. If batch_size is greater than one, save rows in the plugin rather than writing to the DB (returning as if they had been written), flushing them when the number of saved rows reaches the batch size, or a POST_IMPORT event is received.
  3. If the destination is auto-increment, generate new IDs in the plugin rather than the DB, returning them from the plugin.

This feature of course will come with caveats:

  1. A fatal error before a batch-in-progress is flushed will lead to the ID map referencing destination IDs which don't actually exist.
  2. In the auto-increment case, any other user/process writing to the DB during a migration will risk creating collisions.

A pre-requisite for this feature is #2949271: Add feature id_mapping for destination plugin table. - I will add tests there and try to get that over the finish line.

Note that I have implemented this on our project as a bulk_table plugin extending table, but it makes more sense to integrate the functionality. It'll take me a day or two to do that.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mikeryan created an issue. See original summary.

mikeryan’s picture

Issue summary: View changes

Updating attribution.

mikeryan’s picture

Status: Active » Needs review
FileSize
14.14 KB

This incorporates the patch for #2949371 - when that goes in, I'll reroll this.

mikeryan’s picture

Status: Needs review » Needs work

Just uncovered a problem - if a stub is needed and the migration that creates the stub uses the batching, the stub doesn't get created until... a "batch_size" number of stubs are "created"? Which is too late - we don't want to batch on stub creation, but I'm having trouble figuring out how the table plugin knows it's being used in a stub situation.

mikeryan’s picture

Version: 8.x-4.x-dev » 8.x-5.x-dev
Status: Needs work » Needs review
FileSize
14.38 KB

Ah - we don't have to explicitly check for stubbing, the plugin is created locally in MigrationLookup and thus destructed on return, so we can flush in __destruct() (which is a good idea anyway).

heddn’s picture

Status: Needs review » Needs work
Issue tags: +Needs reroll

There's a fair number of changes in Table::import and the patch doesn't apply right now. So I'm going to wait for a more exhaustive review for later. For now, just a few nits.

  1. +++ b/src/Plugin/migrate/destination/Table.php
    @@ -110,23 +134,79 @@ class Table extends DestinationBase implements ContainerFactoryPluginInterface {
    +        if (count($this->rowsToInsert) == 0) {
    

    ===

  2. +++ b/src/Plugin/migrate/destination/Table.php
    @@ -140,4 +220,43 @@ class Table extends DestinationBase implements ContainerFactoryPluginInterface {
    +  ¶
    

    Nit: whitespace.

mikeryan’s picture

Assigned: Unassigned » mikeryan
mikeryan’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll
FileSize
14.74 KB

Well, I'm diffing this directly against 8.x-5.x, but composer keeps insisting it can't apply the patch when requiring "drupal/migrate_plus": "^5.0@dev". Let's see how the testbot likes it...

mikeryan’s picture

Status: Needs review » Needs work

Duh, I was sure I had pulled 8.x-5.x freshly but apparently not...

mikeryan’s picture

Status: Needs work » Needs review
FileSize
12.83 KB

Let's try that again...

heddn’s picture

A single question.

  1. +++ b/src/Plugin/migrate/destination/Table.php
    @@ -159,4 +219,43 @@ class Table extends DestinationBase implements ContainerFactoryPluginInterface {
    +  public function preImport(MigrateImportEvent $event) {
    +  }
    

    This doesn't seem necessary.

  2. +++ b/src/Plugin/migrate/destination/Table.php
    @@ -159,4 +219,43 @@ class Table extends DestinationBase implements ContainerFactoryPluginInterface {
    +  public function postImport(MigrateImportEvent $event) {
    +    // At the conclusion of a given migration, make sure batched inserts go in.
    +    $this->flushInserts();
    ...
    +  public function __destruct() {
    +    // At the conclusion of a given migration, make sure batched inserts go in.
    +    $this->flushInserts();
    

    Do we need to make sure we don't try to call flushInserts twice?

mikeryan’s picture

This doesn't seem necessary.

Tell that to ImportAwareInterface!

Do we need to make sure we don't try to call flushInserts twice?

The __destruct() call will indeed be redundant most of the time - but when it is unnecessary, the batch will have been cleared by the postImport() call, so it won't do anything.

heddn’s picture

Status: Needs review » Reviewed & tested by the community

The last thing I can think of is doxygen on the new batch_size config option. But we don't have any comments. So let's bump that to a follow-up and land this.

heddn’s picture

  • heddn committed 1c1bcc4 on 8.x-5.x authored by mikeryan
    Issue #3090853 by mikeryan, heddn: Option for bulk inserting of rows...
heddn’s picture

Status: Reviewed & tested by the community » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.