Migration runs twice, updates content second time through [#2403593]

I have a fairly basic D6->D7 migration set up using migrate_d2d, and I"m having an issue with my node migrations where the migration runs twice. It will complete the migration once, stop (or do something else, I don't know what), and then re-run with the same nodes again, updating them. The final output from the drush mi command will be

Processed 4 (2 created, 2 updated, 0 failed, 0 ignored) in 98.7 sec (2/min) - done with 'CNSArticle'

when I've run it for two nodes using the --idlist parameter. It didn't do this with user, taxonomy, and file migrations, and the node migrations are fairly straightforward, with no additions or overrides of the query.

Here's the code to set up the migrations:

/**
 * Register all D6->D7 migrations.
 */
function cns_migrate_register_migrations() {
  // Define arguments used for all migrations.
  $common_arguments = array(
    'source_connection' => 'legacy',
    'source_version' => 6,
    'format_mappings' => array(
      1 => 'filtered_html',
      2 => 'full_html',
      3 => 'full_html',
      4 => 'full_html',
    ),
  );

  $node_arguments = array(
    array(
      'class_name' => 'CNSArticleMigration',
      'description' => t('Migration of CNS article nodes from Drupal 6'),
      'machine_name' => 'CNSArticle',
      'source_type' => 'article',
      'destination_type' => 'article',
      'dependencies' => array('Category', 'Tags'),
    ),
    array(
      'class_name' => 'APArticleMigration',
      'description' => t('Migration of AP article nodes from Drupal 6'),
      'machine_name' => 'APArticle',
      'source_type' => 'articleap',
      'destination_type' => 'articleap',
      'dependencies' => array('Category', 'Tags'),
    ),
  );
  // Tell the node migrations where the users are coming from, so they can
  // set up the dependency and resolve D6->D7 uids.
  $common_node_arguments = $common_arguments + array(
      'user_migration' => 'CNSUser',
      'group_name' => 'node',
    );
  foreach ($node_arguments as $arguments) {
    $arguments = array_merge_recursive($arguments, $common_node_arguments);
    MigrationBase::registerMigration($arguments['class_name'], $arguments['machine_name'],
      $arguments);
  }
}

And the classes themselves:

class ArticleMigration extends DrupalNode6Migration {
  public function __construct(array $arguments) {
    parent::__construct($arguments);

    // Mappings for taxonomy terms.
    $this->addFieldMapping('field_category', 1)
      ->sourceMigration('Category');
    $this->addFieldMapping('field_category:source_type')->defaultValue('tid');
    $this->addFieldMapping('field_tags', 2)
      ->sourceMigration('Tags');
    $this->addFieldMapping('field_tags:source_type')->defaultValue('tid');
    // Images multifield
    $this->addFieldMapping('field_images:field_image', 'field_image')
      ->sourceMigration('CNSFile');
    $this->addFieldMapping('field_images:field_image:file_class')
      ->defaultValue('MigrateFileFid');
    $this->addFieldMapping('field_images:field_image:preserve_files')
      ->defaultValue(TRUE);
  }

  public function prepare($entity, stdClass $row) {
    // Add source nid to Disqus field so that comments are mapped to new nodes.
    // Uses patch from https://www.drupal.org/node/1175668#comment-7847119.
    // nid 77900 is when Disqus comments were first used.
    if ($row->nid > 77900) {
      $entity->disqus = array(
        'domain' => variable_get('disqus_domain', ''),
        'url' => $entity->path['alias'],
        'title' => $entity->title,
        'did' => NULL,
        'developer' => 1,
      );

      $entity->disqus_identifier_auto = FALSE;
      $entity->disqus_status = TRUE;
      $entity->disqus_identifier = 'node/' . $row->nid;
    }
    else {
      // Pre Disqus, so we disable Disqus on the node.
      $entity->disqus_status = FALSE;
      $entity->disqus_identifier_auto = FALSE;
    }
  }
}

class CNSArticleMigration extends ArticleMigration {
  public function __construct(array $arguments) {
    parent::__construct($arguments);

    // Author
    $this->addFieldMapping('field_author', 'field_source')
      ->sourceMigration('CNSUser');
  }
}

class APArticleMigration extends ArticleMigration {
  public function __construct(array $arguments) {
    parent::__construct($arguments);

    // Author
    $this->addFieldMapping('field_author_ap', 'field_apauthor');
    // AP tracking fields.
    $this->addSimpleMappings(array('field_ap_mgmt_id', 'field_ap_mgmt_seq_num'));

  }
}

This also happens if I use the CNSArticleMigration and APArticleMigration classes directly instead of deriving from the ArticleMigration class. I also thought that the multifield migrate handler (something I'm having a separate issue with) might be the cause, but when I remove that, I still have this problem.

I've googled around and haven't been able to find a similar issue. Anyone know what could be causing this? I'm not sure if it's particular to migrate_d2d or migrate, so I'm starting here.

Thanks.

Comment	File	Size	Author
#9	migration_runs_twice-2403593-9.patch	1.34 KB	jcisio
#9
#8	migration_runs_twice-2403593-8.patch	1.34 KB	mikeryan
#8

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

wonder95 CreditAttribution: wonder95 commented 7 January 2015 at 21:35

Issue summary:

View changes

Correcting node migration class names in description.

Comment #2

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 12 January 2015 at 15:41

Status:

Active

» Postponed (maintainer needs more info)

I don't see anything obviously wrong here. To be clear, you're doing something like

drush mi CNSArticle --idlist=123,456

to get the status of 2 created, 2 updated?

One thing that causes unexpected processing is stub creation - if there's a reference to an as-yet-unmigrated item, a stub will be created, and then when the target migration runs there's an update. I'm not seeing any such references in the above code, however.

Have you tried debugging to see exactly what's being processed? E.g., dump in the incoming nid in prepareRow() to see exactly what source nodes are being processed, and the resulting nid in complete() to see what's being created/updated?

Comment #3

jcisio CreditAttribution: jcisio commented 12 January 2015 at 16:47

Status:

Postponed (maintainer needs more info)

» Active

The problem is in MigrateSourceSQL->getNextRow(). In performRewind() a lot of stuffs are done, like filter with idlist. But later in getNextRow() when it runs out of records, it makes a getNextBatch() which does nothing at all and the same ids are found again.

I think there are 2 places to be fixed:

- When there is idlist, if num_processed == count(idlist) return.
- Refactor performRewind and getNextBatch to have the same conditions. Or simply remove getNextBatch() as I don't think why it is necessary.

Comment #4

jcisio CreditAttribution: jcisio commented 12 January 2015 at 16:49

Ah found #2296187: Add batched query support to MigrateSQLSource.

Comment #5

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 12 January 2015 at 16:54

All right, looks like a bug in getNextBatch() then - as you've discovered, it's a relatively new feature which helps with very large queries.

Comment #6

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 12 January 2015 at 16:56

Comment #7

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 12 January 2015 at 17:29

Priority:

Normal

» Major

The problem (or, at least A problem, see below) is getNextBatch() clones the original query, from before the modifications made in performRewind(). So, given our batch size of 1000, with more than 1000 rows in the source:

1. performRewind() clones the original query (which pulls more than 1000 rows), and adds various stuff, including in this particular case an IN() clause that limits the results to just two rows.
2. The two rows are processed.
3. getNextRow() sees no new rows, and calls getNextBatch().
4. getNextBatch() clones the original query, without doing the alterations from performRewind(), and executes it.
5. We are now pulling from the original query starting with row 1001, and thus processing rows we shouldn't.

So... This is clearly wrong, but it doesn't lead to the same result as was originally reported, so I'm not quite sure if fixing getNextBatch to use the modified query necessarily addresses the original problem. But, I'll submit a patch to address this problem and you can try it.

Thanks.

Comment #8

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 12 January 2015 at 19:15

Status:	Active	» Needs review
Issue tags:		+Migrate 2.7

File	Size
migration_runs_twice-2403593-8.patch	1.34 KB

I did replicate this with the original behavior, where the specified item was not in the first batch - this patch works for me.

Comment #9

jcisio CreditAttribution: jcisio commented 13 January 2015 at 08:54

Version:

7.x-2.6-rc2

» 7.x-2.x-dev

File	Size
migration_runs_twice-2403593-9.patch	1.34 KB

+++ b/plugins/sources/sql.inc
@@ -356,6 +356,9 @@ class MigrateSourceSQL extends MigrateSource {
+    // Save our fixed-up query so getNextBatch() matches it.
+    $this->alteredQuery = clone $this->query();
+

Wouldn't it be clone $this->query? It works fine and fixes the issue.

In #3 I thought about another optimization: when idlist is provided, we know exactly the maximum number of rows to be processed because IDs are unique. When the this number is reached, we could end the iteration. E.g. when there are 1,000,000 rows and the idlist contains 3 IDs, it avoids running the batch 1,000 times. However, as I tested, because the query is filtered with IDs, the execution time for each query is less than 1 ms, so it might not worth complicating the logic.

Comment #10

15 January 2015 at 22:54

mikeryan committed 86a140f on 7.x-2.x authored by jcisio

Issue #2403593 by mikeryan, jcisio: Migration runs twice, updates...

Comment #11

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 15 January 2015 at 22:54

Committed, thanks.

Comment #12

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 15 January 2015 at 23:20

Status:

Needs review

» Fixed

Comment #13

wonder95 CreditAttribution: wonder95 commented 20 January 2015 at 19:19

This patch worked perfectly for me.

Comment #14

3 February 2015 at 19:24

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Migration runs twice, updates content second time through

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Related issues

Referenced by

Thank you to these Drupal contributors

News items

Our community

Documentation

Drupal code base

Governance of community