Problem/Motivation

Migration is very slow when there are lots of rows in the message table. I saw this with file migration (~500K files). I tracked it down to slow queries involving WHERE clause on the table message.

I spent a lot of time to debug this, it's not easy the migration begin with correct ratio and slown down after 1 hour...

At the first of migration i have a ration of import ~ 100 files/s and after ~70K ration slown down to 15 files/s.

You can reproduce this scenario with a migration that write lot of row to message table, in my case i have a lot of message like date not valid or file don't exist. (The source data was not correctly valid.)

Exemple query :

DELETE FROM migrate_message_json_images
WHERE source_ids_hash = 'a03cf05d5026350fa2fd9650fd78d3879aad366235eee7df9af35e268957c90e';

Time Migration without patch : 13 hours.
Time Migration with patch : 1,5 hour.

This is possibly related to #2688297: File migration slows down and eats more and more memory.

Proposed resolution

Add indexes to migrate_message_* table on column source_ids_hash.

Data model changes

No changes to data model but indexe are added. Since this is not done at install, there shouldn't be much of a change.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

focus13 created an issue. See original summary.

focus13’s picture

Title: Add indexes to migrate_message_* tables » Add index to migrate_message_* tables
FileSize
1.38 KB
focus13’s picture

Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch, 2: add_index_to_migrate_message-3085192-1.patch, failed testing. View results

DamienMcKenna’s picture

It also needs an update script to add the index to existing migrate_message_* tables.

mikelutz’s picture

Priority: Major » Normal
Issue tags: -Migrate critical

This makes sense, but needs an update path and tests around the update path. A slightly more detailed profiling analysis would also help show the reasoning here, and prove that we have a net gain with time saved in reading versus extra time writing when we get large tables (Which I would certainly expect to be the case)

Removing the 'migrate critical' tag and setting to normal priority, The migrate critical tag is for items that are tied to preparation for Drupal 9 and a new Major version. Something like this could be put out in any minor release, so while important, it is not 'migrate critical', nor does it meet the standard of Major priority.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Wim Leers’s picture

Issue tags: +Performance
Kris77’s picture

@DamienMcKenna can you describe how to update with the script please?

I want to try this patch because I have to import about 1milion rows and now it takes two days.

Thanks a lot.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Wim Leers’s picture

Priority: Normal » Major

https://www.drupal.org/u/grasmash is working on a migration … where this patch made his migration three times faster.

This is especially important if you're using #2745797: Add option to content entity destinations for validation (CR: https://www.drupal.org/node/3073707), and your source site has some data integrity problems (common: required fields that were added after lots of content already existed, and the pre-existing content did not get updated to have values for those required fields).

I think I want to challenge the need for an update path and update path tests here. The mantra in the migration system is always: do not break existing migrations, but allow making things better by default for new migrations. Well … isn't that exactly the case here?

Wim Leers’s picture

#2 no longer applied to HEAD. Rebased.

Wim Leers’s picture

Wim Leers’s picture

Actually, I think we can even make this a unique key? No two source rows are supposed to have the same source IDs hash — if they do, then they cannot be identified uniquely either.

The "mapping" tables do already make source_ids_hash the primary key, so the uniqueness is already guaranteed.

Wim Leers’s picture

Title: Add index to migrate_message_* tables » Add unique key on source_ids_hash for migrate_message_* tables
Wim Leers’s picture

Here's a do-not-test patch relative to #2845340: migrate mapping & messages table names are truncated, can lead to incorrect mapping lookups, which is why my previous rebased patches were failing :) Also excluded the test changes in this one, to avoid the need to reroll all the time while having both patches applied.

The last submitted patch, 17: 3085192-17.patch, failed testing. View results

Wim Leers’s picture

Title: Add unique key on source_ids_hash for migrate_message_* tables » Add index on source_ids_hash for migrate_message_* tables
FileSize
1.36 KB

I don't know what I was thinking … obviously there can be more than one message per source row!

So #16 is correct, #17 is stupid.

Wim Leers’s picture

Here's a do-not-test patch relative to #2845340: migrate mapping & messages table names are truncated, can lead to incorrect mapping lookups, which is why my previous rebased patches were failing :) Also excluded the test changes in this one, to avoid the need to reroll all the time while having both patches applied.

joachim’s picture

Status: Needs review » Reviewed & tested by the community

LGTM.

alexpott’s picture

Version: 9.3.x-dev » 9.2.x-dev
Category: Task » Bug report
Status: Reviewed & tested by the community » Fixed

This is a major performance bug if it can have so much impact on a migration. Backported to 9.2.x - there's no reason to not get this in asap. I agree that writing an upgrade path is not worth it given how migrations tend to be ephemeral and this is not a breaking bug.

Committed and pushed f1c06e87e0 to 9.3.x and d4e4f7c1d3 to 9.2.x. Thanks!

  • alexpott committed f1c06e8 on 9.3.x
    Issue #3085192 by Wim Leers, focus13, grasmash: Add index on...

  • alexpott committed d4e4f7c on 9.2.x
    Issue #3085192 by Wim Leers, focus13, grasmash: Add index on...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.