Problem/Motivation

Significant delays can be introduced by some of the queries executed from this module. In the comments below we see reports from users experiencing delays of several minutes to an hour before migrations commence.

Proposed resolution

  • Use static caches to avoid repeated lookups
  • Optimise queries where possible
  • Merge partial fixes early, don't wait on perfection

Remaining tasks

  • Review patche(s)
  • Merge!

API changes

Comments

huzooka created an issue. See original summary.

wim leers’s picture

🏎

xurizaemon’s picture

Subscribe!

Drupal\media_migration\Utility\SourceDatabase::getFormatsHavingFileLink() is scanning all of field_revision_body for '%<a %href=_/file%' (repeatedly, I think? perhaps per process definition?) in order to establish the formats to consider.

First thoughts:

- This result could be cached, if I am indeed seeing it happen >1 times in the same drush migrate:reset-status
- The query could be perhaps limited to body revisions within the migration source scope
- The need to do this could be bypassed by providing process plugin configuration?

When I hit this while executing drush migrate:reset-status MIGRATION_ID after enabling Media Migration, the time to reset the migration went from a few seconds to a quarter hour (long enough to find this issue ... after a couple of false start interruptions!).

xurizaemon’s picture

Very crudely, I think a static cache in media_migration_migrate_d7_filter_format_prepare_row() will prevent this being an issue for my setup. Note that this code will not suit if the migrations considers multiple source connections. Just capturing this for the moment.

While this function seems slow for a larger content set, what blew out the time to 15 minutes seems to be that the query was slow and that it was repeated with no change in inputs.

  static $formats_formatting_linkit = null;
  if (is_null($formats_formatting_linkit)) {
    $formats_formatting_linkit = SourceDatabase::getFormatsHavingFileLink($source_connection);
    if (in_array($filter_format_id, $formats_formatting_linkit)) {
      $filters['media_migration_added__linkit'] = [
        'name' => 'linkit',
        'weight' => 0,
        'settings' => [],
      ];
      $row->setSourceProperty('filters', $filters);
    }
  }

Other options seem reasonable too, plan to come back to this - from the issue description, I'm going to guess it's not the only spot with room for optimisation :)

xurizaemon’s picture

Oh look, there's a similar strategy in ::getFormatsUsingTag() already :)

kunalkursija’s picture

+1. I am facing the same issue and https://git.drupalcode.org/project/media_migration/-/blob/8.x-1.0-alpha1... is taking 3+ seconds to execute.

Hence even a single migrate-status command is taking 1+ hour to execute.

kunalkursija’s picture

Adding patch to introduce function level static caching in SourceDatabase::getFormatsHavingFileLink similar to SourceDatabase::getFormatsUsingTag.

kunalkursija’s picture

Removing unnecessary variable declaration from the patch in #7 and adding a new one.

wim leers’s picture

Status: Active » Needs review

#5: was going to say that! :D

Marking Needs review so we get test results :)

kunalkursija’s picture

Ahh, I didn't know tests only run when an issue state is "Needs Review". Thanks! @Wim Leers :)

xurizaemon’s picture

We have a 100K revision migration which is improved by the patch in #8 significantly. For that alone, I give +1 to the patch as provided.

The queries in getFormatsHavingFileLink() and getFormatsUsingTag() still take ~20s each the first call. I suspect we can do much more here, but we shouldn't optimise too far - a drop from one hour to one minute is significant enough to accept.

Looking further, because the minute long waits still bothered me:

MediaWysiwygPluginBase looks for all ContentEntity migrations, then calls SourceDatabase::getFormatsHavingFileLink() and SourceDatabase::getFormatsUsingTag() for each migration discovered. Even when running drush ms some_migration, a lot of unused migrations are likely to be given this treatment. With the static cache that impact is reduced, but I wonder if it could be bypassed entirely.

One idea would be to move the list of formats to an optional configuration to be applied from the filter. The existing behaviour could be retained, but if configuration is passed in then those lookups could be skipped.

Thinking about this I looked at how we end up hitting this code, and it doesn't originate from the process plugin but from hook_migrate_plugins_alter().

The service called there is described as Migration plugin alterer for "fixing" migrations provided by Migrate Tools, and we're not using Migrate Tools at all, so maybe next I'll disable that call in media_migration_migration_plugins_alter() entirely and see if it's doing anything for us.

(None of the migrations which I saw it try to modify dynamically are migrations which we actually execute ... it's possible it's doing something so tricky I'm not seeing it, but it's also possible it's doing nothing at all for us.)

hugronaphor’s picture

Much needed fix for me.

My commands were taking at least 15 minutes to run after enabling LinkIt

xurizaemon’s picture

Issue summary: View changes

Updated issue description. Patch still applying OK to 1.0-alpha16.

very_random_man’s picture

Status: Needs review » Reviewed & tested by the community

This speeds things up dramatically for me. I think Xurizaemon is probably right in that there are still improvements to be made as I'm also seeing delays introduced into unrelated specific migrations but I don't think the prospect of further optimisations should hold this back from being merged as the improvement is huge.

firewaller’s picture

#8 works for me