Problem/Motivation
Significant delays can be introduced by some of the queries executed from this module. In the comments below we see reports from users experiencing delays of several minutes to an hour before migrations commence.
Proposed resolution
- Use static caches to avoid repeated lookups
- Optimise queries where possible
- Merge partial fixes early, don't wait on perfection
Remaining tasks
- Review patche(s)
- Merge!
API changes
| Comment | File | Size | Author |
|---|---|---|---|
| #8 | Static-Cache-GetFormatsHavingFileLink-3209706-8.patch | 1.02 KB | kunalkursija |
Comments
Comment #2
wim leers🏎
Comment #3
xurizaemonSubscribe!
Drupal\media_migration\Utility\SourceDatabase::getFormatsHavingFileLink() is scanning all of field_revision_body for
'%<a %href=_/file%'(repeatedly, I think? perhaps per process definition?) in order to establish the formats to consider.First thoughts:
- This result could be cached, if I am indeed seeing it happen >1 times in the same
drush migrate:reset-status- The query could be perhaps limited to body revisions within the migration source scope
- The need to do this could be bypassed by providing process plugin configuration?
When I hit this while executing
drush migrate:reset-status MIGRATION_IDafter enabling Media Migration, the time to reset the migration went from a few seconds to a quarter hour (long enough to find this issue ... after a couple of false start interruptions!).Comment #4
xurizaemonVery crudely, I think a static cache in
media_migration_migrate_d7_filter_format_prepare_row()will prevent this being an issue for my setup. Note that this code will not suit if the migrations considers multiple source connections. Just capturing this for the moment.While this function seems slow for a larger content set, what blew out the time to 15 minutes seems to be that the query was slow and that it was repeated with no change in inputs.
Other options seem reasonable too, plan to come back to this - from the issue description, I'm going to guess it's not the only spot with room for optimisation :)
Comment #5
xurizaemonOh look, there's a similar strategy in
::getFormatsUsingTag()already :)Comment #6
kunalkursija commented+1. I am facing the same issue and https://git.drupalcode.org/project/media_migration/-/blob/8.x-1.0-alpha1... is taking 3+ seconds to execute.
Hence even a single
migrate-statuscommand is taking 1+ hour to execute.Comment #7
kunalkursija commentedAdding patch to introduce function level static caching in
SourceDatabase::getFormatsHavingFileLinksimilar toSourceDatabase::getFormatsUsingTag.Comment #8
kunalkursija commentedRemoving unnecessary variable declaration from the patch in #7 and adding a new one.
Comment #9
wim leers#5: was going to say that! :D
Marking so we get test results :)
Comment #10
kunalkursija commentedAhh, I didn't know tests only run when an issue state is "Needs Review". Thanks! @Wim Leers :)
Comment #11
xurizaemonWe have a 100K revision migration which is improved by the patch in #8 significantly. For that alone, I give +1 to the patch as provided.
The queries in
getFormatsHavingFileLink()andgetFormatsUsingTag()still take ~20s each the first call. I suspect we can do much more here, but we shouldn't optimise too far - a drop from one hour to one minute is significant enough to accept.Looking further, because the minute long waits still bothered me:
MediaWysiwygPluginBase looks for all ContentEntity migrations, then calls
SourceDatabase::getFormatsHavingFileLink()andSourceDatabase::getFormatsUsingTag()for each migration discovered. Even when runningdrush ms some_migration, a lot of unused migrations are likely to be given this treatment. With the static cache that impact is reduced, but I wonder if it could be bypassed entirely.One idea would be to move the list of formats to an optional configuration to be applied from the filter. The existing behaviour could be retained, but if configuration is passed in then those lookups could be skipped.
Thinking about this I looked at how we end up hitting this code, and it doesn't originate from the process plugin but from hook_migrate_plugins_alter().
The service called there is described as Migration plugin alterer for "fixing" migrations provided by Migrate Tools, and we're not using Migrate Tools at all, so maybe next I'll disable that call in
media_migration_migration_plugins_alter()entirely and see if it's doing anything for us.(None of the migrations which I saw it try to modify dynamically are migrations which we actually execute ... it's possible it's doing something so tricky I'm not seeing it, but it's also possible it's doing nothing at all for us.)
Comment #12
hugronaphor commentedMuch needed fix for me.
My commands were taking at least 15 minutes to run after enabling LinkIt
Comment #13
xurizaemonUpdated issue description. Patch still applying OK to 1.0-alpha16.
Comment #14
very_random_man commentedThis speeds things up dramatically for me. I think Xurizaemon is probably right in that there are still improvements to be made as I'm also seeing delays introduced into unrelated specific migrations but I don't think the prospect of further optimisations should hold this back from being merged as the improvement is huge.
Comment #15
firewaller commented#8 works for me