Problem/Motivation
Migrations use a lot of memory, and the current method of reclaiming memory is often ineffective.
Follow-up from #2503999-32: Large volume entity migrations run out of memory
This issue was to followup on the consequences of invalidating entity caches when reclaiming static memory. Context from catch's comment:
Opened #2558857: Migrations invalidate entity caches when reclaiming memory to track the persistent cache clearing.
There are two cases that can be tricky:
- migrations that need to load entities as part of the migration may slow down a lot - some sites take 10 or 20 hours to migrate so a 10% regression can mean an extra hour or two.
- partial migrations on a live site
And on the subject of the original intent of this issue - the first point isn't really an issue, the cache is only cleared when memory has nearly run out, it will be very infrequent so migrations that load entities will benefit from the cache their entire run except shortly after clearing the cache.
Migrations on a live site (as using migration in a feeds-like way to do regular import) would be the greater concern (although, unless the periodic imports are very large they shouldn't hit the cache clearing code).
Proposed resolution
The current code for reclaiming memory is
// Entity storage can blow up with caches so clear them out.
$entity_type_manager = \Drupal::entityTypeManager();
foreach ($entity_type_manager->getDefinitions() as $id => $definition) {
$entity_type_manager->getStorage($id)->resetCache();
}
This is complicated (looping through all entity types). Worse, resetCache() only invalidates cache items, it does not reclaim any memory. Instead, delete the entity-storage cache with
\Drupal::service('entity.memory_cache')->deleteAll();
Remaining tasks
User interface changes
None
API changes
None
Data model changes
None
| Comment | File | Size | Author |
|---|---|---|---|
| #34 | 2558857-34.patch | 829 bytes | heddn |
| #34 | interdiff_30-34.txt | 718 bytes | heddn |
Comments
Comment #2
andypostComment #3
andriyun commentedComment #4
andypostGreat!
Comment #5
berdirThat fixes #32 but I don't think that's why @catch opened this issue? This is about finding a way to only invalidate static caches but not the persistent ones.
Comment #6
mikeryanUmm... This issue was not about the little entityManager() cleanup, that should have been a new, separate minor issue.
This issue was to followup on the consequences of invalidating entity caches when reclaiming static memory. Context from catch's comment:
And on the subject of the original intent of this issue - the first point isn't really an issue, the cache is only cleared when memory has nearly run out, it will be very infrequent so migrations that load entities will benefit from the cache their entire run except shortly after clearing the cache.
Migrations on a live site (as using migration in a feeds-like way to do regular import) would be the greater concern (although, unless the periodic imports are very large they shouldn't hit the cache clearing code).
Comment #8
andriyun commentedPatch for quick fix cleanup moved to issue #2559191: Clean-up migrate executable
Comment #9
andypostLet's keep the issue as clean-up as #8 said, and continue in #2545632: [PP1] Move memory reclamation out of migrate executable
Comment #11
damiankloip commentedAfaict, that issue won't resolve this one.
Everyone is trying to find workarounds for the simple fact that the ContentEntityStorageBase::resetCache method completely and utterly abuses the API.
Comment #14
heddnupdated IS
Comment #15
catchI think we should mark this duplicate of #1596472: Replace hard coded static cache of entities with cache backends.
Comment #18
nevergoneRelated code: http://cgit.drupalcode.org/drupal/commit/?id=2d5dc23546cab86d27f944a4fd0...
Comment #19
andypostIssue summary was updated in #14 and it's exactly about static caches of entity storage handlers
@nevergone that's why related issue was decaupled
Comment #20
andypostComment #21
pounardDid you just mark this issue as being related to itself ?
Comment #22
andypostsorry, proper one
Comment #23
catchPostponing this on those issues (which I'm re-rolling). The first will make reclamation safe to do without affecting the persistent cache, the second may mean we can drop memory reclamation altogether.
Comment #26
grahlI've been trying to run several large migrations (500k-5m records) with incremental updates and was constantly running into memory issues. I had to set the memory limit quite high so that it reflected a significant portion of overall memory. When Migrate then started reclaiming memory it was not unusual for the process to be killed due to the limited remaining resources.
It turns out, that the high memory usage is entirely avoidable when we just clear the MemoryCache. A hacky workaround for this is attached. Obviously the referenced #1199866: Add an in-memory LRU cache would be the better approach. This workaround avoids the issues outlined above with restarting migrations and starting from scratch, as well as providing nearly constant and reasonable memory usage.
(I think batch_size for SQL sources does make a difference there per migration, too, but haven't rerun the numbers.)
Comment #27
kevinquillen commentedI had similar issues for migrations that used files on disk. JSON files, containing anywhere from 100 to up to 28,000 items, depending. The larger ones definitely had issues, especially if they generated Paragraphs. The memory reclamation part in migrate seems to fail or do nothing, at least on Pantheon. It will hit 85% used, reclaim a little bit, do a few more rows and exit.
The only way around this for me (Pantheon) was writing a script that broke down JSON files from one large file to many smaller files, and a new Drush command that could migrate and override the source file URL with one provided via argument and loop over that. We no longer had memory issues, but isn't the most ideal solution.
Oddly enough, this never happens to anyone on the dev team when doing a migration in the local Docker environment, it only happened on Pantheon. I am not totally sure why.
Comment #30
heddnI've been doing some profiling of a migration recently. And what I find is that the biggest issue we are facing is that entity cache is "reset". Which just invalidates the memory. It doesn't actually clear out the memory. There is #1199866: Add an in-memory LRU cache, and that seems like a much better permanent solution, but for now, let actually reclaim memory. Patch attached. No interdiff, as this is a distinctly different approach.
Comment #31
benjifisherThis will be a sort of rambling review, as I try to understand how this all works. Please confirm if you can. Perhaps some of these comments will be helpful in future reviews.
This is one of the smaller patches I have looked at recently:
Nit: as long as we are updating the comment, can we add a comma before "so"?
Nit: since we never use the
$memory_cachevariable again, can we eliminate it?The service resolves to
Drupal\Core\Cache\MemoryCache\MemoryCache. This class does not know anything about entities, so I guess the only thing that ties it to the entity system is that only entity-related code should be using this service. I searched the codebase forentity.memory_cacheand am reviewing the results.The code comment for
reset()in the parent class says, "This is only used by tests", but this is not enforced andEntityTypeManager::useCaches()breaks that promise.Here is the code for
EntityStorageBase::resetCache(), the method we are removing in this patch:As @heddn says in #30, this "… just invalidates the memory. It doesn’t actually clear out the memory", at least for entity types that cannot be statically cached. I do not see a mechanism for deleting invalidated items from the cache. Looking at the code for
invalidateTags(), I see that it just sets theexpireproperty to "one second before the request time".in
SqlFieldableEntityTypeListenerTrait::copyData(), I see a similar usage of\Drupal::service('entity.memory_cache')->deleteAll();, so there is precedent for this strategy.Conclusion
This change looks like a good idea. It actually removes the cache instead of invalidating it. As a bonus, it does not spend a lot of effort looping through entity types to figure out which tags to invalidate. I declare the patch reviewed but not tested.
How do we test this? The question applies to both manual and automated testing.
Comment #32
benjifisherI am updating the issue summary with some of the notes from my previous comment.
Comment #33
catch#30 looks good, and I think we should go ahead here. I'm not sure it's possible to add an automated test, memory cache itself does have test coverage and we're just calling one method, so possibly that's enough. For manual testing, would probably need to run a migration and do tens of thousands of items in one go, or artificially lower the memory limit with a smaller number of items - but need to break things to reproduce the out of memory first.
If/when we make #3190992: Add a WeakReference memory cache implementation the default entity cache backend, we should be able to remove the remaining line here.
Comment #34
heddnIf you need profiling, I've used this patch on a pantheon hosted site that has very limited memory allocated to it. Using this patch, the migrations run and finish successfully. Without it, the migration continually dies a horrible death against memory limits. It doesn't solve all the problems. I still have to wrap the drush migration command in bash loop
drush ms --fields="Migration ID" --tag=Content |grep upgrade_d6 | while read line ; do echo $line; drush mim --continue-on-failure $line </dev/null ; done. But this patch and that loop makes things possible on Pantheon that which wasn't possible previously.Also, address feedback from #31.
Comment #35
heddnComment #36
benjifisher@heddn, thanks for fixing the two nits. I consider this reviewed but not tested (RBNT?). I pinged @kevinquillen (Comment #27) on Slack, and I hope he can do some testing to confirm.
@catch, thanks for the guidance.
Comment #37
joachim commentedPatch LGTM, fixes the problem described in the IS.
Working well for me on a Commerce Order rollback which was running out of memory (probably due to Commerce Migrate module loading all the order items for each order).
Comment #40
catchCommitted/pushed to 9.2.x, and cherry-picked to 9.1.x, thanks!
Comment #42
kevinquillen commentedThumbs up all - seems to work for me too. Sorry for the late response!