Problem/Motivation

In large migrations, memory can get locked up in the entity system and not recovered. This is because the entity system stores a cache of loaded entities as a property on the storage class.

Proposed resolution

Iterate over entity storage classes and call their static reset method.

Remaining tasks

none

User interface changes

none

API changes

none

Data model changes

none

Original Report

When running a large migration, something internal structure is not getting reclaimed leading to a giant memory leak.

running an import with about 180,000 nodes I got this error:

nodes Running d6_node                                                             [ok]
Memory usage is 870.48 MB (85% of limit 1 GB), reclaiming memory.      [warning]
Memory usage is now 870.47 MB (85% of limit 1 GB), not enough          [warning]
reclaimed, starting new batch

at which point it fails and moves on to and fails importing revisions and other imports.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mikeryan’s picture

Component: migration system » entity system

We've had some discussion of this in IRC. Migrate (in 7 & 8) monitors memory usage and when reaching a threshold (defaulting to 85% of memory_limit) issues a drupal_static_reset() and checks to see if "enough" memory was released. This works just fine with Drupal 7 core - node caches and such use drupal_static() and get cleared, migrations generally only run into memory problems with the odd contrib or custom module doing local static caching.

In Drupal 8, however, EntityStorageBase::$entities contains an in-memory cache of entities. To clean them up, migration would need to find all the EntityStorageBase instances and call resetCache() on each of them. First, though, a question - is there any reason EntityStorageBase shouldn't be using drupal_static() to cache the entities?

neclimdul’s picture

Status: Active » Needs review
FileSize
899 bytes

I talked to larowlan and I didn't get the impression he was interested in fixing this in the entity system. I don't know if there is would be support from other people but for the time being here is the patch I'm using to work around the leak as far as migration is concerned.

andypost’s picture

Would be great to limit that clearing with only target entities from executed migrations

+++ b/core/modules/migrate/src/MigrateExecutable.php
@@ -568,7 +568,17 @@ protected function attemptMemoryReclaim() {
+    foreach ($manager->getDefinitions() as $id => $definition) {
+      $manager->getStorage($id)->resetCache();

would be great to skip caches that have static disabled

neclimdul’s picture

Would be great to limit that clearing with only target entities from executed migrations

I'm not sure I agree. This isn't run on every pass, it is resolving an excessively high memory event and we should try to clear every thing in Drupal. Furthermore, the cost of iterating and clearing is cheap in the context of what we are doing.

would be great to skip caches that have static disabled

Wouldn't they just be a no-op method call? Why would we bother skipping them?

A little more information, the patch running on our migration looks like this:

...
Running d6_node                                                             [ok]
Memory usage is 435.29 MB (85% of limit 512 MB), reclaiming memory.    [warning]
Memory usage is now 155.21 MB (30% of limit 512 MB), reclaimed         [warning]
enough, continuing
Memory usage is 435.34 MB (85% of limit 512 MB), reclaiming memory.    [warning]
Memory usage is now 153.69 MB (30% of limit 512 MB), reclaimed         [warning]
enough, continuing
Memory usage is 435.29 MB (85% of limit 512 MB), reclaiming memory.    [warning]
Memory usage is now 153.31 MB (30% of limit 512 MB), reclaimed         [warning]
enough, continuing
Invalid argument supplied for foreach() Iterator.php:31                [warning]
Memory usage is 435.26 MB (85% of limit 512 MB), reclaiming memory.    [warning]
Memory usage is now 154.51 MB (30% of limit 512 MB), reclaimed         [warning]
enough, continuing
Memory usage is 435.29 MB (85% of limit 512 MB), reclaiming memory.    [warning]
Memory usage is now 159.62 MB (31% of limit 512 MB), reclaimed         [warning]
enough, continuing
Memory usage is 435.2 MB (85% of limit 512 MB), reclaiming memory.     [warning]
Memory usage is now 151.57 MB (30% of limit 512 MB), reclaimed         [warning]
enough, continuing
Running d6_node_revision 
...
andypost’s picture

Issue tags: +Entity Field API

it is resolving an excessively high memory event and we should try to clear every thing in Drupal

So why that does not fire event?
there's no way to other sub-systems to get involved here.

And \Drupal\Core\Entity\Sql\SqlContentEntityStorage::resetCache() implementation causes cache_tag invalidation, which could be very expensive.
Suppose we need follow-up here or at least document that operation could cause a chained invalidations

  1. +++ b/core/modules/migrate/src/MigrateExecutable.php
    @@ -568,7 +568,17 @@ protected function attemptMemoryReclaim() {
    +      $manager->getStorage($id)->resetCache();
    

    I'm mostly worried because this calls Cache::invalidateTags(array($this->entityTypeId . '_values'));

  2. +++ b/core/modules/migrate/src/MigrateExecutable.php
    @@ -568,7 +568,17 @@ protected function attemptMemoryReclaim() {
         // @TODO: explore resetting the container.
    

    Let's remove that todo or add issue to explore that

neclimdul’s picture

I'm honestly not comfortable moving this quick fix into the realm of new API/event/hooks. We have a bit of a mess with local variable caches like this in the container and long running tasks but that's no simple fix.

  1. It is literally the only API to do this. If its expensive, I think we just deal with it. We don't have a way of knowing what caches where populated leading up running out of memory. Let me pose a couple scenario.
    1. The user migration takes say 70% of our memory leaving us at 80%. We then start the node migration and it hits the 85% cap. What happens? We're in the node migration so we clear the node entity cache and regain 5% of our memory when users are holding up that huge chunk? We'll thrash clearing that 5% over and over rather then reclaiming 75% and running longer before hitting the threshold.
    2. A field or custom migration runs and starts loading users and nodes and custom entities to resolve references. We hit our cap and have to clear memory.
    3. Insert arbitrary set of arbitrarily complex migrations running for several hours
  2. I'm ok with just removing it. Rebuilding the container mid process is going to be very dangerous.
mikeryan’s picture

Is there a particular reason not to have the in-memory entity cache use drupal_static()? It seems that resetCache() is too heavy-handed, and using drupal_static() would make that in-memory cache work just like all other in-memory caches without undesirable side-effects.

neclimdul’s picture

Catch brought to light some related issues to this in IRC.

neclimdul’s picture

mikeryan’s picture

I do feel that the entity cache needs proper static caching (specifically a means of clearing the static cache without invalidating external caches), and until #2260187: [PP-1] Deprecate drupal_static() and drupal_static_reset() drupal_static() is it.

All functions requiring a static variable to persist or cache data within a single page request are encouraged to use this function unless it is absolutely certain that the static variable will not need to be reset during the page request. By centralizing static variable storage through this function, other functions can rely on a consistent API for resetting any other function's static variables.

Entity caching certainly qualifies, and is (unless I'm missing something big somewhere else) the biggest static memory user.

So, attached is a patch to use drupal_static() for the static entity cache. It's pretty simple and should be perfectly transparent - the one known issue is it'll break unit tests, which don't include bootstrap.inc, so we'd need to "mock" drupal_static/drupal_static_reset there. Knowing that, I'll let the testbot take a crack just to be sure there aren't any simpletests broken by this.

Status: Needs review » Needs work

The last submitted patch, 10: node_migrations_runs-2503999-10.patch, failed testing.

neclimdul’s picture

Status: Needs work » Postponed
neclimdul’s picture

Status: Postponed » Needs work

Sorry, didn't see the patch. I'm still in favor if fixing this in migrate with an updated version of the previous patch with we've been running in our development environment for 2 months. That approach is blocked on the other issue.

catch’s picture

So #2501117: Add static caching for PermissionsHashGenerator::generate() adds a cache.static bin using the memory backend. That addition is just a couple of lines if for some reason that patch doesn't get committed.

#1596472: Replace hard coded static cache of entities with cache backends is very outdated but I was trying to replace the static cache for entities with a cache backend - would need to start from scratch but that could be done with cache.static or cache.entity_static similar to the above. Scope could be a lot narrower than the issue was originally.

Then #1199866: Add an in-memory LRU cache is also outdated but was an LRU version of the memory backend with a limit on the number of items. Could either use a numeric limit on number of items, or just check peak usage in the backend each time there's a set and empty it at 60% like migrate does, or some kind of combination.

Then migrate could just ensure that the LRU / memory checking version of the cache backend is used in the container and might not need to do anything special at all otherwise. Or it can call MemoryBackend::reset() manually.

mikeryan’s picture

So, those are all good issues, but are any of them going to make 8.0? I have to admit the drupal_static() approach is also a longshot for 8.0 at this stage, but with no API changes seems like a possibility.

We need some way for migrations to make space for themselves today - of the various issues that involve improving entity caching, which has the best chance to get into 8.0? Whichever one that is, I'm happy to contribute to pushing it forward.

catch’s picture

o, those are all good issues, but are any of them going to make 8.0?

They're all performance/scaling improvements - and in the case of OOM straight bug fixes, so they can go in up to and potentially during release candidate if someone works on them. There's no API change in any of these, only API additions which is fine during beta as long as there's a concrete use case that needs it.

The LRU cache backend I'd personally like available in core, but that could be done in contrib once entities are using the static cache backend/service instead of a raw object property - and migrate can then depend on that either way. Or as mentioned use the reset() method to clear the entire backend manually instead of LRU.

webchick’s picture

Priority: Normal » Major

This sounds pretty bad, so escalating to major.

Berdir’s picture

Definitely -1 on introducing drupal_static() in there. Using a memory backend with cache tag support (would also remove 95% of the cases where we need to reset static caches manually in tests) or other OO replacement, maybe, but not that.

Patch in #2 looks good to me as a quick fix for this issue. There are also long-standing issues about introducing a limit of how many nodes are kept in the static cache, that would be another approach on how this could be solved long-term.

neclimdul’s picture

maybe the path through is
1) use the approach in #2
2) open a follow up that this introduces a bug into migrate where it can clear the site cache on a live site
3) Resolve the entity cache issue and implement any change needed in the follow from 2)
Side note: The EntityStorageInterface only allows reseting of static caches according to the documentation. The current cache invalidation, while probably pretty darn important, breaks the contract.

Berdir’s picture

Cache tag invalidation on a long running migration (or any sort of long running process) is flawed anyway due of the static cache that prevents repeated invalidations, so only the first node would actually invalidate the list cache tag. And fixing that to do it all the time wouldn't be fun either as that would most definitely bring a live site down ;)

The only thing that might work is doing the actual invalidations after the migration completed, but that has problems too.

Want to re-upload your patch so it's clear which one should be committed? I can RTBC then.

Berdir’s picture

Also, issue summary should explain the solution/workaround here.

catch’s picture

Cache tag invalidation on a long running migration (or any sort of long running process) is flawed anyway due of the static cache that prevents repeated invalidations, so only the first node would actually invalidate the list cache tag. And fixing that to do it all the time wouldn't be fun either as that would most definitely bring a live site down ;)

We could look at rate-limiting that rather than either once or every time - enforce once per configurable-amount-of-seconds per-request. Since it's just a static cache that's not too much extra work to implement.

mikeryan’s picture

Title: Node migrations runs out of memory » Large volume entity migrations run out of memory

Let me summarize how I understand the issue:

  • Migration creates lots of entities (and other stuff) in a single PHP process (we'll talk about drush here, since a single batch through the UI isn't usually going to run into memory problems).
  • Entities that are created are statically cached in the current process memory, in the $entities arrays in EntityStorageBase and ConfigEntityStorage.
  • The migration process monitors memory usage, and when it gets to a certain threshold (default 85% of PHP's memory_limit) attempts to make room for itself by calling drupal_static_reset(). In D7, this was very effective - in most cases this would reclaim plenty of memory and we would seldom have to fallback to spawning a new drush process to continue the migration.
  • Let's refer to the drupal_static() docs: "All functions requiring a static variable to persist or cache data within a single page request are encouraged to use this function unless it is absolutely certain that the static variable will not need to be reset during the page request. By centralizing static variable storage through this function, other functions can rely on a consistent API for resetting any other function's static variables."
  • Nothing uses anywhere near the static memory that the entity static storage does, at least in the migration context, yet it does not use drupal_static() for the static memory cache.

Yes, there are a number of other things that can be done to improve static entity caching, but I still don't understand why whatever static caching is done for entities should be exempt from the documented purpose of drupal_static(). It's certainly poor DX to have a documented "consistent API for resetting any other function's static variables" and yet have to remember that entities are special and you have to call a different API to reset them...

Berdir’s picture

It's certainly poor DX to have a documented "consistent API
for resetting any other function's static variables" and yet have to remember
that entities are special and you have to call a different API to reset
them...

The entity system is in no way special. In 7.x, we had 180 calls to drupal_static() according to api.drupal.org while in 8.x, we only have 50 left. And looking at the remaining calls, they are almost exclusively in old code.

That it frees memory is a side effect IMHO, the main use case for it was being able to reset the environment for tests. The way we do that in 8.x now is the container and dependency injection.

Resetting this could in some cases even have undesired effects, for example https://api.drupal.org/api/drupal/core%21includes%21bootstrap.inc/functi..., resetting it might actually result in dropped information that can't be recovered. Just an example, deprecated and very likely unrelevant for processed that migrate, but it also was the very first one that I looked at when searching for such an example, there are likely others.

mikeryan’s picture

The ability to free unnecessary memory is a critical use case for migration, and drupal_static() is the only existing general mechanism for that.

As for resetting resulting in dropped information, anywhere that happens is a case of misusing drupal_static() - "...use this function unless it is absolutely certain that the static variable will not need to be reset during the page request".

Must migration have a laundry list of module- or class-specific API calls to make for reclaiming memory? And what about contrib modules - if drupal_static() is being (at least informally) deprecated, will we go back to the D6 days of static caches that can't be reset?

catch’s picture

Issue tags: +Migrate critical

Tagging this as migrate critical. We could untag it later, but don't want to forget this one.

phenaproxima’s picture

I haven't been involved in this issue, but reading over it I support using @neclimdul's patch in #2 as a quick fix, because this is a big deal. The larger discussion about how static caching is done for entities is very much worth having, but it's bigger than Migrate and should not block this fix. Once a more consistent static caching system is in core, this can and should be revisited. But until then, we need to be able to reclaim memory effectively.

phenaproxima’s picture

Issue tags: +Needs reroll
neclimdul’s picture

Status: Needs work » Needs review
FileSize
899 bytes

Same patch still applies. Re-upload.

neclimdul’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update, -Needs reroll
phenaproxima’s picture

I'm OK with this patch, so I vote RTBC. However, I'd like someone more familiar with the intricacies of the Entity API to review and RTBC it for real.

andypost’s picture

+++ b/core/modules/migrate/src/MigrateExecutable.php
@@ -568,7 +568,17 @@ protected function attemptMemoryReclaim() {
+    // Entity storage can blow up with caches so clear them out.
+    $container = \Drupal::getContainer();
+    /** @var \Drupal\Core\Entity\EntityManagerInterface $manager */
+    $manager = $container->get('entity.manager');

There's \Drupal::entityManager() for that

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Note that this will also invalidate the persistent cache, but running migrations is going to be a severe hit on your performance anyway so this is probably acceptable, at least as a temporary solution.

I had a quick look into just resetting the static caches, we don't really have an API for that, we could reset the handlers property in the entity manager with useCaches(FALSE); useCaches(TRUE), but they might still be injected somewhere and not really freed.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Seems like everyone is on board with this being a reasonable quick fix, and it's unclear whether more expansive fixes are possible. That's a good argument for getting this in, possibly with a follow-up issue to dig in more.

I inquired about test coverage but it's not easy to test memory consumption.

Committed and pushed to 8.0.x. Thanks!

#32 sounds like a good novice issue, but don't want to hold up a migrate critical on that.

  • webchick committed 66e453a on 8.0.x
    Issue #2503999 by neclimdul, mikeryan, Berdir, catch, andypost: Large...
catch’s picture

Opened #2558857: Migrations invalidate entity caches when trying to reclaim memory, should flush to track the persistent cache clearing.

There are two cases that can be tricky:

- migrations that need to load entities as part of the migration may slow down a lot - some sites take 10 or 20 hours to migrate so a 10% regression can mean an extra hour or two.
- partial migrations on a live site

andypost’s picture

Follow-up (quickfix) for #32
#2559191: Clean-up migrate executable

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

davidwbarratt’s picture