Problem/Motivation
Core has a MemoryCache implementation, which provides the same API as cache backends but with everything held in PHP memory.
This is already used for the entity static cache after #1596472: Replace hard coded static cache of entities with cache backends.
The long term goal is to solve #2558857: Migrations invalidate entity caches when trying to reclaim memory, should flush 'automatically' - i.e. that core would statically cache things, but that our implementation would prevent memory leaks without intervention. This will solve a lot of issues with processes that need to load lots of entities, with migrate as the main core example #2558857: Migrations invalidate entity caches when trying to reclaim memory, should flush.
Back in 2011, @catch suggested an implementation using weakreferences - at that time only a PECL extension #1237636-7: Entity lazy multiple front loading.
There's an issue open to add a basic LRU cache #1199866: Add an in-memory LRU cache implementation (also from 2011), but that has limitations in that there has to be a hard-coded numeric limit on the entities added, which could still lead to out of memory issues in some situations, or entities being ejected from the cache when there's plenty of memory left.
WeakReferences are now in PHP 7.4: https://www.php.net/manual/en/class.weakreference.php
WeakMap is now in PHP 8: https://www.php.net/manual/en/class.weakmap.php / https://wiki.php.net/rfc/weak_maps (can't find actual PHP 8 docs for it yet). With a PHP 7.4 polyfill https://github.com/BenMorel/weakmap-polyfill
Since Drupal 10 will require PHP 8 or higher, we could add an implementation to core in a 9.x minor release, then Drupal 10 could switch the default for entity caching to it.
Steps to reproduce
Load thousands of entities in one go.
Proposed resolution
Start with a WeakReference implementation for now.
WeakMaps may end up being useful for entity or other caching, but this will require further investigation.
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#39 | reroll_diff_3190992_33-39.txt | 2.02 KB | ankithashetty |
#39 | 3190992-39.patch | 10.21 KB | ankithashetty |
Comments
Comment #2
heddnNote: the memory issues in #2558857: Migrations invalidate entity caches when trying to reclaim memory, should flush are not 100% resolved with clearing entity cache. Drush and Drupal slowly seem to eat up and not release more memory over time with migrations. But once we land that issue, it will be much easier to see any other places for improvement. Baby steps.
Comment #3
catchHere's an (untested) start.
While this will help with memory, it doesn't solve it completely, because even when references to the entities are cleaned up, we're still storing an array of stdClass pointing to empty WeakReferences.
Pretty sure this is what WeakMap is supposed to help with, but I don't see a way to make that work with our caching and entity systems yet.
Comment #4
catchBritish English...
Comment #5
catchCan't use the constructor.
Comment #6
catchComment #7
catchAdding some very basic unit testing. Can't get gc_collect_cycles() to clear the WeakReferences yet - not sure if it's me, or if it's not really testable.
Comment #8
catchMinor clean-up.
Comment #9
catchCspell didn't like the test content.
Comment #10
ranjith_kumar_k_u CreditAttribution: ranjith_kumar_k_u at Zyxware Technologies commentedUpdated the test content
Comment #11
alexpottWrong class name
Shouldn't this extend \Drupal\system\Tests\Cache\GenericCacheBackendUnitTestBase? (And be a kernel test)
Comment #12
anmolgoyal74 CreditAttribution: anmolgoyal74 at OpenSense Labs for DrupalFit commentedAddressed #11
Comment #13
catch@anmolgoyal74 #11.2 recommends extending GenericCacheBackendUnitTestBase, but that change isn't in the most recent patch - there should be other cache backend tests extending this same one to compare against. Also still needs the class name in #11.1 updating.
Comment #14
ravi.shankar CreditAttribution: ravi.shankar at OpenSense Labs commentedHere I have tried to address comment #13.
Comment #15
neclimdulHere's a more functional version of the test that implements the required abstract method.
The base test catches one bug that I fixed. The current patch assumes the data to be an object but its mixed so only passing objects to weak reference seems to be the answer.
There are still some failures that I didn't really look too closely at yet. It looks like maybe the generic test is asserting some PHP pass by value or return by value behaviors that might be a bit different for weak references.
Comment #16
Suresh Prabhu Parkala CreditAttribution: Suresh Prabhu Parkala at Specbee commentedTried to fix custom command failures. Please review.
Comment #17
catchSo the remaining failure is due to the following, as hinted at by @neclimdul.
Regular cache backends serialize and unserialize objects, meaning if you request an object, change something on it, then request it again from cache, you should get the 'same' (or rather, a copy of the...) object back without the interim modification.
For a 'static cache' backend, we want the opposite behaviour, since we're replacing 'chuck an object on a property of a class' as opposed to 'store an object in the database for a later request'.
I think we can probably still use the base class, but need to adapt things for our needs a bit.
Comment #18
catchComment #19
catchThe testing for invalidate tags relies on the cache invalidator service (i.e. Cache::invalidateTags()), but that isn't set up for memory cache backends so overriding the method to test it directly. This isn't necessarily the best way to deal with this problem, but also not convinced this is the issue to refactor the cache testing base class.
Comment #20
catchBetter way to deal with the cache invalidator via @berdir.
Comment #21
catchWhitespace.
Comment #22
catchComment #23
catchWhitespace again...
Comment #24
bradjones1I spent a good amount of time tonight reviewing this manually and also running the new test methods locally. Only nit I have from a DX standpoint here is that the test would be more helpful, perhaps, to people new to the WeakReference concept if the docblock on
::testSetGetNoReference()
explained more about why the test is skipped; that is, the object in the cache *will* change if you alter its properties after retrieving it.Might be a stupid question, but is it possible that someone might be relying on the existing behaviour, which guards against this via serialization? Does this need a change record to note that this is truly a reference to an object in a static cache, vs. a repository that stores the object as it was when set?
Comment #25
catchAdded a note to the test class. On the second point that is covered already by MemoryCacheInterface, but added an @see to that.
Comment #26
catch@berdir asked a question in slack - i.e. whether weakreferences are destroyed only by garbage collection, or immediately when unset/they got out of scope.
The rfc suggests garbage collection: https://wiki.php.net/rfc/weakreferences - this is the behaviour we want.
The php docs say when they're unset/out of scope: https://www.php.net/manual/en/class.weakreference.php
Added test coverage for this, and it looks like bad news to me :/
Comment #27
Wim LeersThis will be great to have in Drupal core! I for sure ran into problems related to this while working on migrations!
Comment #28
alexpottI guess given #26 we're back to LRU as often an entity will not remain in scope so the entity memory cache wouldn't really be helped by this at all.
Comment #29
alexpottI posted https://bugs.php.net/bug.php?id=80951 - let's see if that get's any traction.
Comment #30
alexpottI posted https://bugs.php.net/bug.php?id=80951 - let's see if that get's any traction.
Comment #31
catch'not a bug' on https://bugs.php.net/bug.php?id=80951.
This makes them pretty useless for our purposes. An individual method that's creating thousands of objects could use them, but how many of those are there really?
Moving to 'needs work' so we can discuss a bit more, but I think we're back to #1199866: Add an in-memory LRU cache (which I was hoping to leapfrog entirely with this this issue, turns out not).
Comment #32
KapilV CreditAttribution: KapilV as a volunteer and at Innoraft for Drupal Care, Drupal Association commentedComment #33
KapilV CreditAttribution: KapilV as a volunteer and at Innoraft for Drupal Care, Drupal Association commentedFixed Custom Commands Failed.
Comment #35
geek-merlin> This makes [weakrefs] pretty useless for our purposes.
Interesting problem. What if we use refcycle black magick to save the weakref from the refcounter. PHP has a hardcoded limit of 10k refcycles before auto-gc, so doing that for cache items is pretty useless, but if we weakref the cache service itself, the limit should be OK for now. "10k cache services are enough for everyone".
Comment #36
geek-merlinHere's a Proof Of Concept that and how #34 can be done:
https://3v4l.org/MG8tb
Comment #37
rgpublicHm, geek-merlin's idea is certainly cool. Although I came to wonder why we are relying on PHP's garbage collection anyway. With its hardcoded 10k value it seems somewhat arbitrary anyway when things get purged.
We could also have 2 different bins. In one bin go normal/non-weak/"strong" references in the other bin go the weak references. At first if I set sth., it goes to the strong bin together with a value. It could be a timestamp or it could even be the number of garbage cycles (gc_status) to emulate PHP's garbage collection behavior. And then if sth. new goes in, we always compare whether some stuff in the strong bin should go to the weak bin (if some time has passed, some garbage cycles took place, whatever). This way, the conditions WHEN that actually happens could be a bit more predictable and configurable...
Just thinking out loud - ignore me if it sounds too crazy ;-)
Comment #38
mstrelan CreditAttribution: mstrelan at PreviousNext commentedI tested #33 on a batched post update hook that was running out of memory loading ~1500 nodes in batches of 50. Before the patch memory usage was increasing a few MB with every batch, after the patch the memory usage remains pretty much constant. The patch no longer applies to 9.2.x or 9.3.x.
Comment #39
ankithashettyRerolled the patch in #33 as requested in #38, thanks!
Comment #40
Wim Leers@mstrelan: thanks for the super valuable real-world testing!
Comment #41
mstrelan CreditAttribution: mstrelan commentedI should add that the update was executed via drush in case that makes a difference.
Comment #43
andypostAs 8.0 a minimum for core 10 I bet it safe to make it real
Comment #44
acbramley CreditAttribution: acbramley at PreviousNext for Service NSW commentedWe were running into memory issues on our hosted environments (the dreaded
The process has been signaled with signal "9".
error) when migrating 3.5k entities (for some reason locally drush would correctly create a new thread).With patch #39 we no longer had new threads created locally and the upgrade ran flawlessly on our hosted environment!
Comment #45
longwaveComment #46
catchHuh #44 suggests this might actually work for our purposes when the PHP docs were discouraging at best, that would be extremely good news if so!
Comment #47
longwaveOr, it was effectively acting as a null cache backend, which solved the memory problem as it wasn't actually caching anything?
Comment #48
catchOK I just checked the test failures in #39 and it's the same problem as
#26-31 (i. the test is OK and the code it's testing OK, but weakreferences aren't doing what we want). The weakreferences don't persist when the object goes out of scope, which means there's no persistence across a request, which is not useful for entity caching. So the patch is likely fixing @acbramley's memory issues because it's essentially disabling the entity static cache on the cli.
Comment #49
acbramley CreditAttribution: acbramley at PreviousNext for Service NSW commented-_- damn
Comment #51
andypostMeantime Symfony reverted weak references for DI because of https://github.com/symfony/symfony/issues/50439