Something else that is really important, but potentially can be solved in core itself for memcache's behalf:
Cache Tags can be invalidated whenever - even within transactions and that means we make old data valid again permanently - which means we serve outdated data.
e.g. Simple example:
Entity Cache:
[node:1] is written within a transaction, node tag is invalidated, but transaction is not yet committed.
Someone loads [node:1], the old node is loaded from the database and stored within memcache permanently.
Transaction is committed and from now on the new node should be shown. However it is not.
This can already happen with deletes (yes) and even with full bin flushes (yes, too).
But cache tags are everywhere in Drupal 8 and this makes the potential for race conditions much higher. And given that people get deadlocks on the DB cache tags implementation it means the race condition potential is real.
I think an after transaction commit that you can subscribe to would solve that as we could:
Invalidate the tags, and the queued deletions and the queued bin clears all at once.
Because we do not support strong DB isolation anyway (READ_COMMITTED is preferred) we do not even need to invalidate anything before transactions are committed [but we could of course do so as a best-effort for same-process reads of the data - though I am not a fan -- another possibility would be to set a different prefix for the cache during transaction (e.g. a random ID), so that the process has it's own cache key space to work with, invalidation are then propagated at the end of the transaction as usual ].
So unless someone disagrees, I think we should escalate #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock to a contributed project blocker and block a stable release on some solution getting into core for this.
Yes, Redis has the same problem, but I don't think it is good to ship something as stable, which has known race condition problems.
Especially if we have a solution that is not as complicated as cache_consistent.
| Comment | File | Size | Author |
|---|---|---|---|
| #79 | memcache-2996615-mr-22-f003a844-2.7.patch | 16.1 KB | rosk0 |
Issue fork memcache-2996615
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
catchIf we wait until the end of/after the transaction, we can still run into a race condition I think:
1. Process A saves node 1.
2. Process B loads node 1, gets old version.
3. Process A completes the transaction, invalidates cache tags.
4. Process B renders node 1, and it writes back to the render cache using the cache tag status from #3. Because the node was loaded in step 2, and is statically cached for the duration of the request, the render cache representation is stale, but the cache tags are up to date, so it's treated as valid.
And there's another one which is even harder to account for.
1. Process B loads node 1.
2. Process A saves node 1, completes the transaction, tags invalidated.
3. Process B renders node 1 - cached with stale data.
So as long as we have static caching of entities, there's always going to be this potential for a race condition, within the lifetime of one request. I don't think it really matters whether this is due to static caching or transactions given static entity caching actually creates a longer potential window for the race condition. Therefore I don't think this issue is critical (or if we decide it is, it won't be solved by the core issue).
The only thing I can think of which would solve most of these, would be forward dating the cache tags by a second or more - meaning there would be a delay clearing the cached items, but that any created around the actual moment of the tag being cleared would be more likely to get cleared since it'll sweep things up a second or two afterwards.
Comment #3
fabianx commentedDiscussed in chat: While this will allow to avoid certain race conditions we are still open to others, which are pre-existing in core.
So demoting to major.
Comment #4
m4oliveiCame here from a couple of other issues (linked as related issues). Will work on a patch for memcache modeled on the redis implementation (#3018203: Support delayed cache tag invalidation).
Comment #5
m4oliveiHere is the first piece that does direct cache item invalidation. This piece, together with the core patch in #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock fixes the race condition behavior reported in #1679344: Race condition in node_save() when not using DB for cache_field. I'll still also look at adding the delayed cache tag invalidation for memcache here like the redis patch has. I still need to study that a bit more and convince myself it's needed for memcache and how it works.
Comment #6
m4oliveiQuick and dirty update, haven't tested the memcache timestamp cache tag check sum yet.
Comment #7
m4oliveiCouple more updates:
Ready for review. We don't use the memcache cache tags service on our site, so I'm not the best person to confirm that behavior. However I will be testing the direct cache item invalidation to fix the race condition reported in #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock.
Comment #8
wim leers#2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock just landed!
I think this can probably look at #3018203: Support delayed cache tag invalidation's latest iteration for inspiration.
Comment #9
m4oliveiThanks Wim. I wrote this patch against the most recent patch that got committed to core, so hopefully the patch here is good still, but I'll make a note to check it over again in a couple days. Thanks!
Comment #10
wim leersOh, of course! Thanks :)
I think this would ideally be manually tested by multiple people.
Comment #11
m4oliveiDouble checked and this is working for me. Looking forward to hearing back from others manually testing this.
Comment #12
mh.marouan commentedI have Error:
Call to undefined method Drupal\Core\Database\Driver\mysql\Connection::addRootTransactionEndCallback() in Drupal\memcache\MemcacheBackend->deleteMultiple() (line 303 of modules/contrib/memcache/src/MemcacheBackend.php)
with Drupal 8.7.11
Comment #13
wim leers@mh.marouan: That's because #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock landed in Drupal 8.8. Update to Drupal 8.8.0 (released two weeks ago) and then this patch will work.
Comment #14
wim leers@catch, @Fabianx Any chance Tag1 could also manually test this? 🙏🤞
Comment #15
mh.marouan commentedThanks @Wim .
Comment #16
jedgar1mx commented#7 seems to fix the errors. Thanks @m4olivei
Comment #17
daffie commented@mixologic has said that he is going to work on #2949210: Drupalci Roadmap. This will make it possible to actually test Drupal with a Memcache storage. We can then see what works and doen't work with a memcache storage.
Comment #19
japerryTested as well, and looks good. I pushed the requirements for memcache to Drupal 8.8 since that is the lowest supported version of Drupal now.
Comment #21
trevorbradley commentedOK, I know posting this against a closed issue is not the best idea, and my apologies in advance for this being exceptionally vague, but I thought I should document something here in case it comes up for anyone else.
We deployed this patch against memcache 8.x-2.0 on a D8.8 production site to help with heavy server load. It appears to have wreaked havoc on the commerce system, throwing seemingly unrelated errors in profiles/commerce_recurring (reproducable), and possibly screwing up dates on recurring orders (verifying now). When we pulled the patch from the production servers, these issues appear to have gone away.
This was a multi-server environment. It felt almost like Drupal storage wasn't storing entity saves properly.
I wish I could be more specific than that. Maybe this is a commerce_recurring+memcache specific bug. Maybe this was unsafe to patch against memcache-8.x-2.0. Basically I want a place for people doing a google search to come back to in case this rears its head again.
Again, apologies - I know this is a vague report, but the severity of the bugs resolved by pulling the patch seemed worthy to report.
Was there anything else fixed since commerce 8.x-2.0 that this patch relies on to work safely?
Comment #22
nigelcunningham commentedI think I've located the issue, @TrevorBradley:
If an order is from the previous day when Drupal Commerce loads it or other criteria apply, it will seek to 'refresh' the order, resaving it from the entity load (that alone sounds bogus to me but...). The presave hook invokes loadUnchanged to allow hooks to compare the modified version of an entity with the original. As part of that, the persistentCache is reset. Or it should be. With this patch applied, memcache backend's deleteMultiple implementation looks at whether we're in a transaction and doesn't remove the data if that's the case, so the original entity is then returned when later calls seek to load the updated entity, resulting in ... (return to the start of the paragraph)
Comment #23
japerryMarking needs work and reverting this issue. Its causing massive issues and clearly the tests aren't sufficient.
Comment #25
guptahemant commentedComment #26
josh waihi commentedComment #27
josh waihi commentedI haven't tested this yet but wanted to post it here incase it helps anyone. I've written a transaction-aware memcache backend that essentially:
This could further be enhanced by using the MemoryBackend for get and set calls instead of just missing all the time.
Comment #28
andypostRe #27 it looks promising and could save some carbon
Comment #29
josh waihi commentedUpdates:
Comment #30
tobiberlinWe run into several issues which I think are related to this. One fox eample: when an answer to a question is published on our platform we calcluate all questions and answers of the politician on hook_entity_update. But when all published answers are loaded during hook_entity_update with EntityQuery memcache does not take the just published answer into account. The result is that the last pubslished answer is never counted. I used the patch from #7 but it does not change this. I did not apply the patch from #29 as I do not know how to tell Drupal to use the class from this patch?!
Comment #31
tobiberlinSince I applied the patch from #7 I find many files in file_managed with status = 0 although they should have status = 1. I am not sure if it is really connected to that patch but I am quite sure... this new issue just appeared since the patch was deployed... just in case it is helpful.
Comment #32
azinck commented@tobiberlin: do not use #7, it is known to cause massive issues and it absolutely corrupts data. That’s why it was backed out of the module after having been committed.
Comment #33
tobiberlin@azinck thank you I already removed it again.
Comment #34
wim leers#27: 🤩 Wow!
I'm really curious what @Berdir thinks about this 🤔 🤞
#30: Did you try #29? I have very high expectations of this implementation :)
Übernit: two newlines instead of one.
Übernit: s/false/FALSE/
Nit:
string $bin(i.e. add type hint).Very elegant! 🤩
Nit: Could use a
boolreturn type hint.Übernit: s/true/TRUE/
Übernit:
string $method.Nice! This is combined with my "elegant!" remark form the heart of this approach.
Since I only have nits, and this really just needs manual testing in the wild, I'm marking this 👍
Comment #35
josh waihi commentedThanks Wim, I'll update with fixes but I wanted to update the latest patch which actually has some testing done to it and uses the transaction hook from the DB connection instead of detecting it manually.
Comment #36
omar alahmed@Josh I had this issue after updating from 2.2 to 2.3 so I applied the patch in #35 on core 8.9.20 and Memcache 8.x-2.3 and then flushed all the Memcache servers but it didn't work (after updating any node, the old revision is displayed even for the admins).
So I had to downgrade to 8.x-2.2 and the issue has been resolved. However, I'm not sure if the patch needs work!
Comment #37
twod@Omar Did you also change the cache backend(s) to
cache.backend.memcache_transaction_aware?Comment #38
megha_kundar commentedHi @TwoD We had same issue we have also changed cache backend to cache.backend.memcache_transaction_aware and applied patch #35 It didn't work for me.
Comment #39
omar alahmed@TwoD I don't think so. Our drupal 8 application is hosted on Acquia and we followed their docs here: https://docs.acquia.com/cloud-platform/performance/memcached/enable/
I checked it and I could not find
memcache_transaction_aware.What do you suggest?
Comment #40
kekkisRerolled #35 against the latest released version (8.x-2.5). The services.yml had gotten an additional argument for the
cache.backend.memcacheservice which was however not (at least no longer) used at all in the implementation ofDrupal\memcache\MemcacheBackendFactory. Based on that fact alone, I decided not to add the same argument in thecache.backend.memcache_transaction_awareservice. Not to mention how different the backend setup actually is. :-)Ah, sorry for the missing interdiff. Adding this as HTML here instead.
Comment #41
skylord commentedTried #40 with "cache.backend.memcache_transaction_aware" as backend on site with huge traffic and slow DB (so the chance for race conditions is bigger) - no way, still the same behaviour. Node saving process started, cache is invalidated first, someone requests page, cache is populated with old data, then node actually saves in DB and we got stale cache until i manually invalidate appropriate cache tag using drush... With DB or Redis backend all is OK on that site!
Comment #42
arthur.baghdasar commentedWe are seeing same behaviour as per #41
Comment #43
weekbeforenextThe patch in comment #40 worked for me. I had reported an issue with the Diff module (#3377875: Stale cache when latest revision is published (content moderation)) which ended up being related to Memcached. I'm not sure of the overall implications of this patch and it's affects on performance, but it does resolve what I described in the related issue.
Edited: Spoke too soon. Seeing some slowness in node editing and I was able to reproduce my problem issue. :(
Comment #44
weekbeforenextComment #45
ericgsmith commentedWe have been encountering what I suspect to be this issue. I was able to reproduce the same issue using the patch in 40 locally, and tracked it down to be stale data in the entity cache. When reset cache is called by entity storage, it is calling deleteMultiple which appeared to be happening immediately and not buffering until the transaction was complete.
Looking through it, it looks like implementation of deleteMultiple was just missing from the transaction aware backend - added it and can no longer reproduce what I was encountering. Still doing some testing but it looks really good - thank you Josh!
Would be interested if the deleteMultiple / entity cache issue is what was causing the continued issues in 36, 38, 41 and 43.
Comment #47
ericgsmith commentedMove the MR and made an attempt at adding some tests.
Unfortunately the existing MemcacheBackendTest is failing - and this test will have the exact same failures. I posted here https://www.drupal.org/project/memcache/issues/3397864#comment-15438674 but could not find an open issue where the tests are still being worked on.
Ignoring the pre failing tests - the 2 new test methods I've added are very rough, they pass locally but I'm not sure what the state of CI will be. They need work, so leaving as needs work.
Comment #48
ericgsmith commentedTest failures for both MemcacheBackendTest and TransactionAwareMemcacheBackendTest both reduced by changing the tolerance to 0 for the timestamp invalidator.
Setting back to needs review as I'm drifting into changing existing test failures here. The additional tests cover the get set and delete during transaction and with a rollback. This isn't full test coverage but I think with the existing failures its hard to press through without full confidence in them. I think getting the existing tests all passing in CI in a different issue will make this easier to review.
I'm still very interested in if the fix in #45 will resolve the errors reported in #36, #38, #41 and #43.
Comment #49
ericgsmith commentedAdding #2996055: Test coverage as a related issue to track existing issues with test failures.
Comment #50
btully commentedI can confirm that patch #45 appears to have solved the issue of stale content for a site using considerable nested entities (paragraphs), using cache.backend.memcache as their default backend and where a node edit form might take 10 seconds or more to submit while the node being edited is receiving traffic.
To reproduce the issue I developed a simple JMeter test to request the test page as an anonymous user with a cachebusing query string. The test runs with 10 concurrent users.
While the load test is running, I manually edit the test page, which contains paragraphs. I edit one of the paragraph components and update the title and kicker fields. I submit the node edit form.
Prior to the patch, the node edit form would submit, but the embedded paragraph form fields that were updated do not appear on page load AND/OR when reloading the node edit page. It seems as though the race condition of a simultaneous anonymouse user request for the page has memcache get the old/stale values before they can be deleted and populated with the new values.
After applying the patch and changing the default cache backend from cache.backend.memcache to cache.backend.memcache_transaction_aware, and clearing Drupal and Varnish cache, I repeated my reproducible steps.
With the patch applied, I immediately saw the new values for both fields as both an authenticated and anonymous user. Reloading the node edit form also showed the updated values.
I went through the reproducible steps 3 times, and all 3 times the new values were displaying, which leads me to believe this patch (#45) is working as expected!
Comment #51
btully commentedI take my last comment back. While the patch appeared to be working for about a week or so, I am suddenly seeing the following error even though the service and FactoryInterface are present in the codebase:
You have requested a non-existent service "cache.backend.memcache_transaction_aware".
Is there a specific place the `cache.backend.memcache_transaction_aware` needs to be defined? I have the following in my in docroot/sites/settings/global.settings.php:
$settings['cache']['default'] = 'cache.backend.memcache_transaction_aware';
Like I said, it was working for about a week, but with a recent deployment that changed the code to a branch without the patch, now when we switch the code back to the branch with the patch (and settings) we are getting the above error.
Is there someplace else the new service needs to be explicitly defined/injected?
Memcache module is enabled and I can see the new cache.backend.memcache_transaction_aware service in memcache.services.yml
But for some reason, even after a cache rebuild, Drupal is saying the service isn't defined.
Could opcache be to blame?
Comment #52
suresh7787 commentedHi All,
I have a quick question regarding this. I applied the patch to my site, which is highly transactional, and it successfully resolved the stale content issue. However, I occasionally encounter cache corruption (cause unknown), and I suspect it might be due to a race condition or something similar.
When I reviewed the cache bins and their associated classes, I noticed that critical bins like `bootstrap`, `discovery`, and `config` are still using the default Memcache backend instead of the transaction-aware backend.
The cache corruption predominantly affects `config` and `discovery`, and clearing the cache resolves the issue temporarily. However, the problem reoccurs over time.
Could you please advise if moving these critical bins to the transaction-aware backend would help prevent this cache corruption?
Thanks!
Comment #53
sachin.bansal27 commentedDo We need to make any change in bootstrap container?
$memcache_exists = class_exists('Memcache', FALSE);
$memcached_exists = class_exists('Memcached', FALSE);
if ($memcache_exists || $memcached_exists) {
$class_loader->addPsr4('Drupal\\memcache\\', $app_root . '/modules/composer/memcache/src');
// Define custom bootstrap container definition to use Memcache for cache.container.
$settings['bootstrap_container_definition'] = [
'parameters' => [],
'services' => [
'database' => [
'class' => 'Drupal\Core\Database\Connection',
'factory' => 'Drupal\Core\Database\Database::getConnection',
'arguments' => ['default'],
],
'settings' => [
'class' => 'Drupal\Core\Site\Settings',
'factory' => 'Drupal\Core\Site\Settings::getInstance',
],
'memcache.settings' => [
'class' => 'Drupal\memcache\MemcacheSettings',
'arguments' => ['@settings'],
],
'memcache.factory' => [
'class' => 'Drupal\memcache\Driver\MemcacheDriverFactory',
'arguments' => ['@memcache.settings'],
],
'memcache.timestamp.invalidator.bin' => [
'class' => 'Drupal\memcache\Invalidator\MemcacheTimestampInvalidator',
# Adjust tolerance factor as appropriate when not running memcache on localhost.
'arguments' => ['@memcache.factory', 'memcache_bin_timestamps', 0.001],
],
'memcache.backend.cache.container' => [
'class' => 'Drupal\memcache\DrupalMemcacheInterface',
'factory' => ['@memcache.factory', 'get'],
'arguments' => ['container'],
],
'cache_tags_provider.container' => [
'class' => 'Drupal\Core\Cache\DatabaseCacheTagsChecksum',
'arguments' => ['@database'],
],
'cache.container' => [
'class' => 'Drupal\memcache\MemcacheBackend',
'arguments' => ['container', '@memcache.backend.cache.container', '@cache_tags_provider.container', '@memcache.timestamp.invalidator.bin'],
],
],
];
}
Comment #54
deepakrmklm commentedHi All,
Patch from #45 works well, if we override all bins `default`, `bootstrap`, `discovery`, and `config` to Memcache trasaction aware backend.
Thanks.
Comment #55
rich.3po commentedFollowing up on the case referenced in #54 and #51 (which is the same application).
We believe comment #51 can be disregarded, as we only observed this at one stage, and we believe this was down to a deployment or cache flushing issue of some kind.
So in summary, the patch in #45 seems to be working well, so long as cache bins which have a
default_backendset are explicitly overridden in settings.php.See https://www.drupal.org/node/2754947.
e.g for the
discoverybin:Can the module maintainers please comment on the overall approach used here, and advise if this patch is likely to make it back into the main project?
Comment #56
janusman commentedCan report a client used the patch successfully.
The overrides that were added into settings.php to use the new backend added by the patch was this section:
Comment #57
chris burge commentedFollowing up from #50/51 here. (@btully - Thank for your those testing steps!)
Steps to reproduce:
cache.backend.memcacheforbootstrap,discovery,config, anddefaultcache bins.hook_entity_update()function withsleep(2);to emulate real-world entity save.Here's the
hook_entity_update()code:Next, I configured JMeter with a simple load test - 10 concurrent requests with random query strings appended to the request URL.
Next, I deployed the MR code and retested:
cache.backend.memcache_transaction_awarecache backend.This testing indicates that the MR is effective.
Comment #58
chris burge commentedSetting to 'Needs work' due to random test failures:
Pass: https://git.drupalcode.org/issue/memcache-2996615/-/jobs/5137611
Fail: https://git.drupalcode.org/issue/memcache-2996615/-/jobs/5137778
Comment #59
mgam737 commentedWe also have applied the patch and can confirm that it addressed the issue as well.
Comment #61
idebr commentedTests are failing on Drupal 11.2 regardless, see #3538514: Fix tests on 11.2
Comment #62
chris burge commented@idebr - Thanks for sorting out those tests! Setting to RTBC (although, I assume we'll need passing tests being merging).
I can also confirm the patch works. A customer of mine was experiencing an issue where content authors would need to double save content in order for content updates to be reflected. The race condition was triggered by steady traffic to the site + content updates. With a patch from this MR deployed, the issue stopped presenting.
Comment #63
himanshu5050 commentedConnection::addRootTransactionEndCallback is deprecated and removed in Drupal 11.x, so re-applied the patch for D11 compatibility.
See here - https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Database%...
Comment #64
himanshu5050 commentedComment #65
himanshu5050 commentedComment #66
chris burge commented@himanshu5050 - Can you update MR22? With the move to GitLab, only MRs are being committed now.
Comment #68
minoroffense commentedI added the changes from https://www.drupal.org/project/memcache/issues/2996615#comment-16232688 into the MR as per https://www.drupal.org/project/memcache/issues/2996615#comment-16232773
Comment #69
ericgsmith commentedRestoring the previous RTBC status - #63 was providing comment / fix to patch 45 which was more than 2 years older than the MR - it has been already fixed a long time ago. 7cbb399fbcd6a589a9665c17ed57bfdc697bd876 just introduces formatting change. Acknowledging that tests currently fail as previously mentioned in (#62).
We have been using this patch for several years now and now on multiple projects where we have encountered similar issues to the many described in this thread, it would be awesome to see this committed.
Hiding all patches to hopefully prevent further confusion on the issue.
Comment #71
rosk0I've noticed that core compatibility constraint wasn't changed while using transaction manager added in 10.2.
Updated accordingly. No functional changes.
Comment #72
rosk0Addressing #55 and #56 about the correct configuration - README is correct and the only change in the
$settings.phprequired to send all cache bins to the Memecached is$settings['cache']['default'] = 'cache.backend.memcache_transaction_aware';. That is because the three services (cache.bootstrap,cache.config,cache.discovery) that currently declaring adefault_backend, are usingcache.backend.chainedfastbackend which uses the$settings['cache']['default']as consistent service and APCu, if available, as a fast cache backend.If you declare cache backend for those bins explicitly, you are adding a network request to a Memcached server for every
$cache->get()without any good reason, defying the purpose of thecache.backend.chainedfastbackend.See http://api.drupal.org/ChainedFastBackend for details.
Comment #73
rosk0#3538514: Fix tests on 11.2 has been merged rendering MR unmergeable - rebased.
Noticed that the service introduced here uses deprecated in 11.3 dependency
@cache.backend.memory- fixed that .See https://www.drupal.org/node/3546856 .Comment #74
rosk0Crosslinking and file cleanup.
Comment #75
rosk0Patch generated directly from the merge request could not be applied to the latest 2.7 release so here is the one that could be. It is generated from the latest commit 31d673eb -
chore: Update compatible versions.Comment #76
ericgsmith commentedComment #77
rosk0Tests are obviously flaky - from three jobs one succeeded and two failed with different results:
Comment #78
andypostCurious why I get
The current user is not authorized to access the job log.Comment #79
rosk0New patch version.
Comment #80
ericgsmith commentedLatest changes look good.
I have retested to verify I can still reproduce the issue without the patch, and that the patch still resolves the issue.
Test failures do look random - I reran the pipeline a few times and the unchanged MemcacheBackend tests occasionally failed alongside the new TransactionAwareMemcacheBackend with inconsistent results. I think there is still an unreliable nature to the tests - setting up a scheduled pipeline for the project would be good so that we can see more of a pattern - I note currently there doesn't not appear to be any regular pipelines running.
Good notes on the open MR comment - I'll leave it open in case somebody else can weigh in, but looks resolved from my perspective.
Comment #81
damienmckennaWould it be worth updating README.md to note the recommended usage of the new cache backend?
Comment #82
chris burge commented@damienmckenna - That's something I've gone back and forth on. On one hand, a transaction-aware backend fixes a pretty significant bug. On the other hand, should we merge this and let it live in a wild before making it the default recommendation? Arguably, the non-transaction-aware backend should be replaced entirely.
I do have a customer who had deployed this MR on a fairly major Drupal application, which runs on 10x r5a.4xlarge AWS instances. They handle a significant volume of traffic (60,000-70,000 requests requests per hour), and we haven't observed any adverse affects from the MR.
Comment #83
damienmckennaShould we update the MR to replace the existing plugin?
Comment #84
chris burge commentedThat makes sense to me. The MR has been battle tested (i.e. we have sites using it successfully, including large sites). And that would eliminate technical debt that would otherwise be created for site owners (i.e. needing to switch the backend in settings.php).
Comment #85
berdirI have not reviewed the implementation and have no experience using memcache on D8+, what I can say as redis maintainer is that both redis and the core changes never considered this as an optional feature but as a bugfix and the only correct thing to do implementation.
That said, the redis implementation is significantly different from this. Redis only considers specific operations, namely cache tag invalidations and deletions as a problem during transactions and not all implementations. That is because only those are known to cause issues when invalidating/deleting during an active transaction before the updated data might be visible to others. That has been in use since 2019/2020 and I've never had reports of any inconsistencies since then.
It's also worth noting that the redis implementation explicitly avoids injecting the database connection because it is not possible to access the database connection object without automatically opening a connection to the database. That means the connection is opened for example for a page cache hit.
And a last note on memory cache. I didn't review how that is being used here, but note that only the service and not \Drupal\Core\Cache\MemoryBackend has been deprecated. They are on purpose different. \Drupal\Core\Cache\MemoryBackend behaves like a regular backend in that items in it are serialized and unserialized and is meant as a transparent replacement for a real cache, for example during kernel tests. While MemoryCache does *not* serialize entries and is meant to be used as an in-memory cache and replacement for things like drupal_static() or a property on a service. I don't know which implementation is appropriate here. The service has been deprecated to discourage using MemoryBackend as a fast in-memory cache.
Comment #86
pesonenv commentedUsing this patch led into Redirect module's "Automatically create redirects when URL aliases are changed." feature to break. Next idea was to set
$settings['cache']['bins']['entity'] = 'cache.backend.database';as that seemed to fix the issue with automatic creation of redirects. Realised setting that probably doesn't help with the problem of sometimes having to double save content as this might be happening with core as well according to https://www.drupal.org/project/drupal/issues/3474843.Comment #87
jonhattanProblem reported by @pesonenv is generated by the use of the `cache.backend.memory.memory`. This error is not present when using `cache.backend.memory`, so I guess we should use \Drupal\Core\Cache\MemoryBackend somehow.
Comment #88
ericgsmith commentedSeeing a first side effect using the transaction aware backend.
Upgrade site we have been using this on to Drupal 11.3. Seemed fine but later discovered issues when deleting entities we were seeing a fatal error:
Fatal error: Cannot switch fibers in current execution contextWhat was happening was
- inside a hook delete, another set of entities was being loaded (in order to be deleted)
- loading the entities calls cache set for the entity cache during the transaction
- one of the fields on the entity was a computed field (metatag)
- when the buffer is cleared / actual call to memcache happens setting the cache, that computed field was being computed which was calling render->executeInRenderContext which was trying to start a fiber and threw that exception.
Not sure why exactly (fibers 🤷 😕), but this exception is not thrown if we don't buffer sets (as how Redis works, mentioned in comment #85).
Not sure exactly what is happening there - but a heads up for anybody that is also using this patch with metatag - in this case it was things like redirect and menu link content which I didn't even expect to have a metatag computed field on them (raised #3587766: Only add base field to entity types that need it for that)
Comment #89
ericgsmith commentedAfter doing a bit more reading on Fibers and reproducing the bug, I can see why its happening.
The post transaction callback is executed from within the destructor - https://git.drupalcode.org/project/drupal/-/blob/11.3.x/core/lib/Drupal/...
Were are using php8.3 which notes:
Source: https://www.php.net/manual/en/language.fibers.php
The metatag's computed field is calling executeInRenderContext which is what tries to start a fibre - https://git.drupalcode.org/project/metatag/-/blob/2.2.x/src/Plugin/Field...
So when the entity is set to the backend during a transaction its buffered, then when set to memcached its serialized which tries to compute the metatag field / start the Fibre and 💥
So - this is more of a warning for anybody in that same set of circumstances (this patch, metatag and php8.3)
Stacktrace for anybody interested:
Comment #90
damienmckenna@ericgsmith: If you upgrade to PHP 8.4 does the problem go away?
Comment #91
ericgsmith commentedI strongly assume it would yes, have not tested as not an option for this client at this point in time.
Comment #92
ericgsmith commentedSecond side effect I have noticed using this patch.
I noticed the cached value of a taxonomy term entity had a value for
->_referringItemas a node object.Tracked it down to when
auto_entitylabel_entity_insertis saving a node the tokens loaded the taxonomy term - which queued the term to be set to cache. Later during the transaction the field tokens are generated for the node, one of which was rendering the the taxonomy term using the default label formatter, which sets this magic property to the taxonomy term object. Now when the transactions completes the taxonomy term object that was queued in the transaction buffer has the _referringItem value on it.I haven't noticed a negative side effect, I'm sure there could be one , but does not seem right that this value is cached as part of the entity.
I wonder if since core is adding it, core should remove in as part of
__sleepOr pehaps when buffering a set call, we could clone the data arg if its an object - not sure how efficient that is but it also makes me wonder what else might interact with the object during the transaction that we don't expected to end up on the cached object.
Comment #93
ericgsmith commentedOk - so looking into this today deeper regarding comments on #86, #87 and #85.
The differences between MemoryBackend and MemoryCache are causing the side effects noted in the issue with redirect. Debugging I can see that when redirect runs that the
originalEntityproperty of the alias entity is the new entity - e.g with the new alias set. This causes the check to fail and no redirect created. We do not see this with MemoryBackend as the entity is serialised when set.I think the key part that is breaking with memory cache is the assumption in the entity storage loadUnchanged method - which says:
Optimize this for content entities by trying to load them directly from the persistent cache again, as in contrast to the static cache the persistent one will never be changed until the entity is saved.With memory cache this can be changed prior to saving as we are returning the object that was originally set, which could have been changed at this point.
I think the change to use memory cache must be reverted, but we are going against the grain of the CR which states this about memory backend (#3546856: New cache.memory cache bin, replaces cache.static, MemoryCacheInterface alias deprecated
But in this case, I think we are after a drop in replacement for regular runtime operations.
With this change reverted the redirect bug is resolved - but the issues I outlined above around mutation of the object / referringItem remain - as while the memory backend serialises at the time we call it, the original entity still hangs about in the buffer until flushed and so can still end up persisting data that has changed after it was set to cache.
Comment #94
slejnej commentedTested the patch on a big site and we've encountered loop problems reported by xdebug when calling
MemcacheBackend->getMultiple()Updating with following resolved the issue and caching split between "the 3":
Comment #95
ericgsmith commentedA colleague also noted that with the latest version of pathauto (8.x-1.15) there were receiving a 404 after creating a new node The change in path auto to add a message on save is somehow (with this patch) poisoning the route cache.
Comment #96
bohemier commentedData loss on nested entity saves when patch is applied to entity/data bins
We hit a reproducible data-loss regression that I want to add as a data point. Symptoms match comments #21, #22, #30, #31 (commerce havoc, hook_entity_update not seeing fresh data, silent file_managed corruption), but with a clean repro pattern that might help narrow the root cause.
Setup
cache.backend.memcache_transaction_awareservice.Reproducible failure
We have a parent–child webform pattern: a child submission's
postSave()hook attaches the child's ID to the parent submission'sinvoices(entity-reference, multiple) field, then calls$parent->save(). With the patch applied to entity/data bins:$parent->setData([...'invoices' => ['child_id']])— set in memory, verified via direct property dump.$parent->save()— runs, returns no error.webform_submission_dataforinvoicesis empty afterwards.Adding logging inside a webform handler's
preSave()on the parent, we see$webform_submission->getData()['invoices']returns[]even thoughsetData()was called with['child_id']immediately prior. The in-flight value is lost somewhere in the save pipeline — somewhere betweensetData()andWebformSubmissionStorage::saveData()'s call togetData(). A direct DB write (bypassing$entity->save()entirely) persists correctly, confirming this is pipeline-only.The failure is systematic, not intermittent. The same code works without the patch.
Workaround
Scope the patch to only the bins it was designed for. We now use:
Critically:
cache.defaultmust NOT be the transaction-aware backend, otherwise entity/data (and any other unlisted bin) inherit it. This is a footgun worth documenting — the guidance in #72 implicitly hits this trap.With this scoping, the original cim/cr fix this patch is designed for still works, and the parent–child save pattern persists correctly.
Suggestion
entity/databins (commerce, webform, anywhere$entity->save()is called from inside another entity's save hooks) would save several days of debugging.deleteMultiplesemantics are fully resolved, the safe scope appears to be config / render / page caches only.