Problem/Motivation
cache.config uses the chained_fast backend by default, which leverages APCu. In case of running multiple web servers in parallel, each of them has its own APCu. If a config gets updated on one server, the outdated cahe entries of that config remain valid on the other servers.
Steps to reproduce
Setup drupal with multiple webservers.
The webforms contrib module stores webforms as config entities. Create a webform and ensure to open it via every server.
Now modify the webform on one server and save it. The other servers will still deliver the old invalid webform until all caches get cleared.
Proposed resolution
The problem is that ConfigEntityStorage and CachedStorage store all configs to cache.config without cache tags. But APCu relies on such cache tags to calculate the invalidation checksum. Without these tags, the checksum is always 0.
The config cache entries require a self-reference as cache tag.
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #5 | 3402161_5.patch | 1.99 KB | mkalkbrenner |
| #2 | 3402161.patch | 1.99 KB | mkalkbrenner |
Comments
Comment #2
mkalkbrennerComment #3
mkalkbrennerComment #4
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue.
While you are making the above changes, we recommend that you convert this patch to a merge request. Merge requests are preferred over patches. Be sure to hide the old patch files as well. (Converting an issue to a merge request without other contributions to the issue will not receive credit.)
Comment #5
mkalkbrennerfixed whitespace
Comment #6
smustgrave commentedRecommended to use MRs now as patches are being phased out.
As a bug will need a test case showing the problem also.
Thanks
Comment #7
longwaveAs this involves multiple web servers with individual apcu caches this will likely be hard/impossible to write an automated test for.
But I'm amazed this hasn't been spotted before if this is a bug on all config objects?
Comment #8
luke.leberWe've seen all manner of random, inexplicable weirdness with multi-web-server setups in Acquia Cloud Enterprise. We've blamed Memcache primarily, but this could be equally as likely to toss monkey wrenches around if it can be reproduced.
Comment #9
berdirThis was discussed quite a bit in slack.
The fix is definitely not correct, and a) *should* not do anything on 10.1 and lower as cache tags are stripped from the fast backend and b) causes a severe performance regression on 10.2 where cache tags are kept, and then each fast lookup would need an extra lookup against the cache tag invalidation service.
We don't do 1:1 cache tags that are identical to the cache key, just like entity storage caches don't use the cache tag either.
Please do _not_ use this patch :)
My only idea is that something is wrong with the setup that causes the fast chained backend to not work as expected.
Memcache: AFAIK the race condition that was fixed in core/database and redis around cache tag invalidation during database transactions was never fixed in Memcache, so I'd absolutely expect random issues there.
Comment #11
smustgrave commentedSo think this one should be closed?
Comment #12
catchIf the consistent backend is memcache, it is more likely to be #2996615: Transaction support for cache (tags) invalidation.
The only other possibility I can think of would be significant clock drift between the servers so that the fast backend timestamp doesn't work, but that's not covered by this approach, we'd need to change the timestamp to some kind of checksum/counter.
Closing this as cannot reproduce.