Problem/Motivation

cache.config uses the chained_fast backend by default, which leverages APCu. In case of running multiple web servers in parallel, each of them has its own APCu. If a config gets updated on one server, the outdated cahe entries of that config remain valid on the other servers.

Steps to reproduce

Setup drupal with multiple webservers.
The webforms contrib module stores webforms as config entities. Create a webform and ensure to open it via every server.
Now modify the webform on one server and save it. The other servers will still deliver the old invalid webform until all caches get cleared.

Proposed resolution

The problem is that ConfigEntityStorage and CachedStorage store all configs to cache.config without cache tags. But APCu relies on such cache tags to calculate the invalidation checksum. Without these tags, the checksum is always 0.
The config cache entries require a self-reference as cache tag.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#5 3402161_5.patch1.99 KBmkalkbrenner
#2 3402161.patch1.99 KBmkalkbrenner

Comments

mkalkbrenner created an issue. See original summary.

mkalkbrenner’s picture

Issue summary: View changes
StatusFileSize
new1.99 KB
mkalkbrenner’s picture

Status: Active » Needs review
needs-review-queue-bot’s picture

Status: Needs review » Needs work

The Needs Review Queue Bot tested this issue.

While you are making the above changes, we recommend that you convert this patch to a merge request. Merge requests are preferred over patches. Be sure to hide the old patch files as well. (Converting an issue to a merge request without other contributions to the issue will not receive credit.)

mkalkbrenner’s picture

Status: Needs work » Needs review
StatusFileSize
new1.99 KB

fixed whitespace

smustgrave’s picture

Status: Needs review » Needs work

Recommended to use MRs now as patches are being phased out.

As a bug will need a test case showing the problem also.

Thanks

longwave’s picture

As this involves multiple web servers with individual apcu caches this will likely be hard/impossible to write an automated test for.

But I'm amazed this hasn't been spotted before if this is a bug on all config objects?

luke.leber’s picture

We've seen all manner of random, inexplicable weirdness with multi-web-server setups in Acquia Cloud Enterprise. We've blamed Memcache primarily, but this could be equally as likely to toss monkey wrenches around if it can be reproduced.

berdir’s picture

Status: Needs work » Postponed (maintainer needs more info)

This was discussed quite a bit in slack.

The fix is definitely not correct, and a) *should* not do anything on 10.1 and lower as cache tags are stripped from the fast backend and b) causes a severe performance regression on 10.2 where cache tags are kept, and then each fast lookup would need an extra lookup against the cache tag invalidation service.

We don't do 1:1 cache tags that are identical to the cache key, just like entity storage caches don't use the cache tag either.

Please do _not_ use this patch :)

My only idea is that something is wrong with the setup that causes the fast chained backend to not work as expected.

Memcache: AFAIK the race condition that was fixed in core/database and redis around cache tag invalidation during database transactions was never fixed in Memcache, so I'd absolutely expect random issues there.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

smustgrave’s picture

Issue tags: +Bug Smash Initiative

So think this one should be closed?

catch’s picture

Status: Postponed (maintainer needs more info) » Closed (cannot reproduce)

If the consistent backend is memcache, it is more likely to be #2996615: Transaction support for cache (tags) invalidation.

The only other possibility I can think of would be significant clock drift between the servers so that the fast backend timestamp doesn't work, but that's not covered by this approach, we'd need to change the timestamp to some kind of checksum/counter.

Closing this as cannot reproduce.