Problem/Motivation
Discovered in #1650930: Use READ COMMITTED by default for MySQL transactions, it looks like we unnecessarily invalidate all bins in ChainedFastBackend when cache tags are invalidated.
Because we have the checksum service and all backends (including fast ones) are responsible for checking that, or for invalidating tags themselves, we shouldn't need to.
Steps to reproduce
Proposed resolution
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #23 | 3334489-22-interdiff.txt | 959 bytes | berdir |
| #23 | 3334489-22.patch | 2.86 KB | berdir |
Comments
Comment #2
catchUntested, but this would allow a fast backend to clear tags themselves if they implement the interface, or for nothing to happen if they don't (relying on the cache tags checksum service).
Comment #4
andypostInteresting that most of failed tests are jsonapi and rest modules
Comment #5
berdirNot really because there are tons of very similar tests in both modules, we do the same thing for many different entity types, and almost always if they fail they all fail. I think I've discussed before that these tests are very expensive to run.
There are plenty of other test fails that show that we do need this, because the fast backends don't actually support cache tags.
There might be ways to improve it, but it won't be this easy :)
Comment #6
berdirvery vague idea, we track which kind of cache tags are actually used in a given bin, e.g. by checking the first few characters of them or something, then we can safely ignore most of them, because all the entity tokens for example will not be used in fast chained bins like bootstrap/config/..
Comment #7
longwavesetMultiple()has this, which won't help with #2, as none of the tags will actually be set there.Comment #8
berdiryes exactly, we optimized them away because we thought that this is more efficient for regular reads, but probably did not consider the number and frequency of cache tag invalidations that are happening on busy sites, so that's where my idea in #6 is coming from, that we extend that logic there to know which tags might affect this or not and the vast majority should not IMHO.
Comment #9
catchAhh OK #7 is the fatal flaw :)
I think this might have just been a not-great choice when we originally implemented this. The chained fast backend is generally used for quite stable caches that rarely get invalidated, whereas cache tags are mostly for very volatile caches that store loads of items. I think we only stripped the cache tags because we'd already decided on 'always invalidate all the time' and that made cache tags redundant, but definitely the space saving didn't drive the implementation.
I wrote something like this for the Drupal 7 memcache module and cache prefixes (to replace a previous implementation that stored an ever-growing array and/or blew away the whole bin every prefix clear). It worked but it was a bit of a nightmare to be honest.
This could be more over-optimism, but... can we just not strip the cache tags?
edited to add: Not storing the cache tags means that the cache tags checksum logic is skipped on get, but... also not sure this makes a huge difference in practice. Items in the chained bins will have a few cache tags, but these cache tags are very likely to be used for other cache items in non-chained bins, so we'd need to consult the checksum for them anyway. Conversely, non-chained bins will have literally thousands or millions of cache tags that aren't in the chained bins, that are currently causing invalidations.
Comment #10
longwave@catch yep this whole discussion reminds me of #911232-20: A high percentage of getMulti operations are failing to hit cache ;)
Comment #11
catchOK #9 is green, so I think this is just about the trade-offs now.
Comment #12
longwaveNot so fast, #9 is missing #2 - this patch combines them.
Comment #13
berdir2am, somewhere in Switzerland:
me: trying to sleep.
my brain: Hey you! You forgot the reason why cache tag handling works the way it does in the ChainedFastBackend. It's there to support multiple servers, as each server has his own fast cache backend and invalidations that happen on one server must be shared across all of them.
30min later...
me: still trying to sleep.
my brain: You know what would be funny? Adding that information to the issue as a story where you talk to your brain!
me: please let me sleep now.
---
That was before I saw the updates here though, I was a bit surprised to see #12 pass, but that's because the ApcuBackend also uses the checksum service, forgot about that. If you check the explanation on the ChainedFastBackend class, then that's exactly what it assumes:
That doesn't mention cache tags specifically, but IMHO also implies that. We should probably at least update the documentation, might even need a change record if someone uses a fast backend that really doesn't support tag checksums. Maybe even means that we should in fact not send tag invalidations to the fast backend as a local invalidation is not expected to work.
I'd also like to do some analysis on this, how many different cache tags we have on chained bins on a real site and how many queries to the checksum service this ads on a regular request.
Yes, I also remember the fun we had in memcache in D7, was involved in writing and testing that prefix stuff as well, so it's quite possibly over-optimizing especially if this works well.
Comment #14
berdirSome quick notes on cache tag usage:
bootstrap: just seeing two cache tags, theme_registry and config:user.role.anonymous, barely any cache items too.
config: no cache tags
discovery:
A bit more, some probably unnecessary or could be aggregated into one (workflow tags?):
Comment #15
catchYes this seems like a good idea. I think it's unlikely there's a custom fast backend that doesn't use checksums out there, but we should document it.
I went round in circles on this a couple of times:
- if you're on a single webserver, it would work, and not sending it would be introducing a bug to prevent another (possible) bug
- writing a fast backend that accepts cache tag invalidations directly (instead of using the checksum service) seems unlikely since it's very hard to implement that, so probably no-one will ever notice either way.
- ?!?! a local-only cache backend like apcu that accepts direct cache tag invalidations would be arguably a bug just by existing, because that's never going to work across multiple web heads anyway, chained or not.
So overall, I think we should keep the instanceof check, but add inline docs explaining that this should never be relied on. Another option would be to deprecate having a fast backend that does direct invalidation and eventually throw an exception instead, but could be heavy handed.
| workflow_type_pluginsPretty sure this one can just go, opened #3335145: Unnecessary workflow plugin manager cache tag.
Can't actually see where those are coming from yet. Do you have group module installed? and/or could you point to which cids they're attached to?
Comment #16
longwaveShould cache backends that support checksums indicate this by providing an interface? Then we can detect this in
invalidateTags()and fall back to invalidating the whole backend?Comment #17
berdir> Can't actually see where those are coming from yet. Do you have group module installed? and/or could you point to which cids they're attached to?
Looking more closely, both are from the state_machine contrib module. not exactly great namespacing, should be fine to remove I think.
Comment #18
catchThis is a tricky one I think.
We have CacheBackendInterface - which requires accepting cache tags when a cache item is set.
There's then an assumption that a cache backend does one of the following:
1. Also implements CacheTagsInvalidatorInterface
2. Relies on a different service that implements CacheTagsInvalidatorInterface such as the checksum.
If it doesn't do either of these, then it would be broken if you use it for nearly any core bin.
If you have a custom cache bin that never sees a cache tag and create a backend just for that bin, then you'd be fine, but... does anyone do that.
If we add an interface, it'd need to be something like CacheBackendUsingExternalTagsInvalidatorInterface (and it'd be empty) - we can't really use the word 'checksum' because that's an implementation detail of what core ships with. Then every cache backend would use one or the other interface. But it'd essentially be confirming that the backend does something we already assume it has to. It would be a place to add documentation for how the checksum stuff works, but we could probably add more high-level docs in CacheBackendInterface and CacheTagsInvalidatorInterface that do that as well (not necessarily here).
Comment #19
berdirYeah, I don't think we need an interface for that, I think the change record is basically "ChainedFastBackend now requires that the fast backend supports distributed cache tag invalidations (for example by relying on the checksum service)" and a similar sentence on the class docblock and then we're OK here. As @catch wrote, we pretty much require that already for a regular persisting backend (exception is for example the memory backends, but the expectation there is that they are short/request-lived).
Comment #20
berdirTesting the patch on default umami installation frontend as admin with disabled render caching, I'm seeing 6 extra queries to the cachetags table (142 vs 148 in total). Time difference of that is too small is way too small to reliably profile, but it's still something to consider.
The specific tags are, in order of appearance:
* routes (non admin routes)
* entity_types (also entity_bundles and lots of entity_field_info later, but all of them would be needed anyway sooner or later, so doesn't really count)
* element_info_build
* theme_registry
* breakpoints
* local_tasks
* library_info
That's 7 with entity_types. I think we had discussions before about whether or not we should use cache tags for some kind of plugins/caches. I'd suggest opening a follow-up that we remove cache tags for finite and known variations (langcodes, themes), that would remove library_info, element_info_build and theme_registry. breakpoints is a bit trickier areay, groups there are more dynamic, but we also have an API to fetch them (\Drupal\breakpoint\BreakpointManager::getGroups). Alternatively, we could merge these into one, it's rare that they are invalidated and invalidating a few extra thingss might be worth saving 2-3 queries on every request. Not sure how much cache tags are an API though.
That said, IMHO it's fine to move ahead with this, setting to needs work for the documentation updates.
Comment #21
catchI couldn't find an existing issue for removing cache tags from plugin managers, although definitely remember talking about it somewhere, so opened a new one for that #3335768: Manually clear cache keys from plugin managers with finite variations instead of using cache tags.
Comment #22
berdirI just found this old issue, where most of us had similar discussions 6+ years ago :)
#2611400: [meta] ChainedFastBackend should not invalidate the whole fastBackend when doing a Set()
Comment #23
berdirI seem to have derailed this with the request for documentation, here's a proposed sentence for that. I also created https://www.drupal.org/node/3344524.
Comment #24
andypostI think it should throw deprecation if no interface support for backend implemented
Comment #25
berdirThere is no interface that we could check, there is no explicit API or anything, we always pass in cache tags, it's up to the implementation to decide how to handle that. We discussed that above, introducing something would IMHO just cause existing implementations to break, I'm not aware of a single cache implementation that would be affected by this. This is only about the fast backend, I'm not sure if there even is an alternative to APCu out there, I've certainly never used something else.
Redis module now supports Relay and that comes with a built-in APCu-like cache, but then you want to replace ChainedFastBackend completely, not just the fast backend.
Comment #26
longwaveedit: deleted
Comment #27
longwaveThis whole discussion made me wonder if we were reinventing the wheel, and we do seem to be: Symfony supports database, APCu, filesystem, memory, Redis, Memcache and other backends, and allows chaining too: https://symfony.com/doc/current/components/cache/adapters/chain_adapter....
Symfony also supports cache tags in an apparently cleverer way than we do; you store tags in a cache, but it can be a different storage backend to the one that the actual items are in: https://symfony.com/doc/current/components/cache/cache_invalidation.html
I couldn't find an issue for this, worth opening one to explore?
Comment #28
berdirI think their chain adapter is our more generic BackendChain, I don't see anything there that specifically supports the sync feature of the ChainedFastBackend, but I only had a quick look at it.
The cache tag implementation seems somewhat comparable with cache tag versions that seem similar our checksums. I guess if you use the same tag pool for different adapters, you should get the same behavior as we have, but it will fetch cache tags from the adapter for every bin/pool. For the record, I'm fairly certain we invented that particular wheel quite some time before Symfony did, but that doesn't really matter.
I'm open to exploring to use their cache system, but that seems a like a _big_ issue and I'm sure we'll hit a fair share of roadblocks on the way there. (Like the database transaction topic) And even if we do that, we'd either need to contribute this back or re-implement it as an Adapter, so this is an improvement we want to do either way.
Comment #29
catchI feel it's necessary to point out that it's Symfony that re-invented the wheel, we've had cache tags support since 2012, theirs was added in about 2016. We tried to get cache tag support into the PSR cache spec when it was in progress, and that was rejected at the time - looks like they've figured out a way around that though.
Comment #30
smustgrave commentedQuestion before reviewing (attempting) will this require tests?
Comment #31
berdirWe didn't have tests before about the explicit internal behavior of ChainedFastBackend, or we would have needed to adapt them. We could add them, but I'm not sure if there is a benefit to that.
We have plenty of test coverage that test that ChainedFastBackend overall works as expected (blackbox testing) as basically every functional/browser test is using it. As evident by the 100 fails in the first inconsistent patch.
Comment #32
catchI think this is ready - should be 10.1-only with a change notice, but shouldn't be disruptive and should be a good performance improvement on real sites.
Comment #34
smustgrave commentedRandom failure
Comment #35
geek-merlin> Various plugin manager related cache tags removed
Why was this mixed in here?
With all due honor for the work done here, setting NW for the reasons elaborated in #2241377-124: [meta] Profile/rationalise cache tags
Comment #36
berdirThat is not mixed in here, that is a related draft change record that just mentions this issue as related, as it is the reason for doing it.
Comment #38
catchRandom test failure.
Comment #41
berdirAnother random test fail I assume, in 1) Drupal\FunctionalJavascriptTests\Ajax\ThrobberTest::testProgressThrobberPosition.
Comment #43
andypostrandom test failure
Comment #44
longwaveTechnically as a bug fix this is eligible for 10.1.x but I'm kinda wary about doing this now we are past rc just in case we are breaking something subtle; I still find it hard to understand all layers of the cache subsystem. So let's get this into 10.2.x and it can have six months of testing before it gets into real sites.
Committed and pushed 0448b4bcc77 to 11.x (10.2.x). Thanks!
Comment #46
berdirYeah, makes perfect sense to not backport this. This will also become a clearer improvement if we do #3335768: Manually clear cache keys from plugin managers with finite variations instead of using cache tags as well and that definitely won't go into 10.1.
Comment #48
quietone commentedPublished CR.