Problem/Motivation
Whenever a cache entry is retrieved from the database, we check what tags were added to it and retrieve the invalidation count for said tags from the cachetags table. This leads to multiple queries per request where several tags are often repeated across (almost) all requests.
It has even led to other DX issues being deemed inadvisable because of this drawback: #3001284: Allow plugin derivers to specify cache tags for their definitions
Steps to reproduce
Run the StandardPerformanceTest, witness the high counts in the following assertion:
$this->assertSame(COUNT_GOES_HERE, $performance_data->getCacheTagIsValidCount());
Every call in that count has the potential to trigger a query in DatabaseCacheTagsChecksum::getTagInvalidationCounts()
Proposed resolution
From the other issue:
What if we build a simple system that allows us to either specify or identify commonly used cache tags and, at the very first call to ::calculateChecksum(), we load ALL of them with one query. This would both improve the performance of Drupal as it stands and make the current issue less of a hassle because we could (but don't have to) add whatever cache tag our deriver specifies to the "common" list.
I feel like this wouldn't necessarily have to be a smart system that tries to automagically deduce which tags belong to the list, but could rather be a very rudimentary system where both core and contrib can specify tags to add to the list. We could then even have the entity type manager make sure that all entity type's list tags are added to said list, for instance, but that's up for discussion.
- Look at Grafana to see all of the queries like
SELECT [tag], [invalidations] FROM {cachetags} WHERE [tag] IN ( :tags[] ) - Create an event dispatcher for "common cache tags", write an event subscriber that adds the common cache tags to the list.
- Optionally create another subscriber that depends on the entity type manager and adds all list cache tags
- Get the result of this in DatabaseCacheTagsChecksum::getTagInvalidationCounts() and on the first call merge these into the ones we're taking to the DB
Remaining tasks
We might need to have another look at PerformanceTestTrait::isDatabaseCache() because the call amount to ::getTagInvalidationCounts() will not be reduced, but the query count should. So we might want to start tracking these queries again somehow.
User interface changes
None
API changes
New event to declare common cache tags
Data model changes
None
Release notes snippet
None?
Issue fork drupal-3436146
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
kristiaanvandeneyndeThis already proves that our isValid count doesn't translate 1:1 to database queries. Perhaps if this goes green it should already be committed so that the work being done here can more easily prove the count difference.
Even if we don't commit this in its current form, we can still compare results later on to the following.
Marking as NW because it doesn't contain the actual fix yet.
Comment #4
kristiaanvandeneyndeHere are the common stand-alone cache tags I found in Grafana:
Then there were some entity type related tags we should put in a separate subscriber:
Then we had some tags related to the following that could be added dynamically in follow-up issues for new event subscribers:
IMO we should start with the common ones and the entity type related ones as those do not need to be cleared at all unless a cache clearing event takes place (new module, new entity type, ...) The event subscribers based on config or entities (menus, blocks, formats, image styles, ...) need to somehow clear the "common cache tag" list whenever the list changes.
The cached list of common cache tags should be manually cleared by these event subscribers when necessary rather than use cache tags itself because that will get really nasty really quickly :D
Comment #5
kristiaanvandeneyndeLatest commit does not introduce the event stuff yet, but shows a proof of concept where we hard-code the common list from above. The results are amazing:
That's -67%, -39%, -35%, -33%, -79% and -38% amount of queries or 56 total queries fewer across our current tests.
Comment #6
kristiaanvandeneyndeSwitched to an event subscriber. left a bunch of @todo comments.
Reason I'm using:
Is because if I tried to add it as a proper argument Drupal wouldn't even boot because DrupalKernel::$defaultBootstrapContainerDefinition contains cache_tags_provider.container and at that point we do not have a container yet, so event_dispatcher doesn't resolve. Trying to add 'event_dispatcher' to that array just caused more errors where the autowired 'session' closure wasn't being passed in. Using a setter made all of these issues go away.
Comment #7
catchI opened #3438070: Add JSON:API performance tests based on the discussion in the MR.
Comment #8
kristiaanvandeneyndeOkay so a local attempt at getting this to work with entity types dynamically was running into eternal loops, but I just had a chat with @catch and we think we may have a solution for that:
We fire an event early on that asks for this list of cache tags, when/if we find this list in the bootstrap cache we set it on the container (or wherever). In the CacheTagsChecksumTrait, we check for said list and don't do anything special if it's not there. This will prevent eternal loops.
Then, we can have the entity type manager load all the info it needs and spit out a list of common cache tags. In the end, we cache said list without cache tags (because here be monsters), but with a max-age in the bootstrap cache. Because it's just an optimization, we don't need to worry that much if this cache becomes stale for a short while; next cache clear or after the list expired it will be recalculated anew.
Comment #9
berdirWorking on #3497341: Add caching to ConfigurableLanguageManager::getLanguages() reminded me about this, as that issue adds a very early cache tag. The first in fact if page cache isn't a hit.
I still think this is interesting, but I think it's not quite so straight-forward, especially the timing of when we do this load.
Examples below are all against umami /en/recipes/super-easy-vegetarian-pasta-bake
The best case scenario is that we have an internal page cache hit. That means exactly one cache tag lookup.
One single lookup with all the tags. We neither can nor want to preload them all, because they will be different on every page. In other words, trying to preload common ones before this will actually expand it from one to two queries.
Second best is a dynamic page cache hit:
Multiple queries that are very stable and easy to optimize, but the dynamic page cache one itself is not, same as above. And the access policies might *look* easy, but it contains the tags of all roles of the current user. See below point 3.
Umami has history, so also triggers a second request:
Same page, but disabled dynamic page cache, with render cache hits.
However, a scenario with (dynamic) page cache miss due to an invalidated cache tag is I guess also very common, where that will basically act as a bulk load for many of the following cache tags that will be checked again.
To me, that means two things (the off-by-one error was actually not on purpose but it fits perfectly on cache invalidation):
1. I think it should not be an internal thing in the checksum trait but an external request subscriber that runs very early but after page cache. It doesn't matter too much if it's on a page cache miss because then most will already be filtered out from the static cache.
2. I think we should avoid making a first iteration too complicated. We can introduce the concept, with a few examples that remove a handful of fixed cache tags that the vast majority of sites use. A contrib module could always experiment with more, I also still think supporting extra cache tags through Settings::get() would be an easy way to optimize specific sites.
3. I still see a lot of potential to optimize away overlapping and unnecessary dynamic cache tags. Why do we have routes and route_match, do we really need both? I have an issue open about the block ones, that turned out to be not so trivial but it has a lot of potential. For access policies, what if we just invalidate the access_policies cache tag if any role is saved, instead of adding a dynamic amount of cache tags? It's always a tradeoff between performance and simplicity, and invalidations vs. lookups.
This also reminded me about the current redis issue with the last write, which is quite inefficient, created #3498940: Optimize bin cache tags and last write timetamp. Which in turn made me wonder if fast chained backends should use cache tags instead of a regular cache get on the fast backend? They now require cache tag support anyway and it would mostly avoid the redis problem and could be combined with the preloading done here.
Performance stuff is always _such_ a rabbit hole ;)
Comment #10
berdirSuggestions, how about I pull the performance test parts to assert the behavior into #3500739: Aggregate cache operations per bin in performance tests to allow for more specific asserts, then we can document the current baseline and it makes it easier to compare improvements?
I did a merge of 11.x (since previous updates also did a merge), didn't review yet if the tests still pass.
I did have a quick look at the current event based idea, but it currently breaks when using redis. The problem is that redis uses cache tags to handle invalidateAll(), which means that even the initial container cache load does a cache tag check. This is being optimized away in both core (by deprecating invalidateAll()) and optionally just in redis for now, but it's there and the current default behavior.
Ignoring that, the next thing is then that this would trigger an event subscriber on internal page cache. Even with caching and what not, that's a ton of extra complexity at that point that we don't want or need as mentioned above.
So I really think this should be externalized into a separate and very early request subscriber.
Comment #11
catchYes that sounds good - easier to show the diff in this issue that way.
Could we do something with attributes here? Like look for a PreloadCacheTags attribute on services in a compiler pass, and add those to a list in the container? It would not help with dynamic tags that depend on entities, but a service could still list those manually and we already don't have a way to do that properly dynamically due to the infinite recursion issues above.
Comment #13
berdirRunning into some awkward edge cases with this on non-installed sites, installer tests are failing that this results in creating the cachetags table. Might need to rethink this a bit. Possibly a new method to register preload cache tags on the checksum service but it will only run when doing an actual lookup query. not sure.
Comment #14
kristiaanvandeneyndeOne thing I do in Group when no services are available yet but I need to polyfill some services based on plugins is using the discovery directly:
Maybe we could do the same to find entity type definitions and get their cache tags that way? Could do the same for certain config entity types perhaps. Downside is we need to do this on every request. So we'd have to cache the outcome with a max age but no cache tags, for the reasons discussed above.
This does lead to potential cache duplication, but because the information stored in the "common cache tags" cache is non-critical I don't see how it could hurt if said cache were to contain too much or too little information for a small amount of time.
Comment #15
catchI ran into the same installer tests issue with #3486503: Add a persistent cache for file-based discovery based on FileCache, but nothing useful applicable from there because I think it needs a different approach anyway.
This seems worth a try and would have the advantage that with a theoretical request that has no cache tags to look up, nothing would run.
Comment #16
berdirAdded that now, means alternative implementations will need to implement the new interface (I went with that because alternative implementations might not use the trait and there's also decorators.
The checksum count is reset to previous values, but that's kind of moot anyway as it doesn't really have a performance impact, as discussed, we might want to remove/deprecate those.
One aspect of this is that there can easily be multiple subscribes that all add their own tags. So we don't need to trigger its own event or something to collect the tags.
Comment #17
berdirSomehow only one test caught a really fun bug with the preloaded cache tags, calculateChecksum() then returned the checksum for all tags including the preloaded ones, unsure why that only showed up in one case.
Comment #18
wim leersFound this via @berdir's blog post at https://www.md-systems.ch/en/blog/2025-01-26/redis-startup-performance-i... — fascinating stuff! 😄
I remember that back in 2013–2015 when I helped stabilize Drupal 8's caching layers with @catch, @berdir and others, that we discussed (basically dreamed) of one day having enough information/insight to do things like y'all are doing here.
I specifically remember the trade-offs of Drupal shipping with default assumptions that would not be true once contrib/custom modules are added. Those tensions are very apparent here. It'd be very cool to have Drupal self-monitor and self-optimize, but the overhead/cost of that is likely prohibitive.
~2 years ago I was thinking about this too in the context of Acquia's hosting platform. I remember having some ideas of how to solve this generically, but then never found the time to follow through. Unfortunately, those ideas evaporated since. But IIRC they centered on the use of a bloom filter. I forget how exactly. But reading this, I think I was thinking of using it to automatically determine the most commonly used cache tags.
</braindump>Comment #19
berdirRebased, performance tests had quite a bit of a back and forth, so I squashed the commits together.
I did a braindump of my own recently in #3500648: [meta] Improve reports and cache performance investigation features of ideas on how to expose more issues and optimization possibilities. Also not sure if https://www.drupal.org/project/webprofiler already supports displaying loaded cache tags for a page, if not then it definitely should.
As you said, it's a tradeoff. A presumably tiny cost of loading a few extra cache tags that the current page doesn't need but that cost increases if we introduce dynamic logic to guess those tags. See #3503843: Remove automatic preloading of all "public" routes, cache routes in fast chained bin for an extreme case of this, there we preload way too much which uses a good chunk of extra memory and I'm trying to reduce that now.
The approach that we now have allows for multiple implementations to add tags without resulting in multiple queries, so a contrib module could try to do something fancier, but I think in practice, most sites have a pretty stable baseline of cache tags that are really frequently used. My blog post has the full list of cache tags I added in our project.
Some of the more common things that will show on specific projects are *_values tags from entity storage cache (after some deliberation, I added user_values now as it's a core module and shows up in most lists but not node_values). But plenty of content entity types are also not regularly used when rendering pages, so I'm not sure that we need to dynamically collect and add them (and then cache that information again).
And I'd prefer to limit this issue to just introduce this capability and a set of baseline tags that most sites will use. Decoupled sites likely have quite a different situation for example, but many tags of this list should still be relevant there. One reason to have a JSON:API performance test, so we can better see the impact of issues like this on that. It won't need libraries and local tasks for example, but it still needs routes, access, entity types and fields.
Another thing worth mentioning here is that there is only really a benefit if we can eliminate a lookup query entirely.
Take this section as an example
['rendered', 'user:0', 'user_view'],
['config:filter.format.restricted_html', 'node:1', 'node_view'],
['block_view', 'config:block.block.stark_site_branding', 'config:system.site'],
There's little benefit to add rendered or user view since it's almost always combined with a specific user id, same for formats and node_view, as long as the node:1 lookup is still needed we gain nothing. And again, same for block_view and config:system:site, since the block id is configuration. That said, I did open #3341042: Use only the list cache tag for block config entities quite a while ago to eliminate all those block cache tags, will try to get back to that. that will change things around and we could consider to preload certain cache tags.
Also, page cache and dynamic page cache misses essentially also act as preloaders for a given page. If a page cache can be loaded but is no longer valid because of a cache tag invalidation such as node_list, we've already preloaded 90%+ of the cache tags for all the entities, blocks and configs that we'll see on that page.
Comment #20
catchHead exploding emoji.
Comment #21
berdirYeah, that is pretty interesting, and I think many sites will more often than not have a page/dynamic page cache miss due to cache tags instead of it not being in the cache at all.
I added a test to demonstrate that situation, it's interesting, but I think it's not in scope to be committed here and in its current form is IMHO too fragile. But as always with these performance issues, various interesting bits in there: The level of insight into performance metrics that we have now is really powerful and allows us to explicitly and fairly easily assert the exact behavior around caching.
* There is a bug in argument replacement, if there are more than 10 numbered placeholders, then it will replace "foo_10,foo_11,..." with "foo_1". reversed the args array to fix this, should probably be its own issue.
* I was really surprised to see those route queries, I currently don't understand why creating a new node invalidates route load caches (route match maybe, but specific raw route data? why would that change? somehow there's also a query before the current preload happens. The whole query is I think also too fragile and would be a pain to maintain, as every time we add or remove a module from standard or add/remove any of those routes we'd have to manually update that very, very long list. Good thing is that many of these problems will go away if we move forward with #3503843: Remove automatic preloading of all "public" routes, cache routes in fast chained bin, except maybe the part why they are invalidated in the first place.
* the 7 caches being written are: data (2 route load caches, 1 views data cache), render (views render cache, including a redirect due to url:site cache context being added by something), dynamic page cache and page are obvious. I tried adding an explicit assert on those, but they contain absolute URL's and hashes, we would need to normalize those similar to queries.
* cache tag preloading is working well, as expected. the only thing not covered yet with the preload list is views_data and the core extension config (we still don't delete all caches on a module install, only uninstall) and node_values.
Comment #22
catchI just manually tested and the 'routes' cache tag wasn't invalidated, and I can't think how else that could get invalidated. It might be worth an extra page request prior to creating the node just to make sure that cache is definitely warmed. Worth splitting that test to its own issue then we can open spin-offs from there.
Might be worth an issue to discuss adding that to required cache contexts?
Left a couple of related comments in the URL for user_values and node_values - I think we could implement event subscribers in the respective modules for these.
Comment #23
berdirI moved the new test to #3504516: Add node:list cache invalidation performance test, will reply there on that part of the discussion.
Comment #24
berdirAdded a node cache tag preloader as an example, still not exactly sure if we want to do this.
Comment #25
catchShould we look at removing the _values cache tags altogether? In a separate issue of course but +1 to splitting that out if we have other options.
Comment #26
berdirCreated #3505059: Remove entity_type_values cache tags, how about I remove the node cache tag preloader then?
Comment #27
catchYes round in circles but I think it's better to try to get rid of it altogether if we can.
Comment #28
kristiaanvandeneyndeMR approach looks good to me, I'm just a bit sad we weren't able to crack the dynamically generated cache tags nut yet. Re the _values cache tag, I commented on the other issue but I'm a fan of its removal.
I'd RTBC and say we try to tackle dynamic cache tags another day, but I spotted two nitpicks that can be accepted from the web IDE.
Comment #29
catchOne trivial thing that's been bothering me - common cache tags makes it sound like these are common to lots of cache items, whereas the reason we're trying to preload them is because they're frequent, even if they only apply to one or two cache items. This concept isn't in the MR so just changing the title.
It would be easy in contrib or custom code to register another event listener, dymanically generate the tags, cache in apcu (or chained fast) but it's introduce an extra query on sites without apcu so not sure it's worth trying to make it work in core - we might come up with something though.
Comment #30
berdirThanks for the review. Accepted the docblock change, also added a @see to resolve a review from @catch. I will add a change record for this tonight or so about this new feature and the setting that allows to customize it.
> I'm just a bit sad we weren't able to crack the dynamically generated cache tags nut yet
Happy to create a follow-up to discuss this further. I know you initially created this based on my response in #3001284: Allow plugin derivers to specify cache tags for their definitions but with the current usage + planned optimizations, I'm not sure I even see a need for that. See #19/21, on cache-tag-triggered page cache misses, we already have powerful and highly dynamic tag preloading systems in place, #3504516: Add node:list cache invalidation performance test is aiming to add a test to demonstrate that.
Comment #31
kristiaanvandeneyndeTo me this is a good enough start. I changed a doc line that got reverted by accident.
It could also unblock #3001284: Allow plugin derivers to specify cache tags for their definitions in a way where we support plugin derivers specifying cache tags but document that they also need to preload their (list) cache tag using an event subscriber. That way we don't introduce extra cache tag lookup queries, but still allow people to not have to write dozens of FOO_insert/update/delete hooks. It also takes care of the dynamic part because now it's up to the module that knows which list cache tag they're using to declare said list cache tag.
The drawback is we'd end up with a few extra event subscribers being called on every request vs having to figure out how to load all entity type's list cache tags without talking to any cache (could use discovery here).
Tests went green and I expect them to remain that way after the docs change, so RTBC. Also adding and applying a doc change re #29 to rephrase "common cache tags" as "frequently used cache tags".
Comment #32
catchNeeds a rebase.
Comment #33
catchAlso a change record.
Tagging for the 11.2 release highlights because while it's a low-level change we can include it in a general caching/performance improvements section, which looks like it could be quite big for 11.2
Comment #34
berdirDid a rebase, didn't get any conflicts.
Started a change record: https://www.drupal.org/node/3505248, it's a lot of information and I plan to move/copy some of that to the cache tags documentation page.
Comment #35
catchChange record looks great, applied one small wording suggestion in the class summary.
Comment #37
catchCommitted/pushed to 11.x, thanks!
Comment #39
benjifisherI just noticed this issue, and I had a look at the commit:
Should we add
array_unique()to that?Second point:
I think we should add something to
core/assets/scaffold/files/default.settings.phpdescribing the new setting.Comment #40
berdirThanks for the review.
1. I'm doing a array_unique() when these cache tags are merged with the ones that were requested to be loaded. This method is not expected to be called many times nor with many tags. I don't think there is a benefit of running array_unique() every time.
2. IMHO, not every setting needs to be documented in settings.php. People shouldn't really mess with this unless they follow the docs on it (which I plan to move/copy from the change record).
Comment #42
catch