Cache tags = data dependencies
Cache tags describe dependencies on data managed by Drupal
Cache tags provide a declarative way to track which cache items depend on some data managed by Drupal.
This is essential for a Content Management System/Framework like Drupal because the same content can be reused in many ways. In other words: it is impossible to know ahead of time where some content is going to be used. In any of the places where the content is used, it may be cached. Which means the same content could be cached in dozens of places. Which then brings us to the famous quote There are only two hard problems in Computer Science: cache invalidation and naming things. — that is, how are you going to invalidate all cache items where the content is being used?
Note: Drupal 7 offered 3 ways of invalidating cache items: invalidate a specific CID, invalidate using a CID prefix, or invalidate everything in a cache bin. Neither of those 3 methods allow us to invalidate the cache items that contain an entity that was modified, because that was impossible to know!
A cache tag is a string.
Cache tags are passed around in sets (order doesn't matter) of strings, so they are typehinted to
string. They're sets because a single cache item can depend on (be invalidated by) many cache tags.
By convention, they are of the form
thing:identifier — and when there's no concept of multiple instances of a thing, it is of the form
thing. The only rule is that it cannot contain spaces.
There is no strict syntax.
node:5— cache tag for
Nodeentity 5 (invalidated whenever it changes)
user:3— cache tag for
Userentity 3 (invalidated whenever it changes)
node_list— list cache tag for
Nodeentities (invalidated whenever any
Nodeentity is updated, deleted or created, i.e. when a listing of nodes may need to change)
config:system.performance— cache tag for the
library_info— cache tag for asset libraries
Drupal 8 core's cache tags
The data that Drupal manages fall in 3 categories:
- entities — these have cache tags of the form
<entity type ID>:<entity ID>
- configuration — these have cache tags of the form
- custom (for example
Drupal provides cache tags for entities & configuration automatically — see the
Entity base class and the
ConfigBase base class. (All specific entity types and configuration objects inherit from those.)
Although some entity types follow a predictable cache tag format of
<entity type ID>:<entity ID>, third-party code shouldn't rely on this. Instead, it should retrieve cache tags to invalidate for a single entity using its
::getCacheTags() method, e.g.
In addition, it may be necessary to invalidate listings-based caches that depend on data from the entity in question (e.g. refreshing the rendered HTML for a listing when an entity no longer exists in it): this can be done using
EntityTypeInterface::getListCacheTags(), then invalidating any returned by that method along with the entity's own tag(s).
$cache_backend->set( $cid, $data, Cache::PERMANENT, ['node:5', 'user:7'] );
This stores a cache item with ID
$cid permanently (i.e. stored indefinitely), but makes it susceptible to invalidation through either the
user:7 cache tags.
Tagged cache items are invalidated via their tags, using cache_tags.invalidator:invalidateTags() (or, when you cannot inject the
Cache::invalidateTags()), which accepts a set of cache tags (
Note: this invalidates items tagged with given tags, across all cache bins. This is because it doesn't make sense to invalidate cache tags on individual bins, because the data that has been modified, whose cache tags are being invalidated, can have dependencies on cache items in other cache bins.
All of the above is helpful information when debugging something that is being cached. But, there's one more thing: let's say something is being cached with the cache tags
['foo', 'bar']. Then the corresponding cache item will have a
tags column (assuming the database cache back-end for a moment) with the following value:
In other words:
- cache tags are separated by space
- cache tags are sorted alphabetically
That should make it much easier to analyze & debug caches!
Finally: it is easy to see which cache tags a certain response depends on (and thus is invalidated by): one must only look at the
(This is also why spaces are forbidden: because the
X-Drupal-Cache-Tags header, just like many HTTP headers, uses spaces to separate values.)
Note: If you're not seeing those headers, you will want to set up your Drupal instance for development.
Integration with reverse proxies
Rather than caching responses in Drupal and invalidating them with cache tags, you could also cache responses in reverse proxies (Varnish, CDN …) and then invalidate responses they have cached using cache tags associated with those responses. To allow those reverse proxies to know which cache tags are associated with each response, you can send the cache tags along with a header.
Just like Drupal 8 can send an
X-Drupal-Cache-Tags header for debugging, it can also send a
Surrogate-Keys header with space-separated values as expected by some CDNs or a
Cache-Tag header with comma-separated values as expected by other CDNs. And it could also be a reverse proxy you run yourself, rather than a commercial CDN service.
As a rule of thumb, it's recommended that both your web server and your reverse proxy support 16 KB headers.
- HTTP is text-based. Cache tags are therefore also text-based. Reverse proxies are free to represent cache tags in a different data structure internally. The 16 KB was selected based on 2 factors: A) to ensure it works for the 99% case, B) what is practically achievable. Typical web servers (Apache) and typical CDNs (Fastly) support 16 KB headers. This means roughly 1000 cache tags, which is enough for the 99% case.
- The number of cache tags varies widely by site and the specific response. If it's a response that depends on many other things, there will be many cache tags. More than 1000 cache tags on a response will be rare.
- But, of course, this guideline (~1000 tags/response is sufficient) may and will evolve over time, as we A) see more real-world applications use it, B) see systems specifically leverage/build on top of this capability.
Finally, anything beyond 1000 cache tags probably indicates a deeper problem: that the response is overly complex, that it should be split up. Nothing prevents you going beyond that number in Drupal, but it may require manual finetuning. Which is acceptable for such extremely complex use cases. Arguably, that's the case even for far less than 1000 cache tags.
Read documentation for using Varnish with cache tags.
CDNs known to support tag-based invalidation/purging:
Internal Page Cache
Comprehensive use of cache tags across Drupal 8 allows Drupal 8 to ship with its Internal Page Cache enabled by default. This is nothing more than a built-in reverse proxy.