Cache tags

Last updated on
11 May 2017

Cache tags = data dependencies

Cache tags describe dependencies on data managed by Drupal

Why?

Cache tags provide a declarative way to track which cache items depend on some data managed by Drupal.

This is essential for a Content Management System/Framework like Drupal because the same content can be reused in many ways. In other words: it is impossible to know ahead of time where some content is going to be used. In any of the places where the content is used, it may be cached. Which means the same content could be cached in dozens of places. Which then brings us to the famous quote There are only two hard problems in Computer Science: cache invalidation and naming things. — that is, how are you going to invalidate all cache items where the content is being used?

Note: Drupal 7 offered 3 ways of invalidating cache items: invalidate a specific CID, invalidate using a CID prefix, or invalidate everything in a cache bin. Neither of those 3 methods allow us to invalidate the cache items that contain an entity that was modified, because that was impossible to know!

What?

A cache tag is a string.

Cache tags are passed around in sets (order doesn't matter) of strings, so they are typehinted to string[]. They're sets because a single cache item can depend on (be invalidated by) many cache tags.

Syntax

By convention, they are of the form thing:identifier — and when there's no concept of multiple instances of a thing, it is of the form thing. The only rule is that it cannot contain spaces.

There is no strict syntax.

Examples:

  • node:5 — cache tag for Node entity 5 (invalidated whenever it changes)
  • user:3 — cache tag for User entity 3 (invalidated whenever it changes)
  • node_list — list cache tag for Node entities (invalidated whenever any Node entity is updated, deleted or created, i.e. when a listing of nodes may need to change)
  • config:system.performance — cache tag for the system.performance configuration
  • library_info — cache tag for asset libraries

Drupal 8 core's cache tags

The data that Drupal manages fall in 3 categories:

  1. entities — these have cache tags of the form <entity type ID>:<entity ID>
  2. configuration — these have cache tags of the form config:<configuration name>
  3. custom (for example library_info)

Drupal provides cache tags for entities & configuration automatically — see the Entity base class and the ConfigBase base class. (All specific entity types and configuration objects inherit from those.)

Although some entity types follow a predictable cache tag format of <entity type ID>:<entity ID>, third-party code shouldn't rely on this. Instead, it should retrieve cache tags to invalidate for a single entity using its::getCacheTags() method, e.g. $node->getCacheTags(), $user->getCacheTags(), $view->getCacheTags() etc.

In addition, it may be necessary to invalidate listings-based caches that depend on data from the entity in question (e.g. refreshing the rendered HTML for a listing when an entity no longer exists in it): this can be done using EntityTypeInterface::getListCacheTags(), then invalidating any returned by that method along with the entity's own tag(s).

How

Setting

Any cache backend should implement CacheBackendInterface, so when you set a cache item with the ::set() method, provide third and fourth arguments e.g:

$cache_backend->set(
  $cid, $data, Cache::PERMANENT, ['node:5', 'user:7']
);

This stores a cache item with ID $cid permanently (i.e. stored indefinitely), but makes it susceptible to invalidation through either the node:5 or user:7 cache tags.

Invalidating

Tagged cache items are invalidated via their tags, using cache_tags.invalidator:invalidateTags() (or, when you cannot inject the cache_tags.invalidator service: Cache::invalidateTags()), which accepts a set of cache tags (string[]).

Note: this invalidates items tagged with given tags, across all cache bins. This is because it doesn't make sense to invalidate cache tags on individual bins, because the data that has been modified, whose cache tags are being invalidated, can have dependencies on cache items in other cache bins.

Debugging

All of the above is helpful information when debugging something that is being cached. But, there's one more thing: let's say something is being cached with the cache tags ['foo', 'bar']. Then the corresponding cache item will have a tags column (assuming the database cache back-end for a moment) with the following value:

bar foo

In other words:

  • cache tags are separated by space
  • cache tags are sorted alphabetically

That should make it much easier to analyze & debug caches!

Headers (debugging)

Finally: it is easy to see which cache tags a certain response depends on (and thus is invalidated by): one must only look at the X-Drupal-Cache-Tags header!

(This is also why spaces are forbidden: because the X-Drupal-Cache-Tags header, just like many HTTP headers, uses spaces to separate values.)

Note: If you're not seeing those headers, you will want to set up your Drupal instance for development.

Integration with reverse proxies

Rather than caching responses in Drupal and invalidating them with cache tags, you could also cache responses in reverse proxies (Varnish, CDN …) and then invalidate responses they have cached using cache tags associated with those responses. To allow those reverse proxies to know which cache tags are associated with each response, you can send the cache tags along with a header.

Just like Drupal 8 can send an X-Drupal-Cache-Tags header for debugging, it can also send a Surrogate-Keys header with space-separated values as expected by some CDNs or a Cache-Tag header with comma-separated values as expected by other CDNs. And it could also be a reverse proxy you run yourself, rather than a commercial CDN service.

As a rule of thumb, it's recommended that both your web server and your reverse proxy support response headers with values of up to 16 KB.

  1. HTTP is text-based. Cache tags are therefore also text-based. Reverse proxies are free to represent cache tags in a different data structure internally. The 16 KB response header value limit was selected based on 2 factors: A) to ensure it works for the 99% case, B) what is practically achievable. Typical web servers (Apache) and typical CDNs (Fastly) support 16 KB response header values. This means roughly 1000 cache tags, which is enough for the 99% case.
  2. The number of cache tags varies widely by site and the specific response. If it's a response that depends on many other things, there will be many cache tags. More than 1000 cache tags on a response will be rare.
  3. But, of course, this guideline (~1000 tags/response is sufficient) may and will evolve over time, as we A) see more real-world applications use it, B) see systems specifically leverage/build on top of this capability.

Finally, anything beyond 1000 cache tags probably indicates a deeper problem: that the response is overly complex, that it should be split up. Nothing prevents you going beyond that number in Drupal, but it may require manual finetuning. Which is acceptable for such extremely complex use cases. Arguably, that's the case even for far less than 1000 cache tags.

Read documentation for using Varnish with cache tags.

CDNs known to support tag-based invalidation/purging:

Internal Page Cache

Comprehensive use of cache tags across Drupal 8 allows Drupal 8 to ship with its Internal Page Cache enabled by default. This is nothing more than a built-in reverse proxy.

See also