Problem/Motivation

#2158003: Remove Block Cache API in favor of blocks returning #cache with cache tags introduced the concept of cache contexts. #2167039: Regression: page cache tags broken, also: some routes incorrectly use _controller -> No page object in template variables -> Fatal in seven_preprocess_page() introduced CacheableInterface.

CacheableInterface only had getCacheKeys(), because that's what was relevant to the scope of that issue. About cache keys, from CacheContexts::convertTokensToKeys():

   * Cache keys may either be static (just strings) or tokens (placeholders
   * that are converted to static keys by the @cache_contexts service, depending
   * depending on the request). This is the default cache contexts service.

But, by now we're using CacheableInterface for use cases where we don't want to return any static keys, but only tokens, because the object that implements CacheableInterface is not itself cacheable, but affects the cacheability of a containing object.

In other words, we now have situations where it's confusing to implement getCacheKeys() because we want implementations to not return actual keys, but only to return tokens (cache contexts) by which the cached thing should be varied.

Proposed resolution

Add CacheableInterface::getCacheContexts().

This will allow both BlockBase() and AccessResult to be easier to understand.

When building a CID for a cacheable object, we'll just have to do array_merge($o->getCacheKeys(), $o->getCacheContexts()).

Remaining tasks

Discuss.

User interface changes

None.

API changes

CacheableInterface::getCacheContexts()

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Wim Leers’s picture

Status: Active » Postponed
Wim Leers’s picture

Discussed this with msonnabaum at DrupalCon Amsterdam — he strongly agreed, and was even very much surprised it wasn't already the case.

Patch attached.


I was just opening a new issue for this, but my browser's autocomplete functionality suggested this title, hence I knew I'd already created this >1 month ago — nice :)

msonnabaum’s picture

Status: Needs review » Reviewed & tested by the community

I remember now that I designed it the way it was with the idea that we'd never treat contexts as anything other than lazy keys, but now that we know we want to bubble the contexts, I agree that they need their own method.

Patch looks sensible to me.

catch’s picture

Status: Reviewed & tested by the community » Needs review

I know we came up with a special case for access checks, but that requires executing the access checkers to get their granularity/contexts prior to generating the cache key for the parent. Also it took me a couple of iterations to get my head around the chicken/egg situation there.

For other cases like say an entity rendering, how do I get the cache contexts of the children? Surely the parent has to know about the implementation of CacheableInterface to call downwards (for example whether an entity has field formatters that require access checks on entities, or none), but that's different to automatically bubbling up.

Fabianx’s picture

FWIW, I just designed the same for contrib then saw that CacheableInterface does not support getCacheContexts().

Ultimately we want #cache to match CacheableInterface, too.

So ['#cache']['contexts']?

There is merits to both approaches. One issue I ran into (that does not apply to D8) is the order and where to put the cache contexts section into the in the end generated Cache ID, having the Cache Contexts as part of the keys is nice for that, as your Cache IDs still make sense.

Playings devils advocate here:

- Isn't it enough to return getCacheKeys() and use isCacheable() == FALSE for things where only the context matters?

Without needing that API change?

Fabianx’s picture

Status: Needs review » Needs work
Issue tags: +Needs issue summary update

The solution Wim and me came up in IRC was to:

- Enforce usage of @cache_context.user as placeholder - this matches symfony syntax and it _is_ a service after all
- Create Cache::filterContexts(), which just uses array_filer($keys, function($key) { return strpos($key, '@') === 0; } );

That makes it both performant and container independent to filter the dynamic cache contexts, also improves DX as a dynamic placeholder can easily be distinguished from a static one.

The reason to be able to find the Cache Contexts is that a child might want to know the cache context of its parent and render a placeholder when the cache context is not matching. This needs to be added to the issue summary.

Wim Leers’s picture

Let's finally finish this one, since it has repercussions for at least #2099137: Entity/field access and node grants not taken into account with core cache contexts and #2396333: BlockContentBlock ignores cache contexts required by the block_content entity.


#4:

For other cases like say an entity rendering, how do I get the cache contexts of the children? Surely the parent has to know about the implementation of CacheableInterface to call downwards (for example whether an entity has field formatters that require access checks on entities, or none), but that's different to automatically bubbling up.

For entities + fields, we have #2099137: Entity/field access and node grants not taken into account with core cache contexts to solve that. And yes, the parent has to know that CacheableInterface is implemented by the children. But no matter how it ends up calling those children, it still needs a reliable way of getting only the cache contexts, not also the cache keys. And it's perfectly possible that they return both.
I don't see why this is a reason to not do #2?


#5: the first half of the comment actually is an endorsement of the patch in #2. The second half is playing devils advocate, but the answer is: no, that is not enough. It's perfectly possible for something to be cacheable (isCacheable() === TRUE) and still return cache contexts. Something may be individually render cacheable (and hence return both keys and contexts in its getCacheKeys() method), but also need to affect a parent when that is cached.


#6 feels like a painful solution, one that is pretty much equally problematic as what we have today, because it still relies on a specifically formatted string, which is not a strong guarantee (nothing forbids cache keys from beginning with an @).

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
8.22 KB

Quite incredibly, #2 applies cleanly after 4 months! (With the sole exception of the HtmlFragment hunk, which is obsolete now since that class is gone.)

Wim Leers’s picture

FabianX and I talked on IRC. We came to the following conclusion + plan:

  1. We agreed that both in HEAD and in the #6 proposal, the DX leaves a lot to be desired. The mixing of "static cache keys" and "dynamic cache context token placeholder cache keys" is very confusing. Splitting things up makes it a lot clearer.
  2. HEAD uses a magic name, #6 is still using a magic name. One that is less fragile, but fragile nonetheless, so marginally better at best.
  3. Q: If we split out the "contexts" (::getCacheContexts() + #cache['contexts']) part out from "keys" (::getCacheContexts() + #cache['contexts']), how do we ensure that they're always combined into a CID in the same way? If one place first uses the contexts, the other first uses the keys, then they'd have different CIDs.
    A: By convention, we generate a CID by first using the keys, then the contexts. This also aids debuggability, because it ensures that the things unique about the thing are listed first, and the contexts it must be varied by are listed last.
  4. Q: Doesn't this break CID-prefix-based invalidation?
    A: D8 doesn't support that: the API doesn't list it anymore, and the hashing we do in the DB cache back-end to ensure we don't exceed the limit of 255 characters for the CID already prevents using prefix-based invalidation anyway.
  5. We'd need to change the render array syntax too: not only have #cache['keys'], but also #cache['contexts']. This would only be a minor disruption, since very little code in core uses cache contexts anyway (it's almost all in the block and entity view builders, which generate almost all renderables). And it'd make things much clearer.
  6. This also allows us to make the CacheContexts service be more strict: rather than only converting the placeholders it is given that are actually cache contexts, we can now throw an exception whenever it is given a non-existing cache context, because the calling code knows exactly which things are contexts and which aren't. This further helps DX: if there's a typo, the developer will instantly know why things aren't working as expected.
  7. Next: the whole reason we used 'cache_context.user' as the cache context token/placeholder is because using just 'user' had a 99% probability of collisions (i.e. a static cache key accidentally being 'user' too and hence end up being treated as a cache context unintentionally). The use of 'cache_context.user' made that a 1% collision probability.
    By separating the two (keys vs. contexts), and introducing the above exception, we make it a 0% probability. Which is quite important for Drupal 8 in the long run, because somebody will probably end up causing a collision even in HEAD's code…
  8. Consequently, we can now actually can use 'user' instead of 'cache_context.user', precisely because we won't have to parse the cache keys and try to identify the cache contexts among them; we'll already know which strings are cache contexts. So this will again make the DX better: rather than having to type the pretty awkward, implementation detail-leaking 'cache_context.language', they'll just be able to type 'language' — much, much better.
  9. Finally: we can implement this ('cache_context.language' -> 'language') by keeping the service declarations identical, but changing the compiler pass, to only pass the service ID without the 'cache_context.' prefix, in combination with requiring cache context services to have that prefix (and throwing an exception otherwise). That way, we leverage the container's preexisting requirement of all services being required to have a unique ID to ensure that all cache contexts and associated cache context services have a unique ID.

The many DX benefits listed above should make the cache APIs less intimidating, and hence they will end up being used more often.

Wim Leers’s picture

FileSize
36.84 KB
31.87 KB

Turns out we need less than 30 net additions to make this happen! :)

Fabianx’s picture

  1. +++ b/core/modules/block/src/Tests/BlockViewBuilderTest.php
    @@ -214,7 +214,7 @@ public function testBlockViewBuilderAlter() {
    -    $default_keys = array('entity_view', 'block', 'test_block', 'en', 'cache_context.theme');
    +    $default_keys = array('entity_view', 'block', 'test_block', 'en');
    

    Is this intentional?

    And should we not define default_cache_contexts as well?

  2. +++ b/core/modules/block/src/Tests/BlockViewBuilderTest.php
    @@ -223,7 +223,7 @@ public function testBlockViewBuilderAlter() {
    -    $cid = 'entity_view:block:test_block:en:core:' . $alter_add_key;
    +    $cid = 'entity_view:block:test_block:en:' . $alter_add_key . ':core';
    

    This reminds me:

    Should we order cache contexts alphabetically when creating the CID to avoid fragmentation?

Overall looks really good, almost RTBC (just feedback to those two points).

Wim Leers’s picture

  1. Yes, because the latter value in the array is not a cache key.
    Yes, we could define $default_cache_contexts as well, but the current solution just handles that single context in a hardcoded manner (it resolves to 'core') because the theme is always going to be the same. I can change it to $default_cache_contexts + use the cache contexts service though.
  2. See the explanation in point 1 for why this was flipped around :)
    RE: sorting: It would definitely be harmless. But I can't think of a single reason why it would be necessary though? As long as the cache contexts are passed around without changing the order, and the code that actually specifies cache contexts is deterministic (i.e. always returns things in the same order), we're fine. If there was a problem in this area, we would have encountered it a long time ago already?
Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

I was thinking of cases where we pass cache contexts upstream in the future, but we can do the sorting and deduplication indeed then do during the merging itself.

RTBC

Wim Leers’s picture

FileSize
36.06 KB

#2318437: Replace the hardcoded langcode key on blocks with the 'language' cache context landed, which conflicts in several places. Straight reroll.

The last submitted patch, 10: cacheable_keys_contexts-2329101-10.patch, failed testing.

Wim Leers’s picture

#10 failed only because testbot is so ridiculously backlogged that a whole bunch of commits happened before the patch even got to being tested… #14 is the reroll to fix that. Hopefully it comes back green.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 14: cacheable_keys_contexts-2329101-13.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
35.23 KB
1.55 KB

Failure due to #2415441: Automate finding @covers errors which also landed after #10 was posted; exception due to an incorrect change in a test. Keeping at RTBC, because there are no actual changes.

effulgentsia’s picture

+++ b/core/lib/Drupal/Core/Cache/CacheableInterface.php
@@ -9,6 +9,15 @@
+ * The only cacheability metadata that must not be bubbled, are the cache keys:
+ * they're explicitly intended to be used to generate the cache item ID when
+ * caching the object they're on.

I don't get this. What's an example where we want the implementor of this interface to decide what should go into its cache ID, but that isn't something that needs to bubble up? Seems to me that implementors of this interface should only implement getCacheContexts() and never implement getCacheKeys(), in which case, should we remove getCacheKeys() from the interface? As an example, BookNavigationBlock::getCacheKeys() currently returns the book's active trail, but why is it we wouldn't want that to bubble up to whatever larger element contains that block?

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs review

Sorry, I hate un-RTBC'ing issues, especially for a question that might have an answer that doesn't require any change to the patch, but I'd feel better having an answer to #19 before this gets committed.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

Let's look at a generic example to explain this.

Every block must be varied by language and theme. Because a block config entity has a label that is displayed to the user and can be different for every language. And because a block is rendered into HTML markup, and the templates may differ per theme. Hence ::getCacheContexts() === ['language', 'theme'].
But just those cache contexts don't make up the cache ID: there is nothing that uniquely identifies the block. So the cache keys we use (by default) are ['entity_view', 'block', $entity->id()]. The first two of those keys are still generic: they apply to all blocks. The last key is what uniquely identifies the block. But the first two keys are important parts too: they already help narrow down what thing it is that is being cached: a rendered entity, that is, a placed block entity, that is, the block with that specific ID. Omitting the first two keys would make it easy to have collisions with something else. So: the cache keys allow us to uniquely identify the object.
Now, when we render a block, the things it must be varied by (language & theme) "infect" the parents/containers of the block too: regions in the page, and the page itself. In other words: if a block must be varied by theme and language, then so must the region that contains it, and so must the page.
However, looking at the cache keys, it doesn't make sense to cache the page using a cache ID that includes the keys ['entity_view', 'block', $entity->id()], because that triple uniquely identifies the block, not the page. Imagine if we did include in the page's cache ID: we'd have to include the cache keys of every single block, every single node, user, term, etc. That doesn't make sense.


I hope that was clear.


Looking at the example you pointed at — BookNavigationBlock::getCacheKeys()/SystemMenuBlock::getCacheKeys() — you pull out the one example in all of core that is confusing :P

Cache contexts allow you to say "my thing depends on this bit of the request context, vary by it when caching". Examples: user, user's roles, URL, theme, language — all of those are clearly "request contexts". Of each of those, there is one and only one of them per request. Which is why we can use them as placeholders: we know that the placeholder/token can be (efficiently) replaced with the effective value for the current request… precisely because there is only one value for them for every request.

Now, the problem with the active trail (in menus/books — they're the same concept, same implementation, same trickiness) is that there is no THE active trail, there are multiple active trails! That is, there is an active trail per menu, an active trail per book!


The only way we could possibly turn that into a cache context, is by allowing a cache context to receive parameters: we could then have an "active menu trail" cache context that receives the menu name as a parameter, in which case "one and only one per request" is true again, given that parameter to select a menu to get the active trail for.

Until now, we've chosen not to go that way. We could. But IMO that would be a follow-up issue.

OTOH, as soon as a parameter needs to be passed, it indicates that you're no longer dealing with mere request context. You're also dealing with a fair bit of application logic. In which case, you could argue — as we've done so far — that you're no longer dealing with "vary by a request context", but "object uniqueness". And uniquely identifying an object is the responsibility of the cache keys again.

I'd generally be in favor of having "parametrized cache contexts", but IMO that'd belong in a follow-up issue.

Wim Leers’s picture

Fabianx’s picture

This is just an idea I had when reading #21, but thinking about this some more, maybe we should add a cache tag whenever we add a cache context, so that its no longer the one adding the cache contexts to also add the cache tags, which might not even be known at that time or double the logic.

Just food for thought and not in this issue for sure.

Wim Leers’s picture

webchick’s picture

Assigned: Wim Leers » catch

This smells catch-ish.

Wim Leers’s picture

catch’s picture

I'm holding off committing this because I'm not sure about #2429257: Bubble cache contexts, and I think that's the only issue blocked by this.

Wim Leers’s picture

Fabianx’s picture

I think this patch is a necessity for Drupal regardless of the other parts, here is why:

Cache Keys - Identify the thing, based on its configuration (e.g. view_mode, language, ...) - its the unique address
Cache Tags - Allow invalidation of the thing
Cache Contexts - show the variation of the thing - based on some state that is _outside_ of the thing.

Already in render_cache I learned that e.g. modified time is something that is usually not a true variation (later versions overwrite earlier versions) and view_mode is something that is configured on the thing, but 'phase of the moon' - being external - is a true variation where the same configuration has different outputs.

It is important to distinguish both cases as it makes talking about this things way simpler and also makes the whole system simpler to understand.

This also much improves the DX of being able to write just $build['#cache]['context'][] = 'user';

Wim Leers’s picture

#27: Even if we don't do bubbling of cache contexts, then we still want this to happen. Because we still want e.g. the entity view builder to gather the cache contexts of the field formatters that it's about to use.

Plus, as #29 indicates, this makes the distinction between keys & contexts much clearer. This will give us a clear cache metadata trifecta:

  • cache keys: "the thing" — could be "a specific configuration of the thing" — like an entity (thing) view mode (configuration)
  • cache contexts: "state of the viewer to which the representation of the thing is adapted"
  • cache tags: captures the dependencies used for the representation, hence indicating when it should be invalidated

Technically, we already have those 3 things. But we mix the keys & contexts, which makes it significantly less clear.

  • catch committed 460af4c on 8.0.x
    Issue #2329101 by Wim Leers: CacheableInterface only has a getCacheKeys...
catch’s picture

Status: Reviewed & tested by the community » Fixed

OK found and read the google doc which covers all the points I raised on #2429257: Bubble cache contexts. If we're going to do some combination of cid caching, placeholdering and cache context comparison/hierarchy then this should not only allow for much better performance but also help to resolve intractable issues like the entity/field access ones.

One typo fixed on commit.

+++ b/core/lib/Drupal/Core/Cache/CacheContexts.php
@@ -112,16 +108,17 @@ protected function getContext($context) {
+   * Retrieves a cache contet service from the container.

context.

Committed/pushed to 8.0.x, thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

YesCT’s picture

Issue tags: -Amsterdam 2014 +Amsterdam2014

changing to the more commonly used tag, so I can delete the one lesser used