Proposed commit message

Issue #2432837 by Wim Leers, Fabianx: Make cache contexts hierarchical (e.g. 'user' is more specific than 'user.roles')

Problem/Motivation

  1. Cache Contexts offer great granularity, but e.g. a cache_context like menu_trail that is bubbled up to page (response) level is needlessly increasing the complexity of that cache ID. After all, the menu trail cache context really is a subset of the URL cache context.
    Also an advanced HttpMiddleware page cache would likely try to transfer as less cache contexts over the wire as possible.
  2. The 2nd reason is to be able to compare cache contexts.

    e.g. An entity that is displayed per role, is displayed within a block that is configured per user.

    Without the hierarchy, the role context would be propagated up and a sensible policy might have decided to create a placeholder for the 'role' entity as its not compatible, but because "per user" is a more specific cache context (a subset of) than "per role", this is wrong, because no placeholder is needed in reality.

    Proposed resolution

    Remaining tasks

    None.

    User interface changes

    None.

    API changes

    Only API additions.

    Cache context services have been renamed, which practically amounts to an API change as well. See the CR (https://www.drupal.org/node/2451661) for details.

    ---

    The global definition would ideally live in the container, so its:

    a) fairly static
    b) very reliable and usable at page middle ware level pre-boot and pre-routing.

    In other words, an advanced page cache sitting at the middle ware level as explained in the doc, would probably make heavy use of the cache context hierarchy (issue needed), to propagate the mix of:

    [ menu_trail:tools, book.navigation, users.roles ] to e.g. just [ 'page', 'users.roles' ] and only take the most upper level cache contexts into account that can be executed with very low overhead.

    Beta phase evaluation

    Reference: https://www.drupal.org/core/beta-changes
    Issue category Task because not a bug, but necessary to ensure good cache hit ratios.
    Issue priority Major because this is crucial for getting good cache hit ratios: when varying by cache contexts that are implied by other cache contexts, needless variations are created that are identical.
    Prioritized changes The main goal of this issue is performance.
    Disruption Minimal disruption: existing contrib/custom code that uses cache contexts will have to update to use the updated cache contexts IDs. There is a huge DX benefit though: the new cache contexts communicate unambiguously what their parent (more granular) cache contexts are by which they are implied. E.g. url.pagers:0 is implied by url.pagers which in turn is implied by url, and knowing that is easy: you only have to look at the cache context specified! It also clearly communicates to the developer where the value originates from. i.e. it takes away some of the magic, leading to a deeper understanding.
CommentFileSizeAuthor
#105 cache_context_hierarchy-2432837-105.patch67.64 KBWim Leers
#101 interdiff.txt1016 bytesWim Leers
#101 cache_context_hierarchy-2432837-101.patch67.65 KBWim Leers
#95 interdiff.txt848 bytesWim Leers
#95 cache_context_hierarchy-2432837-94.patch67.65 KBWim Leers
#92 interdiff.txt3.63 KBWim Leers
#92 cache_context_hierarchy-2432837-92.patch67.78 KBWim Leers
#90 interdiff.txt2.09 KBWim Leers
#90 cache_context_hierarchy-2432837-90.patch67.82 KBWim Leers
#84 interdiff.txt993 bytesWim Leers
#84 cache_context_hierarchy-2432837-84.patch67.76 KBWim Leers
#80 interdiff.txt4.8 KBWim Leers
#80 cache_context_hierarchy-2432837-80.patch67.77 KBWim Leers
#77 interdiff.txt5.18 KBWim Leers
#77 cache_context_hierarchy-2432837-76.patch70.65 KBWim Leers
#66 interdiff.txt3.07 KBWim Leers
#66 cache_context_hierarchy-2432837-66.patch70.66 KBWim Leers
#65 cache_context_hierarchy-2432837-65.patch70.64 KBWim Leers
#61 interdiff.txt1.45 KBWim Leers
#61 cache_context_hierarchy-2432837-61.patch70.55 KBWim Leers
#58 interdiff.txt10.95 KBWim Leers
#58 cache_context_hierarchy-2432837-58.patch70.02 KBWim Leers
#50 interdiff.txt5.54 KBWim Leers
#50 cache_context_hierarchy-2432837-50.patch60.06 KBWim Leers
#46 interdiff.txt9.19 KBWim Leers
#46 cache_context_hierarchy-2432837-46.patch58.69 KBWim Leers
#45 interdiff.txt996 bytesWim Leers
#45 interdiff.txt996 bytesWim Leers
#45 cache_context_hierarchy-2432837-45.patch50.11 KBWim Leers
#42 interdiff.txt9.12 KBWim Leers
#42 cache_context_hierarchy-2432837-42.patch49.2 KBWim Leers
#41 cache_context_hierarchy-2432837-39_rebased.patch41.13 KBWim Leers
#39 interdiff.txt1.07 KBWim Leers
#39 cache_context_hierarchy-2432837-39.patch41.3 KBWim Leers
#36 interdiff.txt5.44 KBWim Leers
#36 cache_context_hierarchy-2432837-36.patch40.73 KBWim Leers
#35 interdiff.txt9.64 KBWim Leers
#35 cache_context_hierarchy-2432837-35.patch39.45 KBWim Leers
#33 interdiff.txt5.07 KBWim Leers
#33 cache_context_hierarchy-2432837-33.patch32.83 KBWim Leers
#31 interdiff.txt8.87 KBWim Leers
#31 cache_context_hierarchy-2432837-31.patch30.02 KBWim Leers
#30 interdiff.txt6.41 KBWim Leers
#30 cache_context_hierarchy-2432837-30.patch21.83 KBWim Leers
#24 interdiff.txt5.5 KBWim Leers
#24 cache_context_hierarchy-2432837-24.patch16.75 KBWim Leers
#22 interdiff.txt5.5 KBWim Leers
#22 cache_context_hierarchy-2432837-22.patch16.14 KBWim Leers
#20 interdiff.txt5.2 KBWim Leers
#20 cache_context_hierarchy-2432837-20.patch12.63 KBWim Leers
#17 cache_context_hierarchy-2432837-17.patch8.03 KBWim Leers
#6 cache_context_hierarchy-2432837-6.patch1.84 KBWim Leers
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Fabianx’s picture

Issue summary: View changes
Wim Leers’s picture

Issue summary: View changes

Just adding some formatting/minor rewording to the IS.

yched’s picture

Regarding hierarchy, it's a bit counter-intuitive that 'user' is more granular than 'user.roles'. From the naming you'd expect the opposite (foo.bar looks more specific than foo)

catch’s picture

Yes that's been in the back of my mind as well, I think it goes both ways though.

Specifically in terms of user roles, it's easy to see that 'user' includes 'user.roles' - i.e. that roles is a property of user. Therefore if you vary on user you're also varying by role.

I'm wondering if that works for enough other things though - say page and page.active_trail? Then timezone - iirc it's possible to prevent users from selecting a timezone so that this would always be the timezone of the site.

Another thing here, I had a brief irc discussion with Fabianx about (longer term) exposing some of this logic to reverse proxies (i.e. for authenticated user caching in Varnish).

If we're able to group cache contexts into things like page vs. user, then that ought to help reverse proxy implementations figure out which are handled by the path and which aren't.

Fabianx’s picture

Priority: Normal » Major

Bumping to major as that is a big enabler of:

a) Caching at HttpMiddleware level (authenticated user)
b) ESI / auto-AJAX-replacement -- The less cache contexts, the easier the request logic of those.
c) Cache context comparison.

So authenticated user caching in Varnish in reality also solves most of the problems around ESI (at least those around the cache granularity).

The same lazy-two-tier-approach works for Varnish, too! :)

An ESI request is just a request that does not have 'cache_context.page', which is req.url in 'varnish speak'.

Wim Leers’s picture

Status: Active » Needs review
Issue tags: +Needs tests
FileSize
1.84 KB

I'd like to start talking about the implementation.

Cache contexts aren't plugins. They're tagged services.

So we can't use annotations. What we can use, is the ability to set purpose-specific attributes for tagged services. What about a parent attribute? E.g.:

  cache_context.pager:
    …
    tags:
      - { name: cache.context, parent: cache_context.url}

… but I wonder if that is truly sufficient. I think there are use cases where contrib/custom cache contexts may sit between existing levels.

E.g. when varied per user, there's no need to vary per user.roles. So the parent of user.roles is user. But what if we add an user.is_admin context? That's actually in between: it behaves like user for UID 1, but kind of like user.roles for every other user.


#3/#4: we could rename the user cache context to user.id, if you think that helps.

Wim Leers’s picture

Actually, I think parent is too vague. What about is_subset_of? But that too is not very clear, because, that means reading it as the pager cache context is a subset of the url cache context because it is less specific, but "less specific" actually would mean superset, not subset.

Perhaps I'm being misled because I'm not a native speaker, but I do think it's very important we get the wording right.


So, let me let me explain the situation using the cache context hierarchies we have in core, so anybody can chime in:

url
  - pager
  - book.navigation
  - menu.active_trail
user
  - user.roles

That is the hierarchy. If any of the lower level cache contexts is specified, as well as any of the higher level, then the lower level one can be omitted, since it's already implied by the (more specific) higher level.

Basically: What is the best word/verb to describe this relationship?

Fabianx’s picture

I think first of all we should name them like that, because in the end this is like class chaining:

Possible core examples

- user
- user.roles
- user.roles.has_role:'administrator'
- user.access_grants
- user.timezone (TZ is configured by a user in Drupal even if its the same always)
- user.language (the users preferred language, when e.g. user language neg is used on a content piece)

- url
- url.pager:0
- url.menu.active_trail:'tools'
- url.query_parameter:'campaign_code'
- url.book_navigation (While other schemes are possible the core scheme is to variate the book navigation by url.)

- url.main_request (need to call the main route for page assembling)

- url.route (hash of the route with parameters, used e.g. by SmartCache)
- url.route.format (negotiated format after routing, e.g. 'json', 'dialog', etc.)
- url.route.language (negotiated language derived from the url after routing)

( Here route is a sub-set of url based on path based language and content negotiation with the exception of allowing uncacheable pages to be everything. )

Possible Contrib examples

- user.organic_groups.has_og:'some-group'

- request (Hash of whole incoming request, not that useful)
- request.cookie:'device'
- request.header:'X-Drupal-Device'

Conclusion

Naming things appropriately is probably the easiest to show the relationships and then 'parent' is probably fine.

larowlan’s picture

Yes, the . based hierarchy makes sense to me, and we have similar patterns elsewhere, eg in validation message paths.
So given this varies from the pure service tags approach, where do we keep the logic - will you just use simple strpos === 0 style comparisons or do you need more complex helper methods?

Wim Leers’s picture

Wim Leers’s picture

It looks like #8 & #9 have established that we should declare the hierarchy through the use of periods. Let's go with that for now.


For completeness, this is the entire cache context hierarchy that we (Fabianx, catch, I) discussed in IRC:

request
    ip_address 
        country
    headers
        header:<parameter, e.g. 'X-Device-Detected'>
    cookies
        cookie:<parameter, e.g. 'device', 'cart-count'>
        session -- e.g. CSRF tokens
            user
                roles
                    has_role:<parameter, e.g. 'admin', 'anonymous'>
                organic_groups
                    has_organic_group:<parameter>
                is_super_user
                node grants
                timezone
                language
    accept (content negotiation)
        >> Not supported (period!)
    accept_language (language negotiation)
        >> 302 redirect, uncacheable — see https://www.drupal.org/node/2430335
    url
        host
        query_args
            query_arg:<parameter, e.g. 'q'>
            pagers
                pager:<parameter, e.g. 0>
        menu.active_trails
            menu.active_trail:<parameter, e.g. 'tools'>
        book.navigation
        route
            format -- the format is derived from the extension
            language -- the language is derived from the path
                language:<type, e.g. 'interface', 'content'>
            route_parameters
            path
            theme -- the theme is by default per route
server / global state
    request_time
    deployment_id
    configured state

Observations:

  1. There are two trees: request and server are their roots. This indicates that a cache context either comes from the request (almost always) but might also come from the server in rare cases.
  2. But "request" is the 99% use case, so we can omit request from the cache context IDs (and thus also omit it from that period-based naming scheme).
  3. So let's look at the request tree, that's the most interesting one. We see 4 children there: ip_address, headers, cookies and url. A HTTP request roughly consists of 2 things: a URL + headers. We special-case the cookie header because of its special treatment on the client-side. And each request originates from an IP address. Hence those 4. (Based on the discussion in #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use, we can explicitly not support the Accept & Accept-Language headers as cache contexts.)
  4. Now the important bit: every other cache context (including the ones in HEAD) has either of those 4 (ip_address/headers/cookies/url) as its ancestor! This is of vital importance because it means that if something varies by either of those 4 contexts, then any of its descendants are implicitly varied by also, and thus we can safely omit them.
  5. This reasoning applies to all levels in the tree. For example, if anything varies by the language request, then that implies varying by the language:interface cache context, meaning we can omit the latter.
  6. … but that points to an interesting consequence of the period-based naming scheme: do we really want to write url.route.language and url.route.language:interface? It has the benefit of removing all ambiguity, at the cost of being quite long.
catch’s picture

The longer names seem OK.

In irc Wim suggested:

- rename existing cache contexts to match the tree
- follow-up issues to add missing contexts

Also seems OK.

Fabianx’s picture

#11.6: I am all for long names, because while discoverability is not as good as for long classnames (where you have auto completion - "What IDE, I don't need an IDE. I have ex!'), we are generally moving towards:

- explicit is better than implicit

in core.

Also for language e.g. we have a user.language cache context and a url.route.language cache context, where this makes it also more clear.

yched’s picture

But the session doesn't necessarily come from a cookie ? (er, or does it ?)

yched’s picture

Also, discoverability is a concern. How do I find the correct full tree name for "cache by roles" ?

Fabianx’s picture

#14: Yes, it does in Drupal :).

#15: You look into either the services tree like for any other service or read the cache context documentation that we start in a sister issue.

Also user.roles is not more magical, than what we currently have, which is, uhm, 'user.roles'.

So my proposal was to document the whole tree, but don't have:

request.cookies.session.user.roles

But start a new tree with 'user' as user is a really special concept.

So the two root 'aliases' would then be:

- url (special, aliased from request.url)
- user (special, aliased from request.cookies.session.user)

Both are special concepts for a CMS like Drupal.

The two natural root aliases are:

- server (like server.value:HTTP_HOST)
- request (like request.header)

--

So my long names proposal is actually a:

- as long as needed for practical purposes proposal

and to shorten the hierarchy, where we don't have something meaningful as parents that can be cached.
(and while session might be valid at times, I don't feel it deserves it's own cache context, because it gives horrible cacheability).

--

TL;DR:

Have 4 root elements:

- server
- request

- user
- url

for the cache context hierarchy for practical purposes, have child elements of those use the '.' notation.

Also see #8 for concrete examples.

Note that this roughly matches what varnish has internally as variables (this comes naturally as both describe the hierarchy of headers and values):

e.g.

request == req
server == req.backend.value (IIRC)
url == req.url
user == // There is no concept of a user in varnish.
request.cookie == req.http.cookie (with regexp to filter out)
request.header:X == req.http.X

Wim Leers’s picture

Assigned: Unassigned » Wim Leers
FileSize
8.03 KB

Initial patch. Only doing url.host and url.query.args in this one, to see what it's starting to look like. Also introduces validation of the cache context ID, to ensure the parent cache contexts are present.

This patch will fail because the menu.active_trail cache context uses the period for something that isn't hierarchical, there's no menu cache context. Just showing that it works and what exception you'll get.

(No interdiff, completely new patch, following the agreed upon approach.)

Now pushing this further.

Status: Needs review » Needs work

The last submitted patch, 17: cache_context_hierarchy-2432837-17.patch, failed testing.

Wim Leers’s picture

Failed as expected, with this message:

 Output: [PHP Fatal error:  Uncaught exception 'InvalidArgumentException' with message 'The service "menu.active_trail" has an invalid service ID: the period indicates the hierarchy of cache contexts, therefore "menu" is considered the parent cache context, but no cache context service with that name was found.' in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Cache/CacheContextsPass.php:39
Wim Leers’s picture

Status: Needs work » Needs review
FileSize
12.63 KB
5.2 KB

Changes:

  1. pager -> url.pager
  2. menu.active_trail -> url.menu_active_trail
  3. API addition: MenuActiveTrailInterface::getActiveTrailIds() now accepts NULL as the menu name, just like its sibling MenuActiveTrailInterface::getActiveLink() — this then allows us to build the active trail when no menu is specified. This is just fixing an oversight.

Status: Needs review » Needs work

The last submitted patch, 20: cache_context_hierarchy-2432837-20.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
16.14 KB
5.5 KB

While tackling the next one (book.navigation), I realized that it's actually got the url.route cache context as its parent, and in fact, so does url.menu_active_trail/menu.active_trail!

Changes:

  1. New: url.route
  2. url.menu_active_trail -> url.route.menu_active_trail
  3. book.navigation -> url.route.book_navigation
  4. Fixed the pager cache context, to comply with its interface. Should cause testbot to finally actually run without fataling.

Status: Needs review » Needs work

The last submitted patch, 22: cache_context_hierarchy-2432837-22.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
16.75 KB
5.5 KB

Should be green.

catch’s picture

Is there any chance we need a cache context for the actual route as a thing as opposed to route + params?

Wim Leers’s picture

#25: interesting! I don't think we need that, because a URL is bound to a route + parameters, not to just a route. Just a route can live at many URLs, so it's not like that could be a useful level between "url" and "route", AFAICT.

Fabianx’s picture

#25: If you mean the route name, the context for that would be url.route.name instead.

Wim Leers’s picture

Indeed, that's implied by what I said. Thanks for making that much clearer :)

catch’s picture

url.route.name is what I was thinking of. And indeed url.route is the parent of that so that works fine.

Wim Leers’s picture

FileSize
21.83 KB
6.41 KB

This interdiff:

  1. New: url.route.format
  2. New: url.route.name
  3. With that, the url-subtree is finished IMO, caching per url.route.route_parameters or per url.path don't make sense.
  4. New: ip
  5. New: cookies (which allows for cookies:<name>)
  6. New: headers (which allows for headers:<name>)

Next: user.

Wim Leers’s picture

FileSize
30.02 KB
8.87 KB

This interdiff:

  1. user and user.roles moved from the user module into \Drupal\Core, since none of it actually depends on user module logic — or at least, not anymore.
  2. Added user.is_super_user
  3. Changed user.roles into a calculated cache context; it behaves just like it does today, but it now allows one to specify a specific role to check the presence of, e.g.: user.roles:admin, to vary something by whether the current user has the admin role or not (or e.g. anonymous or not, or authenticated or not)
  4. node_view_grants -> user.node_view_grants

Next: dealing with the last 3 cache contexts in core that haven't been touched yet: language + theme + timezone.

Status: Needs review » Needs work

The last submitted patch, 31: cache_context_hierarchy-2432837-31.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
32.83 KB
5.07 KB

This interdiff:

  1. Fixed #31, which was broken due to a c/p error for the user.roles cache context.
  2. As discussed in IRC yesterday with catch & Fabian: user, theme and language should be top-level cache contexts. The conclusion, verbatim from IRC:
    13:57:18 WimLeers: catch: Fabianx-screen and I agree with you, the basic rule is: fit cache contexts into the tree unless there's a use case for altering the behavior/logic, and hence the parent cache context. In that case, make it a top-level cache context. Applies to user, theme, language.
  3. #31 did that for user. This now also does that for theme and language
  4. timezone -> user.timezone

Next: clean-up.

Wim Leers’s picture

What the current patch gives us

So we've established a clear pattern to define the tree:

  • periods separate parents from children
  • everywhere where cache contexts are used, that entire hierarchy is used, which has 3 benefits:
    1. no ambiguity: it's clear what parent cache context is based on wherever it is used
    2. comparing (and folding) cache contexts becomes simpler: if both a.b.c and a.b are present, it's obvious that a.b encompasses a.b.c, and thus it's clear why the a.b can be omitted, why it can be "folded" into the parent
    3. no need to deal with ensuring each level in a tree is unique in the entire tree

So let's look at the tree we've ended up with, the one the last patch actually allows you to use. Sorted alphabetically. Generated by this code:

    foreach ($this->contexts_service->getAll() as $context_id) {
      $accepts_parameter = $this->container->get('cache_context.' . $context_id) instanceof CalculatedCacheContextInterface;
      NestedArray::setValue($tree, explode('.', $context_id), $accepts_parameter, TRUE);
    }
    var_dump($tree);

That gives us:

cookies
headers
ip
language
theme
url
  .host
  .pager
  .query_args
  .route
    .book_navigation
    .format
    .menu_active_trail
    .name
user
  .is_super_user
  .node_view_grants
  .roles
  .timezone

Note that the deeper you go in the tree, the better for caching, because the less specific your cache context becomes, and hence the more reusable it becomes.

Looking at the cache contexts that accept a parameter

But now let's clarify that a bit by showing which ones accept a parameter, signified with a leading colon (:), just like the syntax allows you:

cookies
  :name
headers
  :name
ip
language
theme
url
  .host
  .pager
    :pager_id
  .query_args
    :key
  .route
    .book_navigation
    .format
    .menu_active_trail
      :menu_name
    .name
user
  .is_super_user
  .node_view_grants
  .roles
    :role
  .timezone

Now we see an interesting pattern. Most of the ones accepting a parameter are plural, but not all.

I'd argue that every calculated cache context (i.e. the ones that accept a parameter) should be plural, because when no parameter is given, they're required to return a value representing all of the possible parameters. This then automatically gives us a clean, logical hierarchy. And it communicates to the user that whenever a plural is encountered, a parameter can be specified.

Then the pattern becomes:

  • periods separate parents from children
  • everywhere where cache contexts are used, that entire hierarchy is used, which has 3 benefits: […]
  • a plurally named cache context indicates a parameter may be specified; to use: append a colon, then specify the desired parameter

Proposed next steps

So if we rename url.pager to url.pagers and url.route.menu_active_trail to url.route.menu_active_trails, we're wholly consistent there. (They've already been updated in the above patches to support no parameter being passed and in that case returning results that apply to all values they accept as parameters.)

Except for user.node_view_grants. That's still plural. But why is it plural? Because there are many operations for which there are grants. But it currently only supports the "view" operation. Renaming it to user.node_grants and then allowing a parameter containing the operation to be specified, would make this too wholly consistent: user.node_grants:view.

Finally, I'd like to propose we turn language into a calculated cache context and turn it into languages. Then you can do languages:content or languages:interface. That also better captures the fact that the current language actually covers all the different language types.

With those proposals implemented, we'd end up with:

cookies
  :name
headers
  :name
ip
languages
  :type
theme
url
  .host
  .pagers
    :pager_id
  .query_args
    :key
  .route
    .book_navigation
    .format
    .menu_active_trails
      :menu_name
    .name
user
  .is_super_user
  .node_view_grants
    :operation
  .roles
    :role
  .timezone
Wim Leers’s picture

FileSize
39.45 KB
9.64 KB

In this interdiff: user.node_view_grants -> user.node_grants.

Wim Leers’s picture

FileSize
40.73 KB
5.44 KB

In this interdiff:

  • url.pager -> url.pagers
  • url.menu_active_trail -> url.menu_active_trails

The last submitted patch, 35: cache_context_hierarchy-2432837-35.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 36: cache_context_hierarchy-2432837-36.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
41.3 KB
1.07 KB

In #35, there are failures, because in between posting #33 and #35, #2381217: Views should set cache tags on its render arrays, and bubble the output's cache tags to the cache items written to the Views output cache landed, which introduced more uses of cache contexts, which I therefore hadn't yet updated.

In #36, I stupidly forgot to rename the actual cache context. The consequences are clear :)

Status: Needs review » Needs work

The last submitted patch, 39: cache_context_hierarchy-2432837-39.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
41.13 KB

Straight rebase of #39, patch didn't apply because #2433599: Ensure every (non-views) pager automatically associates a matching cache context was committed in the mean time.

Wim Leers’s picture

FileSize
49.2 KB
9.12 KB

In this interdiff: language -> languages.

The cache context hierarchy is now all done! Now we just need to start using it.

Next: folding of cache contexts + tests.

The last submitted patch, 41: cache_context_hierarchy-2432837-39_rebased.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 42: cache_context_hierarchy-2432837-42.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
50.11 KB
996 bytes
996 bytes

Missed one place. This should make it green again.

Wim Leers’s picture

Assigned: Wim Leers » Unassigned
Issue tags: -Needs tests
FileSize
58.69 KB
9.19 KB

Folding implemented: CacheContexts::optimizeTokens().

All done now; now blocked on reviews.

Fabianx’s picture

+++ b/core/lib/Drupal/Core/Render/MainContent/HtmlRenderer.php
@@ -147,7 +158,7 @@ public function renderResponse(array $main_content, Request $request, RouteMatch
-      'X-Drupal-Cache-Contexts' => implode(' ', $cache_contexts),
+      'X-Drupal-Cache-Contexts' => implode(' ', $this->cacheContexts->optimizeTokens($cache_contexts)),

Can be follow-up, but someone / something should set the 'url' cache context, so that things get optimized out that only depend on url.

Also means the page cache has the 'url' cache context.

--

Continuing review later ...

Fabianx’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/IsSuperUserCacheContext.php
    @@ -0,0 +1,29 @@
    +  public function getContext() {
    +    return $this->user->id() === 1;
    +  }
    

    I think better to do:

    return 's.' . $this->user->id() === 1 ? 1 : 0;

    Else you end up with:

    '::' potentially, which is confusing.

  2. +++ b/core/lib/Drupal/Core/Cache/LanguagesCacheContext.php
    @@ -41,17 +41,22 @@ public static function getLabel() {
    +    if ($type === NULL) {
    ...
    -      $context_parts[] = $this->languageManager->getCurrentLanguage()->getId();
    +      return $this->languageManager->getCurrentLanguage($type)->getId();
    

    I think it would be great to type-hint somewhere that this is of one of the types defined in:

    Drupal\Core\Language\LanguageInterface

  3. +++ b/core/lib/Drupal/Core/Cache/LanguagesCacheContext.php
    @@ -41,17 +41,22 @@ public static function getLabel() {
    +      else {
    +        $context_parts[] = $this->languageManager->getCurrentLanguage()->getId();
    +      }
    

    So if its not multi-lingual, then we use LanguageInterface::TYPE_INTERFACE, which is the default.

  4. +++ b/core/lib/Drupal/Core/Cache/LanguagesCacheContext.php
    @@ -41,17 +41,22 @@ public static function getLabel() {
    +      return implode(':', $context_parts);
    

    I think we should delimite with '.' or ',' here as its not a new CID part.

  5. +++ b/core/lib/Drupal/Core/Cache/PagersCacheContext.php
    @@ -22,7 +22,11 @@ public static function getLabel() {
    +    if ($pager_id === NULL) {
    +      return 'pager' . $this->requestStack->getCurrentRequest()->query->get('page', '');
    +    }
    

    This could use a comment that 'page' contains all pagers on a page separated by comma, or an @see pager_find_page() or such.

    It confused me first.

  6. +++ b/core/lib/Drupal/Core/Cache/QueryArgsCacheContext.php
    @@ -0,0 +1,40 @@
    +      parse_str($this->requestStack->getCurrentRequest()->getQueryString(), $query_args);
    +      return $query_args[$query_arg];
    

    This needs an isset() check.

  7. +++ b/core/modules/block/src/BlockViewBuilder.php
    @@ -85,7 +85,7 @@ public function viewMultiple(array $entities = array(), $view_mode = 'full', $la
    -          'language',
    +          'languages',
    

    Are we sure the block depends on all language types?

    Not know too much about this, just wondering.

  8. +++ b/core/modules/filter/src/Tests/FilterAPITest.php
    @@ -259,7 +259,7 @@ function testProcessedTextElement() {
    -      'language',
    +      'languages:content',
    

    According to https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Language%... this needs to be language_content.

Fabianx’s picture

Issue summary: View changes
Wim Leers’s picture

FileSize
60.06 KB
5.54 KB
  1. Done.
  2. PHP doesn't allow implementations of an interface to change the typehint. Added docs and an exception.
  3. Yes, as in HEAD, and as happens elsewhere in core.
  4. Went with comma, just like we do in UserRolesCacheContext. (Note though that the colon was already there before this patch.)
  5. Fixed.
  6. No, we're not sure about that. But I'm trying to not correct (granularize) the usage of HEAD's 'language' cache context in this patch. I only did it in the filter API test coverage because there the impact is self-contained. Let's fix the existing usages in #2448823: Add languages:<type> cache contexts.
  7. Fixed.

Also rebased; fixed a tiny conflict.

Wim Leers’s picture

Title: Consider making cache contexts hierarchical (e.g. 'user' is more specific than 'user.roles') » Make cache contexts hierarchical (e.g. 'user' is more specific than 'user.roles')
Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Could find nothing more, RTBC if tests pass :).

We need another issue for e.g. user roles to consider throwing an exception, too if the argument is invalid to prevent:

user.roles:auhtenticated

or other typos :).

Great work!

Fabianx’s picture

Issue summary: View changes
Fabianx’s picture

Wim Leers’s picture

This should fail because #2445743: Allow views base tables and entity types to define additional cache contexts has landed, which introduced more occurrences of some cache contexts.

Wim Leers’s picture

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 50: cache_context_hierarchy-2432837-50.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
70.02 KB
10.95 KB

Failed as expected.

Some Views plugins were associating the non-existing cache.context.url cache context instead of just url. Also noticed at #606840-40: Enable internal page cache by default.

Opened an issue to validate cache contexts, so that that is not possible anymore: #2451679: Validate cache contexts (+ cache contexts in some views plugins wrong).

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 58: cache_context_hierarchy-2432837-58.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
70.55 KB
1.45 KB

Trivial fix, hence back to RTBC.

Fabianx’s picture

RTBC + 1

larowlan’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/CacheContexts.php
    @@ -118,6 +119,69 @@ public function convertTokensToKeys(array $context_tokens) {
    +   * - ['a', 'a.b', 'a.b.c'] -> ['a.b']
    

    Why is this a.b and not a? I could be plain wrong here.

  2. +++ b/core/lib/Drupal/Core/Cache/CacheContexts.php
    @@ -118,6 +119,69 @@ public function convertTokensToKeys(array $context_tokens) {
    +      // - a period mean they don't have a parent
    +      // - a colon mean they're not a specific value of a cache context
    

    nitpick, should this be means - not mean

  3. +++ b/core/lib/Drupal/Core/Cache/CacheContexts.php
    @@ -118,6 +119,69 @@ public function convertTokensToKeys(array $context_tokens) {
    +      // hence no  optimizations are possible.
    

    nitpick two spaces

  4. +++ b/core/lib/Drupal/Core/Cache/IsSuperUserCacheContext.php
    @@ -0,0 +1,29 @@
    +    return $this->user->id() === 1 ? '1' : '0';
    

    I know in comment we had issues with $comment->getOwnerId() returning '1' as in a string, we had to add a cast to int in that method.
    We might need this here too.

  5. +++ b/core/lib/Drupal/Core/Cache/RouteCacheContext.php
    @@ -0,0 +1,48 @@
    +    return $this->routeMatch->getRouteName() . serialize($this->routeMatch->getRawParameters()->all());
    

    could we hash this to reduce the size of the context? Just checking we definitely don't unserialize this, as that'd be a sechole.

  6. +++ b/core/lib/Drupal/Core/Cache/RouteNameCacheContext.php
    @@ -0,0 +1,28 @@
    +class RouteNameCacheContext extends RouteCacheContext {
    +  /**
    

    Nitpick missing blank line

  7. +++ b/core/tests/Drupal/Tests/Core/Cache/CacheContextsTest.php
    @@ -20,6 +20,63 @@
    +      [['a', 'a.b', 'a.b.c'], ['a']],
    

    This would indicate the comment earlier (see point 1) is incorrect.

webchick’s picture

Status: Reviewed & tested by the community » Needs work
Wim Leers’s picture

Wim Leers’s picture

FileSize
70.66 KB
3.07 KB

Second: #63.

  1. Nope, you're plain right.
  2. LOL, yes, thanks. C/P + modifying fail.
  3. See 2.
  4. Cache context values are never exposed to end users. But yes, let's do this anyway, better safe than sorry.
  5. Fixed.
  6. Indeed!

Excellent review, @larowlan++

Fabianx’s picture

Assigned: Unassigned » catch
Status: Needs review » Reviewed & tested by the community

Great review, back to RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 66: cache_context_hierarchy-2432837-66.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community

Testbot was flailing in its first test run (nonsensical fail).

Wim Leers’s picture

Also pinged @effulgentsia for a review. And posted a link to this in #2443323: New convention: CacheContextInterface implementations should mention their ID in their class-level docblock to get more feedback.

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs work

Needs work for:

  1. +++ b/core/lib/Drupal/Core/Cache/QueryArgsCacheContext.php
    @@ -0,0 +1,40 @@
    + * Defines the QueryArgsCacheContext service, for "per host" caching.
    + *
    + * A "host" is defined as the combination of URI scheme, domain name and port.
    + *
    + * @see Symfony\Component\HttpFoundation::getSchemeAndHttpHost()
    

    What does "host" have to do with QueryArgs?

  2. +++ b/core/lib/Drupal/Core/Cache/RequestStackCacheContextBase.php
    @@ -0,0 +1,38 @@
    + * Defines the HostCacheContext service, for "per host" caching.
    + *
    + * A "host" is defined as the combination of URI scheme, domain name and port.
    + *
    + * @see Symfony\Component\HttpFoundation::getSchemeAndHttpHost()
    + */
    +abstract class RequestStackCacheContextBase implements CacheContextInterface {
    

    Wrong docs for this class.

effulgentsia’s picture

Additionally, but could be follow-up:

  1. +++ b/core/core.services.yml
    @@ -6,26 +6,88 @@ parameters:
    +  # Simple cache contexts, directly derived from the request context.
    

    url.route and its children aren't exactly "simple" and "directly derived", so I think could be given a separate section with their own comment.

  2. +++ b/core/core.services.yml
    @@ -6,26 +6,88 @@ parameters:
    +  cache_context.url.pagers:
    

    Why is 'pagers' at this level and not under 'query_args'? I suppose this allows an alternate implementation that uses some other portion of the URL, but if that's the intent, or if there's a different intent, let's add a comment about that.

effulgentsia’s picture

And on a more substantial note, overall I like the naming approach a lot. But, for things like cache_context.url.route.menu_active_trails and cache_context.url.route.format, it means we have an implicit requirement that implementations of those services must only depend on the URL. So, no ability for FancyMenuActiveTrails to depend on user or phase of moon, and no ability for contrib to implement header-based format negotiation. In other words, what we're sacrificing by using service ID to have hierarchy meaning is the ability to override the service implementation with something that would put it in a different part of the hierarchy. Is that something we're willing to accept? Or, should we address it via something like using an optional service tag or a service alias? For example, if contrib implements a FancyMenuActiveTrail override of core's MenuActiveTrail, then it can also tag or alias the cache_context.url.route.menu_active_trails service in a way that makes the system (e.g., CacheContexts::optimizeTokens()) aware that its parent is no longer url.route.

[Edit: BTW, I'm fine to punt this to a follow-up.]

effulgentsia’s picture

How about this to address #74: we embed the hierarchy in the context name / service ID only when that hierarchy is intrinsic to the meaning of the context, not merely an implementation choice.

What I mean by that is url.host is a good name, because the semantics of "host" makes it intrinsically a child of "url". Same for url.query_args and user.roles.

However, I suggest that timezone, format, and pagers each move to the root level, because those concepts are not intrinsically bound to users, routes, and URLs, respectively.

Additionally, I think route should move to the root level as well. Routing could in principle depend on IP and headers, even if we don't want core's to do so.

I'm unsure about book_navigation and menu_active_trails. I think they might make sense to keep under route (i.e., route.book_navigation and route.menu_active_trails), because I don't think it makes semantic sense for those things to vary on anything other than the route. However, I suspect there are sites that customize breadcrumbs and the rendered structure of some menus based on things like role or organic group, but what I wonder is whether it ever makes sense to implement such customizations via the return value of getActiveTrailIds() or whether such customizations should only ever be done in the display layer (which would then also be responsible for adding the additional contexts there). If there are legitimate use cases for the former, then those context names probably need to move to the root level as well (or we can group them under something like navigation).

For this issue, I think it would be sufficient to simply rename accordingly.

In a followup, I think we can add a service tag to convey an implementation-driven parent relationship. For example:

  cache_context.pagers:
   ...
    tags:
      - { name: cache.context, parent: cache_context.url.query_args:page }
  cache_context.route:
    ...
    tags:
      - { name: cache.context, parent: cache_context.url }

And thereby allow CacheContexts::optimizeTokens() to optimize for implementations where pagers are solely determined by a single URL query argument and routing is done exclusively by URL.

dawehner’s picture

While I can understand the point of not making everything alterable, I think encoding hierarchy into the service IDs is wrong,
because its a new pattern we came up with. ... What about things like module.installer, there is no hierarchy, its simply a group of things.

Tags are the way how you add additional metainformation to the services.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
70.65 KB
5.18 KB

#72: fixed both; both are copy/paste remnants.

#73: Fixed both. Your hunch on the second point was correct. But I think you're right; we should put it under query_args. Talked about it with Fabian, and he's fine with it too. More on the "why" below.

#74 + #75:

  • RE: menu active trail. We discussed that one for a long time. I also was always convinced of fancy alternative implementations. E.g. a breadcrumb trail that's kept in the session. But after talking about it for a long time, and actually looking at the interface, we came to the conclusion that that doesn't really make sense:
    /**
     * Defines an interface for the active menu trail service.
     *
     * The active trail of a given menu is the trail from the current page to the
     * root of that menu's tree.
     */
    interface MenuActiveTrailInterface {
    

    So the active menu trail is always going to depend on routing. It may choose a different algorithm to select parents, but it'll depend on the current route always.

    But! That doesn't mean that the breadcrumb trail cannot use some fancy alternative, like remembering last-visited pages in the session. In other words, like you, I was confusing "active menu trail" with "breadcrumb trail". People can still override the breadcrumb manager or add more breadcrumb_builder-tagged services. It's up to those services to associate whatever cache context they want. (And currently, they're not associating any cacheability metadata at all btw, that's up to #2232609: Cacheable breadcrumbs block, and fix breadcrumb builders to fix.)

  • Generally speaking, we have two types of usages of cache contexts:
    1. Cache contexts that are internal to the thing associating them (implementation-bound): e.g. #type => pager has a default behavior that is based on a query argument. But using element_info_alter(), you could change that to use the URL (think /page:1 in the URL). When making that implementation change, you can also define another cache context and associate that one instead. This will work fine, because the cache context is internal; it is bubbled, it isn't set by the consumer of #type => page.
    2. Cache contexts that are external to the thing associating them: e.g. an access result associates the "per role" cache context because it checked permissions.

    The internal ones we can and should be specific. When the implementation is swapped out, the cache context can be swapped too, and will bubble automatically, so everything continues to work.

    The external ones, there we have to be careful. This is why languages, user and theme are all top-level: because they are necessary all over the place and the way the corresponding value for a request is selected can be highly dynamic: many types of language negotiation (URL-based, header-based, user-preference-based etc.), many types of authentication, many ways of selecting a theme (URL, user preference, defined on the route…): impossible to choose a clear parent.

  • RE: IP-based routing. You're right in theory. But it'd also mean that responses would need to very per IP address. This is similar/related to the #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use discussion, where the conclusion is that Accept-Language and Accept headers cannot be supported because it's impossible to make work with external caches, and therefore, with the internet in general. In an ideal world, it's possible, but in the practical actual world, it isn't. (Besides, IP-based routing goes even further than that; it's less reasonable, if you will.)
    We are choosing to standardize on the URL as being the only thing that determines the route. It's a uniform resource locator, after all. For browser-based language negotiation (which is also conceptually part of routing), we're introducing a redirect to the corresponding URL: #2430335: Browser language detection is not cache aware. For IP-based routing, we'd also use a redirect. For country-based routing (country derived from IP), we'd also use a redirect. In other words: for any advanced/fancy form of routing, we'd use redirects.
  • RE: timezone, format, pagers. I've explained pagers and route above. The same applies to timezone and route. Basically, we need to make some assumptions, otherwise we end up with an uncacheable site. format will be implied by url once #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use lands. timezone could also be IP-specific (because region-specific), but we usually see per-country websites, so once again a redirect is the answer.
    If we don't make any decisions, which are indeed judgement calls, but well-reasoned ones, then we'd not end up with a hierarchy, but a list.
  • RE: implementation-driven parent relationship. I think that's a very interesting proposal. There are a few downsides to it though, from #34:
    1. everywhere where cache contexts are used, that entire hierarchy is used, which has 3 benefits:
      1. no ambiguity: it's clear what parent cache context is based on wherever it is used
      2. comparing (and folding) cache contexts becomes simpler: if both a.b.c and a.b are present, it's obvious that a.b encompasses a.b.c, and thus it's clear why the a.b can be omitted, why it can be "folded" into the parent
      3. no need to deal with ensuring each level in a tree is unique in the entire tree

    Your proposal would make the hierarchy implicit instead of explicit, hence removing those advantages. It'd introduce more magic.

Don't get me wrong, I agree with #75 in principle. But #75 is about optimizing for the "what if it's altered in advanced ways" use case, the current patch optimizes for the "I'm using it like >98% does" use case.

Looking forward to your thoughts :)

Wim Leers’s picture

I think encoding hierarchy into the service IDs is wrong, because its a new pattern we came up with. ... What about things like module.installer, there is no hierarchy, its simply a group of things.

We only apply this special meaning to cache context services. It is validated in CacheContextsPass. I don't see why that's necessarily harmful.

(Another advantage of doing it this way is that people are already familiar with the concept that a service ID must be unique. Hence the cache context service ID must be unique.)

Besides, service IDs in general are a wild west. Singular, plural, periods, underscores, "manager" suffixes in some cases and not in others, etc. Even within a same area there are inconsistencies (@theme_handler, but @theme.manager). I don't think it's bad to have some enforced consistency for tagged services of a certain type.

That being said: I see your point. What you're saying is actually exactly what I originally did. But that lead to a lack of clarity. See #6 & #7.

Fabianx’s picture

To #77 for the route argument:

If a site chooses to alter the meaning of url.route and exchange the cache context, so its dependent on more than the url naturally they will need to exchange the 'url' cache context as well.

Even if there was no hierarchy and it was not simplified, as soon as something uses 'url' instead of route, but this url depends on something else in the request it is wrong.

That is why core enforces an url based hierarchy on things that is true for core.

Because with this we give the guarantee of cacheability in external proxies.

Therefore it is good that we choose to have url.route as dependent on the url, because it gives clarity that you need to exchange both to get correct and proper caching.

--

Let's go through the list again:

cookies -- dependent on request, straightforward OK
headers -- dependent on the request headers, OK
ip -- a property of the request, OK
languages -- a top level drupal concept, needs special casing, OK
  :type
theme -- a top level drupal concept, can be added by anything that renders, it is dependent on the route in Core
url -- dependent on request, OK
  .host -- dependent on request, OK
  .query_args
    :key
    .pagers -- Core's implementation is bound to query_args, the implementation has the cache contexts, OK
      :pager_id
  .route -- Core's route is determined out of the URL for cacheability reasons in external proxies, redirects are not cached, OK (see above).
    .book_navigation -- Interfacing on the route, if you need another navigation, switch out the navigation AND the context, OK
    .format -- determined by the route negotiation, OK
    .menu_active_trails -- Interfaced on route, if you switch out the implementation, switch out the implementation and the contexts, OK
      :menu_name
    .name -- determined by the route, OK
user -- Arbitrary Drupal concept, OK
  .is_super_user -- Arbitrary Drupal concept, OK
  .node_grants -- Grants are per user in core, other things use other contexts, OK
    :operation
  .roles -- A user has one or more roles, OK
    :role
  .timezone -- Timezone is in Core a property of the user, but timezone being a global thing, it might be tricky, SO SO

So therefore timezone from this list remains as its added by e.g. a DateFormatter and is understood as a global concept.

And maybe explicit parent relationship as a tag is the right thing for timezone and theme, so it can still be simplified in core. (though that could indeed be follow-up).

Because usually timezone is user and given:

[timezone, user] can be simplified to user, but that might not always be true.

Similarly theme can easily be simplified to url.route in Core by default given the interface of the theme negotiator and we don't do this right now.

Will discuss timezone with Wim again.

Wim Leers’s picture

FileSize
67.77 KB
4.8 KB

Timezone

In #77, I gave I think a sound justification for why route, format and pager are already in the right place.
For timezone, I wasn't able to formulate an equally sound justification. Combined with #79, I'm fine with moving timezone to a top-level position again, so that we can do #75's bottom proposal in a follow-up.

Then patch == #75

Then the patch basically is exactly what you suggested in #75:

How about this to address #74: we embed the hierarchy in the context name / service ID only when that hierarchy is intrinsic to the meaning of the context, not merely an implementation choice.

What I mean by that is url.host is a good name, because the semantics of "host" makes it intrinsically a child of "url". Same for url.query_args and user.roles.

However, I suggest that timezone, format, and pagers each move to the root level, because those concepts are not intrinsically bound to users, routes, and URLs, respectively.

Additionally, I think route should move to the root level as well. Routing could in principle depend on IP and headers, even if we don't want core's to do so.

[…]

For this issue, I think it would be sufficient to simply rename accordingly.

In a followup, I think we can add a service tag to convey an implementation-driven parent relationship. […]

And thereby allow CacheContexts::optimizeTokens() to optimize for implementations where pagers are solely determined by a single URL query argument and routing is done exclusively by URL.

i.e.:

  1. the hierarchy is kept as it was in #66 for cache contexts where the hierarchy is intrinsic.
  2. all hierarchy is removed for any cache context where the hierarchy is non-intrinsic (depends on a swappable implementation).
    1. This already was the case for most such cache contexts (user, theme …), but not for timezone. pagers does not fall in this category, because for 99% of developers, pager == query arg.
    2. For the routing-related cache contexts (route, format), we do enforce an intrinsic meaning, because that's also what #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use will do; i.e. we just follow the decision there.
    3. For fancy alternative pagers, they can just override the Pager element and set a different cache context.
  3. we have a follow-up to allow those non-intrinsic (implementation-dependent) hierarchies to be defined: #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents()

CR updated.

Fabianx’s picture

To write #80 again from a different light (also for my own understanding).

1. Intrinsic, it is a property of the parent - always, mostly request based things.

+Main property: Not negotiated, but taken mostly raw.
+Swappability: It can be swapped, but there is no use case without also having to swap the parents in the hierarchy.

2. Implementation specific, the cache context is bound to a specific implementation and by swapping out the implementation the cache contexts are swapped out as well.

Those are cache contexts that are not negotiated, but hard coded to their specific implementation. (Core's pager do use ?page)

+Main property: You do not use this cache context outside of the things it is on. It is bubbled up.
+Swappability: You do not swap the cache context, but the implementation and use a different cache context.

3. Dependent on calculated state, the cache context is negotiated based on other properties that can change. Because it is a cache context that others specify it MUST be swappable.

+Main property: Complex logic to calculate, can depend on several other cache contexts itself.
+Swappability: MUST be swappable, because it is specified by something outside, e.g. a date formatter specifies a timezone CC.

These are the top level contexts and what #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents() is for.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

All feedback has been addressed, #76 was additionally discussed in IRC.

Therefore back to RTBC.

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs review

If a site chooses to alter the meaning of url.route and exchange the cache context, so its dependent on more than the url naturally they will need to exchange the 'url' cache context as well.

I don't follow this reasoning. If I want to write a contrib/custom module that determines a route parameter from a header (because I choose to not care about reverse proxy caching, or can configure my reverse proxy to vary by that header), why would that require changing the meaning or implementation of the 'url' context? Vary-by-URL still means exactly that. It's only that in that case, vary-by-route can't be treated as a strict subset of vary-by-url, so how to deal with that? I suppose #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents() will allow that even if we preserve the context name as url.route?

Not un-RTBC'ing for above, since that can be debated in a followup. But, setting to NR for:
Should url.route.format be changed to url.format? Because a single route can serve multiple formats AND a single format can be used by many routes, so I don't see in which sense format is a child of route.

Wim Leers’s picture

FileSize
67.76 KB
993 bytes

Because a single route can serve multiple formats

And @effulgentsia strikes again with irrefutable simplicity & clarity!

Of course! Incredible that we didn't realize this before. All of your awesome feedback so far is exactly why I wanted to get your thoughts before this went in :)

Updated the CR too.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Yes, #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents() will allow to set a parent explicitly even if the semantic meaning is changed then from our recommendation / choice in core.

Naming ain't easy :).

I also wondered before on url.route.format, but atm. format neg is done in middle ware (or at least soon), so yes, that is not a property of the route itself.

effulgentsia’s picture

Thanks.

We are choosing to standardize on the URL as being the only thing that determines the route. It's a uniform resource locator, after all.

I promised not to un-RTBC this on this, so I won't, but I'd just like to make a final plea to reconsider putting format and route at the root levels, not inside url. Here's why: URL stands for uniform resource locator, not uniform representation locator. So I don't believe there is anything semantically that justifies thinking about format and route as children of url. I don't think '#contexts' => ['url.format'] is better DX over '#contexts' => ['format'], especially when 'languages' and 'theme' are at the root level, and at least superficially, format is kind of like those things.

I can appreciate the desire to optimize format and route away when we're caching on url, but we'll be able to do that in #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents(), so I don't see why we need to make that implementation decision leak into the names.

Fabianx’s picture

Status: Reviewed & tested by the community » Needs review

#86: Thanks, I think our understanding of cache contexts has started to grow and as its a new concept it takes some tries to get right.

What I don't want is: All things are top-level, hence just a list as that is problematic for discoverability and DX.

I am pretty happy with my latest table / decision chart from #81 and I think thats a good addition to the developer docs on cache contexts.

Lets judge 'format' and 'route' based on that:

1. format is not intrisic itself as its not a simple property on the url, it is a calculated property of the request and in Drupal's implementation of the url, but we choose to omit 'request.' prefix due to shortening. It certainly fails 'not negotiated', but raw.

2. format would definitely be set by something else, like a request / response listener, hence it is not specific to the implementation of its thing. (though the cache contexts could live on the request object, but that is still != url).

3. format is negotiated and must be swappable for REST support in contrib.

=> Top level

route

- 1. Derived from the request, not a simple property, multiple routes per url
- 2. Smart Cache is after routing and wants to use this, so set by something else.
- 3. Definitely a negotiated context

=> Top level

A non-shortened tree would probably look like this:

languages -- a top level drupal concept, needs special casing, OK | Core Parent: url (???)
request -- the hash of the full incoming request
  .cookies -- dependent on request, straightforward OK
  .format -- property of the request ($request->getFormat()), OK
  .headers -- dependent on the request headers, OK
  .ip -- a property of the request, OK
  .url -- dependent on request, OK
    .host -- dependent on url, OK
    .query_args
      :key
      .pagers -- Core's implementation is bound to query_args, the implementation has the cache contexts, OK
        :pager_id
route -- Drupal concept, can also be negotiated theoretically by phase-of-the-moon | Core Parent: 'url'
  .book_navigation -- Interfacing on the route, if you need another navigation, switch out the navigation AND the context, OK
  .menu_active_trails -- Interfaced on route, if you switch out the implementation, switch out the implementation and the contexts, OK
    :menu_name
  .name -- determined by the route, OK
theme -- a top level drupal concept, can be added by anything that renders, it is dependent on the route in Core, | Core Parent: 'route'
timezone -- top level negotiated concept | Core Parent: 'user'
user -- Arbitrary Drupal concept, OK
  .is_super_user -- Arbitrary Drupal concept of a user, OK
  .node_grants -- Grants are per user in core, other things use other contexts, OK
    :operation
  .roles -- A user has one or more roles, OK
    :role

If we omit request, we should probably still use request_format instead of just 'format' as it is a little too common.

Without explanations:

languages
  :type
request
  .cookies
    :name
  .format
  .headers
    :name
  .ip
  .url
    .host
    .query_args
      :key
      .pagers
        :pager_id
route
  .book_navigation
  .menu_active_trails
    :menu_name
  .name
theme
timezone
user
  .is_super_user
  .node_grants
    :operation
  .roles
    :role

@Wim: Thoughts?

effulgentsia’s picture

+1 to #87. Either option (with or without request as its own context / prefix) is fine with me. If we omit that prefix, I'm fine with either format or request_format.

Fabianx’s picture

Final decision, 1,2,3:

cookies
  :name
headers
  :name
ip
languages
  :type
request_format
route
  .book_navigation
  .menu_active_trails
    :menu_name
  .name
theme
timezone
url
  .host
  .query_args
    :key
    .pagers
      :pager_id
user
  .is_super_user
  .node_grants
    :operation
  .roles
    :role
Wim Leers’s picture

FileSize
67.82 KB
2.09 KB

Talked it through some more with Fabianx, to try to find remaining problems, but it looks like this is now completely sound.

Went with request_format because — in order of importance — 1) Symfony has Request::getRequestFormat(), which is exactly what this cache context uses, 2) prevents clashes, because "format" could indeed apply to many things, 3) Fabianx preferred that.

CR updated.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

And finally back to RTBC, see you in the follow-ups! (hopefully)

Wim Leers’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
67.78 KB
3.63 KB

Oops…

[17/03/15 22:25:36] Wim Leers: https://www.drupal.org/node/2432837#comment-9731261
[17/03/15 22:26:53] Alex Bronstein: looks good, but what about url.route -> route?
[17/03/15 22:34:37] Wim Leers: DOH
[17/03/15 22:34:51] Wim Leers: so focused on discussing the contentious one that I simply forgot about that one lol

Fixed that. Going AFK now — tired :)

Let's let @effulgentsia RTBC this — it's important that we have his sign-off IMO.

effulgentsia’s picture

Status: Needs review » Reviewed & tested by the community

Looks great. Thanks!

dawehner’s picture

Can we have a follow up to add test coverage for all the new cache contexts?

  1. +++ b/core/lib/Drupal/Core/Cache/IsSuperUserCacheContext.php
    @@ -0,0 +1,29 @@
    +/**
    + * Defines the IsSuperUserCacheContext service, for "super user or not" caching.
    + */
    +class IsSuperUserCacheContext extends UserCacheContext {
    ...
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function getContext() {
    +    return ((int) $this->user->id()) === 1 ? '1' : '0';
    +  }
    +
    

    I'm curios which behaviour relies on the admin user still? don't we rely on admin roles somehow already?

  2. +++ b/core/lib/Drupal/Core/Cache/PagersCacheContext.php
    @@ -21,8 +21,16 @@ public static function getLabel() {
    +  public function getContext($pager_id = NULL) {
    +    // The value of the 'page' query argument contains the information that
    +    // controls *all* pagers.
    +    if ($pager_id === NULL) {
    +      return 'pager' . $this->requestStack->getCurrentRequest()->query->get('page', '');
    +    }
    

    Should we not instead fallback to the pager with element 0? I think this is more what is actually intended. On top we then would need 'all', but I think the usecases for that are really rare.

  3. +++ b/core/lib/Drupal/Core/Cache/QueryArgsCacheContext.php
    @@ -0,0 +1,40 @@
    +      $query_args = [];
    +      parse_str($this->requestStack->getCurrentRequest()->getQueryString(), $query_args);
    +      return $query_arg . '.' . (isset($query_args[$query_arg]) ? $query_args[$query_arg] : '');
    

    Huch? Why do you not just use $request->query->get($query_arg)?

  4. +++ b/core/modules/views/src/Plugin/views/argument/ArgumentPluginBase.php
    @@ -1208,7 +1208,7 @@ public function getCacheContexts() {
         //   the information from there.
    -    $contexts[] = 'cache.context.url';
    +    $contexts[] = 'url';
     
         // Asks all subplugins (argument defaults, argument validator and styles).
         if (($plugin = $this->getPlugin('argument_default')) && $plugin instanceof CacheablePluginInterface) {
    diff --git a/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php b/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php
    
    diff --git a/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php b/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php
    index 337b722..feccd21 100644
    
    index 337b722..feccd21 100644
    --- a/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php
    
    --- a/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php
    +++ b/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php
    
    +++ b/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php
    +++ b/core/modules/views/src/Plugin/views/argument_default/QueryParameter.php
    @@ -95,7 +95,7 @@ public function isCacheable() {
    
    @@ -95,7 +95,7 @@ public function isCacheable() {
        * {@inheritdoc}
        */
       public function getCacheContexts() {
    -    return ['cache.context.url'];
    +    return ['url'];
       }
     
     }
    diff --git a/core/modules/views/src/Plugin/views/argument_default/Raw.php b/core/modules/views/src/Plugin/views/argument_default/Raw.php
    
    diff --git a/core/modules/views/src/Plugin/views/argument_default/Raw.php b/core/modules/views/src/Plugin/views/argument_default/Raw.php
    index 4239c50..40bddf3 100644
    
    index 4239c50..40bddf3 100644
    --- a/core/modules/views/src/Plugin/views/argument_default/Raw.php
    
    --- a/core/modules/views/src/Plugin/views/argument_default/Raw.php
    +++ b/core/modules/views/src/Plugin/views/argument_default/Raw.php
    
    +++ b/core/modules/views/src/Plugin/views/argument_default/Raw.php
    +++ b/core/modules/views/src/Plugin/views/argument_default/Raw.php
    @@ -124,7 +124,7 @@ public function isCacheable() {
    
    @@ -124,7 +124,7 @@ public function isCacheable() {
        * {@inheritdoc}
        */
       public function getCacheContexts() {
    -    return ['cache.context.url'];
    +    return ['url'];
       }
     
     }
    diff --git a/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php b/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php
    
    diff --git a/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php b/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php
    index b5f8bf8..9abf63e 100644
    
    index b5f8bf8..9abf63e 100644
    --- a/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php
    
    --- a/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php
    +++ b/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php
    
    +++ b/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php
    +++ b/core/modules/views/src/Plugin/views/filter/FilterPluginBase.php
    @@ -1477,7 +1477,7 @@ public function getCacheContexts() {
    
    @@ -1477,7 +1477,7 @@ public function getCacheContexts() {
         // input from GET parameters, which are part of the URL. Hence a view with
         // an exposed filter is cacheable per URL.
         if ($this->isExposed()) {
    -      $cache_contexts[] = 'cache.context.url';
    +      $cache_contexts[] = 'url';
         }
         return $cache_contexts;
       }
    diff --git a/core/modules/views/src/Plugin/views/sort/SortPluginBase.php b/core/modules/views/src/Plugin/views/sort/SortPluginBase.php
    
    diff --git a/core/modules/views/src/Plugin/views/sort/SortPluginBase.php b/core/modules/views/src/Plugin/views/sort/SortPluginBase.php
    index 45b86be..8fe2b43 100644
    
    index 45b86be..8fe2b43 100644
    --- a/core/modules/views/src/Plugin/views/sort/SortPluginBase.php
    
    --- a/core/modules/views/src/Plugin/views/sort/SortPluginBase.php
    +++ b/core/modules/views/src/Plugin/views/sort/SortPluginBase.php
    
    +++ b/core/modules/views/src/Plugin/views/sort/SortPluginBase.php
    +++ b/core/modules/views/src/Plugin/views/sort/SortPluginBase.php
    @@ -241,7 +241,7 @@ public function getCacheContexts() {
    
    @@ -241,7 +241,7 @@ public function getCacheContexts() {
         $cache_contexts = [];
         // Exposed sorts use GET parameters, so it depends on the current URL.
         if ($this->isExposed()) {
    -      $cache_contexts[] = 'cache.context.url';
    +      $cache_contexts[] = 'url';
         }
         return $cache_contexts;
       }
    diff --git a/core/modules/views/src/Tests/GlossaryTest.php b/core/modules/views/src/Tests/GlossaryTest.php
    
    diff --git a/core/modules/views/src/Tests/GlossaryTest.php b/core/modules/views/src/Tests/GlossaryTest.php
    index fd97978..1df4701 100644
    
    index fd97978..1df4701 100644
    --- a/core/modules/views/src/Tests/GlossaryTest.php
    
    --- a/core/modules/views/src/Tests/GlossaryTest.php
    +++ b/core/modules/views/src/Tests/GlossaryTest.php
    
    +++ b/core/modules/views/src/Tests/GlossaryTest.php
    +++ b/core/modules/views/src/Tests/GlossaryTest.php
    @@ -87,7 +87,7 @@ public function testGlossaryView() {
    
    @@ -87,7 +87,7 @@ public function testGlossaryView() {
     
         // Verify cache tags.
         $this->enablePageCaching();
    -    $this->assertPageCacheContextsAndTags(Url::fromRoute('view.glossary.page_1'), ['cache.context.url', 'node_view_grants', 'language', 'user'], [
    +    $this->assertPageCacheContextsAndTags(Url::fromRoute('view.glossary.page_1'), ['languages', 'url', 'user'], [
           'config:views.view.glossary',
           'node:' . $nodes_by_char['a'][0]->id(), 'node:' . $nodes_by_char['a'][1]->id(), 'node:' . $nodes_by_char['a'][2]->id(),
           'node_list',
    diff --git a/core/modules/views/src/Tests/RenderCacheIntegrationTest.php b/core/modules/views/src/Tests/RenderCacheIntegrationTest.php
    
    diff --git a/core/modules/views/src/Tests/RenderCacheIntegrationTest.php b/core/modules/views/src/Tests/RenderCacheIntegrationTest.php
    index da94ab4..4a9276f 100644
    
    index da94ab4..4a9276f 100644
    --- a/core/modules/views/src/Tests/RenderCacheIntegrationTest.php
    
    --- a/core/modules/views/src/Tests/RenderCacheIntegrationTest.php
    +++ b/core/modules/views/src/Tests/RenderCacheIntegrationTest.php
    
    +++ b/core/modules/views/src/Tests/RenderCacheIntegrationTest.php
    +++ b/core/modules/views/src/Tests/RenderCacheIntegrationTest.php
    @@ -221,7 +221,7 @@ public function testViewAddCacheMetadata() {
    
    @@ -221,7 +221,7 @@ public function testViewAddCacheMetadata() {
         $view = View::load('test_display');
         $view->save();
     
    -    $this->assertEqual(['node_view_grants', 'user', 'language'], $view->getDisplay('default')['cache_metadata']['contexts']);
    +    $this->assertEqual(['languages', 'user', 'user.node_grants:view'], $view->getDisplay('default')['cache_metadata']['contexts']);
       }
    

    Isn't that part of the other patch?

Wim Leers’s picture

FileSize
67.65 KB
848 bytes

#94:

Yes, we can add tests in a follow-up. But many of these are too trivial and aren't worth having test coverage at all. E.g. the host, IP, headers, cookies cache contexts are all purely calling Symfony Request object methods, which already have their own test coverage. That's also why no cache context in core currently has any test coverage except for the node grants one; because only in that case there is significant complexity.

  1. Well, there are valid use cases. E.g. maintenance mode can only be configured by the root user. Core fortunately doesn't have many of these checks remaining (grep for id() == 1 and id() === 1), but still special-cases UID 1 when checking permissions, for example. This issue is also about looking at the entire "cache context landscape", and "is super user" is a sensible part of that.
  2. No! That's exactly not what's intended. The pager uses the page query arg. It encodes the data for pagers 0, 1, 2, 3 … N into that query arg's value, separated by commas. Therefore, "all possible pager IDs" are contained by that query arg, therefore this is correct.
    Using url.query_args.pagers means "all". It surely is very rare to use that, but that's the semantics we've assigned: plural means all possible parameter values. See #34 and subsequent comments and the CR.
  3. Because I didn't know that was possible. Fixed. Thanks! :)
  4. Yes, it's also in #2451679: Validate cache contexts (+ cache contexts in some views plugins wrong), but I fixed it here independently, I needed this to make tests pass.
dawehner’s picture

No! That's exactly not what's intended. The pager uses the page query arg. It encodes the data for pagers 0, 1, 2, 3 … N into that query arg's value, separated by commas. Therefore, "all possible pager IDs" are contained by that query arg, therefore this is correct.

You probably did not understood my point at all. I doubt people keep in their head that there are more than one pager on a page (tell me about it, I've been lurking in the views support queue for years). Based upon that, people will always just use 'pager', even they technically actually should have used pager:0. Based upon that,
I'd argued before, that 0 should be the default, and you explicitly opt IN to show all.

Wim Leers’s picture

  1. Since #2433599: Ensure every (non-views) pager automatically associates a matching cache context, every pager gets the cache context assigned automatically, so there's not even a potential for confusion.
  2. Only developers will ever deal with the pager cache context. And even then, 99% of them won't, thanks to point 1. But when they are dealing with this, because they're using the Pager API directly, if they don't know that there can be multiple pagers, then they also cannot use the Pager API correctly… so I don't see how this is in any way a new problem.
  3. The point of the hierarchy is that we have the ability to "fold", "collapse", "simplify", "optimize" a set of cache contexts, by omitting more granular cache contexts. That's why we cannot assign a special behavior to pagers: making pagers === pagers:0 would break the hierarchy.
dawehner’s picture

Only developers will ever deal with the pager cache context.

Shock, so users will have to deal with cache contexts ? ;)
Right, if the render array already propagates that, there are just a few people which have to set it.

The point of the hierarchy is that we have the ability to "fold", "collapse", "simplify", "optimize" a set of cache contexts, by omitting more granular cache contexts. That's why we cannot assign a special behavior to pagers: making pagers === pagers:0 would break the hierarchy.

Alright, well, I just tried to think about it, and how to improve stuff.

So yeah, there is a reason I did not touched the RTBC state of this issue.

Wim Leers’s picture

s/ever/never/ :)

Thanks for sharing your thinking, it's much appreciated! I hope I was able to fully convince you of the rationale behind the current patch.

catch’s picture

Few questions, not changing status yet.

  1. +++ b/core/core.services.yml
    @@ -6,26 +6,86 @@ parameters:
    +  cache_context.ip:
    

    In terms of figuring out the list/hierarchy in the issue IP makes complete sense, but I'm not sure about actually providing it. Also why not a child of headers?

  2. +++ b/core/core.services.yml
    @@ -6,26 +6,86 @@ parameters:
    +  cache_context.cookies:
    

    Why isn't cookies a child of headers?

  3. +++ b/core/core.services.yml
    @@ -6,26 +6,86 @@ parameters:
    +  cache_context.user.is_super_user:
    

    So conceptually this is correct. But I wonder whether the logic should be contained in the roles cache context. Wherever you add per-role, you should also be adding this, and not adding it could lead to information disclosure. Role cache context is in 99.9% of cases "includes a permission check".

  4. +++ b/core/lib/Drupal/Core/Cache/CacheContexts.php
    @@ -118,6 +119,69 @@ public function convertTokensToKeys(array $context_tokens) {
    +   * given set of cache context tokens that is a more granular context of
    

    Do we want to say more granular, or 'property of'?

  5. +++ b/core/lib/Drupal/Core/Cache/CacheContexts.php
    @@ -118,6 +119,69 @@ public function convertTokensToKeys(array $context_tokens) {
    +   *  another cache context token in the set, is removed.
    

    Nit, extra space.

  6. +++ b/core/tests/Drupal/Tests/Core/Cache/CacheContextsTest.php
    @@ -20,6 +20,63 @@
    +  public function providerTestOptimizeTokens() {
    

    Nice.

Wim Leers’s picture

FileSize
67.65 KB
1016 bytes
  1. Because the IP address is not actually part of the header, except when using a reverse proxy that sets the X-Forwarded-For header.
  2. Good point. Technically, it's indeed part of the headers. But in practice, all web software considers it to be something on the same level as headers: PHP, Varnish, etc.
  3. Some things are only available to the super user. In that case, it makes more sense to distinguish between "is the super user" or "is NOT the super user". It should be rarely necessary, but IMO it makes sense.
  4. Sounds good, done.
  5. Fixed.
  6. :)
catch’s picture

1. OK yes that's $_SERVER + headers so not a true child of header.

2. I could go either on this. I guess it's something we don't need to optimize so OK.

3. that feels more like user.role:super_user too for that usage as well. uid 1 special casing is a pain. At least needs a follow-up even if we don't finalize that here.

Wim Leers’s picture

  1. … but it's not a true role. So the problem with going with user.roles:super_user is that somebody could create a role called super_user, which would then effectively be a conflict. If we would disallow a role config entity with that ID, that'd be possible. But I don't think it's possible to do that?

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 101: cache_context_hierarchy-2432837-101.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
67.64 KB

Rebased; 3-way merging FTW!

catch’s picture

The new contexts are handy for reviewing the patch, but there's several in here (IP address, header) that I can't ever see people using. If we end up listing all available contexts somewhere, does the extra choice help people pick the right one(s) to use? Answer to that might be yes.

Wim Leers’s picture

I think those cache contexts (IP address, headers) mostly just helps with people's intuition. If they see that all request properties are available as cache contexts, and that some have child levels… I think that helps with understanding the concept.

Plus, the "Countries API" module would/should provide a "country" cache context that would be a child of the "IP" cache context.
And it's entirely plausible for REST-heavy/headless Drupal sites to add cache contexts for certain headers, to vary by those header values
So I think that *core* indeed can't do anything with those cache contexts, but I think they're useful building blocks for contrib.

It ensures contrib will create the ip.country cache context, rather than a top-level country cache context. And it ensures that different contrib modules don't have to add their own headers cache context implementations.

  • catch committed 11f5549 on 8.0.x
    Issue #2432837 by Wim Leers, Fabianx: Make cache contexts hierarchical (...
catch’s picture

Assigned: catch » Unassigned
Status: Reviewed & tested by the community » Fixed

OK that's a good answer - having the roots of the hierarchy means contrib and custom code can fit into it better.

Happy with everything else here now, so committed/pushed to 8.0.x, thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Wim Leers’s picture