Proposed commit message
Issue #2432837 by Wim Leers, Fabianx: Make cache contexts hierarchical (e.g. 'user' is more specific than 'user.roles')
Problem/Motivation
-
Cache Contexts offer great granularity, but e.g. a cache_context like
menu_trail
that is bubbled up to page (response) level is needlessly increasing the complexity of that cache ID. After all, the menu trail cache context really is a subset of the URL cache context.
Also an advancedHttpMiddleware
page cache would likely try to transfer as less cache contexts over the wire as possible. - The 2nd reason is to be able to compare cache contexts.
e.g. An entity that is displayed per role, is displayed within a block that is configured per user.
Without the hierarchy, the role context would be propagated up and a sensible policy might have decided to create a placeholder for the 'role' entity as its not compatible, but because "per user" is a more specific cache context (a subset of) than "per role", this is wrong, because no placeholder is needed in reality.
Proposed resolution
Remaining tasks
None.
User interface changes
None.
API changes
Only API additions.
Cache context services have been renamed, which practically amounts to an API change as well. See the CR (https://www.drupal.org/node/2451661) for details.
---
The global definition would ideally live in the container, so its:
a) fairly static
b) very reliable and usable at page middle ware level pre-boot and pre-routing.In other words, an advanced page cache sitting at the middle ware level as explained in the doc, would probably make heavy use of the cache context hierarchy (issue needed), to propagate the mix of:
[ menu_trail:tools, book.navigation, users.roles ] to e.g. just [ 'page', 'users.roles' ] and only take the most upper level cache contexts into account that can be executed with very low overhead.
Beta phase evaluation
Reference: https://www.drupal.org/core/beta-changes Issue category Task because not a bug, but necessary to ensure good cache hit ratios. Issue priority Major because this is crucial for getting good cache hit ratios: when varying by cache contexts that are implied by other cache contexts, needless variations are created that are identical. Prioritized changes The main goal of this issue is performance. Disruption Minimal disruption: existing contrib/custom code that uses cache contexts will have to update to use the updated cache contexts IDs. There is a huge DX benefit though: the new cache contexts communicate unambiguously what their parent (more granular) cache contexts are by which they are implied. E.g. url.pagers:0
is implied byurl.pagers
which in turn is implied byurl
, and knowing that is easy: you only have to look at the cache context specified! It also clearly communicates to the developer where the value originates from. i.e. it takes away some of the magic, leading to a deeper understanding.
Comment | File | Size | Author |
---|---|---|---|
#105 | cache_context_hierarchy-2432837-105.patch | 67.64 KB | Wim Leers |
Comments
Comment #1
Fabianx CreditAttribution: Fabianx commentedComment #2
Wim LeersJust adding some formatting/minor rewording to the IS.
Comment #3
yched CreditAttribution: yched commentedRegarding hierarchy, it's a bit counter-intuitive that 'user' is more granular than 'user.roles'. From the naming you'd expect the opposite (foo.bar looks more specific than foo)
Comment #4
catchYes that's been in the back of my mind as well, I think it goes both ways though.
Specifically in terms of user roles, it's easy to see that 'user' includes 'user.roles' - i.e. that roles is a property of user. Therefore if you vary on user you're also varying by role.
I'm wondering if that works for enough other things though - say page and page.active_trail? Then timezone - iirc it's possible to prevent users from selecting a timezone so that this would always be the timezone of the site.
Another thing here, I had a brief irc discussion with Fabianx about (longer term) exposing some of this logic to reverse proxies (i.e. for authenticated user caching in Varnish).
If we're able to group cache contexts into things like page vs. user, then that ought to help reverse proxy implementations figure out which are handled by the path and which aren't.
Comment #5
Fabianx CreditAttribution: Fabianx commentedBumping to major as that is a big enabler of:
a) Caching at HttpMiddleware level (authenticated user)
b) ESI / auto-AJAX-replacement -- The less cache contexts, the easier the request logic of those.
c) Cache context comparison.
So authenticated user caching in Varnish in reality also solves most of the problems around ESI (at least those around the cache granularity).
The same lazy-two-tier-approach works for Varnish, too! :)
An ESI request is just a request that does not have 'cache_context.page', which is req.url in 'varnish speak'.
Comment #6
Wim LeersI'd like to start talking about the implementation.
Cache contexts aren't plugins. They're tagged services.
So we can't use annotations. What we can use, is the ability to set purpose-specific attributes for tagged services. What about a
parent
attribute? E.g.:… but I wonder if that is truly sufficient. I think there are use cases where contrib/custom cache contexts may sit between existing levels.
E.g. when varied per
user
, there's no need to vary peruser.roles
. So the parent ofuser.roles
isuser
. But what if we add anuser.is_admin
context? That's actually in between: it behaves likeuser
for UID 1, but kind of likeuser.roles
for every other user.#3/#4: we could rename the
user
cache context touser.id
, if you think that helps.Comment #7
Wim LeersActually, I think
parent
is too vague. What aboutis_subset_of
? But that too is not very clear, because, that means reading it as , but "less specific" actually would mean superset, not subset.Perhaps I'm being misled because I'm not a native speaker, but I do think it's very important we get the wording right.
So, let me let me explain the situation using the cache context hierarchies we have in core, so anybody can chime in:
That is the hierarchy. If any of the lower level cache contexts is specified, as well as any of the higher level, then the lower level one can be omitted, since it's already implied by the (more specific) higher level.
Basically: What is the best word/verb to describe this relationship?
Comment #8
Fabianx CreditAttribution: Fabianx commentedI think first of all we should name them like that, because in the end this is like class chaining:
Possible core examples
- user
- user.roles
- user.roles.has_role:'administrator'
- user.access_grants
- user.timezone (TZ is configured by a user in Drupal even if its the same always)
- user.language (the users preferred language, when e.g. user language neg is used on a content piece)
- url
- url.pager:0
- url.menu.active_trail:'tools'
- url.query_parameter:'campaign_code'
- url.book_navigation (While other schemes are possible the core scheme is to variate the book navigation by url.)
- url.main_request (need to call the main route for page assembling)
- url.route (hash of the route with parameters, used e.g. by SmartCache)
- url.route.format (negotiated format after routing, e.g. 'json', 'dialog', etc.)
- url.route.language (negotiated language derived from the url after routing)
( Here route is a sub-set of url based on path based language and content negotiation with the exception of allowing uncacheable pages to be everything. )
Possible Contrib examples
- user.organic_groups.has_og:'some-group'
- request (Hash of whole incoming request, not that useful)
- request.cookie:'device'
- request.header:'X-Drupal-Device'
Conclusion
Naming things appropriately is probably the easiest to show the relationships and then 'parent' is probably fine.
Comment #9
larowlanYes, the . based hierarchy makes sense to me, and we have similar patterns elsewhere, eg in validation message paths.
So given this varies from the pure service tags approach, where do we keep the logic - will you just use simple strpos === 0 style comparisons or do you need more complex helper methods?
Comment #10
Wim LeersThis discussion also blocks #2448823: Add languages:<type> cache contexts.
Comment #11
Wim LeersIt looks like #8 & #9 have established that we should declare the hierarchy through the use of periods. Let's go with that for now.
For completeness, this is the entire cache context hierarchy that we (Fabianx, catch, I) discussed in IRC:
Observations:
request
andserver
are their roots. This indicates that a cache context either comes from the request (almost always) but might also come from the server in rare cases.request
from the cache context IDs (and thus also omit it from that period-based naming scheme).request
tree, that's the most interesting one. We see 4 children there:ip_address
,headers
,cookies
andurl
. A HTTP request roughly consists of 2 things: a URL + headers. We special-case the cookie header because of its special treatment on the client-side. And each request originates from an IP address. Hence those 4. (Based on the discussion in #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use, we can explicitly not support theAccept
&Accept-Language
headers as cache contexts.)language
request, then that implies varying by thelanguage:interface
cache context, meaning we can omit the latter.url.route.language
andurl.route.language:interface
? It has the benefit of removing all ambiguity, at the cost of being quite long.Comment #12
catchThe longer names seem OK.
In irc Wim suggested:
- rename existing cache contexts to match the tree
- follow-up issues to add missing contexts
Also seems OK.
Comment #13
Fabianx CreditAttribution: Fabianx commented#11.6: I am all for long names, because while discoverability is not as good as for long classnames (where you have auto completion - "What IDE, I don't need an IDE. I have ex!'), we are generally moving towards:
- explicit is better than implicit
in core.
Also for language e.g. we have a user.language cache context and a url.route.language cache context, where this makes it also more clear.
Comment #14
yched CreditAttribution: yched commentedBut the session doesn't necessarily come from a cookie ? (er, or does it ?)
Comment #15
yched CreditAttribution: yched commentedAlso, discoverability is a concern. How do I find the correct full tree name for "cache by roles" ?
Comment #16
Fabianx CreditAttribution: Fabianx commented#14: Yes, it does in Drupal :).
#15: You look into either the services tree like for any other service or read the cache context documentation that we start in a sister issue.
Also user.roles is not more magical, than what we currently have, which is, uhm, 'user.roles'.
So my proposal was to document the whole tree, but don't have:
request.cookies.session.user.roles
But start a new tree with 'user' as user is a really special concept.
So the two root 'aliases' would then be:
- url (special, aliased from request.url)
- user (special, aliased from request.cookies.session.user)
Both are special concepts for a CMS like Drupal.
The two natural root aliases are:
- server (like server.value:HTTP_HOST)
- request (like request.header)
--
So my long names proposal is actually a:
- as long as needed for practical purposes proposal
and to shorten the hierarchy, where we don't have something meaningful as parents that can be cached.
(and while session might be valid at times, I don't feel it deserves it's own cache context, because it gives horrible cacheability).
--
TL;DR:
Have 4 root elements:
- server
- request
- user
- url
for the cache context hierarchy for practical purposes, have child elements of those use the '.' notation.
Also see #8 for concrete examples.
Note that this roughly matches what varnish has internally as variables (this comes naturally as both describe the hierarchy of headers and values):
e.g.
request == req
server == req.backend.value (IIRC)
url == req.url
user == // There is no concept of a user in varnish.
request.cookie == req.http.cookie (with regexp to filter out)
request.header:X == req.http.X
Comment #17
Wim LeersInitial patch. Only doing
url.host
andurl.query.args
in this one, to see what it's starting to look like. Also introduces validation of the cache context ID, to ensure the parent cache contexts are present.This patch will fail because the
menu.active_trail
cache context uses the period for something that isn't hierarchical, there's nomenu
cache context. Just showing that it works and what exception you'll get.(No interdiff, completely new patch, following the agreed upon approach.)
Now pushing this further.
Comment #19
Wim LeersFailed as expected, with this message:
Comment #20
Wim LeersChanges:
pager
->url.pager
menu.active_trail
->url.menu_active_trail
MenuActiveTrailInterface::getActiveTrailIds()
now accepts NULL as the menu name, just like its siblingMenuActiveTrailInterface::getActiveLink()
— this then allows us to build the active trail when no menu is specified. This is just fixing an oversight.Comment #22
Wim LeersWhile tackling the next one (
book.navigation
), I realized that it's actually got theurl.route
cache context as its parent, and in fact, so doesurl.menu_active_trail
/menu.active_trail
!Changes:
url.route
url.menu_active_trail
->url.route.menu_active_trail
book.navigation
->url.route.book_navigation
Comment #24
Wim LeersShould be green.
Comment #25
catchIs there any chance we need a cache context for the actual route as a thing as opposed to route + params?
Comment #26
Wim Leers#25: interesting! I don't think we need that, because a URL is bound to a route + parameters, not to just a route. Just a route can live at many URLs, so it's not like that could be a useful level between "url" and "route", AFAICT.
Comment #27
Fabianx CreditAttribution: Fabianx commented#25: If you mean the route name, the context for that would be url.route.name instead.
Comment #28
Wim LeersIndeed, that's implied by what I said. Thanks for making that much clearer :)
Comment #29
catchurl.route.name is what I was thinking of. And indeed url.route is the parent of that so that works fine.
Comment #30
Wim LeersThis interdiff:
url.route.format
url.route.name
url
-subtree is finished IMO, caching perurl.route.route_parameters
or perurl.path
don't make sense.ip
cookies
(which allows forcookies:<name>
)headers
(which allows forheaders:<name>
)Next:
user
.Comment #31
Wim LeersThis interdiff:
user
anduser.roles
moved from the user module into\Drupal\Core
, since none of it actually depends on user module logic — or at least, not anymore.user.is_super_user
user.roles
into a calculated cache context; it behaves just like it does today, but it now allows one to specify a specific role to check the presence of, e.g.:user.roles:admin
, to vary something by whether the current user has the admin role or not (or e.g. anonymous or not, or authenticated or not)node_view_grants
->user.node_view_grants
Next: dealing with the last 3 cache contexts in core that haven't been touched yet:
language
+theme
+timezone
.Comment #33
Wim LeersThis interdiff:
user.roles
cache context.user
,theme
andlanguage
should be top-level cache contexts. The conclusion, verbatim from IRC:13:57:18 WimLeers: catch: Fabianx-screen and I agree with you, the basic rule is: fit cache contexts into the tree unless there's a use case for altering the behavior/logic, and hence the parent cache context. In that case, make it a top-level cache context. Applies to user, theme, language.
user
. This now also does that fortheme
andlanguage
timezone
->user.timezone
Next: clean-up.
Comment #34
Wim LeersWhat the current patch gives us
So we've established a clear pattern to define the tree:
a.b.c
anda.b
are present, it's obvious thata.b
encompassesa.b.c
, and thus it's clear why thea.b
can be omitted, why it can be "folded" into the parentSo let's look at the tree we've ended up with, the one the last patch actually allows you to use. Sorted alphabetically. Generated by this code:
That gives us:
Note that the deeper you go in the tree, the better for caching, because the less specific your cache context becomes, and hence the more reusable it becomes.
Looking at the cache contexts that accept a parameter
But now let's clarify that a bit by showing which ones accept a parameter, signified with a leading colon (
:
), just like the syntax allows you:Now we see an interesting pattern. Most of the ones accepting a parameter are plural, but not all.
I'd argue that every calculated cache context (i.e. the ones that accept a parameter) should be plural, because when no parameter is given, they're required to return a value representing all of the possible parameters. This then automatically gives us a clean, logical hierarchy. And it communicates to the user that whenever a plural is encountered, a parameter can be specified.
Then the pattern becomes:
Proposed next steps
So if we rename
url.pager
tourl.pagers
andurl.route.menu_active_trail
tourl.route.menu_active_trails
, we're wholly consistent there. (They've already been updated in the above patches to support no parameter being passed and in that case returning results that apply to all values they accept as parameters.)Except for
user.node_view_grants
. That's still plural. But why is it plural? Because there are many operations for which there are grants. But it currently only supports the "view" operation. Renaming it touser.node_grants
and then allowing a parameter containing the operation to be specified, would make this too wholly consistent:user.node_grants:view
.Finally, I'd like to propose we turn
language
into a calculated cache context and turn it intolanguages
. Then you can dolanguages:content
orlanguages:interface
. That also better captures the fact that the current language actually covers all the different language types.With those proposals implemented, we'd end up with:
Comment #35
Wim LeersIn this interdiff:
user.node_view_grants
->user.node_grants
.Comment #36
Wim LeersIn this interdiff:
url.pager
->url.pagers
url.menu_active_trail
->url.menu_active_trails
Comment #39
Wim LeersIn #35, there are failures, because in between posting #33 and #35, #2381217: Views should set cache tags on its render arrays, and bubble the output's cache tags to the cache items written to the Views output cache landed, which introduced more uses of cache contexts, which I therefore hadn't yet updated.
In #36, I stupidly forgot to rename the actual cache context. The consequences are clear :)
Comment #41
Wim LeersStraight rebase of #39, patch didn't apply because #2433599: Ensure every (non-views) pager automatically associates a matching cache context was committed in the mean time.
Comment #42
Wim LeersIn this interdiff:
language
->languages
.The cache context hierarchy is now all done! Now we just need to start using it.
Next: folding of cache contexts + tests.
Comment #45
Wim LeersMissed one place. This should make it green again.
Comment #46
Wim LeersFolding implemented:
CacheContexts::optimizeTokens()
.All done now; now blocked on reviews.
Comment #47
Fabianx CreditAttribution: Fabianx commentedCan be follow-up, but someone / something should set the 'url' cache context, so that things get optimized out that only depend on url.
Also means the page cache has the 'url' cache context.
--
Continuing review later ...
Comment #48
Fabianx CreditAttribution: Fabianx for Drupal Association commentedI think better to do:
return 's.' . $this->user->id() === 1 ? 1 : 0;
Else you end up with:
'::' potentially, which is confusing.
I think it would be great to type-hint somewhere that this is of one of the types defined in:
Drupal\Core\Language\LanguageInterface
So if its not multi-lingual, then we use LanguageInterface::TYPE_INTERFACE, which is the default.
I think we should delimite with '.' or ',' here as its not a new CID part.
This could use a comment that 'page' contains all pagers on a page separated by comma, or an @see pager_find_page() or such.
It confused me first.
This needs an isset() check.
Are we sure the block depends on all language types?
Not know too much about this, just wondering.
According to https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Language%... this needs to be language_content.
Comment #49
Fabianx CreditAttribution: Fabianx for Drupal Association commentedComment #50
Wim LeersUserRolesCacheContext
. (Note though that the colon was already there before this patch.)'language'
cache context in this patch. I only did it in the filter API test coverage because there the impact is self-contained. Let's fix the existing usages in #2448823: Add languages:<type> cache contexts.Also rebased; fixed a tiny conflict.
Comment #51
Wim LeersComment #52
Fabianx CreditAttribution: Fabianx for Drupal Association commentedCould find nothing more, RTBC if tests pass :).
We need another issue for e.g. user roles to consider throwing an exception, too if the argument is invalid to prevent:
user.roles:auhtenticated
or other typos :).
Great work!
Comment #53
Fabianx CreditAttribution: Fabianx for Drupal Association commentedComment #54
Fabianx CreditAttribution: Fabianx for Drupal Association commentedComment #55
Wim LeersThis should fail because #2445743: Allow views base tables and entity types to define additional cache contexts has landed, which introduced more occurrences of some cache contexts.
Comment #56
Wim LeersCR created: https://www.drupal.org/node/2451661.
Beta evaluation added.
Comment #58
Wim LeersFailed as expected.
Some Views plugins were associating the non-existing
cache.context.url
cache context instead of justurl
. Also noticed at #606840-40: Enable internal page cache by default.Opened an issue to validate cache contexts, so that that is not possible anymore: #2451679: Validate cache contexts (+ cache contexts in some views plugins wrong).
Comment #59
Fabianx CreditAttribution: Fabianx for Drupal Association commentedBack to RTBC
Comment #61
Wim LeersTrivial fix, hence back to RTBC.
Comment #62
Fabianx CreditAttribution: Fabianx for Drupal Association commentedRTBC + 1
Comment #63
larowlanWhy is this a.b and not a? I could be plain wrong here.
nitpick, should this be means - not mean
nitpick two spaces
I know in comment we had issues with $comment->getOwnerId() returning '1' as in a string, we had to add a cast to int in that method.
We might need this here too.
could we hash this to reduce the size of the context? Just checking we definitely don't unserialize this, as that'd be a sechole.
Nitpick missing blank line
This would indicate the comment earlier (see point 1) is incorrect.
Comment #64
webchickComment #65
Wim LeersFirst: chase HEAD. #2342045: Standard views base fields need to use same rendering as Field UI fields, for formatting, access checking, and translation consistency conflicted with this.
Comment #66
Wim LeersSecond: #63.
Excellent review, @larowlan++
Comment #67
Fabianx CreditAttribution: Fabianx for Drupal Association commentedGreat review, back to RTBC.
Comment #70
Wim LeersTestbot was flailing in its first test run (nonsensical fail).
Comment #71
Wim LeersAlso pinged @effulgentsia for a review. And posted a link to this in #2443323: New convention: CacheContextInterface implementations should mention their ID in their class-level docblock to get more feedback.
Comment #72
effulgentsia CreditAttribution: effulgentsia at Acquia commentedNeeds work for:
What does "host" have to do with QueryArgs?
Wrong docs for this class.
Comment #73
effulgentsia CreditAttribution: effulgentsia at Acquia commentedAdditionally, but could be follow-up:
url.route and its children aren't exactly "simple" and "directly derived", so I think could be given a separate section with their own comment.
Why is 'pagers' at this level and not under 'query_args'? I suppose this allows an alternate implementation that uses some other portion of the URL, but if that's the intent, or if there's a different intent, let's add a comment about that.
Comment #74
effulgentsia CreditAttribution: effulgentsia at Acquia commentedAnd on a more substantial note, overall I like the naming approach a lot. But, for things like
cache_context.url.route.menu_active_trails
andcache_context.url.route.format
, it means we have an implicit requirement that implementations of those services must only depend on the URL. So, no ability for FancyMenuActiveTrails to depend on user or phase of moon, and no ability for contrib to implement header-based format negotiation. In other words, what we're sacrificing by using service ID to have hierarchy meaning is the ability to override the service implementation with something that would put it in a different part of the hierarchy. Is that something we're willing to accept? Or, should we address it via something like using an optional service tag or a service alias? For example, if contrib implements a FancyMenuActiveTrail override of core's MenuActiveTrail, then it can also tag or alias the cache_context.url.route.menu_active_trails service in a way that makes the system (e.g., CacheContexts::optimizeTokens()) aware that its parent is no longer url.route.[Edit: BTW, I'm fine to punt this to a follow-up.]
Comment #75
effulgentsia CreditAttribution: effulgentsia at Acquia commentedHow about this to address #74: we embed the hierarchy in the context name / service ID only when that hierarchy is intrinsic to the meaning of the context, not merely an implementation choice.
What I mean by that is
url.host
is a good name, because the semantics of "host" makes it intrinsically a child of "url". Same forurl.query_args
anduser.roles
.However, I suggest that
timezone
,format
, andpagers
each move to the root level, because those concepts are not intrinsically bound to users, routes, and URLs, respectively.Additionally, I think
route
should move to the root level as well. Routing could in principle depend on IP and headers, even if we don't want core's to do so.I'm unsure about
book_navigation
andmenu_active_trails
. I think they might make sense to keep underroute
(i.e.,route.book_navigation
androute.menu_active_trails
), because I don't think it makes semantic sense for those things to vary on anything other than the route. However, I suspect there are sites that customize breadcrumbs and the rendered structure of some menus based on things like role or organic group, but what I wonder is whether it ever makes sense to implement such customizations via the return value of getActiveTrailIds() or whether such customizations should only ever be done in the display layer (which would then also be responsible for adding the additional contexts there). If there are legitimate use cases for the former, then those context names probably need to move to the root level as well (or we can group them under something likenavigation
).For this issue, I think it would be sufficient to simply rename accordingly.
In a followup, I think we can add a service tag to convey an implementation-driven parent relationship. For example:
And thereby allow CacheContexts::optimizeTokens() to optimize for implementations where pagers are solely determined by a single URL query argument and routing is done exclusively by URL.
Comment #76
dawehnerWhile I can understand the point of not making everything alterable, I think encoding hierarchy into the service IDs is wrong,
because its a new pattern we came up with. ... What about things like module.installer, there is no hierarchy, its simply a group of things.
Tags are the way how you add additional metainformation to the services.
Comment #77
Wim Leers#72: fixed both; both are copy/paste remnants.
#73: Fixed both. Your hunch on the second point was correct. But I think you're right; we should put it under
query_args
. Talked about it with Fabian, and he's fine with it too. More on the "why" below.#74 + #75:
So the active menu trail is always going to depend on routing. It may choose a different algorithm to select parents, but it'll depend on the current route always.
But! That doesn't mean that the breadcrumb trail cannot use some fancy alternative, like remembering last-visited pages in the session. In other words, like you, I was confusing "active menu trail" with "breadcrumb trail". People can still override the breadcrumb manager or add more
breadcrumb_builder
-tagged services. It's up to those services to associate whatever cache context they want. (And currently, they're not associating any cacheability metadata at all btw, that's up to #2232609: Cacheable breadcrumbs block, and fix breadcrumb builders to fix.)#type => pager
has a default behavior that is based on a query argument. But usingelement_info_alter()
, you could change that to use the URL (think/page:1
in the URL). When making that implementation change, you can also define another cache context and associate that one instead. This will work fine, because the cache context is internal; it is bubbled, it isn't set by the consumer of#type => page
.The internal ones we can and should be specific. When the implementation is swapped out, the cache context can be swapped too, and will bubble automatically, so everything continues to work.
The external ones, there we have to be careful. This is why
languages
,user
andtheme
are all top-level: because they are necessary all over the place and the way the corresponding value for a request is selected can be highly dynamic: many types of language negotiation (URL-based, header-based, user-preference-based etc.), many types of authentication, many ways of selecting a theme (URL, user preference, defined on the route…): impossible to choose a clear parent.Accept-Language
andAccept
headers cannot be supported because it's impossible to make work with external caches, and therefore, with the internet in general. In an ideal world, it's possible, but in the practical actual world, it isn't. (Besides, IP-based routing goes even further than that; it's less reasonable, if you will.)We are choosing to standardize on the URL as being the only thing that determines the route. It's a uniform resource locator, after all. For browser-based language negotiation (which is also conceptually part of routing), we're introducing a redirect to the corresponding URL: #2430335: Browser language detection is not cache aware. For IP-based routing, we'd also use a redirect. For country-based routing (country derived from IP), we'd also use a redirect. In other words: for any advanced/fancy form of routing, we'd use redirects.
timezone
,format
,pagers
. I've explainedpagers
androute
above. The same applies totimezone
androute
. Basically, we need to make some assumptions, otherwise we end up with an uncacheable site.format
will be implied byurl
once #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use lands.timezone
could also be IP-specific (because region-specific), but we usually see per-country websites, so once again a redirect is the answer.If we don't make any decisions, which are indeed judgement calls, but well-reasoned ones, then we'd not end up with a hierarchy, but a list.
Your proposal would make the hierarchy implicit instead of explicit, hence removing those advantages. It'd introduce more magic.
Don't get me wrong, I agree with #75 in principle. But #75 is about optimizing for the "what if it's altered in advanced ways" use case, the current patch optimizes for the "I'm using it like >98% does" use case.
Looking forward to your thoughts :)
Comment #78
Wim LeersWe only apply this special meaning to cache context services. It is validated in
CacheContextsPass
. I don't see why that's necessarily harmful.(Another advantage of doing it this way is that people are already familiar with the concept that a service ID must be unique. Hence the cache context service ID must be unique.)
Besides, service IDs in general are a wild west. Singular, plural, periods, underscores, "manager" suffixes in some cases and not in others, etc. Even within a same area there are inconsistencies (
@theme_handler
, but@theme.manager
). I don't think it's bad to have some enforced consistency for tagged services of a certain type.That being said: I see your point. What you're saying is actually exactly what I originally did. But that lead to a lack of clarity. See #6 & #7.
Comment #79
Fabianx CreditAttribution: Fabianx for Drupal Association commentedTo #77 for the route argument:
If a site chooses to alter the meaning of url.route and exchange the cache context, so its dependent on more than the url naturally they will need to exchange the 'url' cache context as well.
Even if there was no hierarchy and it was not simplified, as soon as something uses 'url' instead of route, but this url depends on something else in the request it is wrong.
That is why core enforces an url based hierarchy on things that is true for core.
Because with this we give the guarantee of cacheability in external proxies.
Therefore it is good that we choose to have url.route as dependent on the url, because it gives clarity that you need to exchange both to get correct and proper caching.
--
Let's go through the list again:
So therefore timezone from this list remains as its added by e.g. a DateFormatter and is understood as a global concept.
And maybe explicit parent relationship as a tag is the right thing for timezone and theme, so it can still be simplified in core. (though that could indeed be follow-up).
Because usually timezone is user and given:
[timezone, user] can be simplified to user, but that might not always be true.
Similarly theme can easily be simplified to url.route in Core by default given the interface of the theme negotiator and we don't do this right now.
Will discuss timezone with Wim again.
Comment #80
Wim LeersTimezone
In #77, I gave I think a sound justification for why
route
,format
andpager
are already in the right place.For
timezone
, I wasn't able to formulate an equally sound justification. Combined with #79, I'm fine with movingtimezone
to a top-level position again, so that we can do #75's bottom proposal in a follow-up.Then patch == #75
Then the patch basically is exactly what you suggested in #75:
i.e.:
user
,theme
…), but not fortimezone
.pagers
does not fall in this category, because for 99% of developers, pager == query arg.route
,format
), we do enforce an intrinsic meaning, because that's also what #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use will do; i.e. we just follow the decision there.Pager
element and set a different cache context.CR updated.
Comment #81
Fabianx CreditAttribution: Fabianx for Drupal Association commentedTo write #80 again from a different light (also for my own understanding).
1. Intrinsic, it is a property of the parent - always, mostly request based things.
+Main property: Not negotiated, but taken mostly raw.
+Swappability: It can be swapped, but there is no use case without also having to swap the parents in the hierarchy.
2. Implementation specific, the cache context is bound to a specific implementation and by swapping out the implementation the cache contexts are swapped out as well.
Those are cache contexts that are not negotiated, but hard coded to their specific implementation. (Core's pager do use ?page)
+Main property: You do not use this cache context outside of the things it is on. It is bubbled up.
+Swappability: You do not swap the cache context, but the implementation and use a different cache context.
3. Dependent on calculated state, the cache context is negotiated based on other properties that can change. Because it is a cache context that others specify it MUST be swappable.
+Main property: Complex logic to calculate, can depend on several other cache contexts itself.
+Swappability: MUST be swappable, because it is specified by something outside, e.g. a date formatter specifies a timezone CC.
These are the top level contexts and what #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents() is for.
Comment #82
Fabianx CreditAttribution: Fabianx for Drupal Association commentedAll feedback has been addressed, #76 was additionally discussed in IRC.
Therefore back to RTBC.
Comment #83
effulgentsia CreditAttribution: effulgentsia at Acquia commentedI don't follow this reasoning. If I want to write a contrib/custom module that determines a route parameter from a header (because I choose to not care about reverse proxy caching, or can configure my reverse proxy to vary by that header), why would that require changing the meaning or implementation of the 'url' context? Vary-by-URL still means exactly that. It's only that in that case, vary-by-route can't be treated as a strict subset of vary-by-url, so how to deal with that? I suppose #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents() will allow that even if we preserve the context name as
url.route
?Not un-RTBC'ing for above, since that can be debated in a followup. But, setting to NR for:
Should
url.route.format
be changed tourl.format
? Because a single route can serve multiple formats AND a single format can be used by many routes, so I don't see in which sense format is a child of route.Comment #84
Wim LeersAnd @effulgentsia strikes again with irrefutable simplicity & clarity!
Of course! Incredible that we didn't realize this before. All of your awesome feedback so far is exactly why I wanted to get your thoughts before this went in :)
Updated the CR too.
Comment #85
Fabianx CreditAttribution: Fabianx for Drupal Association commentedYes, #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents() will allow to set a parent explicitly even if the semantic meaning is changed then from our recommendation / choice in core.
Naming ain't easy :).
I also wondered before on url.route.format, but atm. format neg is done in middle ware (or at least soon), so yes, that is not a property of the route itself.
Comment #86
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThanks.
I promised not to un-RTBC this on this, so I won't, but I'd just like to make a final plea to reconsider putting
format
androute
at the root levels, not insideurl
. Here's why: URL stands for uniform resource locator, not uniform representation locator. So I don't believe there is anything semantically that justifies thinking about format and route as children of url. I don't think'#contexts' => ['url.format']
is better DX over'#contexts' => ['format']
, especially when 'languages' and 'theme' are at the root level, and at least superficially, format is kind of like those things.I can appreciate the desire to optimize format and route away when we're caching on url, but we'll be able to do that in #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents(), so I don't see why we need to make that implementation decision leak into the names.
Comment #87
Fabianx CreditAttribution: Fabianx for Drupal Association commented#86: Thanks, I think our understanding of cache contexts has started to grow and as its a new concept it takes some tries to get right.
What I don't want is: All things are top-level, hence just a list as that is problematic for discoverability and DX.
I am pretty happy with my latest table / decision chart from #81 and I think thats a good addition to the developer docs on cache contexts.
Lets judge 'format' and 'route' based on that:
1. format is not intrisic itself as its not a simple property on the url, it is a calculated property of the request and in Drupal's implementation of the url, but we choose to omit 'request.' prefix due to shortening. It certainly fails 'not negotiated', but raw.
2. format would definitely be set by something else, like a request / response listener, hence it is not specific to the implementation of its thing. (though the cache contexts could live on the request object, but that is still != url).
3. format is negotiated and must be swappable for REST support in contrib.
=> Top level
route
- 1. Derived from the request, not a simple property, multiple routes per url
- 2. Smart Cache is after routing and wants to use this, so set by something else.
- 3. Definitely a negotiated context
=> Top level
A non-shortened tree would probably look like this:
If we omit request, we should probably still use request_format instead of just 'format' as it is a little too common.
Without explanations:
@Wim: Thoughts?
Comment #88
effulgentsia CreditAttribution: effulgentsia at Acquia commented+1 to #87. Either option (with or without
request
as its own context / prefix) is fine with me. If we omit that prefix, I'm fine with eitherformat
orrequest_format
.Comment #89
Fabianx CreditAttribution: Fabianx for Drupal Association commentedFinal decision, 1,2,3:
Comment #90
Wim LeersTalked it through some more with Fabianx, to try to find remaining problems, but it looks like this is now completely sound.
Went with
request_format
because — in order of importance — 1) Symfony hasRequest::getRequestFormat()
, which is exactly what this cache context uses, 2) prevents clashes, because "format" could indeed apply to many things, 3) Fabianx preferred that.CR updated.
Comment #91
Fabianx CreditAttribution: Fabianx for Drupal Association commentedAnd finally back to RTBC, see you in the follow-ups! (hopefully)
Comment #92
Wim LeersOops…
Fixed that. Going AFK now — tired :)
Let's let @effulgentsia RTBC this — it's important that we have his sign-off IMO.
Comment #93
effulgentsia CreditAttribution: effulgentsia at Acquia commentedLooks great. Thanks!
Comment #94
dawehnerCan we have a follow up to add test coverage for all the new cache contexts?
I'm curios which behaviour relies on the admin user still? don't we rely on admin roles somehow already?
Should we not instead fallback to the pager with element 0? I think this is more what is actually intended. On top we then would need 'all', but I think the usecases for that are really rare.
Huch? Why do you not just use $request->query->get($query_arg)?
Isn't that part of the other patch?
Comment #95
Wim Leers#94:
Yes, we can add tests in a follow-up. But many of these are too trivial and aren't worth having test coverage at all. E.g. the host, IP, headers, cookies cache contexts are all purely calling Symfony Request object methods, which already have their own test coverage. That's also why no cache context in core currently has any test coverage except for the node grants one; because only in that case there is significant complexity.
id() == 1
andid() === 1
), but still special-cases UID 1 when checking permissions, for example. This issue is also about looking at the entire "cache context landscape", and "is super user" is a sensible part of that.page
query arg. It encodes the data for pagers 0, 1, 2, 3 … N into that query arg's value, separated by commas. Therefore, "all possible pager IDs" are contained by that query arg, therefore this is correct.Using
url.query_args.pagers
means "all". It surely is very rare to use that, but that's the semantics we've assigned: plural means all possible parameter values. See #34 and subsequent comments and the CR.Comment #96
dawehnerYou probably did not understood my point at all. I doubt people keep in their head that there are more than one pager on a page (tell me about it, I've been lurking in the views support queue for years). Based upon that, people will always just use 'pager', even they technically actually should have used pager:0. Based upon that,
I'd argued before, that 0 should be the default, and you explicitly opt IN to show all.
Comment #97
Wim Leerspagers
: makingpagers
===pagers:0
would break the hierarchy.Comment #98
dawehnerShock, so users will have to deal with cache contexts ? ;)
Right, if the render array already propagates that, there are just a few people which have to set it.
Alright, well, I just tried to think about it, and how to improve stuff.
So yeah, there is a reason I did not touched the RTBC state of this issue.
Comment #99
Wim Leerss/ever/never/ :)
Thanks for sharing your thinking, it's much appreciated! I hope I was able to fully convince you of the rationale behind the current patch.
Comment #100
catchFew questions, not changing status yet.
In terms of figuring out the list/hierarchy in the issue IP makes complete sense, but I'm not sure about actually providing it. Also why not a child of headers?
Why isn't cookies a child of headers?
So conceptually this is correct. But I wonder whether the logic should be contained in the roles cache context. Wherever you add per-role, you should also be adding this, and not adding it could lead to information disclosure. Role cache context is in 99.9% of cases "includes a permission check".
Do we want to say more granular, or 'property of'?
Nit, extra space.
Nice.
Comment #101
Wim LeersX-Forwarded-For
header.Comment #102
catch1. OK yes that's $_SERVER + headers so not a true child of header.
2. I could go either on this. I guess it's something we don't need to optimize so OK.
3. that feels more like user.role:super_user too for that usage as well. uid 1 special casing is a pain. At least needs a follow-up even if we don't finalize that here.
Comment #103
Wim Leersuser.roles:super_user
is that somebody could create a role calledsuper_user
, which would then effectively be a conflict. If we would disallow a role config entity with that ID, that'd be possible. But I don't think it's possible to do that?Comment #105
Wim LeersRebased; 3-way merging FTW!
Comment #106
catchThe new contexts are handy for reviewing the patch, but there's several in here (IP address, header) that I can't ever see people using. If we end up listing all available contexts somewhere, does the extra choice help people pick the right one(s) to use? Answer to that might be yes.
Comment #107
Wim LeersI think those cache contexts (IP address, headers) mostly just helps with people's intuition. If they see that all request properties are available as cache contexts, and that some have child levels… I think that helps with understanding the concept.
Plus, the "Countries API" module would/should provide a "country" cache context that would be a child of the "IP" cache context.
And it's entirely plausible for REST-heavy/headless Drupal sites to add cache contexts for certain headers, to vary by those header values
So I think that *core* indeed can't do anything with those cache contexts, but I think they're useful building blocks for contrib.
It ensures contrib will create the
ip.country
cache context, rather than a top-levelcountry
cache context. And it ensures that different contrib modules don't have to add their ownheaders
cache context implementations.Comment #109
catchOK that's a good answer - having the roots of the hierarchy means contrib and custom code can fit into it better.
Happy with everything else here now, so committed/pushed to 8.0.x, thanks!
Comment #110
Wim LeersYAY! Now #2448823: Add languages:<type> cache contexts and #2453059: Set default render cache contexts: 'theme' + 'languages:' . LanguageInterface::TYPE_INTERFACE are unblocked, but also #2443323: New convention: CacheContextInterface implementations should mention their ID in their class-level docblock.
Comment #112
Wim LeersDiscovered a potential security problem caused by this: #2512866: CacheContextsManager::optimizeTokens() optimizes ['user', 'user.permissions'] to ['user'] without adding cache tags to invalidate that when the user's roles are modified.