Problem/Motivation

>3 years ago, #2557815: Automatically bubble the "user.node_grants:$op" cache context in node_query_node_access_alter() introduced automatic cacheability bubbling for node_query_node_access_alter(). But ideally, this wouldn't need to bubble (onto the currently active render context if any) at all, ideally this cacheability would just be associated with the query itself.

We encountered this in the API-First initiative too, in #3005826: Follow-up for #2984964: JSON API + hook_node_grants() implementations: count queries still result in cacheability metadata leak.

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Comments

Wim Leers created an issue. See original summary.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

hchonov’s picture

I've just discussed this a little bit with @kfritsche and actually an entity query returns an array and not an object.

He suggested that we might need a service, where we could add a cacheable metadata during a request at any place and then before the response is returned to apply this cacheable metadata to the response.

geek-merlin’s picture

> He suggested that we might need a service, where we could add a cacheable metadata during a request at any place and then before the response is returned to apply this cacheable metadata to the response.

Which assumes that everything done is about collecting the response. Unfortunately this is wrong sometimes. Think of a query that assembles some search api index items which get stored, totally unrelated to the response. A lot of other cases can be imagined. (Apart of that, re-adding globals is dirty dirty dirty ;)

We might build on $query->get|setMetaData. Like in \Drupal\node\NodeGrantDatabaseStorage::alterQuery, where it is used for $langcode = $query->getMetaData('langcode').

If we settle on (and document) $query->getMetaData('cacheability'), we should be done. In an alter hook, we can then do something like $query->getMetaData('cacheability')->addCacheContext('whatever').

Of course, superior would be to go all the way and have $query->cacheability().

hchonov’s picture

How do you want then for that query cacheable metadata to propagate if everything a function is returning is a primitive value and not an object? Function A might be invoking function B, which then executes an entity query and based on the user permissions adds a cache context, because the query result is dependent on them and as such the value returned by function B. In this case there is no way to let the cacheable metadata propagate even if the entity query supports cacheable metadata in any way.

And the renderer itself is a global service, which is called in node_query_node_access_alter() in order to propagate the cache context, so basically adding a dedicated service for collecting cacheable metadata without having to invoke the renderer is just a better separation of concerns, but basically does not change the way we will be letting cacheable metadata bubble.

geek-merlin’s picture

What i see is, that we have the choice of wrapping each and every result in a metadata carrying wrapper ("monading it" in haskell speak) or add a global service. And yes, wrapping everything might be cleaner, but unfeasably tedious. And yes, if the cacheability service allows swapping out or pushing/popping the cacheability context, the use case i brought up can be done with it.

So what stays is a gut feeling that wrappers are cleaner, but i guess the service is (in php ;-) the only thing we can afford. So given the above (it can swap context), the cacheability service makes sense to me.

geek-merlin’s picture

Hmm, i see another point +wrappers/-service:

The service global variable works correctly under the assumption that for each and every render array, the cacheability is added while that render array is rendered. If the cacheability is calculated earlier, by priciple it can not be assigned to the right render array fragment. (IIRC we have an exception for too-early cacheability, but can we prove this guarantees us right cacheability attribution in the general case?)
Even more, in #2551419: Abstract RenderCache into a separate service that is capable of cache redirects in a non-render array-specific way, use cases were mentioned of cacheability stored for completely different objects than render arrays.

So in the end, the "simple" solution might shoot us in the knees and via wrong cacheability attribution open wormcans of security issues.

(Not taking sides yet, just thinking aloud.)

hchonov’s picture

If the cacheability is calculated earlier, by priciple it can not be assigned to the right render array fragment.

Such cacheable metadata should be assigned to the response, not a part of it.

So in the end, the "simple" solution might shoot us in the knees and via wrong cacheability attribution open wormcans of security issues.

Couple you please describe how this might happen?

geek-merlin’s picture

I was just throuth some other issues and got reminded, how many intermediate results have to be saved / cached and there are lots of issues about them lacking cacheability metadata. (Did not find it now but broup permission caching is one instance, kristian even does an immense effort to abstract the cacheability aware render cache for this: #2551419: Abstract RenderCache into a separate service that is capable of cache redirects in a non-render array-specific way)

Also i remember there had been a long list of problems in the issue where the render context was introduced (couold not find it today, my brain seems stuck on wrong search terms). From that issue it was clear to me that that render context was a pragmatic solution that solved the typical problem of early rendering, but worsens other cacheability use cases.

(So much for today, i'll follow up with some issue links when i find them. Sorry i have to defer a better answer.)

geek-merlin’s picture

Also this is related or dup.

hchonov’s picture

It would be nice if you could answer to my comment :)

geek-merlin’s picture

Thanks for your patience ;-), now i'm back on this. As of #8:

> If the cacheability is calculated earlier, by priciple it can not be assigned to the right render array fragment.
Such cacheable metadata should be assigned to the response, not a part of it.

No. See below.

> So in the end, the "simple" solution might shoot us in the knees and via wrong cacheability attribution open wormcans of security issues.
Couple you please describe how this might happen?

The claim is that for each and every intermediate result (render array, acces result, ...) that we cache, we must track the correct cacheability. Not doing this has various bad consequences which range from performance (see a list in #2046881: [meta] Avoid "early rendered" strings where beneficial to do so, build structured data to pass to drupal_render() once instead) to security. I'll only elaborate security here.

Consider we have an access result of something. That access result is cached with unsufficient cacheability data, when admin is logged in. Tomorrow nonadmin comes along and the cached access result of admin is used. Boom.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

kristiaanvandeneynde’s picture

Just stumbled upon this issue, but my 2c is also that we should make the bubbling of cacheability a separate service. Right now it's deeply baked into the renderer, like VariationCache was once baked into the render cache. If we abstract the bubbling into a separate service, then other non-rendering pieces of logic could also call the service to add cacheability to the whole. We'd be able to get rid of methods like Renderer::executeInRenderContext().

Having said that, the challenge would be scoping this (like we currently have render contexts) so that unrelated access checks on the whole request do not poison the cacheability of smaller render arrays on the page, leading to everything being nigh uncacheable because there are too many tags and contexts involved.

Perhaps we could keep the render context concept for the time being, but introduce this one global context as a service where everyone can add to and then have the renderer also add its bubbled cacheability to said global context. Kind of like what it does already, but then abstracted out for the top level.

geek-merlin’s picture

@kristiaanvandeneynde: Good thoughts!

AND: You mentioned the core problem:
> Having said that, the challenge would be scoping this...

I once read the whole issue and story about why we have that render contexts. It was a dirty hack pragmatic solution because adding cacheability to some strings would have had a too big impact. (I remember some render array attributes that do not eat renderables.)

So we should take the opposite direction: Allow every value to carry cacheability. E.g. by adding a stringable renderable, and fix the remaining places where this causes cough.

That said, what are the use cases you got in mind where things that are not renderable should impact the response cacheability? And why not simply add those cacheabilities to the redner array?

kristiaanvandeneynde’s picture

That said, what are the use cases you got in mind where things that are not renderable should impact the response cacheability? And why not simply add those cacheabilities to the redner array?

Any sort of access check on the route or simply at any level before the renderer. Keep in mind access checks don't always rely merely on user.permissions which gets added anyway, but can also rely on domain, country of origin, an external system, etc.

kristiaanvandeneynde’s picture

Example: In the footer of an astrophysicist's blog we calculate today's distance between Earth and the Sun because why not. This calculation takes a lot of resources and we do not want to recalculate that for every page and every variation of said page, all we need is a cache context representing the day of the year.

The good news is that because of cache keys, we can cache smaller pieces of the page like these separately in the render cache. The bad news is that the render always adds the required_cache_contexts even to smaller pieces. So even if our distance widget does not vary by permissions, we still add 'user.permissions', 'languages:language_interface' and 'theme' to the cache contexts. See (coming from Renderer):

    if ($is_root_call || isset($elements['#cache']['keys'])) {
      $required_cache_contexts = $this->rendererConfig['required_cache_contexts'];
      if (isset($elements['#cache']['contexts'])) {
        $elements['#cache']['contexts'] = Cache::mergeContexts($elements['#cache']['contexts'], $required_cache_contexts);
      }
      else {
        $elements['#cache']['contexts'] = $required_cache_contexts;
      }
    }

Now, there's a good reason we have these required cache contexts because if we relied on people properly adding them, we'd have security reports all across contrib and perhaps even core.

But here's the thing, if we want to redo how cacheability is added to a response, we need to be careful that we don't poison these small unchanging blocks even more. Adding more (global) cache contexts to any render array in a response would mean that our block that in theory only varies by day of the year in practice varies by interface language, permissions, theme, day of the year and whatever cache context you added on a higher level.

TL;DR: It's really important that whatever solution we come up with does not get tacked onto render arrays, but rather that the total cacheability from renderRoot() makes it into a new service that at the end of a response puts all bubbled/gathered cacheability on the response object instead.

berdir’s picture

I'm quite confused about the discussion in this issue and how it's related to entity queries.

The issue title is currently "Enable an entity query's *return value* to carry cacheability".

Which as being discussed here is tricky, it's not even always an array, it can be a count query then it's just a number, or the aggregate version has it's own return structure.

However, I think we could consider to just make the entity query object itself implement CacheableDependencyInterface, then you can use that and add it to your cacheability metadata. There are still some tricky things, like how the inner SQL query object and alter hooks then attach their stuff there, but I think we can find solutions for that.

Whether or not that metadata is then automatically/force-added to the global render context is IMHO a different question and should be its own issue. Doing that is in general just a workaround and hack if we don't have a way to pass things through properly.

kristiaanvandeneynde’s picture

I'm quite confused about the discussion in this issue and how it's related to entity queries. The issue title is currently "Enable an entity query's *return value* to carry cacheability".

Adding cacheability to an array of IDs (the return value) seems hard to pull off without breaking BC. Then there's the question of what to do with said cacheability as, currently, nothing would consume it. Finally, I'm not sure what the benefit would be of a cacheable return value as we'd somehow have to reliably convert the entire entity query into a cache ID with contexts (because the return value may change based on several factors).

Would it not be better to not cache entity queries and instead cache the thing you used them for? Be that a render array, an access check or whatever.

There are still some tricky things, like how the inner SQL query object and alter hooks then attach their stuff there, but I think we can find solutions for that.

This might be really hard to pull off, I'm genuinely curious as to how you'd approach this.

Whether or not that metadata is then automatically/force-added to the global render context is IMHO a different question and should be its own issue. Doing that is in general just a workaround and hack if we don't have a way to pass things through properly.

Fully agree, from the early comments on the discussion here went in that direction and I must admit I also simply rolled with it :)

Maybe we should ping Wim and ask what the goal of this issue was? It's a bit muddy to me to say the least.

berdir’s picture

> Adding cacheability to an array of IDs (the return value) seems hard to pull off without breaking BC.

Yes, which is why I'm saying we should use the whole query (builder) object for that purpose, then you can do CacheableMetadata::fromArray($my_build_array)->merge($query);. You can't 100% chain-call it then so you still have access to the the object, but that should work.

> Then there's the question of what to do with said cacheability as, currently, nothing would consume it.

See above, consuming it is the responsibility of the specific implementations that run an entity query and then put the results in a render array or something as long as we don't force-bubble it, which imho we shouldn't.

FWIW, there are actually very few cases that qualify for that in core, as most listings are views or methods/functions that get things from something else than an entity query. Or the horrible node_title_list() that should have been removed a long, long time ago ;)

> This might be really hard to pull off, I'm genuinely curious as to how you'd approach this.

Maybe not that hard, query alters have contexts, see what I'm doing with https://www.drupal.org/project/drupal/issues/3188914 for example. If we pass in the query builder object down as context, alter hooks can add cache contexts or tags to it.

pcambra’s picture

Just adding this comment for discoverability, when you add accessCheck(TRUE) to an entity query, as per #2287071: Add cacheability metadata to access checks, there's cache metadata and that can cause the dreaded LogicException: The controller result claims to be providing relevant cache metadata, but leaked metadata was detected error.

berdir’s picture

Yes, I suppose that's kind of expected, what you need to do is do your thing in a render context and then collect the cacheability metadata from that. It's an unfortunate workaround due to this issue not being implemented and then you need to handle that when not already working in a render context.

The idea here is that we could avoid that by attaching and collecting that on the $query object and then you still need to do the same thing, but in a nicer way.

godotislate’s picture

#3516034: Add cacheable metadata to SelectInterface and entity QueryInterface objects is possibly a duplicate, but there's an MR up there to add cacheable metadata to SelectInterface and entity QueryInterface objects.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.