Problem/Motivation
The node module has very old code to add a cache context for the current user's node access grants inside a query alter.
This was added in #2557815: Automatically bubble the "user.node_grants:$op" cache context in node_query_node_access_alter()
The goal is obviously to not cache and re-use a query result for a user with different node access grants to avoid information disclosures.
However, the way it's accomplished is using the renderer to render an array with just cache metadata based on wether the current request represents a cachable page. Per @catch, this was "a kludge 10 years ago (along with one or two others) to unblock dynamic_page_cache and big_pipe".
A further reflection of the problem is that for REST plugins you have to execute all your node, etc. queries in a render context so that your REST response data has the correct cache metadata.
Proposed resolution
In core there should be a mechanism to add cache metadata (e.g. cache contexts or tags) to Select and Entity queries and core should handle adding that cache metadata as part of the cache context for the current response if needed.
Possibly this should take the form of adding a callback instead of the actual data since in the current node module implementation we avoid calculating the user's node access grants if the request is not going to be cached.
This is a little tricky since we don't know what happens to the data once the query is executed, so possibly you'd have to handle this in the execute() method. That also seems like a kludge.
The solution in MR 12535 is to have \Drupal\Core\Database\Query\Select,\Drupal\Core\Database\Query\SelectExtender, and \Drupal\Core\Entity\Query\QueryBase implement \Drupal\Core\Cache\RefinableCacheableDependencyInterface and use \Drupal\Core\Cache\RefinableCacheableDependencyTrait. Objects of these types can then addCacheableDependency(), and get contexts, tags, and max-age. The query objects can also be merged as a cacheable dependency to any other cacheable dependency object.
Remaining tasks
Decide on the architecture including how this data is added to a query and where core is going to collect the data
Define on a mechanism other than calling the renderer to add cache data. This need to handle
API changes
\Drupal\Core\Database\Query\Select,\Drupal\Core\Database\Query\SelectExtender, and \Drupal\Core\Entity\Query\QueryBase will implement \Drupal\Core\Cache\RefinableCacheableDependencyInterface and use \Drupal\Core\Cache\RefinableCacheableDependencyTrait
| Comment | File | Size | Author |
|---|
Issue fork drupal-3516034
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
quietone commentedChanges are made on on 11.x (our main development branch) first, and are then back ported as needed according to the Core change policies.
Comment #3
catchComment #4
catchIf we add a cache context to a query, then only variation cache / render cache will actually cause that context to be calculated (once it's bubbled up to something that going to be written to cache), until that point, it's just a string in an array like
['contexts' => 'user'], so we should be fine to add contexts and tags directly and not need a callback system.Comment #5
catchI think we need something like ::addCacheableMetadata() and ::getCacheableMetadata() on the query objects (entity and database), and then the code that executes the query is responsible for getting the cacheablemetadata and merging it into the render array.
Comment #6
godotislateDoes it make sense for
\Drupal\Core\Database\Query\Select(and maybe\Drupal\Core\Database\Query\SelectExtender) and\Drupal\Core\Entity\Query\QueryBaseto implement\Drupal\Core\Cache\RefinableCacheableDependencyInterfaceand use\Drupal\Core\Cache\RefinableCacheableDependencyTrait? That providesaddCacheableDependency(). There's nogetCacheableMetadata(), but there are getter methods for context, tags, and max-age, and you can also then merge the query object as a cacheable dependency to any other cacheable dependency object.Comment #7
catchThat sounds promising.
Comment #9
godotislatePut up draft MR 12535 implementing #6. Added a kernel test which essentially tests what
Drupal\Tests\node\Functional\NodeAccessCacheabilityTest::testNodeAccessCacheabilitySafeguard()does, but without rendering. There probably should be additional test coverage for more generic usage.A couple questions came to mind:
Drupal\Core\Database\Connection\SelectInterfaceandDrupal\Core\Entity\Query\QueryInterfaceextendRefinableCacheableDependencyInterface? It would eliminate the need to doinstanceofchecks, but there are BC considerationsEntityStorageInterface::loadByProperties()use entity queries under the hood. Should there be an optional second parameter onloadByProperties()to capture cacheable metadata?Comment #10
godotislateThen there's the EntityRepository method
loadEntityByUuidmethod that calls EntityStorage::loadByProperties, and possible other rabbit holes, so I don't know if it makes sense to add an additional parameter everywhere, at least not in scope for this issue.Comment #11
catch#9.1 does the
Selectclass count as a base class here? It doesn't have theBasekeyword at the end but that's probably because it was originally added in c.2008. If it does, then I think we can add the interface to SelectInterface directly because Select will then implement the methods for any subclasses.#9.2 based on #10 that sounds like good material for a follow-up but we should open one :)
Comment #12
godotislateYes, it looks like
Selectis essentially a base class.SelectExtenderalso implementsSelectInterface, but it is essentially a base class as well. And for entity queries,QueryBaseis the base class forQueryInterface. Updated the MR to have the respective interfaces extendRefinableCacheableDependencyInterfaceand added a CR https://www.drupal.org/project/drupal/issues/3533260.The SQL entity query class merges the cache metadata from the underlying Select object, so this works for content entities. Config entity queries or other entity types using key value storage don't capture any cacheable metadata from SQL key value queries. Doing so would add complexity to key value and this seems like an edge case.
Follow up per #11: #3533260: [PP-1] Make it possible to capture cacheable metadata from underlying entity queries in EntityStorageInterface::loadByProperties().
Probably still needs some more test, so leaving in NW and the Needs tests tag.
Comment #13
godotislateOK, added more general tests. I also removed the NodeAccessCacheability kernel test I'd added before, because I think it's duplicating other tests mostly at this point, and instead added asserts in the functional test to test that the query tags get added to the render array when the query is not running in a render context.
Additionally, I found dead/broken test code #2878483: loadEntityByUuid() should skip access checks in query alter hook in EntityTestHooks.php, and I removed all that and reused the hook for tests needed here.
One last thing:
It occurred to me that maybe entity query objects should include the corresponding entity type list cache tags and entity type list cache contexts. However, when I tried that the list cache contexts for the Node entity type are
['user.node_grants:view'], which would kinda defeat the point of all the logic in the query alter hook only to add those contexts conditionally.Separately I was also wondering if the page cache contexts should be added to
PagerSelectExtender, but I haven't thought that through completely.I think the MR is ready for review regardless.
Comment #14
godotislateUpdated IS for solution in MR 12535.
Comment #15
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #16
godotislateRebased to resolve merge conflict.
Comment #17
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #18
godotislateRebased.
Comment #19
godotislateHad a thought today about how to get the query cache metadata into the response without the early render:
Doing this would make sure that the cache metadata is applied to the response correctly, but it would not bubble through the render array, so there are pros and cons to this.
Comment #20
catchI think we'd need to be able to retrieve it from the query to then apply it to a render array or similar - because the cache metadata needs to apply to render cache items at a lower level than the full response.
Still need to review the current MR here.
Comment #21
kristiaanvandeneyndeI like #19, but we indeed can't wait until the very top level to add that cacheable metadata back in like @catch said in #20.
What if we add a service like in #19, but call and flush it whenever we're about to attempt to write to a cache in Renderer? This means that we would be able to add the metadata to the very render element that triggered the query in the first place. It could then bubble up or not just like it would for any normal render scenario.
E.g.: We have a page with a region with a block with a view with some nodes. The view triggers a query, so when we start rendering that the service knows the current cache key is the view's and when Renderer gets to trying to store the view with said cache key, our service returns metadata which we add to the render array.
It then gets written to the cache (or not) and bubbles up (or not) based on normal Renderer flow. Only thing to look out for is that we do not poison this system by adding cacheable metadata to db cache lookup queries as that could get really ugly.
Comment #22
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #23
godotislateI think there might be an issue with this approach, since queries often happen during the render array build and before the actual render.
Take an example of using an entity query during the build to create a list of entity teasers:
So the render cache entry for the first rendered entity might then have cache metadata that belongs only to the list.
There could also be more complications if there are multiple queries executed during the build step, each contributing cache metadata.
Comment #24
kristiaanvandeneyndeSorry, you're right. I was living in a perfect world where (almost) everything is a placeholder or pre_render callback. Something I'd like to see one day :)
But that could still work, though. If we know we're building something renderable, we could call the service in advance with the cache keys that we are setting on the very thing we're building. And then everything I said in #21 could still happen. I.e.: We'd be able to know exactly which metadata came from which render array and attach it back when going through the render process or even on the render array itself.
Something like:
Very rough idea, but I think there are things we can do here to get the "collateral cacheable metadata" from queries being fired with or without our knowledge and add it to the right render array.
Comment #25
catchThe other issue with #21 is fibers.
If we start rendering one render array, then start rendering a different render array in a different fiber before the first has finished, then cache the first render array with all the metadata added in between, it will include some for the second render array. Then when we cache the second render array, some would be missing.
Comment #26
godotislateMy idea behind #19 was to make it possible to remove the weird
render()call inDrupal\node\Hook\NodeDatabaseHooks::queryNodeAccessAlter(), which is needed since otherwise cache considerations in node_access queries that occur don't get captured at all. For queries that are executed outside the rendering of arrays, it's the only the render context provided by EarlyRenderingControllerWrapperSubscriber that captures the metadata, so using the proposed service instead would work somewhat similarly. This would be a stopgap until all usages of entity queries (or loadByProperties()) have their cache metadata properly applied to render elements.Though as mentioned, removing the render() in the query alter hook would mean that node_access queries executed while rendering a render element would not be captured to that element's cache metadata, so I agree that the idea doesn't really work.
Comment #27
godotislateRebased the MR.
Also updated the issue title, since we're likely not removing that render() call here after all.
Comment #28
catchCode here looks good to me - didn't do a very in-depth review yet.
I think to remove the render() call we'd need to do something like:
1. Use the API everywhere applicable in core.
2. A CR to say that transferring metadata from queries to render arrays will be required (probably in 13.x)
3. Probably remove the actual ::render() call only once we've done some related issues like the ones linked from #3507959: Refactor the render context stack to avoid a static property.
Comment #29
godotislateAdding #3028976: Enable an entity query's return value to carry cacheability a related issue because it's a possibly a dupe.
Comment #31
smustgrave commentedAll feedback appears to be addressed, any objections to moving up?
Comment #32
godotislateShould be ready, and @catch has already looked at it a little per #28.
Comment #33
smustgrave commentedOkay going to lean on that.
Comment #34
mxr576Moving back to needs work until my previous comment is answered or resolved.
PS.: I so glad that this issue is moving forward \o/
Comment #35
godotislateRefactored the test and addressed the comment.
Comment #36
godotislate@mxr576 Do the latest changes address your concerns?
Comment #37
smustgrave commentedWanted to bump 1 more time if @ mxr576 you could take a look at the feedback.
Comment #38
godotislateActually, since #3390193: Add a drupalGet() method to KernelTestBase got in, I'm in the middle of trying to refactor to kernel tests. Moving back to NW.
Comment #39
godotislateOK, refactoring to kernel tests is done.
I found one bug with kernel test alter hook implementations, which I opened #3585498: Alter hook implementations in Kernel test classes are not executed for. Technically that blocks this now, but I added that fix here so the tests would pass. In the meantime it'd be great to get this looked at again.
Comment #40
godotislateComment #41
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #42
godotislateRebased.
Comment #43
godotislateReplying to @berdir's MR comment here for more visibility:
This is a good question that I don't have a solid answer for, but here's one idea
Somewhere in query execution, add the cache metadata to the current render context, if it exists
preventCacheBubble()on SelectInterface and entity QueryInterface objects, and trigger a deprecation warning that it needs to be called before executing the query, with the warning message being that early render is going away and that handling the cache metadata by applying it to the render element.preventCacheBubble()would also be removed (or maybe it would be deprecated and do nothing until the next major after that?)One issue is that the DX would be pretty bad: having to through everywhere there's a select query or entity query and adding a method call, when especially for select queries, there probably are a lot that don't affect caching at all.
This is a good idea, and I had a thought that we wouldn't need to do this conditionally. We could just do something like in
Select:::execute():This blew up a bunch of tests, though, especially because there are times where queries are executed before there is a container.
The last idea I had was to use property hooks for the tags, contexts, and max-age properties, and set a boolean property to true whenever one of those properties is set. This would limit us to PHP 8.5 and 12.0+ though, on top of not being able to use the trait anymore because of conflict in the property definitions.
Comment #44
berdir> This blew up a bunch of tests, though, especially because there are times where queries are executed before there is a container.
We could add a hasContainer(), not nice, but for a BC layer maybe acceptable. FWIW, the whole thing ties the database layer, which at least at some point only had limited dependencies on the rest of Drupal closer to other components. A slightly less hard coupling might be doing it through extra arguments and not directly on the query object. Haven't thought this through.
And yes, if we want to ensure that this is correctly used we'll have to look at lots and lots of queries. At least for direct database queries, we only need to do it for queries with alter hooks I think. entity queries have those by default. Maybe we can also limit to currently having a render context, as other cases wouldn't have worked now anyway? That should exclude most of test code and stuff like that.
Comment #45
godotislateNew thought: if we can land #3481903: Support hooks (Hook attribute) in any registered service, then I can add a query alter hook in the early rendering subscriber itself. The hook would add the cache metadata from the query to the render context, and I think I can introduce a mechanism to see whether there was any cache metadata set on the query object. (I'll also have to familiarize myself with the hook ordering API to see if can get that hook to run last.)
That other issue has had some blockers that are hard to resolve, but I think I might have an idea on a way forward. Doing it this way would mean I don't think either can get in for 11.4 at this point, but it is what it is.
Comment #46
godotislateThink this should go back into the shop for a bit for #43-45.