Problem/Motivation
The render system has \Drupal\Core\Render\RenderCache(Interface)
. This is capable of doing cache redirects, based on bubbled cache contexts. I.e. compare the originally known cache contexts with those that the final-to-be-cached data depends upon, and if that set is different, cache it as a cache redirect. (See RenderCache
's docs for details.)
This is useful outside of the render system too.
Proposed resolution
Abstract RenderCache
into a service (name TBD, initial quick thought: RedirectingCache
) that RenderCache
can use, so that RenderCache
is just one of many things able to use this generic concept.
Remaining tasks
TBD
User interface changes
None.
API changes
None, pure API addition.
Data model changes
None.
Release note snippet
A new VariationCache API has been added. Users of the Variation cache contributed module should review whether it is still required.
Comment | File | Size | Author |
---|---|---|---|
#193 | 2551419-nr-bot.txt | 90 bytes | needs-review-queue-bot |
#189 | 2551419-nr-bot.txt | 90 bytes | needs-review-queue-bot |
Issue fork drupal-2551419
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
Berdir#2479459: BlockRepository::getVisibleBlocksPerRegion() does an uncached entity query on every request would be a possible use case for this.
Comment #3
Wim Leers#2429617: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!) is the reason this issue was created: that's another use case.
Comment #4
dawehnerSo maybe something like this?
Comment #5
joelpittetBumping to 8.1.x due to feature request nature. Triaging majors.
Comment #10
Wim Leers#4: looks like a good start. I think it'd be great to
\Drupal\Core\Render\RenderCache
to use whatever new class we introduceDynamicPageCacheSubscriber::responseToRenderArray()
+DynamicPageCacheSubscriber::renderArrayToResponse()
+DynamicPageCacheSubscriber::$dynamicPageCacheRedirectRenderArray
That'd prove the benefit.
Comment #11
dawehnerI agree, this is by no means major.
Comment #13
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThe Workspace module would also benefit from this if we'd be able to make the entity storage derive its persistent cache keys from the cache contexts of the request.
Just for reference, this was discussed in #2784921-164: Add Workspaces experimental module, point 12.
Comment #16
kristiaanvandeneyndeGoing to take this PoC and clean it up a little to add to Group. If it works well there, I'll let you know.
My current plan is to create a ContextCacheFactory service which has a get method that takes the name of a cache backend service. That would then return a polished version ContextCache from Daniel's proof of concept, wrapping said cache backend so it works with cache contexts and redirects.
Or does anyone have a better suggestion?
Comment #17
Wim LeersFirst: THANK YOU! 🙏
I think
ContextCache
is a confusing name. We already have "cache contexts" and "Context API", and probably even more uses of "contexts". I think we need a better name, but I'm not sure what that is yet.Thinking out loud:
VariantCache
VariationCache
AutoVaryingCache
BubbleawareCache
Comment #18
kristiaanvandeneyndeOkay so naming things is always hard :) Will start of with
VariationCache
orCacheContextAwareCache
as I recall a core test recently being renamed from BubblingSomething to CacheContextSomething.The factory I had in mind would be declared as such:
Then you could "create" a cache context aware version of any cache backend like so:
VariationCacheFactory::get($bin)
would simply invokeCacheFactory::get($bin)
and wrap its return value in a VariationCache object.Please note: For this to work nicely with the recently added entity.memory_cache service, it will need to be properly declared as a cache backend. See how I handled this for now in Group:
But there are other issues that (sort of) handle that:
Shall I create a dedicated issue that handles the above?
Comment #19
catchJust reflecting that there's some code here, I keep thinking this is a duplicate of the issue with a patch on it.
Comment #20
kristiaanvandeneyndeHere's the relevant part of the huge patch found here: #3041087: Update the new permission layer to be alterable.
Ignoring the obvious fact that the classes are namespaced for Group right now, it should give you a general idea of the approach I've taken. Oh and the best part is: It actually seems to be working :)
Comment #21
kristiaanvandeneyndeIf you're looking for a concrete example, scour the massive patch in the other issue for the ChainGroupPermissionCalculator class.
Comment #22
kristiaanvandeneyndeLet's see what this does while I'm away for lunch. Should largely work, but I need to do some further investigation whether I got all bases covered and the following line in
::set()
needs love as the old CID-comparing didn't take cache tags into account:Comment #23
kristiaanvandeneyndeForgot to actually inject the new cache factory.
Comment #24
kristiaanvandeneyndeFor working on such a complex issue I sure am making a lot of rookie mistakes here...
Comment #25
kristiaanvandeneyndeOh for the love of...
Sorry for all the noise.
Comment #27
kristiaanvandeneynde518 test failures :o
Well, at least we have something to work with now :)
Comment #28
tstoecklerReally nice work. The patch itself looks pretty good. Didn't play around with it yet, though.
Noticed one thing:
It seems the callers of this method do not account for this case.
Comment #29
tstoecklerWow, I guess today is understatement day. What I meant to say instead: really impressive (!) and surprisingly elegant for such a complex topic
Sorry for the glibness...
Comment #30
kristiaanvandeneyndeRe #28 Which return value would that be? It's not entirely obvious from the context.
Comment #31
kristiaanvandeneyndeThis should fix some tests, I also type-hinted the cache keys and cleaned up the cache key comparing logic to do exactly the same as before through use of a public method on the variation cache.
Comment #33
tstoecklerRe #30: I was talking about
VariationCache::createCacheId()
:Here it is with some more context
Comment #34
kristiaanvandeneyndeI've refactored parts of RenderCache to account for the changed approach to how it should call the cache. It was currently using the same cache keys for both the redirect and the actual cache item.
Also made sure the test case calls the cache correctly, no longer tries to test a method that was removed and some other minor test fixes. Hopefully this will make fewer tests fail now.
Comment #35
kristiaanvandeneyndeRe #33 ah, you're right. That was for a short-circuit at the top of RenderCache::get() and ::set(). It seems to be unused now but maybe that served a purpose: If we can't create a cache ID, we shouldn't try to get or set the cache entry.
I should update VariationCache to include checks for whether the cache ID is not FALSE so the behavior remains unchanged. The only difference being that this won't short-circuit RenderCache as early as it used to because it now first needs to call the cache.
Comment #37
kristiaanvandeneyndeOver a 100 failures fewer 🎉
Comment #38
kristiaanvandeneynde@tstoeckler Looking at the test failures, it seems what you were on about in #28 is causing a lot of failures. So we definitely need to refactor that part too. But I'm out of time this week.
Attached is a patch that should fix some more misuse of the new cache.
Comment #40
kristiaanvandeneyndeFixed a few more test cases and fixed the render cache calling the backend when it didn't use to.
The VariationCache::createCacheId() method now always expects valid keys and no longer returns FALSE as that was behavior unique to the RenderCache (keys might be missing) and cache backends should be able to rely on the fact that they're being called correctly.
Introduced RenderCache::isElementCacheable() to reintroduce the short-circuiting behavior to RenderCache.
Comment #42
kristiaanvandeneyndeAdded some more test fixes and added ::delete() and ::validate() to the variation cache while also changing ::set() to only set a redirect if need be.
It does seem like an awful lot of core tests call the render cache bin directly rather than going through RenderCache, making this a tough nut to crack.
Comment #44
kristiaanvandeneyndeWell if we're going to make silly typos, of course tests are gonna fail. Let's see what happens next.
Comment #46
kristiaanvandeneyndeWait, so I reran the tests without the change from the interdiff? What a nice way to spend my time.
Comment #48
kristiaanvandeneyndeFound another culprit! We used to store the cache redirect by the pre-bubbling CID and now I simply passed the keys and the cacheability of the post-bubbling element, effectively caching it under the post-bubbling CID.
I realized we can actually store the redirect indefinitely, so I did exactly that but did ensure we use the pre-bubbling contexts (and thus CID).
Comment #50
kristiaanvandeneyndeHmm only 5 fixes :/ Back to the drawing board I guess.
Comment #51
kristiaanvandeneyndeSo the things we were caching got influenced by the cache context folding, which is no longer the case. To fix that, I made sure the potentially updated cache tags and max-age from the actual variation cache got merge back into the cache object's #cache data when retrieving a render array from the cache.
Comment #53
kristiaanvandeneynde🎉🎉🎉
I still have some ideas as to what might be wrong with the last few tests, but this is some amazing progress already.
Comment #54
kristiaanvandeneyndeFixed a regression regarding the rendered cache tag and a bug where I would only update the cache keys for the redirect if the redirect got updated.
Comment #56
kristiaanvandeneyndeMake sure the cache tags we retrieve from the cache have a proper incremental index after filtering out the 'rendered' tag as some tests were doing exact matches. You could argue that PluginTest should have used assertEquals() instead of assertEqual() as I believe the former ignores numeric indexes and would thus not have caused a test fail.
Also set all of the combined cache tags and max-age on the cache redirect like the old system used to. I first figured we could cache the redirect indefinitely, but it seems the tests disagree. I can think of one edge case where this would make sense, so trying things out with the old logic.
Comment #58
kristiaanvandeneyndeI think I know what broke TrackerTest (and potentially others). RenderCache::set() used to update $elements cache tags because it folded the cache contexts and got $elements by reference. The new version doesn't, but properly folds the cache contexts only when setting the cache item.
So the X-Drupal-Cache-Tags header no longer contains cache tags that were only added after cache contexts were folded. The X-Drupal-Cache-Contexts header remains unaffected.
I don't know about you, but I find it rather dirty that a cache's set method affects the thing it's caching for the rest of the runtime. That said, the whole #cache property needing to be a live copy of the cache metadata -even though the actual cache should be responsible for anything cache related- is the real culprit here.
I'm thinking of not allowing ::set() to affect the object it's caching, but to add the cache tags it finds during cache context folding directly to the response. That should allow the tests to pass, without needing every variation cache invocation to account for what might happen when they call ::set().
It should also fix most remaining test fails. The only thing it won't do is affect the #cache metadata during the current runtime. I don't know how much code depends on that being updated as you go, but I'm hoping none as it's a rather nasty behavior to expect in the first place.
Edit: The question then becomes whether we should also add the tags from ::get() to the response rather than back to the render array. Will investigate that last (after fixing ::set()).
Comment #59
kristiaanvandeneyndeI've removed the render cache (ab)use from DynamicPageCacheSubscriber to see what that does. I've also fixed a test case.
The idea here is that we should not set the cache's optimized tags in RenderCache during ::get() or ::set(). So if this subscriber thingy works out well, the last move would be to update FinishResponseSubscriber to consistently set X-Drupal-Cache-Tags and X-Drupal-Cache-Contexts.
I say consistently because now it shows you the contexts after optimization, but cache tags before optimization unless the cache tags came from optimization in RenderCache and bubbled up to the response. Which makes no sense at all because anything coming from other places such as route access checks are not properly represented.
So my next patch will aim to set X-Drupal-Cache-Tags to the response's tags plus the ones that come out of cache context optimization and the X-Drupal-Cache-Contexts to the optimized contexts (remains unchanged).
Setting the cache's tags on the render array during ::get() did fix 285 test failures but I'm hoping the above will do the same.
Comment #60
kristiaanvandeneyndeComment #62
kristiaanvandeneyndeThis should remove the clutter from the changes in the last patch.
Comment #64
kristiaanvandeneyndeThis should fix the other fails, although we might need tor evisit the tests for EntityResourceBase (jsonapi) in a follow-up because its logic wasn't updated to only account for one cache entry, unlike rest's EntityResourceBase.
Comment #66
tstoecklerGreat work here! I had totally forgotten that Dynamic Page Cache currently (ab)uses the render cache redirect system, that alone makes this issue such a great cleanup, regardless of further possibilities. The subscriber looks so much more readable with the patch, really nice.
Comment #67
kristiaanvandeneyndeThis line used to cause redirects to be stored because the cache contexts would almost always differ. We don't have that now, so the variation cache tries to find stuff varying by the contexts of the last response that was saved.
So it used to benefit not only from the RenderCache's cache context resolution, but also its redirect resolution, meaning everything stored under the 'response' cache key would eventually vary by the maximum amount of cache contexts that ever bubbled up to the response.
So because the response doesn't always vary by the same contexts, we need a new type of caching wrapper that also handles redirects the way RenderCache does (and then feed RenderCache that cache wrapper).
Comment #68
kristiaanvandeneyndeI've removed the self-healing strategy from RenderCache and put it in VariationCache as other systems relied on it as well (see #67). This makes RenderCache a really clean wrapper around the variation cache.
I expect this to break a lot more things and there are some todos here:
Edit: Also need to remove sharesCacheKey() as it no longer serves a purpose
Comment #70
kristiaanvandeneyndeWow, that went far better than expected. These small fixes should reduce failure noise even more.
Comment #72
kristiaanvandeneyndeSome weird errors here:
I did notice DynamicPageCacheIntegrationTest is failing because the self-healing being moved into the variation cache. It used to HIT because RenderCache didn't need a first redirect to find the cache ID, i.e.: it would construct it itself because render arrays know more about their cacheability than they should during cache gets. Now, the self-healing has moved to an earlier phase.
Old scenario:
New scenario:
So as you can see the only reason this worked (and was more efficient) under RenderCache was because we would tell RenderCache what contexts to use during ::get().
In order to maintain this more efficient approach of only storing redirects for those variations that deviate from the original, we would have to rewrite VariationCache to always ask for the "original cacheability" during ::get() and ::set(), which would make the solution less elegant.
So it's either:
I'll try the second to see how many remaining test fails are solved that way, but it might be less elegant :(
Comment #73
kristiaanvandeneyndeCome to think of it, RenderCache was only more efficient on the first tier of caching.
Suppose you have a render array that outputs your "housing situation".
Output would be one of the following scenarios:
Any other possible combination still exists, but just wouldn't affect the output.
In the current variation cache, we'd eventually see the following reachable cache entries:
Not really efficient, right? There would also be some cache entries that can no longer be reached (e.g. keys:apartment) depending on which entries were stored first.
In the old RenderCache, we'd eventually see the following cache entries:
As you can see, any house without a garden is still stored inefficiently and would also possible have some "dead" cache entries.
So what we're trying to reach is this:
This is actually possible if we store the minimal shared contexts between redirects at every step of the way rather than the total sum of contexts up front.
So in the above scenario we'd end up with the following cache redirects:
Meaning that the retrieval of an apartment only needs to follow one redirect, a house without a garden needs to follow two and a house with a garden three. This could prove to be a performance hit. The upside is that your cache is now far more efficient and will contain no dead entries whatsoever!
Any thoughts on this?
Comment #74
kristiaanvandeneyndeThis should fix the DynamicPageCacheIntegrationTest failure in mentioned in #72. It also takes a new approach to variation caching as outlined in #73 by storing several redirects for complex cache entries.
If everything works out great, we're down a test fail. If this breaks something I did not foresee (ReousrceTestBase jumps to mind), we'll see how bad it breaks :o
I probably still need to further dumb down RenderCache and expand FinishResponseSubscriber as mentioned in #68.
Comment #76
kristiaanvandeneyndeBecause getting it right the first time is too much to ask for :D
Comment #78
kristiaanvandeneyndeThis adjusts FinishResponseSubscriber, which should fix RendererBubblingTest and TrackerTest, but will probably break other stuff. I also rewrote the self-healing tests already in preparation of removing it entirely in favor of VariationCache tests (which will also check for self-healing).
Comment #80
kristiaanvandeneyndeGreat so they got fixed!
The new REST/JSONAPI fails are because the user.permissions cache context is being optimized away and X-Drupal-Cache-Tags and X-Drupal-Cache-Contexts now properly reflect that. So those tests need updating.
This leaves only a few failures, which I'm hoping to fix by moving the FinishResponseSubscriber rework to DynamicPageCacheSubscriber so it happens earlier and is available to all subscribers that should have the updated contexts/tags.
Comment #81
kristiaanvandeneyndeThis should fix a few more tests (and again might break thousands). I'm expecting to see the rest/jsonapi stuff from #80 still broken because I haven't put in a fix for those yet.
Comment #83
kristiaanvandeneyndeAha! It turns out X-Drupal-Cache-Contexts is set too late!
Cache contexts only apply to the dynamic page cache, which runs at
$events[KernelEvents::RESPONSE][] = ['onResponse', 100];
. However, we do have RESPONSE subscribers that add cacheability after that. This cacheability is still used by page cache, but since that system only leverages cache tags, any cache context added after DynamicPageCacheSubscriber is irrelevant and should thus not show up in the header.So what I suggest is this:
Please note: RouteAccessResponseSubscriber will also still be capable of adding cache tags, but any cache contexts added there (by running
$response->addCacheableDependency($access_result);
) will be disregarded in the header because it is too late there to actually do anything.Maybe in the future we should have three headers?
The above would be the most accurate way to represent what is actually being used in the caching layer.
Comment #84
kristiaanvandeneyndeI've moved the contexts debug header to the more logical place. A lot of tests extending ContentTranslationUITestBase were failing but it seems many of them were specifying incorrect cache contexts.
Comment #86
kristiaanvandeneyndeSigh, service definition fell through the cracks of patch creation...
The problem we're facing here is that the debug headers contained incorrect data but a lot of tests actually queried for (and found!) the right data. The only reason this worked was because RenderCache bled context-folding-affected cacheability up to the header level, even though it shouldn't have. I've since exposed (and fixed) this, but ideally we should have caught and fixed that bug first in a separate issue.
Can still split up the final patch to tackle both issues separately, I suppose.
Comment #88
kristiaanvandeneyndeOkay, so take Cache.Drupal\Tests\node\Functional\NodeAccessCacheabilityTest for instance: It fails because of these lines:
'user' is no longer in the header because the header is now set in DynamicPageCacheSubscriber, given how cache contexts have zero impact on caching after that point. Yet, RouteAccessResponseSubscriber adds the user cache context to the response, after DynamicPageCacheSubscriber.
So even though the user cache context will be set by the time the page reaches PageCache, PageCache does not care about cache contexts (and shouldn't!) which begs the question: Why is NodeAccessCacheabilityTest even checking for the user cache context in the first place? It won't do anything at all.
The only time that 'user' could be set on the actual build is if it were added manually or an access result with the context was added to '#access', which makes Renderer::doRender() add it as a dependency to the build.
Footnote:
The only exception to the above that I could find is AnonymousUserResponseSubscriber, which checks the contexts after DynamicPageCacheSubscriber so that it could optionally add a cache tag to the PageCache entry. But that one checks for 'user.permissions', not 'user'.
Comment #89
kristiaanvandeneyndeRight, so we are definitely going to want to "fix" the header issue we have going on here in a separate issue. For that reason I have restored the original headers to their original location (FinishResponseSubscriber) but added two new ones already that properly reflect the actual cache contexts that are being used. I might need to revert a few test adjustments left and right now or make them use the new headers.
For the record, we now have:
['user', 'user.permissions']
is a possible outcomeComment #91
catchJust to say #73 is really useful, but I'm having trouble thinking through what the real-world implications are in terms of hit rates vs. cache redirects (probably it would depend a lot on what the actual variations and traffic patterns are).
Comment #92
kristiaanvandeneyndeThis whole header thing is really getting in the way. There are so many unrelated test fails in this issue because:
Created a spinoff issue here: #3052395: Our cacheability debug headers show completely wrong information. If we can commit that to 8.7.x and 8.8.x already we might have a better shot at fixing this issue without so many random test failures.
Comment #93
kristiaanvandeneyndeRe #91 that's where the performance experts should come in :) Every get will be slightly more expensive for complicated cases only. So only if you have three layers of cache contexts bubbling up, will the retrieval have to go through extra steps.
Old render cache
New variation cache
So as you can see, we already take an inherent hit by dropping the expectation of #cache being present so the variation cache can be used for storing items other than render arrays. Any hits beyond that only happen if there are multiple layers of redirects possible and the active context values point in that direction.
Comment #94
catchhmm I think we should stick with the current method to be honest, that could add up to a lot of cache gets on a single page and while theoretically the number of variations could be a lot higher, it's very dependent on traffic.
Comment #95
kristiaanvandeneyndeOkay so given how I've been sidetracked by this whole header thingy, here's what I want to do:
Comment #96
kristiaanvandeneynde@catch in #94: We can't stick with the old system 100% because that one used to cheat by having #cache available to it. So we will see an increase to at least 2 gets when there are contexts (2nd bullet for variation cache in #93).
We can try and stick to the self-healing mechanism from RenderCache from that point on, but that would increase the second bullet point from #93 to 3 gets as well:
So by sticking to the old self-healing system, we need 3 GETs for every cache retrieval in a self-healing scenario. With the new multiple-redirect system we also have 3 at minimum.
So the question is: Do we opt for a system with a far better hit ratio and more efficient storage that might need to do more retrievals as soon as we have 2 or more layers of bubbling contexts? Or do we stick to the old, less efficient system but still accept that from now on everything will need 3 lookups?
Either way we can't get around the extra lookup. What we can make sure is that, with the new system, an object with only top-level contexts will still only need 2 lookups instead of 3.
Comment #97
kristiaanvandeneyndeNo interdiff because this basically undoes a lot of try-and-error bughunting with regard to the test fails. I know now that whatever fails we get regarding missing tags we need to adjust to no longer expect tags the bled through from RenderCache.
Comment #99
kristiaanvandeneyndeA lot of similar failures in Hal, jsonapi and Rest and no more weird Umami failure. I call that a win.
Comment #100
kristiaanvandeneyndeWell this was fucking stupid of me.
Also moved the context folding after the max-age check because max-age cannot be affected by it and we thus save a performance hit in case of max-age 0 scenarios.
Comment #102
kristiaanvandeneyndeThe last one is easy.
TrackerTest tests code that never adds the user cache context to its content. What it does do, however, is add the user cache context to the local tasks. But those get properly placeholdered now and thus the optimized user.permissions cache context's tags never makes it to the response level (and thus cacheability headers).
Expecting fully green tests now. Adding "Needs tests" tag for adding unit/kernel tests to the new VariationCache.
Comment #103
kristiaanvandeneynde🎉🎉🎉 FINALLY 🎉🎉🎉
Now that I can prove this works, I can already use it in Group and get a release out the door at long last.
Comment #104
kristiaanvandeneyndeThis adds a performance boost and unit tests showing that the variation cache works (which we already knew). Nevertheless, it makes it easier for people to understand what's going on inside the variation cache.
This should still go green and as far as I'm concerned is finished. I will be using this version in Group while waiting for it to be committed to core. Unless all of the sudden tests start failing, that is. Which they shouldn't :o
Comment #105
borisson_Overall this is looking very good, if there are remarks in here that have been resolved earlier I'm sorry about that. Most of these are nitpicks about documentation.
I'm not sure about this comment, we should probably explain why or remove the comment. Because all this does is make me confused.
Needs docblock.
I'm not sure if I like just calling this operation
get()
, since that is not very telling. Especially when doingVariationCache::get($keys)
somewhere else. Should this begetRedirect
, or is this not what this does?Same remark
This method is also not super clear. We're deleting only the last item from the chain, but because we are passing in an entire array of keys it sounds like the entire chain will be deleted and not one item.
I think a clearer method name can help here.
I don't really like assignments in if statements. But that is just personal preference I guess. Just wanted to mention that I think splitting this in 2 lines can make this somewhat easier.
Needs docblock.
Same remark as earlier:
getBin
?Is this something you'd want to tackle here as well? I think we should probably already open a followup and link to it in the code?
Comment #106
kristiaanvandeneyndeRe #105: Thanks for your review! Answers below.
Comment #107
kristiaanvandeneynde@catch in #94, I just thought of another amazing benefit to the new redirect system. In the current system, you are not allowed to vary a local action by a cache context with a parameter that has many options. In the new system, it would not be a problem.
An easy example is a fictional "Node Bookmarks" module with context
user.has_bookmarked_node:<NODE_ID>
returning '0' or '1'. If you want to show a local action saying "Add bookmark" or "Edit bookmark" to every node page you'd effectively kill caching altogether because the local tasks block would keep adding the context for every node to the ever-growing RenderCache redirect and start varying every page by said contexts.In the new system, we no longer have this notion of ever-growing cache contexts, but would instead have to perform only one extra ::get() on every node page that uses the bookmark module. Nodes of node types that don't use the module wouldn't even have to perform said extra ::get()!
This opens up so many options as we can finally use any cache context with a potentially high amount of parameter options without having to worry where it'll end up in the render tree and whether it will kill caching.
So to sum up the benefits of the new system:
I'd really like to get rid of the old system in favor of this new one as it:
TL;DR: I think a performance hit of sometimes having to perform one extra get is far more acceptable than the current performance hit where we completely recalculate whole render trees that were already cached, simply because we couldn't find it in the cache.
Comment #108
ndobromirov CreditAttribution: ndobromirov at FFW commentedWhether it's faster or slower, we need to know how much...
Comment #109
kristiaanvandeneyndeI've never done benchmarks in my life. Any information on how to execute them with Drupal 8?
Comment #110
BerdirWell, there are two things:
One is single-request comparison, you can for example use blackfire for that, on cache hits (with and without dynamic page cache I'd say) and also misses.
The other is how it behaves across multiple requests, which will be much more challenging but at least as important.
Comment #111
ndobromirov CreditAttribution: ndobromirov at FFW commentedI am eager to benchmark this but due to lack of time I can not at the moment...
Bench-marking however is fairly straight-forward - have a base-lime (no patch) and then have the patch version tested.
For Drupla 8 there are currently 2 ways to test a single request (as @Berdir mentioned).
- One is locally (no service registrations) with tideways + xhprof (DrupalVM has that prepared). Then install the Drupal's xhprof module and you are good to go.
- The other way is the BlackFire that does essentially the same but you need account on their site. I have not used that yet though.
In the end it will expose information about CPU time and RAM usage.
Once you have profiled the no-patch warm and cold caches and the same with the patch, you can compare and decide are the changes OK or not.
Note that any benchmark tool will add overhead due to the data collection routines.
Some PHP extensions have a high overhead as well, for example x-debug. Whenever bench-marking something it's a good thing to stop the x-debug extension fully from php (not just disable all breakpoints).
So in the end you can test the warm cases with the simplest AB tool that will just bombard a URL with some amount or requests.
This will hit example.com a hundred times with a single client thread and present you with a nice aggregated results.
That's pretty much all that I am using when in need to do benchmarks.
Comment #112
kristiaanvandeneyndeThanks for the thorough explanation! I just got Blackfire working and will create a few pages that mimic the things we should definitely be testing. For making sure the caches are cold, I can use an event subscriber that clears certain cache tags on request.
Comment #113
kristiaanvandeneyndeResult 1: Warm caches, logged in as an admin, getting a simple node page.
WITHOUT patch:
WITH patch:
As you can see, the new cache does not affect load time for simple requests. What I did find was that you can clearly see how the case without the patch puts more stress on the renderer whereas the profile with makes the cache work harder.
If we can optimize
CacheContextsManager::convertTokensToKeys()
andCacheContextsManager::optimizeTokens()
to statically cache its results for the duration of the request, then we should definitely see the new approach beat the old one.Comment #114
kristiaanvandeneyndeSeeing similar result for other "simple pages": Same results or the new system being ever so slightly slower (1ms-5ms on a total of 1000-1200ms).
Keep in mind that this was to be expected. The new cache should definitely outperform the old when it comes to not having to render things we still have in cache (better hit ratio) and not having to call cache contexts that don't apply. These things are harder to test with blackfire because they usually span multiple requests, not just a single one.
I'd have to develop a custom page for this. Going to create one that works like #73
Comment #115
kristiaanvandeneyndeThese results are from the complex case from #73 where I included a 20ms usleep() for every render call. This is very generous depending on what you're rendering.
Because of the higher cache hit ratio, the profile without the patch called usleep() 36 times whereas the one with the patch called it only 13 times! This clearly indicates that the volatility of our current redirect system (i.e.: it clears too much, too often) will cause reduced performance on a site that introduces new content on a regular basis.
It also shows a much higher cache hit ratio as:
During this entire process the old cache didn't get a single hit, whereas the new one got 23 out of 24 hits!
If you want, I can upload the module I used to test this. But I don't want to attach even more files here unless necessary.
Comment #116
catchFor #115 do you have numbers for a fully warmed cache?
Comment #117
ndobromirov CreditAttribution: ndobromirov at FFW commentedAs there are now profiles available I am dropping the needs benchmarks tag.
Added more related tags.
Comment #118
kristiaanvandeneynde@catch in #115 Will try and get those numbers too. Will probably be early next week.
Comment #119
kristiaanvandeneynde@catch re #116
Performance with a fully warmed cache, only retrieving:
Again shows a marginal increase (+1%) in time spent (1.56s vs 1.58s) on a page with multiple complicated retrievals and perfectly warmed caches. Given how rare that actually is with the old caching system (as a single invalidate would "soft invalidate" the whole lot), I'd say the new system is worth the absolutely minor performance hit in favor of the increased hit ratio and possibility to now really use cache contexts with parameters (as detailed in #107)?
Comment #120
catchLooks like the 'without' case is doing 48 merge queries so might not have been a fully warmed cache?
Comment #121
kristiaanvandeneyndeI added usleep() calls in the Renderer. So if something was not retrieved, it needed to be rendered and as such would call usleep(). What you're seeing is my way of changing cache context values during runtime:
And every context looks like this:
Comment #122
kristiaanvandeneyndeRerolled with changes from #105/#106.
Please note that I addressed point 7 by adding a docblock to VariationCacheFactory but almost all implementations of CacheFactoryInterface do not have one. So even though core is at fault for not having them, adding one kind of breaks the habit.
Comment #123
kristiaanvandeneyndeI should probably get rid of this soon as the variation cache test now checks for self-healing, but let's see what people think of the patch now instead of focusing on changing one test.
Comment #124
kristiaanvandeneyndeI thought of one edge case and one problematic scenario that needed resolving. The tests in the interdiff speak volumes, so read that :)
Comment #126
kristiaanvandeneyndeI see what's going on. My changes to ResourceTestBase are now irrelevant because the latest patch properly handles overlap scenarios whereas the original one only dealt with supersets and subsets, but not partial matches. This is actually a good thing if my assumptions are correct because it means the patch will become even smaller in size, yet more robust.
Glad I thought of, caught and tested the edge cases in today's patch. :)
Comment #127
kristiaanvandeneyndeThis basically reverts the changes to ResourceTestBase/EntityResourceTestBase and thus makes the patch even smaller.
Comment #129
borisson_The patch got bigger though, so I don't understand this comment?
Comment #130
kristiaanvandeneyndeOkay so both ResourceTestBase and EntityResourceTestBase should not be checking for cache redirects as it's:
The old render cache would instantly look for
keys:[foo]=bar
and then maybe get redirected tokeys:[foo]=bar:[cat]=dog
whereas the new system actually first checks forkeys
and then could instantly look forkeys:[foo]=bar:[cat]=dog
, but could also have redirect steps in between. So the amount of CIDs matchingLIKE %[foo]=bar%
are no longer predictable.The solution is to simply stop checking for redirects and make sure there is at least one entry in the DB.
Comment #131
kristiaanvandeneynde@borisson_ Fewer changes, but further apart. So more "gray lines", meaning a slightly bigger patch in size. But definitely smaller in amount of changes (red/green lines) :)
Comment #132
kristiaanvandeneyndeHaha, this is what I like about having worked on this for so long. I can understand why tests are failing without having to XDebug them. Back to green we go! :D
Comment #133
borisson_Aha, cool!
Ok, that makes sense.
In general, this patch looks very good. It has excellent test coverage, and I couldn't find any more nits to pick either that I didn't already bring up.
I feel very confident about the quality of the code, I am however not sure about the additional complexity this brings to an already complex part of the system.
My gut feeling about this patch is that it is done and can go in as it is in #130.
Comment #134
kristiaanvandeneyndeCool, thanks for the review! I made sure the test coverage was very clean and easy to read to facilitate the reviewing of this patch.
90% of the test changes are converting the use of a CID string into cache keys. The other changes are documented in the comments here somewhere (such as #130).
Comment #135
kristiaanvandeneyndeReroll
Comment #136
Wim LeersI finally caught up on this. Very sorry for the very long wait, @kristiaanvandeneynde! More than a 100 comments to catch up to takes a long time 😨 Especially for a topic as complex as this one!
The good news is that #2819335: Resource (entity) normalization should use partial caching landed, so now this patch can update two independent use cases, which give us even more confidence that this is a sensible abstraction 👍
#18: I hadn't even thought about cache factories in any of my previous comments here :)
#25: Looks like a nice first iteration passing most tests! This iteration still needs to port quite a bit from
RenderCache
.#34: I'm seeing significant changes in existing tests. We should avoid this if at all possible. It signals a potential BC break.
#42: RE "awful lot of core tests call the render cache bin directly rather than going through RenderCache": Well … in the end it's all stored in a cache bin, so it should be okay to not go through
RenderCache
. Besides, many tests predateRenderCache
by years! We shouldn't break them. That'd be a BC break. See previous point.#51: Nice leap forward, down to two-digit failures! 👏
#56: Given this comment of yours and your line of thinking, I think you're starting to grok some of the trickiest bits there, which took @Fabianx and me a lot of time to get right! 🤓
#58: Really interesting observation! I think this is all largely caused by the Render ArrayPI. The render array is both the input and the output, and so by necessity things are being modified by reference.
#60: Yay for refactoring Dynamic Page Cache's code!
#62: That looks like a very big change in behavior for Dynamic Page Cache. Why isn't there a "redirect" cache entry anymore? #42 mentions only setting a redirect when necessary, but that already is only created when necessary in
RenderCache
? I think #67 wants to explain this, but I don't understand it :) Why do we no longer need a cache redirect?#68: Yay!
#72 + #73: Epic debugging! :D The new scenario doing anything redirect related for route foo seems wrong: why is it storing a redirect at all, which is then later even reused when accessing route bar? Extrapolated this means that every Dynamic Page Cache entry would vary by the superset of cache contexts used by all routes seen so far. I see you reach the same conclusion towards the end of #72. It's essential to how
RenderCache
and Dynamic Page Cache work that the "original cacheability" (actually, original cache ID, i.e. cache keys + cache contexts) are taken into account. That's what the "pre-bubbling CID" was about. That's what the "long-ass comment" is explaining :)Correct, in HEAD,
RenderCache
will accumulate all cache contexts, including conditional ones, and then vary by them unconditionally because @Fabianx and I at did not see a way to both vary correctly and not introduce multiple levels of redirects. Multiple levels of redirects would cause a very high I/O cost. So we chose the trade-off to have fewer round trips (lower latency) and store unnecessary variations (consume more cache space). I agree with your ideal scenario. I'd love to see how you arrive there without choosing other trade-offs, i.e. how you can ensure we don't get more I/O roundtrips. Choosing a different trade-off would change application-level behavior across all existing Drupal 8 deployments, and would hence be … very undesirable :)And … based on what I read in #73, introducing multiple levels of redirects is exactly what you're proposing to do. As much as I like optimally using cache space, I don't think we can change this anymore. At least not in Drupal 8. And definitely not in this patch. Perhaps we can allow for multiple variation cache strategies, and then individual modules can choose whichever makes sense for them (given they know their data patterns)? And sites could even choose to change the default. I'm relieved to see @catch also raising concerns about this in #91 and #94.
(#73 is a superb comment by the way, crystal clear!)
#83: RE: " any cache context added after DynamicPageCacheSubscriber is irrelevant and should thus not show up in the header." -> no, this header is not meant to reflect the cache contexts respected by either Page Cache or Dynamic Page Cache — it's meant to inform developers (and tests) about the final, total set of cache contexts that the response varies by. For the similar remark you make about cache tags: Page Cache is unaffected by late
KernelEvents::RESPONSE
subscribers because it is not implemented as an event subscriber, but as aStackMiddleware
. I suspect you assumed that Page Cache & Dynamic Page Cache need those headers to function at all, but that's not the case: they function by inspecting theCacheableMetadata
stored inResponse implements CacheableResponseInterface
. It's only external caches that function based on such headers, such as Varnish or CDNs. And even then, those headers need to be named and formatted specifically for those external caches.#84: We can't have Page Cache set the
X-Drupal-Cache-Tags
header for the same reason I mentioned before: this header is unrelated to Page Cache.#88: See my response to #83 — that explains both why it makes sense for the
user
cache context to show up and for a test to assert its presence.#89: This is going to negatively impact front-end performance significantly. Many sites have responses that are invalidated by hundreds of cache tags, resulting in multi-kilobyte headers. See https://www.drupal.org/node/2592471 for why
X-Drupal-Cache-Tags
was made opt-in.#96: Why would we suddenly need to go from 2 to 3 cache reads if we stick to the old system? Yes, it did rely on a render array being manipulated in place. But that's what render arrays do all over the place. Why can't we continue to do that? I don't see why we absolutely need one extra lookup.
#102 + #103: Impressive work, thanks for being persistent enough to see this through to a green patch 😊Very few people would have kept going at it!
#104 says cache redirects only care about cache contexts. This is not true. It also needs cache tags to be associated, like HEAD's render cache already does. It's possible for a conditional cache context to no longer appear (and hence for a redirect to disappear) based on change configuration, which we wouldn't know about without the cache tag.
#107: There are a few bullets that are inaccurate. I've explained above why. One of those explains why increasing I/O calls for a single cache retrieval is a trade-off change that we probably don't want to make at this point, at least not without an opt-in. Aside of changed app perf characteristics I want to stress again that I think it's dangerous to gloss over the fact that cache backend I/O latency can be high.
#113 + #114: Thanks for doing profiling! It's important to also simulate different cache backend I/O latencies, to simulate the different environments Drupal operates in. For many Drupal setups, the cache lives in a DB (or Redis or memcache) that requires network I/O. I suspect you're testing with a DB cache backend that only does local I/O.
#115: Excellent job profiling the extreme edge case where the proposed patch gains the maximum benefit. Can you please do the same for the other extreme?
#130: +1 — this was done merely to harden tests — see #2877778: Harden test EntityResource + Dynamic Page Cache test coverage. I'm glad to see that complexity gone. Although I am still concerned about us changing so many tests so significantly in this issue.
Overall: absolutely incredible effort, @kristiaanvandeneynde! 🤩👏🤯
I agree with @borisson_ in #133 where he says that he feels unsure due to the additional complexity in an additional complex part of the system.
My top three key concerns:
\Drupal\Core\Render\RenderCache
,\Drupal\dynamic_page_cache\EventSubscriber\DynamicPageCacheSubscriber
and\Drupal\jsonapi\EventSubscriber\ResourceObjectNormalizationCacher
, without changing a single test. Changing a few tests would be okay, especially those that were specifically testing implementation details ofRenderCache
. But this is changing a lot of other tests.RenderCache
's logic into a separate service without changing the approach, then I think we should first changeRenderCache
's logic in a separate issue. That'd make this issue far simpler.It's totally possible that reviewing the patch in detail will reduce my concerns :) I hope so!
But before doing that patch review (which would probably take several dozens of hours) I'd love to hear feedback on this comment from you, @kristiaanvandeneynde. I'd especially like @catch and @Fabianx to chime in on the changed trade-offs.
I hope it's clear I'm not at all saying that things must stay the way they are because I worked on them — I'd love for all this to become better, and I'm super glad that Drupal now has a lower bus factor on this part of the caching system 😄 I'm concerned about the risks.
Once again: epic work, @kristiaanvandeneynde. I can't believe you managed to change significant implementation aspects and got the patch to green!
Comment #137
kristiaanvandeneyndeI'll start with your comment-specific reviews, focusing on the more recent ones as the oldest ones are hardly reflected in the patch anymore.
Because with render arrays we used to say: DPC always varies by route and method, the cache doesn't need to figure that out, we tell it so. With the new system we can't know that (unless we add an extra parameter asking for such a thing). So the old system would immediately start storing data or redirects at
response:<route>:<method>
whereas the new system stores an initial redirect with those exact directions atresponse
This extra necessary get is explained in #93 and #96 and is a necessary evil for the new system to work. But as further explained in #107 et al, the fact that we now have a "clean" API that is truly agnostic means we can finally use the caching layer at will without having to worry about edge cases that are only broken because of the way we implemented the render cache (e.g.: highly dynamic contexts on local actions block).
Re replies to #83, #84 and #85: I'd like to not go into detail about that one again as it seriously sidetracked this issue before :) The only thing I want to reiterate is that the header is set to cache metadata that does not accurately represent what the very layer that sets the header uses. Whether that is intentional, confusing or both I'll leave for another discussion. With all due respect, though!
We wouldn't if we were only refactoring RenderCache. But because the whole goal of this issue is to abstract the logic in said cache into a layer that can be used by anything, we need to think about what we want our API to look like. Render arrays are the only thing that can currently cheat by saying what cache contexts are always there beforehand (pre-bubbling).
So either we accept the extra hit of the initial redirect or we change VariationCache to accept an extra parameter $minimum_cacheability. That would also work but again beat the purpose of trying to have a clean API by needing to specify cacheability twice (full vs bare minimum). I chose the self-healing API for the full 100% because it adds less complexity for the implementer at the cost of an extra redirect behind the scenes.
They actually don't care when it comes to doing their job: pointing to the right CID. That's what that comment is about.
As for cache tags: they are now applied in a more clever way :P Instead of tagging the redirect, we store the redirects indefinitely. The actual cache item would also be tagged with the cache tag that might change the contexts upon its next invalidation, right? So what happens now is that when said thing happens and we do not change contexts, the old redirects still work and would lead to a stale cache item because we invalidated the tag. At this point, the new cache entry still has the same contexts and is stored using the already existing redirects.
However, when we do change contexts after invalidating the cache tag, the old redirect route would still no longer find the cache entry (because the tag also cleared it) and then while trying to store its new entry figure out that the old redirect route no longer holds true and overwrite it with the new redirect route.
I find this system ingenious because the old render cache would clear everything all the time because the single massive redirect would be stored with the sum of all tags and thus continuously be cleared, leading to a lot of cache misses. The new system leaves redirect paths untouched unless something really did change. This means we do not break redirect routes to unaffected cache entries every time we invalidate one item.
Re increased I/O (reply to #107, #113, #114): I did test on a Docker set-up where the DB was in its own container. But network traffic would indeed be fast because it's all run locally.
I wouldn't say any of the profiling was extreme. The ones from #113 and #114 show simple page loads are hardly affected (given local I/O) whereas #115/#119 weren't intended to show extreme cases, but rather profile a multi-page request, which Blackfire doesn't seem to do. The whole difficulty here was to profile what happens when requesting multiple variations of the same cache keys.
If you want, I could try and whip up an extreme case where the new cache wins by miles and try to find another extreme case with the same result for the old cache, but those scenarios would be so farfetched, I doubt it would do any good. The focus really lies on what happens across multiple requests because my research shows the old cache does a lot of invalidating of items it shouldn't invalidate.
I would agree if the tests truly changed. But if you filter out all of the conversions from cache IDs to cache keys and the removed usage of #cache_redirect, you'll notice not that many tests have changed. Only the changes from #130 remain (which you like) and some changes to scenarios in RenderBubblingTest, which should arguably no longer test for any cache entries as that is now taken care of by the dedicated VariationCacheTest. I even left a @todo there specifying how we might want to remove the cache testing from that class altogether.
Comment #138
kristiaanvandeneyndeWill comment on your top three concerns as soon as possible.
Comment #139
kristiaanvandeneyndeGetting to your concerns now:
We could re-use RenderCache's logic and then build around it, but it would just not be desirable. We'd have to adjust the API for get() et al to not only ask for keys but also for
$permanently_present_cacheability
.This is something we could certainly do, but if we do end up refactoring the whole thing to be self-healing, then we end up with a dead parameter.That said, the current "downside" of introducing at least one extra redirect with the new system would be alleviated by the above because we too could now predict the first cache ID where we need to look rather than store a redirect at the cache ID comprised only of the cache keys.
The beauty of the new system, however, is that it could eventually replace all cache implementations. If you look at the code in VariationCache::get(), you'll notice that it falls back to a regular cache if there are no contexts. This means that in a future release of Drupal, we could simply have one API for all cache backends and it would only introduce redirects if it actually needs to. So if any alter were to change your cache object's cacheability to suddenly have contexts, you wouldn't need to worry about it because you're already calling a cache that knows how to deal with it.Nope, can't do this. If we first store something at keys and then try to GET anything with the same keys but also some contexts, we're gonna get the wrong cache item.Another downside of keeping the old system is that it doesn't properly support caching non-render-arrays. As shown in the old DynamicPageCacheSubscriber, you need to create some funky render arrays containing the actual object to allow the render cache to work. I'd rather not create a new API that has the first step of its instructions be: " Wrap your object in a render array".
So what would you suggest here? How can I keep the old logic for now (with the ever-growing redirect that always gets flushed) while allowing for the new system to eventually replace it? So that perhaps we can introduce both systems at once and
To summarize: we could keep old logic and even adjust new logic to have one redirect fewer by adding a
$permanently_present_cacheability
parameter,but that would also inhibit further improvements down the line because we'd be stuck with a parameter that we ideally want to get rid of ASAP.Also, I cannot stress this enough: While the old system works fine at the first layer of redirects, the self-healing redirect is designed in such a way that it will be flushed almost all the time (because it's tagged with everything). So even if we introduce more redirects from the second level onward, it would still lead to a cache HIT rather than an almost guaranteed cache MISS.
I suppose an extra round-trip to the cache backend is cheaper than having to recalculate the whole thing, right?
Comment #140
kristiaanvandeneyndeCome to think of it,
$permanently_present_cacheability
might not be such a bad idea (see crossed out paragraph in previous comment). If we initially vary something by foo and someone else thinks it's funny to remove foo and vary it by bar, we could cross-check it against the$permanently_present_cacheability
and throw an exception.And if you ever need someone else's code's cache to support variations, then it's up to you to swap out that service with a decorated version that uses the variation cache.
By introducing this, we can at the very least remove the extra redirect you're all worried about :) Then the only remaining issue would be to keep the old self-healing caching that hardly caches at all or the new self-healing caching that might introduce an extra redirect, but not necessarily.
Comment #141
kristiaanvandeneyndeI expect to see a huge amount of red tests.
This patch addresses the fact that the old render cache could benefit from "knowing too much" and the new cache couldn't. After careful consideration (#139/#140) I now realize we ought to keep that insider knowledge.
Because of this, the "extra redirect step" that you all feared so much is a thing of the past. If contexts are provided to the $initial_cacheability, the first cache ID in the chain will immediately use those contexts. This should boost performance even more and with the improved hit rate of the new system, I'm cautiously confident that the new system might actually outperform the old.
Comment #143
kristiaanvandeneyndeNot too bad, expected worse for such a huge change. Will have a look into these as soon as possible.
Comment #144
kristiaanvandeneyndeI fixed a few inconsistencies where I did not revert some changes of a WIP different approach.
Comment #146
kristiaanvandeneyndeA bit worried about the JavaScript errors. But maybe I need to see why some tests fail on a 500 and will that tell me more. Attached patch adds the required cache contexts to a render cache GET and should net us better results.
Comment #147
kristiaanvandeneyndeOkay I debugged the 500 errors. It's because of the LogicException being thrown in VariationCache because the initial cache contexts contain the required ones (language_interface, theme, user.permissions) and the element's actual cache contexts don't contain those. Investigating.
Comment #149
kristiaanvandeneyndeOkay, turns out ToolbarController::preRenderGetRenderedSubtrees() completely overwrites the original element's cacheability. This should not happen causes various and caching issues (it can't be cache properly) but never surfaced until my new code started throwing a LogicException when a $pre_bubbling_elements has more cache contexts than the eventual $elements. This should fix some of the status code 500 failures.
Also reverting last patch because that seemed to have broken more than it fixed. I did manage to set up a testing site locally, so can now submit more error-proof patches :)
Comment #151
kristiaanvandeneyndeLOL, that's 37 test fails fewer by simply fixing a bug in Toolbar. Taking care of the final two failing classes later this week.
Comment #152
kristiaanvandeneyndeThis should only leave UserBlocksTest::testUserLoginBlock as a fail. I'm trying to figure out why it's failing because when I reproduce the same scenario in the browser, it seems to work :/
Comment #154
kristiaanvandeneyndeHah, found it! And it's yet another example of why we shouldn't test for cache hits in non-cache-related tests. Basically what happened was the following:
GET /filter/tips
User is anonymous, DPC stores a redirect chain of:
GET /filter/tips
User logs in and tries to store a redirect chain of:
Variation cache sees this and optimizes the redirect chain, "temporarily breaking the bridge" as shown in VariationCacheTest.
GET /filter/tips?foo=bar now tries to use the above chain but won't find anything, therefore calculating it once more and restoring the bridge as follows:
During this final step, DPC will still output a MISS where the test expects a HIT. Because of this, we can fix the test by adding a 2nd anonymous request, albeit with different query arguments (because PageCache will otherwise trigger and return the stored DPC MISS header). But the real fix would be to simply remove the DPC header checking from this test as it does not belong here.
Attached is a patch proving my research. Although we should ideally remove the header checks instead.
Comment #156
kristiaanvandeneyndeOf all the query args I could have chosen, I ended up with foo=baz, which is also used further down the test 🤦
This replaces it with cat=dog and definitely goes green.
Comment #157
kristiaanvandeneyndeSplitting off multiple smaller, unrelated failures into their own issues. Will update this comment as I make them:
Comment #158
kristiaanvandeneyndeThis leaves me to separate out the changes to the following:
All other tests changes are related to this issue's patch as they all have (sometimes hardly) valid reasons to test the render cache. Because the render cache backend changed, these tests need to reflect those changes. It would be a perfect world if they didn't test the backend directly, but rather the render cache itself, but here we are... It's not always that simple :)
Comment #159
kristiaanvandeneyndeI've created the following two, according to #158:
#3085475: ItemCacheTagsTest and ShortcutCacheTagsTest needlessly test the render cache.
#3085487: TrackerTest looks for a cache context it does not add.
From the latter:
So the TrackerTest changes need to remain in this patch.
For review, I have added the patch that does not contain any of the separated issues. This is obviously a review-only patch, but it might help filter out the noise.
Comment #161
stBorchertI have to admit that i didn't read the complete issue but do you have the Vary header in mind?
While playing around with JSON:API (especially Consumers) I stumbled across #2972483: Page Cache must respect Vary, otherwise it might serve mismatched responses. and #3023104: Introduce "Vary" page cache response policy.
Unfortunately it isn't possible at the moment, to use the "X-Consumer-ID" header implemented by Consumers because the page cache does not care about the Vary header (or any cache context at all).
Is it possible that your patch would fix this also?
Comment #162
gabesullice@stBorchert, I don't think this issue will solve your problem. If you need
Vary
support, you can't usepage_cache
module. You can only usedynamic_page_cache
module. To get anonymous response caching w/Vary
, you'll have to use something like Varnish. If you have further questions about that, feel free to open a support request for Drupal core using thejsonapi.module
orpage_cache.module
component. You might also consider opening a support request in the consumers module queue.Comment #163
stBorchertThanks for you quick reply.
I'm trying to fix a problem in Consumers (#3001043: Cacheability issue when using _consumer_id query in JsonApi responses.) where it isn't possible to get different results per consumer when using the Header introduced there but I'm sure to find a solution :)
Comment #164
bradjones1Pretty sure this is solidly 9.1.x now.
Comment #165
kristiaanvandeneyndeIs there still interest in moving this forward?
FWIW, recent versions of Group rely on VariationCache (https://www.drupal.org/project/variationcache) and with over 2.2k installs I have yet to see a bug report or performance complaint.
Comment #166
bradjones1I was blown away by this when you presented it in Amsterdam and was honored to work with you on the contrib implementation.
I imagine there may be some hesitance in so far as it is a "big" change, but then again Drupal has such good code coverage for both this specifically and in integration tests (which are passing here) that mitigates such a concern in large part.
This probably got lost a bit in the D8 to D9 "new features" freeze that was effectively in place before 9.1 got opened, but now that it is - let's loop in core maintainers?
Comment #168
kristiaanvandeneyndehttps://www.drupal.org/project/variationcache
5.5k installs, zero bug reports.
Comment #169
Wim Leers👏👏👏🥳
Comment #170
kristiaanvandeneyndeI'd love to see this in core, though :) One module fewer to maintain and more possibilities when it comes to adding parameter-based cache contexts (see #107).
Comment #171
Wim LeersWhat's holding me back from RTBC'ing this is the changes in performance characteristics as described in #107, #113, #115, #116 and #119. Unfortunately, none of the
blackfire.io
URLs work anymore… 😬I did a thorough review in #136 but do not currently have the capacity to follow up on what I wrote then and your responses to it 😞 For that, I apologize.
However, if core maintainers like @catch, @FabianX or @Berdir approve of this, then you don't need my approval at all!
@catch, you're listed as the sole maintainer of the cache system right now — do you have updated thoughts since your last comment here (#120)?
Comment #172
bradjones1Is there any agreement on a script/configuration/etc. for profiling this? Perhaps the cache was profiled when initially implemented and there's something there? Or is there any automation that was behind the earlier blackfire profiling efforts on this ticket?
Comment #174
bradjones1To perhaps provide an additional bump on this, I'd note that core itself has an open optimization on variation caches in json:api; e.g., the
ResourceObjectNormalizationCacher
.Comment #175
kristiaanvandeneyndeYeah, that's just another part of core that (ab)uses the render cache to get cache redirects working. Which is exactly what this issue is trying to fix/get rid of by introducing a dedicated variation cache (i.e.: cache with redirects).
Comment #178
borisson_We were waiting for a couple of issues in #157, we're currently only waiting for these:
- #3082032: ToolbarController::preRenderGetRenderedSubtrees() deletes parent's cacheability.
- #3082059: ResourceTestBase needlessly tests the caching layer
Comment #180
SocialNicheGuru CreditAttribution: SocialNicheGuru commentedComment #181
kristiaanvandeneyndeUpdate of #168 from almost 2 years ago:
https://www.drupal.org/project/variationcache - 11.6k installs, zero bug reports.
Group can be quite resource-intensive the bigger your site gets and how you configured it. I have yet to see a single report where someone's site was slow because of VC, it's usually some ungodly queries (which I've recently optimized a lot) that are causing issues.
To me, it feels like BigPipe how feels to Wim: I wrote it, it works, no-one complains about it. The fact that its issue queue has remained empty for so long, bar some D10 compatibility issues, speaks volumes about how well it works.
Comment #183
kristiaanvandeneyndeTried to reroll into an MR because that's easier to review than a huge patch. Contains the fix from #3082059: ResourceTestBase needlessly tests the caching layer until that one gets accepted.
Comment #184
kristiaanvandeneyndeOkay seeing a few VariationCache test fails, which could make sense as I have not updated the tests for this issue over the years. Copied the ones from the standalone module now.
Also seeing "Failed asserting that 5 is equal to 4 or is less than 4." type errors, which was to be expected after the changed made to #3082059: ResourceTestBase needlessly tests the caching layer The maximum is 5, though, meaning that the maximum amount of extra redirects across all of core is 1.
Also seeing various "The text "SOME_TEXT" was not found anywhere in the text of the current page." failures, which I'd need to check as I'd be surprised if they were related.
Then also seeing a few "Failed asserting that 500 is identical to 200.", which again I'd need to check if that's even related.
Anyways, updated VariationCache and VariationCacheTest code to match standalone module and upped redirect limit to 5 to reduce some noise.
Comment #185
kristiaanvandeneyndeThe final test that fails was ShortcutCacheTagsTest::testToolbar, added in #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable
It tries to detect config:user.role.authenticated and user:1 because it expects the toolbar's user.permissions cache context to be optimized away for the placeholder's user cache context.
Relevant changes that caused this in shortcut_toolbar() are:
Just like #102, the placeholders are working properly now, so there is no optimization that needs to take place and therefore it never makes it to the page's cache tags.
This is due to changes in PlaceholderGenerator, but I'll read up on those again to double-check if anything odd is happening. I don't suppose so, but doesn't hurt to double-check.
Comment #186
kristiaanvandeneyndeReading back up on the comments in the 70ies range, I noticed I added the change to PlaceholderGenerator in 78/81 because back then I still did not take initial cacheability into account and that was breaking DynamicPageCacheSubscriber and related tests.
Now that, since #141, we do take initial cacheability into account, I'm thinking I might be able to make this patch smaller by removing said changes again. Having said that, though, it's clear from my comments surrounding the issue that the change was actually for the better because the header output was not accurate before.
We'll see if I can make it go green when smaller, but right now it's already amazing that the tests go green after all these years.
Comment #187
kristiaanvandeneyndeReading the changes in the MR and their associated comments, I'm actually inclined to keep things the way they are. I had good reasons for the changes that were made, I'm not gonna risk delaying everything a few more years because of perfectionism. AFAIC, this MR is "done".
Comment #188
kristiaanvandeneyndeHiding patches as we use a merge request now.
Comment #189
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #190
kristiaanvandeneyndeComment #191
borisson_Are we still waiting for the change in https://www.drupal.org/project/drupal/issues/3082032, or what are the next steps here? Looks like all of @catch's remarks have been resolved. I now feel more confident about this patch compared to #133 almost 5 years ago.
Comment #192
bradjones1#3082032: ToolbarController::preRenderGetRenderedSubtrees() deletes parent's cacheability. is fixed.
Comment #193
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #194
kristiaanvandeneyndeNot sure why the MR keeps saying it needs to be rebased even after I merged in 10.1.x like I did last two times.
Comment #195
andypostThere's few big commits arrived today so it needs another rebase/merge, but there's a bug https://gitlab.com/gitlab-org/gitlab/-/issues/407115
Comment #196
andypostleft review
Comment #197
kristiaanvandeneyndeAdded placeholder CR for that change here: https://www.drupal.org/node/3354596
Will flesh it out a bit more soon, but this way the link already works.
Comment #198
kristiaanvandeneyndeUpdated the change record.
Comment #199
borisson_I have used an ab-like tool to check if there is a difference in page response times. I have done this on my local machine with umami and a couple of extra blocks enabled on the homepage. This difference that we see here is super small, I think this is more likely to be something else that ran on my machine rather than the patch actually making it slower.
Based on this, and the discussions with @kristiaanvandeneynde this looks great.
Setting this to rtbc
Comment #200
borisson_Actually setting to rtbc.
Comment #201
kristiaanvandeneyndeWoot, thanks for the performance review @borisson_! Ever since Blackfire changed, I haven't found the time to work on another performance test. So thanks a million.
Looks like the patch actually makes it slightly faster too? Could also be down to small differences on your machine as you noted.
Comment #203
catchMade a couple of small commits to resolve unresolved threads. If I haven't broken anything I'm hoping to commit this to 11.x/10.2.x later today or tomorrow.
Tagging needs follow-up for using this in workspaces per #2551419-13: Abstract RenderCache into a separate service that is capable of cache redirects in a non-render array-specific way
Comment #205
catchCommitted 339643b and pushed to 11.x. Thanks!
After doing that, I realised that we should have an extra change record for the new API and a release note with some information about it, to hopefully encourage contrib take-up where it's applicable.
Moving to needs work for that, but nice to finally land this one!
Comment #206
catchComment #207
longwaveOpened a small followup #3365495: VariationCache deprecations should refer to 10.2.0
Comment #208
kristiaanvandeneyndeWow, this is huuuuuge 🎉
I'll try to draft a change record ASAP.
Comment #209
kristiaanvandeneyndeDraft change record here: https://www.drupal.org/node/3365546 Also, I flagged it as 10.2.x but did not see a commit for that (yet?), hope I didn't make a mistake with the version on the CR.
Now I just need to figure out a way to seamlessly allow people to transition from contrib VariationCache to core VariationCache. I was thinking the following:
Would that work? AFAICR, having the same service name in contrib as a core service simply replaces the core one with the contrib one, so we should be fine?
If this needs further discussion, I'm definitely open to take it up on Slack or another D.O issue.
Comment #210
catchSee https://www.drupal.org/about/core/blog/new-drupal-core-branching-scheme-... for why there's no 10.2.x commit, don't worry it'll be in there :)
The plan for the contrib module in #209 sounds like it will work - should probably be an issue against variationcache for that?
Change record looks like a good start. Moving this back to fixed.
I re-reread the workspaces discussion linked from #13, and the issue that we wanted to fix got fixed via memory cache which to me is clean enough, so removing the needs follow-up tag at least for that one.
Added a release notes snippet.
Comment #211
kristiaanvandeneyndeAha, that was a good read. Now I get why it's committed to 11.x
Thanks for the feedback on #209
Comment #212
BerdirI wouldn't rely on your services automatically replace the ones from core, that depends on order in which the files are loaded. I'd do a ServiceProvider and alter them.
One problem is that you can't retroactively change core compatibility of old releases. If you do a new minor release that is not compatible with 10.2 and a new 2.x major version that requires 10.2, composer will by default just downgrade to the previous 1.x release as most people will have version constraint like ^1 in place. They will need to manually update that, assuming they directly require it. If not, then composer will update them to 2.x.
You could see if there's a way to not do a new major version, but support both earlier and newer versions in one release. You won't be able to extend from the core classes then, you'd still need to duplicate everything and either replace or define the services. You might be able to do some class alias trickery to extend from core classes if they exist, the way phpunit/twig and similar BC compatibility sometimes works. If you don't, you'd risk breaking code that relies on the new 10.2 interface.
Comment #213
Wim LeersEpic! 🤩
Congratulations and thank you, @kristiaanvandeneynde!
Comment #215
andypostIs there reason why no change record published?
Comment #216
kristiaanvandeneyndeHmm good point, they're still in draft status
Comment #217
Chi CreditAttribution: Chi commentedRe #215 That's common issue on drupal.org.
Filed a ticket. #3384938: Publish change records when issues are fixed
Comment #218
kristiaanvandeneyndeI've gone ahead and published both CRs