Proposed commit message:
Issue #2429617 by Wim Leers, Fabianx, Berdir, yched, dawehner, effulgentsia, catch, borisson_, jhodgdon, martin107, torgosPizza: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!)
TL;DR
Dynamic Page Cache (formerly known as SmartCache, until comment ~380) in a nutshell:
- cache the
CacheableResponseInterfaceobject (which in case of aHtmlResponsestill has the#attached[placeholders; those are replaced at a later time, byHtmlResponseAttachmentsProcessor) - return that cached response immediately after routing has taken place, hence avoiding the controller service (and its dependencies) being initialized, along with everything else for building the response.
The resulting performance improvement as user 1 on the frontpage (APCu off, OpCache on, PHP 5.5, XDebug off, XHProf on):
| Run #frontpage-before | Run #frontpage-after | Diff | Diff% | |
|---|---|---|---|---|
| Number of Function Calls | 31,794 | 16,749 | -15,045 | -47.3% |
| Incl. Wall Time (microsec) | 89,291 | 57,922 | -31,369 | -35.1% |
| Incl. MemUse (bytes) | 17,050,176 | 13,120,368 | -3,929,808 | -23.0% |
| Incl. PeakMemUse (bytes) | 17,101,080 | 13,153,056 | -3,948,024 | -23.1% |

(See #205 for details.)
Problem/Motivation
- We want D8's authenticated user page loads to be fast.
- Some parts of the page are cacheable across users. To actually cache that across users, we have #2429617: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!).
- Other parts need to be dynamically calculated per user, or are simply uncacheable.
- Those dynamic parts should not prevent us from showing the rest of the page first.
(Drupal 8's anonymous user page loads already are fast since #606840: Enable internal page cache by default.)
We're render caching bits and pieces (blocks & entities), but we're still running expensive controllers to build the main content array. And we're initializing dozens and dozens of services that we barely use. It's getting more difficult to see what to optimize next.
Drupal has historically always generated the majority of the response dynamically for every request.
But we've been doing massive amounts of work to make things more cacheable. Could we make use that work to break the trend, and make Drupal 8 the first release to generate a minority of the response dynamically for every request?
Proposed resolution: the simplified version
Assumption: everything that depends on some (request) context, specifies a cache context. #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance has made sure that this is implemented consistently (and tested) for the 99% use cases.
- On a cache miss
-
- A
KernelEvents::RESPONSEevent subscriber detects whether it's aCacheableResponseInterfaceobject, i.e. a response with cacheability metadata that SmartCache can therefore safely cache. This an object representing the entire response. In case of aHtmlResponse, it still has the placeholders (that need to be replaced on every request, i.e. dynamically, uncacheable). - The result is that we therefore have the response that is cacheable across all users, because we know which cache contexts are associated. We use the
RenderCacheclass that is capable of handling cache redirects (which is in fact based on early versions of the SmartCache patch), and ask it to cache the response per route, and if necessary, perform cache redirection. - The event subscriber ends. Then the response is finished and sent as usual. (The other event subscribers fire, including in case of a
HtmlResponsetheHtmlResponseSubscriber, which callsHtmlResponseAttachmentsProcessor, which does all the final rendering (just like for any other request): it renders the attached placeholders, bubbles the bubbleable metadata from the placeholders, and given that final set of attachments, it is able to render the CSS/JS assets, plus HTML<head>tags + HTTP headers.)
- A
- On a cache hit
-
- We've already done the above, and now we're hitting the same route again.
- Immediately after routing, before even calling the controller (the thing that would otherwise do DB queries and whatnot to build a render array), SmartCache's event subscriber checks if we have a cache item in the
smart_cachecache bin for the current route, following any redirects if necessary. - If such a cache item (including potential cache redirect) exists, then the response for this route is already cached. Hence the controller never needs to be executed, and all that still needs to happen (in case of a response other than
HtmlResponse, otherwise we're done already), is processing the attachments! (Which includes rendering the placeholders.) This is handled automatically; just like for anyHtmlResponse.
How is this related to the BigPipe issue?
The BigPipe issue: #2469431: BigPipe for auth users: first send+render the cheap parts of the page, then the expensive parts.
Basically, SmartCache doesn't care about things that are dynamic (max-age = 0 or cache contexts that we consider "too dynamic to cache", such as "per user"). It's very simple, dumb and stupid. It simply varies by all cache contexts on the page.
So, if e.g.:
/foovaries only by interface language, then SmartCache's entry for that will also vary by interface language/barvaries by interface language, route, permissions, query parameter, user and whatnot, then the SmartCache entry will also vary by all those things
That's where the intersection with BigPipe begins.
BigPipe allows us to NOT have the bits that are "too dynamic" to affect the overall page: it allows us to NOT bubble up the "per user" cache context, for example. Because we replace it with a placeholder, and render it separately.
Proposed resolution
- See the simplified proposed resolution above.
- Read the issue summary at #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance.
Given what you've just read in #2429287, you may now realize that once:
- Cache contexts are defined by everything, as they should be (i.e.
once that meta is completedsince #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance is 99% done, which it now is) - Cache context bubbling is implemented (child of the meta: #2429257: Bubble cache contexts)
- All uncacheable things use placeholders (just like: node links, comment links, the comment form on entities, and more)
… then we can start to put all of that cache metadata to the originally intended powerful use: cache the entire HtmlResponse, varied by the contexts associated with that route!
(And once we have that, we can go even further! We could make it configurable which cache contexts (high-frequency ones, e.g. "per user") we don't actually want to vary by, which we could then automatically replace with placeholders. We could then run replace those placeholders either when rendering a response (like today), replace them using something like Facebook's BigPipe, or replace them using ESI. It would be configurable. We have an issue now for auto-placeholdering: #2499157: [meta] Auto-placeholdering.)
A final note: SmartCache works post-routing and caches per-route. You may wonder: why not pre-routing and caching per-URL?
— see #219 for a detailed answer. But in short: because the SmartCache architecture is based on cache contexts, and several cache contexts are not available pre-routing/pre-bootstrap (user, user.permissions, user.*, route, and likely more). Which means SmartCache cannot work pre-routing.
SmartCache reuses the logic in RenderCache, at the cost of some (isolated!) ugliness because RenderCache only knows how to deal with render arrays. The benefit of this reuse is that we don't duplicate the logic, and can just depend on the existing comprehensive test coverage.
Algorithm, proofs & next steps — comments wanted!
This issue corresponds to step 1 in the Google Doc: https://docs.google.com/document/d/1Gw7ohBOUKu38t4kMbN9zj6cX-4-_2ZNXGotv...
(We will move this onto Drupal.org once it's 100% fleshed out.)
Remaining tasks
None!
When this gets committed, please also credit Fabianx.
User interface changes
None.
API changes
None. This is a pure addition; that doesn't change any APIs.
But, notable changes:
- Forms are now marked as uncacheable (
max-age = 0) by default. - The REQUEST event subscriber
ContentControllerSubscriber::onRequestDeriveFormWrapper()has a different priority, because it was impossible to inject an event subscriber at the desired place (i.e. to inject a new event subscriber in between).
And a few small bugfixes, mostly small remnants of things that were fixed in blocking child issues, but don't really make sense to fix separately because of process overhead:
TestAccessBlock::blockAccess()specifies max-age=0 because it depends on state.PageCacheTestinvalidates therenderedcache tag rather than deleting everything in therendercache bin.PathAliasTestinvalidates therenderedcache tag because it is expecting path alias changes to show up immediately, despite there still being an open issue (#2480077: Using the URL alias UI to change aliases doesn't do necessary invalidations: path aliases don't have cache tags) to make path aliases handle cache invalidation correctly. So, this is a temporary fix, with a @todo added.- The
ConfigCacheTagconfig save event subscriber was only invalidating therenderedcache tag forsystem.theme.global, it also needs to do it forsystem.theme. TestControllers::testEntityLanguage()'s render array was failing to pass on cacheability metadata.
Credit
HUGE HUGE HUGE thanks to Fabianx, who's helped me enormously in verifying that this actually works! Without him, this issue would be much vaguer. Plus, once this is in, he has amazing ideas for further improvements, he even sees a way to make it work without executing a request at all — resulting in 10–20 ms response times! See the aforementioned Google Doc, steps 2 & 3, for details.
| Comment | File | Size | Author |
|---|---|---|---|
| #416 | dynamic_page_cache-2429617-416.patch | 50.55 KB | wim leers |
Comments
Comment #1
fabianx commentedComment #2
fabianx commentedJust a quick note:
The idea that Wim implements at the page level can be applied to all sub levels pretty easily - once the bubbling is in:
https://gist.github.com/LionsAd/fef2568413569c13e952
is a quick PoC / pseudo-code for that.
Reason:
- Currently the bubbling leads to no longer receive cache hits (e.g. an entity_view() adds a 'user' cache context, then the block its part of has a cache ID of block:name:%user).
But when that block is next checked it will check for just block:%user.
The idea is now to store the final used cache contexts (the sum of it), at that location.
So block:name returns a render array that redirects to block:name:%user.
Yes, much more cache entries, but perfect granularity - similar to smart cache.
While it might not make sense to enable that for all renderables, it might make sense to complement the strategy of the smart cache working on the route level.
e.g. a block where an entity is displayed per user can be gotten from cache on other routes.
Comment #3
wim leersThe first PoC (also linked in the Google Doc): just dealt with Response objects. Hence was breaking things, because it was also caching the result of
#post_render_cachecallbacks (e.g. the comment form on a node). (*-poc0.patch)The second PoC, then, fixed that, by only caching the top-level
#type => HTMLrender array. But it was still only a hack ofHtmlRenderer. Which means the controller was still being resolved, the main content render array was still being built (and then discarded), etc. Still, this already yielded a nice performance boost. (*-poc1.patch)Environment: OpCache on, PHP 5.5, XDebug off, XHProf on. When the run say "noapcu", that means APCu and the APC classloader are both off, otherwise both are on. Fresh install, one node, with one term. Viewed as root.
/node/1/node/1Next comment: the latest PoC, which significantly improves upon these numbers still!
Comment #4
wim leersAnd here is the latest PoC (the profiling data of which is already in the IS). This one actually complies with the algorithm that Fabianx & I came up with, as is explained in the IS.
This makes Drupal 8 twice as fast!
The patch that includes the depended patches includes #2329101: CacheableInterface only has a getCacheKeys() method, no getCacheContexts(), leads to awkward implementations (currently RTBC) and #2329101: CacheableInterface only has a getCacheKeys() method, no getCacheContexts(), leads to awkward implementations (which depends on #2329101).
/node/1/node/1Comment #5
dawehnerNot bad results.
Just an idea, you can at any moment replace _controller with a new one, which does, what you want
Comment #7
wim leers#5: Of course! Thank you! :) That allows me to remove a significant portion of the hackiness :) 1 KB smaller patch.
Comment #8
fabianx commentedHere are my initial thoughts:
I think we should explicitly used an internal key, like #cache_rendered_already or something like that.
Checking for html could have side effects.
While this is workable, I prefer to use a hash e.g.:
$this->routeMatch->getRouteName() . ':' . hash256(serialize($this->routeMatch->getRawParameters()->all()));
All this work needed strengthens my belief that this belongs in the Renderer instead and this should just set a #cache flag.
I think all code dealing with the cache contexts and prefixes belongs into a helper class / helper service.
The route itself could then be a cache_context.
That would generalize the two cache tier approach.
Comment #10
wim leersThe depended patches have landed. Re-uploading the #7 patch without a single change, but now without the dependencies.
If this passes, then it's only because the test coverage elsewhere is insufficient. It should fail. Every failure points will point us to a missing cache context.
We are already aware of many of the missing cache contexts, and are fixing that in child issues of #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance. If you want to stay up-to-date on fixing those blockers, follow that issue.
Comment #12
wim leersHuge, huge numbers of failures caused testbot to take a whopping THREE HOURS instead of 20 minutes. Which means the test coverage is catching many problems, which is exactly what we hoped for (per #10). Yay!
Now actively working on the child issues of the meta to make this green.
Comment #13
martin107 commentedComment #14
siva_epari commentedPatch rerolled.
Comment #16
martin107 commentedThis is a comment that apples to #10 - before the reroll
SmartCacheHtmlRenderer::__construct()
There are two parameter variables name - which are too similar and for me cause confusion
The difference between $contexts_cache and $cache_contexts is not immediately obvious!
Regarding the reroll - all the error have a common factor.
It looks like the order of the parameter in SmartCacheHtmlRenderer::__construct needs some attention.
Comment #17
siva_epari commentedThanks for pointing out. Changed the order of parameters. Also, added the missing 6th parameter in parent::__construct()
Comment #18
berdirThis issue is very experimental right now *and* it almost takes a testbot down when it runs, see the earlier broken test results. I'd recommend to postpone it and avoid rerolling it for now.
Comment #19
siva_epari commentedSorry, that the reroll caused trouble. Agree, that rerolling should be avoided when it is obvious that the tesbot takes 3 hours for testing a patch at this stage of this issue.
Comment #21
fabianx commentedPostponing on the meta issue for now per #18
Comment #22
wim leersYes, I'm afraid #13 — #17 is a distraction that actually complicates this issue, rather than moving it forward.
My bad for not making this even more explicit than I already was in #12. Now assigning to myself also to signal that I'll reroll this when it's ready.
Comment #23
wim leersUpdate!
Drupal 8 performance today
In only a few days, it's Drupal Dev Days Montpellier. We'll have a performance sprint there, and we basically want to reduce the number of unknown cold cache & bootstrap performance problems. We already know where we spend a lot of time in rendering, and we already have a plan to reduce that: SmartCache (this issue) and BigPipe (see https://docs.google.com/document/d/1Gw7ohBOUKu38t4kMbN9zj6cX-4-_2ZNXGotv... + https://www.drupal.org/node/2429287).
By doing a lot of profiling work, and analyzing the profiling, writing quick prototypes to see where we can make gains and how many milliseconds we can regain. The difficulty with that in HEAD is that it's often quite difficult how much time is spent where, because we spend a significant amount of time rendering.
Profiling at DDD
That's why we want to apply this patch at DDD and do profiling with it applied. It still shows everything for rendering an entire page, including asset handling, but minus the big amount of time spent actually rendering the contents. This allows us to see the threes in the forest again.
To be able to do that, we want this patch 1) to apply, 2) to be able to install Drupal, 3) to get meaningful test results. Lots of work has already been done in #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance, which should make it possible to get testbot to run the entire test suite and "just" come back with lots of failures :)
Next steps
Comment #24
wim leersThis is #23.1.
Comment #26
wim leersThis is #23.2.
About this reroll/these changes:
#cache[max-age]and thus didn't actually look at that yet. Now it does. The wonderful consequence of addingmax-agesupport to SmartCache and the fact thatmax-ageis bubbled: any page that has anything that's not cacheable will automatically skip SmartCache.Right now, the search block and breadcrumbs block both set
max-age = 0, so almost every integration test using the Standard install profile is effectively bypassing SmartCache. Those blocks already have @todos to stop setting a maximum age of zero.(A strategy for figuring out which pages are still setting inappropriate zero max ages would be for SmartCache to ignore the max-age and always cache the HTML response. Then we'll have test failures much like #606840: Enable internal page cache by default, most of which will be pages that indeed should be cacheable.)
Comment #27
wim leersThis is #23.3.
#23.4 is for tomorrow.
Comment #28
yched commentedSorry, I'm aware it's probably not the right time for a pedantic review at this point, but I got curious, and then I got remarks, so I wrote them down :-)
Just wondering : this means the "old" HtmlRenderer is never used anymore ? Do we still want to keep it around ?
unused :-)
Feels a bit misleading/confusing that a $cache_contexts / CacheContext object is in fact a service that does stuff about contexts, but not "some actual cache contexts" in itself.
onRouteMatch() just a couple lines below has a $cache_context var, which, this time actually holds "cache contexts for some piece of rendered html"
Better if we name things for what they are ?
minor : no separator between the route name and the serialized array ?
I might very well be getting this wrong, but if we have a match in SmartCacheSubscriber::onRouteMatch(), then we'll still run through main_content_renderer.html 's renderResponse(), right ?
So aren't we re-doing here what SmartCacheSubscriber already did ?
More generally, if SmartCacheHtmlRenderer::finish() runs for both cache hits and cache misses, then isn't some of the logic extraneous for cache hits (including re-caching the data) ?
Comment #29
wim leers#28: you're right, this isn't the right time. So I'm not going to address it point by point, but only the big remarks :) Hope that's okay.
class SmartCacheHtmlRenderer extends HtmlRenderer-> it interjects additional logic in a strategically chosen place :) We overrideHtmlRenderer::finish(), then call the parent method to resume as per usual.SmartCacheHtmlRenderer::finish()check if#type === 'html, and if so, immediately skip toHtmlRenderer:finish()I can't stress this enough: the patch was the simplest thing that could possibly work. It can be cleaned up. It can be simplified. That's why I found and fixed so many silly mistakes #26 :) Clearly, it needs to be cleaned up/simplified/optimized more, but what matter is that it proves this is viable! :)
Comment #31
wim leers#28 Sorry for the weird tone. It was very late and I was about to go to bed — I simply shouldn't have written that. Having slept, and being actually awake, I think it actually makes sense to already incorporate all of your feedback :)
#28:
When this patch was first rolled, we didn't have a
routecache context yet. And we also didn't yet have a hierarchy of cache contexts. The hierarchy helps to keep the cache ID simpler. In the case of SmartCache, we know we are caching by route, so any cache context that is implied by theroutecache context can be optimized away. See https://www.drupal.org/node/2451661. For this, that's usually going to be the active menu trail cache contexts.But you're completely right that the current code still goes through everything that
SmartCacheHtmlRendererdoes! It's not going to be a huge amount, but still, we were definitely doing useless work! Which means this reroll should be even faster than the profiling shown in the IS.AWESOME! Thanks, yched! :)
In this reroll, I've also added a fun snippet:
So when you do profiling with this patch applied, you have to uncomment that line.
Comment #32
wim leers#26 has significantly fewer test failures already! (But it still causes testbot to crash at the end, so you actually have to look at the results log for e.g. the patch in #10 and then compare it manually. The difference is quite large.)
Quite a few test failures are happening for admin routes. And since it may be too much work to add the necessary cacheability metadata to all admin routes before 8.0.0, and adding that metadata can easily happen in 8.1.0 without a BC break, I think it makes sense to make SmartCache skip admin routes, even if just for now.
Let's see what testbot thinks about that.
Comment #34
fabianx commentedNice work!
Comment #35
wim leersHurray! 644 failures, 77 exceptions. Finally we are getting actual results from testbot :)
The best part: all of those failures indicate places that are missing some cacheability metadata: either missing cache contexts or a missing
max-age = 0. So we can apply the same strategy as we used for #606840: Enable internal page cache by default: file child issues to fix those specific failures.We already have #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance for that, though, so now postponing again.
Comment #36
wim leersAnd this is #23.4, finally. The promised round of profiling. Profiled D8@
a4e51c82e60fca8f9e46338fd2bf8c080fee019e.Thanks to @yched's remark in #28.5, this became even slightly faster :)
I only profiled the frontpage, with APC on. This can be compared to the first table in #4, which I'm repeating here to simplify comparing.
And finally, here's a preliminary analysis, which we will continue at DDD Montpellier.
Because so much less time is spent on rendering, we can now see problems that we hadn't seen before.
Drupal\Core\Asset\AssetResolver::getJsAssets(). That's inclusive wall time, but still: that seems excessively much, for something that really is quite simple. Similarly,Drupal\Core\Asset\AssetResolver::getCssAssets()takes 4 ms (or 6.1%). If we can get both down to just 2 ms, that'd be another 5 ms saved, or 7% faster (on top of SmartCache).Drupal\Core\Theme\ThemeManager::getActiveTheme()andDrupal\Core\Theme\ThemeManager::initTheme(): each 3 ms, or 4.8% of page load time. Again, inclusive wall time.I now also generated Flame Graphs. Note how is also true here: the very early bootstrap becomes much more visible,
DrupalKernel::boot()goes from 1.49% to 3.17% of total page generation time, and::getHttpKernel()from 1.49% to 4.37%.Drupal.org sadly doesn't allow SVGs to be attached or embedded. So just download the attached zip file to look at them.
Comment #37
effulgentsia commentedFrom what I remember, there's similar overhead in 7.x, and a big source of it is the fetch and unserialization of the theme registry. The fetch could maybe be sped up by storing it in a bin that uses the chainedfast backend. But quite likely it's the unserialization that's most of the cost, so that might not help all that much. With the smartcache though, we ought to be calling way fewer theme hooks per request, so another possibility is to split up the registry into a cache item per theme hook (or more likely to be better, per base theme hook). This should probably all go into a separate issue, but I'm too lazy to file that at the moment.
Comment #38
dawehnerI'm not 100% sure, but last time I had a look at that code initTheme and getActiveTheme have been separated from the registry entirely. They are only to negotiate the active theme, which is probably quite costly, if you have multiple possible ones.
Comment #39
mikeytown2 commentedIn terms of template_preprocess_html() calling AssetResolver::getCssAssets() and AssetResolver::getJsAssets() causing the slowdown due to rendering the scripts; could we use a render cache here? That's exactly what I do in AdvAgg, see _advagg_process_html().
Couple of tricky things to look out for if you do decide to use a render cache for css/js:
- Use getMultiple to get both css & js at the same time to speed things up.
- Drupal.settings will need to be post processed so the current pages Drupal.settings will be replaced with the cached version. Also means the Drupal.settings will need to be excluded from the hash used for the cache. If you don't do this you're cache hit ratio will be very bad on a semi complicated site.
- Caching the alters can cut the time in half on a simple site and even more so on a complicated one. Digging into what the alter does and using that as apart of the hash is key. Make sure 3rd party modules can add code in to allow the altering of the hash. This is sorta tricky as if you do it wrong this is a wash. Looking at system_js_settings_alter() and at this point I would avoid caching the calls to the alter functions; too much going on to easily cache alter calls.
- If using js to render the css (https://github.com/filamentgroup/loadCSS) then if you get a hit on the css but a miss on the js or the other way around you'll need to render both css & js. Also means you'll need to include the setting that turns it on/off in the hash if using functionality like this.
Comment #40
fabianx commented#39: Excellent point!
We removed all runtime library altering, so we can cache all except settings, which we get on runtime via hook_js_settings_alter().
And yes, indeed caching that result is easily possible in the same smartcache bin as long as we synchronize it with the files on the system. (e.g. store both the filenames and the rendered data, then re-verify that the cache is current).
/me excited!
Go Wim!
Comment #41
moshe weitzman commented#36 - I don't see any flame graphs in the .zip. Would love to see. Last time I did flame graphs I exported to PNG somehow and embedded in the issue so folks could get a taste. I also hyperlinked that to the SVG hosted on dropbox. Send me SVG and I will do it for you.
Comment #42
wim leers#41: there are *two* zips. The "FlameGraphs.zip" one is the one you want :)
Comment #43
wim leersChasing HEAD.
Comment #44
wim leersSome rebase cruft snuck into #43.
Comment #45
xjmComment #46
misc commentedTested it out, and it seems to work very well, great job on this one. My profiler is not working for the moment, so I could not do proper testing.
Comment #47
misc commentedNoticed one thing - I guess it is by design - but before the patch I got the cache-tag config:user.role.anonymous, after the patch I dont get it in the header any more.
Comment #48
wim leersMiSc: no further profiling is necessary for now, this is still in the PoC stage.
All of you who would like to help here: please look at the test failures in #32, pick a specific test, file an issue for fixing that test failure, figure out why it's failing, and post your analysis, hopefully along with a patch. Feel free to ping me on IRC if you have questions. Thanks! :)
Comment #49
wim leersJust chasing HEAD.
Comment #50
wim leers… and now properly chasing HEAD. I only fixed the conflict in #49, didn't actually try running it.
Comment #51
wim leersAdd a "how it works" section to the IS.
Comment #52
wim leersMore clarifications, and a "how it relates to the BigPipe issue" section.
Comment #53
wim leersForgot a few bits.
Comment #54
wim leersChasing HEAD.
Comment #55
wim leersOne super simple change to significantly reduce the number of test failures (down from 644 in #32): mark non-GET forms as uncacheable. It seems acceptable/reasonable to defer updating every single form in Drupal core to a major follow-up:
Let's see how many failures we're at now.
Comment #57
wim leersSmartCache was breaking page titles defined in render arrays. Fixed that. This fixes at least 10 test failures.
Comment #59
wim leersNice, it fixed 75 failures :)
I investigated the remaining failures, and already found two fairly simple issues:
Comment #60
wim leers#2484611: Tracker responses don't set cache tags & contexts is already RTBC!
Two more child issues: #2082315: Tracker history markers ("new" and "updated" markers, "x new replies" links) forces render caching to be per user & #2082317: Forum history markers ("new" and "updated" markers, "x new posts" links) forces render caching to be per user.
Comment #61
wim leersClearer "remaining tasks" section.
Comment #62
wim leersWe had a (very long) discussion about the security implications of this issue at DrupalCon LA with Mark Sonnabaum, Wim Leers, Crell, dawehner, webchick, Alex Pott and fgm.
The problem
If a piece of code (an access result, a function building a render array …) forgets to add a certain cache context, it's possible for information disclosure to happen: if something should only be accessible for users with role A, and the
user.rolescache context is missing, then if a user with role A first accesses the content, and then a user with role B accesses it, then the user with role B will be able to see the content.Clearly, that is an important concern. But, the same underlying problem exists in Drupal 7 if the cache keys don't take all dependencies into account (in block cache, views cache and render caching). The key difference is that this affects all content on the page, rather than only fractions of it.
More importantly: any caching has security implications: if the cache ID is not taking some things into account, then there is the potential for security problems. By that reasoning, we should be doing almost no caching at all. The risk is lower for well-structured, well-defined renderables such as entities and blocks, because Drupal core does all the rendering, and is well-tested. Custom controllers usually are not quite as well tested. So we want to reduce the risk for those as well.
The solution
max-age = 0.user.permissions, this seriously diminishes the risk, because permissions are by far the most common factor in determining the visibility (access) of certain content.This diminishes the risks sufficiently to
Comment #63
fabianx commented#62:
In three ways we are then even more secure than Drupal 7:
a) Even if caches are not enabled by default, the first thing that most users do when a site is slow is to enable some kind of cache (views, panels,...). That means that the potential information disclosure happens on live - instead of during development or on a well-tested staging site.
b) By being able to enforce a cache context for everything on the page, we can partition everything by user permissions hash - but someone with the knowledge and 60 roles can still easily opt-out (after doing a security audit). That was not as generically possible in Drupal 7. (e.g. the filter cache would never take into account the user.roles / permissions; any custom filter that did not know there was a transparent never seen cache and that forgot to opt-out of caching, could have introduced a security issue there easily.)
c) In Drupal 8 you can set max-age=0 and you are "done" from a security standpoint [need to check in how this might affect forms], while in Drupal 7 you could opt-out of page caching (drupal_page_set_cacheable(FALSE)), but you would need to check your whole stack that not at some level somewhere you were using any caches that were not respecting that and because that is set by session for any authenticated user very early on, no one checks it in practice ... Being able to say, because I have this little critical information here that I never want to cache, but always re-create dynamically / or make the whole page uncacheable by bubbling that up is a HUGE advantage.
Comment #64
fabianx commentedBtw. I would be happy that controllers need to opt-in to being cached by smart-cache [In the discussion I missed that the main concern have been custom controllers], where entity and other secure core controllers would opt-in automatically.
Could be as simple as allowing controllers to use CacheableDependencyInterface optionally and maybe providing a AlwaysCacheableTrait?
Also anyone who sends their own (non-cacheable) Response will / should be automatically opted out of all caching. (2nd step for CacheableResponseInterface)
That should also again reduce the test-failures significantly of smart-cache. Could even have a run-time / container setting for that (so that we can still fix uncacheable parts).
I believe custom controllers are the 10%, not the 90% case.
Comment #65
mikeytown2 commentedWhat if we had some sort of access callback for complex cases (or whatever is the D8 equivalent)? I do agree with Fabianx that this is a lot better than what we have in D7.
Comment #66
hass commentedI need to ask and notify you about an I think complex example I need a solution for and if I set
max-age = 0and this bubbles to the page and makes all uncached - than I guess you may hate me for doing this as Google Analytics is installed on most Drupal installs.Here is what we need to cover:
I hope you can give me an idea how I can handle this as I feal a bit lost currently.
Comment #67
fabianx commentedThanks for your concerns, hass. This is exactly why we are creating such frameworks as cache contexts for 'user' and 'url'.
It would be great if you could watch https://events.drupal.org/losangeles2015/sessions/making-drupal-fly-fast... however first as there we do explain a lot of the basics.
I also do give an example of a block that has kinda the dynamicness of what you describe above.
Another requirement is for the GA code to be in the html head right so that it is executed as soon as the page starts to load? (Just asking, I am not sure.)
I know it might be frustrating that APIs are not yet finished for the caching especially if you have such a highly dynamic module like GA or recaptcha, but I assure you we are working hard on it and then it will all be simpler and make much more sense. But that dynamicness is exactly the reason why we don't support inline JS anymore and yes - of course we need to support use cases like that.
I pledge to personally help your modules to be ready so that they are ready when D8 comes out - however I have to beg your pardon it might take a while as we are in the process of finishing the last big problems.
-----
edit: some examples:
- JS code may change per user e.g. userID is unique
=> 'user' cache context
- JS code may change on every page as someone may add a dimension / metric that is based on a dynamic token.
=> custom cache context or if its based on query parameters 'query_args:argument'
- JS code will change on every search result page as the page URL is dynamic.
=> 'url' cache context
- JS code may need to be shown per user on some pages, but others not.
=> 'user' cache context OR custom cache context to know what the per 'user' or not deviates on
- JS code need to change if Drupals page status code may be 403 or 404 ("file not found" / "access denied" tracking).
=> Re-use the request.access_result cacheability meta data, likely that is already correctly cached though, so nothing needed to be added
- JS code need to be removed if a user unsubscribe from getting tracked (per user setting).
=> Per 'user' cache context
- JS code need to be added to a page or not if path filters are active e.g. /admin
=> Per 'url' cache context
- JS code may change on several
=> custom cache context
- JS visibility on a page may be fully dynamic with PHP code snippet.
=> PHP snippet will need to define cacheability metadata or be treated as uncacheable
- JS visibility on a page may be based on DNT http header a client may send.
=> "headers:[some header]" cache context
- JS visibility may be based on role membership
=> 'user.roles' cache context
- Plus many other dynamic parameters that just change the JS code
=> Whatever the dynamicness varies by, needs to defined as a cache context
Comment #68
znerol commentedMy gut feeling is that the GA module should use JS settings for the variable parts of the code (like user-id) such that the tracker code itself is cacheable without tons of cache contexts (especially without the
usercontext).Comment #69
fabianx commented#68: Yes, however even for the JS settings you should declare cache contexts, so that a site could potentially fetch those settings via ajax (for e.g. per role cached pages) if it does not need GA to run already in the head.
Comment #70
giorgio79 commented#66
You could also handle some of the logic via pure js, pregenerating the logic once the user saves the GA settings...
Eg, js could easily handle urls,
if url contains this and that
then run this and that
instead of letting the db decide what to output per page...
Comment #71
hass commented@Fabian: Thanks for the video link. BigPipe looks really like a killer feature.
Yep. It need to run async, defer and as soon as possible from head. Loading anything later with ajax is too late.
I still think it is a pure lie that we cannot support inline JS. Inline JS and CSS is a critical feature for mobile first. I noted this in the other case and never received any feedback from Wim who pointed earlier to this. But this is out of scope of this issue here.
We need to get GA up and running. A lot of people are asking for it far before core reaches RTM.
I have also been told be bedir that we need to change a lot of code for smartcache and that "headers:[DNT]" cache context will not work and we can only solve this with JS code that ask the browser for DNT status. You said it will work with the header context. I should give it a try what is really true now.
What is a custom cache context and how will this implemented?
Thanks for the examples. If I combine them all together I will end in a very large list of contexts.
Does this mean that every token need to define it's own cache level? This sounds like highly complicated coding in some cases. Ok a "user" token is user context, but how should I add this if I have no real clue what the users are entering. I leave this to token for now, but I'm sure a lot of things becomes extreme difficult to solve with tokens.
About recaptcha I found a solution by using attached and html_head, see http://cgit.drupalcode.org/recaptcha/tree/recaptcha.module?h=8.x-2.x#n98 . This works stable and since I added addCacheableDependency() all looks good for now. However I still hope to be able to move this into a libraries file just to make it cleaner.
Thanks fabian for your feedback and help and commitment.
Comment #72
hass commented@znerol: DrupalSettings are also cached, aren't they? No idea how this could change anything if I need the userID. It is static for one user, but it is different for all your users :-)
Comment #73
hass commentedSure that is config context. That is really the easiest part.
Comment #74
fabianx commented#73: As GA really only needs the User ID in JS, we could store it in a cookie - readable by JS. GA could then do _one_ ajax request at the very first page load, to populate the cookie / user specific settings - if its not there already. We use this technique a lot.
Just to give some food for thought:
It might be that the client _never_ hits Drupal in the first place, because all pages (yes also authenticated user pages) are cached in e.g. Varnish or Akamai or another CDN. But GA should then still work ... [ granted usually someone logs in if they are authenticated, so just setting the cookie on the first uncached request usually works fine. ]
Comment #75
wim leersI've been working on addressing all feedback. But, before posting that, I want to reupload the patch in #57, because interestingly, many of the failures are now actually gone. In no small part thanks to the many bugfixes that went in in the past 1.5 week that were blocking #2381277: Make Views use render caching and remove Views' own "output caching".
So, uploading a rebased #57. #57 had 124 failures. This should have at least 32 less failures, so 92 at the most.
Comment #76
wim leersIn this reroll:
user.permissionsanother required cache context). This required a whole bunch of test changes.SmartCacheIntegrationTest.Note: if people think that's a good idea, I could move point 1 into a patch of its own, that'd make this patch much easier to review again. (As easy as #57/#75.)
#63:
max-age = 0in D8 opts out of all caching at the level where you set it and at all higher levels. Which maps 1:1 to how HTTP works.#64: I disagree. For three reasons:
CacheableDependencyInterfacedoesn't make sense for controllers, because to know the cacheability of the controller's output, you need to build the render array, and render it… Plus, very often, a single class contains many controller methods. But the interface can obviously only be implemented once.Wow, #66—#74 are quite the distraction. It's good stuff for sure, but it's not at all specific to this issue. Answering to key questions/points succinctly.
CacheContextInterface+ the implementations in core of that interface.<body>, i.e. everything inpage.html.twig. The rest (i.e.html.html.twig) is still generated dynamically. So, it's perfectly possible for the Google Analytics module to define a#post_render_cachecallback that calculates the dynamicdrupalSettingsfor the current request/user. This is also what we do for node links, comment links, the comment form and … the history module. The history module actually is the closest example to what you need: seehistory_attach_timestamp(): it adds a setting with data specific to the current user.Comment #79
webchickSpinning off "Implemented #62 (made user.permissions another required cache context). This required a whole bunch of test changes." sounds like a good idea to me.
It also seems like a good idea to spin off a dedicated support request about using cache contexts. There are some good bits in this issue that are going to get totally lost because no one would ever think about looking at the issue that implemented SmartCache to find more information on how to use cache contexts et al correctly.
Comment #80
fabianx commented#79: Absolutely agree, lets spin off user.permissions to make that another required cache context to another issue.
#79: Yes, we really should move that thread over to there ... Is there any technical way to move things? I guess not so we should just paste a summary? Or maybe add the discussion to a wiki page?
Comment #81
wim leersI've been working on removing as much logic as possible from
SmartCache, and reuseRenderCache(Interface)instead. After all, this is just another implementation of the cache redirection logic. By reusing the existing logic, we don't need to write the extensive, detailed test coverage for that again.However, I'm not yet satisfied with it, so here's a WIP interdiff. Fabianx has ideas on how to make it simpler/cleaner. This should be the bulk of the conversion work though. Interdiff is applied to #76.
Comment #82
wim leers#79.1 + #80.1: child issue created: #2493033: Make 'user.permissions' a required cache context. That allows me to remove that part out of #76, which should bring the number of failures down to 96 again. The interdiff is basically the reverse of #2493033.
#79.2 + #80.2: we already have a meta issue for that: #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance. But, I'll repeat what I said in #76: 99% of what is being asked in #66 has a very simple answer: . (But, note that
#post_render_cacheis being removed in favor of a simpler concept, in #2478483: Introduce placeholders (#lazy_builder) to replace #post_render_cache.)Also filed #2493021: Remove unused & useless services from HtmlRenderer, which should allow this patch to become slightly simpler.
And also filed #2493047: Cache redirects should be stored in the same cache bin, which I discovered as part of #81.
Comment #83
wim leersComment #84
wim leersMaking SmartCache obey the
no_cacheroute option further reduces the number of failures.Removes another >=15 test failures.
Comment #87
wim leersSmartCache is storing rendered output. It should therefore also specify the 'rendered' cache tag. This further reduces the number of test failures.
Comment #89
wim leersThemeSuggestionsAlterTestwas failing due to thesystem.themeconfig not invalidating therenderedcache tag. Fixing that removes another 9 failures.Comment #91
berdirStill haven't reviewed this. I was quite scared by this, but with the limitations that were put in place for a first phase (like excluding all post forms and admin pages), it looks like this actually works, the tests are certainly starting to agree with that :)
+1 on making it an opt-in feature per controller. If we can make this work for entity view and views, then we should already cover a huge amount of requests for most sites, and if they have custom stuff they can always opt-in to benefit from this as well.
Also still wondering if this should be a module, just like page_cache, then it would be easy to turn off...or actually, part of the page_cache module?
Comment #92
wim leersWhen the page cache killswitch is used (e.g. when an unhandled exception occurs), we shouldn't be caching the associated HTML render array in SmartCache. In other words: besides respecting the
no_cacheroute option, and excluding_admin_routeroutes, we should also respect the page cache kill switch, and more. And all of that is already encapsulated in request & response policies!Therefore, let us reuse that same infrastructure/API, but introduce request/response policies for SmartCache that are separate from PageCache. After all, PageCache is about caching responses for anonymous users, whereas this is about caching partial responses for *all* users. Hence the key change in this patch is this:
This also makes it easy for sites/developers to add additional generic exclusion rules if they'd so choose.
This fixes the failures in
ErrorHandlerTest(5) +PagerTest(4).Comment #93
wim leers-1.
See #76's reply to #64 for my rationale. In short: it needs to be opt-out, not opt-in, if we want the D8 ecosystem to actually support it. Otherwise we end up with the same as render caching in D7: core has the API, but nothing actually uses it.
PageCache is conceptually simple. SmartCache is conceptually much more difficult to explain. Therefore I don't think we want to expose this as a module, because it will simply be too intimidating.
I propose we keep it in core, but add the necessary services/service modifications through a compiler pass, and allow a container parameter to prevent that compiler pass from running, hence disabling SmartCache completely if that container parameter is set to
false.Comment #95
wim leersSigh. Re-testing.
Comment #98
wim leersThe first failing test (
BlockTest) is failing because block visibility conditions don't set the necessary cache contexts. There's an existing issue for that: #2375695-42: Condition plugins should provide cache contexts AND cacheability metadata needs to be exposed. That is now another blocker for this.Comment #99
fabianx commented#98: Lets add the easy fix to this issue:
- Use max-age=0 for any added conditions for now with a @todo on the other issue?
Comment #100
fabianx commentedAttaching just a quick interdiff for #99.
Probably will need to use the same trick of InaccessibleBlock wrapping the real block for security reasons and BC ... (e.g. to not have page manager expose inaccessible blocks then) ...
Comment #101
wim leers#99: Excellent point. Except that that is already happening in
BlockAccessControlHandler:Excellent!
But, then
\Drupal\block\BlockRepository::getVisibleBlocksPerRegion()causes a lot of sadness:i.e. access results' cacheability metadata for blocks never makes it into the render arrays of blocks. I did some git history debugging; this was just never tested; this max-age = 0 thing originates in the original "access result cacheability metadata" issue, which just updated all code, and didn't add test coverage for every particular use of it (#2287071: Add cacheability metadata to access checks). Because that was out of scope.
Filed a critical security bug #2495171: Block access results' cacheability metadata is not applied to the render arrays to fix this. Which means this is now blocked on #2495171, not on #2493033. Adjusted the IS.
Here's a straight reroll/rebase now that #2493021: Remove unused & useless services from HtmlRenderer has landed.
Comment #102
wim leersSo #100 is very roughly what we want in #2495171: Block access results' cacheability metadata is not applied to the render arrays.
Comment #103
fabianx commentedComment #104
fabianx commentedComment #105
wim leersI updated the IS incorrectly in #101. Fixing that.
Comment #107
wim leers#2493091: Installing block module should invalidate the 'rendered' cache tag landed, which should remove the failure in
TrustedHostsTest. Re-testing for that. The patch is currently at 59 failures, should go down to 58.With menu render caching having landed, I was also able to get #2254865: toolbar_pre_render() runs on every page and is responsible for ~15ms/17000 function calls going again, which should go a long way in helping to address the 14 test failures in
ToolbarAdminMenuTest.Comment #112
wim leersHah, looks like that actually fixed a bunch of other failures as well. New low: 55 failures :)
Comment #113
hass commentedComment #114
berdirJust posting the interdiff for smartcache from #2375695: Condition plugins should provide cache contexts AND cacheability metadata needs to be exposed.
Comment #115
berdirOpened #2500013: Add cacheability metadata information to translation overview for the content translation test fails.
Comment #116
wim leersAdding #2500013: Add cacheability metadata information to translation overview to the IS. Thanks for that one! That'll push the number of remaining failures down significantly. :)
Comment #117
wim leers#2484611: Tracker responses don't set cache tags & contexts landed!
Re-uploading a rebased #100 patch, without any other changes. This should reduce the number of failures by 6. If no new failures were introduced, then this should bring it down to 49 failures :) Let's find out!
Comment #119
wim leers#2470693: Upgrade to Symfony 2.7.0 caused those many failures and impressively many exceptions. Simple fix :)
Comment #121
martin107 commentedFound only minor nits while looking this patch over.
Comment #123
wim leersThis is the only change I disagree with. And I disagree strongly.
This concatenates the two comment blocks together, as if they logically belong together. They do not.
And the changed indentation — from bizarre to the usual — is also bad, precisely because it was intended to be so bizarre. It makes it stand out. This part is there fore testing purposes only, it is not actually being proposed to be committed, so that's why I want to keep it standing out by being bizarre.
Reverted this, kept the rest. Thanks!
Both #2481453: Implement query parameter based content negotiation as alternative to extensions and #2273925: Ensure #markup is XSS escaped in Renderer::doRender() broke this in tricky ways. The former requires SmartCache's event subscriber to ignore any requests that specify a wrapper format (one-line change), the latter requires us to mark all of SmartCache's cached markup as safe.
By fixing those two things — which basically boil down to chasing HEAD — we should be seeing the effective consequences of #2484611: Tracker responses don't set cache tags & contexts having landed: if I counted correctly, we should be down to 47 or less failures.
Comment #125
wim leersYay, down to 39 failures! :)
Also, with #2484611: Tracker responses don't set cache tags & contexts landed, the issue title needs an update :)
Comment #126
wim leers#2484619: Forum responses don't set cache tags landed, which should fix one more failure! Re-testing.
Comment #129
wim leersComment #130
wim leersPeriodical rebase. Also to check something for the child issue #2500013: Add cacheability metadata information to translation overview.
#2483781: Move cache contexts classes from \Drupal\Core\Cache to \Drupal\Core\Cache\Context, #2480811: Cache incoming path processing and route matching, #2407195: Move attachment processing to services and per-type response subclasses caused quite a few things to change. The latter has allowed for a significant clean-up/simplification. While working on this, I also noticed a small oversight, opened #2512382: Follow-up for #2407195: #attached['http_header'] being added to Response in two places for that.
Comment #132
wim leersFrom 38 (in #132) to 33! A new low :)
The 3 failures in
Drupal\system\Tests\Common\NoJavaScriptAnonymousTesthave disappeared, thanks to #2407195: Move attachment processing to services and per-type response subclasses.And the 2 failures in
Drupal\system\Tests\System\FrontPageTesthave disappeared as well, thanks to #2480811: Cache incoming path processing and route matching.Getting ever closer to zero :)
Comment #133
wim leersFixed the failure in
UpcastingTest, it was a bug inRouteCacheContext(using raw parameters, hence not the upcasted parameters, hence not the upcasted, i.e. translated entities). Fixing it here, because here we have a clear failure showing the bug.This should bring us down to 32.
Comment #135
wim leersOh, wow! I definitely don't like this. It should've been 32, not 15.
This implicitly "fixed" 15 more failures, but it didn't really fix them, they only look fixed thanks to a bug. #133 changed the route cache context to use the non-raw (i.e. upcasted) parameters. But something is modifying the upcasted parameters (node entities) after parameter upcasting. Hence the route cache context has different values just after routing (when SmartCache's event subscriber runs) and at render time (when SmartCache does its caching). Hence there's a mismatch in these cases, and hence there are always SmartCache misses (in case of node route parameters). That causes these tests to no longer fail.
This points to a much deeper problem with our
ParamConverters. Discussed with catch, he opened #2512718: EntityManager::getTranslationFromContext() should add the content language cache context to the entity for that.So, reverting #133, despite it being a correct fix. That fix will now be introduced, along with removing the failing test method of
UpcastingTest(because it tests something that is wrong), in #2512718: EntityManager::getTranslationFromContext() should add the content language cache context to the entity. We found a bug related to that: #2512712: Entities rendered by EntityViewBuilder should vary by content language cache context, which can already be fixed independently.Re-uploading #130, to not have a misleadingly low number of failures. Should be back to 33.
Comment #137
wim leersGiven that we're at 33 failures, and have patches in the works to bring it below 20, I investigated what it'd take to get us to zero, and to check that there are no more further criticals hiding here.
Drupal\block\Tests\BlockTestDrupal\comment\Tests\CommentTranslationUITestDrupal\content_translation\Tests\ContentTestTranslationUITestDrupal\content_translation\Tests\ContentTranslationWorkflowsTestDrupal\node\Tests\NodeEntityViewModeAlterTesDrupal\system\Tests\ParamConverter\UpcastingTesDrupal\system\Tests\Session\SessionTestDrupal\toolbar\Tests\ToolbarAdminMenuTestDrupal\tracker\Tests\TrackerTestOf those, #2512580: NodeEntityViewModeAlterTest uses State to dynamically affect hook_entity_view_mode_alter(), #2512718: EntityManager::getTranslationFromContext() should add the content language cache context to the entity, #2512734: session_test routes/controllers don't specify the appropriate cacheability metadata and #2512866: CacheContextsManager::optimizeTokens() optimizes ['user', 'user.permissions'] to ['user'] without adding cache tags to invalidate that when the user's roles are modified are new issues. (And I think #2512866: CacheContextsManager::optimizeTokens() optimizes ['user', 'user.permissions'] to ['user'] without adding cache tags to invalidate that when the user's roles are modified may be another security critical.)
(Also added to IS, where we can keep it updated.)
Comment #138
wim leersNote that #2512580 and #2512734 have very simple patches. Let's land those ASAP, then we'd be down to 25 fails :)
Comment #139
berdirReroll, the block test fails should be gone now.
Comment #140
berdirAh, need to generate a proxy class
Comment #143
berdirStill one test fail in BlockTest, see #2375695-56: Condition plugins should provide cache contexts AND cacheability metadata needs to be exposed:
I think that should just be added to this issue. Doesn't make sense to me to do that in a separate one.
Comment #144
wim leers#2512580: NodeEntityViewModeAlterTest uses State to dynamically affect hook_entity_view_mode_alter() also landed.
Per #143, the one remaining failure in
BlockTestshould be fixed here.Comment #145
wim leers#2512734: session_test routes/controllers don't specify the appropriate cacheability metadata also landed.
Re-uploading the #140 patch to get the new number of test failures, which should be 22 :)
Comment #147
wim leersPer #62 + #79 + #80 + #82: to do SmartCache, we must also do #2493033: Make 'user.permissions' a required cache context to provide sufficient security. So this issue is also blocked on that, but it isn't responsible for test failures, which is why it wasn't listed before. Updated the IS.
(Thanks @plach for pointing that out!)
Comment #148
wim leersReroll of #145, with the minimal changes necessary to keep things working (#2450993: Rendered Cache Metadata created during the main controller request gets lost and #2512866: CacheContextsManager::optimizeTokens() optimizes ['user', 'user.permissions'] to ['user'] without adding cache tags to invalidate that when the user's roles are modified require changes in SmartCache).
Comment #149
wim leersUpcastingTestfails due to\Drupal\paramconverter_test\TestControllers::testEntityLanguage()being plain wrong/silly. One-line fix for that test is now included in this patch.That only leaves #2082315: Tracker history markers ("new" and "updated" markers, "x new replies" links) forces render caching to be per user, which we still need to get started on.
This should bring it down to 21 failures, assuming #148 still has 22 failures, like #145.
Comment #152
wim leers#149 has one less failure than #148 as expected. The fact that it's 11/10 failures and not 22/21, is solely because we have a new test runner as of yesterday, which reports things differently.
Comment #155
berdir5 new test fails in SmartCacheIntegrationTest but those are trivial to fix.
Comment #156
wim leersThe 5 trivial failures in
SmartCacheIntegrationTestare due to #2505989-58: Controllers render caching at the top level and setting a custom page title lose the title on render cache hits. That is being fixed in #2506581: Remove SafeMarkup::set() from Renderer::doRender, so those 5 failures should disappear again :)Comment #157
wim leersAnd #2500013: Add cacheability metadata information to translation overview has also landed in the mean time! (This is why Berdir re-tested it in #153.)
Updating issue accordingly :) Keeping at PP-4 because of #156.
Comment #158
wim leers#2493033: Make 'user.permissions' a required cache context already was RTBC (and has been for quite a while now), #2217985: Replace the custom menu caching strategy in Toolbar with Core's standard caching. is now RTBC (EDIT: while writing this, it actually got committed! So reducing from PP-4 to PP3.), #2506581: Remove SafeMarkup::set() from Renderer::doRender is getting close to RTBC.
That leaves just #2082315: Tracker history markers ("new" and "updated" markers, "x new replies" links) forces render caching to be per user, which I've now rerolled and is now blocked on reviews!
Comment #159
serg2 commented#2082317: Forum history markers ("new" and "updated" markers, "x new posts" links) forces render caching to be per user is still listed as a child of this. I thought best to mention just in case you have overlooked it. It doesn't appear to be causing test failures which is what I think you are focussing on.
Comment #160
wim leersThis is a rebase of #149, with also #143 / #144 addressed.
Comment #161
wim leers#159: Indeed, technically we need to fix it, but it doesn't cause any test failures (because Forum's test coverage is lacking). That's indeed why it's omitted from the IS.
Thanks to #2407195: Move attachment processing to services and per-type response subclasses, we can slightly simplify SmartCache's logic.
Comment #164
lauriii#2493033: Make 'user.permissions' a required cache context has landed now!
Comment #165
wim leersI investigated the unexpected Toolbar failures. (After #2217985: Replace the custom menu caching strategy in Toolbar with Core's standard caching., we expected zero failures.)
I found the root cause, and I posted it as a comment on the relevant issue: #2512866-132: CacheContextsManager::optimizeTokens() optimizes ['user', 'user.permissions'] to ['user'] without adding cache tags to invalidate that when the user's roles are modified. Awaiting feedback there to figure out the best way to fix this.
Comment #166
wim leersFor #165, we now have a new major issue, that also blocks SmartCache: #2533768: Add the user entity cache tag to user.* cache contexts that need it.
Comment #167
wim leers#2082315: Tracker history markers ("new" and "updated" markers, "x new replies" links) forces render caching to be per user just landed. Again one less blocker! :)
Straight reroll of #161, to track how many remaining failures we have.
Comment #168
wim leersAs @lauriii noted in #164, #2493033: Make 'user.permissions' a required cache context has landed too. I forgot to update the IS — done now.
I will start updating the IS in the next few days to get this issue + patch ready for the final steps.
Comment #169
wim leersAnd now #2506581: Remove SafeMarkup::set() from Renderer::doRender landed too! Down to one blocker :)
Comment #173
cosmicdreams commentedIt appears that even though #2082315: Tracker history markers ("new" and "updated" markers, "x new replies" links) forces render caching to be per user landed the tracker related fails in this issue are not resolved. How should we follow up on this issue?
Comment #174
berdirMaybe it's an invalidation problem again? So not that it's different per user, but that something changes and the caches aren't invalidated by it.
Comment #175
wim leersIndeed. Investigating.
First, here's a fix for the 5 remaining failures (1 on the new testbot) in
SmartCacheIntegrationTest. Those are caused by the fact that the simplification in #161 causes no more'page'key to be cached. I simply had not yet updated the test accordingly.Comment #176
wim leersFound the root cause why #167 (i.e. after #2082315: Tracker history markers ("new" and "updated" markers, "x new replies" links) forces render caching to be per user landed), we still have failures in
TrackerTest(also raised by #173 + #174).comment_listcache tag. Fixing that removes 6 of the 8 failures.user.roles:authenticatedcache context (for only adding thetracker/historyasset library for authenticated users). Fixing that removes the remaining 2 of 8 failures. (Without that cache context, SmartCache thinks the same HTML is valid for both anonymous and authenticated users.)Given that that is going to be the only small bit that technically belongs in a child issue, I think it's fine to do that last bit as part of this patch.
Now rerolling #2533768: Add the user entity cache tag to user.* cache contexts that need it, which is the last child issue, and will fix the last 2 failures.
Comment #178
wim leersIS updated.
Note that I'm working on #2499157: [meta] Auto-placeholdering, which would make this issue effectively cache the majority (i.e. minus the too-dynamic parts, which are automatically placeholdered) of most HTML pages (we don't cache
/adminHTML pages for example)!That, in turn, makes BigPipe (#2469431: BigPipe for auth users: first send+render the cheap parts of the page, then the expensive parts) significantly more effective, because it means BigPipe can just send the initial HTML directly, because SmartCache already has it cached. BigPipe simply chooses to stream the content for the placeholders after the majority of the page is already sent to the browser (i.e. the user doesn't have to wait for the highly dynamic content in the placeholders to be rendered), instead of first replacing all placeholders, and then sending a completely finished response (i.e. the user DOES have to wait for the highly dynamic content in the placeholders to be rendered).
Comment #179
fabianx commentedSo tracker shows both nodes and comments?
Makes sense ...
uhm, the cache context should be added outside of the if - unless I am missing something.
Comment #181
berdirShouldn't #2493033: Make 'user.permissions' a required cache context essentially result in the same because the permissions is going always different between anon and authenticated?
Is it possible that smartcache doesn't respect the required cache contexts yet and only gets the implicitly if there's something being render cached on a page that bubbles up?
Comment #182
wim leers#179:
#181: Well, I think it's theoretically possible that they have the same sets of permissions. So we should fix that bit in the Tracker module anyway. But, nevertheless, you raise an excellent point!
Cha-ching! We've got a winner! Fixing that next.
Comment #184
wim leersHere it is, required cache contexts are now handled by SmartCache. Includes updated test coverage.
(I've manually tested it locally, and can confirm that with this fixed, we technically could remove the
'user.roles:authenticated'cache context that is being added intracker.pagessince #176, but as explained in #182, I think it's better to still specify it anyway. Especially now that it's cleaned up/documented well since #182.)Comment #185
fabianx commented#184: Now that we handle required cache contexts ourselves and cache whatever we get as cacheable render array, we can probably use render cache service now and avoid the logic duplication, right?
( e.g. inject it, which would still work even if it was private. )
Comment #186
wim leersAlso, I just realized that since #2407195: Move attachment processing to services and per-type response subclasses, SmartCache can be simplified even further, and to do even less work on cache hits. Instead of caching the
#type => 'html'render array, we can now simply cache the entireHtmlResponseobject AFAICT.It's getting late here, but I'll look into that tomorrow. I very well may be forgetting something right now!
Comment #187
wim leers#184: no,
RenderCacheis not aware of required cache contexts. See #2483887: Mark RenderCache as internal for that. So, for now, that's a simplification we cannot yet make.Comment #188
berdirSure, we should definitely keep explicit cache contexts.
That said, even if anon and authenticated would have the same permissions, our implementations ensures that the hash is still different because we also include the role names. And I think we could actually optimize user.roles* into user.permissions with #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents().
While that does not itself result in different variations as they will always be consistent with each other, it would likely result in fewer cache redirects, since we have the complete list of necessary cache contexts already available upfront. @catch made a similar point about user.permissions being a required context already in the issue that made it so.
Comment #189
fabianx commented#187 Yes, that is what I meant.
Because we now handle required cache contexts in smartcache ourselves, we can use the RenderCache service to load / store the data instead of duplicating the logic from there.
Comment #191
wim leers#188:
True. Since #2417895: AccountPermissionsCacheContext/PermissionsHashGenerator must special case user 1, since permissions don't apply to it, not before, but true.
Hah, great thinking! (But again, only since #2417895.)
Can you comment on #2453835 so we don't forget it there?@Berdir++EDIT: Berdir already did: #2453835-12: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents() :)
Indeed :)
Comment #192
wim leersBefore embarking on the idea I had in #186 to make SmartCache even faster, I figured I'd do some profiling so we have an idea about the impact of the SmartCache patch after all these months. Is it still worth it? As you'll see below, the answer is .
I did a round of profiling using the patch in #184, with D8 HEAD (commit
cc34748c31e393e25f595f472c1a40bedd9acf12). The sole @todo is uncommented — again: that won't be necessary anymore once #2499157: [meta] Auto-placeholdering lands. But in HEAD, e.g. the breadcrumb block is present on all pages, and hasmax-age = 0, hence every page is made uncacheable by it. Auto-placeholdering will fix that. But for now, uncommenting that @todo fixes it, so we can have meaningful profiling results.The last time was in #36. But in #36, I only profiled the frontpage, to compare @yched's suggestions for improvements with the original patch, that was profiled in #4.
This round of profiling tests more scenarios than before. And I also did benchmarking.
So, without further ado…
Profiling
/node/1/contactWhat's interesting here is that we see that neither
/node/1nor the frontpage show the 55-60% improvement anymore that we saw in earlier profiling. This is simply because we made Drupal 8's bottom line performance better. But, nevertheless, we still see 30-40% improvements for both of those.Note that both will become faster once we no longer placeholder node and comment links no longer explicitly use placeholders, but just bubble cacheability metadata (I'd swear Berdir filed an issue for this, but having gone through his last 50 (!) pages of nodes he participated in, I did not find it). Because that is the main reason the
/node/1and frontpage routes are still relatively slow even with SmartCache: because they have quite a few placeholders to replace, and to render those placeholders, a whole bunch of services need to be initialized.Contrast that with the results for the
/contactroute, which has only a single placeholder: that for rendering messages. It's sped up by 60% by this issue.Profiling evolution
Nevertheless, if we take a look at the evolution of the profiling over the course of this issue, i.e. if we compare the frontpage's results in #4 and #36 with that in #184:
We went from 66K to 75K to 37K function calls in the before ("without patch") situation. I don't know the reason for the 66K to 75K increase in HEAD at the time, but it's likely a regression. I do know the reason for going from 75K to 37K function calls: Views are render cached now.
But note we also went from 26K to 28K to 22K function calls in the after ("with patch") situation. We went up from 26K to 28K because the original patch in #4 was yielding incorrect results. But we did go down from 28K to 22K function calls in recent time. In part thanks to the SmartCache implementation becoming better, but mostly thanks to bootstrapping/routing/route access checking becoming more efficient.
Benchmarking
/node/1/contactComment #193
berdirThe issue you were looking for is #2530846: Fix and improve comment cache tag usage, I tested removing it there, but I'll open a separate issue just for that, so we can get the comment cache tags stuff done.
Comment #194
wim leersNote that with #2469431: BigPipe for auth users: first send+render the cheap parts of the page, then the expensive parts, every HTML page will basically see the ~30 ms response time of
/contact, and everything on top of that (the ~50 and ~70 ms response times for the frontpage and/node/1, so +20 and +40 ms, respectively) will happen after the initial page is sent, those contents would be streamed to the browser by BigPipe!Comment #195
wim leersI now did what I described in #186:
I'm happy to report that this shaves off another 300 function calls per page load, but much more importantly: it makes the SmartCache implementation trivial!
No more careful overriding of the
HtmlRenderer… just an event subscriber. Because this patch defers rendering of placeholders fromHtmlRenderertoHtmlResponseAttachmentsProcessor(to make it easier to review the changes in that area separately, I've attachedsmartcache_placeholder_replacement_in_htmlresponse-2429617-do-not-test.patch), SmartCache can simply be implemented as aKernelEvents::RESPONSEsubscriber. Note that this change is not an API change (it doesn't affect anybody). On top of that, it also simplifies the implementation of #2469431: BigPipe for auth users: first send+render the cheap parts of the page, then the expensive parts, because BigPipe will want to render the placeholders into BigPipe placeholders and not actually render the placeholders yet; the actual rendering of the placeholders happens later in BigPipe, as it wants to stream the HTML for those placeholders (as also explained in #194).In terms of performance, the difference will be negligible because it only saves 300 function calls (for the
/contactexample, we go from 17,166 function calls to 16,838, or 328 fewer function calls). So a new round of profiling is not useful. But what matters most, is that this new implementation is vastly simpler.If everybody is on board with this, I'll update the IS to describe the new implementation.
Comment #196
wim leersI saw a way to slightly simplify things further.
Also: I forgot to update a small bit in
SmartCacheIntegrationTest, that will cause its tests to fail.Comment #197
wim leersUgh, all of the above profiling was done with this in my
settings.local.php:Looks like that has significantly skewed the results. So, I'll have to redo it all. For example,
/contactwould then not take 16838 functionc alls as described in #195, but only 11,604 function calls and significantly fewer milliseconds.Will do that next, after a very, very late lunch.
Comment #200
yched commentedI've fallen a bit behind the last couple weeks, but :
So are the policies injected or pulled ? ;-)
Comment #201
yched commentedAlso, I might very well get that wrong, but : so far HtmlRenderer used to be in charge of rendering all the placeholders in the HTML, and now it's the job of HtmlResponseAttachmentsProcessor ? Doesn't this seem out of scope for a class that is primarily supposed to take care of attached assets ?
If BigPipe wants to swap a different processing for the placeholders, does it really want to swap the "process assets" logic as well ?
Comment #202
fabianx commentedI think we definitely need an HtmlResponseInterface now ...
I thought we had a HtmlResponseInterface?
If not maybe we could use AttachmentsInterface as that should be all we need?
Overall this works for me, but it won't make BigPipe easier, but more difficult overall.
The reason is 2-fold:
a) BigPipe needs arbitrary data in #attached, but between renderPlaceholders() and drupal_process_attached() there is no way to hook in.
So probably the best would be to lift the Exception from drupal_process_attached ...
(I never liked it anyway).
Or at least make the list configurable somehow via a container parameter or such.
b) We need to clone the full response to use the BigPipeResponse class, but that is very error prone at this point as someone could have already replaced the response with a custom class.
So likely a decorator with magic __call, while slow, is probably the only option left here and even then it is a little problematic as it could possibly not define the right interface.
So it would be better to return a BigPipeResponse from the start from HtmlRenderer, but that is only possible if placeholders are replaced (or rather: render strategies run), because only then we know that we want to BigPipe or not.
But I agree that for SmartCache this is way simpler.
Comment #203
wim leersThis should bring us back to 2 failures.
Also fixed #200 — thanks @yched — that's a PoC-hack leftover that I failed to clean up :)
Replying to #201 and #202 in a next comment.
Comment #205
wim leersHere's a new round of profiling, this time profiling #203. (Because, as I said in #197, the previous round of profiling had CSS & JS aggregation disabled, which skewed the results.)
Profiling
/node/1/contactProfiling evolution
Benchmarking
/node/1/contactComment #206
wim leersSo, an additional 10% gain, that was masked in the previous profiling.
EDIT: Moshe Weitzman asked why exactly. Well, CSS/JS aggregation being disabled means many more
<link rel=stylesheet …and<script>…tags need to be generated. And those are generated using the Render system. And that is apparently fairly expensive. We can dig into why that is expensive, but that's out of scope here. What matters is that we're profiling the actual situation that people will be using Drupal in the most: with CSS/JS aggregation turned on. Those numbers look great. But even with CSS/JS aggregation turned off, those numbers look great. So: either way: yay :)Comment #207
wim leers#201: But as of #2478483: Introduce placeholders (#lazy_builder) to replace #post_render_cache, placeholders are attachments. Attachments != only assets, it's also things like HTML in
<head>, HTTP headers… and … placeholders.BigPipe will not want to swap (override) asset handling, but they also won't have to. See below.
#202
We could use
AttachmentsInterface, but I think it's semantically wrong. This is only for HTML responses.Given that Symfony only has
Responseand notResponseInterface. Similarly, we can just haveHtmlResponse, and BigPipe could provideBigPipeResponse extends HtmlResponse.Introducing
HtmlResponseInterfacecan be done in a separate issue.a) Right, but that's just stupidity/brokenness/legacy cruft/clean-up we never got to in
drupal_process_attached()that needs to be fixed anyway.b) That's how the current BigPipe patch works. But I think it could just as easily be changed so that
HtmlRendereralways returns aHtmlResponseobject. That also makes sense conceptually, since it's called the HTML renderer. The current BigPipe patch (#2469431-81: BigPipe for auth users: first send+render the cheap parts of the page, then the expensive parts) is kind of hacky in that respect: it requires changes toHtmlRendererso that it is able to return aBigPipeResponse. That's not really acceptable IMO, because it means contrib cannot add more render strategies. So I think render strategies need to be layered on top of (i.e. run after)HtmlRenderer.Given that, I think this is a very simple, elegant and complete solution (interdiff relative to latest patch):
Comment #208
andypostThe related issue shows that we still use static for assets
Comment #209
yched commentedDoh, right. I even yayed about it in that very issue back then.
Sorry for the noise - awesome simplifications in the latest rounds !
[edit : and the proposed code at the end of #207 makes a ton of sense]
Comment #210
Anonymous (not verified) commentedanother drive by, just reading the code to see if any of it makes sense to someone way out of the loop.
typo, or some side-effect-causing-thing that should have a comment?
OMG my eyes. maybe not for this issue, but at least one object is missing an API method.
i think that comment can go away. it doesn't say anything interesting beyond what we see in the code.
this block seems like it could use some work. i don't think "soft-render" as a concept adds anything. i think it means "don't replace placeholders", which is already stated. also, i guess this is an early / late thing? or something? there's kind of a missing bit of 'why' here. here's a possibly wrong attempt:
Comment #211
wim leers#210
I'm hopeful that if those are your only remarks, that the patch was mostly clear? :)
Comment #212
wim leers#189 (@Fabianx):
In recent patches, that's not possible without hacks that impede legibility, because SmartCache now deals with
HtmlResponseobjects instead of render arrays.Thoughts?
Comment #214
wim leersFurther clean-up/simplification:
$this->routeMatchis unused; removed it.HtmlResponseobjects and ensuresHtmlRendererdoesn't yet render placeholders) makes it so that it is guaranteed that theHtmlResponsethat SmartCache receives always has the required cache contexts. Which means we can simplify SmartCache further, by reverting #184's non-test coverage bits.(Not a smaller patch, but a simpler patch. Although it would have been smaller if we would have less boilerplate.)
Comment #215
fabianx commented#207: I think the best is then to just add the necessary send() method to HtmlResponse itself, and check for $this->getSendService() or something like that and call the service if that exists.
e.g. make it possible to plug the send method in HtmlResponse() directly.
Then we don't need to deal with decorators or any of that, which are error prone and get direct integration and other strategies could use something similar.
Exchanging the response mid-event chain is not the right solution.
--
Anyway: Could we split off the renderPlaceholders() move to subscriber to another task, the both RenderStrategies and BigPipe, etc. can work off of that?
And lets have a dedicated subscriber for the renderPlaceholders() so that someone can hook in between.
--
#212: The logic is easy enough, lets keep as is.
Comment #216
fabianx commented#214 looks fantastic overall! :)
I think once we remove those last 2 test failures we can get this in!
Comment #218
borisson_Since this is almost ready, I read trough this patch and found a few small nitpicks as well.
shouldn't the DenyNonHtmlRequests Request policy already make sure that smartcache only cares about HTML responses?
I didn't really understand this comment until I read the contents of the
elseblock.I think something like:
// SmartCache can only cache HTML responses that have a max agewould more descriptive.I don't think you need to describe in the comment that this is an anonymous function.
80 cols
The period makes this go over 80cols, do we care about that?
Should this comment be in there?
Not sure why this class is in this namespace construct, can this be a normal namespace declaration?
namespace Drupal\Core\ProxyClass\SmartCache;I haven't seen any other files in this patch use this type of namespace declaration.That way this class doesn't need to use FQDNs all over.
Can you merge these two lines?
Should this also check in the response html that the correct
$animalis in the response?Comment #219
wim leersSo, by now I feel pretty good about the SmartCache patch :)
But! There was one more thing that was nagging me. SmartCache still requires routing and routing access to run. I.e. SmartCache caches per
routecache context. But why not make it cache perurlcache context? That'd be similar to the PageCache module & middleware.Reasons I could think of not to do that:
/node/1,/node/1?foo=bar,/node/1?foo=llamaand so on all end up using the same cached response, if/node/1doesn't actually care about thefooquery arg (if it did, it'd have specified theurl.query_args:foocache context), thanks to SmartCache relying on cache contexts to tell it which variants it should cache. In other words: it avoids cache poisoning (not exactly the right term, because I mean it here as in "filling the cache with garbage"), and ensures faster response times for all HTML requests.Acceptheader. That changed only weeks ago: see https://www.drupal.org/node/2501221. I.e. content negotiation happened. That's no longer the case.So, given those facts, I was thinking last night: . That protection could be quite simple — roughly speaking: for every URL sans its query string, cache whether the associated response every specified the
url.query_argscache context or not. This could then be used by both PageCache and SmartCache to automatically use the cached response for/node/1when the actual URL is/node/1?foo=bar! This would also help with the scalability of Drupal: it'd be harder to DDoS Drupal.This is a solution that can be implemented in a separate issue, and would definitely benefit PageCache too.
Given the above, I set out to do a quick experiment: modifying the SmartCache patch as minimally as possible to get an idea of how this would work. Ideally, it'd be converted to a middleware. But as a first step, just making the existing event subscriber run earlier would be sufficient.
In doing so, I ran into my first problem: a response that varies by the
usercache context never results in a cache hit, because… SmartCache now runs so early that we haven't created a User object yet!So, I changed the SmartCache event subscriber to run later (after
authentication_subscriber). This is the earliest possible time that could possibly work — note that I am quite likely to encounter further problems, but let's ignore that for a second. Also note this means it can NOT be converted to a middleware.Let's look at the performance of this incomplete solution. This brings
/contactdown from 11,605 function calls (in #205) to 10,894. 711 fewer function alls, and ~2 ms faster. Maybe worth it, but only a small incremental gain compared to what the current SmartCache patch already offers.A next problem is the
routecache context: that won't work, since this now runs before routing. But we're assuming here that theroutecache context is implied by theurlcache context, because we're assuming we don't haveAcceptheader negotiation. So, once we do #2453835: Allow non-intrinsic (implementation-dependent) cache context services to specify their parents via DynamicParentContextInterface::getParents(), theroutecache context would be optimized away automatically (since theurlcache context is always present for SmartCache). But that means that any contrib module doingAcceptheader negotiation on routes that also happen to serve HTML responses (i.e. the ones cached by SmartCache) would need to override SmartCache to a quite large extent, to … bring it back to the behavior that the current patch (#214) implements.In conclusion:
Acceptheader-based negotiation./node/1?foo=barto/node/1, to increase cache hit ratios and avoid DDoSes by filling the cache with random query strings.Comment #220
wim leers#218
DenyNonHtmlRequestsis, as its name says, about requests, not responses. I clarified the reasons for this if-statement in the comment though — great question!Thanks for the review! :)
Comment #222
Anonymous (not verified) commentedThis sounds really good. But if you have to sacrifice this
then I guess routes are much better approach.
On the other hand
I think changing it to query argument is now a Drupal way and so contrib modules should follow that.
Comment #223
fabianx commentedTL;DR: I am perfectly fine with smart cache to run after routing, but it should probably use the url cache context.
So there is a misunderstanding:
SmartCache fully uses:
a) a two-tier caching strategy
b) allows cache contexts to bubble up
If smart cache is url based, then the request_format cache context could still bubble up.
That only means that request format needs to be negotiated by a middleware before smart cache ($request->setFormat()).
OR if request_format is determined during routing that smart cache is moved by said contrib module after routing.
I don't think neither is a problem.
To have smart cache fully run in a middleware we need to "cache the cache contexts" similar to the ESI use case based on the user session and only if we can't fulfill them fall back to the subscriber after routing / authentication.
However that middleware could be added at any point in the future, so should not block that.
However for forward compatibility it would be good if smartcache used ['url'] and relied on request_format to bubble up - it just needs to make sure to take into account cache contexts set on $request->data('_access_cacheable_metadata') [or what we called that] if it runs earlier, but before the normal FinishResponseSubscriber.
We can always harden DDOS in the future.
Comment #224
wim leers#223: I don't think it's valuable to make SmartCache use the
urlcache context because:if we run SmartCache AFTER routing already, there's no point in doing so
if we change SmartCache to run BEFORE routing, then the problem that arises is described in #219: the
user(anduser.*) cache contexts don't work anymore. And who knows what else.Therefore, I think making SmartCache run before routing should be follow-up material; it can happen at any time in the future (think
SmartCache2, which will only be enabled on new sites).Which bring me to an important point: how do we let users turn off SmartCache? My concern with making it a separate module is that it is too difficult to explain to users what it does. "Page Cache" is simple": "caches HTML for anon users".
But SmartCache works for all users, in all situations, and only for HTML. So… "HTMLResponseCache" may be a better name. But exposing a module with that name is super confusing too.
IMHO, we should ship with SmartCache enabled and not exposed in the UI. Making it configurable using either a container parameter or setting or config, and having
settings.local.phpdisable SmartCache by default (cfr. https://www.drupal.org/node/2259531) seems like a good solution to me.Comment #225
wim leersRevamped IS.
Comment #226
wim leersCan somebody please add Fabianx to the issue credit? Thanks.
Comment #227
cosmicdreams commentedWim: I think calling it "Smart cache" is fine. Because it is applying a good deal of intelligence in how it handles the cache. I can't think of another kind of cache we would throw at this that would be smarter than this smart cache (that we wouldn't roll into smart cache).
To the site admin / site builder that is curious about what this new fancy thing is and only looks at that name, I would imagine they would eventually learn that turning this on is like supercharging their site.
Comment #228
fabianx commented#224: Lets keep it focused and go with route for smartcache 1.
A container parameter is fine and enough, this is like block cache, just for a different part.
Comment #229
wim leersHaving discussed this with @Fabianx, we felt like the clearest way to add SmartCache to D8 core, is by putting it in a
smart_cachemodule.So, this patch:
smart_cachemodule (all code)page_cachemodule's description +hook_help()to make the difference withsmart_cacheclearerhook_help()for SmartCacheNoAdminRoutespolicy toDenyAdminRoutes, for consistencysmart_cacheto be enabled by default in theminimal,standardandtestinginstall profilesexample.settings.local.phpto disable SmartCache by defaultThoughts? :)
Comment #230
berdirYes, clear +1 on making it a module. Was discussed before and was suggested by me as well before, but back then the coupling way too tight for that to really work IIRC. Yay that that's no longer the case :)
Comment #232
berdirDiscussed example.settings.local.php with wim.
I expressed my worries about developers disabling smart, page and render caching locally and then deploy stuff without testing when it's enabled.
We agreed on adding both smart and render cache disabling but commented out by default with a clear comment about the pitfalls.
Comment #233
webflo commentedPlease add smart_cache module to the replace section in
core/composer.jsonComment #234
wim leers#232: Done.
#233: Done. I had no idea about that, so thanks for pointing that out! Ideally that'd have caused a test failure, so created an issue for adding a test so that happens in the future: #2539682: Add test to ensure composer.json is listing all core modules.
Comment #236
wim leersOMG, missing underscore. Wim--
Will fix that in the next reroll.
In the mean time: more feedback/reviews? :)
Comment #237
yched commentedA couple thoughts on the overall organisation around request / response policies between page_cache and smart_cache :
(although, the same could be said about the pre-existing 'page_cache_response_policy' tags I guess...)
- the concepts are defined in Core/PageCache,
- policy implementations are shipped in PageCache folders / namespaces (modules/smart_cache/src/PageCache/RequestPolicy/DefaultRequestPolicy is "The default SmartCache request policy" - feels confusing to have "smart cache" policies shipped in a PageCache folder ?)
Thus, should we rename the concept and folders to ResponseCachePolicy ? (the current Core/PageCache folder is only about Policy stuff)
Comment #238
yched commentedAlso : In case of a cache miss, the request policy is checked twice (once before checking the cache in onRouteMatch(), once after buidling the uncached response in onResponse()).
- that is a different behavior from page_cache, which only checks the request policy once (before handling the request, and not after computing an uncached response), the inconsistency could be confusing ?
- if we do need to re-check the request policy before caching the computed response, it's possibly not a biggie perf-wise since this won't happen on the next cache hit, but then maybe let's skip it if the request policy failed the first time ? Because it seems we'll be checking it twice on each request to an admin route ?
Comment #239
wim leersThanks, great review! :)
page_cachemodule; they're actually for all responses (pages)! They determine a response'sCache-Controlheaders, seeFinishResponseSubscriber::onRespond(). It is simply the case that thepage_cachemodule respects precisely the same request/response policies!PageCachepart in the namespace is wrong; we should rename it toResponseCacheorResponseorResponseCacheControl. Which is similar to what you are suggesting.But, IMO that can be done in a separate issue (before or after this one), because it is an independent, pre-existing issue. (Again, see point 2.)
page_cache-specific policies;page_cachesimply uses those that policies that apply to all responses. (Again, see point 2.)So, with point 3 addressed, this should in theory be faster. In practice, it's *marginally* faster, but only within the margin of error. The distribution of the histogram just looks a bit more favorable. (See attached file.)
Comment #240
wim leers#238: Another good point! :) The only way to avoid doing that though, is by storing the conclusion of the request event subscriber somewhere. Where? On the response itself? IDK what the best practices are around that, except for that it is generally looked down upon. Happy to do whatever people think is the best way!
Comment #242
yched commentedThanks for thé answers in #239, makes sense. Agreed we should rename the PageCache namespace in a separate issue.
#240 :
Can't we put a "smartcache : not cacheable" header or attribute on the request (and then carry over to the Response) ?
Related : might be a good idea to briefly document why smartcache can't be implemented with a middleware like page cache ?
Comment #243
benjy commentedUnrelated i guess but how come this proxy class is not generated with Drupal's coding standards?
We're not actually using the result of the render() call, could we add a comment. It isn't obvious what's happening.
Just pointing this out.
Comment #244
wim leersThank you both for your reviews!
#242
#243:
@todoto that issue.Comment #246
plachI just tried to add @Fabianx to the credit list. Hopefully he will stick :)
Comment #247
yched commentedThanks @wim !
+ $event->getRequest()->attributes->set(self::SMART_CACHE_REQUEST_POLICY_RESULT, $request_policy_result);Minor : should use the $request var that was extracted two lines above :-)
Also, nitpick, not sure how we do this elsewhere, but the const name, being a class const, doesn't really need to be prefixed with the SMART_CACHE_ business domain ? (just like the class member is requestPolicy rather than smartCacheRequestPolicy)
Other than that, I'm amazed at how straightforward the patch has become :-) Amazing work.
Comment #248
yched commentedDo we really still need that proxy ? Can't seem to find a service that uses it ?
Nitpick : reusing the same var for "cache item" and "content of that cache item" is a bit confusing. Plus, this means in the rest of the code, $stored_cache_contexts is either "the cached contexts" or FALSE, rather than NULL or [] as you would more likely expect on a cache miss ?. That's not really explicit and might lead to surprises in later adjustments / refactorings ?
Also, isn't that something we already fetched in onRouteMatch() ? Maybe not, since, although the comment is the same ("Get the contexts by which the current route's response must be varied"), the actual code is slightly different :
- onRouteMatch() fetches $cid = implode(':', cacheContextsManager->convertTokensToKeys(['route'])->getKeys());
- onResponse() fetches $cid = cacheContextsManager->convertTokensToKeys(['route'])->getKeys()[0]
If both actually fetch a different context_cache entry, maybe we could clarify/differentiate the code comments ? ATM it's not clear to me why onResponse() only looks at the first key (especially since, according to CacheContextsManager::convertTokensToKeys(), the order seems arbitrary/alphabetical ?)
If both actually should be fetching the same entry, is that something we want to save in a request attribute as well ? (well, I guess this one only affects real cache misses, and not uncacheable pages, so maybe that's not a big deal)
Comment #249
wim leers#247
/facepalm — fixed :D
#248:
Comment #251
wim leersFixing those new failures.
Comment #252
wim leers#2533768: Add the user entity cache tag to user.* cache contexts that need it just landed! This means SmartCache is now unblocked!
Comment #254
yched commentedPainful nitpicker returns :-p :
Rather than putting the cache_item object in the CACHE_CONTEXTS_FOR_ROUTE attribute, it might be a bit cleaner to first resolve "$stored_contexts = ($cache_item !== FALSE) ? $cache_item->data : NULL" once and for all and store that resolved value in the attribute, so that onRepsonse() doesn't need to duplicate the unwrapping logic ? We probably don't really need to leak to consumers that this came from a cache bin and is still wrapped by the cache API ?
Comment #255
wim leers#2430397: When mapping cache contexts to cache keys, include the cache context ID for easier debugging caused those ten failures in the integration test. This interdiff updates the integration test accordingly.
This … should be green! :) :O :)
Comment #256
lauriiiWOW! Good job Wim & Fabian & all!
Comment #257
fabianx commentedLast changes look awesome! It is fantastic to see this finally being green!
As we are nearing the RTBC stage.
Proposed commit message:
Comment #258
lauriiiNot much to say but few comments:
I was wondering if it would be a good idea to change this comment to include why because it is possible to override this
Double space before HTMLResponse
Hmm?
This class is missing class documentation and all the methods are also undocumented
Comment #259
wim leers#254: hah :P Done! yched-nitpicking++
Also tried getting rid of the REQUEST subscriber priority stuff, just to double-check that it's still necessary. It still is. So, documented why exactly.
Comment #260
borisson_Read trough the entire patch again, found a couple of really small nitpicks.
Nitpick: there's a double space right before HTMLResponse.
Should we return this early when the X-Drupal-SmartCache Header is 'UNCACHEABLE' as well?
Can we specify an example of the wrapper format? (ie, drupal_ajax, drupal_dialog, ...)
Comment #261
wim leers#259:
max-age=0gets bubbled from the block to the overall page). SmartCache won't cache those pages. Think e.g. the breadcrumb/local tasks/local actions blocks; those all still specifymax-age=0. We are working on auto-placeholdering in #2499157: [meta] Auto-placeholdering, which avoids that infection and puts a placeholder in the place of the block, so that the block can be rendered separately. Which means that once that lands, SmartCache will be able to cache even those pages.So far, I've argued to keep this in for profiling purposes. But, since this patch is now green, I'm removing it now, so that it can really undergo final review :)
Comment #262
wim leers#260:
(Whew, it's getting hard to keep up with all the reviews :D)
Comment #263
borisson_#262.2: This makes SmartCacheSubscriber a lot more readable by removing that extra layer of indentation! Awesome.
Comment #264
wim leers#263: agreed :)
Fixed a bunch of nitpicks/confusing language after a self-review.
Comment #265
yched commentedI was wondering whether we should also set the 'X-Drupal-SmartCache: UNCACHEABLE' header on request/response policy deny.
It seems PageCache doesn't - it in fact never sets its X-Drupal-Cache header to UNCACHEABLE, whatever the reason (max age 0, policy deny...). So maybe we should look into unifying that in a followup ?
Comment #266
wim leersBut the request/response policies simply exclude some things; some of them technically are cacheable (e.g. admin routes/responses).
X-Drupal-SmartCache: UNCACHEABLEreally means that the response itself is uncacheable.(In the future, admin routes/responses may become cacheable; e.g.
/admin/contentwould make a lot of sense to cache.)Comment #267
yched commentedSure, maybe UNCACHEABLE is not the right info there, but "some" explicit info might help ? Or we consider that "SmartCache enabled and X-Drupal-SmartCache header absent" implicitly means "excluded because of policy" ?
(again, totally fine for a followup across page_cache and smart_cache - if at all)
Comment #268
wim leersYes; just like
PageCacheonly addsX-Drupal-Cache: MISSif the response didn't already get excluded by the response policy.Comment #269
yched commentedOk - but then do we want to have X-Drupal-Cache output 'UNCACHEABLE' too on relevant cases ? page_cache currently doesn't distinguish between "excluded by policy" or "response says it's not cacheable".
Side note : opened #2540684: Rename Drupal\Core\PageCache namespace for #237.4 / #239.4
Comment #270
wim leersBut
page_cachedoesn't care about uncacheable. It assumes — like it always has — that every page is cacheable. To fix that, we have #2352009: Bubbling of elements' max-age to the page's headers and the page cache.Thanks for opening that other issue!
Comment #271
yched commentedOooh, right, that's why :-) Then it would be for #2352009: Bubbling of elements' max-age to the page's headers and the page cache to spit X-Drupal-Cache UNCACHEABLE when applicable, to be consistent with what smartCache does with its X-Drupal-SmartCache here, right ?
Comment #272
wim leersYes!
Comment #273
jhodgdonI was asked to take a look at the hook_help and I have a few comments:
a) In page_cache.module -- The standard way for one module to say "check out this other module" is to link to that other module's help page, not directly to drupal.org. So please change this sentence:
to say something like "To speed up your site for authenticated users, see the Smart Cache module" [also note wording change, "for speeding up" is not great], and then turn "Smart Cache module" into a link to that module's help page. You'll need to have a module exists check in there too. For an example of how to do this... well they are in other places in hook_help(), like there's a link in Menu about Block, and in Field about Field UI, Views for Views UI, etc.
b) In smart_cache.module help: Although I found the "Nothing needs to be configured — that is why it is smart!" text amusing, I don't think it is great for documentation. Instead of having a "Configuring" heading and then saying "you don't have to configure it", I think it would be better to just add a sentence to the previous paragraph under "Speeding up your site" to say simply "The module requires no configuration.".
c) I also think that the help is confusing and contradictory. The first line of the help says it caches "parts of pages", but then in the Speeding part, it is talking about whole pages. So this needs some attention. It would be nice if by reading this help I could understand what this module does, why it's better than Internal Page Cache alone, etc.
d) In system_help(), there is a link to the Internal Page Cache module. Probably something should be added there to link to the Smart Cache module too.
Comment #274
wim leersa) Done.
b) Done.
c) Good point. I refrained from explaining that in the documentation, because i) it's advanced, ii) it links the d.o handbooks precisely for that reason :)
d) Done.
You'll probably have more nitpicks, but this should be a step in the right direction :)
Comment #275
jhodgdonMUCH better, thanks!
A few small suggestions:
a) "The Smart Cache module caches pages minus the personalized parts in the database" ... I found that a bit hard to parse. How about:
The Smart Cache module caches pages in the database, minus the personalized parts.
b)
I think that these two sentences could be put into the same paragraph, and I don't think the second one needs to be in parentheses.
The rest looks great, thanks!
Comment #276
dawehnerI really like how it is not required to change a lot of existing code! Great work wim.
We still don't have any constant for 0?
Can we do a return parent::setContent($content);?
It would be nie to document what is in there or at least a pointer to the same information in the render cache service
It would be interesting what happens if you don't vary by user. Could you then say route is actually less detailed then URL decide to cache by URL, which then is available at middleware level ...
+1 for documenting the attributes. I still think attributes aren't that bad as other people considered them to be, at least for internal value.
Do we have some test coverage, maybe a test event subscriber which ensures everywhere that the cache contexts don't vary on the same route for different parts of a test function? This could maybe let us find a couple of problematic places.
So I'm curious why would it not work with rest responses with cacheability metadata?
Somewhere it would be great to document why
Let's answer why, sorry.
That sounds like a workaround for me ... its not necessary the admin theme, right? This is just using the bare html renderer. I would rather mark it as not cacheable directly
Comment #277
wim leers#275:
a) Thanks, applied your suggestion, that is so much better! :) Updated the module description in
smart_cache.info.ymlsimilarly.b) Oh, yes, of course!
@jhodgdon++
#276:
:)
sites/default/default.services.yml.usercache context is present means cache poisoning: the cache will simply not be per user, which means I can see things intended only for you and vice versa, depending on who accessed it first.htmlrequest format, but since #2481453, caring just about the request format isn't enough anymore; we also need to care about the wrapper format. However, you're right; this is not necessary. We can fix it by having the logic that determines which wrapper format to use (MainContentViewSubscriber::onViewRenderArray()add theurl.query_args:_wrapper_formatcache context.Note this also means it's in fact quite easy to do a DDoS attack: just specify random
_wrapper_formatquery arguments. This is a consequence of having "ajax" and "dialog" and "modal" not be actual request formats, but just wrapper formats. But that's out of scope to fix here.What is in scope here is to allow ajax/dialog/modal (i.e. HTML + wrapper format) responses to be cached by SmartCache. Except AJAX responses aren't cacheable yet. But, once they are made cacheable — or at least dialog/modal at first — this will be perfectly possible.
no_cache: trueinstead.Comment #278
wim leersThis is addressing #276.7:
HAH! This is an EXCELLENT EXCELLENT question! You're absolutely right! Since #195, SmartCache is no longer deeply intertwined with
HtmlRenderer. Which is why it is now indeed completely possible to do this in a generic way!Fixed!
Comment #279
wim leersThis is addressing #276.6:
First: It is okay for a response to return different cache contexts: it's possible that only if the user does have a certain permission (i.e. for a certain variant of the
user.permissions) cache context, that something else is shown that itself has cache contexts that aren't otherwise present (e.g. render something that depends on a certain URL query argument, so it addsurl.query_args:foo).Second: this is the most important missing part in the current patch IMHO: test coverage of this logic.
Third: This is basically a second implementation of cache context bubbling for this particular use case. See
RenderCachefor the existing implementation and\Drupal\Tests\Core\Render\RendererBubblingTest::testConditionalCacheContextBubblingSelfHealing()for the extensive existing test coverage.Fourth: I tried getting rid of SmartCache's re-implementation in #189/#212, but failed, because SmartCache deals with Response objects and RenderCache deals with render arrays. It seemed like a bad idea there, but… really, it is not. Duplicating the exact same logic and all of its test coverage, that's just a recipe for disaster.
So, in this reroll, SmartCache now uses
RenderCache, so we automatically have all the needed test coverage :)Comment #282
yched commentedWow. Expanding to all CacheableResponses is huge !
Still wrapping my head around the use of RenderCache :-). Meanwhile, a couple more remarks :
I don't see where that bin is used now ?[edit : never mind, found out, it's specified in SmartCacheSubscriber::$smartCacheRedirectRenderArray]
minor : the phpdoc description is copied from the previous method, that one should be "... returning a CacheableResponse object" ?
Side note, not for the patch here, but adding a cache context to a response is kind of convoluted.
Can't we have CacheableResponseInterface extend RefinableCacheableDependencyInterface and gain direct setters ?
Comment #284
yched commentedshouldn't that be "... so that *RenderCache* is able to cache it" ?
Comment #285
dawehnerWell for sure ... but just assume we don't have something per user. Once you have that, you might be able to have all the other cache tokens available on middleware level.
Comment #286
wim leers#277: The addition of the
'url.query_args:' . MainContentViewSubscriber::WRAPPER_FORMATcache context in #277 caused a whole bunch of test failures. Easy fixes though: we need to update the expectations of some tests.Because it was incorrectly split up from the 2 other patches (278/279), it also fails the
SmartCacheIntegrationTest.(This causes the patch size to increase significantly, even though the changes themselves are tiny.)
#278: Here, the
SmartCacheIntegrationTestpasses again. But it has failures inDrupal\rest\Tests\ReadTest. Because some routes' controllers return different responses based on the request format (rather than having a separate route for each request format.) Solved by making SmartCache always vary by therequest_formatcache context too.#282:
CacheableResponseInterface extends Response implements RefinableCacheableDependencyInterface, and usingRefinableCacheableDependencyTrait. Opened #2541272: Improve CacheableResponseInterface for that.#284: Fixed.
Comment #287
wim leersTrue — to some extent. How are you going to get the route cache context in a middleware? And all the cache contexts that depend on the current user?
But, yes, this is super interesting/potentially powerful stuff to investigate, because it may yield additional performance benefits. Let's do that in a follow-up :) Created #2541284: Investigate moving Dynamic Page Cache into a middleware at the cost of doing authentication manually if needed for that.
Comment #288
fabianx commented#287: Yes, lets explore in a follow-up, ESI and other middlewares have the same problems that cache contexts can only be fulfilled once http kernel processing has started and not lazily.
However we can cache the cache contexts per session potentially and then we can move it pretty early in the chain.
Comment #290
wim leers#288: interesting! Can you post that as a comment on #2541284: Investigate moving Dynamic Page Cache into a middleware at the cost of doing authentication manually if needed, or update the IS?
Forgot one bit; should be green again now.
P.S.: #285 had 100,002 assertions, of which 2 failed. I wonder if it's the first time we have exactly 100K assertions passing? :)
Comment #291
wim leersThis patch links to https://www.drupal.org/documentation/modules/smart_cache, but now that URL actually has a handbook page :) It is modeled closely after the already existing https://www.drupal.org/documentation/modules/internal_page_cache handbook page, but that one has been updated too (for language & content errors).
In the process, I've also updated:
(See #2540814: Make the cache documentation even more awesome.)
Comment #292
yannisc commentedI tried #290 through simplytest.me and I get the UNCACHEABLE header. I'm sure I'm missing something here, which would make it much better to understand if it was mentioned in the documentation.
Comment #293
borisson_Reviewed the patch again after the recent changes, found 2 more nitpicks:
can you change
and then are reusedintoand then they are reused. That makes this a little bit more fluent to read.While this is true for all the code that core (and hopefully contrib) provides, should we mention something about adding cache contexts / cache tags for custom code here, or is that something that should go in the docs on d.o?
Comment #294
jhodgdonThanks for taking a look at the patch @borrison_!
I actually disagree with both of your comments though... I think "then are reused" is clear, and regarding item 2, hook_help() is documentation for end users, not for developers. The documentation for developers should be put into (a) topics using @defgroup and (b) pages on drupal.org. I believe that we already have sufficient developer documentation about cache tags and cache contexts in both places.
Comment #296
larowlanOn the criticals call you mention there is only one blocker left - is that #2082317: Forum history markers ("new" and "updated" markers, "x new posts" links) forces render caching to be per user?
Comment #297
jhodgdonI thought there were also some blockers to page cacheing in the Search module? But maybe they're not the blockers you are talking about.
Comment #298
lauriii@yannisc: if any element has max-age set to 0 it will not cache the whole page. I think now there is multiple places setting it to 0 but it can be worked on after this issue has been fixed
Comment #299
yannisc commented@laurii: Thank you for the clarification. Should we add this in the documentation, along with how we check which elements have max-age=0?
Comment #300
wim leers#292 + #298 + #299: See #258.3 + #261.3.
But basically:, as @lauriii already said:
max-age=0causes SmartCache not to cache. As the IS says:What we still need, is the tools to make it easy to see — while developing — what the cacheability of things are. See http://wimleers.com/blog/renderviz-prototype for that. In fact, I just made https://www.drupal.org/project/renderviz compatible (commit
d99477103e80dce3baa0159801ae6f6abb7b4995) with D8 beta 14. Apply the core patch, install the module, and you can see something like the attached screenshot. As you can see, it is the breadcrumb block that is causing every page to become uncacheable. #2483183: Make breadcrumb block cacheable (currently RTBC!) will make that block cacheable and will this effectively make SmartCache work on most pages.rendervizmax-age=0, which means SmartCache won't cache the page.renderviz(Yes, the visualization — and accessibility — of
rendervizis still lacking. If you want to help out with that, ping me, or file issues/patches at https://www.drupal.org/project/renderviz.)#293 + #294: Nothing to do for me then I guess :)
#296: nope, that's not blocking SmartCache (though it should be fixed for sure; there simply isn't any test coverage for that one, which is why SmartCache is green without that issue having been fixed). The last blocker was #2533768: Add the user entity cache tag to user.* cache contexts that need it, which has since been fixed.
#297: Those can be fixed independently of this issue; they're a pre-existing problem, also for Page Cache. (They were only discovered after Page Cache landed. And yes, I still have that issue open in a tab — just haven't gotten around to it yet.)
#298: Indeed, thanks!
Straight reroll, with one bugfix: creating the SmartCache vs. SmartCache + cacheable breadcrumb block screenshots above using
rendervizmade me discover a bug in HEAD:FormBuilderdoes not pass on the method specified in$form['#method']toFormState. So,FormStatestill thought that the search block's form was a POST form, hence it was still settingmax-age=0on that.(The screenshots include this fix.)
Comment #301
wim leersAnd of course I forgot to mention the most important thing in #300: we're working on auto-placeholdering of things that are detected to be uncacheable, over at #2543332: Auto-placeholdering for #lazy_builder without bubbling. That would've automatically placeholdered the uncacheable breadcrumbs block, and thus allowed SmartCache to still work despite the breadcrumbs block! (Just like SmartCache still works despite the comment form, because the comment form is manually placeholdered.) Of course, it's even better to have the breadcrumbs block just be cacheable, then no placeholdering is necessary.
#2543332: Auto-placeholdering for #lazy_builder without bubbling has several child issues that are ready for final review: #2543332: Auto-placeholdering for #lazy_builder without bubbling + #2543340: Convert BlockViewBuilder to use #lazy_builder (but don't yet let context-aware blocks be placeholdered).
Comment #302
wim leersQuick status update!
#2543332: Auto-placeholdering for #lazy_builder without bubbling landed! If #2543340: Convert BlockViewBuilder to use #lazy_builder (but don't yet let context-aware blocks be placeholdered) now also lands (a soft blocker for that was discovered and fixed already: #2543554: Clean up Help & Statistics blocks), then we won't have uncacheable blocks preventing SmartCache from caching pages anymore.
Alternatively, if #2483183: Make breadcrumb block cacheable lands (it was RTBC, but committer Alex Pott found some areas that needed more work), then the breadcrumbs block — which is the main reason why most pages are not cacheable by SmartCache — will become cacheable and hence SmartCache will also be effective on most pages.
Holding off on rerolling this, since there seems little point at the moment — if you want to review this, #300 is the one :)
P.S.: finally, because #2543332: Auto-placeholdering for #lazy_builder without bubbling was committed, #2543334: Auto-placeholdering for #lazy_builder with bubbling of contexts and tags is now unblocked, and has a green patch too. This will ensure that blocks whose contents contain poorly cacheable things will also be automatically placeholdered.
Comment #303
wim leers#2483183: Make breadcrumb block cacheable just landed. And as said before, this is the last block preventing most pages from being cached by SmartCache.
So, as of right now, if you apply the SmartCache patch, you can actually see it in action.
Patch rerolled. Just one small conflict to resolve: the changes in
NodeTranslationUiTestare no longer necessary, so the patch even became slightly smaller.Comment #304
Anonymous (not verified) commentedSo what is the ETA for core?
Comment #306
dawehnerWe will have fun one #144538: User logout is vulnerable to CSRF lands, don't we? Or will we autoplaceholder that entire menu in that case?
Comment #307
wim leers#304: That depends completely on the reviews this issue gets — it needs to be thoroughly questioned, reviewed before it can be committed, even more so than most patches.
I personally hope that the ETA is :) I've been working towards this for almost exactly 6 months (6 days short of 6 months!) — with the help of many others obviously.
So, turns out the uncacheable breadcrumb block was masking several problems. Hence the 7 failures. I also found one small bug/optimization/simplification in the patch.
Changes:
HtmlRendererwas slightly simplified/optimized.CommentNonNodeTestwere becauseentity_test/structureis actually an admin route, but wasn't marked as such because it doesn't have theadmin/prefix. So, changed the system path for that route. This was introduced several years ago, in "the early new routing system days". I checked with Berdir, and he confirms/thinks that was simply an oversight.ForumTestwere due to a missing cache tag. Test coverage added.Will fix the remaining test failures tomorrow.
Comment #308
dawehnerIt feels we are loosing some test coverage with that we might want to have actually
Comment #309
wim leersHrm, not only is that route change (system path rename) quite large already in #307, I clearly also missed several places. This fixes that. But it makes the patch even bigger. I'm going to split it off into a separate issue so we can make this patch smaller again.
#308: Can you be more specific about what we lose?
Comment #310
larowlanWhat is great here is how little code there is in this patch - looks great Wim - just a few questions and comments
side note: I wonder if its worth creating constants somewhere in the routing system - so we could do stuff like
SomeRouteInterface::FORM_WRAPPER_DERIVATION - 1which would make it clear that the subscriber went just before that particular instance.nit: should we single quote placeholders?
%s/to replace/are being replaced
Can we get a comment as to why we render here but don't use the output/return? We have one added later on in the patch in HtmlRenderer::renderResponse, happy for it to be similar.
Can we get a follow up/issue number here?
debug code still?
The 'works well for websites of all size' is a bit bizarre.
sentence needs work
no need for the comma here
Comment #313
wim leersFixing the last fail. And removing debugging changes that I accidentally included in #307.
Comment #314
wim leers#310: Thanks! Glad you like it :)
page_cache.info.yml's — can you make a suggestion on how to phrase it differently?usestatements; fixed that. I haven't changed the wording here though, it's again mirrored afterpage_cache.module's wording: — suggestions?Comment
100π:)Next: moving out the route change as mentioned in #309.
Comment #315
wim leersAnd here's that part moved out into a separate issue: #2551429: FieldUI should accommodate route options in RouteSubscriber. Will make this patch 14 KB smaller again.
Not postponing this issue on that one; it's just to make this patch easier to review.
Comment #316
wim leersChanges:
system.routing.ymlcan be removed.Comment #317
wim leersI split off two more significant parts of this patch:
Comment #318
wim leers#2551429: FieldUI should accommodate route options in RouteSubscriber and #2551989: Move replacing of placeholders from HtmlRenderer to HtmlResponseAttachmentsProcessor landed! That already helps make this patch considerably smaller.
Because #2551989 has landed, #2349011: Support placeholder render strategies so contrib can support BigPipe, ESI… is now also unblocked!
This is a straight reroll/rebase, with only a single change: an early return statement for debugging purposes that I accidentally included.
Next: get #2552013: Follow-up for #2481453: ensure the 'url.query_args': MainContentViewSubscriber::WRAPPER_FORMAT cache context is set to RTBC (it's close), and #2526472: Ensure forms are marked max-age=0, but have a way to opt-in to being cacheable. I'll try to open additional issues later.
Comment #319
andypostIt's really close! only one nit
only 2 todos! yay!
Suppose they should be uncommented
Comment #320
wim leersSince #318, #2552013: Follow-up for #2481453: ensure the 'url.query_args': MainContentViewSubscriber::WRAPPER_FORMAT cache context is set got RTBC'd and #2526472: Ensure forms are marked max-age=0, but have a way to opt-in to being cacheable saw some discussion.
The next step was to try to move as many of the final small bits of cacheability fixes out into separate issues. Only two of those make sense to go in independently of this patch, because it makes sense to have explicit test coverage for them. The others are very small cacheability fixes that really only make sense in combination with SmartCache.
So, I opened #2554579: Forum index response is missing the vocabulary cache tag & #2554585: Tracker responses are missing a cache tag and a cache context for those two fixes.
#319:
Removed two small leftovers in this reroll. Also rebased on HEAD.
Comment #321
wim leersWoot, #2552013: Follow-up for #2481453: ensure the 'url.query_args': MainContentViewSubscriber::WRAPPER_FORMAT cache context is set landed, and #2554579: Forum index response is missing the vocabulary cache tag landed amazingly fast (it was super trivial after all)!
That means this patch can now shed about 25% of its weight! Even if the other patches land in the mean time, they won't make this patch significantly smaller.
IS updated to reflect that the scope of the patch is now quite a bit smaller, now that the recent problems that this patch uncovered have already been fixed in other issues :)
I'll keep this patch up-to-date, applying to HEAD. It's ready. It's just a matter of deciding when/if this lands.
Comment #322
tstoecklerRead through the entire patch and it is really a very good sign, when someone like me - who has no real in-depth knowledge of the new caching world order - can understand the entire patch not only in broad terms but in its completeness. The code is very nicely architected and thoroughly documented, otherwise I would have had no chance at all to grok this. Awesome work!!!
I have some minor comments, probably not worth a re-roll just for that, though.
Super minor, but I think would be more readable to do:
I might be missing something, but I think this is unused.
Also super minor, but it looks kind of strange to have
#attachedbe a string. Shouldn't/Couldn't it be an empty array?As far as I can tell, this "chains" only a single policy, so why not use the one policy in the first place? The chaining seems overkill here, no?
I think we need a separate change notice for this. I realized this when reading the discussion above about adding
_admin_routeto the entity test routes. So far_admin_routewas pretty much cosmetic, i.e. it determined which theme is used and not (much) more. This makes it so that if you provide a non-admin page which is i.e. an entity list, you will have to take great care about caching metadata.Double newline
Comment #323
effulgentsia commentedWim and I had a long conversation about the concerns that committing this patch (once it's ready) would be disruptive to contrib, so I opened a separate policy issue for all of us to discuss that further. See #2556889: [policy, no patch] Decide if SmartCache is still in scope for 8.0 and whether remaining risks require additional mitigation. Feedback there welcome and appreciated.
Comment #324
effulgentsia commented#2470679: [meta] Identify necessary performance optimizations for common profiling scenarios says that a performance issue should be critical if:
This issue has not been prioritized as critical up until now because we thought that even if it didn't make it into 8.0, it could still be eligible for 8.1, and thus not meet that second criterion. That may still be true, but per #2556889-14: [policy, no patch] Decide if SmartCache is still in scope for 8.0 and whether remaining risks require additional mitigation, I'm less confident of that now, so raising this to critical until we sort that out.
Comment #325
xjmDiscussed with @alexpott, @effulgentsia, @catch, and @webchick. We were mostly agreed that it seemed safer overall to do this issue pre-RC than in a minor. (See extended discussion in #2556889: [policy, no patch] Decide if SmartCache is still in scope for 8.0 and whether remaining risks require additional mitigation.) We also discussed strategies to mitigate the risk of delaying the RC, including:
We also plan to discuss the issue with Dries tomorrow given its potential impacts and disruptions for both the release timeline and the product.
Comment #328
wim leers#2554585: Tracker responses are missing a cache tag and a cache context landed, this needed a reroll for that. The patch again became a bit smaller.
Comment #330
wim leersOops, I rebased incorrectly. There's still Tracker stuff in #328. Fixed. Should be green again now. Smaller again too.
Comment #331
dawehnerDo we have a unit test for this particular change?
Why do we use ! as placeholder? @ should work the same way. Interesting we kinda use both in core all over the place, so nevermind.
I mean ideally we would fix all those cases as well, right? The other argumentation we might want to follow here is that its not worth to cache admin pages, right?
You can use one of the new lines below :P
Out of scope of this issue, but just a thought: Could we introduce something like
$build['#cache']['dependencies'][], which takes care of those kind of merging?So do we plan to defer the enabling by default, until #2556889: [policy, no patch] Decide if SmartCache is still in scope for 8.0 and whether remaining risks require additional mitigation has settled but get this in without it? This could have the huge advantage, that contrib/custom code can already deal with it, which is kinda what the other discussion was about, contrib/custom code need awareness.
There is always time for an ubernitpick: Do you mind adding new lines after each assert... call for better readability?
Comment #332
fabianx commentedI agree we can definitely get this in and then agree on when its enabled - if its opt-in (smart cache response policy) or opt-out (smart-cache response policy).
Even the safeguards module could have a (configurable) response policy that makes smart cache opt-in instead of opt-out.
Comment #333
berdirI've started testing this in my install profile. I've got 32 fails out of 351 test scenarios, most of them likely related to our paywall, which is the hook_entity_prepare_view() use case that @dawehner mentioned before. That shouldn't be too hard to fix, I just need a paywall-access:$nid cache context and that should solve that. I'm not too worried about that or the security implications of this.
But I am worried about cache performance and cache hit/miss ratio. We already maintain an impressive amount of caches and cache variations and this is going to add quite a bit more. Posting this here because it's not really related to the discussion in the other issue.
Unsurprisingly, for uid 1, I ended up with the user cache context. I didn't track down which place(s) added that, I guess it's toolbar, among other things. If toolbar is actually not doing that, then just assume that I'm using that as a theoretical example. You might think, well, that's fine, that's just for the admin(s). Well, it is not. Cache contexts are collected for all requests, for all users. So if you have one admin user with toolbar access and 10k normal users, for which the page could be cached perfectly by their same shared user permissions context, you end up with 10'001 cache variations, not 2. And 10000 of those are identical. As Wim said on the EU criticals call, if you cache everything by user then it becomes a performance burden, not an improvement.
Yes, conceptually, we are working on bigppage and we have the concept of auto-placeholdering but I don't think that actually works already, at least not for something like blocks or the toolbar? (Or I wouldn't have spent 1h+ tracking down every max-age: 0 that was set somewhere on one of the 3 forms on my frontpage (which I changed to use GET) or some blocks that weren't cached yet in the default installation.
I don't think we can make the generic render caching/smart caching smart enough to actually understand the use case above and know that only certain users need a per-user cache variation? (Not unless you have a specific cache context that does just that).
So, here's an idea: Can we, by default/configurable, consider pages with the user context as uncacheable in smartcache and just skip them? Possible even in a more generic way in render caching itself too (caching something by-user is fine, by-url too, both might not be worth caching? (again, configurable)
Going back to the scenario above, then we'd skip the cache for the admin user(s) but we have a single cache entry for the 10k normal users. And for pages where all users get the user get the user cache context, well, then we're back to Wim's statement above, and we might be better of not wasting storage/memory for those caches anyway? And if you really want to, you can enable it.
And over time, as we get bigpipe and auto-placeholdering actually working (not just based on max-age but also on certain cache contexts), smartcache will be able to cache those pages as well.
Comment #334
wim leers#331:
page_cache.module.#332: Oh, so we already seem to have agreement about #331.6. Even better. I'll do that in my next reroll.
#333: Excellent point. Auto-placeholdering is indeed going to help address this. (Blocks are indeed not yet placeholdered, but the patch for that is RTBC.) But I also think your proposal makes total sense: the same reasons for something that is rendered to be auto-placeholdered is also a good reason for SmartCache to not cache it. Done!
Comment #335
wim leersAnd this now does #331.6/#332.
Comment #337
catch#334 is a good change. I was thinking about how much of the auto-placeholdering work was actually a blocker for this going in, but that means it's not.
Comment #339
wim leersWell, that was rather dumb. In #335, I kept the integration test ensuring that the Standard install profile has certain pages cached by SmartCache, but I removed Smart Cache from the standard install profile.
But the failures in #334 are the same. Which means that it's the changes I made in response to Berdir's comment at #333 that caused this. And Berdir already alluded to the root cause: a subtree of the toolbar varies by the
usercache context, which means the entire page must vary that cache context, which as of #334 means that the entire page won't be cached by SmartCache. So, opened an issue for that: #2560401: User module's toolbar tab is only cacheable per user: make that tab use placeholders.Comment #342
wim leersBack to NR, because DrupalCI came back green, PIFR seems to have problems. So, this patch AFAICT is actually green.
Comment #343
stefan.r commentedThis looks quite straightforward, very nice work!
s/to let/so that/
so FALSE is a double negation and means it does.. which is correct, but maybe it would be clearer to just say "Whether the response should be cached"?
"is denied either if x, or if y" might be clearer here?
Maybe "(and hence the expected behavior)" could be either left out here, or elaborated on... as is it doesn't seem to add much?
and let
++
s/why pages/why entire pages/
And maybe we can plug smartcache here?
We could clarify whether we are "caching in the database" or caching "pages [that are] in the database"
Even if large sites shouldn't be using page_cache, maybe s/should be installed/should also be installed/ ? Just so it doesn't appear an either/or (which it does a bit with the current wording)
are we missing the test here? :)
Comment #344
wim leersThanks for the review! :)
First list:
\Drupal\Core\PageCache\DefaultRequestPolicy, so keeping the same. Super nitpick anyway :PSecond list:
Comment #345
stefan.r commentedLooks good, didn't notice I had made 2 lists!
That is clearer, but now we have the same question about /users/ [that are] in the database :P
Would merely replacing in with "into" work?
Comment #346
jibranComment #347
wim leers@dawehner remarked in #331.1:
To which I replied in #334:
#2559011 has just landed. Before it landed, I already wanted to make sure that SmartCache (this issue/patch) would be green once #2559011 landed. Unfortunately the answer was that it would not be green: the patch in the testing issue is red (#2560959-4: Testing issue for #2429617).
The reason the current SmartCache patch is green is because it takes the coarser (inferior) approach of #2526472. But because a more granular (better) approach was taken in #2559011, we now need one more step: #2561775: Forms without $form['#action'] set get their action automatically generated based on current path + query args: cacheability metadata is missing. Then forms will have sufficient cacheability metadata, and the SmartCache patch will be green again. That issue is already green.
This rebases SmartCache now that #2559011 has landed. It adds #2561775, plus a single extra hunk that only makes sense in this patch. If all is well, this should be a green patch again. Once #2561775 lands, this patch will be 15.5 KB smaller again, and will be identical to #344, minus the hunk in
FormBuilder, plus the hunk inSessionTestthat you can already see in this interdiff.Comment #348
wim leersAnother day, another update!
So, while #2561775 solved the remaining problem at hand, @catch remarked at #2561775-8: Forms without $form['#action'] set get their action automatically generated based on current path + query args: cacheability metadata is missing that #2504139: Blocks containing a form include the form action in the cache, so they always submit to the first URL the form was viewed at solved the same problem, but better. It'd cause fewer variations, and hence result in a much better cache hit ratio. This made sense. I'm happy to say that #2504139 has just landed :)
(See
interdiff-revert-previous-solution.txt, which removes the parts of the original solution that aren't also in that alternative solution.)But… that also meant a different solution was chosen. I'd only verified that #2561775 actually made SmartCache green again. So, I reopened my testing issue and started testing: #2560959-13: Testing issue for #2429617. 14 remaining failures, that the more optimal #2504139 had that the less optimal #2561775 did not. Those 14 failing assertions were in 3 tests. Three corresponding, tiny fixes solve those failures. Because they're so tiny, I think it'd be okay to fix them as part of this issue.
(See
interdiff-fix-remaining-failures.txt.)Comment #352
wim leersBot 3128 was broken, it's now taken out of rotation. Re-tested. Should come back green, just like DrupalCI already came back green.
Comment #353
effulgentsia commented+1 to not needing a spin-off issue for only those, but reading the whole patch, I see all of the following file changes that I think make sense to open one final spin-off patch for and commit separately from the SmartCache code:
Comment #354
wim leersAlright, will do!
In my next reroll I will also:
MAINTAINERS.txtentry listing Fabian and me :)Comment #355
wim leersDid points one and two of #354. Next: doing #353.
Comment #356
wim leersAnd now #353 is done too: #2562757: Various small cacheability fixes that are blocking SmartCache. That'll make this patch ~10 KB smaller.
@catch just told me that
MAINTAINERS.txtchanges must be committed by Dries. So removing that hunk in the next reroll (that's probably/hopefully after #2562757 lands); we can do that in a follow-up.Comment #357
gokulnk commentedExcuse me if this is a diversion. But I thought it would help others who are following this issue.
Based on #300 I was trying to set up an instance on SimplyTest.me But SimplyTest.me requires the patch to be on drupal.org So I am uploading the patch and removing it from display.
You can visit the link https://simplytest.me/project/drupal/daf9e2c509149441d4d9a4d1964895179a84a12c?patch[]=https://www.drupal.org/files/issues/renderviz-prototype.patch to directly create the required instance, enable RenderViz module and play around with a command like
renderviz('contexts', 'timezone')in the browser console.Comment #358
wim leers#2562757: Various small cacheability fixes that are blocking SmartCache landed! This is a straight reroll. Now this patch is 100% about Smart Cache, not a single remaining hunk can be done outside of this issue.
I kept the
MAINTAINERS.txtentry because @Moshe said in IRC that Dries can just post a comment +1'ing changes to that file. No more need for a silly tiny follow-up then.Comment #361
wim leersFailed due to broken testbot:
Testbot--
Retesting.
Comment #362
dawehnerThis issue should really land rather earlier than later, in order to ensure that regressions caused by it, can be fixed fast enough.
On the other hand I fear that this issue could defer the RC state to beyond barcelona, but let's see what happens.
IMHO pointless comments
I'm confused why this actually works. I thought that public services cannot be resolved on runtime, and this is basically what ChainResponsePolicy requires you ...
Technically speaking, the request is not cacheable, its rather whether the request determines whether the response is cacheable.
I'm curious whether cache contexts or max age are the more probable reason for non-cacheability on the longrun.
Do you mind adding an @todo for the future, when we have a proper cache service which can deal with cache tags/contexts on its own?
Comment #363
wim leerspage_cache_(request|response)_policyuses.max-age. But surely we can adjust this in the future; once we have enough data, we can micro-optimize this.Comment #364
dawehnerYeah I guess that public: false will be resolved internally, if its pointed around on compile time, and well the construction of services are kinda like internal. Yeah ignore me here, I was just thinking loud.
So I'm curious, does that issue technically depends on the safeguards? Would we open up security issues when we won't have safeguards in core? I guess no, because otherwise we would lack test coverage?
Comment #365
effulgentsia commentedSee the proposed resolution of #2556889: [policy, no patch] Decide if SmartCache is still in scope for 8.0 and whether remaining risks require additional mitigation. Neither this issue nor RC are blocked on the safeguards. But I hope the safeguards land before RC anyway.
Comment #366
effulgentsia commentedI'm late to the party here in terms of actual patch review, but I finally did review it today and like it a lot! I have some nits, but I'd prefer to save them for a non-critical followup. The main thing I'd still like settled here before initial commit is the module name. I don't particularly like
smart_cache, because "smart" is pretty vague in terms of informing people of what it does. I'd like to suggest renaming it topersonalized_page_cache, because I think it basically serves the same purpose aspage_cache, but not limited to anonymous users. I preferpersonalizedoverauthenticated, because it can also be used to cache pages that vary by the user's location or other attributes that can be determined without a login. Note that I'm using the term "personalized" here to encompass both per-user and per-user-group variations, per https://en.wikipedia.org/wiki/Personalization.However, that makes the following not quite fit:
So instead of that, I suggest:
Apologies if naming was already discussed and ideas like this shot down. If that's the case, please point me to where I can read up on that discussion.
Other than naming, once folks here feel that all other feedback on this issue has been addressed, I think this can be RTBC'd and committed.
Comment #367
wim leers+1 to finding a better name.
I think is kind of misleading or at least ambiguous, because it is very easy to interpret it as , whereas the opposite is true (hence the quite long description, which I'd also say is too long).
OTOH, the naming rationale is sound ("it also caches pages, and it is also internal, but …"), so I personally cannot think of a better name.
What do other people think?
Comment #368
klausipersonalized_page_cache is certainly better than smart_cache, which has no meaning. So while there are no better suggestions we should plan for a rename to personalized_page_cache (I also don't have a better idea).
Comment #369
dawehner+1 for using page cache in the name, because this is something people know about.
Comment #370
wim leersI propose we give it some time (12–48 hours?) to think of a better name. Unless we think of a better name by then, we'll go with
personalized_page_cache/.Keeping at NR to gather more feedback.
Comment #371
hass commentedI do not think smart cache is so bad. Very general name may be just "cache" or "drupal_internal_cache" or "drupal_cache".
Comment #372
mcrickman commentedJust a suggestion for a name.
Interactive Page Cache
Comment #373
Pls commentedI would consider naming it "smart_page_cache". More info about what is "smart" can be on module description.
Comment #374
dawehnerTo give a little bit more information I thought about using "contextual_page_cache" to make it clear that it various depending on some context, like authorized user, role or whatever.
Comment #375
fabianx commented+1 to #374, contextual page cache is also what I thought when I read the description, which currently says:
"Personalized Page Cache: context-dependent page caching (for *all* users!)"
The original difference was:
- page_cache was for anonymous users and in a middleware for full and complete Responses
- smart_cache was for authenticated users and for the caching the render array composed from the main content + blocks
Now (at this point in time) the difference is different:
- page_cache is for anonymous users and in a middleware for complete Response objects
- smart_cache is also caching HtmlResponse objects, but before placeholders (the uncacheable or hard-to-cache parts) are replaced and before Assets are computed (so aggregates are not dependent on different placeholders being present - except for ESI / BigPipe / AJAX)
Therefore now it would even make sense to potentially cache anonymous responses in smart_cache now (can be follow-up, can be done on custom sites) - especially if placeholdered parts would have a high invalidation ratio, which is probably a good case for auto-placeholdering with tags, in case of a highly invalidated block, smart-cache would be the 2nd base and only the block would need to be re-computed again and again.
It is just a later layer in the caching hierarchy.
Therefore it has nothing more to do with 'authenticated' users vs. anonymous users.
It is mainly a partial_page_cache.
Which would sound kinda okay.
Thinking more about this.
Comment #376
hass commentedI thought this cache is not just for pages...
Comment #377
fabianx commented#376: True, it is a CacheableResponse cache for any CacheableResponse that is not too granular (not limited to HTML responses).
So maybe (just brainstorming - not all are to be taken seriously):
- partial_response_cache
- cacheable_response_cache
- smart_response_cache
- intelligent_response_cache
- auto_response_cache
- placeholdered_response_cache
- customizable_response_cache
- automatic_response_cache
- rendered_response_cache
- contextual_response_cache
- makes_use_of_cache_contexts_response_cache
- generic_response_cache
- general_response_cache
- secret_response_cache
- drupal_response_cache
- customized_response_cache
- powerful_response_cache
Obviously that whole list also works with *_page_cache ...
Comment #378
chx commented> I propose we give it some time (12–48 hours?)
The US is in general out for Labor Day so let's make this ready for Tuesday morning 9AM eastern time. (You know what I think of this issue, let's not make it even worse by holding up the RC.)
Comment #379
plachtldr; RTBC +1
This looks great to me as well, I found nothing worth pointing out :)
About the naming bikeshed: contextual sounds very good to me from a technical POV, if we want something more user-friendly maybe "Universal Page Cache" or "Universal Response Cache" could work?
This deserves its own t-shirt :)
Comment #380
wim leersI think #373's
smart_page_cachecould be another choice, but I also like #374/#375'scontextual_page_cache, for the simple reason given in #374. But I thinkpersonalized_page_cachethen still is more understandable/less technical sounding thancontextual_page_cache, because there is no need to explain/know cache contexts for that name to make sense.However, #376 makes a good point: this does not only support "pages" (i.e. HTML responses), it supports any cacheable response. OTOH, I think it is fine for the name to just say "page" because that's the 99% use case. And similarly, "page" is more understandable/less technical sounding than "response".
#375: the patch already also works for anonymous users, and always has, that's why the title has always said . :)
Comment #381
effulgentsia commentedThanks for the brainstorming in progress. Let's keep the ideas flowing.
Re "personalized" vs "contextual": I think either is fine, but prefer "personalized" for the reasons in #380. Also, we already have a "contextual" module in core, which refers to "Contextual links", which uses the word "context" in an entirely different way, though maybe it's ok for the implied meaning of the word "contextual" to depend on context :)
Another idea is to name it
multivariate_page_cache, in the sense of involving 2 or more variable quantities. So regular page_cache can only vary on URL, whereas multivariate_page_cache can vary on URL, user permissions, timezone, etc. On the one hand, this is a pretty technical word that might be intimidating to some users. On the other hand, multivariate testing is a well-known concept in internet marketing, and in fact, one use case for multivariate_page_cache could be to support caching of pages varied for multivariate testing.Here's a stab at a .info.yml description for such a name:
Re "page" vs "response": both the existing page_cache.module and the module added by this issue can be used to cache HTTP responses that are not traditional "web pages", such as ESI fragments and non-HTML API endpoints. But I think "page" is still an ok name, because that is the more common use case and the friendlier name. Just like I think that taxonomy.module is a perfectly fine name even though the module also supports folksonomies.
I don't think those are the most important differences for naming purposes. I think the most key difference is that smart_cache can handle more variation/context/personalization than page_cache. That in order to handle such variation it needs to run later and deal with placeholders are just implementation details, and potentially ones that might change between 8.0 and 8.1.
Comment #382
torgospizzaIn #366 @effulgentsia mentioned the word "Dynamic" and that word is also sprinkled around this issue thread. To me that word encompasses the crux of all the work that has been done here:
The cache is dynamic (in that it is highly configurable and flexible). And yet the end results/responses can also continue to be dynamic (in that the data/content/etc. returned can be different across many different contexts, individualized and personalized for both anon and authenticated users).
When in doubt, keep it simple, and I think the term "Dynamic" encompasses every aspect of this major achievement. My two cents anyway.
Kudos all!
Comment #383
achtonMy very brief 2 cents (based on comments from @Fabianx and @torgosPizza) - I think it should be one of:
- Dynamic Cache
- Contextual Cache
I agree with keeping it simple here, and I think that either of those will "look nice" next to the Page Cache module, and will convey some basic information about the differences between the two. To fully grok how/when to use either, developers must read the description/documentation anyway -- you should not try to embed that information in the module name.
Comment #384
Anonymous (not verified) commentedContextual cache is IMHO the best choice but because the fact that we have Contextual links already(mentioned by eff) it could be confusing so I'd stick with dynamic. Although that sounds to me like on every page load/request something new is generated so I'm not 100% sure about that one either....I guess personalized is not that bad afterall.
Comment #385
catchPer #383 I'd probably go for 'Dynamic cache':
- we don't use the word 'Dynamic' for another core module
- it both caches things that are dynamic, and handles not caching things that are too dynamic - i.e. it's a cache for dynamic stuff.
Comment #386
wim leersI quite like the rationale behind "dynamic". So /
dynamic_page_cachemakes a ton of sense to me. "Dynamic" is also a superset of "personalized". E.g. showing real-time stock information is hardly "personalized", but it is dynamic.EDIT: cross-post with catch.
Comment #387
almaudoh commented+1 to Dynamic (Page) Cache too :)
Comment #388
fabianx commented+1 to dynamic_page_cache with reasoning of #385.
Comment #389
dawehner+1
Comment #390
borisson_+1 for dynamic_page_cache
Comment #391
oriol_e9gThe cache is based on Cache Tags (in the documentation saids that cache tags = data dependencies), so, why not a name that includes the words "tags" and "cache"? maybe
tags_page_cacheortags_cache?Comment #392
catchIt's not only cache tags, it's also cache contexts (and #attached bubbling), which are part of all render caching as opposed to just page-level render caching.
Comment #393
cosmicdreams commentedI'm confused. I thought we already hashed it out over the module's name.
Smart Cache is an appropriate name, it's a name that doesn't immediately make it clear how it is smart but gives the reader enough curiosity to unpack the definition.
Look, this is just my opinion, to stay with the original name smart_cache, that's how I'll always think of it. I definitely don't want to stall the release of Drupal 8 because of disagreement over this name.
I'm just confused why we're still debating the name after we spent so much time on this discussion in the past.
Comment #394
rainbowarrayNew rule: If we don't settle on a name before comment #400, we replace the word Smart with Johnny and be done with the discussion.
Comment #395
plachLol @ Johnny Cache
I'm tempted to post 5 more comments :)
Comment #396
effulgentsia commentedI'm happy with
dynamic_page_cacheas well. Renaming issue title as that is the idea with the most +1's so far.Sorry about that. I asked in #366 if there were notes from such a discussion written up somewhere. All I could find on the issue was #227, which did not look like it got any +1's on the issue, though maybe I missed something as I skimmed through the subsequent comments?
FWIW, I don't like "smart" in the name of any piece of software, unless it passes the Turing test, and maybe not even then. So, for example, if we ever decide to rename Views, I would prefer "dynamic lists" over "smart lists". But this is just my preference, it should not be up to me alone. I'm happy that "dynamic_page_cache" got a bunch of +1's in the recent comments though.
Comment #397
plachI'm also +1 on it, if we mean "Dynamic Page" Cache and not Dynamic "Page Cache" :)
Comment #398
jrabeemer commented+1 for Dynamic Page Cache. DPC also sounds nice and isn't used elsewhere.
Comment #399
fabianx commentedAfter again carefully reviewing this, I am RTBC'ing this as it works great and delivers on the promise.
The naming discussion and the Dries approval for the maintainers entry are still open for now, but that can go on while it is being reviewed by core committers and others that monitor just the RTBC queues.
So far as I can see all feedback has been addressed and this is really ready now.
Comment #400
dawehner+1 for a principal RTBC, but I would like to not RTBC it before the rename is done, why?, because the renaming could easily miss some bits like documentation strings.
Comment #401
catchDid mostly a docs/help pass per #400.
I think I missed this first time around. 'Small to medium-sized websites' is a bit meaningless (traffic, content and/or budget might all be small or large individually). We should really say 'use when an external page cache is not available'?
This needs more than just a rename, we should say something like 'Caches pages taking into account dynamic content'.
Sometimes the placeholder contents could themselves be render cached. Low max age, high frequency cache tags etc. aren't personalised but are also taken into account.
Still think this is mis-named - route/params is high cardinality but not in this list.
Can we make this strict?
Also what's the difference between these two?
As before size isn't the determining factor here but infra.
Comment #402
effulgentsia commentedRe #401.2, see the code block in #381: I think the description there is a good starting point.
Re #401.1, I agree, and would also like to see more parallelism between that part of page_cache and dynamic_page_cache descriptions. E.g., for page_cache, "Use when an external page cache is not available", and for dynamic_page_cache, "Use when an external page cache that supports the same features is not available". I'd also be ok with not holding this issue up on getting the exact description/help wording perfect and continuing to tweak those in followups.
Comment #403
wim leersAnd voila, renamed all the things! Including the handbook page: https://www.drupal.org/documentation/modules/dynamic_page_cache.
system.themecontains the default plus admin theme.system.theme.globalcontains the global theme settings. (Look at the correspondingly named YML files.)page_cachemodule. It sounds like we need to revise all of that. So, I'll leave it to you, catch — if you provide the exact wording you want, I'm happy to add that, but if you think it needs discussion, it probably is better to do it in a follow-up?Recommended for reviewing the name change while remaining sane: compare the last patch (#363) with this one, by doing:
That's much easier to review than the interdiff presented here!
Comment #404
kim.pepper/me adds an alias for "git diff --color-words --word-diff-regex=[^[:space:]]|([[:alnum:]]|UTF_8_GUARD)+ -M10%"
Comment #405
rickvug commentedAt the end of the day it time to get this patch in and the naming doesn't really matter. That said, who can resist some good bikeshedding?
My preference is for SmartCache or at least a name without the word "page" in it. The reason being that we're trying to move away from a page or HTML centric world into various types of responses. I like SmartCache because there is some intelligence in the design of the system. Sure, "Smart" is pretty meaningless from a developer's point of view but in marketing-speak SmartCache sounds much more significant and exciting IMO. Take my 2 cents for what you will. Excited to see this go in!
Comment #406
dawehnerI don't get why you make it so complex. Just commit one thing to one branch, and the other thing to another branch.
You worked too much on this issue if you deal with a dynamic page ache ...
What about using its own constant for the header name? On top of that I think we should keep similar and use "X-Drupal-Dynamic-PageCache" maybe?
Is it a new thing that we don't use the classname anymore? I easily could have missed that because I don't care and would ideally not add it.
Comment #407
dawehnerBut yeah, obviously nothing of that blocks the commit, but the constant would be nice!
Comment #408
chx commentedTYpofix. Do not credit me.
Comment #409
dries commentedI just wanted to let my co-committers know that I approve of the change to MAINTAINERS.txt that was introduced in comment #358. Feel free to commit that change along with the rest of the patch when it's ready. Thanks!
Comment #410
wim leersGood idea!
I forgot to mention my rationale for this naming.
page_cacheusesX-Drupal-Cache. Therefore, the logical choice fordynamic_page_cacheisX-Drupal-Dynamic-Cache.So I think the final thing we have to do is bikeshed the header name :P catch also wasn't sure about the naming I used here, but also wasn't sure about what to use otherwise.
Comment #411
catch:D
So my only concern about the header names is that
X-Drupal-CacheandX-Drupal-Dynamic-Cachemake it look like the dynamic cache is part of the page cache. I think it might be clearer if we used X-Drupal-Internal-Cache and X-Drupal-Dynamic-Cache to match the module labels rather than machine names - since then it's more clear that they're side by side.Fine with deferring better header names to a follow-up though it's extremely minor.
With hook_help() I think that'd be much easier to get right on an issue without a pager, but it means more disruption for translators that way since we'd add a bunch of strings then change them again, so not sure either way on that.
Comment #412
marcvangendI'm excited to see this is RTBC.
Not sure if this is the right place to comment on the handbook page, but it needs some care. For instance, the sentence "This feature improves performance because it speeds up the site" means absolutely nothing.
Comment #413
wim leersIn this reroll, I introduced the constant that @dawehner asked for, which will help addressing @catch' renaming feedback when we do think of a better name, in this issue or another.
#412: Agreed! That handbook page — you guessed it — is mirrored after
page_cache's handbook page. And I already did a lot of clean-up, to make both handbook pages acceptable.Comment #414
catchCross-post with #413, didn't review the interdiff yet.
Here's a try on some changes.
The Dynamic Page Cache module caches pages for all users in the database, handling dynamic content correctly.
Caches pages for any user, handling dynamic content correctly.
Caches pages for anonymous users. Use when an external page cache is not available.
Pages are cached the first time they're requested if they are suitable from caching, then the cached version is served for all later requests. Dynamic content is handled automatically so that both cache correctness and hit ratio is maintained.
This reads a bit more aggressive than I think it actually is.
If something within the page has been auto-placeholdered, then the cacheability metadata that caused it to be auto-placeholdered won't be present in the response.
So this is only a fallback in the case that something in the page should be auto-placeholdered, but for whatever reason cannot be. When it has been placeholdered, then while it wasn't worth caching as part of the overall response, it 1. doesn't make the response itself not caching (because we're caching it with a placeholder instead of the content itself 2. might be cached independently per-user.
Or to put it differently, the problem here is not with the cache contexts themselves usually, but the fact that they've bubbled all the way up to the response level without being placeholdered. I think that'd be useful information for the person who ends up reading this comment when they arrive here wondering why their custom controller isn't getting cached by dynamic page cache ;)
Thanks for the @todo, I nearly opened a duplicate of that issue before I spotted the link :P
I had to read this three or four times. The first time I read it, I thought it was saying it's pointless to run before ContentControllerSubscriber. What it's actually saying is ContentControllerSubscriber is pointless when Dynamic page cache runs. Maybe 'not effective' or 'a no-op' instead of 'pointless'?
I think the better reason for excluding admin routes is that they'll have a very low cache hit ration due to low traffic and form submissions. Not sure the cacheability of most admin responses is really lacking - most admin pages are config forms and views at this point.
SmDynamic artCache sounds like a club night.
Comment #415
cosmicdreams commented@#396 That's fair. I guess it would be too bold / assuming for us to declare the cache is smart and that they meaning would be to heavy on cultural meanings. I still like calling it smart cache as it peaks the curiosity into knowing how it's smart or if the reader is too impatient, garners some trust / mistrust in believing that it's actually smart.
Perhaps that's a problem, in that the cache would be become the scapegoat of things it's not really responsible for. I'm a fan of having less magic.
You could have called it Jonny and I would still be happy to see it in.
Comment #416
wim leersAddressed all feedback in #414. Agreed on all counts. (Also, I forgot to commit #408 locally, so that interdiff is included here again. Just a single typo fix.)
:D ROFL!
Comment #417
wim leersDammit, a single stupid space snuck in! Wim--
I think that can be fixed on commit though :)
Comment #418
fabianx commentedUpdated proposed commit message to:
Excluded chx per his own request.
Comment #419
catchUpdated the issue credit per #418 (I think).
Comment #420
wim leersUpdating the proposed commit message, to exclude some people who did not actually contribute, and add some people who did. Reordering by contribution impact.
Comment #421
catchComment #422
catchComment #423
torgospizzaI am thrilled that I was able to contribute even just a little bit. Names are important, and I think Dynamic Page Cache (or "DPC") not only sounds cool and slightly futuristic but also is descriptive enough to provide at least some information as to what it does and how it works. Looking forward to seeing this get committed, and I reaaalllly can't wait to see the benefits! Nice work everyone!
Comment #424
dom. commentedJust saying here, (because I don't know where else), I renamed the link from "Smart Cache" to "Dynamic Page Cache" in the documentation page at /documentation/modules/internal_page_cache.
Comment #425
effulgentsia commentedI read through the patch again, and am about to commit it shortly. Here's some feedback for follow-up material though.
Something is grammatically (and maybe semantically) wrong with that first sentence.
I think page_cache and dynamic_page_cache should use the same cache bin.
"exploits" sounds like something bad and security related. Let's find another word.
I think Fabianx had comments somewhere on this thread to change from caching by 'route' to caching by 'url'. Also, I wonder if 'user' and user.* could be determined without routing (if we can either deal with or ignore the dependencies of authentication providers on routes). In which case, we can get this into an earlier running REQUEST priority and maybe eventually into a middleware.
Exceeds 80 characters.
Trailing space.
AuthenticationSubscriber has 2 request listeners, so this comment is ambiguous as to whether/why it needs to run after the 2nd one. Also, can maintenance mode be dealt with via its already existing page cache kill switch instead of requiring this to run later?
Any reason not to receive $request as a controller parameter instead?
Let's either put something user-specific into the markup or comment why we aren't (i.e., if it's to test this independently of #2558599: Automatically assign user cache contexts/tags when using current_user service or some other important reason to remind future people reading this code of).
Wrong comment for this function.
Shouldn't we have some test coverage for this with the module enabled as well?
Comment #426
effulgentsia commentedI agree. Adding commit credit to @torgosPizza for coming up with the name.
Comment #428
effulgentsia commentedPushed to 8.0.x. Hooray!! Amazing work, everyone! I'm looking forward to seeing this usher in a whole new level of Drupal performance, especially if follow-up and/or contrib work can move this into earlier and earlier running code and eventually into reverse proxies.
Comment #429
jibranCongrats @Wim Leers @Fabianx and all the reviewers. Yay!!!!
Comment #430
torgospizza@effulgentsia: Aw, shucks. You are too kind.
Super exciting. Congrats everyone!
Comment #431
andypostThat needs new change record about new module which should point to updated help page https://www.drupal.org/documentation/modules/dynamic_page_cache
Comment #432
wim leersNext steps, in order of importance:
Change record written: https://www.drupal.org/node/2565453. I did not yet publish this CR so I can work on expanding relevant handbook pages first, so the CR can link to them. I will publish the CR tonight.
#425: filed #2565455: Follow-up for #2429617: small improvements for that, and for your point 4, we already have #2541284: Investigate moving Dynamic Page Cache into a middleware at the cost of doing authentication manually if needed, which I now unpostponed.
Committers: can you add a
dynamic_page_cache.modulecomponent?Comment #433
tr commentedThis commit broke a lot of Ubercart tests in weird ways. I would appreciate some detailed documentation describing the impact and implications of this change in a way that would allow me to determine why my tests are failing, how to fix them, and how to avoid problems in the future. The change records are less than sufficient for this purpose. It seems that this change is awfully fragile in the sense that unless code is written in a very specific (and undocumented) way it will fail to operate correctly. In that sense this is a big break in BC. In my case it broke tests that have been working for more than a year.
Comment #434
fabianx commented#433: As a quick fix (get tests passing again, then fix properly) what you can do is:
- Add no_cache: TRUE to your affected routes in the option section (see https://www.drupal.org/node/2463533 for more information)
- If you return a render array from your controllers, add somewhere where you have dynamic data
$build['#cache']['max-age'] = 0;
This can be close to the return of the controller or anywhere else.
Once you get tests running again, you can gradually make things properly cached with DPC:
See:
- https://www.drupal.org/developing/api/8/cache/tags
- https://www.drupal.org/developing/api/8/cache/contexts
- https://www.drupal.org/developing/api/8/cache/max-age
You can also watch or talk https://events.drupal.org/losangeles2015/sessions/making-drupal-fly-fast... (later half) or review the slides here:
http://wimleers.com/talk-making-drupal-fly-fastest-drupal-ever-near/#/
That should be enough to get you started!
The short version of all that is:
- You need to tell Drupal on which conditions your content varies on, then Drupal can perfectly cache it or decide to not cache it (max-age=0, cache-context = user).
This is very similar to Vary http header and Cache-Control: max-age=... overall.
Comment #435
wim leers#433: You're absolutely right, I completely failed to explain that in the change record! My sincere apologies.
I added useful information to the CR, to help contrib developers: https://www.drupal.org/node/2565453. I also see that that CR was not yet published. I published it now.
If you're stuck, you can find me on IRC, I'd be happy to help you out! Sorry for the pain right now, but the end result will be that sites using your module will be much faster once you've done this work :)
Comment #436
tr commentedMy tests fail with Dynamic Page Cache uninstalled, is that an expected result?
Caching is an optimization; it shouldn't be breaking code that isn't "cache-aware". Requiring code to include caching directives at all stages of development is effectively premature optimization (the root of all evil...), with all that implies.
It's wonderful that this change enables core to be so much faster and allows cache-aware contributions to be sped up as well, but we're not building static sites here - we shouldn't be assuming that everything can be cached unless it explicitly says otherwise. Just the opposite - the reason to generate content via code is to deliver dynamic content, which by definition is changing.
Comment #437
berdirIf it fails anyway then no, it's not related to this, at least not all your test fails.
Nope, don't agree with this at all. premature optimization has nothing to do with breaking or not. Caching always has downsides, in that you need to think about cache invalidation and cache variation. That's by design. premature optimization is about trying to optimize upfront without verifying/profiling that it is necessary.
We're well aware that we're not building static sites and things change. That's why we have cache tags, contexts and max-age built into render caching. Yes, we deliver dynamic content, but in most cases, we dynamically build and deliver the same dynamic content. That's *exactly* what (render) caching is about.
Anyway, every time we comment here, we notify 100+ people. Please open issues in your module's issue queue if you have test fails related to smart cache/DPC, Wim/Fabianx/me/... are happy to help, just tell us about it.
Comment #439
dawehnerTo be fair, this added a critical issue: #2579847: /node/add is lacking cacheability metadata, causes problems when cached by Dynamic Page Cache and "Use admin theme when editing or creating content" is turned off
Comment #440
yesct commentedcaused #2579847: /node/add is lacking cacheability metadata, causes problems when cached by Dynamic Page Cache and "Use admin theme when editing or creating content" is turned off
Comment #441
wim leersWe now have a
dynamic_page_cache.modulecomponent, so let's move it there.