Problem/Motivation

  1. #2429617: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!) looks to offer a very nice performance improvement to authenticated users thanks to more caching.
  2. The downside with it is that it requires route controllers to add the cache contexts that they use to the render array that they return. For example, if a module has the following route controller:
    function foo() {
      return [
        '#markup' => \Drupal::currentUser()->getEmail(),
      ];
    }
    

    Then whichever user first visits that route after a cache clear will see his or her email address, but then all other users will see the first one's email address instead of their own. All the module needs to do to fix this is to associate the appropriate cache context as so:

    function foo() {
      return [
        '#markup' => \Drupal::currentUser()->getEmail(),
        '#cache' => [
          'contexts' => ['user'],
        ],
      ];
    }
    

    But, this requires knowing and remembering to do this, and the consequences of not doing this can be bad.

  3. One mitigation that's already in place is that 'user.permissions' is registered as a required cache context, so forgetting to assign that one wouldn't result in information disclosure to less privileged users, but as the above example shows, you can still get into undesirable information disclosure situations among users with the same permissions.
  4. In addition to its own performance benefits, an upside of getting SmartCache in is that it would help module developers discover sooner when their route controllers don't have the correct contexts associated. Thus, additional performance improvements (via BigPipe, ESI, etc.) can be viable in contrib with the benefit of relying on accurate controllers.

Proposed resolution

  1. Add SmartCache to core prior to 8.0.0-RC, and enabled in the Testing profile. Because doing so after RC or in a minor release increases the likelihood of contrib and custom modules failing to provide their cacheability info due to it not being used or tested.
  2. For RC, enable SmartCache in the Standard and Minimal profiles. Because if during RC or during the first few months of 8.0's release, we discover that too many modules or sites have information disclosure bugs with it enabled, we can choose in a patch or minor release to switch to disabled by default, but it's riskier to switch in the other direction.
  3. For RC, have routes use SmartCache by default unless explicitly opted out. Because if during RC or during the first few months of 8.0's release, we discover that too many modules or sites have information disclosure bugs with this default, we can choose in a patch or minor release to switch to opted out by default unless there's explicit opt-in, but it's riskier to switch in the other direction.
  4. Add automagic safeguards for node grants and per-user. To the extent that these safeguards work, they should cover all core contexts that we would consider to be a security bug if missing. Other missing contexts, such as timezone or language, would be bugs but not security bugs. Contrib modules that provide security-sensitive contexts (e.g., an LDAP field) can provide safeguards for them similar to the core safeguards.
  5. Because HEAD is already caching entities, blocks, and Views without the above safeguards, and we haven't yet become aware of widespread problems with that, those issues are not Critical (they will not block an RC) at this time. However, having them in place will increase the likelihood of us being able to retain the enabled-by-default and opted-in-by-default settings.

Remaining tasks

  • Decide if the benefits outweigh the risks.
  • Consider whether there's any ways to further mitigate the risks.

Benefits

  1. Speed: 2-4 times faster page loads
  2. Help surface and fix bugs in contrib
    1. Easily overlooked bugs: Like PageCache, SmartCache has helped uncover lots of hidden bugs, those problems would've bitten D8 sites at some point, so better to discover the remaining ones sooner
    2. Security: Security problems uncovered sooner: if a contrib/custom module has integration tests with SmartCache enabled, it helps ensure security aspects are tested
    3. Best practices: Developers will learn that it makes sense to test things as different users.
  3. More speed
    1. Upcoming BigPipe core patch: SmartCache helps BigPipe have more impact, because it makes BigPipe faster: it allows the initial response to be sent much faster (and so point 5 indirectly helps ensure BigPipe
    2. Unblocks even more speed improvement in contrib via reverse proxies (Like PageCache helped ensure reverse proxy compability for anon users (by forcing module developers to think about cache tags), SmartCache will do so for auth users too (by forcing module developers to think about cache contexts))

Risks

Two choices to make:

  1. When shipping SmartCache, opt-out (module installed by default) or opt-in (module not installed by default)?
  2. Ship SmartCache with 8.0.0 or later (8.1.0)?

Risks in shipping in 8.0 enabled by default:

  1. Uncover more criticals, hence delay release
  2. Broken contrib & custom code (missing or incorrect cacheability metadata): security bugs

But:

  1. Delaying to 8.1 is a way to mitigate risk 1, but it would accentuate the second risk.
  2. Disabling by default is a way to mitigate risk 2 — but with a big caveat: people will read articles and follow advice and in doing so still enable SmartCache, and still be at risk.
  3. Both of the risk mitigation options (delaying or disabling by default) increase the likelihood that module developers won't notice the problem until later if at all, because they don't test with it, because it's not present by default, and from the beginning of the D8 cycle.
  4. Moving SmartCache into contrib suffers all of downsides/risks of either delaying or disabling by default, but even stronger: because it's then completely optional, meaning that an even smaller subset of module developers will test and develop with SmartCache enabled.
  5. If you have a very low tolerance for information disclosure risk, just disable SmartCache (and depending on your level of paranoia, also blocks and/or all of render cache).
  6. The most common way to do access checking, is permissions. To mitigate the risk, we already made the "user.permissions" cache context a required cache context (so that anything that is render cached is varied by this cache context), to ensure that any site whose security depends on permissions, is guaranteed to be secure.
  7. The "delay release" risk can be mitigated by adding the "no_cache: TRUE" option to any route whose controller's cacheability metadata is found to be incorrect.
  8. If SmartCache is optional, any site wanting to use SmartCache needs to audit every single bit of code of every contrib module they use.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

effulgentsia created an issue. See original summary.

Anonymous’s picture

In the context of BigPipe/ESI pushing this out, finally, is a no brainer to me.
Not to mention that Drupal 8 is still in development so how we say developers will have to develop, that's how it will be.
If you think about the "difficulty" of switching to OOP for the most Drupal-bread developers, this is nothing.

Soul88’s picture

I wasn't too active in core issue queues, but as a module developer and constant follower of performnce-related issues I strongly believe this should get in for 8.0.

The reason I think so is that:
- it seems to me that info disclosure won't be critical in 99% cases, even if it's not fixed. And the fix is pretty trivial.
- on the other hand we have clients who don't like Drupal (Drupal 7) because of its slowliness and so now they prefer frameworks like Yii or Symphony for their project. And in direct comparison Drupal 8 is even slower (I know it's not a correct comparison, but not every possible Drupal user is geeky enough to catch the difference). I also think that getting this in, is very important for those future performance comparisons on 10-article sites that show not that much for real-world sites, but will be made and intensively discussed on conferences etc: Drupal 7 vs Drupal 8, Drupal 8 vs Wordpress/...

So I think that disruption is pretty minor as of pre-RC stage, but with this patch it would be much easier to convince non-tech decision-makers to go with Drupal and have one more argument for the "way too many queries on the main page" "experts".

Wim Leers’s picture

Issue summary: View changes

Expanded the IS with a "Benefits" and a "Risks" section.

almaudoh’s picture

I also think that getting this in, is very important for those future performance comparisons on 10-article sites

with this patch it would be much easier to convince non-tech decision-makers to go with Drupal

These are really strong points. Most people are not going to do apples for apples comparisons of D7 -> D8 or WordPress -> D8, etc. Rather, most people will do out-of-the-box comparisons.
I also strongly agree we should have this in 8.0.x

Anonymous’s picture

Uncover more criticals, hence delay release

I see this as a positive thing rather than negative.

bojanz’s picture

Broken contrib & custom code (missing or incorrect cacheability metadata): security bugs

I want my code broken sooner rather than later. "Tomorrow" is a much better time for it than in 6 months when we're all busy building sites.

benjy’s picture

Uncover more criticals, hence delay release

I see this as a positive thing rather than negative.

Agreed.

+1 to getting this in now, the sooner the better for something like this as long as there is no disruption caused directly by SmartCache.

oriol_e9g’s picture

I'm not active in core but +1 for one of the Drupal 8 slogans: fast by default.

SmartCache in and enabled.

swentel’s picture

Uncover more criticals, hence delay release

Seriously, say, another month longer, that's nothing compared to the last few years now :)
And I agree on the security part: I'd rather have this on by default so

  • my custom code can break potentially immediately
  • I can rely on it for my contribs because it's in core, and not on a contrib module
hchonov’s picture

+1 for SmartCache going now in and enabled by default.

I also think now is the perfect time for SmartCache to get in, before all the contrib modules are finally ready ported and released as well the Drupal core. The sooner we find the problems and fix them the better instead of waiting till 8.1 and then having to spend probably a lot of time fixing bugs, which we could do at the moment before Drupal 8 is officialy released, even if it means one or two months release delay.

Drupal 8 brings a lot of changes already and bringing up such a major change in 8.1, when we already have our Drupal 8 sites ready and runing would make our life harder.

webchick’s picture

Everyone seems pretty aligned here, but just for transparency... the core committers have a call this Friday where we plan to discuss this and make a decision, so if you have counter-points, now's the time to get 'em out.

cosmicdreams’s picture

I am channeling my inner Chx here:

Security is the number one concern. It has precedent over every other concern. It has to. We have to.

I get that folks will want to use SmartCache and that by using it will open themselves up to security vulnerabilities if they don't get their cache contexts right.

With that in mind, is there anything more we can do to help contributed module developer mitigate the risk? Specifically can we:

1. Create a standard / copy-and-paste kind of test that checks the results of a route for different kinds of uses. One that contributed developers could modify slightly and include in their contributed module to verify they don't have information disclosures.

2. Can Smartcache prevent information disclosures by having better safeguards.

3. Can we automate tests on contributed modules to detect that they improperly disclose protected information? Warn people that the modules that do are insecure and disable any projects that are insecure over a period of time?

effulgentsia’s picture

#13 is worth thinking about some more. Meanwhile, here's some thoughts on the feasibility of this going into 8.1 vs. 8.0. TL;DR: I'm not very happy with any of them:

  1. We tell contrib authors that the "public API" is that a route controller is responsible for providing accurate cache context information, even if SmartCache isn't in core for 8.0. And that therefore, we can commit this into 8.1 without "breaking the API". But, I think this would violate the spirit of semantic versioning even if not the letter. The reality is that without something that breaks when this condition isn't satisfied, lots of modules won't satisfy it, and then introducing the break into 8.1 would in practice be a large BC break.
  2. We commit something other than this patch before 8.0 that is able to test (whether automated or manual) when a route controller fails to provide the necessary cache contexts. But, I have no idea what this could be. Seems to me that SmartCache itself is the best exerciser of this.
  3. We commit this in 8.1, but with all routes opting out of it by default. And we add some flag for a route to opt in. But, this would mean that many contrib routes would remain opted out, and for those routes, there would be a potentially large performance loss compared to us committing this before 8.0.
  4. We commit this in 8.1, but with some fancy change to or decorator around AccountProxy that automatically pushes a 'user' cache context onto the current render stack whenever one of its per-user methods (such as getEmail()) is called. But, this is just an idea at this point, and I don't know if this idea will be workable or sufficient.

Any other options?

dawehner’s picture

Let me quote myself from some of the email threads:

While I really like smartcache as general idea (really elegant
solution for a problem),
I think, especially at that time of both release and the state of the
project in general, we have to discuss whether the risks
are worth the gains, when we decide to enable it by default. This is
certainly a release manager decision. Something which might work
is something like: Add it to core disabled by default, get experience
in 8.1/8.2, empower our tools and APIs and then enable it by default
in the future.

Block cacheing is a bit different as many designs of sites don't show
that many sensitive information in blocks.

For core, this could lead to another vulcano of criticals over time,
which moves the release further into the future. (note: this is not
about the problem that contrib/custom might not get rid ride from the
beginning).

So yeah I would vote to commit it but not enabled it by default. If we care about Drupal 8 being released sooner than later, I would vote for doing that.
Btw. I give a shit about economical reasons, but I'm worried about the burnout rate of the community. If we loose another month, we risk another level of burnout.

saltednut’s picture

...commit it but not enabled it by default...

+1

This assumes people that use it will understand the implementation details, including potential security risk. I think that is an okay assumption as long as we provide release notes and documentation around implementing SmartCache.

We can keep the current timeline for core release and then work on addressing the points of #13 toward an 8.N release where SmartCaching could become the default for OOTB Drupal in the standard profile.

Block cacheing is a bit different as many designs of sites don't show that many sensitive information in blocks.

I'm not sure if that's a safe assumption. Given that nearly everything on the page is a block in D8...

effulgentsia’s picture

For core, this could lead to another vulcano of criticals over time...So yeah I would vote to commit it but not enabled it by default.

How would the latter help with the former? Because we think core has a bunch of missing cache context associations, and not enabling by default will decrease the rate at which those are found? Or, that when found, we wouldn't prioritize them as critical since it would only affect a minority of sites? I think the latter would only be true if "enabling it" cannot be done via the UI (i.e., if instead of it being a module or configuration option, if enabling it required putting something into settings.php or sites.services.yml).

In either case, I wonder if we can mitigate the release slippage by having a quick fix for when the issues are found (e.g., allow a quick patch in that opts the offending route out of caching (we already have the no_cache route option that does that) with no "Needs tests" requirement) followed by a non-critical issue to fix the issue more properly and add tests. I don't think this mitigates the release delay risk completely, but perhaps by enough?

dawehner’s picture

Having no "needs tests" requirements for security fixes sounds like a pretty shitty idea. Well the thing is that potential every controller, which does not have a problem at the moment, could now cause new issues.

I mean I totally like to fix them rather earlier than later, but I just fear that smartcache will cause issue in both custom/contrib world, not just now, but also later.
I'm curious whether we could provide some wrapper which detects whether at least the common cases are covered with the required cache contexts.

Wim Leers’s picture

Having no "needs tests" requirements for security fixes sounds like a pretty shitty idea.

That's not what the IS nor @effulgentsia are saying.

Making a certain route opt out from SmartCache (by setting the no_cache: TRUE option on that route) also means that any security problems are avoided, precisely because there only is a security problem while that response is being cached.

i.e. any security problems uncovered can be solved in two ways:

  1. Fast fix: Opting out from SmartCache by setting no_cache: TRUE ⇒ impossible to have security problem because of SmartCache ⇒ test coverage for that no_cache: TRUE option is pointless, so is fine to omit.
  2. Proper fix: Fixing the cache contexts ⇒ responses cached by SmartCache now should be varied correctly ⇒ test coverage is required.

I'm curious whether we could provide some wrapper which detects whether at least the common cases are covered with the required cache contexts.

Here's one idea: make WebTestBase::get() automatically do every request twice in case if there's a logged in user. Check that the cache contexts in both responses are the same. That'd at least catch one class of problems.

RainbowArray’s picture

The upside of SmartCache is that Drupal's back-end can be perceived as fast for authenticated traffic as well as anonymous traffic. That's a very big deal and will open the doors for Drupal to be used in ways that would have been difficult before. Personalization of Drupal sites for individual user needs will be feasible, because doing so won't result in the site slowing to a complete crawl. Customizing a site based on authenticated traffic will still have a performance impact for the parts of the page that are customized, but it will at least not break caching on the entire site.

Certainly, the possibility of contrib modules skipping the cache context system and caching things that shouldn't be cached is a problem. There are probably other ways that information can be shared when it shouldn't be through developer error, but since this is a new thing for people to think about, it's understandable the concern for the problems this might cause.

So the question is what is the best way to get the upside without the downside.

Developer education seems like one of the best ways to mitigate risk. In that sense the best time to commit this is any day before yesterday, the further back the better. More time allows for more education and more opportunities to unearth bugs before they are in widespread use.

Conversely, from all of the comments so far, it appears that the best way to increase risk is to commit this further in the future or to commit it now and make its use optional.

If developers do not face the risk of improper caching, it is more likely they won't use cache contexts properly. If SmartCache is in contrib or it is turned off by default, then the likelihood of improper caching showing up goes down, but the likelihood that developers will use cache contexts properly because they need to do so will also go down.

Turning SmartCache on at 8.1 or 8.2 is likely to be more problematic than right now, because there will be more things in the wild that will break.

If we leave this for Drupal 9, and turn this on at the beginning of that cycle, there will still be a risk. Yes, there may be more time for developer education, but in the end there is still a risk some people will do the wrong thing.

I think the upside of using SmartCache is very big. It's a key to Drupal being performant under a wide variety of situations. My vote would be to commit this with SmartCache enabled. Yes, some errors could surface, and that stinks. But the alternative is likely that this won't happen for 3-5 years until Drupal 9.

That too is bad for developer morale. There's a lot of work that's gone into making this possible. It would be great to see that work come to fruition.

cosmicdreams’s picture

I think it's appropriate to look at all the angles with this issue because on one hand Smartcache and everything it enables is super exciting and would be cool to play with. On the other hand it risks security holes and Security is the most important aspect of any kind of software. It must be given precedence over every other concern.

I see a lot of people voting for the idea of including the module but not having it enabled by default. I do not view that as an appropriately secure option. As soon as someone turns that on they open themselves up to a ton of potential information disclosure security holes.

How does one prove that, for their application, they don't have any of these contract-destroying, lawsuit liability threatening security holes? Write a bunch of tests for everything. Does that mean this is the barrier of entry for professional developers who want to use Drupal? Not an impossibly high bar, but a lot of us will have to learn how to pole vault in order to clear it.

Alternative idea: Security first

If there is any chance that a route can be insecure, let's secure it. Make "opt-out" (or in this cache no-cache) be the default for a route's handling of smartcache. That way if a developer naively turns the feature on they won't be compromising their site.

Performance will take a hit, for sure. But as we figure out better ways to ensure the security of routes without compromising the site we can extend the reach of SmartCache and improve performance. With the architecture and APIs in place we could spend a good portion of the 8.x.x development cycle improving performance in this way. We'd learn a lot of things along the way and cut loose older apis whenever Drupal 9 comes around. Maybe even leapfrog what we're doing now. Well, that's the dream anyway.

Questions:

In reading through the documentation for cache contexts, getting them to work for you doesn't seem overly hard. It's proving that everything is going to be OK that will be a barrier to some. How much testing will be enough? How can we be certain every page of the site is secure?

To that end: What is the scope of the security problem we see with SmartCache?
Are we just talking about custom routes?
Are we talking about system driven routes?
Every route?

Final thoughts

I really really want SmartCache, let's figure out a way to have it without risking security of the system / pushing Drupal 8's release to next year.

webchick’s picture

Can drupal_render()/the routing system/$bueller fire an exception if cache contexts are not provided? I think currently they're not a required part of the API, but making them such would force addressing security issues to the forefront.

The downside of that is it would make writing a "Hello World" module require understanding cache contexts, though. :\ OTOH, if this gets committed, regardless if enabled by default or when, that's basically where we're at.

effulgentsia’s picture

I don't know if the following is a good idea or not, but just putting it out there. I'm totally open to either discarding this idea, or tweaking it in whatever way makes sense.

What if as a way to ease the transition for contrib/custom modules, we create an interface (e.g., CacheContextAwareInterface or ReliableCacheContextsInterface) with no methods, but with documentation. Then, a controller class can implement that interface as a way of saying, "all of my controller methods return render arrays whose #cache['contexts'] can be trusted to be complete". Then, we can have something that runs before smart cache writing (e.g., a high priority KernelEvents::VIEW subscriber) that checks if the controller implements that interface, and if not, does something to the render array (e.g., adds 'user' to #cache['contexts'], if that's the only context we think poses a security risk worth safeguarding against, or sets #cache['max-age'] to 0, if we want to be extra conservative). It can potentially also implement a #lazy_builder around that controller in order to still let the overall page structure (i.e., blocks) get cached.

This interface can also provide the appropriate hint to BigPipe, ESI, PageManager, DisplaySuite, etc. for making decisions on how much to trust the cacheability information returned by a controller that those things wrap.

Meanwhile, we can also start out with none (or few) core controllers implementing that interface. And any followup core patch that wants to make a core controller implement that interface needs to demonstrate sufficient test coverage in order to be committed. And we can be committing such patches in both minor and patch releases throughout 8.x.

Then, in 9.x, we deprecate the interface and trust all controllers to provide reliable cache metadata. And in 10.x, we remove the interface.

Sutharsan’s picture

The downside of that is it would make writing a "Hello World" module require understanding cache contexts, though.

A simple Hello World already involves a routing.yml and a controller class. I don't think those extra few lines of code will really harm. I expect many people to copy a working example or use the Drupal console, all it takes is drupal generate:controller.

Berdir’s picture

I was initially quite worried about this, but I've come around and agree that it makes sense to include this.

I'm not sure I see why not enabling it by default helps however. Since when are critical problems in modules that are not enabled by default not critical? If anything, we'd have to explicitly document it as experimental and not officially supported? Which is not something we've ever done I think (except maybe profile.module which we made hidden?) and then we could just as well put it into contrib?

Wondering if enabling it by default and clearly documenting somewhere its implications wouldn't be better? On of the risks is that people will just enable it on production without having properly tested it before, which is a risk that can be mitigated if we have it enabled by default. The biggest risk is very likely custom code, not contrib.

I don't think there are any easy ways to automatically test this. We don't know what the correct cache contexts are, so we can't just build a script that checks them? The best we can do is provide guides on how to write good tests to cover this, which usually means to test the same with multiple users and make sure they all they the correct content. I think we're already doing a lot better for test coverage both in core and also contrib in 8.x than ever before. And as Wim said, page cache enabled by default uncovered a lot of bugs that would have otherwise only been found on actual sites. The situation there is of course a bit different because we already had page cache in core before.

As mentioned multiple times, the default cache contexts include user.permissions. That should cover most of the possible information disclosure problems, What remains is mostly per-user information which can be very problematic as well of course. I'm wondering how we can better protect against this by default. Two things I was thinking about is a) include route access cacheability in smartcache (the fact that it isn't caused one caching problem in content_translation that we could not test/reproduce without it) and b) automatically add the per-user cache context if the current user is passed in as a route parameter (e.g /user/1). But many modules have per-user output (e.g. private message's /messages) that are always by user and also don't need a special check for that in the access check so that doesn't help.

I can offer two things to help understand the impact of this as on contrib/custom project:

a) I can run http://d8status.md-systems.ch/ with the smartcache patch applied.
b) I can try to apply and enable smartcache in our install profile and run our tests, we've found probably dozens of core and contrib bugs over the last year and have a pretty good test coverage. For example, unlike simpletest, I'm running hundreds of test scenarios on the same installation.

(I've written on this a while, I noticed several comments have been posted meanwhile)

Wim Leers’s picture

Can drupal_render()/the routing system/$bueller fire an exception if cache contexts are not provided? I think currently they're not a required part of the API, but making them such would force addressing security issues to the forefront.

That doesn't work, because the mere presence of some cache contexts does not at all guarantee that the developer didn't forget to specify some.

This is exactly why we did #2493033: Make 'user.permissions' a required cache context — after a very long discussion at DrupalCon Los Angeles among many people (Mark Sonnabaum, Wim Leers, Crell, dawehner, webchick, Alex Pott and fgm). This effectively means that any rendered HTML page always has the user.permissions cache context.

This was considered a great compromise that would cover >=90% of potential security problems, because by far the most security-sensitive things are handled by permissions, and hence varying every HTML response (and thus also every response cached by SmartCache) by that cache context would mitigate the vast majority of security problems.

This issue is therefore about A) the remaining <=10%, B) double-checking that we still think that permissions are the >=90% use case.

So, thoughts about how user.permissions always being present is enough or not?

Berdir’s picture

What if as a way to ease the transition for contrib/custom modules, we create an interface (e.g., CacheContextAwareInterface or ReliableCacheContextsInterface) with no methods, but with documentation. Then, a controller class can implement that interface as a way of saying, "all of my controller methods return render arrays whose #cache['contexts'] can be trusted to be complete". Then, we can have something that runs before smart cache writing (e.g., a high priority KernelEvents::VIEW subscriber) that checks if the controller implements that interface, and if not, does something to the render array (e.g., adds 'user' to #cache['contexts'], if that's the only context we think poses a security risk worth safeguarding against, or sets #cache['max-age'] to 0, if we want to be extra conservative). It can potentially also implement a #lazy_builder around that controller in order to still let the overall page structure (i.e., blocks) get cached.

Making it opt-in per controller has already been discussed in the original issue. @Fabianx proposed that in #63 (or maybe that already was a response, didn't check), I +1'd in #91 but Wim was against it in #93.

Note that by controller can cover a lot. For example every entity_view route if we make that controller opt-in, which we really should. Page manager for example also uses an entity view builder to display itself.

Can drupal_render()/the routing system/$bueller fire an exception if cache contexts are not provided? I think currently they're not a required part of the API, but making them such would force addressing security issues to the forefront.

Not sure that makes sense. As mentioned, we already have user.permissions, interface language and theme as default cache contexts, which actually covers quite a lot already. Many controllers probably don't need more or they get it bubbled up through whatever they are displaying.

Wim Leers’s picture

Making it opt-in per controller has already been discussed in the original issue. @Fabianx proposed that in #63 (or maybe that already was a response, didn't check), I +1'd in #91 but Wim was against it in #93.

And I went to check why I said that at #2429617-93: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!), to not force everybody who's following this issue to do so:

See #76's reply to #64 for my rationale. In short: it needs to be opt-out, not opt-in, if we want the D8 ecosystem to actually support it. Otherwise we end up with the same as render caching in D7: core has the API, but nothing actually uses it.

Following that, I listed three reasons, only two of which apply to the proposal made by @effulgentsia here:

  1. The most important reason: opt-out is much better than opt-in, because it forces developers to think about this. Otherwise, it's going to be far too easy for developers to just not think about this and make the Drupal 8 ecosystem slow.

    (Emphasis just copied, but despite this being written 3 months ago, that's exactly the same arguments that have been raised in this issue's summary and in recent comments.)

  2. Hence the only possibility left is to use an empty interface for signaling, but then the "multiple controller methods" problem still exists.

    (And that problem is: what about classes that have multiple controller methods? This empty interface would apply to all controller methods, also when new ones are added.)

rickvug’s picture

We're looking at a 2x speed increase and opening the door to new rendering strategies being viable. Putting on my Product Management hat for a moment, these are massive wins when (arguably) the biggest knock against D8 is its performance profile. My opinion is that the risk/reward calculation is fairly simple at this high level. What needs to be determined is the best approach to take to ensure that D8 is shipped ASAP and contib module authors properly implement cache contexts.

I really like the approach @effulgentsia suggests in #17 to deal with criticals. It would allow Smart Cache to be shipped in 8.x without delaying release. Perhaps Smart Cache would be disabled in RC1 with a message to Contrib to test their modules with Smart Cache before RC2 ships with Smart Cache enabled by default. History shows us that "fast by default" needs to be in place for full adoption by both end users and contrib authors.

dawehner’s picture

because by far the most security-sensitive things are handled by permission

Well you know, this is IMHO a view biased a LOT by working just on core for a long time. For sites where content is actually secret for some users, so those sites,
which actually would have issues with bypass access issues, they use more complex systems, like node access or you know just simple custom code.
So sites which actually have security relevant content protection, would be exactly those sites which could have those issues "caused" by smartcache.

Berdir’s picture

Discussed my route access idea a bit with @WimLeers. To clarify, it doesn't make too much sense, since those two things should be quite separate, the only reason is to catch some additional edge cases where the controller code is not correctly implemented and access might cover that somehow. As I mentioned, we had one case in core where that would have helped but in a way it would have also hidden the actual problem.

effulgentsia’s picture

I'm still not sure if I like or don't like #23, but in response to #28:

what about classes that have multiple controller methods? This empty interface would apply to all controller methods, also when new ones are added

I don't see that as a problem. Controller classes don't usually have a huge number of methods. For example, system module has ~50 routes, but its controller class with the most controller methods, SystemController, only has 4. Which follows a general rule that classes in general should be of a size that can be comprehended. So when you choose to mark a controller class as being responsible for its cache context accuracy, I think it's ok for that responsibility to be carried into new methods you add there later. In the realm of custom modules, assuming any given controller class is maintained by the same person/team over time, this also means that the developer(s) had an opportunity to learn about cache contexts and how/when to associate them, and can then be reasonably expected to apply that knowledge to future methods.

opt-out is much better than opt-in, because it forces developers to think about this

I think that's one of the questions this issue needs to answer: given that Drupal 8 is already in its late beta, almost RC, stage, do we think it's better to force route controller developers to think about cache contexts? I think there's some solid arguments for both sides.

Otherwise, it's going to be far too easy for developers to just not think about this and make the Drupal 8 ecosystem slow

I also raised this as a concern in #14.3. Especially if it only goes in for 8.1, since then many modules will have been written and then sitting around without much attention. But, if per-controller-class opt-in is possible from 8.0-RC onwards, I wonder if that gives the ecosystem sufficient opportunity to opt-in. While I'd like to think that it would, I also realize that it might not, given various research evidence that in many areas, most people tend to stick with whatever is the default choice.

giorgio79’s picture

#13 "Security is the number one concern. It has precedent over every other concern. It has to. We have to."
If security would be the number one concern, no one would be on the web. :) Given leaks happen now daily, the question is not if but when a site will get hacked.

Facebook: “Move fast and break things”
Google's Gospel of Speed: "Research shows that if search results are slowed by even a fraction of a second, people search less (seriously: A 400ms delay leads to a 0.44 percent drop in search volume, data fans). And this impatience isn’t just limited to search: Four out of five internet users will click away if a video stalls while loading. " https://www.thinkwithgoogle.com/articles/the-google-gospel-of-speed-urs-...

So, performance should be the number one concern, and anything that is incompatible with SmartCache due to bugs should be removed from core, not Smartcache :)

Anonymous’s picture

effulgentsia:

What if as a way to ease the transition for contrib/custom modules,...

I think that between D7 and D8 code this statement has no weight to it. And for D8-ready code this change is negligible since updating the module's route response is a few minutes task.

dawehner’s picture

So, performance should be the number one concern, and anything that is incompatible with SmartCache due to bugs should be removed from core, not Smartcache :)

Here is a thing. Recently on a D8 project we had general performance issues on the frontpage. All what was done was to fix client side problems. The 200ms for the non-page cache frontpage has been not needed at all, just saying.
Personally I cannot support a project really which does not have security first, but you know, this is my inner german in me probably. I actually care about data privacy of people, I care about encryption, I want to host my own data. This is maybe though just an ancient view of the internet these days.

Anonymous’s picture

Can't we just autoinject user context in case no tags are provided?

hchonov’s picture

@ivanjaros:
unfortunately no, because this would then cache certain parts only for the current user. But it might be a block, which could otherwise be cached for all the users, because it does not contain any sensitive or user specific / private information.

I mean, enforcing the user context would not cause any problems, but would reduce the performance boost given by SmartCache and putting it automatically in would hide the caching problem from the developers, because then specific parts will be cached for each user.

catch’s picture

I think it'd be useful if we're clearer exactly what class of information disclosure we think are likely via SmartCache vs. the existing entity render and block caching in 8.x, #30 helps with that.

  1. Per role content - because there's no actual access check involved
  2. Listings varied by node grants - because we don't do an explicit entity access check, but instead rely on the results of the query
  3. Per user content - because there's no access check there either, it's just assumed only that user can see it

Per-role content is mitigated by user.permissions being applied everywhere. You'd only run into problems if you have roles used purely for the purpose of displaying different content where permission hashes end up the same. The only idea I have for that would be also applying the user.roles cache context everywhere, but it also seems like the least likely of the three cases for something actually bad to happen.
We already have the user's roles container in the hash - see https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Session%2.... Berdir pointed this out in irc. I was involved in the issue that added this, but forgot we'd done it.

So anything per-role is always fine.

For listings varied by node access, the issue we have is that we always assume that entities have been access checked before rendering - and when listings and node grants are involved, that 'access check' is done in only the query, not by the entity access system itself.

Two possible ways to deal with that:
- check entity access inside rendering too, then render an empty string or throw an exception if access is denied. This is quite a big behaviour change and means double access checking sometimes.
- add the cache context to the current render context from the node grants query alter itself whenever that runs.

If the second option there is viable, that seems not bad to me in general - it's already quite possible for a node listing in a block to forget to add list_cache_contexts, and it's an approach other query altering modules (like a contrib generic entity grants module or CPS) could also use.

That leaves us with per-user content and the example in the issue summary. This is really where there are likely to be risks, since lots of modules add per-user tabs or pages (like https://www.drupal.org/tracker/35733), and unlike per-role content or node grants, they're usually not trying to access control anything, they're just customizing something per-user (which then potentially exposes that user's data).

#14.4 is interesting for that - seems worth investigating at least.

Making routes opt-in doesn't seem as bad to me as others have said. Or at least if the choice is between nothing, off by default, or on by default with route-opt-in, I'd go for the latter.

In terms of implementing that, we could auto-opt-out all routes that don't have something specified, as we already do for admin routes. That then leaves open the possibility for a contrib module or core setting to remove that auto-opt-out again.

Wim Leers’s picture

- add the cache context to the current render context from the node grants query alter itself whenever that runs.

If the second option there is viable, that seems not bad to me in general

This is very interesting. This means that the only thing you have to remember as a module developer is the same thing we already require you to remember in D7: doing ->addTag('node_access') in your query.

I wrote a PoC and verified that it works as expected.

catch’s picture

Wim Leers’s picture

That leaves us with per-user content and the example in the issue summary. […]

#14.4 is interesting for that - seems worth investigating at least.

And here's a PoC for that. (There's a bit of trickiness to avoid a circular dependency.) This is very similar to #2351015: URL generation does not bubble cache contexts.

Manually tested it with the example in the IS, and it works.


Note that both of these PoCs are only possible thanks to #2450993: Rendered Cache Metadata created during the main controller request gets lost.

Wim Leers’s picture

So, if those two PoCs end up in D8 core, then we'll effectively have "automatic cacheability" for:

  • URL generation
  • tokens
  • node access
  • current user-specific stuff being used

And on top of that, we always vary by interface language, theme and permissions.

That sounds like a good compromise to still get the speed gains of SmartCache while reducing DX pain and potential security bugs.

Berdir’s picture

node access makes a lot of sense and should be quite reliable, not so sure yet about the other one. For example, one very common pattern is to do User::load($account->id()) as the account proxy is not a fully loaded user object and even if it would be, we still couldn't access the fields through the proxy.

I think we want to run tests with that, just to see what happens then, for a start. Wouldn't be surprised if some of those methods are called all the time, for example the preferred language methods when language negotiation based on the user is enabled. Which would *not* require per-user caching since we have that covered through the language context. Which I guess would actually work because there is probably no render context available there yet.

cosmicdreams’s picture

It is really great to see progress and great discussion around this issue. Thank you @catch for refining the scope of the information disclosure here.

If we take a step back and think about this from a system-wide architecture perspective, of course we want the system to have secure defaults and handle the access control in a reliable, predictable way. Of course we want to allow developers to build their own thing and in doing so make it possible to override these systems. Of course we should provide detailed documentation about how to safeguard their routes, and verify that what they've made is secure (enough).

We should avoid letting a developer naively stumble into causing security issues. I like the proposals for generating errors to report to the screen whenever a route was created but didn't explicitly specify the cache contexts. If we do that we should point people to the documentation about what cache contexts are and how you should use them. Don't underestimate the power of constant annoyance. It's a great teaching tool.

Questions

  1. Can we make easier for developers to detect they have a security problem? As mentioned above it will be difficult to Unit Test because we don't know what the proper cache contexts should but it should be possible to functionally test this. We just need to know what scenarios we want to test if for, setup those tests and see if we have an improper information disclosure.

    That sounds like a common scenario that developers will want to use to test a lot of different things. So it makes sense to me that we could have some kind of common code that developers can find and repurpose. Might be a good introduction to testing for new testers.

  2. What do you think about making an appropriately annoying (like a message to administrators that is on the screen until the security issue is handled) whenever the security standards has not been met for cache contexts. Alternatively, we could throw this into a D8 version of the Security Audit / Security Kit module.
catch’s picture

I like the proposals for generating errors to report to the screen whenever a route was created but didn't explicitly specify the cache contexts.

We can't do that, because all render arrays automatically get permissions, theme, language etc. applied to them.

We can detect if there are any additional cache contexts defined on top of the defaults, but often there won't and should not be, so warning about the absence of them would not be helpful.

Can we make easier for developers to detect they have a security problem? As mentioned above it will be difficult to Unit Test because we don't know what the proper cache contexts should but it should be possible to functionally test this. We just need to know what scenarios we want to test if for, setup those tests and see if we have an improper information disclosure.
That sounds like a common scenario that developers will want to use to test a lot of different things. So it makes sense to me that we could have some kind of common code that developers can find and repurpose. Might be a good introduction to testing for new testers.

It might be possible to do something like this - it'd be a hack and it'd have to be a development-only thing, since there'll be false positives etc.

- add a listener to render cache sets
- in that listener, get the rendered output for anything sensitive for the current user - user name, e-mail etc.
- if any of those strings are within the rendered string, check if the render cache item has the user cache context
- if not assert() or trigger_error()

But for example, that would fail with this issue comment listing, since it shouldn't have a per-user cache context, and yet my username is shown - just my username being rendered doesn't indicate information disclosure at all. It might work for e-mail addresses though at least more of the time.

RainbowArray’s picture

What if a route controller could set a property, something like cache_contexts_considered = TRUE that would prevent a warning from being triggered if no additional cache contexts are set beyond the defaults. Maybe that particular property name isn't great, but couldn't something like that allow us to annoy developers while giving a way to prevent that annoyance for those who've considered the cache contexts and decided the defaults are fine for that controller?

Soul88’s picture

I wonder if it is possible and make sense to create some kind of status page analog for routes where all the routes with empty cache contexts would be red?

Sutharsan’s picture

Can we make easier for developers to detect they have a security problem?

I'm thinking about exposing cache contexts/tags in the html output to make it visible and easy accessible. Either like Twig debug (in html comments) or like Theme Developer/Firebug (in html attributes + JS + mouse hover). The challenge will be to only show the relevant data, not everything that bubbles up (?). Would this be the right information to determine the (security) holes in the cache configuration?

catch’s picture

@Sutharsan that already exists to some extent in http://wimleers.com/blog/renderviz-prototype

Fabianx’s picture

So in addition to what Wim and catch said please also consider:

Render Caching is already everywhere in Drupal 8.

- Blocks are already cached in that way
- Entities or any content added or attached to an entity is cached that way
- Views are already cached in that way and with that most listings in current core

So there is already pretty much that can go "wrong" in regards to caching - so the more people get educated about the fact they need to deal with that-caching-thing, the better, because then we get less information disclosure in blocks and entities and views as well.

I also agree with bojanz, that break-earlier is better than break-later.

TL;DR: Smart-Cache should go in and be enabled by default. The only remaining question is: Opt-all controllers / routes in by default and opt-out admin + sensitive or opt-in specific controllers / routes for the 99% cases and work with the major core and contrib modules that display pages (panels, display_suite, views, commerce) to enable smart cache on their custom controllers.

Of course controllers are special, so lets examine them:

Train of thought

- Admin routes => opted out of smart cache specifically for now
- "State changing" routes; e.g. the node/1/edit route => should be marked as an admin route as well, so opted-out.
- "Non-state-changing" routes

Usually on a site in my experience 99% of relevant traffic goes to entities or specialized grouping-controllers like display_suite or panels or views.

So with those being on board, smart-cache would already be very might useful.

And usually in my experience people just activate panels caching - without thinking too much about the cacheability problems that might bring.

The big benefit of smart cache is not that the result of the controller is cached, but that the whole HTML + Assets + Placeholders can be cached.

This works because a page usually consists of blocks (within regions) + main content and because the blocks are already render cached.

The big problem we face here is in a nut shell:

We do not know if the main content is cacheable or not.

If the same as for blocks we knew that the controller was cacheable / cached, we would not have a problem, we would just cache what is cacheable.

Already for entity controllers, views controllers and maybe others that is true, because those just return a #pre_render + #cache render array.

In the following I am gonna concentrate on just the 'main content' part of the page, not the rest.


The problem of making the controller opt-in or out is that a controller can have different responsibilities, so the is-this-cacheable consideration needs to be done per route or even per controller callback.


So a possbile opt-in proposal would be:

- Have a smart_cache.controller_default_cacheability very similar to the default renderer config that can be specified for auto-placeholdering which is set to:

max-age = 0, contexts = [], tags = []

With that information and the auto-placeholdering the main content could theoretically be placeholdered (we need to add a placeholder for #title, so that it is calculated after processing placeholders, but that is doable; also would need to preserve the order to have main content first), so we at least save re-creating all the blocks again.

Then we need to decide on a way how to override that default thing. That is the same problem space as the forms issue (Opt out all forms of caching, but allow them to opt-in to caching.)

Maybe the solution is for that returned render array to have a #cache + #pre_render return to override the default cacheability with the #cache data. Or maybe we need a special 'render array' key that it is cacheable, which overrides the default controller cacheability.

e.g.

$build += [
  '#cache' => [
    'max-age' => Cache::PERMANENT,
    'contexts' => ['user'],
    'tags' => ['user:2'],
  ],
];

And that just overrides whatever is set as default controller cacheability, which would still allow someone to opt-in all routes for micro-caching of e.g. 10 seconds and per user.permissions or such.

That is just one possibility.


Another way more simple one is simply to say:

'smart_cache' => TRUE in the route definition, however that means that every module implementing a custom entity on a custom route would need to copy that and not get it automatically - not sure that would be a good or bad thing.

Overall I think the decision should be done per route (as no_cache already does) and in the rare case a mix is needed to return #cache:max-age => 0.


It would be fantastic if we somehow could make use of the information that what the controller returns is already 100% render-cacheable. That _should_ make entities and views automatically work with smart-cache (unless there are some render array levels I can't see currently).

However the second best thing would be if we could rely on max-age being correctly set by all controllers and such knowing all controllers - that don't opt out - are cacheable.


Conclusion:

I am still undecided if I am for opt-in or opt-out.

The pro of opt-in is: Secure by default + 99% of relevant user visited sites are just served by a few controllers. (It would be great if people could back that up with e.g. looking at their New Relic reports, which already for Drupal 7 group things by 'controllers' / the menu callback)

The pro of opt-out is: Fast by default + everyone needs to think about cacheability, hence reducing risk for views / entities / blocks (higher awareness) while slightly increasing risk for custom controllers that display per-user-data on a path that does not contain the users UID.

Fabianx’s picture

+++ b/core/lib/Drupal/Core/Session/CacheabilityBubblingAccountProxy.php
@@ -0,0 +1,171 @@
+  public function id() {
+    return $this->accountProxy->id();
+  }

I think if we bubbled here as well, we would also get berdir's case of loading the user from the account via:

User::load($account->id());

and indeed if you don't want that and need the raw current_user, just use the .non_bubbling version.

I think worth opening an issue for in any case.

Unless there is another way to get to the current user, that should solve the 'user' cache context problem as well.

andypost’s picture

+1 to enabled by default

suppose testing of controllers for cachability metadata is a contrib module task
for example devel could have submodule that iterates over routes and check for #cache* keys in result arrays

PS: related issue should allow us use real User entity so any reason for another proxy?

Wim Leers’s picture

#52: good point! AFAICT that allows us to have full confidence that the user cache context is bubbled automatically when necessary also!


I think #51 is much more complex than just adding clever auto-bubbling solutions for "per-user" (#14.4, decorated AccountProxy, PoC in #42) and "per-node grants" (#39, PoC in #40, green patch with test coverage in #2557815: Automatically bubble the "user.node_grants:$op" cache context in node_query_node_access_alter()).

So, let me ask a question:

If we have both auto-bubbling for user.node_grants:$op and user, are the security concerns then fully addressed?

Wim Leers’s picture

Moved #42 in its own issue at #2558599: Automatically assign user cache contexts/tags when using current_user service, now working on addressing @Fabianx' feedback at #53 and test coverage.

xjm’s picture

Crossposting from #2429617-325: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!):

Discussed with @alexpott, @effulgentsia, @catch, and @webchick. We were mostly agreed that it seemed safer overall to do this issue pre-RC than in a minor. (See extended discussion in #2556889: [policy, no patch] Decide if SmartCache is still in scope for 8.0 and whether remaining risks require additional mitigation.) We also discussed strategies to mitigate the risk of delaying the RC, including:

  • Addressing security-sensitive functionality (like node access, users, etc.)
  • Enabling it sooner rather than later to begin flushing out potential bugs.
  • Disabling caching on routes that show cache invalidation bugs.
  • As a fallback, disabling the module in core profiles if there are an unmanageable number of cache invalidation bugs.
  • Not postponing RCs on account of specific/manageable cache invalidation bugs being discovered.

We also plan to discuss the issue with Dries tomorrow given its potential impacts and disruptions for both the release timeline and the product.

dawehner’s picture

I would honestly think that this should be discussed on the security team level as well.

webchick’s picture

Emailed the security team with a pointer to this issue.

pwolanin’s picture

My take is: enabled by default for 8.0.0, opt-in for routes

I think the latter is important to avoid the majority of casual mistakes, while still providing the speed benefit to core.

greggles’s picture

In general Drupal 8 has taken some good steps toward being more secure by default. There's this handy blog post about 10 of them :)

It would be a surprising shift to add this feature and make it's default behavior tend towards being insecure.

What about #21?

If there is any chance that a route can be insecure, let's secure it. Make "opt-out" (or in this cache no-cache) be the default for a route's handling of smartcache.

Is there some reason this can't be done and then patches that add the appropriate cache contexts to the most expensive items can trickle in over time?

Berdir’s picture

It would be a surprising shift to add this feature and make it's default behavior tend towards being insecure.

I don't think that's fair. With the same argument, the improved block, entity, views and render caching in general would have made Drupal 8 insecure. This is just the next step.

To improve security, we a) already always cache everything by users permissions, b) have an issue to automatically cache by node access when invoked and c) have an issue to automatically cache anything by user if personal data of the user is accessed (I'm not convinced yet that part will actually work out as we think, though).

As written above, I've now added the smartcache patch to the daily test run of http://d8status.md-systems.ch/. The result is quite boring, there doesn't seem to be a single new test fail (those 3 changed are afaik random fails that come and go). A few interesting projects haven't been tested yet because they're still broken due to the language config schema change (google analytics, captcha, ds). There's the question of how much test coverage those modules have of course, but there were plenty of test fails when page_cache was enabled by default, when nodes were no longer cached per roles/permissions by default (now changed again), when blocks were cached by default. Not this time. That doesn't mean there are no possible issues but I think they are not as common as you'd think.

Ups, spoke too soon, looks like the patch wasn't applied properly. Will try again tomorrow.

*The thing is this*: Almost all of those modules, and I think that's true for the vast majority of modules are already forced to provide cacheability in many places. Because most of them either provide new or alter existing entity-based output, blocks, views and so on. And custom controllers are often limited to the backend. And invalidation already needs to work for the page cache. This change really only affects modules with *non-admin*, *non-form* controllers that are providing custom output (as opposed to e.g. listing entities) or things that happen outside of controllers (page top/bottom/attachments, for the most part).

If you look at what kind of modules needed specific fixes for cacheablity (not those that we fixed in a generic way like block access and entity/config overrides/alterations), then that list includes modules like forum, tracker, content_translation. Exactly those that have custom controllers that are visible to normal users.

But, as said before, I'm perfectly OK with making it opt-in for now. We can opt in most of core now anyway (including all entity and views output), which should cover most requests that a site receives.

dawehner’s picture

Another thing which we really should not try to forget is to look through all our hooks, alter hooks and events and check whether there are usecases for non statically cached results,
which then results in a need to be able to specify cacheability metadata:

Potential examples:

Berdir’s picture

@dawehner: That's a good point, but both of those hooks aren't a smartcache specific problem. prepare_view() for example can already be a problem if you have a list of entities in a view and change the view mode. But..

a) For smartcache, there is always the fallback to add things directly to the render context. So for that, you always have a way to make it work, if a bit ugly.
b) hook_entity_prepare_view() doesn't get a render array but it gets the entity. I didn't test yet but $entity->addCacheContext(['whatever']) should just work there, thanks to RefinableCacheableDependencyInterface.

dawehner’s picture

Well yeah for sure, just wanted to write it down, before we forget about it.

I do agree there are always sort of workarounds, I guess, but we should try to implement a sort of accessible API.

Wim Leers’s picture

So this was just discussed on the D8 EU criticals call. Key points:

  1. #2557815: Automatically bubble the "user.node_grants:$op" cache context in node_query_node_access_alter() fully addresses the per-node access concerns
  2. #2558599: Automatically assign user cache contexts/tags when using current_user service addresses the vast majority of per-user concerns, one that remains is routes that get a User entity injected via a URL argument (e.g. /tracker/UID
  3. moving all of these "safeguards" into a separate, hidden, required module that is enabled on existing D8 sites using system_update_N() was considered the best option. That means all existing sites get it, including those of other distributions, as well as new installs of other distributions.
  4. Also provide a "no safeguards" module that can be used in tests, for those controllers that either A) know that their cacheability metadata is complete, B) want to verify that enabling the safeguards does not cause additional cacheability to appear
  5. MetadataBubblingUrlGenerator should NOT be part of these safeguards, because it is more than a safeguard: it's a work-around to address an incomplete API without huge BC breaking changes.

See #2557815-14: Automatically bubble the "user.node_grants:$op" cache context in node_query_node_access_alter() and further comments, where this is happening.

dsnopek’s picture

re @greggles in #60:

It would be a surprising shift to add this feature and make it's default behavior tend towards being insecure.

I also think this is unfair. This isn't really "insecure by default" because the proposal is to add (by default!) cache contexts for a number of things, including permissions (the most common cause of access bypass), user (per #2558599: Automatically assign user cache contexts/tags when using current_user service) and node access (per #2557815: Automatically bubble the "user.node_grants:$op" cache context in node_query_node_access_alter()).

So, this will really only affect modules that generate content that varies by something novel which is security related. For those modules, they'll need to carefully consider cache contexts and if they get them wrong could cause an access bypass.

For that reason, I think having routes opt-in to being "smart cacheable" (as mentioned a whole lot of times in this issue) makes sense. If a module author doesn't understand cache contexts, they just don't mess with them. That is until someone is trying to make it performant in production (which is where everyone is going to want to enable caching, right?) and then they can sit down and really figure out what cache contexts they need to use, and hopefully, get it right.

So, from another security team member, personally, I think this proposal sounds acceptable. :-)

Fabianx’s picture

#65.2: Question to that:

#2558599: Automatically assign user cache context & cache tag in current_user service addresses the vast majority of per-user concerns, one that remains is routes that get a User entity injected via a URL argument (e.g. /tracker/UID

How would tracker/UID be a problem? As long as the user ID is in the route it should be cached per smart cache parameter via [route] anyway :).

If we use a container parameter no_safeguards: TRUE, would we then still need to put this into a module?

Overall it makes a lot of sense to be able to test easily with and without the default safe guards.

Berdir’s picture

@Fabianx: The concern is about access. Often with URL's like that, there's some special permission or something that allows you to access all users while normal users can only access themself. But as mentioned in the call, the 95+% use case there is a special access all X permission, and we have that covered with the user.permissions required cache context.

And yes, I'm still not convinced about making it a module myself, especially not the part about making it required and then having yet another module to remove it again. The only advantage I can see is that it would all be together. Either a single parameter to switch it, or providing a services.no_cacheability_safeguards.yml that reverts all the changes still makes more sense to me.

catch’s picture

#67 makes sense, anything in the URL will be covered.

#2540772: Add cacheability metadata to Page Manager's context providers so that blocks and access can be cached accordingly. happened because the context from the URL wasn't automatically applied, so it's actually less likely with SmartCache than a block.

greggles’s picture

I guess it's a good thing I didn't say "insecure by default" ;)

Having automatic protections by permission, user and node access (not entity access?) - seems like a decent solution. I didn't understand that component and I agree that makes this much more palatable.

I still prefer having no caching unless an explicit cache context value is set. We know that drupal_set_title in D6 and below was a huge source of xss. Then we made it filter by default and the problem basically stopped.

catch’s picture

entity_access() is already covered - when you call entity_access() the cacheability metadata is there, since #2287071: Add cacheability metadata to access checks. That isn't a specific cache context, it's the combination of cache contexts that the access implementations rely on - so it could be permissions, or it could be per-user (for edit own nodes) or it could be per node-grants or all three.

Someone could have an entity access implementation that doesn't add the cacheability metadata, but that would already be a information disclosure bug at this point since it'd affect entity reference rendering and other places too (at least for the view $op, for editing we don't cache forms yet anyway).

The problem with node grants is that it doesn't call entity_access() - it's just a query alter. Node grants has its own specific cache context - since different users might have the same node grants - but the spin-off patch from here should handle that automatically nicely. This and the per-user would be the first 'on-demand safeguarding' we've added - required cache contexts are just applied to everything since we assume they won't vary too much.

So I really think the only information disclosure chances are:

- per-user stuff
- node grants query altering without adding the node grants cache context
(hopefully both covered if those patches land)
- custom access/visibility by strange cache contexts like IP address and similar. However even for those, there's the option for a module to add them as a new required cache context (so that every single render cache item varies by IP range or country or whatever) then that's job done for them too.

I still prefer having no caching unless an explicit cache context value is set.

We can auto-opt-out/opt-in at the route level, but we can't detect explicit cache contexts vs. implicit ones - since it's the entire result of the controller that needs to be evaluated, and by the time you get there the implicit/required ones may have already been added (from render cache gets for example).

At one point we were talking about some kind of user.permissions-fallback or similar indicator that the cache context had been added from required contexts vs. explicit ones, but we couldn't figure out a working solution for that.

Fabianx’s picture

#68: However smartcache should be after access as it runs after routing (onRouteMatch event). So I think access for smart cache itself should be fine. (Correct me if I am wrong there, please).

#71: I think we can now do that via the container parameter, then tests can be run with / without required cache contexts enabled easily.

However maybe instead of one MEGA parameter, we should have the possibility to turn single safe_guards off or on.

Like:

cache.safe_guards: TRUE / FALSE # Especially useful for tests, defaults to TRUE

but also allow to selectively disable:

cache.safe_guards.user_permissions: FALSE
cache.safe_guards.node_access: FALSE
cache.safe_guards.user: TRUE

I think we should also have a safe_guards cache context (that is required) so that the cache always varies based on the safe_guards configuration.

Then likely most tests could indeed just drupalGet() twice.

Not sure how viable that is besides the drupalGet, but especially being able to run tests without safe_guards still feels important to me, especially if for the new KernelTestBase(TNG) it is possible to use dataProviders or dependent code to theoretically easily run a test twice - without much performance loss.

Wim Leers’s picture

However smartcache should be after access as it runs after routing (onRouteMatch event). So I think access for smart cache itself should be fine. (Correct me if I am wrong there, please).

Correct.

#71 + #72: RE: container parameter: #2557815-31: Automatically bubble the "user.node_grants:$op" cache context in node_query_node_access_alter() does use a container parameter to disable the behavior. I kinda like the idea of having it in a separate module because it makes it very clear what we want to remove in D9. If not a separate module, then I think a separate namespace Drupal\Core\CacheabilitySafeguards or something like that would also make it sufficiently isolated to group them together.

Crell’s picture

I'm a little confused by some of the earlier comments. Is the proposal to have an optional-but-not-really module that flips ON caching for controllers, or flips OFF caching for controllers...?

Wearing my subsystem maintainer hat (and coming a bit late to the party, I know), adding yet more options to the route definitions makes me uncomfortable. We already have a lot in there, and frankly I think there's flags we've got that I don't even know about. :-)

I am generally staunchly in favor of the "cache on by default" approach, for reasons already mentioned: If it's not in-your-face, no one will think to do it. It sounds like we actually have solutions proposed for most cases, which would allow us to enable smartcache by default with opt-out. The question becomes, then, even if we can't automate finding bugs for people can we make it break in some obvious way before a live disclosure issue happens? And/or, when something breaks can we make it clear what broke so that people can learn what to do right rather than having to guess?

chx’s picture

I wanted to avoid controversy and I tried to sort this out in private. However, since the implementation issue has been raised to critical I can not avoid posting this in public any longer. Sorry. Here's the mail:

While I was really pissed at how Drupal traditions were thrown away during the D8 cycle that's just me fuming. Most people didn't give a damn and past release noone will care. Possibly not even me.

But this "cache all the things". Noone, absolutely noone will be able to write code that caches correctly. Things will show up where/when/for whom they shouldn't and they won't where/when/for whom where they should. People will yell about this everywhere and leave Drupal in droves as a piece of software that's impossible to write bug free code in. I am in a bad position because if I bitch in public about this then it'll swept under the rug "it's chx doing what he always does".

But, here it is and I stand for it: this patch while technologically surely correct is a DX disaster.

pwolanin’s picture

Reiterating my opinion above - if smartcache is going to go into 8.0.0 is should be opt-in (off by default). I agree with chx that this is a DX problem if caching is on where it's not expected.

However, given the time frame we want to ship, I would now lean to saying it should no longer be in scope for 8.0.0.

Crell’s picture

I actually share chx's concern. I am hardly new to pointing out DX issues with render API, and caching is one of them. :-) That's why I've always pushed for coarser-grained caching.

That said, that ship had unfortunately sailed and micro-caching of an AST is what Drupal 8 is going to do. The concern now is how do we make the DX such that developers learn how to cache things properly quickly rather than forget as the always have in the past. Wim and Fabian have put a ton of work into that recently, with the auto-caching of blocks, for instance, and auto-flagging with the permissions context. However, that doesn't mean we're out of the woods.

"Turn it on so that buggy controllers are obviously buggy" sounds great, and is an approach I generally support. However, such an approach requires that making it unbuggy is reasonably self-evident. Even reviewing Wim's patches for the past 4 months I am still not convinced I could write a properly cache-denoted render array without help. We need to provide that help. However, that problem already exists; not just for controllers, but for, say, a field formatter. I sadly do not have an answer for that either way, but we need one regardless of whether we turn on SmartCache.

chx’s picture

> auto-flagging with the permissions context.

I wanted to mention exactly this: it actually makes people even less aware there's caching and context etc. Yes, you can't win. Adding more magic makes it easier but magic , based on my long experience , at the end of the day, always turns around and bites us because people always want to do something the magic did not account for.

dawehner’s picture

I wanted to mention exactly this: it actually makes people even less aware there's caching and context etc. Yes, you can't win. Adding more magic makes it easier but magic , based on my long experience , at the end of the day, always turns around and bites us because people always want to do something the magic did not account for.

Yeah at the end of the day, PEP 20 and its rule of explicit over implicit wins.

What I think we need to do is to clearly communicate (and provide technical tools) that you cannot develop Drupal in production settings (safeguard enabled)
and you cannot run Drupal with development settings (anti safeguard enabled). Gladly for sites which deal with security issues I'd argue that having DEV and LIVE separated is a really common thing.
There was once the try to provide such a boolean flag: #1537198: Add a Production/Development Toggle To Core but for now I think shipping with some form of environment indicator maybe could help, one which is enabled by default, so you need this extra step of going to production. Maybe we could then also provide some things which aren't no longer possible on production mode, which would be needed actually by anyone working on the site (no idea yet, maybe install modules via the UI could be disabled).

Once we have riced the awareness that much the tools like the safeguard and the anti-safeguard would unfold its power/security wins.
Funny enough this means that in dev mode we cache as strict as possible, while in prod mode we would take that back a bit.

Fabianx’s picture

Uhm,

This has gotten a little heaty in here and a lot of controversy / confusion and mixed facts.

Let me try to clean this up a little:

  • a) Viewing a node or a view page currently is perfectly smart-cacheable without any side effects, because all items on the page are already render-cached. => Therefore except for edge-cases like directly changing hook_page_top() / hook_page_bottom() there is no problem there. Also way less probability of information disclosure and the same problem already exists for Varnish / core page cache, which many people can well work with since D7. It is also easier than ever to just remove caching by just issuing a max-age=0 on anything that is displayed and it will propagate up to all levels.
  • b) Admin routes are excepted automatically. That is usually the majority of where sensitive information is displayed.
  • c) Smart cache is after routing and therefore after access checking, therefore any controllers that get their access callbacks right, are not vulnerable.
  • d) user.permissions is already added to every render cache item on the site.
  • e) The most common problematic things will be automatically added as cache contexts, in particular 'node_grants' and 'user', a non-magic version will be always available for those that know what they are doing (and for very edgy edge-cases).

I think in particular c) is very important as it seems people are not aware of that.

So lets examine an example:

/private-messages

is a route that uses the current user, but only has a generic 'can access private messages page'.

If that page uses a view, then it needs to get the current user from somewhere, which will make the view emit the 'user' context already and as the view can be render cached correctly, so can the page.

Okay, so lets take a look at a custom query instead:

"SELECT posts from {table} where uid = :uid"

In that case probably if this is not cached by user it gets apparent very quickly. Case e) would have added 'user' however, so unless they use some magical way to get the current user (and I would not know how), this would work with the safe guards automatically.

Even hook_page_{top|bottom} are by now automatically having a render context as we use a normal render for the first iteration of the HTML page, so that data is correct, too.

Okay, so lets now take a look at a controller that varies on the 'phase-of-the-moon' its output:

Yes, that controllers result would be wrong as it would not be cached varied by the 'moon', but adding a simple #cache:max-age=0 anywhere in the output or adding no_cache to the route would fix it. I don't think that is harder to understand than e.g. dependency injection, plugins, etc.

And if they make a 'phase-of-the-moon' block or view OR if that /moon-phase thing is available to anonymous users, they already have that problem.


The biggest information disclosure problems in my experience happen when caching is turned on because the page is slow some time after development has been finished - not when caching is enabled from day #0.

Because then usually the enabling is rushed, not well tested and using output that is not prepared to be cached, while for day#0 caching usually QA or manual testing catches the bugs.

That is the huge advantage of 'caching-all-the-things' and everyone that tried to add authcache or render_cache or something like that after a site has been build already will know the pain of how difficult it is.


That said: I am perfectly fine with opt-in via a response policy as I think that node/%*, views/% and page_manager/% plus some other front end managers is all that ever realistically need smart_cache style render caching.

It would be great to have it available for all controllers automatically though.


The following is not totally smart-cache related, but I still found it important to answer:

But this "cache all the things". Noone, absolutely noone will be able to write code that caches correctly.

And yet modules like blockcache_alter, authcache, core page caching, etc. thrive in Drupal 7 and there is a huge demand for more. I personally found caching a lot easier to understand than dependency injection, yet we got a ton of contributors who learned OOP, Dependency Injection, Plugins, Entity API, Routing, Twig, etc.

From talking with people at DrupalCon the tenor always was:

"I want a fast site, but how?"

The concepts of render caching introduced here for the first time allow someone that has not yet written 'cacheable' code to use a simple language to express the dependencies and have a page be correct through all layers thanks to bubbling of tags and contexts.

There are 100s of posts on how to enable specialized performance things for Drupal and Wordpress alike; Performance has become a first-class citizen.

I think people that want to write a simple page application will rather use Symfony or any other framework, because it is fast to write and fast in performance as it is not complex.

But the strength of Drupal is its dynamicness _and_ complexity. Both in combination are not compatible with performance as layers of abstraction that make it very powerful, also make it slow (by pure function call counting alone already).

However those layers of abstraction allow to perfectly cache a page and also show like with renderviz how a page is cached.

Those are very powerful concepts that make Drupal ready for the dynamic web.

I agree that we struggled in development to get all those concepts defined:

- "What are cache keys, cache contexts, max-age, tags?"

and yes it took weeks and months to get to "invent" those and specify exactly what they do, but now they are defined and Drupal does the hard work and you only specify the dependencies.

That is not worse than to define a new config entity, or a NodeStorage or a library or a theme inheritance or even a module dependency.

We should maybe take a look at our D7 'controllers' and see which would break if there was such a smart_cache thing for D7 and how to fix it.

I think the only way to find out how complex it would be is to try it out.

Therefore my order of preference is for smart_cache in and enabled by default + cache_safeguards module enabled by default:

  • a) response policy to enable / disable smart_cache is opt-out via no_cache: TRUE or putting a max-age=0 somewhere in the controller output.
  • b) response policy to enable / disable smart_cache is opt-in via smart_cache: TRUE per route; node/%* and views/%* are opted-in, but not for admin routes.

If we get lots of bug reports for a) and lots of problems and people just can't write code anymore, then we can always fallback to b).

While with b) we cannot easily go back to a) as it is a BC break that we can't do.

Therefore my feeling is it would be best to be adventurous, also make use of the on-going security finding bounty and enable it for now with the caveat of making it opt-in before 8.0.0 - if the risk feels too high, there have been too many problems.

That should give us a good idea of what to expect in the future. It will also ensure tests continue to find cacheability bugs for now.

Fabianx’s picture

Some more:

The cacheability safe guards that directly add to the render context ('node_grants' and 'user') can indeed work in a production and development mode:

Because we know what we expect, we can indeed throw a warning / error if what we expected was not in the render context in development and we can automatically add it while in production mode.

Or rather as we have assertions and those need to be turned off for production:

We can always assert the correct contexts are there for those that safe guards support.

That would mean: Automatically adding + error if it was not explicitly added. => Safe + Nagging => Win!

It would also mean less magic, but I don't think user.permissions is a use case for that, because except for tests, where it would be beautiful to turn that off to assert the right cache contexts, it is generically useful.

Especially because the API is:

Renderer::addToRenderContext() and nothing besides safe guards uses that, we can add an optional parameter and then assertions are perfect for that.

$renderer->addToRenderContext($cacheability, TRUE); // Yes, check if whatever was added would have been present without it.

That only technically means we have to have an 'enterRenderContext' and 'leaveRenderContext' "event", which only happens in two cases:

a) A #cache is present in the render AST
b) A new context was opened explicitly.

This is getting technical ...


Anyway TL;DR:

- We can definitely have magic in production and explicit nagging in development (for all render caching).

#77:

All you need to do is in your controller that varies per user:

$build = $this->buildMyRenderArray();
$build['#cache']['contexts'][] = 'user'; // Add this one line and your output varies by the current user.

return $build;

The same works for a block, view, field_formatter, etc.

chx’s picture

> We can definitely have magic in production and explicit nagging in development (for all render caching).

Well, if there's indicators in development... my concerns were mostly blocks but then again probably the kind of blocks I am worried about will be views so... maybe.

catch’s picture

I'm also tending towards the following;

- enabled by default, with the API supporting both explicit opt-in and opt-out for routes (we already have no-cache so it'd be opt-in that needs figuring out).

- add cacheability_safeguards for node access and per-user stuff.

- see how much things break.

- switch to opt-in if things break a lot.

I don't think we can switch from auto-opt-out to auto-opt-in easily, that's much more likely to result in information disclosure on existing sites/contrib that were fine before. If we have auto-opt-in first then those bugs will get fleshed out and auto-opt-out just means some routes don't get cached that might have been OK.

Crell’s picture

I agree that, tactically, reaching further now and falling back to a less aggressive approach later if necessary is the safer approach. To that end, it sounds like SmartCache on, opt-out is the best approach for the time being and we can fall back to opt-in during RC if we really need to. Although I'm still not sold on this "hidden required module" thing. I don't see how that's a good idea at all.

Fabian: For me, the big question is affordance. One reason that I have been a strong proponent of pure functions and statelessness (aside from them being inherently simpler and easier to understand) is that they are inherently memoizable. If you do something wrong to make them not so, it's painfully explicitly obvious, perhaps even a compile error. That makes knowing what you did wrong easy, and knowing what to do to fix it ("make the function properly explicit and pure") straightforward (although not always easy).

Drupal didn't go full pure functions, though, for various reasons (both really good technical ones and some less good ones); instead, we're taking an in/out/track approach. Fine, it is what it is, let's make that work. The problem is that in/out/track is a lot less obvious when you get it wrong, which means it's going to be wrong more often. We've already found that core's getting it wrong in all kinds of places every time we tweak something. It's also a lot less obvious how to make it not-wrong.

Your argument is basically "we can auto-cover the 95% case, and it's easy enough to opt-out in the other 5% if you don't want to bother getting it right". That's a fair argument, although I agree with dawhener that the amount of magic we can allow before it's too much is very small, and the cost of doing so potentially large. (Auto-adding a user context every time we access the current account, for instance, is such tight architectural coupling it makes me ill.)

But that does not preclude "make it way easier to know when you've done something wrong and what to do instead". With anonymous arrays there's an upper bound to how much of that we can do, but it's something that we still need to do. Perhaps this becomes a separate "render array DX" issue, but if we're talking about making the final switch to "if it's not a cache-aware render array, you're doing it wrong!" that's a question we have to seriously consider.

Disclaimer: I'm finally working on a book for D8 development, and how to explain things step by step without a circular dependency of knowledge is so far the biggest challenge. :-) Hence my concern about the height of the "must be this high to ride" bar as it pertains to caching and security.

effulgentsia’s picture

Issue summary: View changes
Status: Active » Reviewed & tested by the community

Thank you, everyone, for lots of thoughtful discussion. Per #12, @Dries, @alexpott, @catch, @webchick, @xjm, and I discussed this issue on Friday, and basically came to the same conclusion as #83. Updating the IS accordingly and marking RTBC, but will wait on marking "Fixed" in case anyone feels that something important is left unaddressed and wants to comment about that. Possibly the most contentious part of the proposed resolution is the opted-in-by-default choice, especially with several security team members requesting the opposite. But I hope the rationale that we can change that choice, but not its inverse, during RC, based on what we learn, alleviates those concerns.

I'll follow up with another comment addressing some of the feedback that's been posted over the weekend.

fgm’s picture

I tried to follow the whole discussion, and maybe missed some of it ; however, at the end I think there's been a duality being discussed all along between default on + opt-out vs default off + opt-in, but there is a third route apparently not considered : neither default to cachable nor non-cacheable, but force cache information to be provided, possibly as no-cache of course, so that there is no defaulting. It seems this gets the benefits of both approaches, at the limited costs of forcing the addition in a myriad of places ; but as we know, this type of many-times-a-tiny-change is something we've always been best at in terms of community output.

One thing which could alleviate such a trouble could be to to further along the automatic generation already in place for permissions and others, by adding the context for any loaded entity to a cacheability object, which would be available as a default anywhere data is being built. That way, developers building data could just add this default (overly conservative because it contextualizes by everything) object instead of having to ponder about actual cache contexts ; and then, going further could slice and dice from this, or even build their own if they take the time to reflect on their actual cacheability.

This combo of forcing cacheability to be explicit (vs implicit one way or another, cf PEP20), plus providing a very simple default which errs on the safer side rather than the faster side, seems more attractive to me than either of the existing discussed branches of the existing alternative.

But then I may very well be missing something.

greggles’s picture

But I hope the rationale that we can change that choice, but not its inverse, during RC, based on what we learn, alleviates those concerns.

That makes some sense, but I think we're more likely to see the problems come in after 8.0.0 is released and contribs get updated. I guess that if that happens we can consider making this change as part of 8.0.x release?

effulgentsia’s picture

Assigned: Unassigned » effulgentsia
Status: Reviewed & tested by the community » Fixed

I guess that if that happens we can consider making this change as part of 8.0.x release?

Yes, the issue summary's proposed resolution items 2 and 3 already mention that.

I just now committed #2429617: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!), so marking this issue "fixed" as well. However, I still want to respond to some of the comments here about DX and other concerns, so assigning to myself to remind myself to do that when I can make the time.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.