Problem/Motivation

In #2158003-62: Remove Block Cache API in favor of blocks returning #cache with cache tags, "Forever" was introduced as an option for the Page cache maximum age setting on the system performance form. If the option is selected, cacheable pages will be delivered with a Cache-Control: public, max-age=31536000, i.e. every browser and intermediary proxy is instructed to cache the response for up to one year.

The following code comment introduced into drupal_serve_page_from_cache() points out the motivation behind this change:

  // RFC 2616, section 14.21 says: 'To mark a response as "never expires," an
  // origin server sends an Expires date approximately one year from the time
  // the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more
  // than one year in the future.'

In another issue which I regrettably cannot find atm, it has been pointed out that Google suggests a maximum of one year for non-expirable content:

Set caching headers aggressively for all static resources. [...] Set Expires to a minimum of one month, and preferably up to one year, in the future. [...] Do not set it to more than one year in the future, as that violates the RFC guidelines. [...]

However the cited RFC-section as well as the Google recommendation are referring to the Expires header, which is a totally different beast than Cache-Control header. It is very important to understand that it is only safe to use the Expires header on static content (i.e. aggregated JavaScript with a fingerprint in the URL). As soon as it is necessary to add a Vary header on the response (e.g. Vary: Cookie), the Expires header needs to be set to a date in the past. This is because Expires is HTTP/1.0 and Vary was only introduced in HTTP/1.1.

I sincerely cannot think of a situation where anybody would like to instruct browsers/proxy caches to store HTML content for more than a couple of hours. It would be insane to deliver any markup with a Cache-Control: max-age of one year. I guess this option was introduced by accident.

Proposed resolution

Remove the option.

Remaining tasks

User interface changes

API changes

Comments

znerol’s picture

Status: Active » Needs review
Parent issue: » #2257695: [meta] Modernize the page cache
StatusFileSize
new1.84 KB
wim leers’s picture

First: AWESOME patch name :D

Blocks are cached inside Drupal, and are hence not governed by HTTP-defined maximum cache durations. That's why Drupal has the notion of "cache permanently", which effectively means "forever". It is logically possible to cache a Drupal block forever — even though that would probably never happen in practice (e.g. when a module is installed, all caches are cleared). That's why "Forever" (corresponding to CacheBackendInterface::PERMANENT) is an option for a block's "maximum age" setting.
The reason pages also received this option is simple: because we wanted consistency in the available choices for the "maximum age" setting across blocks and pages.

You're right that this makes little sense for pages though. The "1 day" maximum age is probably more than sufficient, for the reasons you cite. Even more so, because for pages, Drupal explicitly supports HTTP revalidation (If-Modified-Since and If-None-Match). So I'd be fine with removing this option from the UI.

However, I think that the check to ensure the maximum age never exceeds one year is still necessary, because people might modify their config directly to set values higher than one year, which would then still lead to wrong headers. Thoughts?

znerol’s picture

However, I think that the check to ensure the maximum age never exceeds one year is still necessary, because people might modify their config directly to set values higher than one year, which would then still lead to wrong headers. Thoughts?

The RFC as well as Google page speed document refer to the Expires header, however we're enforcing the max-age directive of the Cache-Control header. This just looks wrong.

Also I cannot think of a use-case where it would be necessary to specify an insanely big default max-age value for rendered markup. This is certainly different for static assets but that piece of code never is run for those.

Therefore I'd prefer to remove this rather confusing check.

wim leers’s picture

Status: Needs review » Reviewed & tested by the community

RE: Expires vs. Cache-Control: great point!

No remaining objections, then. Let's get this in!

catch’s picture

Status: Reviewed & tested by the community » Fixed

I think it's reasonable to set this up to a week - if you have a lot of pages + infrequent updates + cache tag support working how it should with varnish then why not cache for a long time. However yeah, not for a year. Committed/pushed to 8.x, thanks!

  • Commit 2400569 on 8.x by catch:
    Issue #2257807 by znerol: Remove 'Forever' from the Page cache maximum...
znerol’s picture

I'd rather recommend to just set the max-age to the value which should be propagated to the client (e.g. a couple of minutes) and instead increase the beresp.ttl from within vcl_fetch, but only on those responses with the X-Drupal-Cache-Tags header. This approach is robust insofar that responses lacking the cache-tags header for some reason will not be cached indefinitely. It also avoids problems with stale browser caches when users accidentally bypass Varnish (due to misconfiguration or changes in the setup, e.g. when SSL traffic is routed around the reverse proxy).

Set max-age to the maximum you consider acceptable for caches you don't have control over (public proxies, browser cache). Increasing it over that limit is dangerous.

catch’s picture

Well we also allow max-age to be overridden dynamically by different requests.

So for that approach I'd normally set Varnish to respect max-age from Drupal, but then rewrite the value for the responses it serves to be shorter. Otherwise beresp.ttl can't be affected by Drupal at all.

znerol’s picture

Think about the reason why it is currently desirable to enforce a beresp.ttl from within Drupal (7). In my opinion this is due to the lack of an easy-to-use and reliable mechanism capable of selectively purging pages from the gateway cache when content changes (it can be challenging to set up the respective contrib modules on a complex site). However, with cache-tags this restriction is gone and with it also the necessity to specify different beresp.ttl values in Varnish. In fact storing anything with e.g. beresp.ttl = 1y regardless of the max-age value returned by Drupal corresponds exactly to the behavior implemented by the internal page cache since #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

alberto56’s picture

I sincerely cannot think of a situation where anybody would like to instruct browsers/proxy caches to store HTML content for more than a couple of hours.

Here are a two examples:

Right now, in the Acquia article mentioned above, Acquia explicitly suggests setting the max-age to more than one day:

drush @site.env config-set -y system.performance cache.page.max_age 2764800

So Acquia, at least, thinks there is a usecase for this. If you run the above command then visit /admin/config/development/performance, it will show "No Caching" as being selected. If you save, the config will be reverted to zero instead of 2764800.

alberto56’s picture

StatusFileSize
new381.24 KB

Here is another usecase: the Purge module gives a warning if the TTL is less than a month, as in the enclosed image.