Problem: Drupal's page caching interferes with RESTWS because it only takes the URL path as cache identifier. HTTP Accept headers are ignored, so only one response at a path can be cached. Example: page cache is empty, GET request for HTML to /node/1 arrives, HTML is cached. Then a GET request for JSON arrives at /node/1, but the dumb page cache ignores the Accept header and just returns the cached HTML, which makes the JSON expecting client unhappy. This is why we had to recommend the usage of GET /node/1.json in the recent alpha5 security release.

I can see two possible solutions to move forward from here:

Remove support for GET /node/1 and solely rely on /node/1.json

This is the easy solution and could be implemented fast.

Unfortunately this violates the REST principles, format information should never be encoded into the URL. That's what we have a HTTP standard for: Accept headers. The usage of /node/1.json was intended for simple testing purposes in web browsers, since humans can access URLs easily in them.

When using hypermedia APIs it is important that URLs are consistent. We are outputting references to other resources, and currently we just use /node/1. We would have to change that to be /node/1.json as well, so the format must be known when building those links. Clients should be able to use those links as is (they should not have to add ".json" to URLs).

Disable page caching and implement own cache headers

This is more sophisticated to implement, but would allow us to retain GET /node/1.

The obvious fix to still make GET /node/1 work is to just disable page caching for this path, so the Accept headers would always be processed. However, just disabling the page cache for node paths is an absolute no go to me, since it is a very important feature of Drupal 7. On the other hand, big sites are using reverse proxies anyway that handle page caching. So fago had the interesting idea of just disabling page caching and outputting the appropriate HTTP cache headers ourselves in RESTWS. The reverse proxy of course must be configured to take into account HTTP Accept headers, otherwise we have the same situation as with the dumb Drupal page cache. It would be good to know if for example Varnish has that setting in its default configuration.

If we decide to go with this idea I also think it would make sense to introduce configuration about which resources are enabled and whether page caching should be disabled for them. We want to interfere with Drupal's page caching as little as possible, so RESTWS resources that are not used on a site anyway should not influence page caching. Example: I want a REST API for nodes, but not for taxnomy terms, so I want my menu items and page cache variable for terms to be untouched.

I'm still not sure what solution we should go for, so I would like to hear your input. The first one is simple but dirty, the second is clean but complex.

Comments

drumm’s picture

Issue tags: +drupal.org D7, +porting
greg.1.anderson’s picture

Is there a core bug open against the page cache for this issue? My understanding is that Drupal 8 is going to support web services by default, so I suspect this may already be fixed for D8. Anyone know if that is the case?

As far as the Drush Issue Queue commands are concerned, it makes absolutely no difference whether option 1 or option 2 is selected. However, node/1.json is fairly distasteful as a general solution to providing REST services on top of Drupal, as mentioned above. It seems to me that the best solution would be to make a core patch that added the content type to the page cache key, since that is the correct behavior for a page cache. If this was done as a core patch, I don't think it would be necessary to alter the behavior based on the kind of content that is being served.

It is therefore my thought that the first step would be to start by verifying whether D8 has this problem or not, and posting a core patch to start fixing the problem. If we didn't want to wait for the core patch to land, we could go with the slightly distasteful solution in the short term.

The second solution also seems fine, and certainly better than the first option; however, it also seems harder to implement than just fixing the problem in core. If I were the implementer, I don't think I'd want to go to the effort. If someone did, it would be great, though.

dww’s picture

+1 to fixing this in core and not messing with page caching from inside RESTWS.

sreynen’s picture

I think there may be a 3rd option here: disable the page cache only when the Accept: application/json is set. That wouldn't impact normal pages at all, but JSON requests would skip the page cache, optionally using a separate cache.

It looks like both drupal_page_set_cache() and drupal_page_get_cache() check drupal_page_is_cacheable(), which has a return based in part on a &drupal_static() call. So I think that static value could be conditionally set in hook_init() to FALSE when the current request is for JSON.

dww’s picture

Good point. I'd support that as a reasonable compromise if we can't just fix this in core.

Does anyone know if we've already got a core bug report about this? If so, please link it here.

Also, we should probably update the summary with the other options being proposed (e.g. fix core and this proposal from sreynen in #4).

Thanks!
-Derek

klausi’s picture

@sreynen: that 3rd option does not work. Example:

  1. page cache is empty
  2. Request with Accept: application/json to /node/1 arrives
  3. JSON is delivered as expected
  4. Request for usual HTML to /node/1 arrives
  5. page cache is populated with HTML for /node/1
  6. HTML is delivered as expected
  7. Request with Accept: application/json to /node/1 arrives
  8. Page cache matches and HTML is returned unexpectedly

And here is the core issue to fix page caching: #1303010: Page cache only uses URL as cache ID, not HTTP Accept headers or language

fago’s picture

What about adding a hook to core that allows us to customize the page-cache-key via drupal_alter(). RestWS could add the format there then.

klausi’s picture

As catch already said in the core issue: we cannot invoke hooks at that point in bootstrap, because modules are not loaded yet (not loading stuff is the point of a page cache).

IMO the solution would be to take Accept headers into account when building the cache ID.

Anyway, as the core issue seems to be far away even for Drupal 8 I think we have to focus on the possibilities in RESTWS itself.

fago’s picture

As catch already said in the core issue: we cannot invoke hooks at that point in bootstrap, because modules are not loaded yet (not loading stuff is the point of a page cache).

So it would have to be an ugly callback solution? Yeah, obviously the core fix would have to happen quite fast.

dww’s picture

Issue tags: +project

With this impacting the d.o D7 upgrade, I think we have a much better chance of getting this in core quickly than we usually would. Let's use that to our advantage and just put the energy into fixing it properly. Everyone wins that way. If those of us already participating in this issue (plus DamZ who pointed me here) just team up on the core queue with patches, reviews and RTBC, I think it'll be in before you know it. This is a pretty stellar cast we've already assembled here. :) Let's not waste our precious time figuring out ugly partial solutions.

#1303010-19: Page cache only uses URL as cache ID, not HTTP Accept headers or language

Thanks,
-Derek

greg.1.anderson’s picture

Strongly agree with #10. I'm just surprised that #1303010 isn't already marked critical for #wscci. Sent out a tweet to hopefully draw attention to it; hopefully someone working on #wscci can comment.

https://twitter.com/greg_1_anderson/status/324626305545023489?p=v

drumm’s picture

Anonymous’s picture

i'm confused as to why this matters for d.o.?

don't we use an external cache, and turn off the page cache inside Drupal? so don't we do cid generation and header normalisation in varnish?

Fabianx’s picture

#12: Ah, the page cache for D.org D7 project would be pretty simple to "trick",

In #1768556: Context Mobile Detect does not work with page cache enabled we have done that for context_mobile_detect. (Caution: Only works for Apache, but other support _could_ be added.)

The idea is very simple:

* The page cache depends on https://api.drupal.org/api/drupal/includes%21bootstrap.inc/function/requ...
* This setting can be overridden from settings.php
* Add a function mymodule_preboot() to a special file and invoke it like (hook_boot() is too late)
* Include this file from settings.php

Like mymodule.settings.inc:


require_once dirname(__FILE__) . '/mymodule.module';

mymodule_preboot();

In settings.php do:

include DRUPAL_ROOT . '/sites/all/modules/mymodule/mymodule.settings.inc';

In mymodule_preboot() do something like (pseudo-code):

if ('Accept-Header' ~  'json' && !(request_uri() ~ '.json$')) {
  $_SERVER['REQUEST_URI'] = request_uri() . '.json';
}

Yes,yes this is even better than the solution in context_mobile_detect and will work with all webservers (not just Apache).

It is around 5 lines of code and an additional instruction in the README and it all works :-).

Yes, technically it is still a hack, but one that works wonderfully well.

Anonymous’s picture

seriously, wut? are we using drupal's page cache on d.o.? just tell me to go away if we are, but that's a pretty surprising wtf.

drumm’s picture

Untagging since #1710850: Deploy RestWS for D7 project issue JSON should be okay to track this. Tagging with "Drupal.org 7.1" would be okay.

Drupal.org does use the core page cache, backed by MemCache, and additionally Varnish in front. I don't know all the details, but I think the minimum cache lifetime has been particularly helpful for us.

drumm’s picture

Issue tags: -project, -drupal.org D7, -porting

(untagging)

clemens.tolboom’s picture

@Fabianx in #14 will not work for d.o as varnish has a cached version (for anonymous) of node/1 and probably not recognizing the accept header yet.

@helmo and I are working on #1710850: Deploy RestWS for D7 project issue JSON and 'discovered' drush_iq issues a file download to get to node/1 hoping for a json format which will always fails as no accept header is send. (That probably a drush feature request). drush_iq not only needs the issue node but also it's comments and contributing users which seems way bad. (should become a view I guess)

My question: Is the request node/1.json always available or is it just a developer feature? Can we rely on it for d.o?

iamEAP’s picture

There are some fun things that can be done in hook_boot() to get around the page cache issue.

Fabianx’s picture

Assigned: Unassigned » Fabianx

#19: hook_boot() is too late for page cache, but pre-page cache works perfect as outlined in #14.

#18: This is a configuration issue, which d.org webmasters could do quite easily:

vcl_hash() {
  if (req.http.accept ~ 'json') {
    hash += 'json'
  }
}

I'll work on a patch for restws on d.org ... if I get to it.

drumm’s picture

Drupal.org has settled on keeping Drupal's page cache on #2228897: Review cache values on drupal.org.