Problem/Motivation

As of #2019123: Use the same canonical URI paths as for HTML routes, content negotiation is used to determine the response format of REST requests. Since REST clients are served from the same URLs as browsers, this applies to every request.

However, in order to make content negotiation via Accept request header work with external caches, an application needs to add Vary: Accept to affected responses. Otherwise an external cache will mix them up and REST clients / browsers are served with responses in random formats.

One big difference Drupal has from other REST APIs it serves HTML on the same path, which causes problems in browsers.

Accept header use cases

  • Modal dialogs
  • REST API responses

Because of these two completely different use cases, we might want to tackle them differently.

Examples of major REST APIs "in the wild"

APIs using proper accept headers without a format extension (*.json for example)

APIs using format extensions (*.json for example) and ignoring Accept headers

APIs using different URLs (e.g. /api prefix)

  • Facebook: https://developers.facebook.com/docs/graph-api/overview/
    GET graph.facebook.com
      /facebook/picture?
        redirect=false
    

    Facebook does not use any inline hyperlinks for API consumers, they only provide IDs

  • Google API https://developers.google.com/ad-exchange/buyer-rest/v1.3/
    GET https://www.googleapis.com/adexchangebuyer/v1.3/accounts/id
    

    embedded API links do not use a format extension.

  • Amazon: http://docs.aws.amazon.com/AWSEC2/latest/APIReference/Query-Requests.html
    https://ec2.amazonaws.com/?Action=RunInstances
    &ImageId=ami-2bb65342
    &MaxCount=3
    &MinCount=1
    &Placement.AvailabilityZone=us-east-1a
    &Monitoring.Enabled=true
    &Version=2014-10-01
    &X-Amz-Algorithm=AWS4-HMAC-SHA256
    &X-Amz-Credential=AKIAIOSFODNN7EXAMPLEus-east-1%2Fec2%2Faws4_request
    &X-Amz-Date=20130813T150206Z
    &X-Amz-SignedHeaders=content-type%3host%3x-amz-date
    &X-Amz-Signature=ced6826de92d2bdeed8f846f0bf508e8559e98e4b0194b84example54174deb456c
    Content-type: application/json
    host:ec2.amazonaws.com
    

    No links, no format extensions.

Browser support

Vary: Accept in browser caches (e.g. Safari, found via this blog post).

See also: http://www.newmediacampaigns.com/blog/browser-rest-http-accept-headers

CDN support

All CDNs need to have a way of either respecting the Vary headers (and should work out of the box) or customizing the cache key (with configuration).

REST use cases

The following cases should be considered when evaluating any solution as to whether they'd be supported and how cleanly.

Routes

  • rss.xml - single purpose path responding with a non-html format.
  • node/1 - a single purpose path responding with html only format.
  • node/1.json - multipurpose path (node/1) with an extension denoting format.
  • admin/foo/example.plugin_id - An arbitrary path containing some placeholder not related to format but a point)
  • rest/myendpointwithvaryingformats - Accepts only conneg. (Current behaviour in 8.x, not-critical to support and can be in contrib.

Aliases

  • myalias.json - an explicitly added alias with an extension denoting format (myalias.json -> node/1.json) (supported in 7.x)
  • myalias.html - an alias with an extension, pointing to a regular HTML page (node/1)
  • menu.pdf - an alias with a non-HTML extension, pointing to an HTML page (node/1) (supported in 7.x)
  • about.json - a request comes in to an aliased path (about -> node/1) with an extension . The alias-with-extension is implicit. (not supported in 7.x

Research

Other Peoples' Opinions™

The Spring framework prefers extensions (and query arguments) over content negotation and calls content-negotiation "problematic to use".

Roy Fielding on a discussion between extensions and query arguments denies that either "choice has anything to do with REST". This (and his whole mail) really can't be interpreted any other way but that he sees extensions compatible with REST and that he likes extensions better than query arguments.

Proposed resolution

Original proposed resolution was to switch to /node/1.json and so on. Perhaps keep Content-Negotiation as an (advanced) option.

However, research at Drupal Dev Days shows that most of the major web APIs these days are not using extensions. So this would be a non-standard approach. Additionally, the only reverse proxy we can find without a workaround for content negotiation is CloudFlare, and it seems much easier to just say we don't support it until it does.

Therefore, the new proposed resolution is:

  1. For modal/dialog, we should use query parameters instead of accept headers
  2. For REST requests, prefix the path with /api/

However, this may need to be revisited based on research.

Remaining tasks

User interface changes

API changes

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

znerol’s picture

dawehner’s picture

So #2331919: Implement StackNegotiation solves this issue, what is the point to have this issue opened?

In addition to that the proposed solution: Varry: Accept seems sadly not a good idea for HTML routes, we could implement it probably
for REST routes though?

znerol’s picture

So #2331919: Implement StackNegotiation solves this issue,

Nope, not for external caches (Varnish, Squid, etc.).

Varry: Accept seems sadly not a good idea for HTML routes, we could implement it probably for REST routes though?

We need to specify Vary: Accept on all URLs (regardless of routes) where content negotiation can happen. That's why I think that #2019123: Use the same canonical URI paths as for HTML routes was not so smart after all.

znerol’s picture

This might be a starting point for someone. I failed to come up with a good idea on how FinishResponseSubscriber::needsVaryAccept() could be implemented.

catch’s picture

#2331919-25: Implement StackNegotiation and the linked comments explain why Vary: Accept isn't a good option for external caches. I feel like we should go for some variant of the 'different routes' option.

catch’s picture

Priority: Normal » Critical
Issue tags: +Security improvements, +Security Advisory follow-up

This was worthy of an SA for the Drupal 7 contrib REST module (https://www.drupal.org/node/1966780), I don't think it's any less critical for external caches than the internal one.

klausi’s picture

Originally I was very excited about #2019123: Use the same canonical URI paths as for HTML routes, because I thought there was benefit in implementing REST the right way for discoverability of resource paths and with a uniform interface.

Turns out the internet does not give a shit about uniform interfaces or link relations. The term REST has changed its meaning completely and only means some HTTP accessible API now. If you look at all the common APIs out there like Twitter, Github, Facebook etc. they all violate the original REST principles by using some "/api/version/1.2/node/1.json" path instead of using "/node/1" with appropriate HTTP headers.

There are even dubious REST best practice examples out there that promote this non-REST behavior. https://github.com/Gizra/restful is an example for Drupal 7 that is now emerging, and people do not seem to have a problem with it.

As a REST module maintainer in D8 I fear that by following the original REST principles we are inventing a Drupalism here that no one else is doing. And it also makes our own life harder as it might need to be (performance in the routing system and external cachability).

For the last two years I was hoping for the real REST to break through, but apparently it is not happening on the internet. Unfortunately we are now in beta, so not sure what the most sensitive way forward should be.

chx’s picture

> We need to specify Vary: Accept on all URLs (regardless of routes) where content negotiation can happen.

Like every content entity in the system?

Anonymous’s picture

i really like the spirit of #7.

for the love of $deity, can we make decisions based on what will actually work, and not architecture-astronaut hand waving.

chx’s picture

I find it important to note: any rumors of me hacking beejeebus' account and posting #9 myself are completely baseless.

effulgentsia’s picture

Originally I was very excited about #2019123: Use the same canonical URI paths as for HTML routes, because I thought there was benefit in implementing REST the right way...As a REST module maintainer in D8 I fear that by following the original REST principles we are inventing a Drupalism here that no one else is doing. And it also makes our own life harder as it might need to be (performance in the routing system and external cachability).

I think the basic principle of a request processing and routing system that can treat an internal "path" and an internal "format" as separate things and that can support:

  • Single routes at a given internal path, with format-specific customization happening within the logic of a controller, and
  • Single routes at a given internal path, with format-specific customization happening within the logic of a view listener, and
  • Multiple routes at a given internal path, with format-specific differences resulting in entirely different controllers

is very useful. For example, for an internal path of "node/1", it allows us to have:

  • JSON, XML, and HAL+JSON representations implemented at the same route as each other, but with the Serialization system (invoked during the running of the controller) branching by format only where it needs to.
  • HTML, DIALOG, and MODAL representations implemented at the same route as each other, but with a View listener branching by format only where it needs to.
  • The first set above implemented at a different route than the second set, because the entire controller flow and access permissions needs to be handled differently between those sets.

So, IMO, the decoupling of path, route, and format, at the PHP code level, is useful, so I think a lot of the work that went into the routing system and #2019123: Use the same canonical URI paths as for HTML routes has not been in vain.

However, our useful decoupling of these concepts at the PHP level shouldn't force us into suboptimal solutions for user agents. Given the real performance and caching benefits of including the format into the URL, I think we should do that, at least by default.

In other words, I think we should implement inbound/outbound path processors, just like we do as our default language negotiation strategy. So our URLs can look like /en/node/1.json, but the internal path that routes on gets normalized to /node/1. That would give us the best of most worlds: nicely decoupled PHP code, URLs friendly to real-world proxy caches, and a faster internal page cache fetch.

However, for site builders who really want format-agnostic URLs, and are willing to deal with a slower internal page cache fetch and either the bypassing or special configuration of proxy caches, let's provide a setting for switching from URL-based negotiation to header-based negotiation (similar to how we allow language negotiation configuration, though in the case of format negotiation, it probably needs to be either-or rather than a chain). The fact that we already have the concepts of internal paths and URLs decoupled should makes doing this relatively easy.

Wim Leers’s picture

#9 & #10: Thanks for the giggles!

#11: A resounding "YES!" to your paragraphs about the decoupling of path/route/format at the PHP code level being useful.

But … how do you imagine that those outbound path processors would work, given that the whole point is that different formats are supported? E.g.: you create a menu link (menu_link_content) to /node/1. It's available in JSON, XML, HAL+JSON, HTML, DIALOG and MODAL. There are only two ways to make that outbound path processor choose a format: specify it in the menu link (currently not supported) or default to a certain format. I think that in practice, this would effectively mean we'd always default to HTML. Is that a problem?

Fabianx’s picture

#11: I think that is useful.

I also thought about a setting to disable Vary: Accept and move to URL based negotiation.

But still supporting it - when needed. We could keep full REST for BC reasons (with stack neg library) and just rip it out if / when we see that no one uses it ...

But having a settings switch sounds useful, which would default to fast-by-default.

znerol’s picture

It seems to me that people here believe that content negotiation is a prerequisite for REST. I do not think that this is true, see e.g. REST - Tradeoffs between content negotiation via Accept header versus extensions (posted the link previously in the stack negotiation issue, reposting it here).

For example the Django REST framework allows both approaches (see Format suffixes).

I often use the Redmine REST API which uses the extension-approach.

dawehner’s picture

#11 on top being useful in general we should take in mind that these best practices might chance at any point. Maybe people will come up with using query arguments again.
Keeping the underlying structure to be flexible is a good idea, if you would ask me.

But … how do you imagine that those outbound path processors would work, given that the whole point is that different formats are supported? E.g.: you create a menu link (menu_link_content) to /node/1. It's available in JSON, XML, HAL+JSON, HTML, DIALOG and MODAL. There are only two ways to make that outbound path processor choose a format: specify it in the menu link (currently not supported) or default to a certain format. I think that in practice, this would effectively mean we'd always default to HTML. Is that a problem?

I would argue that for menu links we should dedicate try to prefer the HTML only routes, so the routes which don't have a _format specified.
If you would to link to something like .json route we should ensure that inbound processing is also executed when we determine the route.

For other stuff like serialized json, the best strategy would be to always try to pass along the context in which we render stuff and use it.

Crell’s picture

Background: When we discussed this question previously, the answer at the time was "Varnish can be configured to obey accept headers, it's just not the default" and "Akamai doesn't by default, but can also be configured to do so." That is, "yes it means you need to tweak your proxy cache settings but you generally have to do that anyway." So the matter of external caches was considered and we concluded that "well configure them properly" was an acceptable answer.

IF we are going to go back on that, I am still opposed to doing so but Symfony does offer a convention for it that "falls back" gracefully. That is, _format is the request attribute that tracks the negotiated format name. If that's provided in the path pattern (/node/{node}.{_format}) then that wins over whatever negotiated value was there previously.

Supporting that directly would allow people to send requests to /node/5.json_hal, PUT to /node/5.json, etc. It's janky but if you want to do URI-based content negotiation that's the supported way to do so. And it would still transparently allow /node/5 itself to still work with a negotiated header for those who are able and willing to setup the appropriate Varnish rule.

chx’s picture

#16 please link the discussion. Thanks! (#2019123: Use the same canonical URI paths as for HTML routes does not contain the words "Varnish" or "Akamai")

klausi’s picture

effulgentsia’s picture

Re #17: For Akamai: #1855260-28: Page caching broken by accept header-based routing. For Varnish: https://www.varnish-cache.org/docs/3.0/tutorial/vary.html, which gives "Accept-Encoding" as the example, but the same kind of VCL could be written to normalize "Accept" as well.

In #1855260-25: Page caching broken by accept header-based routing, I said:

The annoying elephant in the room is Akamai, which, according to http://www.rimmkaufman.com/blog/vary-user-agent/30112012/, will flat out refuse to cache any page that uses Vary: Accept (or any other header other than Accept-Encoding). My current thought on that though is to leave that to contrib to solve. Such a contrib module would need to implement whatever the D8 equivalents of hook_url_inbound_alter() and hook_url_outbound_alter() will end up being (plus we'll need to expose a client-side equivalent hook in the Drupal.url() JS function) to implement whatever strategy it wants for adding format-specific information to the URL.

But yesterday, when I wrote #11, I changed my mind to making core's default be "fast by default", rather than requiring contrib or individual site owners to figure out how to make it fast.

catch’s picture

I was never happy with the original solution, as you can see from the comment when I committed the patch #1855260-79: Page caching broken by accept header-based routing - that always felt like a fragile workaround for functionality that was flawed in terms of how browsers (which aren't consistent in the headers they send) and reverse proxies (which rightfully don't trust browsers to send the right headers) work.

This isn't just about "fast by default" it's also about "secure by default" - since we don't send Vary: Accept

Crell’s picture

Side note: Klausi, part of the problem is that "REST" has been twisted in recent years. REST as defined by Roy Fielding originally and expressed in the HTTP 1.1 spec does entail content negotiation, links between resources, statelessness, etc. More recently many organizations have taken to using "REST" to refer to any web service that isn't SOAP or XML, even though that's not accurate. As a result, many are now using the term "hypermedia" to refer to "What REST actually meant before people started abusing the term." Whatever the noun is this week, I do believe Drupal should strive to be "correct" with respect to the standards and protocols in question even if many parties on the web fall short of that metric.

Also adding links to prior discussions.

chx’s picture

#21 please read #14 especially the linked discussion there which contains

Here's a brief snippet from Roy Fielding (who defined REST)...

"section 6.2.1 does not say that content negotiation should be used all the time."

#14 says this was raised in previous issues.

Fabianx’s picture

Another quote from the excellent stackoverflow answers:

The proper official RESTful approach is to use Accept: header.

However, you have to be careful not to break cacheability, which is one of REST's requirements. You need to have Vary: Accept header and cache which understands it. In ideal world you'd have it, but in real life your millage may vary. So the second solution isn't as clean, but it might be more practical.

Also, note that some very old browsers used to ignore headers, relying on the extension instead.

If cacheability is one of RESTs requirements, but we can't send Vary: Accept for normal browsers, can't we still have both:

- a node/1 - the Drupal path, only responding to HTML
- rest/node/1 - the "multiplexed" path that will always send Vary: Accept

Route generation might need a parameter, restful => TRUE or something like that, which is off by default.

Yes it is true that 'node/1' is then no longer canonical, but rest/node/1, but is it specified somewhere in REST that a resource MUST NOT BE available at a different location (which is not restful)?

A trouble of this approach might be aliases, but just re-acting to /rest/* the same as pure / might solve that, e.g. /article-1 and /rest/article-1.

In my really humble opinion (this is just an idea) this might be a way we can remain truly restful (just under a different namespace) and still have our fast 90% case for web browsers.

Crell’s picture

klausi: /node/1 and /api/node/1 are different resources by definition, since resource and URI are 1:1. However, there's no requirement that 2 resources can't be backed by the same underlying data or that changing one resource causes a change in another, just that such details are not exposed to the REST client AFAIK.

If we were to leave the current code in place, but then also provide an inbound and outbound path processor that adds/removes /api on incoming/outgoing requests if _format is not html, that would probably accomplish what #23 describes. I don't think that's any better than what #16 proposed, which would also give us both "correct" shared URIs but operational format-distinct URIs. #16 would be more consistent with the tutorials I've seen for Symfony, anyway.

mikl’s picture

I've recently been trying to figure out how to use the REST stuff in Drupal 8, and since I have not had that much exposure to it before I had a fairly fresh take on it. Here's a few observations:

  1. While the idea of using the same URL for REST as the content itself seems nice, in practise, it's very confusing. Nodes are at /node/[nid]. But requesting just /node (without a node ID) does not give me a list of nodes, as I would have expected from any other REST service. As far as I can tell, you have to create custom views if you want lists of entities.
  2. The paths for different kinds of entities are inconsistent. Content entities have /node, /user, /etc., but config entities usually have something like /entity/date_format/{date_format}. And then there's really weird stuff like /admin/config/user-interface/shortcut/link/{shortcut} (I got something similar, when I created my own content entity).
  3. JSON is not enabled by default. I know HAL is the bee's knees if you have a tool that understands it, but I think most people will just want plain JSON.

So in short, yes, I think it would be nicer if we had uniform endpoints. So /rest/user gives a list of users, /rest/user/1 gives you the admin account. It might have to be slightly more complex than that to avoid name collisions between modules – so /rest/taxonomy/term/1, /rest/node/node/1 and the like, so the first part of the URL would be the module name and the next would be the entity type.

Crell’s picture

mikl: The "/node gives a list of nodes" is the usual recommendation for "collections" in REST APIs, but in practice I don't think it's viable; adding /api to the URI wouldn't change that on a site with 100k nodes a resource that just lists through them in some arbitrary order, hopefully paged, would be extremely difficult to use if not impossible. Contextually-relevant collections are far more useful, which is exactly what Views is for.

I think the rest of #25 is off topic for this thread, but is good feedback in general that should be discussed in a separate issue.

kim.pepper’s picture

@mikl Have a look at the REST UI contrib module which might give you more flexibility https://www.drupal.org/project/restui

David Strauss’s picture

Larry and I discussed the worries here at BADCamp. In short, I encourage people to validate their "Accept/Vary" concerns before worrying others or putting work into creating workarounds. I'm pretty optimistic that it's supported properly.

According to Varnish's documentation, it supports Vary headers by default. Someone else at the BADCamp discussion said it also works fine if configured in Akamai (unclear if that's the default). After the conversation, I gave it further thought and realized that supporting different page compression standards requires proper Accept-Encoding/Vary support, and we definitely see a combination of plaintext, gzip, and sdch, and every proxy has to support that.

Still, we should consider shipping Varnish and Apache configuration to normalize the Accept for the content type so we can hit caches efficiently. The values from HTTP clients will probably differ even more than for Accept-Encoding.

znerol’s picture

While trying to work on the patch in #4 I realized that this is a bigger problem than I initially thought. The reason is that content negotiation is not only used in the rest module but also in core (e.g. MainContentViewSubscriber.php).

Any d8 site which is operating behind a caching proxy (no matter whether this is a reverse proxy operated by the owner of the site or an intermediate one) is potentially vulnerable to cache-poisoning for anonymous users.

The easy way to resolve this problem is to unconditionally add Vary: Accept to all responses generated by Drupal. However, this might be still not enough because of bugs in browser caches (e.g. Safari, found via this blog post).

However, I'd prefer if d8 wouldn't ship with Vary: Accept on by default. In this case it will be necessary to make content negotiation optional - not only for the rest module but throughout Drupal core.

Wim Leers’s picture

#29: That doesn't sound good :( Do you have a proposal on how to move forward?

Anonymous’s picture

my preference would be specifying what we want in the url, and don't do Vary: Accept by default.

if we make it possible to do Vary: Accept headers for contrib, but don't do it by default in core, we'll get much better hit rates, and a simpler system.

larowlan’s picture

Whilst I concede that we might have to use URLs instead of accept, one thing accept gave us was progressive enhancement with dialog api. Links to node/4 would open in a dialog if JavaScript was enabled, but would still work without it

znerol’s picture

This should be doable with minor changes. I hope it is as simple as converting from data-accepts="application/vnd.drupal-modal" to data-format="drupal-modal.js" and then just callback to "node/4." + format or something.

dawehner’s picture

FileSize
14.79 KB

Here is a rough idea just to get some momentum on this issue:

  • In case we URL is ".extension" we set the wanted request format, ... in case nothing provides that request format, we just ignore it
  • We use as described in #33, data-format instead of data-accepts and build the URL with it.
  • The content negotiation class now uses $request->getRequestFormat(), if available. ... ideally it would be the other way round though
dawehner’s picture

Status: Active » Needs review

Well, let's see how much breaks, just being curious here.

The last submitted patch, 4: 2364011-vary-accept-INCOMPLETE.diff, failed testing.

Status: Needs review » Needs work

The last submitted patch, 34: 2364011-34.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
2.34 KB
16.2 KB

Just using ".extension" is not as simple as you expect it to be, let's see whether it helps to rename /rss.xml to /rss
as well as changing the order of the path processors.

Status: Needs review » Needs work

The last submitted patch, 38: 2364011-38.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
11.86 KB

Just a reroll ...

Status: Needs review » Needs work

The last submitted patch, 40: 2364011-40.patch, failed testing.

xjm’s picture

Issue tags: +Triaged D8 critical
dawehner’s picture

Given that we use dots in URLs in various places, for example in the config UI, I wonder whether we should better
try to approach the problem using some URL parameter?

dawehner’s picture

@berdir suggested to just accept a whitelist of extensions, which would be reserved like 'json' 'hal_json' and 'xml'.

This would probably make core passing but its will be odd to not allow for example config names for that.

dawehner’s picture

Status: Needs work » Needs review
FileSize
776 bytes
11.86 KB

Let's try out the whitelist idea.

Status: Needs review » Needs work

The last submitted patch, 45: 2364011-45.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
8.86 KB
20.72 KB

Fixed a couple of the usages of "/rss.xml" to "/rss".

neclimdul’s picture

FileSize
20.75 KB

Was going to look at this but it didn't apply. Haven't really looked still but here's a automated re-roll(rebase of last working commit) for anyone else planning on looking. And I guess an updated testbot run too.

podarok’s picture

Status: Needs review » Needs work
+++ b/core/lib/Drupal/Core/ContentNegotiation.php
@@ -30,6 +30,10 @@ class ContentNegotiation {
+      return $request->getRequestFormat(NULL);

Let's avoid of double method execution

if ($ret = $request->getRequestFormat(NULL)) {
  return $ret;
}
dawehner’s picture

Status: Needs work » Needs review

Let's change it back to needs review, given that we aren't yet (sadly) in the phase of detailed reviews

neclimdul’s picture

So finally looked through it. Magic like this sort of makes me nervous. First, its a hardcoded list. Second, its magically changing the path. seems like that will come back to bite someone in contrib.

Dave Reid’s picture

I would highly suspect this would come back to bite xmlsitemap.module, among others. I'm completely doubtful that everything requesting /sitemap.xml would use the proper Accept: text/xml header, which could then be incredibly confusing if there is an actual browseable/clickable sitemap like most sites do.

dawehner’s picture

@Dave Reid
For xml sitemap this would be no problem. You don't have different results for XML and not HTML, all you would do is to always return your XML.
It though would become problematic if we don't allow .xml by default. Thanks for bringing this up.

znerol’s picture

I tend to think that the PathProcessorAccept approach suffers from the exact same issues as the current accept header based content negotiation. It is more or less just swapping out the negotiation mechanism from accept header to file extension. This may help with solving the external cache-issue, but it does nothing to fix the architectural problems I've pointed out in #2369681-34: Move ContentNegotiation into a middleware.

znerol’s picture

chx’s picture

Could someone do a quick look at various frameworks as to what they do to this problem? For example, here's Spring, it selects the format checking first an URL suffix (.xml) then a URL parameter (?format=xml) and only third the Accept header noting that

Finally the Accept HTTP header property is checked. This is how HTTP is actually defined to work, but, as previously mentioned, it can be problematic to use.

We need more data like this.

chx’s picture

Another data point (nothing more than that): http://stackoverflow.com/a/385216/308851 while a certain developer in 2009 was insisting on what D8 does at this point and said that foo.json was not how REST supposed to be by 2011 said developer said (s)he was wrong. A much upvoted comment:

Any thing that is useful enough to be addressable individually can be assigned a URI. If you need to make the distinction between two formats, you can "promote" them as resources.

grendzy’s picture

A practical data point... I'm running a production D8 site behind the CloudFront CDN. "Vary: Accept" is not supported by CloudFront, which causes all kinds of breakage with AJAX (e.g. the image button on WYSIWYG).

klausi wrote: If you look at all the common APIs out there like Twitter, Github, Facebook etc. they all violate the original REST principles by using some "/api/version/1.2/node/1.json" path instead of using "/node/1" with appropriate HTTP headers.

Well, I don't have the expertise to make claims about the "original" principles, but even Roy Fielding said That's why I always prefer extensions. Neither choice has anything to do with REST."

I think content negotiation should be optional. Regardless of principles, the concept is objectively a market failure, and if Drupal requires it we are severely limiting the range of compatible CDN and proxy options.

chx’s picture

Issue summary: View changes

If CloudFront doesn't support Vary: Accept --and it doesn't-- then this discussion is pretty much over. Updating the issue summary.

chx’s picture

Issue summary: View changes
chx’s picture

Issue summary: View changes
chx’s picture

Issue summary: View changes
chx’s picture

My work here (and everywhere) is done, so unfollowing this. Have fun!

David Strauss’s picture

I see a note in the description about CloudFlare, which I think is irrelevant given CloudFlare not strongly supporting page caching, anyway. I've talked with their engineers via my company's partnership, and they don't recommend using their service as a Drupal page cache. By default, CloudFlare doesn't try to cache pages.

@chx, your information about CloudFront is out of date: https://aws.amazon.com/blogs/aws/enhanced-cloudfront-customization/

grendzy’s picture

David: CloudFront allows custom headers to be "whitelisted", but you have to understand whitelisting means the entire header value becomes part of the cache key (analogous to vcl_hash() in varnish).

Due to the extreme fragmentation of Accept: in the browser landscape, whitelisting Accept: entirely negates the usefulness of the cache.

webchick’s picture

Issue tags: +Security

Using the more common "security" tag here.

dawehner’s picture

jibran queued 67: 2364011-67.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 67: 2364011-67.patch, failed testing.

swentel’s picture

Status: Needs work » Needs review
FileSize
20.16 KB

rerolled

jibran’s picture

Status: Needs review » Needs work
Issue tags: +JavaScript, +Need tests

Adding JS tag for JS changes. There are no tests for this fix in the patch and I am not seeing an test only patch on the issue.

+++ b/core/lib/Drupal/Core/PathProcessor/PathProcessorAccept.php
@@ -0,0 +1,42 @@
+  public function processInbound($path, Request $request) {

We should add some unittest for this.

dawehner’s picture

FileSize
23.33 KB
4.88 KB

For now this is a test to ensure that things work as expected. It fails at the moment though.

Crell’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 72: 2364011-72.patch, failed testing.

znerol’s picture

Title: External caches mix up response formats on URLs where content negotiation is in use » [meta] External caches mix up response formats on URLs where content negotiation is in use
Status: Needs work » Active

I've already pointed out in #54 that this approach is doomed to fail (as demonstrated by the test failures for rss.xml). We only can properly implement this on the router level. Therefore, let's turn this issue into a meta and continue exploring the options in subissues.

We could solve the problem with JS wishing to open anything in a dialog by simply adding .{_format} to the path of every route which does not have an extension specified explicitly. That is still hacky, but it addresses the problem on the router level (where it actually belongs). PoC is #2409187-3: RouteProvider does not understand extension syntax subsequent optimizations did not work, therefore we should depart from patch 3. Also note that it is possible to use composite extensions to formats (i.e. node/1.drupal-dialog.js, node/1.drupal-modal.js, node/1.hal.json).

dawehner’s picture

We had a plan outlined on the last wscci meeting, I hope crell can post it.
I fear that putting it into the routing it might be really hard to undo it for intranet usecases.

Crell’s picture

What I proposed during the WSCCI meeting last week goes something like this:

As I understand it, there's two cache-based fails that are the cause of our pain:

1) Incoming requests with an Accept header, which browsers screw up and send a dozen variations on that, in practice, result in the same thing. This means we either use that in the cache (and have a ton of duplicates) or don't use it in the cache (and get cache poisoning).

2) Outgoing requests with a Vary: Accept header, which common browsers short-circuit and translate to no-cache. (I hadn't realized this was the case until catch mentioned it in the meeting; the sound you heard was my jaw hitting the floor at just how *censored* broken browsers are in this day and age.)

We've been trying to find ways around the first point. The odds of us finding ways around the second, though, are minimal to non-existent in the general case, because browsers suck. There are also two competing considerations:

1) We want to actually, you know, follow REST, or allow people to follow REST, because we work on the web and HTTP 1.1 is a REST format. That's kind of the point.

2) We want to actually, you know, work with the commonly deployed web browsers, proxies, and so on, even if they're all irrecoverably broken, because people need to be able to actually use our code. That's kind of the point.

As a way to try and achieve both goals, and leverage the architecture we have more or less the way it is designed to be used, we proposed the following:

1) The _format attribute is the canonical source for the request format machine name. Most code should look at that only, and never renegotiate itself. (Or, rather, it should use $request->getRequestFormat() which fronts for it.)

2) Except as noted here, all code should act as though we were doing Accept-based negotiation. That is, always link to /node/5, with no extension.

3) We add a pre-kernel middleware that checks for an extension on the request URI. If one is there, it sets that as _format. If not it assumes a default of html. A container parameter passed to this middleware can disable the default-html behavior. That means, in the default/typical case, content negotiation is done via extension and the Accept header is ignored. (Alternatively, we can default to html only if the Accept header is HTML-ish, ie, strpos('text/html'). That may be a more robust route, actually.) The format extension if any is *removed* from the request before it is passed on. (dawehner argued this could be done in an inbound path processor; I disagree, as we want it done as early as possible so that _format is unset for as little time as possible.)

4) We keep the AcceptHeaderMatcher service. However, we adapt it so that if the request format is already set (which it usually will be) it just uses that as the only legal format. If so, it filters out any non-matching routes and continues. If it is not set, then it looks at the available routes and does a "proper" negotiation based on the Accept header and the ordered list of available routes, then filters accordingly.

5) We should swap out the core ContentNegotiation class used by step 4 with a library that doesn't suck. (I wrote the one in core. It sucks.) The willdurand library we keep trying to use is an option, or if people really don't like that one there's also Aura.Accept which seems to have a larger default set of formats. This choice is independent of the rest of this proposal.

6) The AcceptHeaderMatcher service will in some cases be setting _format on the request. So be it. If it does so, it also needs to set _header_negotiated or some other flag. The Generator service will then check for the _header_negotiated attribute. If set, it will generate urls as now. If not set, and getRequestFormat() != HTML, it will append getRequestFormat() as an extension. That way, the addition and stripping of the extension is entirely transparent to the vast majority of code. Most code can ignore it entirely and everything still works. That includes, in particular, any non-HTML code which doesn't need to think about extensions. (dawehner argued this could be done in an outbound path processor; implementation detail, I have no strong preference.)

7) We add another Response listener that checks for _header_negotiated as well. If set, it does two things. One, it sets a Vary: Accept header for correctness. Two, it disables all caching (no-cache) on the response. Why? Because that's what browsers are going to do anyway, effectively, so we make it explicit. This also gives us a place to document why we're doing it, and explain why browsers are the enemy of a free and open web. </sarcasm>. In case a particular site wants to disable that, because they know they DO control all the servers between client and server and knows that they're not broken, we can provide another container parameter to disable the no-cache logic.

8) We modify core's Javascript to send back requests that have BOTH an Accept header AND an extension. This can probably be automated so that little code needs to change. (Ie, set an Accept header and we add the extension automatically, since there's only about a half dozen that cover the 99.9% case of core. As a follow-up we could even pass a mapping list forward from the server in settings.)

9) One loophole here is that IF there's no extension, and IF the middleware didn't set a default _format, and IF there is an exception thrown from a request listener pre-routing, then the exception listeners may not have a getRequestFormat() available. I think in that case we can probably default to html in the exception handling somewhere, or do something else along those lines. It's a solvable problem best handled by the implementation, not sorted out here.

Upshot of this approach:

1) Core is built to do HTTP/REST "correctly", per spec, including route-sensitive negotiation.
2) Most requests use extension-based handling, which is the only thing broken browsers and proxies will handle correctly.
3) Most code doesn't know the difference; it can think in terms of "correct" REST (single URI), but will transparently use extensions just as well.
4) The impact on existing code not explicitly mentioned here is minimal. I'd argue this is not even an API change for the vast majority of code in core or contrib, if we can automate the Javascript part as well.

Does that seem like a viable approach that will satisfy all parties?

catch’s picture

I had to disappear towards the end of the irc discussion for a phone meeting, also haven't read the whole plan yet.

One quick point:

That means, in the default/typical case, content negotiation is done via extension and the Accept header is ignored. (Alternatively, we can default to html only if the Accept header is HTML-ish, ie, strpos('text/html'). That may be a more robust route, actually.)

We can't sometimes set Vary then not set it on the same path, because that gets back to the situation where the proxy serves a cached HTML response to something asking for JSON (or vice versa), which was the original core bug report by me and the reason for the restws SA.

David Strauss’s picture

I support use of the "Vary: Accept" idiom in our code, even if we have to fuzz it with extensions and then abstract it away.

Crell’s picture

catch: The "sometimes" would be in the case that the extension-based html-default is disabled. That's going to be all-or-nothing on a given site, so it shouldn't cause an issue, no?

Crell’s picture

Also, setting Vary: Accept in this setup also implies setting no-cache, so a browser/proxy would have to be REALLY broken to then cache it anyway.

znerol’s picture

  1. I second #78, the Vary response header needs to be identical for any request to the same uri. This is also why we add Vary: Cookie on responses even if no cookie at all was on the request.
  2. We do not necessary support every format on every uri on a site, therefore we cannot possibly use neither a middleware nor an incoming path processor nor anything that runs before routing. People disagreeing with that statement please should specify exactly how a response should look like for /user/login.hal.json. Rendered HTML page or json? Which status code?
  3. The url generator does not need to be modified at all if we'd support and use the path/to/{resource}.{format} pattern.

In my opinion any attempt to perform routing outside of the router will fall on its face sooner or later, and will add additional legacy code which needs to be cleaned up in the future. Please do not try to take any shortcuts here.

dawehner’s picture

Let's try to start with something which is needed for both approaches of both @znerol as well of @crell.

#2445723: Use the $request format instead of the ContentNegotation.

@znerol
Do you think this is a sane step to start with?

Fabianx’s picture

So I think its fine that contrib can disable path based negotiation by doing something (like removing the middleware or setting a flag or whatever).

And I agree with 7. that we always send:

Vary: Accept whenever the AcceptHeaderMatcher runs and does header based format negotiation. (plus that no-cache thingy)

Just to summarize again this gives us:

a) fast and cacheable by default - as url based neg is the default
b) reliable and less cacheable, but correct responses

I think because the middleware would come after the page cache anyway, it is fine to also put it in routing directly.

I would agree with #83 that this is a good first step.

neclimdul’s picture

I've been working on a plan for this issue and I think there are some questions and discussions about the finer points of some things but #83 is pretty set in all of that so that's definitely the first step we can start on now.

Crell’s picture

znerol: A request to /user/login.hal_json would -> /usr/login, _format: hal_json, which routing will find no route for and throw a 406 Not Acceptable, which is what it does now already.

znerol’s picture

I do not expect to get any 406 from a site which does not do Accept based content negotiation.

catch’s picture

Another problem with extension stripping in a middleware.

node/1 is aliased to my-legacy-path.html

We strip .html

Now the alias system gets my-legacy-path

No match.

effulgentsia’s picture

Isn't the equivalent of #88 a problem in HEAD as well:

node/1 is aliased to en/my-legacy-path

We strip en/

Now the alias system gets my-legacy-path

No match.

catch’s picture

In that case we'd only strip en/ if path-based language negotiation was enabled. Otherwise it'll be treated as-is.

And if path-based language negotiation was enabled, aliasing to just my-legacy-path would work for en/my-legacy-path. So I don't think it's the same, no.

We'd be stripping the extension on every site, and it'd also be a regression for sites upgrading from previous Drupal versions where it's been fine to have aliases with extensions. There's also no good reason to do it - we can check or aliases first, then routing would need to assess both foo.extension and foo as routing parts to account for the rss.xml case.

I can't find the link, but neclimdul started a spreadsheet with other use-cases last night.

Another example we have apart from rss.xml is config admin routes which have the yaml filename as argument, so admin/config/config/whatever/foo.yml.

catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
kerby70’s picture

Issue tags: -Need tests +Needs tests

Correcting non standard tag 'Need tests' to 'Needs tests'

Damien Tournoud’s picture

If an external CDN doesn't respect the Vary header *and* doesn't have a way to customize the cache key, it is *broken*. Do we have any evidence of any CDN that fall into this category?

From the top of my head:

  • Varnish supports Vary;
  • All versions of Nginx support customizing the cache key, in addition, recent versions of Nginx support Vary;
  • CloudFront supports customizing the cache key;
  • CloudFlare has really poor caching support, but they use Nginx, so we can assume that they will support the Vary header down the road.
catch’s picture

@Damien did you read the issue? Specifically several things znerol has pointed out are ignored in #94.

catch’s picture

dawehner’s picture

Note: https://github.com/neclimdul/conneg_test has some current experiments

pwolanin’s picture

Issue tags: +D8 Accelerate Dev Days

Talking with dawehner & alexpott about implementation options. I'm going to try a new approach to take out the header handling and keep the path handling fairly simple.

pwolanin’s picture

Title: [meta] External caches mix up response formats on URLs where content negotiation is in use » Remove content negotiation in favor of extensions since external caches mix up response formats on URLs where content negotiation is in use

Changing the title - I think this is also not a meta since we don't have any open child issues.

pwolanin’s picture

Status: Active » Needs work
FileSize
11.46 KB

Starting to work - not functional yet.

pwolanin’s picture

FileSize
13.36 KB
3.13 KB

add some test cases and fix implementation

pwolanin’s picture

FileSize
27.03 KB
16.75 KB

more (slow) progress

pwolanin’s picture

turns out we are seeing page cache interference, so blocked while fixing: #2471473: REST responses should have proper cache tags

patch here test works with the extra hack of disabing page cache, but not what we actually want.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
27.53 KB
1.87 KB

Enough is working - let's see what the bot thinks.

Status: Needs review » Needs work

The last submitted patch, 104: 2364011-104.patch, failed testing.

dawehner’s picture

@pwolanin
We should merge the following tests into this patch:

a) The work we specified in a lot of discussions: https://github.com/neclimdul/conneg_test/blob/master/src/Tests/ContentNe...
It is really helpful to have one place to define what behaviour we expect
b) We should also have a test to ensure that we don't have cache poisoning, https://www.drupal.org/files/issues/2364011-72.patch has that already,
see core/modules/system/src/Tests/Bootstrap/PageCacheTest.php

+++ b/core/lib/Drupal/Core/EventSubscriber/MainContentViewSubscriber.php
@@ -76,7 +76,7 @@ public function onViewRenderArray(GetResponseForControllerResultEvent $event) {
-    $format = $request->getRequestFormat();
+    $format = $request->getRequestFormat(NULL);
 

We should explain why we pass NULL, it might be obvious now, but i won't be in the future.

neclimdul’s picture

Title: Remove content negotiation in favor of extensions since external caches mix up response formats on URLs where content negotiation is in use » [meta] External caches mix up response formats on URLs where content negotiation is in use
FileSize
9.13 KB

I think we should roll this discussion back just a bit. I want a working patch as well but my biggest concern is we rush this and get another weird or broken solution. Also, there where not and child issues but I don't know that we have a solution to warrant those yet.

I've had this mostly ready for over a week but life happened and I didn't get it posted. I've worked through the discussions we've had here and discussions that happened in a couple of the -wscci meetings(transcripts can be provided) and come up with a series of tests that I think outline sort of the expected behaviors we expect routing to have at the end. Some of the tests might be ideal best case scenarios that we have to skip but its a starting point.

I haven't read peter's patches yet, I'll be following up this comment with a review. I actually agree we should rip out the current content negotiation mechanic as part of this though. However, I hope as part of removing that we'll be able to replace it with a better solution rather then removing it. I think it gives service based systems a really killer edge and I hope we can leave a working implementation in that contrib can leverage to great effect. Per discussions, that system has to interact with routes only work on path(all routes that match that path) who have been flagged in some way (so all routes block caching and vary on accept).

znerol’s picture

Issue summary: View changes
znerol’s picture

Issue summary: View changes
webchick’s picture

We just had a big discussion about this at Drupal Dev Days, and decided that this issue summary is definitely out of date. Here's a Google Doc for us to get it fixed up: https://docs.google.com/document/d/1o2_Wnsdw1K7RHDWB7aTAHus0qbfa6WkG8-Ry...

znerol’s picture

Issue summary: View changes
webchick’s picture

Issue summary: View changes

Updating the issue summary so far. Not done yet.

webchick’s picture

Issue summary: View changes

Moar summary!

Basic findings: the web has changed a lot since comment #7. While only GitHub uses content negotiation as we do, only Twitter uses file extensions.

webchick’s picture

Issue summary: View changes

MOAR summary.

New proposed resolution:

  1. For modal/dialog, we should use query parameters instead of accept headers
  2. For REST requests, prefix the path with /api/

Still researching some stuff, but coming to consensus. (pwolanin, klausi, DamZ, dawehner)

pwolanin’s picture

ONe thing we missed was X-request-with header set by jQuery. nod_ says:

12:24 nod_: pwolanin: looking at it doesn't seem to be using it. and when we want to know, this patch just send more data saying it's an ajax request: https://www.drupal.org/node/1305882
12:24 Druplicon: https://www.drupal.org/node/1305882 => drupal_html_id() considered harmful; remove ajax_html_ids to use GET (not POST) AJAX requests #1305882: drupal_html_id() considered harmful; remove ajax_html_ids to use GET (not POST) AJAX requests => 64 comments, 11 IRC mentions
12:24 nod_: maybe contrib relies on it, it's nice because it's built in jQuery and it's the same for everyone
znerol’s picture

Issue summary: View changes
Fabianx’s picture

+1 on the move to /api
+1 on the move to query parameters for dialog/ajax

Thanks @all for discussing this important issue.

pwolanin’s picture

Issue summary: View changes
neclimdul’s picture

For modal/dialog, we should use query parameters instead of accept headers

So re-reading this, I think this might be conflating concerns. The research shows that api's do not sure extensions so we should take that into account when designing a full content negotiation system.

I grabbed a couple of the sites listed about(facebook, twitter, and amazon) and went to their website to see how their XHR requests generally get handled.

Facebook:

Twitter:
twitter is sort of weird. all requests seem to send appropriate request headers.

  • Some requests like typeahead.json log.json
  • Other requests like recommendations, timeline, trends

Amazon.com
Requests all seem to be Accept: */*
no requests supply any sort of extension of query argument that is obvious.

neclimdul’s picture

Status: Needs work » Needs review
FileSize
10.58 KB
21.92 KB

To continue the discussion, updated and fixed some bugs with tests. Obviously if we decide to not use extensions the extension tests should be replaced with what ever other method we implement.

There where concerns about the feasibility of the PoC that was in the patch in my github repo. meta_external_caches-2364011-121.patch contains a more fleshed out implementation for discussion as well.
Done:

  • It fully supports extensions and setting the request format.
  • routes using extensions in this manner bypass content negotiation and can always return the given format.

Not done:

  • Nothing has been converted to use extensions.
  • Content negotiation is moved around but still runs on all non-extension requests. Because of that the main issue here is technically still broken. Accept: application/xml will pollute the cache with a 406 and invalid data. This backwards compatibility was left in until the above conversion is done and to make sure these routing changes don't change any other behaviors.

The last submitted patch, 121: content_negotiation_tests.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 121: meta_external_caches-2364011-121.patch, failed testing.

pwolanin’s picture

@neclimdul - I don't think it's clear that core should ship with any kind of a "full" content negotiation system. We had discussed reducing the current code to as simple a mapping as possible and stopping there.

catch’s picture

I can see that the new proposal is better than the status quo. But I'm not convinced it's better than using extensions - looks like more work to get done, and still requires configuring (at least one) CDN correctly in the case that REST module is providing > 1 format.

webchick’s picture

That is true, though how often do you think that case realistically exists? In looking at the "top 10" REST APIs as part of updating the issue summary, they've all deprecated/removed their XML/SOAP APIs are only are doing JSON, across the board. I think it's nice that we offer the ability to do > 1 format as part of our REST API in the interest of flexibility, but not sure it's a "90%" feature. I think most will just use the one format they get out of the box.

Versus none of the top 10 REST APIs were using extensions anymore, at least as far as I remember.

catch’s picture

I think it's quite likely that a site might want JSON as well as epub representations of their content. Drupal.org still provides 'printer friendly version' for book like https://www.drupal.org/book/export/html/251019 and I know of at least one other site where people actually use that to put long multi-part content into e-readers and similar.

You might not want epub under /api since it's not for a REST api, but that means you'd end up with something like:

node/1 - > HTML
node/1/epub -> epub
api/node/1 -> JSON

vs.

node/1 -> HTML
node/1.epub -> epub
node/1.json -> JSON

Also github lets you do things like https://github.com/md-systems/redis/pull/5.diff (although it redirects, but principle is similar).

neclimdul’s picture

@pwolanin isn't the point of #2472321: Move rest.module routes to /api to have content negotiation for REST.module? I was under the impressions there was full consensus for full content negotiation for a subset of paths.

Fabianx’s picture

#128: That is also how I understood it, but I also don't see the problem with it as long as we distinguish between routes / paths with just one format vs. one with multiple ...

And as long no one puts rest.module on the main content paths (like /node, etc.)

dawehner’s picture

The main problem I have with extensions at that point of view is that we change the router itself, which is compared to changing
paths IMHO a much bigger change at that point.

catch’s picture

#121 doesn't look like that big of a change to me, and it's quite self-contained in terms of impact elsewhere, whereas we already have #2463009: Introduce CacheableResponseInterface: consolidate ways of setting X-Drupal-Cache-Tags/Contexts headers and similar to make content negotiation work with the page cache due to /api/.

Crell’s picture

I really hate all of the options we have available. Just sayin. :-(

#127 to me suggests that we should stick with extensions. The alternative is a seemingly random collection of options; while we can't stop people from putting an arbitrary format at an arbitrary place (nor should we), our "80% automated" mechanisms should be consistent. Having a mix of "sometimes it's /api with Accept, sometimes it's an extension, sometimes it's an extra path element, depending on things" doesn't seem very consistent.

Much as I hate extensions, "if one path has multiple returns, look at the extension" is at least consistent and predictable and more easily documented.

Damien Tournoud’s picture

Note that #121 is really just the beginning of the story: from there we need the related changes inside the rest modules and those sound like where the interesting stuff is. For example, we need to make sure that generated links have the proper format extension, and I'm not sure that the current routing API is able to do that without massively breaking its encapsulation.

effulgentsia’s picture

Since I wasn't at the Drupal Dev Days discussion, thank you for the issue summary updates in #110 - #114. This comment is based purely on the information in that summary, so if there's additional stuff that was discussed that makes this comment stupid, then apologies.

Like #125, I don't understand why the proposal is for rest.module routes within /api to not have extensions. For me, what I find most sad in needing "node 1 as html" and "node 1 as json" to be at different URLs at all is the links issue. If you're a JS application, you might want to get "node 1 as json", but then put the links (e.g., to the node's author, tags, etc.) that are returned within that into some HTML fragment of your application and have those be regular clickable links (i.e., to HTML representations). So regardless of whether the JSON representation is at /node/1.json, /api/node/1, or /api/node/1.json, the client will need to do some link manipulation when inserting content from the response into a different format context than the original request. It would have been awesome if all of the internet (browsers, proxies, CDNs) were standardized to cache efficiently with different representations at the same URL, but oh well, they're not, so we're stuck with passing on some complexity of link management to multiformat client applications.

Once we make the decision to put "node 1 as html" and "node 1 as json" at different URLs, I don't understand what's gained in putting "node 1 as hal" and "node 1 as json" at the same URL. The argument that Facebook, Google, and Amazon have APIs that don't use extensions seems only marginally relevant to me. Those are individual websites with plenty of budget to configure proxies and CDNs for maximum efficiency. Whereas Drupal is a framework that needs to cater to millions of websites (I'm guessing thousands of which will enable REST module for more than one format). Why require all of those site owners to muck with Proxy and CDN configurations? If there's a substantial benefit that justifies it, then ok, but if it's just to mimic Facebook and Google, meh.

According to #131 and #133, we have some potentially difficult code to write for either approach, so I don't have an opinion at this time on which is more expedient in terms of releasing Drupal 8. So for me, it's more a question of what's better for site owners, and I think working on a broader range of proxy/CDN configurations is more compelling than having the URLs be pretty. Especially if there can be a contrib module to implement header-based negotiation at the same URL for those who want it.

Crell’s picture

Issue summary: View changes
Fabianx’s picture

#134: I think we still should be supporting content negotiation inside of core and make the use cases work - even if its not the default and even if path based negotiation is additionally available.

It will be difficult for contrib to add it in a clean way in my opinion, especially if we look at the internal page cache.

catch’s picture

@Damien

For example, we need to make sure that generated links have the proper format extension, and I'm not sure that the current routing API is able to do that without massively breaking its encapsulation.

As effulgentsia pointed out, this isn't any easier for /api/, or if it is, that needs explaining.

If you make a request to node/1.json, then you might want links to point to JSON resources, or you might want the canonical HTML link - depends on the use case of the caller.

If you make a request to api/node/1 for JSON, then you also might want links to point to JSON resources, or you might want the canonical HTML link.

Just node/1 with accept headers doesn't have that problem, but then you have all the other problems.

The other issue for me with /api/ is situations like this:

- Site installs a module that supports a generic Drupal native mobile administration app - this enables json for every entity.

At this point /api/ is only serving one format, so doesn't send a Vary header. So reverse proxy caching/CDNs 'just work'.

- Site then installs a module that provides epub per entity type.

At this point /api/ is serving two formats and sends the Vary header, so varnish/CDNs may do any of 1. stop caching 2. cache OK 3.need configuring.

You can also get to that situation just by clicking around in the UI with rest module, no need to install modules or write code. The behaviour is unpredicatable for end users and requires a lot of knowledge to fix if something goes wrong. As evidenced by this issue, its predecessors and the restws SA.

Whereas with extensions, regardless of how many formats you have, behaviour is the same.

Damien Tournoud’s picture

@catch: No one is pretending that the current solution on the table is perfect. It is really just tuned toward making the 80% use case work properly (i.e. no rest module), which not butchering the 20% use case (the rest module) too much.

If we think we can fix all the complicated problems properly, and accept the likely necessary API changes, I'm fine with moving toward having "extension-based only" content negotiation as the default option.

Damien Tournoud’s picture

My ideal scenario remains to not try to fix this problem. Accept-based negotiation is becoming more and more common, and CDNs are increasingly supporting it. The idea of moving REST to /api is in line with this: it's the forward thinking approach that makes it harder for people to shoot themselves in the foot by blindly enabling the REST module, while still using Accept-based negotiation for the REST part.

As a second best scenario, if we want to move add extension-based negotiation in addition to Accept-based negotiation, we need to do it properly:

  • Add a global switches in the routing system that allows to enable/disable: (1) Accept-based negotiation, (2) Extension-based negotiation;
  • Add a mapping between MIME types and extensions somewhere and give it to the router;
  • Change all the URL generation code to take a MIME type (in the Url class, etc.) as an input, defaulting it to the type of the current request

Contrary to the current patch on the table, I don't think it's ok to force the negotiation of the extension to be a explicit route parameter. This really need to be transparent, so that a given site can chose between the three possible negotiation types transparently (Accept-only, Extension-only, both).

pwolanin’s picture

After having started working on the code, I strong think we should continue on the path of adding a prefix and NOT trying to add or force extension support.

Generic extension support is going to require us hacking up the router and I think should be out of scope for beta, and I think will delay getting to RC.

Fabianx’s picture

I am happy to help with the needed cache work for the page_cache to support /api in an efficient way.

Crell’s picture

Damien: The end result either way is that by the time routing is done, the _format attribute of the $request is populated with a format machine name (html, json, hal_json, etc.) and any code after that should rely on $request->getRequestFormat() and *nothing else*. The request is already available to the generator, too. Also, Request already has a configurable mime type->machine name map built in that we're using, and that's what we'd use for extensions. So the 2nd and 3rd bullet you note is already the case.

I want to avoid "mode switches", as that leads to gross complexity. But yes, whatever we do for the 90% case we do want to make sure we can still support Accept-based matching if someone is in a situation where they know it won't break their site. My biggest concern is keeping it as transparent (as possible) to module developers, as the vast majority of them neither understand nor should need to understand all of this silliness.

dawehner’s picture

Just try to sort things a bit out.
Both approaches /api and .json have a problem with the way how we implement modals/dialogs, because they are done on a global, not per route level.

Given that we should do #2472323: Move modal / dialog to query parameters no matter of the result of this issue.

Damien Tournoud’s picture

@dawehner: one of my points in #139 is that if we go extension-based, we should support that transparently right inside the router. We can't force every route declaring module to have to understand that. In that case modal/dialogs *could* be handled by extensions too.

Damien Tournoud’s picture

Note that there are also interesting side effects of extensions with path aliasing: do we want the path aliasing to also support extensions: you aliased node/123 to about, do we want about.json to do the right thing?

(I would say, of course we want that, and that's just one other unintended consequences of moving to extension-based negotiation.)

eggyal’s picture

New to the party, but here's a thought...

Have /node/1 use content negotiation: responses specify Vary: Accept, but are also merely a 301 (moved permanently) redirection to the relevant content-specific URI e.g. /node/1.html.

Thus when a user agent encounters a node that it has never before seen, a request will have to be made to the origin server—but it can be processed extremely quickly and the response would be very brief indeed. Whenever it encounters that node thereafter, it will use cached versions of the content in its desired format (whatever that might be).

Example HTTP exchange:

GET /node/2364011 HTTP/1.1
Host: www.drupal.org
Accept: text/html

HTTP/1.1 301 Moved Permanently
Location: 2364011.html
Vary: Accept

GET /node/2364011.html HTTP/1.1
Host: www.drupal.org
Accept: text/html

HTTP/1.1 200 OK
Content-Type: text/html

Or

GET /node/2364011 HTTP/1.1
Host: www.drupal.org
Accept: application/json

HTTP/1.1 301 Moved Permanently
Location: 2364011.json
Vary: Accept

GET /node/2364011.json HTTP/1.1
Host: www.drupal.org
Accept: application/json

HTTP/1.1 200 OK
Content-Type: application/json

Etc.

With other request methods—e.g. POST, PUT, DELETE, etc—responses are unlikely to be cacheable in the vast majority of cases anyway. For those cases where they can be cached (and it's desirable to do so), a 301 response may not work as intended (since it will normally result in a GET request to the redirected location); one could instead use 307 or 308 in those cases (the former only indicating a temporary redirection, albeit one for which cache metadata could nevertheless ensure reasonable persistence; and the latter being a more correct permanent status, albeit one that is not yet so widely supported).

znerol’s picture

Re #146 this is what #2430335: Browser language detection is not cache aware tries to do for multilingual sites. Note that this cannot be a general solution because that doubles the HTTP requests. While it might be possible to handle them "very quick", tha still affects client performance significantly.

eggyal’s picture

#147—I agree that it increases the number of requests, which will obviously have an adverse performance impact, but it certainly won't *double* the number of requests due to:

  1. Clients will cache 301 responses and internally rewrite any subsequent references to those nodes they have fetched before, never again needing to make the initial request but instead going directly for the appropriate content-specific URI; and
  2. Caches that can handle Vary: Accept will be able to respond (with the 301 redirect) to the first request without the origin server being hit at all.

Perhaps it's not a "general solution", but it might be an acceptable solution for those who wish to enable content negotiation.

neclimdul’s picture

Sorry I haven't had any free time this week, lets see if I can catch up. There's so much discussion I don't think I can break out all the points

1) I think Damien sort of nailed the reason for moving rest.module to /api in #139, its a forward thinking move to allow support support for more advanced API's to mature by having basic support and a working implementation. Moving it is a way to allow support to mature on top of core and address effulgentsia's 100% valid concern that we can't expect Google style infrastructure knowledge to support it. We turn off caching out of the box and it _should_ be usable.

Let me take a step back though, one of the ways I approach this is in thinking of core is a system but more importantly thinking of it as a toolbox for site developers. These 2 approaches of extensions and Accept negotiations are two tools, not mutually exclusive solutions. Support both in different ways and allow the system designer to decide which meets their requirements.

Extensions:
Built in cache-ability.
Ability to build on top of existing url's like node/1
Easy to understand, we use them in everyday situations.
Accept based:
Allows REST style resources
Industry standard for an "API"
Complicated caching problems (we won't cache by default)

2) Link building I don't know if a solution has really come up. With format being fairly easily resolvable I think the extension should be fairly easy for individual implementations to implement. I hope rest is fairly self contained and will also just make sense going forward.

3) Aliases, #145 yeah in earlier discussions the same sentiment was put forward. The does seem ideal, I'm not sure how to support it. My gut says its ugly. I think maybe more aliases? Is that ideal? how when does it happen? I don't have the answers, so it was left as @todo in the test suite.

4) Redirects. That's pretty novel, it might be worth investigating after this since I don't think its precluded with the current approaches. I have some concerns though because the root of this issue is caching around Vary: Accept. It seems either we're creating a ton of uncached requests to node/1 or we're stumbling into trusting Vary: Accept caching behaviors for the 301.

catch’s picture

Aliases I'd expect us to have to add more aliases to support the REST routes, but also don't see how extensions is better or worse than /api/ on that front.

Not to mention if you had something like epub and wanted to be able to visit it in a browser (there are epub extensions for browsers so this is quite possibe, or you might want to download it as a file to open it with an e-reader), how do you access that? Can't do a format-specific alias for that either - would mean adding an extra route definition in that case probably.

catch’s picture

#149 is a good summary of the pros and cons of each approach, the pros/cons were really missing for me from the dev days google doc hence push back here.

neclimdul's right that the 301 redirect has the same problems with caching as not redirecting, we'd have to issue a 302 anyway since by definition it's not a permanent redirect if the location changes.

#2430335: Browser language detection is not cache aware is talking about redirects explicitly to prevent caching. But it assumes that this will happen once per session if at all - since once you get redirected, all links then get language prefixes applied and browser negotiation won't run. Additionally you can have language negotiation/translation without enabling browser language negotiation at all and this is configurable in the UI. So the cacheability issue is the same, but the assumptions/requirements of the system are quite a bit different.

dawehner’s picture

webchick’s picture

Tagging.

neclimdul’s picture

Sorry for sitting on this, I didn't think I was giving it a very objective write up so distanced my self for a bit. After Thursday's meeting of minds we came up with a plan forward that includes a short term and long term goals for a solution.

Short term:
1) wrap up #2472323: Move modal / dialog to query parameters
There where different views on why but we all agreed this should get out of the way. Personally I think of it as modal/ajax are modifying requests and happening in a generalized way that must be supported on _every_ path. To simplify caching and remove complexity from other solutions we move forward with this issue.

Long term:
There was very little interest for using Accept based negotiation for solving core issues. Only "nice to have" for contrib and site developers. Based on that in the long term we investigate 2 approaches simultaneously.
1) We investigate supporting query arguments for specifying format.
2) We investigate supporting extensions for specifying formats.
And finally there wasn't a lot of consensus on how to support Accept headers. Since it doesn't seem we'll be supporting it in core we discussed moving as much as possible into a module. To use it you would need to opt in in some way to force the correct vary/caching for the URL.

Much of the discussion that lead to this approach hinged around how automated URL generation would work for resources. The example of this is how a link to a reference to a resource would be generated in a HAL formated response. The first concern was around prefix I believe was a resource living in 2 places (/api/node/1 and /node/1). Also, for extensions there was concern about how how would be possible for generating paths with an extension without creating invalid links.

dawehner’s picture

Thank you neclimdul for writing up the summary, I think it contains all the imortant bits, especially written in a neutral way :)

catch’s picture

To expand on link generation discussion, there's issues with all of the approaches (except the one in HEAD which this issue is trying to change).

When viewing a resource in whichever format, we don't necessarily know if we want the canonical URL (i.e. node/1) or the version in the same format that we're viewing (node/1.json, api/node/1, node/1?_format=hal_json). We could assume that we always want to link in the same format, but some routes won't support that format because they only provide HTML or some other hard-coded format (entity forms, sitemap.xml etc.).

So this means either:
- don't try to specify the format - but then you'd have to do link-munging in the client (prepend /api, append an extension).
- always append the format, but end up with 404 or 406 responses from some links.
- in the link generator, try to figure out which formats may or may not be valid - has performance implications.

If we have everything at node/1 with accept headers, we don't have that problem - but we have the various caching problems this issue is trying to solve.

However with query parameters, it's valid for the application to ignore them - so if you link to sitemap.xml?_format=json then returning sitemap.xml isn't the end of the world. This would let us always link with the format, and core will already ignore the query parameter if it doesn't care about it (although the current extension patch also handles this in a similar way for incoming requests).

Additionally, if browsers/proxies/CDNs all started to support accept headers + vary + caching consistently, then 9.x or whichever version could switch to/add accept header content negotiation and node/1?_format=hal_json will work just fine with an accept header - no broken links/redirects to worry about.

However aesthetically/semantically extensions are nicer and some of the complexities (like node/1.xml vs. sitemap.xml are already working with neclimdul's work), so we thought about going a bit further with both of those approaches before trying to make a decision.

dawehner’s picture

A review on #2481453: Implement query parameter based content negotiation as alternative to extensions would be highly welcomed, or at least some critique about that we really should try extensions.

clemens.tolboom’s picture

grendzy’s picture

I wanted to provide some data & context here – the amount of Accept header variation is underestimated. Given the messiness of the browser landscape, I think it's unlikely this situation will improve during the life expectancy of Drupal 8. An "upgrade path" to content-negotiation-everywhere would be nice, but I wouldn't consider it a priority, IMO.

I logged some data on my company's site, and in one weekend observed 95 unqiue Accept headers. Below are some of the most common headers. Under these conditions, Vary: Accept will will have a very poor hit rate, even with caches that understand it.

text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
image/webp,*/*;q=0.8
text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, */*
application/atom+xml,application/rdf+xml,application/rss+xml,application/xml,text/xml,application/json,*/*
image/png,image/*;q=0.8,*/*;q=0.5
text/html,text/plain,text/xml,text/*,application/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/rdf+xml
image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5
text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1
text/html, application/xhtml+xml, */*
text/css,*/*;q=0.1
application/rss+xml, application/xml, */*
text/html,text/plain,text/xml,text/*,application/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/rdf+xml,application/php,application/x-php,application/x-httpd
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,UC/145,plugin/1,alipay/un
image/gif, image/jpeg, image/pjpeg, application/x-shockwave-flash, */*
application/xml, text/xml, */*; q=0.01
image/webp,image/*;q=0.8
*/*,image/webp
application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword,
application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, */*
application/javascript, */*;q=0.8
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,image/webp
text/xml, text/html, application/xhtml+xml, image/png, text/plain, */*;q=0.8
image/webp, image/png;q=0.9, image/jpeg;q=0.9, image/gif;q=0.9, */*;q=0.8
text/html, */*
application/vnd.wap.xhtml+xml,application/xhtml+xml;q=0.9,text/vnd.wap.wml;q=0.8,text/html;q=0.7,*/*;q=0.6
image/png,image/webp,*/*;q=0.8
image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/msword, */*
image/gif,image/x-xbitmap,image/jpeg,image/pjpeg,application/x-shockwave-flash,*/*
application/atom+xml, application/rss+xml, application/rdf+xml;q=0.9, application/xml;q=0.8, text/xml;q=0.8, text/html;q=0.7, unknown/unknown;q=0.1, application/unknown;q=0.1, */*;q=0.1
text/html,application/xml;q=0.9,application/xhtml+xml,text/xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
text/html,application/xhtml+xml,application/xml;q=0.9,application/json;q=0.85,*/*;q=0.8
text/html,*/*;q=0.9
image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
Damien Tournoud’s picture

@grendzy: Granted, but for all practical purposes, it probably doesn't matter. Given that the distribution of incoming requests is extremely far from uniform, the variability in the cache key usually has very little influence on the overall cache hit rate.

If you want to continue your though experiment, you should get the log data and simulate a cache key, and recompute what the overall hit rate would be with Accept in the cache key versus without it. I doubt the hit rate with Accept is going to be significantly lower.

Damien Tournoud’s picture

@grendzy: Also, nothing prevents you from normalizing the Accept headers in your caching/distribution layer (like it is usually done for Accept-Encoding). With more and more websites using content-based negotiation, we are going to see CDNs offering that option more and more.

mikl’s picture

As for “variability in the cache key usually has very little influence on the overall cache hit rate”, that is completely wrong as far as I am concerned. Every variation here basically cuts the cache hit rate in half.

So consider the three big desktop browsers:

Firefox 38: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Chrome 42: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Safari 8: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

By using Vary: Accept, you'd basically have a separate cache bin for the desktop version of Chrome. And when you then add mobile browsers and niche platform like Windows to the mix, then the cache hit rates go straight down the tubes.

As for normalising the Accept headers, that's not at all trivial. Unlike Accept-Encoding which has very few (reasonably) possible values, rewriting Accept headers requires deep knowledge of the application and even the specific route. So it will likely never be a standard option that you can enable on your CDN.

Drupal should be engineered for the web have, rather than the one we wish we had. If we ship Drupal 8 with Vary: Accept, CDNs and reverse proxies will have to consider us broken out-of-the-box, and will have to choose between offering bad service or breaking Drupal's REST API.

dawehner’s picture

If I understand the point of damien correctly this is about really high traffic sites, which have a lot of requests from all possible browsers,
but I'm not convinced that we should actually aim for just that particular usecase.

Advertisment: #2481453: Implement query parameter based content negotiation as alternative to extensions is still ready for a review

Damien Tournoud’s picture

@milk#162:

As for “variability in the cache key usually has very little influence on the overall cache hit rate”, that is completely wrong as far as I am concerned. Every variation here basically cuts the cache hit rate in half.

This is not true at all, even in the most trivial cases. Suppose one URL, receiving 1,000 cached requests from 2 different browsers, without Accept as part of the cache key, you would get a hit rate of 999/1000 (one uncached request, followed by 999 cached requests), with Accept as part of the cache key, you would get a hit rate of 998/1000 (two uncached requests, followed by 998 cached requests).

Think about something else: many websites currently leverage a full-site CDN with multiple POPs around the world (Amazon CloudFront, for example, has more than 30 of them). Each of those POPs cache separately, but you still get really high hit rates.

You get a high hit rate when the flow of incoming requests is significantly higher than the rate of change of the underlying page, and that's basically all. This is the case of the extreme majority of Drupal websites, which are basically brochurewares, even if you make the cache key more complex.

As for normalising the Accept headers, that's not at all trivial. Unlike Accept-Encoding which has very few (reasonably) possible values, rewriting Accept headers requires deep knowledge of the application and even the specific route. So it will likely never be a standard option that you can enable on your CDN.

Of course, all the normalization done on the CDN *has* to be application-aware. This is already really easy to do with Fastly, so I anticipate that other CDNs are going to provide the same types of options when needed.

Drupal should be engineered for the web have, rather than the one we wish we had. If we ship Drupal 8 with Vary: Accept, CDNs and reverse proxies will have to consider us broken out-of-the-box, and will have to choose between offering bad service or breaking Drupal's REST API.

There are reasons to move away from the Accept header for this, but all reverse proxies and many CDNs can easily support this use case, so let's not take this decision for the wrong reasons.

Crell’s picture

most websites currently leverage a full-site CDN

No, most websites over a certain size/budget/wealth of owner do. That's not a representative sample.

Also, as discussed far up the thread (or maybe one of the precursor threads, I've lost track), while we could potentially do all sorts of black magic with the incoming request to normalize it and make Accept headers useful, sending Vary: Accept can do unpleasant things on the *browser* vis a vis caching. We have no control there, no matter how much of the infrastructure you own. At least not in the general case.

Damien Tournoud’s picture

No, most websites over a certain size/budget/wealth of owner do. That's not a representative sample.

I obviously meant "many". My point was not about the number of websites that uses a full-site CDN. My point was that full-site CDNs having multiple locations over the world doesn't affect the cache hit ratio much.

Also, as discussed far up the thread (or maybe one of the precursor threads, I've lost track), while we could potentially do all sorts of black magic with the incoming request to normalize it and make Accept headers useful, sending Vary: Accept can do unpleasant things on the *browser* vis a vis caching. We have no control there, no matter how much of the infrastructure you own. At least not in the general case.

I am not arguing against moving away from Accept-based negotiation. I'm arguing against *bad reasons* to move away from Accept-based negotiation. That not many websites today are doing Accept-based negotiation mixing HTML and non-HTML, and that as a consequence there might be some browser bugs in that scenario is indeed a powerful reason not to try to do it.

dawehner’s picture

Priority: Critical » Major

Query parameter based content negotiation is in!

So both CDNs, and potentially more important browsers don't have issues anymore.

Given that the issue should not be critical anymore, maybe some of the accept header based routing discussion could still be done here ...

Wim Leers’s picture

I also want to call out that #2481453: Implement query parameter based content negotiation as alternative to extensions kept Accept header-based negotiation as a test, to ensure that contrib/a future D8 core release can still add it.

dawehner’s picture

It would be really interesting to have someone (@neclimdul) try it actually out and report what kind of problems this results to?

Crell’s picture

Status: Needs work » Postponed (maintainer needs more info)

I don't think postponed is really correct here, but we don't have a "needs more research" status. Now that query params are in, can someone verify if there is still a cache poisoning here, and if so what the details are? (If it's not an issue now, great, we're done, huzzah.)

neclimdul’s picture

Status: Postponed (maintainer needs more info) » Fixed

Lets go ahead and close this out.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

andypost’s picture

There's still 2 todos in core

git grep 2364011
core/lib/Drupal/Core/EventSubscriber/AcceptNegotiation406.php:13: * @todo fix or replace this in https://www.drupal.org/node/2364011
core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php:11: * @todo This is a temporary solution, remove this in https://www.drupal.org/node/2364011

And somehow application/json is not resolved as proper format from Content-Type header

andypost’s picture

effulgentsia’s picture

Folks here may be interested in #3028501: Enable header-based proactive content negotiation with optimizations and opt-outs available.. I don't think the patch there would mess up external caches, but if you think it would, please comment on that issue.