Problem/Motivation
As of #2019123: Use the same canonical URI paths as for HTML routes, content negotiation is used to determine the response format of REST requests. Since REST clients are served from the same URLs as browsers, this applies to every request.
However, in order to make content negotiation via Accept
request header work with external caches, an application needs to add Vary: Accept
to affected responses. Otherwise an external cache will mix them up and REST clients / browsers are served with responses in random formats.
One big difference Drupal has from other REST APIs it serves HTML on the same path, which causes problems in browsers.
Accept header use cases
- Modal dialogs
- REST API responses
Because of these two completely different use cases, we might want to tackle them differently.
Examples of major REST APIs "in the wild"
APIs using proper accept headers without a format extension (*.json for example)
- GItHub: https://developer.github.com/v3/media/ - Defaults to accept headers:
application/vnd.github[.version].param[+json]
APIs using format extensions (*.json for example) and ignoring Accept headers
- Twitter: https://dev.twitter.com/rest/public -
https://api.twitter.com/1.1/statuses/mentions_timeline.json
APIs using different URLs (e.g. /api prefix)
- Facebook: https://developers.facebook.com/docs/graph-api/overview/
GET graph.facebook.com /facebook/picture? redirect=false
Facebook does not use any inline hyperlinks for API consumers, they only provide IDs
- Google API https://developers.google.com/ad-exchange/buyer-rest/v1.3/
GET https://www.googleapis.com/adexchangebuyer/v1.3/accounts/id
embedded API links do not use a format extension.
- Amazon: http://docs.aws.amazon.com/AWSEC2/latest/APIReference/Query-Requests.html
https://ec2.amazonaws.com/?Action=RunInstances &ImageId=ami-2bb65342 &MaxCount=3 &MinCount=1 &Placement.AvailabilityZone=us-east-1a &Monitoring.Enabled=true &Version=2014-10-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLEus-east-1%2Fec2%2Faws4_request &X-Amz-Date=20130813T150206Z &X-Amz-SignedHeaders=content-type%3host%3x-amz-date &X-Amz-Signature=ced6826de92d2bdeed8f846f0bf508e8559e98e4b0194b84example54174deb456c Content-type: application/json host:ec2.amazonaws.com
No links, no format extensions.
Browser support
Vary: Accept
in browser caches (e.g. Safari, found via this blog post).
See also: http://www.newmediacampaigns.com/blog/browser-rest-http-accept-headers
CDN support
All CDNs need to have a way of either respecting the Vary
headers (and should work out of the box) or customizing the cache key (with configuration).
- Varnish supports
Vary
(once we properly support Vary headers) - All versions of Nginx support customizing the cache key; in addition, recent versions of Nginx support
Vary
- Akamai can be configured to support
Vary
headers. https://www.drupal.org/node/2364011#comment-9328651 , but Akamai can't cache pages the emit vary headers http://my.globaldots.com/knowledgebase.php?action=displayarticle&id=32 - For Amazon CloudFront "only acceptable value for the Vary header is Accept-Encoding. CloudFront ignores other values" so
Vary: Accept
doesn't work. However, CloudFront supports customizing the cache key - CloudFlare has really poor caching support, and Anecdotally CloudFlare doesn't seem to support
Vary: Accept
but official answers are scarce. However, they use Nginx, so we could assume that they will support the Vary header down the road. Or, we could say we simply don't support CloudFlare.
REST use cases
The following cases should be considered when evaluating any solution as to whether they'd be supported and how cleanly.
Routes
- rss.xml - single purpose path responding with a non-html format.
- node/1 - a single purpose path responding with html only format.
- node/1.json - multipurpose path (node/1) with an extension denoting format.
- admin/foo/example.plugin_id - An arbitrary path containing some placeholder not related to format but a point)
- rest/myendpointwithvaryingformats - Accepts only conneg. (Current behaviour in 8.x, not-critical to support and can be in contrib.
Aliases
- myalias.json - an explicitly added alias with an extension denoting format (myalias.json -> node/1.json) (supported in 7.x)
- myalias.html - an alias with an extension, pointing to a regular HTML page (node/1)
- menu.pdf - an alias with a non-HTML extension, pointing to an HTML page (node/1) (supported in 7.x)
- about.json - a request comes in to an aliased path (about -> node/1) with an extension . The alias-with-extension is implicit. (not supported in 7.x
Research
- Interaction between content negotiation and page cache, why content negotiation as a (pre-cache) stack middleware is not going to work #2369681-34: Move ContentNegotiation into a middleware
Other Peoples' Opinions™
The Spring framework prefers extensions (and query arguments) over content negotation and calls content-negotiation "problematic to use".
Roy Fielding on a discussion between extensions and query arguments denies that either "choice has anything to do with REST". This (and his whole mail) really can't be interpreted any other way but that he sees extensions compatible with REST and that he likes extensions better than query arguments.
Proposed resolution
Original proposed resolution was to switch to /node/1.json
and so on. Perhaps keep Content-Negotiation as an (advanced) option.
However, research at Drupal Dev Days shows that most of the major web APIs these days are not using extensions. So this would be a non-standard approach. Additionally, the only reverse proxy we can find without a workaround for content negotiation is CloudFlare, and it seems much easier to just say we don't support it until it does.
Therefore, the new proposed resolution is:
- For modal/dialog, we should use query parameters instead of accept headers
- For REST requests, prefix the path with /api/
However, this may need to be revisited based on research.
Remaining tasks
User interface changes
API changes
Comment | File | Size | Author |
---|---|---|---|
#121 | meta_external_caches-2364011-121.patch | 21.92 KB | neclimdul |
#121 | content_negotiation_tests.patch | 10.58 KB | neclimdul |
Comments
Comment #1
znerol CreditAttribution: znerol commentedComment #2
dawehnerSo #2331919: Implement StackNegotiation solves this issue, what is the point to have this issue opened?
In addition to that the proposed solution:
Varry: Accept
seems sadly not a good idea for HTML routes, we could implement it probablyfor REST routes though?
Comment #3
znerol CreditAttribution: znerol commentedNope, not for external caches (Varnish, Squid, etc.).
We need to specify
Vary: Accept
on all URLs (regardless of routes) where content negotiation can happen. That's why I think that #2019123: Use the same canonical URI paths as for HTML routes was not so smart after all.Comment #4
znerol CreditAttribution: znerol commentedThis might be a starting point for someone. I failed to come up with a good idea on how
FinishResponseSubscriber::needsVaryAccept()
could be implemented.Comment #5
catch#2331919-25: Implement StackNegotiation and the linked comments explain why
Vary: Accept
isn't a good option for external caches. I feel like we should go for some variant of the 'different routes' option.Comment #6
catchThis was worthy of an SA for the Drupal 7 contrib REST module (https://www.drupal.org/node/1966780), I don't think it's any less critical for external caches than the internal one.
Comment #7
klausiOriginally I was very excited about #2019123: Use the same canonical URI paths as for HTML routes, because I thought there was benefit in implementing REST the right way for discoverability of resource paths and with a uniform interface.
Turns out the internet does not give a shit about uniform interfaces or link relations. The term REST has changed its meaning completely and only means some HTTP accessible API now. If you look at all the common APIs out there like Twitter, Github, Facebook etc. they all violate the original REST principles by using some "/api/version/1.2/node/1.json" path instead of using "/node/1" with appropriate HTTP headers.
There are even dubious REST best practice examples out there that promote this non-REST behavior. https://github.com/Gizra/restful is an example for Drupal 7 that is now emerging, and people do not seem to have a problem with it.
As a REST module maintainer in D8 I fear that by following the original REST principles we are inventing a Drupalism here that no one else is doing. And it also makes our own life harder as it might need to be (performance in the routing system and external cachability).
For the last two years I was hoping for the real REST to break through, but apparently it is not happening on the internet. Unfortunately we are now in beta, so not sure what the most sensitive way forward should be.
Comment #8
chx CreditAttribution: chx commented> We need to specify Vary: Accept on all URLs (regardless of routes) where content negotiation can happen.
Like every content entity in the system?
Comment #9
Anonymous (not verified) CreditAttribution: Anonymous commentedi really like the spirit of #7.
for the love of $deity, can we make decisions based on what will actually work, and not architecture-astronaut hand waving.
Comment #10
chx CreditAttribution: chx commentedI find it important to note: any rumors of me hacking beejeebus' account and posting #9 myself are completely baseless.
Comment #11
effulgentsia CreditAttribution: effulgentsia commentedI think the basic principle of a request processing and routing system that can treat an internal "path" and an internal "format" as separate things and that can support:
is very useful. For example, for an internal path of "node/1", it allows us to have:
So, IMO, the decoupling of path, route, and format, at the PHP code level, is useful, so I think a lot of the work that went into the routing system and #2019123: Use the same canonical URI paths as for HTML routes has not been in vain.
However, our useful decoupling of these concepts at the PHP level shouldn't force us into suboptimal solutions for user agents. Given the real performance and caching benefits of including the format into the URL, I think we should do that, at least by default.
In other words, I think we should implement inbound/outbound path processors, just like we do as our default language negotiation strategy. So our URLs can look like
/en/node/1.json
, but the internal path that routes on gets normalized to/node/1
. That would give us the best of most worlds: nicely decoupled PHP code, URLs friendly to real-world proxy caches, and a faster internal page cache fetch.However, for site builders who really want format-agnostic URLs, and are willing to deal with a slower internal page cache fetch and either the bypassing or special configuration of proxy caches, let's provide a setting for switching from URL-based negotiation to header-based negotiation (similar to how we allow language negotiation configuration, though in the case of format negotiation, it probably needs to be either-or rather than a chain). The fact that we already have the concepts of internal paths and URLs decoupled should makes doing this relatively easy.
Comment #12
Wim Leers#9 & #10: Thanks for the giggles!
#11: A resounding "YES!" to your paragraphs about the decoupling of path/route/format at the PHP code level being useful.
But … how do you imagine that those outbound path processors would work, given that the whole point is that different formats are supported? E.g.: you create a menu link (
menu_link_content
) to/node/1
. It's available in JSON, XML, HAL+JSON, HTML, DIALOG and MODAL. There are only two ways to make that outbound path processor choose a format: specify it in the menu link (currently not supported) or default to a certain format. I think that in practice, this would effectively mean we'd always default to HTML. Is that a problem?Comment #13
Fabianx CreditAttribution: Fabianx commented#11: I think that is useful.
I also thought about a setting to disable Vary: Accept and move to URL based negotiation.
But still supporting it - when needed. We could keep full REST for BC reasons (with stack neg library) and just rip it out if / when we see that no one uses it ...
But having a settings switch sounds useful, which would default to fast-by-default.
Comment #14
znerol CreditAttribution: znerol commentedIt seems to me that people here believe that content negotiation is a prerequisite for
REST
. I do not think that this is true, see e.g. REST - Tradeoffs between content negotiation via Accept header versus extensions (posted the link previously in the stack negotiation issue, reposting it here).For example the Django REST framework allows both approaches (see Format suffixes).
I often use the Redmine REST API which uses the extension-approach.
Comment #15
dawehner#11 on top being useful in general we should take in mind that these best practices might chance at any point. Maybe people will come up with using query arguments again.
Keeping the underlying structure to be flexible is a good idea, if you would ask me.
I would argue that for menu links we should dedicate try to prefer the HTML only routes, so the routes which don't have a
_format
specified.If you would to link to something like .json route we should ensure that inbound processing is also executed when we determine the route.
For other stuff like serialized json, the best strategy would be to always try to pass along the context in which we render stuff and use it.
Comment #16
Crell CreditAttribution: Crell commentedBackground: When we discussed this question previously, the answer at the time was "Varnish can be configured to obey accept headers, it's just not the default" and "Akamai doesn't by default, but can also be configured to do so." That is, "yes it means you need to tweak your proxy cache settings but you generally have to do that anyway." So the matter of external caches was considered and we concluded that "well configure them properly" was an acceptable answer.
IF we are going to go back on that, I am still opposed to doing so but Symfony does offer a convention for it that "falls back" gracefully. That is, _format is the request attribute that tracks the negotiated format name. If that's provided in the path pattern (/node/{node}.{_format}) then that wins over whatever negotiated value was there previously.
Supporting that directly would allow people to send requests to /node/5.json_hal, PUT to /node/5.json, etc. It's janky but if you want to do URI-based content negotiation that's the supported way to do so. And it would still transparently allow /node/5 itself to still work with a negotiated header for those who are able and willing to setup the appropriate Varnish rule.
Comment #17
chx CreditAttribution: chx commented#16 please link the discussion. Thanks! (#2019123: Use the same canonical URI paths as for HTML routes does not contain the words "Varnish" or "Akamai")
Comment #18
klausiThis was discussed in #1855260: Page caching broken by accept header-based routing.
Comment #19
effulgentsia CreditAttribution: effulgentsia commentedRe #17: For Akamai: #1855260-28: Page caching broken by accept header-based routing. For Varnish: https://www.varnish-cache.org/docs/3.0/tutorial/vary.html, which gives "Accept-Encoding" as the example, but the same kind of VCL could be written to normalize "Accept" as well.
In #1855260-25: Page caching broken by accept header-based routing, I said:
But yesterday, when I wrote #11, I changed my mind to making core's default be "fast by default", rather than requiring contrib or individual site owners to figure out how to make it fast.
Comment #20
catchI was never happy with the original solution, as you can see from the comment when I committed the patch #1855260-79: Page caching broken by accept header-based routing - that always felt like a fragile workaround for functionality that was flawed in terms of how browsers (which aren't consistent in the headers they send) and reverse proxies (which rightfully don't trust browsers to send the right headers) work.
This isn't just about "fast by default" it's also about "secure by default" - since we don't send
Vary: Accept
Comment #21
Crell CreditAttribution: Crell commentedSide note: Klausi, part of the problem is that "REST" has been twisted in recent years. REST as defined by Roy Fielding originally and expressed in the HTTP 1.1 spec does entail content negotiation, links between resources, statelessness, etc. More recently many organizations have taken to using "REST" to refer to any web service that isn't SOAP or XML, even though that's not accurate. As a result, many are now using the term "hypermedia" to refer to "What REST actually meant before people started abusing the term." Whatever the noun is this week, I do believe Drupal should strive to be "correct" with respect to the standards and protocols in question even if many parties on the web fall short of that metric.
Also adding links to prior discussions.
Comment #22
chx CreditAttribution: chx commented#21 please read #14 especially the linked discussion there which contains
#14 says this was raised in previous issues.
Comment #23
Fabianx CreditAttribution: Fabianx commentedAnother quote from the excellent stackoverflow answers:
If cacheability is one of RESTs requirements, but we can't send Vary: Accept for normal browsers, can't we still have both:
- a node/1 - the Drupal path, only responding to HTML
- rest/node/1 - the "multiplexed" path that will always send Vary: Accept
Route generation might need a parameter, restful => TRUE or something like that, which is off by default.
Yes it is true that 'node/1' is then no longer canonical, but rest/node/1, but is it specified somewhere in REST that a resource MUST NOT BE available at a different location (which is not restful)?
A trouble of this approach might be aliases, but just re-acting to /rest/* the same as pure / might solve that, e.g. /article-1 and /rest/article-1.
In my really humble opinion (this is just an idea) this might be a way we can remain truly restful (just under a different namespace) and still have our fast 90% case for web browsers.
Comment #24
Crell CreditAttribution: Crell commentedklausi: /node/1 and /api/node/1 are different resources by definition, since resource and URI are 1:1. However, there's no requirement that 2 resources can't be backed by the same underlying data or that changing one resource causes a change in another, just that such details are not exposed to the REST client AFAIK.
If we were to leave the current code in place, but then also provide an inbound and outbound path processor that adds/removes /api on incoming/outgoing requests if _format is not html, that would probably accomplish what #23 describes. I don't think that's any better than what #16 proposed, which would also give us both "correct" shared URIs but operational format-distinct URIs. #16 would be more consistent with the tutorials I've seen for Symfony, anyway.
Comment #25
mikl CreditAttribution: mikl commentedI've recently been trying to figure out how to use the REST stuff in Drupal 8, and since I have not had that much exposure to it before I had a fairly fresh take on it. Here's a few observations:
So in short, yes, I think it would be nicer if we had uniform endpoints. So /rest/user gives a list of users, /rest/user/1 gives you the admin account. It might have to be slightly more complex than that to avoid name collisions between modules – so /rest/taxonomy/term/1, /rest/node/node/1 and the like, so the first part of the URL would be the module name and the next would be the entity type.
Comment #26
Crell CreditAttribution: Crell commentedmikl: The "/node gives a list of nodes" is the usual recommendation for "collections" in REST APIs, but in practice I don't think it's viable; adding /api to the URI wouldn't change that on a site with 100k nodes a resource that just lists through them in some arbitrary order, hopefully paged, would be extremely difficult to use if not impossible. Contextually-relevant collections are far more useful, which is exactly what Views is for.
I think the rest of #25 is off topic for this thread, but is good feedback in general that should be discussed in a separate issue.
Comment #27
kim.pepper@mikl Have a look at the REST UI contrib module which might give you more flexibility https://www.drupal.org/project/restui
Comment #28
David StraussLarry and I discussed the worries here at BADCamp. In short, I encourage people to validate their "Accept/Vary" concerns before worrying others or putting work into creating workarounds. I'm pretty optimistic that it's supported properly.
According to Varnish's documentation, it supports Vary headers by default. Someone else at the BADCamp discussion said it also works fine if configured in Akamai (unclear if that's the default). After the conversation, I gave it further thought and realized that supporting different page compression standards requires proper Accept-Encoding/Vary support, and we definitely see a combination of plaintext, gzip, and sdch, and every proxy has to support that.
Still, we should consider shipping Varnish and Apache configuration to normalize the Accept for the content type so we can hit caches efficiently. The values from HTTP clients will probably differ even more than for Accept-Encoding.
Comment #29
znerol CreditAttribution: znerol commentedWhile trying to work on the patch in #4 I realized that this is a bigger problem than I initially thought. The reason is that content negotiation is not only used in the rest module but also in core (e.g.
MainContentViewSubscriber.php
).Any d8 site which is operating behind a caching proxy (no matter whether this is a reverse proxy operated by the owner of the site or an intermediate one) is potentially vulnerable to cache-poisoning for anonymous users.
The easy way to resolve this problem is to unconditionally add
Vary: Accept
to all responses generated by Drupal. However, this might be still not enough because of bugs in browser caches (e.g. Safari, found via this blog post).However, I'd prefer if d8 wouldn't ship with
Vary: Accept
on by default. In this case it will be necessary to make content negotiation optional - not only for the rest module but throughout Drupal core.Comment #30
Wim Leers#29: That doesn't sound good :( Do you have a proposal on how to move forward?
Comment #31
Anonymous (not verified) CreditAttribution: Anonymous commentedmy preference would be specifying what we want in the url, and don't do Vary: Accept by default.
if we make it possible to do Vary: Accept headers for contrib, but don't do it by default in core, we'll get much better hit rates, and a simpler system.
Comment #32
larowlanWhilst I concede that we might have to use URLs instead of accept, one thing accept gave us was progressive enhancement with dialog api. Links to node/4 would open in a dialog if JavaScript was enabled, but would still work without it
Comment #33
znerol CreditAttribution: znerol commentedThis should be doable with minor changes. I hope it is as simple as converting from
data-accepts="application/vnd.drupal-modal"
todata-format="drupal-modal.js"
and then just callback to"node/4." + format
or something.Comment #34
dawehnerHere is a rough idea just to get some momentum on this issue:
Comment #35
dawehnerWell, let's see how much breaks, just being curious here.
Comment #38
dawehnerJust using ".extension" is not as simple as you expect it to be, let's see whether it helps to rename /rss.xml to /rss
as well as changing the order of the path processors.
Comment #40
dawehnerJust a reroll ...
Comment #42
xjmComment #43
dawehnerGiven that we use dots in URLs in various places, for example in the config UI, I wonder whether we should better
try to approach the problem using some URL parameter?
Comment #44
dawehner@berdir suggested to just accept a whitelist of extensions, which would be reserved like 'json' 'hal_json' and 'xml'.
This would probably make core passing but its will be odd to not allow for example config names for that.
Comment #45
dawehnerLet's try out the whitelist idea.
Comment #47
dawehnerFixed a couple of the usages of "/rss.xml" to "/rss".
Comment #48
neclimdulWas going to look at this but it didn't apply. Haven't really looked still but here's a automated re-roll(rebase of last working commit) for anyone else planning on looking. And I guess an updated testbot run too.
Comment #49
podarokLet's avoid of double method execution
Comment #50
dawehnerLet's change it back to needs review, given that we aren't yet (sadly) in the phase of detailed reviews
Comment #51
neclimdulSo finally looked through it. Magic like this sort of makes me nervous. First, its a hardcoded list. Second, its magically changing the path. seems like that will come back to bite someone in contrib.
Comment #52
Dave ReidI would highly suspect this would come back to bite xmlsitemap.module, among others. I'm completely doubtful that everything requesting /sitemap.xml would use the proper Accept: text/xml header, which could then be incredibly confusing if there is an actual browseable/clickable sitemap like most sites do.
Comment #53
dawehner@Dave Reid
For xml sitemap this would be no problem. You don't have different results for XML and not HTML, all you would do is to always return your XML.
It though would become problematic if we don't allow .xml by default. Thanks for bringing this up.
Comment #54
znerol CreditAttribution: znerol commentedI tend to think that the
PathProcessorAccept
approach suffers from the exact same issues as the current accept header based content negotiation. It is more or less just swapping out the negotiation mechanism from accept header to file extension. This may help with solving the external cache-issue, but it does nothing to fix the architectural problems I've pointed out in #2369681-34: Move ContentNegotiation into a middleware.Comment #55
znerol CreditAttribution: znerol commentedFiled #2409187: RouteProvider does not understand extension syntax (maybe critical?)
Comment #56
chx CreditAttribution: chx commentedCould someone do a quick look at various frameworks as to what they do to this problem? For example, here's Spring, it selects the format checking first an URL suffix (.xml) then a URL parameter (?format=xml) and only third the Accept header noting that
We need more data like this.
Comment #57
chx CreditAttribution: chx commentedAnother data point (nothing more than that): http://stackoverflow.com/a/385216/308851 while a certain developer in 2009 was insisting on what D8 does at this point and said that foo.json was not how REST supposed to be by 2011 said developer said (s)he was wrong. A much upvoted comment:
Comment #58
grendzy CreditAttribution: grendzy commentedA practical data point... I'm running a production D8 site behind the CloudFront CDN. "Vary: Accept" is not supported by CloudFront, which causes all kinds of breakage with AJAX (e.g. the image button on WYSIWYG).
Well, I don't have the expertise to make claims about the "original" principles, but even Roy Fielding said That's why I always prefer extensions. Neither choice has anything to do with REST."
I think content negotiation should be optional. Regardless of principles, the concept is objectively a market failure, and if Drupal requires it we are severely limiting the range of compatible CDN and proxy options.
Comment #59
chx CreditAttribution: chx commentedIf CloudFront doesn't support Vary: Accept --and it doesn't-- then this discussion is pretty much over. Updating the issue summary.
Comment #60
chx CreditAttribution: chx commentedComment #61
chx CreditAttribution: chx commentedComment #62
chx CreditAttribution: chx commentedComment #63
chx CreditAttribution: chx commentedMy work here (and everywhere) is done, so unfollowing this. Have fun!
Comment #64
David StraussI see a note in the description about CloudFlare, which I think is irrelevant given CloudFlare not strongly supporting page caching, anyway. I've talked with their engineers via my company's partnership, and they don't recommend using their service as a Drupal page cache. By default, CloudFlare doesn't try to cache pages.
@chx, your information about CloudFront is out of date: https://aws.amazon.com/blogs/aws/enhanced-cloudfront-customization/
Comment #65
grendzy CreditAttribution: grendzy commentedDavid: CloudFront allows custom headers to be "whitelisted", but you have to understand whitelisting means the entire header value becomes part of the cache key (analogous to vcl_hash() in varnish).
Due to the extreme fragmentation of
Accept:
in the browser landscape, whitelistingAccept:
entirely negates the usefulness of the cache.Comment #66
webchickUsing the more common "security" tag here.
Comment #67
dawehnerJust another reroll + feedback from #2364011-49: [meta] External caches mix up response formats on URLs where content negotiation is in use
Comment #70
swentel CreditAttribution: swentel commentedrerolled
Comment #71
jibranAdding JS tag for JS changes. There are no tests for this fix in the patch and I am not seeing an test only patch on the issue.
We should add some unittest for this.
Comment #72
dawehnerFor now this is a test to ensure that things work as expected. It fails at the moment though.
Comment #73
Crell CreditAttribution: Crell commentedComment #75
znerol CreditAttribution: znerol commentedI've already pointed out in #54 that this approach is doomed to fail (as demonstrated by the test failures for
rss.xml
). We only can properly implement this on the router level. Therefore, let's turn this issue into a meta and continue exploring the options in subissues.We could solve the problem with JS wishing to open anything in a dialog by simply adding .{_format} to the path of every route which does not have an extension specified explicitly. That is still hacky, but it addresses the problem on the router level (where it actually belongs). PoC is #2409187-3: RouteProvider does not understand extension syntax subsequent optimizations did not work, therefore we should depart from patch 3. Also note that it is possible to use composite extensions to formats (i.e.
node/1.drupal-dialog.js
,node/1.drupal-modal.js
,node/1.hal.json
).Comment #76
dawehnerWe had a plan outlined on the last wscci meeting, I hope crell can post it.
I fear that putting it into the routing it might be really hard to undo it for intranet usecases.
Comment #77
Crell CreditAttribution: Crell commentedWhat I proposed during the WSCCI meeting last week goes something like this:
As I understand it, there's two cache-based fails that are the cause of our pain:
1) Incoming requests with an Accept header, which browsers screw up and send a dozen variations on that, in practice, result in the same thing. This means we either use that in the cache (and have a ton of duplicates) or don't use it in the cache (and get cache poisoning).
2) Outgoing requests with a Vary: Accept header, which common browsers short-circuit and translate to no-cache. (I hadn't realized this was the case until catch mentioned it in the meeting; the sound you heard was my jaw hitting the floor at just how *censored* broken browsers are in this day and age.)
We've been trying to find ways around the first point. The odds of us finding ways around the second, though, are minimal to non-existent in the general case, because browsers suck. There are also two competing considerations:
1) We want to actually, you know, follow REST, or allow people to follow REST, because we work on the web and HTTP 1.1 is a REST format. That's kind of the point.
2) We want to actually, you know, work with the commonly deployed web browsers, proxies, and so on, even if they're all irrecoverably broken, because people need to be able to actually use our code. That's kind of the point.
As a way to try and achieve both goals, and leverage the architecture we have more or less the way it is designed to be used, we proposed the following:
1) The _format attribute is the canonical source for the request format machine name. Most code should look at that only, and never renegotiate itself. (Or, rather, it should use $request->getRequestFormat() which fronts for it.)
2) Except as noted here, all code should act as though we were doing Accept-based negotiation. That is, always link to /node/5, with no extension.
3) We add a pre-kernel middleware that checks for an extension on the request URI. If one is there, it sets that as _format. If not it assumes a default of html. A container parameter passed to this middleware can disable the default-html behavior. That means, in the default/typical case, content negotiation is done via extension and the Accept header is ignored. (Alternatively, we can default to html only if the Accept header is HTML-ish, ie, strpos('text/html'). That may be a more robust route, actually.) The format extension if any is *removed* from the request before it is passed on. (dawehner argued this could be done in an inbound path processor; I disagree, as we want it done as early as possible so that _format is unset for as little time as possible.)
4) We keep the AcceptHeaderMatcher service. However, we adapt it so that if the request format is already set (which it usually will be) it just uses that as the only legal format. If so, it filters out any non-matching routes and continues. If it is not set, then it looks at the available routes and does a "proper" negotiation based on the Accept header and the ordered list of available routes, then filters accordingly.
5) We should swap out the core ContentNegotiation class used by step 4 with a library that doesn't suck. (I wrote the one in core. It sucks.) The willdurand library we keep trying to use is an option, or if people really don't like that one there's also Aura.Accept which seems to have a larger default set of formats. This choice is independent of the rest of this proposal.
6) The AcceptHeaderMatcher service will in some cases be setting _format on the request. So be it. If it does so, it also needs to set _header_negotiated or some other flag. The Generator service will then check for the _header_negotiated attribute. If set, it will generate urls as now. If not set, and getRequestFormat() != HTML, it will append getRequestFormat() as an extension. That way, the addition and stripping of the extension is entirely transparent to the vast majority of code. Most code can ignore it entirely and everything still works. That includes, in particular, any non-HTML code which doesn't need to think about extensions. (dawehner argued this could be done in an outbound path processor; implementation detail, I have no strong preference.)
7) We add another Response listener that checks for _header_negotiated as well. If set, it does two things. One, it sets a Vary: Accept header for correctness. Two, it disables all caching (no-cache) on the response. Why? Because that's what browsers are going to do anyway, effectively, so we make it explicit. This also gives us a place to document why we're doing it, and explain why browsers are the enemy of a free and open web. </sarcasm>. In case a particular site wants to disable that, because they know they DO control all the servers between client and server and knows that they're not broken, we can provide another container parameter to disable the no-cache logic.
8) We modify core's Javascript to send back requests that have BOTH an Accept header AND an extension. This can probably be automated so that little code needs to change. (Ie, set an Accept header and we add the extension automatically, since there's only about a half dozen that cover the 99.9% case of core. As a follow-up we could even pass a mapping list forward from the server in settings.)
9) One loophole here is that IF there's no extension, and IF the middleware didn't set a default _format, and IF there is an exception thrown from a request listener pre-routing, then the exception listeners may not have a getRequestFormat() available. I think in that case we can probably default to html in the exception handling somewhere, or do something else along those lines. It's a solvable problem best handled by the implementation, not sorted out here.
Upshot of this approach:
1) Core is built to do HTTP/REST "correctly", per spec, including route-sensitive negotiation.
2) Most requests use extension-based handling, which is the only thing broken browsers and proxies will handle correctly.
3) Most code doesn't know the difference; it can think in terms of "correct" REST (single URI), but will transparently use extensions just as well.
4) The impact on existing code not explicitly mentioned here is minimal. I'd argue this is not even an API change for the vast majority of code in core or contrib, if we can automate the Javascript part as well.
Does that seem like a viable approach that will satisfy all parties?
Comment #78
catchI had to disappear towards the end of the irc discussion for a phone meeting, also haven't read the whole plan yet.
One quick point:
We can't sometimes set Vary then not set it on the same path, because that gets back to the situation where the proxy serves a cached HTML response to something asking for JSON (or vice versa), which was the original core bug report by me and the reason for the restws SA.
Comment #79
David StraussI support use of the "Vary: Accept" idiom in our code, even if we have to fuzz it with extensions and then abstract it away.
Comment #80
Crell CreditAttribution: Crell commentedcatch: The "sometimes" would be in the case that the extension-based html-default is disabled. That's going to be all-or-nothing on a given site, so it shouldn't cause an issue, no?
Comment #81
Crell CreditAttribution: Crell commentedAlso, setting Vary: Accept in this setup also implies setting no-cache, so a browser/proxy would have to be REALLY broken to then cache it anyway.
Comment #82
znerol CreditAttribution: znerol commentedVary
response header needs to be identical for any request to the same uri. This is also why we addVary: Cookie
on responses even if no cookie at all was on the request./user/login.hal.json
. Rendered HTML page or json? Which status code?path/to/{resource}.{format}
pattern.In my opinion any attempt to perform routing outside of the router will fall on its face sooner or later, and will add additional legacy code which needs to be cleaned up in the future. Please do not try to take any shortcuts here.
Comment #83
dawehnerLet's try to start with something which is needed for both approaches of both @znerol as well of @crell.
#2445723: Use the $request format instead of the ContentNegotation.
@znerol
Do you think this is a sane step to start with?
Comment #84
Fabianx CreditAttribution: Fabianx commentedSo I think its fine that contrib can disable path based negotiation by doing something (like removing the middleware or setting a flag or whatever).
And I agree with 7. that we always send:
Vary: Accept whenever the AcceptHeaderMatcher runs and does header based format negotiation. (plus that no-cache thingy)
Just to summarize again this gives us:
a) fast and cacheable by default - as url based neg is the default
b) reliable and less cacheable, but correct responses
I think because the middleware would come after the page cache anyway, it is fine to also put it in routing directly.
I would agree with #83 that this is a good first step.
Comment #85
neclimdulI've been working on a plan for this issue and I think there are some questions and discussions about the finer points of some things but #83 is pretty set in all of that so that's definitely the first step we can start on now.
Comment #86
Crell CreditAttribution: Crell commentedznerol: A request to /user/login.hal_json would -> /usr/login, _format: hal_json, which routing will find no route for and throw a 406 Not Acceptable, which is what it does now already.
Comment #87
znerol CreditAttribution: znerol commentedI do not expect to get any 406 from a site which does not do
Accept
based content negotiation.Comment #88
catchAnother problem with extension stripping in a middleware.
node/1 is aliased to my-legacy-path.html
We strip .html
Now the alias system gets my-legacy-path
No match.
Comment #89
effulgentsia CreditAttribution: effulgentsia commentedIsn't the equivalent of #88 a problem in HEAD as well:
node/1 is aliased to en/my-legacy-path
We strip en/
Now the alias system gets my-legacy-path
No match.
Comment #90
catchIn that case we'd only strip en/ if path-based language negotiation was enabled. Otherwise it'll be treated as-is.
And if path-based language negotiation was enabled, aliasing to just my-legacy-path would work for en/my-legacy-path. So I don't think it's the same, no.
We'd be stripping the extension on every site, and it'd also be a regression for sites upgrading from previous Drupal versions where it's been fine to have aliases with extensions. There's also no good reason to do it - we can check or aliases first, then routing would need to assess both foo.extension and foo as routing parts to account for the rss.xml case.
I can't find the link, but neclimdul started a spreadsheet with other use-cases last night.
Another example we have apart from rss.xml is config admin routes which have the yaml filename as argument, so admin/config/config/whatever/foo.yml.
Comment #91
catchComment #92
catchComment #93
kerby70 CreditAttribution: kerby70 commentedCorrecting non standard tag 'Need tests' to 'Needs tests'
Comment #94
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commentedIf an external CDN doesn't respect the
Vary
header *and* doesn't have a way to customize the cache key, it is *broken*. Do we have any evidence of any CDN that fall into this category?From the top of my head:
Vary
;Vary
;Vary
header down the road.Comment #95
catch@Damien did you read the issue? Specifically several things znerol has pointed out are ignored in #94.
Comment #96
catchComment #97
dawehnerNote: https://github.com/neclimdul/conneg_test has some current experiments
Comment #98
pwolanin CreditAttribution: pwolanin commentedTalking with dawehner & alexpott about implementation options. I'm going to try a new approach to take out the header handling and keep the path handling fairly simple.
Comment #99
pwolanin CreditAttribution: pwolanin at Acquia commentedChanging the title - I think this is also not a meta since we don't have any open child issues.
Comment #100
pwolanin CreditAttribution: pwolanin at Acquia commentedStarting to work - not functional yet.
Comment #101
pwolanin CreditAttribution: pwolanin at Acquia commentedadd some test cases and fix implementation
Comment #102
pwolanin CreditAttribution: pwolanin at Acquia commentedmore (slow) progress
Comment #103
pwolanin CreditAttribution: pwolanin at Acquia commentedturns out we are seeing page cache interference, so blocked while fixing: #2471473: REST responses should have proper cache tags
patch here test works with the extra hack of disabing page cache, but not what we actually want.
Comment #104
pwolanin CreditAttribution: pwolanin at Acquia commentedEnough is working - let's see what the bot thinks.
Comment #106
dawehner@pwolanin
We should merge the following tests into this patch:
a) The work we specified in a lot of discussions: https://github.com/neclimdul/conneg_test/blob/master/src/Tests/ContentNe...
It is really helpful to have one place to define what behaviour we expect
b) We should also have a test to ensure that we don't have cache poisoning, https://www.drupal.org/files/issues/2364011-72.patch has that already,
see
core/modules/system/src/Tests/Bootstrap/PageCacheTest.php
We should explain why we pass NULL, it might be obvious now, but i won't be in the future.
Comment #107
neclimdulI think we should roll this discussion back just a bit. I want a working patch as well but my biggest concern is we rush this and get another weird or broken solution. Also, there where not and child issues but I don't know that we have a solution to warrant those yet.
I've had this mostly ready for over a week but life happened and I didn't get it posted. I've worked through the discussions we've had here and discussions that happened in a couple of the -wscci meetings(transcripts can be provided) and come up with a series of tests that I think outline sort of the expected behaviors we expect routing to have at the end. Some of the tests might be ideal best case scenarios that we have to skip but its a starting point.
I haven't read peter's patches yet, I'll be following up this comment with a review. I actually agree we should rip out the current content negotiation mechanic as part of this though. However, I hope as part of removing that we'll be able to replace it with a better solution rather then removing it. I think it gives service based systems a really killer edge and I hope we can leave a working implementation in that contrib can leverage to great effect. Per discussions, that system has to interact with routes only work on path(all routes that match that path) who have been flagged in some way (so all routes block caching and vary on accept).
Comment #108
znerol CreditAttribution: znerol commentedComment #109
znerol CreditAttribution: znerol commentedComment #110
webchickWe just had a big discussion about this at Drupal Dev Days, and decided that this issue summary is definitely out of date. Here's a Google Doc for us to get it fixed up: https://docs.google.com/document/d/1o2_Wnsdw1K7RHDWB7aTAHus0qbfa6WkG8-Ry...
Comment #111
znerol CreditAttribution: znerol commentedComment #112
webchickUpdating the issue summary so far. Not done yet.
Comment #113
webchickMoar summary!
Basic findings: the web has changed a lot since comment #7. While only GitHub uses content negotiation as we do, only Twitter uses file extensions.
Comment #114
webchickMOAR summary.
New proposed resolution:
Still researching some stuff, but coming to consensus. (pwolanin, klausi, DamZ, dawehner)
Comment #115
pwolanin CreditAttribution: pwolanin at Acquia commentedONe thing we missed was X-request-with header set by jQuery. nod_ says:
Comment #116
znerol CreditAttribution: znerol commentedComment #117
dawehnerSome subissues for now.
#2472321: Move rest.module routes to /api
#2472323: Move modal / dialog to query parameters
#2472325: Define fallback strategy for missing accept headers
Comment #118
Fabianx CreditAttribution: Fabianx for Acquia commented+1 on the move to /api
+1 on the move to query parameters for dialog/ajax
Thanks @all for discussing this important issue.
Comment #119
pwolanin CreditAttribution: pwolanin at Acquia commentedComment #120
neclimdulSo re-reading this, I think this might be conflating concerns. The research shows that api's do not sure extensions so we should take that into account when designing a full content negotiation system.
I grabbed a couple of the sites listed about(facebook, twitter, and amazon) and went to their website to see how their XHR requests generally get handled.
Facebook:
Twitter:
twitter is sort of weird. all requests seem to send appropriate request headers.
Amazon.com
Requests all seem to be Accept: */*
no requests supply any sort of extension of query argument that is obvious.
Comment #121
neclimdulTo continue the discussion, updated and fixed some bugs with tests. Obviously if we decide to not use extensions the extension tests should be replaced with what ever other method we implement.
There where concerns about the feasibility of the PoC that was in the patch in my github repo. meta_external_caches-2364011-121.patch contains a more fleshed out implementation for discussion as well.
Done:
Not done:
Comment #124
pwolanin CreditAttribution: pwolanin at Acquia commented@neclimdul - I don't think it's clear that core should ship with any kind of a "full" content negotiation system. We had discussed reducing the current code to as simple a mapping as possible and stopping there.
Comment #125
catchI can see that the new proposal is better than the status quo. But I'm not convinced it's better than using extensions - looks like more work to get done, and still requires configuring (at least one) CDN correctly in the case that REST module is providing > 1 format.
Comment #126
webchickThat is true, though how often do you think that case realistically exists? In looking at the "top 10" REST APIs as part of updating the issue summary, they've all deprecated/removed their XML/SOAP APIs are only are doing JSON, across the board. I think it's nice that we offer the ability to do > 1 format as part of our REST API in the interest of flexibility, but not sure it's a "90%" feature. I think most will just use the one format they get out of the box.
Versus none of the top 10 REST APIs were using extensions anymore, at least as far as I remember.
Comment #127
catchI think it's quite likely that a site might want JSON as well as epub representations of their content. Drupal.org still provides 'printer friendly version' for book like https://www.drupal.org/book/export/html/251019 and I know of at least one other site where people actually use that to put long multi-part content into e-readers and similar.
You might not want epub under /api since it's not for a REST api, but that means you'd end up with something like:
node/1 - > HTML
node/1/epub -> epub
api/node/1 -> JSON
vs.
node/1 -> HTML
node/1.epub -> epub
node/1.json -> JSON
Also github lets you do things like https://github.com/md-systems/redis/pull/5.diff (although it redirects, but principle is similar).
Comment #128
neclimdul@pwolanin isn't the point of #2472321: Move rest.module routes to /api to have content negotiation for REST.module? I was under the impressions there was full consensus for full content negotiation for a subset of paths.
Comment #129
Fabianx CreditAttribution: Fabianx for Acquia commented#128: That is also how I understood it, but I also don't see the problem with it as long as we distinguish between routes / paths with just one format vs. one with multiple ...
And as long no one puts rest.module on the main content paths (like /node, etc.)
Comment #130
dawehnerThe main problem I have with extensions at that point of view is that we change the router itself, which is compared to changing
paths IMHO a much bigger change at that point.
Comment #131
catch#121 doesn't look like that big of a change to me, and it's quite self-contained in terms of impact elsewhere, whereas we already have #2463009: Introduce CacheableResponseInterface: consolidate ways of setting X-Drupal-Cache-Tags/Contexts headers and similar to make content negotiation work with the page cache due to /api/.
Comment #132
Crell CreditAttribution: Crell at Palantir.net commentedI really hate all of the options we have available. Just sayin. :-(
#127 to me suggests that we should stick with extensions. The alternative is a seemingly random collection of options; while we can't stop people from putting an arbitrary format at an arbitrary place (nor should we), our "80% automated" mechanisms should be consistent. Having a mix of "sometimes it's /api with Accept, sometimes it's an extension, sometimes it's an extra path element, depending on things" doesn't seem very consistent.
Much as I hate extensions, "if one path has multiple returns, look at the extension" is at least consistent and predictable and more easily documented.
Comment #133
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commentedNote that #121 is really just the beginning of the story: from there we need the related changes inside the
rest
modules and those sound like where the interesting stuff is. For example, we need to make sure that generated links have the proper format extension, and I'm not sure that the current routing API is able to do that without massively breaking its encapsulation.Comment #134
effulgentsia CreditAttribution: effulgentsia at Acquia commentedSince I wasn't at the Drupal Dev Days discussion, thank you for the issue summary updates in #110 - #114. This comment is based purely on the information in that summary, so if there's additional stuff that was discussed that makes this comment stupid, then apologies.
Like #125, I don't understand why the proposal is for rest.module routes within
/api
to not have extensions. For me, what I find most sad in needing "node 1 as html" and "node 1 as json" to be at different URLs at all is the links issue. If you're a JS application, you might want to get "node 1 as json", but then put the links (e.g., to the node's author, tags, etc.) that are returned within that into some HTML fragment of your application and have those be regular clickable links (i.e., to HTML representations). So regardless of whether the JSON representation is at/node/1.json
,/api/node/1
, or/api/node/1.json
, the client will need to do some link manipulation when inserting content from the response into a different format context than the original request. It would have been awesome if all of the internet (browsers, proxies, CDNs) were standardized to cache efficiently with different representations at the same URL, but oh well, they're not, so we're stuck with passing on some complexity of link management to multiformat client applications.Once we make the decision to put "node 1 as html" and "node 1 as json" at different URLs, I don't understand what's gained in putting "node 1 as hal" and "node 1 as json" at the same URL. The argument that Facebook, Google, and Amazon have APIs that don't use extensions seems only marginally relevant to me. Those are individual websites with plenty of budget to configure proxies and CDNs for maximum efficiency. Whereas Drupal is a framework that needs to cater to millions of websites (I'm guessing thousands of which will enable REST module for more than one format). Why require all of those site owners to muck with Proxy and CDN configurations? If there's a substantial benefit that justifies it, then ok, but if it's just to mimic Facebook and Google, meh.
According to #131 and #133, we have some potentially difficult code to write for either approach, so I don't have an opinion at this time on which is more expedient in terms of releasing Drupal 8. So for me, it's more a question of what's better for site owners, and I think working on a broader range of proxy/CDN configurations is more compelling than having the URLs be pretty. Especially if there can be a contrib module to implement header-based negotiation at the same URL for those who want it.
Comment #135
Crell CreditAttribution: Crell at Palantir.net commentedComment #136
Fabianx CreditAttribution: Fabianx commented#134: I think we still should be supporting content negotiation inside of core and make the use cases work - even if its not the default and even if path based negotiation is additionally available.
It will be difficult for contrib to add it in a clean way in my opinion, especially if we look at the internal page cache.
Comment #137
catch@Damien
As effulgentsia pointed out, this isn't any easier for /api/, or if it is, that needs explaining.
If you make a request to node/1.json, then you might want links to point to JSON resources, or you might want the canonical HTML link - depends on the use case of the caller.
If you make a request to api/node/1 for JSON, then you also might want links to point to JSON resources, or you might want the canonical HTML link.
Just node/1 with accept headers doesn't have that problem, but then you have all the other problems.
The other issue for me with /api/ is situations like this:
- Site installs a module that supports a generic Drupal native mobile administration app - this enables json for every entity.
At this point /api/ is only serving one format, so doesn't send a Vary header. So reverse proxy caching/CDNs 'just work'.
- Site then installs a module that provides epub per entity type.
At this point /api/ is serving two formats and sends the Vary header, so varnish/CDNs may do any of 1. stop caching 2. cache OK 3.need configuring.
You can also get to that situation just by clicking around in the UI with rest module, no need to install modules or write code. The behaviour is unpredicatable for end users and requires a lot of knowledge to fix if something goes wrong. As evidenced by this issue, its predecessors and the restws SA.
Whereas with extensions, regardless of how many formats you have, behaviour is the same.
Comment #138
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commented@catch: No one is pretending that the current solution on the table is perfect. It is really just tuned toward making the 80% use case work properly (i.e. no
rest
module), which not butchering the 20% use case (therest
module) too much.If we think we can fix all the complicated problems properly, and accept the likely necessary API changes, I'm fine with moving toward having "extension-based only" content negotiation as the default option.
Comment #139
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commentedMy ideal scenario remains to not try to fix this problem.
Accept
-based negotiation is becoming more and more common, and CDNs are increasingly supporting it. The idea of moving REST to/api
is in line with this: it's the forward thinking approach that makes it harder for people to shoot themselves in the foot by blindly enabling the REST module, while still usingAccept
-based negotiation for the REST part.As a second best scenario, if we want to move add extension-based negotiation in addition to
Accept
-based negotiation, we need to do it properly:Url
class, etc.) as an input, defaulting it to the type of the current requestContrary to the current patch on the table, I don't think it's ok to force the negotiation of the extension to be a explicit route parameter. This really need to be transparent, so that a given site can chose between the three possible negotiation types transparently (Accept-only, Extension-only, both).
Comment #140
pwolanin CreditAttribution: pwolanin at Acquia commentedAfter having started working on the code, I strong think we should continue on the path of adding a prefix and NOT trying to add or force extension support.
Generic extension support is going to require us hacking up the router and I think should be out of scope for beta, and I think will delay getting to RC.
Comment #141
Fabianx CreditAttribution: Fabianx commentedI am happy to help with the needed cache work for the page_cache to support /api in an efficient way.
Comment #142
Crell CreditAttribution: Crell at Palantir.net commentedDamien: The end result either way is that by the time routing is done, the _format attribute of the $request is populated with a format machine name (html, json, hal_json, etc.) and any code after that should rely on $request->getRequestFormat() and *nothing else*. The request is already available to the generator, too. Also, Request already has a configurable mime type->machine name map built in that we're using, and that's what we'd use for extensions. So the 2nd and 3rd bullet you note is already the case.
I want to avoid "mode switches", as that leads to gross complexity. But yes, whatever we do for the 90% case we do want to make sure we can still support Accept-based matching if someone is in a situation where they know it won't break their site. My biggest concern is keeping it as transparent (as possible) to module developers, as the vast majority of them neither understand nor should need to understand all of this silliness.
Comment #143
dawehnerJust try to sort things a bit out.
Both approaches /api and .json have a problem with the way how we implement modals/dialogs, because they are done on a global, not per route level.
Given that we should do #2472323: Move modal / dialog to query parameters no matter of the result of this issue.
Comment #144
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commented@dawehner: one of my points in #139 is that if we go extension-based, we should support that transparently right inside the router. We can't force every route declaring module to have to understand that. In that case modal/dialogs *could* be handled by extensions too.
Comment #145
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commentedNote that there are also interesting side effects of extensions with path aliasing: do we want the path aliasing to also support extensions: you aliased
node/123
toabout
, do we wantabout.json
to do the right thing?(I would say, of course we want that, and that's just one other unintended consequences of moving to extension-based negotiation.)
Comment #146
eggyal CreditAttribution: eggyal commentedNew to the party, but here's a thought...
Have
/node/1
use content negotiation: responses specifyVary: Accept
, but are also merely a301
(moved permanently) redirection to the relevant content-specific URI e.g./node/1.html
.Thus when a user agent encounters a node that it has never before seen, a request will have to be made to the origin server—but it can be processed extremely quickly and the response would be very brief indeed. Whenever it encounters that node thereafter, it will use cached versions of the content in its desired format (whatever that might be).
Example HTTP exchange:
Or
Etc.
With other request methods—e.g.
POST
,PUT
,DELETE
, etc—responses are unlikely to be cacheable in the vast majority of cases anyway. For those cases where they can be cached (and it's desirable to do so), a301
response may not work as intended (since it will normally result in aGET
request to the redirected location); one could instead use307
or308
in those cases (the former only indicating a temporary redirection, albeit one for which cache metadata could nevertheless ensure reasonable persistence; and the latter being a more correct permanent status, albeit one that is not yet so widely supported).Comment #147
znerol CreditAttribution: znerol commentedRe #146 this is what #2430335: Browser language detection is not cache aware tries to do for multilingual sites. Note that this cannot be a general solution because that doubles the HTTP requests. While it might be possible to handle them "very quick", tha still affects client performance significantly.
Comment #148
eggyal CreditAttribution: eggyal commented#147—I agree that it increases the number of requests, which will obviously have an adverse performance impact, but it certainly won't *double* the number of requests due to:
301
responses and internally rewrite any subsequent references to those nodes they have fetched before, never again needing to make the initial request but instead going directly for the appropriate content-specific URI; andVary: Accept
will be able to respond (with the301
redirect) to the first request without the origin server being hit at all.Perhaps it's not a "general solution", but it might be an acceptable solution for those who wish to enable content negotiation.
Comment #149
neclimdulSorry I haven't had any free time this week, lets see if I can catch up. There's so much discussion I don't think I can break out all the points
1) I think Damien sort of nailed the reason for moving rest.module to /api in #139, its a forward thinking move to allow support support for more advanced API's to mature by having basic support and a working implementation. Moving it is a way to allow support to mature on top of core and address effulgentsia's 100% valid concern that we can't expect Google style infrastructure knowledge to support it. We turn off caching out of the box and it _should_ be usable.
Let me take a step back though, one of the ways I approach this is in thinking of core is a system but more importantly thinking of it as a toolbox for site developers. These 2 approaches of extensions and Accept negotiations are two tools, not mutually exclusive solutions. Support both in different ways and allow the system designer to decide which meets their requirements.
Extensions:
Built in cache-ability.
Ability to build on top of existing url's like node/1
Easy to understand, we use them in everyday situations.
Accept based:
Allows REST style resources
Industry standard for an "API"
Complicated caching problems (we won't cache by default)
2) Link building I don't know if a solution has really come up. With format being fairly easily resolvable I think the extension should be fairly easy for individual implementations to implement. I hope rest is fairly self contained and will also just make sense going forward.
3) Aliases, #145 yeah in earlier discussions the same sentiment was put forward. The does seem ideal, I'm not sure how to support it. My gut says its ugly. I think maybe more aliases? Is that ideal? how when does it happen? I don't have the answers, so it was left as @todo in the test suite.
4) Redirects. That's pretty novel, it might be worth investigating after this since I don't think its precluded with the current approaches. I have some concerns though because the root of this issue is caching around
Vary: Accept
. It seems either we're creating a ton of uncached requests to node/1 or we're stumbling into trusting Vary: Accept caching behaviors for the 301.Comment #150
catchAliases I'd expect us to have to add more aliases to support the REST routes, but also don't see how extensions is better or worse than /api/ on that front.
Not to mention if you had something like epub and wanted to be able to visit it in a browser (there are epub extensions for browsers so this is quite possibe, or you might want to download it as a file to open it with an e-reader), how do you access that? Can't do a format-specific alias for that either - would mean adding an extra route definition in that case probably.
Comment #151
catch#149 is a good summary of the pros and cons of each approach, the pros/cons were really missing for me from the dev days google doc hence push back here.
neclimdul's right that the 301 redirect has the same problems with caching as not redirecting, we'd have to issue a 302 anyway since by definition it's not a permanent redirect if the location changes.
#2430335: Browser language detection is not cache aware is talking about redirects explicitly to prevent caching. But it assumes that this will happen once per session if at all - since once you get redirected, all links then get language prefixes applied and browser negotiation won't run. Additionally you can have language negotiation/translation without enabling browser language negotiation at all and this is configurable in the UI. So the cacheability issue is the same, but the assumptions/requirements of the system are quite a bit different.
Comment #152
dawehnerMake an issue for the query based approach, see #2481453: Implement query parameter based content negotiation as alternative to extensions
Comment #153
webchickTagging.
Comment #154
neclimdulSorry for sitting on this, I didn't think I was giving it a very objective write up so distanced my self for a bit. After Thursday's meeting of minds we came up with a plan forward that includes a short term and long term goals for a solution.
Short term:
1) wrap up #2472323: Move modal / dialog to query parameters
There where different views on why but we all agreed this should get out of the way. Personally I think of it as modal/ajax are modifying requests and happening in a generalized way that must be supported on _every_ path. To simplify caching and remove complexity from other solutions we move forward with this issue.
Long term:
There was very little interest for using Accept based negotiation for solving core issues. Only "nice to have" for contrib and site developers. Based on that in the long term we investigate 2 approaches simultaneously.
1) We investigate supporting query arguments for specifying format.
2) We investigate supporting extensions for specifying formats.
And finally there wasn't a lot of consensus on how to support Accept headers. Since it doesn't seem we'll be supporting it in core we discussed moving as much as possible into a module. To use it you would need to opt in in some way to force the correct vary/caching for the URL.
Much of the discussion that lead to this approach hinged around how automated URL generation would work for resources. The example of this is how a link to a reference to a resource would be generated in a HAL formated response. The first concern was around prefix I believe was a resource living in 2 places (/api/node/1 and /node/1). Also, for extensions there was concern about how how would be possible for generating paths with an extension without creating invalid links.
Comment #155
dawehnerThank you neclimdul for writing up the summary, I think it contains all the imortant bits, especially written in a neutral way :)
Comment #156
catchTo expand on link generation discussion, there's issues with all of the approaches (except the one in HEAD which this issue is trying to change).
When viewing a resource in whichever format, we don't necessarily know if we want the canonical URL (i.e. node/1) or the version in the same format that we're viewing (node/1.json, api/node/1, node/1?_format=hal_json). We could assume that we always want to link in the same format, but some routes won't support that format because they only provide HTML or some other hard-coded format (entity forms, sitemap.xml etc.).
So this means either:
- don't try to specify the format - but then you'd have to do link-munging in the client (prepend /api, append an extension).
- always append the format, but end up with 404 or 406 responses from some links.
- in the link generator, try to figure out which formats may or may not be valid - has performance implications.
If we have everything at node/1 with accept headers, we don't have that problem - but we have the various caching problems this issue is trying to solve.
However with query parameters, it's valid for the application to ignore them - so if you link to sitemap.xml?_format=json then returning sitemap.xml isn't the end of the world. This would let us always link with the format, and core will already ignore the query parameter if it doesn't care about it (although the current extension patch also handles this in a similar way for incoming requests).
Additionally, if browsers/proxies/CDNs all started to support accept headers + vary + caching consistently, then 9.x or whichever version could switch to/add accept header content negotiation and node/1?_format=hal_json will work just fine with an accept header - no broken links/redirects to worry about.
However aesthetically/semantically extensions are nicer and some of the complexities (like node/1.xml vs. sitemap.xml are already working with neclimdul's work), so we thought about going a bit further with both of those approaches before trying to make a decision.
Comment #157
dawehnerA review on #2481453: Implement query parameter based content negotiation as alternative to extensions would be highly welcomed, or at least some critique about that we really should try extensions.
Comment #158
clemens.tolboomComment #159
grendzy CreditAttribution: grendzy at Metal Toad commentedI wanted to provide some data & context here – the amount of Accept header variation is underestimated. Given the messiness of the browser landscape, I think it's unlikely this situation will improve during the life expectancy of Drupal 8. An "upgrade path" to content-negotiation-everywhere would be nice, but I wouldn't consider it a priority, IMO.
I logged some data on my company's site, and in one weekend observed 95 unqiue Accept headers. Below are some of the most common headers. Under these conditions,
Vary: Accept
will will have a very poor hit rate, even with caches that understand it.Comment #160
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commented@grendzy: Granted, but for all practical purposes, it probably doesn't matter. Given that the distribution of incoming requests is extremely far from uniform, the variability in the cache key usually has very little influence on the overall cache hit rate.
If you want to continue your though experiment, you should get the log data and simulate a cache key, and recompute what the overall hit rate would be with Accept in the cache key versus without it. I doubt the hit rate with Accept is going to be significantly lower.
Comment #161
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commented@grendzy: Also, nothing prevents you from normalizing the
Accept
headers in your caching/distribution layer (like it is usually done forAccept-Encoding
). With more and more websites using content-based negotiation, we are going to see CDNs offering that option more and more.Comment #162
mikl CreditAttribution: mikl at Liip commentedAs for “variability in the cache key usually has very little influence on the overall cache hit rate”, that is completely wrong as far as I am concerned. Every variation here basically cuts the cache hit rate in half.
So consider the three big desktop browsers:
Firefox 38: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Chrome 42: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Safari 8: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
By using Vary: Accept, you'd basically have a separate cache bin for the desktop version of Chrome. And when you then add mobile browsers and niche platform like Windows to the mix, then the cache hit rates go straight down the tubes.
As for normalising the Accept headers, that's not at all trivial. Unlike Accept-Encoding which has very few (reasonably) possible values, rewriting Accept headers requires deep knowledge of the application and even the specific route. So it will likely never be a standard option that you can enable on your CDN.
Drupal should be engineered for the web have, rather than the one we wish we had. If we ship Drupal 8 with Vary: Accept, CDNs and reverse proxies will have to consider us broken out-of-the-box, and will have to choose between offering bad service or breaking Drupal's REST API.
Comment #163
dawehnerIf I understand the point of damien correctly this is about really high traffic sites, which have a lot of requests from all possible browsers,
but I'm not convinced that we should actually aim for just that particular usecase.
Advertisment: #2481453: Implement query parameter based content negotiation as alternative to extensions is still ready for a review
Comment #164
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commented@milk#162:
This is not true at all, even in the most trivial cases. Suppose one URL, receiving 1,000 cached requests from 2 different browsers, without Accept as part of the cache key, you would get a hit rate of 999/1000 (one uncached request, followed by 999 cached requests), with Accept as part of the cache key, you would get a hit rate of 998/1000 (two uncached requests, followed by 998 cached requests).
Think about something else: many websites currently leverage a full-site CDN with multiple POPs around the world (Amazon CloudFront, for example, has more than 30 of them). Each of those POPs cache separately, but you still get really high hit rates.
You get a high hit rate when the flow of incoming requests is significantly higher than the rate of change of the underlying page, and that's basically all. This is the case of the extreme majority of Drupal websites, which are basically brochurewares, even if you make the cache key more complex.
Of course, all the normalization done on the CDN *has* to be application-aware. This is already really easy to do with Fastly, so I anticipate that other CDNs are going to provide the same types of options when needed.
There are reasons to move away from the
Accept
header for this, but all reverse proxies and many CDNs can easily support this use case, so let's not take this decision for the wrong reasons.Comment #165
Crell CreditAttribution: Crell at Palantir.net commentedNo, most websites over a certain size/budget/wealth of owner do. That's not a representative sample.
Also, as discussed far up the thread (or maybe one of the precursor threads, I've lost track), while we could potentially do all sorts of black magic with the incoming request to normalize it and make Accept headers useful, sending Vary: Accept can do unpleasant things on the *browser* vis a vis caching. We have no control there, no matter how much of the infrastructure you own. At least not in the general case.
Comment #166
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commentedI obviously meant "many". My point was not about the number of websites that uses a full-site CDN. My point was that full-site CDNs having multiple locations over the world doesn't affect the cache hit ratio much.
I am not arguing against moving away from
Accept
-based negotiation. I'm arguing against *bad reasons* to move away fromAccept
-based negotiation. That not many websites today are doingAccept
-based negotiation mixing HTML and non-HTML, and that as a consequence there might be some browser bugs in that scenario is indeed a powerful reason not to try to do it.Comment #167
dawehnerQuery parameter based content negotiation is in!
So both CDNs, and potentially more important browsers don't have issues anymore.
Given that the issue should not be critical anymore, maybe some of the accept header based routing discussion could still be done here ...
Comment #168
Wim LeersI also want to call out that #2481453: Implement query parameter based content negotiation as alternative to extensions kept
Accept
header-based negotiation as a test, to ensure that contrib/a future D8 core release can still add it.Comment #169
dawehnerIt would be really interesting to have someone (@neclimdul) try it actually out and report what kind of problems this results to?
Comment #170
Crell CreditAttribution: Crell at Palantir.net commentedI don't think postponed is really correct here, but we don't have a "needs more research" status. Now that query params are in, can someone verify if there is still a cache poisoning here, and if so what the details are? (If it's not an issue now, great, we're done, huzzah.)
Comment #171
neclimdulLets go ahead and close this out.
Comment #173
andypostThere's still 2 todos in core
And somehow
application/json
is not resolved as proper format from Content-Type headerComment #174
andypostFiled #2725435: Remove outdated @todo pointing to #2364011
Comment #175
effulgentsia CreditAttribution: effulgentsia at Acquia commentedFolks here may be interested in #3028501: Enable header-based proactive content negotiation with optimizations and opt-outs available.. I don't think the patch there would mess up external caches, but if you think it would, please comment on that issue.