Problem/Motivation

In #2445723: Use the $request format instead of the ContentNegotation., we removed Accept header negotiation. That's all well and good for the cases where the client, like a browser, accepts multiple formats, making negotiation difficult. However, in the case of an HTTP API (e.g., JSON:API), a client might specify a single format in the Accept header. In such cases, we know enough to set the request format early in the processing, and should do so to both solve for certain edge-case bugs and optimize retrieval from page cache.

By HEAD not currently honoring an unambiguous Accept header, we have the following bug:
- Perform jQuery.ajax('/node/1', {'headers': {'accept': 'application/json'}}). The client is explicitly asking for JSON and only for JSON. /node/1 is not available in JSON, so the correct response is to return a 406. HEAD currently ignores the Accept header and returns a 200, with the body in HTML format.

Additionally, by HEAD not determining the format from an unambiguous Accept header, modules like JSON:API are adding their own middleware to do it. For example, see #3027980: [upstream] Move FormatSetter from middleware to a route filter, removing conditionality on Accept header. Except, they're doing it wrong, because they're only checking for the presence of the media type in the Accept header, not that it's the only one there.

Proposed resolution

Expand NegotiationMiddleware::getContentType() to return a format that can be unambiguously determined from the Accept header.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#70 3028501-accept-headers-70.patch17.69 KBkim.pepper
#65 3028501-9.4-no-defaults.patch17.62 KBbradjones1
#56 9.3.x-patch-without-settings-file.patch17.58 KBbradjones1
#42 interdiff_40-42.txt1.12 KBraman.b
#42 3028501-42.patch20.03 KBraman.b
#40 interdiff_37-40.txt2.72 KBraman.b
#40 3028501-40.patch20.2 KBraman.b
#37 interdiff.txt10.18 KBbradjones1
#37 3028501-37.patch19.13 KBbradjones1
#29 3028501-29.patch20.17 KBgabesullice
#29 interdiff.txt1.92 KBgabesullice
#27 3028501-26.patch20.16 KBgabesullice
#27 interdiff.txt17.77 KBgabesullice
#25 3028501-negotiation-middleware-24-code-only.patch4.77 KBbradjones1
#24 3028501-negotiation-middleware-24.patch6.06 KBbradjones1
#24 interdiff-21-24.txt1.76 KBbradjones1
#21 3028501-negotiation-middleware-21.patch5.51 KBbradjones1
#21 3028501-negotiation-middleware-21-test-only.patch2.35 KBbradjones1
#21 interdiff-17-21.txt2.73 KBbradjones1
#17 3028501-negotiation-middleware-17-code-only.patch1.87 KBbradjones1
#17 interdiff-6-17.txt1.13 KBbradjones1
#17 3028501-negotiation-middleware-17.patch2.49 KBbradjones1
#6 3028501-negotiation-middleware-6.patch2.45 KBbradjones1

Issue fork drupal-3028501

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

effulgentsia created an issue. See original summary.

effulgentsia’s picture

Status: Needs review » Needs work

The last submitted patch, 2: negotiation-middleware.patch, failed testing. View results

effulgentsia’s picture

Status: Needs work » Closed (works as designed)

Tomorrow, I'll open a new issue with a different approach, and comment here as to why I'm closing this one.

effulgentsia’s picture

The reason I closed this issue is:

  1. Per #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use, Drupal by default does not Vary responses by the Accept header. Modules can be added to do Accept-header negotiation, but it's not the default. Therefore, this patch is incorrect, since it mutates $request in a way that can result in a different response, which goes against Drupal's otherwise default of not varying by the Accept header. The test failures above fortunately caught that.
  2. The client is explicitly [via the Accept header] asking for JSON and only for JSON. /node/1 is not available in JSON, so the correct response is to return a 406.

    When I opened this issue, I believed this to be an HTTP compliance bug. However, https://tools.ietf.org/html/rfc7231#section-5.3.2 says that it's not:

    If the [Accept] header field is
    present in a request and none of the available representations for
    the response have a media type that is listed as acceptable, the
    origin server can either honor the header field by sending a 406 (Not
    Acceptable) response or disregard the header field by treating the
    response as if it is not subject to content negotiation.

    In other words, Drupal's default behavior of ignoring the Accept header is a valid choice.

  3. and optimize retrieval from page cache

    This is still an issue, but I think I found a better solution for it: #3029373: PageCache::getCacheId() sometimes uses different cache IDs during get and set, causing unnecessary cache misses for some requests

bradjones1’s picture

Title: Honor the Accept header when it's unambiguous » Allow opt-in use of Accept header when it's unambiguous
Version: 8.7.x-dev » 8.8.x-dev
Status: Closed (works as designed) » Needs review
Issue tags: +Needs tests
Related issues: +#2725435: Remove outdated @todo pointing to #2364011
FileSize
2.45 KB

First, I appreciate the initial work-up on this and the thoughtful consideration of reasons *not* to do this, however I think I may have uncovered a use case for supporting this header in the limited sense that sometimes you may *need* to do so for technical reasons.

I came to this issue via #3085360: RouteProvider::getRouteCollectionForRequest() can poison query string of next request, which in short says that you get some buggy, unintuitive behaviour from the route collection cache if an error is handled for a json:api request, but before the request format is set to api_json. That is, an authentication exception will be handled with the request format set to html (the default) and the exception route will be cached with some query strings added. Those query strings, when set on a subsequent authorized request will then make that request fail json:api's validation.

That issue would be solved if the request format can be set sooner. In the case of json_api, spec compliance means supporting only a subset of query parameter keys, and _format is not on the list.

As a quick history review, we use _format because UA and reverse proxy support for the Accept header is spotty. The pattern is actually inherited from Symfony, where the Request object can explicitly set its format from the same query param.

But still, in some cases (most notably json:api, but probably others?) we need the ability to use the Accept header. After some consultation with @gabesullice on Slack, we think this can unlock some blocked issues elsewhere. Since it's possible that this will cause unexpected behaviour behind certain infrastructure, this patch goes a bit further than the original in making this opt-in functionality. We also continue to respect the _format query string over the header, and will only use the header when it is unambiguous - that is, it matches exactly a registered mimetype.

No tests yet, but I think they should be easy enough to add, and we shouldn't fail anything at the moment since this is opt-in.

This would also resolve #2725435: Remove outdated @todo pointing to #2364011.

bradjones1’s picture

Rubber-ducking a few questions:

Is there any reason why it might also be necessary to specify a shortlist of headers that will be allowed to get matched from Accept? E.g., make this ONLY match a specified list of headers? That would further narrow this to a specific kind of request. We could make the allowed value of the setting either a boolean OR an array of allowed mimetypes?

Does the issue regarding the poisoning of the route collection cache still need a separate mitigation? That is, it's still broken if you don't implement this.

gabesullice’s picture

@bradjones1++ for an excellent summary in #6.

Also after an initial pass, that solution looks really nice overall.

Is there any reason why it might also be necessary to specify a shortlist of headers that will be allowed to get matched from Accept? E.g., make this ONLY match a specified list of headers?

I don't think so. The only reason that I can think of would be to avoid the risk of cache poisoning by a malicious UA requesting random media types via the Accept header. I don't think that's a risk because this patch is doing $request->getFormat($mimeType), which narrow the potential cache variation down to only the registered list of formats. Am I missing some other reason to do so?

On a similar note, I am wondering if we want the settings variable to be something like:

$settings['use_http_accept_header_negotiation'] = [
  'accept' => TRUE,
  'accept-language' => TRUE,
  'accept-encoding' => FALSE,
]

So that we can more granularly support content negotiation. F.e. I know certain CDNs respect some of the accept-* headers, but not all of them.

Does the issue regarding the poisoning of the route collection cache still need a separate mitigation? That is, it's still broken if you don't implement this.

If this resolves the issue, then no, I don't think so. AFAIU, the issue is specific to JSON:API's error WRT the disallowed query parameter _format. Since all JSON:API UA's MUST send the Accept header, this issue should fix it in all cases as long as the $settings is enabled. Since this is such an edge-case, I think it's okay to leave it there.


  1. +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -89,6 +100,12 @@ protected function getContentType(Request $request) {
    +    else if (Settings::get('use_http_accept_header_negotiation', FALSE)) {
    

    Nit: elseif not else if. Though, I think we can just do an extra newline + if since the first condition will has a return in it.

  2. +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -89,6 +100,12 @@ protected function getContentType(Request $request) {
    +      if (count($accept) === 1 && $format = $request->getFormat($accept[0])) {
    

    I know that in Slack we talked about only doing this iff there was only one request header, but is that really necessary? I'm not convinced any more.

    Also, super nitty, but maybe just do reset($accept) instead of $accept[0]. I've been bitten by weird, unexpected indices too many times.

  3. +++ b/sites/default/default.settings.php
    @@ -742,6 +742,15 @@
    +/**
    + * Accept header content negotiation.
    + *
    + * This is used to inform the HTTP content type negotiation middleware it should
    + * use the HTTP Accept header as a potential source for setting the request
    + * format.
    + */
    +# $settings['use_http_accept_header_negotiation'] = TRUE;
    

    I wonder if we should immediately add a Drupal 9 followup to make this the default and make FALSE the opt-in behavior. IMO, Drupal should do the right thing with respect to HTTP and provide this as a mechanism for user's with misbehaving infrastructure opt-in to the workaround instead.

Nice work! Leaving as NR for others to review this. I'll ping @Wim Leers and @effulgentsia to have them take a look and weigh in also.

gabesullice’s picture

On a similar note, I am wondering if we want the settings variable to be something like:

Obviously, I think this patch should only add the accept header at first. We can add the other headers later, as necessary.

gabesullice’s picture

Since all JSON:API UA's MUST send the Accept header,

Reread the spec, I guess it's not required. But Drupal's JSON:API module did make this a requirement: https://www.drupal.org/node/3013494

Wim Leers’s picture

Very, very nice work. 👏

I'm impressed with the detective work you did, across multiple issues, across multiple subsystems in core. 😳🙏

I wonder if we should immediately add a Drupal 9 followup to make this the default and make FALSE the opt-in behavior.

I wanted to ask something similar, but different. Why do we even need to make this a setting if it's risk-free? It's gonna be strange that error responses behave differently (and more sensible) on the miniscule percentage of sites that will turn this setting on.

I understand it's for minimizing risk. But … can you articulate what the risk would be, what the consequences would be?

gabesullice’s picture

I wanted to ask something similar, but different. Why do we even need to make this a setting if it's risk-free? It's gonna be strange that error responses behave differently (and more sensible) on the miniscule percentage of sites that will turn this setting on.

I don't think it's risk-free. If you enabled this with REST module on and send a Accept: application/hal+json request to /node/1 then visit /node/1 with your browser, you'll get a JSON response if you have a misbehaving intermediary.

Maybe you're saying new installs, even of Drupal 8, should have this set to TRUE by default, but existing installs should have a BC setting this to FALSE?

Status: Needs review » Needs work

The last submitted patch, 6: 3028501-negotiation-middleware-6.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

bradjones1’s picture

@gabesullice - Thanks for the thoughtful review - I actually had reset() in there originally but didn't know if it was too clever. Your comments are all great.

I do have a question though about the json:api spec requirement vis-a-vis the linked issue that led me here. One, if the client is not compliant (or during testing you forget the Accept header, or whatever) then you get the cache poisoning issue.

More specifically, json:api in Drupal, as currently implemented, does not enforce this requirement if it is indeed a hard and fast validation constraint. As evidenced by the fact we're having this discussion to begin with, the Accept header is nowhere consulted in setting the format to api_json or validating the request. The request validator does other things, but it doesn't check the header. We just assume that if you're hitting a jsonapi path, that's your request format. See Drupal\jsonapi\Routing\EarlyFormatSetter.

More specifically, the spec reads:

Clients MUST send all JSON:API data in request documents

...but to me a request document is a payload; what about a GET? Maybe I'm missing the definition of a "document" elsewhere to mean, any request to json:api, even one without a document payload?

Since you're a spec maintainer, it would be great to get that answer from you. If the answer is yes, header required all the time, then I think we can maybe remove Drupal\jsonapi\Routing\EarlyFormatSetter and perhaps use some earlier-firing middleware or event to validate the request. But I do think the cache poisoning is an issue since we want to catch non-conforming requests and offer a solution (much like the current validator), not potentially poison the cache.

bradjones1’s picture

So digging into this further - per Gabe's note on #10 (which I originally missed) - the json:api Accept header is not required in the spec for GET... but it was added as a format-setting requirement in #3009596: JSON API 2.x responses always result in a Page Cache MISS... but then removed per the commit to core - see this comment and interdiff.

In that light, this is actually a regression from the core integration? In so far as jsonapi 2.x sets the format with early-firing middleware from a property of the request, but it was replaced with a route filter that instead set the request format based on the route.

bradjones1’s picture

bradjones1’s picture

Fresh patch per the feedback from above, also using Symfony\Component\HttpFoundation\AcceptHeader for fetching the values from the header, partly to use existing tools and second, if this does get expanded in Drupal 9 to allow for other types of content-type negotiation, etc., then it's already in here to use the Symfony component. The way the patch was written originally does work for our present requirement (only match if there's only one value and it explicitly matches a known mimetype) but I think this is more semantic than doing a match against the raw Accept header which may included comma-delimited multiple values. So we use AcceptHeader::fromString() to parse it, then see if it's one value, and further if it's known.

The test failure was on known scaffold files (I'm actually going to include a patch without the change to settings.default.php, so I can still bring this with Composer...) so we can address that once the language is agreed-upon.

Further nits and expansions:

Since it looks like the requirement for requests to contain the appropriate Accept header was added in contrib 2.x, then removed in the commit to core, do we want to make a follow-up to respect the spec to require the correct value on document requests but not GET? I'm not sure if this qualifies as BC break since right now Drupal is more lax than the spec, or...?

Also re:

I wonder if we should immediately add a Drupal 9 followup to make this the default and make FALSE the opt-in behavior. IMO, Drupal should do the "right" thing with respect to HTTP and provide this as a mechanism for user's with misbehaving infrastructure opt-in to the workaround instead.

I like the idea of Drupal not twisting into too much of a contortion to address specific reverse proxies, though not all of them are always going to be under the control of the site owner (corporate proxies?) and I wonder if the upgrade path on this is a little confusing?

I also re-opened the cache poisoning issue since it will not entirely be resolved by this patch (that is, if you send a malformed request you still poison, see note above) - though I think this was foreseen in part by Wim during the core merge process:

Some error responses happen before or during routing, and at that time the JSON:API format hasn't been negotiated yet. So … you get a HTML response.

And that was addressed by changing the weight of the format setter, but it still leaves exceptions before routing. But we still need this issue, too :-)

The last submitted patch, 17: 3028501-negotiation-middleware-17.patch, failed testing. View results

bradjones1’s picture

Issue tags: +Needs tests coverage
andypost’s picture

Issue tags: -Needs tests coverage

This one is enough

bradjones1’s picture

The last submitted patch, 21: 3028501-negotiation-middleware-21-test-only.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 21: 3028501-negotiation-middleware-21.patch, failed testing. View results

bradjones1’s picture

So close - Settings is tough to mock since it's a static, so I unset the test settings when we're done; it does however reveal that the mocked Request in ::testHandle() doesn't have a header parameter set, so adding a HeaderBag there for future-proofing.

The test-only patch above is still basically valid, so not re-adding it here.

bradjones1’s picture

Code-only (no settings file changes) patch for use with composer-patches and drupal-project.

gabesullice’s picture

@bradjones1, this looks great! Thanks!

I reviewed and then addressed all my concerns.

I went a step further than your patch by setting the `Vary` header under all the circumstances where it will be necessary that I can think of. I also adjusted everything to work based primarily off of a container param instead of a settings.php value. Now, the value is only used as a temporary BC measure.

Finally, I went ahead and tried to add some sensible optimizations to keep cache variation to a minimum. For example, I only set the vary: accept header on responses when there the serializer.formats container parameter is available (which is basically a test for whether the serialization module is enabled).

I think I thought of everything I could, but I will not feel confident until this gets some careful scrutiny by @Wim Leers.

  1. +++ b/core/assets/scaffold/files/default.settings.php
    @@ -742,6 +742,15 @@
    +# $settings['use_http_accept_header_negotiation'] = TRUE;
    
    +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -2,13 +2,25 @@
    + * utilize the Accept header for content-type negotiation; this behaviour
    

    I added a lot more nuance to this configuration. The gist is that content negotiation is now managed as a container parameter, but this can be opted out of via settings.php.

    The long story is that for existing installs, everything will continue to work as before, because they will not have `$settings['enable_content_negotiation'] = TRUE` defined. New installs will have this defined when the settings file is created during installation. In Drupal 9, we can remove this setting altogether and only respect the values defined in `services.yml`, using defaults when this is not defined (thus making it purely an opt-out/refinement thing).

  2. +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -2,13 +2,25 @@
    + * For certain applications (json:api, for instance) developers may wish to
    

    Nit: JSON:API (capitalized)

  3. +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -2,13 +2,25 @@
    + * utilize the Accept header for content-type negotiation; this behaviour
    

    Nit: s/Accept/`accept`: HTTP headers should be lower-cased since HTTP/2 and I'm on a mission to fix the internet! :P See https://blog.yaakov.online/http-2-header-casing/

    Nit: s/content-type/content: "Content negotiation" is a broader concept than just the content-type. See https://tools.ietf.org/html/rfc7231#section-5.3.


gabesullice’s picture

gabesullice’s picture

Title: Allow opt-in use of Accept header when it's unambiguous » Enable header-based proactive content negotiation with optimizations and opt-outs available.
gabesullice’s picture

  1. +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -102,7 +117,7 @@ protected function getContentType(Request $request) {
    +    if ($this->contentNegotiationConfig['respect_headers']['accept'] ?? FALSE) {
    

    s/respect_headers/headers

  2. +++ b/sites/default/default.services.yml
    @@ -172,3 +172,14 @@ parameters:
    +    # `use_http_content_negotiation_headers` is set to TRUE via settings.php
    

    This should be enable_content_negotation

gabesullice’s picture

gabesullice’s picture

I'd like to get @effulgentsia's opinion as well.

gabesullice’s picture

More self-review:

  1. +++ b/core/lib/Drupal/Core/DependencyInjection/Compiler/ContentNegotiationCompilerPass.php
    @@ -0,0 +1,55 @@
    +    // does not make sense to enable content negotiation via the accept` header.
    

    Missing a backtick before accept`

  2. +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -2,13 +2,23 @@
    + * in practice, Drupal suggests the use of a _format query parameter for
    + * expressing a content type that would otherwise be sent in an Accept HTTP
    + * header.
    

    I'm not sure that this should be our suggestion. I'd like for us to be optimistic about proxy support, not pessimistic.

  3. +++ b/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php
    @@ -2,13 +2,23 @@
    + * disabled by configuring the 'http_content_negotiation_headers' setting.
    

    s/http_content_negotiation_headers settings/content_negotiation.config container parameter in services.yml

  4. +++ b/core/modules/serialization/src/SerializationServiceProvider.php
    @@ -15,7 +16,9 @@ class SerializationServiceProvider implements ServiceProviderInterface {
    +    // Set a priority of 1 so that this pass runs before
    +    // see \Drupal\Core\DependencyInjection\Compiler\ContentNegotiationCompilerPass
    +    $container->addCompilerPass(new RegisterSerializationClassesCompilerPass(), PassConfig::TYPE_BEFORE_OPTIMIZATION, 1);
    

    Maybe ContentNegotiationCompilerPass should have a priority of -1 instead. Curious what others think.

Status: Needs review » Needs work

The last submitted patch, 29: 3028501-29.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Wim Leers’s picture

+++ b/core/assets/scaffold/files/default.settings.php
@@ -742,6 +742,15 @@
+/**
+ * Accept header content negotiation.
+ *
+ * This is used to inform the HTTP content type negotiation middleware it should
+ * use the HTTP Accept header as a potential source for setting the request
+ * format.
+ */
+# $settings['use_http_accept_header_negotiation'] = TRUE;

This works fine for reverse proxies.

But those have never been the problem. Reverse proxies are under control of the site/application/Drupal.

The problem are proxies in front of networks at big corporations, universities, hospitals. Those are out of the control of the application/site/Drupal owner. They may not handle this correctly. And hence cause things to break for all other users on that network interacting with this application.

Now, with the rise of TLS, and the slow rise of HTTP/2 (which de facto requires TLS), this is becoming less and less of a problem. Because these responses are not being cached anyway!

So, in reality, it's only a problem anymore when:

  1. there's a crappy proxy
  2. the site is using HTTP/1
  3. the site is not using TLS
  4. the site is serving a HTTP API to anonymous users

All four factors have to apply before problems arise. But it's very hard to quantify this. And without hard data, it's hard to choose to ship this with Drupal core.

Thoughts?

xjm’s picture

Version: 8.9.x-dev » 9.1.x-dev

This would be a minor-only change. Since 8.9.x and 9.0.x are now in beta, I'm moving this to 9.1.x. Thanks!

bradjones1’s picture

Status: Needs work » Needs review
FileSize
19.13 KB
10.18 KB

Re-roll and incorporating some of Gabe's self-nits. Also removing the references to deprecated $settings option since we're now D9.

I think caveat emptor is an OK approach re: Wim's notes above re: the situations in which the content negotiation can go sideways. Do we need to somehow make it "more" obvious to site owners that the conditions listed above could be an issue? I think we're rapidly approaching a situation where HTTP/2, TLS, etc. are the norm and the noncompliant edge case here is less significant than it was even a year ago?

Status: Needs review » Needs work

The last submitted patch, 37: 3028501-37.patch, failed testing. View results

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

raman.b’s picture

Status: Needs work » Needs review
FileSize
20.2 KB
2.72 KB

Status: Needs review » Needs work

The last submitted patch, 40: 3028501-40.patch, failed testing. View results

raman.b’s picture

Status: Needs work » Needs review
FileSize
20.03 KB
1.12 KB
gabesullice’s picture

Now, with the rise of TLS, and the slow rise of HTTP/2 (which de facto requires TLS), this is becoming less and less of a problem. Because these responses are not being cached anyway!

So, in reality, it's only a problem anymore when:

  1. there's a crappy proxy
  2. the site is using HTTP/1
  3. the site is not using TLS
  4. the site is serving a HTTP API to anonymous users

All four factors have to apply before problems arise. But it's very hard to quantify this.

  • HTTP/2 is now deployed to ~1/2 of all websites.
  • As of October 2020, between 79-98% of all pages loaded in Chrome use HTTPS. If Linux clients are excluded, the numbers jump to 89-98%. (source)
  • The Google team "… used private data sources to track the HTTPS state of the top 100 non-Google sites on the Internet. By [their] estimates, these 100 sites account for approximately 25% of all website traffic worldwide." and 97/100 of those sites default to HTTPS. (source)

Therefore, it seems to me that the numbers affected must be vanishingly small.

And without hard data, it's hard to choose to ship this with Drupal core. Thoughts?

I hope the numbers above temper your caution. What this issue proposes is not something radical. It proposes that we permit Drupal to do content negotiation the way that it was designed as far back as 1997! IOW, the burden of proof is not ours. Even so, we can see how rare the scenario you describe must be.

I understand that there is a risk that a very small number of users may see broken sites because of misbehaving caches. That would not be our fault, nor our responsibility. It would be the fault of the IT team managing that network.

This patch permits website owners to opt-out of this behavior if find our default objectionable. With this patch, at least they have a choice. Right now, the only choice is custom code.

Wim Leers’s picture

It seems to me that the numbers affected must be vanishingly small.

I agree!

I wrote

All four factors have to apply before problems arise. But it's very hard to quantify this. And without hard data, it's hard to choose to ship this with Drupal core.

You just did quantify this! 👏👏🙏

Based on this data, I personally feel comfortable.

It proposes that we permit Drupal to do content negotiation the way that it was designed as far back as 1997! IOW, the burden of proof is not ours.

This is the wrong argument to make, even though I very much share your frustration! Theoretical realities don't help anyone in the real world.

Even so, we can see how rare the scenario you describe must be.

This is the right argument to make. 😊

I understand that there is a risk that a very small number of users may see broken sites because of misbehaving caches. That would not be our fault, nor our responsibility. It would be the fault of the IT team managing that network.

This is also the right argument to make, now that we have the numbers on the side of the theoretically/technically preferable approach. Because now that IT team will be pressured not just by one site "being broken on the organization's network", but because if the theoretical broken Drupal site that uses Accept-header based content negotiation gets reports from these unfortunate users, then the site owner can choose to switch over to HTTPS and … be 100% certain it will work. The IT team cannot cannot in good conscience block HTTPS.

The web is a different world than it was at the time of #2445723: Use the $request format instead of the ContentNegotation., #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use and #2481453: Implement query parameter based content negotiation as alternative to extensions:

So, yes, I agree that the four factors that have to apply before problems arise now make for such a low count in the real world that it totally makes sense to move forward with this.

Wim Leers’s picture

Another point in favor of this: https://blog.mozilla.org/security/2020/11/17/firefox-83-introduces-https...

I'm tempted to bump this to Major 🤓

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

longwave’s picture

Priority: Normal » Major
Status: Needs review » Needs work

I'm being bold and following Wim's suggestion in #45 to make this major. We are building mobile apps that use Drupal as a data store and the _format=json requirement and edge case bugs where you still get an HTML response even though you only accept JSON is irritating - I only just found this issue but this patch hopefully goes some way to fixing that, we are planning to try it out shortly.

The MR needs updating as it mentions Drupal 8.8 but I don't think this will get in until 9.3 now. Also regarding the deprecated setting I think we need to add it to Drupal\Core\Site\Settings::$deprecatedSettings?

gabesullice’s picture

Thanks for doing that @longwave. I did a some research into how CDNs treat Vary: Accept and realized that the Accept header isn’t something I want to spend more of my time on since they don’t handle it well. I could see support for true Accept content negotiation as a contrib module. In fact, I think we really should have that as an option for people who can verify that their CDN is doing things correctly. However, proactive content negotiation still isn’t something we should default to in core... which is really disappointing. When I recreated this issue I was very optimistic that the ecosystem had improved since the _format query parameter became a thing.

On the bright side, I found that Drupal core already has a test module somewhere that has a rudimentary proactive content negotiation middleware that could form the basis of a contrib module. If you’re interested in writing that, I’d be more than willing to help!

For your more immediate needs, I recently committed this to the decoupled_menus module. It does content negotiation based on the hostname. For example, one can configure www.example.com to default to HTML and api.example.com to default to the JSON:API format. That means you don’t need the _format query string. It might be helpful for you, but you’ll probably need to patch it to make it refuse to return an HTML response if a JSON one isn’t available. It should fix cases where there is a JSON response and Drupal is incorrectly sending HTML though.

longwave’s picture

Thinking out loud here: in language module we detect the requested language via various plugins; we allow URL prefix, domain name, querystring parameter, Accept-Language header, etc. Should we offer similar options for content negotiation? It feels like most if not all of those options are equally valid for content types as they are for languages, there can be a similar fallback mechanism, etc.

bradjones1’s picture

I'm curious if Gabe can quantify/provide links to his research which leads to the statement, "I did a some research into how CDNs treat Vary: Accept and realized that the Accept header isn’t something I want to spend more of my time on since they don’t handle it well."

It seems to me that the conditions under which header-based content negotiation fails are outlined in this comment led both Gabe and Wim to conclude

What this issue proposes is not something radical. It proposes that we permit Drupal to do content negotiation the way that it was designed as far back as 1997! IOW, the burden of proof is not ours. Even so, we can see how rare the scenario you describe must be.

So, yes, I agree that the four factors that have to apply before problems arise now make for such a low count in the real world that it totally makes sense to move forward with this.

respectively. People have the right to change their mind, but I'm curious what has changed here to basically say this is not a candidate for core and should only be done in contrib. If I understand the conversation thus far, it is the site owner's responsibility to select a well-behaved reverese proxy in front of their application if they wish to use this feature. Further, the concern about forward proxies articulated above is becoming a vanishingly small concern in so far as the historical users of such heavy-handed forward HTTP proxying cannot do so any longer due to the advent of near universal TLS.

As Gabe is doing in decoupled_menus (which reminded me of this issue) it's certainly possible for contrib to accomplish this, and I am willing to split this out into a contrib module if need be. But I think this is definitely something that can/should be addressed in core, even if it has to wait until 10.x, with contrib as a polyfill until then.

longwave’s picture

From what I can see, Cloudflare does not support Vary at all: https://community.cloudflare.com/t/cloudflare-cdn-cache-to-support-http-...

However other popular CDNs do claim to support it:

AWS: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Reque...

GCP: https://cloud.google.com/cdn/docs/caching#vary-headers

Fastly: https://developer.fastly.com/reference/http-headers/Vary/

As #51 says I don't think we should choose not to include this in core just because it won't work behind certain CDNs. We don't have to enable it by default (if there are concerns about caching incorrect content types on problematic CDNs), but it could still be an option.

bradjones1’s picture

Thanks. So that would change this to opt-in instead of opt-out? But still worth going into core.

effulgentsia’s picture

I haven't reviewed recent patches or comments in this issue, so take this with a grain of salt until I do, but off the cuff, I like #50's suggestion as an approach for sites to decide (via core UI, without needing contrib) between core's current approach (query string), #49's experimentation (host name), and this issue's goal (request header). I'd probably recommend against a URL prefix option unless there's a compelling use-case for that.

bradjones1’s picture

Posting a patch for 9.3.x which does not include the changes to the shipped default.* files, for clean application to current projects.

longwave’s picture

@effulgentsia URL prefixes probably don't make much sense but URL suffixes might. /node/1 can return HTML but /node/1.json can return the same content in JSON?

effulgentsia’s picture

Yeah, I agree that URL path suffixes have a good use-case in that they're a common pattern elsewhere, but I'm concerned about the requirements that that places on client-side code to have to manipulate the path when generating URLs to other formats. We already have some ugliness in our JS code to deal with the _format query string parameter, but at least that doesn't require also manipulating the path portion of the URL. My hunch is to perhaps only provide plugins in core for query parameter, host name, and request header, and to leave URL suffix for contrib?

effulgentsia’s picture

My hunch is to perhaps only provide plugins in core for query parameter, host name, and request header, and to leave URL suffix for contrib?

Thinking about this more, host name might present a problem with respect to Drupal's AJAX system and dialogs, since that would then make those requests cross origin. One thing we might want to consider is to when determining the content type of a request to always check the _format parameter first and have that take highest priority, or at least to do that for certain formats such as those used by AJAX and dialogs. Then use the plugin configurations for determining how to generate URLs from Drupal and how to process incoming requests that don't include a _format parameter.

If we do that, then ajax.js and friends can continue to hard-code _format instead of generating different URLs based on how the site is configured. And that then would also make a URL suffix plugin less problematic.

But another thing to consider with a URL suffix plugin is path aliases. We currently don't prevent those from having . in them, so node/1 could be aliased to friendly-url.json, so a plugin that strips off the .json prior to resolving the alias would conflict with that.

I think we can also keep this issue's scope to just a header-based plugin. And for that matter, we could even punt the plugin aspect of it to a followup, though thinking of whether we want to end up with a plugin-based approach could inform some implementation decisions here.

gabesullice’s picture

I'm curious if Gabe can quantify/provide links to his research which leads to the statement, "I did a some research into how CDNs treat Vary: Accept and realized that the Accept header isn’t something I want to spend more of my time on since they don’t handle it well."

@bradjones1, thanks for keeping me accountable ;)

Unfortunately, I lost access to the Google Doc that I used to record my findings when I left Acquia. Perhaps @effulgentsia would be able to recover it, I believe it is owned by @tim.plunkett now. FWIW, it is not a detailed report; it only contains notes that I jotted down as I was testing browser+CDN behavior using this quick-and-dirty tool that I developed: https://github.com/gabesullice/vary-accept-tester (it doesn't have a README, but it is fairly well-commented).

IIRC, the reason I decided that I do not want to spend more time on this issue is because all major browsers and many CDNs do not cache multiple variations of a response and that makes this useless for my needs (i.e. responding with both HTML and JSON:API response from the same URL without forcing an HTTP client to fiddle with linked URLs). However, anyone is free to work on this as they please. "We" can certainly commit some form of support for this to core.

Almost all caches behave "correctly" insofar as their caches will not serve a cached response processed for Accept: A if an incoming request specifies Accept: B. However, they will not simultaneously cache a response for A and B. All browsers that I tested simply overwrite the A-response with the B-response. This has the effect that only one response per-URL can be cached at a time. Many CDNs behave this way as well. This isn't a deal-breaker for single-page apps IMO (since many will only request HTML once and then JSON for all other requests) but it's pretty bad at the CDN layer (see below).

People have the right to change their mind, but I'm curious what has changed here to basically say this is not a candidate for core and should only be done in contrib.

Sorry, I really could have phrased my opinion better. It was not my intention to say that this should never be included it core (which I why I left the issue metadata unchanged). My opinion is that core should not enable proactive content negotiation by default since it can lead to a form of cache poisoning (see below).

Since core already has an example implementation and a test for that example, I think a contrib module wouldn't require much maintenance. The project page would provide a nice place to document when/why it is useful and the infrastructure requirements. It's even possible that content negotiation might become more popular if there's a contrib module for people to blog about than if there is some nondescript configuration option somewhere.

Cache poisoning example

If:

  1. you have a site which serves a large volume of traffic,
  2. you are using a CDN which has a "one cache item per-URL" policy,
  3. and your site returns Vary: Accept by default,

then a malicious actor can easily send a request with Accept: {random-characters} every 10 seconds to continuously invalidate your high-traffic URLs.

Update: Just read the idea to make this configurable like language negotiation. I think that's a very intriguing idea :)

bradjones1’s picture

Thank you! A good encapsulation of where we're at. I think it's also worth mentioning that even sites which _do not_ sit entirely behind a reverse proxy can benefit from this. E.g., being able to craft well-behaved headless clients (SPAs, XHR, native apps) to avoid appending ?format=_json to all URLs. Dare I say that's the primary motivation for this, and the CDN compatibility question is a necessary consideration if it's going to be in core.

Wim Leers’s picture

@gabesullice Thanks for that very illuminating comment!

P.S.: I’ll coordinate with Tim tomorrow :)

gabesullice’s picture

bradjones1’s picture

MR for 9.4.x and attaching a patch that applies cleanly to project sites.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

larowlan’s picture

kim.pepper’s picture

Status: Needs review » Needs work

The last submitted patch, 70: 3028501-accept-headers-70.patch, failed testing. View results

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.