The 'url' cache context is defined as requestStack->getCurrentRequest()->getUri(). However, this results in all 404 pages sharing the same cache context of https://example.org/system/404 rather than using the request URL.
The solution is to use requestStack->getMasterRequest()->getUri() to provide the originally requested URL for the cache context.
It's not unusual for sites to use different cached content (or redirect) for different 404 pages, so it would be useful to provide a finer-grained URL cache context.
We could modify the built-in core 'uri' cache context to use the original request URL, or, assuming the current implementation is desired, we could add a new cache context.
Comments
Comment #2
cilefen commentedCould you restate the question? You know 'url' is a cache context. But I am not sure exactly what you are asking about the url cache context, especially in relation to the 403 and 404 pages.
Comment #3
mfbas far as I can tell, if an anonymous user goes to https://example.org/admin or any other "403" URL, the 'url' cache context will be https://example.org/system/403
and as far as I can tell, if an anonymous user goes to https://example.org/nothing-to-see-here or any other "404" URL, the 'url' cache context will be https://example.org/system/404
This means that many URLs share the same 'url' cache context.
Comment #4
cilefen commentedI think it means that all 404's and 403's have the same respective url cache context because a 404 or 403 is what is returned. Why wouldn't they? But they are special cases.
Picture a render array that creates a block. If the block output may be different on different URLs you would set the 'url' cache context.
You are observing that in the case of 404's and 403's the url cache context varies by response URL and you would like a way to vary the cache based on the request URL. Is that accurate?
Comment #5
cilefen commentedSo, would node/1 which is path-aliased to my/page have the same url cache context is the question.
Comment #6
mfbThat's correct. I was able to write a CacheContext that does this, so that could be the solution if I'm the only developer who needs it.
But arguably it would be supported by core, as it's probably not unusual to offer different content, redirect, or whatever depending on the specific 403 or 404 URL requested?
Comment #7
cilefen commentedThe bottom of the API page explains how to add new cache contexts, so you could offer a patch.
That said, assuming this is the right class, it looks like the UrlCacheContext uses the request URI:
Comment #8
cilefen commentedOh...
Comment #9
mfbwhen you getUri() for the current request, if it's a 403 or 404 request you don't actually get the request URL, you get system/403 or system/404.
To get the actual request URL I had to handle it as an unrouted URL, which you can see here: http://cgit.drupalcode.org/securelogin/tree/src/SecureLoginManager.php?h... - granted there might be a better way :)
Comment #10
cilefen commentedThere could be other contrib modules that may need it. Maybe post a patch, call the context url.unrouted or something similar.
Comment #11
mfbok changing this to feature request.
For the time being, I put some working code in Secure Login module - http://cgit.drupalcode.org/securelogin/tree/src/SecureLoginCacheContext.... - but, if other modules are also caching based on the request URL then it doesn't make sense to live here :)
Comment #12
cilefen commentedI am tagging this "Needs issue summary update" and "Needs title update". After the title and summary are updated with the template and they explain the plan, those should be removed.
Comment #13
mfbComment #14
fabianx commentedI think it is a bug, the cache contexts always are used from the master request - or rather we have not really defined how to deal with sub requests ...
Assigning to Wim for feedback.
Comment #15
wim leersThis is definitely true.
But I'm not yet fully convinced this is a problem. Let's walk through the problem.
system.404route: a subrequest is made to that route and its response is returned. (See\Drupal\Core\EventSubscriber\DefaultExceptionHtmlSubscriber::on404().)Assigning to Crell for feedback.
Comment #16
dawehnerCan you explain why you think this is the case? The subrequest controller should be executed in the context of the subrequest.
Comment #17
wim leersI agree with ! But that is actually what the OP is claiming to be wrong.
We have a 404. Which means no route + controller is defined for this URL. But perhaps you want customize the 404 page based on the URL — a common case is performing a search, to hopefully guide the user to content the user is looking for. But that means that we're actually trying to generate a response for the master request. Which means that the response we're generating should actually be executed in the master request context, because it wants to react to the master request!
Hence using a subrequest is wrong in this case, because a response generated in a subrequest by definition cannot vary by the master request, and we want the 404 response to be able to vary by the master request context.
That's why I concluded
(I know I'm not explaining it great; I'm not finding the right words for this one.)
Comment #18
Crell commentedWe discussed this a bit during the today's WSCCI meeting. We struggled a bit to come up with a concrete example that we could write tests around to verify this issue and a fix for it. What we came up with (suggested by neclimdul) was different types of 403. Eg, admin pages get a generic 403, unpublished nodes get a "this is archived" message, and some published nodes get a paywall "please sign in and give us money" message.
In that sort of case, I wouldn't say the subrequest is the issue. Rather, what you want then is 3 different 403 routes (which a custom module could do) and a custom exception listener that, based on some information, sent a subrequest to one of those 3 routes. Those routes then SHOULD be operating in the context of the subrequest, shouldn't they?
Is there a better concrete example here we can chew on?
Comment #19
wim leersIRC transcript covering #18 and describing why #18's summary/conclusion are not wrong, but missing the point:
Comment #20
catchTagging this for rc target triage.
I personally think we should try to figure this out during RC rather than after if we can, but hard to make a final call without seeing what a patch is like.
Comment #21
dawehnerI'm really not convinced that this is the right idea. For a custom module I has the HTTP kernel to get data from other pages in Drupal. If we would change the cache context here, this would no longer work.
Comment #22
wim leersWorking on this.
Comment #23
wim leersHaving started looking into this, I think I've already found the explanation for why this uses a subrequest in the first place.
The answer can be found here: http://symfony.com/doc/current/components/http_kernel/introduction.html#....
The Symfony request/response pipeline goes through these events:
But when an exception occurs, the
EXCEPTIONevent is triggered… and at that point, whatever the exception controller returns, only theRESPONSEevent is still invoked, NOT theVIEWevent.And… Drupal needs the
VIEWevent to transform a render array into aResponse.Consequently, it is impossible to not use the HTTP kernel to perform a new subrequest.
But! There is still hope:
KernelEvents::EXCEPTIONevent handler for the original master request cannot possibly be without flaws.\Symfony\Component\HttpKernel\EventListener\ExceptionListeneras being the defaultKernelEvents::EXCEPTIONevent handler. That class does something mighty interesting: it duplicates the original request and then just overwrites the_controllerattribute… which is pretty much exactly what we want: the original request, but with "enforced routing", to point to the route that contains the controller that is able to render aResponse(or a value object that theKernelEvents::VIEWsubscriber will turn into aResponse). Note that this default exception event handler requires a hardcoded controller, i.e. it completely avoids routing!So, there you have the answers we are looking for in this issue:
Therefore the solution is simple:
I've added test coverage that renders a 404 page containing the current request's URL. That is what should match the original request's URL.
Two patches:
DefaultExceptionHtmlSubscriber::makeSubRequest().url_aliastable because that's necessary for the test to not throw an exception, and adebug($response->getContent())to dump the response we get back in HEAD. This clearly shows HEAD does not use the original request's URL, but the 404 page's route's URL.P.S.: I have NFI what the destination query arg and
_exception_statuscodequery arg are for, but I figured I'd keep that completely unchanged, since changing that is out of scope here.Comment #26
wim leersDespite not having a passing test (note that the tests specific to this functionality are green, I of course didn't run the entire test suite locally), this still needs reviews.
Comment #29
wim leersOh, FFS. Testbot is a bit too pro-active nowadays :/
Comment #30
catchLooks great to me, tidies things up a lot.
Comment #31
fabianx commentedLooks great to me, too. Much clearer what is happening.
I was thinking if this would affect dynamic_page_cache in a way, but was not sure. I think it would still work correctly, just doing two cache_sets in quick succession.
In theory for an anonymous page there is a race condition, but it depends on the route cache context if this is a problem or not, which warrants testing.
The theoretical (aka same CIDs for dynamic_page_cache) race condition is like:
Not sure this can happen, but just pointing out ...
Comment #32
wim leersGreat, I'll work on fixing the test failures then. I'll also work on an answer to #31 — perhaps test coverage for that would be valuable? Patch welcome :)
Comment #33
Crell commentedGood catch, Wim! We actually were using ExceptionController originally, but dropped it due to excessive complexity. (We had TWO layers of subrequest going on as a result, which was pointless.) Adding back in that piece of logic sounds reasonable.
One thing we ran into with subrequests often was that attributes got lost when cloning. Do we need to clone over the attributes from the parent request before overriding with an extra route result?
Naming it this way seems excessive. It's a url matcher. That's all we need to know. Implicit additional requirements communicated by the variable name seems like a bad idea.
Comment #34
catchDiscussed briefly with @effulgentsia and triaged as rc target.
This is soft blocking other issues. Also conceptually subrequests have been a point of contention in 8.x (at one point we were going to use them for menu link access or maybe did for a while) so cleaning up the spots we do use them clarifies things.
Also not sure about this change in a patch release. It's possibly fine but happier to get it in rc if we can.
Comment #35
wim leersThis should fix some of the failures.
Comment #37
wim leersI can't reproduce any of the failures locally. The only thing I could think of is that testbot installs Drupal in a subdirectory. So I asked @penyaskito to try to reproduce it, since he had a subdirectory install available. That allowed him to reproduce it.
So, something is wrong in this patch wrt subdirectory handling. I think this fixes it.
Comment #39
wim leersWhew. That was tricky to track down. Also: I am now even more frightened of Drupal 8's routing system than I already was. Instead of just confusing/messy Drupal code, we now have confusing/messy Drupal code and confusing/messy Symfony code.
So, here's what happens (in both HEAD and the patch):
DefaultExceptionHtmlSubscriberdoesUrl::fromRoute('system.404')->toString(), which returns the path (note that == ) to that route, relative to base URLCustomPageExceptionHtmlSubscriberjust returns a valid path, i.e. the path configured at/admin/config/system/site-informationIn HEAD, we create a request object and perform full-blown routing. Hence the inconsistency mentioned above is resolved in the routing system
In the patch, we don't do full-blown routing anymore; we reuse the existing request by cloning the master request and essentially overwriting the
_controllerattribute. Because in Drupal, we allow paths to indicate a controller to use for a certain 4xx response (path -> route -> controller). So, the solution is quite simple: makeDefaultExceptionHtmlSubscriberbehave the same asCustomPageExceptionHtmlSubscriber: don't turn it into a URL, just return the route's path. That avoids it being prefixed with the subdirectory in the first place.(This also reverts #37's interdiff.)
Comment #41
wim leersStill 4 failures. Most failures are due to an endless recursion problem. So let's fix that first.
The root cause for this one turns out to be the fact that Drupal uses routing to determine which controller to use for an exception. Let me repeat two things I wrote in #23:
This time, the problem is that we rely on events within our routing; to some extent, we are doing custom and even hierarchical routing: we allow a route to not have a
_controller, but a_form. We use an event subscriber to map_formto_controller: Symfony requires a route to eventually point to a_controller.The actual problem is then that this event subscriber runs only at one time: after routing. And the router performs routing unless:
(from
\Symfony\Component\HttpKernel\EventListener\RouterListener::onKernelRequest())However, as of this patch, it's perfectly possible that we have performed routing, but all we have is
_form, not_controller. I think that strictly speaking, we are therefore violating Symfony's API's. Because note that for Drupal, having_formmeans that routing has happened, but as far as Symfony's concerned, only_controllerwill do.So we have three possible solutions:
_formto_controllerupon parsing Drupal's*.routing.ymlfiles: i.e. compile-time transformation (this would also yield a small performance boost, because it'd allow us to remove an event subscriber!)._controller, but also_form(i.e. override the cited code above)._formto_controllerbefore the router listener, so that when the cited code above runs, the_controllercorresponding to_formis already set.Option 3 is the least disruptive, so that's what I went with.
Comment #43
fabianx commented#41: I think the cleanest version is indeed to compile time transform _form and others into _controller. In fact I had thought we had been doing that already.
Worth a follow-up for sure, even if we only get to that by 8.Y.0 or 9.x.
Comment #44
wim leersThe two remaining failures also fell in two categories:
destinationquery argument comes in. Hence this is an answer to my P.S. in #23, where I wrote:This fixes the first of those two failures.
Comment #45
wim leersAnd this fixes the second of those two failures.
Comment #46
wim leersI don't think any change records need to be updated here.
This is definitely a bugfix.
Comment #47
wim leers#33 Very glad to hear you're +1! :)
I double-checked with a debugger, and cloning the request object also clones the attributes. And the code also confirms this:
I copy/pasted the naming and comments from
\Drupal\Core\Path\PathValidator. I'm not sure it's such a bad idea to keep calling it "access-unaware router" etc. Because we are doing routing. And we don't want to do access checking, because these are just exception responses, and we inherit the access result from the original master request.#34:
Yay!
#43: Done: #2611164: Remove ContentControllerSubscriber: automatically transform _form to _controller during compile time.
AFAICT this is ready for final review.
Comment #49
Crell commentedHm. OK, that's the difference between clone() and Request::duplicate(). I'm not entirely clear on why both still exist. If it's no problem to use clone(), I'm cool with that.
One thing I'm a bit concerned about: Why do we need to pre-check for _form if the use case is for when we're explicitly setting _controller? Why does that matter?
Also note that there are a bunch of other non-_controller properties that we use, such as the _entity_* properties. Do those need to be accounted for?
Comment #50
wim leersRE: your concern: can you quote the code you are referring to? I don't understand what you mean, and I want to avoid answering beside the question :)
RE: others: Those may need to be updated also, yes. But:
… given that, I'm not so sure. Know I am no routing system expert, so I mean that literally: I am not sure what is the right way, I am only sure that what is in head is wrong :)
Comment #51
Crell commentedThe latest patch runs the check for "if _form is set, add a _controller" twice, once pre-routing, once post-routing. However, if I understand it the point here is to fire a subrequest that has _controller set already, thus skipping routing and thus still having the parent request's route context. So... why does _form matter? :-)
Regarding "violating Symfony's API", I don't believe we are. The Router component itself doesn't promise to return a _controller, just that it returns whatever the route attributes are. The RouterListener then merges that onto the Request attributes.
The HttpKernel has an expectation that by the time both the request and controller events have fired, it can hand the Request to the ControllerResolver and get back a callable. It does not require that the ControllerResolver use _controller to determine that callable. I believe some Symfony use cases do even weirder things than we are in terms of how _controller gets there, such as FOSRestBundle.
The particular ControllerResolver we're using just dumbly looks for _controller, as does the one that Symfony fullstack uses. That is not, however, a requirement of any ControllerResolver implementation. The API requirement of our ControllerResolver is simply "there's a _controller attribute by the time you get to me". An alternate implementation might internalize all of ControllerResolverInterface could do its own handling of _controller/_form/_content/_whatever, in fact. My very initial proof of concept for Drupal, when I was making the case for using HttpKernel, did so, if I recall correctly.
Symfony itself doesn't impose hard requirements on the properties, just a few conventions that we've adopted. We could actually reorganize a lot of it were we so inclined, without breaking the "Symfony API".
Comment #52
wim leersBecause
_controllerrequest attribute is already set_form, not_controller1+2+3 = we need to transform
_formto_controllerin order to actually avoid routing.I've already explained all of this in more detail in #41 ;)
Well, there are only two options: either we are violating Symfony's API, or Symfony is violating its own API. Again quoting
onKernelRequest:The
ControllerResolveris not even in the picture here: we don't even get to that point because Symfony will insist doing all routing again if_controlleris not set, which means that it will again conclude this is a 404, so it again tries to render the 404 response, but it will again do routing, again conclude this is a 404, et cetera et cetera.Perhaps it does not on theory, but it does in practice. We could override Symfony's
RouterListenerand whitelist_form,_entity_form, et cetera. But then contrib modules adding more such keys will have to whitelist theirs too. IOW: that's not really workable.The solution in this patch is workable. The solution in this patch requires minimal changes. I'd also like to fix things in an ideal way, and not in a minimal way. But an rc target is not the place for that. Let's do that in #2611164: Remove ContentControllerSubscriber: automatically transform _form to _controller during compile time.
Comment #53
Crell commentedWim: Wait, are we talking about a _form-using route that is the 404, or the one that is the configured 404 handler?
Comment #54
wim leersThe latter.
Comment #55
Crell commentedAhhh... Now it makes more sense.
But that means that we definitely need to account for non-_form cases as well, since I expect an entity to be used as a 404 page more often than a form. That is, we need to make sure to handle arbitrary non-_controller alternatives.
Comment #56
tim.plunkettWould #2611164: Remove ContentControllerSubscriber: automatically transform _form to _controller during compile time fix this?
Comment #57
wim leersThat already works. :) Those are route enhancers. And route enhancers are already run by the access-unaware router that we manually invoke. It's just
_formthat's the special duck here.This expands the test coverage to prove that both
_formand_entity_formalready work. (And if_entity_formworks, then so do_entity_viewandentity_list.)Comment #58
Crell commentedWell let's address that canard, then: #2613034: Use a route enhancer to handle _form, like everything else. Worked for me in quick spot checks, but we'll see what testbot says. If that works, that cleans up _form and the rest of this patch looks good to me. The oddity of _form was my only issue.
Comment #59
effulgentsia commentedComment #60
wim leersTurns out this bug in HEAD is also showing itself in #2595695: 4xx handling using subrequests: no longer able to vary by URL, see comments 41, 44 and 46 there.
Comment #61
wim leers@catch in #30:
@Fabianx in #31:
@Crell in #58:
Given #2613034: Use a route enhancer to handle _form, like everything else landed, I'm going to go out on a limb and consider #30 + #31 + #58 = RTBC.
Comment #63
wim leersFail because patch doesn't apply anymore because conflict with #2603788: HtmlResponseSubscriber does not call HtmlResponseAttachmentsProcessor on subrequests.
Rebased, let's see if this works.
I'm afraid this will have missed the RC window, but AFAICT should be fine to fix in a point release, since this is a pure bugfix.
Comment #64
wim leersBack to RTBC per #61.
Comment #65
dawehnerCould/should we use
$request->duplicate() ?nitpick: you can use isMethod()
Dropping that reads like an API change for me?
Just an out of scope thought: Actually we could pass it along to our other exception handling?
Let's not drop test coverage
Comment #66
wim leers#65:
duplicate()doesn't actually duplicate; it resets just about everything. It's a very confusing method. Look at the source and weep.clone $requestinstead of$request->duplicate()means that we just don't need to do this anymore.Comment #67
dawehnerOH right, its a different kind of duplicating than we need indeed.
Oh gotcha, thank you!
Thank you wim
Comment #69
wim leersRandom fail in
NodeTypeTranslationTest.Comment #71
wim leersTestbot problem.
Comment #73
wim leersAnd back to RTBC again now that testbot seems to have gotten over its hangover.
Comment #74
alexpottI think we also need to fix drupalSettings here too. If you create a user without access to the admin page and visit it the drupalSettings are wrong.
I would argue the currentPath, currentPathIsAdmin and currentQuery are both wrong here.
Comment #75
wim leersAgreed that that should work correctly too. And if the current path is wrong, I'm not surprised everything else is wrong also.
I'll work on this after 8.0.0, unless somebody beats me to it.
Comment #76
wim leersAlso, Alex Pott found #2600652: Back to site links to wrong path in case the last path visited is 404 page, which is another symptom of the problems described in the IS.
Comment #77
alexpottHere's a fix for #74
Comment #78
wim leersLooks splendid!
Comment #81
alexpottAh the currentPath should not have the base path in there - there is a separate thing in drupalSettings for that.
Comment #84
alexpottFixed test.
Comment #85
dawehnerSo how do we deal with the potential API break which is that you could have changed the path of those routes, but now we hardcode the paths after the patch.
Comment #86
wim leersStill passes, back to RTBC per #67.
Comment #87
catch@dawehner so that's a potential API break, but I can't really think of a use case to do it.
However between that and the service constructor changes here, putting this into 8.1.x only.
Committed/pushed to 8.1.x, thanks!
Comment #90
mr.baileysThis change caused a regression, crosslinking #2678674: Access bypass for custom error pages