Problem/Motivation
Drupal core and a number of contributed modules expect to perform operations after the HTTP response is sent to the client, but prior to script termination. Examples are automated cron, storing JSON:API normalizations to the cache, and any operations that are deferred to the "destruction" of a service by implementing DestructableInterface.
In validating my implementation of the latter case (deferring sending push notifications) I noticed that client receipt of response contents were blocked until all code paths from index.php were executed.
I found #2579775: Missing Content-Length header blocks response until all registered shutdown functions execute and the Kernel terminate events run which is related to a similar symptom of blocking the response to the client side, but I think that is more sensitive to certain backends (e.g., FastCGI.) In any event, applying a re-rolled version of the PoC patch on that issue did not resolve the issue of blocked requests.
By way of level-setting, PHP can be set to have output buffering turned on or off by default through a PHP ini setting. Drupal does not explicitly start output buffering, e.g. by calling ob_start(), though such a thing was discussed back in Drupal 5 days but never went in to core.
Response::send(), which is called in Drupal's front matter, does flush the output buffer, if one is present. In my case (PHP run through Apache's mod_php - I'm not looking to debate your love of FPM here) I do not have output buffering on by default, so ob_end_flush() is never called.
Even if I add an explicit ob_start() before $response->send() in index.php, the output buffer is flushed but the PHP "backend" (I'm using the PHP docs terminology here) does not flush its response buffer to the client side. I need to further call flush() to do so.
All of this seems a little "too obvious" to me for this to be an oversight or purposeful omission on Drupal core's part, but this is:
- hard to test through an automated regime/I don't think we have test coverage for kernel termination/
DestructableInterface, and - These features are rather expert-level and the symptoms of this not working as advertised are easy to miss (your request just takes a little longer than it should.
Users of automated cron are probably most affected, though again, power users are less likely to use it.
Steps to reproduce
Perform a request with an implementation of DestructableInterface to do work after the response is sent to the client. Note that the response is blocked by work that is performed downstream of $kernel->terminate().
Proposed resolution
Call flush() in DrupalKernel::terminate(). Or perhaps do this in a stack middleware, though that seems a little heavy-handed. Alternatively, call it direclty in index.php?
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #96 | 3295790-96.patch | 20.07 KB | acbramley |
Issue fork drupal-3295790
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
bradjones1The docs for
flush()seem to indicate that you must both flush any output buffers and call this function to achieve the kind of behavior Drupal seeks to do in termination/DestructableInterface:Comment #3
bradjones1Adding a historical related issue even though this code has since been Symfony-ized.
Comment #4
longwaveIs this an upstream issue? Should Symfony consider calling
flush()inResponse::closeOutputBuffers()?I also note that FastCGI gets different treatment anyway as
fastcgi_finish_request()is called there, which according to docs flushes the buffers and closes the connection.Comment #6
bradjones1Re: #4, could very well be. My track record/experience in raising upstream issues on Symfony doesn't make me too optimistic, but their team always provides good feedback as to why something might not fit. Opened this issue.
I suspect if it's not "fixed" upstream, it would be because they expect "you" (in that case, Drupal) to do the "final flush." But let's see.
Comment #7
bradjones1A little nugget that arose from a conversation in Slack:
For my $0.02, I find the perennial arguments over Apache+mod_php vs. FPM to be unproductive and unnecessarily divisive. Combined with the fact the vast majority of sites will not realize much real performance enhancement in this regard vs. poor caching or performance elsewhere in the stack where developers do have more control. Plus there are lots, lots of other runtimes, e.g. Nginx Unit or Swoole.
I think our goal should be that Drupal's shutdown process handling should work regardless of your runtime, because it should be easy to do.
Comment #8
bradjones1Comment #9
bradjones1The test failure is on the
varyheader now containingaccept-encoding. I imagine this is automatically set somehow? Curious if the test suite is overly strict on this.Comment #10
catchThe assertion on no Vary header was added in #3089143: FinishResponseSubscriber adds empty Vary-headers on non-cacheable responses..
Clearing the Vary header was added in #2257709: Remove the interdependence between the internal page cache and management of the Cache-Control header for external caches /
5a01a26f1512db26f.I think it would be fine to keep the emptying out of the header, and change the assertion in the test to look for the actual header getting sent with this change (i.e. by flush() itself). Then we'd still ensure we're not sending extra superfluous headers ourselves.
Comment #11
bradjones1Thanks for the archaeology in #10.
It seemed initially this is a bit of a quirk, since my perhaps naive expectation would be that the header would be added always or never.
However, this issue actually reveals a gap in existing test coverage. In my experimentation, Apache's
mod_deflatewill not act on responses with acontent-lengthof less than ~69 bytes. If the content is longer, then Apache will add avary: accept-encodingheader and gzip the response if the client expressed its compatibility.The failing test had a
content-lengthof 5, so it appeared as thoughflush()was adding this header per se. I think it's more that this function call forcesmod_deflateto act regardless of this limit (minimum compression ratio?)Anyway, I added a longer fixture text snippet and require the
varyheader either contain onlyaccept-encodingor be absent entirely.Does this issue also need to add coverage for
DestructableInterface, since there is none, currently?Comment #12
bradjones1Comment #13
bradjones1Tests are back green... curious a maintainer's take on whether we need to more broadly add coverage on the termination processes.
Also, been thinking on the question of whether this gets "fixed" upstream or here, I think at the very least we add this
flush()in Drupal, until such time as Symfony does. I will stand by the rating of this being major, because it breaks expectations of current functionality in a pretty significant way.Comment #14
longwave@fabpot considers this a bug in Symfony, so looks like we can just get it fixed there.
https://github.com/symfony/symfony/pull/46931#issuecomment-1185344649
Comment #15
bradjones1Yeah, I was surprised by that one.
I think the test changes/improvements should still go in, though?
What about thoughts on:
Comment #16
bradjones1PP on upstream.
Comment #17
bradjones1Well that was fast. https://github.com/symfony/symfony/pull/46931#event-7006033935
NW for updates to testing.
Comment #18
bradjones1OK so I'm bumping this back to major for reasons as explained below.
Drupal's ability to perform post-response processing is blocked by #2579775: Missing Content-Length header blocks response until all registered shutdown functions execute and the Kernel terminate events run (which I have marked as a duplicate of this issue, since this has more work/tests) as well as the server's having output buffering enabled.
The last part is a "new" consideration as Symfony quickly added a
flush()to its code, but it only gets called if there are output buffers to be closed. Adding anob_start()call to Drupal was considered way back in #80029: ob_start at beginning of index.php might speed up Drupal but since this only affects users of CGI-based (mod_php) installs, I don't think that's a viable option now.The updated MR now includes a few things I think are important:
Does the recommendation for output buffering in non-FPM environments need to be added elsewhere, e.g. in docs?
Comment #19
bradjones1Credit should be given to to contributors to the duplicate/replaced issue.
Note this MR will act as a failing test until Drupal includes at least Symfony 4.5, which includes the upstream bugfix. I'm going to leave this as NR since that's a pretty soft postponement and this can be reviewed now, with a minor manual patch. This could then test against updated dependencies and ship out in short order for both Drupal 9 and 10.
Comment #20
bradjones1Most of the failures are relating to:
transfer closed with x bytes remaining to read... which I suspect is relating to the
content-lengthheader pulled in from the other issue... in local testing it doesn't appear to actually be a truncated response, but clearly there's a mismatch...Comment #21
bradjones1Related Big Pipe issue.
Comment #22
bradjones1Test failures def. look related to calls to
Response::setContent(). Seems to be the reasonable answer is to assess the various weights of subscribers which set the content so we can consolidate the related setting of the content-header header, appropriately (and in a central place.)Perhaps the "best" way to do this would be to do so inside of
::setContent()or as some type of side effect, however we're using the Symfony core response object so that would require the kind of overloading we're unlikely to want to do.Will investigate these various classes of failures and analyze the various calls to
::setContent()to see if I can't arrive at a least-invasive solution.Comment #23
bradjones1This actually seems to be an issue with Guzzle expecting a body on
HEADrequests...which obviously are not forthcoming.Comment #24
bradjones1Well it gets stranger... when testing this locally at the command line, I noted that curl gave me this helpful note when I was specifying
-X HEAD:I thought... huh. There's not much in the curl docs, but I did find this nugget in a blog post from the maintainer:
Warmer! Turns out that if you specify
-Iat the command line, this tells curl you want to do aHEADrequest and omit looking for a body ofcontent-lengthlength. Otherwise internallylibcurltreats the response as if it were made withGET. But what about setting$methodto HEAD when callinglibcurlvia Guzzle?So I thought, OK, this is a bug in Guzzle... but Drupal certainly couldn't be the first implementation of a
HEADrequest with acontent-lengthheader. Sure enough, Guzzle sets the appropriateCURLOPT_NOBODYoption... but only if the body is unspecified.Turns out JSON:API module re-uses a
$request_optionsvariable between various requests, including HEAD. So internally, the HTTP driver was telling curl to handle these as GET.This only occurs now because we're setting a
content-lengthheader, but this regression should be limited to the test suite.Comment #25
catchThis looks great. Since there are a couple of new classes and (fortunately test-only) side-effects moving to 9.5.x.
Adding a related issue that is not really related as such but is the most recent time anyone looked at BigPipe's response handling.
Comment #26
catchComment #27
bradjones1See note at #19 re: reviewing this in light of the upstream fix.
Comment #28
bradjones1@catch thanks for the review and the compliment. Looks like we were editing this at the same time so the version change back to 9.4.x wasn't purposeful. I'm going to let CI run and if we get down to one expected failure, I'll rebase on 9.5.x and then we can PP this on Symfony releases. But getting this to an "RTBC pending upstream release" would be good? Then, this should be able to go in to 10.0.x and get backported... or however you'd want to handle that.
Out of curiosity, I tried to look at how Laravel tests their similar feature... and I don't think they have test coverage. This definitely uncovered some edge cases/test fixes/etc. in Drupal. I think this should also make any refactors to Big Pipe marginally easier, too.
Comment #29
bradjones1Added two draft change records.
Comment #30
bradjones1Comment #31
heddnSome quick feedback/comments internally from fabian:
Comment #32
bradjones1Thanks for the review... I can't speak to whether it's "for sure" broken in all the places all the time, but I think this issue is necessary for a few reasons, even if this currently works in FPM:
So even if it's just to improve test coverage and ensure we don't regress from Symfony (which committed the related fix rather quickly) I think this is the minimum viable fix to address the above points. Happy to address anything specific, though.
Comment #33
bradjones1Additionally,
I think this is because BigPipe is a streamed response, vs. whatever the opposite/default case is called.
Comment #34
catchBigPipe doesn't actually use Symfony's streamed response, it uses a subclass of HtmlResponse. Comments in the response class mention chunked encoding, but it doesn't use that either - opened #3283145: Try to use HTTP trailers or a real streaming response in Big Pipe. a month or so back to look at this.
Comment #35
bradjones1Right, OK. Lots of interesting intersections to keep track of here. To Fabian's point, I think BigPipe does work here in the sense that it does currently call
flush(), which is what we needed Symfony to add (and this is PP'd on new Symfony versions being tagged.) Which is kinda the point. If there are no BigPipe replacements (e.g., json:api responses) then you don't ever get aflush()and we need the Symfony fix, plus test coverage and better detection of broken content-length headers, anyway.Comment #36
idebr commentedThe Symfony upstream issue was released in 4.4.4, see https://github.com/symfony/symfony/releases/tag/v4.4.44
Comment #37
andypostIt needs new issue to bump composer dependencies
Comment #38
bradjones1Opened #3300773: Fix failed test on `symfony/http-foundation` 4.4.44/6.1.3 and later
Comment #39
bradjones1#3300773: Fix failed test on `symfony/http-foundation` 4.4.44/6.1.3 and later is in!
@catch mentioned there that this "should probably only go into 9.5.x" but I'm a bit confused, would this then get into Drupal 10 during a rebase?
NW for rebase since the linked issue contained one of the test updates from this issue.
Comment #40
catch@bradjones1 sorry 'only' as in 'only as far back as' - we'd still commit to 10.1.x first, just not backport to 9.4.x.
Comment #42
bradjones1Added an MR against 10.1.x; there was a merge conflict in `system.install` unique to 10.x.
The 9.5.x MR will need to wait for #3300773-32: Fix failed test on `symfony/http-foundation` 4.4.44/6.1.3 and later.
Comment #43
bradjones1Well... I don't really know what to make of these test failures. They aren't consistent across runs (there is some overlap, but not 100%) and I can't reproduce locally.
I would suspect the update to symfony/http-foundation but that was just committed with passing tests on a different issue. The failing tests all appear to be Functional (likely browser) so there's gotta be some side effect here, but why it's happening now is a mystery to me.
May need to add some debugging to the failing tests, since they are only failing on DrupalCI. When I've had similar issues in the past, it's been down to timing (DrupalCI runs faster than my local) but I don't think most of these are timing related.
Comment #44
bradjones1Seems a common thread is the failing tests intersect with cache bins, which could in theory have to do with some timing/race conditions and would explain some of the inconsistency in which tests fail. Maybe?
Comment #45
bradjones1A few more thoughts before I have to change tack today:
Pretty sure the failure in ShutdownFunctionsTest is sensitive to the test-running environment? There is a conditional in that test which I think is now hit by Apache+mod_php as well, so that is likely to need a refactor. That seems logical enough.
The other test failures either intersect with the render cache or locale/language negotiation or both. I have a sneaking suspicion this is somehow sensitive to the test running environment's HTTP requests, because I can't reproduce locally. Very sticky. But not particularly surprised given the test fixes we've had to make already on this issue? Wouldn't be surprised if there were a few more, though these are more insidious because I can't reproduce.
Comment #46
catchHad a quick look at the jsonapi failure - I wonder if we need a sleep(1) before checking the cache item there, or at least rule that out.
Comment #47
bradjones1Yeah, I have just about whack-a-mole'd all these; most all of this is race conditions with caches that are set after the response is sent. The LocaleLookup is the most insidious.
Comment #48
bradjones1Well this was another adventure in frustration, but I'm glad we are able to improve so much test coverage.
My two earlier hunches were kinda-sorta correct - this had to do with cache write operations done in services that either implement
DestructableInterfaceor subscribe to the kernel termination event. The string translation tests definitely threw me for a loop since I don't do a ton of i18n.Of note, the Path Alias test only failed once; I'm glad I was able to catch it to avoid some WTF type conditions in the future. I definitely think this is worth getting in to core and I think with the number of test runs this has gone through now, it's likely we've caught most/all of these cases. But if there are some "random" failures in the future, we should definitely suspect these types of race conditions.
Marking this back to NR. I only updated the 10.1.x MR but if it passes, it's a very straightforward backport.
Comment #49
kim.pepperFollowing this issue as AWS Cloudfront needs a content length header in order to enable compression by default.
Comment #50
fubarhouse commentedI have tested the latest patch - works exactly as I was hoping.
I'm also following this one now because of how CloudFront requires a Content-Length header for Brotli compression.
Comment #51
bradjones1I'm curious to hear how you two are using CloudFront in front of Drupal... I use CloudFront (about the only AWS service I can stomach using) a ton for controlling access to media with signed URLs/signed cookies, but I haven't yet found much of a use case for content responses/whole sites.
Could be something interesting to put in the content-length header change record if this unlocks functionality that wasn't even on my radar.
Comment #52
kim.pepperWe have been using Cloudfront as a whole-site CDN for a least the last 5 years without issue as part of the Skpr hosting platform https://www.skpr.com.au/ Cloudfront can automatically compress files it caches https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Servi... however, only works if the content-length header is set.
Most static assets (images, css, js) get the content-length header set already and so get automatically compressed. It's only Drupal-served HTML responses that don't at the moment.
Happy to chat further in slack if you are interested.
Comment #53
acbramley commentedTesting this patch on a XXL (i.e very large) client project and all of our tests pass (YAY!). We are looking to use this to achieve the CloudFront compression mentioned above by @kim.pepper and @fubarhouse
However, our Functional test suite has gone from ~21 minutes to ~28 minutes (6-8 minute increase, fluctuating between CI runs). This is just for the part that's running our Functional tests that are backed by a running site using DTT - not including Javascript tests, Kernel tests, setup time, etc.
When trying to implement the header via a contrib module https://www.drupal.org/project/content_length_header we also saw significant speed impacts, however it was far worse with that module (probably related to the bugs fixed in this issue).
Comment #54
bradjones1Thanks for testing.
Re: #53:
I'm not sure what DTT is?
Are you talking about negative performance impacts in your test suite, or you're saying that the change to set the content-length header significantly slows down the site, overall?
I'm certainly open to learning new things (I learned a lot working on this issue) but I find it hard to believe that doing a strlen() on the contents of the response would slow down responses in a meaningful way.
I just checked and core tests seem to be taking about exactly an hour now, and I compared test durations for this issue with the issue that bumped symfony/http-foundation and there's no difference - actually this one runs a minute faster.
Could you elaborate on whether you think there is an outstanding issue to be addressed here before it can be RTBC?
Comment #55
acbramley commentedSorry - Drupal Test Traits
I haven't profiled the site itself, so specifically the test suite at the moment.
Worth pointing out these tests are running in a bit of a monster of a CI image that runs FPM and NGINX in CircleCI so this may be down to our setup.
I'm not sure, I just wanted to add the feedback as it was interesting that both the contrib module and this issue slowed down our tests.
The fact that core tests aren't affected is a good sign :)
Comment #56
acbramley commentedI'm not sure what's changed with our project or something else in the past few weeks but I'm no longer seeing the slower test runs anymore. Good news!
Comment #57
catchThat's encouraging and was the last remaining concern I had here - I think this is ready to go.
Comment #58
alexpottChanges in tests like...
and
... really concern me. Yes you'll have issues like this if you are testing using nginx today but not if you are using apache. Is there anyway we can disable output buffering by default on apache in tests but enable for the new tests?
Comment #59
alexpottMy worry is that this will suddenly cause test suites for projects to fail and people will have a really hard time tracking this down. Also I think the fails might be random which will be even harder to track down and more annoying.
Comment #60
bradjones1Alex, thanks for the review.
Trust me I didn't feel good doing this, either.
It's actually the other way around, if I understand you correctly. Drupal's own test infra (at least DrupalCI, I dunno about the GitLab environment) now "properly" runs Drupal with the shutdown functions. (Also reference my comments about runtime bikeshedding at #7, not that we're doing so here.)
I suppose one could say that the CI environment should disable output buffering, however we are making the recommendation to run with it turned on so I don't think it makes much sense to change the default to off in CI.
Or to look at it another way, waiting for Drupal to finish post-request processing and documenting it in the tests seems safer to me than basically saying Drupal won't test post-response processing. I could even go further and contend these tests are currently broken in the sense that they do not actually test the subject's designed operation but rather rely on the CI environment not supporting it, to pass. (Yes, the work gets done regardless but then what's the point of going to all the trouble to defer?)
This is the catch-22. For core, enabling post-request processing causes the test fragility, which is why this touches on such a disparate number of modules. In effect we're fixing the test fragility by acknowledging that there is a timing component to certain integration tests.
And as a last theory-to-the-alternative push for getting this done, I'd note that there are a not-insignificant number of other
sleep()calls in tests which could in theory be better performed by say, mocking the time service. I don't like that kind of whataboutism but in the spirit of Dries' recent talk about core contribution friction I wonder if that's the kind of perfect-enemy-of-the-good thing we can ticket as a follow-up (or just accept it.)I did spend a good amount of time trying to research how other frameworks test this kind of thing (e.g., Laravel, Symfony) and the answer is, they don't. Which is why this already prompted a fix in Symfony (but no new test coverage, even!) and I believe why it was latent for so long in Drupal.
Comment #61
alexpott@bradjones1 thanks for the thoughtful reply. I still think we need to manage the change somehow and give people a chance to cope with this change. AS far as I understand it currently on Drupal CI the post response tasks are running BUT they are running prior to the response because of our code. This patch is changing that. And making things work like we want them to when Apache/PHP is configured the way we'd like it to be.
I think what we need to do is to default to disabling output buffering in Functional and Webdriver tests - but on core tests enable it. And then trigger a silenced deprecation if it is disabled. Then contrib / custom tests can update this prior to Drupal 11 and handle the fails on their own terms and then in Drupal 11 this is the default behaviour for all tests.
Adding a one second delay to all gets seems wrong. Perhaps we should consider some way in which we can improve drupalGet() or provide an alternate implementation that will be able to signal to the test system when the post request tasks are complete. Sleeps for a defined amount of time would seem to be a sure fire of introducing random fails are some point.
Comment #62
bradjones1I agree in principle.
That said, I think that ransoms this issue to what amounts to a major change in core test-running infra/structure/classes. Personally, I think that's a huge ask for a bugfix. It's also a gigantic PITA to get any changes like a PHP ini change (especially a conditional one) into the existing test infra... which is all being rebuilt for GitLab anyway.
Regardless of that point, I think the "random fails" concern is a bit overblown. End users generally do not run their own tests against Drupal core; they rely on the upstream software pipeline (that is, DrupalCI and the like). In the case of contrib, I think this should break pipelines if it reveals an over-reliance on synchronous execution when they are actually testing asynchronous code. (Which this is, not in a fibers sense, but a "we'll do it later" execution sense.)
I'd venture to say that the affected contrib projects affected by this are small in number, and customer installs with affected tests are smaller still.
Size of impact isn't a strong case in and of itself, but I don't think the risk here is significant and I'd gladly pitch in on a follow-up task... but given this is a Major and contains enough "big" changes already, I'm just concerned this will become another multi-year core fix if we venture into "let's change how we run tests."
Thoughts?
Comment #63
longwave> provide an alternate implementation that will be able to signal to the test system when the post request tasks are complete
Instead of using
sleep()this could just be something like setting a state flag in the test and then resetting it in the last destructable service, and polling for that change in the test, no?Comment #64
alexpottBut that's precisely what this change is doing. We are changing how we run tests in a way that will cause existing tests to fail. When we introduced $defaultTheme in tests we introduced with a BC layer for contrib and custom so they'd get a deprecation notice and were told what to do. I think we need to take the same approach here.
Comment #65
mfbGotta love when the tests depend on the system being broken :) Brainstorming re: #63, flock() could be used to allow tests to keep simulating the old, broken behavior? Server process gets an exclusive lock and only releases it after final shutdown functions have run; a test method such as drupalGet() uses flock() to block until that lock has been released?
Comment #66
mfbSomething I didn't see explicitly mentioned here is that with both mod_php + mod_deflate, the Content-Length header is removed and request blocks until all the post-response tasks complete.
i.e. with mod_php, mod_deflate and text/html configured to be deflated,
curl --compressedwill block for 5 seconds when fetching this script:while
curl --compressedwill fetch this immediately without blocking:Maybe this is well known, but it wasn't immediately obvious to me that I'd need to both set Content-Length and also make sure mod_deflate isn't deflating the response to avoid blocking on post-response tasks.
Comment #68
bradjones1Now needs a rebase for 10.1.x.
Comment #69
bradjones1So an interesting observation here re: BigPipe; when you are using mod_php BigPipe will flush its response to the client as expected, however because it sends no content-length header at the outset (because it doesn't know the content-length) the client doesn't close the connection after everything is sent. So the client appears to continue to "spin" during post-request processing. If you close the client connection, the server side continues to do its post-request tasks because:
This is kinda annoying UX in that the client doesn't immediately close and give the perceived performance of the page being "done" loading. However if the client disconnects, PHP keeps going on its merry way because no further output is attempted to be sent. I had thought perhaps we needed to implement
ignore_user_abort(true)but I think this means, no we don't. This is one annoying feature of big_pipe when you can't callfastcgi_finish_request()but, whatever.Comment #70
catchThat BigPipe behaviour might be changed by #3283145: Try to use HTTP trailers or a real streaming response in Big Pipe. - probably wouldn't affect the post-response tasks but might/should help with the spinny wheel issue.
Comment #71
bradjones1Re: #63:
I thought we could do that, and that was my initial implementation of the test... however because this is a browser/functional test, the service is not accessible from the test runner; it's running in the context of the tested site. Really tough.
Comment #76
bradjones1I think it would be helpful if those who are making commits (and opening new MRs?) could explain their work? Also it would be helpful for the work to stay on the main MR that has been worked on (2568) since its target has been changed to 11.x... no excuse not to stay there.
Comment #77
bradjones1MR 2568 is now rebased to 11.x, hoping to maintain the MR review history in one place. Tests should still be passing as evidenced by the other 11.x MR that was created so I think this should be ready for another pass by @catch.
Comment #78
bradjones1Comment #81
acbramley commentedWe've been using this patch in prod for 6+ months and it's working well. Also seems that all feedback/discussions have been addressed.
This looks good to go now! Great work everyone!
Comment #83
catchThis all looks great.
Committed c469089 and pushed to 11.x. Thanks!
Hopefully see people over on #3283145: Try to use HTTP trailers or a real streaming response in Big Pipe.
Comment #84
mfbFor potential follow-up: Post-response tasks will still be blocking in mod_php if the response is deflated by mod_deflate. The only way I can think to resolve this would be for the page cache to do its own compression of the response, so that mod_deflate doesn't come into play.
Comment #85
andypostThere's follow-up #3375959: Add a way to delay executions in test runner until terminate event completed in the child site
Comment #86
mondrakeSince when this was committed, lots of functional tests started failing with deadlock errors when running some test-groups over Github Actions on my experimental database driver DruDbal.
See here an example - https://github.com/mondrake/drudbal/actions/runs/5653538436
Adding to the test build a patch that reverts this commit makes tests succeed again. See https://github.com/mondrake/drudbal/actions/runs/5753586671.
This is just to say that not necessarily just the tests covered by fixes in #3375959: Add a way to delay executions in test runner until terminate event completed in the child site could fail, but many others and that it seems something that impacts functional testing overall.
Note - I'm not asking for a reversal here, rather looking forward a fix for FunctionalTest overall.
Comment #87
catch@mondrake does your test suite run with
READ_COMMITTED? This seems like the kind of error we get withREPEATABLE_READ.Comment #88
mondrake#87 I also thought READ COMMITTED might have been the root cause, since DruDbal was behind core in adopting it, and therefore enabled it in https://github.com/mondrake/drudbal/commit/7d89e517e870c4f7e71de39f8bb90.... The number of failures reduced after that, but ATM the only way to have a full green run is to somehow patch core with a reversal of this issue.
Anyway, let's not get crazy about DruDbal, it's what it is i.e. an experiment. I will be looking at the follow-up and trying it with DruDbal as well.
Comment #89
andypost@mondrake please try to use
percona:5.7.22ormysql:5.7.22image instead ofmysql:5.7For me this is the last version where the most of custom projects are stuck to be deadlock free
Looks it needs separate issue to explore deeper
Comment #90
mondrake@andypost thanks. Tried that in PR https://github.com/mondrake/drudbal/pull/333, still gets to deadlock errors. Anyway, MySql 5.7.43 was fine before this issue went in, and is fine again with a patch that is reverting it. So for me we should really need to see in the followup what other cases are there where post-response tasks in the SUT are blocking test tasks.
Comment #91
mondrakeI run tests on Github Actions with vanilla Drupal 11.x and MySql 5.7.43 and they pass OK. So it must be something with DruDbal transaction management, that was undercover and this issue just surfaced. It’s on me then. Sorry for the noise.
Comment #92
spokjeThis introduced a new random testfailure #3379199: [random test failure] DisplayFeedTranslationTest::testFeedFieldOutput.
Comment #93
bradjones1Per #3379199-7: [random test failure] DisplayFeedTranslationTest::testFeedFieldOutput the follow-up just committed in #3375959: Add a way to delay executions in test runner until terminate event completed in the child site fixes #92, so I think any regression talk should get followed-up there. FWIW I don't think any of us were particularly in love with the sleep() calls on this issue so it's awesome to see a creative fix to harden this (still new) test coverage.
Comment #95
quietone commentedChange records published.
Comment #96
acbramley commentedReroll for 10.1.x, there were some conflicts with dictionary.txt
Comment #97
phenaproximaThis appears to have broken the batch system, if an operation callback throws an exception. See #3392196-12: Exceptions in batch no longer are shown on the page when Javascript is disabled.
Comment #98
alexpottI think this change needs to be reverted. There are the reasons explained above but also there are other things about the content-length header documented in https://www.rfc-editor.org/rfc/rfc9110.html#name-content-length that are not accounted for. Among these are:
I also think there are some other things we should account for based on the issues we had when Symfony set a content-length header on everything. See https://github.com/symfony/symfony/issues/46449.
Furthermore, I think this change is inherently risky because we're relying on the content-length we set here being correct but as we can see from the issues linked above post commit this is definitely not the case. It is highly likely that this change will be disruptive so I think we need to make a bigger noise about it in whichever release it lands.
Also, reading https://www.drupal.org/node/3298550 makes it sound like Drupal 10.2.0 will ship with a performance penalty in certain configurations due to this change. Which is not the case. This change has the possibility to improvement performance in relation to 10.1 if you enable output buffering.
Comment #99
catchWe need a revert MR here since the original commit doesn't revert cleanly (although it looks like it should be easy to resolve) and there have been test follow-ups since (although mostly those should be no-op changes without this MR).
Generally I think we'd be fine restricting the content length headers to 200, 403 and 404 since that's the only user-facing responses we actually care about.
Comment #100
alexpottAfter discussing with @catch a bit I've opened #3396559: Only set content-length header in specific situations to try to go forward here instead of a revert.
Comment #101
kleinmp commentedI just wanted to note that we ran into an issue with New Relic after adding the Content-Length header. It wouldn't inject the js-agent into the html because there was a Content-Length header (probably because it changes the length by doing so). So, we stopped getting PageView information until we removed the header.
https://docs.newrelic.com/docs/release-notes/agent-release-notes/php-rel...
One work around is to add the script tag manually in code. That way you get the correct Content-Length and you don't rely on it being injected automatically by the php agent.
https://docs.newrelic.com/docs/apm/agents/php-agent/features/browser-mon...
Comment #102
anybodyJust saw this message in some installations and I'm still curious, what the php.ini / ini_set value should look like.
Could we perhaps add an example to the message in the status report or at least in the Change record?
Linking the php.net website is nice, but doesn't help a lot to tell the user what exactly to do and not to introduce even worse misconfigurations. Not all Drupal Users are Server Administration professionals, but they see the message and will try to get it away...
Comment #103
heddnhttps://www.drupal.org/project/tailwindcss_utility inlines the tailwind css via a
http_middlewareservice. But that gets invoked _after_ this call. Leading to errors like, "upstream sent more data than specified in "Content-Length" header while reading upstream" from nginx. Apache works just fine but nginx doesn't like it. Opening up a follow-up to see what we can do. Maybe the contrib module should adjust things, but alternatively, couldn't this be implemented in a middleware as well?Comment #104
heddnAdded #3410022: Regression from #3295790 content-length header set earlier than expected and a patch to start the conversation. Then I found #3396559: Only set content-length header in specific situations, which is going a different direction. Not sure which to suggest at the moment. At the very least, we should expand on implications in the change record. While working on the patch, I found an issue with Cors that on the surface would break anyone using Cors and Nginx right now in 10.x without any use of contrib modules.