Problem/Motivation
Opening this as a replacement issue for #2503429: [PP-*] Allow both AJAX and non-AJAX forms to POST to dedicated URLs since it's a completely new approach to the same problem, will mark that issue postponed on this one. Mostly summarising in-person (and slack) discussions with Fabianx at DrupalCon Lille here.
Render caching is completely disabled on POST requests, because our form detection logic relies on re-rendering the entire page to discover where the submitted form is in the render array.
Steps to reproduce
Forms submissions are matched to a form by rendering every possible form on POST requests, until we find one matching the submission, we then process the form and short-circuit the rest of rendering using EnforcedResponseException
. Because we don't know what the forms on the page are in advance, and they're all render arrays, this means we have to render essentially the entire page to find one part of it, then throw all of that rendering away, because we're just going to redirect anyway. And exception to the redirect is when form validation fails and we actually render the page.
This so far has required completely disabling all render caching on POST requests, because if the form is retrieved from the render cache, it's HTML, and the form detection logic can't run.
On form controllers, because we always render the controller first, this isn't a problem because we immediately render the form anyway before the rest of the content - although it will still make rendering forms with validation errors slower because then the rest of the page gets rendered.
On complex pages with forms in blocks (like the login block) or a comment form at the bottom of 50-300 comments (like d.o issue queues and similar), a lot of content has to be rendered before we get to the submitted form, and the effect will be greater.
Proposed resolution
If we can identify forms in the render cache on POST requests, and ensure that they never get returned from cache, that allows us to ensure that everything else which isn't a form can be cached.
During a POST request, look for special cache tags following the pattern CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD:
If on a cache get (during a POST) you encounter such a cache tag, the item is treated as a cache miss.
All other cache items can be returned from the render cache.
We would continue to not SET render cache data during a POST request because it'll be unexpected for conrib/custom code and unlikely to improve performance - we usually POST to the same URL we've just done a GET to, so anything that can be cached already should have been. If the POST requests results in different cache IDs for any reason, then any items set would have a low/zero cache hit rate anyway.
This will allow the current 're-render the forms to find the one that was submitted' logic to work, without massive changes to form submission as would be required by #2503429: [PP-*] Allow both AJAX and non-AJAX forms to POST to dedicated URLs, but still allowing caching to work.
One important thing is that big_pipe and other placeholder render strategies needs to be disabled - so placeholders need to be executed as part of the main page request still. This ensures that any forms in placeholders (which they should be anyway) get rendered quicker than if they were deferred to placeholders.
Once the above is implemented, this will already start working for forms in blocks, so is independently useful.
For other forms in the main page content, if the form has max-age:0 (which is the case for all forms until #2578855: Form tokens are now rendered lazily, allow forms to opt in to be cacheable lands) it will prevent caching of render arrays via bubbling, so there will be no effect on the overall caching situation compared to now.
If the form is in a lazy builder, then other render arrays in the main page content can be cached, because the max-age: 0 won't bubble.
Remaining tasks
User interface changes
API changes
New 'reserved' cache tag format CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD:$suffix
, - if something is tagged with this, it will be a cache miss on POST requests.
To take advantage of render caching on POST requests, forms should be rendered in lazy builders, and cache contexts should be added on both GET and POST requests.
Data model changes
Release notes snippet
Drupal now allows render arrays to be returned from cache on POST requests, which should optimize form submissions in various situations.
Comment | File | Size | Author |
---|---|---|---|
#57 | Screenshot from 2024-03-06 23-44-41.png | 148.2 KB | catch |
#57 | Screenshot from 2024-03-06 23-26-23.png | 74 KB | catch |
#57 | Screenshot from 2024-03-06 23-26-17.png | 71.77 KB | catch |
#50 | 3395776-nr-bot.txt | 90 bytes | needs-review-queue-bot |
#23 | 3395776-nr-bot.txt | 90 bytes | needs-review-queue-bot |
Issue fork drupal-3395776
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
catchComment #4
catchAssuming this works it will be a significant performance improvement for every form (including AJAX forms), so bumping to major.
Comment #5
Fabianx CreditAttribution: Fabianx at Tag1 Consulting commentedComment #7
catchAdded an MR that does this:
And a surprisingly low number of things exploded. Haven't manually tested yet, would be a good candidate for verfiying #3396196: Separate cache operations from database queries in OpenTelemetry and assertions.
Comment #8
catchThe remaining test failure is a strange one, the error is this (full backtrace is important, see below):
What's happening is that the path alias field item is trying to check for an alias, when the path_alias table doesn't exist.
First of all I thought this might be that the MR had broken the installer somehow, so that the path_alias table never got created. But I added some debug to the content entity SQL schema and confirmed the table was created.
This means that instead, it's requesting the alias after the database table has been removed, which can happen in phpunit teardown, I guess...?
As you can see from the backtrace, what causes this is a call to serialize(), although the backtrace prior to serialize() doesn't tell us what actually caused the serialization of the entity. So... I still haven't worked out what exactly is different between HEAD and the MR that would cause this.
However, since path aliases are a computed field, it is completely pointless to serialize the value, so I added ComputedFieldItemListInterface and had PathAliasItemList implement it.
After that, the test still failed, but it was failing on the cache tag assertion instead.
So then I realised, that this isn't actually a real fatal error caused by the MR, but a fatal error caused by a test failure caused by the MR. When the test failed, it was cleaning up the database, and then the node was sleeping. I don't know if there's something we can do about that in phpunit/our test base classes or not, it's definitely very confusing.
Opened a new issue for the computed field stuff because that seems worthwhile anyway #3397424: Don't serialize computed fields.
Pushed a commit for the actual test fix which is just to add the extra cache tag. This might be green now.
Comment #9
catchIt might be a fluke, but the core test run that usually takes 10+ minutes finished in 8 minutes 59 seconds: https://git.drupalcode.org/project/drupal/-/pipelines/40707
Tagging this with 'needs profiling' although my plan is to add logging to cache get/set to prove specifically that caching is working so we can see actual cache ID sets and gets, and ideally to codify that with #3396196: Separate cache operations from database queries in OpenTelemetry and assertions (although I would add the performance test coverage on that issue, not this one, even if it lands first we could modify the assertions in the MR here to show the improvement).
One thing I didn't realise when writing this issue up but did since, is that we are essentially guaranteed an almost perfect cache hit rate here:
When you view a form, you prime the render cache for your user, permissions, language, path etc on that page, when you submit the form, you submit it to the same URL, that means the POST request will be fetching render cache items with the same context as the previous GET request, so they will all be hits. There will be situations where the whole page was cached a while ago in page cache/dynamic page cache and some of the individual render cache items have expired in the meantime, but short of other requests invalidating caches it should be pretty good.
Comment #10
catchI should mention the original idea here was all Fabianx's, I just agreed with it, wrote it up, and am implementing it :) Amazing how easy this is (so far) compared to the minefield which would have been trying to get #2503429: [PP-*] Allow both AJAX and non-AJAX forms to POST to dedicated URLs done.
One self-review point, we might want to add a constant for render_cache_form somewhere so that we can document it, but... where would that go?
RenderCacheInterface
maybe?Comment #11
larowlanThis is awesome.
We might be able to add https://github.com/previousnext/drupal-test-utils/blob/main/src/Traits/E... to BrowserTestBase once this is done.
It has picked up several regressions in caching in contrib modules.
It would also prevent more instances of #3232018: Trigger an error when using \Drupal\Core\Cache\RefinableCacheableDependencyTrait::addCacheableDependency with a non CacheableDependencyInterface object
Comment #12
catchFor test coverage, I'm hoping to implement #3396196: Separate cache operations from database queries in OpenTelemetry and assertions and add test coverage of HEAD there, and then add a combined branch of that issue + this one with the test coverage adjusted. It's an ideal scenario to validate the approach in that issue since it's not often we get something which should so dramatically change the numbers on one request. The idea would be to submit the user login form (or any form, but that's an easy one), then assert on the count of the cumulative cache hits and misses from the form submission and subsequent page render. We could also go further by doing #3352851: Allow assertions on the number of database queries run during tests to compare database queries too.
We can probably add explicit test coverage for the behaviour in RenderCacheTest too.
Comment #14
catchI added initial database query coverage in #3352851: Allow assertions on the number of database queries run during tests and didn't see any difference in queries executed. This was because PlaceholderingRenderCache also prevented caching on POST requests.
https://git.drupalcode.org/project/drupal/-/merge_requests/5100/diffs?co... fixes that and only introduces unit test failures.
This actually increases the number of database queries executed, but I confirmed at least 20 of them were cache_render tablequeries and HITs so that shows render caching working.
I am seeing the views pager query run despite it being render cacheable, with node grants query cache contexts applied etc. causing a miss, that looks like it might be an actual bug but haven't tracked down exactly what it is yet. Potentially fixing that would change quite a lot.
I've also opened an additional MR that combines this issue with the test coverage.
Comment #15
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #16
catchThis should be green again. I opened #3400596: Views adds cache contexts inconsistently to investigate render cache misses on views pagers on POST which I found working on #3352851: Allow assertions on the number of database queries run during tests.
Comment #17
smustgrave CreditAttribution: smustgrave at Mobomo commented@catch should profiling happen before tests?
Comment #18
catchYou can now see the cache gets increase in the user login block test added in #3352851: Allow assertions on the number of database queries run during tests, and also that we don't set any new cache items (which is expected since we've retained not writing back to caches on POST requests).
The database queries increase too - some of this might be cache tags, some of it might be #3400596: Views adds cache contexts inconsistently - going to try to get something going on that issue to see what the impact is.
Comment #19
smustgrave CreditAttribution: smustgrave at Mobomo commentedNot 100% sure how to review this one but appears to be unmergable now.
Comment #20
catchRebased.
Comment #21
smustgrave CreditAttribution: smustgrave at Mobomo commentedSo don't want this one to stale but am unsure how to test or profile it.
Comment #22
catchRebased again after #3396196: Separate cache operations from database queries in OpenTelemetry and assertions.
Comment #23
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #24
catchRebased again.
Comment #25
smustgrave CreditAttribution: smustgrave at Mobomo commentedThis is one I'm leaning on the expertise of @catch as a committer.
Feel bad this has been endlessly rebased but couldn't find anyone to review so going to give a shot.
So code wise nothing is standing out to me. But am tagging for a CR incase maybe contrib/custom modules want to start leveraging the new cache tags for their own purposes.
Comment #26
catchAdded a change record: https://www.drupal.org/node/3420901
Comment #27
smustgrave CreditAttribution: smustgrave at Mobomo commentedFor me CR looks good.
Comment #28
larowlanNeeds a reroll, there are two MRs can we get one hidden/closed.
Can we also get an issue summary update
Thanks
Comment #30
catchClosed the draft MR, rebased, added the follow-up to match the @todo, and updated the issue summary. No real code changes apart from different numbers in StandardPerformanceTest, so moving back to RTBC.
Comment #31
slashrsm CreditAttribution: slashrsm at Tag1 Consulting commentedComment #32
catchThis needed another rebase after #3421164: Log every individual query in performance tests and related issues, but in the best possible way - it now shows the menu tree query disappearing and the increases in cache gets and cache tag gets are much easier to see.
Comment #33
Wim LeersVery cool to see this happening at last! 😊
Left a suggestion on the MR that would make the follow-up unnecessary.
This is a pretty important limitation that should be explicitly mentioned in the change record too?
Comment #34
kristiaanvandeneyndeTrying to wrap my head around what's going on in here and it's really cool to see the menu query go away. If I understand correctly, it's because we can now get the menu from the render cache where we would before rerender everything that so happened to be on the same page as a form.
Having said that, I'm not 100% clear on why it only works for forms with lazy builders. Is it because the "special cache tag" would otherwise bubble up and therefore cause a chain reaction of us not setting the other stuff on the page in the render cache or am I misreading this?
Fully agree with Wim on finding a cleaner way to indicate which tags should be skipped on POST, even though the current suggestion is rather verbose :)
Comment #35
Wim Leers@kristiaanvandeneynde reviewed too, and confirmed my concerns, so moving back to
.Comment #36
kristiaanvandeneyndeI'm re-reading the Renderer class for the umpteenth time in my career to figure out why lazy builders are involved in this, but would be nice if someone could deny or confirm what I assumed in #34 :)
Comment #37
catchComment #38
catchI've updated the issue summary (not the CR yet, but will do so in a bit) with the explanation of why lazy builders are important, but extra reply here:
All forms are currently max-age:0 until #2578855: Form tokens are now rendered lazily, allow forms to opt in to be cacheable lands, at least unless you really go out of your way to override core behaviour (I haven't tried this since Drupal 7 so not sure how doable it currently is).
Due to bubbling, any render array including a form also becomes max-age:0. This means that something like the comment form, if it's not in a lazy builder, would cause the node and all the comments to be uncacheable too. The comment form is in a lazy builder for this reason: https://api.drupal.org/api/drupal/core%21modules%21comment%21src%21Comme...
When a form is rendered inside a lazy builder (all blocks are rendered inside lazy builders), this short-circuits the bubbling of max-age:0, and any 'parent' render arrays become cacheable again - because for the parent render array/cache item, the form is just a placeholder now due to auto-placeholdering. Same with any other render array, it's just that forms default to uncacheable whereas everything else doesn't.
So to get the full benefit of this issue on POST requests, deeply nested forms in render arrays need to be in lazy builders - however this is also the case for render caching on GET requests too.
When a form is in a block (or lazy builder), even though the form itself might be uncacheable, because the other stuff on the page already is, then this issue 'just works' as can be seen in the performance test changes.
edited to add: when a form in the main content area is a form controller - like the node add page, it's the first render array that gets rendered anyway on the POST request, so this issue makes no difference either way - this is why only the testloginBlock and not the testLogin (which submits the full login page form) see a difference. I was almost disappointed when I realised this, but of course it's good that we don't render blocks on those pages on a form submission and just means this issue will only be a performance improvement for some forms and neutral for others.
Comment #39
kristiaanvandeneyndeThanks so much for #38. Really clear explanation.
P.S.: Do we also need to update Renderer? It still checks for the method being cacheable and references the other issue.
Comment #40
catchUpdated the CR with more information based on the above (and with the new tag name).
Comment #41
kristiaanvandeneyndeRe #39: I know that it makes no sense to placeholder stuff on POST requests if we're not gonna cache it anyways, but maybe we can remove the @todo in the renderer? At this point it doesn't seem we're going to fix the issue it's pointing to. We can probably keep the isMethodCacheable() check now.
Comment #42
catchPushed an update to the comment that explains we do this for form discovery and removes the @todo. We could still try to do the 'submit to dedicated URL' issue, it's just about 20 times harder (maybe even more than that) than this one, but for me @todo usually implies something small, not rewriting significant parts of core.
Comment #43
kristiaanvandeneyndeFor me all of this looks good and I'm willing to RTBC based on what I can see in the MR.
I'm just worried that we might be missing something given that, after this change, there's still about 7-8 places in core that mention "https://www.drupal.org/node/2367555" and I'm not sure if all of these have been checked to see if said comments and/or code still make sense after this change.
Some of them will definitely be about that exception FormBuilder is throwing, but some really looks like we can remove a reference to said issue now. Especially #39 seems to now warrant a comment along the lines of: "We do not set cache items during POST requests so there's no point in creating placeholders."
In #42 you mentioned changing a comment about form discovery but I don't see that in the commits, so maybe you already addressed my concerns but forgot to push it? :)
Comment #44
catchThere's an additional reason to not create placeholders, from a performance perspective possibly the more important one as it relates to this issue.
The idea of placeholders is that we render as much as possible as quickly possible, skipping the placeholders, then at the end (of the normal page response, or via bigpipe replacements), render the placeholders later.
If we've submitted a form, and that form was in a placeholder, we don't want to defer rendering of the form, in fact we want to render the form as soon as possible, so that we can submit the form and redirect. I've updated the comment with that info.
I haven't gone through the other places in core that mention https://www.drupal.org/node/2367555 though, but agreed it's probably a good idea to do that - leaving needs review and will do some grepping later on.
Comment #45
catchWent through the @todos linking to https://www.drupal.org/node/2367555
Some are still valid, the weird architecture of form submissions is still an issue (if not the performance of it as much) once this lands, but I tried to add longer comments to clarify what's going on.
The one in
BlockContentCacheTagsTest
is outdated - didn't explicitly check whether it's because of this issue but we can definitely remove it here anyway, just deletions for that.Removing the needs profiling tag because performance tests have come on enough to show how this improves things without explicit manual profiling, which I have to say I'm extremely happy about - it's always annoyed me that we remember to require profiling on performance improvements and forget it on all the issues that introduce regressions.
Comment #46
catchHad one extra thought here:
BigPipe because it's dealing with sending in chunks definitely needs to be disabled on POST requests still.
However, I think with this approach it's actually fine to enable placeholdering itself, and it might result in a higher render cache hit rate on some pages because they'll be more similar to GET requests. It might mean that forms get rendered later in the request but if we get a higher cache hit rate that would be worth it.
This is a very small change, so went ahead and did it - only had to update the unit test which specifically enforces the old behaviour.
Comment #47
kristiaanvandeneyndeI'm not sure how. When we're on a POST request, we don't write anything to the cache. As far as render cache hits go, we check them with their initial cacheability. So by late-placeholdering things on a request where we won't be caching anything anyways I don't see how that would lead to a higher cache hit ratio: It doesn't affect cache gets at all and we're not writing anything to the cache either.
Tests still go green and MR looks good. Ready to RTBC after this final concern is addressed.
Comment #48
catchI think you're right, so it doesn't break anything but also doesn't improve anything either, reverted those two commits, but tried to update the comment again to explain why there's no benefit.
Comment #49
kristiaanvandeneyndeNo more notes from my end, looks really good.
Comment #50
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #51
catchRebased again - had to resolve some merge conflicts but I think the changes are small enough to re-RTBC.
Comment #52
catchUpdated the issue summary.
Comment #53
Wim LeersLooks great, two small questions :)
Comment #54
catch@alexpott asked if the extra cache gets are really worth it - I think they are because they represent render cache gets vs. actual rendering (as long as they're cache hits) and we know that render caching saves a tonne of CPU time. He suggested using grafana to compare timings, which is a good idea and easy enough to do locally (the standard tests aren't on the public graphs yet, just the umami ones). Looking at that, I realised the user login tests don't have any node at all so we're testing the performance of the empty front page view which is not very realistic, so adding one node in setUp() to make it moderately more realistic. Makes me wonder if we should move this coverage to umami and enable the login block there for this test instead though.
All of that means I haven't actually got proper comparison numbers yet, maybe tomorrow.
Comment #56
catchAdded a draft MR that just creates a node on the front page of standard, so that we can then test the login block with that node instead of empty views text.
As you can see, that massively increases the cache get counts on the front page, so that the diff between the cache gets is pretty much zero compared to the change here (just some more cache tag checks which you'd expect with more render caching), while we drop a query https://git.drupalcode.org/project/drupal/-/merge_requests/6946/diffs
This explains why the MR is adding more cache gets than we expect - the test coverage is doing less cache gets than we'd expect because no nodes are being loaded and rendered on the front page.
Comment #57
catchThis explains why the cache gets were so much higher - not testing quite the right thing. It's OK for the standard login test before this change, since that's only testing the login really, but for this issue we need a slightly more realistic scenario.
I ran the tests three times each between the two branches, and the best of three is 217ms before the change and 198ms after the change. I did this process twice, and got 214ms vs 199ms the first time - but quite a bit of variation above that as you can see, which I put down to doing things on my laptop/ddev.
Before trace:
After trace:
Three runs vs. three runs (with the change is the top three in the list):
This is just rendering a teaser with a single field. If we added a load of different fields, more entities etc. that'd be a lot more rendering and should be a lot more time saved by the MR, but I think that needs Umami rather than standard - it might be worth moving the login test coverage over there in a separate issue.
Comment #58
Wim LeersHeh… 😅 But that's where the automated perf tests that you added are so very valuable: they're a start to help prevent such regressions! (I think we'll need to test many more scenarios to catch more regressions. SQLite and WebKit are infamous for their perf tests in CI for example.)
Hence
is music to my ears 😁
Comment #59
kristiaanvandeneyndeThe recent comments seem to give a big +1 to #54.
Perhaps we should create a follow-up to move the node creation to the set up like you did in the other MR? I'd rather see us make the standard/stark tests more realistic than put everything in demo_umami tests. It would make it easier to rule out if potentially odd or random behavior is coming from Umami or not.
Comment #60
alexpottReviewed this yesterday in chat with @catch. We considered:
node_query_node_access_alter()
- if they don't make similar changes it just means less caching and the same as now. An example of code needing to make a change is\Drupal\entity\QueryAccess\EntityQueryAlter::alter()
from the Entity module. I think there is also something in the group module.PaymentCheckoutController::returnPage()
for offsite payment gateways. Post to this page results in a redirect so this should be fine too.Comment #61
alexpottCommitted and pushed ffa4daa3fb to 11.x and ddfc567d68 to 10.3.x. Thanks!
Comment #64
alexpottCan we get a follow-up to make the login performance test more realistic?
Comment #65
catchComment #66
catchAdded a release note to the 10.3.0 release notes draft and tagging for release highlights as a performance improvement.
Also opened #3426303: Improve the standard performance login test.
Comment #67
Wim LeersEpic work here, @catch & @kristiaanvandeneynde! 👏
What's next for #2503429: [PP-*] Allow both AJAX and non-AJAX forms to POST to dedicated URLs? 🤔