Problem/Motivation
See #3257726-19: [meta] Use Fibers for concurrency in core and downwards for some discussion and @mglaman's blog post: https://mglaman.dev/blog/can-we-use-concurrency-speed-streamed-bigpipe-r...
Fibers allows for entering a callback within a fiber, kicking off something async, 'suspending' the fiber, moving onto something else (usually another Fiber callback, which could do something CPU-intensive like rendering, or another async thing), then returning to the original fiber, and continuing if the async thing has finished.
This means you can do things like:
1. Render some nodes for one block, while waiting for a listing query to run for another block.
2. Run three different listing queries at the same time, and process the results of whichever comes back first, instead of having to query and process three times in turn.
The three obvious use cases are:
- async MySQL queries
- async (guzzle) http requests
- async file operations
What it doesn't let you do is run CPU tasks at the same time, this is only for running things during what would otherwise be waiting on i/o.
A big advantage of Fibers is it's not just about doing multiple async things at the same time, but also interleaving CPU-intensive task (like rendering) with i/o (like querying), this makes it potentially perfect for Drupal's bigpipe implementation.
Steps to reproduce
Proposed resolution
Execute bigpipe placeholders as Fiber callbacks.
Remaining tasks
By itself, this will not be an improvement as such, because just running code in a fiber doesn't do anything special, you need async and Fiber::suspend() to get the benefits, but we can pair it with #3259709: Create the database driver for MySQLi, and support for async queries in views and entity queries.
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #36 | 3377570-36.patch | 22.99 KB | catch |
| #34 | Screenshot 2023-08-08 at 3.48.03 PM.png | 144.91 KB | wim leers |
| #32 | 3377570-30.patch | 22.58 KB | catch |
| #32 | 3377570-30-test-only.patch | 9.47 KB | catch |
| #30 | 3377570-29.patch | 22.46 KB | catch |
Issue fork drupal-3377570
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
catchComment #4
catchThis is @mglaman's blog post code snippets in patch form. BigPipeTest fails for me locally with head, rather than debugging that on a Friday evening (it is probably apache-specific and I'm running nginx with ddev), let's see what the bot says.
Comment #5
catchComment #7
catchThis might be a green patch - at least BigPipeTest now passes, and that was the only failure, so as long as it doesn't cause another test to fail.
I don't think this is 100% yet but it's definitely getting closer.
Things to look at:
1. We need to preserve artificially rendering the messages placeholder last - because it relies on messages added to the session which might be set by other placeholders. This logic is a bit janky already and let's say the way I handled it does not make it less janky by any means.
Also the test passes if you remove the start and stop signals from the array (see below), but really we should just be asserting that the messages placeholder is rendered last, it might need a bit more reworking to test only the thing we actually care about here.
2. We can't check for the existence of only one start and stop signal, because now we're sending one per placeholder instead of one per set of placeholders. While BigPipe is designed to accommodate this (i.e. rendering one group of placeholders, then another group of placeholders later potentially), it makes me wonder if we should try to refactor things a bit so that we continue to render the group of placeholders inside a single start and stop signal. This would probably mean moving some logic in renderPlaceholders() into another helper method, and making that method what we wrap into a Fiber - something like this. However better to start from a working patch before embarking on bigger changes.
3. We can't eat exceptions from fibers, we need to just let them be thrown. BigPipe itself already has code to convert exceptions to E_USER_ERROR and fortunately test coverage to make sure it works.
Tagging for subsystem maintainer review since this could really use a look from Wim and/or Fabian.
Comment #8
catchAs well as views listing queries and entity queries, any other potentially long query that we do during rendering is a potential candidate for async mysql queries + fibers.
Drupal\path_alias\AliasWhiteList::resolveCacheMiss()would be one - we could check if we're in a fiber when we reach that (somewhat likely because we'd be rendering a URL, which can happen in a bigpipe placeholder), if we are, execute the query async, Fiber::suspend() etc.. Since this happens for each unique first part of a path (node/, user/, admin/ etc.) there's a fair bit of potential there.Comment #9
catchI did this, now get a green test run (on BigPipeTest at least) with no test changes, and while the diff is larger due to indentation, the actual functional change is considerably smaller - i.e. we only send one start and stop signal, only create the fake request once, stuff like that.
I wondered if we could skip the placeholder ordering altogether and only detect the messages placeholder ID and render it last, however even though Fibers means that the placeholders can get sent out of order, I think we still want to start them in DOM order - the quicker they start, the quicker they can finish, and might as well render top down if and when we can, so that logic is fine as it is.
Comment #10
catchConsolidating three identical try/catch statements down to one, HEAD currently has two, adding the third made me realise we can just move them a bit higher.
Comment #11
catchThis shouldn't be an elseif.
If the fiber doesn't call suspend() at all, or if it was already suspended earlier and doesn't call suspend() again, it will either throw an exception or get all the way to termination in one go. So either the start() or resume() calls could mean that by the time we get here, it's already done and we should send the response immediately.
With the elseif, we'd need to go all around the foreach loop another time before we send.
Comment #12
catchLet's be optimistic :)
Comment #13
catchGetting rid of one level of indentation.
Comment #14
catchFailed to remove the now-unnecessary test changes from the patch.
Comment #15
catchI think we can make use of this in core even before we have async database queries. Without async queries it would only be an improvement on cold cache requests to high traffic sites (i.e. a stampede situation), but it also shows a way we can optimise things anticipating async queries too.
Ideas are from #3257725: Add a cache prewarm API but including the changes here because I think it's more appropriate to this one, the other issue would then be able to build on top of the stuff added here.
Let's say we have two requests coming in to a site within 200ms of each other, and let's assume for the sake of argument that building the theme registry takes one second and they're both requesting HTML pages.
If both requests get a cache miss for the theme registry during bigpipe placeholder rendering, then they will both try to build it at the same time, taking a second each. An approach we've taken with the router is to add a lock and lock wait to ensure only one request does this at a time, this means all the other requests wait around doing nothing until one has done the work, then they can all proceed.
However, instead of waiting, if we suspend the fiber, then the parent process can move onto something else, in the case of this issue rendering a different bigpipe placeholder - this could involve running a non-async database query, or an async database query, or building some other cache, or loading an entity, or anything which takes a few milliseconds or more.
Let's say request 1 only has one bigpipe placeholder, and it gets a cache miss for the theme registry. It suspends the fiber, but because there's only one fiber, bigpipe immediately resumes it again, with the change in this patch, it will check the cache again just in case, then move onto building and caching the theme registry.
Request 2 has two placeholders, the first one immediately hits the empty theme registry cache too (because it's just showing the current user's name or similar) and calls Fiber::suspend(), bigpipe moves to the next fiber, which is a views block which does a listing query. It runs the query and then eventually needs to render the results, this arrives at the theme registry again, because request 1 has already finished building it, it gets a cache hit. Once the views block is finished, the first (username rendering) placeholder is resumed, and this also gets a cache hit for the theme registry now. Now without either locking or waiting or async, request 2 was able to get on with something else while request 1 was building the theme registry for it and it skips having to do that work itself.
With async queries this will help the single-request case too.
Let's assume the same two-placeholder page with a username and a views block.
The first placeholder renders the username, it gets a cache miss for the theme registry and suspends the fiber.
The second placeholder is rendering the views listing, it executes an async listing query, and suspends the fiber.
Bigpipe now resumes the fiber for the first placeholder, because it got a cache miss before, the registry double checks that's still the case, then goes ahead and builds the theme registry because it's still not there.
Once the first placeholder finishes, bigpipe then resumes the second placeholder, the query has already come back, so it keeps going.
If we didn't call Fiber::suspend() from the theme registry cache miss, then the first placeholder would just continue all the way through until finishing, and then the second placeholder would just have to wait for its async query to come back after being immediately resumed - because there's no other placeholder for bigpipe to execute while it waits.
So suspending when we get to a CPU-expensive cache miss allows us to move on to the next thing, and if the next thing is async, then we return to the CPU-intensive task while the async thing is running, or if we're lucky (if you can call stampedes lucky) don't have to do it at all. This massively increases the likelihood that we're able to take advantage of things happening in parallel, whether it's due to async or different requests coming in.
Also adding similar logic to LockBackendAbstract::wait() for all the same reasons, see the comments in there.
Comment #16
catchCCF errors...
Comment #17
catchAnd we need an early return if the first cache get is successful.
Comment #18
catchComment #19
andypostIs there a way/ideas about how to profile it and get edge cases when this concurrency can break things?
Comment #20
catch@andypost until we have #3259709: Create the database driver for MySQLi and views support for async queries in ::execute(), it won't improve anything at all for the single request case. We could profile to make sure it doesn't make anything worse, which @mglaman already did.
It would be possible to set up a test situation using guzzle async requests though - i.e. have six different blocks that are all placeholdered, and all make an async guzzle request to somewhere, it would then call Fiber::suspend() after the request is made and before waiting for it to come back. If that async request was to a route that adds a sleep(5) call before returning, it would make things more obvious.
We should then be able to see that bigpipe loops through the placeholders, and they all send the async guzzle request, then it loops through them until they start coming back and renders them in whatever order they come in, instead of doing each one by one.
The main edge case I can think of is something like this:
Someone adds a footer block which adds a library and is placeholdered. The library has an implicit dependency on a menu library but the dependency is undeclared because it's always rendered last after the menus at the top of the page anyway. This change means the footer sometimes gets rendered before the header, so its library is added in the wrong order. However this would be exposing an existing bug (the lack of a library dependency) rather than introducing a new one.
Another case would be some code running inside a placeholder sets something on request or in session, and later code relies on this already having been set, but now they're rendered out of order so it's not set - this is what we already have to account for with the messages block. But again this would be a pre-existing bug - the block might be rendered on a page where the other block never appears for example.
It looks me a while to grasp conceptually what Fibers does, but that's partly because it's deceptively simple - i.e. you can start some code, and pause some code, and restart some code from different places, and that's about it. Nothing actually runs at the same time, except non-blocking i/o.
Comment #21
wim leersThis looks epic 🤩🤩🤩🤩👏👏👏👏👏 And I can't believe how simple it is! Well, I sort of can, because this matches BigPipe's architecture exactly, but still, I would never have dreamt this would require so little code!
This is the key thing that Drupal's BigPipe implementation couldn't do that Facebook's could thanks to them using the PHP language fork
Hack, which supports asynchronous operations. But now, with Fibers, we can! 🥳🚀#15 is a very clear description of multiple real-world scenarios this improves.
It's incredible how little needed to be changed!
@catch only needed to:
First, add Fiber support to Drupal's
lockand sibling services, to ensureSecond, also add Fiber support to the theme registry (because it is very I/O intensive, and virtually every request triggers it, so do something useful while another request handles it for ALL requests, including this one)
And now we upgrade Drupal's BigPipe implementation to use this: each placeholder (which must be using a
#lazy_builderand hence can be rendered in isolation — this is a crucial enabler for adopting Fibers!) can be rendered completely independently, and hence can run in its own Fiber.🥳
Remarks
👍 Tricky stuff — see #2712935: Messages are not rendered last, meaning those messages set within placeholders only appear on the next request.
🤓 The comment could be slightly clearer:
[…] the render order of every other placeholder is safe to change.🙏 Could we please rename it from
$messages_keyto$messages_placeholder_id?👍 This ensures the Fiber for rendering the messages placeholder is executed last.
Übernit: we should/could restore the previous indentation level 🤓
(Not that whitespace matters much in HTML, but this is theoretically adding at least 4*4=16 additional spaces to transmit to the client. 🙈)
🙏 AFAICT these lines are crucial! They're what ensure that if a Fiber gets suspended after we started/resumed it, we move on to the next Fiber.
A comment here would be helpful in clarifying that. Being new to Fibers/async PHP, it took me a few times reading this to grok it.
🤓 TIL you can unset the same key multiple times and PHP won't complain!
🤔 I'm not entirely sure this comment tweak is adding something.
There have already been issues in the past to tweak the placeholder rendering order, to optimize for certain use cases.
BigPipe's client and server side logic do not care about the order — with the sole exception that the status messages must be rendered last.
Now with Fiber support, there effectively is even *no* way to control the execution order — previously you had implicit control: the DOM order.
🐛 If anything, I think this comment tweak does not belong here, inside the body of
BigPipe::getPlaceholderOrder(), but on the method docblock. Because the comment there is now wrong:This is no longer true!
fiber(lowercase), sometimesFiber(uppercase). Let's make that consistent.Conclusion
IMHO this is RTBC as soon as we have done
some benchmarking.⚠️ AFAICT there is not going to be any speed-up right now except for when the theme registry is being rebuilt, or something else hits the
lockservice. The real speed-up will only happen once we have #3259709: Create the database driver for MySQLi in place.Once #3259709 is in place, we can do a follow-up to have the Drupal DB layer detect when it's about to execute a query inside a fiber (a
#lazy_builderfiber or even a BigPipe fiber — this could be detected by using the current\Drupal\Core\Render\RenderContext), and upon doing so, suspend the fiber. Which means the next fiber would start executing.Hah, the section already said that! 👍
We can do a benchmark and a test then, and they can be one and the same: by having an artificially slow block's
#lazy_buildercallFiber::suspend()explicitly and then asserting that it gets rendered last despite being NOT the last block in the DOM order.Comment #22
wim leersI just hit the infamous "cross-posting on a node can cause
published = 0to get set" bug on d.o 😳Comment #23
wim leers#20:
Agreed, this would be an existing bug 👍
Blocks are not allowed to rely on global state, so that'd indeed also be an existing bug. (
#lazy_builders' only dependencies are the arguments for the lazy builder callback — they can fetch information, but must not rely on request/global state. If they do, it's up to the developer to work around this. They probably should NOT use a lazy builder in this case.)Same! That last sentence is also exactly why they were able to add this to PHP without breaking anything, and while allowing much PHP code to be refactored in a simple way like we are doing in this very issue 🤩
Comment #24
catchThis is true but we still control the *start* order if not the *finish*/render order, updating the method docs on ::getPlaceholderOrder() instead of this inline comment sounds good. I originally thought we should get rid of getPlaceholderOrder() entirely, but I think it's fine and probably good to start in DOM order, it could help largest contentful paint for example especially if the first placeholders are quick to render and don't call Fiber::suspend().
Yes I think this will work for a test. I was very relieved to eventually not have to touch the BigPipe test at all, but from what I saw when I thought I would have to, it should be relatively straightforward to add a method + extra placeholder that does this.
I think all the other remarks are good - I'll try to address them soon. On fiber vs. Fiber I can't decide, but probably it's easier to use Fiber since its a proper noun in PHP now I guess.
Comment #25
wim leersThis is my one remaining concern about this change: it's possible that this change will cause a worse end-user experience, because due to async rendering it could be that the most visible block is now rendered last 😅
#2632750: Interface previews/skeleton screens through optional "preview" or "placeholder" templates (CR: https://www.drupal.org/node/3338948) helps mitigate that, though, but we may need to do more issues like #2951268: Improve rendering account link in the toolbar.
In fact, I think doing this is a good motivation to actually prioritize doing more like that #2951268! 👍
+1! 😄
Yep :)
Comment #26
catchYeah it's going to depend on the request, with warm render caches there should be zero change. In a cold cache situation it could result in a later LCP and especially more layout shift, but other times it'll be less (say if the LCP is actually the third placeholder and this ends up getting rendered first, which could be the case with main + user menus and similar). Adding more previews will definitely help to mitigate the times when we're unlucky. But also I would think if the overall page renders quicker it should help things like time to interactive more or less across the board.
I've addressed everything (I hope) except the additional test coverage.
Comment #27
catchAlso I left the Drupal 8 comment as is when moving the messages code around, but when I saw it still said Drupal 8, I wondered if anything's changed, and maybe we can use messages command? Render the placeholder, Drupal::messenger()->getMessages() or whatever it's called now, shove the messages into a MessagesCommand. Definitely follow-up material but could be nice. There's a good change messages are the LCP on pages that have them so it could be quite significant (albeit on a tiny proportion of requests).
Comment #28
catchHere's a bit of extra test coverage - adds a Fiber::suspend() to a callback that sends a message, to ensure that messages really are rendered last even if we go over the loop twice.
If I hack out the fiber-specific messages handling like this:
Then I get this test failure:
So that's good - it's testing what we want.
I started to look at BigPipePlaceholderTestCases and the bigpipe_test controller for trying to add an extra case there to assert things are rendered out of order, and didn't get very far at all yet though.
Comment #29
catchHere we go. I'm not sure Wim will approve of the species introduction but all seems to be working.
Test only patch is also the interdiff.
Comment #30
catchAnd the actual patch.
Comment #31
catchFollow-ups:
#3379883: [PP-1] Add async query execution support to views
#3379885: Use MessagesCommand in BigPipe to remove special casing of the messages placeholder
Comment #32
catchComment #34
wim leers#27: Yes, I've been thinking that too for a while now. 😇 But last time I looked into this (while getting CKEditor 5 ready for Drupal core and we needed to deal with AJAX updates and rendering the messages in a particular DOM location ), I was running into problems. Nothing insurmountable probably, but enough of an upstream blocker (it wasn't in core yet) to pursue an alternative solution. Definitely worth looking into though! 👍
#28: 👏
#29: 🤣 I totally approve of piggies! 🐽
Observations while stepping through, including a gotcha
To do the final review for this, I installed the
big_pipe_testmodule and stepped through the entire rendering process. Observations:<span>This little piggy stayed at home.</span>appear on the page 😅 Its placeholder remains unreplaced!$fiberswas constructed based on$placeholder_orderand looked like this:👆 This is the order in which the Fibers will be executed, time and time again, until none are left. Say that every Fiber gets suspended when they are started, except for #2 and #3. Those would only get executed when they are resumed (because of some I/O they're waiting for). That'd mean the execution order would be: #1, #2, #3, #4, #5, #6, #7, #8, #2, #3.
This condition at the start of the Fiber processing loop guarantees that the messages placeholder is ALWAYS replaced last:
callback=%5CDrupal%5Cbig_pipe_test%5CBigPipeTestController%3A%3Aexception&args%5B0%5D=llamas&args%5B1%5D=suck&token=uhKFNfT4eF449_W-kDQX8E5z4yHyt0-nSHUlwaGAQeUplaceholder,is executed and … causes BigPipe's execution to STOP! 😱
does not wrap therenderPlaceholder()logic in atry {…} catch (\Exception) {…}like HEAD does.🙈 NOPE I was wrong, it's because I'm in a core development setup and have configured
system.logging:error_leveltoERROR_REPORTING_DISPLAY_VERBOSE, rather thanERROR_REPORTING_HIDEwhich is what production sites use, and hence also what\Drupal\Tests\big_pipe\Functional\BigPipeTest::testBigPipe()uses 🙈Code review
🙏 It took me longer than I'd like to admit to figure out why the messages placeholder truly would ALWAYS be rendered last, even if a Fiber would keep getting suspended by its own
#lazy_builder.THIS line is why. Obvious in hindsight. But less obvious while I'm still getting used to thinking in terms of Fibers' execution order 😅
A comment here would go a long way 🙏
🤔 Wouldn't
be clearer?
Übernit:
arrayreturn type 😇Nit: this is at the very end of the loop already due to this refactoring, so this is a dead line of code 🪦
This could use a comment 🙏
I feel VERY strongly that this is a missed opportunity for some piggy emojis; this is missing:
At least one, arguably all 😜
🙏 Let's say why, at least with an
@see?Conclusion
As soon as some clarifying comments are added, I will RTBC this! 🥳 It's incredibly satisfying to see how easy it is to adopt new PHP language-level features! 😊
Comment #35
catch#3379885: Use MessagesCommand in BigPipe to remove special casing of the messages placeholder looks like it has legs which would remove this tricky line altogether, but I will add the clarifying comment for #1 in the meantime.
Everything else looks good - will try to address soon, gonna find out if vim will handle porcine emojis.
Comment #36
catchShould address all of #34.
Comment #37
catchComment #38
catchSelf review:
Let's move this back to #3257725: Add a cache prewarm API where is is more closely tied, it's been reviewed and already changed there. Just removed the hunk from the patch.
On the other hand the theme registry change is more relevant here, since that becomes blocking when we're trying to render stuff, slightly improved the comments though.
Comment #41
catchComment #42
wim leers#38 👍
No more concerns. This is well-documented, well-tested, and easy to understand. RTBC!
Comment #43
wim leersThis is a quote I just have to share with y'all:
— @catch in Drupal Slack 😊
Thanks for getting this going, @mglaman! 👍
Also: started by somebody at Acquia, continued by somebody at Third and Grove, then reviewed by somebody else at Acquia — open source can be beautiful and amazing 🤩😄
Comment #44
mglamanWow, I still can't believe this turned into something. Amazing work @catch
Comment #45
marcelovaniLooks like the span is missing the closing bracket. Is this intentional?
Comment #46
wim leersNice catch, @marcelovani! 🦅👁️
That doesn't seem intentional, but a minor bug that the HTML parsers appear to be fixing automatically 😊 Still, we should get that fixed 🤓
Comment #47
andypostClosing inline elements reminds about #1268180: Newline character in HtmlTag causes unwanted whitespace
Comment #48
catchYes that's a mistake I am afk for a bit, but also one character is fixable on commit if no-one else gets to it.
Comment #49
catchAddressing #45.
Comment #50
marcelovaniI can confirm that this is correct now
Comment #52
catchTest failure looks unrelated.
Comment #53
catchOpened #3383449: Add Fibers support to Drupal\Core\Render\Renderer - no dependencies either way with this issue, just applying the same overall pattern.
Comment #54
catchComment #56
catchUpdate via #3383449: Add Fibers support to Drupal\Core\Render\Renderer and discussion both these issues and others with @Kingdutch - once we've gone around the fibers loop once, if they're still waiting, we can allow calling code to do something else if it wants to too. This reduces the likelihood of a spinlock in the case something async is taking a very long time to come back, since we can do something else useful in the meantime.
The core use case would be #3257725: Add a cache prewarm API.
Comment #57
smustgrave commentedHave a Umami 11.x running.
Applied the patch #56
Clicking around the front end no issues
Searching on the front end no issues
Went to modules and content pages and did searches no issues
Updated a view and verified I still get a status message just fine.
Not seeing any issues so think this is good!
Comment #58
catchConverted to an MR and switched from longhand function () use($foo) syntax to an arrow function.
Comment #60
alexpottCommitted 108150c and pushed to 11.x. Thanks!
Comment #64
reinfate commentedI've noticed that fiber loop itself is never suspended due to $iterations being reset at every iteration of the while loop. Pretty sure it was intended to be before the loop. (Fixed in MR!5002)
Also just random thoughts. For example, if we have 5-10 long-running fibers that just waiting for something, each foreach iteration will suspend the current fiber. So it can be very quick suspends in a row, if higher code has nothing to do we will be back here and again quickly suspended 5-10 times. Will it be better to move $iterations check to the while loop to minimize that "spin locking"?
Comment #65
smustgrave commented@ReINFaTe can you please open a follow up. Will also need a failing test case to show this issue.
Comment #66
catchThe $iterations thing is an optimization, it's not testable except perhaps manually but even that would be very difficult. Let's discuss the rest of #64 on the follow-up issue, I think the fix itself looks correct, we might want two follow-ups, one for the quick fix and one for the spinlock discussion.