Problem/Motivation
With the introduction of the use fibers in the render system and entity storage loading, it can be difficult to debug certain issues that result from concurrency, such as race conditions or global state mutations. These issues can be intermittent or require very complex and specific steps to reproduce. In addition, debuggers only show the current stack, so it can be hard to follow even stepping through execution.
Because of this, some way to be able to prevent fibers from being suspended, so that the execution stays in a running fiber's stack till it terminates, could be useful in both debugging, to determine that a bug is not related to concurrency, or to provide a workaround for a known issue related to concurrency until that issue is resolved.
Possible approaches to do this:
- A settings property in settings.php indicating that globally fibers should not be suspended
- A Drupal wrapper class for Fiber with something like allowSuspend(bool $allow) or mechanisms to prevent both individual fibers from being suspended or all fibers from being suspended globally
- A service with a settable flag indicating that fibers should not be suspended
- Some sort of fiber manager/factory service (probably a combination of the previous two ideas) that is used to instantiate and track fibers. (A challenge with this is how to keep the tracking property populated correctly after serialization or container rebuilds)
- From @longwave on Slack: Per service swaps: it might be easier to reason about and test individual services if they don't support fibers initially, and then each service has a subclass that does that can be swapped in (perhaps by default in most cases)
Steps to reproduce
Proposed resolution
Remaining tasks
User interface changes
Introduced terminology
API changes
Data model changes
Release notes snippet
Issue fork drupal-3576789
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
znerol commented#3568380: Propagate opentelemetry context into fibers proposes to add a fiber factory for other reasons.
Comment #3
znerol commentedA possible option would be to generalize fiber related code into async primitives. It's then up to the implementations whether or not fibers are used.
We could start from the placeholder case. This would need something like kotlins async() and awaitAll() function. Plus some deferred abstraction.
Comment #4
znerol commentedLuminova Async is more or less what I have in mind (docs).
Comment #5
godotislate#3/#4 are good ideas, and now I'm wondering about how this issue should be scoped. I think the goal here is to make it easier to provide workarounds (and/or help debug) Fiber-related issues like these until they get resolved:
#3569172: Weird language negotiation behavior inside getLanguageSwitchLinks leading to incorrect languages being used
#3576594: [PP-1] Avoid account switching when calculating permissions in AccessPolicyProcessor::processAccessPolicies()
#3576381: Ensure that URL access checks respect the language in the URL, remove global language state changes in ConfigurableLanguageManager::getLanguageSwitchLinks
#3565020: Set the Drupal\views\ViewsData::$fullyLoaded property to TRUE only when the data is really loaded.
For this purpose, I was thinking that a wrapper or a factory would be a pretty shallow abstraction, with the factory mostly being a object accessible to the debugger to make it possible to view information about all instantiated fibers globally. But there are other good reasons for the factory use case, as you mentioned, that probably warrant more abstraction. Another question is how much of that can be covered by #3394423: Adopt the Revolt event loop for async task orchestration?
Maybe this discussion is better suited in this meta: #3257726: [meta] Use Fibers for concurrency in core.
Comment #7
znerol commentedDraft MR shows a possible way. There is something wrong with BigPipe, but honestly I do not really know how this is supposed to work.
Comment #8
znerol commentedReroll after #3572967: Renderer::executeInRenderContext() and BigPipe should also pass immediate as resume type.
Comment #9
andypostComment #10
smustgrave commentedShould this still be draft?
Comment #11
znerol commentedIt still has a weird test failure in BigPipe. And probably also needs some more discussion on the approach.
Comment #12
znerol commentedI fixed
BigPipe::sendPlaceholders(). I guess the functional js test failing now is random. I'm unable to rerun it though.Comment #13
znerol commentedI guess this at least needs tests for the new
async_fibers_enabledsetting.Comment #14
kingdutchznerol made me attentive of this issue, thanks! :)
As part of #3394423: Adopt the Revolt event loop for async task orchestration, one of the functionalities is that Revolt can select different "Drivers", which translate the async primitives (of a deferred, delayed, or repeating task) into actual execution on the underlying PHP event loop library (there are a few it abstracts over).
One functionality that Revolt has is the ability to select a "TracingDriver", which will return the backtrace of the place the task was scheduled at. If cancelled, it will also include the place the cancellation happened at. This provides traceability for the tasks that are scheduled on the event loop.
It does not exactly address the goal of this issue, which is preventing suspension. It does provide a path to better debugability of async workloads.