Closed (fixed)
Project:
Drupal core
Version:
11.x-dev
Component:
base system
Priority:
Major
Category:
Feature request
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
20 Jul 2023 at 20:57 UTC
Updated:
27 Sep 2023 at 05:18 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
znerol commentedComment #3
znerol commentedComment #4
andypostComment #5
znerol commentedComment #7
mondrakeI've started to see test failures in my experimental database driver DruDbal, due to database locks, since when the parent issue was committed.
Bumping to major, even though I think critical would be appropriate if the testing framework stability decreased.
Comment #8
znerol commentedComment #9
znerol commentedComment #10
znerol commentedRemove calls to sleep() added in #3295790: Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration.
Comment #11
znerol commentedMore test groups (
path,jsonapi,language,locale).Comment #12
znerol commentedLooks like
setWaitForTerminate()needs to run earlier inConfigurableLanguageManagerTestandLocaleLocaleLookupTest.Comment #13
znerol commentedRemove calls to sleep() added in #3375584: [random test failure] Random failure in PathWorkspacesTest.
Comment #14
bradjones1Primary author of the changes in #3295790: Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration, here.
From a quick read of this issue it appears that the goal is to disable post-response processing, by way of essentially undoing the fixes from the other issue, namely setting the content-length header. This might work, however it seems a bit hacky to me and, most importantly, not entirely obvious. For someone who comes along later and wants to maintain or debug this code, it's not clear why the content-length header is related. Even I had to go back and do some issue archaeology (that issue was open for almost a year, so my memory was a bit fuzzy and this stuff is pretty esoteric.)
For my part I would rather see a more explicit short-circuit of this functionality, or (better yet) improve the underlying race conditions and flakiness while allowing the underlying functionality work as expected, so we are actually testing the system as it's expected to operate. There was a lot of discussion in the parent issue about this kind of thing, and it was generally considered to be an improvement in test coverage because we're no longer testing/depending on broken functionality.
I don't love the additions of the sleep() calls, but we had little choice because as you rightly point out it's hard to signal to and from the test runner and the site that's being called. I think this is where the improvement should happen - I'm sure there is a more elegant solution to this aspect of this puzzle.
Comment #15
catchThis is true, but I think it applies equally to sleep(1).
One possible approach that crossed my mind was a test-only post response task, that writes something to state or key/value when it's finished and tries to run last of the post response tasks. We could then poll for that in the parent process, and delete it again as soon as its found ready for the next time, like a ::waitForResponseTasksToFinish() method. This would allow the post response tasks to run actually post-response while having something hopefully more reliable than sleep() to ensure they have.
Comment #16
znerol commentedWe could try to use flock() for that purpose. If the state flag is set, the child site creates a temporary file and
flocks()it (usingLOCK_EX). The path to the file is communicated to the parent site in a response header (X-Drupal-Test-Wait-Terminate: /tmp/xxxx.lock).The test runners HTTP client (most probably stuff in
UIHelperTrait) examines the response headers and extracts the path fromX-Drupal-Test-Wait-Terminate. If it exists, it tries to set an exclusive lock on that file (which blocks until the lock in the child site is released).Some notes:
Comment #18
znerol commentedOpened an alternative MR 4461 implementing the ideas in #16 using the symfony lock component. Attached is a patch which executes the affected test groups 50 times in a row.
Comment #20
znerol commentedRemove X-Drupal-Wait-Terminate response header and only attempt to wait for termination if container has state service.
Comment #21
znerol commentedRetain a reference to the lock, otherwise it will be released prematurely.
Comment #22
catchThis is outdated now. New approach looks very encouraging to me.
Comment #23
znerol commentedYes. And since we are not accessing the
Request/Responseobject anymore it doesn't need to go into a stack middleware. Not sure if there is a better location for the code though.Comment #25
znerol commentedSwitching to
\Drupal:lock(). We know that this is working in other cases, so let's just use that.Comment #26
znerol commentedComment #27
catchJust using lock seems both simpler and more reliable. Haven't done a line by line review yet but also nothing stuck out either, so +1.
Comment #28
bradjones1I did a once-over of the revised patch and I'm a bit confused as to where and how it's actually waiting for termination of the post-response work. From my read of the earlier comments I figured that there would be, say, a service added that would run after all other destructable services that would release the lock, but it appears this is being done within the test HTTP client? Isn't the whole problem here that the client side doesn't know how long the post-response work takes, and has to be polling some state that is asynchronous from the request->response cycle? I could definitely be reading this wrong as I am not an expert on locks (I always get turned around when implementing them) but I'm suspicious that this might not be signaling the end of work so much as adding a few ticks, and as a result the timing works out?
Comment #29
catch@bradjones I think we need a comment as a reminder, but it's fine:
DatabaseLockBackend for example does this:
So acquiring a lock at the beginning of the request, this will then run at the end of the request after all post response tasks have run, clearing out every lock that was acquired.
It's mostly a safety catch so that a request can't hold a lock even after it's finished (say if there's a fatal error before the lock can be released) but happens to do exactly what we need here.
The patch is using the default values for ::acquire() and ::wait() which means it's acquired for 30 seconds. Lock:wait() intelligently handles polling while it's waiting (i.e. it starts with millseconds and ends up at hundreds of milliseconds), so that works for making sure we wait the minimum time necessary.
Comment #31
bradjones1Ahhh, yeah, OK. That makes sense and appreciate the clarification. I think a comment that states how it works is helpful here for maintainability. Thanks.
Comment #32
spokjeThis seems ready for core comitters review, and is IMHO a much more deterministic way then whacking
sleep(1)into tests.Comment #34
catchCommitted 1debb39 and pushed to 11.x. Thanks!