Closed (fixed)
Project:
Drupal core
Version:
11.3.x-dev
Component:
package_manager.module
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
11 Apr 2026 at 16:10 UTC
Updated:
15 Jun 2026 at 08:35 UTC
Jump to comment: Most recent
Comments
Comment #2
dwwYup, hit this at https://git.drupalcode.org/project/drupal/-/jobs/10010465 for example. I've seen it many times.
Comment #5
dwwOpened a branch/MR as the "baseline" with only a comment change. Hopefully if we repeat this test there, we should see N random fails per 1000 runs.
Opened a real branch/MR with a proposed fix. Hopefully if we repeat that, we'll see all green after N runs.
Thanks to #3576458: [regression] Subsystem and Topics maintainers require access to re-run, trigger, or view tests I can't trigger the repeat test class job in either pipeline. 😢
Comment #6
dwwComment #7
dwwComment #8
godotislateDiscussed with @dww on Slack. The guess is that fix works because "the reload returns before the payload is ready, while drupalGet() is more thorough and only returns once the page has actually refreshed."
I think this is good for RTBC, unless someone thinks a higher repeat count is needed.
Comment #9
dww@godotislate asked me in Slack why the fix works. I'm not 💯 sure, but my guess is that the
reload()returns before the payload is ready, whiledrupalGet()is more thorough and only returns once the page has actually refreshed. Thereload()approach is only used in a small # of core tests:Lots of those seem like familiar faces in the random fail world. Not sure if we should bulk update all of them, or deal with each in their own issues (so it's easier to do the repeat test class checks on each one)...
Comment #10
dwwHehe, x-post. But cool. Thanks for the RTBC!
Comment #11
dwwComment #12
dwwSorta related, at least...
Comment #13
xjmI triggered several more batches;
.98^500 == 0.004%chance it somehow didn't fail given the previous baseline, so we can be pretty confident the fix hath fixed.Comment #14
xjmEdit: oops lol, got this on batch 5 of the fix batch job:
Is that what this is supposed to have fixed?
Comment #15
dwwRe: #14: 😢 Yes, bummer. So the answer to my didn't-really-prove-it assertion that
drupalGet()is "more thorough" just means "slower, shrinking the window for the race condition, but not actually closing it".Might still be worth committing this if it improves the failure rate by a lot.
Or maybe we need something like #3582376: Add a reusable page-transition-aware method to WebDriverTestBase for form submissions for Functional tests, too. So we can do a
drupalGet(), followed by awaitForSomethingSomething()that we expect on the page, before we just immediately assert if we see the text we're expecting.Comment #16
xjmI triggered the baseline job for some more batches as well.
Repeat 2: 0 fails
Repeat 3: 1 fail
Repeat 4: 0 fails
Repeat 5: 0 fails
Repeat 6: 0 fails
So unfortunately there isn't actually a clear order-of-magnitude improvement here; we're getting too much noise from the underlying performance whatevers affecting the jobs for there to be any statistically significant meaning in how many times it passes. We need to slow it down artificially or introduce a hard failure when the random fail is present in this kind of scenario.
Comment #17
dwwOkay, actually did a little real investigation here. 😅 I downloaded artifacts for one of the repeat fail jobs on the "fix" MR. The page that was supposed to fail but didn't still has the real info for composer:
That means that this line in the test didn't actually "stick":
That method does this:
So perhaps the random fail is something about the copy of the status report before this state was set is still being cached some of the time? Pushed a commit to call
drupal_flush_all_caches()between setting the state and reloading the page. Probably overkill. Perhaps we could more selectively just clear the page cache and/or the render cache or something. But curious to see if this one ever fails.Comment #18
godotislateAh. State. Again.
#3506148: [meta] Replace all uses of state in tests with direct key value
Comment #19
dwwOkay, reverted all the changes to the test. Pushed a new commit to convert
TestExecutableFinderto useKeyValueStoreinstead ofState. Test passes locally. We need a new round of repeat tests on https://git.drupalcode.org/project/drupal/-/pipelines/831550 and hopefully we never see a failure...Comment #20
godotislateChanges to replace state with KV look good.
https://git.drupalcode.org/project/drupal/-/jobs/10015234 repeat test set to run 2000. All green. +1 for RTBC.
Comment #21
dwwDid some grep investigating. There's a lot of
Stateinpackage_managertests. Instead of trying to do all of that here, opened a follow-up:#3592813: Replace usage of State in package_manager tests with KeyValue
If the scope gets too unwieldy there, we can turn that into a plan and have child issues for parts of it...
Comment #23
catchCommitted/pushed to main, 11.x and 11.4.x, thanks!
Comment #29
dwwCould this be backported to 11.3.x? Cherry-pick is clean.
Thanks!
-Derek
Comment #30
catchWe've stopped committing to 11.3.x from now in preparation for 11.4 and the 11.3 branch going security-only. I guess random test failures fall into something that it doesn't matter if we fix in a security release (actually a good thing) but will discuss the branch management aspects with other committers.
Comment #32
catchYep, let's backport. Cherry-picked to 11.3.x, thanks!