Problem/Motivation

https://git.drupalcode.org/issue/drupal-1797438/-/jobs/9286923#L2495

Composer Requirement (Drupal\Tests\package_manager\Functional\ComposerRequirement)
     ✘ Composer info shown
       ┐
       ├ Behat\Mink\Exception\ResponseTextException: The text "Composer was not found. The error message was: composer is not a thing" was not found anywhere in the text of the current page.
       │
       │ /builds/vendor/behat/mink/src/WebAssert.php:907
       │ /builds/vendor/behat/mink/src/WebAssert.php:293
       │ /builds/core/tests/Drupal/Tests/WebAssert.php:981
       │ /builds/core/modules/package_manager/tests/src/Functional/ComposerRequirementTest.php:53
       ┴

Steps to reproduce

Proposed resolution

Use $this->drupalGet() instead of $this->getSession()->reload().

https://git.drupalcode.org/project/drupal/-/merge_requests/15925 basically has the code from main, with a bogus comment, so we can re-run the random fail job.
https://git.drupalcode.org/project/drupal/-/jobs/10013153 ran 100 times and failed twice.

https://git.drupalcode.org/project/drupal/-/merge_requests/15926 has the fix.
https://git.drupalcode.org/project/drupal/-/jobs/10013205 ran 100 times with 0 failures.

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Issue fork drupal-3584277

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

kentr created an issue. See original summary.

dww’s picture

Yup, hit this at https://git.drupalcode.org/project/drupal/-/jobs/10010465 for example. I've seen it many times.

dww’s picture

Status: Active » Needs review

Opened a branch/MR as the "baseline" with only a comment change. Hopefully if we repeat this test there, we should see N random fails per 1000 runs.

Opened a real branch/MR with a proposed fix. Hopefully if we repeat that, we'll see all green after N runs.

Thanks to #3576458: [regression] Subsystem and Topics maintainers require access to re-run, trigger, or view tests I can't trigger the repeat test class job in either pipeline. 😢

dww’s picture

Issue summary: View changes
dww’s picture

Issue summary: View changes
godotislate’s picture

Status: Needs review » Reviewed & tested by the community

Discussed with @dww on Slack. The guess is that fix works because "the reload returns before the payload is ready, while drupalGet() is more thorough and only returns once the page has actually refreshed."

I think this is good for RTBC, unless someone thinks a higher repeat count is needed.

dww’s picture

@godotislate asked me in Slack why the fix works. I'm not 💯 sure, but my guess is that the reload() returns before the payload is ready, while drupalGet() is more thorough and only returns once the page has actually refreshed. The reload() approach is only used in a small # of core tests:

  1. core/tests/Drupal/FunctionalTests/Installer/InstallerExistingConfigSyncDirectoryProfileHookInstallTest.php
  2. core/modules/ckeditor5/tests/src/FunctionalJavascript/CKEditor5Test.php
  3. core/modules/ckeditor5/tests/src/FunctionalJavascript/MediaPreviewTest.php
  4. core/modules/ckeditor5/tests/src/FunctionalJavascript/MediaTest.php
  5. core/modules/layout_builder/tests/src/Functional/LayoutBuilderBlocksTest.php
  6. core/modules/layout_builder/tests/src/Functional/LayoutBuilderTest.php
  7. core/modules/layout_builder/tests/src/FunctionalJavascript/ContentPreviewToggleTest.php
  8. core/modules/layout_builder/tests/src/FunctionalJavascript/LayoutBuilderUiTest.php
  9. core/modules/media/tests/src/Functional/MediaOverviewPageTest.php
  10. core/modules/node/tests/src/Functional/NodeAccessCacheabilityWithNodeGrantsTest.php
  11. core/modules/package_manager/tests/src/Functional/ComposerRequirementTest.php
  12. core/modules/workflows/tests/src/Functional/WorkflowUiTest.php

Lots of those seem like familiar faces in the random fail world. Not sure if we should bulk update all of them, or deal with each in their own issues (so it's easier to do the repeat test class checks on each one)...

dww’s picture

Hehe, x-post. But cool. Thanks for the RTBC!

dww’s picture

Issue tags: +Bug Smash Initiative
dww’s picture

xjm’s picture

I triggered several more batches; .98^500 == 0.004% chance it somehow didn't fail given the previous baseline, so we can be pretty confident the fix hath fixed.

xjm’s picture

Edit: oops lol, got this on batch 5 of the fix batch job:

       ┐
       ├ Behat\Mink\Exception\ResponseTextException: The text "Composer was not found. The error message was: composer is not a thing" was not found anywhere in the text of the current page.

Is that what this is supposed to have fixed?

dww’s picture

Re: #14: 😢 Yes, bummer. So the answer to my didn't-really-prove-it assertion that drupalGet() is "more thorough" just means "slower, shrinking the window for the race condition, but not actually closing it".

Might still be worth committing this if it improves the failure rate by a lot.

Or maybe we need something like #3582376: Add a reusable page-transition-aware method to WebDriverTestBase for form submissions for Functional tests, too. So we can do a drupalGet(), followed by a waitForSomethingSomething() that we expect on the page, before we just immediately assert if we see the text we're expecting.

xjm’s picture

Status: Reviewed & tested by the community » Needs review

I triggered the baseline job for some more batches as well.

Repeat 2: 0 fails
Repeat 3: 1 fail
Repeat 4: 0 fails
Repeat 5: 0 fails
Repeat 6: 0 fails

So unfortunately there isn't actually a clear order-of-magnitude improvement here; we're getting too much noise from the underlying performance whatevers affecting the jobs for there to be any statistically significant meaning in how many times it passes. We need to slow it down artificially or introduce a hard failure when the random fail is present in this kind of scenario.

dww’s picture

Okay, actually did a little real investigation here. 😅 I downloaded artifacts for one of the repeat fail jobs on the "fix" MR. The page that was supposed to fail but didn't still has the real info for composer:

Composer version
2.9.7 (/usr/local/bin/composer)

That means that this line in the test didn't actually "stick":

    TestExecutableFinder::throwFor('composer');

That method does this:

  public static function throwFor(string $name): void {
    \Drupal::state()->set("throw for $name", TRUE);
  }

So perhaps the random fail is something about the copy of the status report before this state was set is still being cached some of the time? Pushed a commit to call drupal_flush_all_caches() between setting the state and reloading the page. Probably overkill. Perhaps we could more selectively just clear the page cache and/or the render cache or something. But curious to see if this one ever fails.

godotislate’s picture

dww’s picture

Okay, reverted all the changes to the test. Pushed a new commit to convert TestExecutableFinder to use KeyValueStore instead of State. Test passes locally. We need a new round of repeat tests on https://git.drupalcode.org/project/drupal/-/pipelines/831550 and hopefully we never see a failure...

godotislate’s picture

Status: Needs review » Reviewed & tested by the community

Changes to replace state with KV look good.
https://git.drupalcode.org/project/drupal/-/jobs/10015234 repeat test set to run 2000. All green. +1 for RTBC.

dww’s picture

Did some grep investigating. There's a lot of State in package_manager tests. Instead of trying to do all of that here, opened a follow-up:

#3592813: Replace usage of State in package_manager tests with KeyValue

If the scope gets too unwieldy there, we can turn that into a plan and have child issues for parts of it...

  • catch committed 440c6903 on main
    fix: #3584277 [random test failure] ComposerRequirementTest::...
catch’s picture

Version: main » 11.4.x-dev
Status: Reviewed & tested by the community » Fixed

Committed/pushed to main, 11.x and 11.4.x, thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

  • catch committed 9e64804d on 11.4.x
    fix: #3584277 [random test failure] ComposerRequirementTest::...

  • catch committed 610e3dec on 11.x
    fix: #3584277 [random test failure] ComposerRequirementTest::...

dww’s picture

Version: 11.4.x-dev » 11.3.x-dev
Status: Fixed » Reviewed & tested by the community

Could this be backported to 11.3.x? Cherry-pick is clean.

Thanks!
-Derek

catch’s picture

We've stopped committing to 11.3.x from now in preparation for 11.4 and the 11.3 branch going security-only. I guess random test failures fall into something that it doesn't matter if we fix in a security release (actually a good thing) but will discuss the branch management aspects with other committers.

  • catch committed 9efb4795 on 11.3.x
    fix: #3584277 [random test failure] ComposerRequirementTest::...
catch’s picture

Status: Reviewed & tested by the community » Fixed

Yep, let's backport. Cherry-picked to 11.3.x, thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.