Problem/Motivation

This failed a couple of times in a row on gitlab, but eventually passed.

https://git.drupalcode.org/issue/drupal-3572628/-/jobs/8687319#L744

Media Source File (Drupal\Tests\media\FunctionalJavascript\MediaSourceFile)
  ✘ Media file source
    ┐
    ├ Behat\Mink\Exception\ElementHtmlException: The text "text--plain.png" was not found in the attribute "src" of the element matching css "img".
    │
    │ /builds/vendor/behat/mink/src/WebAssert.php:927
    │ /builds/vendor/behat/mink/src/WebAssert.php:683
    │ /builds/core/modules/media/tests/src/FunctionalJavascript/MediaSourceFileTest.php:100

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#7 3576262-wait-for-elements.patch2.96 KBdries

Issue fork drupal-3576262

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

kentr created an issue. See original summary.

kentr’s picture

Title: [random test failure] MediaSourceFile::testMediaFileSource » [random test failure] MediaSourceFileTest::testMediaFileSource

Another instance:

https://git.drupalcode.org/issue/drupal-3574093/-/jobs/9040665#L744

Media Source File (Drupal\Tests\media\FunctionalJavascript\MediaSourceFile)
     ✘ Media file source
       ┐
       ├ Behat\Mink\Exception\ElementHtmlException: The text "text--plain.png" was not found in the attribute "src" of the element matching css "img".
donquixote’s picture

This is actually happening to me locally randomly.
If I run it repeatedly, I get different fails or passes at seemingly random.
Looking into it.

donquixote’s picture

I think I found what is happening.

When we call ->pressButton('Save'), a few microseconds can pass before the page is actually submitted.
Maybe there are some validation scripts happening.

I am working on a solution.

dries’s picture

I ran into this bug in #3579381: Skip unnecessary COUNT query in Views Sql::execute() and started looking at this one. I thought it was odd that this wasn't a solved problem in Core, so I looked at other places where pressButton() is followed by assertions on the new page. I quickly found several places.

One such place, InlineBlockTestBase (line 92-93) handles this with the existing waitFor* methods:

$page->pressButton('Save layout');
$this->assertNotEmpty($assert_session->waitForElement('css', '.messages--status'));

Based on existing patterns, I'm not sure we need a new pressSubmitButton. This might be a 1-line change in two different places?

1. After the pressButton('Save') at line 99, add $assert_session->waitForElement('css', 'img') to wait for the image element on the new page before checking its src attribute.

2. At line 114-118, move the statusMessageContains() call (which already has a built-in wait) before the loadUnchanged() call, so the entity loads after the save has actually completed.

dries’s picture

StatusFileSize
new2.96 KB

Follow-up: I asked Claude Code (full disclosure) to prototype and validate my suggestion on my localhost. The scope was tiny (just a couple lines) so Claude mostly helped automate the running of the tests. Less typing and fiddling for me, and more observing.

Unmodified test: 4 out of 5 runs failed with actual test assertion failures. Super flaky indeed.

The 2-spot fix I suggested above resulted in 4 passes out of 5. I ran the tests a couple more times and the remaining failures exposed a 3rd race condition at line 67 (addressEquals right after pressButton) and a 4th in MediaSourceTestBase line 149.

So I went back to the MR and it correctly identifies 5 pressButton('Save') calls that need fixing (4 in MediaSourceFileTest, 1 in MediaSourceTestBase). That was not clear from the comments, but is now clear from the MR and my testing.

Claude also pointed out that in my proposal waiting for img is rather generic. If there is any image on the page (like a logo), waitForElement('css', 'img') could match immediately. So it recommended we wait for .messages--status instead, which only appears after the save completes.

I think the existing waitFor* methods can do the job without new infrastructure. That said, the MR might be more robust in how it waits (sentinel + readyState + AJAX count).

I attached my patch for comparison. I ran the tests 20 times, all green. It's smaller than the MR, but that does not mean it's better. I think either approach works and the actual difference could be taste/cosmetic, so will let others decide.

If anything, my exploration was good independent verification that the bugs are real and fixable. And I learned several new things along the way. It feels like the MR is RTBC, if we wanted.

donquixote’s picture

Hello Dries!
The motivation for my solution was this:

  • I want this to be reusable, without knowing what specific element to check for.
    E.g. here it might be ok to check for ".messages--status".
    But what if the old page already had status messages? Then it won't work.
  • I want a useful failure message that distinguishes between "Navigation did not happen" vs "Navigation did happen, but the next page is not what you expected.".
    I believe in general our tests tend to be too vague in their failure messages. If a message is not found, that could be because the message text is not exactly as expected, or it could be a total wsod.

I imagine this won't be the last place like this, and there may be other operations besides "Press button" that expect a page reload. And I expect that contrib likes to copy this stuff.

On the other hand: Any "reusable" feature would then have to be supported in the future, so we need to be confident in it.
Whereas a one-off solution can be changed later if needed.

Maybe we can go with the simpler one-off solutions and then do follow-up work.
But I saw this as an opportunity to craft something we could reuse.

dries’s picture

Makes sense, donquixote!

I just did a rough grep to get a sense of how widespread this pattern is:

find core -path "*/FunctionalJavascript*" -name "*.php" \
  | xargs grep -c "pressButton('Save\|pressButton('Submit\|pressButton('Delete\|pressButton('Confirm" \
  | grep -v ":0" | awk -F: '{s+=$2} END {print s}'
172

Of those, how many are followed by a waitFor within 3 lines:

find core -path "*/FunctionalJavascript*" -name "*.php" \
  | xargs grep -A3 "pressButton('Save\|pressButton('Submit\|pressButton('Delete\|pressButton('Confirm" \
  | grep -c "waitFor"
47

So 47 of the 172 have a wait. This is a rough count with obvious flaws, but it suggests the pressSubmitButton() in the MR could be very valuable.

One small naming thought: the distinction between pressButton() and pressSubmitButton() isn't immediately clear or intuitive IMO. Something like pressButtonAndWaitForPageLoad() would better communicate that the method waits for a page transition?

Ideally, we'd be able to add a parameter to the existing method like $page->pressButton('Save', waitForPageLoad: true), but that is not so easy I think.

Let's see what the reviewers or sub-system maintainers say first, and maybe we create a "Evaluate adding a page-load-aware submit method to WebDriverTestBase and migrate existing pressButton calls" issue?

Personally, I'd commit the minimal fix for this critical bug (my patch in #7), and then address the broader pressSubmitButton() / pressButtonAndWaitForPageLoad() infrastructure in a follow-up issue. It could potentially benefit from more investigation and discussion.

donquixote’s picture

One small naming thought: the distinction between pressButton() and pressSubmitButton() isn't immediately clear or intuitive IMO. Something like pressButtonAndWaitForPageLoad() would better communicate that the method waits for a page transition?

I am ok with name changes. I imagine others might also have opinions :)

One difference between the two is one is provided by the test class, the other by the page object.
(EDIT) So the page is controlled by mink, not Drupal, we don't simply add methods to that.

Another option is to be more explicit, and explicitly call this:

    $this->drupalSubmit(fn () => $this->getSession()->getPage()->pressButton($label));

We could also rename this:

    $this->navigate(fn () => $this->getSession()->getPage()->pressButton($label));
donquixote’s picture

One thing to consider is that the button cannot always be safely identified only by its name.
There can be cases with ambiguity, which needs to be resolved by selecting a button from a specific parent element, or using an advanced selector.
This is what the wrapper is meant to solve.

But...

Personally, I'd commit the minimal fix for this critical bug (my patch in #7), and then address the broader pressSubmitButton() / pressButtonAndWaitForPageLoad() infrastructure in a follow-up issue. It could potentially benefit from more investigation and discussion.

Sure!
The advanced solution needs more discussion.

dries’s picture

Status: Active » Needs review

Thanks @donquixote, those are all good points. I created a follow-up issue: #3582376: Add a reusable page-transition-aware method to WebDriverTestBase for form submissions. I tried to capture everything in the summary.

That gives it the space it deserves for naming, API design, and migrating some more pressButton() calls.

For this issue, can we commit the minimal waitFor* fix in comment #7 to unblock things?

dries changed the visibility of the branch 3576262-random-test-failure-MediaSourceFileTest to hidden.

dries’s picture

I created a new MR with the minimal waitFor fix from my earlier comment #7. I used a new branch since the approach is different from @donquixote's MR.

dries’s picture

Status: Needs review » Reviewed & tested by the community

  • amateescu committed 2fce13e9 on 10.6.x
    fix: #3576262 [random test failure] MediaSourceFileTest::...

  • amateescu committed fdfa98a9 on 11.3.x
    fix: #3576262 [random test failure] MediaSourceFileTest::...

  • amateescu committed f698f76d on 11.x
    fix: #3576262 [random test failure] MediaSourceFileTest::...

  • amateescu committed 115e5e56 on main
    fix: #3576262 [random test failure] MediaSourceFileTest::...
amateescu’s picture

Status: Reviewed & tested by the community » Fixed

ommitted and pushed 115e5e56ba5 to main and f698f76d12b to 11.x and fdfa98a92c9 to 11.3.x and 2fce13e903b to 10.6.x. Thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

amateescu’s picture

Version: main » 10.6.x-dev

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.