When a FunctionalJavascript test calls $page->pressButton('Save') followed by assertions on the new page, there is a race condition: the old page may not have navigated yet when the assertion runs.

This came up in #3576262: [random test failure] MediaSourceFileTest::testMediaFileSource where MediaSourceFileTest failed randomly because assertions ran against the old page.

Many individual tests work around this with waitForElement(). But as donquixote pointed out in #3576262-8: [random test failure] MediaSourceFileTest::testMediaFileSource, this approach has limitations:

  • You need to know which specific element to wait for.
  • The old page might already have that element (e.g. status messages), causing a false match.
  • Failure messages don't distinguish "navigation didn't happen" from "page loaded but wrong content."

A rough grep shows this pattern could be widespread:

# Form-submit-style pressButton calls
find core -path "*/FunctionalJavascript*" -name "*.php" \
  | xargs grep -c "pressButton('Save\|pressButton('Submit\|pressButton('Delete\|pressButton('Confirm" \
  | grep -v ":0" | awk -F: '{s+=$2} END {print s}'
172

# Of those, followed by a waitFor within 3 lines
find core -path "*/FunctionalJavascript*" -name "*.php" \
  | xargs grep -A3 "pressButton('Save\|pressButton('Submit\|pressButton('Delete\|pressButton('Confirm" \
  | grep -c "waitFor"
47

172 pressButton() calls, 47 followed by a waitFor. This is a rough count, but it suggests the problem is not isolated.

A potential solution is to add a reusable method to WebDriverTestBase that:

  1. Detects when the old page has been unloaded (e.g. via a JavaScript sentinel variable).
  2. Waits for the new page to finish loading (document.readyState === 'complete').
  3. Waits for AJAX to complete (drupalActiveXhrCount === 0).
  4. Provides clear failure messages for each phase.

An initial implementation was prototyped in #3576262 (branch 3576262-random-test-failure-MediaSourceFileTest) by @donquixote.

  1. Agree on API design and naming
  2. Migrate (some) existing pressButton() calls to validate this is actually an improvement

Issue fork drupal-3582376

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

dries created an issue. See original summary.

donquixote’s picture

Thanks @dries!

dww’s picture

Yes, this would be very helpful for reducing FunctionalJavascript random fails. Although we don't normally use the priority for features, I'm going to bump this to at least 'Major' due to the potential to use this to solve more of #2829040: [meta] Known intermittent, random, and environment-specific test failures (which we generally consider critical bugs).

We typically use the "phpunit" component for stuff specific to the test world, while "javascript" would be for Drupal's front-end JS code.

I wonder if it's possible / desirable to fix the behavior of pressButton() itself so that all the tests would immediately be protected from the race condition, instead of requiring effort to convert them. But the problem would be tests that press buttons that trigger AJAX/HTMX behaviors and don't actually do a full page load. 😢 So yeah, probably we need a way to opt-in to this behavior, likely with a separate function.

I remember seeing a bunch of work already done for this, probably while looking at the MR for #3576262: [random test failure] MediaSourceFileTest::testMediaFileSource. Would be nice to bring that over here for consideration. That'd likely help with the API design / naming task, so we can be clear on what method(s) need better name(s). 😉

pressButtonAndWaitForPageLoad() was mentioned at the other issue. That's pretty verbose, but also very self-documenting. I love self-documenting, and don't mind verbose. So maybe that's a good place to start. Perhaps someone will come up with something more elegant.

Thanks!
-Derek

donquixote’s picture

I just recreated the MR from the other issue here.
I am going to work on some tests to see if all the 3 wait conditions are necessary.

I love self-documenting, and don't mind verbose.

Yeah Captain Obvious gets too much hate!

Currently we have nice and short methods like ->statusMessageContains(), but nobody could guess from just the name that this method would wait for the status message to appear.
This can lead to magical thinking where if one method is so nice to wait, then maybe other methods will do the same. But then they don't.

dww’s picture

#bikeshed: I'm not sure "Load" is needed. pressButtonAndWaitForPage() is 4 chars shorter and IMHO, still expresses what the function would do...

This can lead to magical thinking where if one method is so nice to wait, then maybe other methods will do the same. But then they don't.

This is definitely a problem. 😅 I'm fairly familiar with the guts of our tests, and I still can't always remember what test-related methods have which side effects. 😂 Lots about our test framework invites "magical thinking" -- there's a lot of magic in there! So yeah, I'm all in favor of adding new methods that are more self-documenting to help clarify what's what.

That said, something like waitForStatusMessageAndSeeIfItContains() would be pretty awful. 😉

Maybe the convention could be {thing}{verb}WithWait()? So statusMessageContainsWithWait()?

The pressButton...() case is a little funny, b/c we're not first waiting for the button so we can press it. We're talking about pressing a button (which presumably we're already sure is on the page), and then waiting for the new page to load. Can't think of any other examples where that'd be the automatic waiting behavior. So I think it's fine to have a different naming scheme. Normally it's "wait for thing, then see if it is/contains/whatever"...

donquixote’s picture

Some notes on the proposed solution:

Test method vs assert session method

The current proposal adds the new methods on the test class.
Instead we could add them on JSWebAssert, where we already have wait methods like waitForText() etc.

Old page sentinel value, and operation wrapping

The proposed solution wraps the operation that is expected to cause the page transition.
This allows it to to create the sentinel value in the old page, to later check if that has gone away.

    // (Renamed the method to make it clear...)
    $this->runOperationAndWaitForPageTransition(
      fn () => $this->getSession()->getPage()->pressButton($label),
    );

This is different from existing waitFor() methods in JSWebAssert, these are always called _after_ an operation.

$this->getSession()->getPage()->pressButton($label);
// Method 
$this->waitForElementOnNewPage(...);

That existing pattern won't work if we want to set the sentinel value before the operation.

If we don't like the wrapper, we could do something like this:

$wait_for_new_page = $this->assertSession()->prepareWaitForPageTransition();
$this->getSession()->getPage()->pressButton($label);
$wait_for_new_page();

Or instead we could store that state directly in the session object.

$this->assertSession()->markCurrentPage();
$this->getSession()->getPage()->pressButton($label);
$this->assertSession()->waitForPageTransition();

But it would be too easy to miss that first step, and then we can have flimsy tests that think they are on the next page when they are not.

We could avoid that sentinel step if we could reliably inject a script on every page load that would detect user interaction.
(Generally, we know we are are still on the old page if the user has interacted with it, while on the new page that will not be the case.)
We could add that via Drupal js, but that seems like messing with the system under test.

I also tried navigator.userActivation.hasBeenActive, but that does not help. It is still true on the new page, because it remembers the state. (or that's how it seems to work at least in these tests)

donquixote’s picture

This is definitely a problem. 😅 I'm fairly familiar with the guts of our tests, and I still can't always remember what test-related methods have which side effects. 😂 Lots about our test framework invites "magical thinking" -- there's a lot of magic in there!

If we really want to prevent magical thinking, we could make tests fail if people don't explicitly wait :)

donquixote’s picture

Now thinking about tests.

Ideally I want to "prove" that each wait step is sufficient and necessary.
To do this, I would create a DelayedSubmitForm with different delay types based on a query parameter.

Now my concern is that these tests would be naturally slow, because some operations in this test need to wait out the artificial wait time, and that artificial wait time needs to be longer than the expected range of natural delays in a test - which could be 1 second or maybe even 5 seconds.

EDIT:
One idea I had was to start with a short artificial wait time and then "pump it up" incrementally until the test behaves as expected. But I am not even sure this would solve the problem.