Problem/Motivation

9.2-dev HEAD test failed today in CKEditorIntegrationTest in the media_library module:
https://www.drupal.org/pift-ci-job/1942623

In case that goes away, the error says:

Media_library.Drupal\Tests\media_library\FunctionalJavascript\CKEditorIntegrationTest
	✗	
Drupal\Tests\media_library\FunctionalJavascript\CKEditorIntegrationTest

fail: [Other] Line 0 of sites/default/files/simpletest/phpunit-100.xml:
PHPUnit Test failed to complete; Error: PHPUnit 8.5.13 by Sebastian Bergmann and contributors.

Testing Drupal\Tests\media_library\FunctionalJavascript\CKEditorIntegrationTest
.F.                                                                 3 / 3 (100%)

Time: 1.04 minutes, Memory: 4.00 MB

There was 1 failure:

1) Drupal\Tests\media_library\FunctionalJavascript\CKEditorIntegrationTest::testButton
Failed asserting that a NULL is not empty.

/var/www/html/vendor/phpunit/phpunit/src/Framework/Constraint/Constraint.php:118
/var/www/html/vendor/phpunit/phpunit/src/Framework/Constraint/LogicalNot.php:111
/var/www/html/vendor/phpunit/phpunit/src/Framework/Assert.php:2887
/var/www/html/vendor/phpunit/phpunit/src/Framework/Assert.php:863
/var/www/html/core/modules/media_library/tests/src/FunctionalJavascript/CKEditorIntegrationTest.php:261
/var/www/html/vendor/phpunit/phpunit/src/Framework/TestResult.php:691

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Issue fork drupal-3192260

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhodgdon created an issue. See original summary.

dww’s picture

catch’s picture

Title: Random fail in media_library CKEditorIntegrationTest » [random test failure] Random fail in media_library CKEditorIntegrationTest

danflanagan8 made their first commit to this issue’s fork.

danflanagan8’s picture

Adding a (likely) related issue.

danflanagan8’s picture

Status: Active » Needs review
danflanagan8’s picture

Let's start over kind of...

Here's a patch that will simply runs the test that is randomly failing 100 times. Let's see if it fails and then go from there. I've never personally done this before. Fingers crossed.

I was referred by @longwave to #3055648: Frequent random fail in \Drupal\Tests\media_library\FunctionalJavascript\MediaLibraryTest.

danflanagan8’s picture

FileSize
1.92 KB

Oops. Missing some quotes. Trying again.

longwave’s picture

Unfortunately the Jenkins log shows it did run the test 100 times but it succeeded each time - either 100 times is not enough or there is some other trigger for this bug :(

danflanagan8’s picture

And I ran it a second 100 times! I'm going to pump this up. The related issue I added tested 400 times. I'm going to give that a shot.

danflanagan8’s picture

Status: Needs review » Needs work

Well that's 600 consecutive passes of the randomly failing test. This may be a tough one to trigger and then convincingly consider fixed.

If it is indeed randomly failing, it seems like it simply must be due to the relatively short wait being used for certain ajax. That's what the MR changes, for what it's worth.

longwave’s picture

Status: Needs work » Reviewed & tested by the community

Theory: due to parallelism in the test runner, the random fail only happens when certain other heavyweight tests are running at the same time that cause enough CPU load for this particular test to slow down and not react in time. But because this is effectively random due to the the various combinations of JavaScript tests that *could* be running at once, triggering this on demand is going to be difficult.

I don't see how increasing the timeout can hurt, it doesn't mean the test will take longer, it just means that it will consider waiting longer if the element is not there yet - and the existing values seem to be somewhat arbitrary. Therefore I'm going to mark this RTBC even though we couldn't prove that it fixes the issue.

longwave’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
1.94 KB

In fact let's try bumping concurrency to 30 and repeat to 100 and see what happens - unsure if it will run out of memory doing this though.

dww’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, @danflanagan8 and @longwave for moving this forward!

Drat, #14 didn't trigger this, either.

However, the theory in #13 sounds totally plausible. We haven't made any other real progress tracking this down or conclusively proving where the race condition lies.

The MR indeed only removes the custom timeouts for waitForElementVisible() so they use the default of 10000 milliseconds. Those were added in commit 6280478a1a from #2994699: Create a CKEditor plugin to select and embed a media item from the Media Library when this test was first added to core. Searching that issue for "waitForElementVisible", there are a few hits, but no explanations as to why those shorter timeouts were chosen. Therefore, +1 to @longwave's characterization of them as "arbitrary". Agreed that using the default value won't always cause this test to run longer, just that it'll potentially wait longer if it needs to, to avoid the random fail. Seems like a win to me.

We could potentially try #14 again, but artificially shrink those timeouts to only 10 milliseconds each or something, but I don't think it's worth the trouble. We clearly have race conditions in this test, and letting the test wait longer for the elements it's looking for is a good way to reduce or eliminate those races.

Therefore, I think we should commit this. RTBC.

Thanks again!

  • catch committed e620e75 on 9.2.x
    Issue #3192260 by danflanagan8, longwave, jhodgdon, dww: [random test...

  • catch committed e2d1b73 on 9.1.x
    Issue #3192260 by danflanagan8, longwave, jhodgdon, dww: [random test...

  • catch committed 8186885 on 8.9.x
    Issue #3192260 by danflanagan8, longwave, jhodgdon, dww: [random test...
catch’s picture

Version: 9.2.x-dev » 8.9.x-dev
Status: Reviewed & tested by the community » Fixed

Wondering if this was an increase when it was first committed, then the default was increased later, but see no reason to artificially shorten the timeouts.

Committed/pushed to 9.2.x and cherry-picked to 9.1.x and 8.9.x, thanks!

danflanagan8’s picture

Thanks everyone!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

xjm’s picture