Problem/Motivation

There seem to be numerous random fails in ImageStylesPathAndUrlTest

testImageStylePrivateWithConversion()

https://git.drupalcode.org/issue/drupal-3483501/-/jobs/3348830

There was 1 failure:
1)
Drupal\Tests\image\Functional\ImageStylesPathAndUrlTest::testImageStylePrivateWithConversion
Behat\Mink\Exception\ExpectationException: Current response status code is
403, but 200 expected.
/builds/issue/drupal-3483501/vendor/behat/mink/src/WebAssert.php:888
/builds/issue/drupal-3483501/vendor/behat/mink/src/WebAssert.php:145
/builds/issue/drupal-3483501/core/modules/image/tests/src/Functional/ImageStylesPathAndUrlTest.php:315
/builds/issue/drupal-3483501/core/modules/image/tests/src/Functional/ImageStylesPathAndUrlTest.php:136
FAILURES!
Tests: 10, Assertions: 240, Failures: 1.

testImageStyleUrlAndPathPrivate()

https://git.drupalcode.org/issue/drupal-2630732/-/jobs/3305640

There was 1 failure:
1)
Drupal\Tests\image\Functional\ImageStylesPathAndUrlTest::testImageStyleUrlAndPathPrivate
Behat\Mink\Exception\ExpectationException: Current response status code is
403, but 200 expected.
/builds/issue/drupal-2630732/vendor/behat/mink/src/WebAssert.php:888
/builds/issue/drupal-2630732/vendor/behat/mink/src/WebAssert.php:145
/builds/issue/drupal-2630732/core/modules/image/tests/src/Functional/ImageStylesPathAndUrlTest.php:315
/builds/issue/drupal-2630732/core/modules/image/tests/src/Functional/ImageStylesPathAndUrlTest.php:85
FAILURES!
Tests: 10, Assertions: 240, Failures: 1.

...

Steps to reproduce

Proposed resolution

The root cause of this issue is discussed in #3496257: Race conditions in CacheCollector/State (again)

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Issue fork drupal-3487371

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

dww created an issue. See original summary.

donquixote’s picture

This use of randomString() looks suspicious.

    $this->drupalGet(\Drupal::service('file_url_generator')->generateAbsoluteString($directory . '/' . $this->randomString()));

EDIT: But does not explain why the test would fail further up.

spokje’s picture

Assigned: Unassigned » spokje

spokje changed the visibility of the branch 3487371-debug-failing-test to hidden.

spokje changed the visibility of the branch 3487371-random-test-failure to hidden.

spokje’s picture

MR !10701 (which is Drupal\Tests\image\Functional\ImageStylesPathAndUrlTest and some pipeline changes to make it, and only it, run 1750 times) gives us 13 fails out of the 1750 runs.

These 13 fails breaks down to:
2x testImageStyleUrlAndPathPrivateUnclean()
10x testImageStyleUrlAndPathPrivateLanguage()
1x testImageStyleUrlAndPathPrivate()

MR !10702 shows that changing \Drupal::state()->set()/get() to Drupal::keyValue()->set()/get(); as suggested in the Slack thread passes a 1750 times run without errors.
The only reason this MR shows as failed is the fact that the artifacts are too big to upload:

./sites/simpletest/browser_output: found 182292 matching artifact files and directories 
ERROR: Uploading artifacts as "archive" to coordinator... 413 Request Entity Too Large  id=3825475 responseStatus=413 Request Entity Too Large status=413 token=glcbt-64
FATAL: too large     

spokje’s picture

Status: Active » Needs review

MR!10703 contains the changes that should be committed after review.

spokje’s picture

Assigned: spokje » Unassigned
smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Based on the findings in #9, and I re-ran 10703 a few times and never got these random failures, when before I probably least got 1 each time.

dww’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Bug Smash Initiative

Fantastic work! Great find.

Agree that the changes to ImageStylesPathAndUrlTest look good. That much is indeed RTBC.

However, grep is finding more \Drupal::state() in core/modules/image/tests/src/* and I'm wondering if we want to expand the scope here to fix ImageStyleFlushTest and ImageEffectsTest while we're at it. I don't remember seeing those randomly fail as often, but it seems like we shouldn't just do this conversion when we happen to be nailed by the trouble. I'm in favor of preemptively fixing all the tests to stop using state(). So we should probably fix more than 1 test class per issue.

spokje’s picture

Strongly disagree with bundling more issues together, but feel free to add-n-fix whatever you like.

Problem here IMHO is where the scope-creep ends, also more tests to test 2500x, or at least as-many-times-fit-in-an-hour makes things very slow, distinct tests run lower and fail rate/ success prove drop.

Doubt that it's OK to just create an MR that changes state to keyvalue, without the above test-runs, since the mostly (very) low failure rate will probably mean a single run will pass the tests.

To me that would prove absolutely nothing about the need for the fix, but n=1, YMMV etc, etc :)

Let the powers-that-be decide on this one.
In the mean time I will continue to create single test fix issues, and surely not for the easy core credits, which I really don't need in the first place.

nicxvan’s picture

In general I would agree with @dww's comment to expand, but here I think @spokje's reasoning is sound. These types of test failures require really long running multiple tests so it makes sense to fix one at a time.

Further there is a separate issue to address the race condition and anecdotally I've never seen this other tests fail.

dww’s picture

Status: Needs review » Reviewed & tested by the community

I was thinking that if release managers are saying “we shouldn’t use state in tests, only keyvalue, that we could do bulk.

But yes, strongly appreciate the long time it takes to do even single tests carefully. Wouldn’t dream of accusing you of lots of issues for credit. You’re a prolific contributor!

I was only wondering if we could add more scope without as much effort and rigor if we want to convert entirely. It’s a bit weird having a mix of both, so I’m hoping to minimize the time it takes to convert.

All that said, yes, following the issue where the underlying race condition in State API is hopefully going to be fixed. So maybe it’s moot to do this conversation?

Yeah, I’m torn, I guess we’ll leave it for the core committers to decide. Back to RTBC.

Thanks!
-Derek

donquixote’s picture

I'd say this issue needs an explanation why the problem occurs with state, but not with keyvalue.
From the comments here I understand it is a race condition, and from looking at the code, I assume it is caused by the cache layer in state, because otherwise state is just a wrapper around keyvalue.

as suggested in the Slack thread

So, this should be in the issue summary..

I do find this issue from April 2024, which might be related:
#3438424: [random test failures] Race condition in state when individual keys are set with an empty cache

spokje’s picture

@donquixote There's a problem with multiple issues: When you're working on a lot of them, you forget that other people didn't. :)

Here's the issue where the root cause is attacked, hope that helps?
#3496257: Race conditions in CacheCollector/State (again)

catch’s picture

Generally on the scope:

I would essentially commit any quick fix to ImageStylesPathAndUrlTest to stop it random failing including skipping it, because it is so annoying when it fails all the time.

Fixing it instead of skipping it is much better.

I do think we should switch all test usage from state to key/value because it massively simplifies things, however it's currently useful that we haven't done that because it's the only way I know to reproduce #3496257: Race conditions in CacheCollector/State (again). So would be quite happy to commit individual fixes for the frequently random tests to make the pipelines more reliable, then handle other issues which aren't randomly failing (as often) together in one issue later.

  • catch committed 7031fb42 on 11.x
    Issue #3487371 by spokje, dww: [random test failure]...
catch’s picture

Version: 11.x-dev » 11.1.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Committed/pushed to 11.x, thanks!

This doesn't apply to any previous branches including 11.1.x. I think it would be worth backporting to 10.5, 11.1, and 10.4 (11.0 and 10.3 if it's an easy cherry pick) so that we get less random failures on branch runs. Probably a backport to one branch would cherry-pick to all the others.

spokje changed the visibility of the branch 3487371-2500x-ImageStylesPathAndUrlTest-as-is to hidden.

spokje changed the visibility of the branch 11.x to hidden.

spokje’s picture

Status: Patch (to be ported) » Needs review

Not sure what either I or this issue has done to deserve waiingt 2:42 minutes for a full spellcheck on all files, but here we are...

  • catch committed 69a509f1 on 10.3.x
    Issue #3487371 by spokje, dww: [random test failure]...
catch’s picture

Version: 11.1.x-dev » 10.3.x-dev
Status: Needs review » Fixed

Cherry-picked back through to 10.3.x, thanks!

  • catch committed f8d2454b on 10.4.x
    Issue #3487371 by spokje, dww: [random test failure]...

  • catch committed 87ab7b3c on 10.5.x
    Issue #3487371 by spokje, dww: [random test failure]...

  • catch committed d5f2b47e on 11.0.x
    Issue #3487371 by spokje, dww: [random test failure]...

  • catch committed d5b6fbbd on 11.1.x
    Issue #3487371 by spokje, dww: [random test failure]...

quietone’s picture

Issue summary: View changes

Updated the proposed resolution with the issue where the root cause was discussed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.