
Problem/Motivation
Package Update (Drupal\Tests\package_manager\Build\PackageUpdate)
✘ Package update
┐
├ Error response:
├
├ Headers: array (
├ 'Host' =>
├ array (
├ 0 => 'localhost:8009',
├ ),
├ 'Date' =>
├ array (
├ 0 => 'Sun, 05 Jan 2025 06:19:54 GMT',
├ ),
├ 'Connection' =>
├ array (
├ 0 => 'close',
├ ),
├ 'X-Powered-By' =>
├ array (
├ 0 => 'PHP/8.3.15',
├ ),
├ 'Content-type' =>
├ array (
├ 0 => 'text/html; charset=UTF-8',
├ ),
├ )
├
├ Server error log: [Sun Jan 5 06:18:24 2025] PHP 8.3.15 Development Server (http://127.0.0.1:8009) started
├ [Sun Jan 5 06:18:24 2025] 127.0.0.1:49894 Accepted
├ [Sun Jan 5 06:18:24 2025] 127.0.0.1:49894 Closed without sending a request; it was probably just an unused speculative preconnection
├ [Sun Jan 5 06:18:24 2025] 127.0.0.1:49894 Closing
├ [Sun Jan 5 06:18:24 2025] 127.0.0.1:49906 Accepted
├ [Sun Jan 5 06:18:30 2025] 127.0.0.1:49906 Closing
├ [Sun Jan 5 06:18:30 2025] 127.0.0.1:41244 Accepted
├ [Sun Jan 5 06:18:30 2025] 127.0.0.1:41244 Closing
├ [Sun Jan 5 06:18:30 2025] 127.0.0.1:41246 Accepted
├ [Sun Jan 5 06:18:33 2025] 127.0.0.1:41246 Closing
├ [Sun Jan 5 06:18:33 2025] 127.0.0.1:41250 Accepted
├ [Sun Jan 5 06:18:36 2025] 127.0.0.1:41250 Closing
├ [Sun Jan 5 06:18:52 2025] 127.0.0.1:48778 Accepted
├ [Sun Jan 5 06:18:53 2025] 127.0.0.1:48778 Closing
├ [Sun Jan 5 06:18:53 2025] 127.0.0.1:48784 Accepted
├ [Sun Jan 5 06:18:53 2025] 127.0.0.1:48784 Closing
├ [Sun Jan 5 06:18:53 2025] 127.0.0.1:48796 Accepted
├ [Sun Jan 5 06:18:54 2025] 127.0.0.1:48796 Closing
├ [Sun Jan 5 06:18:54 2025] 127.0.0.1:48802 Accepted
├ [Sun Jan 5 06:18:56 2025] 127.0.0.1:48802 Closing
├ [Sun Jan 5 06:18:56 2025] 127.0.0.1:48804 Accepted
├ [Sun Jan 5 06:18:56 2025] 127.0.0.1:48804 Closing
├ [Sun Jan 5 06:18:58 2025] 127.0.0.1:48816 Accepted
├ [Sun Jan 5 06:18:59 2025] 127.0.0.1:48816 Closing
├ [Sun Jan 5 06:19:06 2025] 127.0.0.1:33958 Accepted
├ [Sun Jan 5 06:19:10 2025] 127.0.0.1:33958 Closing
├ [Sun Jan 5 06:19:10 2025] 127.0.0.1:39418 Accepted
├ [Sun Jan 5 06:19:13 2025] 127.0.0.1:39418 Closing
├
├ Failed asserting that 500 is identical to 200.
│
│ /builds/project/drupal/core/modules/package_manager/tests/src/Build/TemplateProjectTestBase.php:721
│ /builds/project/drupal/core/modules/package_manager/tests/src/Build/PackageUpdateTest.php:47
┴
FAILURES!
Tests: 1, Assertions: 28, Failures: 1.
Steps to reproduce
Failing jobs:
https://git.drupalcode.org/project/drupal/-/jobs/38971 01#L82
https://git.drupalcode.org/issue/drupal-3497701/-/jobs/5081085/viewer
https://git.drupalcode.org/issue/drupal-3497701/-/jobs/4832276
Most recent passing job:
https://git.drupalcode.org/issue/drupal-3497701/-/jobs/5086167
Proposed resolution
There are a few different fails to fix:
- Fix timeout in php-tuf/composer-stager (414)
- #3521438: Bump php-tuf/composer-stager to 2.0.1
- Increase timeout in
BuildTestBase::executeCommand()
- Assert response text in
makePackageManagerTestApiRequest
since OOM and timeout can still return a 200
Remaining tasks
User interface changes
Introduced terminology
API changes
Data model changes
Release notes snippet
Issue fork drupal-3497701
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
spokjeSo this is a test that doesn't scale well when ran multiple times.
With a concurrency of 5 I just manage to squeeze out a 100x run, which is far too low to hit the random failure.
Unsure how to proceed here.
Comment #4
acbramley CreditAttribution: acbramley at PreviousNext commentedIt seems like this affects more than just PackageUpdateTest, all package_manage tests failed here https://git.drupalcode.org/issue/drupal-3513875/-/jobs/4770092
the PHPUnit Build and Nightwatch jobs are now the worst offenders for random fails. I remember @catch mentioning something in slack about the package_manager tests but I can't find it
Comment #5
catchComment #6
mstrelan CreditAttribution: mstrelan at PreviousNext commentedRunning this test locally always fails for me, but it fails asserting the expected versions. That means
makePackageManagerTestApiRequest
is passing.I found it odd that the
Error response:
in the output is empty, thinking maybe an error was thrown and system.logging error_logging was set to hide instead of verbose, but that's not the case. I explicitly put some errors in\Drupal\package_manager_test_api\ApiController::run
but I could always see the error response in the output.I then thought maybe there is an OOM or timeout. I didn't have much luck down the OOM route, but I did find that if I add this line to
makePackageManagerTestApiRequest
it would fail, even after asserting the 200 status code.$this->assertSame('Finish', $session->getPage()->getContent());
I'm not sure exactly how that helps, but hoping it can spark some ideas.
Comment #7
mstrelan CreditAttribution: mstrelan at PreviousNext commentedFWIW I can replicate the exact error by adding this to the start of
\Drupal\package_manager_test_api\ApiController::run
Comment #8
catchFatal errors can often be 200s, I think we have an issue somewhere.
The timeout is not good, probably reflects something that could happen during actual operation.
Do we know where the 20 seconds max execution time is coming from? Bit surprised it's not 30 seconds. We could increase that to see if it helps the random failures and open a separate issue for the timeout.
Comment #9
mstrelan CreditAttribution: mstrelan at PreviousNext commentedAFK now but there is a constant in one of the test base classes that's setting it to 20 seconds. I'm not sure if there is something else about my local that's causing it to time out though.
Comment #10
catchThis test fails for me locally with:
which looks unrelated to the random test failure but also means I can't get a green test run to then try to make it fail the same way from there. Also tried reducing TemplateProjectTestBase::MAX_EXECUTION_TIME to 1 and that doesn't change anything for me.
Comment #11
mstrelan CreditAttribution: mstrelan at PreviousNext commentedThat's the same experience I had. Most likely it is timing out after 1 sec but still a 200. If you assert that the response is empty it will probably fail and show the timeout error.
Side note I wonder if we can get htmlOutput to work here.
Comment #13
catchAhh thanks this is very confusing.
I change the max execution time to 30 and it passed locally, pushed an MR for this.
I think we need two spin-off issues:
1. We should try to optimise the test and/or package_manager so we don't need a 30 second timeout.
2. We should try to make sure that fatal errors result in a 500 instead of 200 so we don't get false negatives on 200 assertions.
Comment #14
catchhttps://git.drupalcode.org/project/drupal/-/jobs/4824652 is the repeat test class job running with that change, it's varying between 400-700 seconds to complete, so very slow indeed, but so far each iteration is passing.
Comment #15
mstrelan CreditAttribution: mstrelan at PreviousNext commentedI must admit I haven't tried to grok everything this test is trying to achieve, but it feels like
PackageUpdateTest::testPackageUpdate
could use a mocked response from the controller, and the controller could be tested independently. But maybe that's losing end-to-end coverage that we're trying to test for.Another approach would be to return a success string from the controller and assert that we get that in the response. That would catch the OOM and PHP timeout issues we were seeing locally.
That job passed 96 times and failed 4 times. This time instead of there being no error response we have an error to look at which I've included below, I've replaced html encoded entities for readability. The TL;DR is the rsync command is sometimes timing out after 2 minutes. This timeout seems to come from
\PhpTuf\ComposerStager\API\Process\Service\ProcessInterface::DEFAULT_TIMEOUT
, although it looks like\Drupal\package_manager\StageBase::create
should be passing its own default, which is 300, so I'm not sure what's going on there.Comment #16
mstrelan CreditAttribution: mstrelan at PreviousNext commentedPushed an update to assert the response content, and restore the previous execution time to see if that catches anything.
The 120 timeout is still curious. I can't get xdebug working inside the php local server so it's hard to step through. Looks like it would be passing 300 all the way down to
\PhpTuf\ComposerStager\Internal\FileSyncer\Service\FileSyncer::sync
which calls$this->environment->setTimeLimit($timeout);
but I'm not sure that actually affects anything, e.g. in\PhpTuf\ComposerStager\Internal\FileSyncer\Service\FileSyncer::runCommand
the timeout is not passed to$this->rsync->run
so it looks like it would just fallback to 120.Edit: I raised FileSyncer::sync doesn't pass timeout to through to AbstractProcessRunner::run against
php-tuf/composer-stager
.Comment #17
mstrelan CreditAttribution: mstrelan at PreviousNext commentedThe plot thickens. Looks like package manager has a bunch of special handling around composer patches. I've managed to run a test locally that will patch php-tuf/composer-stager to pass the timeout through to the rsync command. I tested it my setting the timeout to 1 second in
ApiController::createAndApplyStage
and it failed after 1 second instead of 120.I have a repeat job running with the timeout set to 180 - https://git.drupalcode.org/issue/drupal-3497701/-/jobs/4832276. If that passes (maybe we run it a few times), then we can probably postpone this on https://github.com/php-tuf/composer-stager/pull/414
Comment #18
mstrelan CreditAttribution: mstrelan at PreviousNext commentedNow we have yet-another-timeout to investigate - installing standard profile times out after 300 seconds:
Comment #19
mstrelan CreditAttribution: mstrelan at PreviousNext commentedThe 50x run has passed 3 times in a row now:
https://git.drupalcode.org/issue/drupal-3497701/-/jobs/4854823
https://git.drupalcode.org/issue/drupal-3497701/-/jobs/4856362
https://git.drupalcode.org/issue/drupal-3497701/-/jobs/4864839
Postponing on https://github.com/php-tuf/composer-stager/pull/414 but we could also focus on speeding up the test instead of increasing the timeout.
Comment #20
mstrelan CreditAttribution: mstrelan at PreviousNext commentedUpstream PR is merged, just need a release then we can bump
php-tuf/composer-stager
. Or maybe we postpone again on #3519246: Update Composer dependencies for 11.2.0.Comment #21
mstrelan CreditAttribution: mstrelan at PreviousNext commentedUpstream issue is fixed in composer-stager 2.0.1, opened #3521438: Bump php-tuf/composer-stager to 2.0.1.
Comment #23
catchCommitted the update in the other issue.
Comment #24
mstrelan CreditAttribution: mstrelan at PreviousNext commentedComment #26
quietone CreditAttribution: quietone at PreviousNext commentedShouldn't this be in package_manager component?
Comment #27
godotislateI couldn't reproduce a failure on HEAD after repeating ~50 times (this could be bc of the upstream fix added), but the changes in the MR make sense and the 50x run passed, so lgtm.
Comment #28
quietone CreditAttribution: quietone at PreviousNext commentedI started tests for the other environments this fails for in the daily scheduled test runs.
[Daily] PHP 8.4 MySQL 8.4
[Daily] PHP 8.3 PostgreSQL 16
[Daily] PHP 8.3 SQLite 3.45
Comment #29
quietone CreditAttribution: quietone at PreviousNext commentedThere are other failures on those test runs but none for a Build test.
Comment #32
catchCommitted/pushed to 11.x and cherry-picked to 11.2.x, thanks!
Comment #35
acbramley CreditAttribution: acbramley at PreviousNext commentedThis is still failing insanely often. I would say 80% of my MR pushes yesterday (and I did a lot) had a phpunit build failure with a very similar message, e.g https://git.drupalcode.org/issue/drupal-3346394/-/jobs/6063361
Should we reopen this or a new issue?
Comment #36
mstrelan CreditAttribution: mstrelan at PreviousNext commentedI've reopened #3508109: [random test failure] Package manager random build failures that was originally closed as a dupe, let's continue there.