Starting to see a new random fail in Drupal\Tests\Core\Command\QuickStartTest
https://dispatcher.drupalci.org/job/drupal_patches/57302/testReport/juni...
https://dispatcher.drupalci.org/job/drupal_patches/57504/testReport/juni...
https://dispatcher.drupalci.org/job/drupal_patches/58062/testReport/juni...
https://dispatcher.drupalci.org/job/drupal_patches/58466/testReport/juni...
https://dispatcher.drupalci.org/job/drupal_patches/58522/testReport/juni...
https://dispatcher.drupalci.org/job/drupal_patches/59593/testReport/juni...
It seems as if the test is trying to access the local server before its actually up?
Its not super frequent, but often enough that I've gotten a couple of reports asking if this was the patch or drupalCI causing the fail.
Additional info
This is what the terminal output looks like when it is completely up:
Drupal development server started: <http://127.0.0.1:8888> This server is not meant for production use. One time login url: <http://127.0.0.1:8888/user/reset/1/1636322519/UGLY_HASH/login> Press Ctrl-C to quit the Drupal development server.
Actual code is waiting for a text which matches this regexp pattern: 127.0.0.1:(\d+)
This kind of text appears twice in the output:
1. LINE: Drupal development server started
2. LINE: One time login url
(Have you noticed that the dots are not escaped in the IP address? It should be like this: 127\.0\.0\.1:(\d+)
)
When it breaks out from the while loop the "One time login url" is not there yet. (at least no all the time)
But the code tries to find it. Like this:
$this->assertStringContainsString("127.0.0.1:$port/user/reset/1/", $server_process->getOutput());
That is why it randomly fails. Very likely it depends on how busy the server is.
Comment | File | Size | Author |
---|---|---|---|
#25 | drupal-2975644-0903-25-quick-start-server-wait.patch | 1.38 KB | Sweetchuck |
#21 | 2975644-21.patch | 743 bytes | neclimdul |
#19 | 2975644_19.patch | 8.42 KB | Mile23 |
| |||
#18 | 2975644_18.patch | 1.16 KB | Mile23 |
#17 | 2975644_17.patch | 1.2 KB | Mile23 |
|
Comments
Comment #2
MixologicComment #3
Anonymous (not verified) CreditAttribution: Anonymous commentedMaybe instead of:
Will be better send guzzle request until code === 500. Though I'm not sure. Let's see if we can reproduce it on 2000 runs.
Comment #4
Anonymous (not verified) CreditAttribution: Anonymous commentedIt seems there is some leak of resources:
I'm trying to cancel testing, but it starts anew: https://dispatcher.drupalci.org/job/drupal_patches/59628/console
Edit:
Re-testing also finished with FAILURE.
Comment #5
Anonymous (not verified) CreditAttribution: Anonymous commentedAlarm! It is necessary to somehow stop #3.
Also 300 runs enough for an fails. Let's also check APCu with TTL=1 (favorite among resource thieves).
Comment #6
Mixologic2000 tests all starting a local php webserver would probably hit some sort of failure to clean up issue in php.
But in any case theres plenty of evidence in the console logs indicative of this test Failing for sure, so that definitely succeed in showing that the failure condition is reproducable.
Comment #7
Anonymous (not verified) CreditAttribution: Anonymous commentedWith TTL = 1 all work as always!) Now the patch, which should clearly show that without changing TTL, 300 tests fail (near the end).
Comment #8
Anonymous (not verified) CreditAttribution: Anonymous commented#7 fails, but this does not seem to be the cases from the IS. Let's check without wait server ready.
Comment #9
Anonymous (not verified) CreditAttribution: Anonymous commentedComment #10
borisson_I don't understand why the usleep is different to the normal sleep? The guzzle request (#3) sounds like a possible solution for this problem.
Comment #11
Mixologicusleep(200000) is only .2 seconds
Not sure why shortening the sleep time helps.
Comment #12
Anonymous (not verified) CreditAttribution: Anonymous commentedBecause it does not help :)
#3 - #9 I just trying to reproduce the random fails from IS cases. And I did not succeed :(
#3, #7, #8 falling due to other reasons (or the relationship is not obvious).
I agree that the wait via guzzle request will be better than magical
sleep(2)
. But not sure that it helps to fix the IS random fails.It may be even more reliable to make a more global wait, like in this patch?
Also, the QuickStartTest has stable fail with #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock. This patch does not help with it too :(
Comment #14
Anonymous (not verified) CreditAttribution: Anonymous commented#13: My bad with check defined constant. But why CI aborted, if
$process->setTimeout(500);
?Comment #17
Mile23All the links to the spurious fails are now 404, so we can't see them. I'm curious if this is still an issue.
Comment #18
Mile23Ugh. It just so happens this is the only unit test in the Command group, so...
Edit:
Same deal as above. There was a panic and a bunch of failed tests, and then the test build started over again. I cancelled it because it would probably do the same thing again and again.
In this case, we're running the maximum number of concurrent run-tests processes. Each process is per test class. Each test class runs a process per test function. Each test function forks at least one process.
So it's no surprise that we run out of processes. This is more of a stress test on DrupalCI than a test of quick start.
I didn't see any fails until after the panic, though, so I'm not sure what's the deal here.
I was looking at this because QuickStartTest has an isolation issue. It says require_once for core/includes/boostrap.inc, but it does this in setUp(), which means it's outside the isolation of the separate process. This causes fails in other tests when you run them under PHPUnit.
Comment #19
Mile23Some cleanup on the test, reduced to concurrency: 16.
Edit:
Same thing:
Comment #21
neclimdulI'm not sure about all the java errors but I think the problem might be the reverse of what might be expected and shortening the wait loop doesn't help. In fact, it might make the race condition built into the test worse.
This is the output of starting the standalone server:
and this is the code I found to fail randomly.
The problem is that the "wait till server is up" logic only works if the timing is _just_ right or the server is up because it drops out matching the first message coming out of the script. But then the test immediately asserts the second message. There is no guarantee the other process has made it far enough to output the reset link though.
The fix should be pretty straight forward though, wait till we get the final boot message out before continuing with the assertions.
Side note, I found this happened a lot when running tests with the phpunit. I assume something about the way they run gives back the process to phpunit faster triggering the error more often. Possibly why some of the other things mentioned triggered the error more often as well.
Comment #24
SweetchuckComment #25
SweetchuckComment #26
SweetchuckComment #28
neclimdulupdated version with some other fixes in #3265291: QuickStartTest: The waiting is the hardest part...
Comment #29
xjmComment #30
Sweetchuck@xjm Why did you open another issue for the same problem?
In my experience it happens very often when there is an open issue for years (in this case ~4 years) an somebody opens and new issue and closes the old one as duplicated.