Problem/Motivation
This is part of #3267247: [meta] Fix and re-enable tests skipped for random failures, for LayoutBuilderDisableInteractionsTest::testFormsLinksDisabled().
See the parent issue for steps to reproduce/proposed resolution.
| Comment | File | Size | Author |
|---|---|---|---|
| #33 | 3268680-33.patch | 6.17 KB | phenaproxima |
| #33 | 3268680-33-1000x-PASS.patch | 7.97 KB | phenaproxima |
| #33 | 3268680-33-1000x-FAIL.patch | 2.59 KB | phenaproxima |
| #32 | interdiff-32.txt | 1.43 KB | xjm |
| #32 | lb-3268680-32.patch | 3.94 KB | xjm |
Comments
Comment #2
phenaproximaSelf-assigning.
Comment #3
phenaproximaHere's a patch to run the test 500 times.
Comment #4
phenaproximaHere's one where I actually run the test 500 times.
Comment #5
phenaproximaLet's see if I can get this one to run...
Comment #6
phenaproximaOkay; guess we'll just run the entire class 500 times.
Comment #8
phenaproximaOK, definitely a random failure. Definitely still happening. Always failing on the same assertion.
My first suspicion here is that there is more than one Close button to be clicked. So this patch adds an assertion that there is only one Close button to press. Let's see if it fails.
Comment #9
phenaproximaLet's try generically waiting for an AJAX request before clicking the Close button. My theory is that maybe, in some cases, it becomes visible and is clicked before its JavaScript behavior -- i.e., close the off-canvas area -- has loaded. That sort of race condition could certainly cause the kind of random failures we're seeing here.
Comment #10
phenaproxima@bnjmnm directed me to #3268368-4: Robustify and restore \Drupal\Tests\ckeditor5\FunctionalJavascript\MediaLibraryTest::testButton, which offers a reasonable theory of what might be happening here. Therefore, this patch waits for the off-canvas area to be visible before trying to close it. Let's see if this helps stabilize the test.
Comment #11
xjmNice, thanks @phenaproxima. I think we have a winner.
We should re-run that with the baseline, but #10's results look pretty solid.
Comment #13
xjmAdding credit for Ben per #10.
Comment #14
xjmBaseline patch and requeuing it with the above during the higher random fail window that seems to happen at night.
Comment #15
spokjeSo we have a baseline test failing (#14) and running at the same time, a non-failing fix patch (#10).
Also we have a PDGE(=Pretty Damn Good Explanation) on _why_ the fix patch works in (#10).
I've attached a patch from #10 _without_ the changes in
core/drupalci.yml, so only containing the fix.I'm unsure against which Drupal Core version this patch should be against, according to here
9.3.x-devseems the way to go and then port upwards.Personally I would aim at
10.0.x-devfor downwards porting.Let's just keep the middle ground that is
9.4.x-devand conveniently is already the version of this issue.Seeing that this is a _random_ failure that occurs (seemingly) only when TestBot is really busy, this patch is very likely to NOT-fail.
There's no proof in that to conclude it is fixing the issue.
(At least for me) The proof is in the test runs mentioned above.
Comment #16
spokjeComment #17
phenaproximaOne thing I'd like to add before this is good to go is a comment above the line that fixes the test, so that we know why we're doing that and don't accidentally revert it, which is a big risk since the failure is random.
Comment #18
spokjeSense = made in #17.
Added new patch with comment.
Comment #19
phenaproximaThanks, Spokje; looks good to me. I can't RTBC, having written the patch, but I'll ping some people...
Comment #20
bnjmnmI'm concerned this won't be sufficient. The successful fix in #3268368: Robustify and restore \Drupal\Tests\ckeditor5\FunctionalJavascript\MediaLibraryTest::testButton accomplished this by wating for something more specific than the dialog wrapper ID. For example, if you use
debugger;to pause execution atDrupal.offCanvasafterCreate.afterCreate,#drupal-off-canvasalready exists, but the dialog hasn't been fully resized/repositioned (and interestingly, the close button is not yet rendered).Based on just comparing the markup pre and post resize, the following selector *might* be a reliable way to check for something that appears after resize + draggable completes
#drupal-off-canvas > form[style=""]Comment #21
phenaproximaDiscussed this a little with @bnjmnm on Zoom. We're thinking that what might help here is to create a simple framework to detect when the off-canvas area is all done resizing itself, and have the test detect that. So, I gave that a try here; let's see how it does after 500 runs. I wasn't able to get the JavaScript dependencies installed to compile this from ES6 to plain JavaScript -- some weird npm error -- so for the moment I bypassed that.
Comment #22
phenaproximaWhoops; had a typo which broke that.
Comment #23
xjmFWIW #10 did look pretty sufficient based on the test results at that time, or at least mitigating.
Is the idea that we would use this test module and assertion attribute in more tests that look for things that might be in the off-canvas area?
If so, should we provide an explicit trait and method for it?
We don't need to @see the issue that added the comment; that's what
git blameis for. ;) The inline comment itself (aside from this line) is sufficient for that. :)Meanwhile, I'll queue up some test result sets for the lastest patch and baseline.
Comment #24
xjmNit: should be "regularly occurring".
Comment #25
ravi.shankar commentedAddressed point number 2 of comment #23 and comment #24.
Comment #26
ravi.shankar commentedMissed the interdiff in comment #25, so adding here.
Comment #27
phenaproximaI discussed this with @xjm and we agreed to add a new test trait for testing the off-canvas area, and keep things as they are in the patch.
So, I present:
Comment #29
phenaproximaWell...that didn't freakin' work. Will investigate tomorrow.
Comment #30
xjmMaybe we need to pass the assertion session into the method or something, because it's not defined in the scope inside the method?But no, your patch fixed that already. At least it's a 100% fail so can be debugged locally. Is it because it's creating a new instance of it? Or... no idea.Maybe that makes sense in context? if you run the check locally?
Comment #31
xjmComment #32
xjmThis should fix the JS errors.
Comment #33
phenaproximaOK, let's try that again. In reply to #30:
Comment #35
xjm7 fails in nearly 10,000 results for the current baseline (the last three envs are about 80% done) means the fail rate of the fail-patch at this time of day is
.0007(or 0.07%) and (we're not actually sure of that last significant digit, could be anything between 0.1 and 1). That means every test has a0.9993chance of passing, which in turn means that there's about a 1% chance that we wouldn't see a still-possible fail in the number of queued test runs.So I will requeue the whole set for the passing patch. :) (Not the fix-only patch though; testing it more than 1 time but fewer than 5,000 doesn't help us here.) :)
Comment #36
xjmQueued it for 9 more runs. That'll make 16K total. If the base fail rate is roughly
0.0007, the chances of us NOT getting a fail are then miniscule. (If it were as low as0.0001, then there would still be a 20% chance of us missing the fail with the current baseline. Isn't statistics fun! However, we got two more fails near the end of the last batches, so I think we're on the right power of 10.)If we compare this with #14, I think we've shown pretty clearly that this fail rate goes way up at night.
Comment #37
xjmI'm convinced. :)
Comment #38
alexpottCommitted 32bee19 and pushed to 10.0.x. Thanks!
Committed and pushed 0fdbe80815 to 9.4.x and 0a06ce704a to 9.3.x. Thanks!
Backported to 9.3.x due as this is only test changes.
Need to rebuild the JS on 10.0.x because of JS build tool updates...
Comment #42
bnjmnmPerhaps it's vanity, but I updated issue credit to the correctly spelled version of me.