We tried to fix random fails with #2901626: CSS animations cause \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInBlockFormTest random fails
But #2830882-99: Introduce Drupal.offCanvas, mirrored after Drupal.dialog, to avoid Settings Tray using Drupal.Dialog in unintended ways seems to prove that even with the animation fix this random failure could still happen
We need to determine a fix.
Is it a code problem in \Drupal\Tests\settings_tray\FunctionalJavascript\SettingsTrayBlockFormTest::testBlocks maybe we aren't waiting for some event that usually happens quick enough we don't have to wait?
Or maybe it is actually a DrupalCI infrastructure problem that will never be able to be fixed with a test code fix
| Comment | File | Size | Author |
|---|---|---|---|
| #40 | interdiff-patches-37-40.txt | 7.89 KB | Anonymous (not verified) |
| #40 | 2902191-40-no_whitespace.txt | 6.62 KB | Anonymous (not verified) |
| #40 | 2902191-40.patch | 14.88 KB | Anonymous (not verified) |
| #37 | interdiff-patches-31-37.txt | 2.98 KB | Anonymous (not verified) |
| #37 | 2902191-37-no_whitespace.txt | 6.56 KB | Anonymous (not verified) |
Comments
Comment #2
tedbowOk here is test to see if #2830882: Introduce Drupal.offCanvas, mirrored after Drupal.dialog, to avoid Settings Tray using Drupal.Dialog in unintended ways cause the random failure.
2 patches that will run \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInBlockFormTest::testBlocks() only 30x. 1 reverts #2830882 also
Comment #3
xjmI wouldn't do this because it actually could affect the test passing or failing when files weren't successfully cleaned up, etc.
This did fail again overnight: https://www.drupal.org/pift-ci-job/739411 Although that commit hash is before the critical was committed.
Comment #4
xjmComment #5
tedbowI think I might have found the random fail point
The new part is
assertPageLoadComplete(). Basically it makes sure all contextual links have been loaded.I think what was happening:
$page->pressButton($button_text);Submitted the form which did an ajax submit and ajax redirect$this->assertElementVisibleAfterWait('css', $new_page_text_locator);assert the page was redirected. But it is possible the page was not completely ready. Contextual links could have not finished loading. Because when the page gets loaded the[data-contextual-id]elements are empty and then contextual js loads the links either through ajax call or local storage.So we need to actually wait for all contextual links to be loaded.
I also added
assertPageLoadCompleteto the overridden version of drupalGet() because this should happen everytime the page is loaded not just redirects.But also added
Because in setup() when logging in this won't be true.
Comment #6
tedbowI was going to try to figure out how I could emulate slow testbots to make the random error happen all the time which I think would prove the fix.
But then I realized the slow testbots could just slowing down Drupal response which cause this error.
So these 2 patches simply add a .25 second sleep to index.php.
I am not running the whole test suite because I just want to prove the .25 second pause will make HEAD fail consistently on OutsideInBlockFormTest and that the fix in #5 passes even with the pause.
This is behavior I am seeing locally when I run these tests.
So 1 patch has the fix and1 doesn't. Running OutsideInBlockFormTest 5x.
Comment #8
GrandmaGlassesRopeManDo we still need this?
Curious if you can just use
window.jQueryComment #9
tedbow1. 😱 Actually leaving that there was mistake and actually my patch fails without it. So the contextual links aren't the problem but I think they should be added to the final patch because I still see them as a possible cause for random failure. I will remove this from the next patch just to prove the other fix that will come fixes it without it though.
2. I think that will not work it the JS condition needs to return a TRUE.
Comment #11
tedbowAfter fixing #8.1 I looked for another possible problem.
Looking that error was happening in the random fails and with patch 2902191-6-slow_drupal_no_fix.patch I realized that OutsideInBlockFormTest::testBlocks() actually loops through all core themes except Seven and I didn't know which theme was causing the error. @see
So put a simple
print $themeline in the test locally to determine which theme was breaking it. It turned out to be Stark.But then I tried
And then it failed on Bartik. Basically it wasn't failing because of Stark or Bartik it always failed on the 2nd theme in the loop.
So I looked at this code chunk again where it failed.
The test was failing when
$block_plugin == 'system_branding_block'.So what happens in that case is the Branding blocks is opened the site name is changed and form is submitted.
Then the intention of this line
$this->assertElementVisibleAfterWait('css', $new_page_text_locator);is to wait for the new text, the site name, from the form submit to take effect on the site. That should mean that the form has been saved and the page has been redirected.
So for the first time through this is true but for the second time through the site name has already been changed to $new_page_text!
So essentially after the first iteration of the loop there is not a wait after the form is submitted which means the next line
$this->openBlockForm($block_selector);could happen either before the ajax redirect. This seems to be happening when testbots are slow.
So the quick fix to this problem is to use a random site name in every iteration of the loop. That way it will truly wait for the form to be submitted and the new site name to be on the page.
This patch provides that fix without the previous fixes. I still think the previous fixes could have also been the cause of the problem under different timing conditions. We should not interact with the page until all the contextual links are loaded.
I will though provide another patch after this to hopefully make the test less confusing..........
Comment #12
tedbowSo #11 passes it and we could use that fix.
I think though the fact that it took me this long to figure out that loop would cause problems with the site name and nobody else has ever found this potential problem when review this test points to the fact that looping and doing the same sequence for each theme in the same method is confusing for this method because is complex user interaction.
So this patch removes the loop and adds
$themeparameter to the testBlocks() function. It then used getTestThemes() in providerTestBlocks() to provide a test case for every block and theme combination. This changes the test cases from 3 to 15.For me running locally this changes running the testBlocks() only from 3.41 minutes(slow I know running in VM) to 7.4 minutes. So not 5x as long because of test setup I guess.
For me the simpler test is more important than the faster test time. I also think before this patch is finished we should add back changes in #6 because
assertPageLoadComplete()still could help stop other random failures.UPDATE: Actually Test time difference
2902191-12-slow_drupal-no_loop.patch
2902191-10-slow-random_site_name.patch
Comment #13
tedbowAdding back the changes from #9 that waits for all contextual links to be loaded on page loaded. These changes don't fix the current random failure but it is an extra check that hopefully will avoid unknown random failures in the future.
Important This still has the usleep() call in index.php to simulate slow testbots and only runs OutsideInBlockFormTest
Comment #14
tedbowI removed the var
$setupCompletebecause it confusing.All I had to do was check for the permission
access contextual linksbefore callingassertAllContextualLinksLoaded()I think this is clearer.
This includes a "slow_drupal" patch with the usleep() command and the runs the 5x times.
2902191-14.patch removes these changes and would be commit ready if RTBC'ed
Comment #15
tedbowHere is a no whitespace patch for #14 to make it easier to review because removing the loop adds a bunch of lines to the patch that are just indentation changes.
Comment #16
tedbowChanging to "major"
If we look at 2902191-6-slow_drupal_no_fix.patch in #6 this patch demonstrates in a "slow drupal" situation, simulated by add like usleep to index.ph might happen in slow testbot HEAD will still likely fail.
So I think this is "major" in the sense the sooner or later if not fixed it will almost certainly break head.
Comment #17
tedbowRemoving text I meant to add to #14
Comment #18
Anonymous (not verified) commented@tedbow, excellent investigation!
Added #14 patch with
diff -w, for ease of comparison changes.Small review:
It is ok, because end of test now :)
+1. Change is good even in itself. Now test more flat, and it is a pleasant.
assertPageLoadComplete- a self-speaking name, but can we partially explain the reasons for waiting via keeping the comment?Look quite useful for general use, not only for OutsideInJavascriptTestBase (follow-up?).
Maybe first
assertWaitOnAjaxRequest()and thanassertAllContextualLinksLoaded(), for cases Where after ajax new links appear?Do we still need it? If yes, what about
jQuery.isReadyortypeof jQuery === 'function'(something like inassertWaitOnAjaxRequest())Comment #19
Anonymous (not verified) commented#16, fair. It is even critical per #2829040: [meta] Known intermittent, random, and environment-specific test failures. Also added issue about testing 'slow bot' for all tests #2902968: How many random fails due to slow bots?.
Comment #20
Anonymous (not verified) commentedI forgot to say, that these changes also look nice when running tests.
Comment #21
tedbow@vaplas thanks for the review!
#18
1. yep
2. Good point
3. Added a comment
4. Added a @todo about existing issue #2821724: Create Javascript Tests for Contextual Links
5. fixed
6. added "typeof jQuery !== 'undefined'" and made all 1 condition.
#19 Thanks for making #2902968: How many random fails due to slow bots? look like this is the only test that fail 😅
#20 Yes will make debugging easier.
Comment #22
Anonymous (not verified) commented#21: looks perfect! And #6 + #14 confirm the fixation of fail. +1 to RTBC. Thanks!
Nit:
jQuery('$selector').length === 0( '===' )#19: Yes, I thought I hit the jackpot on this 😉, but on the other side, the absence of other falls is also not bad 👌 (but we can rerun this test in any time, @Mr. Bot).
Comment #23
Anonymous (not verified) commented#22: done :)
Comment #24
tedbow@vaplas thanks for finishing this up. It looks good to me for RTBC. But since I did most of the patch I shouldn't RTBC.
@vaplas since you provided most of the reviews and your patch on #23 was just 1 character difference I think you could RTBC, if you think it is good.
Comment #25
Anonymous (not verified) commentedI am glad to do this! Thank you for this blow on the random fails, @tedbow.
Comment #27
tedbow@vaplas thanks for RTBC! Patch needed reroll. Here it is
Comment #29
tedbowI meant to upload the "no_whitespace" file as a ".txt" so the tests would not run
Comment #30
Anonymous (not verified) commentedIndeed, needed reroll after #2782891: The Page Title block's title behaves in a confusing way with Settings Tray and the Help block incorrectly has Settings Tray styling.
I also compared #23 and #27. And I found no changes except one:
It seems this is an unintentional changes, right?
Maybe even replace key format? Like:
Because 'bartik--Drupal\outside_in_test\Plugin\Block\OffCanvasFormAnnotationIsClassBlock' it's an acquired taste :)
Comment #31
tedbow@vaplas good catch yes that was a mistake.
Also fixing the keys as you suggested.
Comment #32
Anonymous (not verified) commented@tedbow, thank you for nice work! Back to RTBC again.
Comment #33
star-szrDiscussed with @xjm @effulgentsia @cilefen and we agreed this issue is critical because it's causing a non-trivial amount of random fails.
Comment #34
droplet commentedI didn't look into this issue deeply but I think the actual problem on waitForOffCanvasToOpen. #drupal-off-canvas opened before the content rendering inside the dialog.
I trying to sort out of these problems in global:
#2903300: Dispatch an event to indicate the element is anmiated/loaded
Comment #35
Anonymous (not verified) commentedNow the stability of the test is provided by the fact that
assertElementVisibleAfterWait()works with different values (#11). @tedbow found a great way to implement this through refactoring instead of random-value generation.In addition, @tedbow has been offered two more stability enhancements:
#34 good point! Maybe we can strengthen
waitForOffCanvasToOpen, as part of this issue too. But how to make it better?I did not find a class that says content is loaded. Should we add some class in
afterCreate(off-canvas.es6.js)?Maybe some known elements inside content? But not all content has known elements (like [type="submit"]). Example
off_canvas_testmodule add only one simplediv. Or is it an unsuccessful for OffCanvasTest?The only sign that I found is the resizing, after which the attribute
styleappears. Therefore, we can do so:We can also move this quest to separate issue, since the current patch is sufficient for the stability of this test (imho). Therefore, I did not change the status of issue. But you can do it, if you think otherwise.
Comment #37
Anonymous (not verified) commentedRerolled after #2904134: Settings Tray uses the off-canvas dialog type, but "off_canvas" is not an accurate form plugin name, "settings_tray" is.
Comment #38
lendudeNeeds a reroll now that #2803375: Rename Outside-in module to "Settings Tray" for real is in.
Updated references in the IS to the new test name. New name, same fails : https://www.drupal.org/pift-ci-job/751715
Comment #39
tedbowChanging to new settings_tray.module component. @drpal thanks for script help!
Comment #40
Anonymous (not verified) commented#38: Rerolled.
Comment #41
tedbow@vaplas thanks for the rerolls!!
Setting back to RTBC since it has just been rerolls since it was RTBC confirm the reroll in #40
@droplet #34 that could be problem but this problem because of where it is failing in SettingsTrayBlockFormTest.php:130(seen recently here https://www.drupal.org/pift-ci-job/763335) is because of the problem described in #11 because the site name text already being on the page.
waitForOffCanvasToOpen()is called 7 times directly andopenBlockForm()which contains it is called 7 times also but the random fail has been consistent on SettingsTrayBlockFormTest.php:130 so I think we should commit this issue before worrying about other problems.Comment #42
droplet commented@tedbow,
Yeah we could always refactor that so it's not a reason to block it move forward :)
Comment #46
catchCommitted/pushed to 8.5.x and cherry-picked to 8.4.x. Thanks!
Comment #47
tedbow@catch thanks for committing and thanks everyone who worked on this issue.
Hopefully we just avoided HEAD breaking in a way this test has done in past and causing a major pain for everyone!
Comment #49
xjm