Problem/Motivation
#2905922: Implementation issue for Layout Builder went in and one of the HEAD runs failed with LayoutBuilderTest but it passed on re-test
Proposed resolution
Perhaps we have a random fail - investigate
Remaining tasks
Investigate and fix
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #93 | 2924201-93.patch | 18.73 KB | tedbow |
| #93 | interdiff-85-93.txt | 2.93 KB | tedbow |
| #85 | 2924201-test-85-interdiff.txt | 687 bytes | tim.plunkett |
| #85 | 2924201-test-85.patch | 18.65 KB | tim.plunkett |
| #79 | 2924201-test-79-interdiff.txt | 955 bytes | tim.plunkett |
Comments
Comment #2
larowlanComment #4
larowlan1 fail in 7 - confirmed random
Comment #5
tim.plunkettSeems to be in testing contextual links. Tomorrow I can try chopping up the test instead of having such a single long running method.
Otherwise I'm out of ideas
Comment #6
tim.plunkettStepped through this logic with @Wim Leers, and I think we found the issue.
The custom override for
clickContextualLink()is needed because this is the only place in core where the result of clicking on a contextual link doesn't leave the page (this uses the Off Canvas dialog and AjaxResponse to stay on the page).The logic as written in HEAD:
Step 6 is where the fail is.
Step 5 is where our bug is.
The
waitForLink()is pointless, since the link already exists on the page.What we need is
waitForVisibleLink(), which doesn't exist.Instead I combined the innards of waitForLink within the call of a waitForVisibleElement.
Comment #8
larowlandoh!
Comment #9
xjmWe need to revert the Layout Builder commit for this unfortunately. This is failing at a significant rate in HEAD which will cause serious disruption for other issues. I've done so.
This would be critical otherwise, but we can rescope it to fix and add back this test after Layout Builder (as an alpha module) is committed without it.
Comment #10
Anonymous (not verified) commentedI know that PhantomJS and Mink do not always synchronously determine the visibility of an element (especially if it is set in css through extraneous attributes, like
But at the same time a phantom error pops up. So this is not the case.
Maybe more explicit waiting of the visible link will help?
Comment #12
Anonymous (not verified) commentedWell, what about
[aria-pressed]attribute?Comment #14
Anonymous (not verified) commentedNow error here:
Because after
button->press(), the button doesn't change attribute[aria-pressed]. Why?Comment #15
Anonymous (not verified) commented30 green runs. It seems the problem is definitely around here. Let's divide the last changes into parts.
Comment #16
tim.plunkett@vaplas++!
I think you're onto something.
The module was recommitted without the test, so here's the original test but with interdiff-press applied.
Comment #17
Anonymous (not verified) commentedThank you, @tim.plunkett!
This is nice news! Now we can not so worry that an important improvement is reverted.
But why this works?)
To be honest, I've noticed before in different places (not related with Layout) that by clicking on a pencil contextual links do not always appear, and you need to click again. But earlier I always ignored this oddity, because it happens rarely.
But this can also explain the reason why button still have [aria-pressed="false"] after click. It changed this value on 'true', but then the some trigger worked, which returned the value back to "false".
So, let's try to check the stability without
close()and "interdiff-press".Comment #18
Anonymous (not verified) commentedIt's a pity, the theory burst. Looks like without
close()works like withforseregim. More stable, but not enough.Well, we already have issue to improve
clickContextualLink(): #2918718: Using ContextualLinkClickTrait::clickContextualLink() multiple times per page can cause random failures. So, I reverted #16 with comment about changepress()totrigger('click'), and without extra changes with$pagevariable.Comment #19
tim.plunkettThanks @vaplas! I think we're good.
Comment #20
xjmCan we post an interdiff from what was originally committed to this patch, and a multi-run
run-tests.shhack patch with the intended fix, queued a number of times? Every patch here that did have multiple runs failed on the test at least once.Comment #21
Anonymous (not verified) commentedSure! Here is different to original test (#2905922-99: Implementation issue for Layout Builder):
Also repost about this change in the @todo-issue (#2918718-2: Using ContextualLinkClickTrait::clickContextualLink() multiple times per page can cause random failures).
Comment #22
Anonymous (not verified) commentedWow-wow!!! Stress test-only works stable now! It looks like magic from #2775653: JavascriptTests with webDriver:
https://dispatcher.drupalci.org/job/drupal_patches/39538/console:
MINK_DRIVER_ARGS_WEBDRIVER='["chrome", {"browserName":"chrome","chromeOptions":{"args":["--disable-gpu","--headless"]}},Comment #24
Anonymous (not verified) commented#23: 28 passes, and.. fail. Sorry, I hurried on. But anyway, MINK_DRIVER_ARGS_WEBDRIVER is cool.
Running a couple more tests with fix, for better check it.
Comment #25
Anonymous (not verified) commentedWell, extensive testing showed the stability of the fix, but.. found the new random fail:
https://www.drupal.org/pift-ci-job/827065
How is it possible, that the page contains
'The node body', but not contains'Layout'link?I have three versions:
checkField()works unstable as apress()(although in more rare cases)Moon is definitely the dominant version. But trying to check the rest.
Comment #26
Anonymous (not verified) commentedComment #28
Anonymous (not verified) commented#26: timeout, so, we haven't info about fails, only counts :(
PHP 5.5: 3/14 fails (https://dispatcher.drupalci.org/job/drupal_patches/39609/console)
PHP 7: 1/22 fails (https://dispatcher.drupalci.org/job/drupal_patches/39608/console)
Try again with fewer runs.
Comment #29
Anonymous (not verified) commented#28: In all cases, the checkbox was actually set/unset. So, problem with render page.
https://www.drupal.org/pift-ci-job/827779:
https://www.drupal.org/pift-ci-job/827781:
https://www.drupal.org/pift-ci-job/827788:
Problem with checked page363That is, the prevision version of render node is responded. Because we have random fail with/without
Layoutmode for node (checked/unchecked page).To confirm, let's try to clear cache whenever 'Layout' link is not found.
(interdiff with -w flag)
Comment #30
Anonymous (not verified) commentedYou know, I get a fairly stable localy random fail after..
pressButton('Save'):)Comment #31
xjmThis is major, since it would be a testing gate blocker for a major feature.
Comment #32
tim.plunkettComment #33
larowlanI think this might need to be tagged 'Layout Builder beta blocker'
Comment #35
Anonymous (not verified) commentedReuploded #21 patch with interdiff to #2905922-99: Implementation issue for Layout Builder. The stress test showed that this really solves the main fail.
We stopped only because of the discovery of yet another fail: https://www.drupal.org/pift-ci-job/827065. But it is associated with the unstable work of
pressButton(). And it seems this problem is inherent in all JTB tests, that use this method.I opened a separate issue with simple proof-test (#2934064-6: Random fail in JS-tests due to unstable pressButton()). But for some reason I did not mention it anywhere. Apparently #2926309-55: Random fail due to APCu not being able to allocate memory negatively affected my memory too. Very sorry!
Comment #37
tim.plunkett😂😂😂
Some of the routing has changed, hence the above fail.
Here it is adjusted for those changes.
Comment #39
Anonymous (not verified) commentedSome logic has also been changed? Now the body of node is placed in one of the blocks by default. So, changed this assertion + replaced deprecated
entity_get_display().Comment #41
tim.plunkettAh yes, that was the other change now that #2922033: Use the Layout Builder for EntityViewDisplays landed.
Comment #42
Anonymous (not verified) commented#2922033: Use the Layout Builder for EntityViewDisplays - yes, it's awesome!
#40: And after removing all sections - default layout display returns. Therefore, we need to somehow adjust the final assertions, right?
Comment #43
kristen polI reviewed the patch for readability, consistency, and code style and it was difficult to find anything wrong. :) A couple very minor things:
Are these in this order for any particular reason? Imports are normally alphabetical so this order threw me off a bit.
The wording seems awkward. Perhaps:
// Mink press() is unstable, so the trigger('click') is used here.
Thanks.
Comment #44
Anonymous (not verified) commentedThanks for review, @Kristen Pol! All points make sense. Done.
Also, let's check the test with a webDriver.
Comment #47
tim.plunkettThis caused the fail :)
Comment #48
Anonymous (not verified) commentedIndeed!) Sorry)
Comment #49
Anonymous (not verified) commentedFix #47 point + few more after #2937483: Defining a new type of section storage relies on magic strings and hidden assumptions:
Comment #51
Anonymous (not verified) commentedRevert webDriver.
Comment #52
tim.plunkettComment #53
Anonymous (not verified) commentedRevert "Revert webDriver." and
Based on #2946294-14: Fix race conditions in OffCanvasTestBase tedbow's works.
Perhaps armed with
assertWaitForElement()andassertWaitForNoElement()we can even removeassertWaitOnAjaxRequest().Comment #54
tedbowThis patch should add a @Todo to use #2892440: Provide helper test method to wait for an element to be removed from the page
Comment #55
Anonymous (not verified) commented#54: Done. Also renaming this method for better compatibility.
#53: It failed to abandon fomr
assertWaitForElement()in favorassertWaitForElement() / assertWaitForNoElement(). So while the idea postponed.Also tried to remove
I wonder how faster and more unstable the tests will be after that. (Edit: twice as fast)
Comment #56
Anonymous (not verified) commentedNow the
clickContextualLink()method works more stably. Let's check the relevance of our workaround.Comment #57
Anonymous (not verified) commentedTo reproduce random fails which happened with PostgreSQL tests, we can just change:
It is not unknown problem. See #2901792: Disable all animations in Javascript testing.
@tedbow already made special css-fix fot it. But looks like it needs to be supplemented with rule:
transition: none !important;Also more enhanced
ContextualLinkClickTrait::clickContextualLink()from #2918718-6: Using ContextualLinkClickTrait::clickContextualLink() multiple times per page can cause random failures.Comment #58
tim.plunkettWhat is the status of this now with all the chromedriver stuff and with nightwatch coming?
Comment #59
tedbowIn #2957425: Allow the inline creation of non-reusable Custom Blocks in the layout builder We have JS testing. I have put a lot of work on that issue make sure they test doesn't fail randomly.
The tests there extend
\Drupal\FunctionalJavascriptTests\WebDriverTestBasewhich is what is suppose to be used now. NightWatch is still not ready write tests in I think.Also related to this issue:https://www.drupal.org/drupalorg/docs/drupal-ci/customizing-drupalci-tes...
You don't have to edit
run-test.shto run single test multiple times.Comment #61
tim.plunkettWhere does this stand after the move away from PhantomJS
Comment #62
tedbowRe #61 this patch change to use
WebDriverTestBaseso it won't use phantomjsIt also changes
core/drupalci.ymlinstead ofrun-tests.shchanges to run in multiple times.I realize the test I added #2957425: Allow the inline creation of non-reusable Custom Blocks in the layout builder always do a page load between operation with contextual links so it might not have problems describe in this issue.
Comment #64
tedbowWhoops forgot to test locally. Needed to update for #2936358: Layout Builder should be opt-in per display (entity type/bundle/view mode)
Comment #65
jibranWe had 50 passing test runs let's create a real patch now.
This can be removed now.
Comment #66
tedbowThis should be protected
I am going to change all of the other tests that make sense to extend this class so they can use this method.
Comment #67
tedbowNow that we are going to have a base class for these tests it makes sense to move some of the functionality that repeats in these tests to the base class. I think if didn't have issue with the random failure we would have been using for the base class all along.
This sequence is repeated many times in this class and other tests when want to add a block and we are not always consistent about check if the links exist first.
I am going to create
openAddBlockForm()that will do this sequence given a block link text.We return NULL if the off-canvas dialog is not there but we don't actually check the return value ever.
We should change this to
assertOffCanvasFormAfterWaitand fail if the form is not there or hidden field is not there.Comment #68
jibranThank, for creating the proper patch. Hopefully, my review helps with 1 and 2.
Why are we not using waitForButton() here?
Why are we not using waitForLink() here?
This doesn't need to be absolute.
'canonical' is not needed at all.
These can be combined into waitForLink()
These can be combined into waitForElement()
This should be assertNotNull()
Comment #69
tedbow@jibran thanks for the review again.
re #68
waitForButton()here because we need find the need to button this with the current element.But actually can do this
Which means we change the lines above to the find the link because we don't need the
$elementvariable anymore.$selector$this->assertNotEmpty($assert_session->waitForElementVisible('named', ['link', 'Two column']));We don't really need to
assertWaitOnAjaxRequest()if we know there is something we can check for that will be on the page after wait.waitForLink()won't work because it can cause random failures since the link might exist but be on the off-canvas dialog and not visible yet.>assertWaitOnAjaxRequest()first. So I think we should leave these..I also noticed
I think this more clear as
$this->assertNotEmpty($assert_session->waitForElementVisible('named', ['link', $block_title]));The only difference would be if the
$block_titlealso was somewhere else besides the off-canvas dialog.Also these is a lot of changes so uploaded a x25 patch first to just run layout builder javascript tests.
Comment #71
tedbowThis because the removing the absolute url. We can just use hardcoded "node/1/layout" like most tests do.
Which I think the off-canvas dialog being in the process of opening so there is motion problem. Adding
assertWaitOnAjaxRequest()call because this also checks forjQuery(':animated').length === 0Comment #72
tedbowWhoops here is the interdiff
Comment #73
tedbowI chatted with @tim.plunkett about another related issue and he recommended a trait instead of base class.
I think this make sense especially because other core module may need to make Layout Builder integration tests so
enableLayoutsForBundle()would very useful there.Comment #75
tedbowHMM. The test failure was in
\Drupal\Tests\BrowserHtmlDebugTrait::initBrowserOutputFile()So thinking totally unrelated to this issue and also it only failed once so maybe general DrupalCI error
Retesting to make sure.
Comment #76
bobbygryzyngerHi all,
I want to chime in here because I am also seeing some random failures in my feature test suite that depends on adding blocks within layout builder.
The basic issue is this: clicking the 'Save layout' link is not reliably saving the layout.
My assertion for saving the layout was this:
About 1 in 10 times this would randomly fail for no apparent reason. Instead of being redirected to the the display UI page (where the confirmation message would be displayed), I would still be on the layout builder UI page.
In order to fix this is in my own test suite, I implemented a retry loop:
If I add some debugging output in the
try/catchstatement, I still see that failures occur (still at a rate of about 1 in 10), but thus far this has resolved my own random failures.I hope this can shed some light on what is going on here. Is there anything that can be investigated as far as why the save does not always occur? I've only seen this in tests, if I use the UI myself I've never had a save failure.
Comment #77
tim.plunkettIn order to get this issue done, I'd like to hold off on any helpful traits or other refactoring.
Let's add the test as it is.
This moves all of the trait code directly into the new class and only retains the changes to
core/modules/settings_tray/tests/modules/settings_tray_test_css/css/css_fix.theme.cssComment #78
tedbowWe shouldn't need. This
\Drupal\Tests\system\FunctionalJavascript\OffCanvasTestdoesn't having anything like this.If we could have x30 patch that removes and proves it is good that would be good.
If it fails I think we should duplicate the test module instead of having a settings_tray test module here. And with a @todo to remove it here https://www.drupal.org/project/drupal/issues/2901792#comment-12695082
That issue has a patch to make a system module testing module.
I think we should have a @todo point to https://www.drupal.org/project/drupal/issues/2918718 to remove this.
Comment #79
tim.plunkettFixed both!
Comment #80
tedbowLooks good 🎉
Comment #82
xjm30x test runs isn't enough to prove that a random fail is resolved or not unless the test is failing like 50% of the time, so queuing multiple test runs of the 30-test patch. However, we also usually want a version of the 30-test-runs patch that does not include the fix, to prove that the number of times we ran the test is enough to repeatedly expose the fail. Otherwise a 30-run patch isn't useful. Statistics! ;)
Comment #83
xjmAlso, did anyone try running this locally a few hundred times?
Comment #84
tedbow@xjm
Looks they all came back green!
Yes but since this patch just introduces a new test that is not in 8.7.x then I am don't know what "patch that does not include the fix" would actually be. I don't think we can do that here.
I haven't but now that you ran it 11 times above that is 330 passing on Drupalci!
found 1 remaining problem though.
We shouldn't need the change in this file because we are no longer including this test module.
Comment #85
tim.plunkettOh, I meant to do that in #79 when I stopped loading the module in the test
New diffstat:
Comment #86
tedbowLooks good!
Comment #88
tedbow#87 is totally unrelated non javascript test EntityReferenceAdminTest. Looks to be random failure
Comment #89
xjmThe 330 test runs still actually isn't enough without knowing the base fail rate. Say the base fail rate was 1% (which is still a pretty high fail rate). Running the test 330 times with a 1% random fail would mean there would be a
(.99)^330 = 4%chance every single test run would pass even with the random fail being present. This is why I asked about running the test locally a few hundred times, to know what the base fail rate was previously and compare it to now.Of course the best thing is always to make a version of the test that fails 100% of the time (by generating whatever random condition is causing the fail 100% of the time, whether with a
wait()to slow things down, or hardcoding the random character that breaks the test, or whatever).Anyway, since it looks like the fix involves something like a CSS transition (lol?) then it'd be harder to create that 100% fail patch for something like that.Looks like that change was removed from the patch now.Regarding
x EntityReferenceAdminTest, I confirmed https://www.drupal.org/pift-ci-job/1184261 is in HEAD and has failed twice in the past two weeks. Seems to be missing a core critical issue though... Anyone want to file that for me? :PIs the earlier version of the patch that included the random fail? We could test that the same number of times to expose the failure rate that's being resolved?
Comment #90
xjmOK, reading back on the issue, the original fail rate for the test was in the 5-10% range, so at that point I'm more comfortable with 330 test runs being enough. ;)
Comment #91
xjmPosted #3030902: Random fail in EntityReferenceAdminTest.
Comment #92
xjmThis looks good. Just some nitpicks:
The inline comment doesn't describe what's happening in the next 10 lines; can we remove it?
"From one layout to another" sounds wrong. From one section to another? From one region within the same section to another?
Also there's a missing blank line before the comment.
Missing blank line before the comment again.
Nitpick, but "Ensure the drag persisted" is confusing. Puns abound. How about "Ensure the dragged block is still in the correct position after reload/save"?
I think what's imoportant here is "Reconfigure a block and ensure that the layout content is updated"?
This method name confused me ("Aren't they all?") but it's about configurable layout templates a.k.a. section layouts. I guess we never finalized that terminology issue, so that's why "layouts" is used here without any modifier. We could still clarify the method name.
Nit: "Off Canvas dialog" is wrong. It should be one of "off-canvas dialog".
Comment #93
tedbow@xjm thanks for the review
Fixed
Fixed
Fixed
fixed
fixed
changed to
testConfigurableLayoutSections()fixed
Comment #94
tim.plunkettThanks for the review and fixes!
Comment #97
xjmThanks, committed to 8.7.x. I also backported it to 8.6.x as a test addition; we'll see in an hour or so if that was a bad idea. :P
Comment #99
xjmOK @tedbow indicated a reason it's not 8.6.x-compatible; reverted. Sorry for the emails.
Comment #100
xjmFixing credit; d.o somehow unchecked people that are included in the commit message.
Comment #101
alexpottThis is still causing random fails. See https://www.drupal.org/pift-ci-job/1196600