Problem/Motivation
Drupal's JavaScript tests can fail when the test runner has enough resources to be faster than expected. This is often seen when a link is expected to be clickable but it is not or a field value should be settable but it is not.
For example, if you run Drupal's JS tests on DrupalCI with a concurrency of one it will fail due to this. Also the Layout Builder tests are prone to failing locally due to this.
Follow up to #2946294: Fix race conditions in OffCanvasTestBase
This was solving for the random test failure the produces and error like
1) Drupal\Tests\system\FunctionalJavascript\OffCanvasTest::testOffCanvasLinks with data set "stark" ('stark')
WebDriver\Exception\UnknownError: unknown error: Element ... is not clickable at point (516, 87). Other element would receive the click: ...
(Session info: headless chrome=62.0.3202.94)
(Driver info: chromedriver=2.33.506092 (733a02544d189eeb751fe0d7ddca79a0ee28cce4),platform=Linux 4.9.0-0.bpo.6-amd64 x86_64)/var/www/html/vendor/instaclick/php-webdriver/lib/WebDriver/Exception.php:155
/var/www/html/vendor/instaclick/php-webdriver/lib/WebDriver/AbstractWebDriver.php:157
/var/www/html/vendor/instaclick/php-webdriver/lib/WebDriver/AbstractWebDriver.php:218
/var/www/html/vendor/instaclick/php-webdriver/lib/WebDriver/Container.php:224
/var/www/html/vendor/behat/mink-selenium2-driver/src/Selenium2Driver.php:775
/var/www/html/vendor/behat/mink-selenium2-driver/src/Selenium2Driver.php:763
/var/www/html/vendor/behat/mink/src/Element/NodeElement.php:153
/var/www/html/vendor/behat/mink/src/Element/TraversableElement.php:73
/var/www/html/core/modules/system/tests/src/FunctionalJavascript/OffCanvasTest.php:99
Example
This seems to be a problem where the test browser is trying to click an element but other element is moving on the page and the element to be clicked is obstructed.
Proposed resolution
Generalize the solution in #2946294: Fix race conditions in OffCanvasTestBase
Option 1: Override existing methods of clicking
If would be nice if
$this->getSession()->getPage()->clickLink()
and
$this->getSession()->getPage()->fillField()
were fault tolerant for JS tests.
Therefore we extend \Drupal\FunctionalJavascriptTests\DrupalSelenium2Driver to add this to the underlying driver methods.
Option 2: Provide a new way to click elements
Add new method on WebDriverTestBase
something like safeClick
Remaining tasks
Open follow-ups or find existing issues for the tests that are still failing regularly with a concurrency of one:
- Drupal\Tests\layout_builder\FunctionalJavascript\ContentPreviewToggleTest
- Drupal\Tests\layout_builder\FunctionalJavascript\LayoutBuilderTest
- Drupal\Tests\layout_builder\FunctionalJavascript\AjaxBlockTest
User interface changes
None
API changes
None
Data model changes
None
Release notes snippet
N/a
Comment | File | Size | Author |
---|---|---|---|
#86 | 3032275-86.patch | 19 KB | alexpott |
#86 | 81-86-interdiff.txt | 1.23 KB | alexpott |
#81 | 3032275-81.patch | 18.86 KB | alexpott |
Comments
Comment #2
dwwI'm torn. Overriding the existing methods (option 1) is sort of a mess, since we'd have to intercept a lot of possible choices, and that probably gets ugly quickly. I'm also not exactly clear on the mechanism to try to intercept those methods, but I haven't looked that closely.
However, expecting test writers to know when to use
clickLink()
vs.safeClick()
(option 2) seems somewhat unreasonable. Yes, it'd be great to have a generic solution, but it's not going to do much good if people don't know it exists and know when to use it. I'm barely getting started with FunctionalJS tests, and I sure as hell don't feel qualified to understand when a given command is likely to introduce a race condition or not. It's mostly all magic to me at this point. Maybe I (and others) would simply adopt the "always use the 'safe' option" approach, and maybe that's fine.But yes, we clearly have a general race condition problem in our JS tests, and I'm glad we now have an issue to try to track it. Moving to the phpunit component (which seems more appropriate) and tagging for 'Random test failure' (since that's the class of issues this one is trying to solve.
Comment #3
xjmComment #4
tedbowI agree that option 1 could be a mess but also should every click be a safeClick 😜
So here is a try at Option 1. If this actually works it is not too bad.
findAll()
because all other find* methods end up there.\Behat\Mink\Element\NodeElement::click()
so I think we just have to override that.Comment #6
tedbow\Drupal\Tests\UiHelperTrait::click()
\Drupal\Tests\content_translation\Functional\ContentTranslationRevisionTranslationDeletionTest::doTestOverview
(in a call)Both call
$this->getSession()->getDriver()->click()
We should probably change
\Drupal\Tests\UiHelperTrait::click()
to use$this->getSession()->getPage()->find('css', $css_selector)->click()
I am not sure why it is not doing so now.
Comment #7
Krzysztof DomańskiA similar problem in the Media and Media Library.
https://www.drupal.org/pift-ci-job/1196544
https://www.drupal.org/pift-ci-job/1190550
Comment #8
Krzysztof DomańskiMany random test failure.
Comment #9
LendudePretty cool that this works, but....
The issue I see here, is that this is just solving one very small fraction of possible causes of these random fails. There are an X number of reasons you need to wait during a javascript test (and even more for Nightwatch tests, which doesn't even pause on page loads). So are we going to try to catch ALL the reasons that can cause this, or are people going to need to learn that you need to waitForX a lot when writing these types of tests?
Annoying? Maybe, but it's the reality of writing javascript tests and catching one (or some) of the causes of this will not make them clearer to use or understand I think.
We don't want to get to a state where we are just debouncing EVERYTHING by 100ms like what was done in the MinkPhantomJSDriver (see https://github.com/jcalderonzumba/MinkPhantomJSDriver/pull/68). Taking away the 100ms led to a number of fails in our tests IIRC, so having that in was a solution to some of our random fails (just not a very good one I feel).
Comment #10
tedbow@Lendude I understand your concerns here
But if you look at this patch I was testing with here with a issue that is having this type of random fail
Here I am waiting for just about everything I can think of. The line directly before the
clickLink()
checks if the link is visible.But still this randomly fails here.
With the fix here this line does not randomly fail.
Comment #11
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedI just corrected some nitpicks:
- Fixed coding standards errors, except in the test. I think it doesn't need a doc comment (I couldn't come up with one).
- Minor modification to the JavascriptSession to be more similar to the parent Session class.
How do you test if it is working though?
Comment #12
tedbow@bendeguz.csirmaz thanks!
Yeah testing is tricky.
Here is idea I had.
Without the fix if you did this
It should fail because the Target Link would be obstructed by the Blocker element
With the fix the same code should pass because
JavascriptNodeElement::click
will try to click repeatedly 10 seconds to click until it is clickable.Of course this assumes that
$this->clickLink('Trigger Link');
will get the errorJavascriptNodeElement::click
was made to handle. but this should prove it.Comment #13
andypostPartially related but naming is often an issue too #2815297-14: Actions module should allow to edit and remove non-configurable actions
Comment #14
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commented#12
Added the test.
I'm not sure if creating a new hidden testing module in the modules folder is okay?
Comment #17
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedAdding @group annotation to the test.
Comment #19
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commented#14
Apparently it's not okay: ComposerIntegrationTest fails.
Comment #20
tedbowTest modules should live under /core/[module_name]/tests/modules/
Since this is functionality not for a particular module it should be under /core/system/tests/modules/
You can look at anyone of the test modules there like core/modules/system/tests/modules/cache_test for examples.
Once you move the module it does not need to be hidden. It should not have the hidden key
Comment #21
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedThanks! Here's the fixed version.
Comment #23
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedComment #24
tedbowI think we should just use regular JS file in the test module rather than include JS from the server-side. It is little more work but I think it is clearer what is going on.
Likewise I think we should just have CSS files in the test module.
Comment #25
dww+1 to the points in #24. When I was looking at #21 I thought the same thing.
Some additional nits and concerns:
fault-tolerant
Also, maybe we should say "... in JavaScript tests" or something? we've got JS in the name, maybe worth being more explicit in this description, too?
Should this 2 second wait be a route parameter or something?
(In the new CSS file, not here, but...) don't we also want to set a z-index to force this to be in front of the target_link?
Not sure what this means. Can we more clearly word what this trait is for? "finding extending" isn't proper, regardless.
Unless I'm missing something, I don't see this actually being used anywhere, other than passing it around for the new constructors. Can't we remove this entirely, and majorly simplify lots of this?
fault-tolerant.
protected instead of private?
"return extend elements" doesn't make sense. Does this mean "return extended elements."? What does "extended" mean?
Comment #26
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedOkay, fixed everything, thanks!
The reason I added that inline code is because I thought keeping the test in one file would be simpler.
#25.5
I'm not sure how this could be simplified.
Comment #27
dwwThanks, that's much improved! Attaching the interdiff for other folks to review, too.
Re: #25.5: Indeed, I was missing something. ;) I didn't notice that we pass
$session
to the parent constructor (I was searching for "elementSession" not "session") and I now understand why that's needed. ;) Please ignore.Two tiny remaining nits:
Thanks!
-Derek
Comment #28
tedbowI think @dww 2 points #27 in are worth fixing
Comment #29
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedAdding the fix
Comment #30
dwwDid you mean 100 vs 1000? If 100 is still enough to "see" the bug, that's great. Just double checking.
Otherwise, I have no more concerns. RTBC to my eyes.
Thanks!
-Derek
Comment #31
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedI think it's enough.
Uploading the testonly to prove it :)
(If it fails it can go back to RTBC)
Comment #32
dwwHrm, bummer, that passed. ;)
Can we see maybe 3 test-only versions @ 500, 1000 and 2000 and then decide?
Sorry for the hassles/churn!
Thanks,
-Derek
Comment #33
dwwLet's see...
Comment #34
dwwWhoops, didn't actually mean to assign to myself.
p.s. Now that I applied locally, I saw that the test is at:
core/tests/Drupal/FunctionalJavascriptTests/Tests/JSClickTest.php
while the module it depends on is in:
core/modules/system/tests/modules/js_click_test
I definitely don't grok how we decide what tests live in system vs the core/tests/Drupal world, but I wonder if the test class should move to:
core/modules/system/tests/modules/js_click_test
core/modules/system/tests/src/FunctionalJavascript/JSClickTest.php
?
I guess this is a test for the testing framework itself, not a test for something in Drupal, so the current location makes sense. But I wanted to ask.
Thanks,
-Derek
Comment #35
dwwp.p.s. I regret rushing to upload the patches in #33 without also including a change to drupalci.yml to only run this test. :/ Whoops!!
Comment #39
dwwCool, that's more like it. ;) Let's see about a sweet spot between 100 and 500, then.
This time, with a change to drupalci.yml to only run this particular test, but to repeat it 20 times.
Comment #42
dwwHrm, what about 100ms x 20 repeats? Also 50 and 10 for good measure. ;)
Meanwhile, I noticed the bot complaining about this:
So I fixed that (I think). Let's see what the bot says now.
(sorry for the noise, but I think this is important to find a reasonable value for this).
Comment #44
dwwHah! ;) So I guess 100 is about right, after all. LOL ;) Maybe we should bump to 150 or so. Committers could easily change the value on commit based on all these test results. But here's a full patch with the fix to using setExpectedException(), back at 100ms. Probably RTBC but I'll let @tedbow (or someone) do that.
Thanks,
-Derek
Comment #46
dwwNice! Those are the test results we were hoping for. ;)
I obviously can no longer RTBC this, but I believe it's ready for that.
Note to committers: please list @tedbow and @bendeguz.csirmaz ahead of me in the commit message. I'm only at the front of the default list as a result of uploading all these test-only patches to find/confirm the right value. Those two actually wrote all the code that matters in here.
Thanks,
-Derek
Comment #47
bendeguz.csirmaz CreditAttribution: bendeguz.csirmaz at Cheppers commentedThanks for doing this!
Since 100 randomly failed once I think it should be a greater value to be safe.
Comment #48
dwwWell, 100 randomly *passed* for the test-only. ;) To consistently see the test-only fail, we need more than 100, but that's not really the point. We just wanted to make sure this fix is working, and it is.
Comment #49
tedbowJust a reroll
Comment #50
alexpottWe've discovered that this Exception class is not reliable in #3056536: LayoutBuilderDisableInteractionsTest randomly fails. It needs to be \Exception because on MacOS connecting directly to chromedriver that's the exception that's thrown. On Linux and via selenium-server it's a UnknownError but this will fail on many people's local testing environments. As you're checking the error message anyway I think you'll get the behaviour you want with
catch (\Exception $e)
Comment #53
alexpottMaking this critical because atm you cannot run layout builder tests locally without it failing if you have a fast machine. Plus DrupalCI can't run JS tests with a concurrency of 1 without failing. Going to mark #3246891: Layout Builder JS tests are failing locally and on DrupalCI with a concurrency of 1 as a duplicate.
Comment #54
alexpottThe patch in #53 also adds protection for NodeElement::setValue() as that was also causing not interactable errors for Layout Builder tests.
Comment #57
alexpottFixing the tests and removing usage of the deprecated jQuery.once library.
Comment #58
alexpottLet's handle the visible issue...
Comment #62
alexpottLooking at this change we can move all the hackery of the wait into something we're already overriding - the DrupalSelenium2Driver.
Comment #64
alexpottHopefully this will fix the remaining errors.
Comment #65
alexpottUpdated issue summary with new information about how to cause the problem and the nature of the fix.
Comment #67
alexpottLocally I can get the AjaxBlockTest is I use sqlite. The attached patch fixes this.
Comment #69
alexpottOkay I'm going to stop trying to fix:
We should file follow-ups to explore fixing those. The current approach here of making link clicking and field value setting more resilient reduces the number of random JS errors considering as we can see from the attached patches.
I've also added tests for the form value setting and renamed things in the tests so they are not so click focussed and I removed the dependency on jQuery because why not.
Here's what the different patches attached show:
Comment #70
alexpottComment #72
alexpottYay another difference between a modern chromedriver and the one we currently test on using DrupalCI - we need to update it sometime.
Comment #73
dww@alexpott: Thanks for resurrecting this issue and moving it so far forward!
A few trivial docs nits fixed with attached patch. Then a few other comments / concerns. Only #3 prevents me from recommending RTBC, but I'd love feedback on the others.
#2946294: Fix race conditions in OffCanvasTestBase makes me wonder if we also want to override
Behat\Mink\Element\TraversableElement::clickLink()
, too. But that'd require undoing much of the goodness / simplification from #62...These docs are a bit clumsy and hard to follow.
A) "element(-s)" vs. "it" - if we're going to handle single or plural, "it" should probably be "it/them".
B) "which result is both used as a waiting condition and returned" is confusing. And maybe now wrong? Looks like the callback is expected to return TRUE if it found the thing, but we don't actually care about or return the underlying result beyond that.
That said, this is a private helper in a dark corner of the test plumbing, so maybe it doesn't matter. ;)
It appears that we removed the code that was registering this in
WebDriverTestBase
. It seems this is no longer needed. If so, we should remove it.Looks like how it was registered was changed in #53, but then completely removed in #62.
Thanks again!
-Derek
Comment #74
alexpottThanks for the review @dww
Re #73.1 All clicks will be more fault tolerant now. If they are being added by ajax you're going to need wait otherwise the find will fail but I think that that is okay.
#73.2 and they are copied from upstream docs. I've improved them a bit. Re part B the "which result is both used as a waiting condition and returned" is correct. The callback is expected to return something truthy to end the waiting and something falsey to continue to wait.
#73.3 yep this is dead code now.
Comment #75
dwwThanks, @alexpott!
1. Great! After I posted 73, I was wondering if that’d be the case, but I didn’t have time to dig in and investigate before bed. thanks for confirmation.
2. New docs look good. Truthy or not is the key point, which it now says. Thanks.
3. Yay, glad to see it go.
I have nothing other than too many patches uploaded here stopping me from RTBC. 😉 hopefully @tedbow will RTBC soon (he mentioned in Slack he was planning to review this today).
Thanks again,
-Derek
Comment #76
tedbowExcited this getting fixed! Just a few things
Bittersweet to see this go 😢. Probably one of the great function names in Drupal core history 😂
is "Resize" correct here? I feel this test is already a bit tricky to understand. We are actively removing an element and this first made wonder if we were actively resizing.
could we just say
/* Size the box to cover the target. */
Whats the point of having
{milliseconds}
? We always use/js_interaction_test/100
to access the link in the test. Couldn't we either hard-code this in\Drupal\js_interaction_test\Controller\JSInteractionTestForm::buildForm()
?or if we just hard-coded in
js_interaction_test.trigger_link.es6.js
we won't need to use thecore/drupalSettings
library at all.To me the combo of this form and the JS in js_interaction_test.trigger_link.es6.js makes this test a bit hard to understand. I didn't read any of the comments first to see if the test is understandable on its own since I hadn't thought about this issue in long time.
In the comment we describe this link as
A link that, when clicked, removes the obstructing element after some time.
But then we call it here "Trigger Link". To me it is a little to hard to connect the 2.
Could just call this link "Blocker remover" or even "Link Blocker remover". I don't see any downside to be overly explicit. We could then change the class to
blocker-remover-link
Think the JS would more understandable
or something
We could also update the comment to match
" removes the blocking element after some time."
To me this would more understandable as "Field enabler" with the class
field-enabler
Do we really want to make this a public method instead of private or protected?
This seems like it could cause some BC headaches. For instance what if
Selenium2Driver
itself solves the current problem we are having themselves so that we no longer of have to have our overrides in this class. We won't need this method any longer but we couldn't remove it either because others might be calling it. We could then deprecate it but then it is just one more thing for people take care when they move to the next major version of Drupal.For the field we don't have actually use block element, correct? We just enable it after a delay
Comment #77
dwwThanks for the review, @tedbow! All good points.
But first, I actually tried running the test locally while I was working on the proposed changes. Discovered we never fully addressed #50. The test was dying for me (MacOS chrome + chromedriver 94.0.4606.81) like so:
I had to change both
core/tests/Drupal/FunctionalJavascriptTests/DrupalSelenium2Driver.php
andcore/tests/Drupal/FunctionalJavascriptTests/Tests/JSInteractionTest.php
to use\Exception
notWebDriver\Exception
. Hope that's cool.Uploading that here separately, before I work on #76 so there's a separate patch / interdiff in case we want to change course on this.
Comment #78
dwwThis fixes most of #76. I'm leaving point 3 for @alexpott to decide. Would like confirmation on 6, too.
$this
into that callback (which we're already doing). We don't need this to be static (and therefore public). Agreed that less API is better here. If we have some public need for this (hard to imagine), we can change it then.Comment #79
alexpott@dww re #77 I would love to understand why you're not getting a typed exception. Imo the change to \Exception is not correct and we need to work out what it is about the way you have things set up that results in an un-typed exception. This is because there are other tests that capture even more specific exceptions that would be broken.
Comment #80
alexpottWe should move this elsewhere. And leave it as a public method. There are other places in the codebase that need this. Plus this should stay as static wherever it goes because it has no need for $this.
Comment #81
alexpott@dww I use OSX and chromedriver and \WebDriver\Exception is what I'm getting. Can you provide a full backtrace for your exception so we can see what code is converting it to \Exception - I can not see where that is coming from and rather than converting this to the catch all I'd like to understand it. (Therefore the patch here reverts this change).
Fixed up the wait to be hard-coded - I agree that having this as a setting is OTT.
Also move the exception message tester to JsWebAssert - couldn't think of a better place and used it in the other place where it needs to be.
Comment #82
dwwYeah, I don't think #77 is the right approach. Happy to see it reverted. But for reference, attaching the full phpunit output. Here's what's going on for me locally that's throwing
\Exception
in my case:vendor/instaclick/php-webdriver/lib/WebDriver/Exception.php
I'm getting "element not interactable" but that's not in the list of exceptions that class understands. My (unmodified) composer.lock says:
Maybe it's a mismatch between chrome (95.0.4638.69) and chromedriver (94.0.4606.61)? Sorry, don't have time to dig deeper right now.
Comment #83
alexpott@dww I have no idea how but your vendor directory is very very out-of-date. The current code for 1.4.10 is
and https://github.com/instaclick/php-webdriver/blob/1.4.10/lib/WebDriver/Ex...
Comment #84
dwwYeah, my local vendor was all messed up from switching branches and not re-running
composer install
. Sorry about that! Confirmed that #81 is now passing locally:Although I've uploaded a lot of patches here, many of them were just trying to get the testbot to give us results. The latest code here is mostly @alexpott's additions to and rewrites of the original work by @tedbow and @bendeguz.csirmaz. So I'm gonna be bold and RTBC this. Fixing this would be a huge win for core development, so no reason to further delay.
Comment #85
lauriiiWas this left as the untyped exception on purpose?
Comment #86
alexpottThat's a great catch @lauriii. Fixing that up. It's a minor change so leaving at rtbc.
Comment #88
lauriiiThank you for the new patch @alexpott! Committed 7ff9cbf and pushed to 9.4.x. Thanks!
Leaving open for 9.3.x backport.
Comment #90
lauriiiBackporting to 9.3.x since this only impacts tests. Also got confirmation from @catch on Slack that he didn't have any concerns on backporting this.