Problem/Motivation
https://www.drupal.org/pift-ci-job/537928
https://www.drupal.org/pift-ci-job/535605
https://www.drupal.org/pift-ci-job/538079
Proposed resolution
Remaining tasks
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #150 | interdiff-144-150.txt | 976 bytes | tedbow |
| #150 | 2830485-150.patch | 7.47 KB | tedbow |
| #146 | 2830485-146.patch | 8.84 KB | michielnugter |
| #146 | interdiff-144-146.txt | 2.28 KB | michielnugter |
| #144 | 2830485-144.patch | 7.24 KB | michielnugter |
Comments
Comment #2
wim leersAlso seen at #2737719: EntityResource: Provide comprehensive test coverage: for every entity type, every format, every method, twice already.
Comment #3
catchSince this is an experimental module, we could disable the test and make this a major (and initiative blocking) bug to re-enable it. That's very tempting considering the total number of random fails we have.
Comment #4
wim leersMakes sense to me.
Comment #5
tim.plunkett+1, I know @tedbow and I will be busy for the next week or so, makes sense to disable it for now.
Comment #6
xjmI am unsure if this fail is specific to Outside In or a general issue with JSTB tests. Here are the HEAD fails for JSTB tests I have from the past three months:
This test (last happened in HEAD two weeks ago):
https://www.drupal.org/pift-ci-job/515769
https://www.drupal.org/pift-ci-job/517307
https://www.drupal.org/pift-ci-job/521757
Other Outside In test (last happened in HEAD in Sept.; maybe resolved?):
https://www.drupal.org/pift-ci-job/441943
https://www.drupal.org/pift-ci-job/443869
https://www.drupal.org/pift-ci-job/446521
Entity Reference test (last fail in HEAD a month ago, but may be happening on issues still):
https://www.drupal.org/pift-ci-job/482719
https://www.drupal.org/pift-ci-job/485367
https://www.drupal.org/pift-ci-job/483687
https://www.drupal.org/pift-ci-job/499549
https://www.drupal.org/pift-ci-job/500913
https://www.drupal.org/pift-ci-job/503075
https://www.drupal.org/pift-ci-job/503861
https://www.drupal.org/pift-ci-job/504181
https://www.drupal.org/pift-ci-job/508932
https://www.drupal.org/pift-ci-job/511642
https://www.drupal.org/pift-ci-job/514006
https://www.drupal.org/pift-ci-job/515292
https://www.drupal.org/pift-ci-job/515423
Drupal settings test (randomly failing as of last week):
https://www.drupal.org/pift-ci-job/532305
https://www.drupal.org/pift-ci-job/532305
Comment #7
tedbow+1, I will be busy for the next week or so, makes sense to disable it for now.
Comment #8
xjmI don't think disabling this test gains us too much unless we can prove the equally frequent fails for stable modules are unrelated.
Comment #9
klausiFrom a quick look at OutsideInBlockFormTest i can see that the method toggleEditingMode() presses a button at the end. But it never does an assertion after that - no confirmation that the button has done what it was supposed to do. What could happen is that the action of the button has not been finished yet while the following code already calls openBlockForm() which then fails.
In order to confirm that we should:
* Modify run-tests.sh to run OutsideInBlockFormTest 100 times and no other tests. That should give us a patch that always fails on the testbot.
* Modify OutsideInBlockFormTest and try to make the test fails go away, starting by fixing toggleEditingMode() to do an assertion in the end.
Comment #10
klausiTrying to run OutsideInBlockFormTest 50 times, nothing else. This should fail.
Comment #12
klausiThis fails, but for the wrong reasons:
Not sure what that means, haven't seen that before.
Queuing the patch again to see if it fails the same way a second time.
Comment #14
klausiI was trying to test this locally, but PhantomJS 2.1.1 would just hang and not even finish one test.
So we have 3 major problems with JavaScriptTestBase right now:
1) This random test fail in OutsideInBlockFormTest
2) If one test is repeated on the testbot as above a different random test fail appears
3) Local execution of JavaScriptTestBase failing, we need to find out what phantomjs version we use on the testbot.
Comment #15
xjm@mikeryan filed #2831603: Improve JSWebAssert::waitOnAutocomplete for the entity autocomplete one. I somehow failed to notice that that one only happens on postgres, so perhaps I'm wrong about that one being related to this one.
Comment #16
dawehnerIt is phantomjs-2.1.1, see containers/base/php-base-dev/Dockerfile
#2775653: JavascriptTests with webDriver is also a potential related issue.
Comment #17
michielnugter commented@klausi: I had the completely random failure and tests not finishing problem as well earlier. For my restarting phantomjs solved the problem, somehow it got stuck earlier on.
I tried to reproduce and after +/- 15 passes the test started to fail. While failing I heard my fan loudly indicating by mac had a hard time at running the tests. Maybe the failure of the test is related to the load on the server.
I checked the test and clicking the button triggers an ajax request. Shouldn't there be:
After clicking the button? All my other tests will fail if this isn't added.
I'm running the test now to see if it makes a difference.
EDIT: while running the test it got my mySQL server to crash, after that all the tests kept failing and I had to kill the process by hand. Error: PDO::__construct(): Error while reading greeting packet. PID=7558. Not sure if this is relevant.
Comment #18
michielnugter commentedBecause run-tests.sh isn't very informative in why a test failed (at least I don't know how to make it informative) I tried to reproduce the random fail by directly running phpunit and succeeded. I attached the script I used.
The failure I got was while working on other stuff and stressing my mac. The thing I noticed is that waitForElement() defaults to a 1000 ms timeout, which will not always be enough on my mac, this would explain the random failures here...
I'll try to increase it and let it run for an hour or so to see if there are more failures.
Comment #19
michielnugter commentedI increased the default timeout to 10000 ms and set the toggleEditingMode() timeout to the default (form 3000 to 10000).
Before these changes I had about 5 failures on ~100 runs, 2 different failures which the testbot both encountered. After the change I had 0 failures. I'm curious to hear if my patch has any merit or that I'm looking in the completely wrong direction..
Comment #20
klausi@michielnugter: interesting, running the tests directly with phpunit also works better for me locally and does not cause PhantomJS to hang.
Let's try increasing the timeouts and run OutsideInBlockFormTest 50 times again.
Comment #22
michielnugter commentedWas on the edge of my seat for this one, well, back to the drawing board then I guess.
I still think that 1000 ms is a bit short and will cause random failures at some point, it really did for me.
Comment #23
michielnugter commentedComment #24
klausiIncreasing the timeouts had an effect - now only 2 fails in my patch instead of 4 previously. So we are on the right track at least for parts of this issue.
Next we should update all our JS condition assertion code to fail if the timeout is over and the condition still is not met.
Comment #25
michielnugter commentedGlad it's at least an improvement.
I checked the tests and as far as I can tell the assertions already fail if the timeout passes and the condition is not true yet.
The assertTrue() should handle that. All wait...() methods use the assertJsCondition so they should be covered.
Comment #26
klausiNo, they do not fail if the timeout runs out: "Waits some time or until JS condition turns true."So this is a pretty big pitfall for us: we assumed that the test would fail if the time runs out, but it does not and simply returns. Which means we need to put in some extra evaluateScript() in the end to make absolutely sure that the JS condition is met. The test must fail if the JS condition is not met.Comment #28
klausiMerged in 8.3.x, no other changes.
Comment #29
michielnugter commentedWas working in merging as well, you beat me to it..
I looked a little and the wait that is used is the following I think:
File: vendor/jcalderonzumba/mink-phantomjs-driver/src/JavascriptTrait.php
This should return false as long as the condition does not evaluate and make the assertTrue in wait() fail.
Hope I'm wrong though and that the latest patch will fail 'correctly'..
Comment #30
michielnugter commentedComment #31
klausiUgh, sorry, it looks like you are right ...
Side note: there is a pretty severe performance issue in mink-phantomjs-driver/src/JavascriptTrait.php. As soon as the condition evaluates to TRUE the result is not returned but the process always waits 100ms, which is bad as our tests are slow already. We should file an upstream issue to fix this, potentially even implementing our own driver class until it is fixed.
Comment #32
klausiUpstream performance issue: https://github.com/jcalderonzumba/MinkPhantomJSDriver/pull/68
Comment #33
klausiIt is green, yay! 50 runs of OutsideInBlockFormTest are now successful.
Looks like we manged to increase all the relevant timeouts.
Removed the extra evaluateScript() calls. Removed run-test.sh script hacks, let's see if this also works with the rest of the tests in core.
Comment #34
michielnugter commentedYay! Good work klausi!
I reviewed the patch and it looks good as far as I can see. All the time limits are now longer and all tests using the timelimit now use it in a sensable way.
You changed all the calls to wait() to use assertJsCondition, would that mean that the wait() method is not the recommended way anymore? I actually do think so as waiting for a specific number of ms just doesn't make much sense. If you want the js condition the assertJsCondition is a valid solution for that and there would be no need for the wait() method. Right now only JSWebAssert::assertWaitOnAjaxRequest and JavascriptTestBase::tearDown use the wait() method. They could easily be converted to use the driver wait I think. This then would allow deprecating this method. If you agree I could open a new issue for this.
I also noticed the performance problem and figured to do something about it after this was fixed but I see that you already posted the fix, good work again! Hope it's picked up soon as it will shave of a couple of seconds from the whole test set.
Final consideration: are we sure this was the complete problem for the random fails? Everything seems to run smoothly. Just to be sure I'm going to add a couple of tests on the latest patch.
Comment #35
klausiThere are fails on PHP 5.6 & MySQL 5.5, queuing that again. It looks like 8.3.x is currently generally broken on PHP 5.6 & MySQL 5.5, so nothing we can do about that here. That HAL comment test was introduced by #2737719: EntityResource: Provide comprehensive test coverage: for every entity type, every format, every method.
I did the assertJsCondition() replacements to avoid specifying timeout numbers in tests and use the higher default values. Not sure we should deprecate anything yet, but we could discuss in a follow-up issue. Please open one and link it here.
One takeaway from this issue is definitely not using custom timeouts in JS tests that might be too low.
Wether we fixed all the things going wrong in this issue or not: the current patch is certainly an improvement and gives us more robustness. Anyone willing to RTBC this?
Comment #36
dawehnerWorks for me. Well, ideally we would document on wait that you should use
assertJsConditionthough. Do you think its fine to do that here?Comment #37
michielnugter commentedSetting it to RTBC then.
I looked into documenting wait() but it seems it's out of scope, the method is defined in Mink (Behat\Mink\Session). This would also mean that deprecating it is impossible for now. I'm not familiar enough with Mink to be able to say if it's possible to use our own Session object but maybe even then it's a bridge too far to just extend the Session object to be able to deprecate a method (which will never actually be removed)..
Comment #38
michielnugter commentedI ran the 5.6 & mysql 5.5 again yesterday as well to check if it was another random fail. 3 fails in a row don't seem random to me either and unrelated. Is there already an issue open for that or not yet?
Comment #39
wim leersThanks for all the research that went into this issue!
These are bigger culprits in this particular issue; they're what increase the fail rate in
OutsideInBlockFormTest.Changes here are to increase systematic reliability of all Functional JavaScript tests.
Comment #40
alexpottNice sleuthing and work... also I guess it points to the fact that the performance of the test infra is somewhat variable.
Committed and pushed fdbc6eb to 8.3.x and c1a0b23 to 8.2.x. Thanks!
Comment #43
tedbow@klausi and @michielnugter thank you so much for your work on this!
Comment #44
wim leersHm… https://www.drupal.org/pift-ci-job/542596 just failed in the same way again :(
Presumably because this issue only reduced the likelihood of random fails?
Comment #45
berdiralso here: https://www.drupal.org/pift-ci-job/542605
Doesn't look less frequent to me than before?
Comment #46
xjm:(
If it is not fixed, maybe we should revert the previous commits?
Comment #47
catchSuggest we do this for now.
Comment #48
alexpottI don't think reverting is right - since the longer timeouts result in less failures - looking at the evidence provided here.
#47 is sad - since we have no plan on how to turn in on again.
Comment #49
catchIt's an experimental module so it's allowed to be sad - better than making the core queue in general sad with random failures.
Comment #51
alexpott@catch - good point experimental modules shouldn't be making the core queue sad - let's mark the test as skipped.
Comment #52
catch@alexpott suggested markTestAsSkipped().
Comment #54
catchIt's that kind of day.
Comment #55
dawehnerThis throws an exception so we don't really need a
returnstatement.Comment #56
klausiAgreed, that return statement can be removed.
Comment #58
alexpottDiscussed with @catch we agreed to commit #56 and set this issue back to needs work...
Committed and pushed ce52d33 to 8.3.x and 41e5767 to 8.2.x. Thanks!
Comment #61
klausiBack to reproducing the test fails first by running OutsideInBlockFormTest 50 times. This should fail.
Comment #62
klausiOpened #2832672: [PP-1] Upgrade jcalderonzumba/* for better test performance (found in this issue, but otherwise unrelated).
Comment #64
alexpottSince we're not running the test I don't think this should be a critical issue - but having a skipped test is a major bug to me because we have less test coverage of something we thought was worth testing.
Comment #65
klausiTests are failing as expected, yay!
The next step in this issue is to use a hacked version of Zumba\GastonJS\Exception\BrowserError and print out more info so that we know what the actual PhantomJS error was.
Comment #66
klausiI forked gastonjs for debugging into https://github.com/klausi/gastonjs , changed this: https://github.com/jcalderonzumba/gastonjs/compare/master...klausi:debug...
Changed the composer dependency in the patch, let's see if we get more info with this.
Comment #67
klausiHm, this should have failed on the testbot. Let's do the test 100 times to provoke the fail.
Comment #69
michielnugter commentedHmm, doesn't seem to pick up your altered gastonjs yet. The error is as unclear as it was before..
Comment #70
michielnugter commentedComment #71
klausiRight, sorry about that, I should have checked that the correct stuff is in the zip ball referenced in composer.lock.
Comment #73
michielnugter commented:( Well, that's not much better.
Is it in any way possible to create screenshots with the testbot?
Alternative: mail the HTML of the page just before line 88, would have no problem in receiving loads of e-mails to find the problem here.
Comment #74
michielnugter commentedComment #75
michielnugter commentedTried about everything but just can't reproduce it. Added try-catch + mail/HTML debug, curious to see if that works..
(please don't shoot me on the mega-ugly mail, just didn't feel like really trying to do it the right way)
Comment #76
klausiOh, testbots are not allowed to send email, so that email will never reach you. I think we have to squeeze any output into the assertion message to actually see it. Or install the testbot locally.
Comment #78
michielnugter commentedYep, looks like it, the test-bot failed there. Changed it into asssertEquals('')..
Don't know yet how to configure the testbot locally. Would be very usefull, with xdebug within the tests themselves would be golden.. Should try to get that running some other time..
Comment #79
michielnugter commentedComment #81
michielnugter commentedI attached the screenshot (well, this method works :)), gone close my laptop for today, I'll try to look into it tomorrow. Looks wrong though (editing still visible while the test just tried to close it), maybe the previous wait isn't good enough..
Comment #82
michielnugter commentedFor reference, this screenshot is what it should have looked like.
So it's not actually the click that fails but it's the closing of the tray:
That fails I think. Could also be that the initial load of the /user page is incorrect.
Comment #83
michielnugter commentedAdded an initial screenshot and a check if the tray was open or not to narrow down the problem area.
Comment #84
michielnugter commentedHmm, this is weird. Now it keeps passing (2 completed runs).
I still do think I we can get something out of this. The initial screenshot takes some time in the test which causes a longer time before the assertions start.
This could mean that the page actually hasn't finished executing all behaviors yet or ....?
Not sure on how to fix this. Random attempt: added a assertWaitOnAjaxRequest() and increased the number of test runs to 200 just to be sure.
Comment #85
michielnugter commentedInteresting, it now passes. Seems that my previous conclusion maybe isn't that far fetched. Does drupalGet() perform checks to see if the page is really finished loading and excuting any JavaScript code/ajax requests?
Comment #86
klausiNice find! That would mean we should add assertWaitOnAjaxRequest() to every GET request so that we can be sure the page is really ready for further action.
Comment #87
michielnugter commentedThat would be a solution. But first, do you agree with my finding?
I thought about it some more. I don't know yet when PhantomJS reports back that it finished loading. If I assume the load event, which should mean all content is loaded thus sounding reasonable to me, it could still be too early. This would be too early for pages that have AJAX requests triggered in behaviors (which are triggered in documentready). These requests might not be finished in time before load is triggered. This test is an example of a situation where ajax requests are fired in behaviors.
Side note: The more tests I write the more unfortunate I find the name of the method assertWaitOnAjaxRequest, assertWaitOnJavascript() would be a better name I think.
Comment #88
michielnugter commented400 iterations now and it passes (your test and mine). It does seem like a fix that could work and is not too far fetched..
Also very curious to see if this would be the fix for the other random failures.. Was planning on looking into #2831603: Improve JSWebAssert::waitOnAutocomplete altough I don't have postgress installed..
BTW: I do think this will remain relevant, whatever driver / browser is used (#2775653: JavascriptTests with webDriver).
Comment #89
klausiYes, I agree with your findings. PhantomJS and the driver we use is still a black box to me, but we have demonstrated here that we can fix the random fails this way. This might only be another step to get our javascript tests better under control, not a final solution. It is still progress.
I queued a bunch of test runs for the patch above to make sure the 200 OutsideInBlockFormTest tests also pass in all environments.
Let's see if this approach also works for all our other tests.
Comment #90
tim.plunkettAdditionally, there are some tests doing $this->submitForm() followed by assertWaitOnAjaxRequest(), which makes sense.
Maybe we should add it there too?
I couldn't find any places already doing a assertWaitOnAjaxRequest() after a drupalGet(), mostly after things like pressButton/clickLink/selectFieldOption.
Comment #91
tedbowShould we make an optional parameter $wait_for_javascript = TRUE. Would there be tests that don't have ajax on page where they might want to skip this or would the extra check never be a problem?
Comment #92
michielnugter commented@tim.plunkett
Is the usage of submitForm() the correct way of submitting a form in JTB? Untill now I've just been pressing the submitbutton. I does seem usefull to make this the best practice as it does other things as well. Changing it to the best practice does solve the problem as we could add the assertWaitOnAjaxRequest here.
@tedbow
The call shouldn't be a problem as it will immediatly return if there are no AJAX calls running thus not slowing down your page. I can't think of a situation where it could harm your test to have it there.
Comment #93
alexpottNice sleuthing. I think #91 is not a problem because the check is before waiting. See
\Zumba\Mink\Driver\JavascriptTrait::wait(). However, I agree that making this optional behaviour seems like a good idea. I've looked at the code to see if we can do something lower down that affects both drupalGet() and submitForm() - but it does not seem simple. These methods call \Zumba\GastonJS\Browser\BrowserNavigateTrait::visit() and \Zumba\GastonJS\Browser\BrowserMouseEventTrait::click() respectively.Comment #94
alexpottNice sleuthing. I think #91 is not a problem because the check is before waiting. See
\Zumba\Mink\Driver\JavascriptTrait::wait(). However, I agree that making this optional behaviour seems like a good idea. I've looked at the code to see if we can do something lower down that affects both drupalGet() and submitForm() - but it does not seem simple. These methods call \Zumba\GastonJS\Browser\BrowserNavigateTrait::visit() and \Zumba\GastonJS\Browser\BrowserMouseEventTrait::click() respectively.Comment #95
michielnugter commentedI searched for something lower down as well but I couldn't find anything. The get and button click are two completely different paths in causing a page reload. If you submit the form yourself by clicking a button there is no way to intercept it and add the wait as far as I can see, which would mean that we should reject patches that use this method in favor of the submitForm method.
Don't have time anymore to change the patch, very sorry about that. Gotta go for now, if it's not fixed next tuesday I'll continue to help out!
Comment #96
michielnugter commentedComment #97
alexpottSo we could implement out own PhantomJSDriver class that extends \Zumba\Mink\Driver\PhantomJSDriver and uses it's own \Zumba\GastonJS\Browser\Browser that extends that class but by default waits for JS to ajax to finish after doing a command. We'd just have to wrap \Zumba\GastonJS\Browser\BrowserBase::command() to do this.
Comment #98
dawehnerWell, I'm wondering whether this is something we could also fix upstream somehow. I totally imagine that other users of this library might have similar problems.
Comment #99
michielnugter commented@dawehner
Fixing it upsteam would nice but that has it's own problems.
The only, non-hacky, way to see if an AJAX request is running is by using jQuery.active (which Drupal currently does). This a jQuery specific option that only works as long as you use the jQuery methods for AJAX. There is no, standard, way of telling if an XMLHttpRequest is running. You'll need to overwrite some of the XMLHttpRequest functionality as suggested in:
http://stackoverflow.com/questions/10783463/javascript-detect-ajax-requests
This could be done of course but I'm not sure about overwriting such a basic functionality in a test suite. Then again, having a more low-level way of checking for activity would be nice.
Comment #100
alexpott@michielnugter, @dawehner I wonder if rather than fixing this upstream is there anyway we can customise
vendor/jcalderonzumba/gastonjs/src/Client/main.jsto help us?Comment #101
michielnugter commentedI've been discussing this issue offline with lendude and he had a nice insight: "You should know when writing a test when to wait for additional processing to finish.".
I agree with him. Enabling the outside-in module means some js and ajax requests will be triggered on load. The test itself should make sure that it waits for this processing to finish.
We could expand the doxygen for drupalGet to explain this and give a hint. I've also been working on the documentation here: https://www.drupal.org/docs/8/phpunit/phpunit-javascript-testing-tutorial. This page could be expanded with these kind of things. (Section: how to load a page).
What do you think?
EDIT: About altering main.js. Yes it should be possible but only in a hackish way and I'm really not sure about this route. Implementing our own Driver and Browser seem like better ideas to me if we want to implement this low-level.
Comment #102
klausiI agree that we should fix OutsideInBlockFormTest first with an explicit wait for the Ajax to complete. Writing our own drivers or wait logic in the PhantomJS bindings might be taking this a bit too far for now. Even the addition to drupalGet() I proposed earlier might be a bit too much for this issue.
#2832672: [PP-1] Upgrade jcalderonzumba/* for better test performance has to deal with new failures of our existing tests anyway, so let's figure out there if we need additional Ajax waiting and potentially our own driver modifications.
Comment #103
michielnugter commentedI created the patch with only a change for the outside-in test with an additional version that runs 200 times.
Comment #104
klausiThanks, this should all come back green.
Adding a comment why we need to wait, but otherwise this should be RTBC.
Comment #105
michielnugter commentedI added some notes and tips for this on the documentation page:
https://www.drupal.org/docs/8/phpunit/phpunit-javascript-testing-tutorial
I recently updated this page based on my experiences so far..
Comment #106
lendude@michielnugter++, @klausi++, RTBC++
Comment #107
xjmLet's just confirm that the 200 run patch again would actually fail without the fix.
Comment #108
tedbowWould it make sense to override drupalGet() in \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInJavascriptTestBase then call $web_assert->assertWaitOnAjaxRequest(); from there? Then any new test method won't have to do that same thing
Other issue that introduce test methods:
#2786193: Differentiate 2 "Quick edit" links for custom blocks
#2782915: Standardize the behavior of links when Outside In editing mode is enabled
Comment #109
xjmFor #108.
Comment #111
xjm#107 failed 3 times, so that helps confirm the fix. Thanks @michielnugter!
Reuploading #104 and queuing it several times. NR still for #108.
Edit: meant to requeue the 200 run patch. Sorry testbot. :)
Comment #112
dawehnerI think I agree with @tedbow, in case we don't put this into drupalGet or somehow (even if its hard) on the lower levels we end up with more random failures in the future. On the other hand fixing this random failure now helps for itself as well.
Comment #113
tim.plunkettThis should do it, right?
Note there's also \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInJavascriptTestBase::waitForOffCanvasToOpen() but I think that assumes the tray is open.
Comment #114
tedbow@tim.plunkett thanks for making the patch. Looks good!
Comment #115
xjmComment #116
xjmI'm fine with fixing this in all outside in tests for now, but it seems like this problem is going to come up again and again in JS tests. Should we file a followup to discuss a more general way of addressing the problem?
Comment #117
tim.plunkettAdding
$this->assertSession()->assertWaitOnAjaxRequest();after link clicks or button presses is pretty common, and you can't really write a test without it.AFAIK this is the first time we've encountered needing to wait for JS running on page load, with no other interaction, which is why it caught us all.
Not sure if there *is* a better way, other than thinking very carefully about what the page is doing!
Comment #118
michielnugter commentedIt seems logical for me to fix for every outside-in test, I agree completely with the latest patch.
The problem might come up again but I hope that the extended documentation will help with that and maybe we should add something to the drupalGet documentation to indicate this might be a problem. I really don't know if adding a fix for this low-level is a good idea. The behavior will be specific to a few tests only and there are situations where you might not want it (maybe the big-pipe tests?).
Comment #119
dawehnerI'm wondering whether this approach (wrap the driver and wait on every click, visit and form submission.
Comment #121
alexpottI vote for doing #115 here and opening a followup to discuss. However with #115 there still seems to be a random error - https://www.drupal.org/pift-ci-job/551149
Comment #122
dawehnerI agree totally, I think though trying to do the research later would be cool!
Comment #123
tedbowOk chatted with @dawehner briefly about this he suggested using a data provider method for the block info so the test isn't looping in through multiple blocks and making multiple drupalGet() calls.
Maybe the browser gets slower as the test goes on? How knows it's random.
Anyways leaving previous method where it tests all blocks in 1 go and a new one which test each block with a separate drupal install using a dataprovider. For now I changed as little code as possible so they both call the new function individualBlockTest(). Will need more factoring if determine data provider solves the problem.
UPDATE:
I missed this but agreed.
Will update this patch anyways just to see.
Comment #125
tedbowRetesting
Not sure what this means DrupalCI "ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?"
Comment #127
tedbowOk actually looking back on the fail in #123 I saw "Build timed out (after 110 minutes). Marking the build as aborted."
Going to upload a patch that just does the dataprovider version 200 times. Hopefully it won't time out but maybe because 200 x 3 providers runs = 600 installs.
Comment #129
tedbowTime out again. Got 125 tests done. So over half way there.
Going to change to just 100 tests. hopefully pace if running at same rate.
Comment #130
michielnugter commentedI tried 500 tests in another issue and that was too much as well, seems that 200-300 is a max of runs the testbot can handle for the functional javascript tests.
I added a bunch of tests to see if the new patch holds up. If everything passes I still think we should keep an eye on this issue, it seems to me like a work-a-round instead of an actual fix?
I'm curious: does the testbot restart PhantomJS for each test? The PhantomJS browser really acts as a normal browser. I had a problem earlier with javascript changes not beeing picked up in my test, turns out it was using a browser cached version, restarting PhantomJS fixed it. If PhantomJS is not restarted each time this might cause it's own problems.
Comment #131
xjmUnfortunately, #115 is not actually resolving the bug. See https://www.drupal.org/pift-ci-job/551149. Since the
drupalGet()approach was therefore not actually fixing the bug completely, probably we should remove it from the patch and see what the effect of the data provider change is on its own?Comment #132
xjmAgreed @michielnugter.
Comment #133
michielnugter commentedThis patch is without the overridden drupalGet().
The mentioned fail is a different one as before, the fail is later in the test but it might still be the same fail and thus the attempts so far would not be good enough.
I think it pops up later because of the changed way of testing. maybe because of a separate test for each block the tray is never expanded on reload?
Comment #134
xjmQueued more runs for #133.
What's interesting is that #103 did not show the fail #115 did, so I queued more of #103 too.
One possibility is that we are lowering the failure rate but not necessarily fixing it, since it's related to how long the test takes to do stuff.
Comment #135
michielnugter commentedAlright, #132 fails as I expected. The drupalGet() fix actually does good as I see it now.
I do think we're lowering the possibility of a random fail with each of these fixes. All fixes up till now we're explainable, very plausible and had a positive effect on the number of random fails.
I think what we're seeing now is the effect of the JTB tests beeing new to Drupal and the theething problems that come with that.
EDIT: About the #103 not failing and #115 failing: I think #103 is just as likely to have the same random failure because it comes down to the same fix.
Comment #137
tedbowOk so it seems the only setup that hasn't failed is using a dataprovider and overriding drupalGet().
So just to recap
Both fail.
#129 uses dataprovider and overriding drupalGet().
Passes
This patch is effectively the same as #129 but cleaned up.
I could actually see where we might need these 2 things together.
Could be that without the dataprovider the second or third time around the loop, subsequent drupalGet requests, the JS is somehow slower. So using the dataprovider fixes the slowness because there is an install instead of multiple requests. But there is still enough slowness even with the dataprovider that requests only work if we add the wait after the parent drupalGet().
Comment #138
tedbowWhat the @#$%@$%?
So now a random failure on a totally different issue?
Comment #139
michielnugter commentedHad just about the same reaction...
I really seems like a different issue than before. It's not an error immediately after loading a page.
But in this case, aren't there 2 ajax requests in a row? I see one POST and 1 get. The get is fired immediatly after the POST it seems. If the test checks for jquery activity exactly between these two requests it seems logical that the test fails.
So yes a random fail, yes a different one and again explainable I think.
Maybe we should test on the result using a js condition in this case?
Comment #140
michielnugter commentedPatch with the changed assertion. Let's see what this does.
Comment #142
michielnugter commentedSorry about that one, fail was actually reproduceable locally. Updated the patch, this one works locally..
Comment #143
michielnugter commentedLooking good, added another complete test run to see if it will hold up.
Comment #144
michielnugter commentedVersion without the 100 runs for RTBC consideration
Comment #145
tedbow@michielnugter
I like the change but we should probably put in a comment as to why.
I think I could forget why we aren't using
pageTextContains($new_page_text);We could even make a method assertJSElementHTML($selector, $html)
but maybe not needed.
Comment #146
michielnugter commentedIs something like this what you're looking for?
I added the methods to JavascriptTestBase, don't know if they should be placed somewhere else?
Comment #147
tedbow@michielnugter that looks good. If we get push back on the change to JavascriptTestBase then probably just your added comment will do but we can wait and see about that
Crossing my fingers and hoping there is little less randomness in the universe today ;)
Comment #148
lendudeGreat work!
yeah personally not a fan of adding extra assertions that are just wrappers for another assertion. If we see this pattern used more then it might be useful and we can always add and refactor later.
But it's just personal opinion, I know there are other people that do like this since it leads to more readable test code (which I think is a pipe dream anyway).
Comment #149
michielnugter commentedI created #2837676: Provide a better way to validate all javascript activity is completed which could be a fix for this issue. I'd like to know if the created issue is a good solution for this problem.
Comment #150
tedbow@michielnugter sorry for going back and forth on this but now that you have opened up #2837676: Provide a better way to validate all javascript activity is completed
It seems like the patch in #144 cover what is needed without affecting adding anything outside of this module.
I think we should go with the approach in #144.
This patch simply adds a @todo so that we can simply use \Behat\Mink\WebAssert::pageTextContains(). if #2837676 provides a new wait method.
That way won't need the new assert method.
Comment #151
michielnugter commented@tedbow I agree, I'm also in favor of committing #150 and fix the random errors for this specific case and tackle the rest of the random fails in #2837676: Provide a better way to validate all javascript activity is completed with the improved validation.
So, RTBC?
Comment #152
tedbow@michielnugter yep think we are RTBC!
Comment #155
xjmI agree, this seems like the best course for now. I'm also glad we have the followup in #2837676: Provide a better way to validate all javascript activity is completed.
Really great and patient work on this issue! Committed and pushed to 8.3.x and 8.2.x.
Comment #157
xjmComment #158
mpdonadioNot sure if it is related, but still seeing problems, #2848177: More \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInBlockFormTest random fails.
Comment #159
tedbowChanging to new settings_tray.module component. @drpal thanks for script help! :)