Problem/Motivation

https://www.drupal.org/pift-ci-job/537928
https://www.drupal.org/pift-ci-job/535605
https://www.drupal.org/pift-ci-job/538079

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#150 interdiff-144-150.txt976 bytestedbow
#150 2830485-150.patch7.47 KBtedbow
#146 2830485-146.patch8.84 KBmichielnugter
#146 interdiff-144-146.txt2.28 KBmichielnugter
#144 2830485-144.patch7.24 KBmichielnugter
#144 interdiff-142-144.txt538 bytesmichielnugter
#142 2830485-142-100x.patch7.85 KBmichielnugter
#142 interdiff-140-142.txt2.17 KBmichielnugter
#140 2830485-140-100x.patch7.75 KBmichielnugter
#140 interdiff-137-140.txt729 bytesmichielnugter
#137 2830485-137-x100.patch7.7 KBtedbow
#137 interdiff-122-137.txt6.02 KBtedbow
#137 2830485-137-x100.patch7.7 KBtedbow
#133 2830485-132-100_tests_provider.patch7.83 KBmichielnugter
#133 interdiff-122-132.txt743 bytesmichielnugter
#129 interdiff-127-128.txt635 bytestedbow
#129 2830485-122-100_tests_provider.patch8.76 KBtedbow
#127 2830485-122-200_tests_provider.patch8.76 KBtedbow
#123 interdiff-113-122.txt6.99 KBtedbow
#123 2830485-122.patch8.11 KBtedbow
#123 2830485-1220-200_tests.patch8.72 KBtedbow
#119 2830485-119.patch6.87 KBdawehner
#115 2830485-outside_in-113.patch1.73 KBxjm
#115 2830485-113-200-tests.patch2.34 KBxjm
#113 2830485-outside_in-113-interdiff.txt1.78 KBtim.plunkett
#113 2830485-outside_in-113.patch1.73 KBtim.plunkett
#111 outsidein-fails-2830485-104.patch1.28 KBxjm
#107 outsidein-fails-2830485-103-200-runs-FAIL.patch1.41 KBxjm
#104 outsidein-fails-2830485-104.patch1.28 KBklausi
#103 outsidein-fails-2830485-103-200-runs.patch1.77 KBmichielnugter
#103 outsidein-fails-2830485-103.patch1.16 KBmichielnugter
#103 interdiff-89-102.txt1.37 KBmichielnugter
#92 interdiff-89-92.txt1.01 KBmichielnugter
#89 interdiff.txt623 bytesklausi
#89 outsidein-fails-2830485-89.patch1.7 KBklausi
#86 interdiff.txt6.09 KBklausi
#86 outsidein-fails-2830485-85.patch2.31 KBklausi
#84 outsidein-fails-2830485-84.patch6.24 KBmichielnugter
#83 outsidein-fails-2830485-83.patch6.29 KBmichielnugter
#82 correct.jpg28.45 KBmichielnugter
#81 screenshot.jpg24 KBmichielnugter
#78 outsidein-fails-2830485-78.patch5.68 KBmichielnugter
#75 outsidein-fails-2830485-75.patch6.89 KBmichielnugter
#75 interdiff-71-75.txt2.46 KBmichielnugter
#71 interdiff.txt1.23 KBklausi
#71 outsidein-fails-2830485-71.patch4.74 KBklausi
#67 interdiff.txt633 bytesklausi
#67 outsidein-fails-2830485-67.patch4.74 KBklausi
#66 interdiff.txt3.33 KBklausi
#66 outsidein-fails-2830485-66.patch4.74 KBklausi
#61 outsidein-fails-2830485-61.patch1.41 KBklausi
#56 disable-outsidein-test-2830485-56.patch823 bytesklausi
#54 2830485.patch834 bytescatch
#52 2830485.patch836 bytescatch
#47 2830485.patch909 bytescatch
#33 interdiff.txt3.32 KBklausi
#33 outsidein-fails-2830485-33.patch7.66 KBklausi
#28 outsidein-fails-2830485-28.patch10.3 KBklausi
#26 outsidein-fails-2830485-26.patch10.29 KBklausi
#20 outsidein-fails-2830485-19.patch4.13 KBklausi
#19 outsidein-fails-2830485-19.patch1.6 KBmichielnugter
#18 outsidein-fails-2830485-18.patch3.19 KBmichielnugter
#10 outsidein-fails-2830485-10.patch621 bytesklausi
Members fund testing for the Drupal project. Drupal Association Learn more

Comments

catch created an issue. See original summary.

Wim Leers’s picture

Title: Outside in test fails randomly » \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInBlockFormTest fails randomly
Issue summary: View changes
catch’s picture

Since this is an experimental module, we could disable the test and make this a major (and initiative blocking) bug to re-enable it. That's very tempting considering the total number of random fails we have.

Wim Leers’s picture

Makes sense to me.

tim.plunkett’s picture

+1, I know @tedbow and I will be busy for the next week or so, makes sense to disable it for now.

tedbow’s picture

+1, I will be busy for the next week or so, makes sense to disable it for now.

xjm’s picture

I don't think disabling this test gains us too much unless we can prove the equally frequent fails for stable modules are unrelated.

klausi’s picture

From a quick look at OutsideInBlockFormTest i can see that the method toggleEditingMode() presses a button at the end. But it never does an assertion after that - no confirmation that the button has done what it was supposed to do. What could happen is that the action of the button has not been finished yet while the following code already calls openBlockForm() which then fails.

In order to confirm that we should:
* Modify run-tests.sh to run OutsideInBlockFormTest 100 times and no other tests. That should give us a patch that always fails on the testbot.
* Modify OutsideInBlockFormTest and try to make the test fails go away, starting by fixing toggleEditingMode() to do an assertion in the end.

klausi’s picture

Trying to run OutsideInBlockFormTest 50 times, nothing else. This should fail.

Status: Needs review » Needs work

The last submitted patch, 10: outsidein-fails-2830485-10.patch, failed testing.

klausi’s picture

This fails, but for the wrong reasons:

Zumba\GastonJS\Exception\BrowserError: There was an error inside the PhantomJS portion of GastonJS.
This is probably a bug, so please report it:
click
html.touchevents.details.js body.layout-one-sidebar.layout-sidebar-first.user-logged-in.path-user.toolbar-fixed.toolbar-horizontal div#toolbar-administration.toolbar.toolbar-oriented nav#toolbar-bar.toolbar-bar.clearfix.js-outside-in-edit-mode div.contextual-toolbar-tab.toolbar-tab button.toolbar-icon.toolbar-icon-edit.toolbar-item.is-active

/var/www/html/vendor/jcalderonzumba/gastonjs/src/Browser/BrowserBase.php:122
/var/www/html/vendor/jcalderonzumba/gastonjs/src/Browser/BrowserBase.php:99
/var/www/html/vendor/jcalderonzumba/gastonjs/src/Browser/BrowserMouseEventTrait.php:17
/var/www/html/vendor/jcalderonzumba/mink-phantomjs-driver/src/MouseTrait.php:31
/var/www/html/vendor/behat/mink/src/Element/NodeElement.php:153
/var/www/html/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInBlockFormTest.php:88

Not sure what that means, haven't seen that before.

Queuing the patch again to see if it fails the same way a second time.

The last submitted patch, 10: outsidein-fails-2830485-10.patch, failed testing.

klausi’s picture

I was trying to test this locally, but PhantomJS 2.1.1 would just hang and not even finish one test.

So we have 3 major problems with JavaScriptTestBase right now:
1) This random test fail in OutsideInBlockFormTest
2) If one test is repeated on the testbot as above a different random test fail appears
3) Local execution of JavaScriptTestBase failing, we need to find out what phantomjs version we use on the testbot.

xjm’s picture

@mikeryan filed #2831603: Improve assertion for testEntityReferenceAutocompleteWidget for the entity autocomplete one. I somehow failed to notice that that one only happens on postgres, so perhaps I'm wrong about that one being related to this one.

dawehner’s picture

3) Local execution of JavaScriptTestBase failing, we need to find out what phantomjs version we use on the testbot.

It is phantomjs-2.1.1, see containers/base/php-base-dev/Dockerfile

#2775653: JavascriptTests with webDriver is also a potential related issue.

michielnugter’s picture

@klausi: I had the completely random failure and tests not finishing problem as well earlier. For my restarting phantomjs solved the problem, somehow it got stuck earlier on.

I tried to reproduce and after +/- 15 passes the test started to fail. While failing I heard my fan loudly indicating by mac had a hard time at running the tests. Maybe the failure of the test is related to the load on the server.

I checked the test and clicking the button triggers an ajax request. Shouldn't there be:

$this->assertSession()->assertWaitOnAjaxRequest();

After clicking the button? All my other tests will fail if this isn't added.

I'm running the test now to see if it makes a difference.

EDIT: while running the test it got my mySQL server to crash, after that all the tests kept failing and I had to kill the process by hand. Error: PDO::__construct(): Error while reading greeting packet. PID=7558. Not sure if this is relevant.

michielnugter’s picture

Because run-tests.sh isn't very informative in why a test failed (at least I don't know how to make it informative) I tried to reproduce the random fail by directly running phpunit and succeeded. I attached the script I used.

The failure I got was while working on other stuff and stressing my mac. The thing I noticed is that waitForElement() defaults to a 1000 ms timeout, which will not always be enough on my mac, this would explain the random failures here...

I'll try to increase it and let it run for an hour or so to see if there are more failures.

michielnugter’s picture

Status: Needs work » Needs review
FileSize
1.6 KB

I increased the default timeout to 10000 ms and set the toggleEditingMode() timeout to the default (form 3000 to 10000).

Before these changes I had about 5 failures on ~100 runs, 2 different failures which the testbot both encountered. After the change I had 0 failures. I'm curious to hear if my patch has any merit or that I'm looking in the completely wrong direction..

klausi’s picture

@michielnugter: interesting, running the tests directly with phpunit also works better for me locally and does not cause PhantomJS to hang.

Let's try increasing the timeouts and run OutsideInBlockFormTest 50 times again.

Status: Needs review » Needs work

The last submitted patch, 20: outsidein-fails-2830485-19.patch, failed testing.

michielnugter’s picture

Status: Needs work » Needs review

Was on the edge of my seat for this one, well, back to the drawing board then I guess.

I still think that 1000 ms is a bit short and will cause random failures at some point, it really did for me.

michielnugter’s picture

Status: Needs review » Needs work
klausi’s picture

Increasing the timeouts had an effect - now only 2 fails in my patch instead of 4 previously. So we are on the right track at least for parts of this issue.

Next we should update all our JS condition assertion code to fail if the timeout is over and the condition still is not met.

michielnugter’s picture

Glad it's at least an improvement.

I checked the tests and as far as I can tell the assertions already fail if the timeout passes and the condition is not true yet.

  protected function assertJsCondition($condition, $timeout = 10000, $message = '') {
    $message = $message ?: "Javascript condition met:\n" . $condition;
    $result = $this->getSession()->getDriver()->wait($timeout, $condition);
    $this->assertTrue($result, $message);
  }

The assertTrue() should handle that. All wait...() methods use the assertJsCondition so they should be covered.

klausi’s picture

Status: Needs work » Needs review
FileSize
10.29 KB

No, they do not fail if the timeout runs out: "Waits some time or until JS condition turns true."

So this is a pretty big pitfall for us: we assumed that the test would fail if the time runs out, but it does not and simply returns. Which means we need to put in some extra evaluateScript() in the end to make absolutely sure that the JS condition is met. The test must fail if the JS condition is not met.

Status: Needs review » Needs work

The last submitted patch, 26: outsidein-fails-2830485-26.patch, failed testing.

klausi’s picture

Status: Needs work » Needs review
FileSize
10.3 KB

Merged in 8.3.x, no other changes.

michielnugter’s picture

Status: Needs review » Needs work

Was working in merging as well, you beat me to it..

I looked a little and the wait that is used is the following I think:

File: vendor/jcalderonzumba/mink-phantomjs-driver/src/JavascriptTrait.php

  public function wait($timeout, $condition) {
    $start = microtime(true);
    $end = $start + $timeout / 1000.0;
    do {
      $result = $this->browser->evaluate($condition);
      usleep(100000);
    } while (microtime(true) < $end && !$result);

    return (bool)$result;
  }

This should return false as long as the condition does not evaluate and make the assertTrue in wait() fail.

Hope I'm wrong though and that the latest patch will fail 'correctly'..

michielnugter’s picture

Status: Needs work » Needs review
klausi’s picture

Ugh, sorry, it looks like you are right ...

Side note: there is a pretty severe performance issue in mink-phantomjs-driver/src/JavascriptTrait.php. As soon as the condition evaluates to TRUE the result is not returned but the process always waits 100ms, which is bad as our tests are slow already. We should file an upstream issue to fix this, potentially even implementing our own driver class until it is fixed.

klausi’s picture

klausi’s picture

It is green, yay! 50 runs of OutsideInBlockFormTest are now successful.

Looks like we manged to increase all the relevant timeouts.

Removed the extra evaluateScript() calls. Removed run-test.sh script hacks, let's see if this also works with the rest of the tests in core.

michielnugter’s picture

Yay! Good work klausi!

I reviewed the patch and it looks good as far as I can see. All the time limits are now longer and all tests using the timelimit now use it in a sensable way.

You changed all the calls to wait() to use assertJsCondition, would that mean that the wait() method is not the recommended way anymore? I actually do think so as waiting for a specific number of ms just doesn't make much sense. If you want the js condition the assertJsCondition is a valid solution for that and there would be no need for the wait() method. Right now only JSWebAssert::assertWaitOnAjaxRequest and JavascriptTestBase::tearDown use the wait() method. They could easily be converted to use the driver wait I think. This then would allow deprecating this method. If you agree I could open a new issue for this.

I also noticed the performance problem and figured to do something about it after this was fixed but I see that you already posted the fix, good work again! Hope it's picked up soon as it will shave of a couple of seconds from the whole test set.

Final consideration: are we sure this was the complete problem for the random fails? Everything seems to run smoothly. Just to be sure I'm going to add a couple of tests on the latest patch.

klausi’s picture

There are fails on PHP 5.6 & MySQL 5.5, queuing that again. It looks like 8.3.x is currently generally broken on PHP 5.6 & MySQL 5.5, so nothing we can do about that here. That HAL comment test was introduced by #2737719: EntityResource: Provide comprehensive test coverage: for every entity type, every format, every method.

I did the assertJsCondition() replacements to avoid specifying timeout numbers in tests and use the higher default values. Not sure we should deprecate anything yet, but we could discuss in a follow-up issue. Please open one and link it here.

One takeaway from this issue is definitely not using custom timeouts in JS tests that might be too low.

Wether we fixed all the things going wrong in this issue or not: the current patch is certainly an improvement and gives us more robustness. Anyone willing to RTBC this?

dawehner’s picture

Works for me. Well, ideally we would document on wait that you should use assertJsCondition though. Do you think its fine to do that here?

michielnugter’s picture

Status: Needs review » Reviewed & tested by the community

Setting it to RTBC then.

I looked into documenting wait() but it seems it's out of scope, the method is defined in Mink (Behat\Mink\Session). This would also mean that deprecating it is impossible for now. I'm not familiar enough with Mink to be able to say if it's possible to use our own Session object but maybe even then it's a bridge too far to just extend the Session object to be able to deprecate a method (which will never actually be removed)..

michielnugter’s picture

I ran the 5.6 & mysql 5.5 again yesterday as well to check if it was another random fail. 3 fails in a row don't seem random to me either and unrelated. Is there already an issue open for that or not yet?

Wim Leers’s picture

Thanks for all the research that went into this issue!

  1. +++ b/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInBlockFormTest.php
    @@ -111,7 +111,7 @@ public function testBlocks() {
    -        $this->getSession()->wait(500);
    +        $this->assertSession()->assertWaitOnAjaxRequest();
    
    @@ -141,7 +141,7 @@ public function testBlocks() {
    -    $this->waitForElement('#toolbar-bar', 3000);
    +    $this->waitForElement('#toolbar-bar');
    

    These are bigger culprits in this particular issue; they're what increase the fail rate in OutsideInBlockFormTest.

  2. +++ b/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInBlockFormTest.php
    --- a/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInJavascriptTestBase.php
    +++ b/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInJavascriptTestBase.php
    

    Changes here are to increase systematic reliability of all Functional JavaScript tests.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Nice sleuthing and work... also I guess it points to the fact that the performance of the test infra is somewhat variable.

Committed and pushed fdbc6eb to 8.3.x and c1a0b23 to 8.2.x. Thanks!

  • alexpott committed fdbc6eb on 8.3.x
    Issue #2830485 by klausi, michielnugter: \Drupal\Tests\outside_in\...

  • alexpott committed c1a0b23 on 8.2.x
    Issue #2830485 by klausi, michielnugter: \Drupal\Tests\outside_in\...
tedbow’s picture

@klausi and @michielnugter thank you so much for your work on this!

Wim Leers’s picture

Status: Fixed » Active

Hm… https://www.drupal.org/pift-ci-job/542596 just failed in the same way again :(

Presumably because this issue only reduced the likelihood of random fails?

Berdir’s picture

also here: https://www.drupal.org/pift-ci-job/542605

Doesn't look less frequent to me than before?

xjm’s picture

:(

If it is not fixed, maybe we should revert the previous commits?

catch’s picture

Status: Active » Needs review
FileSize
909 bytes

Suggest we do this for now.

alexpott’s picture

I don't think reverting is right - since the longer timeouts result in less failures - looking at the evidence provided here.

#47 is sad - since we have no plan on how to turn in on again.

catch’s picture

It's an experimental module so it's allowed to be sad - better than making the core queue in general sad with random failures.

Status: Needs review » Needs work

The last submitted patch, 47: 2830485.patch, failed testing.

alexpott’s picture

@catch - good point experimental modules shouldn't be making the core queue sad - let's mark the test as skipped.

catch’s picture

Status: Needs work » Needs review
FileSize
836 bytes

@alexpott suggested markTestAsSkipped().

Status: Needs review » Needs work

The last submitted patch, 52: 2830485.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
834 bytes

It's that kind of day.

dawehner’s picture

+++ b/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInBlockFormTest.php
@@ -51,6 +51,9 @@ protected function setUp() {
+    $this->markTestSkipped('Test skipped due to random failures in DrupalCI, see https://www.drupal.org/node/2830485');
+    return;

This throws an exception so we don't really need a return statement.

The last submitted patch, 10: outsidein-fails-2830485-10.patch, failed testing.

alexpott’s picture

Status: Needs review » Needs work

Discussed with @catch we agreed to commit #56 and set this issue back to needs work...

Committed and pushed ce52d33 to 8.3.x and 41e5767 to 8.2.x. Thanks!

  • alexpott committed ce52d33 on 8.3.x
    Issue #2830485 by klausi, catch, michielnugter: \Drupal\Tests\outside_in...

  • alexpott committed 41e5767 on 8.2.x
    Issue #2830485 by klausi, catch, michielnugter: \Drupal\Tests\outside_in...
klausi’s picture

Status: Needs work » Needs review
FileSize
1.41 KB

Back to reproducing the test fails first by running OutsideInBlockFormTest 50 times. This should fail.

klausi’s picture

Opened #2832672: [PP-1] Upgrade jcalderonzumba/* for better test performance (found in this issue, but otherwise unrelated).

Status: Needs review » Needs work

The last submitted patch, 61: outsidein-fails-2830485-61.patch, failed testing.

alexpott’s picture

Priority: Critical » Major

Since we're not running the test I don't think this should be a critical issue - but having a skipped test is a major bug to me because we have less test coverage of something we thought was worth testing.

klausi’s picture

Tests are failing as expected, yay!

The next step in this issue is to use a hacked version of Zumba\GastonJS\Exception\BrowserError and print out more info so that we know what the actual PhantomJS error was.

klausi’s picture

Status: Needs work » Needs review
FileSize
4.74 KB
3.33 KB

I forked gastonjs for debugging into https://github.com/klausi/gastonjs , changed this: https://github.com/jcalderonzumba/gastonjs/compare/master...klausi:debug...

Changed the composer dependency in the patch, let's see if we get more info with this.

klausi’s picture

FileSize
4.74 KB
633 bytes

Hm, this should have failed on the testbot. Let's do the test 100 times to provoke the fail.

Status: Needs review » Needs work

The last submitted patch, 67: outsidein-fails-2830485-67.patch, failed testing.

michielnugter’s picture

Status: Needs work » Needs review

Hmm, doesn't seem to pick up your altered gastonjs yet. The error is as unclear as it was before..

michielnugter’s picture

Status: Needs review » Needs work
klausi’s picture

Status: Needs work » Needs review
FileSize
4.74 KB
1.23 KB

Right, sorry about that, I should have checked that the correct stuff is in the zip ball referenced in composer.lock.

Status: Needs review » Needs work

The last submitted patch, 71: outsidein-fails-2830485-71.patch, failed testing.

michielnugter’s picture

Status: Needs work » Needs review

:( Well, that's not much better.

Is it in any way possible to create screenshots with the testbot?

Alternative: mail the HTML of the page just before line 88, would have no problem in receiving loads of e-mails to find the problem here.

michielnugter’s picture

Status: Needs review » Needs work
michielnugter’s picture

Status: Needs work » Needs review
FileSize
2.46 KB
6.89 KB

Tried about everything but just can't reproduce it. Added try-catch + mail/HTML debug, curious to see if that works..

(please don't shoot me on the mega-ugly mail, just didn't feel like really trying to do it the right way)

klausi’s picture

Oh, testbots are not allowed to send email, so that email will never reach you. I think we have to squeeze any output into the assertion message to actually see it. Or install the testbot locally.

Status: Needs review » Needs work

The last submitted patch, 75: outsidein-fails-2830485-75.patch, failed testing.

michielnugter’s picture

Yep, looks like it, the test-bot failed there. Changed it into asssertEquals('')..

Don't know yet how to configure the testbot locally. Would be very usefull, with xdebug within the tests themselves would be golden.. Should try to get that running some other time..

michielnugter’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 78: outsidein-fails-2830485-78.patch, failed testing.

michielnugter’s picture

FileSize
24 KB

I attached the screenshot (well, this method works :)), gone close my laptop for today, I'll try to look into it tomorrow. Looks wrong though (editing still visible while the test just tried to close it), maybe the previous wait isn't good enough..

michielnugter’s picture

FileSize
28.45 KB

For reference, this screenshot is what it should have looked like.

So it's not actually the click that fails but it's the closing of the tray:

// If a tray was open from page load close it.
          $element->click();
          $this->waitForNoElement("#toolbar-administration a.is-active");

That fails I think. Could also be that the initial load of the /user page is incorrect.

michielnugter’s picture

Status: Needs work » Needs review
FileSize
6.29 KB

Added an initial screenshot and a check if the tray was open or not to narrow down the problem area.

michielnugter’s picture

Hmm, this is weird. Now it keeps passing (2 completed runs).

I still do think I we can get something out of this. The initial screenshot takes some time in the test which causes a longer time before the assertions start.

This could mean that the page actually hasn't finished executing all behaviors yet or ....?

Not sure on how to fix this. Random attempt: added a assertWaitOnAjaxRequest() and increased the number of test runs to 200 just to be sure.

michielnugter’s picture

Interesting, it now passes. Seems that my previous conclusion maybe isn't that far fetched. Does drupalGet() perform checks to see if the page is really finished loading and excuting any JavaScript code/ajax requests?

klausi’s picture

Nice find! That would mean we should add assertWaitOnAjaxRequest() to every GET request so that we can be sure the page is really ready for further action.

michielnugter’s picture

That would be a solution. But first, do you agree with my finding?

I thought about it some more. I don't know yet when PhantomJS reports back that it finished loading. If I assume the load event, which should mean all content is loaded thus sounding reasonable to me, it could still be too early. This would be too early for pages that have AJAX requests triggered in behaviors (which are triggered in documentready). These requests might not be finished in time before load is triggered. This test is an example of a situation where ajax requests are fired in behaviors.

Side note: The more tests I write the more unfortunate I find the name of the method assertWaitOnAjaxRequest, assertWaitOnJavascript() would be a better name I think.

michielnugter’s picture

400 iterations now and it passes (your test and mine). It does seem like a fix that could work and is not too far fetched..

Also very curious to see if this would be the fix for the other random failures.. Was planning on looking into #2831603: Improve assertion for testEntityReferenceAutocompleteWidget altough I don't have postgress installed..

BTW: I do think this will remain relevant, whatever driver / browser is used (#2775653: JavascriptTests with webDriver).

klausi’s picture

Yes, I agree with your findings. PhantomJS and the driver we use is still a black box to me, but we have demonstrated here that we can fix the random fails this way. This might only be another step to get our javascript tests better under control, not a final solution. It is still progress.

I queued a bunch of test runs for the patch above to make sure the 200 OutsideInBlockFormTest tests also pass in all environments.

Let's see if this approach also works for all our other tests.

tim.plunkett’s picture

Additionally, there are some tests doing $this->submitForm() followed by assertWaitOnAjaxRequest(), which makes sense.
Maybe we should add it there too?

I couldn't find any places already doing a assertWaitOnAjaxRequest() after a drupalGet(), mostly after things like pressButton/clickLink/selectFieldOption.

tedbow’s picture

+++ b/core/tests/Drupal/FunctionalJavascriptTests/JavascriptTestBase.php
@@ -64,6 +64,19 @@ protected function tearDown() {
+    // When the page has been loaded wait until jQuery is done with all Ajax
+    // activity before any further actions are done on the page.
+    $this->assertSession()->assertWaitOnAjaxRequest();

Should we make an optional parameter $wait_for_javascript = TRUE. Would there be tests that don't have ajax on page where they might want to skip this or would the extra check never be a problem?

michielnugter’s picture

@tim.plunkett

Is the usage of submitForm() the correct way of submitting a form in JTB? Untill now I've just been pressing the submitbutton. I does seem usefull to make this the best practice as it does other things as well. Changing it to the best practice does solve the problem as we could add the assertWaitOnAjaxRequest here.

@tedbow

The call shouldn't be a problem as it will immediatly return if there are no AJAX calls running thus not slowing down your page. I can't think of a situation where it could harm your test to have it there.

alexpott’s picture

Nice sleuthing. I think #91 is not a problem because the check is before waiting. See \Zumba\Mink\Driver\JavascriptTrait::wait(). However, I agree that making this optional behaviour seems like a good idea. I've looked at the code to see if we can do something lower down that affects both drupalGet() and submitForm() - but it does not seem simple. These methods call \Zumba\GastonJS\Browser\BrowserNavigateTrait::visit() and \Zumba\GastonJS\Browser\BrowserMouseEventTrait::click() respectively.

alexpott’s picture

Nice sleuthing. I think #91 is not a problem because the check is before waiting. See \Zumba\Mink\Driver\JavascriptTrait::wait(). However, I agree that making this optional behaviour seems like a good idea. I've looked at the code to see if we can do something lower down that affects both drupalGet() and submitForm() - but it does not seem simple. These methods call \Zumba\GastonJS\Browser\BrowserNavigateTrait::visit() and \Zumba\GastonJS\Browser\BrowserMouseEventTrait::click() respectively.

michielnugter’s picture

FileSize
1.01 KB

I searched for something lower down as well but I couldn't find anything. The get and button click are two completely different paths in causing a page reload. If you submit the form yourself by clicking a button there is no way to intercept it and add the wait as far as I can see, which would mean that we should reject patches that use this method in favor of the submitForm method.

Don't have time anymore to change the patch, very sorry about that. Gotta go for now, if it's not fixed next tuesday I'll continue to help out!

michielnugter’s picture

alexpott’s picture

So we could implement out own PhantomJSDriver class that extends \Zumba\Mink\Driver\PhantomJSDriver and uses it's own \Zumba\GastonJS\Browser\Browser that extends that class but by default waits for JS to ajax to finish after doing a command. We'd just have to wrap \Zumba\GastonJS\Browser\BrowserBase::command() to do this.

dawehner’s picture

Well, I'm wondering whether this is something we could also fix upstream somehow. I totally imagine that other users of this library might have similar problems.

michielnugter’s picture

@dawehner

Fixing it upsteam would nice but that has it's own problems.

The only, non-hacky, way to see if an AJAX request is running is by using jQuery.active (which Drupal currently does). This a jQuery specific option that only works as long as you use the jQuery methods for AJAX. There is no, standard, way of telling if an XMLHttpRequest is running. You'll need to overwrite some of the XMLHttpRequest functionality as suggested in:

http://stackoverflow.com/questions/10783463/javascript-detect-ajax-requests

This could be done of course but I'm not sure about overwriting such a basic functionality in a test suite. Then again, having a more low-level way of checking for activity would be nice.

alexpott’s picture

@michielnugter, @dawehner I wonder if rather than fixing this upstream is there anyway we can customise vendor/jcalderonzumba/gastonjs/src/Client/main.js to help us?

michielnugter’s picture

I've been discussing this issue offline with lendude and he had a nice insight: "You should know when writing a test when to wait for additional processing to finish.".

I agree with him. Enabling the outside-in module means some js and ajax requests will be triggered on load. The test itself should make sure that it waits for this processing to finish.

We could expand the doxygen for drupalGet to explain this and give a hint. I've also been working on the documentation here: https://www.drupal.org/docs/8/phpunit/phpunit-javascript-testing-tutorial. This page could be expanded with these kind of things. (Section: how to load a page).

What do you think?

EDIT: About altering main.js. Yes it should be possible but only in a hackish way and I'm really not sure about this route. Implementing our own Driver and Browser seem like better ideas to me if we want to implement this low-level.

klausi’s picture

I agree that we should fix OutsideInBlockFormTest first with an explicit wait for the Ajax to complete. Writing our own drivers or wait logic in the PhantomJS bindings might be taking this a bit too far for now. Even the addition to drupalGet() I proposed earlier might be a bit too much for this issue.

#2832672: [PP-1] Upgrade jcalderonzumba/* for better test performance has to deal with new failures of our existing tests anyway, so let's figure out there if we need additional Ajax waiting and potentially our own driver modifications.

klausi’s picture

michielnugter’s picture

I added some notes and tips for this on the documentation page:

https://www.drupal.org/docs/8/phpunit/phpunit-javascript-testing-tutorial

I recently updated this page based on my experiences so far..

Lendude’s picture

@michielnugter++, @klausi++, RTBC++

xjm’s picture

Let's just confirm that the 200 run patch again would actually fail without the fix.

tedbow’s picture

+++ b/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInBlockFormTest.php
@@ -80,6 +77,10 @@ public function testBlocks() {
       $this->drupalGet('user');
+      // After the page loaded we need to additionally wait until the settings
+      // tray Ajax activity is done.
+      $web_assert->assertWaitOnAjaxRequest();

Would it make sense to override drupalGet() in \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInJavascriptTestBase then call $web_assert->assertWaitOnAjaxRequest(); from there? Then any new test method won't have to do that same thing

Other issue that introduce test methods:
#2786193: Differentiate 2 "Quick edit" links for custom blocks
#2782915: Standardize the behavior of links when Outside In editing mode is enabled

xjm’s picture

Status: Reviewed & tested by the community » Needs review

For #108.

Status: Needs review » Needs work

The last submitted patch, 107: outsidein-fails-2830485-103-200-runs-FAIL.patch, failed testing.

xjm’s picture

dawehner’s picture

I think I agree with @tedbow, in case we don't put this into drupalGet or somehow (even if its hard) on the lower levels we end up with more random failures in the future. On the other hand fixing this random failure now helps for itself as well.

tim.plunkett’s picture

This should do it, right?
Note there's also \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInJavascriptTestBase::waitForOffCanvasToOpen() but I think that assumes the tray is open.

tedbow’s picture

Status: Needs review » Reviewed & tested by the community

@tim.plunkett thanks for making the patch. Looks good!

xjm’s picture

I'm fine with fixing this in all outside in tests for now, but it seems like this problem is going to come up again and again in JS tests. Should we file a followup to discuss a more general way of addressing the problem?

tim.plunkett’s picture

Adding $this->assertSession()->assertWaitOnAjaxRequest(); after link clicks or button presses is pretty common, and you can't really write a test without it.
AFAIK this is the first time we've encountered needing to wait for JS running on page load, with no other interaction, which is why it caught us all.

Not sure if there *is* a better way, other than thinking very carefully about what the page is doing!

michielnugter’s picture

It seems logical for me to fix for every outside-in test, I agree completely with the latest patch.

The problem might come up again but I hope that the extended documentation will help with that and maybe we should add something to the drupalGet documentation to indicate this might be a problem. I really don't know if adding a fix for this low-level is a good idea. The behavior will be specific to a few tests only and there are situations where you might not want it (maybe the big-pipe tests?).

dawehner’s picture

I'm wondering whether this approach (wrap the driver and wait on every click, visit and form submission.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 119: 2830485-119.patch, failed testing.

alexpott’s picture

I vote for doing #115 here and opening a followup to discuss. However with #115 there still seems to be a random error - https://www.drupal.org/pift-ci-job/551149

dawehner’s picture

I agree totally, I think though trying to do the research later would be cool!

tedbow’s picture

Status: Needs work » Needs review
FileSize
8.72 KB
8.11 KB
6.99 KB

Ok chatted with @dawehner briefly about this he suggested using a data provider method for the block info so the test isn't looping in through multiple blocks and making multiple drupalGet() calls.

Maybe the browser gets slower as the test goes on? How knows it's random.

Anyways leaving previous method where it tests all blocks in 1 go and a new one which test each block with a separate drupal install using a dataprovider. For now I changed as little code as possible so they both call the new function individualBlockTest(). Will need more factoring if determine data provider solves the problem.

UPDATE:

I vote for doing #115 here and opening a followup to discuss.

I missed this but agreed.

Will update this patch anyways just to see.

The last submitted patch, 123: 2830485-1220-200_tests.patch, failed testing.

tedbow’s picture

Retesting

Not sure what this means DrupalCI "ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?"

The last submitted patch, 123: 2830485-1220-200_tests.patch, failed testing.

tedbow’s picture

Ok actually looking back on the fail in #123 I saw "Build timed out (after 110 minutes). Marking the build as aborted."

Going to upload a patch that just does the dataprovider version 200 times. Hopefully it won't time out but maybe because 200 x 3 providers runs = 600 installs.

Status: Needs review » Needs work

The last submitted patch, 127: 2830485-122-200_tests_provider.patch, failed testing.

tedbow’s picture

michielnugter’s picture

I tried 500 tests in another issue and that was too much as well, seems that 200-300 is a max of runs the testbot can handle for the functional javascript tests.

I added a bunch of tests to see if the new patch holds up. If everything passes I still think we should keep an eye on this issue, it seems to me like a work-a-round instead of an actual fix?

I'm curious: does the testbot restart PhantomJS for each test? The PhantomJS browser really acts as a normal browser. I had a problem earlier with javascript changes not beeing picked up in my test, turns out it was using a browser cached version, restarting PhantomJS fixed it. If PhantomJS is not restarted each time this might cause it's own problems.

xjm’s picture

I vote for doing #115 here and opening a followup to discuss.
I missed this but agreed.

Unfortunately, #115 is not actually resolving the bug. See https://www.drupal.org/pift-ci-job/551149. Since the drupalGet() approach was therefore not actually fixing the bug completely, probably we should remove it from the patch and see what the effect of the data provider change is on its own?

xjm’s picture

I added a bunch of tests to see if the new patch holds up. If everything passes I still think we should keep an eye on this issue, it seems to me like a work-a-round instead of an actual fix?

Agreed @michielnugter.

michielnugter’s picture

This patch is without the overridden drupalGet().

The mentioned fail is a different one as before, the fail is later in the test but it might still be the same fail and thus the attempts so far would not be good enough.

I think it pops up later because of the changed way of testing. maybe because of a separate test for each block the tray is never expanded on reload?

xjm’s picture

Queued more runs for #133.

What's interesting is that #103 did not show the fail #115 did, so I queued more of #103 too.

One possibility is that we are lowering the failure rate but not necessarily fixing it, since it's related to how long the test takes to do stuff.

michielnugter’s picture

Alright, #132 fails as I expected. The drupalGet() fix actually does good as I see it now.

I do think we're lowering the possibility of a random fail with each of these fixes. All fixes up till now we're explainable, very plausible and had a positive effect on the number of random fails.

I think what we're seeing now is the effect of the JTB tests beeing new to Drupal and the theething problems that come with that.

EDIT: About the #103 not failing and #115 failing: I think #103 is just as likely to have the same random failure because it comes down to the same fix.

Status: Needs review » Needs work

The last submitted patch, 133: 2830485-132-100_tests_provider.patch, failed testing.

tedbow’s picture

Ok so it seems the only setup that hasn't failed is using a dataprovider and overriding drupalGet().

So just to recap

  • #133 uses dataprovider but no overriding drupalGet().
  • #113 no dataprovider but does overriding drupalGet().

Both fail.

#129 uses dataprovider and overriding drupalGet().
Passes

This patch is effectively the same as #129 but cleaned up.

I could actually see where we might need these 2 things together.
Could be that without the dataprovider the second or third time around the loop, subsequent drupalGet requests, the JS is somehow slower. So using the dataprovider fixes the slowness because there is an install instead of multiple requests. But there is still enough slowness even with the dataprovider that requests only work if we add the wait after the parent drupalGet().

tedbow’s picture

What the @#$%@$%?

So now a random failure on a totally different issue?

Behat\Mink\Exception\ResponseTextException: The text "Can you imagine anyone showing the label on this block?" was not found anywhere in the text of the current page.

michielnugter’s picture

Had just about the same reaction...

I really seems like a different issue than before. It's not an error immediately after loading a page.

But in this case, aren't there 2 ajax requests in a row? I see one POST and 1 get. The get is fired immediatly after the POST it seems. If the test checks for jquery activity exactly between these two requests it seems logical that the test fails.

So yes a random fail, yes a different one and again explainable I think.

Maybe we should test on the result using a js condition in this case?

michielnugter’s picture

Status: Needs review » Needs work

The last submitted patch, 140: 2830485-140-100x.patch, failed testing.

michielnugter’s picture

Sorry about that one, fail was actually reproduceable locally. Updated the patch, this one works locally..

michielnugter’s picture

Looking good, added another complete test run to see if it will hold up.

michielnugter’s picture

Version without the 100 runs for RTBC consideration

tedbow’s picture

@michielnugter

+++ b/core/modules/outside_in/tests/src/FunctionalJavascript/OutsideInBlockFormTest.php
@@ -92,7 +92,7 @@
-      $web_assert->pageTextContains($new_page_text);
+      $this->assertJsCondition('jQuery("' . $block_selector . ' h2").html() == "' . $new_page_text . '"');

I like the change but we should probably put in a comment as to why.
I think I could forget why we aren't using pageTextContains($new_page_text);

We could even make a method assertJSElementHTML($selector, $html)
but maybe not needed.

michielnugter’s picture

Is something like this what you're looking for?

I added the methods to JavascriptTestBase, don't know if they should be placed somewhere else?

tedbow’s picture

@michielnugter that looks good. If we get push back on the change to JavascriptTestBase then probably just your added comment will do but we can wait and see about that

Crossing my fingers and hoping there is little less randomness in the universe today ;)

Lendude’s picture

Great work!

+++ b/core/tests/Drupal/FunctionalJavascriptTests/JavascriptTestBase.php
@@ -94,6 +94,36 @@ protected function assertElementNotVisible($css_selector, $message = '') {
+  protected function assertElementHtmlEquals($css_selector, $html) {
...
+  protected function assertElementHtmlNotEquals($css_selector, $html) {

yeah personally not a fan of adding extra assertions that are just wrappers for another assertion. If we see this pattern used more then it might be useful and we can always add and refactor later.
But it's just personal opinion, I know there are other people that do like this since it leads to more readable test code (which I think is a pipe dream anyway).

michielnugter’s picture

I created #2837676: Provide a better way to validate all javascript activity is completed which could be a fix for this issue. I'd like to know if the created issue is a good solution for this problem.

tedbow’s picture

@michielnugter sorry for going back and forth on this but now that you have opened up #2837676: Provide a better way to validate all javascript activity is completed
It seems like the patch in #144 cover what is needed without affecting adding anything outside of this module.
I think we should go with the approach in #144.

This patch simply adds a @todo so that we can simply use \Behat\Mink\WebAssert::pageTextContains(). if #2837676 provides a new wait method.
That way won't need the new assert method.

michielnugter’s picture

Status: Needs review » Reviewed & tested by the community

@tedbow I agree, I'm also in favor of committing #150 and fix the random errors for this specific case and tackle the rest of the random fails in #2837676: Provide a better way to validate all javascript activity is completed with the improved validation.

So, RTBC?

tedbow’s picture

@michielnugter yep think we are RTBC!

  • xjm committed f53358d on 8.3.x
    Issue #2830485 by michielnugter, klausi, tedbow, xjm, catch, tim....

  • xjm committed 9924b5e on 8.2.x
    Issue #2830485 by michielnugter, klausi, tedbow, xjm, catch, tim....
xjm’s picture

Version: 8.3.x-dev » 8.2.x-dev
Status: Reviewed & tested by the community » Fixed

I agree, this seems like the best course for now. I'm also glad we have the followup in #2837676: Provide a better way to validate all javascript activity is completed.

Really great and patient work on this issue! Committed and pushed to 8.3.x and 8.2.x.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

xjm’s picture

mpdonadio’s picture

tedbow’s picture

Component: outside_in.module » settings_tray.module

Changing to new settings_tray.module component. @drpal thanks for script help! :)