Problem/Motivation

There are numerous random failures in Drupal\Tests\ckeditor\FunctionalJavascript\AjaxCssTest over the past several days, mostly in SQLite but also one on MySQL. The results I've seen are all on PHP 7 and Drupal 8.3.x.

Ckeditor.Drupal\Tests\ckeditor\FunctionalJavascript\AjaxCssTest

✗ Drupal\Tests\ckeditor\FunctionalJavascript\AjaxCssTest
  fail: [Other] Line 0 of sites/default/files/simpletest/phpunit-4.xml:
  PHPunit Test failed to complete; Error: PHPUnit 4.8.27 by Sebastian
Bergmann and contributors.

  F

  Time: 5.02 seconds, Memory: 4.00Mb

  There was 1 failure:

  1)
Drupal\Tests\ckeditor\FunctionalJavascript\AjaxCssTest::testCkeditorAjaxAddCss
  Failed asserting that two strings are equal.
  --- Expected
  +++ Actual
  @@ @@
  -'rgb(255, 0, 0)'
  +'rgb(34, 34, 34)'

  /var/www/html/core/tests/Drupal/Tests/BrowserTestBase.php:1343

/var/www/html/core/modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php:65

Proposed resolution

TBD

Remaining tasks

TBD

Comments

xjm created an issue. See original summary.

xjm’s picture

Title: Random test failure in CKEditpAjaxCssTest » Random test failure in CKEditor AjaxCssTest
xjm’s picture

I wonder if this is another one where we are not waiting long enough for an AJAX response.

wim leers’s picture

Issue tags: +JavaScriptTest

I wonder if this is another one where we are not waiting long enough for an AJAX response.

I'd bet it is.

michielnugter’s picture

With #2837676: Provide a better way to validate all javascript activity is completed the assertWaitOnAjaxRequest is improved and now waits on the complete ajax request to finish, might this also be the fix for this random fail?

If not the test could be changed to waiting for the expected values to be valid. I'd love to know if the random failures have disappeared with the assertWaitOnAjaxRequest patch before making any changes to this test.

catch’s picture

Title: Random test failure in CKEditor AjaxCssTest » Improve assertion in CKEditor AjaxCssTest
Priority: Critical » Normal

Since we're dealing with the same code, let's downgrade this to normal for the assertion improvement. If we somehow see the random fail again, we can always upgrade to critical again.

catch’s picture

Title: Improve assertion in CKEditor AjaxCssTest » Random test failure in CKEditor AjaxCss
Priority: Normal » Critical
Status: Active » Fixed

Updated the wrong issue. Let's mark this fixed, if the random fail resurfaces we can re-open.

wim leers’s picture

+1 to #7.

catch’s picture

Status: Fixed » Active

Unfortunately that didn't fix it. https://www.drupal.org/pift-ci-job/583411

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

xjm’s picture

Version: 8.4.x-dev » 8.2.x-dev
Issue tags: +Triaged D8 critical

Discussed with @catch, @alexpott, @effulgentsia, and @cilefen. This is most definitely a disruptive critical bug. (My email is full of notices of this failure every morning.)

droplet’s picture

Status: Active » Needs review
StatusFileSize
new2.09 KB

Status: Needs review » Needs work

The last submitted patch, 12: 2843693-12.patch, failed testing.

michielnugter’s picture

Version: 8.2.x-dev » 8.3.x-dev
Status: Needs work » Needs review

Tried it locally @droplet's test passes 50x in a row. Checked the logs, the test was added in 8.3.x. Updating the version and rerunning the tests.

michielnugter’s picture

StatusFileSize
new2.09 KB

Original patch running on 8.2.x again, it really did say 8.3.x when adding the tests. Adding same patch again to try and make it test 8.3.x

droplet’s picture

StatusFileSize
new3.35 KB

I have another quick look after the patch is failing. The failing may cause by the CKEDITOR didn't initialize.

- Also, ckeditor_test.module is useless. It's loading 404 res.
- And I wonder why:

  1. +++ b/core/modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php
    @@ -45,24 +45,62 @@ public function testCkeditorAjaxAddCss() {
    +    $session->wait('10', $condition);
    

    this is passed. ensure CKEDITOR is loaded.

  2. +++ b/core/modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php
    @@ -45,24 +45,62 @@ public function testCkeditorAjaxAddCss() {
    +      'edit-inline',
    

    But "edit-inline" is failed. (I also tried sleep 10 sec before this test)
    and "edit-iframe-value" always passed.

Mixologic’s picture

If you are trying to figure out random fails like this, please do not submit the patches to every single possible environment. We know it happens in php7, and sqlite, so there is no need to add 7.1 tests, 5.5 tests, pgsql tests or any other tests until we've reached the point where we believe we have figured out the root cause in the failing environment and believe we have a fix. Only then should we run it on all other environments at RTBC time to ensure total covereage.

As it stands, this one issue is blocking testing on all of d.o. for the next couple of hours just to flush through 24 tests.

The last submitted patch, 12: 2843693-12.patch, failed testing.

michielnugter’s picture

@Mixologic: Sorry, that's on me. I didn't think it would have this great an impact. I wanted to run an extensive set to see if we really fixed the issue here. Will be a lot more carefull with this in the future. Cancelling various tests in the hope that helps..

The last submitted patch, 15: 2843693-15.patch, failed testing.

droplet’s picture

Thanks for the remind @Mixologic. Does testbot always use the SAME machine for PHP7 or random config on testing?

OFF-TOPIC:
Still no idea why testbot doesn't implement a feature to prioritize the target test first and failed it immediately. (Doesn't think it's a technical issue?) We could provide an option for it if there's any extra test info really matter.

Basically, for these JS testing, it will take less than 1 min. But the whole testing will take 35+ mins.
(and when our local env always get passed, here's really testing the testbot)

Mixologic’s picture

No sweat, thats on us for not really providing clear instructions on how to make the most of the testing infrastructure. When I see "php7 + sqlite" fails, and they arent database related, my first thought is that "That is the fastest testing environment we have, so something is happening too fast for the tests to have time to react to". So testing on the slower environments wont necessarily prove or disprove much.

Anyhow, thanks for trying to sort out these rando fails. They're a serious pain.

droplet’s picture

That is the fastest testing environment

Could you log the CPU usage? I actually think in diff. Maybe there's some reason caused very high CPU usage and slow down the PhantomJS execution. Or tooo fast caused PhantomJS high-CPU usages? Are they running on same machine?

Mixologic’s picture

@droplet : The testing infrastructure spins bots up and down on demand, up to a limit (normally 12, higher during drupalcons). If bots runs a test, and is then idle, it stays running until 5 minutes before the hour we paid for is up. But if its working, it buys another hour, and keeps running. So bots do get reused, but no tests are assigned particular machines.

Also re: off-topic -priority testing: thats a feature we're discussing, and hope to implement soon (#2569585: Split incoming patches into psr0/psr4 tests and code and run just new tests first.. This core patch went in a few weeks ago and will help considerably with that: (#2841948: Modify run-tests.sh to show file paths of all discovered tests.).

Mixologic’s picture

I can pretty much guarantee that the phantomjs issues arent overloading the cpu. The boxes have 32 cores, and we havent yet figured out the best way to parallelize the phantomjs testing. So we are running the phantomjs tests serially, on a gigantic box.

klausi’s picture

Status: Needs review » Needs work

Good testing practices say that when you have random fails in a test you should disable it temporarily. That way we keep trust in the test system and the release managers are not bombarded with the same test fails in their mail inbox every day.

1) Create and commit a patch that disables this test.
2) Create a patch that reproduces the test fails reliably. (the usual trick is to run the same test class very often, for example in #2830485: \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInBlockFormTest fails randomly)
3) Create a patch that fixes the test fails reliably.
4) Create a final patch that removes all previous debugging and should be committed.

michielnugter’s picture

StatusFileSize
new655 bytes
new599 bytes

@Mixologic
Thanks for the insights into the testbot and I tried to cancel as many tests as I could, wont be doing that again!

@klausi
Thanks for teaching, really want to get everything about this right and this helps a lot.

Patch 1: Skip this test, should be commitable right away.
Patch 2: baseline to reproduce the issue.

Will work on updating @droplet's patch.

michielnugter’s picture

Status: Needs work » Needs review
michielnugter’s picture

StatusFileSize
new3.94 KB
new3.67 KB

Cleanup and restructure for @droplet's patch.

Added waitOnCkEditorInstance, which if you agree on the need should be moved to either a base class or a Trait.

droplet’s picture

Thanks @michielnugter,

Good to go now?

----

07:52:56 Test run duration: 3 min 46 sec
08:20:52 Test run duration: 2 min

PHP7 & SQLITE is twice faster than PHP 5.6

michielnugter’s picture

@droplet:

The skip test is good to go, I would like opinions on moving the waitOnCkEditorInstance method out of this class before commiting the latest version.

---
interesting to see the performance diff, PHP7 is awesome in every single way :)

klausi’s picture

Status: Needs review » Reviewed & tested by the community

RTBC for disabling the test as a first step. Hiding the other patches in the issue summary for now.

klausi’s picture

Hiding more patches.

wim leers’s picture

So #32 is asking to commit 2843693-27-skip-test.patch. Please reupload that patch next time so that it's not confusing to committers which patch is actually RTBC.

droplet’s picture

@michielnugter,

The skip test is good to go, I would like opinions on moving the waitOnCkEditorInstance method out of this class before commiting the latest version.

For testing purpose, I think this is useless. (This CKEDITOR testing module also has a bad design)

In general, with better UX, we should enable the BUTTON after the CKEDITOR loading is finished. I'm able to click the button in Chrome with mouse before CKEDITOR is loaded.

And in a real env, this is likely called from code rather than human clicking. Well, should we try to split it into 2 test cases and running it from JS? It will be executed serially since JS is single-threaded?

Anyway, move it later when it's really required? Addition API is allowed. Remove API from CORE is much harder, hehe.

michielnugter’s picture

@droplet

And in a real env, this is likely called from code rather than human clicking. Well, should we try to split it into 2 test cases and running it from JS? It will be executed serially since JS is single-threaded?

Not sure what you mean here.

Anyway, move it later when it's really required? Addition API is allowed. Remove API from CORE is much harder, hehe.

Agree, let's keep it here for now and move it later when it makes sense.

Can someone review the latest patch?

cilefen’s picture

updating credit

  • cilefen committed 5632ab5 on 8.4.x
    Issue #2843693 by michielnugter, droplet, xjm, Mixologic, klausi: Random...

  • cilefen committed 3c08f91 on 8.3.x
    Issue #2843693 by michielnugter, droplet, xjm, Mixologic, klausi: Random...
cilefen’s picture

Status: Reviewed & tested by the community » Needs work

Committed to 8.3.x and 8.4.x #27 which disables the test. The test does not exist in 8.2.x. Thank you everyone! Work can hereby continue to fix the issue.

droplet’s picture

Status: Needs work » Needs review

Thanks @cilefen.

@michielnugter agreed no more changes. And patch #29 looks good to me also. Anyone can do a 2nd review?

Thanks ALL!

xjm’s picture

Issue tags: +DrupalCampNJ2017
andriy_novak’s picture

Status: Needs review » Reviewed & tested by the community

fixed

The last submitted patch, 27: 2843693-27-skip-test.patch, failed testing.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

@andriy_novak the commit done in #40 was to disable the test. The patch in #29 still has code to run the test repeatedly in it. And #40 set back to needs work to get a proper fix. So, as far I can see, this issue is still needs work.

droplet’s picture

Assigned: Unassigned » droplet

That's fine. We just heard feedbacks. Let me close it.

droplet’s picture

Assigned: droplet » Unassigned
Status: Needs work » Needs review
StatusFileSize
new3.09 KB

Just removed the test script hack

alexpott’s picture

Status: Needs review » Needs work
    // Skip test until #2843693 is fixed.
    $this->markTestSkipped();

Also needs to be removed from the test too - ATM it is not running.

droplet’s picture

Status: Needs work » Needs review
StatusFileSize
new3.26 KB
new3.85 KB
michielnugter’s picture

Status: Needs review » Reviewed & tested by the community

Thanks for updating the patch @droplet

RTBC on random_test_failure_in-2843693-49.patch.

alexpott’s picture

We should confirm that the test.patch in #49 actually causes the random fail

Uploading all the patches from #49 too because the last patch on the issue should be the rtbc patch.

The last submitted patch, 51: 2843693.test-should-fail.patch, failed testing.

droplet’s picture

Ahh. doesn't we confirm that's random fail before patching? But whatever, more bots running has no harmful :)

alexpott’s picture

@droplet what #51 proves is that the number or repetitions was enough to cause the random fail. Interestingly, only once https://dispatcher.drupalci.org/job/drupal_patches/8437/console which might mean that 25 and green is not high enough to prove it.

alexpott’s picture

StatusFileSize
new3.81 KB

Just uploading another test patch with the patch applied. Let's fire the test 300 times - just to be sure as we can. If this is green I'll commit the fix.

michielnugter’s picture

300 might be shooting too high, in my experience the testbot can handle about 150 runs before timing out, let's see what happens.

droplet’s picture

Ahh got the point now. Actually adding more tests with diff config and ENV (https://www.drupal.org/node/2843693/qa/5807580) would be better than running it 300 times once. According to one of testbot members said, JSTests is running on a super computer, it doesn't seem overloaded. So that with diff config & ENV, it will be testing target elements on diff timing.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed 9564973 to 8.4.x and 80f1493 to 8.3.x. Thanks!

FILE: .../modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php
----------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
----------------------------------------------------------------------
 122 | ERROR | [x] The closing brace for the class must have an empty
     |       |     line before it
----------------------------------------------------------------------
PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------

Time: 48ms; Memory: 4Mb

 8.4.x  alex: ~/dev/drupal > git diff
diff --git a/core/modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php b/core/modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php
index 7a9f77e..f011554 100644
--- a/core/modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php
+++ b/core/modules/ckeditor/tests/src/FunctionalJavascript/AjaxCssTest.php
@@ -119,4 +119,5 @@ protected function waitOnCkeditorInstance($instance_id, $timeout = 10000) {

     $this->getSession()->wait($timeout, $condition);
   }
+
 }

Fixed on commit.

  • alexpott committed 9564973 on 8.4.x
    Issue #2843693 by droplet, michielnugter, xjm, Mixologic, klausi: Random...

  • alexpott committed 80f1493 on 8.3.x
    Issue #2843693 by droplet, michielnugter, xjm, Mixologic, klausi: Random...
xjm’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.