Closed (fixed)
Project:
Drupal core
Version:
8.3.x-dev
Component:
ckeditor.module
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
15 Jan 2017 at 21:01 UTC
Updated:
11 Apr 2017 at 14:20 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
xjmComment #3
xjmI wonder if this is another one where we are not waiting long enough for an AJAX response.
Comment #4
wim leersI'd bet it is.
Comment #5
michielnugter commentedWith #2837676: Provide a better way to validate all javascript activity is completed the assertWaitOnAjaxRequest is improved and now waits on the complete ajax request to finish, might this also be the fix for this random fail?
If not the test could be changed to waiting for the expected values to be valid. I'd love to know if the random failures have disappeared with the assertWaitOnAjaxRequest patch before making any changes to this test.
Comment #6
catchSince we're dealing with the same code, let's downgrade this to normal for the assertion improvement. If we somehow see the random fail again, we can always upgrade to critical again.
Comment #7
catchUpdated the wrong issue. Let's mark this fixed, if the random fail resurfaces we can re-open.
Comment #8
wim leers+1 to #7.
Comment #9
catchUnfortunately that didn't fix it. https://www.drupal.org/pift-ci-job/583411
Comment #11
xjmDiscussed with @catch, @alexpott, @effulgentsia, and @cilefen. This is most definitely a disruptive critical bug. (My email is full of notices of this failure every morning.)
Comment #12
droplet commentedComment #14
michielnugter commentedTried it locally @droplet's test passes 50x in a row. Checked the logs, the test was added in 8.3.x. Updating the version and rerunning the tests.
Comment #15
michielnugter commentedOriginal patch running on 8.2.x again, it really did say 8.3.x when adding the tests. Adding same patch again to try and make it test 8.3.x
Comment #16
droplet commentedI have another quick look after the patch is failing. The failing may cause by the CKEDITOR didn't initialize.
- Also, ckeditor_test.module is useless. It's loading 404 res.
- And I wonder why:
this is passed. ensure CKEDITOR is loaded.
But "edit-inline" is failed. (I also tried sleep 10 sec before this test)
and "edit-iframe-value" always passed.
Comment #17
MixologicIf you are trying to figure out random fails like this, please do not submit the patches to every single possible environment. We know it happens in php7, and sqlite, so there is no need to add 7.1 tests, 5.5 tests, pgsql tests or any other tests until we've reached the point where we believe we have figured out the root cause in the failing environment and believe we have a fix. Only then should we run it on all other environments at RTBC time to ensure total covereage.
As it stands, this one issue is blocking testing on all of d.o. for the next couple of hours just to flush through 24 tests.
Comment #19
michielnugter commented@Mixologic: Sorry, that's on me. I didn't think it would have this great an impact. I wanted to run an extensive set to see if we really fixed the issue here. Will be a lot more carefull with this in the future. Cancelling various tests in the hope that helps..
Comment #21
droplet commentedThanks for the remind @Mixologic. Does testbot always use the SAME machine for PHP7 or random config on testing?
OFF-TOPIC:
Still no idea why testbot doesn't implement a feature to prioritize the target test first and failed it immediately. (Doesn't think it's a technical issue?) We could provide an option for it if there's any extra test info really matter.
Basically, for these JS testing, it will take less than 1 min. But the whole testing will take 35+ mins.
(and when our local env always get passed, here's really testing the testbot)
Comment #22
MixologicNo sweat, thats on us for not really providing clear instructions on how to make the most of the testing infrastructure. When I see "php7 + sqlite" fails, and they arent database related, my first thought is that "That is the fastest testing environment we have, so something is happening too fast for the tests to have time to react to". So testing on the slower environments wont necessarily prove or disprove much.
Anyhow, thanks for trying to sort out these rando fails. They're a serious pain.
Comment #23
droplet commentedCould you log the CPU usage? I actually think in diff. Maybe there's some reason caused very high CPU usage and slow down the PhantomJS execution. Or tooo fast caused PhantomJS high-CPU usages? Are they running on same machine?
Comment #24
Mixologic@droplet : The testing infrastructure spins bots up and down on demand, up to a limit (normally 12, higher during drupalcons). If bots runs a test, and is then idle, it stays running until 5 minutes before the hour we paid for is up. But if its working, it buys another hour, and keeps running. So bots do get reused, but no tests are assigned particular machines.
Also re: off-topic -priority testing: thats a feature we're discussing, and hope to implement soon (#2569585: Split incoming patches into psr0/psr4 tests and code and run just new tests first.. This core patch went in a few weeks ago and will help considerably with that: (#2841948: Modify run-tests.sh to show file paths of all discovered tests.).
Comment #25
MixologicI can pretty much guarantee that the phantomjs issues arent overloading the cpu. The boxes have 32 cores, and we havent yet figured out the best way to parallelize the phantomjs testing. So we are running the phantomjs tests serially, on a gigantic box.
Comment #26
klausiGood testing practices say that when you have random fails in a test you should disable it temporarily. That way we keep trust in the test system and the release managers are not bombarded with the same test fails in their mail inbox every day.
1) Create and commit a patch that disables this test.
2) Create a patch that reproduces the test fails reliably. (the usual trick is to run the same test class very often, for example in #2830485: \Drupal\Tests\outside_in\FunctionalJavascript\OutsideInBlockFormTest fails randomly)
3) Create a patch that fixes the test fails reliably.
4) Create a final patch that removes all previous debugging and should be committed.
Comment #27
michielnugter commented@Mixologic
Thanks for the insights into the testbot and I tried to cancel as many tests as I could, wont be doing that again!
@klausi
Thanks for teaching, really want to get everything about this right and this helps a lot.
Patch 1: Skip this test, should be commitable right away.
Patch 2: baseline to reproduce the issue.
Will work on updating @droplet's patch.
Comment #28
michielnugter commentedComment #29
michielnugter commentedCleanup and restructure for @droplet's patch.
Added waitOnCkEditorInstance, which if you agree on the need should be moved to either a base class or a Trait.
Comment #30
droplet commentedThanks @michielnugter,
Good to go now?
----
07:52:56 Test run duration: 3 min 46 sec
08:20:52 Test run duration: 2 min
PHP7 & SQLITE is twice faster than PHP 5.6
Comment #31
michielnugter commented@droplet:
The skip test is good to go, I would like opinions on moving the waitOnCkEditorInstance method out of this class before commiting the latest version.
---
interesting to see the performance diff, PHP7 is awesome in every single way :)
Comment #32
klausiRTBC for disabling the test as a first step. Hiding the other patches in the issue summary for now.
Comment #33
klausiHiding more patches.
Comment #34
wim leersSo #32 is asking to commit
2843693-27-skip-test.patch. Please reupload that patch next time so that it's not confusing to committers which patch is actually RTBC.Comment #35
droplet commented@michielnugter,
For testing purpose, I think this is useless. (This CKEDITOR testing module also has a bad design)
In general, with better UX, we should enable the BUTTON after the CKEDITOR loading is finished. I'm able to click the button in Chrome with mouse before CKEDITOR is loaded.
And in a real env, this is likely called from code rather than human clicking. Well, should we try to split it into 2 test cases and running it from JS? It will be executed serially since JS is single-threaded?
Anyway, move it later when it's really required? Addition API is allowed. Remove API from CORE is much harder, hehe.
Comment #36
michielnugter commented@droplet
Not sure what you mean here.
Agree, let's keep it here for now and move it later when it makes sense.
Can someone review the latest patch?
Comment #37
cilefen commentedupdating credit
Comment #40
cilefen commentedCommitted to 8.3.x and 8.4.x #27 which disables the test. The test does not exist in 8.2.x. Thank you everyone! Work can hereby continue to fix the issue.
Comment #41
droplet commentedThanks @cilefen.
@michielnugter agreed no more changes. And patch #29 looks good to me also. Anyone can do a 2nd review?
Thanks ALL!
Comment #42
xjmComment #43
andriy_novak commentedfixed
Comment #45
alexpott@andriy_novak the commit done in #40 was to disable the test. The patch in #29 still has code to run the test repeatedly in it. And #40 set back to needs work to get a proper fix. So, as far I can see, this issue is still needs work.
Comment #46
droplet commentedThat's fine. We just heard feedbacks. Let me close it.
Comment #47
droplet commentedJust removed the test script hack
Comment #48
alexpottAlso needs to be removed from the test too - ATM it is not running.
Comment #49
droplet commentedComment #50
michielnugter commentedThanks for updating the patch @droplet
RTBC on random_test_failure_in-2843693-49.patch.
Comment #51
alexpottWe should confirm that the test.patch in #49 actually causes the random fail
Uploading all the patches from #49 too because the last patch on the issue should be the rtbc patch.
Comment #53
droplet commentedAhh. doesn't we confirm that's random fail before patching? But whatever, more bots running has no harmful :)
Comment #54
alexpott@droplet what #51 proves is that the number or repetitions was enough to cause the random fail. Interestingly, only once https://dispatcher.drupalci.org/job/drupal_patches/8437/console which might mean that 25 and green is not high enough to prove it.
Comment #55
alexpottJust uploading another test patch with the patch applied. Let's fire the test 300 times - just to be sure as we can. If this is green I'll commit the fix.
Comment #56
michielnugter commented300 might be shooting too high, in my experience the testbot can handle about 150 runs before timing out, let's see what happens.
Comment #57
droplet commentedAhh got the point now. Actually adding more tests with diff config and ENV (https://www.drupal.org/node/2843693/qa/5807580) would be better than running it 300 times once. According to one of testbot members said, JSTests is running on a super computer, it doesn't seem overloaded. So that with diff config & ENV, it will be testing target elements on diff timing.
Comment #58
alexpottCommitted and pushed 9564973 to 8.4.x and 80f1493 to 8.3.x. Thanks!
Fixed on commit.
Comment #61
xjm