Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
Originally reported by @xjm on May 14, who pointed me to https://dispatcher.drupalci.org/job/drupal8_core_regression_tests/54877/... and asked to investigate.
- 9.5.x-dev test with PHP 7.3 & MariaDB 10.3.22 having "build successful" builds, which if you open them say "no failures" (see https://dispatcher.drupalci.org/job/drupal8_core_regression_tests/54779/), but if you look at the console output (https://dispatcher.drupalci.org/job/drupal8_core_regression_tests/54779/... and https://dispatcher.drupalci.org/job/drupal8_core_regression_tests/54873/... for example), you'll see that *some* tests are timing out.
- Note that as 9.5.x-dev test with PHP 7.3 & MariaDB 10.3.22 shows, there *have* been successful builds since this ticket was created: on May 16 and 18 the builds worked just fine.
- Also note that https://www.drupal.org/pift-ci-job/2386056 has only existed since April 30. This suggests that this is a new DrupalCI image.
Steps to reproduce
N/A — only happens on DrupalCI, impossible to reproduce locally.
Proposed resolution
TBD
Remaining tasks
TBD
User interface changes
TBD
API changes
TBD
Data model changes
TBD
Release notes snippet
TBD
Comments
Comment #2
Wim LeersMy analysis:
ReferenceError: jQuery is not defined
seem to confirm this: that is indicate of a static asset having failed to load.yarn test:nightwatch modules/toolbar/tests/src/Nightwatch/Tests/toolbarTest.js
(~92 seconds) andyarn test:nightwatch modules/toolbar/tests/src/Nightwatch/Tests/toolbarApiTest.js
(~85 seconds).Note that if I start doing other things while the test is running, e.g. browsing drupal.org, then I do see these tests failing occasionally. This confirms that Nightwatch tests are extremely susceptible to resource contention.
The only possible conclusion I can reach is: tests are failing only because they're running in a resource-constrained environment. Otherwise they should consistently fail, with explicit errors.
Comment #4
Wim LeersComment #6
Wim LeersDiscussed in detail with @nod_.
He had a hunch … which turned out to be helpful! :D
https://dispatcher.drupalci.org/job/drupal8_core_regression_tests/54779/...
→
You can see that it failed to set the
access toolbar
permission and then silently continued anyway, then crashes after 10 seconds of waiting for a toolbar to appear … but the user does not have access to it, so that makes total sense.https://dispatcher.drupalci.org/job/drupal8_core_regression_tests/54877/...
→
You can see that the checkbox to enable the
toolbar
checkbox never appeared, so it failed to check it to install the Toolbar module, then it silently continues anyway, then eventually crashes because of course executing Toolbar JS is not going to work if Toolbar JS is not loaded because the module was never installed.Conclusion: machine performance is too low and/or machine load is too high. That in turn causes very fundamental things (installing the Toolbar module) to fail, which then of course makes all Toolbar tests fail.
Comment #7
nod_as a followup we should also somehow kill the test if the setup fails instead of going ahead.
Comment #8
MixologicSo, a few things from infra perspective -> containers / images shouldnt really have any bearing on underlying hardware/resource contention, *unless* the container itself is somehow hamstrung from accessing the full resources. When the nightwatch tests are running, there is nothing else at all happening on the machine, and thus no contention at all. Nightwatch tests run serially if I remember correctly, so its not an issue of parallelism either.
In this case it could be that the mariadb container isnt properly configured to take advantage of the massive amounts of memory/disk/cpu available. The configurations for those containers were just what I received from a community member, so dont know if there is more performance to be gained there, or if there is some other issue.
The image itself is not new, it was just recently added as another daily test. The image was added in mar, 2020.
So, Im not sure how we can test if mariadb is underperforming/underconfigured.
Comment #9
hestenetFor reference, the tests are running on a C4 8XL AWS instance (or one of a very comparable one in the pool) which has 32vCPUs/64gigs of RAM. And as @mixologic says, the nightwatch tests run serially, so they shouldn't be constrained by the instance itself.
Comment #10
Wim LeersThanks for the context, @mixologic and @hestenet!
The image dating back to >2 years ago and (I'm assuming) not having caused issues until now is a strong indication that the image itself is fine.
The machine sounds about 4 times more powerful than my measly laptop, and with no other work happening concurrently with the Nightwatch tests, my resource contention theory seems very unlikely.
Unfortunately … that leaves me in a place where I have no idea how to dig deeper 😔
Comment #11
hestenetWe think it was maybe @daffie who helped get the MariaDB container set up. Perhaps they would have some ideas here.
Comment #12
daffie CreditAttribution: daffie commentedMariaDB is saying that they fixed some PHP PDO stuff in version 10.3.27 and we are using version 10.3.22. See: https://mariadb.com/kb/en/mariadb-10327-release-notes/
Maybe update MariaDB to a newer version and see if it fixes the problem.
Comment #13
alexpottIt's not just MariaDB though... see https://www.drupal.org/pift-ci-job/2388848 - same fail and MySQL.
It's something that has changed on 9.4.x that is not on 9.3.x because it does not seem to be happening at all there - you can see this if you look at the build history on https://www.drupal.org/node/3060/qa - compare https://www.drupal.org/pift-ci-job/2388394 and https://www.drupal.org/pift-ci-job/2388902 - these are both PHP 8.0 and MySQL 5.7 - so exactly the same build environment.
Comment #14
alexpottLet's see if we can get Nightwatch to dump the page when it fails to install the module.
Comment #15
alexpottLol - I thought I'd already added the missing semi-colon...
Comment #16
Wim LeersSame failures on https://www.drupal.org/pift-ci-job/2388963, which is PHP 8.1 & MySQL 5.7 for
10.0.x
…Comment #17
xjm#15 looks promising, but there are two Toolbar tests with different fails in its result set:
We should consider just deploying that fix and then filing followups for the different fails.
Comment #18
Wim Leers#16 was for #3265140: Move QuickEditImageController from image to quickedit on D10 + PHP 8.1.
I'm now seeing this also for #3261447: Add an API for dynamically setting recommended and supported PHP versions based on known and predicted PHP release schedules on D10 + PHP 8.1 — see #3261447-35: Add an API for dynamically setting recommended and supported PHP versions based on known and predicted PHP release schedules.
Comment #19
alexpott#15 didn't give us the info I was looking for... trying to work out why...
Comment #20
alexpottHmmm... I wonder if this is all about the .drupalInstallModule() - when Nightwatch was added we created the ability to pass a PHP script to the installer to do tasks like this. I wonder what happens if we move this there.
Comment #21
alexpottMissed the toolbarApiTest.js which does the same thing.
Comment #23
Wim Leers@huzooka just pointed something out:
… maybe it's the fact that we're doing a plain
yarn install
that is the problem?See https://classic.yarnpkg.com/en/docs/cli/install — we should do
yarn install --frozen-lockfile
(theyarn
equivalent ofnpm ci
) if we want to ensure that the dependencies in theyarn.lock
file are actually respected!Related reading: https://newbedev.com/how-to-install-packages-based-on-the-lock-file-with.... Also, this is being proposed as the default behavior for
yarn install
: https://github.com/yarnpkg/yarn/issues/4147 — which I suspect all of us thought was already the default behavior…Maybe that solves it? 🤞 If we don't do that here, we should have a follow-up for it for sure.
Comment #24
alexpott@Wim Leers the problem is all in core/tests/Drupal/Nightwatch/Commands/drupalInstallModule.js - that's proved by #21 which does the module install a different way. But looking at this code. It's not correct. It's waiting for something that is already on the page when it presses the submit button. We can improve this considerably.
I think the attached patch will fix the problem.
Comment #25
alexpott@Wim Leers yarn.lock is not currently updated by running a yarn install - so that's not the issue here. I'm not sure about adding
--frozen-lockfile
yarn respects the lock file in nearly all cases anyway.Comment #26
huzookaWhat about
chromedriver
?Comment #27
alexpottI'm pretty sure that the nightwatch tests on DrupalCI are not using the chromedriver in core/node_modules... fwiw. See the DrupalCI logs...
This is the chromedriver being used by Nightwatch - the version brought down by yarn is ignored.
Comment #28
alexpottThe errors are happening because the
.waitForElementVisible('.system-modules', 10000);
does not work. On really fast computers that assert is able to complete before the browser has refreshed after clicking on the submit. So we have the opposite of @Wim Leers performance suspicions. If we wait for something that can only be true after the module install has completed that this will work. The reason for the different errors is that sometimes the breakpoint install was being interrupted and sometimes the toolbar was so we'd see slightly different errors after this has occurred. Fun stuff. This will also resolve probable random errors in core/tests/Drupal/Nightwatch/Tests/touchDetectionTest.js as that uses drupalInstallModule() too.Comment #29
alexpottAh we need to account for the fact that testing modules are hidden. That's not a problem.
Comment #30
alexpottAnd here's just the fix on itself without the changes to drupalci.yml.
Comment #31
Wim Leers🤣 🥳 👏👏👏👏
Comment #32
nod_ooh I see, very nice find, thanks :)
Comment #33
alexpott#29 shows that we still have the problem with #30. The more specific assert narrows it down to
drupalInstallModule()
in both https://www.drupal.org/pift-ci-job/2389713 and https://www.drupal.org/pift-ci-job/2389714So options are to do #21 and use a PHP site setup script to do the module enabling and we should deprecate
drupalInstallModule()
because it is broken. This was introduced in #3255809: Add nightwatch tests for toolbar - so we still have the question as to why 9.3.x does not seem to be as affected.Comment #34
alexpottLet's do two things before we fallback to PHP. Let's make all the data input and clicking and waiting specific to the form.system-modules. I'm pretty sure that is not an issue here but it help make drupalInstallModules more robust. And let's do a little pause after pressing submit. I don;t think there is much harm in that here and if it results in it being fixed that feels acceptable.
Comment #35
nod_don't remember why I didn't use submit form here: https://nightwatchjs.org/api/submitForm.html
Comment #36
alexpottLet's use submitForm() and let's also add some instrumentation - I think I understand why the original effort did not work... you have to tell the wait not to abort on error because you want the callback to fire.
Comment #37
alexpottOkay submitForm() got a sea of green - yay... triggering the tests again. If it is another sea of green I think we should proceed here.
Comment #38
alexpottSo 9.3.x is not failing because for some reason core/modules/toolbar/tests/src/Nightwatch/Tests/toolbarApiTest.js does not use drupalInstallModule on that branch. In fact drupalInstallModule does not exist on 9.3.x so this is all down to how #3255809: Add nightwatch tests for toolbar was committed.This is incorrect. Still can't explain 9.3.x vs 9.4.x
Comment #39
alexpottSo one difference between 9.4.x and 9.3.x is that #3272727: Nightwatch's drupalModuleInstall() doesn't handle test modules or modules w underscores in machine name - but that was 7 weeks ago and I don't think this has been happening for 7 weeks.
Comment #40
alexpottAnyways #36 is a sea of green after 2 goes so the problem is fixed - yay!
So the fix seems to be changing to submitForm() and making the wait something actually specific that has changed due to enabling a module.
Here's a patch the has just the necessary stuff to fix this.
Comment #41
Wim LeersI'm in awe of this epic showcase of "oh but if you really know what you're doing, it is possible to debug Nightwatch tests on DrupalCI" that we got served here. 🤯
Next time I need to do something like this I'm gonna google "nightwatch drupal module install alexpott" 😁
I kind of want to uncredit everyone except @alexpott on this one because he's basically come in and saved the
dayrelease.Comment #42
nod_Rtbc+1
Comment #44
catchCommitted/pushed to 10.0.x, cherry-picked to 9.5.x and 9.4.x, thanks!