It's the old Rock, Pebbles, and Sand analogy but for test. We need to run the slowest tests first so we know that they won't hog the test runner while everything else has finished. By running the slowest tests early we make optimal use of all 32 cores available.

Three possible approaches:

  1. Mark the slow tests with #slow. TestDiscovery orders by alphabet so the # ensures that these tests come first. It can be any tag obviously, but #slow has a nice ring to it.
  2. Reverse the order of the tests. This would work because the slowest tests come late in the alphabet (Update, Views, Workflow).
  3. Randomise the order of the test. This helps because the same type of test tend to be slow and this evenly distributes them though the run.

Pros-cons:

  1. Option one would give us the highest level of control, but would take manual management and updating. Drawback pointed out by @Mile23 in #5 was addressed in #2296615: Accurately support multiple @groups per test class
  2. Option two is only by coincidence that it works, if we add a slow test to the access group, we are back to square one. But it seems like a logical assumption that updates tests are always going to be the biggest bottleneck, so it solves that. No manual management.
  3. Option three is random, so the slow tests could still wind up at the end, but over the long haul this should even out, new slow tests will just get shuffled in. No manual management.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Lendude created an issue. See original summary.

Lendude’s picture

Here are the three options as shown in patches

Lendude’s picture

HEAD

  • Unit: 25 sec
  • Kernel: 7 min 16 sec
  • Functional: 34 min 31 sec
  • Javascript: 6 min 4 sec
  • Total: 52 min

random

  • Unit: 33 sec
  • Kernel: 5 min 32 sec
  • Functional: 30 min 31 sec
  • Javascript: 5 min 48 sec
  • Total: 47 min

reversed

  • Unit: 42 sec
  • Kernel: 5 min 7 sec
  • Functional: 34 min 5 sec
  • Javascript: 3 min 54 sec
  • Total: 46 min

#slow

  • Unit: 30 sec
  • Kernel: 5 min 10 sec
  • Functional: 31 min 47 sec
  • Javascript: 4 min 8 sec
  • Total: 46 min

All tests done on PHP 7.2 with MYSQL. There is of course variation in test times but I ran this a couple of times in #3038758: [ignore] Lendude testing issue and these times seem a good indicator, same for the test of HEAD.

Mixologic’s picture

Maybe we could reuse the phpunit @large annotation and set the timeout to something long, like 20 minutes? Not sure if that's using that annotation appropriately, or not.

Mile23’s picture

+++ b/core/modules/settings_tray/tests/src/FunctionalJavascript/SettingsTrayBlockFormTest.php
@@ -9,6 +9,7 @@
+ * @group #slow
  * @group settings_tray

So when you say 'orders by alphabet' that's a bit of a problem, because TestDiscovery doesn't know how to deal with multiple groups. So if you make the first group '#slow' then 'settings_tray' is ignored by run-tests.sh. Anyone using run-tests.sh to run that group then loses those tests.

But hey, guess what. We have this: #2296615: Accurately support multiple @groups per test class

We could also introduce our own group annotation scheme like @early or something, which would then be a sort for run-tests.sh. Then establish some metrics with, for instance, speedtrap: https://github.com/johnkary/phpunit-speedtrap

Lendude’s picture

Issue summary: View changes

@Mile23 yeah the loss of group also occurred to me, but -5 minutes on all run times is straight money in the bank, so it might be worth it. Or it means a +1 for randomise which currently doesn't have that drawback.

Doing an @early annotation sounds like a pretty good idea too, since that would clearly signal that this is a Drupal only thing instead of misusing the groups annotation for this.

The charm of the #slow group for me was that it really doesn't require any additional work in TestDiscovery nor run-tests.sh, updated the IS with the drawbacks of #5

alexpott’s picture

I think Using @large is a bit problematic

If the PHP_Invoker package is installed and strict mode is enabled, a large test will fail if it takes longer than 60 seconds to execute. This timeout is configurable via the timeoutForLargeTests attribute in the XML configuration file.

see https://phpunit.de/manual/6.5/en/appendixes.annotations.html#appendixes....

60 seconds :) yes we can change it but we're amusingly different from that.

I think using groups is a good idea but it does sound like we need to fix #2296615: Accurately support multiple @groups per test class first.

Lendude’s picture

Issue summary: View changes

#2296615: Accurately support multiple @groups per test class is in, nice! Updated the IS again to reflect that one of the drawbacks of using groups has been addressed. It seems like that option has the most traction, so struck out the others for now.
#2296615: Accurately support multiple @groups per test class also addressed an issue I had that the #slow group always had to be the first group in the annotation. Since the groups now get sorted, it should always float to the top of the list.

So the only question I still see is do we want to use #slow or something similar that uses the fact that we sort by alphabet or do we want a dedicated group (like @large) for which we provide special handling in run-tests.sh that moves that group to the front of the list.

To me, no new special cases/handling/logic in run-tests.sh sounds good, so I would opt for the #slow group, but curious what others think.

Mile23’s picture

Here's a first pass at @early.

Lendude’s picture

Couple of things I see with #9

  1. +++ b/core/modules/simpletest/src/TestDiscovery.php
    @@ -198,16 +200,26 @@ public function getTestClasses($extension = NULL, array $types = []) {
    +    $list = array_merge($early_list, $list);
    

    This would lead to the entire group to be place at the front, so the whole group becomes early when one test in that group is @early. So I think we need to make @early into a pseudo group and place that at the start of the list. Since tests can now be in multiple groups anyway, that shouldn't be a problem.

  2. +++ b/core/modules/system/tests/src/Functional/Module/InstallUninstallTest.php
    @@ -10,6 +10,8 @@
    + * @early
    

    This doesn't get picked up by the TestDiscovery regex preg_match_all('/^[ ]*\* \@([^\s]*) (.*$)/m', $doc_comment, $matches);

    it's also looking for a value, so if we don't want to modify the regex we need to do something like @early true, also annotations are not automatically added to the info, so we need to add it to the info

This addresses those issues, lets see what happens.

Status: Needs review » Needs work

The last submitted patch, 10: 3038901-10.patch, failed testing. View results

Lendude’s picture

Status: Needs work » Needs review
FileSize
6.88 KB

Fixed test.

Mile23’s picture

Using a group is fine, too. I just wanted to avoid magical naming conventions that work because other things don't work.

Lendude’s picture

I just wanted to avoid magical naming conventions that work because other things don't work.

Agree 100%, so I really wanted to see what it would take to get @early to work, 'cause that is a much less magical solution. But looking at it, I'm not sure if its worth the added complexity, since we would also need to update the regex, because having to do @early something is really bad DX I think.

Lets try another #slow when #3039572: Prevent tests with multiple groups running multiple times lands.

Lendude’s picture

HEAD => #14
Kernel: 6 min 57 sec => 5 min 12 sec
Functional: 34 min 23 sec => 31 min 38 sec

larowlan’s picture

Status: Needs review » Reviewed & tested by the community

It's primitive, but it works.

dww’s picture

I appreciate the primitive simplicity. ;)

Do we document test groups anywhere?

- If so, can we update about this? What's the criteria for being #slow? How will people know when to use this group or not? Is the expectation that core committers will Just Know(tm) and get it right from here on out? Or will @Lendude keep a watchful eye on all tests added to core? ;) Neither seems particularly sustainable.

- If not, can we add some? ;) Not sure where. Perhaps in core/tests/README.md and/or core/core.api.php? Both mention that test groups exist, but neither really gives any guidance about how to define them.

Otherwise, +1 for RTBC on this. Getting the slow operations started first is a classic scheduling algorithm in concurrent systems.

Cheers,
-Derek

Mile23’s picture

Doc pages: https://www.drupal.org/docs/8/phpunit

We don't have much about groups, other than mentioning you need at least one in this document: https://www.drupal.org/docs/8/phpunit/phpunit-file-structure-namespace-a...

As far as metrics for which tests get the treatment: I'd say we should include a timing tool that fails your test run if an unmarked test takes too long. I'd say something like https://github.com/johnkary/phpunit-speedtrap but that requires PHP 7.1.

It's not a very complicated listener to put together, but does require doing. And then bikeshedding. :-)

Note that one benefit here is that we can say:

./vendor/bin/phpunit -c core --exclude-group #slow
dww’s picture

Yeah, I'm actually a maintainer of that doc guide. ;) I was thinking it'd be good to document this stuff directly in core, not just in a handbook page.

Meanwhile, https://www.drupal.org/docs/8/phpunit/phpunit-file-structure-namespace-a... does mention @group, but only to say:

Every test class MUST specify at least one (first) @group that matches the originating module short name (or Drupal core component name).

Not particularly helpful...

So yes, we could add something about #slow to that page. But I'm wondering if we can also put it into some of the included-with-core-in-Git docs, and if so, if there's a preference about where and what.

Speedtrap sounds nice, but that's obviously a separate issue. For now, I'd like some clarity (I guess from @Lendude) on the criteria used to generate this patch so we have something to start with.

Thanks,
-Derek

Lendude’s picture

@dww excellent point on the docs. I would argue against putting it in the core git docs since this is a far from normal occurrence.

For now I would add a point to the 'group' explanation pointed out in #19. Something like:
An additional group '#slow' can be added to test that have a long running time to ensure these tests are run first.

Not very specific, but then again, currently there are no specific criteria for when to add this.

Which brings me to how I build this patch. I just looked at the tests that are last to finish or that ended way out of alphabetical order and tagged them to see if this had any effect. And I totally agree that this is not a sustainable method of doing this :)

#3033037: DrupalCI should track and display job run times should help determine if new slow tests get added by patches, so we could bump the priority on that one.

Adding an automated way of detecting this is obviously even better, so I agree with @Mile23 we should do the speedtrap follow-up and see if that works for us.

Mixologic’s picture

#3033037: DrupalCI should track and display job run times is about the time of the entire build run (1 hr 10 min etc).
#2943340: Process PHPUnit junit reports separately is the drupalci issue that would have us use the phpunit junit.xml, which already has the test timing data in it, vs the junit that run-tests.sh generates.

Mile23’s picture

amateescu’s picture

How about including all the update path tests in the #slow group? They're all really slow "by nature" :)

dww’s picture

Status: Reviewed & tested by the community » Needs work
FileSize
19.78 KB

@amateescu re: #23: At #3040694: Use the phpunit result cache to order tests by slowest first in gitlab jobs we've got a patch that puts all tests that take longer than 75 seconds in the #slow group, and a mechanism to fail new tests that are added which take more than 90 seconds and aren't #slow. There's a lot from core/modules/system/tests/src/Functional/Update in that patch, but definitely not everything. ;) If we put too many things in #slow, it starts to undo the effectiveness of the scheduling algorithm.

Anyway, I'm really in favor of #3040694 as a quantifiable way to decide what belongs in #slow or not.

I just tried applying both patches. Much to my surprise, a few of the tests marked #slow in here didn't hit the 75 second mark over there. Of course that patch catches a lot more tests than this did. Looking at the interdiff, I also noticed one of the ones in here has a typo (@gruop #slow).

So, not entirely sure how to proceed now. Honestly, I think #3040694-11: Use the phpunit result cache to order tests by slowest first in gitlab jobs is the actually RTBC patch now. But in fairness, this was @Lendude's cool idea, and @Mile23 and I just built upon it. In terms of issue history, commit credits, etc, I think we should commit something here first, with @Lendude as the primary author, then we can re-roll #3040694 to remove whatever the conflicts are, and commit @Mile23's cool test listener and the rest of the tests that should also be #slow.

How does that sound to everyone?

NW for the typo, and removing the #slow tests from here that aren't 75 seconds slow.

dww’s picture

Status: Needs work » Needs review
FileSize
2.65 KB
1.75 KB
18.03 KB
amateescu’s picture

Status: Needs review » Reviewed & tested by the community

#24 sounds great to me :)

Lendude’s picture

#24 makes sense to me, lets land this soon so we get the initial benefit and continue the awesome stuff in #3040694: Use the phpunit result cache to order tests by slowest first in gitlab jobs to automate this which might take a bit longer to perfect.

dww’s picture

FileSize
3.16 KB
357 bytes

Let's keep both of the Kernel tests from #14.

  • larowlan committed dcd66e1 on 8.8.x
    Issue #3038901 by Lendude, dww, Mile23: Speed up the tests on DrupalCI...
larowlan’s picture

Version: 8.8.x-dev » 8.7.x-dev
Status: Reviewed & tested by the community » Fixed

Committed dcd66e1 and pushed to 8.8.x. Thanks!

c/p to 8.7.x

  • larowlan committed 28ec907 on 8.7.x
    Issue #3038901 by Lendude, dww, Mile23: Speed up the tests on DrupalCI...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.