[PP-1] Reduce test pipeline times to around 18 minutes and always report back unit tests [#3386091]

Problem/Motivation

Drupal CI test runs do something like this: lint -> unit -> kernel -> functional -> functional javascript -> nightwatch.

Each test group is executed with maximum concurrency, but the groups have to be executed one after the other.

gitlab has jobs rather than single script, this allows different test types to run in parallel.

#3386076: GitLab CI integration for core is structured so that CI pipelines run in two stages.

First, a lint step, with cspell, phpstan etc.

Second, all of the various test types are executed in their own jobs.

Unit tests are a dependency for the other test types, so if unit tests fail,the functional, functional javascript, nightwatch etc. don't run.

Since the lint step blocks the test step, this means you have in effect three steps, each of which must finish before the first starts:

lint -> unit -> other test types

With this configuration, a full pipeline run is around 32 minutes.

This issue is to try to improve on that 32 minute test time, as well as allowing unit tests to run independent of linting, so that they always return even if there's a typo.

This results in:

lint + phpunit -> other test types - takes a minute or two off the pipeline.

Additionally, functional tests even at 32 concurrency were taking over 20 minutes to complete. If we chunk those into eight groups (and also build tests into two groups for the same reason), we end up with various run times of between about 4 and 11 minutes. The maximum concurrency is ten, so when a 4 minute job finishes, it frees up a runner/pod to complete another short job while the 11 minute ones are still running.

We have tweaked the order of test execution to functional -> build -> javascript -> nightwatch to support starting the longest running tests first.

With these changes (all just changes to the YAML config plus a small supporting change to run-tests.sh), pipelines complete in around 18 minutes.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Issue fork drupal-3386091

Show commands

Start within a Git clone of the project using the version control instructions.

Add & fetch this issue fork’s repository

Or, if you do not have SSH keys set up on git.drupalcode.org:

Add & fetch this issue fork’s repository

3386091-pp-1-consider-putting changes, plain diff MR !4727
Check out this branch for the first time

Check out existing branch, if you already have it locally
3386091-split-only-with-parallel changes, plain diff MR !4733
Check out this branch for the first time

Check out existing branch, if you already have it locally

3 hidden branches

3386091-spread-tests

changes, plain diff MR !4737

Check out this branch for the first time

Check out existing branch, if you already have it locally

3386091-reduce-config

changes, plain diff MR !4736

Check out this branch for the first time

Check out existing branch, if you already have it locally

3386091-split-using-parralel

changes, plain diff MR !4731

Check out this branch for the first time

Check out existing branch, if you already have it locally

About issue forks

Comments

Comment #1

8 September 2023 at 10:54

catch created an issue. See original summary.

Comment #2

catch

he/him

English

commented 8 September 2023 at 11:33

Status:

Active

» Closed (works as designed)

This is already done with --with-unit-tests in the core template, closing as by design. Doesn't need its own group

Comment #3

catch

he/him

English

commented 8 September 2023 at 14:15

Title:	[PP-1] Consider putting unit tests into their own blocking test group	» [PP-1] Allow unit tests to run if linting fails
Issue summary:	View changes
Status:	Closed (works as designed)	» Active

Actually, re-opening with a slightly different purpose.

If we skip the linting tests for unit tests, but add it to all the other types of tests, the following happens:

1. Unit tests run parallel to linting, this removes a 2 minute blocking step in between linting and all other test types so theoretically might reduce overal pipeline time from ~32 minutes down to ~30 minutes.

2. Unit tests will run despite failures in linting tests - this means if you've made a typo in a code comment and also broken a unit test, you'll get the unit test feedback back alongside the linting errors.

All the other test types continue to rely on both unit tests and linting so we don't do expensive test runs for things that are nearly guaranteed to fail.

Comment #4

8 September 2023 at 14:16

catch opened merge request !4727

Comment #5

catch

he/him

English

commented 8 September 2023 at 14:17

The MR is based off the main gitlab MR, this is the actual change for this issue: https://git.drupalcode.org/issue/drupal-3386091/-/commit/8cfe4fc5fed6bc6...

Comment #6

catch

he/him

English

commented 8 September 2023 at 15:04

Status:

Active

» Needs review

Comment #7

smustgrave commented 8 September 2023 at 17:33

Status:

Needs review

» Needs work

Should TESTSDIRECTORY be added to dictionary?

Or phpcs ignore for that file?

Comment #8

8 September 2023 at 18:45

bbrala made their first commit to this issue’s fork.

Comment #9

8 September 2023 at 18:45

bbrala opened merge request !4731

Comment #10

8 September 2023 at 20:27

bbrala opened merge request !4733

Comment #11

8 September 2023 at 20:33

bbrala closed merge request !4731

Comment #12

8 September 2023 at 22:45

larowlan made their first commit to this issue’s fork.

Comment #13

catch

he/him

English

commented 9 September 2023 at 06:55

Title:	[PP-1] Allow unit tests to run if linting fails	» [PP-1] Reduce test pipeline times to around 18 minutes and always report back unit tests
Issue summary:	View changes
Status:	Needs work	» Needs review

Tried allowing kernel tests to run parallel to unit tests, it didn't improve things so reverted it.

Small tidy-up merging with-composer into with-composer-cache because they were identical.

Last pipeline run was 18 minutes, 5 seconds: https://git.drupalcode.org/project/drupal/-/pipelines/18810

Anything else we've tried is not getting us below 18 minutes, some are neutral, some make it worse, so I think this issue has got as far as it can without more intrusive changes.

The next stage would be #3386217: \Drupal\Tests\Core\Test\PhpUnitCliTest requires a database and webserver - move to a functional test to make the unit test run shorter - that in turn would start all the other test types faster.

Retitling again and updating the issue summary.

We keep seeing the same nightwatch failure over and over on the pipelines, that is the only test type we're not messing around with, so probably not introduced here, but needs its own issue at least.

Comment #14

catch

he/him

English

commented 9 September 2023 at 07:02

Priority:

Normal

» Critical

Also I think we're allowed to bump the priority. gitlabci already completes nearly twice as quickly as DrupalCI, this makes it three times as quick as DrupalCI.

Comment #15

9 September 2023 at 08:47

bbrala opened merge request !4736

Comment #16

catch

he/him

English

commented 9 September 2023 at 08:51

Kernel parallel x 2 gets us down to 17 minutes 24 seconds. https://git.drupalcode.org/project/drupal/-/pipelines/18837 this is because the kernel test runner frees up to pick up the last of the functional tests.

Which gives me the idea to up the parallezation of kernel tests and run them alongside linting again - idea is to max out the 10 concurrent jobs during both the linting and test stages. About to push that.

Comment #17

9 September 2023 at 09:20

bbrala closed merge request !4736

Comment #18

catch

he/him

English

commented 9 September 2023 at 09:44

OK 17m49s now: https://git.drupalcode.org/project/drupal/-/pipelines/18855 margin with error with the previous attempt which was 17m30s.

The trade-off is that kernel x 4 can result in one or two of the linting steps queueing, delaying the start of the longest javascript/functional jobs by a minute, so we really are at the limit at 17m30s.

If we can somehow optimize the yarn build step, that would start a lot of things earlier, even 30seconds could shave off 1-2 minutes by freeing up runners. Maybe the cloud caching stuff will help.

The advantage of this approach though is that kernel test failures will report at the same time as phpunit failures and cs failures, all within about 4-5 minutes of pushing to a branch. We still block the properly expensive test runs on unit tests and linting though.

Also now the kernel tests are reporting back quicker, we could consider adding *with-kernel-tests to the more expensive tests without a speed penalty.

Comment #19

catch

he/him

English

commented 9 September 2023 at 11:08

OK even at ten parallel for functional tests, the bottleneck is still the first group of functional tests - there must be some tests in there that are so slow they create a hard floor even while other tests are spread out. So we definitely have reached the limit of overall run times here, but the kernel tests happening earlier isn't affecting the speed of the whole run, so I think that's a good change.

Stopping for now - IMO this is mergeable either as a follow-up to the initial gitlab issue or we could merge it back in there and commit them together.

Comment #20

9 September 2023 at 12:58

bbrala opened merge request !4737

Comment #21

9 September 2023 at 19:44

bbrala closed merge request !4737

Comment #22

fjgarlin commented 11 September 2023 at 07:59

Given that #3386076: GitLab CI integration for core was (and still is) a first proposal and that this issue is built on top of that one, I think the changes from this MR should go directly there and continue the conversation and improvements there.

The MR for #3386076: GitLab CI integration for core already has some feedback and it'd be great to not have two diverging proposals.

The changes to leverage parallelism in this issue are great and the fact that we are already below the 20 minute mark is amazing.

I'm going to mark it RTBC but really aiming for the changes to go directly to #3386076: GitLab CI integration for core and then we can mark this issue as fixed.

The nightwatch issue seems intermittent but we can perhaps try to fix it in the other issue too if it keeps on appearing.

Comment #23

fjgarlin commented 11 September 2023 at 07:59

Status:

Needs review

» Reviewed & tested by the community

Comment #24

catch

he/him

English

commented 11 September 2023 at 09:38

Status:

Reviewed & tested by the community

» Closed (duplicate)

Merged over there, marking this as duplicate.

[PP-1] Reduce test pipeline times to around 18 minutes and always report back unit tests

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Issue fork drupal-3386091

Comments

Related issues

Referenced by