Problem/Motivation

Drupal core pipelines are configured to cache composer and yarn builds, but without distributed caching enabled this doesn't work because almost all jobs run on separate hosts.

Steps to reproduce

Proposed resolution

See https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-...

Remaining tasks

User interface changes

API changes

Data model changes

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

    1 hidden branch
  • 3387117-test-s3-cache Comparecompare

Comments

longwave created an issue. See original summary.

fjgarlin’s picture

I've put together an internal MR to enable this. It's based on the setup suggested here https://docs.gitlab.com/runner/configuration/advanced-configuration.html... and using "iam" authentication.

wim leers’s picture

Issue tags: +Test suite performance

Ran into this too: https://drupal.slack.com/archives/CGKLP028K/p1695219548515259

This can deliver a nice boost in some scenarios! 😊

wim leers’s picture

Issue tags: +caching

Bump — this has the potential to help with

11.5 GiB Project Storage

https://git.drupalcode.org/project/big_pipe_sessionless
🤯11.5 GB!!! 👆 for a trivial module, that is being tested against 9.5, 10.0, 10.1 ,10.2 and 11.0.
This is because of a nightly scheduled build that has to do multiple composer builds, each of which takes ~100 MB: https://git.drupalcode.org/project/big_pipe_sessionless/-/artifacts.

If we extrapolate this to 100s or even 1000s of contrib modules doing multi-version testing (which they should!), then this is going to quickly become untenably expensive AFAICT? (/cc @hestenet (he/him)). Can we automatically clear out all artifacts from all previous scheduled runs, at least the ones that were successful? That’d make a huge difference I think?

https://drupal.slack.com/archives/CGKLP028K/p1698398728821119

andypost’s picture

I bet even start using local runner's cache can speed-up pipelines

Simplest approach is just add fallback cache https://docs.gitlab.com/ee/ci/caching/#per-cache-fallback-keys
Then tune-up push/pull policy https://docs.gitlab.com/ee/ci/caching/#use-a-variable-to-control-a-jobs-...

Alternatively, cspell for example (7.5MB) can use artifacts via https://docs.gitlab.com/ee/ci/jobs/job_artifacts.html#with-a-cicd-job-token

andypost’s picture

Filed new issue as it's the blocker for new infra #3449463: Update DB images to prevent server restart

wim leers’s picture

Priority: Normal » Major

Do we have an ETA for this? 🤞

This could AFAICT massively speed up CI runs, and reduce infrastructure costs!

cmlara’s picture

If we don’t implement #3445532: Random HTTP timeouts for GitLab CI jobs this issue would be helpful to avoid failing jobs (at an expense of increase in S3 storage usage)

Cache is not a replacement for artifacts, this won’t fix the storage issue mentioned above but can help decrease runtimes by copying some temporary cache files (like phpstan analysis cache) though that will not help
issue forks (cache is per project)

catch’s picture

If this doesn't help issue forks, do we want to look at moving phpstan analysis caches (and similar) into artifacts somehow? afaik that wouldn't be blocked on infrastructure changes too, so we could find out whether it helps or not fairly quickly.

edit: opened #3462763: Use artifacts to share the phpstan result and cspell caches from core to MRs

murz’s picture

Caching of Composer and NPM dependencies can significantly reduce the pipeline execution time and resources usage, including bandwidth, because now we download packages again from scratch in every pipeline.

Also, it should reduce the load and bandwidth on the Composer repo servers too.

I tried to enable caching in the issue #3513729: Implement caching of the Composer packages in GitLab CI pipeline jobs and in a contrib module, but faced this issue that the GitLab cache URL is not configured, here is an example: https://git.drupalcode.org/project/commercetools/-/jobs/4701921

Restoring cache
Checking cache for composer-11.1.4-09bfc62449f88e9b34f39b2dd9328bbff5cb5e0a-non_protected...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted. 
WARNING: Cache file does not exist
Failed to extract cache

But, surprisingly, it can be uploaded well:

Creating cache composer-11.1.4-7bd9c3ecd13a053b2376270a00aa6f55d8c958da-non_protected...
composer.lock: found 1 matching artifact files and directories 
.composer-cache: found 924 matching artifact files and directories 
vendor: found 28005 matching artifact files and directories 
web/core: found 87866 matching artifact files and directories 
web/modules/contrib: found 266 matching artifact files and directories 
web/themes/contrib: found 9057 matching artifact files and directories 
Uploading cache.zip to https://storage.googleapis.com/new-prod-ocean-s3/path/to/prefix/project/108615/composer-11.1.4-7bd9c3ecd13a053b2376270a00aa6f55d8c958da-non_protected 
Created cache

So, maybe we just miss some easy thing to make it work?

fjgarlin’s picture

As mentioned in #2, there is an MR in the private repo where the gitlab runner is managed to add caching to s3.

The patch is this (a bit trimmed down):

--- a/applications/gitlab-runner/gitlab-runner-app.yaml
+++ b/applications/gitlab-runner/gitlab-runner-app.yaml
@@ -29,6 +29,14 @@ spec:
         runners:
           config: |
             [[runners]]
+              [runners.cache]
+                Type = "s3"
+                Path = "drupalcode"
+                Shared = true
+                [runners.cache.s3]
+                  AuthenticationType = "iam"
+                  BucketName = "runners-cache"
+                  Insecure = false            
               [runners.kubernetes]
                 namespace = "{{.Release.Namespace}}"

It needs infra to set "iam" per https://docs.gitlab.com/runner/configuration/advanced-configuration.html... first.

cmlara’s picture

Considering the propose config change has been waiting since September of 2023:

What is infras opinion on this? Is this something they actually plan to enable or not given its known positives and negatives?

drumm’s picture

Since the new cluster has been stable, we can move ahead with this.

Caching looks like its opt-in via editing .gitlab-ci.yml, so it's not too much risk destabilizing tests, until we get something using this into the main templates.

I’m not seeing any explicit notes about cached items expiring and being cleaned up, we will want to make sure it's not a runaway cost.

cmlara’s picture

I’m not seeing any explicit notes about cached items expiring and being cleaned up, we will want to make sure it's not a runaway cost.

IIRC Gitlab currently expects that the storage layer will handle this (add lifecycle rules in S3).

fjgarlin’s picture

Internal MR rebased (still the same as #11).

Will create an MR here that tests the cache so we can see it in this fork/MR.

fjgarlin changed the visibility of the branch 3387117-test-s3-cache to hidden.

fjgarlin’s picture

Status: Active » Needs review
Related issues: +#3553846: Test that caching works
fjgarlin’s picture

Status: Needs review » Reviewed & tested by the community

The internal MR was merged.

I did a POC to test that caching works and it did. See this comment.

I don't think that we need to change or document anything anywhere, as GitLab has good documentation about this. Core or contrib can start using this feature and report here if there are any issues.

RTBC (I guess it could be "Fixed" as well, but I'll wait until we have some more tests confirming it).

mondrake’s picture

Newbie here, excuse my ignorance

Could this help with #3549110: [CI] Performance pipeline execution drops warmed caches?