Problem/Motivation

When a merge request modifies too many files (around 1000) the 📔 Spell-checking job fails with "Argument list too long" errors.

$ export MODIFIED=`git diff --name-only refs/heads/$TARGET_BRANCH|while read r;do echo "$CI_PROJECT_DIR/$r";done|tr "\n" " "`
$ echo $MODIFIED | tr ' ' '\n' | yarn --cwd=./core run -s spellcheck:core --no-must-find-files --file-list stdin
/scripts-125737-332819/step_script: line 262: /usr/bin/tr: Argument list too long
/scripts-125737-332819/step_script: line 262: /usr/bin/yarn: Argument list too long

The maximum length of arguments is determined by getconf ARG_MAX. There are some docs about this in GitLab CI at https://docs.gitlab.com/ee/ci/variables/#argument-list-too-long

Aside: do we need a gitlab or ci component for these issues?

Steps to reproduce

https://git.drupalcode.org/issue/drupal-3399388/-/jobs/332819

Proposed resolution

Instead of piping the list of files to tr and again to yarn we can write the list to a file read it in the yarn command. Something like this:

git diff --name-only refs/heads/$TARGET_BRANCH|while read r;do echo "$CI_PROJECT_DIR/$r";done > ./core/modified.txt
yarn --cwd=./core run -s spellcheck:core --no-must-find-files --file-list modified.txt

Remaining tasks

  • Determine optimal command formatting / file location
  • Update .gitlab-ci.yml

User interface changes

N/A

API changes

N/A

Data model changes

N/A

Release notes snippet

N/A

Issue fork drupal-3401988

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

mstrelan created an issue. See original summary.

fjgarlin’s picture

Note that the "git diff" command might not work as expected if the MR needs rebasing.

#3401971: Test-only job shouldn't require constant rebases to detect which files were changed. needs to change the git diff command.
git diff ${CI_MERGE_REQUEST_DIFF_BASE_SHA} seems to be working well, so maybe, if we need to update the command here, we can use the same.

acbramley’s picture

I just hit this on https://git.drupalcode.org/project/drupal/-/merge_requests/5500

As you can see, there aren't many files changed at all. The issue was that the fork's 11.x branch was way out of date. That doesn't make sense to me since the MR's target is origin's 11.x. Why does it matter where the tip of the fork's 11.x branch is pointing?

Pushing 11.x to the fork fixed it, but then the 11.x branch shows up on the issue (which can easily be hidden now, but still shouldn't be required).

fjgarlin’s picture

Status: Active » Needs review

I made an MR based on my suggestion in #2.

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Seems like a simple enough change

acbramley’s picture

Status: Reviewed & tested by the community » Needs work

Tested the change here https://git.drupalcode.org/issue/drupal-3412420/-/jobs/651480 which errored out but passed the pipeline.

smustgrave’s picture

Priority: Normal » Major

Could argue almost critical as some MRs aren't running now

acbramley’s picture

Bumping the depth to 50 (as per recommendation via slack) has fixed it https://git.drupalcode.org/issue/drupal-3412420/-/jobs/651854

mstrelan’s picture

I agree that we should make this change, but it's not addressing the issue that was actually reported here. We would still have a fail when an MR actually does touch that many files (such as #3399388: Add declare(strict_types=1) to all Kernel tests).

acbramley’s picture

For those hitting this (where you DONT have lots of files changed) - a temporary fix is to push the latest 11.x branch to your fork:

git checkout 11.x
git fetch
git pull origin 11.x
git push FORK 11.x

Where FORK is your issue's fork (i.e drupal-123123

You'll then need to manually hide the 11.x branch on your issue in the Issue fork section.

fjgarlin’s picture

Re #10, why not looping through the files changed and run the spellcheck per file? Even if we make more calls, it should be safer, and for small MRs, the impact shouldn't be too big.

mstrelan’s picture

We don't need to loop because the issue summary already shows an example using a temp file.

fjgarlin’s picture

Oh! My bad, I didn't read that part. Will revert the changes.

fjgarlin’s picture

Status: Needs work » Needs review

Using the suggested solution (thanks @mstrelan!) and the fetching fix, things are working as expected. See https://git.drupalcode.org/issue/drupal-3401988/-/jobs/654322

MR is ready for review: https://git.drupalcode.org/project/drupal/-/merge_requests/6209/diffs

andypost’s picture

wim leers’s picture

👀 Ran into this too, I thought this was a recent regression 😅, when it clearly is not.

drumm’s picture

Status: Needs review » Reviewed & tested by the community

I ran this patch on the private fork used for testing security patches and confirmed it works as expected there.

wim leers’s picture

Ran into this again in a core issue at #3317769: Add support for linking to entities in CKEditor 5.

acbramley’s picture

Big +1 to get this committed, it's biting me on almost every issue I touch at the moment.

wim leers’s picture

Priority: Major » Critical
longwave’s picture

Status: Reviewed & tested by the community » Needs review

Added some nitpicks to the shell script.

fjgarlin’s picture

@longwave - your suggestion differs from the one proposed in the issue summary and #13. Both work, so it's just a matter of agreeing to which one to choose.

If there are 100 files changed:
- The pipe+sed+yarn would trigger the command will trigger the spellcheck command 100 times for a single file each time
- The keeping changed files in a file+yarn would trigger the spellcheck command 1 time for the 100 files with just one argument

I'm fine either way but it'd be great to agree on the approach to do before making any more changes.

smustgrave’s picture

Seen a few more issues get hit by this today #3347343: Add Views EntityReference filter to support better UX for exposed filters which is a large issue that's close. If this fixes it can we merge this and tweak in a follow up?

longwave’s picture

@fjgarlin Why would git diff | sed | yarn trigger it 100 times? The --file-list stdin argument takes the entire list directly from stdin, we're not using xargs.

The suggestion is the same as https://github.com/streetsidesoftware/cspell/issues/1850#issuecomment-10... except that we need to do some fixing up of the paths because package.json isn't in the repo root.

fjgarlin’s picture

That’s just me not knowing enough about bash.

takes the entire list directly from stdin

So won’t that pass “too many arguments” again if many files change? Or does it count as just one argument?

longwave’s picture

In fact testing locally we can simplify it to:

git diff $CI_MERGE_REQUEST_DIFF_BASE_SHA --name-only | sed "s_^_../_" | yarn --cwd=./core run -s spellcheck:core --no-must-find-files --file-list stdin

which fixes the directory issue using relative instead of absolute paths.

longwave’s picture

https://docs.gitlab.com/ee/ci/variables/#argument-list-too-long only mentions this affecting variables, I think the problem before was $MODIFIED was too long - here we aren't using variables or arguments at all, just pipes and stdin/stdout.

fjgarlin’s picture

I thought that the limitation was from cspell, not gitlab!!

I'm going to try the suggested fix in the last failing reported MR.

fjgarlin’s picture

Testing the suggested fix on an MR that was failing and is not rebased (#3347343: Add Views EntityReference filter to support better UX for exposed filters):

Reason. The fork does not even have 11.x. This is not the problem on this issue.
I needed to push the 11.x branch, and the pipeline now works, but then I can't reproduce the issue here.

Trying to find another MR.

fjgarlin’s picture

Thanks for the explanations about the different variations of the commands.

I think this is ready for review again. #31 shows how it fixes it in a failing MR.

fjgarlin’s picture

Everything is rebased too.

longwave’s picture

Version: 11.x-dev » 10.2.x-dev
Status: Needs review » Fixed

Committing from NR as this is affecting multiple merge requests.

Committed and pushed 7986c7c8d6 to 11.x and e874b9620b to 10.2.x. Thanks!

  • longwave committed e874b962 on 10.2.x
    Issue #3401988 by fjgarlin, acbramley, longwave, mstrelan, drumm: Spell-...

  • longwave committed 7986c7c8 on 11.x
    Issue #3401988 by fjgarlin, acbramley, longwave, mstrelan, drumm: Spell-...
norman.lol’s picture

#11 did not help unfortunately, but in the end a rebase did.

git switch 11.x
git pull
git switch ISSUE-BRANCH
git rebase 11.x
git push --force-with-lease

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

ressa’s picture

#3082958: Add gitignore(s) to Composer-ready project templates just got hit by this, even though the MR only modifies a few files ...

       * [new branch]        11.x       -> 11.x
$ export MODIFIED=`git diff --name-only refs/heads/$TARGET_BRANCH|while read r;do echo "$CI_PROJECT_DIR/$r";done|tr "\n" " "`
$ echo $MODIFIED | tr ' ' '\n' | yarn --cwd=./core run -s spellcheck:core --no-must-find-files --file-list stdin
/scripts-81307-1019042/step_script: line 286: /usr/bin/tr: Argument list too long
/scripts-81307-1019042/step_script: line 286: /usr/bin/yarn: Argument list too long

Cleaning up project directory and file based variables
ERROR: Job failed: command terminated with exit code 1

https://git.drupalcode.org/issue/drupal-3082958/-/pipelines/108333/failures