Problem/Motivation

Issues like #2829185: Fix spelling errors in Drupal core comments are great, as they provide profit, but it'd be nice when things could be done more automatically.
Let's try to do some research how we could provide maybe a tool to check for spellchecking automatically.

Symfony for example corrects common mistakes using a bot.

Proposed resolution

Use cspell

The config file locates at core/.cspell.json, you can add new words into it or adjust it when necessary. Visit https://github.com/streetsidesoftware/cspell/tree/master/packages/cspell and read the documentation for more information. Such as how to enable/disable checking etc.

Example usages:

  • Checking files inside the core folder:
    cd core
    yarn run spellcheck:core
    

    Warning: the executing time for this command is about 5 to 30 minutes depending on your PC.

  • Or to checking any file or directory:
    cd core
    yarn run spellcheck "path/to/file/or/directory/**/*"
    

Once an unknown word is reported, you have two options:

  • If it's an incorrect word, fixing it
  • if it's a correct word, adding it to the core/misc/cspell/dictionary.txt file

Regenerating the Drupal dictionary

Run:

cd core
rm misc/cspell/dictionary.txt
yarn -s spellcheck:core --unique --wordsOnly | sort -o misc/cspell/dictionary.txt

Remaining tasks

Followups:
#3122088: [Meta] Remove spelling errors from dictionary.txt and fix them

User interface changes

None

API changes

None

Data model changes

None

Dependency evaluation

  1. GitHub repo: https://github.com/streetsidesoftware/cspell
  2. Release cycle: first release was in January 2017, there has been approximately one major release a year (v4 is current, v5 is in alpha) and there are regular (monthly or sooner) patch releases.
  3. Security policy: cspell is a development tool that does not appear to have a security release process.

The package is used by:

  • phpcs
  • Code Spell Checker for Visual Studio Code, 1.3 million installs (see below about the Microsoft-authored dependency)

In addition to its own sub-packages, cspell installs all of the following:

  1. dot-prop (sindresorhus)
  2. graceful-fs (isaacs)
  3. is-obj (sindresorhus)
  4. make-dir (sindresorhus)
  5. write-file-atomic (isaacs)
  6. at-least-node (RyanZim)
  7. comment-json (kael)
  8. configstore (sindresorhus)
  9. crypto-random-string (yeoman)
  10. fs-extra (jprichardson)
  11. gensequence (Jason3S)
  12. has-own-prop (sindresorhus)
  13. iterable-to-stream (Jason3S)
  14. jsonfile (jprichardson)
  15. unique-string (sindresorhus)
  16. universalify (RyanZim)
  17. vscode-uri (Microsoft, apparently)
  18. xdg-basedir (sindresorhus)

Release notes snippet

@todo

CommentFileSizeAuthor
#158 2972224-5-158.patch109.68 KBalexpott
#157 2972224-5-157.patch110.44 KBalexpott
#153 2972224-5-153.patch110.69 KBalexpott
#153 151-153-interdiff.txt289 bytesalexpott
#151 2972224-5-151.patch110.7 KBalexpott
#151 148-151-interdiff.txt522 bytesalexpott
#148 2972224-5-148.patch110.74 KBalexpott
#148 147-148-interdiff.txt984 bytesalexpott
#147 2972224-5-147.patch110.73 KBalexpott
#147 143-147-interdiff.txt2.33 KBalexpott
#146 2972224-no-tests-interdiff.txt25.03 KBlongwave
#143 2972224-5-143.patch110.99 KBalexpott
#143 139-143-interdiff.txt871 bytesalexpott
#139 2972224-5-137.patch111.08 KBalexpott
#139 127-137-interdiff.txt28.85 KBalexpott
#127 interdiff.2972224.125-127.txt881 byteslongwave
#127 2972224-127.patch98.12 KBlongwave
#125 2972224-5-125.patch98.11 KBalexpott
#125 123-125-interdiff.txt461 bytesalexpott
#123 120-123-interdiff.txt370 bytesalexpott
#123 2972224-5-123.patch98.13 KBalexpott
#120 2972224-5-120.patch98.15 KBalexpott
#120 110-120-interdiff.txt715 bytesalexpott
#110 2972224-5-110.patch97.98 KBalexpott
#110 108-110-interdiff.txt1.75 KBalexpott
#108 interdiff-102-108.txt34.23 KBjungle
#108 2972224-108.patch98 KBjungle
#102 2972224-4-102.patch108.34 KBalexpott
#102 90-102-interdiff.txt2.13 KBalexpott
#90 interdiff.2972224.87-90.txt24.6 KBlongwave
#90 2972224-90.patch108.16 KBlongwave
#87 interdiff.2972224.70-87.txt26.37 KBlongwave
#87 2972224-87.patch87.29 KBlongwave
#70 2972224-3-70.patch60.92 KBalexpott
#70 57-70-interdiff.txt4.05 KBalexpott
#57 interdiff-2972224-50-57.txt19.33 KBsja112
#57 2972224-57.patch60.95 KBsja112
#50 interdiff.2972224.46-50.txt43.01 KBlongwave
#50 2972224-50.patch84.95 KBlongwave
#46 interdiff.2972224.40-46.txt25.52 KBlongwave
#46 2972224-46.patch62.39 KBlongwave
#40 2972224-2-40.patch60.53 KBalexpott
#40 30-40-interdiff.txt3.06 KBalexpott
#30 2972224-2-30.patch60.82 KBalexpott
#30 28-30-interdiff.txt7.54 KBalexpott
#28 2972224-28.patch53.96 KBsja112
#26 2972224-img-patch-not-working.png209.08 KBsja112
#11 interdiff-8-11.txt1010 bytesjungle
#11 2972224-11.patch55.61 KBjungle
#8 2972224-8.patch55.07 KBalexpott

Comments

dawehner created an issue. See original summary.

cilefen’s picture

Title: Try to auomate spellchecking in Durpal core » Try to auomate spellchecking in Drupal core

Ha

cilefen’s picture

Title: Try to auomate spellchecking in Drupal core » Try to automate spellchecking in Drupal core

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

alexpott’s picture

PHPCS is using cspell - it looks really quick and support php out-of-the-box. Yes it is a NPM package but at some point core is going to have build process that involves NPM so that shouldn't stop us.

alexpott’s picture

Version: 8.9.x-dev » 9.0.x-dev
Status: Active » Needs review
StatusFileSize
new55.07 KB

To use this patch install & run cspell from the command line:

$ npm install -g cspell
$ cd core
$ cspell "**/*"

Status: Needs review » Needs work

The last submitted patch, 8: 2972224-8.patch, failed testing. View results

avpaderno’s picture

The failure is caused from Drupal\Tests\ComposerIntegrationTest::testExpectedScaffoldFiles(), which isn't related to this patch.

Looking at the words in the dictionary, I wonder why words like Pioggia or Pastafazoul are necessary. I don't recall seeing them used in computer science. I could understand spaghetti in spaghetti code, but even that word doesn't come in Drupal code comments.

jungle’s picture

StatusFileSize
new55.61 KB
new1010 bytes
diff --git a/core/.cspell.json b/core/.cspell.json
index 8955084e3b..3c7c12ce2f 100644
--- a/core/.cspell.json
+++ b/core/.cspell.json
@@ -11,6 +11,7 @@
       "modules/migrate_drupal/tests/fixtures/drupal6.php",
       "modules/migrate_drupal/tests/fixtures/drupal7.php",
       "modules/search/tests/UnicodeTest.txt",
+      "node_modules/*",
       "profiles/demo_umami/modules/demo_umami_content/default_content/languages/es/**/*",
       "MAINTAINERS.txt",
       "package.json",
diff --git a/sites/default/default.settings.php b/sites/default/default.settings.php
index bec626cd16..d4950477ae 100644
--- a/sites/default/default.settings.php
+++ b/sites/default/default.settings.php
@@ -424,7 +424,7 @@
 /**
  * Class Loader.
  *
- * If the APCu extension is detected, the classloader will be optimised to use
+ * If the APCu extension is detected, the classloader will be optimized to use
  * it. Set to FALSE to disable this.
  *
  * @see https://getcomposer.org/doc/articles/autoloader-optimization.md

Ignored node_modules

The failure is caused from Drupal\Tests\ComposerIntegrationTest::testExpectedScaffoldFiles(), which isn't related to this patch.

It's relevant :) , and fixed by s/optimised/optimized in sites/default/default.settings.php

jungle’s picture

Status: Needs work » Needs review

2 coding standards messages haven't been fixed.

✗ 2 more than branch result

/var/lib/drupalci/workspace/jenkins-drupal_patches-38630/source/core/themes/claro/js/nav-tabs.es6.js ✗ 1 more
line 0 File ignored because of a matching ignore pattern. Use "--no-ignore" to override.
/var/lib/drupalci/workspace/jenkins-drupal_patches-38630/source/core/themes/seven/js/nav-tabs.es6.js ✗ 1 more
0 File ignored because of a matching ignore pattern. Use "--no-ignore" to override.

alexpott’s picture

@jungle node_modules is ignored automatically by cspell for what it is worth. Thanks for fixing the fail.

@kiamlaluno so our tests contain lots of foreign words and made up stuff. I generated the list of words by emptying out the word list in the .cspell.json file and then doing...

$ cspell "**/*" --unique --wordsOnly > ../wordlist.txt
$ sort ../wordlist.txt ../sorted_wordlist.txt

And then created valid JSON from that list.

I think we could commit this patch and then do follow-ups to remove incorrect spellings from the list or decide to use cspell's ability to ignore sections / lines / files where appropriate. This main thrust of this patch should be to get this in and then I can add it to https://github.com/alexpott/d8githooks so all core patches are spellchecked. And then patches like #2388925: British again invade config sync won't ever be necessary again.

xjm’s picture

Version: 9.0.x-dev » 9.1.x-dev

Why is this issue filed against 9.0.x? 9.0.x should only contain major-only changes, or bugs that only affect 9.0.x+. Feature requests should be filed against 8.9.x and will be shortly updated to 9.1.x.

alexpott’s picture

It was filed against 9.0.x because I didn't want to support D8 with this because of unnecessary work. If this lands in D9 then great. As for what exact version it is filed against I think we're placing way too much importance on that being "right" before a patch is rtbc and we, ie. committers, should be responsible for setting it to the correct version and ensure the patch applies and is tested against that specific branch. Also with a code quality test improvement like this as long it hits the latest dev branch then it benefits the majority of the code because we have to commit 99.5% of patches there first.

Anyhow, I agree that 9.1.x is the correct version fwiw.

alexpott’s picture

+++ b/core/.cspell.json
@@ -0,0 +1,1734 @@
+      "Customised",
+      "Customises",
...
+      "Initialise",
...
+      "Normalise",
+      "Normalises",
...
+      "Optimises",
...
+      "initialised",
...
+      "normalise",
+      "normalised",
...
+      "sanitise",
+      "sanitised",

So there are loads of real spelling mistakes in the list because I generated the list from our current spelling mistakes as determined by cspell. I think a good way forward would be to commit this list as is. And then do a follow to remove all the obviously wrong spellings - for example these British spellings and go forwards. At least then we can be sure that we don't add any new (unique) spelling mistakes and we don't suffer massive patch churn here.

jungle’s picture

Expecting this to be committed soon if possible.

jungle’s picture

One question, should the command in #8 be added to the scripts section of core/package.json?

For example:

--- a/core/package.json
+++ b/core/package.json
@@ -22,7 +22,8 @@
     "lint:css": "stylelint \"**/*.css\"",
     "lint:css-checkstyle": "stylelint \"**/*.css\" --custom-formatter ./node_modules/stylelint-checkstyle-formatter/index.js",
     "test:nightwatch": "cross-env BABEL_ENV=development node -r dotenv-safe/config -r babel-register ./node_modules/.bin/nightwatch --config ./tests/Drupal/Nightwatch/nightwatch.conf.js",
-    "prettier": "prettier --write \"./**/*.es6.js\" \"./tests/Drupal/Nightwatch/**/*.js\""
+    "prettier": "prettier --write \"./**/*.es6.js\" \"./tests/Drupal/Nightwatch/**/*.js\"",
+    "cspell": "cspell \"**/*\""
   },
alexpott’s picture

Status: Needs review » Needs work
Issue tags: +Needs change record, +Needs issue summary update

@jungle nope I don't think we do need to add to package.json. We could add it there but I don't think that that is necessary. Doing a global install seems fine for now.

We need a few things:

  • We need a change record to tell core developers how to spellcheck their code
  • We need a follow-up to go through the .cspell.json to remove spelling errors from the word list and fix them
  • We need at least a cursory note about the new build dependency - it's not included in any package of core so it's not important but the issue summary needs to state that
  • We need to update the issue summary to describe the approach and follow-up steps.
  • We need a follow-up to discuss adding it to DrupalCI so that can have a spellcheck step.
jungle’s picture

Assigned: Unassigned » jungle

Follow #19 and work on

jungle’s picture

Assigned: jungle » Unassigned
Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Needs change record, -Needs issue summary update
Related issues: +#3122088: [Meta] Remove spelling errors from dictionary.txt and fix them, +#3122096: Support code spell checking

Re #19

1. CR created
2. See #3122088
3. See section Build dependency changes
4. Updated section Proposed resolution
5. See #3122096

clayfreeman’s picture

Reviewed the work in #11 and #21; everything looks great from what I can tell. I updated the change request to make one sentence a bit easier to read.

+1 RTBC from me; I presume @alexpott will want a chance to review before this hits full RTBC.

avpaderno’s picture

Another issue reminded me that Drupal uses American. (I take that means both in strings shown in the user interface and in comments.) The spellchecker should also correct British words into American words.

Does the spellchecker do that automatically, or is there a setting for that?

jungle’s picture

Re #23, the CR https://www.drupal.org/node/3122084 might help to answer the question.

alexpott’s picture

+++ b/core/.cspell.json
@@ -0,0 +1,1734 @@
+      "Normalise",
+      "Normalises",
...
+      "synchronisation",

British spellings that need fixing - probably due to me :)

The list in core/.cspell.json is a list of things that need to be fixed or are Drupalisms that we want to keep. But adding this list at least allows us to spell check and then work on improving the code in a maintainable and testable way.

sja112’s picture

Status: Needs review » Needs work
Issue tags: +Needs reroll
StatusFileSize
new209.08 KB

As per the comment #15,

Anyhow, I agree that 9.1.x is the correct version fwiw.

If we are planning it to be fixed for 9.1.x patch needs to be re-rolled.

2972224-img-11-patch-not-working

jungle’s picture

Thanks, @sja112!

Tip: to queue testing against the target branch. if the patch does not apply, it will fail and tell/show. Attaching screenshot of failed applying patch is unnecessary.

sja112’s picture

Status: Needs work » Needs review
StatusFileSize
new53.96 KB

Thanks, @jungle for your suggestion.

I have re-rolled the patch for the 9.1.x version.

Changes in core/lib/Drupal/Core/Installer/InstallerServiceProvider.php and core/lib/Drupal/Core/Installer/NormalInstallerServiceProvider.php are not required.

sivaji_ganesh_jojodae’s picture

Issue tags: -Needs reroll
alexpott’s picture

StatusFileSize
new7.54 KB
new60.82 KB

@sja112 if you re-roll this patch you need to run cspell to ensure you don't get any new errors introduced by patches added in the meantime.

$ cspell "**/*"
/Users/alex/dev/sites/drupal8alt.dev/core/assets/scaffold/files/default.settings.php:233:21 - Unknown word (mydriver)
/Users/alex/dev/sites/drupal8alt.dev/core/assets/scaffold/files/default.settings.php:234:56 - Unknown word (mydriver)
/Users/alex/dev/sites/drupal8alt.dev/core/lib/Drupal/Core/Installer/NormalInstallerServiceProvider.php:17:67 - Unknown word (customisations)
/Users/alex/dev/sites/drupal8alt.dev/core/modules/system/tests/src/Functional/Theme/ThemeUiTest.php:328:6 - Unknown word (incompatitable)
/Users/alex/dev/sites/drupal8alt.dev/core/modules/system/tests/src/Functional/Theme/ThemeUiTest.php:343:26 - Unknown word (incompatitable)
/Users/alex/dev/sites/drupal8alt.dev/core/modules/system/tests/src/Functional/Theme/ThemeUiTest.php:360:26 - Unknown word (incompatitable)
/Users/alex/dev/sites/drupal8alt.dev/core/modules/system/tests/src/Functional/Theme/ThemeUiTest.php:364:28 - Unknown word (incompatitable)
/Users/alex/dev/sites/drupal8alt.dev/core/profiles/demo_umami/themes/umami/umami.theme:160:55 - Unknown word (neccessary)
/Users/alex/dev/sites/drupal8alt.dev/core/tests/Drupal/BuildTests/Composer/Template/ComposerProjectTemplatesTest.php:138:17 - Unknown word (SUT's)
/Users/alex/dev/sites/drupal8alt.dev/core/tests/Drupal/KernelTests/Core/Database/FetchTest.php:86:42 - Unknown word (Prefech)
/Users/alex/dev/sites/drupal8alt.dev/core/themes/bartik/bartik.theme:186:55 - Unknown word (neccessary)
/Users/alex/dev/sites/drupal8alt.dev/core/themes/claro/claro.theme:418:57 - Unknown word (neccessary)
CSpell: Files checked: 14772, Issues found: 12 in 8 files

These are fixed in the patch attached.

daffie’s picture

Issue summary: View changes
daffie’s picture

Issue summary: View changes
daffie’s picture

Issue summary: View changes
lendude’s picture

Reading through the word list is fantastic :) Somebody managed to get Teletubbies in! This does take out the fun of trying to sneak a new unique animal into core though ;)

Just some thoughts:

Some false positives in the list that jump out at me are things like:

+++ b/core/.cspell.json
@@ -0,0 +1,1735 @@
+      "Aexception",
+      "Afiller",

These are caused by URL escaping. the 'A' is from a URL escaped character before the actual word, might need an upstream fix if we care enough?

+++ b/core/lib/Drupal/Core/Language/LanguageManager.php
@@ -229,6 +229,7 @@ public static function getStandardLanguageList() {
+    // cSpell:disable

not a fan of needing this in core code, but I can see the list exploding when not doing that in those couple of cases ¯\_(ツ)_/¯

Also, if we want to keep the list maintainable we might want to introduce some more API in the test framework to generate certain test strings. Expand \Drupal\Tests\RandomGeneratorTrait to do strings with UTF-8 characters in them, so you don't have to make up your own when writing tests? And in the follow ups for cleaning out the list we could replace fixed UTF-8 strings with generated UTF-8 strings if we wanted to.

We would want to keep the changes to this list to a minimum I would think, since otherwise this could lead to merge conflicts and re-rolls. So not having false positives and minimising the number of hardcoded weird strings in tests would probably be requirement to make this non-disruptive.

Also, it crashes on Node 8.11 :(

daffie’s picture

Status: Needs review » Reviewed & tested by the community

Runing the command:

cd core
cspell -c .cspell.json "**/*"

Gets the result: "CSpell: Files checked: 14749, Issues found: 0 in 0 files".

Followups have been created for cleaning up the word list in .cspell.json and for supporting this check on the drupalci_testbot.

The patch contains the new .cspell.json file and a number of spelling corrections.

The list of ignored paths looks reasonable and the language is set to "en-US".

All code changes look good to me.
For me it is RTBC.

avpaderno’s picture

Should not the excluded path include more files/directories? As package.json and yarn.lock are excluded, should not composer.json and composer.lock be excluded too?

alexpott’s picture

@kiamlaluno composer.lock is up a directory so not relevant. I think scanning the composer.json in /core is good. They have text in.

xjm’s picture

Status: Reviewed & tested by the community » Needs review
  1. +++ b/core/.cspell.json
    @@ -0,0 +1,1735 @@
    +      "AND's",
    ...
    +      "API's",
    

    We should check the context on these. If it's "the API's functionality" or "the AND's second clause" (possessives), then we can leave them in. However, if it's "There are many API's" or "There are two "AND's" (intended to be plurals), then we should remove the apostrophe. It's a common mistake to pluralize acronyms with an apostrophe.

  2. +++ b/core/.cspell.json
    @@ -0,0 +1,1735 @@
    +      "Aresponse",
    

    This is also not present in core as a single word. Where are these coming from? Is the dictionary from D8? If so we should regenerate it only for D9. This isn't backportable IMO.

  3. +++ b/core/.cspell.json
    @@ -0,0 +1,1735 @@
    +      "Bajor",
    

    lol, hi @phenaproxima

  4. +++ b/core/.cspell.json
    @@ -0,0 +1,1735 @@
    +      "Becalled",
    

    This looks like a typo but I can't find it anywhere as an independent word, even with a case-insensitive search? Is it attempting to parse camel-case? If so the camel-case should be beCalled() and therefore not detected as a single word.

I am thinking the patch for this issue should just add the dictionary with no spelling fixes. Then, we should create a meta of followups to fix the spelling errors included already, or in review comments, as well as others. I almost feel like we need to go over nearly every item in the dictionary and check for why it's there.

alexpott’s picture

We can go over the dictionary in the follow-up. The dictionary being added here is so that we can work on it in the follow up #3122088: [Meta] Remove spelling errors from dictionary.txt and fix them. I agree that we need to go through this list but trying to get the list perfect now will mean that we'll never get this in. And adding this even in this state will help improve core code quality - see #30 and how this caught a spelling mistake in an @see.

I tried just adding the dictionary with all of our spellings but that broke cspell because the custom dictionary grew too long.

I think we should regenerate the list and hopefully that deals with becalled.

alexpott’s picture

StatusFileSize
new3.06 KB
new60.53 KB

Here's how I generated the new list.

Applied the patch

  • edit core/.cspell.json and set words to an empty array
  • run the core directory run cspell "**/*" --unique --wordsOnly > ../wordlist.txt
  • sort ../wordlist.txt > ../sorted_wordlist3.txt
  • edited sorted_wordlist3.txt and turned it into a json list to paste back into .cspell.json

The Becalled comes from an incorrect function call in \Drupal\Tests\Core\Form\FormStateDecoratorBaseTest::testSetLimitValidationErrors() -

    $this->decoratedFormState->setLimitValidationErrors($limit_validation_errors)
      ->shouldBecalled();

The should be shouldBeCalled

The reason for the changes here I think is all the update test removals that happened between #8 and now.

longwave’s picture

+++ b/core/.cspell.json
@@ -0,0 +1,1719 @@
+      "modules/color/preview.html",

Let's also exclude themes/bartik/color/preview.html as this is just Lorem Ipsum and a bunch of Latin-looking words are in the exclusions list.

edit: there is Lorem Ipsum in several files, I guess that can wait for the followup

+      "inheritdic",
+      "inheritdodc",
+      "inheritoc",

I wonder what these should be ;)

longwave’s picture

cspell docs note there is a "php" dictionary for PHP keywords and library methods, we should probably include that and it will cut down our word list a bit.

longwave’s picture

Also shouldn't we add this to package.json and yarn.lock, like the other developer tools? I assume the word lists are in an npm package somewhere and without locking to a specific version we can't guarantee everyone will use identical word lists.

This is basically the same as eslint/stylelint but for English :)

jungle’s picture

Re #41, checked for fun 😆 one word, one usage found only, 3 in total:

  1. inheritdic over \Drupal\Tests\views\Kernel\ViewsTemplateTest::$testViews
  2. inheritdodc over \Drupal\media_library\Form\AddFormBase::trustedCallbacks()
  3. inheritoc over \Drupal\Tests\help\Functional\ExperimentalHelpTest::setUp()
longwave’s picture

Assigned: Unassigned » longwave

Playing with this a bit more. I think we can solve a bunch of URL-encoded false positives by ignoring the regex %[0-9A-F]{2} and there are some bits of Lorem Ipsum we can skip too.

longwave’s picture

Assigned: longwave » Unassigned
StatusFileSize
new62.39 KB
new25.52 KB

Excluded some more non-English text (but there is more to do in various translation tests, if we want to) and ignored the regex %[0-9][0-9A-F] which helps with URL-encoded false positives - %[0-9A-F]{2} was matching "%destination" and triggering a spelling error on "stination". This cuts down the word list by 230 or so entries.

xjm’s picture

BTW for the various stuff we're finding above, we can file issues/patches now as children of the above followup; we don't need to wait for this to land.

I think if we need to run the script with a custom regex, we should add a command to core or something. (The CR at least needs to document it.)

dww’s picture

Re: #47: Good point. I opened #3138671: Fix "incompatitable" typos in core to get us started (since I've made that typo numerous times in core in the last few months). ;)

longwave’s picture

Re #47 the regex is in the .cspell.json config file, no custom command needed, but I still think it's worth adding cspell to package.json both as a package and a command - though @alexpott disagreed with this in #19 so I haven't done this yet.

I think that to scope this better we could remove the various spelling fixes from the patch here, and spin that out into a separate issue ("fix British English spellings" or similar). That would leave us here just adding .cspell.json and a set of comments where spellchecking is disabled.

longwave’s picture

StatusFileSize
new84.95 KB
new43.01 KB

Another pass at this with ~200 more false positives removed such as all the Lorem Ipsum and a number of translations.

Excluded the following files:

lib/Drupal/Component/Diff/** - not our code and has loads of strange variable names
modules/**/Migrate*Test.php - perhaps controversial but this skips a whole bunch of words from translation tests
modules/system/tests/logo.svgz - this is being treated as text by cspell for some reason
COPYRIGHT.txt - contains a lot of names

Also added several more cspell disable/enable blocks.

Remaining questions:

  1. Do we add cspell to package.json and yarn.lock? This lets us lock the cspell and dictionary versions and also provide e.g. "npm run cspell" to make it easier to use
  2. Do we spin out the spelling fixes in this patch to another issue?
dww’s picture

Re: #50:

- Seems a bit dangerous to ignore COPYRIGHT.txt. Granted, it shouldn't be changing much, but that seems like an important file to not have typos in. But yeah, it's a lot of names. /shrug

1. +1 to that. Seems good to have this handled much like eslint, locking specific versions of dictionaries, and having an npm run cspell command.

2. +1 to that. I guess I already started with #3138671: Fix "incompatitable" typos in core since I now see that those fixes are already included here. Seems weird to have a mix of known typos in cspell.json and some typo fixes in this patch. I'd rather this was just adding the cspell plumbing, the cspell ignore comments, and the complete cspell.json list (including bugs), and then we fix the bugs (and remove the entries from cspell.json) in children of the follow-up.

longwave’s picture

jungle’s picture

Filed #3138731: Fix "inheritdoc" typos in core to fix the funny @inheritdoc ones in #41/#44

dww’s picture

Status: Needs review » Needs work

So let's re-roll this to remove the typo fixes from here per #50.2.

quietone’s picture

Regarding excluding "modules/**/Migrate*Test.php - perhaps controversial but this skips a whole bunch of words from translation tests"

I am a casual reader of this issue and reckon if we want to spell check this type of migration test then a separate issue should be made because there are 253 tests of that type, 26 of which are specific to translation. As the writer of most of those translation tests I know I am prone to typos and mixing American and British spelling so maybe it is prudent to make a new issue?

xjm’s picture

+++ b/core/.cspell.json
@@ -0,0 +1,1288 @@
+      "Файл",
+      "Русский",
+      "العربية",
+      "тестирования",
+      "ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΣὨ",
+      "αβγδεζηθικλμνξοσὠ"

We should probably skip things that aren't in the Latin alphabet, like the entire Greek alphabet for example, or "Russian" in Russian or "Arabic" in Arabic. Aside: "File" gets straight-up transliterated? That's delightful. :) Edit: "тестирования" is even better; what a wonderful loanword.

If the tool doesn't support skipping things in other character sets, seems like it should.

sja112’s picture

Status: Needs work » Needs review
StatusFileSize
new60.95 KB
new19.33 KB

As per #54 rerolled the patch to remove the typo files.
These fixes will be handled in the related issue Convert British English spellings to American English, for the umpteenth time

xjm’s picture

I agree with @longwave; I think we probably do want it as a dev dependency in package.json. It should probably get checked on commit like coding standards do; otherwise, it'll keep regressing.

Edit: That also means a dependency evaluation. :)

sja112’s picture

Filed #3138731: Fix "neccessary" typos in core.

I handled it separately because it was not in the scope of Convert British English spellings to American English, for the umpteenth time

dww’s picture

@sja112: I already did at #3138721: Fix "neccessary" typos in core.

@all: Please use #3122088: [Meta] Remove spelling errors from dictionary.txt and fix them as the parent for any "Fix X spelling typo" issues, and be sure to check there for what's already been filed before opening anything else. There's an active discussion about how to properly scope things.

This issue is about the cspell plumbing (only).

#3122088 and children are about fixing the actual typos. We don't need comments in this issue every time we open a child of #3122088.

Thanks!
-Derek

alexpott’s picture

Crediting @dww from the duplicate issue #3138721: Fix "neccessary" typos in core

sja112’s picture

@alexpott,

As per #54 rerolled the patch to remove the typo files in #57. That's why worked on another issue to fix "neccessary" typo error like we did for other words.

alexpott’s picture

The recent comments continue to set this back unnecessarily. We could have committed this to core and add to the committer toolchain and the committer can either fix the typos on commit or set the patch back to needs work. It's such a shame to not commit things that will make an obvious and clear benefit to the committer workflow and then iterate.

Let's not add this to package.json yet we can do that after and spend the necessary time whilst getting the improvements in core. We can do this in a follow-up.

longwave’s picture

@alexpott in that case isn't #57 ready for review? The followups can keep building up and wait until this is committed if necessary.

alexpott’s picture

Proceeding with the sub issues of #3122088: [Meta] Remove spelling errors from dictionary.txt and fix them like #3138721: Fix "neccessary" typos in core before committing this means we no way of proving that we're not introducing regressions. Once we have the core/.cspell.json in core we can stop regressing and more forward with 3122088.

Any of the patches that were green and not failing would have been good to commit. If we've had this is core when this was first done then #3138721: Fix "neccessary" typos in core would not have even been neccessary :)

@longwave yes let's get #57 in.

alexpott’s picture

Issue summary: View changes
alexpott’s picture

Title: Try to automate spellchecking in Drupal core » Add .cspell.json to automate spellchecking in Drupal core
alexpott’s picture

Assigned: Unassigned » alexpott
Status: Needs review » Needs work
+++ b/core/.cspell.json
@@ -0,0 +1,1288 @@
+    "dictionaries": ["softwareTerms", "fonts", "companies", "php"],
...
+        "filename": "**/{*.module,*.install,*.inc}",
+        "languageId": "php"

php is automatically detected for .php files... we do not need to add it here. I'm working on updating the file.

+++ b/core/.cspell.json
@@ -0,0 +1,1288 @@
+    "ignoreRegExpList": [
+      "^msgstr .*",
+      "%[0-9][0-9A-F]"
+    ],

Nice!

longwave’s picture

Adding the PHP dictionary removed at least these words from the list, and probably more:

-      "chdir",
-      "chmod",
-      "ctype",
-      "fread",
-      "fseek",
-      "gzinflate",
-      "imagemagick",
-      "instanceof",
-      "isset",
-      "mkdir",
-      "mktime",
-      "rmdir",
-      "rtrim",
-      "sprintf",
-      "stripos",
-      "strrpos",
-      "strtok",
-      "strtolower",
-      "strtoupper",
-      "strtr",
-      "unserialize",
-      "unshift",
alexpott’s picture

StatusFileSize
new4.05 KB
new60.92 KB

Turns out adding php to the global dictionaries is a good idea. Fixes things like chmod in core/INSTALL.txt - and we are a PHP project. I've also added the html dictionary for the same reason. We can always revisit these choices after commit and add new words to the wordlist but this does feel like a good place to start.

With this .cspell.json running cspell "**/*" results in no errors.

alexpott’s picture

+++ b/core/.cspell.json
@@ -753,10 +752,8 @@
-      "inheritdic",
-      "inheritdodc",
-      "inheritoc",

This is why we need to get this in first. If we land issues like #3138731: Fix "inheritdoc" typos in core and #3138671: Fix "incompatitable" typos in core we'll be playing catch up forever.

alexpott’s picture

Status: Needs work » Needs review
jungle’s picture

  1. $ cspell --version
    4.0.63
    $ cspell "**/*"
    CSpell: Files checked: 14490, Issues found: 0 in 0 files
    

    The patch looks good.

  2. +++ b/core/.cspell.json
    @@ -26,7 +26,7 @@
    +    "dictionaries": ["companies", "fonts", "html", "php", "softwareTerms"],
    

    I think the following dictionaries should be added as well.

    General Dictionaries

    • en_US - Derived from Hunspell US English words.

    Programming Language Dictionaries

    • typescript - keywords for Typescript and Javascript
    • css - css, less, and scss related keywords
    • bash - Bash/shell script keywords
    • en_US: explicitly telling that we are adopting American English
    • typescript: for .es.js and .js
    • css: for scss, css etc.

    I checked with

    -    "dictionaries": ["companies", "fonts", "html", "php", "softwareTerms"],
    +    "dictionaries": ["en_US", "companies", "fonts", "html", "css", "typescript", "php", "softwareTerms"],
    

    the output:

    $ cspell "**/*"
    CSpell: Files checked: 14490, Issues found: 0 in 0 files
    

    Note: the test without the bash dictionary

    +++ b/core/.cspell.json
    @@ -0,0 +1,1279 @@
    +        "filename": "**/scripts/{*.php.txt, *.sh}",
    +        "languageId": "php"
    
longwave’s picture

en_US is already specified as the language, we don't need to repeat that with the dictionary.

I think the risk of adding more dictionaries is we might end up with some misspelled words being ignored that happen to be in those dictionaries. I have scanned the words list in .cspell.json and I don't immediately see any CSS or JavaScript words that look like they would be removed anyway (we can test this by deleting the word list, adding the dictionaries and regenerating the word list, then doing a diff).

As far as I can see this is "good enough" for now and we can work on incremental improvements in followups.

jungle’s picture

Thanks, @longwave!

> As far as I can see this is "good enough" for now and we can work on incremental improvements in followups.

Agree, setting back to RTBC?

sja112’s picture

Applied the patch on the 9.0.x branch.

Ran cspell to ensure I don't get any new errors introduced by patches added in the meantime.

Got three,

$ cspell -c .cspell.json "**/*"
core/modules/migrate_drupal_ui/tests/src/Functional/d6/MultilingualReviewPageTest.php:153:11 - Unknown word (nviews)
core/modules/migrate_drupal_ui/tests/src/Functional/d6/NoMultilingualReviewPageTest.php:157:11 - Unknown word (nviews)
core/tests/Drupal/KernelTests/Core/Theme/StableDecoupledTest.php:89:27 - Unknown word (skippable)
CSpell: Files checked: 14486, Issues found: 3 in 3 files
longwave’s picture

Status: Needs review » Needs work

Both nviews and skippable have been removed from 9.1.x but are indeed present in 9.0.x. I guess the question is, how far back are we going to go with this? Should the word list be the same across all branches?

jungle’s picture

To maintain extra custom dictionaries per branch?

"dictionaryDefinitions": [
    { "name": "drupal88", "path": "./dictionary/8.8.x.txt"},
    { "name": "drupal89", "path": "./dictionary/8.9.x.txt"},
    { "name": "drupal90", "path": "./dictionary/9.0.x.txt"},
    { "name": "drupal91", "path": "./dictionary/9.1.x.txt"},
],

nviews and skippable go to 9.0.x.txt only?

jungle’s picture

> As far as I can see this is "good enough" for now and we can work on incremental improvements in followups.

Or no more back and forth, commit it, do all the rest in follow-ups.

jungle’s picture

Status: Needs work » Needs review

Setting back to Needs review.

sja112’s picture

Assigned: alexpott » Unassigned
Status: Needs review » Reviewed & tested by the community

Agreeing to #79.

I also believe in solving the pending issues in the follow-ups.

Patch looking good. Setting it to RTBC.

xjm’s picture

Status: Reviewed & tested by the community » Needs review

I don't think this is ready for commit. Couple things:

  1. The CR in https://www.drupal.org/node/3122084 should include examples of both updating the word list and skipping spellchecking on certain sections of code, as the patch does.
  2. The claim is being made that this patch needs to be committed first for other misspellings to avoid regressions, but right now committing this patch isn't going to prevent any regressions. In order for regressions to actually be avoided, this needs to be built into the on-commit yarn checks that happen in the committer toolchain, which means adding this to package.json and so forth. I disagree with requiring a global installation; every time we've used global installs for our development tooling before, things have eventually gone horribly wrong.
  3. Finally, I really disagree with committing this issue first rather than cleaning up obvious problems in core that already have patches. Folks misunderstood me when I said we didn't need to wait to file issues, and created patches for them in addition to filing the issues, but there's no reason to throw that work away when all we have to do here is delete a couple lines from the list.
xjm’s picture

Okay apparently there are 40-odd issues that now have patches, which definitely wasn't my intention. It's not an ideal situation to be in either way. But I see no reason to block those issues if this issue isn't going to actually stop regressions from being committed. Requiring committers to do this manually on every commit is not an option IMO, and if it doesn't run on commit we're not changing anything from the current status.

xjm’s picture

Note that this is different from your typical PHPCS rule because part of it is about documentation and language. Those tasks always need different scope management than reformatting whitespace or such.

xjm’s picture

@jungle, interesting idea on the custom dictionaries per branch. I think though we want to be striving for as small a dictionary as possible with as little maintainable surface as possible, so I would keep just the one dictionary. IMO this issue is only eligible for 9.1.x, but many of the fixes (where there's merely documentation errors in particular) can be backported as far as 8.8.x.

longwave’s picture

Updated the change record to give the options of fixing spelling, adding to the word list, or ignoring the new word(s) using code comments.

Dependency evaluation for adding this to core:

GitHub repo: https://github.com/streetsidesoftware/cspell
Release cycle: first release was in January 2017, there has been approximately one major release a year (v4 is current, v5 is in alpha) and there are regular (monthly or sooner) patch releases.
Security policy: cspell is a development tool that does not appear to have a security release process.

longwave’s picture

StatusFileSize
new87.29 KB
new26.37 KB

Added cspell to package.json and yarn.lock. Also added two commands for ease of use:

  • npm run spellcheck path/to/file - check a file (can specify multiple files)
  • npm run spellcheck:core - check all of core
alexpott’s picture

@longwave how about the dependencies it pulls in?

longwave’s picture

> how about the dependencies it pulls in?

Do we really evaluate every single npm dependency? As npm packages go it doesn't seem too sprawling, and it is only ever used in development (and even then it doesn't compile code or anything like that), so how detailed do we have to be here? All the packages beginning cspell- are from the same author. The full tree is:

$ npx npm-remote-ls cspell
npx: installed 109 in 8.123s
└─ cspell@4.0.63
   ├─ cspell-glob@0.1.19
   │  └─ micromatch@4.0.2
   │     ├─ braces@3.0.2
   │     │  └─ fill-range@7.0.1
   │     │     └─ to-regex-range@5.0.1
   │     │        └─ is-number@7.0.0
   │     └─ picomatch@2.2.2
   ├─ gensequence@3.1.1
   ├─ get-stdin@7.0.0
   ├─ comment-json@3.0.2
   │  ├─ core-util-is@1.0.2
   │  ├─ has-own-prop@2.0.0
   │  ├─ repeat-string@1.6.1
   │  └─ esprima@4.0.1
   ├─ chalk@2.4.2
   │  ├─ ansi-styles@3.2.1
   │  │  └─ color-convert@1.9.3
   │  │     └─ color-name@1.1.3
   │  ├─ supports-color@5.5.0
   │  │  └─ has-flag@3.0.0
   │  └─ escape-string-regexp@1.0.5
   ├─ fs-extra@9.0.0
   │  ├─ at-least-node@1.0.0
   │  ├─ jsonfile@6.0.1
   │  │  ├─ universalify@1.0.0
   │  │  └─ graceful-fs@4.2.4
   │  ├─ universalify@1.0.0
   │  └─ graceful-fs@4.2.4
   ├─ commander@2.20.3
   ├─ minimatch@3.0.4
   │  └─ brace-expansion@1.1.11
   │     ├─ concat-map@0.0.1
   │     └─ balanced-match@1.0.0
   ├─ cspell-lib@4.1.29
   │  ├─ comment-json@3.0.2
   │  ├─ cspell-dict-aws@1.0.5
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-bash@1.0.3
   │  │  └─ configstore@5.0.1
   │  ├─ configstore@5.0.1
   │  │  ├─ dot-prop@5.2.0
   │  │  │  └─ is-obj@2.0.0
   │  │  ├─ graceful-fs@4.2.4
   │  │  ├─ unique-string@2.0.0
   │  │  │  └─ crypto-random-string@2.0.0
   │  │  ├─ make-dir@3.1.0
   │  │  │  └─ semver@6.3.0
   │  │  ├─ xdg-basedir@4.0.0
   │  │  └─ write-file-atomic@3.0.3
   │  │     ├─ imurmurhash@0.1.4
   │  │     ├─ is-typedarray@1.0.0
   │  │     ├─ signal-exit@3.0.3
   │  │     └─ typedarray-to-buffer@3.1.5
   │  │        └─ is-typedarray@1.0.0
   │  ├─ cspell-dict-cpp@1.1.26
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-companies@1.0.22
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-django@1.0.15
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-cryptocurrencies@1.0.2
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-dotnet@1.0.14
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-en-gb@1.1.16
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-elixir@1.0.13
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-en_us@1.2.26
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-fullstack@1.0.23
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-golang@1.1.14
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-haskell@1.0.4
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-fonts@1.0.5
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-latex@1.0.13
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-java@1.0.12
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-html-symbol-entities@1.0.13
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-python@1.0.20
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-lua@1.0.8
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-powershell@1.0.6
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-lorem-ipsum@1.0.10
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-php@1.0.13
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-rust@1.0.12
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-ruby@1.0.3
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-scala@1.0.11
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-dict-software-terms@1.0.10
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-io@4.0.22
   │  │  ├─ iterable-to-stream@1.0.1
   │  │  └─ iconv-lite@0.4.24
   │  │     └─ safer-buffer@2.1.2
   │  ├─ cspell-dict-typescript@1.0.5
   │  │  └─ configstore@5.0.1
   │  ├─ cspell-util-bundle@4.0.12
   │  ├─ cspell-trie-lib@4.1.10
   │  │  └─ gensequence@3.1.1
   │  ├─ gensequence@3.1.1
   │  ├─ minimatch@3.0.4
   │  ├─ fs-extra@9.0.0
   │  └─ vscode-uri@2.1.1
   └─ glob@7.1.6
      ├─ inflight@1.0.6
      │  ├─ once@1.4.0
      │  └─ wrappy@1.0.2
      ├─ fs.realpath@1.0.0
      ├─ inherits@2.0.4
      ├─ minimatch@3.0.4
      ├─ path-is-absolute@1.0.1
      └─ once@1.4.0
         └─ wrappy@1.0.2
longwave’s picture

StatusFileSize
new108.16 KB
new24.6 KB

Ignored all the non Latin characters, some parts of binary encoded strings, and a few other bits.

dww’s picture

Cool, nice to see this moving forward. +1 to putting this directly into core, instead of relying on global installs of anything. Makes it much more likely people will (be able to) use this.

However, this seems to bring in a lot more than we want:

  1. +++ b/core/yarn.lock
    @@ -1661,6 +1688,285 @@ cross-spawn@^7.0.0:
    +cspell-dict-cpp@^1.1.26:
    +  version "1.1.26"
    +  resolved "https://registry.yarnpkg.com/cspell-dict-cpp/-/cspell-dict-cpp-1.1.26.tgz#67e3f8d26ec2c49d305b086013935f0b0fade2e0"
    +  integrity sha512-ywY7X6UzC5BC7fQhyRAwZHurl52GjwnY6D2wG57JJ/bcT5IsJOWpLAjHORtUH2AcCp6BSAKR6wxl6/bqSuKHJw==
    +  dependencies:
    +    configstore "^5.0.0"
    +
    +cspell-dict-cryptocurrencies@^1.0.2:
    +  version "1.0.2"
    +  resolved "https://registry.yarnpkg.com/cspell-dict-cryptocurrencies/-/cspell-dict-cryptocurrencies-1.0.2.tgz#301773a9b555d4c3173f442b0770483874ec36dd"
    +  integrity sha512-suLIsOGmeHt+lqRBbbOJM9aVeBNcXq+3kKINOyuFiAJFpRhDMQrnATzGmW0hhi8XaJHFBcSeQY7iQYe3u1WbnA==
    +  dependencies:
    +    configstore "^5.0.0"
    +
    +cspell-dict-django@^1.0.15:
    +  version "1.0.15"
    +  resolved "https://registry.yarnpkg.com/cspell-dict-django/-/cspell-dict-django-1.0.15.tgz#a0faec617cab280bd9ef942d1b2a6a5634e5c143"
    +  integrity sha512-heppo6ZEGgv+cVPDLr24miG8xIn3E5SEGFBGHyNLyGqt8sHzeG3eNKhjKOJCC0hG/fq0ZECbE5q4691LvH24/Q==
    +  dependencies:
    +    configstore "^5.0.0"
    +
    +cspell-dict-dotnet@^1.0.14:
    +  version "1.0.14"
    +  resolved "https://registry.yarnpkg.com/cspell-dict-dotnet/-/cspell-dict-dotnet-1.0.14.tgz#780c3143d340e3211be27df7cfd2d9d1f82b24c5"
    +  integrity sha512-gTuh94tNAVMS4XmVCK2AsFgKp2mXBk2b8+f2GLCw2K8HY6QUHlvOJg051JJrZABRW/lAoquKZuqssSo9B1mgng==
    +  dependencies:
    +    configstore "^5.0.0"
    +
    +cspell-dict-elixir@^1.0.13:
    +  version "1.0.13"
    +  resolved "https://registry.yarnpkg.com/cspell-dict-elixir/-/cspell-dict-elixir-1.0.13.tgz#f3d08b27d2ee2a25fcae5050820d5680028e95d5"
    +  integrity sha512-KWDO4NeV3QuMlZxSWpN0sPiFN4GE5AzlDi75eSKRvq/f1+pxgxgXQ5zLNPnDbr2EOSJBV34paZwI+7PvCiTTgA==
    +  dependencies:
    +    configstore "^5.0.0"
    +
    

    Can we prevent it from pulling all this in?

  2. +++ b/core/yarn.lock
    @@ -1661,6 +1688,285 @@ cross-spawn@^7.0.0:
    +cspell-dict-en-gb@^1.1.16:
    +  version "1.1.16"
    +  resolved "https://registry.yarnpkg.com/cspell-dict-en-gb/-/cspell-dict-en-gb-1.1.16.tgz#75155e43c21e972ac2f60117b69fd53b5701335f"
    +  integrity sha512-PBzHF40fVj+6Adm3dV3/uhkE2Ptu8W+WJ28socBDDpEfedFMwnC0rpxvAgmKJlLc0OYsn07/yzRnt9srisNrLg==
    +  dependencies:
    +    configstore "^5.0.0"
    +
    

    And especially this, given #3138718: Convert British English spellings to American English, for the umpteenth time ;)

  3. +++ b/core/yarn.lock
    @@ -1661,6 +1688,285 @@ cross-spawn@^7.0.0:
    +cspell-dict-fonts@^1.0.5:
    ...
    +cspell-dict-fullstack@^1.0.23:
    ...
    +cspell-dict-golang@^1.1.14:
    ...
    +cspell-dict-haskell@^1.0.4:
    ...
    +cspell-dict-java@^1.0.12:
    ...
    +cspell-dict-latex@^1.0.13:
    ...
    +cspell-dict-lua@^1.0.8:
    ...
    +cspell-dict-powershell@^1.0.6:
    ...
    +cspell-dict-python@^1.0.20:
    ...
    +cspell-dict-ruby@^1.0.3:
    ...
    +cspell-dict-rust@^1.0.12:
    ...
    +cspell-dict-scala@^1.0.11:
    ...
    +cspell-dict-typescript@^1.0.5:
    

    Bunch more we don't care about, right?

Thanks!
-Derek

longwave’s picture

No, we can't stop that - cspell declares all those as dependencies, so you get them whether you like it or not. They do nothing except waste disk space, unless you explicitly choose to use them in .cspell.json.

dww’s picture

@longwave: re: #92 Duly noted, thanks.

Maybe this is silly, but is it possible/worthwhile to have a "post-install" command that runs to remove the crap we don't need? ;)

e.g.

% yarn install
installing a bunch of stuff...
triggering post-install hooks:
removing junk you don't really need

I know that kinda thing works in composer land, but I'm not a yarn-er. Thoughts?

Thanks again,
-Derek

alexpott’s picture

Doing #93 introduces more fragility we shouldn't be changing a packages dependencies externally.

alexpott’s picture

I've updated https://github.com/alexpott/d8githooks to support spellchecking. It will use a globally installed cspell if available and the .cspell.json from #90. However once this patch is committed it'll automatically switch to using yarn and the config we commit.

One downside (for me and any committer that chooses to install cspell) is that if one of the file changes that has words that fail checking and we're getting around that with a // cSpell:disable tag then we'll have to work out it if the patch is changing that line - but that feels a small price to pay for better patch review right now. The git hooks support a flag to disable spellchecking to get around this.

alexpott’s picture

So running yarn audit before and after running yarn install with this patch applied reveals that we already have known security vulnerabilities in js dev dependencies.

With patch

22 vulnerabilities found - Packages audited: 995
Severity: 21 Low | 1 Moderate
✨  Done in 2.10s.

Without patch

22 vulnerabilities found - Packages audited: 938
Severity: 21 Low | 1 Moderate
✨  Done in 1.36s.

At least we're not adding anything new :)

xjm’s picture

@longwave re: #89 yep, we try to keep a handle on our dependencies' dependencies too, since these also introduce stability and security risks. We've been trying very hard not to be "that project with 1000 npm dependencies in the dependency tree".

In #3107926: Update stylelint to ^13.0.0 the most recent version of stylelint introduced a much larger dependency tree than it had before, so we're discussing how to proceed there.

alexpott’s picture

Here's a diff using yarn-lock-diff

┌──────────────────────────────┬──────────────────────────────────────────────────┬──────────────────────────────────────────────────┐
│ package name                 │ old version(s)                                   │ new version(s)                                   │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ dot-prop                     │ [...]                                            │ [..., 5.2.0]                                     │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ graceful-fs                  │ [...]                                            │ [..., 4.2.4]                                     │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ is-obj                       │ [...]                                            │ [..., 2.0.0]                                     │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ make-dir                     │ [...]                                            │ [..., 3.1.0]                                     │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ write-file-atomic            │ [...]                                            │ [..., 3.0.3]                                     │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ at-least-node                │ -                                                │ 1.0.0                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ comment-json                 │ -                                                │ 3.0.2                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ configstore                  │ -                                                │ 5.0.1                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ crypto-random-string         │ -                                                │ 2.0.0                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-aws              │ -                                                │ 1.0.5                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-bash             │ -                                                │ 1.0.3                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-companies        │ -                                                │ 1.0.22                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-cpp              │ -                                                │ 1.1.26                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-cryptocurrencies │ -                                                │ 1.0.2                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-django           │ -                                                │ 1.0.15                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-dotnet           │ -                                                │ 1.0.14                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-elixir           │ -                                                │ 1.0.13                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-en-gb            │ -                                                │ 1.1.16                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-en_us            │ -                                                │ 1.2.26                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-fonts            │ -                                                │ 1.0.5                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-fullstack        │ -                                                │ 1.0.23                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-golang           │ -                                                │ 1.1.14                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-haskell          │ -                                                │ 1.0.4                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-html-symbol-ent… │ -                                                │ 1.0.13                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-java             │ -                                                │ 1.0.12                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-latex            │ -                                                │ 1.0.13                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-lorem-ipsum      │ -                                                │ 1.0.10                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-lua              │ -                                                │ 1.0.8                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-php              │ -                                                │ 1.0.13                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-powershell       │ -                                                │ 1.0.6                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-python           │ -                                                │ 1.0.20                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-ruby             │ -                                                │ 1.0.3                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-rust             │ -                                                │ 1.0.12                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-scala            │ -                                                │ 1.0.11                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-software-terms   │ -                                                │ 1.0.10                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-dict-typescript       │ -                                                │ 1.0.5                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-glob                  │ -                                                │ 0.1.19                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-io                    │ -                                                │ 4.0.22                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-lib                   │ -                                                │ 4.1.29                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-trie-lib              │ -                                                │ 4.1.10                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell-util-bundle           │ -                                                │ 4.0.12                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ cspell                       │ -                                                │ 4.0.63                                           │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ fs-extra                     │ -                                                │ 9.0.0                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ gensequence                  │ -                                                │ 3.1.1                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ has-own-prop                 │ -                                                │ 2.0.0                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ iterable-to-stream           │ -                                                │ 1.0.1                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ jsonfile                     │ -                                                │ 6.0.1                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ unique-string                │ -                                                │ 2.0.0                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ universalify                 │ -                                                │ 1.0.0                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ vscode-uri                   │ -                                                │ 2.1.1                                            │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ xdg-basedir                  │ -                                                │ 4.0.0                                            │
└──────────────────────────────┴──────────────────────────────────────────────────┴──────────────────────────────────────────────────┘
xjm’s picture

@longwave For more background on this concern, see: https://youtu.be/yxVJggeMAPs?t=656 (Edit: Added the start time code as you obviously don't need background information on Drupal 9.) :P Edit 2: https://youtu.be/yxVJggeMAPs?t=1928 is the timecode specifically about our future npm dependency tree.

Re: #96 ouch, we just fixed the bad packages on audit like 6 months ago. We probably want a separate[ issue to update them again... also that number of dev packages has exploded somehow since. GitHub also isn't listing all those for some reason.

alexpott’s picture

By actually using this discovered that we also need to include config for composer/* folders they have things we do and don't want to scan. Will work on this later. Note all the other files outside core/ are scaffolded files so they are sort of checked already.

xjm’s picture

Status: Needs review » Needs work
Issue tags: +Needs release manager review

For #100.

Also tagging for RM review of the dependency tree we're adding. (Definitely prefer to have it as a dev dependency in package.json given the scope of the dep tree.)

alexpott’s picture

Status: Needs work » Needs review
StatusFileSize
new2.13 KB
new108.34 KB

Adding the words from the composer folder. Turns out there aren't too many and there's no file that stands out that we should exclude.

xjm’s picture

So in addition to the various cspell packages, this adds:

  1. dot-prop
  2. graceful-fs
  3. is-obj
  4. make-dir
  5. write-file-atomic
  6. at-least-node
  7. comment-json
  8. configstore
  9. crypto-random-string
  10. fs-extra
  11. gensequence
  12. has-own-prop
  13. iterable-to-stream
  14. jsonfile
  15. unique-string
  16. universalify
  17. vscode-uri
  18. xdg-basedir

Blehhhhh at all the stupid micro-packages. Like is-obj is:

'use strict';

module.exports = value => {
	const type = typeof value;
	return value !== null && (type === 'object' || type === 'function');
};

srsly

xjm’s picture

Have we looked at any other libraries for this?

xjm’s picture

Issue summary: View changes

Most of the packages are authored by a handful of authors who really like to share their three-line functions. Added the list to the IS.

alexpott’s picture

This the package that phpcs uses to do this job.

longwave’s picture

It is also the engine behind the most commonly used spell checker in Visual Studio Code with 1.3 million installs to date.

https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.c...

jungle’s picture

StatusFileSize
new98 KB
new34.23 KB

Comment coming in the next comment

jungle’s picture

  1. +++ b/core/.cspell.json
    @@ -20,1230 +20,17 @@
    +      "misc/cspell/dictionary.txt"
    ...
    -    "dictionaries": ["companies", "fonts", "html", "php", "softwareTerms"],
    -    "words": [
    ...
    +    "dictionaries": ["drupal","companies", "fonts", "html", "php", "softwareTerms"],
    +    "dictionaryDefinitions": [
    +      { "name": "drupal", "path": "./misc/cspell/dictionary.txt"}
         ],
    

    Introduced a new custom dictionary: drupal, split out the words into misc/cspell/dictionary.txt making the .cspell.json cleaner and smaller.

  2. +++ b/core/package.json
    @@ -25,6 +25,7 @@
    +    "spellcheck:dict": "cspell \"**/*\" --unique --wordsOnly | sort > misc/cspell/dictionary.txt",
    

    Added a command spellcheck:dict to generate/update the dictionary

  3. The content of misc/cspell/dictionary.txt was generated by running npm run spellcheck:dict
  4. +++ b/core/.cspell.json
    @@ -0,0 +1,48 @@
    +    "ignorePaths": [
    ...
    +      "misc/cspell/dictionary.txt"
    

    The path of custom dictionary is ignored by cspell.

alexpott’s picture

Issue summary: View changes
StatusFileSize
new1.75 KB
new97.98 KB

@jungle you've undone the changes I made in #102. Also locally I couldn't get the spellcheck:dict command to not exit with:

error Command failed with exit code 1.

So I've adjusted the command to...

    "spellcheck:dict": "rm misc/cspell/dictionary.txt && yarn -s spellcheck:core --unique --wordsOnly | sort -o misc/cspell/dictionary.txt",

so that uses the spellcheck:core command and doesn't error locally for me. Also we need to rm misc/cspell/dictionary.txt first otherwise re-running the command empties the dictionary :)

However... thinking more about this command I'm not sure we should add it. Regenerating the dictionary might be a common task before we've committed this. But after it should never be done. So let's not add it to package.json but let's add it how to do it to the issue summary here.

alexpott’s picture

Issue summary: View changes

I've updated the change record with the yarn commands.

alexpott’s picture

jungle’s picture

  1. > you've undone the changes I made in #102.

    Sorry about this, missed. Thank for fixing it.

  2. I used the command in #108 generated the dictionary. it worked for me. and tested that your command works too.
    CSpell: Files checked: 14543, Issues found: 9118 in 2675 files
    
  3. To me, I would add the command to package.json, it's convenient if we want to re-generate the dictionary for some reasons (yes, currently, no reason). But it's ok, to get rid of it.
alexpott’s picture

I'm torn on the dictionary command - on one hand we really ought not need it if we proceed with all the follow-ups. On the other files will get fixed and removed outside of that process. However, the thing that gives me the most pause as to whether we should add the command is that it becomes another thing to support. Meaning we can't remove the command. I think given this and the fact that it wouldn't be part of any regular CI process mean that we shouldn't add it.

xjm’s picture

Why would we not regenerate the dictionary? We're going to be fixing issues manually, but issues might also get fixed without removing the bad entry from the dictionary, especially when dead code is removed, documentation is rewritten, or code is refactored, but also just because people fix things in it.

The command should at least go in the CR, but it is also still useful in the future I think to continue generating a smaller and smaller dictionary as things might get fixed in other issues.

alexpott’s picture

Because it is yet another thing to support and it will not part of any CI process. Also it is a bit fragile. It has to to remove the existing dictionary otherwise the first time you run it'll generate a blank dictionary. However once you remove the dictionary if you exist the process before quitting then running it again will fail. Exiting the process is not uncommon because it takes a long time to generate the dictionary and you can suddenly think oh I meant to do that as well.

Adding dictionary generation to the CR.

longwave’s picture

> However once you remove the dictionary if you exist the process before quitting then running it again will fail

If you change the first part of the command to rm -f misc/cspell/dictionary.txt it shouldn't fail when the file doesn't exist.

Spotted in another issue, and double checked here: why isn't Driectory spotted as a spelling error?

./core/tests/Drupal/FunctionalTests/Installer/InstallerExistingConfigSyncDriectoryProfileMismatchTest.php
jungle’s picture

Re #117

A demo command to check file names:

$ find . -type file | cspell stdin 
alexpott’s picture

@jungle we have the string in a file. i.e.

class InstallerExistingConfigSyncDriectoryProfileMismatchTest extends InstallerExistingConfigTestBase {

So it should be finding it.

I think we need to file an upstream issue. As InstallerExistingConfigSyncDriectoryProfileMismatchTest is not detected. However InstallerExistingConfigSyncDriectory, BlahDriectory and DriectoryBlah are. Interestingly InstallerExistingConfigSyncDriectoryProfile is not.

alexpott’s picture

StatusFileSize
new715 bytes
new98.15 KB

Here's the make dictionary command back again but with @longwave's suggestion so it is re-entrant after quitting - nice one. I've changed the command name to be more explicit about what it does. I also reran it and there's no changes.

jungle’s picture

  1. @jungle we have the string in a file. i.e.

    In theory. not all file names show up in files, but I believe in that most of them are, as classes everywhere in D8/D9.

  2. Interestingly InstallerExistingConfigSyncDriectoryProfile is no

    Drie-s-magic happens to Driectory? :p

xjm’s picture

Issue summary: View changes

Drie-s-magic happens to Driectory? :p

💯

Adding further notes to the IS for the dependency evaluation, both from #86 and the other projects that use this.

BTW, the words:

...is a development tool that does not appear to have a security release process...

...Always make me sad.

alexpott’s picture

StatusFileSize
new98.13 KB
new370 bytes

Rerolled now that #3138721: Fix "neccessary" typos in core has landed.

jungle’s picture

@catch committed a few relevant fixes in the past 10 mins. needs reroll again, probably.

What left here? Could someone RTBC this after the next reroll?

Thanks!

alexpott’s picture

StatusFileSize
new461 bytes
new98.11 KB

Here we go...

The last submitted patch, 123: 2972224-5-123.patch, failed testing. View results

longwave’s picture

StatusFileSize
new98.12 KB
new881 bytes

sort is collation dependent, this means we get different sort orders on Mac and Linux, and on Linux the current locale has an effect too. If I regenerate the dictionary on Linux I get the same file but in a different order - the upper and lower case letters are mixed rather than all the uppercase letters coming first.

I think we can override this by setting the LC_ALL environment variable to C which should use the same collation everywhere. On Linux this now generates the same dictionary that was in the previous patch.

longwave’s picture

Discussed in Slack: as far as I am concerned this is ready for RTBC now, if @jungle wants to do that when this test run passes?

jungle’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs release manager review

Checking again against the latest commit: c9c0fc1e59

$ cd core
$ yarn
$ yarn run spellcheck:make-drupal-dict
yarn run v1.22.4
$ rm -f misc/cspell/dictionary.txt && yarn -s spellcheck:core --unique --wordsOnly | LC_ALL=C sort -o misc/cspell/dictionary.txt
CSpell: Files checked: 14558, Issues found: 9120 in 2678 files
✨  Done in 247.78s.

Regenerated the dictionary, confirmed it's identical with the one in the patch.

To me, nothing left here, so removing the tag "Needs release manager review". Let's move forward. Do all the rest in child issues or follow-ups

Note for committer: please remember to publish the CR: https://www.drupal.org/node/3122084

Thanks!

alexpott’s picture

@jungle thanks for the rtbc. A release manager needs to remove that tag.

jungle’s picture

@alexpott,. Thanks for correcting me! Yes, realized that I am not eligible to remove that tag.

xjm’s picture

FWIW, @catch and I both had some similar concerns about the potential impact of this on DX.

E.g. is correcting the British and Australian spellings that the Commonwealth committers don't notice really worth adding work and RTBC/NW cycles for someone who puts a Star Trek reference in their test fixture? I'm not sure. I'm not saying I think it's wontfix; just I am not totally sure that we're winning on impact vs. disruption.

longwave’s picture

If there are concerns could we introduce it but with reduced scope? If we could ignore all tests and code comments for now, would that be a good start? That would still catch some typos in YAML files, error messages, etc.

xjm’s picture

Another concern (or at least annoyance) is that the cspell checks seem to take much longer than phpcs or yarn per file. It took fully two minutes to scan the files in this patch when I committed this locally to test! Is there any way to speed it up, I wonder?

xjm’s picture

Tested locally:

Checking changed files...
/Users/xjm/git/maintainer/core/lib/Drupal.php:12:54 - Unknown word (fbhji)
error Command failed with exit code 1.
PHPCS: core/lib/Drupal.php passed
[ibnsina:maintainer | Thu 12:38:48] $ git diff
diff --git a/core/lib/Drupal.php b/core/lib/Drupal.php
index 461fda8547..29fc5f8071 100644
--- a/core/lib/Drupal.php
+++ b/core/lib/Drupal.php
@@ -2,14 +2,14 @@
 
 /**
  * @file
- * Contains \Drupal.
+ * Contains \Durpal.
  */
 
 use Drupal\Core\DependencyInjection\ContainerNotInitializedException;
 use Symfony\Component\DependencyInjection\ContainerInterface;
 
 /**
- * Static Service Container wrapper.
+ * Static Service Container wrapper normalise colour fbhji.
  *
  * Generally, code in Drupal should accept its dependencies via either
  * constructor injection or setter method injection. However, there are cases,

The Brit spellings are in the dictionary I guess, but why didn't it catch "Durpal"?

xjm’s picture

@longwave Sorry, missed your comment amidst posting mine.

Skipping tests could be useful since that's where a lot of test data ends up, but on the other hand we would not be able to catch scenarios where e.g. the test was not actually testing what we thought it did because of a mis-named method, provider case, etc. In some cases we can protect against that already by including "before and after" assertions, but not always. It's worth considering.

Skipping comments I'm similarly conflicted about. On the one hand, typos in comments are less likely to affect anything. On the other, though, part of the advantage for me of doing this would be to cut down on the number of nitpick patches (that also become a part of credit gaming) and have more contributors focus on fixing more impactful things. (That's part of the goal of the coding standards work generally -- fix it once, enable the rule, and prevent us from needing such patches ever again.)

I wonder if skipping things speeds up the scanning? Like skipping whole files certainly should, but does skipping certain lines in a file that's being scanned speed up or slow down the scan?

xjm’s picture

Status: Reviewed & tested by the community » Needs review

I'd like to discuss the above more, e.g. proposals like #133 or other ideas. Thanks!

xjm’s picture

BTW, I'm also going to commit the outstanding RTBC spelling fix patches after discussing with @catch earlier, since RC is the perfect time to commit such patches and since they're backportable while this isn't. We still shouldn't create a zillion more issues or anything, but we can wait and update the dictionary once the already-ready ones are in. (Issue scoping and process are release management purview, and rerolling all those would be more work than rerolling just this one.) Thanks!

alexpott’s picture

StatusFileSize
new28.85 KB
new111.08 KB

I spent sometime investigating #135 trying to figure out why Durpal is allowed. And it's because we have compound words set to TRUE. We shouldn't do that. So I've turned that off and re-run the dictionary.

Also while investigating the cSpell configuration I've realised that our dictionary should be all lowercase. cSpell is case insensitive and all their dictionaries are lowercase too. I've adjusted the command to generate a lowercase dictionary and handle stray \ characters that turn up when spell checking things like Subtheme\'s. Doing this also removes some duplicates from the dictionary which is nice. For example in #127 we've got CPAN and cpan.

I've also looked at the performance issues. cSpell set up is quite expensive - it has to build the dictionaries from compressed and other interesting formats. So I've adjusted the d8githooks to do the spell checking of all the changed files in one go. Do it this way has a massive and beneficial impact on performance. Without this change committing this patch takes me 66.04 secs and with this change 10.71 secs

Not scanning test would speed things up. but then we'd lose some benefit. For example,

  /**
   * Confirms that we can fetch a record into a class using fetchObject.
   *
   * @see \Drupal\system\Tests\Database\FakeRecord
   * @see \Drupal\Core\Database\StatementPrefech::fetchObject
   */
  public function testQueryFetchObjectClass() {

The scanner picks up that StatementPrefech is badly spelt and that is in core/tests/Drupal/KernelTests/Core/Database/FetchTest.php

xjm’s picture

≈ 10s is much more reasonable for a large patch.

Why is "Durpal" considered a compound word? I don't get it.

I just grepped the codebase for "spelt" to make sure that didn't need to be added to #3138718: Convert British English spellings to American English, for the umpteenth time (as it's a grain in AE, but not a past participle).

alexpott’s picture

I think "Durpal" is considered a compound word because pal is in the dictionary and the dur is a three letter word and so ignored. But I'm not sure.

alexpott’s picture

Re the impact versus disruption.

  1. This tooling is finding real bugs - see #3138795: d6_term_node_revision references non-existent migration
  2. We’ve had years and years of spellchecking issues being filed against core and these can become a thing of the past. And people not landing their patches because either they’ve already needed re-rolls or they missed correcting the same mistake somewhere else.
  3. We finally have a way to not regress when we decide to prefer something like “writable” over “writeable” - see #2898947: Change "writeable" to "writable" in documentation - an issue that so far has taken us 3 years to do!
  4. How many times do we want to fix and get credit for #3138718: Convert British English spellings to American English, for the umpteenth time
  5. Adding friction to adding favourite fake words into tests is a feature too. One person's favourite band can be another person's trigger.
alexpott’s picture

StatusFileSize
new871 bytes
new110.99 KB

Re-rolled spelling fixes have been committed.

quietone’s picture

Totally agree with #142, especially #1. I am the author of the typo that lead to this bug discovery and I spent more time today reviewing, fixing, rerolling a related patch and commenting than I would have done if the typo was discovered while working on the original issue.

+++ b/core/.cspell.json
@@ -0,0 +1,48 @@
+      "modules/**/Migrate*Test.php",
...
+      "modules/migrate_drupal/tests/fixtures/drupal6.php",
+      "modules/migrate_drupal/tests/fixtures/drupal7.php",

Nice, I can still add 'Golgafrincham' somewhere. ;-)

longwave’s picture

The backslash removal needs work, on Linux I get:

> rm -f misc/cspell/dictionary.txt && yarn -s spellcheck:core --unique --wordsOnly | tr '[:upper:]' '[:lower:]' | tr -d \\ | LC_ALL=C sort -u -o misc/cspell/dictionary.txt

tr: warning: an unescaped backslash at end of string is not portable
longwave’s picture

StatusFileSize
new25.03 KB

So if we drop tests from the spellcheck then we lose 905 words from the dictionary, some of which are spelling errors but many of which are either real words or foreign words/nonsense/technical terms used in tests.

However this does a lot for the speed of npm run core:spellcheck run, sample timings from my laptop:

#143 including tests: 7m1s
With tests removed: 3m20s

Interdiff attached only for discussion.

alexpott’s picture

StatusFileSize
new2.33 KB
new110.73 KB

Rerolled because #3138718: Convert British English spellings to American English, for the umpteenth time.

Hopefully fixed the warning on unix for the dictionary make command.

I think now that commit performance is fixed I'd like to continue to include tests. Here's why:

  1. It fixed errors in tests like highlighted in #139
  2. This process is not part of the critical path for performance and running a full scan is a rare activity. It's only common now because we trying to land the patch
  3. If we don't fix them in tests we're then going to get spelling mistakes in tests so we're going to get issues filed that need to be reviewed and committed etc.
  4. The friction to adding new fake works is a good thing because as said before - One person's favourite band can be another person's trigger.
  5. There are other cases where a spelling mistake might not result in a test breaking but the assumptions around a test might be changed.
alexpott’s picture

StatusFileSize
new984 bytes
new110.74 KB

Apparently #147 still issues a warning on linux. After discussing with @jungle and @longwave we've discovered we need a lot \ for cross compatibility.

longwave’s picture

Confirm that 8 backslashes is the correct number to fix this on Linux. We need two for tr to not throw a warning, then those two need to be escaped to four for the shell, and those four need to become eight when embedded in JSON.

longwave’s picture

Agree with all of #147.1/3/4/5. My only question around #147.2 is that eventually would we want this to become part of DrupalCI? If so is adding some minutes on to each test run acceptable? Or I guess we could script this like the linter and only make it run on changed files for patches?

alexpott’s picture

StatusFileSize
new522 bytes
new110.7 KB

Keeping up with HEAD.

Re #150 - yes CI would definitely only test changed files. Exactly the same way we run PHP lint against changed files.

longwave’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC on this for @xjm to review #142 onwards, as I'm happy with what's been changed since around compound words, agree that checking tests and comments is worthwhile, and the performance questions have been answered too.

alexpott’s picture

StatusFileSize
new289 bytes
new110.69 KB

Keeping up with HEAD.

dww’s picture

I'm also +100 to all the points in #142 / #147.

I don't know if anyone except a release manager is supposed to RTBC with the "Needs release manager review" tag, but +1 to RTBC.

Re: #133: Let's not ignore/skip any more than we have to. If we're going to nitpick test comments (which we do), let's automatically spell check them, just like everything else.

Here's my experience trying it locally (2016 Macbook Pro, 3.3 GHz Dual-Core Intel Core i7, 16 GB RAM):

% yarn spellcheck:core
yarn run v1.22.4
$ cspell "**/*" "../composer/**/*"
/.../drupal-9_1/core/phpunit.xml:25:67 - Unknown word (MAMP)
/.../drupal-9_1/core/phpunit.xml:25:79 - Unknown word (htdocs)
CSpell: Files checked: 14563, Issues found: 2 in 1 files
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
yarn spellcheck:core  331.03s user 12.90s system 101% cpu 5:39.10 total

One possible improvement: should we ignore core/phpunit.xml, too? We can have weird words in there from filesystem paths, etc. Probably doesn't matter, so I'm not NW or NR'ing for this, but I wanted to ask.

5 mins 39 seconds wall clock time to check all of core isn't great, but checking individual files, patches, etc, is only a few seconds. Checking all of core isn't really going to need to happen much going forward.

Another experience point. I made this change to core/authorize.php:

--- a/core/authorize.php
+++ b/core/authorize.php
@@ -15,6 +15,9 @@
  * settings.php ('allow_authorize_operations') and via the 'administer software
  * updates' permission.
  *
+ * dww dozent spell nicelee
+ * Thignz ar funy

I ran the check on this file and see:

% yarn spellcheck authorize.php
yarn run v1.22.4
$ cspell authorize.php
/.../drupal-9_1/core/authorize.php:18:8 - Unknown word (dozent)
/.../drupal-9_1/core/authorize.php:18:21 - Unknown word (nicelee)
/.../drupal-9_1/core/authorize.php:19:4 - Unknown word (Thignz)
/.../drupal-9_1/core/authorize.php:19:14 - Unknown word (funy)
CSpell: Files checked: 1, Issues found: 4 in 1 files
error Command failed with exit code 1.

Seems odd that it doesn't flag "dww". I don't see that in the Drupal dictionary. ;) I guess "ar" is actually a word of some sort in one of our used dictionaries?

Generally, looks like this is going to help a lot but it won't be fool-proof and we're still going to have to pay some attention to typos/spelling mistakes during reviews.

Cheers,
-Derek

longwave’s picture

@dww cspell ignores words shorter than 4 letters.

xjm’s picture

Yep, any "Needs [committer-type] review" tag is fine to have outstanding if an issue is otherwise RTBC (because the RTBC queue is all issues that are only actionable for committers).

alexpott’s picture

StatusFileSize
new110.44 KB

Re-rolled because yarn.lock has changed and therefore was conflicting.

alexpott’s picture

StatusFileSize
new109.68 KB

yarn.lock changed again so rerolled and generated the dictionary again - no changes.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 158: 2972224-5-158.patch, failed testing. View results

longwave’s picture

Status: Needs work » Reviewed & tested by the community

Random fail in CKEditorIntegrationTest, back to RTBC

xjm’s picture

Saving issue credits.

  • xjm committed 94ca9bc on 9.1.x
    Issue #2972224 by alexpott, longwave, jungle, sja112, xjm, dww, daffie,...
xjm’s picture

Alright, @catch and I have discussed this issue several times, and we both are somewhat reluctant about the slight DX annoyance of having to add or exclude words that are missing from the dictionary. However, the fact that spelling errors can also cause real bugs, expose issues with test coverage, etc. all make the issue more worthwhile. Especially now that the performanced is fix, I'm OK with this going into 9.1.x.

Committed to 9.1.x, and published the change record. I'm also going to NW a few issues that can now add or remove a dictionary entry as appropriate. Thanks!

xjm’s picture

Status: Reviewed & tested by the community » Fixed
jungle’s picture

Thanks @xjm for committing. Landed finally.

One question that could this be an exception and be backported to 9.0.x and 8.9.x?

I know you have your considerations so that it got committed to 9.1.x only. But if so, related patches which touched the dictionary have to be re-rolled if those patches are applicable to 9.0.x and 8.9.x. You could not do cherry-pick from 9.1.x probably which increases an extra effort to both contributors and committers.

chi’s picture

jwilson3’s picture

Apologies, coming here from #3154539: Implement new Gray scale on Claro where I'm looking at trying to standardize the word "gray/grey". This is a really long issue to read through to figure out what is the proper procedure for adding a new spelling correction. Could there maybe be a CR created for this, or at least something in this issue title that describes how to go about proposing a new spelling correction.

I also noticed there is a safeword of "davysgray" in the dictionary introduced in this issue. I dont know if there is something proper or special about that word that would prevent it from being standardized to "davysgray" (or removed entirely if I can convince the designers to move their neutral color scale to a decimal based scale, instead of named colors.

Edit: found the CR here: https://www.drupal.org/node/3122084/

alexpott’s picture

@jwilson3 the current dictionary contains misspellings and other errors. It was added so that we could stop new misspellings occurring and and work in a structured and well-scoped way to fix core. Words can be added to the flagwords configuration and removed from dictionary by issues as appropriate.

jwilson3’s picture

Thank you @alexpott. I came to the same conclusion last night after finding the CR and I created a follow-up issue with patch for the misspelling I found. #3155258: Use American English spelling of "gray" Feedback appreciated on that. I hope I did it right.

jwilson3’s picture

After running yarn run spellcheck:core on a pristine 9.1.x branch, I'm seeing several instances of 2 unknown words that are being flagged that are not in the dictionary.

...
/core/modules/migrate/src/Plugin/migrate/process/StaticMap.php:59:22 - Unknown word (autop)
/core/modules/migrate/src/Plugin/migrate/process/StaticMap.php:61:22 - Unknown word (htmlcorrector) 
...

CSpell: Files checked: 14601, Issues found: 56 in 27 files
error Command failed with exit code 1.

Does it need a follow-up issue?

jungle’s picture

Re #170, @alexpott just did a hot fix on that. please pull the latest code to your local.

jwilson3’s picture

Thanks. confirmed that works.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.