Problem/Motivation
Issues like #2829185: Fix spelling errors in Drupal core comments are great, as they provide profit, but it'd be nice when things could be done more automatically.
Let's try to do some research how we could provide maybe a tool to check for spellchecking automatically.
Symfony for example corrects common mistakes using a bot.
Proposed resolution
Use cspell
The config file locates at core/.cspell.json, you can add new words into it or adjust it when necessary. Visit https://github.com/streetsidesoftware/cspell/tree/master/packages/cspell and read the documentation for more information. Such as how to enable/disable checking etc.
Example usages:
- Checking files inside the core folder:
cd core yarn run spellcheck:coreWarning: the executing time for this command is about 5 to 30 minutes depending on your PC.
-
Or to checking any file or directory:
cd core yarn run spellcheck "path/to/file/or/directory/**/*"
Once an unknown word is reported, you have two options:
- If it's an incorrect word, fixing it
- if it's a correct word, adding it to the core/misc/cspell/dictionary.txt file
Regenerating the Drupal dictionary
Run:
cd core
rm misc/cspell/dictionary.txt
yarn -s spellcheck:core --unique --wordsOnly | sort -o misc/cspell/dictionary.txt
Remaining tasks
Followups:
#3122088: [Meta] Remove spelling errors from dictionary.txt and fix them
User interface changes
None
API changes
None
Data model changes
None
Dependency evaluation
- GitHub repo: https://github.com/streetsidesoftware/cspell
- Release cycle: first release was in January 2017, there has been approximately one major release a year (v4 is current, v5 is in alpha) and there are regular (monthly or sooner) patch releases.
- Security policy: cspell is a development tool that does not appear to have a security release process.
The package is used by:
- phpcs
- Code Spell Checker for Visual Studio Code, 1.3 million installs (see below about the Microsoft-authored dependency)
In addition to its own sub-packages, cspell installs all of the following:
- dot-prop (sindresorhus)
- graceful-fs (isaacs)
- is-obj (sindresorhus)
- make-dir (sindresorhus)
- write-file-atomic (isaacs)
- at-least-node (RyanZim)
- comment-json (kael)
- configstore (sindresorhus)
- crypto-random-string (yeoman)
- fs-extra (jprichardson)
- gensequence (Jason3S)
- has-own-prop (sindresorhus)
- iterable-to-stream (Jason3S)
- jsonfile (jprichardson)
- unique-string (sindresorhus)
- universalify (RyanZim)
- vscode-uri (Microsoft, apparently)
- xdg-basedir (sindresorhus)
Release notes snippet
@todo
| Comment | File | Size | Author |
|---|---|---|---|
| #158 | 2972224-5-158.patch | 109.68 KB | alexpott |
| #157 | 2972224-5-157.patch | 110.44 KB | alexpott |
| #153 | 2972224-5-153.patch | 110.69 KB | alexpott |
| #153 | 151-153-interdiff.txt | 289 bytes | alexpott |
| #151 | 2972224-5-151.patch | 110.7 KB | alexpott |
Comments
Comment #2
cilefen commentedHa
Comment #3
cilefen commentedComment #7
alexpottPHPCS is using cspell - it looks really quick and support php out-of-the-box. Yes it is a NPM package but at some point core is going to have build process that involves NPM so that shouldn't stop us.
Comment #8
alexpottTo use this patch install & run cspell from the command line:
Comment #10
avpadernoThe failure is caused from
Drupal\Tests\ComposerIntegrationTest::testExpectedScaffoldFiles(), which isn't related to this patch.Looking at the words in the dictionary, I wonder why words like Pioggia or Pastafazoul are necessary. I don't recall seeing them used in computer science. I could understand spaghetti in spaghetti code, but even that word doesn't come in Drupal code comments.
Comment #11
jungleIgnored
node_modulesIt's relevant :) , and fixed by s/optimised/optimized in sites/default/default.settings.php
Comment #12
jungle2 coding standards messages haven't been fixed.
Comment #13
alexpott@jungle node_modules is ignored automatically by cspell for what it is worth. Thanks for fixing the fail.
@kiamlaluno so our tests contain lots of foreign words and made up stuff. I generated the list of words by emptying out the word list in the .cspell.json file and then doing...
And then created valid JSON from that list.
I think we could commit this patch and then do follow-ups to remove incorrect spellings from the list or decide to use cspell's ability to ignore sections / lines / files where appropriate. This main thrust of this patch should be to get this in and then I can add it to https://github.com/alexpott/d8githooks so all core patches are spellchecked. And then patches like #2388925: British again invade config sync won't ever be necessary again.
Comment #14
xjmWhy is this issue filed against 9.0.x? 9.0.x should only contain major-only changes, or bugs that only affect 9.0.x+. Feature requests should be filed against 8.9.x and will be shortly updated to 9.1.x.
Comment #15
alexpottIt was filed against 9.0.x because I didn't want to support D8 with this because of unnecessary work. If this lands in D9 then great. As for what exact version it is filed against I think we're placing way too much importance on that being "right" before a patch is rtbc and we, ie. committers, should be responsible for setting it to the correct version and ensure the patch applies and is tested against that specific branch. Also with a code quality test improvement like this as long it hits the latest dev branch then it benefits the majority of the code because we have to commit 99.5% of patches there first.
Anyhow, I agree that 9.1.x is the correct version fwiw.
Comment #16
alexpottSo there are loads of real spelling mistakes in the list because I generated the list from our current spelling mistakes as determined by cspell. I think a good way forward would be to commit this list as is. And then do a follow to remove all the obviously wrong spellings - for example these British spellings and go forwards. At least then we can be sure that we don't add any new (unique) spelling mistakes and we don't suffer massive patch churn here.
Comment #17
jungleExpecting this to be committed soon if possible.
Comment #18
jungleOne question, should the command in #8 be added to the scripts section of core/package.json?
For example:
Comment #19
alexpott@jungle nope I don't think we do need to add to package.json. We could add it there but I don't think that that is necessary. Doing a global install seems fine for now.
We need a few things:
Comment #20
jungleFollow #19 and work on
Comment #21
jungleRe #19
1. CR created
2. See #3122088
3. See section
Build dependency changes4. Updated section
Proposed resolution5. See #3122096
Comment #22
clayfreemanReviewed the work in #11 and #21; everything looks great from what I can tell. I updated the change request to make one sentence a bit easier to read.
+1 RTBC from me; I presume @alexpott will want a chance to review before this hits full RTBC.
Comment #23
avpadernoAnother issue reminded me that Drupal uses American. (I take that means both in strings shown in the user interface and in comments.) The spellchecker should also correct British words into American words.
Does the spellchecker do that automatically, or is there a setting for that?
Comment #24
jungleRe #23, the CR https://www.drupal.org/node/3122084 might help to answer the question.
Comment #25
alexpottBritish spellings that need fixing - probably due to me :)
The list in core/.cspell.json is a list of things that need to be fixed or are Drupalisms that we want to keep. But adding this list at least allows us to spell check and then work on improving the code in a maintainable and testable way.
Comment #26
sja112 commentedAs per the comment #15,
If we are planning it to be fixed for 9.1.x patch needs to be re-rolled.
Comment #27
jungleThanks, @sja112!
Tip: to queue testing against the target branch. if the patch does not apply, it will fail and tell/show. Attaching screenshot of failed applying patch is unnecessary.
Comment #28
sja112 commentedThanks, @jungle for your suggestion.
I have re-rolled the patch for the 9.1.x version.
Changes in
core/lib/Drupal/Core/Installer/InstallerServiceProvider.phpandcore/lib/Drupal/Core/Installer/NormalInstallerServiceProvider.phpare not required.Comment #29
sivaji_ganesh_jojodae commentedComment #30
alexpott@sja112 if you re-roll this patch you need to run cspell to ensure you don't get any new errors introduced by patches added in the meantime.
These are fixed in the patch attached.
Comment #31
daffie commentedComment #32
daffie commentedComment #33
daffie commentedComment #34
lendudeReading through the word list is fantastic :) Somebody managed to get Teletubbies in! This does take out the fun of trying to sneak a new unique animal into core though ;)
Just some thoughts:
Some false positives in the list that jump out at me are things like:
These are caused by URL escaping. the 'A' is from a URL escaped character before the actual word, might need an upstream fix if we care enough?
not a fan of needing this in core code, but I can see the list exploding when not doing that in those couple of cases ¯\_(ツ)_/¯
Also, if we want to keep the list maintainable we might want to introduce some more API in the test framework to generate certain test strings. Expand
\Drupal\Tests\RandomGeneratorTraitto do strings with UTF-8 characters in them, so you don't have to make up your own when writing tests? And in the follow ups for cleaning out the list we could replace fixed UTF-8 strings with generated UTF-8 strings if we wanted to.We would want to keep the changes to this list to a minimum I would think, since otherwise this could lead to merge conflicts and re-rolls. So not having false positives and minimising the number of hardcoded weird strings in tests would probably be requirement to make this non-disruptive.
Also, it crashes on Node 8.11 :(
Comment #35
daffie commentedRuning the command:
Gets the result: "CSpell: Files checked: 14749, Issues found: 0 in 0 files".
Followups have been created for cleaning up the word list in
.cspell.jsonand for supporting this check on the drupalci_testbot.The patch contains the new
.cspell.jsonfile and a number of spelling corrections.The list of ignored paths looks reasonable and the language is set to "en-US".
All code changes look good to me.
For me it is RTBC.
Comment #36
avpadernoShould not the excluded path include more files/directories? As package.json and yarn.lock are excluded, should not composer.json and composer.lock be excluded too?
Comment #37
alexpott@kiamlaluno composer.lock is up a directory so not relevant. I think scanning the composer.json in /core is good. They have text in.
Comment #38
xjmWe should check the context on these. If it's "the API's functionality" or "the AND's second clause" (possessives), then we can leave them in. However, if it's "There are many API's" or "There are two "AND's" (intended to be plurals), then we should remove the apostrophe. It's a common mistake to pluralize acronyms with an apostrophe.
This is also not present in core as a single word. Where are these coming from? Is the dictionary from D8? If so we should regenerate it only for D9. This isn't backportable IMO.
lol, hi @phenaproxima
This looks like a typo but I can't find it anywhere as an independent word, even with a case-insensitive search? Is it attempting to parse camel-case? If so the camel-case should be
beCalled()and therefore not detected as a single word.I am thinking the patch for this issue should just add the dictionary with no spelling fixes. Then, we should create a meta of followups to fix the spelling errors included already, or in review comments, as well as others. I almost feel like we need to go over nearly every item in the dictionary and check for why it's there.
Comment #39
alexpottWe can go over the dictionary in the follow-up. The dictionary being added here is so that we can work on it in the follow up #3122088: [Meta] Remove spelling errors from dictionary.txt and fix them. I agree that we need to go through this list but trying to get the list perfect now will mean that we'll never get this in. And adding this even in this state will help improve core code quality - see #30 and how this caught a spelling mistake in an @see.
I tried just adding the dictionary with all of our spellings but that broke cspell because the custom dictionary grew too long.
I think we should regenerate the list and hopefully that deals with becalled.
Comment #40
alexpottHere's how I generated the new list.
Applied the patch
cspell "**/*" --unique --wordsOnly > ../wordlist.txtThe Becalled comes from an incorrect function call in \Drupal\Tests\Core\Form\FormStateDecoratorBaseTest::testSetLimitValidationErrors() -
The should be
shouldBeCalledThe reason for the changes here I think is all the update test removals that happened between #8 and now.
Comment #41
longwaveLet's also exclude
themes/bartik/color/preview.htmlas this is just Lorem Ipsum and a bunch of Latin-looking words are in the exclusions list.edit: there is Lorem Ipsum in several files, I guess that can wait for the followup
I wonder what these should be ;)
Comment #42
longwavecspell docs note there is a "php" dictionary for PHP keywords and library methods, we should probably include that and it will cut down our word list a bit.
Comment #43
longwaveAlso shouldn't we add this to package.json and yarn.lock, like the other developer tools? I assume the word lists are in an npm package somewhere and without locking to a specific version we can't guarantee everyone will use identical word lists.
This is basically the same as eslint/stylelint but for English :)
Comment #44
jungleRe #41, checked for fun 😆 one word, one usage found only, 3 in total:
\Drupal\Tests\views\Kernel\ViewsTemplateTest::$testViews\Drupal\media_library\Form\AddFormBase::trustedCallbacks()\Drupal\Tests\help\Functional\ExperimentalHelpTest::setUp()Comment #45
longwavePlaying with this a bit more. I think we can solve a bunch of URL-encoded false positives by ignoring the regex
%[0-9A-F]{2}and there are some bits of Lorem Ipsum we can skip too.Comment #46
longwaveExcluded some more non-English text (but there is more to do in various translation tests, if we want to) and ignored the regex
%[0-9][0-9A-F]which helps with URL-encoded false positives -%[0-9A-F]{2}was matching "%destination" and triggering a spelling error on "stination". This cuts down the word list by 230 or so entries.Comment #47
xjmBTW for the various stuff we're finding above, we can file issues/patches now as children of the above followup; we don't need to wait for this to land.
I think if we need to run the script with a custom regex, we should add a command to core or something. (The CR at least needs to document it.)
Comment #48
dwwRe: #47: Good point. I opened #3138671: Fix "incompatitable" typos in core to get us started (since I've made that typo numerous times in core in the last few months). ;)
Comment #49
longwaveRe #47 the regex is in the .cspell.json config file, no custom command needed, but I still think it's worth adding cspell to package.json both as a package and a command - though @alexpott disagreed with this in #19 so I haven't done this yet.
I think that to scope this better we could remove the various spelling fixes from the patch here, and spin that out into a separate issue ("fix British English spellings" or similar). That would leave us here just adding .cspell.json and a set of comments where spellchecking is disabled.
Comment #50
longwaveAnother pass at this with ~200 more false positives removed such as all the Lorem Ipsum and a number of translations.
Excluded the following files:
lib/Drupal/Component/Diff/**- not our code and has loads of strange variable namesmodules/**/Migrate*Test.php- perhaps controversial but this skips a whole bunch of words from translation testsmodules/system/tests/logo.svgz- this is being treated as text by cspell for some reasonCOPYRIGHT.txt- contains a lot of namesAlso added several more cspell disable/enable blocks.
Remaining questions:
Comment #51
dwwRe: #50:
- Seems a bit dangerous to ignore COPYRIGHT.txt. Granted, it shouldn't be changing much, but that seems like an important file to not have typos in. But yeah, it's a lot of names. /shrug
1. +1 to that. Seems good to have this handled much like eslint, locking specific versions of dictionaries, and having an npm run cspell command.
2. +1 to that. I guess I already started with #3138671: Fix "incompatitable" typos in core since I now see that those fixes are already included here. Seems weird to have a mix of known typos in cspell.json and some typo fixes in this patch. I'd rather this was just adding the cspell plumbing, the cspell ignore comments, and the complete cspell.json list (including bugs), and then we fix the bugs (and remove the entries from cspell.json) in children of the follow-up.
Comment #52
longwaveOpened #3138718: Convert British English spellings to American English, for the umpteenth time to deal with the British English spellings.
Comment #53
jungleFiled #3138731: Fix "inheritdoc" typos in core to fix the funny @inheritdoc ones in #41/#44
Comment #54
dwwSo let's re-roll this to remove the typo fixes from here per #50.2.
Comment #55
quietone commentedRegarding excluding "modules/**/Migrate*Test.php - perhaps controversial but this skips a whole bunch of words from translation tests"
I am a casual reader of this issue and reckon if we want to spell check this type of migration test then a separate issue should be made because there are 253 tests of that type, 26 of which are specific to translation. As the writer of most of those translation tests I know I am prone to typos and mixing American and British spelling so maybe it is prudent to make a new issue?
Comment #56
xjmWe should probably skip things that aren't in the Latin alphabet, like the entire Greek alphabet for example, or "Russian" in Russian or "Arabic" in Arabic. Aside: "File" gets straight-up transliterated? That's delightful. :) Edit: "тестирования" is even better; what a wonderful loanword.
If the tool doesn't support skipping things in other character sets, seems like it should.
Comment #57
sja112 commentedAs per #54 rerolled the patch to remove the typo files.
These fixes will be handled in the related issue Convert British English spellings to American English, for the umpteenth time
Comment #58
xjmI agree with @longwave; I think we probably do want it as a dev dependency in
package.json. It should probably get checked on commit like coding standards do; otherwise, it'll keep regressing.Edit: That also means a dependency evaluation. :)
Comment #59
sja112 commentedFiled #3138731: Fix "neccessary" typos in core.
I handled it separately because it was not in the scope of Convert British English spellings to American English, for the umpteenth time
Comment #60
dww@sja112: I already did at #3138721: Fix "neccessary" typos in core.
@all: Please use #3122088: [Meta] Remove spelling errors from dictionary.txt and fix them as the parent for any "Fix X spelling typo" issues, and be sure to check there for what's already been filed before opening anything else. There's an active discussion about how to properly scope things.
This issue is about the cspell plumbing (only).
#3122088 and children are about fixing the actual typos. We don't need comments in this issue every time we open a child of #3122088.
Thanks!
-Derek
Comment #61
alexpottCrediting @dww from the duplicate issue #3138721: Fix "neccessary" typos in core
Comment #62
sja112 commented@alexpott,
As per #54 rerolled the patch to remove the typo files in #57. That's why worked on another issue to fix "neccessary" typo error like we did for other words.
Comment #63
alexpottThe recent comments continue to set this back unnecessarily. We could have committed this to core and add to the committer toolchain and the committer can either fix the typos on commit or set the patch back to needs work. It's such a shame to not commit things that will make an obvious and clear benefit to the committer workflow and then iterate.
Let's not add this to package.json yet we can do that after and spend the necessary time whilst getting the improvements in core. We can do this in a follow-up.
Comment #64
longwave@alexpott in that case isn't #57 ready for review? The followups can keep building up and wait until this is committed if necessary.
Comment #65
alexpottProceeding with the sub issues of #3122088: [Meta] Remove spelling errors from dictionary.txt and fix them like #3138721: Fix "neccessary" typos in core before committing this means we no way of proving that we're not introducing regressions. Once we have the core/.cspell.json in core we can stop regressing and more forward with 3122088.
Any of the patches that were green and not failing would have been good to commit. If we've had this is core when this was first done then #3138721: Fix "neccessary" typos in core would not have even been neccessary :)
@longwave yes let's get #57 in.
Comment #66
alexpottComment #67
alexpottComment #68
alexpottphp is automatically detected for .php files... we do not need to add it here. I'm working on updating the file.
Nice!
Comment #69
longwaveAdding the PHP dictionary removed at least these words from the list, and probably more:
Comment #70
alexpottTurns out adding php to the global dictionaries is a good idea. Fixes things like chmod in core/INSTALL.txt - and we are a PHP project. I've also added the html dictionary for the same reason. We can always revisit these choices after commit and add new words to the wordlist but this does feel like a good place to start.
With this .cspell.json running
cspell "**/*"results in no errors.Comment #71
alexpottThis is why we need to get this in first. If we land issues like #3138731: Fix "inheritdoc" typos in core and #3138671: Fix "incompatitable" typos in core we'll be playing catch up forever.
Comment #72
alexpottComment #73
jungleThe patch looks good.
I think the following dictionaries should be added as well.
General Dictionaries
Programming Language Dictionaries
I checked with
the output:
Note: the test without the
bashdictionaryComment #74
longwaveen_US is already specified as the language, we don't need to repeat that with the dictionary.
I think the risk of adding more dictionaries is we might end up with some misspelled words being ignored that happen to be in those dictionaries. I have scanned the words list in .cspell.json and I don't immediately see any CSS or JavaScript words that look like they would be removed anyway (we can test this by deleting the word list, adding the dictionaries and regenerating the word list, then doing a diff).
As far as I can see this is "good enough" for now and we can work on incremental improvements in followups.
Comment #75
jungleThanks, @longwave!
> As far as I can see this is "good enough" for now and we can work on incremental improvements in followups.
Agree, setting back to RTBC?
Comment #76
sja112 commentedApplied the patch on the 9.0.x branch.
Ran cspell to ensure I don't get any new errors introduced by patches added in the meantime.
Got three,
Comment #77
longwaveBoth nviews and skippable have been removed from 9.1.x but are indeed present in 9.0.x. I guess the question is, how far back are we going to go with this? Should the word list be the same across all branches?
Comment #78
jungleTo maintain extra custom dictionaries per branch?
nviews and skippable go to 9.0.x.txt only?
Comment #79
jungle> As far as I can see this is "good enough" for now and we can work on incremental improvements in followups.
Or no more back and forth, commit it, do all the rest in follow-ups.
Comment #80
jungleSetting back to Needs review.
Comment #81
sja112 commentedAgreeing to #79.
I also believe in solving the pending issues in the follow-ups.
Patch looking good. Setting it to RTBC.
Comment #82
xjmI don't think this is ready for commit. Couple things:
package.jsonand so forth. I disagree with requiring a global installation; every time we've used global installs for our development tooling before, things have eventually gone horribly wrong.Comment #83
xjmOkay apparently there are 40-odd issues that now have patches, which definitely wasn't my intention. It's not an ideal situation to be in either way. But I see no reason to block those issues if this issue isn't going to actually stop regressions from being committed. Requiring committers to do this manually on every commit is not an option IMO, and if it doesn't run on commit we're not changing anything from the current status.
Comment #84
xjmNote that this is different from your typical PHPCS rule because part of it is about documentation and language. Those tasks always need different scope management than reformatting whitespace or such.
Comment #85
xjm@jungle, interesting idea on the custom dictionaries per branch. I think though we want to be striving for as small a dictionary as possible with as little maintainable surface as possible, so I would keep just the one dictionary. IMO this issue is only eligible for 9.1.x, but many of the fixes (where there's merely documentation errors in particular) can be backported as far as 8.8.x.
Comment #86
longwaveUpdated the change record to give the options of fixing spelling, adding to the word list, or ignoring the new word(s) using code comments.
Dependency evaluation for adding this to core:
GitHub repo: https://github.com/streetsidesoftware/cspell
Release cycle: first release was in January 2017, there has been approximately one major release a year (v4 is current, v5 is in alpha) and there are regular (monthly or sooner) patch releases.
Security policy: cspell is a development tool that does not appear to have a security release process.
Comment #87
longwaveAdded cspell to package.json and yarn.lock. Also added two commands for ease of use:
npm run spellcheck path/to/file- check a file (can specify multiple files)npm run spellcheck:core- check all of coreComment #88
alexpott@longwave how about the dependencies it pulls in?
Comment #89
longwave> how about the dependencies it pulls in?
Do we really evaluate every single npm dependency? As npm packages go it doesn't seem too sprawling, and it is only ever used in development (and even then it doesn't compile code or anything like that), so how detailed do we have to be here? All the packages beginning
cspell-are from the same author. The full tree is:Comment #90
longwaveIgnored all the non Latin characters, some parts of binary encoded strings, and a few other bits.
Comment #91
dwwCool, nice to see this moving forward. +1 to putting this directly into core, instead of relying on global installs of anything. Makes it much more likely people will (be able to) use this.
However, this seems to bring in a lot more than we want:
Can we prevent it from pulling all this in?
And especially this, given #3138718: Convert British English spellings to American English, for the umpteenth time ;)
Bunch more we don't care about, right?
Thanks!
-Derek
Comment #92
longwaveNo, we can't stop that - cspell declares all those as dependencies, so you get them whether you like it or not. They do nothing except waste disk space, unless you explicitly choose to use them in .cspell.json.
Comment #93
dww@longwave: re: #92 Duly noted, thanks.
Maybe this is silly, but is it possible/worthwhile to have a "post-install" command that runs to remove the crap we don't need? ;)
e.g.
I know that kinda thing works in composer land, but I'm not a yarn-er. Thoughts?
Thanks again,
-Derek
Comment #94
alexpottDoing #93 introduces more fragility we shouldn't be changing a packages dependencies externally.
Comment #95
alexpottI've updated https://github.com/alexpott/d8githooks to support spellchecking. It will use a globally installed cspell if available and the .cspell.json from #90. However once this patch is committed it'll automatically switch to using yarn and the config we commit.
One downside (for me and any committer that chooses to install cspell) is that if one of the file changes that has words that fail checking and we're getting around that with a
// cSpell:disabletag then we'll have to work out it if the patch is changing that line - but that feels a small price to pay for better patch review right now. The git hooks support a flag to disable spellchecking to get around this.Comment #96
alexpottSo running yarn audit before and after running yarn install with this patch applied reveals that we already have known security vulnerabilities in js dev dependencies.
With patch
Without patch
At least we're not adding anything new :)
Comment #97
xjm@longwave re: #89 yep, we try to keep a handle on our dependencies' dependencies too, since these also introduce stability and security risks. We've been trying very hard not to be "that project with 1000 npm dependencies in the dependency tree".
In #3107926: Update stylelint to ^13.0.0 the most recent version of stylelint introduced a much larger dependency tree than it had before, so we're discussing how to proceed there.
Comment #98
alexpottHere's a diff using yarn-lock-diff
Comment #99
xjm@longwave For more background on this concern, see: https://youtu.be/yxVJggeMAPs?t=656 (Edit: Added the start time code as you obviously don't need background information on Drupal 9.) :P Edit 2: https://youtu.be/yxVJggeMAPs?t=1928 is the timecode specifically about our future npm dependency tree.
Re: #96 ouch, we just fixed the bad packages on audit like 6 months ago. We probably want a separate[ issue to update them again... also that number of dev packages has exploded somehow since. GitHub also isn't listing all those for some reason.
Comment #100
alexpottBy actually using this discovered that we also need to include config for composer/* folders they have things we do and don't want to scan. Will work on this later. Note all the other files outside core/ are scaffolded files so they are sort of checked already.
Comment #101
xjmFor #100.
Also tagging for RM review of the dependency tree we're adding. (Definitely prefer to have it as a dev dependency in
package.jsongiven the scope of the dep tree.)Comment #102
alexpottAdding the words from the composer folder. Turns out there aren't too many and there's no file that stands out that we should exclude.
Comment #103
xjmSo in addition to the various
cspellpackages, this adds:Blehhhhh at all the stupid micro-packages. Like
is-objis:srsly
Comment #104
xjmHave we looked at any other libraries for this?
Comment #105
xjmMost of the packages are authored by a handful of authors who really like to share their three-line functions. Added the list to the IS.
Comment #106
alexpottThis the package that phpcs uses to do this job.
Comment #107
longwaveIt is also the engine behind the most commonly used spell checker in Visual Studio Code with 1.3 million installs to date.
https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.c...
Comment #108
jungleComment coming in the next comment
Comment #109
jungleIntroduced a new custom dictionary:
drupal, split out thewordsintomisc/cspell/dictionary.txtmaking the.cspell.jsoncleaner and smaller.Added a command
spellcheck:dictto generate/update the dictionarymisc/cspell/dictionary.txtwas generated by runningnpm run spellcheck:dictThe path of custom dictionary is ignored by cspell.
Comment #110
alexpott@jungle you've undone the changes I made in #102. Also locally I couldn't get the spellcheck:dict command to not exit with:
So I've adjusted the command to...
so that uses the spellcheck:core command and doesn't error locally for me. Also we need to
rm misc/cspell/dictionary.txtfirst otherwise re-running the command empties the dictionary :)However... thinking more about this command I'm not sure we should add it. Regenerating the dictionary might be a common task before we've committed this. But after it should never be done. So let's not add it to package.json but let's add it how to do it to the issue summary here.
Comment #111
alexpottI've updated the change record with the yarn commands.
Comment #112
alexpottNote the support in https://github.com/alexpott/d8githooks is already working - see #3071682-28: The oembed Resource value object should be more permissive for NULL dimensions - one less spelling mistake in core.
Comment #113
jungleSorry about this, missed. Thank for fixing it.
Comment #114
alexpottI'm torn on the dictionary command - on one hand we really ought not need it if we proceed with all the follow-ups. On the other files will get fixed and removed outside of that process. However, the thing that gives me the most pause as to whether we should add the command is that it becomes another thing to support. Meaning we can't remove the command. I think given this and the fact that it wouldn't be part of any regular CI process mean that we shouldn't add it.
Comment #115
xjmWhy would we not regenerate the dictionary? We're going to be fixing issues manually, but issues might also get fixed without removing the bad entry from the dictionary, especially when dead code is removed, documentation is rewritten, or code is refactored, but also just because people fix things in it.
The command should at least go in the CR, but it is also still useful in the future I think to continue generating a smaller and smaller dictionary as things might get fixed in other issues.
Comment #116
alexpottBecause it is yet another thing to support and it will not part of any CI process. Also it is a bit fragile. It has to to remove the existing dictionary otherwise the first time you run it'll generate a blank dictionary. However once you remove the dictionary if you exist the process before quitting then running it again will fail. Exiting the process is not uncommon because it takes a long time to generate the dictionary and you can suddenly think oh I meant to do that as well.
Adding dictionary generation to the CR.
Comment #117
longwave> However once you remove the dictionary if you exist the process before quitting then running it again will fail
If you change the first part of the command to
rm -f misc/cspell/dictionary.txtit shouldn't fail when the file doesn't exist.Spotted in another issue, and double checked here: why isn't Driectory spotted as a spelling error?
Comment #118
jungleRe #117
A demo command to check file names:
Comment #119
alexpott@jungle we have the string in a file. i.e.
So it should be finding it.
I think we need to file an upstream issue. As
InstallerExistingConfigSyncDriectoryProfileMismatchTestis not detected. HoweverInstallerExistingConfigSyncDriectory,BlahDriectoryandDriectoryBlahare. InterestinglyInstallerExistingConfigSyncDriectoryProfileis not.Comment #120
alexpottHere's the make dictionary command back again but with @longwave's suggestion so it is re-entrant after quitting - nice one. I've changed the command name to be more explicit about what it does. I also reran it and there's no changes.
Comment #121
jungleIn theory. not all file names show up in files, but I believe in that most of them are, as classes everywhere in D8/D9.
Drie-s-magichappens toDriectory? :pComment #122
xjm💯
Adding further notes to the IS for the dependency evaluation, both from #86 and the other projects that use this.
BTW, the words:
...Always make me sad.
Comment #123
alexpottRerolled now that #3138721: Fix "neccessary" typos in core has landed.
Comment #124
jungle@catch committed a few relevant fixes in the past 10 mins. needs reroll again, probably.
What left here? Could someone RTBC this after the next reroll?
Thanks!
Comment #125
alexpottHere we go...
Comment #127
longwavesortis collation dependent, this means we get different sort orders on Mac and Linux, and on Linux the current locale has an effect too. If I regenerate the dictionary on Linux I get the same file but in a different order - the upper and lower case letters are mixed rather than all the uppercase letters coming first.I think we can override this by setting the
LC_ALLenvironment variable toCwhich should use the same collation everywhere. On Linux this now generates the same dictionary that was in the previous patch.Comment #128
longwaveDiscussed in Slack: as far as I am concerned this is ready for RTBC now, if @jungle wants to do that when this test run passes?
Comment #129
jungleChecking again against the latest commit: c9c0fc1e59
Regenerated the dictionary, confirmed it's identical with the one in the patch.
To me, nothing left here, so removing the tag "Needs release manager review". Let's move forward. Do all the rest in child issues or follow-ups
Note for committer: please remember to publish the CR: https://www.drupal.org/node/3122084
Thanks!
Comment #130
alexpott@jungle thanks for the rtbc. A release manager needs to remove that tag.
Comment #131
jungle@alexpott,. Thanks for correcting me! Yes, realized that I am not eligible to remove that tag.
Comment #132
xjmFWIW, @catch and I both had some similar concerns about the potential impact of this on DX.
E.g. is correcting the British and Australian spellings that the Commonwealth committers don't notice really worth adding work and RTBC/NW cycles for someone who puts a Star Trek reference in their test fixture? I'm not sure. I'm not saying I think it's wontfix; just I am not totally sure that we're winning on impact vs. disruption.
Comment #133
longwaveIf there are concerns could we introduce it but with reduced scope? If we could ignore all tests and code comments for now, would that be a good start? That would still catch some typos in YAML files, error messages, etc.
Comment #134
xjmAnother concern (or at least annoyance) is that the cspell checks seem to take much longer than phpcs or yarn per file. It took fully two minutes to scan the files in this patch when I committed this locally to test! Is there any way to speed it up, I wonder?
Comment #135
xjmTested locally:
The Brit spellings are in the dictionary I guess, but why didn't it catch "Durpal"?
Comment #136
xjm@longwave Sorry, missed your comment amidst posting mine.
Skipping tests could be useful since that's where a lot of test data ends up, but on the other hand we would not be able to catch scenarios where e.g. the test was not actually testing what we thought it did because of a mis-named method, provider case, etc. In some cases we can protect against that already by including "before and after" assertions, but not always. It's worth considering.
Skipping comments I'm similarly conflicted about. On the one hand, typos in comments are less likely to affect anything. On the other, though, part of the advantage for me of doing this would be to cut down on the number of nitpick patches (that also become a part of credit gaming) and have more contributors focus on fixing more impactful things. (That's part of the goal of the coding standards work generally -- fix it once, enable the rule, and prevent us from needing such patches ever again.)
I wonder if skipping things speeds up the scanning? Like skipping whole files certainly should, but does skipping certain lines in a file that's being scanned speed up or slow down the scan?
Comment #137
xjmI'd like to discuss the above more, e.g. proposals like #133 or other ideas. Thanks!
Comment #138
xjmBTW, I'm also going to commit the outstanding RTBC spelling fix patches after discussing with @catch earlier, since RC is the perfect time to commit such patches and since they're backportable while this isn't. We still shouldn't create a zillion more issues or anything, but we can wait and update the dictionary once the already-ready ones are in. (Issue scoping and process are release management purview, and rerolling all those would be more work than rerolling just this one.) Thanks!
Comment #139
alexpottI spent sometime investigating #135 trying to figure out why Durpal is allowed. And it's because we have compound words set to TRUE. We shouldn't do that. So I've turned that off and re-run the dictionary.
Also while investigating the cSpell configuration I've realised that our dictionary should be all lowercase. cSpell is case insensitive and all their dictionaries are lowercase too. I've adjusted the command to generate a lowercase dictionary and handle stray \ characters that turn up when spell checking things like
Subtheme\'s. Doing this also removes some duplicates from the dictionary which is nice. For example in #127 we've got CPAN and cpan.I've also looked at the performance issues. cSpell set up is quite expensive - it has to build the dictionaries from compressed and other interesting formats. So I've adjusted the d8githooks to do the spell checking of all the changed files in one go. Do it this way has a massive and beneficial impact on performance. Without this change committing this patch takes me 66.04 secs and with this change 10.71 secs
Not scanning test would speed things up. but then we'd lose some benefit. For example,
The scanner picks up that
StatementPrefechis badly spelt and that is in core/tests/Drupal/KernelTests/Core/Database/FetchTest.phpComment #140
xjm≈ 10s is much more reasonable for a large patch.
Why is "Durpal" considered a compound word? I don't get it.
I just grepped the codebase for "spelt" to make sure that didn't need to be added to #3138718: Convert British English spellings to American English, for the umpteenth time (as it's a grain in AE, but not a past participle).
Comment #141
alexpottI think "Durpal" is considered a compound word because pal is in the dictionary and the dur is a three letter word and so ignored. But I'm not sure.
Comment #142
alexpottRe the impact versus disruption.
Comment #143
alexpottRe-rolled spelling fixes have been committed.
Comment #144
quietone commentedTotally agree with #142, especially #1. I am the author of the typo that lead to this bug discovery and I spent more time today reviewing, fixing, rerolling a related patch and commenting than I would have done if the typo was discovered while working on the original issue.
Nice, I can still add 'Golgafrincham' somewhere. ;-)
Comment #145
longwaveThe backslash removal needs work, on Linux I get:
Comment #146
longwaveSo if we drop tests from the spellcheck then we lose 905 words from the dictionary, some of which are spelling errors but many of which are either real words or foreign words/nonsense/technical terms used in tests.
However this does a lot for the speed of
npm run core:spellcheckrun, sample timings from my laptop:#143 including tests: 7m1s
With tests removed: 3m20s
Interdiff attached only for discussion.
Comment #147
alexpottRerolled because #3138718: Convert British English spellings to American English, for the umpteenth time.
Hopefully fixed the warning on unix for the dictionary make command.
I think now that commit performance is fixed I'd like to continue to include tests. Here's why:
Comment #148
alexpottApparently #147 still issues a warning on linux. After discussing with @jungle and @longwave we've discovered we need a lot \ for cross compatibility.
Comment #149
longwaveConfirm that 8 backslashes is the correct number to fix this on Linux. We need two for
trto not throw a warning, then those two need to be escaped to four for the shell, and those four need to become eight when embedded in JSON.Comment #150
longwaveAgree with all of #147.1/3/4/5. My only question around #147.2 is that eventually would we want this to become part of DrupalCI? If so is adding some minutes on to each test run acceptable? Or I guess we could script this like the linter and only make it run on changed files for patches?
Comment #151
alexpottKeeping up with HEAD.
Re #150 - yes CI would definitely only test changed files. Exactly the same way we run PHP lint against changed files.
Comment #152
longwaveBack to RTBC on this for @xjm to review #142 onwards, as I'm happy with what's been changed since around compound words, agree that checking tests and comments is worthwhile, and the performance questions have been answered too.
Comment #153
alexpottKeeping up with HEAD.
Comment #154
dwwI'm also +100 to all the points in #142 / #147.
I don't know if anyone except a release manager is supposed to RTBC with the "Needs release manager review" tag, but +1 to RTBC.
Re: #133: Let's not ignore/skip any more than we have to. If we're going to nitpick test comments (which we do), let's automatically spell check them, just like everything else.
Here's my experience trying it locally (2016 Macbook Pro, 3.3 GHz Dual-Core Intel Core i7, 16 GB RAM):
One possible improvement: should we ignore core/phpunit.xml, too? We can have weird words in there from filesystem paths, etc. Probably doesn't matter, so I'm not NW or NR'ing for this, but I wanted to ask.
5 mins 39 seconds wall clock time to check all of core isn't great, but checking individual files, patches, etc, is only a few seconds. Checking all of core isn't really going to need to happen much going forward.
Another experience point. I made this change to core/authorize.php:
I ran the check on this file and see:
Seems odd that it doesn't flag "dww". I don't see that in the Drupal dictionary. ;) I guess "ar" is actually a word of some sort in one of our used dictionaries?
Generally, looks like this is going to help a lot but it won't be fool-proof and we're still going to have to pay some attention to typos/spelling mistakes during reviews.
Cheers,
-Derek
Comment #155
longwave@dww cspell ignores words shorter than 4 letters.
Comment #156
xjmYep, any "Needs [committer-type] review" tag is fine to have outstanding if an issue is otherwise RTBC (because the RTBC queue is all issues that are only actionable for committers).
Comment #157
alexpottRe-rolled because yarn.lock has changed and therefore was conflicting.
Comment #158
alexpottyarn.lock changed again so rerolled and generated the dictionary again - no changes.
Comment #160
longwaveRandom fail in CKEditorIntegrationTest, back to RTBC
Comment #161
xjmSaving issue credits.
Comment #163
xjmAlright, @catch and I have discussed this issue several times, and we both are somewhat reluctant about the slight DX annoyance of having to add or exclude words that are missing from the dictionary. However, the fact that spelling errors can also cause real bugs, expose issues with test coverage, etc. all make the issue more worthwhile. Especially now that the performanced is fix, I'm OK with this going into 9.1.x.
Committed to 9.1.x, and published the change record. I'm also going to NW a few issues that can now add or remove a dictionary entry as appropriate. Thanks!
Comment #164
xjmComment #165
jungleThanks @xjm for committing. Landed finally.
One question that could this be an exception and be backported to 9.0.x and 8.9.x?
I know you have your considerations so that it got committed to 9.1.x only. But if so, related patches which touched the dictionary have to be re-rolled if those patches are applicable to 9.0.x and 8.9.x. You could not do cherry-pick from 9.1.x probably which increases an extra effort to both contributors and committers.
Comment #166
chi commentedAdded follow-up issue.
Comment #167
jwilson3Apologies, coming here from #3154539: Implement new Gray scale on Claro where I'm looking at trying to standardize the word "gray/grey". This is a really long issue to read through to figure out what is the proper procedure for adding a new spelling correction. Could there maybe be a CR created for this, or at least something in this issue title that describes how to go about proposing a new spelling correction.
I also noticed there is a safeword of "davysgray" in the dictionary introduced in this issue. I dont know if there is something proper or special about that word that would prevent it from being standardized to "davysgray" (or removed entirely if I can convince the designers to move their neutral color scale to a decimal based scale, instead of named colors.
Edit: found the CR here: https://www.drupal.org/node/3122084/
Comment #168
alexpott@jwilson3 the current dictionary contains misspellings and other errors. It was added so that we could stop new misspellings occurring and and work in a structured and well-scoped way to fix core. Words can be added to the flagwords configuration and removed from dictionary by issues as appropriate.
Comment #169
jwilson3Thank you @alexpott. I came to the same conclusion last night after finding the CR and I created a follow-up issue with patch for the misspelling I found. #3155258: Use American English spelling of "gray" Feedback appreciated on that. I hope I did it right.
Comment #170
jwilson3After running yarn run spellcheck:core on a pristine 9.1.x branch, I'm seeing several instances of 2 unknown words that are being flagged that are not in the dictionary.
Does it need a follow-up issue?
Comment #171
jungleRe #170, @alexpott just did a hot fix on that. please pull the latest code to your local.
Comment #172
jwilson3Thanks. confirmed that works.