Problem/Motivation
Follow up #2972224: Add .cspell.json to automate spellchecking in Drupal core, Drupal core now uses cspell for spell checking, but there are loads of real spelling mistakes in the first iteration of the core/misc/dictionary.txt which has 2,262 words.
Proposed resolution
Identify and remove all spelling errors from the dictionary.
Valid technical terms or abbreviations remain in the dictionary. One example is 'nntp'.
Spreadsheet of words and notes.
How to work on the child issues
- Refer to cspell to learn how to work with cspell.
- Changed text must wrap at 80 columns per coding standards.
- Changed text must be US English.
Issues fixing spelling errors
- #3210125: Fix spelling in core.* yml files
- Issue(s) needed for groups of words such as revisionable, revisioned, revisionid, revisioning, unrevisionable
Other
- #3153919: Rename dictionary.txt file to drupal.dic to make it compatible with PhpStorm on Linux
- #3397353: Keep dictionary in sync
- #3390959: Test for words in 'cspell:ignore' that can be removed
Remaining tasks
- Review words in core/misc/cspell/dictionary.txt, identity misspelled ones.
- Group the misspellings into manageable issues by component/system so they can be reviewed in context.
- Spreadsheet of words and notes to help determine a scope
- Avoid issue scope for words beginning with a letter.
- Create a child issues for the suggested scope
- Needs a follow-up for #3138788 comment-12.1
- Follow up to enable case-sensitivity after other spellings issues are fixed.
Number of words in the dictionaries
Date | dictionary.txt | drupal-dictionary.txt | Total |
---|---|---|---|
2020-06-21 | 2,262 | 0 | 2,262 |
2024-02-03 | 972 | 13 | 985 |
User interface changes
API changes
Data model changes
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#37 | word-data.ods | 372.08 KB | quietone |
Issue fork drupal-3122088
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
apadernoComment #3
dwwComment #4
dwwMoved the list of British words to #3138718: Convert British English spellings to American English, for the umpteenth time
Updated remaining tasks to be more accurate for this as a meta.
Comment #5
jungleAdd a note to IS mentioning potential changes:
wrapped too early
orover 80 chars
.Comment #6
jungleFixing typo: Wrappred -> Wrapped 🤦
Comment #7
xjmJust as a note, we definitely shouldn't do a single issue for every misspelling. The current ones are OK since miscellaneous bad
@inheritdoc
are a phpdoc issue, not just a spelling issue, andincompatitable
we'll just fix as it's silly. (Edit: There is also an anecdote for that; somehow it got added to @tedbow's PHPStorm dictionary as a real word.) That said though, let's lean more toward scoping like #3138718: Convert British English spellings to American English, for the umpteenth time.Comment #8
xjmMiscellanea:
a shouldBeCamelCased category.
This could be an issue.
Unless this is French, it should probably be
DoNot
/DO_NOT
in code andDon't
orDo not
in text. There may be a whole category in here of contractions without their apostrophes.Should be hyphenated (un-assign, de-prioritize, re-render, etc.)
There's an interesting question of where to draw the line for these. Generally in English these prefixes are morphologically productive with a hyphen, and get de-hyphenated when the word is adopted into common usage. "Denormalize", "Unsanitized", "Uninstantiated" etc. are all obviously in common usage in programming. "Unsticky", "Unrevisionable", and the like are Drupal terminology, and I'm surprised that "unpublish" and friends aren't already in the main dictionary.
The word is "dependents". This is all up in our States API, so would need an issue with deprecation and all that.
"Disable"
Some tests have this in some method names. Should be
DoesNot
.Should be "Monoceros". A lot of variations of that one...
French is "Nourriture" (two Rs). So this is a case of it being a word in another language yet still misspelled. :) I confirmed it's supposed to be French where it's used in tests.
Neither is actually a word, but let's standardize on ORed.
Partially.
Picasso.
"Protected"
Presumably "response".
"autocomplete"
"blockquote"
"bubbleable" is our non-word word, unless this is like the French translation.
compatibility
"configuration"
"complete"
Probably "control".
Huh, this is totally an English word now. Loanword, yes, but totally in the dictionary.
"description"
Probably "depending"
Obviously supposed to be about entities.
This is thankfully mis-flagged from "i18nsync", not the 90s boy band. Bye, bye, bye.
Similarly just a mis-parsed part of "i18ntaxonomy".
In context, it's supposed to be "readily".
I think the standard spelling is "snafus".
"strength"
Comment #9
xjmI think the obvious individual typos above could be fixed in one issue.
Comment #10
jungleThanks @xjm for the list!
#8 .26 Bye bye bye in the album no STRINGs attached :p, one of my favorite bands. Assigning to me to file a bunch of child issues per #8
Comment #11
xjmCorrecting a British spelling in the IS. 😂
Comment #12
jungleDone.
Comment #13
dwwRe: #12, wow, thanks! That clearly took a lot of effort. Unfortunately, you missed #9:
I suspect @xjm will want to consolidate a bunch of these. I don't trust myself to make decisions about scope, so I'll defer to whatever xjm and other core committers say on how to proceed.
But I see our eager contributor community for trivial patches is already uploading fixes to many of these, so maybe by the time we try to consolidate, it'll be easier to just commit and move on instead of trying to merge things. /shrug
Comment #14
jungleOh, my... sorry!!! missed #9 completely!
Hoping it is not too bad, maybe. Some of them are not just typos, e.g. the "cache" one.
Comment #15
jungleMy non-native-English-speaker eyes can only see what I am focusing on. 🤦 If #9 is in Chinese, I guess I could notice it out of the corner of my eye early. 😂
Comment #16
jungleBTW, It's my mistake, I am willing to merge some of them myself to let them have a better scope if one of the committers asks me to do so.
I would do:
Comment #17
jungleAdded a kanban board, everyone can CRUD it, @see https://contribkanban.com/node-board/a9703fb6-a0bc-4dd9-bbe1-4ba065871276
Comment #18
longwaveInitially I thought the individual issues should be merged, but having reviewed three of them I think they are worth keeping separate, as when there are multiple cases and code changes are required it seems better to be able to pick over the details in individual issues.
Comment #19
alexpottI think this should be postponed on #2972224: Add .cspell.json to automate spellchecking in Drupal core otherwise the child issues we're doing can regress. And we have to can the wordlist all the time. There's no winning and stay winning if we fix these first.
Comment #20
sja112 CreditAttribution: sja112 at Srijan | A Material+ Company for Drupal India Association commentedFew more pending items,
1.
attibute
should beattribute
2.
defintion
should bedefinition
3.
propegated
should bepropagated
4.
traslation
should betranslation
5.
s\'il
should bes'il
6.
O\'Bar'
should beO'Bar"
7.
Prefech
should bePrefetch
These were part of patch #50 Try to automate spellchecking in Drupal core but now as are fixing this in followups. I am working on it.
Comment #21
jungle@sja112, Re #20, better to wait for more feedbacks about the scope. whether to merge some of them or not. I missed #9 already which is a mistake, even though @longwave agreed in #18, from my understanding.
let's postpone this and all child issues on #2972224: Add .cspell.json to automate spellchecking in Drupal core
Thanks!
Comment #22
sja112 CreditAttribution: sja112 at Srijan | A Material+ Company for Drupal India Association commentedComment #23
xjmI disagree with #19, but whatever.
Can we postpone each of the child issues and paste a message to not work on them yet? We don't want novices to invest effort only to haveg the patches rejected for scoping issues.
There are two kinds of spelling errors: Code spelling errors and comment spelling errors. Comment spelling errors don't need dedicated issues. Code spelling errors probably do.
Comment #24
xjmSince we have a 50 K patch already in #3138718: Convert British English spellings to American English, for the umpteenth time, I think we should at least resolve that one and reroll the word list accordingly rather than postponing it until it goes stale.
Comment #25
xjmThe way I would do this is create issues in stages. Start with an obvious cluster, and/or something that requires API changes, and file issues for those. Whittle the list down by the significant clusters first, splitting those into smaller issues if they are too unwieldy, and then work on what is left. That way, we don't spend too much time filing and committing issues for one typo.
Comment #26
jungle#3138778: Fix "Nourriture" relevant typos in core
#3138783: Fix "Partially" relevant typos in core
#3138772: Fix "Disable" relevant typos in core
#3138789: Fix "blockquote" relevant typos in core
Would be appreciated if committers could help to remove my credits from the above issues. They made me reached the milestone of 100 credits in Core yesterday and I hoped that I could get 100 credits in Core before the Drupal 9 release day. But I am not happy with it. It's a behaviour of credit gaming, I did file issue only by copying/pasting/quoting, I should not be giving credits.
Thanks!
Comment #27
catchPosting an issue is fine for issue credit, even for trivial issues, it takes more work than some RTBCs.
The thing that could have been done differently here is grouping all the straightforward spelling errors into larger patches (i.e. by phpdoc, by test content, then one for variable and method names) to have a smaller number of issues.
Comment #28
xjmAgreed with #27; sometimes, posting the issue is most of the work so we give credit for a helpful initial issue writeup. Especially for the first few children of a large meta. Yours had clear instructions, included references, etc. so it is a worthwhile contribution.
Since there were issues at RTBC already because we didn't manage to postpone them fast enough, it was less emotional labor to just commit the ones that only needed to fix a single typo, even though that's not something we'd credit normally.
What I suggest we do after those simple issues are in is update the patch dictionary (or the committed one if the issue is in by then). Then, take that file and:
Start by creating a sub-list of things that are definitely real words in Drupal and relevant technologies, like "Bartik", "Behat", "Buytaert", etc. Post that list on this issue and get consensus on it.
Write a script that loops over the rest of the list, looks for the matches with
grep -ri
to see what the context is, and store that output after the word.Post that document to the issue, so we can review it together.
Use that information to add any additional items to the dictionary from #1 that are appropriate based on the added context.
Group the patches according to the scope guidelines, e.g., one patch to fix
testMethodsThatareIncorrectlyCamelCased()
, one patch to fix all the one-off typos in documentation that don't appear in any UIs or APIs, individual patches as appropriate to deprecate spelling errors in APIs on a per-API basis, one patch per functional bug to fix said bug with test coverage, etc.File issues for those, starting with a few at a time to make sure the issue scopes make sense.
Comment #29
alexpottAnother possible stream of things to do is to propose updates to the cSpell dictionaries. For example there are PHP words in list that should probably be a part of https://github.com/streetsidesoftware/cspell-dicts - for example
curle
since this is from the PHP constant CURLE_OK. And there might be stuff for the html and computer terms dictionaries. If we improve the upstream dictionaries that'd be a really nice outcome for Drupal giving back.Comment #30
jungle#2972224: Add .cspell.json to automate spellchecking in Drupal core was in. Thanks, @catch and @xjm for the replies about the credit thing.
Comment #31
jungleAdding "Needs a follow-up for #3138788 comment-12.1" to IS, because It's a bit hard for me to do it myself. And I am going to remove the "Needs followup" tag from that issue.
Comment #32
jungleBTW. Just set back 2 RTBC'ed child issues to NW for rerolling. The other 2 RTBC'd ones are still valid.
Comment #33
quietone CreditAttribution: quietone as a volunteer commentedFYI, the dictionary contains 'skłodowska' which is a family name, see Marie Curie.
Comment #34
jungleThanks @quietone, FYI, filed #3164652: Update modules/migrate/src/Plugin/migrate/process/Substr.php to remove "skłodowska" from misc/cspell/dictionary.txt to remove it.
Comment #36
jungleExperimentally, made two patches grouped by the initial letter. #3185640: Fix or ignore words that start with "v", excluding real non-English words, #3185807: Fix or ignore some words starting with "w"
Comment #37
quietone CreditAttribution: quietone as a volunteer commentedTo make it manageable for me to find words to change I decided to figure how often a word is used and in what files. The result is in the attached spreadsheet. I am finding it easier to refer to that than constantly grepping for strings.
Comment #38
longwaveJust to note that #3210633: Update JavaScript dependencies for Drupal 9.2 updates the dictionaries that ship with cspell and that will remove ~60 words from the custom dictionary.
Comment #40
SpokjeFound out in #3210633-18: Update JavaScript dependencies for Drupal 9.2 that no changes in the custum dictionary
core/misc/cspell/dictionary.txt
will happen,Comment #41
quietone CreditAttribution: quietone as a volunteer commentedI really think we need to take a moment and make a list of the words in dictionary.txt that we will NOT be changing. Ones that we consider words (for any reason) despite the dictionary. We should also decide what to do about proper names that are in the dictionary. Do we add cspell:ignore lines in the code or leave them in the dictionary?
One reason to do that now is to prevent work removing a word and have it put back in the dictionary during review. The other is to provide some overall guidance on what to remove and what not to remove from the dictionary. There is too much guessing going on for me.
To keep track of the words that must stay in the dictionary we could add a README.txt to core/misc/cspell to list these words and the reasons to keep them in the dictionary.
Comment #42
quietone CreditAttribution: quietone as a volunteer commentedFor example, from my work tonight on #3210129: Fix spelling for words used once, beginning with 'a' -> 'd', inclusive
These words are in dictionary.txt but are words, bakeware, chocolatiers, corge, cucurbitaceae
Maybe these should stay in dictionary.txt? backlinks, backport, classmaps, drupalisms
Comment #43
alexpott@quietone we can have more that one custom dictionary - so we could create a dictionary with all the errata and then attempt to remove that.
Another thing to consider - we should look to file upstream issues to the dictionaries we depend on for real words that fit into those dictionaries.
Comment #44
quietone CreditAttribution: quietone as a volunteer commented@alexpott, yes, I learned that we can add dictionaries after I posted my comment but that doesn't fit what I think is needed.
I've come to realize that what I'd like to have is a place to document decisions about what stays in the dictionary and what does not. Covering points such as how we handle
So, probably a documentation page in the coding standards would be a better place. That is, if people think it will be useful.
Edit: fix formatting
Comment #45
quietone CreditAttribution: quietone as a volunteer commentedThis spreadsheet can be used to identify what words should stay in the dictionary. It also has updated sheets showing all the files and words and counts etc.
Comment #47
xjmLate to the party on the spreadsheet, but perhaps we could also add a column for words that should be ignored on a per-test basis rather than being in the dictionary (rather than removing them and replacing them with something boring). I added a column for this that could be filled in, although it looks like the word list doesn't entirely match the dictionary anymore.
A few specific things I'd request be moved from the dictionary to their own tests instead of being removed, all related to honoring specific top core contributors:
I realize this is a fine line since some references could be unclear, controversial, confusing for English-as-second-language speakers, etc., but I think we can find it in our sometimes-overly-serious maintainer hearts to let core devs have their small joys with some test data.
Comment #52
quietone CreditAttribution: quietone as a volunteer commentedComment #53
quietone CreditAttribution: quietone at PreviousNext commentedI've updated the spreadsheet.
I wonder if we should make a wiki page that lists the words that are not to be changed?
Comment #54
quietone CreditAttribution: quietone at PreviousNext commentedRemoving the Kanban board because it not being kept up to date.
Comment #55
jungleRe #47 and #53
See the example below copied from https://cspell.org/docs/dictionaries-custom/#example-1
Example 1:
custom-words.txt
Comments are allowed in the custom dictionary -- can we add comments for those words or who can remember all those stories behind and treat them as non-typos?
Comment #56
quietone CreditAttribution: quietone at PreviousNext commentedI did think about that. For me, it just seems cleaner to have two dictionaries, one of which will eventually be empty and can be removed.
Comment #58
quietone CreditAttribution: quietone at PreviousNext commentedComment #59
quietone CreditAttribution: quietone at PreviousNext commentedComment #60
quietone CreditAttribution: quietone at PreviousNext commentedComment #61
quietone CreditAttribution: quietone at PreviousNext commentedComment #62
quietone CreditAttribution: quietone at PreviousNext commentedComment #63
quietone CreditAttribution: quietone at PreviousNext commentedUpdate IS and credit
Comment #64
quietone CreditAttribution: quietone at PreviousNext commentedReplaced the spreadsheet from #45 (now deleted) with a new one. Details in the Issue Summary.
Comment #65
quietone CreditAttribution: quietone at PreviousNext commentedComment #66
quietone CreditAttribution: quietone at PreviousNext commentedComment #67
smustgrave CreditAttribution: smustgrave at Mobomo commentedFrom #3391788: Fix spelling of function names in tests
Comment#20 from @xjm
Comment #68
quietone CreditAttribution: quietone at PreviousNext commentedJust updating the list of issues to do in the Issue Summary
Comment #69
quietone CreditAttribution: quietone at PreviousNext commentedComment #70
quietone CreditAttribution: quietone at PreviousNext commentedComment #71
quietone CreditAttribution: quietone at PreviousNext commentedComment #72
joachim CreditAttribution: joachim at Factorial GmbH commentedI don't understand #3413984: Simple fixes for words with prefix of 'de' or 're'. Words like 'reclosed' are not spelling errors -- https://dictionary.cambridge.org/dictionary/english/reclose
Comment #73
quietone CreditAttribution: quietone at PreviousNext commentedDetermining if a word is spelled incorrectly or not will always be a challenge. Languages change over time as words are added and meanings change. Drupal core uses American English so, which dictionary do we use? There are many on-line dictionaries and they do not always agree.
In the end, we are using cspell, and must defer to what they are using, which I think is hunspell. Words can be added upstream but I haven't explored that much after reading this issue.
In the case of 'reclosed', that was used only twice. I think it was better to convert that and have a smaller dictionary list. However, for other words like this with high usage we probably do want to file an issue upstream and/or add it to the drupal dictionary.
Comment #74
quietone CreditAttribution: quietone at PreviousNext commentedI made the followup asked for in #31 and earlier added some suggestions about scope. Therefore, removing the tag.
Comment #75
quietone CreditAttribution: quietone at PreviousNext commented