Fixing spelling in GitHub repos using codespell

Posted on 13 Jun 2022 by Edward Betts

Codespell is a spell checker specifically designed for finding misspellings in source code.

I've been using it to correct spelling mistakes in GitHub repos sine 2016.

Most spell checkers use a list of valid words and highlighting any word in a document that is not in the word list. This method doesn't work for source code because code contains abbreviations and words joined together without spaces, a spell checker will generate too many false positives.

Codespell uses a different approach, instead of a list of valid words it has a dictionary of common misspellings.

Currently the codespell dictionary includes 34,466 known misspellings. I've contributed 300 misspellings to the dictionary.

Whenever I find an interesting open source project I run codespell to check for spelling mistakes. Most projects have spelling mistakes and I can send a pull request to fix them.

In 2019 Microsoft made the Windows calculator open source and uploaded it to GitHub. I used codespell to find some spelling mistakes, sent them a pull request and they accepted it.

A great source for GitHub repos to spell check is Hacker News. Let's have a look.

Hacker News has a link to forum software called Flarum. I can use codespell to look for spelling mistakes. When I'm looking for errors in a GitHub repo I don't fork the project until I know there is a spelling mistake to fix.

edward@x1c9 ~/spelling> git clone git@github.com:flarum/flarum.git
Cloning into 'flarum'...
remote: Enumerating objects: 1338, done.
remote: Counting objects: 100% (42/42), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 1338 (delta 21), reused 36 (delta 19), pack-reused 1296
Receiving objects: 100% (1338/1338), 725.02 KiB | 1.09 MiB/s, done.
Resolving deltas: 100% (720/720), done.
edward@x1c9 ~/spelling> cd flarum/
edward@x1c9 ~/spelling/flarum (master)> codespell -q3
./public/web.config:13: sensitve ==> sensitive
edward@x1c9 ~/spelling/flarum (master)> gh repo fork
 Created fork EdwardBetts/flarum
? Would you like to add a remote for the fork? Yes
 Added remote origin
edward@x1c9 ~/spelling/flarum (master)> git checkout -b spelling
Switched to a new branch 'spelling'
edward@x1c9 ~/spelling/flarum (spelling)> codespell -q3
./public/web.config:13: sensitve ==> sensitive
edward@x1c9 ~/spelling/flarum (spelling)> codespell -q3 -w
FIXED: ./public/web.config
edward@x1c9 ~/spelling/flarum (spelling)> git commit -am "Correct spelling mistakes"
[spelling bbb04c7] Correct spelling mistakes
 1 file changed, 1 insertion(+), 1 deletion(-)
edward@x1c9 ~/spelling/flarum (spelling)> git push -u origin
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 360 bytes | 360.00 KiB/s, done.
Total 4 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
remote: 
remote: Create a pull request for 'spelling' on GitHub by visiting:
remote:      https://github.com/EdwardBetts/flarum/pull/new/spelling
remote: 
To github.com:EdwardBetts/flarum.git
 * [new branch]      spelling -> spelling
branch 'spelling' set up to track 'origin/spelling'.
edward@x1c9 ~/spelling/flarum (spelling)> gh pr create 

Creating pull request for EdwardBetts:spelling into master in flarum/flarum

? Title Correct spelling mistakes
? Choose a template Open a blank pull request
? Body <Received>
? What's next? Submit
https://github.com/flarum/flarum/pull/81
edward@x1c9 ~/spelling/flarum (spelling)> 

That worked. I found one spelling mistake, the word "sensitive" was spelled wrong. I forked the repo, fixed the spelling mistake and submitted the fix as a pull request.

The maintainer of Flarum accepted my pull request.

Fixing spelling mistakes in Bootstrap helped me unlocked the Mars 2020 Contributor achievements on GitHub.

Why not try running codespell on your own codebase? You'll probably find some spelling mistakes to fix.

Tags: codespell, github

all posts