Is there a reason why Apple's iPhone spellcheck is often really poor, significantly worse than both LLMs and just...human eyes?
I often find myself butchering the spelling of a word in a way where the correct answer is obvious to human eyes (probably because of "typoglycemia" [1]) and an AI LLM immediately understands what I meant to say, but Apple's spellcheck has "No Guesses Found."
Yes but it’s much broader. Just in general the lack of Steve Jobs noticing these glaring issues and coming down hard to solve them is pretty clear.
I remember when macbooks briefly came out with a ridiculously bright standby led that required
Black electrical tape over if you wanted to sleep with it in the house. Shortly after no more status leds on any MacBook (thank you!).
Nowadays i find non stop little annoyances with threads from others on the same issues on Apple devices. From.the.overly.prominent.full.stop when searching textually in the url bar to the crappy spell check and crappy spam filtering. As much as Jobs apparently came across as an asshole there’s a need for someone at the top to say ‘WTF is this, fix it or get fired!’.
I worked at Apple and heard a lot of Steve stories. He really did personally approve everything. He would be sitting in a room, and team leads would all line up to give their quick 2-minute update. So it's the MacBook Air guy's turn. He comes in and places his prototype down in front of Steve. Steve opens the lid. Two seconds later he picks up the laptop and heaves it so hard it skipped across the table like a stone on water: "I said fxxking INSTANT ON!!" The poor guy collected his prototype and exited the room. Later the MacBook Air launched... it fxxking turned on the moment you open the lid
Good product development really does seem to require some sort of leader who demands quality and smacks people when they don't deliver. Linux is nice because of Torvalds for example.
I was given a small electric fan. It’s great in that it’s portable and I can use it in some of the crummy hotels I have to stay in.
Unfortunately, it has a bright blue LED on it so it’s a pain to use at night when you’re trying to sleep.
It’s so bright that even covered with tape it still shines through the thin plastic of the fan body.
What really gets me is why they bothered putting an operating light on it in the first place?
It’s a fan. The fact that it’s working tells you it’s working.
A Jobs or Torvalds type character would have pointed that out.
I suspect though that it’s often a case of people noticing these type of design flaws but not having the authority to fix them while those with the authority don’t care.
I've worked in physical product development at some companies that include names you'd recognize.
More often than not, those annoying features are direct requests from the person up top who smacks people. They want that feature because they think it will sell, and it's no use trying to argue with them because you'll just get smacked again.
Oh yeah, it's definitely solvable if you can be bothered with it.
It was more just the observation that an unnecessary light had been included that degrades the performance of the product.
I find it intriguing how that comes to be. On paper it seems like adding the light wouldn't hurt the product even if not useful but no body actually used it it seems.
There is a chance the led is also used as a important diode in the circuit, plainly removing it can greatly reduce the lifespan of the device. (more common in cheaper products)
Adding a appropiate diode in it's place is advised.
> I suspect though that it’s often a case of people noticing these type of design flaws but not having the authority to fix them while those with the authority don’t care.
Kinda related but also not really, my own pet peeve is the pouring spout in many products, coffee machine, water jugs, buckets... they might look effective but I find that more often than not, they are curved too much and drip all over when actually pouring.
And I always have to wonder, after serving coffee from one of those things, did the person who design it never even try it just once? Didn't they ever use such a thing, they never ever poured water from a pot?
I have to agree with that as a lead. Most developers claim to be done with a task without taking care of the small details that users will immediately notice. It’s a constant struggle to try to get them to care about what is actually the value of the feature they are implementing, let alone chase on their own initiative the small issues unless painfully listed in some requirements document.
As a dev Im always noticing these little problems in my designs but my boss just wants the thing done asap without worrying too much about it being nice to use.
Same here to the point when I do leave it’s going to be one of the reasons given.
One example is how this product manager type, because of company politics, isn’t really under the same department as the other software teams.
Because of his very very narrow horse blinkers approach, he doesn’t see or even comprehend why we’d want to align with literally anything in any other team and that includes visual UI stuff.
That’s why we have a bright neon pink “Back” button. Right in the literal center of the screen. It’s insane.
In this case, both fair and fare are words in English. Which shows that spell checking needs to know a lot about grammar and context to work in general. Basically you need an LLM. Or if not a 'large language model', perhaps at least a small language model.
I wonder how it does work, I remember MS Word having a fairly decent grammar checker when I was using it in school - which predated LLMs by many years!
I suspect an LLM wouldn’t be the most optimal choice
Latency is actually an interesting case, because it’s one of those things that, by default, nobody owns end-to-end
If you’re booting a computer or building web search, every subsystem can contribute to latency. If you have more teams and more features, you’re likely to have more latency.
In the early days of Google, Larry Page would push hard on this as well, in person. So Google search was fast.
But later the company became larger and bureaucratized, so nobody was in charge of latency. So then each team contributes a bit to latency, and that’s what ends up shipping.
Google products used to be known for being fast, but they’ve reverted to the mean
The instant on thing actually bothered me enough to make switch from windows back to Mac( by proxy the idle battery drain on windows was also pretty terrible)
Alignment of incentives. I'm sure the personal humiliation of being yelled at by Jobs was a reasonably strong incentive, but I'm certain the perception that failing to deliver would have him personally sending you to the dole queue asap was even more of a strong incentive.
Compare to most corporations where the only thing you can do to get fired is fail at office politics and failure to deliver/delivering the lowest quality crap that can be passed off is just business as usual.
Alas, human don't come fully customisable. You get to pick from the packages on offer. And it seemed like for Apple Steve Jobs' good parts only came as part of a package that also included his bad parts.
Most macbooks I remember since a long time ago were pretty much instant on way before apple sillicon. Maybe you had some corporate crapware installed in yours/.
I've also found a lot of this stuff is due to naysayers telling people that things can't be fixed (because really they don't want to bother). You need a strong leader to say "no it can and we will".
It takes a village. Also to be successful in tech it takes an asshole. No way around it. At some point all successful companies share an overly aggressive visionary. The entire company doesn’t need to be toxic, but the apex does. If you don’t like it, don’t climb the ladder.
Skill issue. There are other ways to get those results, but being an asshole is the lowest-hanging, and is nearly free if the people around you don't have the self-respect to walk away.
I don't know, maybe he wouldn't raise his voice, but I can't imagine it would be fun to be a subordinate of LKY at the moment he decided you were wasting his time.
Digital's Ken Olsen is probably one of the most relevant examples. Though it probably helped that Olsen was largely working with the grain at DEC, building digital playsets which engineers themselves loved, rather than constantly forcing them to adopt a non-technical consumer's perspective.
Not just that, but the strong leader needs to ensure that it can be fixed.
Yelling at a rank-and-file to unfuck some random system, then not giving them any time, resources, or tools to fix it is just being a dictatorial dickhead.
Wait until you realize that the icon of the period and spacebar don't at all line up to the touch area due to touch gravitation. You can tap slightly more on the spacebar side and still end up with a period. https://www.reddit.com/r/iphone/comments/1ekszul/comment/lgn...
So if you suffer from this it's not even your fault. You're literally hitting the spacebar but some incentive at Apple in their org structure has led to the period literally having waaaay too much weighting and the lack of exec oversight at Apple in the post Jobs days is leading to us all.typing.periods.whenever.we.just.wanted.to.search.
It's hard to pinpoint exactly what it is, but yes, there seems to be an increasing number of small issue with Apple devices. They aren't major stuff simply not work, but yes, spam filter being pretty terrible, text overlapping on non flagship phones (e.g. the iPhone SE). All sorts of minor annoyances.
Yep I've worked at other big tech where they had periodic "executive bug filing" where executives would flag minor things that are annoying. These minor issues would then have higher than normal priority purely by virtue of being flagged by an exec. I have a likely controversial opinion that this executive bug filing actually led to better outcomes.
It did break prioritization in the opinion of the ground level teams and their goals but I argue it's not bad to at least periodically do this since grating against the current org structure prioritization and goals is not a bad thing to do on occasion.
Chances are they'll find there's no team that considers themselves the owners of spell check or spam filtering and the goals the keyboard team are going for is likely some silly thing like "number of sentences with correct punctuation" leading to the current ridiculous outcomes where the period in the URL is way too prominent, especially considering we don't even type full URLS into the search bar that often these days.
Dear Apple leads: if you're reading this do a short initiative where execs aim to file an annoyance a day. It's not hard to find such. There will be some complaints at the ground level that these executive annoyances get too much priority but part of that will be because you're questioning lower level org priorities (a healthy thing to do!), not because the issues don't matter. The end result will bring Apple a bit more in line with the quality we saw during the Jobs period since this is exactly the kind of shake up he did on occasion.
Currrent MBPs have bright green/orange charging lights on both sides of the magsafe connector. They're bright enough I have to block them when I'm in a hotel and my laptop is in the same room.
That light was really helpful to do the occasional late night visit to the bathroom in a home where I lived where there were no light controls at the bedside.
There has to be something going on with iOS Safari and the keyboard because my typing goes to complete shit in ways it never does in any other application.
Here are some random examples I thought of for this comment. Notice how everything is spelled wrong as though the screen input doesn’t match the location of the buttons.
I notice this as well, i think it is because the autocorrect is turned off and we may just be so used to it learning our typing habits that our "raw" typing is really that bad
> I remember when macbooks briefly came out with a ridiculously bright standby led that required Black electrical tape over if you wanted to sleep with it in the house. Shortly after no more status leds on any MacBook (thank you!).
The lack of status LEDs is actually the only thing I really REALLY hate about MacBooks!
Too often I have been bitten by the thing not properly going to sleep because SOMETHING keeps a wake lock (and of course macOS doesn't indicate this anywhere outside of Energy Monitor, nested in System Activity) and overheating in my bag as a result. A simple LED would have been a good visual indicator that it is still awake.
There's nothing more frustrating than when you type the word you want to type, it changes it to a different word, you delete it and type the word you wanted to type again and then rinse/repeat 3 to 4 times before you have the word you actually wanted.
And if you're not paying attention, your message ends up looking like you're having a stroke.
It used to be that if you typed, deleted the correction, and retyped, that spelling would now be the preferred and you wouldn’t have to play that game anymore. Apple broke that years ago.
> they focus too much on the first letter of the word
They also do that in Apple Notes. On the iPad the search can only match word prefixes. So if you type "oo" and the entire note consists of just the word "foo", it will find nothing. This doesn't even require fuzzy search, yet they couldn't be bothered while solving the much more difficult handwriting recognition problem.
Also the iPhone's Settings app still doesn't have all settings in the search index. So it's impossible to find the section "headphone safety" & "reduce loud audio" using words like "headphone", "audio" or "safety". This setting was introduced five years ago, by the way.
> they really don’t want you saying bad words of any kind.
Not true anymore, I just typed fuck in this comment without having to fight it. They made a change I think last year and they even announced it.
> they do not look at context at all
Also not true. It's true that they're not perfect at it, but replacement after you typed 2 more words happen specifically because it can tell better what you want to say. Sometimes works against you because language is highly personal.
Bit off-topic - macOS has excellent built-in dictionary. Just select the word in any app, press Ctrl+Command+D and it opens it. It even guesses most incorrect words correctly. Also translation available if it exist for current keyboard locales.
E.g.
> No entries for "typoglycemia", did you mean "hypoglycemia"?
These user activated dictionaries tend to be excellent (even in vim, a pretty barebones system, I tend to get fantastic guesses from the machine).
Actually, come to think of it, the problem must be a bit easier than on smartphones, right? Real keyboard input is very precise. Smartphone keyboards already guess what word you were trying to spell, so they are influencing the typos in the direction of likely words… cannibalizing the very guess list that the dictionary uses!
that's great. I usually use the context menu on MacOS and the "Define" option on long press on iOS
That said, trying to use long press on iOS (or whatever it actually is), is one of those places that often drives me nuts. I don't know if the issue is a specific app or the OS or what but sometimes I want the popup menu to appear and I can't get it to appear. Or I do something to make it appear but it doesn't appear for x hundred milliseconds, during which I think it didn't get my gesture so I start a new one, just as it's finally responding in which case my new gesture dismisses it. Repeat 3-4 times before I'm ready to tear my hair out
It also shows why canvas based websites suck. Open Google Docs, select a word, press Cmd-Ctrl-D, ... nothing. Try it in gmail (which is not canvas based) and it works.
Alfred ties into it nicely too, you can type `spell someword` and the completions below have the various spellings of words, fuzzy matched. Select one and the word goes onto your clipboard
The spell check is truly bad. It boggles the mind how this is even possible given how solved the problem is everywhere else. Also the period being to the right of the spacebar such that it gets hit instead of space. So annoying!
I feel the same way about Android's. It just seems like spell check used to be so much better then years ago. But I'm not sure whether it's comparing mobile with desktop expectations. It really seems extremely dumb on Android.
When I used Windows Phone 8.1 I felt like I was typing text twice as fast as on Android. Better suggestions, more accurate keyboard inputs on the same screen size, and selecting an entire word was just a single tap which made fixing a typo very quick as well. Meanwhile back then it was impossible to make certain text selections without a bluetooth keyboard because of how Android constantly tried "fixing" touch-based selections. It's sad that Microsoft shut down the only system & UI that felt like the developers were actually thinking of the user when designing it. To this day no other mobile OS is as friendly to left-handed users.
I mean, TBH I would expect this to be true: an LLM is trained over a massive corpus of internet data, which contains many typos, and is required to accurately predict tokens despite edit errors. A spellchecker is typically running a deterministic algorithm really, really quickly, and has hardcoded limits on acceptable edit distance (and has no learned knowledge of what looks correct/incorrect to human eyes). An LLM should generally trounce a spellchecker at figuring out what you meant to type, unless the spellchecker is secretly a tiny LLM / ML model of some kind under the hood.
"Hypo", meaning low; "glyc-", meaning sugar; and "emia", meaning of the blood. "Low sugar of the blood". (With apologies to chubbyemu.)
Since "typo" comes from "typography", it roughly means "symbolic". So "typoglycemia" should mean "symbolic sugar of the blood". Low typos in your blood would be "hypotypemia".
I have no idea why "typoglycemia" refers to a human ability to autocorrect, but it brings me joy, so I'm not going to question it ^_^
Yes. I just typed in "Tipografical earer" - and iOS 18.6 suggested "Tipograxical" for the first word, and one of "eared", "eager", and "eater" for the second word.
Here are some nice examples (excluding obvious edit distance based ones which it does right)
"snowbalfight" --> "snowball fight"
"unrelevant" --> "irrelevant"
"fone" --> "phone"
"the the" --> "The"
And all of this with auto capitalization if it notices you're at the start of a sentence, and stuff like handling proper nouns, punctuations, etc,.
What I find really interesting is swipe-type spell checking (its basically word prediction) on phones. That is a really cool problem to solve well. Sometimes it works like a dream and other times it's annoying. I wonder how they write those.
I have definitely noticed this too. I also use the built in swipe to type feature, and it may as well be a coin flip as to whether it gets the word right. I get that swiping is vague, but even a little bit of frequency prediction would tell you that “sounds good” is going to be more likely than “sings hood”. It’s an absolutely infuriating feature.
I use the swipe feature because I guess I have wide fingertips and frequently hit unintended, adjacent keys when pecking on the keyboard (especially as I’ve gotten older). The words produced by swiping often make no grammatical sense, and are frequently esoteric words that I just can’t believe rank high enough on a basic frequency list to suggest. Not to mention my own vocabulary, which apparently is not considered by the keyboard at all.
I had a way better experience using SwiftKey on my android phone 15 years ago.
It is 2025 and the best spell checker is a search engine. Numerous time an application will not provide the correct word. Only solution is to try the word in a search engine and try using in a sentence if that fails.
In my opinion, this is where ML/AL local model, no internet required, would be the most beneficial today.
Even had to use a search engine with, "thoughts and opi" because I forgot how to spell opinion before posting this. In application spell checker was 100% useless with assisting me.
Instead of how LLMs operate by taking the current text and taking the most likely next token, you take your full text and use an LLM to find the likeliness/rank of each token. I'd imagine this creates a heatmap that shows which parts are the most 'surprising'.
You wouldn't catch all misspelling, but it could be very useful information to find what flows and what doesn't - or perhaps explicitly go looking for something out of the norm to capture attention.
I would like this too. This approach would also fix the most common failure mode of spelling checkers: typos that are accidentally valid words.
I constantly type "form" instead of "from" for example and spelling checkers don't help at all. Even a simple LLM could easily notice out of place words like that. And LLMs also could easily go further and do grammar and style checking.
I've seen this in a UI. They went a step further and you could select a word (well token but anyway) and "regenerate" from that point by selecting another word from the token distribution. Pretty neat. Had the heatmaps that you mentioned, based on probabilities returned by the LLM.
This should also be pretty cheap (just one pass through the LLM).
I've used BERT to do that sort of thing. It was a prototype and I was using Pytorch, also, I'm not an expert on Pytorch performance. I also tried with models that succeeded BERT for masked token. My issue with it is that it was slow :-( . My second issue with it is that it wasn't integrated in my favorite word editor. But definitively useful.
Can confirm. The first time I saw an automatic spellchecker was probably with WordStar around 1989, and it blew me away. How can the computer know all the words? That's insane! Sounds lame, but it's true. It was a different world.
Since you're old enough, here's a question for you. Do you remember if at the time the first spellcheckers were invented, people were negative on spellcheckers, because that would mean that soon people would stop learning how to spell and just general dumbing down?
It seems that anything that helps people gets this reaction these days. On the one hand, the argument 100% resonates with me. On the other hand, spelling isn't really the end, is it? It's just a means to an end, so what's wrong with making the mean easier? Did people worry that you'd stop knowing how to plant potatoes when trading was invented? EDIT: The example doesn't make sense because agriculture is newer than trading, but you got the idea.
Not as much with spellcheckers because even when they started to get popular, it was apparent that many people cannot spell English. So it was very natural.
People pushed back on the grammar checks when they landed in Word.
Before that, people pushed back on calculators in secondary schools. This was a huge point of contention all classes except trigonometry, and calculators were definitely not allowed in the SAT/ACT.
> People pushed back on the grammar checks when they landed in Word.
Word’s grammar checker has improved quite a lot. But I absolutely hate the style checker and its useless advice. Yes, I know how the passive voice works and yes, it is appropriate in this sentence. Also, it’s not really a problem in English but Word still can’t do spaces properly so it wants to put normal spaces everywhere and it’s fucking ugly. I wish it would spend as much time fixing inappropriate breaking spaces (in English as well).
What I think is one should question if something is the point of the exercise.
I'd argue that where writing spelling is something completely arbitrary and thus of no fundamental importance. Arithmetic is the same way, lots of algorithms to do that and they are all valid. So calculators and spell checkers are fine. And you should use them.
The same is not true for grammar. Getting AI to write an essay for you.
I remember hearing this as late as the early 00s. I'd buy electronic dictionaries and spell checkers at yard sales and things like that, and use them in class. Multiple teachers were disapproving of it, despite it basically just being a paper book dictionary in a small, TI-92 shaped device. 10 year old me never saw how flipping through some obnoxiously heavy book in the back of the classroom was better than just punching in a few letters, hitting the "show definition", and ensuring I was spelling and using "curmudgeonly" properly.
Same went for using MacWord vs AppleWorks. MacWord had a built in dictionary, AppleWorks didn't.
I think this happens every time something gets automated away, and in a way it's true. I'm sure a lot more accountants knew 123x27 by heart before than they do now. The problem is LLMs take out the whole process of thinking, and that is going to be a problem: you generally need to think even when you're not in front of a screen.
> because that would mean that soon people would stop learning how to spell and just general dumbing down?
I'd argue that negative people where correct. People can't spell anymore, not even with a spellchecker. Maybe they never could? I'm not against spellcheckers, I think they are amazing, but they haven't helped much.
WordStar is not the problem, StarWars is. Popular culture has become so vapid that when it comes to writing and the thinking behind writing, most people fare worse than an LLM. I know that old people have been saying this basically since ancient Greece, but it bears repeating: the youth is lost.
I don't remember any particular negative reaction to spell-checkers like the 'calculator panic'.
Perhaps partly because most schoolkids then wouldn't have been using word processors as their main writing tool at school and people using them in a corporate environment were pleased not to make embarrassing errors in their emails.
It was the opposite experience for me. Before spellcheck was commonly part of the web browser, I would go back and reread some very early emails and/or usenet posts from myself. And realize how atrocious my spelling was.
I actually consider spellcheck to have improved my spelling dramatically over the years. The little red squiggles under words have helped me to recognize my misspellings, especially the words that are hard for me to get right consistently.
As I recall, there was some, but not a lot of FUD around spellcheckers, mostly because personal computers were still relatively new. Most GenX parents (Boomers) didn't even know what personal computers were yet, so they didn't know enough to be concerned. (I grew up in Missouri, which was the Digital Stone Age back then.) At the time, I think their complaining was more focused on MTV and video games.
However, also sounds weird, but I recall myself and some of my peers questioning spellcheckers, "Why do I need this?", because spelling was a primary mission of our education. We were all raised constantly being tested on spelling. In fact, I think I disabled the spellchecker on my old-ass 286 because it caused delays in the overall experience.
I had the same thing when the Encarta CDs started to include pronunciation tests. You'd get a word, speak it in the microphone, and get a "score" on how well you pronounced that word. Knowing what I know now, it was probably pretty inaccurate and hand wavy, but in the early 90s that was an absolutely amazing experience for an ESL person.
Having a dictionary is a prerequisite but is only a small part of the spell check problem. Plus, plain text word lists are slow to parse in the 80s; better going with a Trie or some other exotic tree structure that is naturally compressed but O(log(n)) instead of O(n) to traverse.
The computer has to figure out whether the word is in the dictionary, but it also has to figure out a suggestion for what to change it to.
And even after just that, we already have a bug- homonym mistakes- homonyms are in the dictionary but they’re misspelled (that was intentional btw).
How misspelled is another problem. We’ve had Levenshtein et al algorithms for a long time, but how different can you get? A really badly misspelled word might not have any good replacement candidates within your edit distance limit.
There are also optimizations like frequently mistyped words (acn-> can), acronyms, etc.
"A Spellchecker Used to Be a Major Feat of Software Engineering"
It still is. The spell checker on my Android phone is a PIA. It's too dumb to correct many typos, there's no way of highlighting wrongly used but correct words such a 'fro' and 'for', etc. There's no automatic or user defined substitution such as correcting 'rhe' with 'the' and yet keep the words highlighted until a final revision.
Wordpossessor spellers have no way of tagging certain words that one may or may not wish to use depending on context. A classic example that's caught me out past the draft and found its way into the final document without me noticing it is 'pubic' for 'public'. Why doesn't my speller highlight such words in red and ask whether I actually meant to use this word?
Moreover, spellers are not all of the same level of accuracy, for example Microsoft Word's speller is much better than LibrOffice's much to my annoyance as LibreOffice is my main (preferred) WP.
Nor is there a method of collecting misspelled words or typos and tagging them as spelling errors or typos for the purpose of helping one's spelling or typing. It'd be nice to have a list of my misspelled words together with their correct spelling, that way I could become a better speller. Also, spellers could be integrated with full dictionaries—highlight the word and press F1 for its meaning, etc.
There are no dictionary formats that are both universal and smart, that is that would allow for easy amalgamation between dictionaries and yet could contain user defined words and other user metadata which would be distinguished from the general corpus of words when crossed or amalgamated. For example, a smart dictionary format could contain metadata that would allow a dictionary and thesaurus to coexist in the same word list, similarly so different dictionaries, technical, medical etc.
All up, spellercheckers are still a damn mess. They need urgent attention.
As a copyeditor/proofreader, the number of times over the years I've had to fix the low-quality (i.e, wrong) suggestions is quite large. ("he had a small plague on his desk" remains a favorite.)
I have a spelling checker
It came with my PC
It highlights for my review
Mistakes I cannot sea.
I ran this poem thru it
I'm sure your pleased to no
Its letter perfect in it's weigh
My checker told me sew.
Given that spellcheckers are mostly stable tech, I wonder why Google’s spellchecker in Gmail, or even in Chrome in “enhanced” mode, is so bad.
Even Microsoft Word, being a local app and everything, manages to work better than Google’s cloud-based offerings. That’s surely evidence that progress is far from being linear.
There have been some very specific issues I and others have noticed that lead me to believe the backend for Google’s cloud based spellchecker has changed from a traditional language model to some more generalized LLM-based system. It’s gotten distinctly more terrible a couple of times in the last few years.
The way growth in memory availability changes the scope of problems is really quite astonishing. I cut my teeth writing code for Apple ][ computers with theoretically up to 128K of RAM, but in practice much closer to 40K for most use cases, but it does make me much more conscious of memory and CPU usage than younger devs who never faced these sorts of constraints.
Thinking of the example given about being able to just load the word list into memory, I did something of that ilk when my son’s fifth grade class read a book which had a concept of dollar words: You assign a value to each letter, a=1, b=2, … z=26, add up the value and try to get exactly 100. It was pretty trivial to write a program that read the word list and produced the complete list of dollar words (although I didn’t share that with my son, I did give him access to the word list and challenged him to write the program himself).
At the moment, I’m building up a Spanish rhyming dictionary by using a Spanish word list, reversing the words and sorting the reversed list to find the groups of words that are most likely to rhyme, which was something that 30 years ago would have been a challenge on my desktop computer but now is a brief script that I’m just as likely to manage through perl 1-liners and shell pipes as not.
> Few engineers could build a good spell checker without external libraries, giving a database of valid words.
Writing a spell checker that quickly identifies if a word is in a list of valid words (the problem described in the article) is a trivial problem for anyone who has basic algorithms and data structure knowledge. It's the classic example for using a trie: https://en.wikipedia.org/wiki/Trie
The problem described in the article is doing it within very limited storage space. How do you store your list of 200K words on a system with only 256K of memory? This is the challenging part.
> How do you store your list of 200K words on a system with only 256K of memory?
Your hard disk is almost always larger than your RAM. You only load into memory what's needed at the moment. I hope that gives a hint on how to proceed with the above problem.
But you don’t necessarily even have a hard disk. You might only have a 320K floppy. Floppy-only computers were pretty common in the late 80s when I was in undergrad.
I was actually asked to build a spell checker in an interview. I immediately thought of Peter Norvig's article on spell corrector (https://norvig.com/spell-correct.html) and proceeded to explain. It turned out that the interviewer really wanted a spell checker not spell corrector: the program will only point out words not in the set of known words.
Almost. You needed to clarify what the interviewer was asking and discover requirements. As much as HN likes to hate on coding interviews requiring specific algorithm knowledge, determining requirements is very much part of the job, and engineers have a tendency to build what they want to build, not what the customer wants.
I don't care about people who like to overengineer things, as long as they are humble and have awareness of this tendency. Sometimes you need the extra power and creativity. If they are affable and don't take critics personally, they usually get in with the program easily.
Marvin the Paranoid Android voice. 'Oh... that. Just do a linear search on a dictionary. Put the small words first. I know it's woefully inefficient but you won't fuck that up'
It's also a key enabler to CJK typing on computers. CJK scripts never map to keyboards well, so instead of actually typing, approximate representations are typed in and regularized into written forms using similar technologies as spell checkers. It's a neat thing if you speak one of the languages, sort of interesting that a similar tool haven't been integrated into English keyboards.
Doesn't Chinese input usually work by typing Latin codes for characters? Korean characters represent syllables made up of shapes representing individual sounds, those fit on a keyboard just fine. And I'm not sure about Japanese, there they may use something like spell checkers to map kana to kanji.
Another interesting challenge with CJK languages was just displaying them. You need higher-resolution graphics and a much bigger character ROM to even consider that.
Romanization systems for Chinese vary, but all have the issue that a single "word" in the romanized system can map to dozens, if not hundreds, of actual "words" in the target language.
Pinyin is sort of the standard for romanization, although other systems exist, as well as inputs that aren't based on romanization (bopomofo).
Take the pinyin `fei`. Just looking at the tones that can be on this word, it can mean at least 4 words (my dictionary app couldn't find any neutral tone words). In reality, its at least dozens, each with different contextual meanings.
IIUC there are ambiguity problems in Chinese and Korean, just less than there are for Japanese. Korean input has no end-of-character marks and multi-character entry could be split different ways, Chinese has bunch of homonyms-in-Latin, and Japanese is a huge mess(like always, if I think about it...)
For checking? Just a lookup on disk (no db, just a large list with a custom index, then binary search in the retrieved block). Decoding anything was slow, and in-core was basically out of the question [1]. Caching was important, though, since just a handful of words make up 50% of the text.
I once built a spell checker plus corrector which had to run in 32kB under a DOS hotkey, interacting with some word processor. On top of that, it had to run from CD ROM, and respond within a second. I could do 4 lookups, in blocks of 8kB, which gave me the option to look up the word in normal order, in reverse order, and a phonetic transcription in both directions. Each 8kB block contained quite a few words, can't remember how many. Then counting the similarities, and returning them as a sorted list. It wasn't perfect, but worked reasonably well.
[1] Adding that for professional spell checking you'd need at least 100k lemmata plus all inflections plus information per word if you have to accept compounds/agglutination.
The limit given in the article is 360KB (on floppy). At that size, you can't use Tries, you need lossy compression. A Bloom filter can get you 1 in 359 false positives with the size of word list given https://hur.st/bloomfilter/?n=234936&p=&m=360KB&k=
The error rate goes up to 1 in 66 for 256KB (in memory only);
according to https://en.wikipedia.org/wiki/Ispell ispell (1971) already used Levenshtein Distance (although from the article it is not stated if this already existed in the original version, or if it was added in later years).
Levenshtein distance up to 1, according to that article. If you have a hierarchical structure (trie or a DAG; in some sense, a DAG is a trie, but stored more efficiently, with the disadvantage that adding or removing words is hard) with valid words, it is not hard to check what words satisfy that. If you only do the inexact search after looking for the exact word and finding it missing I think it also won’t be too slow when given ‘normal’ text to spell-check.
The first article I read about the techniques used in the spell program was the 1985 May issue of Communications of the ACM (CACM for those who know), https://dl.acm.org/toc/cacm/1985/28/5, in Jon Bentley's Programming Pearls column.
Not as much detail as the blog.codingconfessions.com article mentioned above, maybe some of the other/later techniques were added later on?
This article ends too soon! Show me the techniques and solutions those clever programmers of old came up with. Did I miss a link somewhere to subsequent posts?
Early spellcheckers often used Bloom filters to efficiently store dictionaries in minimal memory - a probabilistic data structure that could determine if a word was "definitely not" or "possibly" in the dictionary using just a few bits per word.
I have used WordStar in 1993. But I don't remember anyone really using the spell checker or feeling a need for it those days. Work was slow and we had enough eyes and time to catch the spelling mistakes. When we were upgraded to MS Word on Windows 3.1, we were astonished to see that it has a feature to preview the document before printing. But somehow WordSatr still looked more fluid and faster than GUI-based word processors.
One wild thing about the AI era is that tasks which once required specialized NLP expertise—rhyming/meter detection, grammar correction, sentiment analysis—can now be done by weak LLMs. Same APIs, different prompts. I’m surprised more people aren’t exploiting this.
Sentiment analysis by small models is quite bad. I haven't tried grammar correction, but I imagine it will perform better in English than in e.g. German.
IIRC none of the popular text-based games of the 80s and early 90s incorporated even basic support for spelling mistakes, best-case scenario they had one or two synonyms baked-in and even that was uncommon!
My Commodore 64 had a spellchecker. It was a separate program. I had to save my file, exit my word processor program, switch floppies to the spell checker program, wait for it to load, and all I got was a list of misspelled words… no suggested corrections.
Thinking back, how the heck did they do spell checking algorithms on a 6502? That’s a bit of code I’d like to see reverse engineered!
SpellMaster for the (6502-based) BBC Micro was seriously impressive given the space limitations.
It did both spell checking and correction (and had an anagram finder as a bonus), had integration with several different wordprocessors, check as you type functionality, AND its own integrated editor on top of that. The built in dictionary had a claimed 58k words (with a claimed checking speed of 10k words per minute). All of this was somehow squeezed into 128k (as a ROM on a carrier board with a hardware bank switching mechanism paging in 16K at once).
I used a document editor in DOS with spellcheck. It was quite fast too. I don't remember what it was called but it featured formatting characters such as character returns and paragraphs. It had good printer support. I think it was called Easy Print and it was only a couple of dollars.
Pff, now Microsoft Word probably sends your document to an AI every second and asks it to send back a PNG that highlights every misspelled word, and it overlays that underneath your text.
After having used the IntelliJ/PyCharm spellchecker for German quite extensively I can only attest to this. It's so much more than just checking is a word is spelled correctly.
On the other hand, I did use Grammarly for a while and ultimately became annoyed at its tendency to keep "re-adjusting" sentences after it had already fixed them, so there is a fine line to walk.