It’s been a long time since I first had to use speech recognition software. I’d broken my arm at Ju-Jitsu and was on deadline—if I remember right, for the middle book of the After America series. Those of you who’ve been around that long might recall that I used the Mac version of Dragon Dictate. And while I made it work, I was always aware that the Windows version was far superior. That made sense—back then, Windows desktops vastly outnumbered Macs, and software companies allocated resources accordingly.
The fundamental divide between Mac and Windows still exists, but I’m afraid the real game now is AI.
I recently swapped out my old dictation setup—AirPods and an open document in Microsoft Word—for AI-powered speech recognition tools. I had no choice, really. Dragon Dictate for Mac died screaming years ago. How many years? I honestly don’t remember. Five? Ten? Whatever, it’s long gone and the bones were buried in Word.
AI for speech recognition (rather than just shit writing) snuck up on me. I suspect it’s been improving for a while, but it wasn’t until recently that I realized just how many alternatives are available now. And because most of them are server-based, they work equally well on Mac and Windows. That said, the one I’m using—MacWhisper Pro— runs locally. I’ve got it on my M2 MacBook Air because it’s slightly faster than my first-gen M1 iMac, and fuck me, the leap in speed and accuracy is staggering.
Back in the day, dictation software was brute force. There was nothing intelligent about it. It was a massive, clunky piece of bloatware trained over tens of thousands of hours to recognize common pronunciations of words. If what you said matched what it had in its memory, great. If not, too bad. It was basically a mechanical Turk for language.
AI doesn’t work that way. It’s not just matching sounds—it’s running probability calculations. It takes your speech, analyzes the waveforms (or whatever creepy shit is going on behind the scenes), and predicts the most likely sequence of words. The result? Much better accuracy.
It took me a few days to figure out the best way to use it, but once I did, my daily word count exploded. I don’t use AI to write—as in, “Hey, robot, write this story for me,” because I’m not a loser. But dictation? That’s different. Now, I just sit down and describe the stories in my head for 10 minutes, let the software transcribe them, and—while it’s not perfect—it’s at least as accurate as the old Dragon Dictate. And a helluva lot more forgiving.
Then, I take that rough transcript, feed it into another AI (Claude or ChatGPT), and tell it: This came from speech recognition, so it’s full of dictation artifacts. Clean it up. A minute later, I’ve got a readable version of my text that’s 99% accurate.
The best part? I don’t have to dictate punctuation, paragraph breaks, commas, full stops, quotation marks—none of it. That as always one of the worst things about the old mechanical Turk. I also don’t have to navigate through the text using voice commands, which was a nightmare. I just talk, as if I’m telling the story to someone at a bar or a café, and the AI does the typing.
The end result? I’ve gone from averaging 2,000 words a day to 5,000. It’s partly why I’ve been absent here. I’ve crushing a deadline for the third Sleeper Agent using this system.
I don’t know if I’ll be able to sustain the pace, but there’s no reason I shouldn’t—it’s not a grind. I’m looking forward to trying it out with WW 3.2.
I support a lot of lawyers in my job and a lot of them use dictation software. I have been surprised at how good the software is getting, mostly because they apply an enormous amount of resources of getting it right.
The days of transcriptionists are almost over. You can feed a dictation file into an LLM, as you say, and get a very accurate transcription out.
They're even training the LLMs on doing transcription of voicemail messages, because apparently, listening to a voicemail is too hard. My mobile also transcribes voicemails. It's not as good as some enterprise phone systems obviously, but it does it.
The obvious evolution is the iButler - a personalised link to LLMs and so on, being helpful and feeding you directed ads (reduce ads for $49.99 per week), doing things like booking appointments, doing your shopping, transcribing voicemails and so on. Different accents, languages and cosmetic options available for both one off and subscription pricing.
I think we should make a distinction between what is now grouped under the umbrella AI. It isn't helped that every techbro and his politician claims everything is AI whilst conflating the LLM AIs copilot, chatGP etc (as I refer to it , spicy clippy) and network learning algorithms that help refine tools like software developed to scan images to identify pre-tumorous cells better than radiologist, Improves classifications of microbiota in seweage treatment systems or as in the case above that helps improve speech recognition.