Handy - a simple, open-source offline speech-to-text app written in Rust using whisper.cpp
Ways to use Whisper for speech-to-text
Someone please make an app that transcribes voice to text using Whisper
I made an open-source Android transcription keyboard using Whisper AI. You can dictate with auto punctuation and translation to many languages. :)
Videos
I built a simple, offline speech-to-text app after breaking my finger - now open sourcing it
TL;DR: Made a cross-platform speech-to-text app using whisper.cpp that runs completely offline. Press shortcut, speak, get text pasted anywhere. It's rough around the edges but works well and is designed to be easily modified/extended - including adding LLM calls after transcription.
Background
I broke my finger a while back and suddenly couldn't type properly. Tried existing speech-to-text solutions but they were either subscription-based, cloud-dependent, or I couldn't modify them to work exactly how I needed for coding and daily computer use.
So I built Handy - intentionally simple speech-to-text that runs entirely on your machine using whisper.cpp (Whisper Small model). No accounts, no subscriptions, no data leaving your computer.
What it does
Press keyboard shortcut → speak → press again (or use push-to-talk)
Transcribes with whisper.cpp and pastes directly into whatever app you're using
Works across Windows, macOS, Linux
GPU accelerated where available
Completely offline
That's literally it. No fancy UI, no feature creep, just reliable local speech-to-text.
Why I'm sharing this
This was my first Rust project and there are definitely rough edges, but the core functionality works well. More importantly, I designed it to be easily forkable and extensible because that's what I was looking for when I started this journey.
The codebase is intentionally simple - you can understand the whole thing in an afternoon. If you want to add LLM integration (calling an LLM after transcription to rewrite/enhance the text), custom post-processing, or whatever else, the foundation is there and it's straightforward to extend.
I'm hoping it might be useful for:
People who want reliable offline speech-to-text without subscriptions
Developers who want to experiment with voice computing interfaces
Anyone who prefers tools they can actually modify instead of being stuck with someone else's feature decisions
Project Reality
There are known bugs and architectural decisions that could be better. I'm documenting issues openly because I'd rather have people know what they're getting into. This isn't trying to compete with polished commercial solutions - it's trying to be the most hackable and modifiable foundation for people who want to build their own thing.
If you're looking for something perfect out of the box, this probably isn't it. If you're looking for something you can understand, modify, and make your own, it might be exactly what you need.
Would love feedback from anyone who tries it out, especially if you run into issues or see ways to make the codebase cleaner and more accessible for others to build on.
Hello all! I've been using a great speech-to-text feature on the OpenAI website. I go to this link, click on a green microphone icon, and then upload audio files from my computer. It works really well for converting speech to text. But recently, I saw a message saying that the current method I use is legacy and suggesting I use a new method at this other link. The problem is that uploading audio isn't available in the chat Playground. Thus, I'm worried that soon I won't be able to upload audio the way I do now.
Does anyone know another method to convert speech to text that doesn't use complete mode on Playground? Thanks for any help!