I know OpenAI recently released whisper V3 Turbo but I remember hearing about some other ones that's a lot better but I can't remember
Videos
I wanted a straightforward way to interact with local LLMs using voice, similar to some research projects (think sesame which was a huge disapointment and orpheus) but packaged into something easier to run. Existing options often involved cloud APIs or complex setups.
I built Persona Engine, an open-source tool that bundles the components for a local speech-to-speech loop:
It uses Whisper .NET for speech recognition.
Connects to any OpenAI-compatible LLM API (so your local models work fine or cloud if you prefer).
Uses a TTS pipeline (with optional real-time voice cloning) for the audio output.
It also includes Live2D avatar rendering and Spout output for streaming/visualization.
The goal was to create a self-contained system where the ASR, TTS, and optional RVC could all run locally (using an NVIDIA GPU for performance).
Making this kind of real-time, local voice interaction more accessible feels like a useful step as AI becomes more integrated. It allows for private, conversational interaction without constant cloud reliance.
If you're interested in this kind of local AI interface:
Code/Details: https://github.com/fagenorn/handcrafted-persona-engine
Demo: https://www.youtube.com/watch?v=4V2DgI7OtHE (forgive the cheesiness, I was having a bit of fun with capcut)
Curious about your thoughts ๐
I'd like to know if there are any recent open-source large language models that can be deployed locally on my computer? I want it to have speech-to-speech capabilities, like voice chat, and ideally with real-time interruption capabilities. Are there any such open-source models available?
any github address or advice i would really appreciate it.