whisper-v3-turbo because of its wide compatibility with open source ecosystem (not necessarily because of its WER) The architecture is plug and play. You can typically add some LLMs along with whisper to correct stuff for you and customize as you need. I wrote a guide here just now: Creating Very High-Quality Transcripts with Open-Source Tools: An 100% automated workflow guide Answer from phoneixAdi on reddit.com
๐ŸŒ
Northflank
northflank.com โ€บ blog โ€บ best-open-source-text-to-speech-models-and-how-to-run-them
Best open source text-to-speech models and how to run them | Blog โ€” Northflank
Explore the best open source text-to-speech models like XTTS-v2, Mozilla TTS, and Bark. Learn how to choose, deploy, and scale them for production with GPU support using Northflank.
๐ŸŒ
BentoML
bentoml.com โ€บ blog โ€บ exploring-the-world-of-open-source-text-to-speech-models
The Best Open-Source Text-to-Speech Models in 2026
Here is a code example of serving XTTS-v2 with BentoML: Deploy XTTS-v2Deploy XTTS-v2 Deploy XTTS-v2 with a streaming endpointDeploy XTTS-v2 with a streaming endpoint ยท Developed by Neuphonic, NeuTTS Air is the worldโ€™s first on-device, ...
๐ŸŒ
Gladia
gladia.io โ€บ blog โ€บ best-open-source-speech-to-text-models
Gladia - Top 5 Open-Source Speech-to-Text Models for Enterprises
In this article, we will cover the most advanced open-source ASR models available, including Whisper ASR, DeepSpeech, Kaldi, Wav2vec, or SpeechBrain, highlighting their key strength and technical requirements, Modern ASR can very reliably transcribe ...
๐ŸŒ
AssemblyAI
assemblyai.com โ€บ blog โ€บ the-top-free-speech-to-text-apis-and-open-source-engines
The top free Speech-to-Text APIs, AI Models, and Open Source Engines
This post compares the best free Speech-to-Text APIs and AI models on the market today, including APIs that have a free tier. Weโ€™ll also look at several free open-source Speech-to-Text engines and explore why you might choose an API vs. an open-source library, or vice versa.
๐ŸŒ
Eden AI
edenai.co โ€บ post โ€บ top-free-speech-to-text-tools-apis-and-open-source-models
Top Free Speech to text tools, APIs, and Open Source models | Eden AI
DeepSpeech is an open-source, embedded speech-to-text engine that operates in real-time on a variety of devices, ranging from high-powered GPUs to a Raspberry Pi 4.
๐ŸŒ
Northflank
northflank.com โ€บ blog โ€บ best-open-source-speech-to-text-stt-model-in-2025-benchmarks
Best open source speech-to-text (STT) model in 2025 (with benchmarks) | Blog โ€” Northflank
The hybrid design pairs a FastConformer encoder optimized for speech recognition with an unmodified Qwen3-1.7B LLM decoder. This enables dual operation: pure transcription mode and intelligent analysis mode supporting summarization and question answering. ... Word Error Rate: 5.63% (Open ASR Leaderboard average), 1.6% (LibriSpeech Clean), 3.1% (LibriSpeech Other)
๐ŸŒ
Modal
modal.com โ€บ blog โ€บ open-source-stt
The Top Open Source Speech-to-Text (STT) Models in 2025
What sets Canary apart is its new hybrid architecture that combines automatic speech recognition (ASR) with large language model (LLM) capabilities. This makes Canary Qwen 2.5B the first open-source Speech-Augmented Language Model (SALM).
Find elsewhere
๐ŸŒ
GitHub
github.com โ€บ vndee โ€บ local-talking-llm
GitHub - vndee/local-talking-llm: A talking LLM that runs on your own computer without needing the internet.
Speech Recognition: Utilizing OpenAI's Whisper, we convert spoken language into text.
Starred by 735 users
Forked by 147 users
Languages ย  Python 95.8% | Makefile 4.2%
๐ŸŒ
Modal
modal.com โ€บ blog โ€บ open-source-tts
The Top Open-Source Text to Speech (TTS) Models
Itโ€™s an open sourced model that was built on top of Llama 3.2 3B, pre-trained on over 10 million hours of audio data. This model provides industry-leading expressive audio generation and multilingual voice cloning.
๐ŸŒ
GitHub
github.com โ€บ mozilla โ€บ DeepSpeech
GitHub - mozilla/DeepSpeech: DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper.
Starred by 26.7K users
Forked by 4.1K users
Languages ย  C++ 47.0% | Python 21.4% | C 11.2% | Shell 10.8% | C# 2.8% | Swift 1.8%
๐ŸŒ
Reddit
reddit.com โ€บ r/singularity โ€บ building a local speech-to-speech interface for llms (open source)
r/singularity on Reddit: Building a Local Speech-to-Speech Interface for LLMs (Open Source)
March 29, 2025 -

I wanted a straightforward way to interact with local LLMs using voice, similar to some research projects (think sesame which was a huge disapointment and orpheus) but packaged into something easier to run. Existing options often involved cloud APIs or complex setups.

I built Persona Engine, an open-source tool that bundles the components for a local speech-to-speech loop:

  • It uses Whisper .NET for speech recognition.

  • Connects to any OpenAI-compatible LLM API (so your local models work fine or cloud if you prefer).

  • Uses a TTS pipeline (with optional real-time voice cloning) for the audio output.

  • It also includes Live2D avatar rendering and Spout output for streaming/visualization.

The goal was to create a self-contained system where the ASR, TTS, and optional RVC could all run locally (using an NVIDIA GPU for performance).

Making this kind of real-time, local voice interaction more accessible feels like a useful step as AI becomes more integrated. It allows for private, conversational interaction without constant cloud reliance.

If you're interested in this kind of local AI interface:

  • Code/Details: https://github.com/fagenorn/handcrafted-persona-engine

  • Demo: https://www.youtube.com/watch?v=4V2DgI7OtHE (forgive the cheesiness, I was having a bit of fun with capcut)

Curious about your thoughts ๐Ÿ˜Š

๐ŸŒ
GitHub
github.com โ€บ KoljaB โ€บ RealtimeTTS
GitHub - KoljaB/RealtimeTTS: Converts text to speech in realtime
It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available. Short_RealtimeTTS_Demo.mov ยท Low Latency ยท almost instantaneous text-to-speech conversion ยท compatible with LLM outputs ยท
Starred by 3.7K users
Forked by 354 users
Languages ย  Python 96.2% | Shell 1.8% | Batchfile 1.5%
๐ŸŒ
Mistral AI
mistral.ai โ€บ news โ€บ voxtral
Voxtral | Mistral AI
July 15, 2025 - Voxtral comprehensively outperforms Whisper large-v3, the current leading open-source Speech Transcription model.
๐ŸŒ
KDnuggets
kdnuggets.com โ€บ top-5-text-to-speech-open-source-models
Top 5 Text-to-Speech Open Source Models - KDnuggets
Orpheus TTS is a cutting-edge, Llama-based speech LLM designed for high-quality and empathetic text-to-speech applications. It is fine-tuned to deliver human-like speech with exceptional clarity and expressiveness, making it suitable for real-time ...
๐ŸŒ
GitHub
github.com โ€บ openai โ€บ whisper
GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper
Starred by 92.1K users
Forked by 11.5K users
Languages ย  Python
๐ŸŒ
GitHub
microsoft.github.io โ€บ VibeVoice
VibeVoice: A Frontier Open-Source Text-to-Speech Model
Model (LLM) to understand textual ... distinct speakers, surpassing the typical 1-2 speaker limits of many prior models. 2025-09-05: VibeVoice is an open-source research framework ......
๐ŸŒ
Siliconflow
siliconflow.com โ€บ articles โ€บ en โ€บ best-open-source-speech-to-text-models
The Best Open Source Speech-to-Text Models in 2025
Ultimate guide to 2025's best open source speech-to-text models: 1. Fish Speech V1.5; 2. CosyVoice2-0.5B; 3. IndexTTS-2. Compare TTS performance, multilingual support, latency, and duration control for speech synthesis applications.