whisper-v3-turbo because of its wide compatibility with open source ecosystem (not necessarily because of its WER) The architecture is plug and play. You can typically add some LLMs along with whisper to correct stuff for you and customize as you need. I wrote a guide here just now: Creating Very High-Quality Transcripts with Open-Source Tools: An 100% automated workflow guide Answer from phoneixAdi on reddit.com
Gladia
gladia.io › blog › best-open-source-speech-to-text-models
Gladia - Top 5 Open-Source Speech-to-Text Models for Enterprises
In this article, we will cover the most advanced open-source ASR models available, including Whisper ASR, DeepSpeech, Kaldi, Wav2vec, or SpeechBrain, highlighting their key strength and technical requirements, Modern ASR can very reliably transcribe ...
Reddit
reddit.com › r/localllama › what's the best open source speech to text model
r/LocalLLaMA on Reddit: What's the best open source speech to text model
August 2, 2024 -
I know OpenAI recently released whisper V3 Turbo but I remember hearing about some other ones that's a lot better but I can't remember
Top answer 1 of 9
21
whisper-v3-turbo because of its wide compatibility with open source ecosystem (not necessarily because of its WER) The architecture is plug and play. You can typically add some LLMs along with whisper to correct stuff for you and customize as you need. I wrote a guide here just now: Creating Very High-Quality Transcripts with Open-Source Tools: An 100% automated workflow guide
2 of 9
7
You might be talking about https://huggingface.co/Revai Here is the post you might be remembering https://x.com/reach_vb/status/1841885263766945930
Videos
13:41
Local and Open Source Speech to Speech Assistant - YouTube
17:27
Real-Time Speech-to-Text & Speaker Identification using Whisper, ...
28:01
My Top 5 Open-Source AI Text-to-Speech Models - YouTube
15:48
Possibly THE BEST Open Source Text-to-Speech Model - VibeVoice ...
16:53
NEW Fast Open Source AI TTS Installation - DMOSpeech 2 - YouTube
23:58
The Most Accurate Speech-to-text APIs in 2025 - YouTube
Eden AI
edenai.co › post › top-free-speech-to-text-tools-apis-and-open-source-models
Top Free Speech to text tools, APIs, and Open Source models | Eden AI
Coqui is a remarkable toolkit for deep learning in Speech-to-Text transcription. It is developed to be utilized in more than twenty language projects with an array of inference and productionization features. Furthermore, the platform provides custom trained models and has bindings for numerous programming languages, making it easier for deployment. ... Whisper, which was released by OpenAI in September 2022, can be considered as one of the leading open source options.
GitHub
github.com › mozilla › DeepSpeech
GitHub - mozilla/DeepSpeech: DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper.
Starred by 26.7K users
Forked by 4.1K users
Languages C++ 47.0% | Python 21.4% | C 11.2% | Shell 10.8% | C# 2.8% | Swift 1.8%
GitHub
github.com › resemble-ai › chatterbox
GitHub - resemble-ai/chatterbox: SoTA open-source TTS
Starred by 15.5K users
Forked by 2.2K users
Languages Python
Siliconflow
siliconflow.com › articles › en › best-open-source-speech-to-text-models
The Best Open Source Speech-to-Text Models in 2025
Ultimate guide to 2025's best open source speech-to-text models: 1. Fish Speech V1.5; 2. CosyVoice2-0.5B; 3. IndexTTS-2. Compare TTS performance, multilingual support, latency, and duration control for speech synthesis applications.
GitHub
github.com › openai › whisper
GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.
Starred by 92.1K users
Forked by 11.5K users
Languages Python
NVIDIA
blog.spheron.network › a-comprehensive-look-at-open-source-speech-to-text-projects-2024
A Comprehensive Look at Open-Source Speech-to-Text Projects (2024)
April 20, 2024 - Similar to DeepSpeech, it has good initial accuracy and can facilitate model training. Kaldi has a long testing history and is currently used by numerous companies in their production environments, which boosts developer confidence in its effectiveness. Facebook AI Research has developed Wav2Letter, an ASR toolkit that utilizes C++ and the ArrayFire tensor library. It is a moderately accurate open-source library that is user-friendly for smaller projects. SpeechBrain is a transcription toolkit that is built on PyTorch.
The Open Source Post
fosspost.org › home › open source for developers › top 15 open source speech recognition/tts/stt/ systems
Top 15 Open Source Speech Recognition/TTS/STT/ Systems
August 1, 2024 - Amphion is an open-source toolkit designed for audio, music, and speech generation. Licensed under the MIT license, it is primarily developed in Python with supporting components written in Jupyter Notebook and Shell scripting. The software leverages various other model structures from other libraries such as FastSpeech2, VITS, VALL-E, NaturalSpeech2 for text-to-speech (TTS) tasks.