🌐
Gladia
gladia.io › blog › best-open-source-speech-to-text-models
Gladia - Top 5 Open-Source Speech-to-Text Models for Enterprises
It can also perform multiple tasks ... text in French and translate it in real time into English. Moreover, Whisper can achieve high accuracy and performance on different speech domains and languages, even without additional fine-tuning. On the downside, Whisper’s “vanilla” version as provided by OpenAI was intended ...
🌐
Voxforge
voxforge.org › fr
French - voxforge.org
VoxForge est un projet qui vise à collecter des enregistrements oraux de textes. Les modèles générés à partir de ces enregistrements sont utilisables par les moteurs de reconnaissance vocale Open Source . Nous mettons à disposition tous ces enregistrements sous licence GPL.
🌐
Programmez!
programmez.com › actualites › piper-un-text-speech-local-et-open-source-37656
Piper : un text to speech local et open source
May 7, 2025 - Piper est un système open source pour faire du text to speech localement, un TTS pour résumer. Avec une simple Raspberry Pi, on peut faire du TTS. Il supporte de nombreuses voix localisées (Allemand, Anglais, Français, Italien, etc.). Il suffit de charger le modèle voulu et de la configurer.
People also ask

Why did you not mention the DeepSpeech project by Mozilla?
DeepSpeech by Mozilla was abandoned many years ago and it is no longer under active development.

We recommend using other open-source models on this page that are still maintained.
🌐
fosspost.org
fosspost.org › home › open source for developers › top 15 open source speech recognition/tts/stt/ systems
Top 15 Open Source Speech Recognition/TTS/STT/ Systems
Some other speech models are not mentioned in your article
Please review the listicle criteria mentioned earlier to understand why we made our choices. Ultimately, we may have missed a few of them, but all of those mentioned are the top ones indeed in the market at the time of writing this article.

You are always welcome to leave us a comment about an addition that you think should be made to this article.
🌐
fosspost.org
fosspost.org › home › open source for developers › top 15 open source speech recognition/tts/stt/ systems
Top 15 Open Source Speech Recognition/TTS/STT/ Systems
Why did you remove OpenSeq2Seq from your list?
Just like DeepSpeech by Mozilla, OpenSeq2Seq from NVIDIA is no longer under active development and was abandoned many years ago.

Try using other models in our list.
🌐
fosspost.org
fosspost.org › home › open source for developers › top 15 open source speech recognition/tts/stt/ systems
Top 15 Open Source Speech Recognition/TTS/STT/ Systems
🌐
GitHub
github.com › mozilla › DeepSpeech
GitHub - mozilla/DeepSpeech: DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. - mozilla/DeepSpeech
Starred by 26.7K users
Forked by 4.1K users
Languages   C++ 47.0% | Python 21.4% | C 11.2% | Shell 10.8% | C# 2.8% | Swift 1.8%
🌐
GitHub
github.com › qanastek › EasyTTS
GitHub - qanastek/EasyTTS: Ready-to-use Multilingual Text-To-Speech (TTS) package.
EasyTTS is an open-source and ready-to-use Multilingual Text-To-Speech (TTS) package. The goal is to simplify usages of state-of-the-art text-to-speech models for a variety of languages (french, english, ...).
Starred by 24 users
Forked by 2 users
Languages   Python
🌐
The Open Source Post
fosspost.org › home › open source for developers › top 15 open source speech recognition/tts/stt/ systems
Top 15 Open Source Speech Recognition/TTS/STT/ Systems
August 1, 2024 - One of the newest open source speech recognition systems, as its development just started in 2020. Unlike other systems in this list, Vosk is quite ready to use after installation, as it supports +20 languages (English, German, French, Turkish…) with portable pre-trained models already available for users.
🌐
Notta
notta.ai › en › blog › speech-to-text-open-source
13 Best Free Speech-to-Text Open Source Engines, APIs, and AI Models
Julius is one of the oldest speech-to-text projects, dating back to 1997, with roots in Japan. It is available under the BSD -3-license, making it accessible to developers. It strongly supports Japanese ASR, but being a language-independent program, the model can understand and process multiple languages, including English, Slovenian, French, Thai, and others.
🌐
DataCamp
datacamp.com › blog › best-open-source-text-to-speech-tts-engines
9 Best Open Source Text-to-Speech (TTS) Engines | DataCamp
December 2, 2024 - Cons: Requires some technical knowledge to implement. ... Coqui TTS is a modern open-source text-to-speech framework that provides an array of pre-trained models for various languages and accents.
🌐
Framacolibri
framacolibri.org › codeurs › développement
MaryTTS : le TTS open-source - Développement - Framacolibri
January 31, 2018 - Bonjour à tous, Premier post certes, mais j’aimerais vous présenter MaryTTS, un super outils en ligne open-source pour faire du TTS (Text To Speech, du Texte vers la Parole). Le TTS est utile principalement pour les so…
Find elsewhere
🌐
Picovoice
picovoice.ai › blog › on-device-voice-ai-for-french
On-device voice AI for French to build AI Agents
Evaluate Cheetah Streaming Speech-to-Text accuracy by comparing it against popular asynchronous French speech-to-text models using our open-source French speech-to-text benchmark.
🌐
Capterra
capterra.ca › home › speech recognition software
Speech Recognition Software - Prices & Reviews - Capterra Canada 2025
Multi-language speech recognition software with the ability to dictate in any third party software or to fill forms on websites. Apart from dictation, Braina also provides voice command features that allows you to search the web, open file, programs & websites, find information, set reminders, take notes and much more. You can use your voice to dictate text to your Windows computer, automate processes and improve your personal and business productivity.
🌐
Reddit
reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?
r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?
August 24, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

  • Faster Whisper (MIT license)

  • Insanely fast Whisper (Apache-2.0 license)

  • Distil-Whisper (MIT license)

  • WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)

  • WhisperLive (MIT license, Added here 03/2025)

  • WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

  • Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

  • StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Top answer
1 of 38
72
I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech
2 of 38
15
I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.
🌐
Eden AI
edenai.co › post › top-free-speech-to-text-tools-apis-and-open-source-models
Top Free Speech to text tools, APIs, and Open Source models | Eden AI
Wav2Letter is an Automatic Speech Recognition (ASR) Toolkit developed by Facebook AI Research. It is written in C++ and employs the ArrayFire tensor library. Wav2Letter is a moderately precise open-source library that is user-friendly for minor projects.
🌐
ElevenLabs
elevenlabs.io › speech-to-text › french
Free French Speech to Text | Transcribe French Voice and Audio to Text
ElevenLabs Speech to Text
High quality, Speech to Text transcription in 99 languages
(4.5)
Price   US$5.00
🌐
GoodFirms
goodfirms.co › home › speech recognition software › blog › the best 7 free and open source speech recognition software solutions
The Best 7 Free and Open Source Speech Recognition Software Solutions
July 1, 2025 - Ample of tools meant for the speech recognition related purposes like keyword spotting, pronunciation evaluation, and alignment. Encourage various languages like Mandarin, Dutch, German, Russian, English, and French. Enjoys the ability to build models for other languages. An open source voice recognition tool is released by the Mozilla that it states is “close to the human level performance.” It is free speech recognition software for developers to plug into their projects.
🌐
GitHub
github.com › cjpais › Handy
GitHub - cjpais/Handy: A free, open source, and extensible speech-to-text application that works completely offline.
A free, open source, and extensible speech-to-text application that works completely offline. - cjpais/Handy
Starred by 8.6K users
Forked by 583 users
Languages   TypeScript 53.6% | Rust 44.6% | Swift 0.9% | CSS 0.5% | JavaScript 0.2% | HTML 0.1% | C 0.1%
🌐
GitHub
github.com › emma11y › speech-to-text-conference
GitHub - emma11y/speech-to-text-conference
L’institut INRIA fait des études sur ce sujet avec le projet Multi Speech. La langue française est une langue très riche et complexe. Nous avons vu lors de nos trois démos différentes qu’il y a des fautes d’accord de verbe. Maintenant, je vais vous montrer une nouvelle démo avec les mélanges qu’on peut avoir au quotidien. Voici une nouvelle démo de Samuel. J’ai préparé un texte avec pleins de petits pièges pour les reconnaissances vocaux.
Author   emma11y
🌐
OpenAI
platform.openai.com › docs › guides › speech-to-text
Speech to text | OpenAI API
In this case, the inputted audio was german and the outputted text looks like: Hello, my name is Wolfgang and I come from Germany. Where are you heading today? We only support translation into English at this time. We currently support the following languages through both the transcriptions and translations endpoint: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
🌐
AssemblyAI
assemblyai.com › blog › the-top-free-speech-to-text-apis-and-open-source-engines
The top free Speech-to-Text APIs, AI Models, and Open Source Engines
This post examines the best free Speech-to-Text APIs and AI models on the market today, including ones that have a free tier, to help you make an informed decision. We'll also look at several free open-source Speech-to-Text engines and explore why you might choose an API or AI model vs.