best speech to text ai open source - Brave Search

Best local open source Text-To-Speech and Speech-To-Text?

reddit.com › r › LocalLLaMA › comments › 1f0awd6 › best_local_open_source_texttospeech_and

I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com

modal.com › blog › open-source-tts

The Top Open-Source Text to Speech (TTS) Models

It’s currently the top trending text-to-speech model on Hugging Face. It’s an open sourced model that was built on top of Llama 3.2 3B, pre-trained on over 10 million hours of audio data.

bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models

The Best Open-Source Text-to-Speech Models in 2026

Chatterbox is a high-performance, open-source TTS model developed by Resemble AI. Built with a 500M-parameter Llama backbone and trained on over 500K hours of cleaned audio, Chatterbox delivers state-of-the-art speech generation quality with ...

Videos

Possibly THE BEST Open Source Text-to-Speech Model - VibeVoice ...

September 2, 2025

My Top 5 Open-Source AI Text-to-Speech Models - YouTube

February 12, 2025

The Most Accurate Speech-to-text APIs in 2025 - YouTube

February 6, 2025

Local and Open Source Speech to Speech Assistant - YouTube

September 12, 2024

SpeechBrain - Speech to text model - YouTube

August 14, 2024

The MOST EXPRESSIVE Open Source Text-to-Speech of 2025 - YouTube

September 8, 2025

reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?

r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?

August 24, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

Faster Whisper (MIT license)
Insanely fast Whisper (Apache-2.0 license)
Distil-Whisper (MIT license)
WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)
WhisperLive (MIT license, Added here 03/2025)
WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team ^source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech

I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.

gladia.io › blog › best-open-source-speech-to-text-models

Best open-source speech-to-text models

Whisper, DeepSpeech, Kaldi, Wav2vec, or SpeechBrain: key factors to consider when choosing an open-source ASR model for your apps and projects.

edenai.co › post › top-free-speech-to-text-tools-apis-and-open-source-models

Top Free Speech to text tools, APIs, and Open Source models | Eden AI

IBM Watson's Speech to Text technology facilitates rapid and precise transcription of speech in various languages for a range of applications, not excluding customer self-help, agent aid, and speech analytics.

edenai.co › post › top-free-text-to-speech-tools-apis-and-open-source-models

Top Free Text-to-Speech tools, APIs, and Open Source models | Eden AI

Mozilla TTS is an open-source model that provides tools and models for converting text into human-like speech. The primary model is Tacotron 2, which generates mel-spectrograms, and it can be paired with a vocoder like WaveGlow to create audio.

assemblyai.com › blog › the-top-free-speech-to-text-apis-and-open-source-engines

The top free Speech-to-Text APIs, AI Models, and Open Source Engines

This post compares the best free Speech-to-Text APIs and AI models on the market today, including APIs that have a free tier. We’ll also look at several free open-source Speech-to-Text engines and explore why you might choose an API vs. an open-source library, or vice versa.

datacamp.com › blog › best-open-source-text-to-speech-tts-engines

9 Best Open Source Text-to-Speech (TTS) Engines | DataCamp

December 2, 2024 - Explore 9 common free, open-source text-to-speech engines for your ML projects.

Find elsewhere

Google Bing Mojeek

resemble.ai › home › chatterbox – free open source text to speech model

Chatterbox - Free Open Source Text to Speech Model | Resemble AI

May 26, 2025 - New Chatterbox Turbo: Blazing Fast Open Source TTS → · MIT licensed. Multilingual. Turbo. Emotion control. Super fast. Consistently outperforms ElevenLabs in blind evaluations.

notta.ai › en › blog › speech-to-text-open-source

13 Best Free Speech-to-Text Open Source Engines, APIs, and AI Models

Best 13 speech-to-text open-source engine · 1 Whisper · 2 Project DeepSpeech · 3 Kaldi · 4 SpeechBrain · 5 Coqui · 6 Julius · 7 Flashlight ASR (Formerly Wav2Letter++) · 8 PaddleSpeech (Formerly DeepSpeech2) · 9 OpenSeq2Seq · 10 Vosk ...

reddit.com › r/localllama › what's the best open source speech to text model

r/LocalLLaMA on Reddit: What's the best open source speech to text model

August 3, 2024 -

I know OpenAI recently released whisper V3 Turbo but I remember hearing about some other ones that's a lot better but I can't remember

whisper-v3-turbo because of its wide compatibility with open source ecosystem (not necessarily because of its WER) The architecture is plug and play. You can typically add some LLMs along with whisper to correct stuff for you and customize as you need. I wrote a guide here just now: Creating Very High-Quality Transcripts with Open-Source Tools: An 100% automated workflow guide

You might be talking about https://huggingface.co/Revai Here is the post you might be remembering https://x.com/reach_vb/status/1841885263766945930

github.com › coqui-ai › TTS

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Tools to curate Text2Speech datasets underdataset_analysis. Utilities to use and test your models. Modular (but not too much) code base enabling easy implementation of new ideas. ... You can also help us implement more models. 🐸TTS is tested on Ubuntu 18.04 with python >= 3.9, < 3.12.. If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.

Starred by 43.9K users

Forked by 5.8K users

Languages Python 92.0% | Jupyter Notebook 7.5% | HTML 0.3% | Shell 0.1% | Makefile 0.1% | Cython 0.0%

northflank.com › blog › best-open-source-text-to-speech-models-and-how-to-run-them

Best open source text-to-speech models and how to run them | Blog — Northflank

Explore the best open source text-to-speech models like XTTS-v2, Mozilla TTS, and Bark. Learn how to choose, deploy, and scale them for production with GPU support using Northflank.

pageon.ai › blog › speech-to-text-ai

Top 10 Free and Open-Source Speech-to-Text AI Tools for 2025

OpenAI Whisper is a revolutionary open-source speech-to-text tool that combines high accuracy with multilingual support. It uses an encoder-decoder transformer architecture to process audio in 30-second segments.

huggingface.co › models

Text-to-Speech Models – Hugging Face

Text-to-Speech + 44 · Parameters Reset Parameters · < 1B · 6B · 12B · 32B · 128B · > 500B · < 1B · > 500B · Libraries · PyTorch · TensorFlow · JAX · Transformers · Diffusers · sentence-transformers · Safetensors · ONNX · GGUF · Transformers.js · MLX + 41 · Apps · vLLM · TGI · llama.cpp · MLX LM · LM Studio · Ollama · Jan + 12 · Inference Providers · Groq · Novita · Nebius AI ·

github.com › resemble-ai › chatterbox

GitHub - resemble-ai/chatterbox: SoTA open-source TTS

If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media. Choose the right model for your application. ... # conda create -yn chatterbox python=3.11 # conda activate chatterbox git clone https://github.com/resemble-ai/chatterbox.git cd chatterbox pip install -e .

Starred by 15.2K users

Forked by 2.1K users

Languages Python

mistral.ai › news › voxtral

Voxtral | Mistral AI

July 15, 2025 - Voxtral comprehensively outperforms Whisper large-v3, the current leading open-source Speech Transcription model.

rev.com › resources › the-5-best-open-source-speech-recognition-engines-apis

Best Open Source Speech Recognition APIs | Rev

Click to see our top five for converting speech to text. ... Luckily for you, we deliver. Subscribe to our blog today. Thank You for Subscribing! A confirmation email is on it’s way to your inbox. ... In this article, we provide a breakdown of five of the best free-to-use open source speech recognition services along with details on how you can get started.

resemble.ai › home › top open-source ai speech-to-text models in 2026

Top Open-Source AI Speech-to-Text Models in 2026 | Resemble AI

3 weeks ago - Open-source speech-to-text models give developers the freedom, transparency, and flexibility to experiment, fine-tune, and deploy transcription systems on their own terms. They are powerful tools for prototyping, research, offline processing, and privacy-sensitive workflows but they often fall short when teams need expressive output, real-time performance, multilingual accuracy, or enterprise-level reliability. ... Resemble AI becomes the perfect complement.

modal.com › blog › open-source-stt

The Top Open Source Speech-to-Text (STT) Models in 2025

50 mg), or financial audio recorders for enforcing trading desk compliance. Whisper Large V3 Turbo is the latest iteration of OpenAI’s flagship speech-to-text model, which debuted in 2022.