I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com
🌐
GitHub
github.com › coqui-ai › TTS
GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Tools to curate Text2Speech datasets underdataset_analysis. Utilities to use and test your models. Modular (but not too much) code base enabling easy implementation of new ideas. ... You can also help us implement more models. 🐸TTS is tested on Ubuntu 18.04 with python >= 3.9, < 3.12.. If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.
Starred by 43.9K users
Forked by 5.8K users
Languages   Python 92.0% | Jupyter Notebook 7.5% | HTML 0.3% | Shell 0.1% | Makefile 0.1% | Cython 0.0%
🌐
Eden AI
edenai.co › post › top-free-text-to-speech-tools-apis-and-open-source-models
Top Free Text-to-Speech tools, APIs, and Open Source models | Eden AI
Mozilla TTS is an open-source model that provides tools and models for converting text into human-like speech. The primary model is Tacotron 2, which generates mel-spectrograms, and it can be paired with a vocoder like WaveGlow to create audio.
🌐
DataCamp
datacamp.com › blog › best-open-source-text-to-speech-tts-engines
9 Best Open Source Text-to-Speech (TTS) Engines | DataCamp
December 2, 2024 - Explore the best AI coding assistants, including open-source, free, and commercial tools to enhance your development experience. ... OpenAI’s TTS API is an endpoint that enables users to interact with their TTS AI model that converts text to natural-sounding spoken language. ... Harness the capabilities of the ElevenLabs API, a powerful AI voice generator. Learn how to transform text into speech and clone voices with this technology.
🌐
Reddit
reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?
r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?
August 24, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

  • Faster Whisper (MIT license)

  • Insanely fast Whisper (Apache-2.0 license)

  • Distil-Whisper (MIT license)

  • WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)

  • WhisperLive (MIT license, Added here 03/2025)

  • WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

  • Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

  • StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Top answer
1 of 38
72
I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech
2 of 38
15
I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.
🌐
Modal
modal.com › blog › open-source-tts
The Top Open-Source Text to Speech (TTS) Models
It’s currently the top trending text-to-speech model on Hugging Face. It’s an open sourced model that was built on top of Llama 3.2 3B, pre-trained on over 10 million hours of audio data. This model provides industry-leading expressive audio generation and multilingual voice cloning.
🌐
AssemblyAI
assemblyai.com › blog › the-top-free-speech-to-text-apis-and-open-source-engines
The top free Speech-to-Text APIs, AI Models, and Open Source Engines
October 23, 2025 - An alternative to APIs and AI models, open-source Speech-to-Text libraries are completely free--with no limits on use. Some developers also see data security as a plus, since your data doesn't have to be sent to a third party or the cloud.
🌐
BentoML
bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models
The Best Open-Source Text-to-Speech Models in 2026
1 week ago - Chatterbox is a high-performance, open-source TTS model developed by Resemble AI. Built with a 500M-parameter Llama backbone and trained on over 500K hours of cleaned audio, Chatterbox delivers state-of-the-art speech generation quality with ...
🌐
Hugging Face
huggingface.co › models
Text-to-Speech Models – Hugging Face
2 days ago - Text-to-Speech + 44 · Parameters ... · MLX LM · LM Studio · Ollama · Jan + 12 · Inference Providers · Groq · Novita · Nebius AI ·...
Find elsewhere
🌐
Resemble AI
resemble.ai › home › chatterbox – free open source text to speech model
Chatterbox - Free Open Source Text to Speech Model | Resemble AI
May 26, 2025 - The fastest open-source text-to-speech model that supports paralinguistic tagging for non-speech sounds.
🌐
GitHub
github.com › resemble-ai › chatterbox
GitHub - resemble-ai/chatterbox: SoTA open-source TTS
1 day ago - Chatterbox is a family of three state-of-the-art, open-source text-to-speech models by Resemble AI.
Starred by 15.2K users
Forked by 2.1K users
Languages   Python
🌐
Northflank
northflank.com › blog › best-open-source-text-to-speech-models-and-how-to-run-them
Best open source text-to-speech models and how to run them | Blog — Northflank
September 11, 2025 - Explore the best open source text-to-speech models like XTTS-v2, Mozilla TTS, and Bark. Learn how to choose, deploy, and scale them for production with GPU support using Northflank.
🌐
Gladia
gladia.io › blog › best-open-source-speech-to-text-models
Gladia - Top 5 Open-Source Speech-to-Text Models for Enterprises
In this article, we will cover the most advanced open-source ASR models available, including Whisper ASR, DeepSpeech, Kaldi, Wav2vec, or SpeechBrain, highlighting their key strength and technical requirements, Modern ASR can very reliably transcribe ...
🌐
The Open Source Post
fosspost.org › home › open source for developers › top 15 open source speech recognition/tts/stt/ systems
Top 15 Open Source Speech Recognition/TTS/STT/ Systems
August 1, 2024 - If you are looking for something modern, then this one can be included. Flashlight ASR is an open source speech recognition software that was released by Facebook’s AI Research Team.
🌐
Eden AI
edenai.co › post › top-free-speech-to-text-tools-apis-and-open-source-models
Top Free Speech to text tools, APIs, and Open Source models | Eden AI
The platform provides open-source implementations of popular research projects and tightly integrates with HuggingFace, enabling easy access. In general, the platform is clearly defined and regularly updated, making it an uncomplicated tool for training and fine-tuning. ... Coqui is a remarkable toolkit for deep learning in Speech-to-Text transcription.
🌐
Rev
rev.com › resources › the-5-best-open-source-speech-recognition-engines-apis
Best Open Source Speech Recognition APIs | Rev
Premium Tools. Premium Perks. Save on Human Transcription & unlock AI features built for precision, efficiency, & privacy. ... Spot inconsistencies, patterns, & key facts across multiple audio, video, & text files in seconds. ... Comprehensive AI-powered deposition summaries with 100% accurate page-line citations. ... There are many open-source speech ...
🌐
OpenAI
platform.openai.com › docs › guides › text-to-speech
Text to speech | OpenAI API
14 hours ago - The Audio API provides a speech endpoint based on our GPT-4o mini TTS (text-to-speech) model. It comes with 11 built-in voices and can be used to: ... Our usage policies require you to provide a clear disclosure to end users that the TTS voice ...
🌐
OpenAI
platform.openai.com › docs › guides › speech-to-text
Speech to text | OpenAI API
1 2 3 4 5 6 7 8 9 10 11 12 13 import fs from "fs"; import OpenAI from "openai"; const openai = new OpenAI(); const transcription = await openai.audio.transcriptions.create({ file: fs.createReadStream("/path/to/file/speech.mp3"), model: "gpt-4o-transcribe", response_format: "text", prompt:"The following conversation is a lecture about the recent developments around OpenAI, GPT-4.5 and the future of AI.", }); console.log(transcription.text);
🌐
GitHub
github.com › mozilla › DeepSpeech
GitHub - mozilla/DeepSpeech: DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper.
Starred by 26.7K users
Forked by 4.1K users
Languages   C++ 47.0% | Python 21.4% | C 11.2% | Shell 10.8% | C# 2.8% | Swift 1.8%
🌐
Vapi
vapi.ai › blog › medical-speech-to-text-software
The 10 Best Open-Source Medical Speech-to-Text Software Tools - Vapi AI Blog
May 22, 2025 - Read The 10 Best Open-Source Medical Speech-to-Text Software Tools on the Vapi blog