I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com
🌐
Reddit
reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?
r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?
August 23, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

  • Faster Whisper (MIT license)

  • Insanely fast Whisper (Apache-2.0 license)

  • Distil-Whisper (MIT license)

  • WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)

  • WhisperLive (MIT license, Added here 03/2025)

  • WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

  • Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

  • StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Top answer
1 of 38
72
I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech
2 of 38
15
I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.
🌐
Linus Tech Tips
linustechtips.com › software › programs, apps and websites
Self Hosted AI TTS - Programs, Apps and Websites - Linus Tech Tips
August 2, 2023 - Are there any options to run an AI text-to-speech/TTS program locally using an RTX GPU? There are some good online paid services like speechify and Naturalreader but are there any similar options that can be self hosted? I have a 2060 in my laptop and a 4090 in my desktop to play around with.
Discussions

What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ... More on forum.devtalk.com
🌐 forum.devtalk.com
2
March 20, 2025
Best local open source Text-To-Speech and Speech-To-Text?
I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech More on reddit.com
🌐 r/LocalLLaMA
141
345
August 23, 2024
Best speech to text transcription? Local model or api
Whisper is really good, at least for single speaker text. It's less good for multi-speaker text, but still plenty usable. More on reddit.com
🌐 r/LocalLLaMA
26
6
July 4, 2024
What is current AI go to for voice generation running locally on PC
As far as I know, XTTS-v2 is still the best, but if there's something better now, I'd be quite interested to hear about it. More on reddit.com
🌐 r/LocalLLaMA
45
79
April 12, 2024
🌐
YouTube
youtube.com › watch
The BEST, Local Text-to-Speech Generator - AI Voice ...
Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.
🌐
Zapier
zapier.com › app picks › best apps
The 9 best AI voice generators in 2025 | Zapier
August 25, 2025 - You can upload your audio—any kind of audio—and access transcription, speech generation, or noise removal, among many other possibilities. The learning curve is a bit steep here, as this screen has a real audio editor vibe: be sure to ... Altered price: Free plan available for 3 minutes per month of voice morphine, 10k AI tokens, and local voice cloning.
🌐
Synthesia
synthesia.io › home › synthesia features › ai voice generator and text-to-speech tool
AI Voice Generator - Create Realistic Voiceovers for FREE
Synthesia is best AI voice generator (according to G2 reviews). It combines the most advanced AI voices with state-of-the-art generative video capabilities that allow users to generate realistic videos with voiceovers in minutes.
🌐
Unite.AI
unite.ai › best-text-to-speech-generators
10 Best “Text to Speech” Generators (December 2025) – Unite.AI
October 1, 2025 - ... ElevenLabs is an AI-powered text-to-speech platform that converts written text into natural sounding speech, the platform features a clean interface and the most realistic AI voices available.
Find elsewhere
🌐
Nerdynav
nerdynav.com › best-ai-voice-generators
15 Best AI Voice Generators That Sound Human (2025) + Audio Samples | Nerdynav
November 1, 2025 - If you’re like me and you’ve been searching for a reliable text-to-speech tool for commercial use, you might want to check out Play.ht. Big companies like Verizon and Samsung trust it, and that says something about its quality. Play.ht really shines when it comes to regional languages and accents. So, if you’re working on a project that needs that local touch, this could be a great fit.
🌐
Devtalk
forum.devtalk.com › ai forum › ai questions/help
What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk
March 20, 2025 - Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ...
🌐
YouTube
youtube.com › aitrepreneur
RIP ELEVENLABS! Create BEST TTS AI Voices LOCALLY For FREE! - YouTube
Say goodbye to expensive AI voice generators like ELEVENLABS! In this ULTIMATE guide, I'll show you how to create the BEST text-to-speech AI voices on your l...
Published   May 9, 2024
Views   162K
🌐
GitHub
github.com › coqui-ai › TTS
GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option. ... If you plan to code or train models, clone 🐸TTS and install it locally. git clone https://github.com/coqui-ai/TTS pip install -e .[all,dev,notebooks] # Select the relevant extras
Starred by 43.9K users
Forked by 5.8K users
Languages   Python 92.0% | Jupyter Notebook 7.5% | HTML 0.3% | Shell 0.1% | Makefile 0.1% | Cython 0.0%
🌐
GitHub
github.com › rhasspy › piper
GitHub - rhasspy/piper: A fast, local neural text to speech system
October 6, 2025 - A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.
Starred by 10.3K users
Forked by 872 users
Languages   C++ 72.8% | Python 18.8% | Jupyter Notebook 7.6% | CMake 0.3% | Shell 0.2% | Dockerfile 0.1%
🌐
Modal
modal.com › blog › open-source-tts
The Top Open-Source Text to Speech (TTS) Models
This especially matters to use ... run on a local device, like a smart device or smartphone, or for companies producing speech at scale where costs might overwhelm other metrics of realism. If you are building a text-to-speech powered application for the first time, we highly recommend starting with Chatterbox. Developed by Resemble AI, Chatterbox ...
🌐
BentoML
bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models
The Best Open-Source Text-to-Speech Models in 2026
5 days ago - Built on a compact 0.5B-parameter LLM backbone, it delivers near-human speech quality and real-time performance. Unlike most cloud-locked systems, NeuTTS Air provides embedded voice AI capabilities on local devices such as laptops, mobile phones, ...
🌐
DataCamp
datacamp.com › blog › best-open-source-text-to-speech-tts-engines
9 Best Open Source Text-to-Speech (TTS) Engines | DataCamp
December 2, 2024 - Explore 9 common free, open-source text-to-speech engines for your ML projects.
🌐
Google Cloud
cloud.google.com › text-to-speech
Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud
October 1, 2025 - Try Gemini 3, our best model for reasoning, coding, and multimodal understanding in Vertex AI ... Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.
🌐
Northflank
northflank.com › blog › best-open-source-text-to-speech-models-and-how-to-run-them
Best open source text-to-speech models and how to run them | Blog — Northflank
September 11, 2025 - Explore the best open source text-to-speech models like XTTS-v2, Mozilla TTS, and Bark. Learn how to choose, deploy, and scale them for production with GPU support using Northflank.
🌐
Medium
medium.com › @himimemo › top-local-tts-models-with-voice-cloning-to-try-in-november-2024-947ff48c6fe6
Top Local TTS Models with Voice Cloning to Try in November 2024 | by Himimemo | Medium
November 4, 2024 - XTTS-V2 Demo: https://huggingface.co/spaces/coqui/xtts Github: https://github.com/coqui-ai/TTS Model License: https://coqui.ai/cpml (NC) E2-TTS and F5-TTS Demo: https://huggingface.co/spaces/mrfakename/E2-F5-TTS Github: https://github.com/SWivid/F5-TTS Model License: CC-BY-NC-4.0 · MaskGCT Demo: https://huggingface.co/spaces/amphion/maskgct Github: https://github.com/open-mmlab/Amphion Model License: CC-BY-NC-4.0 · Fish Speech Demo: https://huggingface.co/spaces/fishaudio/fish-speech-1 Github: https://github.com/fishaudio/fish-speech Model License: CC-BY-NC-SA 4.0
🌐
Eden AI
edenai.co › post › top-free-text-to-speech-tools-apis-and-open-source-models
Top Free Text-to-Speech tools, APIs, and Open Source models | Eden AI
Mozilla TTS is an open-source model that provides tools and models for converting text into human-like speech. The primary model is Tacotron 2, which generates mel-spectrograms, and it can be paired with a vocoder like WaveGlow to create audio.