I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com
🌐
Reddit
reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?
r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?
August 23, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

  • Faster Whisper (MIT license)

  • Insanely fast Whisper (Apache-2.0 license)

  • Distil-Whisper (MIT license)

  • WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)

  • WhisperLive (MIT license, Added here 03/2025)

  • WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

  • Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

  • StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Top answer
1 of 38
72
I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech
2 of 38
15
I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.
🌐
GitHub
github.com › rhasspy › piper
GitHub - rhasspy/piper: A fast, local neural text to speech system
A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.
Starred by 10.3K users
Forked by 872 users
Languages   C++ 72.8% | Python 18.8% | Jupyter Notebook 7.6% | CMake 0.3% | Shell 0.2% | Dockerfile 0.1%
People also ask

What is the best free AI text to speech tool?
The best free AI text to speech tool depends on what you need natural voice quality, a variety of languages and accents, low latency, or accessibility. · ‍ · Murf AI outperforms the competition in almost every category. Our free plan gives you access to all the features of our full voice generation studio. Use the free tool on our site to generate audio completely free, no sign-up required. · Here are some key features that make Murf the best free TTS tool: · Fully free-to-use AI voiceover studio (no credit card required) · Ultra-realistic, human-like voices that are contextually aware · Full
🌐
murf.ai
murf.ai › text-to-speech
Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI
What is text to speech?
Text to speech, or TTS, also known as speech synthesisor "read aloud," is a technology that converts digital text into speech. The technology uses advanced AI algorithms and leverages AI voices (also known as synthetic voices) to generate audio from written text. It was originally developed as an assistive technology for better accessibility, but now TTS has an array of modern use cases.
🌐
murf.ai
murf.ai › text-to-speech
Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI
How does text to speech work?
Text to speech works in three simple steps: · Text normalization (text-to-word conversion): Pre-processing, or text normalization, helps the TTS software clean up the input by expanding abbreviations, acronyms, dates, and other symbols into readable words. · Phonetic conversion (word-to-phoneme conversion): The TTS model identifies phonemes the basic units of sound in the normalized text, using a phoneme library to guide pronunciation. · Speech synthesis (phoneme-to-sound conversion): A synthetic or AI-generated voice reads the phonemes aloud using pre-recorded samples or learned speech patter
🌐
murf.ai
murf.ai › text-to-speech
Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI
🌐
Devtalk
forum.devtalk.com › ai forum › ai questions/help
What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk
March 20, 2025 - Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ...
🌐
Modal
modal.com › blog › open-source-tts
The Top Open-Source Text to Speech (TTS) Models
This especially matters to use ... run on a local device, like a smart device or smartphone, or for companies producing speech at scale where costs might overwhelm other metrics of realism. If you are building a text-to-speech powered application for the first time, we highly recommend starting with Chatterbox. Developed by Resemble AI, Chatterbox ...
🌐
GitHub
github.com › coqui-ai › TTS
GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option. ... If you plan to code or train models, clone 🐸TTS and install it locally. git clone https://github.com/coqui-ai/TTS pip install -e .[all,dev,notebooks] # Select the relevant extras
Starred by 43.9K users
Forked by 5.8K users
Languages   Python 92.0% | Jupyter Notebook 7.5% | HTML 0.3% | Shell 0.1% | Makefile 0.1% | Cython 0.0%
🌐
BentoML
bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models
The Best Open-Source Text-to-Speech Models in 2026
Built on a compact 0.5B-parameter LLM backbone, it delivers near-human speech quality and real-time performance. Unlike most cloud-locked systems, NeuTTS Air provides embedded voice AI capabilities on local devices such as laptops, mobile phones, ...
Find elsewhere
🌐
Northflank
northflank.com › blog › best-open-source-text-to-speech-models-and-how-to-run-them
Best open source text-to-speech models and how to run them | Blog — Northflank
Explore the best open source text-to-speech models like XTTS-v2, Mozilla TTS, and Bark. Learn how to choose, deploy, and scale them for production with GPU support using Northflank.
🌐
Play.ht
play.ht
#1 Free AI Voice Generator, Text to Speech, & AI Voice Over
The Best AI Voice Generator with 200+ realistic AI voices. PlayAI is the voice platform for creators & enterprises. See our low latency Text to Speech API.
🌐
Murf AI
murf.ai › text-to-speech
Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI
Text to Speech Converter by Murf
Our MCP server allows you to directly ... in your local setup. Deploy production-ready Python SDKs and get your first API call running under 5 minutes. Control voice styles, pitch and pauses with SSML tags. ... We support text to speech in all popular languages from English and Spanish to Chinese and French. Use our multilingual AI generated ... Convert Text to Voice using Murf Text to Speech online software. Input Text and convert to audio using realistic AI voices & download in mp3, wav, flac formats. Murf TTS supports 120 voices and 20 languages
Rating: 5 ​
🌐
WillowTree
willowtreeapps.com › craft › 10-speech-to-text-models-tested
We Tested 10 Speech-to-Text Models, See Which Perform Best
Last, we evaluated whisper-large-v3-local on an Apple MacBook Pro running a M3 Max chip, 36 GB of memory, and MacOS Sequoia 15.1. Overall, assemblyai-universal-2 appeared to be the best speech-to-text model we tested.
🌐
YouTube
youtube.com › watch
My Top 5 Open-Source AI Text-to-Speech Models - YouTube
Links referenced in the video:The TTS Interface I Used - https://github.com/JarodMica/audiobook_makerGPT-SoVITS - https://github.com/RVC-Boss/GPT-SoVITSFish ...
Published   February 12, 2025
🌐
Medium
medium.com › @himimemo › top-local-tts-models-with-voice-cloning-to-try-in-november-2024-947ff48c6fe6
Top Local TTS Models with Voice Cloning to Try in November 2024 | by Himimemo | Medium
November 4, 2024 - XTTS-V2 Demo: https://huggingface.co/spaces/coqui/xtts Github: https://github.com/coqui-ai/TTS Model License: https://coqui.ai/cpml (NC) E2-TTS and F5-TTS Demo: https://huggingface.co/spaces/mrfakename/E2-F5-TTS Github: https://github.com/SWivid/F5-TTS Model License: CC-BY-NC-4.0 · MaskGCT Demo: https://huggingface.co/spaces/amphion/maskgct Github: https://github.com/open-mmlab/Amphion Model License: CC-BY-NC-4.0 · Fish Speech Demo: https://huggingface.co/spaces/fishaudio/fish-speech-1 Github: https://github.com/fishaudio/fish-speech Model License: CC-BY-NC-SA 4.0
🌐
ElevenLabs
elevenlabs.io
Free AI Voice Generator & Voice Agents Platform | ElevenLabs
ElevenLabs Text to Speech
High quality, human-like AI voice generator in 70 languages
(4.5)
Price   US$5.00
🌐
Google Cloud
cloud.google.com › text-to-speech
Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud
Try Gemini 3, our best model for reasoning, coding, and multimodal understanding in Vertex AI ... Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.
🌐
SPEECHMA
speechma.com › english
SPEECHMA - Best Free Text to Speech Online | Unlimited AI Voices | Commercial Use TTS 2025
Best free text to speech converter with 580+ natural AI voices. Unlimited usage with commercial license. Perfect for YouTube, TikTok & content creation.
🌐
Picovoice
picovoice.ai › blog › local-text-to-speech-with-cloud-quality
Local Text-to-Speech with Cloud Quality - Picovoice
April 29, 2025 - Everything you need to know about local text-to-speech (TTS): the challenges, choosing the right local TTS, and why Orca is developers' top choice in 2025.
🌐
Nerdynav
nerdynav.com › open-source-ai-voice
Best FREE ElevenLabs Alternatives & Opensource Text to Speech Models (2025) | Nerdynav
Self-Host (Free for Personal): Their 0.5B distilled model OpenAudio S1-mini can be downloaded and run locally. Try at HuggingFace · Commercial Users: The $9.99/month plan or the pay-as-you-go API ($15/1M chars) is a no-brainer compared to ElevenLabs’ pricing which can cost upwards of $100+/month. You get better quality at 80% less cost. ... If you absolutely need a 100% free solution for commercial use, Chatterbox is your best bet. ... Chatterbox is an MIT-licensed AI text to speech model from Resemble AI.
🌐
Smallest.ai
smallest.ai › blog › creating-local-text-to-speech-ai-voices-for-free
Creating Local Text-To-Speech AI Voices for Free
So, What if you could create local text-to-speech AI voices for free? So, if you need a voiceover for your content, a chatbot for your business, or accessibility features but don’t want to pay, you are in the right place! This guide will show you how TTS works, the best free tools, how to ...
🌐
LocalAI
localai.io › features › text-to-audio
🗣 Text to audio (TTS) :: LocalAI
1 month ago - The LocalAI TTS API is compatible with the OpenAI TTS API and the Elevenlabs API. The /tts endpoint can also be used to generate speech from text.