I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com
🌐
LocalAI
localai.io › features › audio-to-text
🔈 Audio to text :: LocalAI
1 month ago - Audio to text models are models that can generate text from an audio file. The transcription endpoint allows to convert audio files to text. The endpoint is based on whisper.cpp, a C++ library for audio transcription. The endpoint input supports all the audio formats supported by ffmpeg.
Discussions

Best local open source Text-To-Speech and Speech-To-Text?
I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech More on reddit.com
🌐 r/LocalLLaMA
141
346
August 24, 2024
Squawk - Real-Time Local Text-to-Speech with AI
royshilkrot submitted a new resource: Squawk - Real-Time Local Text-to-Speech with AI - Generative AI engine for speech in The OBS Squawk plugin adds powerful voice cloning capabilities to OBS by leveraging sherpa-onnx. With this plugin, you can generate speech on the fly and in real-time... More on obsproject.com
🌐 obsproject.com
22
June 19, 2024
What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ... More on forum.devtalk.com
🌐 forum.devtalk.com
2
March 20, 2025
Feature request: use Coqui or other Local AI sites for voice... - HomeSeer Message Board
Add server: (wav or mp3) Link (192.168.0.9:5002/api/tts?text=) Then you can select a voice server-- or maybe have an option to change the server from an event.... HS3 Pro Edition 3.0.0.435 (Windows Server 8.1 on ESXi box) Plug-Ins Enabled: Z-Wave:,RaspberryIO:,AirplaySpeak:,Ecobee:, weatherXML:,JowiHue:,APCUPSD:,PHLocation:,Chromecast:,EasyTr igger: ... My implementation already supports selecting different providers. And I'm now adding Microsoft SpeechSynthesizer ... More on forums.homeseer.com
🌐 forums.homeseer.com
🌐
WillowTree
willowtreeapps.com › craft › 10-speech-to-text-models-tested
We Tested 10 Speech-to-Text Models, See Which Perform Best
Important note: azure-ai-speech requires batch processing for larger audio files, which requires moving the audio file to a cloud storage bucket before processing, potentially resulting in additional costs. Gladia requires uploading the file to their servers and then performing transcription on the audio_url generated. During our tests using a short French clip about 30 seconds long, whisper-large-v3-local performed the best.
🌐
OpenAI
platform.openai.com › docs › guides › text-to-speech
Text to speech | OpenAI API
The Audio API provides a speech endpoint based on our GPT-4o mini TTS (text-to-speech) model. It comes with 11 built-in voices and can be used to: ... Our usage policies require you to provide a clear disclosure to end users that the TTS voice ...
🌐
MeetJamie
meetjamie.ai › blog › 10-best-speech-to-text-software
10 Best Speech-to-Text Software [Updated August 2025] | Jamie
5 days ago - Aiko is a high-quality on-device transcription tool that converts speech to text from meetings, lectures, and more. Powered by OpenAI’s Whisper model, it runs locally on your device for fast and reliable transcription.
🌐
Gladia
gladia.io › blog › best-open-source-speech-to-text-models
Gladia - Top 5 Open-Source Speech-to-Text Models for Enterprises
Whisper, DeepSpeech, Kaldi, Wav2vec, or SpeechBrain: key factors to consider when choosing an open-source ASR model for your apps and projects.
🌐
GitHub
github.com › KoljaB › RealtimeSTT
GitHub - KoljaB/RealtimeSTT: A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription. - KoljaB/RealtimeSTT
Starred by 9K users
Forked by 764 users
Languages   Python 96.1% | HTML 2.1%
Find elsewhere
🌐
BentoML
bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models
The Best Open-Source Text-to-Speech Models in 2026
1 week ago - Built on a compact 0.5B-parameter LLM backbone, it delivers near-human speech quality and real-time performance. Unlike most cloud-locked systems, NeuTTS Air provides embedded voice AI capabilities on local devices such as laptops, mobile phones, ...
🌐
Reddit
reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?
r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?
August 24, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

  • Faster Whisper (MIT license)

  • Insanely fast Whisper (Apache-2.0 license)

  • Distil-Whisper (MIT license)

  • WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)

  • WhisperLive (MIT license, Added here 03/2025)

  • WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

  • Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

  • StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Top answer
1 of 38
72
I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech
2 of 38
15
I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.
🌐
GitHub
github.com › rhasspy › piper
GitHub - rhasspy/piper: A fast, local neural text to speech system
October 6, 2025 - A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.
Starred by 10.4K users
Forked by 874 users
Languages   C++ 72.8% | Python 18.8% | Jupyter Notebook 7.6% | CMake 0.3% | Shell 0.2% | Dockerfile 0.1%
🌐
OBS Forums
obsproject.com › home › forums › resources › plugins
Squawk - Real-Time Local Text-to-Speech with AI | OBS Forums
June 19, 2024 - royshilkrot submitted a new resource: Squawk - Real-Time Local Text-to-Speech with AI - Generative AI engine for speech in The OBS Squawk plugin adds powerful voice cloning capabilities to OBS by leveraging sherpa-onnx.
🌐
Devtalk
forum.devtalk.com › ai forum › ai questions/help
What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk
March 20, 2025 - Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ...
🌐
GitHub
github.com › coqui-ai › TTS
GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - coqui-ai/TTS
Starred by 43.9K users
Forked by 5.9K users
Languages   Python 92.0% | Jupyter Notebook 7.5% | HTML 0.3% | Shell 0.1% | Makefile 0.1% | Cython 0.0%
🌐
Picovoice
picovoice.ai › blog › local-text-to-speech-with-cloud-quality
Local Text-to-Speech with Cloud Quality - Picovoice
April 29, 2025 - Get dedicated help specific to your use case and for your hardware and software choices.Consult an AI Expert · TLDR: Local Text-to-Speech (TTS) converts text to speech offline, making it ideal for apps that need low latency and privacy.
🌐
Listnr AI
listnr.ai › blog › creating-local-text-to-speech-ai-voices-for-free
Creating Local Text-To-Speech AI Voices for Free: A Step-by-Step Guide | Listnr AI
April 7, 2025 - 1. What is local TTS voice generation? Local TTS voice generation refers to creating text-to-speech models on your own machine using open-source tools, giving you full control over the data and customization.
🌐
LocalAI
localai.io › features › text-to-audio
🗣 Text to audio (TTS) :: LocalAI
1 month ago - With this config, you can now use the following curl command to generate a text-to-speech audio file: curl -L http://localhost:8080/tts \ -H "Content-Type: application/json" \ -d '{ "model": "xtts_v2", "input": "Bonjour, je suis Ana Florence. Comment puis-je vous aider?" }' | aplay
🌐
YouTube
youtube.com › aitrepreneur
RIP ELEVENLABS! Create BEST TTS AI Voices LOCALLY For FREE! - YouTube
Say goodbye to expensive AI voice generators like ELEVENLABS! In this ULTIMATE guide, I'll show you how to create the BEST text-to-speech AI voices on your l...
Published   May 9, 2024
Views   162K
🌐
Medium
medium.com › @dartisan › how-i-built-a-local-ai-chatbot-that-can-talk-listen-and-read-my-files-675f120098fe
How I Built a Local AI Chatbot That Can Talk, Listen, and ...
April 6, 2025 - 💬 Chat with local LLM using [mistral:7b] from Ollama · 📄 Upload PDFs or text files and ask questions about them ... This makes the assistant aware of the document’s content without fine-tuning. ... It’s surprisingly natural to just ask questions aloud and hear the bot reply. Every conversation is saved as a JSON file per session. When the app starts, it reloads the previous messages and keeps context. ... This was one of the most fun and practical AI side projects I’ve done.
🌐
HomeSeer
forums.homeseer.com › software products › hs4 plugins › media plugins › ak google cast (alexbk66)
Feature request: use Coqui or other Local AI sites for voice... - HomeSeer Message Board
I locally installed Coqui TTS which is a AI TTS system that let's you train voice AI's or download pretrained ones. It returns wav files from a website with the text embedded in the web link... ex.... http://192.168.0.9:5003/api/tts?text=Hello+there I posted a David Attenborough sample for ...
🌐
Smallest.ai
smallest.ai › blog › creating-local-text-to-speech-ai-voices-for-free
Creating Local Text-To-Speech AI Voices for Free
March 13, 2025 - ElevenLabs is a powerful AI-driven text-to-speech (TTS) platform known for its realistic and expressive AI voices. It leverages deep learning models to produce human-like speech with a natural tone, making it ideal for virtual assistants.