Iโ€™ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com
๐ŸŒ
GitHub
github.com โ€บ coqui-ai โ€บ TTS
GitHub - coqui-ai/TTS: ๐Ÿธ๐Ÿ’ฌ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
If you are on Windows, ๐Ÿ‘‘@GuyPaddock wrote installation instructions here. You can also try TTS without install with the docker image. Simply run the following command and you will be able to run TTS without installing it. docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu python3 TTS/server/server.py --list_models #To get the list of available models python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # To start a server
Starred by 43.9K users
Forked by 5.9K users
Languages ย  Python 92.0% | Jupyter Notebook 7.5% | HTML 0.3% | Shell 0.1% | Makefile 0.1% | Cython 0.0%
๐ŸŒ
Reddit
reddit.com โ€บ r/localllama โ€บ best local open source text-to-speech and speech-to-text?
r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?
August 24, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

  • Faster Whisper (MIT license)

  • Insanely fast Whisper (Apache-2.0 license)

  • Distil-Whisper (MIT license)

  • WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)

  • WhisperLive (MIT license, Added here 03/2025)

  • WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

  • Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

  • StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Top answer
1 of 38
72
Iโ€™ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech
2 of 38
15
Iโ€™ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. Iโ€™m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if youโ€™re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where youโ€™ll be deploying this.
๐ŸŒ
GitHub
github.com โ€บ rhasspy โ€บ piper
GitHub - rhasspy/piper: A fast, local neural text to speech system
A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.
Starred by 10.4K users
Forked by 874 users
Languages ย  C++ 72.8% | Python 18.8% | Jupyter Notebook 7.6% | CMake 0.3% | Shell 0.2% | Dockerfile 0.1%
๐ŸŒ
OBS Forums
obsproject.com โ€บ home โ€บ forums โ€บ resources โ€บ plugins
Squawk - Real-Time Local Text-to-Speech with AI | OBS Forums
June 18, 2024 - royshilkrot submitted a new resource: Squawk - Real-Time Local Text-to-Speech with AI - Generative AI engine for speech in ... Read more about this resource... Click to expand... ... Click to expand... Hi, I still have the error for windows 11.
๐ŸŒ
Pup On Tech
pupontech.com โ€บ voice-to-text-on-windows-with-local-processing
Voice-to-Text on Windows with local Processing
July 17, 2025 - Iโ€™ve been going down a rabbit hole trying to find a good way to do transcription on Windows so I could do voice-to-text on Windows using local AI processing. Local processing is cool since it doesnโ€™t destroy our environment โ€ฆ.. as much XD. After a bit of searching I
๐ŸŒ
Microsoft Support
support.microsoft.com โ€บ en-us โ€บ topic โ€บ download-languages-and-voices-for-immersive-reader-read-mode-and-read-aloud-4c83a8d8-7486-42f7-8e46-2b0fdf753130
Download languages and voices for Immersive Reader, Read Mode, and Read Aloud - Microsoft Support
Language packs with text-to-speech capabilities will have the text-to-speech icon ยท . Select the language you would like to download, then select Next. Next, you will see the features available in your selected language and their download sizes. Check or uncheck boxes to choose which features you'd like to install, then select Install. Set as my Windows display language:translates Windows features, such as Settings and File Explorer, into your selected language.
๐ŸŒ
Microsoft Learn
learn.microsoft.com โ€บ en-us โ€บ azure โ€บ ai-services โ€บ speech-service โ€บ speech-to-text
Speech to text overview - Speech service - Foundry Tools | Microsoft Learn
Transcribing spoken words into written text for documentation purposes. Enabling interactive voice response systems to transcribe user queries and commands. Real-time speech to text can be accessed via the Speech SDK, Speech CLI, and REST API, allowing integration into various applications and workflows.
Find elsewhere
๐ŸŒ
Notta
notta.ai โ€บ en โ€บ blog โ€บ speech-to-text-open-source
13 Best Free Speech-to-Text Open Source Engines, APIs, and AI Models
The transcription accuracy largely depends on whether you have the right language and acoustic model. The project is written in the most common language, C, allowing it to work in Windows, Linux, Android, and macOS systems. ... Julius can perform real-time speech-to-text transcription with low memory usage.
๐ŸŒ
Reddit
reddit.com โ€บ r/windowsapps โ€บ speechpulse - a windows app for dictation and file transcription using whisper ai models and apis - now supports realtime ai text formatting and automatic speaker diarization
r/windowsapps on Reddit: SpeechPulse - A Windows app for dictation and file transcription using Whisper AI models and APIs - Now supports realtime AI text formatting and automatic speaker diarization
June 24, 2024 -

Hi,

I am the developer of the SpeechPulse speech recognition application available for Windows.

SpeechPulse uses offline Whisper AI models and Whisper APIs for real-time speech recognition. It can type into any text input area, including text editors, web browsers, and office applications.

You can also use AI language models and OpenAI-compatible LLM APIs to enhance/transform your dictations in real time. SpeechPulse supports customizable AI templates so you can prompt your AI models and APIs for your requirements. Example use cases include grammar correction and text enhancement, Email formatting, text summarization, and code generation.

SpeechPulse also supports batch file transcription and subtitle generation. I also recently added automatic speaker diarization to the file mode. Now SpeechPulse can automatically detect how many speakers are in the audio file and then automatically segment the transcription for each individual speaker.

SpeechPulse has a one-time fee. You can also try SpeechPulse with its 30-day free trial.

I would appreciate hearing your feedback and suggestions!

Thanks.

๐ŸŒ
YouTube
youtube.com โ€บ watch
How to Use Whisper AI Speech to Text Locally - Tested and Works even with only CPU - YouTube
๐Ÿš€ Learn How to Use Whisper AI for Speech-to-Text Locally! ๐Ÿ—ฃ๐Ÿ’ปIn this video, Iโ€™ll walk you through a complete guide on how to install and use Whisper AI, th...
Published ย  September 25, 2024
๐ŸŒ
Vidnoz
vidnoz.com โ€บ ai-solutions โ€บ windows-text-to-speech.html
Harness the Power of Windows Text to Speech: Simple Setup and Pro Tips
Unlike traditional text-to-speech conversion, creating talking videos with AI avatars will be more engaging. ... Windows text-to-speech is a feature on Microsoft Windows that acts as built-in accessibility software which converts written text into spoken words.
๐ŸŒ
Microsoft Store
apps.microsoft.com โ€บ detail โ€บ 9ngvr2b94trh
Text to Speech AI Voice - Free download and install on Windows | Microsoft Store
Text to Speech AI Voice โ€“ Convert Text into Natural Speech! ๐ŸŽ™๏ธ ๐Ÿš€ Transform text into realistic AI-generated speech with Text to Speech AI Voice. This app provides lifelike voices in 89 languages, supports multiple dialects and accents, ...
๐ŸŒ
Reddit
reddit.com โ€บ r/openai โ€บ looking for desktop apps that does speech to text directly at the cursor, using either openai whisper api or locally
r/OpenAI on Reddit: Looking for desktop apps that does speech to text directly at the cursor, using either OpenAI Whisper API or locally
May 28, 2023 -

Hi there, the Whisper model is the most powerful, the most capable speech to text (STT) implementation available to the public I have ever seen. Is there an app that will place the transcription directly at my cursor in Windows and/or macOS?

The closest I have seen do what I am asking for is

Windows https://github.com/Const-me/Whisper

macOS https://superwhisper.com/

๐ŸŒ
Listnr AI
listnr.ai โ€บ blog โ€บ creating-local-text-to-speech-ai-voices-for-free
Creating Local Text-To-Speech AI Voices for Free: A Step-by-Step Guide | Listnr AI
1. What is local TTS voice generation? Local TTS voice generation refers to creating text-to-speech models on your own machine using open-source tools, giving you full control over the data and customization.
๐ŸŒ
Voice.ai
voice.ai โ€บ home โ€บ how to use microsoft text to speech in windows settings step by step
How to Use Microsoft Text to Speech in Windows Settings Step by Step - Voice.ai
September 20, 2025 - This article outlines simple steps to configure voice settings, select a speech engine and voices, and enable read aloud in Windows, allowing your computer to speak clearly and your daily tasks to move faster and feel easier. Voice AIโ€™s ...
๐ŸŒ
TechRadar
techradar.com โ€บ pro โ€บ software & services
Best free text-to-speech software of 2025 | TechRadar
September 19, 2025 - The second option takes the form of a floating toolbar. In this mode, you can highlight text in any application and use the toolbar controls to start and customize text-to-speech. This means you can very easily use the feature in your web browser, word processor and a range of other programs.