local text-to-speech ai windows

Best local open source Text-To-Speech and Speech-To-Text?

reddit.com › r › LocalLLaMA › comments › 1f0awd6 › best_local_open_source_texttospeech_and

I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com

GitHub

github.com › coqui-ai › TTS

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

If you are on Windows, 👑@GuyPaddock wrote installation instructions here. You can also try TTS without install with the docker image. Simply run the following command and you will be able to run TTS without installing it. docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu python3 TTS/server/server.py --list_models #To get the list of available models python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # To start a server

Starred by 43.9K users

Forked by 5.9K users

reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?

r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?

August 24, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

Faster Whisper (MIT license)
Insanely fast Whisper (Apache-2.0 license)
Distil-Whisper (MIT license)
WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)
WhisperLive (MIT license, Added here 03/2025)
WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team ^source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Top answer

1 of 38

2 of 38

I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.

Videos

youtube.com

TEXT TO SPEECH | Piper TTS on Windows AI voice 10x faster ...

12:44

YouTube

How to Install & Use Whisper AI Voice to Text - YouTube

April 5, 2023

474K

youtube.com

Most Realistic AI Text to speech! 100% Free, No Copyright, Offline ...

January 31, 2025

02:55

YouTube

Text to voice and mp3 using AI - using Windows and Word - Text ...

August 12, 2023

08:14

YouTube

How to Use OpenAI's Whisper for Perfect Transcriptions (Speech ...

October 8, 2025

07:36

YouTube

Best FREE Speech to Text AI | TurboScribe - YouTube

github.com › rhasspy › piper

GitHub - rhasspy/piper: A fast, local neural text to speech system

A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.

Starred by 10.4K users

Forked by 874 users

OBS Forums

obsproject.com › home › forums › resources › plugins

Squawk - Real-Time Local Text-to-Speech with AI | OBS Forums

June 18, 2024 - royshilkrot submitted a new resource: Squawk - Real-Time Local Text-to-Speech with AI - Generative AI engine for speech in ... Read more about this resource... Click to expand... ... Click to expand... Hi, I still have the error for windows 11.

Pup On Tech

pupontech.com › voice-to-text-on-windows-with-local-processing

Voice-to-Text on Windows with local Processing

July 17, 2025 - I’ve been going down a rabbit hole trying to find a good way to do transcription on Windows so I could do voice-to-text on Windows using local AI processing. Local processing is cool since it doesn’t destroy our environment ….. as much XD. After a bit of searching I

Microsoft Support

support.microsoft.com › en-us › windows › use-voice-typing-to-talk-instead-of-type-on-your-pc-fec94565-c4bd-329d-e59a-af033fa5689f

Use voice typing to talk instead of type on your PC - Microsoft Support

Use dictation to convert spoken words into text anywhere on your PC with Windows.

Microsoft Support

support.microsoft.com › en-us › topic › download-languages-and-voices-for-immersive-reader-read-mode-and-read-aloud-4c83a8d8-7486-42f7-8e46-2b0fdf753130

Download languages and voices for Immersive Reader, Read Mode, and Read Aloud - Microsoft Support

Language packs with text-to-speech capabilities will have the text-to-speech icon · . Select the language you would like to download, then select Next. Next, you will see the features available in your selected language and their download sizes. Check or uncheck boxes to choose which features you'd like to install, then select Install. Set as my Windows display language:translates Windows features, such as Settings and File Explorer, into your selected language.

Microsoft Learn

learn.microsoft.com › en-us › azure › ai-services › speech-service › speech-to-text

Speech to text overview - Speech service - Foundry Tools | Microsoft Learn

Transcribing spoken words into written text for documentation purposes. Enabling interactive voice response systems to transcribe user queries and commands. Real-time speech to text can be accessed via the Speech SDK, Speech CLI, and REST API, allowing integration into various applications and workflows.

Find elsewhere

Google Bing Mojeek

reddit.com › r/machinelearning › [d] locally-runnable text to speech ai?

r/MachineLearning on Reddit: [D] Locally-runnable text to speech AI?

February 10, 2023 -

I've got a 4090 and some stuff that I think it would be fun to have narrated. I've looked at some of the paid online options and $20-$30/mo for 2 hours of AI TTS is not gonna gut it. Can anyone point me to software that I can run locally that'll give me high quality?

It seems like if people are making billions of waifus in stable diffusion there ought to be something like this out there.

Top answer

1 of 15

Try TortoiseTTS on the highest quality setting

2 of 15

Pyttsx, mbrola, mimic 3. I like the mimic 3. Which is lightweight. And can run on docker or just native. I started out with mycroft which has mimic 3 build in. But you can run it just stand alone as well and quite easy to set up. https://mycroft.ai/mimic-3/ If you want to go down the rabbithole of speech synthesis and analsys check out praat praat.org it's a quiet impressive piece of software.

Notta

notta.ai › en › blog › speech-to-text-open-source

13 Best Free Speech-to-Text Open Source Engines, APIs, and AI Models

The transcription accuracy largely depends on whether you have the right language and acoustic model. The project is written in the most common language, C, allowing it to work in Windows, Linux, Android, and macOS systems. ... Julius can perform real-time speech-to-text transcription with low memory usage.

reddit.com › r/windowsapps › speechpulse - a windows app for dictation and file transcription using whisper ai models and apis - now supports realtime ai text formatting and automatic speaker diarization

r/windowsapps on Reddit: SpeechPulse - A Windows app for dictation and file transcription using Whisper AI models and APIs - Now supports realtime AI text formatting and automatic speaker diarization

June 24, 2024 -

Hi,

I am the developer of the SpeechPulse speech recognition application available for Windows.

SpeechPulse uses offline Whisper AI models and Whisper APIs for real-time speech recognition. It can type into any text input area, including text editors, web browsers, and office applications.

You can also use AI language models and OpenAI-compatible LLM APIs to enhance/transform your dictations in real time. SpeechPulse supports customizable AI templates so you can prompt your AI models and APIs for your requirements. Example use cases include grammar correction and text enhancement, Email formatting, text summarization, and code generation.

SpeechPulse also supports batch file transcription and subtitle generation. I also recently added automatic speaker diarization to the file mode. Now SpeechPulse can automatically detect how many speakers are in the audio file and then automatically segment the transcription for each individual speaker.

SpeechPulse has a one-time fee. You can also try SpeechPulse with its 30-day free trial.

I would appreciate hearing your feedback and suggestions!

Thanks.

Top answer

1 of 5

I am trying to use this for medical documentation. I tried using your default file and default settings. Today was my first day. It didn't do well at all with medications such as "metformin" or"jardiance". Do you have any suggestions?

2 of 5

This looks cool and exactly what I'm looking for! I speak fast so I've never had much luck with the built-in Windows tool. Looking forward to testing it

YouTube

youtube.com › watch

How to Use Whisper AI Speech to Text Locally - Tested and Works even with only CPU - YouTube

07:37

🚀 Learn How to Use Whisper AI for Speech-to-Text Locally! 🗣💻In this video, I’ll walk you through a complete guide on how to install and use Whisper AI, th...

Published September 25, 2024

reddit.com › r/automate › any ai tool for speech to text for windows

r/Automate on Reddit: Any AI tool for speech to text for Windows

March 15, 2025 -

My office laptop has blocked the Windows+H combination which would seamlessly enable me to speak to type so that I dont have to use my hands to type. I'm looking for similar tool which is hopefully portable, which I can use on my office laptop. Could you please help?

Top answer

1 of 5

DM me I will share a script to which you can run locally to do that.

2 of 5

Windows 11 voice access feature. Go to settings--->Accessibility--->Speech--->Voice Access turned on It will dictate whatever you say when an empty text box is clicked. It also helps you control the computer with your voice.

Vidnoz

vidnoz.com › ai-solutions › windows-text-to-speech.html

Harness the Power of Windows Text to Speech: Simple Setup and Pro Tips

Unlike traditional text-to-speech conversion, creating talking videos with AI avatars will be more engaging. ... Windows text-to-speech is a feature on Microsoft Windows that acts as built-in accessibility software which converts written text into spoken words.

Microsoft Store

apps.microsoft.com › detail › 9ngvr2b94trh

Text to Speech AI Voice - Free download and install on Windows | Microsoft Store

Text to Speech AI Voice – Convert Text into Natural Speech! 🎙️ 🚀 Transform text into realistic AI-generated speech with Text to Speech AI Voice. This app provides lifelike voices in 89 languages, supports multiple dialects and accents, ...

reddit.com › r/openai › looking for desktop apps that does speech to text directly at the cursor, using either openai whisper api or locally

r/OpenAI on Reddit: Looking for desktop apps that does speech to text directly at the cursor, using either OpenAI Whisper API or locally

May 28, 2023 -

Hi there, the Whisper model is the most powerful, the most capable speech to text (STT) implementation available to the public I have ever seen. Is there an app that will place the transcription directly at my cursor in Windows and/or macOS?

The closest I have seen do what I am asking for is

Windows https://github.com/Const-me/Whisper

macOS https://superwhisper.com/

Top answer

1 of 4

For windows: https://whispertyping.com

2 of 4

Here's an open source tool - https://github.com/dhruvyad/uttertype . You can see the demo video here .

Listnr AI

listnr.ai › blog › creating-local-text-to-speech-ai-voices-for-free

Creating Local Text-To-Speech AI Voices for Free: A Step-by-Step Guide | Listnr AI

1. What is local TTS voice generation? Local TTS voice generation refers to creating text-to-speech models on your own machine using open-source tools, giving you full control over the data and customization.

Speech to Text Conversion

speech-to-text.cloud › home › blog › locally installed tools for speech-to-text transcription and translation with faster-whisper and speech note on windows and linux

Local Speech-to-Text Transcription on Windows and Linux • Online Speech to Text Cloud

June 14, 2024 - If you are working with Windows, ... on how to use it in the next section. Faster-Whisper can be used to transcribe audio files and perform language translation on your local machine....

Voice.ai

voice.ai › home › how to use microsoft text to speech in windows settings step by step

How to Use Microsoft Text to Speech in Windows Settings Step by Step - Voice.ai

September 20, 2025 - This article outlines simple steps to configure voice settings, select a speech engine and voices, and enable read aloud in Windows, allowing your computer to speak clearly and your daily tasks to move faster and feel easier. Voice AI’s ...

TechRadar

techradar.com › pro › software & services

Best free text-to-speech software of 2025 | TechRadar

September 19, 2025 - The second option takes the form of a floating toolbar. In this mode, you can highlight text in any application and use the toolbar controls to start and customize text-to-speech. This means you can very easily use the feature in your web browser, word processor and a range of other programs.