best text-to-speech ai local

Best local open source Text-To-Speech and Speech-To-Text?

reddit.com › r › LocalLLaMA › comments › 1f0awd6 › best_local_open_source_texttospeech_and

I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com

reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?

r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?

August 23, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

Faster Whisper (MIT license)
Insanely fast Whisper (Apache-2.0 license)
Distil-Whisper (MIT license)
WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)
WhisperLive (MIT license, Added here 03/2025)
WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team ^source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

Videos

30:47

YouTube

RIP ELEVENLABS! Create PERFECT TTS AI Voices LOCALLY For FREE! ...

September 20, 2025

21:59

YouTube

RIP ELEVENLABS! Here's The BEST TTS AI Voices LOCALLY For FREE! ...

April 23, 2025

17:45

YouTube

RIP ELEVENLABS! Create BEST TTS AI Voices LOCALLY For FREE! - YouTube

May 9, 2024

162K

YouTube

ComfyUI Tutorial Series Ep 33: How to Use Free & Local Text-to-Speech ...

February 11, 2025

15:48

YouTube

Possibly THE BEST Open Source Text-to-Speech Model - VibeVoice ...

September 2, 2025

View all

People also ask

What is the best free AI text to speech tool?

The best free AI text to speech tool depends on what you need natural voice quality, a variety of languages and accents, low latency, or accessibility. · ‍ · Murf AI outperforms the competition in almost every category. Our free plan gives you access to all the features of our full voice generation studio. Use the free tool on our site to generate audio completely free, no sign-up required. · Here are some key features that make Murf the best free TTS tool: · Fully free-to-use AI voiceover studio (no credit card required) · Ultra-realistic, human-like voices that are contextually aware · Full

murf.ai

murf.ai › text-to-speech

Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI

What is text to speech?

Text to speech, or TTS, also known as speech synthesisor "read aloud," is a technology that converts digital text into speech. The technology uses advanced AI algorithms and leverages AI voices (also known as synthetic voices) to generate audio from written text. It was originally developed as an assistive technology for better accessibility, but now TTS has an array of modern use cases.

murf.ai

murf.ai › text-to-speech

Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI

How does text to speech work?

Text to speech works in three simple steps: · Text normalization (text-to-word conversion): Pre-processing, or text normalization, helps the TTS software clean up the input by expanding abbreviations, acronyms, dates, and other symbols into readable words. · Phonetic conversion (word-to-phoneme conversion): The TTS model identifies phonemes the basic units of sound in the normalized text, using a phoneme library to guide pronunciation. · Speech synthesis (phoneme-to-sound conversion): A synthetic or AI-generated voice reads the phonemes aloud using pre-recorded samples or learned speech patter

murf.ai

murf.ai › text-to-speech

Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI

Devtalk

forum.devtalk.com › ai forum › ai questions/help

What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk

March 20, 2025 - Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ...

Modal

modal.com › blog › open-source-tts

The Top Open-Source Text to Speech (TTS) Models

This especially matters to use ... run on a local device, like a smart device or smartphone, or for companies producing speech at scale where costs might overwhelm other metrics of realism. If you are building a text-to-speech powered application for the first time, we highly recommend starting with Chatterbox. Developed by Resemble AI, Chatterbox ...

GitHub

github.com › coqui-ai › TTS

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option. ... If you plan to code or train models, clone 🐸TTS and install it locally. git clone https://github.com/coqui-ai/TTS pip install -e .[all,dev,notebooks] # Select the relevant extras

Starred by 43.9K users

Forked by 5.8K users

BentoML

bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models

The Best Open-Source Text-to-Speech Models in 2026

Built on a compact 0.5B-parameter LLM backbone, it delivers near-human speech quality and real-time performance. Unlike most cloud-locked systems, NeuTTS Air provides embedded voice AI capabilities on local devices such as laptops, mobile phones, ...

reddit.com › r/machinelearning › [d] locally-runnable text to speech ai?

r/MachineLearning on Reddit: [D] Locally-runnable text to speech AI?

February 10, 2023 -

I've got a 4090 and some stuff that I think it would be fun to have narrated. I've looked at some of the paid online options and $20-$30/mo for 2 hours of AI TTS is not gonna gut it. Can anyone point me to software that I can run locally that'll give me high quality?

It seems like if people are making billions of waifus in stable diffusion there ought to be something like this out there.

Top answer

1 of 15

Try TortoiseTTS on the highest quality setting

2 of 15

Pyttsx, mbrola, mimic 3. I like the mimic 3. Which is lightweight. And can run on docker or just native. I started out with mycroft which has mimic 3 build in. But you can run it just stand alone as well and quite easy to set up. https://mycroft.ai/mimic-3/ If you want to go down the rabbithole of speech synthesis and analsys check out praat praat.org it's a quiet impressive piece of software.

Find elsewhere

Google Bing Mojeek

Northflank

northflank.com › blog › best-open-source-text-to-speech-models-and-how-to-run-them

Best open source text-to-speech models and how to run them | Blog — Northflank

Explore the best open source text-to-speech models like XTTS-v2, Mozilla TTS, and Bark. Learn how to choose, deploy, and scale them for production with GPU support using Northflank.

Play.ht

play.ht

#1 Free AI Voice Generator, Text to Speech, & AI Voice Over

The Best AI Voice Generator with 200+ realistic AI voices. PlayAI is the voice platform for creators & enterprises. See our low latency Text to Speech API.

Murf AI

murf.ai › text-to-speech

Free Text to Speech Online with 200+ Realistic AI Voices | Murf AI

Text to Speech Converter by Murf

Our MCP server allows you to directly ... in your local setup. Deploy production-ready Python SDKs and get your first API call running under 5 minutes. Control voice styles, pitch and pauses with SSML tags. ... We support text to speech in all popular languages from English and Spanish to Chinese and French. Use our multilingual AI generated ... Convert Text to Voice using Murf Text to Speech online software. Input Text and convert to audio using realistic AI voices & download in mp3, wav, flac formats. Murf TTS supports 120 voices and 20 languages

Rating: 5

WillowTree

willowtreeapps.com › craft › 10-speech-to-text-models-tested

We Tested 10 Speech-to-Text Models, See Which Perform Best

Last, we evaluated whisper-large-v3-local on an Apple MacBook Pro running a M3 Max chip, 36 GB of memory, and MacOS Sequoia 15.1. Overall, assemblyai-universal-2 appeared to be the best speech-to-text model we tested.

YouTube

youtube.com › watch

My Top 5 Open-Source AI Text-to-Speech Models - YouTube

28:01

Links referenced in the video:The TTS Interface I Used - https://github.com/JarodMica/audiobook_makerGPT-SoVITS - https://github.com/RVC-Boss/GPT-SoVITSFish ...

Published February 12, 2025

Medium

medium.com › @himimemo › top-local-tts-models-with-voice-cloning-to-try-in-november-2024-947ff48c6fe6

Top Local TTS Models with Voice Cloning to Try in November 2024 | by Himimemo | Medium

November 4, 2024 - XTTS-V2 Demo: https://huggingface.co/spaces/coqui/xtts Github: https://github.com/coqui-ai/TTS Model License: https://coqui.ai/cpml (NC) E2-TTS and F5-TTS Demo: https://huggingface.co/spaces/mrfakename/E2-F5-TTS Github: https://github.com/SWivid/F5-TTS Model License: CC-BY-NC-4.0 · MaskGCT Demo: https://huggingface.co/spaces/amphion/maskgct Github: https://github.com/open-mmlab/Amphion Model License: CC-BY-NC-4.0 · Fish Speech Demo: https://huggingface.co/spaces/fishaudio/fish-speech-1 Github: https://github.com/fishaudio/fish-speech Model License: CC-BY-NC-SA 4.0

ElevenLabs

elevenlabs.io

Free AI Voice Generator & Voice Agents Platform | ElevenLabs

ElevenLabs Text to Speech

High quality, human-like AI voice generator in 70 languages

(4.5)

Price US$5.00

Google Cloud

cloud.google.com › text-to-speech

Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud

Try Gemini 3, our best model for reasoning, coding, and multimodal understanding in Vertex AI ... Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.

SPEECHMA

speechma.com › english

SPEECHMA - Best Free Text to Speech Online | Unlimited AI Voices | Commercial Use TTS 2025

Best free text to speech converter with 580+ natural AI voices. Unlimited usage with commercial license. Perfect for YouTube, TikTok & content creation.

Picovoice

picovoice.ai › blog › local-text-to-speech-with-cloud-quality

Local Text-to-Speech with Cloud Quality - Picovoice

April 29, 2025 - Everything you need to know about local text-to-speech (TTS): the challenges, choosing the right local TTS, and why Orca is developers' top choice in 2025.

Nerdynav

nerdynav.com › open-source-ai-voice

Best FREE ElevenLabs Alternatives & Opensource Text to Speech Models (2025) | Nerdynav

Self-Host (Free for Personal): Their 0.5B distilled model OpenAudio S1-mini can be downloaded and run locally. Try at HuggingFace · Commercial Users: The $9.99/month plan or the pay-as-you-go API ($15/1M chars) is a no-brainer compared to ElevenLabs’ pricing which can cost upwards of $100+/month. You get better quality at 80% less cost. ... If you absolutely need a 100% free solution for commercial use, Chatterbox is your best bet. ... Chatterbox is an MIT-licensed AI text to speech model from Resemble AI.

Smallest.ai

smallest.ai › blog › creating-local-text-to-speech-ai-voices-for-free

Creating Local Text-To-Speech AI Voices for Free

So, What if you could create local text-to-speech AI voices for free? So, if you need a voiceover for your content, a chatbot for your business, or accessibility features but don’t want to pay, you are in the right place! This guide will show you how TTS works, the best free tools, how to ...

LocalAI

localai.io › features › text-to-audio

🗣 Text to audio (TTS) :: LocalAI

1 month ago - The LocalAI TTS API is compatible with the OpenAI TTS API and the Elevenlabs API. The /tts endpoint can also be used to generate speech from text.