local ai speech-to text - Brave Search

Best local open source Text-To-Speech and Speech-To-Text?

reddit.com › r › LocalLLaMA › comments › 1f0awd6 › best_local_open_source_texttospeech_and

I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech Answer from jpummill2 on reddit.com

localai.io › features › audio-to-text

🔈 Audio to text :: LocalAI

1 month ago - Audio to text models are models that can generate text from an audio file. The transcription endpoint allows to convert audio files to text. The endpoint is based on whisper.cpp, a C++ library for audio transcription. The endpoint input supports all the audio formats supported by ffmpeg.

reddit.com › r/localllama › best speech to text transcription? local model or api

r/LocalLLaMA on Reddit: Best speech to text transcription? Local model or api

July 4, 2024 -

Hey guys, I would like to add speech to text transcription. Hosting an open source model on cloud so I can do this anywhere would be good.

Do you guys know any highly accurate STT open source models that is highly accurate?

Also, can it run on CPU or GPU is a must?

Whisper is really good, at least for single speaker text. It's less good for multi-speaker text, but still plenty usable.

Whisper.cpp runs on CPU, is pretty fast if you have decent hardware, and in my experience fantastically recognizes almost all words and sentences correctly. I was definitely very satisfied with the results. Never tested it with languages other than English, it may require thorough testing before deciding to use a model of specific size.

Discussions

Best local open source Text-To-Speech and Speech-To-Text?

I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech More on reddit.com

r/LocalLLaMA

141

346

August 24, 2024

Squawk - Real-Time Local Text-to-Speech with AI

royshilkrot submitted a new resource: Squawk - Real-Time Local Text-to-Speech with AI - Generative AI engine for speech in The OBS Squawk plugin adds powerful voice cloning capabilities to OBS by leveraging sherpa-onnx. With this plugin, you can generate speech on the fly and in real-time... More on obsproject.com

obsproject.com

22

June 19, 2024

What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk

Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ... More on forum.devtalk.com

forum.devtalk.com

2

March 20, 2025

Feature request: use Coqui or other Local AI sites for voice... - HomeSeer Message Board

Add server: (wav or mp3) Link (192.168.0.9:5002/api/tts?text=) Then you can select a voice server-- or maybe have an option to change the server from an event.... HS3 Pro Edition 3.0.0.435 (Windows Server 8.1 on ESXi box) Plug-Ins Enabled: Z-Wave:,RaspberryIO:,AirplaySpeak:,Ecobee:, weatherXML:,JowiHue:,APCUPSD:,PHLocation:,Chromecast:,EasyTr igger: ... My implementation already supports selecting different providers. And I'm now adding Microsoft SpeechSynthesizer ... More on forums.homeseer.com

forums.homeseer.com

Videos

RIP ELEVENLABS! Create PERFECT TTS AI Voices LOCALLY For FREE! ...

September 20, 2025

How to Use Whisper AI Speech to Text Locally - Tested and Works ...

September 25, 2024

Local and Open Source Speech to Speech Assistant - YouTube

September 12, 2024

Build A LOCAL AI Voice Chatbot with Raspberry Pi – (COMPLETE ...

September 3, 2025

RIP ELEVENLABS! Here's The BEST TTS AI Voices LOCALLY For FREE! ...

My Top 5 Open-Source AI Text-to-Speech Models - YouTube

February 12, 2025

willowtreeapps.com › craft › 10-speech-to-text-models-tested

We Tested 10 Speech-to-Text Models, See Which Perform Best

Important note: azure-ai-speech requires batch processing for larger audio files, which requires moving the audio file to a cloud storage bucket before processing, potentially resulting in additional costs. Gladia requires uploading the file to their servers and then performing transcription on the audio_url generated. During our tests using a short French clip about 30 seconds long, whisper-large-v3-local performed the best.

platform.openai.com › docs › guides › text-to-speech

Text to speech | OpenAI API

The Audio API provides a speech endpoint based on our GPT-4o mini TTS (text-to-speech) model. It comes with 11 built-in voices and can be used to: ... Our usage policies require you to provide a clear disclosure to end users that the TTS voice ...

meetjamie.ai › blog › 10-best-speech-to-text-software

10 Best Speech-to-Text Software [Updated August 2025] | Jamie

5 days ago - Aiko is a high-quality on-device transcription tool that converts speech to text from meetings, lectures, and more. Powered by OpenAI’s Whisper model, it runs locally on your device for fast and reliable transcription.

gladia.io › blog › best-open-source-speech-to-text-models

Gladia - Top 5 Open-Source Speech-to-Text Models for Enterprises

Whisper, DeepSpeech, Kaldi, Wav2vec, or SpeechBrain: key factors to consider when choosing an open-source ASR model for your apps and projects.

github.com › KoljaB › RealtimeSTT

GitHub - KoljaB/RealtimeSTT: A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription. - KoljaB/RealtimeSTT

Starred by 9K users

Forked by 764 users

Languages Python 96.1% | HTML 2.1%

Find elsewhere

Google Bing Mojeek

bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models

The Best Open-Source Text-to-Speech Models in 2026

1 week ago - Built on a compact 0.5B-parameter LLM backbone, it delivers near-human speech quality and real-time performance. Unlike most cloud-locked systems, NeuTTS Air provides embedded voice AI capabilities on local devices such as laptops, mobile phones, ...

reddit.com › r/localllama › best local open source text-to-speech and speech-to-text?

r/LocalLLaMA on Reddit: Best local open source Text-To-Speech and Speech-To-Text?

August 24, 2024 -

I am working on a custom data-management software and for a while now I've been working and looking into possibility of integrating and modifying existing local conversational AI's into it (or at least developing the possibility of doing so in the future). The first thing I've been struggling with is that information is somewhat hard to come by - searches often lead me back here to r/LocalLLaMA/ and a year old threads in r/MachineLearning. Is anyone keeping track of what is out there what is worth the attention? I am posting this here in hope of finding some info while also sharing what I know for anyone who finds it useful or is interested.

I've noticed that most open source projects are based on Open AI's Whisper and it's re-implemented versions like:

Faster Whisper (MIT license)
Insanely fast Whisper (Apache-2.0 license)
Distil-Whisper (MIT license)
WhisperSpeech by github.com/collabora (MIT license, Added here 03/2025)
WhisperLive (MIT license, Added here 03/2025)
WhisperFusion, which is WhisperSpeech+WhisperLive in one package. (Added here 03/2025)

Coqui AI's TTS and STT -models (MPL-2.0 license) have gained some traction, but on their site they have stated that they're shutting down.

Tortoise TTS (Apache-2.0 license) and its re-implemented versions such as:

Tortoise-TTS-fast (AGPL-3.0, Apache-2.0 licenses) and its slightly faster(?) fork (AGPL-3.0 license).

StyleTTS and it's newer version:

StyleTTS2 (MIT license)

Alibaba Group's Tongyi SpeechTeam's SenseVoice (STT) [MIT license+possibly others] and CosyVoice (TTS) [Apache-2.0 license].

(11.2.2025): I will try to maintain this list so will begin adding new ones as well.

1/2025 Kokoro TTS (MIT License)
2/2025 Zonos by Zyphra (Apache-2.0 license)
3/2025 added: Metavoice (Apache-2.0 license)
3/2025 added: F5-TTS (MIT license)
3/2025 added: Orpheus-TTS by canopylabs.ai (Apache-2.0 license)
3/2025 added: MegaTTS3 (Apache-2.0 license)
4/2025 added: Index-tts (Apache-2.0 license). [Can be tried here.]
4/2025 added: Dia TTS (Apache-2.0 license) [Can be tried here.]
5/2025 added: Spark-TTS (Apache-2.0 license)[Can be tried here.]
5/2025 added: Parakeet TDT 0.6B V2 (CC-BY-4.0 license), STT English only [Can be tried here.], update: V3 is multilingual and has an onnx -version.

8/2025 added: Verbify-TTS (MIT License) by reddit user u/MattePalte. Described as simple locally run screen-reader-style app.

8/2025 added: Chatterbox-TTS (MIT License) [Can be tried here.]

8/2025 added: Microsoft's VibeVoice TTS (MIT Licence) for generating consistent long-form dialogues. Comes in 1.5B and 7B sizes. Both models can be tried here. 0.5B model is also on the way. This one also already has a ComfyUI wrapper by u/Fabix84/ (additional info here). Quantized versions by u/teachersecret can be found here

8/2025 added: BosonAI's Higgs Audio TTS (Apache-2.0 license). Can be tried here and further tested here. This one supports complex long-form dialogues. Extra prompting is supposed to allow setting the scene and adjusting expressions. Also has a quantized (4bit fork) version.

8/2025 added: StepFun AI's (Chinese AI-team ^source) Step-Audio 2 Mini Speech-To-Speech (Apache-2.0 license) a 8B "speech-to-speech" (Audio-To-Tokens + Tokens-To-Audio) -model. Added because related, even if bypasses the "to-text" -part.

---------------------------------------------------------

Edit1: Added Distil-Whisper because "insanely fast whisper" is not a model, but these were shipped together.
Edit2: StyleTTS2FineTune is not actually a different version of StyleTTS2, but rather a framework to finetuning it.
Edit3(11.2.2025): as suggested by u/caidong I added Kokoro TTS + also added Zonos to the list.
Edit4(20.3.2025): as suggested by u/Trysem , added WhisperSpeech, WhisperLive, WhisperFusion, Metavoice and F5-TTS.
Edit5(22.3.2025): Added Orpheus-TTS.
Edit6(28.3.2025): Added MegaTTS3.
Edit7(11.4.2025): as suggested by u/Trysem/, added Index-tts.
Edit8(24.4.2025): Added Dia TTS (Nari-labs).
Edit9(02.5.2025): Added Spark-TTS as suggested by u/Tandulim (here)
Edit9(02.5.2025): Added Parakeet TDT 0.6B V2. More info in this thread.

Edit10(29.8.2025): As originally suggested by u/Trysem and later by u/Nitroedge added Chatterbox-TTS to the list.

Edit10(29.8.2025): u/MattePalte asked me to add his own TTS called Verbify-TTS to the list.

Edit10(29.8.2025): Added Microsoft's recently released VibeVoice TTS, BosonAI's Higgs Audio TTS and StepFun's STS. +Extra info.

Edit11+12(1.9.2025): Added VibeVoice TTS's quantized versions and Parakeet V3.

I’ve been trying to keep a list of TTS solutions. Here you go: Text to Speech Solutions 11labs - Commercial xtts xtts2 Alltalk Styletts2 Fish-Speech PiperTTS - A fast, local neural text to speech system that is optimized for the Raspberry Pi 4. PiperUI Paroli - Streaming mode implementation of the Piper TTS with RK3588 NPU acceleration support. Bark Tortoise TTS LMNT AlwaysReddy - (uses Piper) Open-LLM-VTuber MeloTTS OpenVoice Sherpa-onnx Silero Neuro-sama Parler TTS Chat TTS VallE-X Coqui TTS Daswers XTTS GUI VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech

I’ve been using alltalktts ( https://github.com/erew123/alltalk_tts ) which is based off of coqui and supports XTTS2, piper and some others. I’m on a Mac so my options are pretty limited, and this worked fairly well. If xtts is the model you want to go with, then maybe https://github.com/daswer123/xtts-api-server would work even better. Unfortunately most of my cases are in SillyTavern, for narration, and character tts, so these may not be the use case for you. The last link I shared might give you ideas for how to implement that on a real application though. Are you a dev-like person, or just enthusiastic about it? I ask because if you’re a dev with some Python knowledge, or willingness to follow code, the later link is actually pretty useful for ideas, in spite of being targeted towards SillyTavern. If not, this is whole space might be kind of hard to navigate at this point in time, and also will depend a lot on the hardware where you’ll be deploying this.

github.com › rhasspy › piper

GitHub - rhasspy/piper: A fast, local neural text to speech system

October 6, 2025 - A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.

Starred by 10.4K users

Forked by 874 users

Languages C++ 72.8% | Python 18.8% | Jupyter Notebook 7.6% | CMake 0.3% | Shell 0.2% | Dockerfile 0.1%

obsproject.com › home › forums › resources › plugins

Squawk - Real-Time Local Text-to-Speech with AI | OBS Forums

June 19, 2024 - royshilkrot submitted a new resource: Squawk - Real-Time Local Text-to-Speech with AI - Generative AI engine for speech in The OBS Squawk plugin adds powerful voice cloning capabilities to OBS by leveraging sherpa-onnx.

forum.devtalk.com › ai forum › ai questions/help

What are the best text-to-speech ai generation tools that you can run locally? - AI Questions/Help - Devtalk

March 20, 2025 - Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making. I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost ...

github.com › coqui-ai › TTS

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - coqui-ai/TTS

Starred by 43.9K users

Forked by 5.9K users

Languages Python 92.0% | Jupyter Notebook 7.5% | HTML 0.3% | Shell 0.1% | Makefile 0.1% | Cython 0.0%

picovoice.ai › blog › local-text-to-speech-with-cloud-quality

Local Text-to-Speech with Cloud Quality - Picovoice

April 29, 2025 - Get dedicated help specific to your use case and for your hardware and software choices.Consult an AI Expert · TLDR: Local Text-to-Speech (TTS) converts text to speech offline, making it ideal for apps that need low latency and privacy.

listnr.ai › blog › creating-local-text-to-speech-ai-voices-for-free

Creating Local Text-To-Speech AI Voices for Free: A Step-by-Step Guide | Listnr AI

April 7, 2025 - 1. What is local TTS voice generation? Local TTS voice generation refers to creating text-to-speech models on your own machine using open-source tools, giving you full control over the data and customization.

localai.io › features › text-to-audio

🗣 Text to audio (TTS) :: LocalAI

1 month ago - With this config, you can now use the following curl command to generate a text-to-speech audio file: curl -L http://localhost:8080/tts \ -H "Content-Type: application/json" \ -d '{ "model": "xtts_v2", "input": "Bonjour, je suis Ana Florence. Comment puis-je vous aider?" }' | aplay

youtube.com › aitrepreneur

RIP ELEVENLABS! Create BEST TTS AI Voices LOCALLY For FREE! - YouTube

Say goodbye to expensive AI voice generators like ELEVENLABS! In this ULTIMATE guide, I'll show you how to create the BEST text-to-speech AI voices on your l...

Published May 9, 2024

Views 162K

medium.com › @dartisan › how-i-built-a-local-ai-chatbot-that-can-talk-listen-and-read-my-files-675f120098fe

How I Built a Local AI Chatbot That Can Talk, Listen, and ...

April 6, 2025 - 💬 Chat with local LLM using [mistral:7b] from Ollama · 📄 Upload PDFs or text files and ask questions about them ... This makes the assistant aware of the document’s content without fine-tuning. ... It’s surprisingly natural to just ask questions aloud and hear the bot reply. Every conversation is saved as a JSON file per session. When the app starts, it reloads the previous messages and keeps context. ... This was one of the most fun and practical AI side projects I’ve done.

forums.homeseer.com › software products › hs4 plugins › media plugins › ak google cast (alexbk66)

Feature request: use Coqui or other Local AI sites for voice... - HomeSeer Message Board

I locally installed Coqui TTS which is a AI TTS system that let's you train voice AI's or download pretrained ones. It returns wav files from a website with the text embedded in the web link... ex.... http://192.168.0.9:5003/api/tts?text=Hello+there I posted a David Attenborough sample for ...

smallest.ai › blog › creating-local-text-to-speech-ai-voices-for-free

Creating Local Text-To-Speech AI Voices for Free

March 13, 2025 - ElevenLabs is a powerful AI-driven text-to-speech (TTS) platform known for its realistic and expressive AI voices. It leverages deep learning models to produce human-like speech with a natural tone, making it ideal for virtual assistants.