Try https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2 (smaller option) or https://huggingface.co/openai/whisper-large-v3-turbo . Run the parakeet model using nemo library and use something like https://github.com/SYSTRAN/faster-whisper or https://github.com/ggml-org/whisper.cpp for the latter Answer from Few-Welcome3297 on reddit.com
🌐
Hugging Face
huggingface.co › models
Text-to-Speech Models – Hugging Face
Text-to-Speech • Updated 6 days ago • 18.1k • 439 · Text-to-Speech • Updated Sep 23 • 637k • • 1.33k · Text-to-Speech • 3B • Updated Nov 12 • 75k • • 821 · Text-to-Speech • 0.6B • Updated 6 days ago • 10.9k • 47 · Text-to-Speech • 0.7B • Updated Oct 10 • 23.2k • 804 ·
🌐
Hugging Face
huggingface.co › docs › transformers › en › tasks › text-to-speech
Text to speech
In our experience, obtaining satisfactory results from this model can be challenging. The quality of the speaker embeddings appears to be a significant factor. Since SpeechT5 was pre-trained with English x-vectors, it performs best when using English speaker embeddings.
People also ask

What models can I use for Text-to-Speech?
The KittenML/kitten-tts-nano-0.1, ResembleAI/chatterbox, fishaudio/fish-speech-1.5, and nari-labs/Dia-1.6B-0626 models can be used for Text-to-Speech.
🌐
huggingface.co
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
What is Text-to-Speech?
Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages.
🌐
huggingface.co
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
What datasets can I use for Text-to-Speech?
The parler-tts/mls_eng_10k, mythicinfinity/libritts_r, and facebook/multilingual_librispeech datasets can be used for Text-to-Speech.
🌐
huggingface.co
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
🌐
Reddit
reddit.com › r/localllama › 🎧 listen and compare 12 open-source text-to-speech models (hugging face space)
r/LocalLLaMA on Reddit: 🎧 Listen and Compare 12 Open-Source Text-to-Speech Models (Hugging Face Space)
July 6, 2025 -

Hey everyone!

We have been exploring various open-source Text-to-Speech (TTS) models, and decided to create a Hugging Face demo space that makes it easy to compare their quality side-by-side.

The demo features 12 popular TTS models, all tested using a consistent prompt, so you can quickly hear and compare their synthesized speech and choose the best one for your audio projects.

Would love to get feedback or suggestions!

👉 Check out the demo space and detailed comparison here!

👉 Check out the blog: Choosing the Right Text-to-Speech Model: Part 2

Share your use-case and we will update this space as required!

Which TTS model sounds most natural to you?

Cheers!

🌐
Hugging Face
huggingface.co › spaces › NihalGazi › Text-To-Speech-Unlimited
Realistic Text To Speech Unlimited - a Hugging Face Space by NihalGazi
Enter text, choose a voice and emotion, and generate audio. The text is checked for appropriateness before conversion. You'll get an audio file as a result.
🌐
Hugging Face
huggingface.co › models
Automatic Speech Recognition Models – Hugging Face
Text-to-Speech · Text-to-Audio · Automatic Speech Recognition · Audio-to-Audio · Audio Classification · Voice Activity Detection · Tabular · Tabular Classification · Tabular Regression · Time Series Forecasting · Reinforcement Learning · Reinforcement Learning ·
🌐
Hugging Face
huggingface.co › docs › transformers › model_doc › speech_to_text
Speech2Text
Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights. The bare Speech To Text Text Model outputting raw hidden-states without any specific head on to.
🌐
Hugging Face
huggingface.co › learn › audio-course › en › chapter6 › pre-trained_models
Pre-trained models for text-to-speech - Hugging Face Audio Course
... Let’s listen to the result. The sample rate used by SpeechT5 is always 16 kHz. ... Feel free to play with the SpeechT5 text-to-speech demo, explore other voices, experiment with inputs.
Find elsewhere
🌐
Hugging Face
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
Text-to-Speech • Updated Mar 25 • 1.87k • 657 · Note A massively multi-lingual TTS model.
🌐
GitHub
github.com › huggingface › parler-tts
GitHub - huggingface/parler-tts: Inference and training library for high-quality TTS models.
It is a reproduction of work from ... from Stability AI and Edinburgh University respectively. Contrarily to other TTS models, Parler-TTS is a fully open-source release....
Starred by 5.5K users
Forked by 582 users
Languages   Python
🌐
Hugging Face
discuss.huggingface.co › models
Real-Time Text-to-Speech Model - Models - Hugging Face Forums
January 4, 2025 - Greetings everyone, I’m currently looking for real-time tts model that can create an audio as soon as I type. Kindly guide me in this regard.
🌐
Modal
modal.com › blog › open-source-tts
The Top Open-Source Text to Speech (TTS) Models
This article explores the top open-source TTS models, based on Hugging Face’s trending models and insights from our developer community.
🌐
Hugging Face
huggingface.co › models
Models – Hugging Face
Docker Model Runner · Lemonade · Inference Providers Select all · Groq · Novita · Nebius AI · Cerebras · SambaNova · Nscale · fal · Hyperbolic · Together AI · Fireworks · Featherless AI · Zai · Replicate · Cohere · Scaleway ...
🌐
Hugging Face
huggingface.co › blog › arena-tts
TTS Arena: Benchmarking Text-to-Speech Models in the Wild
Just submit some text, listen to two different models speak it out, and vote on which model you think is the best. The results will be organized into a leaderboard that displays the community’s highest-rated models. The field of speech synthesis has long lacked an accurate method to measure the quality of different models.
🌐
GitHub
github.com › huggingface › speech-to-speech
GitHub - huggingface/speech-to-speech: Speech To Speech: an effort for an open-sourced and modular GPT4-o
The pipeline provides a fully open and modular approach, with a focus on leveraging models available through the Transformers library on the Hugging Face hub.
Starred by 4.3K users
Forked by 485 users
Languages   Python 99.7% | Dockerfile 0.3%
🌐
Reddit
reddit.com › r/huggingface › text to speech is getting crazy good - hierspeech++, xtts & styletts2!
r/huggingface on Reddit: Text to Speech is getting CRAZY GOOD - HierSpeech++, XTTS & StyleTTS2!
September 7, 2023 -

Hey,

AI has been going crazy lately and things are changing super fast. I created a video covering a few trending TTS (Text to speech) publicly available huggingface spaces, check it out!

https://www.youtube.com/watch?v=4-2Jk8muo7c

I can't wait to start testing these models application usages in future videos, it seems like this technology is finally starting to gain momentum!

Let me know what you think about it, or if you have any questions / requests for other videos as well,

cheers

🌐
Hugging Face
huggingface.co › spaces
Spaces - Hugging Face
Generate natural-sounding speech from text · Running on Zero · 29 · 🚀 · TTS demo for T5Gemma-TTS model · Running · Featured · 1.73k · 🔥 · Free Text-To-Speech generator with Emotion control (OpenAI) Running on Zero · 609 · 🏢 · Generate expressive speech from text with emotion control ·
🌐
Hugging Face
huggingface.co › infinisoft › tts
infinisoft/tts · Hugging Face
Tools to curate Text2Speech datasets underdataset_analysis. Utilities to use and test your models. Modular (but not too much) code base enabling easy implementation of new ideas. ... You can also help us implement more models. 🐸TTS is tested on Ubuntu 18.04 with python >= 3.7, < 3.11.. If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.
🌐
Hugging Face
huggingface.co › fractalego › personal-speech-to-text-model
fractalego/personal-speech-to-text-model · Hugging Face
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
🌐
Reddit
reddit.com › r/localllama › improved text to speech model: parler tts v1 by hugging face
r/LocalLLaMA on Reddit: Improved Text to Speech model: Parler TTS v1 by Hugging Face
August 8, 2024 -

Hi everyone, I'm VB, the GPU poor in residence (focus on open source audio and on-device ML) at Hugging Face! 🤗

Quite please to introduce you to Parler TTS v1 🔉 - 885M (Mini) & 2.2B (Large) - fully open-source Text-to-Speech models! 🤙

Some interesting things about it:

  1. Trained on 45,000 hours of open speech (datasets released as well)

  2. Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release)

  3. Mini trained on a larger text encoder, large trained on both larger text & decoder

  4. Also supports SDPA & Flash Attention 2 for an added speed boost

  5. In-built streaming, we provide a dedicated streaming class optimised for time to the first audio

  6. Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that

  7. Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do)

Apache 2.0 licensed codebase, weights and datasets! 🤗

Can't wait to see what y'all would build with this!🫡

Quick links:

Model checkpoints: https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c

Space: https://huggingface.co/spaces/parler-tts/parler_tts

GitHub Repo: https://github.com/huggingface/parler-tts