🌐
Hugging Face
huggingface.co › models
Text-to-Speech Models – Hugging Face
Text-to-Speech + 44 · Parameters Reset Parameters · < 1B · 6B · 12B · 32B · 128B · > 500B · < 1B · > 500B · Libraries · PyTorch · TensorFlow · JAX · Transformers · Diffusers · sentence-transformers · Safetensors · ONNX · GGUF · Transformers.js · MLX + 41 · Apps · vLLM · TGI · llama.cpp · MLX LM · LM Studio · Ollama · Jan + 12 · Inference Providers · Groq · Novita · Nebius AI ·
🌐
Hugging Face
huggingface.co › docs › transformers › en › tasks › text-to-speech
Text to speech
Since SpeechT5 was pre-trained with English x-vectors, it performs best when using English speaker embeddings. If the synthesized speech sounds poor, try using a different speaker embedding. Increasing the training duration is also likely to enhance the quality of the results. Even so, the speech clearly is Dutch instead of English, and it does capture the voice characteristics of the speaker (compare to the original audio in the example). Another thing to experiment with is the model’s configuration.
People also ask

What models can I use for Text-to-Speech?
The KittenML/kitten-tts-nano-0.1, ResembleAI/chatterbox, fishaudio/fish-speech-1.5, and nari-labs/Dia-1.6B-0626 models can be used for Text-to-Speech.
🌐
huggingface.co
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
What is Text-to-Speech?
Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages.
🌐
huggingface.co
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
What metrics can I use for Text-to-Speech?
The and mel cepstral distortion metric can be used for Text-to-Speech.
🌐
huggingface.co
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
🌐
Reddit
reddit.com › r/localllama › 🎧 listen and compare 12 open-source text-to-speech models (hugging face space)
r/LocalLLaMA on Reddit: 🎧 Listen and Compare 12 Open-Source Text-to-Speech Models (Hugging Face Space)
July 6, 2025 -

Hey everyone!

We have been exploring various open-source Text-to-Speech (TTS) models, and decided to create a Hugging Face demo space that makes it easy to compare their quality side-by-side.

The demo features 12 popular TTS models, all tested using a consistent prompt, so you can quickly hear and compare their synthesized speech and choose the best one for your audio projects.

Would love to get feedback or suggestions!

👉 Check out the demo space and detailed comparison here!

👉 Check out the blog: Choosing the Right Text-to-Speech Model: Part 2

Share your use-case and we will update this space as required!

Which TTS model sounds most natural to you?

Cheers!

🌐
Hugging Face
huggingface.co › learn › audio-course › en › chapter6 › pre-trained_models
Pre-trained models for text-to-speech - Hugging Face Audio Course
It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice. Bark is an highly-controllable text-to-speech model, meaning you can use ...
🌐
Hugging Face
huggingface.co › models
Automatic Speech Recognition Models – Hugging Face
Text-to-Audio · Automatic Speech Recognition · Audio-to-Audio · Audio Classification · Voice Activity Detection · Tabular · Tabular Classification · Tabular Regression · Time Series Forecasting · Reinforcement Learning · Reinforcement Learning · Robotics · Other ·
🌐
Hugging Face
huggingface.co › docs › transformers › model_doc › speech_to_text
Speech2Text
The Speech2Text model was proposed in fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. It’s a transformer-based seq2seq (encoder-decoder) model designed for end-to-end ...
🌐
Hugging Face
huggingface.co › spaces › NihalGazi › Text-To-Speech-Unlimited
Realistic Text To Speech Unlimited - a Hugging Face Space by NihalGazi
Enter text, choose a voice and emotion, and generate audio. The text is checked for appropriateness before conversion. You'll get an audio file as a result.
Find elsewhere
🌐
Hugging Face
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
Text-to-Speech • Updated Mar 25 • 1.87k • 657 · Note A massively multi-lingual TTS model.
🌐
GitHub
github.com › huggingface › parler-tts
GitHub - huggingface/parler-tts: Inference and training library for high-quality TTS models.
Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).
Starred by 5.5K users
Forked by 582 users
Languages   Python
🌐
Modal
modal.com › blog › open-source-tts
The Top Open-Source Text to Speech (TTS) Models
Chatterbox is a small, fast, and easy-to-use TTS model developed by Resemble AI. Chatterbox was built atop 0.5B Llama. Until recently, it was the #1 trending TTS model on Hugging Face.
🌐
Hugging Face
huggingface.co › models
Models – Hugging Face
OVHcloud AI Endpoints · HF Inference API · WaveSpeed · Misc Reset Misc · text-to-speech · Inference Endpoints · text-generation-inference · Eval Results · Merge · 4-bit precision · custom_code · 8-bit precision · text-embeddings-inference · Mixture of Experts ·
🌐
Hugging Face
huggingface.co › blog › arena-tts
TTS Arena: Benchmarking Text-to-Speech Models in the Wild
Just submit some text, listen to two different models speak it out, and vote on which model you think is the best. The results will be organized into a leaderboard that displays the community’s highest-rated models. The field of speech synthesis has long lacked an accurate method to measure the quality of different models.
🌐
Hugging Face
huggingface.co › infinisoft › tts
infinisoft/tts · Hugging Face
If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option. ... If you plan to code or train models, clone 🐸TTS and install it locally. git clone https://github.com/coqui-ai/TTS ...
🌐
Hugging Face
huggingface.co › spaces
Spaces - Hugging Face
Generate a podcast from a script with AI voices · Running on Zero · 14 · 🏢 · Vibe Voice Large with Custom Voices (Voice Cloning) Running · Featured · 1.21k · 🎤 · Convert spoken words into text · Running · 5 · 🦜 · Generate speech from text with Paratts technology · Running on Zero · Featured · 787 · 🤫 · Transcribe audio or YouTube videos into text · Running on CPU Upgrade · Featured · 909 · 🏆 · Vote on the latest TTS models!
🌐
Hugging Face
discuss.huggingface.co › models
Real-Time Text-to-Speech Model - Models - Hugging Face Forums
January 4, 2025 - Greetings everyone, I’m currently looking for real-time tts model that can create an audio as soon as I type. Kindly guide me in this regard.
🌐
Reddit
reddit.com › r/localllama › improved text to speech model: parler tts v1 by hugging face
r/LocalLLaMA on Reddit: Improved Text to Speech model: Parler TTS v1 by Hugging Face
August 8, 2024 -

Hi everyone, I'm VB, the GPU poor in residence (focus on open source audio and on-device ML) at Hugging Face! 🤗

Quite please to introduce you to Parler TTS v1 🔉 - 885M (Mini) & 2.2B (Large) - fully open-source Text-to-Speech models! 🤙

Some interesting things about it:

  1. Trained on 45,000 hours of open speech (datasets released as well)

  2. Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release)

  3. Mini trained on a larger text encoder, large trained on both larger text & decoder

  4. Also supports SDPA & Flash Attention 2 for an added speed boost

  5. In-built streaming, we provide a dedicated streaming class optimised for time to the first audio

  6. Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that

  7. Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do)

Apache 2.0 licensed codebase, weights and datasets! 🤗

Can't wait to see what y'all would build with this!🫡

Quick links:

Model checkpoints: https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c

Space: https://huggingface.co/spaces/parler-tts/parler_tts

GitHub Repo: https://github.com/huggingface/parler-tts

🌐
Hugging Face
huggingface.co › spaces › balacoon › tts
Text-to-Speech - a Hugging Face Space by balacoon
Enter text and select a model and speaker to generate speech. Listen to the synthesized audio result.
🌐
GitHub
github.com › huggingface › speech-to-speech
GitHub - huggingface/speech-to-speech: Speech To Speech: an effort for an open-sourced and modular GPT4-o
--min_speech_ms: Minimum duration of detected voice activity to be considered speech. --min_silence_ms: Minimum length of silence intervals for segmenting speech, balancing sentence cutting and latency reduction. model_name, torch_dtype, and device are exposed for each implementation of the Speech to Text, Language Model, and Text to Speech.
Starred by 4.3K users
Forked by 485 users
Languages   Python 99.7% | Dockerfile 0.3%
🌐
Hugging Face
huggingface.co › blog › srinivasbilla › llasa-tts
The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
An open source llama3 3B finetune that acts as a text to speech model. Not only does it do incredibly realistic text to speech, it can also clone any voice with only a couple seconds of sample audio.