When it comes to locally executable models, the Whisper series seems to have a lot of know-how. However, there are other options as well. In terms of speed, FastRTC excels in real-time performance, but it’s quite specialized. Or rather, it’s cloud-based? [image] Open ASR Leaderboard - a Hu… Answer from John6666 on discuss.huggingface.co
🌐
Hugging Face
huggingface.co › models
Text-to-Speech Models – Hugging Face
Text-to-Speech • Updated 6 days ago • 18.1k • 439 · Text-to-Speech • Updated Sep 23 • 637k • • 1.33k · Text-to-Speech • 3B • Updated Nov 12 • 75k • • 821 · Text-to-Speech • 0.6B • Updated 6 days ago • 10.9k • 47 · Text-to-Speech • 0.7B • Updated Oct 10 • 23.2k • 804 ·
🌐
Hugging Face
huggingface.co › docs › transformers › tasks › text-to-speech
Text to speech
Fine-tune SpeechT5 that was originally trained on English speech on the Dutch (nl) language subset of the VoxPopuli dataset. Use your refined model for inference in one of two ways: using a pipeline or directly. Before you begin, make sure you have all the necessary libraries installed: ... Install 🤗Transformers from source as not all the SpeechT5 features have been merged into an official release yet:
🌐
Hugging Face
huggingface.co › docs › transformers › model_doc › speech_to_text
Speech2Text
The bare Speech To Text Text Model outputting raw hidden-states without any specific head on to.
🌐
GitHub
github.com › huggingface › speech-to-speech
GitHub - huggingface/speech-to-speech: Speech To Speech: an effort for an open-sourced and modular GPT4-o
OpenAI API · TTS · Parler-TTS 🤗 · MeloTTS · ChatTTS · Clone the repository: git clone https://github.com/huggingface/speech-to-speech.git cd speech-to-speech · Install the required dependencies using uv: uv pip install -r requirements.txt · For Mac users, use the requirements_mac.txt file instead: uv pip install -r requirements_mac.txt ·
Starred by 4.3K users
Forked by 485 users
Languages   Python 99.7% | Dockerfile 0.3%
🌐
Reddit
reddit.com › r/localllama › 🎧 listen and compare 12 open-source text-to-speech models (hugging face space)
r/LocalLLaMA on Reddit: 🎧 Listen and Compare 12 Open-Source Text-to-Speech Models (Hugging Face Space)
July 6, 2025 -

Hey everyone!

We have been exploring various open-source Text-to-Speech (TTS) models, and decided to create a Hugging Face demo space that makes it easy to compare their quality side-by-side.

The demo features 12 popular TTS models, all tested using a consistent prompt, so you can quickly hear and compare their synthesized speech and choose the best one for your audio projects.

Would love to get feedback or suggestions!

👉 Check out the demo space and detailed comparison here!

👉 Check out the blog: Choosing the Right Text-to-Speech Model: Part 2

Share your use-case and we will update this space as required!

Which TTS model sounds most natural to you?

Cheers!

🌐
Modal
modal.com › blog › open-source-tts
The Top Open-Source Text to Speech (TTS) Models
It’s currently the top trending text-to-speech model on Hugging Face. It’s an open sourced model that was built on top of Llama 3.2 3B, pre-trained on over 10 million hours of audio data. This model provides industry-leading expressive audio generation and multilingual voice cloning.
🌐
Hugging Face
huggingface.co › docs › transformers › en › tasks › asr
Automatic speech recognition
Fine-tune Wav2Vec2 on the MInDS-14 dataset to transcribe audio to text. Use your fine-tuned model for inference. To see all architectures and checkpoints compatible with this task, we recommend checking the task-page · Before you begin, make sure you have all the necessary libraries installed: ... We encourage you to login to your Hugging Face account so you can upload and share your model with the community.
Find elsewhere
🌐
GitHub
github.com › huggingface › parler-tts
GitHub - huggingface/parler-tts: Inference and training library for high-quality TTS models.
It is a reproduction of work from ... from Stability AI and Edinburgh University respectively. Contrarily to other TTS models, Parler-TTS is a fully open-source release....
Starred by 5.5K users
Forked by 582 users
Languages   Python
🌐
OpenAI
blog.gopenai.com › simplify-text-to-speech-with-hugging-face-open-source-model-4e8dd11b77a5
Simplify Text-to-Speech with Hugging Face Open Source Model | by LaxmiKumar Reddy Sammeta | GoPenAI
May 25, 2024 - Conclusion: In this article, I’ve ... perform Text-to-Speech using Hugging Face’s open source model kakao-enterprise/vits-vctk. By following the step-by-step guide and executing the provided code examples, you can easily integrate TTS ...
🌐
Reddit
reddit.com › r/huggingface › open source speech to text model that supports multiple language detection for real time streaming?
r/huggingface on Reddit: Open source Speech To Text model that supports Multiple language detection for Real time streaming?
January 1, 2024 -

Hi 👋 I’m researching the best speech to text model that can support multiple language to auto detect language for real time streaming.

I’m really struggling to find the right platform or service. Deepgram has done solid SEO so keep getting articles which says it’s better but also it doesn’t support auto detect language for real time streaming! Has anyone used google speech to text or any other service that supports this? Or any open source model? Thanks so much

🌐
Reddit
reddit.com › r/localllama › improved text to speech model: parler tts v1 by hugging face
r/LocalLLaMA on Reddit: Improved Text to Speech model: Parler TTS v1 by Hugging Face
August 8, 2024 -

Hi everyone, I'm VB, the GPU poor in residence (focus on open source audio and on-device ML) at Hugging Face! 🤗

Quite please to introduce you to Parler TTS v1 🔉 - 885M (Mini) & 2.2B (Large) - fully open-source Text-to-Speech models! 🤙

Some interesting things about it:

  1. Trained on 45,000 hours of open speech (datasets released as well)

  2. Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release)

  3. Mini trained on a larger text encoder, large trained on both larger text & decoder

  4. Also supports SDPA & Flash Attention 2 for an added speed boost

  5. In-built streaming, we provide a dedicated streaming class optimised for time to the first audio

  6. Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that

  7. Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do)

Apache 2.0 licensed codebase, weights and datasets! 🤗

Can't wait to see what y'all would build with this!🫡

Quick links:

Model checkpoints: https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c

Space: https://huggingface.co/spaces/parler-tts/parler_tts

GitHub Repo: https://github.com/huggingface/parler-tts

🌐
Hugging Face
huggingface.co › pantelnm › OpenSpeech-TTS
pantelnm/OpenSpeech-TTS · Hugging Face
OpenSpeech TTS is a user-friendly and efficient Text-to-Speech (TTS) API designed to seamlessly integrate with OpenAI's powerful text-to-speech capabilities.
🌐
Hugging Face
huggingface.co › learn › audio-course › chapter5 › asr_models
Pre-trained models for automatic speech recognition - Hugging Face Audio Course
import torch from transformers import pipeline device = "cuda:0" if torch.cuda.is_available() else "cpu" pipe = pipeline( "automatic-speech-recognition", model="openai/whisper-base", device=device ) Great! Now let’s transcribe the audio as before. The only change we make is passing an extra argument, max_new_tokens, which tells the model the maximum number of tokens to generate when making its prediction: ... {'text': ' He tells us that at this festive season of the year, with Christmas and roast beef looming before us, similarly is drawn from eating and its results occur most readily to the mind.'}
🌐
Hugging Face
huggingface.co › models
Automatic Speech Recognition Models – Hugging Face
Text-to-Speech · Text-to-Audio · Automatic Speech Recognition · Audio-to-Audio · Audio Classification · Voice Activity Detection · Tabular · Tabular Classification · Tabular Regression · Time Series Forecasting · Reinforcement Learning · Reinforcement Learning ·
🌐
Hugging Face
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
from transformers import pipeline synthesizer = pipeline("text-to-speech", "suno/bark") synthesizer("Look I am generating speech in three lines of code!") You can use huggingface.js to infer summarization models on Hugging Face Hub.
🌐
Hugging Face
huggingface.co › microsoft › VibeVoice-1.5B
microsoft/VibeVoice-1.5B · Hugging Face
Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 8 items • Updated 13 days ago • 171
🌐
KDnuggets
kdnuggets.com › striving-open-source-modular-gpt4o-hugging-face-speech
Striving for Open Source Modular GPT4-o with Hugging Face’s Speech To Speech - KDnuggets
In the pursuit of achieving the closed-source model capability, Hugging Face tries to emulate a project called Speech-to-Speech. The project utilizes models from the Hugging Face Transformers library on the hub to create a pipeline that can ...