Try https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2 (smaller option) or https://huggingface.co/openai/whisper-large-v3-turbo . Run the parakeet model using nemo library and use something like https://github.com/SYSTRAN/faster-whisper or https://github.com/ggml-org/whisper.cpp for the latter Answer from Few-Welcome3297 on reddit.com
🌐
Hugging Face
huggingface.co › models
Automatic Speech Recognition Models – Hugging Face
Text-to-Speech · Text-to-Audio · Automatic Speech Recognition · Audio-to-Audio · Audio Classification · Voice Activity Detection · Tabular · Tabular Classification · Tabular Regression · Time Series Forecasting · Reinforcement Learning · Reinforcement Learning ·
Discussions

Which hugging face llm is best for voice recognition
How do I find tune Hugging face LLM for a voice recognition project. Which model is best More on discuss.huggingface.co
🌐 discuss.huggingface.co
0
February 29, 2024
🎧 Listen and Compare 12 Open-Source Text-to-Speech Models (Hugging Face Space)
Nice to have it all in one place. It'd be even nicer to have an apples to apples comparison, thus all female or all male voices, instead of mixed like it's now. Maybe both? The CSM example sounds like it's full of artifacts, just like F5-TTS - and both were highlighted for speech quality. Maybe something went wrong during generation? At least Sesame can sound way better. The Llasa sample seems slightly broken - that's maybe a hint that this happens more often? Same with the background noise for MegaTTS3. Orpheus was probably standing in a large room during the generation 😉. More on reddit.com
🌐 r/LocalLLaMA
30
155
July 6, 2025
Real-Time Text-to-Speech Model
Greetings everyone, I’m currently looking for real-time tts model that can create an audio as soon as I type. Kindly guide me in this regard. More on discuss.huggingface.co
🌐 discuss.huggingface.co
1
January 4, 2025
Best TTS model right now that I can self host?
This one came out about a month ago and the quality of generated voice is pretty good: https://huggingface.co/2Noise/ChatTTS It only supports English and Chinese TTS, and it can add laughter and pauses which makes the results sound more like natural speech. Edit: Base on TTS Arena stats, MeloTTS and GPT-SoVITS look like they are worth checking out. ChatTTS isn't included in the TTS Arena rankings More on reddit.com
🌐 r/LocalLLaMA
119
188
July 3, 2024
🌐
Hugging Face
huggingface.co › docs › transformers › model_doc › speech_to_text
Speech2Text
Check out the from_pretrained() method to load the model weights. The bare Speech To Text Text Model outputting raw hidden-states without any specific head on to.
🌐
Hugging Face
huggingface.co › openai › whisper-large-v3
openai/whisper-large-v3 · Hugging Face
Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e.
🌐
Modal
modal.com › blog › open-source-stt
The Top Open Source Speech-to-Text (STT) Models in 2025
August 5, 2025 - Original Canary-1B (April 2024): This multilingual model supports English, German, French, and Spanish with bidirectional translation capabilities. It was trained on 85,000 hours of speech data and achieved a 6.67% word error rate on the HuggingFace ...
🌐
Hugging Face
discuss.huggingface.co › 🤗hub
Which hugging face llm is best for voice recognition - 🤗Hub - Hugging Face Forums
February 29, 2024 - How do I find tune Hugging face LLM for a voice recognition project. Which model is best
🌐
Hugging Face
huggingface.co › collections › SamuraiBarbi › speech-to-text-models
Speech to Text Models - a SamuraiBarbi Collection
Unlock the magic of AI with handpicked models, awesome datasets, papers, and mind-blowing Spaces from SamuraiBarbi
Find elsewhere
🌐
Medium
medium.com › latinxinai › heres-to-the-crazy-ones-the-misfits-45f2132623c7
Here’s to the crazy ones, the misfits: Automatic Speech Recognition with PyTorch & Hugging Face
April 17, 2024 - One of the first things I noticed ... audio extract from one of the most inspiring speeches ever: the 1963 “I have a dream” speech from Martin Luther King. from transformers import pipeline transcriber = pipeline(task="au...
🌐
Gladia
gladia.io › blog › best-open-source-speech-to-text-models
Best open-source speech-to-text models
As opposed to most alternatives which, albeit open source, are mainly fostered by the private sector, SpeechBrain originates from a strong academic background from over 30 universities worldwide and counts with a large community of support. This community has shared over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks. Over 100 models pre-trained on HuggingFace can be easily plugged and utilized or fine-tuned.
🌐
Hugging Face
huggingface.co › models
Text-to-Speech Models – Hugging Face
1 day ago - Text-to-Speech • 5B • Updated 2 days ago • 1.69k • 46 · Text-to-Speech • 3B • Updated Sep 1 • 517k • 2.09k · Text-to-Speech • Updated Apr 10 • 3.96M • • 5.4k · Text-to-Speech • Updated Dec 11, 2023 • 6.41M • 3.24k · Text-to-Speech • Updated 1 day ago • 18 • 24 ·
🌐
Reddit
reddit.com › r/localllama › 🎧 listen and compare 12 open-source text-to-speech models (hugging face space)
r/LocalLLaMA on Reddit: 🎧 Listen and Compare 12 Open-Source Text-to-Speech Models (Hugging Face Space)
July 6, 2025 -

Hey everyone!

We have been exploring various open-source Text-to-Speech (TTS) models, and decided to create a Hugging Face demo space that makes it easy to compare their quality side-by-side.

The demo features 12 popular TTS models, all tested using a consistent prompt, so you can quickly hear and compare their synthesized speech and choose the best one for your audio projects.

Would love to get feedback or suggestions!

👉 Check out the demo space and detailed comparison here!

👉 Check out the blog: Choosing the Right Text-to-Speech Model: Part 2

Share your use-case and we will update this space as required!

Which TTS model sounds most natural to you?

Cheers!

🌐
Hugging Face
huggingface.co › docs › transformers › en › tasks › text-to-speech
Text to speech
In our experience, obtaining satisfactory results from this model can be challenging. The quality of the speaker embeddings appears to be a significant factor. Since SpeechT5 was pre-trained with English x-vectors, it performs best when using English speaker embeddings.
🌐
Hugging Face
huggingface.co › spaces › NihalGazi › Text-To-Speech-Unlimited
Realistic Text To Speech Unlimited - a Hugging Face Space by NihalGazi
Enter text, choose a voice and emotion, and generate audio. The text is checked for appropriateness before conversion. You'll get an audio file as a result.
🌐
Modal
modal.com › blog › open-source-tts
The Top Open-Source Text to Speech (TTS) Models
This article explores the top open-source TTS models, based on Hugging Face’s trending models and insights from our developer community.
🌐
Hugging Face
huggingface.co › spaces › TTS-AGI › TTS-Arena
TTS Arena Legacy - a Hugging Face Space by TTS-AGI
This application displays a leaderboard of Text-to-Speech models based on user votes. Users can filter the leaderboard to show preliminary results, exclude battle votes, sort by Arena Score, and hi...
🌐
Hugging Face
discuss.huggingface.co › models
Real-Time Text-to-Speech Model - Models - Hugging Face Forums
January 4, 2025 - Greetings everyone, I’m currently looking for real-time tts model that can create an audio as soon as I type. Kindly guide me in this regard.
🌐
BentoML
bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models
The Best Open-Source Text-to-Speech Models in 2026
1 week ago - Dia is a dialogue-focused TTS model developed by Nari Labs. It is able to generate expressive, realistic multi-speaker conversations from text scripts, including nonverbal elements like laughter, coughing, or sighing. Its design makes it ideal for dynamic applications such as podcasts, audio dramas, game dialogues, or conversational interfaces. The newest release, Dia2, features a streaming architecture that can begin synthesizing speech from the first few tokens.
🌐
Analytics Vidhya
analyticsvidhya.com › home › top 12 open source models on huggingface in 2025
Top 12 Open Source Models on HuggingFace in 2025
May 9, 2025 - Indic Parler-TTS is a multilingual text-to-speech system developed collaboratively by AI4Bharat and HuggingFace to enhance linguistic inclusivity in AI applications across India. Supporting 21 languages—including Hindi, Bengali, Tamil, Telugu, ...
🌐
Hugging Face
huggingface.co › tasks › text-to-speech
What is Text-to-Speech? - Hugging Face
Text-to-Speech • Updated Mar 25 • 1.87k • 657 · Note A massively multi-lingual TTS model.