best speech to text model huggingface

What's the Best Speech-to-Text Model Right Now?

reddit.com › r › LocalLLaMA › comments › 1ng8bec › whats_the_best_speechtotext_model_right_now

Try https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2 (smaller option) or https://huggingface.co/openai/whisper-large-v3-turbo . Run the parakeet model using nemo library and use something like https://github.com/SYSTRAN/faster-whisper or https://github.com/ggml-org/whisper.cpp for the latter Answer from Few-Welcome3297 on reddit.com

Hugging Face

huggingface.co › models

Automatic Speech Recognition Models – Hugging Face

Text-to-Speech · Text-to-Audio · Automatic Speech Recognition · Audio-to-Audio · Audio Classification · Voice Activity Detection · Tabular · Tabular Classification · Tabular Regression · Time Series Forecasting · Reinforcement Learning · Reinforcement Learning ·

reddit.com › r/localllama › what's the best speech-to-text model right now?

r/LocalLLaMA on Reddit: What's the Best Speech-to-Text Model Right Now?

September 13, 2025 -

I am looking for the best Speech-to-Text/Speech Recognition Models, anyone could recommend any?

Videos

18:42

YouTube

3 steps to run HuggingFace 🤗 "Parler TTS" AI Voice on your local ...

October 13, 2024

05:15

YouTube

Let's Dive into a Speech Generation with AI Models Tutorial | ...

March 11, 2024

m.youtube.com

Train your custom Speech Recognition Model with Hugging ...

youtube.com

Build Generative AI Text to Speech tool using Hugging Face ...

21:30

YouTube

Creating a Text to Speech AI App with Hugging Face & Next.js" - ...

August 24, 2023

8.56K

reddit.com

r/LocalLLaMA on Reddit: Chatterbox Turbo, new open-source voice ...

1 day ago

View all

Hugging Face

huggingface.co › docs › transformers › model_doc › speech_to_text

Speech2Text

Check out the from_pretrained() method to load the model weights. The bare Speech To Text Text Model outputting raw hidden-states without any specific head on to.

Hugging Face

huggingface.co › openai › whisper-large-v3

openai/whisper-large-v3 · Hugging Face

Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e.

Modal

modal.com › blog › open-source-stt

The Top Open Source Speech-to-Text (STT) Models in 2025

August 5, 2025 - Original Canary-1B (April 2024): This multilingual model supports English, German, French, and Spanish with bidirectional translation capabilities. It was trained on 85,000 hours of speech data and achieved a 6.67% word error rate on the HuggingFace ...

Hugging Face

discuss.huggingface.co › 🤗hub

Which hugging face llm is best for voice recognition - 🤗Hub - Hugging Face Forums

February 29, 2024 - How do I find tune Hugging face LLM for a voice recognition project. Which model is best

Hugging Face

huggingface.co › collections › SamuraiBarbi › speech-to-text-models

Speech to Text Models - a SamuraiBarbi Collection

Unlock the magic of AI with handpicked models, awesome datasets, papers, and mind-blowing Spaces from SamuraiBarbi

Find elsewhere

Google Bing Mojeek

Medium

medium.com › latinxinai › heres-to-the-crazy-ones-the-misfits-45f2132623c7

Here’s to the crazy ones, the misfits: Automatic Speech Recognition with PyTorch & Hugging Face

April 17, 2024 - One of the first things I noticed ... audio extract from one of the most inspiring speeches ever: the 1963 “I have a dream” speech from Martin Luther King. from transformers import pipeline transcriber = pipeline(task="au...

Gladia

gladia.io › blog › best-open-source-speech-to-text-models

Best open-source speech-to-text models

As opposed to most alternatives which, albeit open source, are mainly fostered by the private sector, SpeechBrain originates from a strong academic background from over 30 universities worldwide and counts with a large community of support. This community has shared over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks. Over 100 models pre-trained on HuggingFace can be easily plugged and utilized or fine-tuned.

Hugging Face

huggingface.co › models

Text-to-Speech Models – Hugging Face

1 day ago - Text-to-Speech • 5B • Updated 2 days ago • 1.69k • 46 · Text-to-Speech • 3B • Updated Sep 1 • 517k • 2.09k · Text-to-Speech • Updated Apr 10 • 3.96M • • 5.4k · Text-to-Speech • Updated Dec 11, 2023 • 6.41M • 3.24k · Text-to-Speech • Updated 1 day ago • 18 • 24 ·

reddit.com › r/localllama › 🎧 listen and compare 12 open-source text-to-speech models (hugging face space)

r/LocalLLaMA on Reddit: 🎧 Listen and Compare 12 Open-Source Text-to-Speech Models (Hugging Face Space)

July 6, 2025 -

Hey everyone!

We have been exploring various open-source Text-to-Speech (TTS) models, and decided to create a Hugging Face demo space that makes it easy to compare their quality side-by-side.

The demo features 12 popular TTS models, all tested using a consistent prompt, so you can quickly hear and compare their synthesized speech and choose the best one for your audio projects.

Would love to get feedback or suggestions!

👉 Check out the demo space and detailed comparison here!

👉 Check out the blog: Choosing the Right Text-to-Speech Model: Part 2

Share your use-case and we will update this space as required!

Which TTS model sounds most natural to you?

Cheers!

Top answer

1 of 5

17

Nice to have it all in one place. It'd be even nicer to have an apples to apples comparison, thus all female or all male voices, instead of mixed like it's now. Maybe both? The CSM example sounds like it's full of artifacts, just like F5-TTS - and both were highlighted for speech quality. Maybe something went wrong during generation? At least Sesame can sound way better. The Llasa sample seems slightly broken - that's maybe a hint that this happens more often? Same with the background noise for MegaTTS3. Orpheus was probably standing in a large room during the generation 😉.

2 of 5

14

Cool project! And thanks for the work you put into it and making it a useful tool for others! I'd love to see Chatterbox and Kyutai added to the mix as well. At least, assuming they are open-source, if they aren't, of course, ignore this.

Hugging Face

huggingface.co › docs › transformers › en › tasks › text-to-speech

Text to speech

In our experience, obtaining satisfactory results from this model can be challenging. The quality of the speaker embeddings appears to be a significant factor. Since SpeechT5 was pre-trained with English x-vectors, it performs best when using English speaker embeddings.

Hugging Face

huggingface.co › spaces › NihalGazi › Text-To-Speech-Unlimited

Realistic Text To Speech Unlimited - a Hugging Face Space by NihalGazi

Enter text, choose a voice and emotion, and generate audio. The text is checked for appropriateness before conversion. You'll get an audio file as a result.

Modal

modal.com › blog › open-source-tts

The Top Open-Source Text to Speech (TTS) Models

This article explores the top open-source TTS models, based on Hugging Face’s trending models and insights from our developer community.

Hugging Face

huggingface.co › spaces › TTS-AGI › TTS-Arena

TTS Arena Legacy - a Hugging Face Space by TTS-AGI

This application displays a leaderboard of Text-to-Speech models based on user votes. Users can filter the leaderboard to show preliminary results, exclude battle votes, sort by Arena Score, and hi...

Hugging Face

discuss.huggingface.co › models

Real-Time Text-to-Speech Model - Models - Hugging Face Forums

January 4, 2025 - Greetings everyone, I’m currently looking for real-time tts model that can create an audio as soon as I type. Kindly guide me in this regard.

BentoML

bentoml.com › blog › exploring-the-world-of-open-source-text-to-speech-models

The Best Open-Source Text-to-Speech Models in 2026

1 week ago - Dia is a dialogue-focused TTS model developed by Nari Labs. It is able to generate expressive, realistic multi-speaker conversations from text scripts, including nonverbal elements like laughter, coughing, or sighing. Its design makes it ideal for dynamic applications such as podcasts, audio dramas, game dialogues, or conversational interfaces. The newest release, Dia2, features a streaming architecture that can begin synthesizing speech from the first few tokens.

Analytics Vidhya

analyticsvidhya.com › home › top 12 open source models on huggingface in 2025

Top 12 Open Source Models on HuggingFace in 2025

May 9, 2025 - Indic Parler-TTS is a multilingual text-to-speech system developed collaboratively by AI4Bharat and HuggingFace to enhance linguistic inclusivity in AI applications across India. Supporting 21 languages—including Hindi, Bengali, Tamil, Telugu, ...

reddit.com › r/localllama › best tts model right now that i can self host?

r/LocalLLaMA on Reddit: Best TTS model right now that I can self host?

July 3, 2024 -

which TTS has the human like quality and I can self host ?

or is there a hosted cloud API with reasonable pricing that gives good natural voice like eleven labs or hume ai?