Videos
What models can I use for Text-to-Speech?
What is Text-to-Speech?
What datasets can I use for Text-to-Speech?
Hey everyone!
We have been exploring various open-source Text-to-Speech (TTS) models, and decided to create a Hugging Face demo space that makes it easy to compare their quality side-by-side.
The demo features 12 popular TTS models, all tested using a consistent prompt, so you can quickly hear and compare their synthesized speech and choose the best one for your audio projects.
Would love to get feedback or suggestions!
👉 Check out the demo space and detailed comparison here!
👉 Check out the blog: Choosing the Right Text-to-Speech Model: Part 2
Share your use-case and we will update this space as required!
Which TTS model sounds most natural to you?
Cheers!
I am looking for the best Speech-to-Text/Speech Recognition Models, anyone could recommend any?
Hey,
AI has been going crazy lately and things are changing super fast. I created a video covering a few trending TTS (Text to speech) publicly available huggingface spaces, check it out!
https://www.youtube.com/watch?v=4-2Jk8muo7c
I can't wait to start testing these models application usages in future videos, it seems like this technology is finally starting to gain momentum!
Let me know what you think about it, or if you have any questions / requests for other videos as well,
cheers
Hi everyone, I'm VB, the GPU poor in residence (focus on open source audio and on-device ML) at Hugging Face! 🤗
Quite please to introduce you to Parler TTS v1 🔉 - 885M (Mini) & 2.2B (Large) - fully open-source Text-to-Speech models! 🤙
Some interesting things about it:
Trained on 45,000 hours of open speech (datasets released as well)
Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release)
Mini trained on a larger text encoder, large trained on both larger text & decoder
Also supports SDPA & Flash Attention 2 for an added speed boost
In-built streaming, we provide a dedicated streaming class optimised for time to the first audio
Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that
Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do)
Apache 2.0 licensed codebase, weights and datasets! 🤗
Can't wait to see what y'all would build with this!🫡
Quick links:
Model checkpoints: https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c
Space: https://huggingface.co/spaces/parler-tts/parler_tts
GitHub Repo: https://github.com/huggingface/parler-tts