Videos
What models can I use for Text-to-Speech?
What is Text-to-Speech?
What datasets can I use for Text-to-Speech?
Hey everyone!
We have been exploring various open-source Text-to-Speech (TTS) models, and decided to create a Hugging Face demo space that makes it easy to compare their quality side-by-side.
The demo features 12 popular TTS models, all tested using a consistent prompt, so you can quickly hear and compare their synthesized speech and choose the best one for your audio projects.
Would love to get feedback or suggestions!
👉 Check out the demo space and detailed comparison here!
👉 Check out the blog: Choosing the Right Text-to-Speech Model: Part 2
Share your use-case and we will update this space as required!
Which TTS model sounds most natural to you?
Cheers!
I am looking for the best Speech-to-Text/Speech Recognition Models, anyone could recommend any?
Hey,
AI has been going crazy lately and things are changing super fast. I created a video covering a few trending TTS (Text to speech) publicly available huggingface spaces, check it out!
https://www.youtube.com/watch?v=4-2Jk8muo7c
I can't wait to start testing these models application usages in future videos, it seems like this technology is finally starting to gain momentum!
Let me know what you think about it, or if you have any questions / requests for other videos as well,
cheers
For me, the go-to source for answering the question "What's the state of the art for [task]?" is Papers with Code. They compile tasks, benchmarks, papers (and their associated official/unofficial implementations) into a series of leaderboards.
Consider, the text-to-speech leaderboard, for example. At the top, you see a series of benchmarks which you can click on to see the top performing papers.
Many popular leaderboards (like the one for TTS) also has a libraries section. These may not be the state of the art, but it'll probably be the easiest to get up and running with.
There are somethings to keep in mind, however.
Take benchmarks with a grain of salt. Especially for something like text-to-speech, where it's difficult to quantify performance into a single number.
Research repositories can be very finnicky. Make sure to look at if data/checkpoints are available. I'd recommend starting with existing, maintained libraries, but if you're looking for the state-of-the-art, this might not be an option. You may be able to email paper authors to request checkpoints/data, if they aren't publically available.
Yeah, I’ve been running into the same problem. Most open-source TTS options either sound robotic or require a ton of setup/tweaking. Coqui is great but, like you said, the good stuff isn’t really available unless you pay. You might want to check out Piper by RHVoice — it’s open-source and getting better, especially with custom voice training. Not perfect, but might be good enough depending on your use case.