Videos
Is there any free and good TTS api or on device TTS models with shortcut capability? I've seen the launch of many tts ai models over the months so I'm wondering if anyone had any luck in using one of them with apple shortcuts.
The new siri voice is good but still far from natural. I use eleven labs but it's way too expensive for +/-6 mins of audio everyday.
I have a morning overview shortcut that outputs everyday about 10 paragraphs of text that is then converted to audio. It was supposed to be my weekend project to find another alternative but so far, no luck. I tried to connect to the API Minimax audio v2 but no luck. So I'll greatly appreciate some alternatives.
Hi all, I recently ran a benchmark comparing a bunch of speech-to-text APIs and models under real-world conditions like noise robustness, non-native accents, and technical vocab, etc.
It includes all the big players like Google, AWS, MS Azure, open source models like Whisper (small and large), speech recognition startups like AssemblyAI / Deepgram / Speechmatics, and newer LLM-based models like Gemini 2.0 Flash/Pro and GPT-4o. I've benchmarked the real time streaming versions of some of the APIs as well.
I mostly did this to decide the best API to use for an app I'm building but figured this might be helpful for other builders too. Would love to know what other cases would be useful to include too.
Link here: https://voicewriter.io/speech-recognition-leaderboard
TLDR if you don't want to click on the link: the best model right now seems to be GPT-4o-transcribe, followed by Eleven Labs, Whisper-large, and the Gemini models. All the startups and AWS/Microsoft are decent with varying performance in different situations. Google (the original, not Gemini) is extremely bad.
What I'm looking for
Yes, "best" is subjective - but specifically what I'm looking for in a text to speech API is one that is cheap as possible while not sacrificing the qualities below:
Good selection of voices and voice customization (voice rate, speed, tonality, etc.)
Easy to work with company, one that can make fairly reasonable deals on pricing.
Easy to use API
and as a bonus - it would be nice for the API to have some sort of caching mechanism, so that repeating the same line doesn't incur additional usage costs.
Context for why I'm looking
I'm creating a website that is heavily reliant on a text to speech. I've been using the Web Speech API which has been great, especially because it's free. However, the voices don't sound natural whatsoever - and I'd like to leverage something like ElevenLabs (but once again looking for any alternatives people have had success with) for my use-case.
Or, if people have advice on creating my own text to speech model, and it's low effort - please advise 🤣 Although my assumption is that it will be a lot of effort and spendy.
Hi all, I was wondering if you knew any affordable, convincing text to speech APIs I could use for a commercial app I am making to speak the output of ChatGPT text.
I assume I would have to limit usage to the user due to high costs over time, and then they could pay for additional hours of use (which is a whole other bag of worms)
I know everyone uses Elevenlabs, what about other cheaper alternatives?
Has anyone used google text to speech - is this more affordable? (Pricing says 1 million bytes or characters for 16$, which doesn't help me understand how much usage this actually gets you)
Any other good options?
Any input would be greatly appreciated- thank you!