Tricking the AI into pronouncing words properly
System for writing the pronunciation of words (across languages) using AI?
For people that like to make up words do you use a website/app to find a way to pronounce them right?
Hey everyone, I made a free tool to teach you the pronunciation of any word or phrase in American English (details inside)
What does “Mrs.” mean?
What does “Miss” mean?
What does “Ms.” stand for?
Videos
Here's a tip. I've noticed that some words will be consistently mispronounced. You can work around this by just changing the spelling of the word to something which apparently reads better from the AI's perspective.
For example, the word "emptiness" is consistently pronounced "empy-ness", which does not sound good. Change it to "emptyness", which is not the proper spelling, but the AI can understand that and now pronounces the word right. So, just mess with the spelling and you can trick the AI into pronouncing words correctly when it wants to mispronounce them.
Say I have written the IPA form of 1000 English words, i.e. I wrote the pronunciation down for 1000 words in a standard text format, (or used something like the CMU pronunciation dictionary in English, which has over 100k words with written pronunciations). Ideally it only requires less than 1000 written pronunciations covering all aspects of pronunciation for a given language, as more than that would be a lot of manual work to create the training data.
Ideally, I could write 1000 IPA transcriptions in 100 different languages (1000 per language), and some sort of AI models would integrate with that, and it would be able to automatically generate the IPA text pronunciation from spoken audio. Is that possible?
So I'd have basically a CSV like this:
sublime,/səˈblaɪm/ creativity,/krieɪˈtɪvəti/ bounty,/ˈbaʊn(t)i/ ...
Given that's all the data I have (CSV with word and pronunciation written down), what sort of AI tools would I need to be able to take speech audio and create the IPA written transcription from it?
Tools like AWS Polly and Google Cloud Text2Speech allow for taking IPA transcriptions (like my CSV above), and converting it to AI-generated audio for many languages, including several voices per language (see the Google Cloud link for supported languages). But how would you do the reverse? Converting audio into IPA transcriptions?
What is the basic system of tools you would need to wire up, and the "AI models" you would need to find out in the wild to make this happen? Assuming you don't have the luxury of having any sorts of huge datasets like Meta/Google/Microsoft/etc. have acccess to. For example, I would need to use some sort of existing model for dealing with the audio of voices.
The part I would add (it seems so far) would be a small curated list of ~1000 words and their written IPA pronunciation. I would then plugin the word like "creativity" to get the audio of it from some free AI database. Then I would write some sort of code to train the AI to convert "creativity" into "/krieɪˈtɪvəti/" (and correspondingly, "creation", which it may have never seen an IPA transcription for, into "/kriːˈʲeɪʃɘn/", given some sort of model training).
What is the general approach to making this a reality?
I would like to build a curated database of IPA pronunciations for words, and your best current source is Wiktionary, but for many languages the IPA pronunciation transcriptions are sorely lacking. So if we could use AI to take an existing audio database of saying words in 100's of different languages, and convert it to the written IPA pronunciation, that would be a huge win. Is that at all possible?
I am a full-stack software developer mainly focusing on TypeScript, so that's where I'm coming from. If it's possible to do in Python or some other language, I will surely learn that next, I just am unsure what components go into the final mixture to make this possible.
So I basically just want to know if there’s an app/website that you know that can pronounce made up worlds. Like this name here “Shamól”. I don’t know how to pronounce it correctly because of speaking difficulties. So I’m wondering if there’s a website or something you guys use to know how to pronounce made up words correctly.
Thx for the help if you can share :)