How to Pronounce Google Phonetically (With Audio)?
Google Dictionary's Ridiculously Sexy Voice for Pronunciation
How to use Phonetic or Phoneme pronunciation in google text to speech? - Stack Overflow
Google Pronunciation voice
Videos
Hey Ms. Dictionary, allow me to pronounce something for you, my love, your voice, one marriage, define choice.
I had to hunt for all these, since that voice only does certain words, and the links seem unstable, but there are enough examples. Open in all tabs first and click in order.
1 2 3 4 5 6 7 8
...and yes, that was a poem addressed to a dictionary, but the sexiest god damn dictionary I've ever heard.
Right click the page and select "Inspect Element", then go to the network tab. Now, refresh the page with the network panel still open. Wait until nothing is showing up there anymore. While waiting, make sure not to get your mouse near the Listen button. Once nothing is showing up in the network panel, hover and click the listen button. As soon as you hover the listen button, an entry will appear that says "batchexecute". Find this entry. It should be above entries that say log?format=json&hasfast=….
Click on that and then on the right select the "Response" tab. There should be a bunch of random characters that go off the screen very far to the right
Select just that text and copy it. The easiest way to do this is to scroll all the way to the right first and then click and hold to the right of the ending quotation mark, then move your mouse up to the line above, then move your mouse down to reach the starting quotation mark, holding the mouse the whole time.
Go to the console tab and type v= then paste then press enter. Then, paste this into the console and press enter
{
const a = document.createElement("a");
a.href = "data:audio/mp3;base64,"+JSON.parse(v)[0];
a.download = "file.mp3";
a.click();
}
The mp3 file will download.
- Google search the word of which you want to download pronunciation by entering the query :"*How to pronounce *word**"
- Right-click the page and click View page source.
- Search for Mp3. screenshot
- click the mp3 link.
- Click the 3 dots and click Download.
Google Text-to-Speech supports the <phoneme> tag since at least spring 2021.
However, there are a lot of potential gotchas to overcome:
- The demo page filters out
<phoneme>tags on the client side before they even reach the API. (It does the same with the<voice>tag as pointed out here) - As with Microsoft Azure Text-to-speech (see the other answer for details), each language only supports a limited set of phonemes ("letters") that can be used.
- If you use an unsupported one, the phoneme tag is completely ignored without any warning. So the official example
<phoneme alphabet="ipa" ph="ˌmænɪˈtoʊbə">manitoba</phoneme>does not work with any English variant buten-US, since all others lack the"o"or"oʊ"phoneme. - It's unclear if you need to use the
v1beta1API (which I can confirm is working) or if versionv1is also ok.
There is the SSML tag <phoneme> that serves your purpose.
Unfortunately, it's currently not supported in Google Cloud Text-to-speech. The available subset of SSML tags for Google Cloud is listed in the documentation. The <phoneme> tag is not in this list. An experiment using Google Cloud's text-to-speech-demo confirms that the phonemes are ignored. The content of the tag is being read as ordinary text, as has already been remarked by @Trevor in the comments.
The <phoneme> tag is, however, being supported by Microsoft Azure Text-to-Speech and Amazon Polly. In both cases, the available phonemes are limited to those available in the language being used (see here for Azure and here for Polly). The Azure documentation isn't 100% clear about the exclusion of out-of-language phonemes, but practical experiments with the Azure Text-to-Speech demo confirm that they're not working properly. In some cases, they at least seem to be replaced by the nearest available equivalent in the language used.
Being restricted to the phonemes of one language severely limits the usefulness of the phonemes tag. E.g., you can't used the feature to embed correctly pronounced content in a second language, as the second language will usually have some phonemes that are not available in the first language. Concrete language pairs in which each language has some phonemes that are not available in the other one are English/German, Spanish/German, English/Spanish.