If you're going to do a lot of this, you should really use the API that they offer, but here's a quick step by step if you just want to download a single sample of Google's speech synthesis.
- Go to the page in Google Chrome.
- Open the Developer Tools (by pushing F12)
- Go to the "Network" tab.
- Enter the text you want to get audio of.
- Click the "SPEAK IT" button.
- Watch the "Network" tab populate with a couple of entries.
- Right-click the entry that starts with
data:audio/wav;base64,and click "Open in new tab". - In the new tab, right-click the audio player, and click "Save video as..."
- Choose where you want to save the resulting
.wavfile.
Note: This will get you a (marginally) cleaner copy of the audio than recording the Stereo Mix from your sound card.
Answer from 3D1T0R on Stack ExchangeVideos
If you're going to do a lot of this, you should really use the API that they offer, but here's a quick step by step if you just want to download a single sample of Google's speech synthesis.
- Go to the page in Google Chrome.
- Open the Developer Tools (by pushing F12)
- Go to the "Network" tab.
- Enter the text you want to get audio of.
- Click the "SPEAK IT" button.
- Watch the "Network" tab populate with a couple of entries.
- Right-click the entry that starts with
data:audio/wav;base64,and click "Open in new tab". - In the new tab, right-click the audio player, and click "Save video as..."
- Choose where you want to save the resulting
.wavfile.
Note: This will get you a (marginally) cleaner copy of the audio than recording the Stereo Mix from your sound card.
As predicted in this comment, the accepted answer is now broken. The basic approach still works though, except you have to save the proxy.json and then decode the base64-encoded audio:
cat proxy.json | jq '.audioContent' -r | base64 -d > your-audio.wav
If you like to read your text out loud to catch awkward sentences, you may want to try text-to-speech. Unfortunately the free alternatives sound horrible, and the available text-to-speech apps offering premium voices are expensive, especially if you're revising an entire novel. There is however a workaround, it's a little involved, but you only have to do it once.
Guide: How to generate text-to-speech using Google's Wavenet voices for free. (And legally.)
Wavenet is the artificial voice API used in Google assistant, among others, and sounds considerably more natural than the free alternatives. If you register a Google cloud account, you can activate the the Cloud text-to-speech API and get 1 million characters a month for free directly from Google. Search for it in the API library, and it pops right up.
Be aware that if you exceed the allotted amount of characters, you'll be charged $16 for another million. A million characters is enough for at least 150 000 words though, so you will most likely never come even near running that risk.
The trick is now to take your newly acquired characters and generate an actual voice with them. You do that with an extension to Chrome called "Wavenet for Chrome", surprisingly. Install it and head back to Google cloud to generate an API key. Instructions are provided by the extension, or can be found with a google search. Generate the key and paste it into the extension. The configuration is now done.
You access the extension via the right-click menu, so you need to use a web text editor that doesn't override it. Google docs and Word won't work. I use Wavemaker, but any simple editor will do.
Choose the voice you want in the extension and open your text in the editor. Select the part you want to generate, right-click and select "Download as MP3". This saves you from wasting characters by generating the same text over and over. Open your new file in the MP3-player of your choice and there you go. Easy peasy lemon squeezy.