Videos
I'd like to start making some datasets, but it's gonna take some time since RVC works best with a lot of audio footage.
I was wondering if there's alternatives yet that are better at either training models (faster or less audio samples required) or the voice conversion part.
This could be a game changer. I've played around with RVC models before, and they work really well, but they only change one voice into another voice and don't do text-to-speech.
This describes a way to morph generic training datasets with RVC models, and then use the morphed training data to create piper text-to-speech models that will run on a raspberry pi.
So. Many. Possibilities.
https://github.com/domesticatedviking/TextyMcSpeechy
Iโve been experimenting with RVC (Retrieval-based Voice Conversion) recently, and Iโm trying to figure out how people find good quality voice models for cloning.
To be clear, Iโm not looking for TTS. I already have a source audio, and I just want to convert it into the modelโs voice.
A couple of questions Iโm hoping the community can help me with:
Are there any popular RVC models that are known to give good results?
Whatโs the best way to actually find the popular / high-quality models?
Are there any better alternatives to RVC right now for high-quality voice conversion (not TTS)?
Basically, I want to know how people in the community are discovering and selecting the models that actually work well. Any recommendations, tips, or even links to trusted sources would be super helpful!
I've been experimenting with different TTS systems for a while now, and I recently tried combining Kokoro TTS with RVC voice changer. The results were honestly much better than I expected.
What impressed me most was the speed - it only took about 3 seconds to generate a ~40 second audio clip (on my 1080). For someone who's been waiting minutes for other systems to process similar lengths, this was a game changer.
And all of this running locally
http://www.sndup.net/bmfx5