Videos
I'd like to start making some datasets, but it's gonna take some time since RVC works best with a lot of audio footage.
I was wondering if there's alternatives yet that are better at either training models (faster or less audio samples required) or the voice conversion part.
Iβve been experimenting with RVC (Retrieval-based Voice Conversion) recently, and Iβm trying to figure out how people find good quality voice models for cloning.
To be clear, Iβm not looking for TTS. I already have a source audio, and I just want to convert it into the modelβs voice.
A couple of questions Iβm hoping the community can help me with:
Are there any popular RVC models that are known to give good results?
Whatβs the best way to actually find the popular / high-quality models?
Are there any better alternatives to RVC right now for high-quality voice conversion (not TTS)?
Basically, I want to know how people in the community are discovering and selecting the models that actually work well. Any recommendations, tips, or even links to trusted sources would be super helpful!
This could be a game changer. I've played around with RVC models before, and they work really well, but they only change one voice into another voice and don't do text-to-speech.
This describes a way to morph generic training datasets with RVC models, and then use the morphed training data to create piper text-to-speech models that will run on a raspberry pi.
So. Many. Possibilities.
https://github.com/domesticatedviking/TextyMcSpeechy
I've been experimenting with different TTS systems for a while now, and I recently tried combining Kokoro TTS with RVC voice changer. The results were honestly much better than I expected.
What impressed me most was the speed - it only took about 3 seconds to generate a ~40 second audio clip (on my 1080). For someone who's been waiting minutes for other systems to process similar lengths, this was a game changer.
And all of this running locally
http://www.sndup.net/bmfx5