The Real Time Voice Cloning Toolbox

What Is Real Time Voice Cloning?
Real Time Voice Cloning is involves a deep learning framework that can generate computerized voices by listening to audio recordings. The framework creates numerical representations of the voices it listens to, which can then be used by a text-to-speech program to create a copy of those voices to read text aloud.

The Real Time Voice Cloning Toolbox is written in Pyton, and is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis or SV2TTS, which does the voice analysis. It is used alongside WaveRNN, Tacotron 2, and Generalized End-To-End Loss For Speaker Verification (GE2E) for voice synthesizing and encoding.

This project interests me because of its numerous potential uses. It could be used to give those who lose the ability to speak their voices back, as well as to create computerized voices that sound more natural. However, it also opens up the possibility of people voices being used without their consent, and I am curious as to how these issues are addressed, if at all.

Comments

Popular posts from this blog

External Pull Request and Creating a Spam Checker for Telescope

Second Contribution of Hacktoberfest

Adding 'Favorite' System to Pokemon Showdown: External Pull Request and Final Week of OSD600