The Real Time Voice Cloning Toolbox

September 09, 2019

What Is Real Time Voice Cloning?
Real Time Voice Cloning is involves a deep learning framework that can generate computerized voices by listening to audio recordings. The framework creates numerical representations of the voices it listens to, which can then be used by a text-to-speech program to create a copy of those voices to read text aloud.

The Real Time Voice Cloning Toolbox is written in Pyton, and is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis or SV2TTS, which does the voice analysis. It is used alongside WaveRNN, Tacotron 2, and Generalized End-To-End Loss For Speaker Verification (GE2E) for voice synthesizing and encoding.

This project interests me because of its numerous potential uses. It could be used to give those who lose the ability to speak their voices back, as well as to create computerized voices that sound more natural. However, it also opens up the possibility of people voices being used without their consent, and I am curious as to how these issues are addressed, if at all.

Search This Blog

Open Source Development

The Real Time Voice Cloning Toolbox

Comments

Post a Comment

Popular posts from this blog

Starting on Telescope and Finding a New Issue

Second Contribution of Hacktoberfest

External Pull Request and Creating a Spam Checker for Telescope