Voice Cloning Application Using RVC

Ever been curious about voice cloning? Thanks to advanced technology such as deep learning and RVC (Retrieval-based Voice Conversion), it is readily available! In this project, we will iterate the detailed process of creating a Voice Cloning Application. Don't panic if you are not a computer wizard - every detail is put in the simplest way possible. If you have an interest in AI and machine learning and voice technology comes in between, this project is for you!

Project Overview

In this project, you will experience the process of making a voice cloning tool using RVC technology. The platform for this tutorial would be Google Colab, so you don't need to worry about any troublesome installations and just follow the steps. You will know how to use some pre-trained models to get realistic clones of someone's voice based on the input audio. What is more interesting? You will be able to manipulate the voice, making this project suitable for voice transformation. It can transform a man's voice into a woman's voice for various purposes.

With the more advanced RVC, they have also made it possible to clone voices with great precision. If you are a developer, a voice tech hobbyist, or simply interested in AI voice synthesis, this project will help you get hands-on with the voice cloning technology that everyone has been wondering about.

Prerequisites

Before embarking on this fun-filled Voice Cloning Application project, there are a few prerequisites that you need to know.

Basic knowledge of Python programming is required in order to catch up with its coding tasks and scripts in the project.
It is necessary to know how to work with Google Colab to create an environment for the project and run the code.
Background in deep learning will be useful. Particularly for comprehending how models are trained and the use of existing models.
Knowledge of Librosa and PyDub libraries for tasks like audio processing and manipulation.
Good understanding of RVC and its significance in voice cloning.
You need basic knowledge of WAV/MP3 standards to create and handle voice databases correctly.
Knowledge of using pip for the installation of Python packages.

Approach

In this project, we create the Voice Cloning Application employing RVC (Retrieval-based Voice Conversion) and deep learning techniques. To begin with, we will set up our environment in Google Colab. This will help to avoid the hassles with the local installations that come with running the project. After that, we will gather and prepare audio sources and pre-trained models which are essential for voice processing. In addition to that, we will use Python libraries of Librosa, and PyDub to manipulate the audio files and obtain the exact features.

This will be useful for training the model. When the data is prepared, we will move forward to the model training stage. We will use the already existing model's weight for the enhancement of the voice. Upon completing this phase of the work, we will proceed to the most entertaining aspect of the work - inference!

In this stage, we'll utilize the trained model to replicate voices from input audio samples, tweaking features like pitch for extra customization. Throughout the project, we'll maintain a straightforward and approachable approach. So that even beginners can easily follow along.

Workflow and methodologies

The workflow and methodology for building the "Voice Cloning Application using RVC" are as follows:

Workflow

Configure the Google Colab environment to run the Project without the need for installation on local computers.
Obtain the necessary pre-trained models and audio datasets to begin working on the task of voice cloning.
Implement audio processing libraries such as Librosa and PyDub, working with audio files, extracting essential components, and cleaning out the dataset.
Select the most suitable RVC (Retrieval-based Voice Conversion) technique for the training and inference operations.
Refine the model so that it is efficient to use on given tested audio for voice cloning accuracy.
Tune vocal characteristics such as the pitch of the projected voice to make the retrieved voice suitable and modified for transformations.
Test the training outcomes by assessing the performance of the model in terms of accuracy and voice quality of the trained model.
Use tensorBoard throughout the entire process of model training to track the training performance in real-time.

Methodology

Mount Google Drive to save and access files within the Colab environment.
Clone the RVC repository from GitHub to have the right tools and software for the project.
Download Pre-trained models from Hugging Face using aria2c for faster and more efficient download.
Either upload or download audio files but make sure they are the right format for training and processing.
Use Librosa to process audio which includes changing the format and feature extraction of audio.
Pass the processed data to the RVC model to train it using the existing pre-trained weights to enhance the accuracy of the model.
Apply pitch and f0 extraction techniques in order to manipulate voice transform without any limitations.
Test and verify the output through inference.

Data Collection and Preparation

Data Collection Workflow

Collect datasets: Collect audio files from various sources for voice cloning tasks.
Format datasets: Make sure all the audio files are in the correct format. For example in MP3, WAV format for processing.
Evaluate datasets: Assess audio files for quality as well as for their suitability in voice cloning.

Data Preparation Workflow

Preprocess audio data: For extracting key features, cleaning the dataset, and remove noise use Librosa Pydub.
Normalize and resample audio: Standardize sampling rates. Then normalize audio levels for consistent input to the model.
Split datasets: Preparing the audio data into three portions. These are training, validation, and testing in order to train and assess the model performance effectively.

Code Explanation

STEP 1:

Mounting Drive

This code shows how to connect your Google Drive account to a Colab workspace. It helps in accessing the files available in the user's Google Drive by making it present in a particular folder (which is '/content/drive').

from google.colab import drive
drive.mount('/content/drive')

Initial Setup for WebUI Voice Conversion

This code changes the current working directory in the Google Colab environment to /content. Then, it imports required packages such as clear_output, Button, subprocess, shlex, os. These are used to clear output cells, create UI buttons and run shell commands including manipulation of System operation respectively. Furthermore, it mounts the google drive. Subsequently, a few string variables(var, test, c_word, r_word) are defined. These will be used in the later processes associated with the WebUI, Voice Conversion and Retrieval.

%cd /content
from IPython.display import clear_output
from ipywidgets import Button
import subprocess, shlex, os
from google.colab import drive
var = "We"+"bU"+"I"
test = "Voice"
c_word = "Conversion"
r_word = "Retrieval"

Cloning Repository and Installing Dependencies

The code initially clones a GitHub repository to the /content/RVC directory. It then downloads pip version 24.0 (the Python package management). Finally, it uses the apt package management to install the aria2 package, which is a command-line downloader. This prepares the environment for working with the repository.

!git clone https://github.com/splendormagic/RVC_BahaaMahmoud /content/RVC
!pip install pip==24.0
!apt -y install -qq aria2

Downloading Pretrained Models for Voice Conversion

The function checks /content/RVC/assets/pretrained_v2 for specified pretrained files. If not present, the aria2c download manager downloads Hugging Face repositories' missing files. The filenames to download are in pretrains and new_pretrains. The subprocess module provides means of instruction provision for the downloads. It handles problems with exception blocks.

pretrains = ["f0D32k.pth","f0G32k.pth"]
new_pretrains = ["f0Ov2Super32kD.pth","f0Ov2Super32kG.pth"]
for file in pretrains:
    if not os.path.exists(f"/content/RVC/assets/pretrained_v2/{file}"):
        command = "aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/%s%s%s/resolve/main/pretrained_v2/%s -d /content/RVC/assets/pretrained_v2 -o %s" % ("Voice","Conversion","WebUI",file,file)
        try:
            subprocess.run(shlex.split(command))
        except Exception as e:
            print(e)
for file in new_pretrains:
    if not os.path.exists(f"/content/RVC/assets/pretrained_v2/{file}"):
        command = "aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/poiqazwsx/Ov2Super32kfix/resolve/main/%s -d /content/RVC/assets/pretrained_v2 -o %s" % (file,file)
        try:
            subprocess.run(shlex.split(command))
            print(shlex.split(command))
        except Exception as e:
            print(e)

STEP 2:

Setting Up Directories and Downloading Necessary Files

The code creates directories for data and audio files. It also retrieves Python scripts and sound files from many sources. Wget only downloads files that have not been downloaded before (-nc flag). After setup, the download_files.py script is performed to set up or download additional files.

!mkdir -p /content/dataset && mkdir -p /content/RVC/audios
!wget -nc https://raw.githubusercontent.com/RejektsAI/EasyTools/main/original -O /content/RVC/original.py
!wget -nc https://raw.githubusercontent.com/RejektsAI/EasyTools/main/app.py -O /content/RVC/demo.py
!wget -nc https://raw.githubusercontent.com/RejektsAI/EasyTools/main/easyfuncs.py -O /content/RVC/easyfuncs.py
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/download_files.py -O /content/RVC/download_files.py
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/a.png -O /content/RVC/a.png
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/easy_sync.py -O /content/RVC/easy_sync.py
!wget -nc https://huggingface.co/spaces/Rejekts/RVC_PlayGround/raw/main/app.py -O /content/RVC/playground.py
!wget -nc https://huggingface.co/spaces/Rejekts/RVC_PlayGround/raw/main/tools/useftools.py -O /content/RVC/tools/useftools.py
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/astronauts.mp3 -O /content/RVC/audios/astronauts.mp3
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/somegirl.mp3 -O /content/RVC/audios/somegirl.mp3
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/someguy.mp3 -O /content/RVC/audios/someguy.mp3
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/unchico.mp3 -O /content/RVC/audios/unchico.mp3
!wget -nc https://huggingface.co/Rejekts/project/resolve/main/unachica.mp3 -O /content/RVC/audios/unachica.mp3
!cd /content/RVC && python /content/RVC/download_files.py

Installing Project Dependencies

The code inspects whether the installed variable exists or not. If these do not exist then it installs the Python packages stated in requirements.txt by executing pip install in the directory /content/RVC. There are other dependencies that are also installed like mega.py, gdown, pytube, pydub, and a specific version of gradio . After installation, it sets installed=True in the event that there is not a need for installations in future runs. This makes sure that all the packages essential for the project are present.

if not "installed" in locals():
    !cd /content/RVC && pip install -r requirements.txt
    !pip install mega.py gdown==5.1.0 pytube pydub  gradio==3.42.0
installed=True

Optional Google Drive Integration for Saving Files

This snippet of code facilitates the optional google drive saving functionality through the save_to_drive flag. When it is set to True, it authenticates the user to Google Drive and mounts the drive in Colab. After that the GarbageMan class from the easy_sync module is imported. This helps in organizing and deleting the unwanted files on Google Drive after some time. Any errors arising from this process are captured and printed. This feature is useful in automating the file management.