BEGANSing + RVC + AudioSuperResolution

Korean Singing Voice Synthesis + Singing Voice Conversion(SVS + SVC)

The system generates singing voice from a given text and MIDI in an end-to-end manner.

Overview of the proposed system

Installation
Prepare Datasets
Configuration
Preprocessing & Training
Usage
Results
To-Do
References

Installation

A Windows/Linux system with a minimum of 16GB RAM.
A GPU with at least 12GB of VRAM.
Python >= 3.8
Anaconda installed.
Pytorch installed.
CUDA 11.7 installed.

Pytorch install command:

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

CUDA 11.8 install:

https://developer.nvidia.com/cuda-11-8-0-download-archive

Create an Anaconda environment:

conda create -n begansing python=3.9

Activate the environment:

conda activate begansing

Clone this repository to your local machine:

git clone https://github.com/ORI-Muchim/BEGANSing.git

Navigate to the cloned directory:

cd BEGANSing

Install the necessary dependencies:

pip install -r requirements.txt

Prepare Dataset

Inside the cloned folder, there is a folder called ./test_datasets. You can put the MIDI file and text file in it according to the format. MIDI and text should be arranged in the same number unconditionally. As an example, I will provide GFRIEND's "Rough" MIDI and text. And for the dataset to change the voice from the generated vocals, you can create a folder with the speaker's name in the ./datasets folder and put voice data for Retrieval Voice Conversion (RVC) in it. The following shows the ./datasets format.

BEGANSing
├────datasets
│       ├───kss
│       │   ├────1_0000.wav
│       │   ├────1_0001.wav
│       │   └────...
│       ├───{speaker_name}
│       │    ├───1.wav
└───────└────└───2.wav

This is just an example, and it's okay to add more speakers.

Preprocessing & Training

This pre-trained model is a model in which an additional 100 epochs was trained. For Preprocessing and Training, see Preprocessing, Training in the original repository.

Usage

python main.py {speaker_name} {song} {pitch_shift} --audiosr

If the speaker is male, it is recommended to set the {pitch_shift} value to -12, and if she is female, set it to 0.

The --audiosr option up-samples a voice generated at 22050hz to 48000hz. Use this option for those who have excellent graphics cards or don't mind taking a long time to generate a voice, or remove it if not.

Results

Audio samples at: https://soonbeomchoi.github.io/saebyulgan-blog/. Model was trained at RTX3090 24GB with batch size 32 for 2 days.

To-Do

Change Vocoder Griffin-Lim -> HiFi-GAN

References

g2p/korean_g2p.py from https://github.com/scarletcho/KoG2P
utils/midi_utils.py from Madmom, https://madmom.readthedocs.io/en/latest/

ori-muchim / begansing Goto Github PK

begansing's Introduction

BEGANSing + RVC + AudioSuperResolution

Korean Singing Voice Synthesis + Singing Voice Conversion(SVS + SVC)

Contents

Installation

Prepare Dataset

Preprocessing & Training

Usage

Results

To-Do

References

begansing's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent