Coder Social home page Coder Social logo

begansing's Introduction

BEGANSing + RVC + AudioSuperResolution

Korean Singing Voice Synthesis + Singing Voice Conversion(SVS + SVC)

The system generates singing voice from a given text and MIDI in an end-to-end manner.

model architecture final 2 3

Overview of the proposed system

Contents

Installation

  • A Windows/Linux system with a minimum of 16GB RAM.
  • A GPU with at least 12GB of VRAM.
  • Python >= 3.8
  • Anaconda installed.
  • Pytorch installed.
  • CUDA 11.7 installed.

Pytorch install command:

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

CUDA 11.8 install:

https://developer.nvidia.com/cuda-11-8-0-download-archive

  1. Create an Anaconda environment:
conda create -n begansing python=3.9
  1. Activate the environment:
conda activate begansing
  1. Clone this repository to your local machine:
git clone https://github.com/ORI-Muchim/BEGANSing.git
  1. Navigate to the cloned directory:
cd BEGANSing
  1. Install the necessary dependencies:
pip install -r requirements.txt

Prepare Dataset

Inside the cloned folder, there is a folder called ./test_datasets. You can put the MIDI file and text file in it according to the format. MIDI and text should be arranged in the same number unconditionally. As an example, I will provide GFRIEND's "Rough" MIDI and text. And for the dataset to change the voice from the generated vocals, you can create a folder with the speaker's name in the ./datasets folder and put voice data for Retrieval Voice Conversion (RVC) in it. The following shows the ./datasets format.

BEGANSing
├────datasets
│       ├───kss
│       │   ├────1_0000.wav
│       │   ├────1_0001.wav
│       │   └────...
│       ├───{speaker_name}
│       │    ├───1.wav
└───────└────└───2.wav

This is just an example, and it's okay to add more speakers.

Preprocessing & Training

This pre-trained model is a model in which an additional 100 epochs was trained. For Preprocessing and Training, see Preprocessing, Training in the original repository.

Usage

python main.py {speaker_name} {song} {pitch_shift} --audiosr

If the speaker is male, it is recommended to set the {pitch_shift} value to -12, and if she is female, set it to 0.

The --audiosr option up-samples a voice generated at 22050hz to 48000hz. Use this option for those who have excellent graphics cards or don't mind taking a long time to generate a voice, or remove it if not.

Results

Audio samples at: https://soonbeomchoi.github.io/saebyulgan-blog/. Model was trained at RTX3090 24GB with batch size 32 for 2 days. BEGANSing tensorboard

To-Do

  • Change Vocoder Griffin-Lim -> HiFi-GAN

References

begansing's People

Contributors

dependabot[bot] avatar ori-muchim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.