Coder Social home page Coder Social logo

vocgan's Introduction

Modified VocGAN

Donate using Liberapay
This repo implements modified version of VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network using Pytorch, for actual VocGAN checkout to baseline branch. I bit modify the VocGAN's generator and used Full-Band MelGAN's discriminator instead of VocGAN's discriminator, as in my research I found MelGAN's discriminator is very fast while training and enough powerful to train Generator to produce high fidelity voice whereas VocGAN Hierarchically-nested JCU discriminator is quite huge and extremely slows the training process.

Tested on Python 3.6

pip install -r requirements.txt

Prepare Dataset

  • Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
  • preprocess: python preprocess.py -c config/default.yaml -d [data's root path]
  • Edit configuration yaml file

Train & Tensorboard

  • python trainer.py -c [config yaml file] -n [name of the run]

    • cp config/default.yaml config/config.yaml and then edit config.yaml
    • Write down the root path of train/validation files to 2nd/3rd line.
  • tensorboard --logdir logs/

Notes

  1. This repo implements modified VocGAN for faster training although for true VocGAN implementation please checkout baseline branch, In my testing I am available to generate High-Fidelity audio in real time from Modified VocGAN.
  2. Training cost for baseline VocGAN's Discriminator is too high (2.8 sec/it on P100 with batch size 16) as compared to Generator (7.2 it/sec on P100 with batch size 16), so it's unfeasible for me to train this model for long time.
  3. May be we can optimizer baseline VocGAN's Discriminator by downsampling the audio on pre-processing stage instead of Training stage (currently I used torchaudio.transform.Resample as layer for downsampling the audio), this step might be speed-up overall Discriminator training.
  4. I trained baseline model for 300 epochs (with batch size 16) on LJSpeech, and quality of generated audio is similar to the MelGAN at same epoch on same dataset. Author recommend to train model till 3000 epochs which is not feasible at current training speed (2.80 sec/it).
  5. I am open for any suggestion and modification on this repo.
  6. For more complete and end to end Voice cloning or Text to Speech (TTS) toolbox ๐Ÿค– please visit Deepsync Technologies.

Inference

  • python inference.py -p [checkpoint path] -i [input mel path]

Pretrained models

Two pretrained model are provided. Both pretrained models are trained using modified-VocGAN structure.

Audio Samples

Using pretrained models, we can reconstruct audio samples. Visit here to listen.

Results

[WIP]

References

vocgan's People

Contributors

rishikksh20 avatar carankt avatar jackson-kang avatar 0xflotus avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.