Coder Social home page Coder Social logo

choihkk / cvaejets Goto Github PK

View Code? Open in Web Editor NEW
47.0 3.0 8.0 8.78 MB

Conditional Variational Auto-Encoder with Jointly Training FastSpeech2(+Conformer) and HiFi-GAN for End to End Text to Speech

License: MIT License

Dockerfile 0.07% Python 9.86% Makefile 0.04% HTML 1.71% Batchfile 0.04% Jupyter Notebook 88.28%

cvaejets's Introduction

Introduction

  1. FastSpeech2, HiFi-GAN, VITS, Conformer 오픈 소스를 활용하여 JETS(End-To-End)를 간단 구현하고 한국어 데이터셋(KSS)을 사용해 빠르게 학습합니다.
  2. Adversarial Training에서 Discriminator는 VITS에서 사용한 모듈을 그대로 사용합니다.
  3. 효과적인 Alignment Learning을 위해 Text Sequence 내부 blank token을 추가합니다.
  4. 본 레포지토리에서 HiFi-GAN에서 제안하는 l1 reconstructure loss(only log mel magnitude)를 그대로 사용하면 adversarial loss에서 issue가 발생합니다. 따라서 log stft magnitude와 l1 norm이 같이 계산되는 stft loss로 대체했습니다.
  5. 확장성을 위하여 기존 FastSpeech2 구조에서 Decoder 대신 VITS의 Normalizing Flows(CouplingLayer)를 사용하였습니다. 따라서 Posterior Encoder도 같이 사용됩니다. (Quality 향상, Voice Conversion 목적)
  6. 기존 Posterior Encoder는 Linear Spectrogram을 입력값으로 사용하지만, 본 레포지토리에서는 Mel Spectrogram을 사용합니다.
  7. 기존 오픈소스는 MFA기반 preprocessing을 진행한 상태에서 학습을 진행하지만 본 레포지토리에서는 alignment learning 기반 학습을 진행하고 preprocessing으로 인해 발생할 수 있는 디스크 용량 문제를 방지하기 위해 data_utils.py로부터 학습 데이터가 feeding됩니다.
  8. conda 환경으로 진행해도 무방하지만 본 레포지토리에서는 docker 환경만 제공합니다. 기본적으로 ubuntu에 docker, nvidia-docker가 설치되었다고 가정합니다.
  9. GPU, CUDA 종류에 따라 Dockerfile 상단 torch image 수정이 필요할 수도 있습니다.
  10. preprocessing 단계에서는 학습에 필요한 transcript와 stats 정도만 추출하는 과정만 포함되어 있습니다.
  11. 그 외의 다른 preprocessing 과정은 필요하지 않습니다.
  12. 직전 레포지토리 VAEJETS 보다 powerful하고 training time이 감소되었습니다.
  13. End-To-End & Adversarial training 기반이기 때문에 우수한 품질의 오디오를 생성하기 위해선 많은 학습을 필요로 합니다.

Dataset

  1. download dataset - https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset
  2. unzip /path/to/the/kss.zip -d /path/to/the/kss
  3. mkdir /path/to/the/CVAEJETS/data/dataset
  4. mv /path/to/the/kss.zip /path/to/the/CVAEJETS/data/dataset

Docker build

  1. cd /path/to/the/CVAEJETS
  2. docker build --tag CVAEJETS:latest .

Training

  1. nvidia-docker run -it --name 'CVAEJETS' -v /path/to/CVAEJETS:/home/work/CVAEJETS --ipc=host --privileged CVAEJETS:latest
  2. cd /home/work/CVAEJETS
  3. ln -s /home/work/CVAEJETS/data/dataset/kss
  4. python preprocess.py ./config/kss/preprocess.yaml
  5. python train.py -p ./config/kss/preprocess.yaml -m ./config/kss/model.yaml -t ./config/kss/train.yaml
  6. python train.py --restore_step <checkpoint step number> -p ./config/kss/preprocess.yaml -m ./config/kss/model.yaml -t ./config/kss/train.yaml
  7. arguments
  • -p : preprocess config path
  • -m : model config path
  • -t : train config path
  1. (OPTIONAL) tensorboard --logdir=outdir/logdir

Tensorboard losses

CVAEJETS-tensorboard-losses1 CVAEJETS-tensorboard-losses2

Tensorboard Stats

CVAEJETS-tensorboard-stats

Reference

  1. VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis
  2. JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
  3. Comprehensive-Transformer-TTS
  4. Comprehensive-E2E-TTS
  5. Conformer - paper
  6. FastSpeech2
  7. HiFi-GAN
  8. VAEJETS
  9. VITS

cvaejets's People

Contributors

ailab-choihk avatar choihkk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cvaejets's Issues

error when training

Thank you for the code.

I have this strange error when training using KSS:

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnConvolutionBackward. Traceback of forward call that caused the error:
  File "train.py", line 226, in <module>
    main(args, configs)
  File "train.py", line 102, in main
    y_d_hat_r, y_d_hat_g, _, _ = discriminator(wav_targets, wav_predictions.detach())

any ideas? Thanks.

Paper?

Hello, where is the paper corresponding to this work? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.