Coder Social home page Coder Social logo

yoyololicon / constant-memory-waveglow Goto Github PK

View Code? Open in Web Editor NEW
34.0 7.0 6.0 24.46 MB

PyTorch implementation of NVIDIA WaveGlow with constant memory cost.

Python 100.00%
pytorch waveglow wavenet normalizing-flows flows glow convstant-memory nvidia waveflow

constant-memory-waveglow's Introduction

Constant Memory WaveGlow

DOI

A PyTorch implementation of WaveGlow: A Flow-based Generative Network for Speech Synthesis using constant memory method described in Training Glow with Constant Memory Cost.

The model implementation details are slightly differed from the official implementation based on personal favor, and the project structure is brought from pytorch-template.

Besides, we also add implementations of Baidu's WaveFlow, and MelGlow, which are easier to train and more memory fiendly.

In addition to neural vocoder, we also add an implementation of audio super-resolution model WSRGlow.

Requirements

After install the requirements from pytorch-template:

pip install nnAudio torch_optimizer

Quick Start

Modify the data_dir in the json file to a directory which has a bunch of wave files with the same sampling rate, then your are good to go. The mel-spectrogram will be computed on the fly.

{
  "data_loader": {
    "type": "RandomWaveFileLoader",
    "args": {
      "data_dir": "/your/data/wave/files",
      "batch_size": 8,
      "num_workers": 2,
      "segment": 16000
    }
  }
}
python train.py -c config.json

Memory consumption of model training in PyTorch

Model Memory (MB)
WaveGlow, channels=256, batch size=24 (naive) N.A.
WaveGlow, channels=256, batch size=24 (efficient) 4951

Result

WaveGlow

I trained the model on some cello music pieces from MusicNet using the musicnet_config.json. The clips in the samples folder is what I got. Although the audio quality is not very good, it's possible to use WaveGlow on music generation as well. The generation speed is around 470kHz on a 1080ti.

WaveFlow

I trained on full LJ speech dataset using the waveflow_LJ_speech.json. The settings are corresponding to the 64 residual channels, h=64 model in the paper. After training about 1.25M steps, the audio quality is very similiar to their official examples. Samples generated from training data can be listened here.

MelGlow

Coming soon.

WSRGlow

Pre-trained models on VCTK dataset are available here. We follow the settings of NU-Wave to get the training data.

Citation

If you use our code on any project and research, please cite:

@misc{memwaveglow,
  doi          = {10.5281/zenodo.3874330},
  author       = {Chin Yun Yu},
  title        = {Constant Memory WaveGlow: A PyTorch implementation of WaveGlow with constant memory cost},
  howpublished = {\url{https://github.com/yoyololicon/constant-memory-waveglow}},
  year         = {2019}
}

constant-memory-waveglow's People

Contributors

yoyololicon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

constant-memory-waveglow's Issues

A minor issue with the data loader

In the data loader, a list of SoundFile objects is kept in self.files which may cause problems in loading because of having too many files open when the memory is limited. I suggest to keep the file names instead and create the object in the __getitem__ method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.