Coder Social home page Coder Social logo

waveode's Introduction

WaveODE

An ODE-based generative neural vocoder using Rectified Flow

Introduction

Recently ODE-based generative models are a hot topic in machine learning and image generation and have achieved remarkable performance. However, due to the differences in data distribution between images and waveforms, it is not clear how well these models perform on speech tasks. In this project, I implement an ODE-based generative neural coder called WaveODE using Rectified Flow [4] as the backbone and hope to contribute to the generalization of ODE-based generative models for speech tasks.

Pre-requisites

  • The testdata folder contains some example files that allow the project to run directly.
  • If you want to run with your own dataset:
    1. Replace the feature_dirs and fileid_list in config.json with your own dataset.
    2. Modify the acoustic parameters to match the data you are using and adjust the batch size to the number you need.

Training and inference

Generate MELs

python3 -u generate_mels.py --output testdata/train/ --wav_folder testdata/train/wavs/ --mel_folder testdata/train/mels/

Train WaveODE with 1-Rectified Flow from scratch

python3 -u train.py -c config.yaml -l logdir -m waveode_1-rectified_flow

Inference

  1. RK45 solver:
python3 inference.py --hparams config.yaml --checkpoint logdir/waveode_1-rectified_flow/M_0.pth --input test_mels_dir  --output synthesized_eval_rk45 --sampling_method rk45

python3 inference_mel.py --hparams config.yaml --checkpoint logdir/waveode_1-rectified_flow/M_12.pth --input test_mels_dir  --output synthesized_eval_rk45_mels --sampling_method rk45
  1. Euler sover:
python3 inference.py --hparams config.yaml --checkpoint logdir/waveode_1-rectified_flow/M_0.pth --input test_mels_dir  --output synthesized_eval_euler --sampling_method euler --sampling_steps 20

python3 inference_mel.py --hparams config.yaml --checkpoint logdir/waveode_1-rectified_flow/M_12.pth --input test_mels_dir  --output synthesized_eval_euler_mels --sampling_method euler --sampling_steps 20

Train WaveODE with 2-Rectified Flow

  1. Generate (noise, audio) tuples using 1-Rectified Flow:
python3 inference.py --hparams config.yaml --checkpoint logdir/waveode_1-rectified_flow/M_105.pth --input testdata/train/mels  --output testdata/generate
  1. Train 2-Rectified Flow using generated data
python3 -u train_reflow.py -c config_reflow.yaml -l logdir -m waveode_2-rectified_flow

Todo

  • Upload demos of Waveode on open-resources speech corpus such as LJSpeech and VCTK

Q&A

What is ODE-based generative models?

ODE-based generative model (also known as continuous normalizing flow) is a family of generative models that use an ODE-based model to model data distributions where the trajectory from an initial distribution such as a Gaussian distribution to a target distribution follows a ordinary differential equation.

There are some relevant papers:

[1] Neural ordinary differential equations (Chen et al. 2018) Paper

[2] FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models (Grathwohl et al. 2018) Paper

[3] Score-Based Generative Modeling through Stochastic Differential Equations (Song et al. 2021) Paper

[3] Flow Matching for Generative Modeling (Lipman et al. 2023) Paper

[4] Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow (Liu et al. 2023) Paper

[5] Stochastic Interpolants: A Unifying Framework for Flows and Diffusions (Albergo et al. 2023) Paper

[6] Action Matching: Learning Stochastic Dynamics From Samples (Neklyudov et al. 2022) Paper

[7] Riemannian Flow Matching on General Geometries (Chen et al. 2023) Paper

[8] Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport (Tong et al. 2023) Paper

[9] Minimizing Trajectory Curvature of ODE-based Generative Models (Lee et all. 2023) Paper

Why choose ODE-based model instead of SDE-based diffusion models or Denosing diffusion models?

Because ODE-based model is simpler in theory and implementation, it has become very popular recently.

Why artifacts and glitches exist in the generated samples?

Since Rectified Flow is a proposed approach based on image generation, it may need to be modified or improved for speech tasks. On the other hand, glitches in image generation (e.g., unnatural hands) are less likely to affect the overall image quality, but glitches in speech are naturally easy to capture perceptually.

How to improve Rectified Flow?

[5] proposed that the loss function of Rectified Flow is biased and [9] proposed that Rectified Flow estimates the upper bound of the degree of intersection of the independent coupling but does not really minimize it, and improvements based on the loss function might improve its quality

Reference

https://github.com/gnobitab/RectifiedFlow

waveode's People

Contributors

egorsmkv avatar welkinyang avatar

Stargazers

Artur avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.