Coder Social home page Coder Social logo

wangtao2668129173 / segan_pytorch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from santi-pdp/segan_pytorch

0.0 0.0 0.0 667 KB

Speech Enhancement Generative Adversarial Network in PyTorch

License: MIT License

Python 95.67% Shell 0.59% MATLAB 3.74%

segan_pytorch's Introduction

Speech Enhancement Generative Adversarial Network in PyTorch

Requirements

SoundFile==0.10.2
scipy==1.1.0
librosa==0.6.1
h5py==2.8.0
numba==0.38.0
torch==0.4.1
matplotlib==2.2.2
numpy==1.14.3
pyfftw==0.10.4
tensorboardX==1.4
torchvision==0.2.1

Ahoprocessing tools (ahoproc_tools) is also needed, and the public repo is found here.

Audio Samples

Latest denoising audio samples with baselines can be found in the segan+ samples website. SEGAN is the vanilla SEGAN version (like the one in TensorFlow repo), whereas SEGAN+ is the shallower improved version included as default parameters of this repo.

The voicing/dewhispering audio samples can be found in the whispersegan samples website. Artifacts can now be palliated a bit more with --interf_pair fake signals, more data than the one we had available (just 20 mins with 1 speaker per model) and longer training session by iterating more than 100 epoch.

Pretrained Models

SEGAN+ generator weights are released and can be downloaded in this link. Make sure you place this file into the ckpt_segan+ directory to make it work with the proper train.opts config file within that folder. The script run_segan+_clean.sh will properly read the ckpt in that directory as it is configured to be used with this referenced file.

Introduction to scripts

Two models are ready to train and use to make wav2wav speech enhancement conversions. SEGAN+ is an improved version of SEGAN [1], denoising utterances with its generator network (G).

SEGAN+_G

To train this model, the following command should be ran:

python train.py --save_path ckpt_segan+ --batch_size 300 \
		--clean_trainset data/clean_trainset \
		--noisy_trainset data/noisy_trainset \
		--cache_dir data/cache

Read run_segan+_train.sh for more guidance. This will use the default parameters to structure both G and D, but they can be tunned with many options. For example, one can play with --d_pretrained_ckpt and/or --g_pretrained_ckpt to specify a departure pre-train checkpoint to fine-tune some characteristics of our enhancement system, like language, as in [2].

Cleaning files is done by specifying the generator weights checkpoint, its config file from training and appropriate paths for input and output files (Use soundfile wav writer backend (recommended) specifying the --soundfile flag):

python clean.py --g_pretrained_ckpt ckpt_segan+/<weights_ckpt_for_G> \
		--cfg_file ckpt_segan+/train.opts --synthesis_path enhanced_results \
		--test_files data/noisy_testset --soundfile

Read run_segan+_clean.sh for more guidance.

There is a WSEGAN, which stands for the dewhispering SEGAN [3]. This system is activated (rather than vanilla SEGAN) by specifying the --wsegan flag. Additionally, the --misalign_pair flag will add another fake pair to the adversarial loss indicating that content changes between input and output of G is bad, something that improved our results for [3].

References:

  1. SEGAN: Speech Enhancement Generative Adversarial Network (Pascual et al. 2017)
  2. Language and Noise Transfer in Speech Enhancement GAN (Pascual et al. 2018)
  3. Whispered-to-voiced Alaryngeal Speech Conversion with GANs (Pascual et al. 2018)

Cite

@article{pascual2017segan,
  title={SEGAN: Speech Enhancement Generative Adversarial Network},
  author={Pascual, Santiago and Bonafonte, Antonio and Serr{\`a}, Joan},
  journal={arXiv preprint arXiv:1703.09452},
  year={2017}
}

Notes

  • Multi-GPU is not supported yet in this framework.
  • Virtual Batch Norm is not included as in the very first SEGAN code, as similar results to those of original paper can be obtained with regular BatchNorm in D (ONLY D).
  • If using this code, parts of it, or developments from it, please cite the above reference.
  • We do not provide any support or assistance for the supplied code nor we offer any other compilation/variant of it.
  • We assume no responsibility regarding the provided code.

segan_pytorch's People

Contributors

santi-pdp avatar chemingway avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.