Coder Social home page Coder Social logo

ricwg / am_gan Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 13.92 MB

Enhancing Acoustic Model for Children with Generative Adversarial Network

Python 34.95% Shell 52.32% Dockerfile 0.01% Perl 10.29% HTML 0.02% MATLAB 0.01% sed 0.01% Awk 0.01% Makefile 0.01% TeX 0.08% C++ 2.17% C 0.08% Cuda 0.06% Java 0.01%

am_gan's Introduction

AM_GAN: Enhancing Acoustic Model for Children with Generative Adversarial Network

Introduction

This is the repository for enhancing Acoustic Model with Generative Adversarial Network. In this project, we invetigate to train GAN to denoise audio and then use Li-GRU model to build the acoustic model.

The generator inside the GAN is a convolutional layer based encoder-decoder, and the discriminator is a classification model to predict the audio is noisy or not. The generator can learn to clean the noise (or add more noise) with the help of discrimiator. We built the GAN with Tensorflow. After we train the discriminator with supervised clean/noisy audios, we then train the generator to produce more audio samples.

Light Gated Recurrent Unit (Li-GRU) network is used to train the acoustic model. We use FMLLR, CMVN and 5-layer GRU layer to build the model, together with dropout and batch normalization. We use Kaldi-5 and Pytorch to implement this model.

Dependencies

  • Kaldi 5.x
  • Pytorch 1.1
  • Python 2.7 (for TensorFlow) and 3.6( for Pytorch)
  • TensorFlow 0.12

Data

Audios are coded with 8Kbps, 8 bit, PCM, .wav format. We train the GAN with 20 hours data, and the Li-GRU is trained with 200 hours data.

GAN Training

Sample command line to train can be :

python main.py --init_noise_std 0. --save_path AM_GAN \
               --init_l1_weight 100. --batch_size 50 --g_nl prelu \
               --save_freq 50 --preemph 0.95 --epoch 86 --bias_deconv True \
               --bias_downconv True --bias_D_conv True

Denoising

Here is the sample command to denoise an audio:

CUDA_VISIBLE_DEVICES="0" python main.py --init_noise_std 0. \
					--save_path AM_GAN  \
					--batch_size 100 \
					--g_nl prelu \
					--weights SEGAN-50 \
					--test_wav sourcefile.wav \
					--clean_save_path clean

Evaluation

Sample audios are in ./Sample. An example of noise of background door slam is clearly mitigated between the noisy audio and the cleaned audio at t=0.83s.

Noisy audio:

Noise

Li-GRU training

Dockerfile is uploaded to host the acoustic model training. The docker container will pull Pytorch 1.1 and Kaldi-5.x. After container is launched, the run.recipe.sh will be excuted inside the container. Log can be tailed in ./common, ./children_kaldi and ./children_kalditorch.

Command to build docker image, in the same directory of Dockerfile:

docker build -t image_name .

Scripts script.all.sh in directory common is to do the pre-processing of data, split dataset(script.trans.sh), build language model (if necessary, script.lang.sh) and prepared the dictionary. We fix the language model when evaluate the effect of GAN. script.wav.sh is to convert audio format if needed.

cd /storage/Work/common
bash script.all.sh > script.all.log 2> script.all.err

Scripts in children_kaldi(run.sh) follows the procedure in Kaldi framwork. It generates the MFCC/FMLLR features, CMVN nomalization, and train a basic neural network. run.dev.sh is the decoding process one dev set.

cd /storage/Work/Children_kaldi
bash run.sh > run.log 2> run.err
bash run.dev.sh > run.dev.log 2> run.dev.err

Scripts in children_kalditorch(run.sh) is to build Li-GRU layers with pytorch based on the neural network output from hildren_kaldi.

cd /storage/Work/Children_kalditorch
bash run.sh > run.log 2> run.err

Pytorch files to build the Li-GRU is

python3 run_exp.py cfg/children/liGRU_fmllr.cfg

liGRU_fmllr.cfg has the configuration of the Li-GRU layers. Here are the most important part:

	ligru_lay = 550,550,550,550,550
	ligru_drop = 0.2,0.2,0.2,0.2,0.2
	ligru_use_laynorm_inp = False
	ligru_use_batchnorm_inp = False
	ligru_use_laynorm = False,False,False,False,False
	ligru_use_batchnorm = True,True,True,True,True
	ligru_bidir = True
	ligru_act = relu,relu,relu,relu,relu
	ligru_orthinit=True

	arch_lr = 0.0002
	arch_halving_factor = 0.5
	arch_improvement_threshold = 0.001
	arch_opt = rmsprop
	opt_momentum = 0.0
	opt_alpha = 0.95
	opt_eps = 1e-8
	opt_centered = False
	opt_weight_decay = 0.0

In this project, GAN has been proved powerful to do the speech augmentation. Together with Li-GRU network, it shows significant improvement on the acoustic model for children.

am_gan's People

Contributors

ricwg avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.