Coder Social home page Coder Social logo

samplernn's Introduction

SampleRNN for speech synthesys

Keras implementation of SampleRNN model published here. This repo does only three tier architecture model. Original audio sequence is feed to 3 inputs. Input_1(on picture) goes to slow tier RNN that groups 8 audio samples into 1 timestep. Mid tier gets 2 audio samples at the time plus input from slow tier(see add_1). Finally the samples are being generated by MLP that gets embedding of the previos audio sample(input_3) and output from mid tier layer(see add_2)

Audio preprocessing

Before we can start training audio must undergo some preprocessing. The process to follow is:

  • mkdir -p blizzard/tiny
  • copy some wav files to ./blizzard/tiny; for example 1 min of audio in total
  • run python preprocess.py $PWD/blizzard/tiny
  • blizzard/tiny_parts now contains audio material split into 8seconds long chunks

Baseline

Original implementation of the SampleRNN can be found here. It served as baseline reference during the development. Training results on 'tiny' (see below) dataset were compared with the baseline. Below the costs in bits per sequence for this code and baseline are shown

This code epoch Training Validation
1 3.98438 4.87372
10 2.29819 4.14896
Baseline epoch Training Validation
1 3.9624 4.9070
10 2.6645 4.2562

Training

Unfortunately start/stop indexes to separate validation and training data sets are to be picked manually. Depending on the dataset size. Following values were used for 2 datasets namely tiny and blizzard2013. Index of last training sequence is given by --trainstop command arg(see below) and --validstop points to the last validation sequence index.

Dataset --trainstop --validstop minibatch size
tiny(~50sec) 4 6 2
blizzard2013(~20h) 8000 9000 100

To start training run THEANO_FLAGS=device=cpu,mode=FAST_RUN python train_srnn.py --exp=tiny --slowdim=32 --dim=32 --cutlen=512 --batchsize=2 --validstop=6 --trainstop=4. This will create model with 32 hidden units in each layer and run tbpp for 512 timestamps (due to --cutlen=512). Using theano backend and CPU to compute.

After about 3 epochs of the training on blizzard2013 dataset the model should be able to generate nice looking and even sounding samples.

Sampling

Training process produces files named <tiny|all>_srnn_sz<dim>_e<epoch>.h5 with model weights every --svepoch and in the end of the training. Choose the one with the best validation performance to generate a wav sample. For example THEANO_FLAGS=device=cpu,mode=FAST_RUN python train_srnn.py --exp=tiny --slowdim=32 --dim=32 --cutlen=512 --batchsize=2 --validstop=6 --trainstop=4 --sample=<filename> will produce generated.wav

Sampling from pretrained model

This repo contains a file allmost_1e.h5 with model weights after about 12 hours of training on blizzard2013 using Colab's K80 GPU. Thus it is possible to try it right away and do audio sampling using following command THEANO_FLAGS=device=cpu,mode=FAST_RUN python train_srnn.py --slowdim=1024 --dim=1024 --sample=allmost_1e.h5. Which will use CPU and Theano backend to do the work and produce something like that sample The audio sample shown on the picture can be found in sample4s.wav

samplernn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

samplernn's Issues

Why Python 2 ?

Out of curiosity, why didn't you code this with python3 like most deep learning implementation nowadays ?

Upgrading to keras 2.1.2 broke GruWithWeightNorm

Worked in Keras-2.0.5 but went broken in keras-2.1.2

File "train_srnn.py", line 188, in <module> mlp_activation='relu') File "/Users/sz/deep/samplernn/srnn.py", line 262, in __init__ self.slow_tier_model, initial_state=self.slow_rnn_h0) File "/Users/sz/anaconda/lib/python2.7/site-packages/keras/layers/recurrent.py", line 522, in __call__ return super(RNN, self).__call__(inputs, **kwargs) File "/Users/sz/anaconda/lib/python2.7/site-packages/keras/engine/topology.py", line 603, in __call__ output = self.call(inputs, **kwargs) File "/Users/sz/deep/samplernn/srnn.py", line 151, in call constants = self.get_constants(inputs, training=None) AttributeError: 'GruWithWeightNorm' object has no attribute 'get_constants'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.