Coder Social home page Coder Social logo

end2endasr's Introduction

End-to-End Automatic Speech Recognition use tensorflow

use tensorflow to implement a end-to-end algorithm according baidu deepspeech paperDeepSpeech,DeepSpeech2 and seq2seq listener-attention-speller modelListen, Attend and Spell Towards better decoding and language model integration in sequence to Sequence model

0 change log:

2018.8.23:(1)fix ds2 model validation need seqlen input errors (2)divide train parameters and neual network graph params. (3)add test script

1 prerequests.

see requirements.txt

2 data and preprocess

2.1 English corpus:LibriSpeech

2.2 Chinese corpus:THCHS-30

2.3 preprocess

1.librispeech

usage: libri_preprocess [-h] [-m {mfcc,fbank,log}] [-f {13,81,161}]
                        [-wl WINLEN] [-ws WINSTEP] [-s SPLIT]
                        [-n {dev-clean,dev-other,test-clean,test-other,train-clean-100,train-clean-360,train-other-500}]
                        path save jsonfile
param:
     -m, feature type, mfcc=13 dim,fbank=40dim,log=81or161 dim
     -f, feature dims, mfcc=13,log=81,161
     -n, librispeech corpus sub dirs
     path,corpus data dir
     save,feature save dir
     jsonfile, json file to index all wav feature and ground truth
sample script
$python libri_preprocess.py -m log -f 81 -n dev-clean ~/asr_corpus/librispeech/LibriSpeech ~/asr_corpus/librispeech_feat ~/asr_corpus/dev-clean-featlabel.json
  

3 train

run_train.py

usage: run_train.py [-h] [-rc {gru,lstm,rnn}] [-b BATCH_SIZE] [-n HIDDENS]
                    [-f {13,39,81,161}] [-c CLASSES] [-rl RNN_LAYERS]
                    [-cl CONV_LAYERS] [-g GPUS] [-a {relu,tanh,sigmod}]
                    [-o OPTIMIZER] [-lr LEARNING_RATE] [-k KEEP_PROB]
                    [-gc GRAD_CLIP] [-m MODE] [-r RESTOREMODEL]
                    [-bn BATCHNORM] [-p EPOCHS] [-i INITIAL_EPOCH] -t
                    TRAINFILES [TRAINFILES ...] -d DEVFILES [DEVFILES ...] -s
                    SAVEPATH [-gf GPU_FRACTION] [-md MODEL]
                    [-ub USE_BIDIRECTIONAL_RNN] [-us USE_SUMMARY]
                    [-v VOCABFILE] [--ps_hosts PS_HOSTS] [--ws_hosts WS_HOSTS]
                    [--job_name {ps,worker}] [--task_index TASK_INDEX]

3.1 single machine training

deepspeech2 model
$./run_libri_ds2_train.sh

las model 
$./run_libri_las_train.sh

3.2 clustering machine training

two gpu cards.
deepspeech2 model
$./run_libri_ds2_train_dist.sh ps 0
$./run_libri_ds2_train_dist.sh worker 0
$./run_libri_ds2_train_dist.sh worker 1

las model 
$./run_libri_las_train_dist.sh ps 0
$./run_libri_las_train_dist.sh worker 0
$./run_libri_las_train_dist.sh worker 1

4 test

5 other

end2endasr's People

Contributors

cdyangbo avatar vivienzou1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.