Coder Social home page Coder Social logo

streamingtransformer's Introduction

Streaming Transformer

This repo contains the streaming Transformer of our work On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition, which is based on ESPnet0.6.0. The streaming Transformer includes a streaming encoder, either chunk-based or look-ahead based, and a trigger-attention based decoder.

We will release following models and show reproducible results on Librispeech

Results on Librispeech (beam=10)

Model test-clean test-other latency size
streaming_transformer-chunk32-conv2d 2.8 7.5 640ms 78M
streaming_transformer-chunk32-vgg 2.8 7.0 640ms 78M
streaming_transformer-lookahead2-conv2d 3.0 8.6 1230ms 78M
streaming_transformer-lookahead2-vgg 2.8 7.5 1230ms 78M

Installation

Our installation follow the installation process of ESPnet

Step 1. setting of the environment

CUDAROOT=/path/to/cuda

export PATH=$CUDAROOT/bin:$PATH
export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH
export CFLAGS="-I$CUDAROOT/include $CFLAGS"
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT`

Step 2. installation including Kaldi

cd tools
make -j 10

Build a streaming Transformer model

Step 1. Data Prepare

cd egs/librispeech/asr1
./run.sh 

By default. the processed data will stored in the current directory. You can change the path by editing the scripts.

Step 2. Viterbi decoding

To train a TA based streaming Transformer, the alignments between CTC paths and transcriptions are required. In our work, we apply Viterbi decoding using the offline Transformer model.

cd egs/librispeech/asr1
./viterbi_decode.sh /path/to/model

Step 3. Train a streaming Transformer

Here, we train a chunk-based streaming Transformer which is initialized with an offline Transformer provided by ESPnet. Set enc-init in conf/train_streaming_transformer.yaml to the path of your offline model.

cd egs/librispeech/asr1
./train.sh

If you want to train a look-ahead based streaming Transformer, set chunk to False and change the left-window, right-window, dec-left-window, dec-right-window arguments. The training log is written in exp/streaming_transformer/train.log. You can monitor the output through tail -f exp/streaming_transformer/train.log

Step 4. Decoding

Execute the following script with to decoding on test_clean and test_other sets

./decode.sh num_of_gpu job_per_gpu

Offline Transformer Reference

Regarding the offline Transformer model, Please visit here

streamingtransformer's People

Contributors

kan-bayashi avatar kamo-naoyuki avatar sw005320 avatar hirofumi0810 avatar shigekikarita avatar fhrozen avatar gtache avatar b-flo avatar bobchennan avatar potato-inoue avatar r9y9 avatar unilight avatar simpleoier avatar takenori-y avatar yosukehiguchi avatar ftshijt avatar jnishi avatar emrys365 avatar mn5k avatar xiaofei-wang avatar creatorscan avatar cywang97 avatar masao-someki avatar yuekaizhang avatar sas91 avatar zh794390558 avatar butsugiri avatar lumaku avatar jzmo avatar enamoria avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.