Coder Social home page Coder Social logo

e2e-asr's Introduction

Graves 2013 experiments

File description

  • model.py: rnnt joint model
  • model2012.py: graves2012 model
  • train_rnnt.py: rnnt training script
  • train_ctc.py: ctc acoustic model training script
  • eval.py: rnnt & ctc decode
  • DataLoader.py: kaldi feature loader

Run

  • Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013

  • Train CTC acoustic model

python train_ctc.py --lr 1e-3 --bi --dropout 0.5 --out exp/ctc_bi_lr1e-3 --schedule
  • Train RNNT joint model
python train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule
  • Decode
python eval.py <path to best model> [--ctc] --bi

Results

Model PER
CTC 21.38
RNN-T 20.59

Requirements

Reference

e2e-asr's People

Contributors

hawkaaron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

e2e-asr's Issues

Two problems about training and decoding

Hi, @HawkAaron I'm trying to train transducer with pytorch (I prefer to use it rather than MxNet) and I changed the code of this repo following another implementation of MxNet. However, I found the model cannot converge to a good result. Is there something wrong in my code?

Another problem is that I try to replace the code of here with while loop. However the model cannot get out of the while loop, is there something different of two implementations?

missing txt file for training

Hi i wanted to integrate this transducer model into one of my project, so i wanted to try out to train this model using the script train_rnnt given in the repo.
but i get an error while opening the file 'data/lang/phones.txt'
can you please share this folder ? if not can you please tell me how is the data present in the file

thank you.

Questions about results

Hello Mingkun:
Firstly, thank you for contributing the code. I want to know if your ctc model and rnn transducer have achieved the results in Alex Graves' paper. Before that, my own ctc model without any LM achived PER 21 on TIMIT, but it's far from Alex, I also run your code followed as your default params and achieve PER 22. I am so confused about that. It would be great if you could give me some advice.
Best regards,
Zhengkun Tian

Any results?

Are there any results on any standard dataset?

undefined symbol: state

Hi. I am trying to train the rnn_t model using the pytorch binding. Would really appreciate if someone can shed some light on the issue I have.

When I run,

"" python3 train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule ""

I get the following error:

Traceback (most recent call last):
File "train_rnnt.py", line 12, in
from model import Transducer
File "/home/suhas/E2E-ASR/model.py", line 6, in
from warprnnt_pytorch import RNNTLoss
File "/usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/init.py", line 6, in
from .warp_rnnt import *
ImportError: /usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/warp_rnnt.cpython-36m-x86_64-linux-gnu.so: undefined symbol: state

I really do not know what to make of it.

My system details:
Ubuntu 18.04 LTS, CUDA-10.2
Pytorch built from source

The label decoded is not the same as the pmap data.

@HawkAaron I have met a mistake that the result decoded includes [52,53,54,55,56,57,......],
while the rephone length is just 51. Hence,I met a bug as followes:
the one of y (the result of decoded ) is as followes:
(53, 59, 53, 59, 56, 53, 43, 53, 43, 53, 5, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 48, 53, 59, 53, 25, 53, 43, 53, 43, 53, 43, 53, 59, 53, 59, 31, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 43, 53, 35, 53, 43, 53, 31, 53, 43, 53, 43, 53, 31, 53, 43, 53, 48, 53, 59, 5, 53, 32, 53, 43, 53, 59, 43, 31, 53, 59, 53, 25, 53, 25, 5, 25, 32, 53, 43, 53, 43, 53, 43, 53, 43, 53, 59, 53, 31, 53, 59, 53, 59)
Traceback (most recent call last):
File "eval.py", line 93, in
decode()
File "eval.py", line 84, in decode
y = [pmap[rephone[i]] for i in y]
File "eval.py", line 84, in
y = [pmap[rephone[i]] for i in y]
KeyError: 53

Why my pmap'(or y) length is not as long as the rephone ? And one of the phoneme such as 'sil' is not included in the pmap dict

Why we use vocab_size = 62?

Why we use vocab_size = 62? Not vocab_size = 10000, the same as RNN-Transducer paper?
Are some reasons in speech data processing?
Many thanks~

Question on feature transform

@HawkAaron
Hi, I have a question about your code on feature transform part.
According to Alex Graves 2013 paper, the feature applied is described as

The audio data was encoded using a Fourier-transform-based filter-bank with 40 coefficients (plus energy) distributed on a mel-scale, together with their first and second temporal derivatives. Each input vector was therefore size 123. The data were normalised so that every element of the input vectors had zero mean and unit variance over the training set.

In your code DataLoader.py the feature transform part is :

copy-feats scp:data_timit/{}/feats.scp ark:- | apply-cmvn --utt2spk=ark:data_timit/{}/utt2spk scp:data_timit/{}/cmvn.scp ark:- ark:- |\
 add-deltas --delta-order=2 ark:- ark:- | nnet-forward data_timit/final.feature_transform ark:- ark:- 

Correct me if I make an error here, I think the feature transform is already accomplished before the nnet-forward command.
So why did you use a nnet to make the feature embedding?

When I look into the feature_transform.sh, I got more confused that the net-forward part seems to be another feature normalization all over again, can you explain a little bit for this part? Thx

implement the ASR to TensorFlow warp binding

@HawkAaron Hi,
I am working on RNN-t training with this E2E-ASR repo, with pytorch binding. and I found you have another repo that warp MXnet and TensorFlow. I suppose that warp provides universal usage for 3 different bindings.
While I do not find a plug-in place in the training code. Do I need to change pytorch method manually? What is the right way to implement ASR training with TensorFlow warp binding?
Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.