hawkaaron / e2e-asr Goto Github PK

View Code? Open in Web Editor NEW

125.0 10.0 27.0 44 KB

PyTorch Implementations for End-to-End Automatic Speech Recognition

Python 83.03% Shell 16.97%

asr speech-recognition transducer rnnt-joint pytorch end-to-end

e2e-asr's Introduction

Graves 2013 experiments

File description

model.py: rnnt joint model
model2012.py: graves2012 model
train_rnnt.py: rnnt training script
train_ctc.py: ctc acoustic model training script
eval.py: rnnt & ctc decode
DataLoader.py: kaldi feature loader

Run

Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013
Train CTC acoustic model

python train_ctc.py --lr 1e-3 --bi --dropout 0.5 --out exp/ctc_bi_lr1e-3 --schedule

Train RNNT joint model

python train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule

Decode

python eval.py <path to best model> [--ctc] --bi

Results

Model	PER
CTC	21.38
RNN-T	20.59

Requirements

Python 3.6
PyTorch >= 0.4
numpy 1.14
warp-transducer

Reference

RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
RNNT joint (Graves 2013): Speech Recognition with Deep Recurrent Neural Networks
(PyTorch End-to-End Models for ASR)[https://github.com/awni/speech]
(A Fast Sequence Transducer GPU Implementation with PyTorch Bindings)[https://github.com/HawkAaron/warp-transducer/tree/add_network_accelerate]

e2e-asr's People

Contributors

Stargazers

Watchers

e2e-asr's Issues

RNNT using Lingvo framework

Hi @HawkAaron
Thanks for this implementation. I found it very usefull.

Do you have any plans to implement this using Lingvo framework?

Two problems about training and decoding

Hi, @HawkAaron I'm trying to train transducer with pytorch (I prefer to use it rather than MxNet) and I changed the code of this repo following another implementation of MxNet. However, I found the model cannot converge to a good result. Is there something wrong in my code?

Another problem is that I try to replace the code of here with while loop. However the model cannot get out of the while loop, is there something different of two implementations?

missing txt file for training

Hi i wanted to integrate this transducer model into one of my project, so i wanted to try out to train this model using the script train_rnnt given in the repo.
but i get an error while opening the file 'data/lang/phones.txt'
can you please share this folder ? if not can you please tell me how is the data present in the file

thank you.

Questions about results

Hello Mingkun:
Firstly, thank you for contributing the code. I want to know if your ctc model and rnn transducer have achieved the results in Alex Graves' paper. Before that, my own ctc model without any LM achived PER 21 on TIMIT, but it's far from Alex, I also run your code followed as your default params and achieve PER 22. I am so confused about that. It would be great if you could give me some advice.
Best regards,
Zhengkun Tian

Any results?

Are there any results on any standard dataset?

Is log_softmax needed for activations on GPU to use your Transducer Loss? cf https://github.com/HawkAaron/E2E-ASR/blob/04d416b1c32a8cbe55aa7527cfce25739339cbd5/model.py#L83

Getting same output in greedy decode.

undefined symbol: state

Hi. I am trying to train the rnn_t model using the pytorch binding. Would really appreciate if someone can shed some light on the issue I have.

When I run,

"" python3 train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule ""

I get the following error:

Traceback (most recent call last):
File "train_rnnt.py", line 12, in
from model import Transducer
File "/home/suhas/E2E-ASR/model.py", line 6, in
from warprnnt_pytorch import RNNTLoss
File "/usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/init.py", line 6, in
from .warp_rnnt import *
ImportError: /usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/warp_rnnt.cpython-36m-x86_64-linux-gnu.so: undefined symbol: state

I really do not know what to make of it.

My system details:
Ubuntu 18.04 LTS, CUDA-10.2
Pytorch built from source

feature_transform makes 69 dimension

After "run.sh" and "feature_transform.sh", 69 dim features are given.

It makes error at 54 line in train_rnnt.py .

thank you

The label decoded is not the same as the pmap data.

@HawkAaron I have met a mistake that the result decoded includes [52,53,54,55,56,57,......],
while the rephone length is just 51. Hence，I met a bug as followes:
the one of y (the result of decoded ) is as followes:
(53, 59, 53, 59, 56, 53, 43, 53, 43, 53, 5, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 48, 53, 59, 53, 25, 53, 43, 53, 43, 53, 43, 53, 59, 53, 59, 31, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 43, 53, 35, 53, 43, 53, 31, 53, 43, 53, 43, 53, 31, 53, 43, 53, 48, 53, 59, 5, 53, 32, 53, 43, 53, 59, 43, 31, 53, 59, 53, 25, 53, 25, 5, 25, 32, 53, 43, 53, 43, 53, 43, 53, 43, 53, 59, 53, 31, 53, 59, 53, 59)
Traceback (most recent call last):
File "eval.py", line 93, in
decode()
File "eval.py", line 84, in decode
y = [pmap[rephone[i]] for i in y]
File "eval.py", line 84, in
y = [pmap[rephone[i]] for i in y]
KeyError: 53

Why my pmap'(or y) length is not as long as the rephone ? And one of the phoneme such as 'sil' is not included in the pmap dict

Why we use vocab_size = 62?

Why we use vocab_size = 62? Not vocab_size = 10000, the same as RNN-Transducer paper?
Are some reasons in speech data processing?
Many thanks~

eval.py error

Question on feature transform

@HawkAaron
Hi, I have a question about your code on feature transform part.
According to Alex Graves 2013 paper, the feature applied is described as

The audio data was encoded using a Fourier-transform-based filter-bank with 40 coefficients (plus energy) distributed on a mel-scale, together with their first and second temporal derivatives. Each input vector was therefore size 123. The data were normalised so that every element of the input vectors had zero mean and unit variance over the training set.

In your code DataLoader.py the feature transform part is :

copy-feats scp:data_timit/{}/feats.scp ark:- | apply-cmvn --utt2spk=ark:data_timit/{}/utt2spk scp:data_timit/{}/cmvn.scp ark:- ark:- |\
 add-deltas --delta-order=2 ark:- ark:- | nnet-forward data_timit/final.feature_transform ark:- ark:-

Correct me if I make an error here, I think the feature transform is already accomplished before the nnet-forward command.
So why did you use a nnet to make the feature embedding?

When I look into the feature_transform.sh, I got more confused that the net-forward part seems to be another feature normalization all over again, can you explain a little bit for this part? Thx

implement the ASR to TensorFlow warp binding

@HawkAaron Hi,
I am working on RNN-t training with this E2E-ASR repo, with pytorch binding. and I found you have another repo that warp MXnet and TensorFlow. I suppose that warp provides universal usage for 3 different bindings.
While I do not find a plug-in place in the training code. Do I need to change pytorch method manually? What is the right way to implement ASR training with TensorFlow warp binding?
Thanks in advance.

the question that I use the timit datasets ,but the rnnt model does not coverage！

I adopt this model with timit datasets, but the rnnt model does not coverage, the loss is about 500+, and I want to know the reason which cause it?