Coder Social home page Coder Social logo

rnn-transducer's Introduction

RNN-Transducer

A Pytorch Implementation of Transducer Model for End-to-End Speech Recognition.

If you have any questions, please email to me! Email: [email protected]

Environment

  • pytorch >= 0.4
  • warp-transducer

Preparation

We utilize Kaldi for data preparation. At least these files(text, feats.scp) should be included in the training/development/test set. If you apply cmvn, utt2spk and cmvn.scp are required. The format of these file is consistent with Kaidi. The format of vocab is as follows.

<blk> 0
<unk> 1
我 2
你 3
...

Train

python train.py -config config/aishell.yaml

Eval

python eval.py -config config/aishell.yaml

Experiments

The details of our RNN-Transducer are as follows.

model:
    enc:
        type: lstm
        hidden_size: 320
        n_layers: 4
        bidirectional: True
    dec:
        type: lstm
        hidden_size: 512
        n_layers: 1
    embedding_dim: 512
    vocab_size: 4232
    dropout: 0.2

All experiments are conducted on AISHELL-1. During decoding, we use beam search with width of 5 for all the experiments. A character-level 5-gram language model from training text, is integrated into beam searching by shallow fusion.

MODEL DEV(CER) TEST(CER)
RNNT+pretrain+LM 10.13 11.82

Acknowledge

Thanks to warp-transducer.

rnn-transducer's People

Contributors

zhengkuntian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rnn-transducer's Issues

RNNTLoss

你好,可以提供RNNTLoss的代码吗

LIne 140 in train.py

学长好,train.py 的第140行应该是 'dev' 吗。
另外,prediction network中为什么要用随机的nn.Embedding代替one-hot vector呢?

Email error

Hi, I have a couple of questions regarding RNN-Transducer but can't send email to [email protected] with further detail. I would be very grateful to here back from you.

一个batch内的识别结果相同是什么原因?

hello,
仅用aishell的数据,按照你的脚本跑了一个实验,看到的信息如下:
2020-07-03 15:28:31,296 INFO] -Validation-Epoch: 0, AverageLoss:0.00000, AverageCER: 1133.79356 %
将一个batch下识别全部打印出来,发现一个batch中8条结果都是相同的,这是什么原因?是因为我的数据量太少,或者在前面几个epoch就会有这样的现象?

另外,README.md给出的10%左右的性能是多少数据量的集训练出来的?

Issue about eval.py

I didn't find the eval.py in this project ,should i write this by myself?

关于CER,在验证集上最好也就28?

为什么我跑的效果很差,验证集上最好CER=28%,试过调参,没什么效果。测试集上,目前CER=31%。是为什么呢,是因为没有预训练吗?

loss值为负数

您好,非常感谢您的分享。我在跑您代码的过程中,发现loss值从第一个epoch ,就为负值。且CER刚开始高达600%,最后收敛在100%。具体代码如下,麻烦您看一下,谢谢!
image

how to preprocess the dataset?

so what's the data structure of the

train: egs/aishell/data/train
dev: egs/aishell/data/dev
test: egs/aishell/data/test

Can you show me some examples?

how to preprocess the dataset?

so what's the data structure of the

train: egs/aishell/data/train
dev: egs/aishell/data/dev
test: egs/aishell/data/test

Can you show me some examples?

“RuntimeError: CUDA out of the memory. ” happens during training(epoch:0(34%~36%))

when I execute train.py as it mentioned in README.md, I met the problem in the topic[ RuntimeError: CUDA out of the memory. Tried to allocate 2.97GiB(GPU 0; 10.92GiB total capacity; 5.99GiB already allocated; 1.7GiB free; 8.63GiB resrved in total by Pytorch)]. Thougth I adjusted the configs[e.g.: bath size and feature dim] many times , but it didn't work.
Why does this model suddenly need so much memory in the middle of training? could anyone give me a hand?

A question about vocab

hi, thanks for your work. I am a newbie, I want to know how to build the vocab file, and the vocab file whether including blank or not. looking forward to your reply.

A question about beam search

@ZhengkunTian Thanks for your code. I have readen your blog about the rnn-transducer.

I want to know how did you implement the procedure of beam searching. I implemented this procedure using PyTorch by referring HawkAaron's implementation, but I found the beam search is too slow to use. So I wonder to know your implementation and your speed.

I'll be very appreciate if you can answer me.

Vocab

Can you share vocab? thanks

about loss

why are my loss always equal to zero?I checked the dimension and the code, when caculating the rnnloss, it became zero, is it the problem about my packege warp-transducer?

PRETRAINED MODEL

Can you release the pretrained model for RNN-T? It will be very difficult to start training from scratch

A question of recognize function implement

Hi:
In the implement of Transducer.recognize(), the prediction length is equal to or less than input length. In reality, I think the prediction length can longer than input length. I refered to HawkAaron/RNN-Transducer implemention, it has a While True recurrent in each iteration of input length which guarantees the prediction length can bigger than input length.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.