zhengkuntian / rnn-transducer Goto Github PK

View Code? Open in Web Editor NEW

230.0 8.0 57.0 37 KB

A Pytorch Implementation of Transducer Model for End-to-End Speech Recognition

Python 100.00%

rnn-transducer's Introduction

RNN-Transducer

A Pytorch Implementation of Transducer Model for End-to-End Speech Recognition.

If you have any questions, please email to me! Email: [email protected]

Environment

pytorch >= 0.4
warp-transducer

Preparation

We utilize Kaldi for data preparation. At least these files(text, feats.scp) should be included in the training/development/test set. If you apply cmvn, utt2spk and cmvn.scp are required. The format of these file is consistent with Kaidi. The format of vocab is as follows.

<blk> 0
<unk> 1
我 2
你 3
...

Train

python train.py -config config/aishell.yaml

Eval

python eval.py -config config/aishell.yaml

Experiments

The details of our RNN-Transducer are as follows.

model:
    enc:
        type: lstm
        hidden_size: 320
        n_layers: 4
        bidirectional: True
    dec:
        type: lstm
        hidden_size: 512
        n_layers: 1
    embedding_dim: 512
    vocab_size: 4232
    dropout: 0.2

All experiments are conducted on AISHELL-1. During decoding, we use beam search with width of 5 for all the experiments. A character-level 5-gram language model from training text, is integrated into beam searching by shallow fusion.

MODEL	DEV(CER)	TEST(CER)
RNNT+pretrain+LM	10.13	11.82

Acknowledge

Thanks to warp-transducer.

rnn-transducer's People

Contributors

Stargazers

Watchers

Forkers

zw76859420 lzr9926 fireae gdsttian templeblock zhengjunyue junailin sprinterzzj chmenet hiyoung-asr gaoyiyeah jiabinxue jinsoo247 entn-at hwong39 ideaplexus fengyen-chang iamweiweishi wangjianlu chunhuiwang-china hdmjdp geoffrey0822 alan101-tech ybno1 superhg2012 xieliang555 kedengfeng gentaiscool peiyanflying zhaoyi2 jinsongpan elenazy cst781 jupinter linkonbsmrstu haoyz zweilin314 li563042811 jhvmhg yihuan-github ishine twang18 adeamoy shiyuzh2007 ustclight-sls wylqq312715289 scufan1990 niitrr sciai-ai zhangp85 oshindow buddy23333 verete17 alberto-villarreal circular-aflame cxr321 ruoyxue

rnn-transducer's Issues

RNNTLoss

你好，可以提供RNNTLoss的代码吗

请问预训练和LM的代码在哪里啊？

为什么我直接运行train.py后的损失是0？是因为没有进行预训练和LM嘛？

LIne 140 in train.py

学长好，train.py 的第140行应该是 'dev' 吗。
另外，prediction network中为什么要用随机的nn.Embedding代替one-hot vector呢?

Email error

Hi, I have a couple of questions regarding RNN-Transducer but can't send email to [email protected] with further detail. I would be very grateful to here back from you.

请问160维MFCC特征是如何产生的？

请问160维MFCC特征是如何产生的？mfcc.conf 的参数该如何设置

一个batch内的识别结果相同是什么原因？

hello,
仅用aishell的数据，按照你的脚本跑了一个实验，看到的信息如下：
2020-07-03 15:28:31,296 INFO] -Validation-Epoch: 0, AverageLoss:0.00000, AverageCER: 1133.79356 %
将一个batch下识别全部打印出来，发现一个batch中8条结果都是相同的，这是什么原因？是因为我的数据量太少，或者在前面几个epoch就会有这样的现象？

另外，README.md给出的10%左右的性能是多少数据量的集训练出来的？

Issue about eval.py

I didn't find the eval.py in this project ,should i write this by myself?

关于CER，在验证集上最好也就28？

为什么我跑的效果很差，验证集上最好CER=28%，试过调参，没什么效果。测试集上，目前CER=31%。是为什么呢，是因为没有预训练吗？

loss值为负数

您好，非常感谢您的分享。我在跑您代码的过程中，发现loss值从第一个epoch ，就为负值。且CER刚开始高达600%，最后收敛在100%。具体代码如下，麻烦您看一下，谢谢！

Is there a pre-trained model available?

Is there any pre-trained model available to evaluate the results.

how to preprocess the dataset?

so what's the data structure of the

train: egs/aishell/data/train
dev: egs/aishell/data/dev
test: egs/aishell/data/test

Can you show me some examples?

Structure of egs/timit/data/{train,dev,test}

Hi,
Thanks for sharing the implementation. I'm unable to comprehend the structure of egs/timit/data/{train,dev,test} directories. Can you please help me with that?

请问训练过程中loss降到负了是正常的吗？

按照原文定义Loss=-ln(Pr(y*|x))，理论上来说铁定是正值。
但具体实现中有中途取log，换用黑盒子JointNet生成h(k,t,u)，这些处理是否使得loss可以为负？

how to preprocess the dataset?

so what's the data structure of the

train: egs/aishell/data/train
dev: egs/aishell/data/dev
test: egs/aishell/data/test

Can you show me some examples?

loss降低到四五十后，不再下降了，在稳步上升。是啥情况。

当前只预训练了language model，是不是 encoder 需要CTC pretrain，不然这个模型跑不出来？

“RuntimeError: CUDA out of the memory. ” happens during training(epoch:0(34%~36%))

when I execute train.py as it mentioned in README.md, I met the problem in the topic[ RuntimeError: CUDA out of the memory. Tried to allocate 2.97GiB(GPU 0; 10.92GiB total capacity; 5.99GiB already allocated; 1.7GiB free; 8.63GiB resrved in total by Pytorch)]. Thougth I adjusted the configs[e.g.: bath size and feature dim] many times , but it didn't work.
Why does this model suddenly need so much memory in the middle of training？ could anyone give me a hand?

A question about vocab

hi, thanks for your work. I am a newbie, I want to know how to build the vocab file, and the vocab file whether including blank or not. looking forward to your reply.

A question about beam search

@ZhengkunTian Thanks for your code. I have readen your blog about the rnn-transducer.

I want to know how did you implement the procedure of beam searching. I implemented this procedure using PyTorch by referring HawkAaron's implementation, but I found the beam search is too slow to use. So I wonder to know your implementation and your speed.

I'll be very appreciate if you can answer me.

train.py return these errors

But I can't figure out which one is the 'NoneType' variables.

关于encoder代码的问题

Vocab

Can you share vocab? thanks

about loss

why are my loss always equal to zero?I checked the dimension and the code, when caculating the rnnloss, it became zero, is it the problem about my packege warp-transducer?

RNNT+pretrain+LM中的pretrain指的是预训练的什么呢？

PRETRAINED MODEL

Can you release the pretrained model for RNN-T? It will be very difficult to start training from scratch

这要训练多少轮效果才好，感觉损失都好高，效果很差

A question of recognize function implement

Hi:
In the implement of Transducer.recognize(), the prediction length is equal to or less than input length. In reality, I think the prediction length can longer than input length. I refered to HawkAaron/RNN-Transducer implemention, it has a While True recurrent in each iteration of input length which guarantees the prediction length can bigger than input length.