jindongwang / easyespnet Goto Github PK

51.0 5.0 3.0 27 KB

Making Espnet easier to use

License: Apache License 2.0

Python 100.00%

espnet speech-recognition speech asr speechrecognition toolkit easy-to-use

easyespnet's Introduction

EasyEspnet

ESPNet is a popular tool for end-to-end speech processing. However, it is not that easy to install, learn, and use. For instance, it is in Kaldi style that must run in shell scripts (i.e., its run.sh file). This makes it not easy to use, debug, and deploy in online environments.

We provide a wraper for ESPNet, which we call EasyEspnet, for easier usage of ESPNet. This code base will make it easier to write/run/debug your codes in a more friendly Python style.

Requirements

Of course we are not an independent tool. So you need to correctly install ESPNet first. But we know that the installation of ESPNet is also not that easy (slow; tedious configurations etc.). Thus, we provide a all-in-one docker image for your to use. All you need to do is install docker. Then, pull our ESPNet image:

docker pull jindongwang/espnet:all11

Then, you can directly run ESPNet in this docker. Note that this docker itself already contains the ESPNet codebase. So you do not need to install it again. Docker makes it much easier to submit speech recognition jobs in a cloud environment since most of the cloud computing platforms support docker.

Run

Currently, this repo supports ASR tasks only. All you need is to extract features using Espnet and set the data folder path. To extract features using ESPNet, you can run bash run.sh --stop_state 2 inside an example of ESPNet such as egs/an4/asr1/.

There are three main Python files to use:

train.py: the core script to execute ASR model training, decoding and evaluating.
data_load.py: contains the data configuration which is necessary to specify before training your model and related data loading functions
utils.py: contains various utility functions including model saving/loading, recognizing and evaluating functions

You need to check or modify in train.py arg_list, config should be in ESPnet config style (remember to include decoding information if you want to compute cer/wer), then, you can run train.py. For example,

python train.py --root_path an4/asr1 --dataset an4

Done. Results (log, model, snapshots) are saved in results_(dataset)/(config_name) by default.

Demo

We provide the processed features using an4 as demo.

To run this demo, please execute:

Download and unzip the features:

mkdir data; cd data; 
wget https://transferlearningdrive.blob.core.windows.net/teamdrive/dataset/speech/an4_features.tar.gz
tar -zxvf an4_features.tar.gz; rm an4_features.tar.gz; cd ..

Start training with EasyEspnet:

python train.py --root_path data/an4/asr1/ --dataset an4

Decoding and WER/CER evaluation

Set --decoding_mode to true to perform decoding and CER/WER evaluation. For example:

python train.py --decoding_mode true

Distributed training

EasyEspnet supports multi-GPU training by default using Pytorch DataParallel, but it also supports PyTorch DistributedDataParallel training which is much faster. For example, using 2 GPUs, 1 node:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --dist_train true

Acknowledgement

ESPNet: https://github.com/espnet/espnet

Contact

easyespnet's People

Contributors

Stargazers

Watchers

Forkers

halspeech xiexukang sciai-ai

easyespnet's Issues

Unreasonable WER obtained by MMD training in CMatch

The problem encountered during CMatchASR training is that when I trained using train.yaml, I got a word error rate of 22% on the data libriadapt_en_us_clean_matrix. Then I used the model.loss.best model saved at this time as the value of the load_pretrained_model parameter, and used libriadapt_en_us_clean_pseye as the target data for mmd domain adaptation training. After 39 epoch of training, I got train loss: 310.2868, dev loss: 302.119, test loss: 223.363, and test wer was 147.

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Hi, when I try the demo in docker, it appeared this problem.

root@Oision-Legion-R7000P2021H:~/EasyEspnet# python train.py --root_path data/an4/asr1/ --dataset an4
2022-03-28 03:29:05,274 (utils:21) WARNING: Skip DEBUG/INFO messages
2022-03-28 03:29:05,349 (train:179) WARNING: ngpu: 1
2022-03-28 03:29:06,526 (data_load:94) WARNING: #Train Json data/an4/asr1/dump/train_nodev/deltafalse/data.json: 848
2022-03-28 03:29:06,526 (data_load:95) WARNING: #Dev Json data/an4/asr1/dump/train_dev/deltafalse/data.json: 100
2022-03-28 03:29:06,526 (data_load:96) WARNING: #Test Json data/an4/asr1/dump/test/deltafalse/data.json: 130
2022-03-28 03:38:48,454 (train:301) WARNING: Total parameter of the model = 27181116
2022-03-28 03:38:48,455 (train:305) WARNING: Trainable parameter of the model = 27181116
Traceback (most recent call last):
  File "train.py", line 315, in <module>
    train(dataloaders, model, optimizer, save_path)
  File "train.py", line 107, in train
    train_stats = train_epoch(train_loader, model, optimizer)
  File "train.py", line 55, in train_epoch
    loss = model(fbank, seq_lens, tokens).mean() # / self.accum_grad
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/espnet/espnet/nets/pytorch_backend/e2e_asr_transformer.py", line 178, in forward
    hs_pad, hs_mask = self.encoder(xs_pad, src_mask)
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/espnet/espnet/nets/pytorch_backend/transformer/encoder.py", line 298, in forward
    xs, masks = self.embed(xs, masks)
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/espnet/espnet/nets/pytorch_backend/transformer/subsampling.py", line 75, in forward
    x = self.conv(x)
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "/opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

解码评估时报错NotImplementedError: Batch decoding is not supported yet.

当训练完成进行解码评估时，到utils.py文件中nbest_hyps = model.recognize_batch(fbank, recog_args, char_list=None, rnnlm=None)时链接到了espnet/espnet/nets/asr_interface.py 文件中的recognize_batch函数，然而该函数内容为

def recognize_batch(self, x, recog_args, char_list=None, rnnlm=None):
        """Beam search implementation for batch.

        :param torch.Tensor x: encoder hidden state sequences (B, Tmax, Henc)
        :param namespace recog_args: argument namespace containing options
        :param list char_list: list of characters
        :param torch.nn.Module rnnlm: language model module
        :return: N-best decoding results
        :rtype: list
        """
        raise NotImplementedError("Batch decoding is not supported yet.")

请问是链接的文件不对吗