Coder Social home page Coder Social logo

minglunhan / cif-hieradist Goto Github PK

View Code? Open in Web Editor NEW
36.0 2.0 5.0 23.08 MB

[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation

License: Apache License 2.0

Shell 0.20% Python 97.89% Lua 0.10% C++ 0.53% Cuda 0.95% Cython 0.33%
cif

cif-hieradist's Introduction

CIF-HieraDist

Introduction

[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

🚀🚀 This repository is the official implementation for the hierarchical knowledge distillation (HieraDist) developed for the continuous integrate-and-fire (CIF) based ASR models.

We propose the hierarchical knowledge distillation (HKD or HieraDist) to transfer the knowledge from the pre-trained language models (PLMs) to the ASR models. HieraDist employs cross-modal knowledge distillation with token-level contrastive loss at the acoustic level and knowledge distillation with regression loss at the linguistic level.

Alt Text

Please refer to the original paper for more details: Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation.

What can you do with this repository?

  1. Train a CIF-based ASR model;
  2. Train a CIF-based ASR model with acoustic contrastive distillation (ACD);
  3. Train a CIF-based ASR model with linguistic regression distillation (LRD);
  4. Train a CIF-based ASR model with hierarchical knowledge distillation (HieraDist/HKD).
  5. Conduct model inference.

Usage

Installation

My default python version:

python==3.7.9

You should install all dependecies with the following commands:

cd CIF-HieraDist
pip install -r requirements.txt
pip install -e ./

Let's take the AISHELL-1 dataset as an example and navigate to the corresponding working directory for this dataset:

cd egs/aishell1

Data Preparation

The development of this repository is based on the Fairseq. Please refer to the original data preparation of speech-to-text in Fairseq. You can also refer to the https://github.com/MingLunHan/CIF-HieraDist/blob/main/examples/speech_to_text/prep_aishell1_data.py and modify it for your datasets.

python ../../examples/speech_to_text/prep_aishell1_data.py --input-root ${YOUR_PATH_TO_AISHELL1} --output-root ./data/

Note that YOUR_PATH_TO_AISHELL1 is the parent directory of the AISHELL-1 dataset.

Model Training

To train a standard CIF-based ASR model, you should use the command:

bash run_train_aishell1_cif_small_exp35_14.sh

To train a CIF-based ASR model with HieraDist/HKD, you should first extract features from PLM with the following command:

bash run_extract_plm_feats.sh

The output json file of PLM features should be set in the configuration file in egs/aishell1/data. Then, you should use the command:

bash run_train_bert_distilled_cif_exp4_decdistill0p01_noscale_finalstate_contrastiveloss1p0_conttemp0p02_rmvrpt_neg700.sh

We provide the original training logs in egs/aishell1 for comparison.

Model Inference

To conduct the inference for an ASR model, you should use the command:

bash run_infer.sh

We provide the original inference logs in egs/aishell1 for comparison.

Key Results

When not using any extra language models, we can get the results in the following table:

Methods dev (CER %) test (CER %)
CIF 4.5 4.9
CIF + ACD 4.2 4.7
CIF + LRD 4.0 4.5
CIF + HieraDist 3.8 4.2 (4.1 with better decoding hyper-parameters in later experiments)

With the language model trained with the text of AISHELL-1 itself, we can get:

Methods dev (CER %) test (CER %)
CIF 4.4 4.8
CIF + ACD 4.2 4.6
CIF + LRD 4.0 4.4
CIF + HieraDist 3.8 4.1

Acknowledgments

This repository is developed on Fairseq. Thanks to the Facebook Artificial Intelligence Research (FAIR) for releasing the Fairseq framework.

Other Resources

Citation

If you are inspired by this paper, or use the core codes from this repository for your development, or conduct research related to it, please cite this paper with the following bibtex format:

@inproceedings{han23_interspeech,
  author={Minglun Han and Feilong Chen and Jing Shi and Shuang Xu and Bo Xu},
  title={{Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1364--1368},
  doi={10.21437/Interspeech.2023-423}
}

Thanks!

cif-hieradist's People

Contributors

minglunhan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cif-hieradist's Issues

RuntimeError: Failed to decode audio.

What is the reason for showing audio decoding error while extracting features? Please answer.
Traceback (most recent call last):
File "../../examples/speech_to_text/prep_aishell1_data.py", line 358, in
main()
File "../../examples/speech_to_text/prep_aishell1_data.py", line 354, in main
process(args)
File "../../examples/speech_to_text/prep_aishell1_data.py", line 266, in process
for wav, sample_rate, _, spk_id, utt_id in tqdm(dataset):
File "/home/weiyangjie/anaconda3/envs/fairseq/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "../../examples/speech_to_text/prep_aishell1_data.py", line 165, in getitem
return load_aishell1_item(
File "../../examples/speech_to_text/prep_aishell1_data.py", line 55, in load_aishell1_item
waveform, sample_rate = torchaudio.load(file_audio)
File "/home/weiyangjie/anaconda3/envs/fairseq/lib/python3.8/site-packages/torchaudio/_backend/utils.py", line 203, in load
return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
File "/home/weiyangjie/anaconda3/envs/fairseq/lib/python3.8/site-packages/torchaudio/_backend/ffmpeg.py", line 334, in load
return load_audio(os.path.normpath(uri), frame_offset, num_frames, normalize, channels_first, format)
File "/home/weiyangjie/anaconda3/envs/fairseq/lib/python3.8/site-packages/torchaudio/_backend/ffmpeg.py", line 100, in load_audio
return torch.ops.torchaudio.compat_load(src, format, filter, channels_first)
File "/home/weiyangjie/anaconda3/envs/fairseq/lib/python3.8/site-packages/torch/_ops.py", line 692, in call
return self._op(*args, **kwargs or {})
RuntimeError: Failed to decode audio.

Running error

I'm using a Mac device. Can I run this model on it? While debugging, it seemed like I needed a GPU device to run it

Data Preparation

作者你好,我在准备数据的时候遇到了问题
image
我下载的sishell_1数据集中并没有该子文件夹,请问是要对原数据集进行某种处理吗?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.