Coder Social home page Coder Social logo

linru0 / paper-joint-unsupervised-and-supervised-training-for-asr-via-bilevel-optimization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from afmsaif/joint-self-supervised-and-supervised-training-for-speech-models

0.0 0.0 0.0 133 KB

paper title: joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

License: Apache License 2.0

Python 100.00%

paper-joint-unsupervised-and-supervised-training-for-asr-via-bilevel-optimization's Introduction

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

This repository contains the code for the paper "Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization" by A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, and Tianyi Chen.

Abstract

In this paper, we present a novel bilevel optimization-based training approach for acoustic models in automatic speech recognition (ASR) tasks, termed bi-level joint unsupervised and supervised training (BL-JUST). BL-JUST employs lower and upper level optimizations with unsupervised and supervised losses respectively, leveraging recent advances in penalty-based bilevel optimization to address this challenging ASR problem with manageable complexity and rigorous convergence guarantees. Extensive experiments on the LibriSpeech and TED-LIUM v2 datasets demonstrate that BL-JUST outperforms the commonly used pre-training followed by fine-tuning strategy.

Key Contributions

  1. BL-JUST Framework: Introduces a feedback loop between unsupervised and supervised training, unlike the conventional PT+FT strategy.
  2. Bilevel Optimization: Utilizes penalty-based bilevel optimization for joint training with convergence guarantees.
  3. Empirical Results: Demonstrates superior performance on LibriSpeech and TED-LIUM v2 datasets, reducing word error rates (WERs) and improving training efficiency.

Contents

  • Code: Implementation of the BL-JUST training framework.
  • Experiments: Scripts and configurations for reproducing the experiments presented in the paper.
  • Datasets: Instructions for downloading and preparing the LibriSpeech and TED-LIUM v2 datasets.

Getting Started

  1. Dependencies:
    • Python=3.9
    • Pytorch=2
  2. Installation: Step-by-step guide to setting up the environment.
    git clone https://github.com/afmsaif/Joint-self-supervised-and-supervised-training-for-speech-models.git
    cd bl-just-asr
    pip install -r requirements.txt
  3. Running Experiments: Detailed instructions to run the training and evaluation scripts.

Usage

  • Training: Example commands for training the ASR models using the BL-JUST framework.
  • Evaluation: Commands to evaluate the trained models and reproduce the results from the paper.

Results

  • Performance Metrics: Summary of the ASR performance on different datasets.
  • Comparative Analysis: Comparison with the traditional PT+FT approach.

Citation

If you find this code useful in your research, please consider citing our paper:

@article{saif2024joint, title={Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization}, author={Saif, AFM and Cui, Xiaodong and Shen, Han and Lu, Songtao and Kingsbury, Brian and Chen, Tianyi}, journal={arXiv preprint arXiv:2401.06980}, year={2024} }

Acknowledgments

This work was supported by the Rensselaer-IBM AI Research Collaboration, part of the IBM AI Horizons Network and Cisco research grant.

paper-joint-unsupervised-and-supervised-training-for-asr-via-bilevel-optimization's People

Contributors

afmsaif avatar chentianyi1991 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.