Coder Social home page Coder Social logo

audioku / meta-transfer-learning Goto Github PK

View Code? Open in Web Editor NEW
49.0 6.0 11.0 6.15 MB

Implementation of meta-transfer-learning for ASR and LM (ACL 2020)

License: MIT License

Python 99.97% Shell 0.03%
meta-learning speech speech-recognition asr code-switching multi-lingual language language-model neural-network transformer pytorch meta-transfer-learning mixed-language acl

meta-transfer-learning's Introduction

Meta-Transfer Learning for Code-Switched Speech Recognition

Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Peng Xu, Pascale Fung

License: MIT

This is the implementation of our paper accepted in ACL 2020.

This code has been written using PyTorch. If you use any source codes or datasets included in this toolkit in your work, please cite the following paper.

@inproceedings{winata-etal-2020-meta,
    title = "Meta-Transfer Learning for Code-Switched Speech Recognition",
    author = "Winata, Genta Indra  and
      Cahyawijaya, Samuel  and
      Lin, Zhaojiang  and
      Liu, Zihan  and
      Xu, Peng  and
      Fung, Pascale",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.348",
    pages = "3770--3776",
}

Abstract

An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and the expense and significant effort required to collect mixed-language data. We therefore propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting by judiciously extracting information from high-resource monolingual datasets. Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data. Based on experimental results, our model outperforms existing baselines on speech recognition and language modeling tasks, and is faster to converge.

Data

Model Architecture

Setup

  • Install PyTorch (Tested in PyTorch 1.0 and Python 3.6)
  • Install library dependencies (requirement.txt)

Run the code

  • Meta-Transfer Learning
python meta_transfer_train.py --train-manifest-list data/manifests/cv-valid-train_manifest.csv data/manifests/hkust_16khz_train_manifest.csv data/manifests/seame_phaseII_train_manifest.csv \
--train-partition-list 1 1 1 --valid-manifest-list data/manifests/cv-valid-dev_manifest.csv data/manifests/hkust_16khz_dev_manifest.csv data/manifests/seame_phaseII_val_manifest.csv \ 
--name mtl_enc2_dec4_512_b8_16khz_copy_grad --cuda --k-train 8 --k-valid 8 --labels-path data/labels/hkust_seame_labels.json --lr 1e-4 --save-folder save/ --save-every 10000 \ 
--feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,200 \
--src-max-len 5000 --tgt-max-len 2500 --evaluate-every 10000 --epochs 1000000 --sample-rate 16000 --copy-grad
  • Joint training
python joint_train.py --train-manifest-list data/manifests/cv-valid-train_manifest.csv data/manifests/hkust_16khz_train_manifest.csv data/manifests/seame_phaseII_train_manifest.csv \
--valid-manifest-list data/manifests/cv-valid-dev_manifest.csv data/manifests/hkust_16khz_dev_manifest.csv data/manifests/seame_phaseII_val_manifest.csv --cuda --k-train 8 \
--labels-path data/labels/hkust_seame_labels.json --lr 1e-4 --name joint_enc2_dec4_512_b8_16khz --save-folder save/ --save-every 10000 --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 \
--num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,200 --src-max-len 5000 --tgt-max-len 2500 --evaluate-every 10000 \
--epochs 10000000 --sample-rate 16000 --train-partition-list 1 1 1

Bug Report

Feel free to create an issue or send email to [email protected]

meta-transfer-learning's People

Contributors

gentaiscool avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

meta-transfer-learning's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.