Coder Social home page Coder Social logo

mohamedmesto / asr-accent-analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from archiki/asr-accent-analysis

5.0 0.0 0.0 15.76 MB

Analysis and investigating the confounding effect of accents in end-to-end Automatic Speech Recognition models.

License: MIT License

Python 0.84% Jupyter Notebook 98.84% Dockerfile 0.01% HTML 0.31%

asr-accent-analysis's Introduction

Analyzing Confounding Effect of Accents in E-2-E ASR models

This repository contains code for our paper How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems, on understanding the confounding effect of accents in an end-to-end Automatic Speech Recognition (ASR) model: DeepSpeech2 through several probing/analysis techniques, which is going to appear in ACL 2020.

Requirements

  • Docker: Version 19.03.1, build 74b1e89
  • nvidia-docker
  • apex==0.1
  • numpy==1.16.3
  • torch==1.1.0
  • tqdm==4.31.1
  • librosa==0.7.0
  • scipy==1.3.1

Instructions

  1. Clone deepspeech.pytorch and checkout the commit id e73ccf6. This was the stable commit used in all our experiments.
  2. Use the docker file provided in this directory and build the docker image followed by running it via the bash entrypoint,use the commands below. This should be same as the dockerfile present in your folder deepspeech.pytorch, the instructions in the README.md of that folder have been modified.
sudo docker build -t  deepspeech2.docker .
sudo docker run -ti --gpus all -v `pwd`/data:/workspace/data --entrypoint=/bin/bash --net=host --ipc=host deepspeech2.docker
  1. Install all the requirements using pip install -r requirements.txt
  2. Clone this repository code inside the docker container in the directory /workspace/ and install the other requirements.
  3. Install the Mozilla Common Voice Dataset, TIMIT Dataset used in the experiments and the optional Librispeech Dataset which is used only for training purposes.
  4. Preparing Manifests: The data used in deepspeech.pytorch is required to be in .csv called manifests with two columns: path to .wav file, path to .txt file. The .wav file is the speech clip and the .txt files contain the transcript in upper case. For Librispeech, use the data/librispeech.py in deepspeech.pytorch. For the other datsets, use the files DeepSpeech/data/make_{MCV,timit}_manifest.py provided. The file corresponding to TIMIT works on the original folder structure whereas as for MCV, we need to provide a .txt file with entries of the format- file.mp3 : reference text.
  5. The additional and/or modified files can be found in DeepSpeech/ along with our trained model and Language Model (LM) used in DeepSpeech/models.

Reproducing Experiment Results

  • Section 2.1, Table 1: This was obtained by testing the model using the following command and the appropriate manuscript:
cd deepspeech.pytorch/
python test.py --model-path ../Deepspeech/models/deepspeech_final.pth --test-manifest {accent manifest}.csv --cuda --decoder beam --alpha 2 --beta 0.4 --beam-width 128 --lm-path ../Deepspeech/models/4-gram.arpa
  • Section 3.1, Attribution Analysis: Code for all experiments in this section can be found in AttrbutionAnalysis.ipynb. The main requirements for this notebook include the gradient attributions calculated using Deepspeech/test_attr.pyand the frame-level alignments that can be derived from the time(s)-level alignments using gentle along with accent labels and reference transcripts. The paper contains attribution maps for the sentence: 'The burning fire had been extinguished.', the audio files for the various accents can be found in the folder audioFiles.

  • Section 3.2, Information Mixing Analysis: Datapoints for the figures showing phone focus and neighbour analysis can be found in Contribution.ipynb. Deepspeech/test_contr.py is used to calculate the gradient contributions given by equation (1).

  • Section 4, Mutual Information Experiments: Data for all experiments involving mutual information can be generated using MI.ipynb which uses averaged phone representations which can be generated by using frame-level alignments and averaging all the consecutive frames corresponding to a particular phone.

  • Section 5, Classifier-driven Analysis: All the code files relevant to the accent probe/classifiers and phone probe/classifiers can be found in the folders AccentProbe/ and PhoneProbes/ respectively. These probes are trained on entire represenations and frame-level (and average) representations respectively.

Citation

If you use this code in your work, please consider citing our paper:

@inproceedings{prasad-jyothi-2020-accents,
    title = "How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems",
    author = "Prasad, Archiki  and
      Jyothi, Preethi",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.345",
    pages = "3739--3753"}
    

Acknowledgements

This project uses code from deepspeech.pytorch.

asr-accent-analysis's People

Contributors

archiki avatar mohamedmesto avatar

Stargazers

Alexander avatar Jeff Snow avatar ko@l@tr33 avatar Roman avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.