Coder Social home page Coder Social logo

wav2vec-toolkit's Introduction

wav2vec-toolkit

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

This repository accompanies the ๐Ÿค— HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for low-resource languages [link]

How to contribute

(Mostly identical to the huggingface/datasets contributing guide)

  1. Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.

  2. Clone your fork to your local disk, and add the base repository as a remote:

    git clone [email protected]:<your Github handle>/wav2vec-toolkit.git
    cd wav2vec-toolkit
    git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git
  3. Set up a development environment by running the following command in a virtual environment:

    conda create -n env python=3.7 --y
    conda activate env
    pip install -e ".[dev]"
    pip install -r languages/{YOUR_SPECIFIC_LANGUAGE}/requirements.txt

    (If wav2vec-toolkit was already installed in the virtual environment, remove it with pip uninstall wav2vec_toolkit before reinstalling it in editable mode with the -e flag.)

  4. Create a new branch to hold your development changes:

    git checkout -b a-descriptive-name-for-my-changes

    do not work on the master branch.

  5. Develop the features on your branch.

    1. Adding a new language here
  6. Format your code. Run black and isort so that your newly added files look nice with the following command:

    black --line-length 119 --target-version py36 src scripts languages
    isort src scripts languages
  7. Once you're happy with your implementation, add your changes and make a commit to record your changes locally:

    git add .
    git commit

    It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:

    git fetch upstream
    git rebase upstream/main

    Push the changes to your account using:

    git push -u origin a-descriptive-name-for-my-changes
  8. Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.

wav2vec-toolkit's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

wav2vec-toolkit's Issues

Regarding the Code Organization

Recently, I came across Facebook AI's MMF repository which had this registry feature which I find really cool. Since then, I have followed the same code organization for my projects. One of them you can find at https://github.com/gchhablani/toxic-spans-detection

Basically, there is one registry (or what I call it - configmapper). Every run has a YAML config, and using the registry and YAML config, we load objects/classes in the script.

I'm not sure if we can do the same here, but some basic organization for experiments could follow similar structure. Let me know your thoughts on this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.