Coder Social home page Coder Social logo

ts-asr's Introduction

Target Speaker Automatic Speech Recognition

Python version: 3.6 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

This SpeechBrain recipe includes scripts to train end-to-end transducer-based target speaker automatic speech recognition (TS-ASR) systems as proposed in Streaming Target-Speaker ASR with Neural Transducer.


⚡ Datasets

LibriSpeechMix

Generate the LibriSpeechMix data in <path-to-data-folder> following the official readme.


🛠️️ Installation

Clone the repository, navigate to <path-to-repository>, open a terminal and run:

pip install -e vendor/speechbrain
pip install -r requirements.txt

▶️ Quickstart

Navigate to <path-to-repository>, open a terminal and run:

python train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder>

To use multiple GPUs on the same node, run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus> \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launch

To use multiple GPUs on multiple nodes, for each node with rank 0, ..., <num-nodes> - 1 run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus-per-node> \
--nnodes=<num-nodes> --node_rank=<node-rank> --master_addr <rank-0-ip-addr> --master_port 5555 \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launch

Helper functions and scripts for plotting and analyzing the results can be found in utils.py and tools.

NOTE: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training, gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).

Examples

nohup python -m torch.distributed.launch --nproc_per_node=8 \
train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \
--data_folder datasets/LibriSpeechMix --num_epochs 100 \
--distributed_launch &

📧 Contact

[email protected]


ts-asr's People

Contributors

lucadellalib avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ts-asr's Issues

train-2mix split

Where can I find train-2mix (specifically train-2mix.jsonl)? The LibriSpeechMix repository contains only dev-clean-2mix, test-clean-2mix.

How to implement TSE + RNNT

Hi luca
Thank you very much for your open source project, this helps me a lot.
(1) I read the reference paper. How should the Cascade system(TSE+RNNT) mentioned in the paper be implemented?
(2) How do you turn it into a TSE project in your script with single channel mixed audio input and target audio output?
thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.