Target Speaker Automatic Speech Recognition

This SpeechBrain recipe includes scripts to train end-to-end transducer-based target speaker automatic speech recognition (TS-ASR) systems as proposed in Streaming Target-Speaker ASR with Neural Transducer.

⚡ Datasets

LibriSpeechMix

Generate the LibriSpeechMix data in <path-to-data-folder> following the official readme.

🛠️️ Installation

Clone the repository, navigate to <path-to-repository>, open a terminal and run:

pip install -e vendor/speechbrain
pip install -r requirements.txt

▶️ Quickstart

Navigate to <path-to-repository>, open a terminal and run:

python train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder>

To use multiple GPUs on the same node, run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus> \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launch

To use multiple GPUs on multiple nodes, for each node with rank 0, ..., <num-nodes> - 1 run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus-per-node> \
--nnodes=<num-nodes> --node_rank=<node-rank> --master_addr <rank-0-ip-addr> --master_port 5555 \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launch

Helper functions and scripts for plotting and analyzing the results can be found in utils.py and tools.

NOTE: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training, gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).

Examples

nohup python -m torch.distributed.launch --nproc_per_node=8 \
train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \
--data_folder datasets/LibriSpeechMix --num_epochs 100 \
--distributed_launch &

📧 Contact

[email protected]

lucadellalib / ts-asr Goto Github PK

ts-asr's Introduction

Target Speaker Automatic Speech Recognition

⚡ Datasets

LibriSpeechMix

🛠️️ Installation

▶️ Quickstart

Examples

📧 Contact

ts-asr's People

Contributors

Stargazers

Watchers

Forkers

ts-asr's Issues

train-2mix split

How to implement TSE + RNNT

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent