VoiceFilter

Dependencies

Python and packages

This code was tested on Python 3.8 with PyTorch 1.10.0. Other packages can be installed by:
```
pip install -r requirements.txt
```

Prepare Dataset

Download LibriSpeech dataset

Install axel first (apt install axel).

Use axel to download datasets.

axel -n 10 -a -c "https://www.openslr.org/resources/12/train-clean-100.tar.gz"
axel -n 10 -a -c "https://www.openslr.org/resources/12/train-clean-360.tar.gz"
axel -n 10 -a -c "https://www.openslr.org/resources/12/dev-clean.tar.gz"
axel -n 10 -a -c "https://www.openslr.org/resources/12/test-clean.tar.gz"

Then, unzip tar.gz file to datasets folder:

tar -xvzf train-clear-100.tar.gz
tar -xvzf train-clear-360.tar.gz
tar -xvzf dev-clean.tar.gz
tar -xvzf test-clean.tar.gz

Edit config.yaml
```
cd config
cp default.yaml config.yaml
```

Train VoiceFilter

Get pretrained model for speaker recognition system

The model can be downloaded at this GDrive link.

Using gdown command for convenient download (gdown was installed via pip).
```
gdown --id 1YFmhmUok-W76JkrfA0fzQt3c-ZsfiwfL
```
Run

After specifying train_dir, test_dir at config.yaml, run:
```
python train.py -c [config.yaml] -e [path of embedder pt file] -m [name] --train_set [list of datasets used to generate train data] --test_set [list of datasets used to generate test data]
```
This will create chkpt/name and logs/name at base directory(-b option, . in default)

For reproducing the original experiment, use: the following bash command:
```
python train.py -c config.yaml -e embedder.pt -m powlaw_loss --train_set librispeech-train --test_set librispeech-test
```
Supported dataset include (for detail implementation, see the source code in datasets/GenerateDataset.py): librispeech-train, librispeech-test, vctk , vin, voxceleb1-train, voxceleb1-test, voxceleb2-train , voxceleb1-test, zalo-train, zalo-test
View tensorboardX
```
tensorboard --logdir ./logs
```

Resuming from checkpoint

python trainer.py -c [config yaml] --checkpoint_path [chkpt/name/chkpt_{step}.pt] -e [path of embedder pt file] -m name --train_set [list of datasets used to generate train data] --test_set [list of datasets used to generate test data]

For example, finetune with VN dataset:

python train.py -c config.yaml -e embedder.pt -m powlaw_loss_finetune --checkpoint_path chkpt/powlaw_loss/chkpt_168000.pt --train_set vin zalo-train --test_set zalo-test

License

Apache License 2.0

This repository contains codes adapted/copied from the followings:

utils/adabound.py from https://github.com/Luolc/AdaBound (Apache License 2.0)
utils/audio.py from https://github.com/keithito/tacotron (MIT License)
utils/hparams.py from https://github.com/HarryVolek/PyTorch_Speaker_Verification (No License specified)
utils/normalize-resample.sh from https://unix.stackexchange.com/a/216475

vlozg / voicefilter Goto Github PK

voicefilter's Introduction

VoiceFilter

Dependencies

Prepare Dataset

Train VoiceFilter

License

voicefilter's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent