Coder Social home page Coder Social logo

mr-rawnet's Introduction

MR-RawNet

This repository contains official pytorch implementation and pre-trained models for following paper:

  • Title : MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms
  • Autor : Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo and Ha-Jin Yu
  • Comments : this paper has been accepted at Interspeech 2024 and is available here.

Abstract

overall

In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw waveforms. The MR-RawNet extracts time-frequency representations from raw waveforms via a multi-resolution feature extractor that optimally adjusts both temporal and spectral resolutions simultaneously. Furthermore, we apply a multi-resolution attention block that focuses on diverse and extensive temporal contexts, ensuring robustness against changes in utterance length. The experimental results, conducted on VoxCeleb1 dataset, demonstrate that the MR-RawNet exhibits superior performance in handling utterances of variable duration compared to other raw waveform-based systems.

Our experimental code was modified based on voxceleb_trainer, and we referenced the baseline code at here.

Data

The VoxCeleb datasets were used for training and test.

The train list should contain the identity and the file path, one line per utterance, as follows:

id00000 id00000/youtube_key/12345.wav
id00012 id00012/21Uxsk56VDQ/00001.wav

The train list for VoxCeleb2 can be download from here, and the test lists for VoxCeleb1 can be downloaded from here.

For data augmentation, the following script can be used to download and prepare.

python3 ./dataprep.py --save_path data --augment

We also performed an out-of-domain evaluation using the VOiCES development set.

Each dataset must be downloaded in advance for training and testing, and its path must be mapped to the docker environment.

Environment

Docker image (nvcr.io/nvidia/pytorch:23.07-py3) of Nvidia GPU Cloud was used for conducting our experiments.

Make docker image and activate docker container.

./docker/build.sh
./docker/run.sh

Note that you need to modify the mapping path before running the 'run.sh' file.

Training

  • MR-RawNet on a single GPU
python3 ./trainSpeakerNet.py --config ./configs/MR_RawNet.yaml
  • MR-RawNet on multiple GPUs
CUDA_VISIBLE_DEVICES=0,1 python3 ./trainSpeakerNet.py --config ./configs/MR_RawNet.yaml --distributed

Use --distributed flag to enable distributed training. If you are running more than one distributed training session, you need to change the --port argument.

Note that the configuration file overrides the arguments passed via command line.

Test

The following script should return: EER 0.8294.

python3 ./trainSpeakerNet.py --eval --config ./configs/MR_RawNet.yaml --initial_model MR_RawNet.pt

Citation

Please cite if you make use of the code.

@article{kim2024mrrawnet,
  title={MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms},
  author={Kim, Seung-bin and Lim, Chan-yeong and Heo, Jungwoo and Kim, Ju-ho and Shin, Hyun-seo and Koo, Kyo-Won and Yu, Ha-Jin},
  journal={arXiv preprint arXiv:2406.07103},
  year={2024}
}

@inproceedings{kim2024mrrawnet,
  title={MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms},
  author={Kim, Seung-bin and Lim, Chan-yeong and Heo, Jungwoo and Kim, Ju-ho and Shin, Hyun-seo and Koo, Kyo-Won and Yu, Ha-Jin},
  booktitle={Proc. Interspeech},
  year={2024}
}

mr-rawnet's People

Contributors

kimho1wq avatar

Stargazers

 avatar 소부승 avatar  avatar Xinsheng Wang avatar DS.Xu avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.