Coder Social home page Coder Social logo

imenbaa / ba-lr Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 26.84 MB

Explainable Speaker Recognition

License: MIT License

Python 100.00%
forensics automatic-voice-comparison forensic-speaker-recognition interpretability-and-explainability resnet34 x-vector-pytorch likelihood-ratio speaker-recognition

ba-lr's Introduction

BA-LR: Toward an interpretable and explainable approach for automatic Speaker Recognition

Table of content:

How to install?

To install BA-LR, do the following:

  1. Use a conda environment
  2. Install requirements:
pip install -r requirements.txt
  1. Clone repository:
git clone https://github.com/Imenbaa/BA-LR.git

BA-vectors extractor

The extractor is trained on Voxceleb2 dataset https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html. It is composed of a ResNet generator of speech representations optimised for speaker classification task. After training phase, we obtain sparse representations (0,x), we replace x to 1 to obtain binary representation. The trained generator parameters are in model/voxceleb_BA.

Generator

Filterbanks -> ResNet generator -> embedding -> Softplus layer() -> Sparse representation

Speaker Classifier

Sparse representation -> classifier (i.e. NN projected to num_classes with Softmax) -> class prediction

BA-Vector

Sparse representation -> BA-vectors

To extract the trained representations, do the following:

cd extractor
[TRAIN BAvectors]
python extract.py -m /model/voxceleb_BA/model_dir --checkpoint 2100 -d [WAV_FILES_TRAIN] -f "txt"
[TEST BAvectors]
python extract.py -m /model/voxceleb_BA/model_dir --checkpoint 2100 -d [WAV_FILES_Test] -f "txt"

BA behavioral parameters

Behavioral parameters per BA such as the typicality, typ, and the dropout, dout are calculated based on the train data.

python BA_params.py --path [TRAIN_DATA]/BAvectors.txt  --typ_path data/typ_BA_soft.txt --dout_path data/dropout_soft.txt

dout typ The dropin parameter, Din, is related to the noise that could occur in the data. The value of drop-in is tuned on a dedicated set of comparison pairs extracted from the train corpus, by minimizing the actual Calibrated Log LR, Cllr. The optimised value of dropin is 0.50 for Cllrmin/act equal to 0.13/0.16 and EER= 2.8.

Din

LR Framework

Partial LRs

partials

Global LR

Interpretability & Explainability

The interpretability of BA-LR approach is illustrated by two aspects. First, the characterisation of each attribute in terms of discriminatory power and reliability. Second, the impact of the attribute behavior on its contribution to the global LR value. For instance, if the behavior of an attribute is very discriminating of the speaker and that attribute is trustworthy, then its contribution to the final LR would be the most important, the most informative and the most reliable. cont If we take a target and a non target voice pair and we try to see the contribution of the different BAs to the final LR decision using Shape figures. We notice that there are some important BAs that lead the decision to negative or positive values and that have the biggest contribution to the LR. cont

To be continued...

References

For the resnet extractor we used: https://github.com/Chaanks/stklia

If you use this code, please do not forget to cite our paper where the idea of this approach is firstly introduced and got the best paper award. Thanks!

@inproceedings{Benamor2022,
  title={BA-LR: Binary-Attribute-based Likelihood Ratio estimation for forensic voice comparison
},
  author={Imen Ben Amor, Jean-François Bonastre},
  booktitle={IEEE International Workshop on Biometrics and Forensics 2022},
  year={2022},
  organization={IEEE}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.