Coder Social home page Coder Social logo

hechmik / voxceleb_enrichment_age_gender Goto Github PK

View Code? Open in Web Editor NEW
56.0 4.0 14.0 64.09 MB

Code and data repository for paper "VoxCeleb enrichment for Age and Gender recognition" submitted at ASRU 2021

License: MIT License

Jupyter Notebook 89.98% Python 10.02%
voxceleb sound machine-learning deep-learning gender-recognition age-prediction interspeech asru2021 voxceleb-enrichment age

voxceleb_enrichment_age_gender's Introduction

VoxCeleb enrichment for Age and Gender recognition

This repository contains all the material related to the paper "VoxCeleb enrichment for Age and Gender recognition" submitted for publication at ASRU 2021. For those mainly interested in downloading data you can download the ENRICHED DATASET csv file.

Arxiv Link

https://arxiv.org/abs/2109.13510

Abstract

VoxCeleb datasets are widely used in speaker recognition studies. Our work serves two purposes.

First, we provide speaker age labels and (an alternative) annotation of speaker gender. Second, we demonstrate the use of this metadata by constructing age and gender recognition models with different features and classifiers. We query different celebrity databases and apply consensus rules to derive age and gender labels. We also compare the original VoxCeleb gender labels with our labels to identify records that might be mislabeled in the original VoxCeleb data.

On modeling side, the lowest mean absolute error (MAE) in age regression, 9.443 years, is obtained using i-vector features with ridge regression. This indicates challenge in age estimation from in-the-wild style speech data.

Authors

Repo structure

This repository is structured as follows:

  • dataset: here the ENRICHED DATASET can be found and downloaded, as well as support files detailing which records have been used for training and testing
  • best_models: the best models reported in the paper, Linear Regression with i-Vectors (Age regression) and Logistic regression with i-Vectors (Gender recognition), are made available so that other users can try them in a variety of scenarios (assuming that features where computed as described)
  • notebooks: Python scripts and Jupyter notebooks used throughout the various steps

Aknowledgments

This work has been partially sponsored by Academy of Finland (proj. no. 309629).

Considering the nature of the work, we would like to cite also in this README the original VoxCeleb 1 and VoxCeleb 2 papers:

[1] A. Nagrani*, J. S. Chung*, A. Zisserman, VoxCeleb: a large-scale speaker identification dataset, 
INTERSPEECH, 2017

[2] J. S. Chung*, A. Nagrani*, A. Zisserman, VoxCeleb2: Deep Speaker Recognition, 
INTERSPEECH, 2018

Similar works

This work was carried out in 2020 when the first author was affiliated with University of Eastern Finland. The authors came later across an independent but closely related work that addresses age labeling of VoxCeleb. The key difference between our work and theirs is that we assigned age labels based on the videos semantic and people identity, while they trained a facial age estimation model for the labeling task, taking as input the visual frames of the original YouTube videos. For readers convenience here it follows the paper's full reference, together with their github repo.

N. Tawara, A. Ogawa, Y. Kitagishi and H. Kamiyama, "Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6963-6967, doi: 10.1109/ICASSP39728.2021.9414272.

GITHUB Repository: https://github.com/nttcslab-sp/agevoxceleb

Contact information

For any comment, clarification or suggestion please feel free to open an issue here in GitHub and/or send me an email at hechmi DOT khaled1995 AT gmail DOT com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.