Coder Social home page Coder Social logo

gender_recognition_svm's Introduction

gender_recognition_svm

Summary

gender_recognition_svm is a python script that can detect the gender of a speaker in an audio recording using a Support Vector Machine. It was trained using the Gender Recognition by Voice kaggle set and uses pyaudio and praat scripting to record and extract features from audio segments. It performs at approximately 96% accuracy.

Running Script

$ python ./train_svm.py

Motivation

Speech processing is a rapidly growing area in the field of Machine Learning. While speech recognition has become almost ubiquitous with the technologies we use, it is seldom used for more than simply transcribing speech to text. While digital phones have introduced speech recognition to bridge the gap between human and computers, it has failed to create Artificial Intelligence that can understand more than what we are saying. To truly have seamless human-computer interaction, we must produce devices that understand more than what humans tell it. It must have clairvoyance. The prime motivation of this project is to produce a classifier that can detect the gender of a speaker. Possible real world applications of this project are:

  • using it with existing AI technology to learn more about the user
  • classifying the users gender as a biometric which can be used to detect fraud

Implementation

Through the process of explorative data analysis, it was decided that the appropriate initial model to use would be a binary classifier that would be implemented using an SVM. The following will expand on the process used to implement the machine learning model.

Training and Testing

The idea of a gender classification model came when I was exploring datasets for possible projects to work on. Having worked in signal and speech processing before, I was familiar with machine learning in the field of speech analysis. The biggest problem I had found was that labeled speech data was hard to come by, and unlabeled speech data often took hours to label. What's more, extracting features from audio on python (my preference when prototyping ML models) was an even bigger pain due to the lack of audio processing libraries out there. When I came upon the Gender Recognition Data Set on Kaggle (https://www.kaggle.com/primaryobjects/voicegender), I saw an opportunity to skip the boring and time consuming tasks often required in the preprocessing phase and go directly to implementing ML. While kaggle does have examples of people implementing ML for gender recognition, I wanted to see how accurately a model trained on the data set would work in real life audio recordings. While I grew up having a rather feminine voice, puberty had made it substantially lower, and having a trained ML model identify me as a male would be instant gratification.

Explorative Data Analysis

As a novice to the field of Data Science, I always like to visualize the data prior to do any sort of manipulation. When implementing a classifier, I always like to look at the distribution of the data. When visualizing the data we can see that there is equal distribution of male and female points. data distribution

Thus using metrics such as accuracy can be a valid measure of the model's performance.

Feature Extraction

Given the lack of audio processing tools accessible in python, there were only a handful of features I could extract from real audio recordings. To determine the most appropriate features, I turned to academia. In a recent paper out of Columbia University (http://www.cs.columbia.edu/speech/PaperFiles/2016/levitan_prosody16.pdf), researchers were able to get significant progress in gender classification using the fundamental frequency and MFCC's. Luckily, the data set I had contained feature data on mean,minimum, and maximum fundamental frequency in the recordings. Having a possible set of features, I then tried to visualize the data as a function of these features.
Features

Thus we can see here a clear separation from male and female data points. The data can easily be segmented by simply running a line through the middle and give us a pretty good accuracy.

Model Selection

Given my familiarity with it, I have chosen to use an SVM binary classifier as primary model. A future iteration of this project would be to implement another algorithm and calculate accuracy.

Hyperparameter Tuning

The first parameter that was tested was the SVM kernel type. The Kernels define the way of separating the data into dinstinct clusters. Linear kernel methods work by segmenting the data through linear boundaries, while polynomial and rbf methods segment the data using non-linear boundaries. The following shows the mean accuracies of each kernel method through cross-validation of 10 iterations using a training set composed of 60% of the data. All other hyperparameters were set to default.

Kernel Method Mean Accuracy
Linear 95.32%
rbf 94.78%
Polynomial 50.63%

Thus as mentioned before, the data looked easily segmentable through a linear boundary. We can see here this stands true as the linear kernel method performs the best over all options.

The next value to be tuned is the C value of the SVM. A SVM works to perform two functions: to correctly separate the points and to create a hyperplane that maximises the separation between two groups. However it is not always the case that both can happen at the same time. The C value allows us to specify which of the two functions we prefer. In this case we simply need want the C value that gives us the highest accuracy: C values

The final parameter to be tuned is the gamma value. The gamma value dictates how solwly the svm converges to a value by setting the jump value. When tuning the gamma value, we found no significant change in accuracy with changes in the gamma val. Thus the gamma value of 0.00001 was chosen. Gamma Values

After fine tuning the parameters, the SVM was then run on our testing set and found to have an accuracy of 95.66%.

Results

Despite being trained on a publicly available training set, the model performs very well when guessing the gender of actual recordings of speech. I tested the performance of the model on 8 genuine recordings of the utterance "hello" by two different speakers (1 male, 1 female). The SVM correctly identified the gender in all 8 tests. Where the SVM begins to show faults is when gender is impersonated. On recordings where a female might pretend to sound like a male, the model consistently recognized the correct gender. However, when males pretended to sound like females, the SVM was not as consistent. Further testing must be done to give appropriate quantitative values. Likewise, the SVM also fails to detect the gender of a child. This is most likely since the training data set only contained points from adults. A more diverse data set could solve this problem.

gender_recognition_svm's People

Contributors

sharansankar avatar

Stargazers

Muhammet Ali Köker avatar  avatar  avatar

Watchers

James Cloos avatar Muhammet Ali Köker avatar Di Jiang avatar  avatar

gender_recognition_svm's Issues

IndexError: too many indices for array

Traceback (most recent call last):
File "./train_svm.py", line 134, in
tuned_svm = parameter_tuning_svm(df)
File "./train_svm.py", line 42, in parameter_tuning_svm
scores = cross_val_score(svc, training, training_result, cv=10, scoring='accuracy')
File "/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1572, in cross_val_score
cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
File "/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1835, in check_cv
cv = StratifiedKFold(y, cv)
File "/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 570, in init
label_test_folds = test_folds[y == label]
IndexError: too many indices for array

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.