Coder Social home page Coder Social logo

human-identification's Introduction

Classification using multi modalities of data: Classifying data point class with audio-image pair

Human Identitfication using Multi-Modal models, using audio-image pair, and also seperately.

Raw Data

VoxCeleb Raw Data was used in this project VoxCeleb Data, which is a large scale audio-visual dataset of human speech. Following are the download links to raw data used:-

Data Selection

Data Selection is done locally using Windows File Manager Only.

  • Subset of Image Data is created according to the audio data available, and named as subSetFaces.

Pre-Processing Data

All Pre-Processing of Data is done in models.ipynb only.

  1. Getting, Cropping and saving Image Data in np array of np.unit8 format from subSetFaces as 100 images per person for 40 persons, creating labels.
  2. Getting and Trimming Audio Data to equal sizes, as 100 audio files per persona for 40 persons.
  3. Converting equal sized Audio Files into spectograms with log frequency, and storing it in the form of np array in np.unit8 format.
  4. Mapping Images Audio to Audio-Image Paired Data.
  5. Both Images and Audio are Rescaled from 0 to 1 inside Models in Rescaling Layer, this is to save memory as storing data in the form of np.unit8 format is much less expensive than storing them in float64 format.

Models Used

  1. CNN for Image Classification:- This model consists of One Preprocessing Rescaling Layer, Two Convolution2D Layers with L2 Regularization, Two MaxPooling2D Layers, One Flatten Layer, Two Dense Layers and One Dropout Layer.
  2. CNN for Audio Classification:- This model consists of One Preprocessing Rescaling Layer, Two Convolution2D Layers with L2 Regularization, Two MaxPooling2D Layers, One Flatten Layer, Two Dense Layers.
  3. CRNN for Audio Classification:- This model consists of One Preprocessing Rescalingn Layer, Two Convolution1D Layers with L2 Regularization, Two MaxPooling1D Layers, One LSTM Layers, Two Dense Layers.
  4. CNN-CNN Parrallel Multi-Modal Model:- This model consists of 1st and 2nd Models in prallel and Fusion Model with Two Dense Layers and One Dropout Layer.
  5. CNN-CRNN Parrallel Multi-Modal Model:- This model consists of 1st and 3rd Models in prallel and Fusion Model with Two Dense Layers and One Dropout Layer.

Requirements

  • Python 3
  • Matplotlib
  • PIL
  • numpy
  • librosa

Results

Model Training Accuracy Valiation Accuracy
CNN Image Classification 100.0% 84.50%
CNN Audio Classification 100.0% 79%
CRNN Audio Classification 79.75% 71.50%
CNN-CNN Multi Modal Model 99.86% 73.25%
CNN-CRNN Multi Modal Model 99.92% 80.00%

How to Run

Run models.ipynb file.

Best Result

Best Result was achieved with CNN-CRNN parallel multi modal model.

Accuracy

Loss

human-identification's People

Contributors

vishnuagbly avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.