Coder Social home page Coder Social logo

elaaatif / spectrogram-voice-analysis-with-resnet-152 Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.65 MB

This repository contains code for a deep learning project aimed at analyzing voice data using spectrograms and a ResNet-152 architecture using the UrbanSound8K dataset

Jupyter Notebook 100.00%
resnet-152 spectrogram urban-datasets voice-analysis

spectrogram-voice-analysis-with-resnet-152's Introduction

Spectrogram Voice Analysis with ResNet-152

This repository contains code for a deep learning project aimed at analyzing voice data using spectrograms and a ResNet-152 architecture. The project focuses on classifying audio samples into different categories based on the UrbanSound8K dataset.

Dataset

The dataset used in this project is the UrbanSound8K dataset, available on Kaggle. It consists of 8732 labeled sound excerpts (4 seconds each) of urban sounds across 10 classes. You can find the dataset here.

Usage

  1. Data Preparation: Download the UrbanSound8K dataset from the provided link. Extract the dataset and ensure it is structured correctly. The dataset includes metadata in CSV format along with audio files categorized into different folds.

  2. Environment Setup: This project was developed using Python in a Jupyter Notebook environment. Ensure you have the necessary libraries installed, including pandas, numpy, matplotlib, seaborn, librosa, scikit-learn, tensorflow, opencv-python, and resampy.

  3. Notebook Execution: Run the Jupyter Notebook spectrogram-voice-analysis-with-ResNet-152.ipynb to execute the project. This notebook guides you through the steps of data loading, feature extraction, model building, training, evaluation, and visualization.

  4. Model Usage: After training, the trained model is saved as ResNet152_based_model.zip. You can extract and load this model in your Python code using TensorFlow/Keras and use it for inference on new audio samples.

    from tensorflow.keras.models import load_model
    
    # Load the trained model
    model = load_model('ResNet152_based_model.h5')
    
    # Perform inference on new data
    # Replace X_new with your new data
    predictions = model.predict(X_new)

Model Architecture

The model architecture consists of a pre-trained ResNet-152 base with custom layers added on top. The ResNet-152 base is frozen to leverage the learned features from ImageNet, while custom layers adapt the model for spectrogram-based voice analysis. The architecture includes:

Model Architecture: ResNet-152 with Custom Layers

  1. Base Model: The base of the model is ResNet-152, a deep convolutional neural network (CNN) architecture known for its effectiveness in image classification tasks. ResNet-152 consists of 152 layers and has shown impressive performance on various computer vision tasks. You utilize the pre-trained weights from ImageNet to leverage the learned features.

  2. Freezing Base Layers: All layers of the ResNet-152 base model are frozen, meaning they are not updated during training. This approach allows the model to retain the learned representations from ImageNet while fine-tuning the model's parameters for the specific task of sound classification.

  3. Custom Layers: On top of the frozen ResNet-152 base, custom layers are added to adapt the model for spectrogram-based voice analysis:

    • Global Average Pooling 2D (GAP): After the base model, a Global Average Pooling layer is added to reduce the spatial dimensions of the feature maps to a vector of features. This helps in capturing the most important features from the spectrogram.

    • Dense Layers: Multiple fully connected Dense layers are added to the model to learn complex patterns from the spectrogram features. The Dense layers use Rectified Linear Unit (ReLU) activation functions to introduce non-linearity into the model.

    • Dropout Regularization: Dropout layers are added after some Dense layers to prevent overfitting. Dropout randomly sets a fraction of input units to zero during training, which helps in improving the generalization of the model.

    • Output Layer: The final Dense layer predicts the probability distribution over the classes using a softmax activation function. The number of units in this layer corresponds to the number of classes in the dataset.

Summary

The model architecture combines the power of a pre-trained ResNet-152 base with custom layers tailored for spectrogram-based voice analysis. By leveraging transfer learning and fine-tuning, the model can effectively classify audio samples into different categories. The addition of custom layers allows the model to adapt to the specific characteristics of the spectrogram data while maintaining the robustness and representational power of the pre-trained ResNet-152 base.

spectrogram-voice-analysis-with-resnet-152's People

Contributors

elaaatif avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.