Coder Social home page Coder Social logo

joal84 / audio-class Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 2.25 MB

This application is designed for environmental audio classification and aims to assist individuals who are deaf or have severe hearing loss by focusing on environmental and urban sounds.

Home Page: https://youtu.be/srpw3UPxvaw

JavaScript 10.57% PureBasic 66.15% Python 10.30% CSS 11.67% HTML 1.31%
cnn-classification css flask machine-learning pyaudio python3 reactjs

audio-class's Introduction

Environmental Audio Classification Application

This application is designed for environmental audio classification and aims to assist individuals who are deaf or have severe hearing loss by focusing on environmental and urban sounds. To get started with this project, follow the steps below. Please note that this application is meant to run locally, as all audio capturing and analysis processes are performed on the server.

Final video of fixing issues in your code in VS Code

Getting Started

Clone this repository to your local machine:

git clone https://github.com/Joal84/audio-class.git

Navigate to the project directory:

cd audio-class

Install the required dependencies:

npm install

Navigate to /flask-server folder and install the required libraries and packages:

cd flask-server
pip install -r requirements.txt

Start the application:

npm run dev

Project Overview

Purpose: This application is designed for environmental audio classification, with a focus on assisting individuals who are deaf or have severe hearing loss, particularly in recognizing environmental and urban sounds.

Data Sources: The classification model was trained using two datasets:

  • ESC50: 50 classes, 40 recordings per class, each lasting 5 seconds.
  • us8K: 10 classes, 8732 total recordings, each with a duration of less than 4 seconds.
  • Screenshot of class distribution of both datasets.

Feature Extraction

  • The application uses the "Librosa" library to load all audio files.
  • All audio files have a sample rate of 44.1 kHz.
  • Audio signals are captured in 1-second chunks. If a sound has a smaller duration, it's padded with 0's. Longer sounds are truncated.
  • The audio data is reshaped into a 2D array.
  • Spectrogram extraction uses the Normalized Radial Diffusivity Transform (NRDT) algorithm to calculate diffusivity at different time delays.
  • Various parameters like flag, window (w), and channels are set for different spectrogram extraction approaches. Image of spectogram of a dog barking

Convolutional Neural Network (CNN) Model

  • The CNN model uses the "Selu" activation function to mitigate vanishing and exploding gradient problems.
  • It monitors validation accuracy and saves the best model checkpoint per epoch.
  • Model evaluation involves training both datasets on the same model, achieving high accuracy levels of 94% (esc50) and 97% (us8k) on test data.

Live Audio Input

  • The application utilizes the "pyaudio" library for managing live audio input into Python.
  • You can choose which microphone to be used by changing input_device_index value, in prediction.py, to curresponding number of the device you would like to use.

Audio Gate

Issues with live audio input are often related to the "chunks" of audio sent to the model for prediction. A gate system has been implemented to automate the start and end of the recording process, helping to address these issues. A gate threshold in prediction.py can be ajusted for fine tuning.

audio-class's People

Contributors

joal84 avatar

Stargazers

 avatar

Watchers

 avatar

audio-class's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.