Coder Social home page Coder Social logo

speech-recognition-with-pytorch's Introduction

Speech Recognition with PyTorch

CNN implementation in Python with PyTorch, on audio (.wav) files (94+ on test).

  1. General
  2. Dependencies

General

Background

Implementation of a neural network on the audio files. using gcommand_dataset.py that converts the .wav files into a 2D matrix (of 161 x 101).

The audio files in this dataset are ~ 1sec long, and there are 30 optional commands that can be heard in the files.

Model Structure

In short, the model has 5 convolutional layers, with Batch Normalize, ReLU and Max Pooling after each one. Then 2 more Fully Connected layers. The output of the neural network is 30.

In more detail:

  • First layer: Convolutional layer, kernel size = 5, stride = 2, padding = 2, batch norm = 16 => [1,16]
  • Second layer: Convolutional layer, kernel size = 3, stride = 1, padding = 1, batch norm = 32 => [16,32]
  • Third layer: Convolutional layer, kernel size = 3, stride = 1, padding = 1, batch norm = 64 => [32,64]
  • Fourth layer: Convolutional layer, kernel size = 3, stride = 1, padding = 1, batch norm = 128 => [64,128]
  • Fifth lyaer: Convolutional layer, kernel size = 3, stride = 1, padding = 1, batch norm = 256 => [128,256]
  • Sixth layer: Fully Connected layer, batch norm = 128 => [512, 128]
  • Seventh layer: Fully Connected layer => [128,30]

Throughout the model building, I monitored the loss and accuracy values so that I could get value of the accuracy of the model, and how And where can I improve it. This can be seen under the section called: ๐‘€๐‘œ๐‘‘๐‘’๐‘™๐‘  ๐‘ฃ๐‘Ž๐‘™๐‘–๐‘‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘“๐‘ข๐‘›๐‘๐‘›๐‘๐‘›๐‘๐‘› in the attached code.

About The Output Files

The program code exports a total of 2 files:

  • A test_y file that contains the predictions for the test.
  • The BestModelcpu.png or BestModelcuda.png file (based on the device on which the code runs), which contains a graph of the accuracy percentage and loss values of the training and the validation depending on the epochs.

Note that for using the dataset given in this repo, you need to download the dataset (about 1GB). You can also use google colab for running this program.

Dependencies

speech-recognition-with-pytorch's People

Contributors

davidlevinwork avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.