Coder Social home page Coder Social logo

david8862 / command-words-recognition-keras Goto Github PK

View Code? Open in Web Editor NEW

This project forked from drukka/command-words-recognition-keras

0.0 0.0 0.0 15 KB

Speech recognition of command words using Keras and Tensorflow2

License: MIT License

Python 100.00%

command-words-recognition-keras's Introduction

CNN - Command Words Recognition

Speech recognition in Python using Tensorflow 2 and Keras high level API.
Convolutional Neural Networks (CNN) were invented to classify time invariant data (like images). Recent researches found that, if sound is converted into its spectrogram (or better: Log-Mel spectrogram, then convolutional neural networks can be applied on sound's features (aka spectrogram) for training speech recognition model.

Usage

  1. Clone repository:
    cd /path/to/these/files/
    git clone https://github.com/tyukesz/command-words-recognition-keras
    
  2. Check constants.py file for parameters like:
    • DATASET_URL // URL where ZIPed sounds files are stored
    • SAMPLE_RATE // Recomended 16KHz
    • EPOCHS // number of training epochs
    • BATCH_SIZE //use power of 2: 64, 128, 256, 512, ...
    • TESTING and VALIDATION percentage // Recomended 15% - 15%
    • WANTED_WORDS // list of command words: ['yes', 'no', 'up', 'down', 'on', 'off', ...]
    • VERBOSITY //0=disable, 1=progressbar
  3. Install requirements:
    pip install requirements.txt
    
  4. Run training on CPU:
    python train.py
    
    • If your are running training for first time it's recommended using '--force_extract=True' argument:
      python train.py --force_extract=True
      
    • If your SOUNDS_DIR is empty it will download and extract the sounds files from provided DATASET_URL.
    • Forcing MFCC feature extraction (force_extract=True) causes saving sounds features in MFCCS_DIR as tensors.
    • If features are already extracted, then the training begins faster since no need to recompute MFCCs of sounds, just load them as tensor.
  5. (Optional) You can load the pretrained model for transfer learning:
    python train.py --load_model=name_of_your_model
    
  6. (Optional) Test your model prediction with WAV file:
    python predict_wav.py --load_model=name_of_your_model --wav_path=/path/to/yes.wav --num_of_predictions=2
    
    • The above command will output something like:
      • yes (97%)
      • left (0.84%)

CNN architecure

Using sequential Keras model with following layers:

Layer (type) Output Shape Params
conv2d (Conv2D) (512, 64, 32, 16) 416
max_pooling2d (MaxPooling2D) (512, 32, 16, 16) 0
conv2d_1 (Conv2D) (512, 32, 16, 32) 4640
max_pooling2d_1 (MaxPooling2) (512, 16, 8, 32) 0
dropout (Dropout) (512, 16, 8, 32) 0
batch_normalization_v2 (BatchNormalization) (512, 16, 8, 32) 128
conv2d_2 (Conv2D) (512, 8, 4, 64) 18496
conv2d_3 (Conv2D) (512, 8, 4, 128) 73856
max_pooling2d_2 (MaxPooling2) (512, 4, 2, 128) 0
dropout_1 (Dropout) (512, 4, 2, 128) 0
flatten (Flatten) (512, 1024) 0
dropout_2 (Dropout) (512, 1024) 0
dense (Dense) (512, 256) 262400
dense_1 (Dense) (512, 6) 1542
____________________________
Total params: 361,478
Trainable params: 361,414
Non-trainable params: 64

License

This project is licensed under the MIT License - see the LICENSE.md file for details.
License: MIT

command-words-recognition-keras's People

Contributors

tyukesz avatar david8862 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.