Acoustic Event Detection with TensorFlow Lite

This project is to develop an Android version of "acoustic feature camera" developed in my other project acoustic features.

And the other screenshots.

Android app "spectrogram"

This app is for both training CNN and running inference for acoustic event detection: code

I reused part of this code that was written in C language by me for STM32 MCUs.

The DSP part used JTransforms.

I confirmed that the app runs on LG Nexus 5X and Google Pixel4. The code is short and self-explanatory, but I must explain that the app saves all the feature files in JSON format under "/Android/data/audio.processing.spectrogram/files/Music".

I developed a color map function "Coral reef sea" on my own to convert grayscale image into a color of the sea in Okinawa.

    private fun applyColorMap(src: IntArray, dst: IntArray) {
        for (i in src.indices) {
            val mag = src[i]
            dst[i] = Color.argb(0xff, 128 - mag / 2, mag, 128 + mag / 2)
        }
    }

Training CNN

This is a notebook for training a CNN model for musical instruments recognition: Jupyter notebook.

The audio feature corresponds to gray-scale image the size of 64(W) x 40(H).

Converting Keras model into TFLite model is just two lines of code:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

In case of "musical instruments" use case, the notebook generates two files, "musical_instruments-labels.txt" and "musical_instruments.tflite". I rename the files and place them under "assets" folder for the Android app:

-- assets --+-- labels.txt
            |
            +-- aed.tflite

Audio processing pipeline

Training data collection on Android (smart phone)

Basically, short-time FFT is applied to raw PCM data to obtain audio feature as gray-scale image.

   << MEMS mic >>
         |
         V
  [16bit PCM data]  16kHz mono sampling
         |
  [ Pre-emphasis ]  FIR, alpha = 0.97
         |
[Overlapping frames (50%)]
         |
     [Windowing]  Hann window
         |
  [   Real FFT   ]
         |
  [     PSD      ]  Absolute values of real FFT / N
         |
  [Filterbank(MFSCs)]  40 filters for obtaining Mel frequency spectral coefficients.
         |
     [Log scale]  20*Log10(x) in DB
         |
   [Normalzation]  0-255 range (like grayscale image)
         |
         V
 << Audio feature >>  64 bins x 40 mel filters

Training CNN on Keras

The steps taken for training a CNN model for Acoustic Event Detection is same as that for classification of grayscale images.

 << Audio feature >>  64 bins x 40 mel filters
         |
         V
   [CNN training]
         |
         V
 [Keras model(.h5)]
         |
         V
 << TFLite model >>

Run inference on Android

 << Audio feature >>  64 bins x 40 mel filters
         |
         V
[Trained CNN(.tflite)]
         |
         V
   << Results >>

References

Speech processing for machine learning

I learned audio processing technique for machine learning from this site.

haitmai / android-aed Goto Github PK

android-aed's Introduction

Acoustic Event Detection with TensorFlow Lite

Android app "spectrogram"

Training CNN

Audio processing pipeline

Training data collection on Android (smart phone)

Training CNN on Keras

Run inference on Android

References

Speech processing for machine learning

android-aed's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent