Coder Social home page Coder Social logo

android-aed's Introduction

Acoustic Event Detection with TensorFlow Lite

This project is to develop an Android version of "acoustic feature camera" developed in my other project acoustic features.

android_app

And the other screenshots.

Android app "spectrogram"

This app is for both training CNN and running inference for acoustic event detection: code

I reused part of this code that was written in C language by me for STM32 MCUs.

The DSP part used JTransforms.

I confirmed that the app runs on LG Nexus 5X and Google Pixel4. The code is short and self-explanatory, but I must explain that the app saves all the feature files in JSON format under "/Android/data/audio.processing.spectrogram/files/Music".

I developed a color map function "Coral reef sea" on my own to convert grayscale image into a color of the sea in Okinawa.

    private fun applyColorMap(src: IntArray, dst: IntArray) {
        for (i in src.indices) {
            val mag = src[i]
            dst[i] = Color.argb(0xff, 128 - mag / 2, mag, 128 + mag / 2)
        }
    }

Training CNN

This is a notebook for training a CNN model for musical instruments recognition: Jupyter notebook.

The audio feature corresponds to gray-scale image the size of 64(W) x 40(H).

Converting Keras model into TFLite model is just two lines of code:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

In case of "musical instruments" use case, the notebook generates two files, "musical_instruments-labels.txt" and "musical_instruments.tflite". I rename the files and place them under "assets" folder for the Android app:

-- assets --+-- labels.txt
            |
            +-- aed.tflite

Audio processing pipeline

Training data collection on Android (smart phone)

Basically, short-time FFT is applied to raw PCM data to obtain audio feature as gray-scale image.

   << MEMS mic >>
         |
         V
  [16bit PCM data]  16kHz mono sampling
         |
  [ Pre-emphasis ]  FIR, alpha = 0.97
         |
[Overlapping frames (50%)]
         |
     [Windowing]  Hann window
         |
  [   Real FFT   ]
         |
  [     PSD      ]  Absolute values of real FFT / N
         |
  [Filterbank(MFSCs)]  40 filters for obtaining Mel frequency spectral coefficients.
         |
     [Log scale]  20*Log10(x) in DB
         |
   [Normalzation]  0-255 range (like grayscale image)
         |
         V
 << Audio feature >>  64 bins x 40 mel filters

Training CNN on Keras

The steps taken for training a CNN model for Acoustic Event Detection is same as that for classification of grayscale images.

 << Audio feature >>  64 bins x 40 mel filters
         |
         V
   [CNN training]
         |
         V
 [Keras model(.h5)]
         |
         V
 << TFLite model >>

Run inference on Android

 << Audio feature >>  64 bins x 40 mel filters
         |
         V
[Trained CNN(.tflite)]
         |
         V
   << Results >>

References

Speech processing for machine learning

I learned audio processing technique for machine learning from this site.

android-aed's People

Contributors

araobp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.