Coder Social home page Coder Social logo

Calibrate FFT for the microspeech about rune HOT 6 CLOSED

hotg-ai avatar hotg-ai commented on May 31, 2024
Calibrate FFT for the microspeech

from rune.

Comments (6)

meelislootus avatar meelislootus commented on May 31, 2024 2

Notes on hotg drive: https://docs.google.com/document/d/1IeJjxcj8VIca_nFGxnNVsbxsuQvGmnI0-Lga0Wy5Tg8/edit#

Overall summary:

  • The spectrogram-computer function in Python (implemented in C/C++, really) is quite complicated; we probably want to simplify and retrain the model with the simpler version of spectrogram-computer
  • The spectrogram-computer library sonogram in Rust does not do exactly what we need (misses mel-spectrum) and gives a bit poor access to parameters; we probably want to replace the use of sonogram with (1) our own windowing function, written in Rust, (2) an existing FFT crate
  • OR we just call the individual C/C++ code steps from Rust, in TF library

Here’s the TF Ops repo with all the parts to the TF spectrogram-computer, in C/C++ implementation: TF spectrogram-computer repo

The steps in the TF spectrogram-computer (they are all sequentially called from frontend.c) are, with links to the relevant code:

  1. Step 1: A windowing function, that chops the incoming audio sample into windows: window.c - this is currently part of sonogram. should not be too difficult to figure out / reverse engineer
  2. Step 2: FFT - applied on each window - this exists in Rust already: fft.cc
  3. Step 3: Filterbank calculations - convert the FFT complex and imaginary parts into energy - filterbank.c (FilterbankConvertFftComplexToEnergy & FilterbankAccumulateChannels)
  4. Step 4: Noise reduction - apply a low pass filter on each of the windows: noise_reduction.c (NoiseReductionApply)
  5. Step 5: Auto gain control - this might be complicated to reimplement, the algorithm is explained in Wang et al. 2016: pcan_gain_control.c (PcanGainControlApply)
  6. Step 6: Logarithmic scaling: log_scale.c (LogScaleApply)

I think a reasonable plan to match the model might be (given that especially step 5 might be quite complicated):

  1. Stage 1: Retrain the TF model with noise reduction and gain control turned off, and match with a Rust proc block that does steps (1) windowing, (2) FFT, (3) filterbank, (6) log scaling - this should be doable with sonogram + some hacking
  2. Stage 2: Match steps (4) noise reduction and (5) gain control in Rust (or we call the C/C++ functions from rust?)
  3. Stage 3: Go back to using the original model (now that the FFT proc block is fully matched)

from rune.

kthakore avatar kthakore commented on May 31, 2024

Starting with the microspeech with fft fix.
@Michael-F-Bryan might have a better idea here.
https://github.com/kthakore/json-eater

!!! If we can test proc blocks in python that is HUGE deal

from rune.

kthakore avatar kthakore commented on May 31, 2024

Need to make an implementation in Rust (copying over a python function) for microspeech. More notes from @meelislootus .

from rune.

kthakore avatar kthakore commented on May 31, 2024
  • Should we make prock_block libraries of these. Users could use these in the their procblock.

from rune.

meelislootus avatar meelislootus commented on May 31, 2024

https://github.com/hotg-ai/rune/compare/calibrate_models#diff-75a3acce5b7dd27594d4febcdc1d3562368ee3d2ab95027c049a91307fb6a389

from rune.

Michael-F-Bryan avatar Michael-F-Bryan commented on May 31, 2024

It looks like microspeech is good so I'll close this and #113.

from rune.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.