arm-software / ml-kws-for-mcu Goto Github PK

Keyword spotting on Arm Cortex-M Microcontrollers

License: Apache License 2.0

Python 3.23% C++ 12.65% Objective-C 9.36% C 73.96% Makefile 0.80%

arm machine-learning python cmsis-nn deep-neural-networks microcontrollers

ml-kws-for-mcu's Introduction

Keyword spotting for Microcontrollers

This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword spotting on Microcontrollers. The scripts are adapted from Tensorflow examples and some are repeated here for the sake of making these scripts self-contained.

To train a DNN with 3 fully-connected layers with 128 neurons in each layer, run:

python train.py --model_architecture dnn --model_size_info 128 128 128

The command line argument --model_size_info is used to pass the neural network layer dimensions such as number of layers, convolution filter size/stride as a list to models.py, which builds the tensorflow graph based on the provided model architecture and layer dimensions. For more info on model_size_info for each network architecture see models.py. The training commands with all the hyperparameters to reproduce the models shown in the paper are given here.

To run inference on the trained model from a checkpoint on train/val/test set, run:

python test.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint 
<checkpoint path>

To freeze the trained model checkpoint into a .pb file, run:

python freeze.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint 
<checkpoint path> --output_file dnn.pb

Pretrained models

Trained models (.pb files) for different neural network architectures such as DNN, CNN, Basic LSTM, LSTM, GRU, CRNN and DS-CNN shown in this arXiv paper are added in Pretrained_models. Accuracy of the models on validation set, their memory requirements and operations per inference are also summarized in the following table.

To run an audio file through the trained model (e.g. a DNN) and get top prediction, run:

python label_wav.py --wav <audio file> --graph Pretrained_models/DNN/DNN_S.pb 
--labels Pretrained_models/labels.txt --how_many_labels 1

Quantization Guide and Deployment on Microcontrollers

A quick guide on quantizing the KWS neural network models is here. The example code for running a DNN model on a Cortex-M development board is also provided here.

ml-kws-for-mcu's People

Stargazers

Watchers

Forkers

clscy jiaodaxiaozi wangmengzhi 2php mkolod khaledto nieshaoshuai szhaomsft entn-at zhongxingpeng opencvbaby opensource-vivek pswietojanski rtygbwwwerr subho406 frankiegu tony32769 wantongtang jiangfan2 james-lh zhiweizhong pribadihcr savinaydharmappa xiongyihui hongshui3000 xiang07 fendou1997 sinferwu danghuutoan diridiri hakehuang fgr1986 arhik fanwei918 daisukelab interstella12 lxyyang liaorichard zhaoforever alessandrocapotondi cledic droneboost 4p0pt0z reasonss bukosabino gulsahsamut vaisramana tantiv4 64nd heseng91 dragonwar ncutstudent leo121 holianh vicyangworld soar0603 minhnd3796 zh794390558 vampsj pooyaww gaffey huleg sugarcase exscientiatridens dylancao tdautc19841202 zhengwang721 peiyongxia gm-jl flyahead xinkez wangyang2014 xiangzheng2017 shwanashrafi liao1995 zhuleiustc zhiguangzhang uoops hahadashi ai-driven victorstefan13 remme123 peyer taozhang8 hanson-hschang molyswu superever liangbai-fdu maxmax2016 wangqiang1588 wilsonzhang2008 hklee2040 normonisping lym0302 hobertxu gaoyiyeah sycomix clcarwin lbqin mingmchen

ml-kws-for-mcu's Issues

The mbed-cli error leading to the compiling error.

Hi author,

I has tried so many method for this compiling issue.
When I has done sucessfully mbed deploy , then mbed compile, some error shows as following.

the error log says, ....mbed.h: No such file or directory ... compilation terminated.

But I use find name mbed.h
only shows
./mbed/5aab5a7997ee/mbed.h

So I can draw a conclusion the mbed-cli lead to the compiling error. So why will we use the previous version of mbed-cli.

@navsuda certainly, we also can continue to the gcc-arm toolchain development.
At the same time , we should keep the mbed-cli toolchain development. What's your opinion?

Support for RNNs in CMSIS-5

Hi, thanks for sharing the code !

Is there any plan to share code for running the RNNs presented in the paper on MCU, as it has been done for DS_CNNs ? Maybe code in the CMSIS-5 library ?

Thank you

memory/Ops in stm32f7

Hello @navsuda

I have implemented the KWS, how to get the memory/Ops in stm32f7.

recommended version of tensorflow 1.4.0 or 1.7.0?

What version of tensorflow do you recommend? 1.4.0 or 1.7.0? We had some issues with 1.7.0 on raspberry pi.

question, performance not match with the paper and feature generation

Hi,
Thanks for your wonderful work. It really help me much.
I have several question about this project
1 , I run the code with your train_commands.txt, and I found the performance is a little worse than the result in the paper, Table 7. For DS-CNN, small model, the highest validation accuracy is 92.98% in codee, and it is 93.6% in the paper.
My question is, do you get the Table 7 performance with the same code setting?

in the train.py, the test dataset is implemented after training is done.
It does not use the model when validation dataset accuracy is maximum.
do you calculate the test accuracy in the paper with the same method?

3, did you compare the performance of LBFE vs MFCC? in google's paper, it use LFBE. But MFCC can use small feature, you only use 10 MFCC features. If we use more MFCC features, can we get the higher performance?

4, do you consider the feature normalization to compatible with the different signal power range?

5, if some frame, the signal power is zero, how do you calculate the log(LBFE)? I can't see it in the code. In general, it will use log(LBFE+delta), delta is a constant small value. what is the delta value?

In many paper, they use window_size_ms=25 or 30ms, and window_stride_ms=10ms,
but in DS-CNN, you use window_size=40ms and window_stride=20ms
big window_stride can reduce the OPERATIONS, I can understand.
But I don't know why use 40ms window_size, for 16k input sample rate, it should use 1024 FFT, it is power consuming.

7, I run the simulation, almost need 4 hours. I use GPU Geforce 1080 TI and CPU E5-2650. But I saw in your other reply that you only need 1 hours to complete the simulation. Is there any way to speed up? I found the feature generation use most of the time.

Thanks
Jinhong

conver into tensorflow lite

Can someone help me in converting the frozen graph model to tensorflow lite? thanks.

Why it is Qx.8 ，isn't the q_7 type?

How to run inference

Not sure if this is the right place, but if someone can point me on how to do live microphone inference on the FRDM-K64F, I built the .bin file and loaded it, but not sure what to do next

The accuracy is not very high

is not 16000hz，16bit，1 channel？

Issue with quant_test.py, and request for DS_CNN

Hi, I'm trying to get my DNN work on Disco board.

I have problem with quant_test.py, it came across this error.
Is it possible to show what could avoid this please?

$ python quant_test.py --data_url= --data_dir=/dataset/speech_commands_v0.01/ --model_architecture dnn --model_size_info 128 128 128 --checkpoint /tmp/speech_commands_train/best/dnn_8480.ckpt-17200
Traceback (most recent call last):
  File "quant_test.py", line 305, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/Users/foo/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "quant_test.py", line 203, in main
    FLAGS.model_architecture, FLAGS.model_size_info)
  File "quant_test.py", line 79, in run_quant_inference
    is_training=False)
  File "/Users/foo/lab/ML-KWS-for-MCU/quant_models.py", line 115, in create_model
    act_max, is_training)
  File "/Users/foo/lab/ML-KWS-for-MCU/quant_models.py", line 162, in create_dnn_model
    if(act_max[i]!=0):
IndexError: list index out of range

Before trying this, I have trained my network as follows. It was successful.

python train.py --data_url= --data_dir=/dataset/speech_commands_v0.01 --model_architecture dnn --model_size_info 128 128 128

I understand that only DNN is supported by this for now, we are very very happy if you could open quant_test.py as well as Deployment codes. Thank you!

Extract weight from pb

Hi, thanks for sharing your code.
how you can get the ds_cnn_weights.h value?
Please tell me how to get those value,

thx

No op named DecodeWav in defined operations.

Hi Naveen

First of all I like to thank you for uploading this nice set of example networks for keyword spotting. It is a great help and good starting point for many people like me.

I am a newbie to TensorFlow and while trying to load the models I get an error. Here is the code that I am using to load the model

import tensorflow as tf
from tensorflow.python.platform import gfile
with tf.Session() as sess:
    model_filename ='./Pretrained_models/DNN/DNN_S.pb'
    with gfile.FastGFile(model_filename, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        g_in = tf.import_graph_def(graph_def)

and here is the error

g_in = tf.import_graph_def(graph_def)
  File "/home/users/stageg4m/venv/pynn/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 285, in import_graph_def
    raise ValueError('No op named %s in defined operations.' % node.op)
ValueError: No op named DecodeWav in defined operations.

While searching for the solution online I found out that one of the reasons for this error can be the usage of a different versions for freezing and loading the models. If this is the reason can you please tell me the version of tensorflow you used for creating the models? If not can you please help me to solve this problem? I will be grateful.

ask help about how to do quantization and use arm_fully_connected_q7?

Hi,

Thanks for your nice work!

I am using arm_fully_connected_q7 for speeding my cortex m4 project. But the output is different with that using raw C implementation or arm_matrix_mul_f32.

I do the quantization by below steps:

Get min, max for Wk,Bk,Input_k,Ouput_k. (Input_k = Output_{k-1})
2.calculate Q for Wk, Bk, Ik,Ok .

calculate Bias Shift and output shift

Bias_shift = Q_{Ik} + Q_{Wk} - Q_{Bk}

Output_shift = Q_{Ik} + Q_{Wk} - Q_{Ok}

Quantize the Wk an Bk by the Q_{Wk} ,Q_{Bk}

Use bias_shift, output_shift in arm_fully_connected_q7()

I also try to print the de-quantized value for W and B at runtime and that is same with the raw float value.

Do I miss something? Do I need to reorder the elements of weight? such as [1,2,3,4] to [1,3,2,4].

Below is my quantization params.

#I use cmvm to normalize the fbank feature so it is in [-2.6,3.6]
Q_i =5
Q_W = [5, 6, 6, 7, 6]
Q_B =[4, 4, 4, 5, 5]
Q_O =[2, 1, 2, 2, 3]
Shift_B = [6, 4, 3, 4, 3]
Shift_O =[8, 7, 5, 7, 5]

Thank you very much for helping!

Training other words

Hello, @navsuda
I'm constantly development KWS application. and I tried several test in my stm32, mbed borad.
finally, I want to training my own words(except : no, yes, rigth, left....) like turn on, turn off, light on.

After reading your source code, I realized adding data_dir -> my_wav file. but is it only thing i have to do? To adding my own words, how can i do?
or only using speech_commends_v0.02 data set, just adding word like' happy', 'house' then KWS is work well?

To product weight.h file, DNN process follow train.py -> test.py->quant_test.py -> weight.h
but i want to DS-CNN process. not sure about process
i guess, fold_batchnorm.py-> test.py -> quant_test.py -> weight.h(from DS-CNN)
Am i right? (maybe wrong..)

Thank you.

Couldn't find build tools in your program.

zhuan@zhuan-HP:~/zhuan/ML-KWS-for-MCU/Deployment$ mbed new kws_simple_test --mbedlib
[mbed] Creating new program "kws_simple_test" (git)
[mbed] Adding library "mbed" from "https://mbed.org/users/mbed_official/code/mbed/builds" at latest revision in the current branch
[mbed] Unpacking library build "5aab5a7997ee" in "/home/zhuan/zhuan/ML-KWS-for-MCU/Deployment/kws_simple_test/mbed"
[mbed] Updating reference "mbed" -> "https://mbed.org/users/mbed_official/code/mbed/builds/5aab5a7997ee"
[mbed] ## Couldn't find build tools in your program. Downloading the mbed 2.0 SDK tools...

anybody comes across this problem "Couldn't find build tools in your program."? Look forward to your comment, thank you!

GCC Compile Issue

Hi, I am having problems compiling your application using the newly added GCC "option". The error i get is the following after following your steps (clone CMSIS_5 into repo, cd into gcc project, then make);
"""
...GIT\ML-KWS-for-MCU\Deployment\Examples\simple_test_k64f_gcc>make -j 8
make[1]: *** No rule to make target 'simple_test/main.o', needed by 'simple_test_k64f_gcc.elf'. Stop.
Makefile:25: recipe for target 'all' failed
make: *** [all] Error 2
"""

Any thoughts as to why this is not working? Based on another issue, it seems they have gotten this to work already so just curious as to what i am doing wrong.

GCC compilation

Hello,

Could I compile this project with GCC?
Do you have any examples of CMAKE or any open source tool?

Thank You in advance!

Kevin Patino

请教：从0开始按照说明进行训练并测试，得到的结果和给出的数据差距很大，是哪里出了问题？

你好！
我从0开始按照说明进行训练并测试，得到的结果和给出的数据差距很大，是哪里出了问题？

下面是我的训练过程记录：

# python train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DS_CNN/DS_CNN1/retrain_logs --train_dir work/DS_CNN/DS_CNN1/training/
......
INFO:tensorflow:Step #29592: rate 0.000020, accuracy 93.00%, cross entropy 0.214778
INFO:tensorflow:Step #29593: rate 0.000020, accuracy 95.00%, cross entropy 0.247013
INFO:tensorflow:Step #29594: rate 0.000020, accuracy 97.00%, cross entropy 0.160600
INFO:tensorflow:Step #29595: rate 0.000020, accuracy 95.00%, cross entropy 0.125619
INFO:tensorflow:Step #29596: rate 0.000020, accuracy 95.00%, cross entropy 0.190318
INFO:tensorflow:Step #29597: rate 0.000020, accuracy 94.00%, cross entropy 0.218497
INFO:tensorflow:Step #29598: rate 0.000020, accuracy 97.00%, cross entropy 0.137457
INFO:tensorflow:Step #29599: rate 0.000020, accuracy 96.00%, cross entropy 0.194881
INFO:tensorflow:Step #29600: rate 0.000020, accuracy 95.00%, cross entropy 0.245867
INFO:tensorflow:Confusion Matrix:
 [[258   0   0   0   0   0   0   0   0   0   0   0]
 [  1 218   1   2   2   7   6   8   7   2   0   4]
 [  2   3 247   4   0   0   2   0   0   0   0   3]
 [  0   9   0 246   1   3   2   0   0   0   0   9]
 [  2   5   0   0 238   0   0   0   0  13   1   1]
 [  0   4   1  12   0 238   0   0   0   0   1   8]
 [  1   2  13   1   1   0 226   2   1   0   0   0]
 [  0   7   0   1   1   0   3 243   1   0   0   0]
 [  4   4   0   0   2   1   2   0 239   5   0   0]
 [  0   0   0   0  16   0   1   0   2 233   2   2]
 [  2   3   0   0   7   0   2   0   0   5 227   0]
 [  5   8   1   8   3   1   0   0   2   5   1 226]]
INFO:tensorflow:Step 29600: Validation accuracy = 91.79% (N=3093)
INFO:tensorflow:So far the best validation accuracy is 92.31%
INFO:tensorflow:Step #29601: rate 0.000020, accuracy 97.00%, cross entropy 0.193472
INFO:tensorflow:Step #29602: rate 0.000020, accuracy 98.00%, cross entropy 0.086266
INFO:tensorflow:Step #29603: rate 0.000020, accuracy 95.00%, cross entropy 0.206013
INFO:tensorflow:Step #29604: rate 0.000020, accuracy 95.00%, cross entropy 0.130586
INFO:tensorflow:Step #29605: rate 0.000020, accuracy 99.00%, cross entropy 0.107168
INFO:tensorflow:Step #29606: rate 0.000020, accuracy 99.00%, cross entropy 0.051911
......
INFO:tensorflow:Step #29969: rate 0.000020, accuracy 98.00%, cross entropy 0.121484
INFO:tensorflow:Step #29970: rate 0.000020, accuracy 99.00%, cross entropy 0.062734
INFO:tensorflow:Step #29971: rate 0.000020, accuracy 96.00%, cross entropy 0.131749
INFO:tensorflow:Step #29972: rate 0.000020, accuracy 96.00%, cross entropy 0.148196
INFO:tensorflow:Step #29973: rate 0.000020, accuracy 100.00%, cross entropy 0.074228
INFO:tensorflow:Step #29974: rate 0.000020, accuracy 95.00%, cross entropy 0.156531
INFO:tensorflow:Step #29975: rate 0.000020, accuracy 97.00%, cross entropy 0.098559
INFO:tensorflow:Step #29976: rate 0.000020, accuracy 95.00%, cross entropy 0.156327
INFO:tensorflow:Step #29977: rate 0.000020, accuracy 90.00%, cross entropy 0.212709
INFO:tensorflow:Step #29978: rate 0.000020, accuracy 94.00%, cross entropy 0.246338
INFO:tensorflow:Step #29979: rate 0.000020, accuracy 94.00%, cross entropy 0.208315
INFO:tensorflow:Step #29980: rate 0.000020, accuracy 95.00%, cross entropy 0.195625
INFO:tensorflow:Step #29981: rate 0.000020, accuracy 90.00%, cross entropy 0.289182
INFO:tensorflow:Step #29982: rate 0.000020, accuracy 96.00%, cross entropy 0.146916
INFO:tensorflow:Step #29983: rate 0.000020, accuracy 95.00%, cross entropy 0.153784
INFO:tensorflow:Step #29984: rate 0.000020, accuracy 95.00%, cross entropy 0.182212
INFO:tensorflow:Step #29985: rate 0.000020, accuracy 94.00%, cross entropy 0.174497
INFO:tensorflow:Step #29986: rate 0.000020, accuracy 97.00%, cross entropy 0.140645
INFO:tensorflow:Step #29987: rate 0.000020, accuracy 95.00%, cross entropy 0.160968
INFO:tensorflow:Step #29988: rate 0.000020, accuracy 98.00%, cross entropy 0.063577
INFO:tensorflow:Step #29989: rate 0.000020, accuracy 93.00%, cross entropy 0.165674
INFO:tensorflow:Step #29990: rate 0.000020, accuracy 98.00%, cross entropy 0.096141
INFO:tensorflow:Step #29991: rate 0.000020, accuracy 96.00%, cross entropy 0.149724
INFO:tensorflow:Step #29992: rate 0.000020, accuracy 92.00%, cross entropy 0.281510
INFO:tensorflow:Step #29993: rate 0.000020, accuracy 93.00%, cross entropy 0.205289
INFO:tensorflow:Step #29994: rate 0.000020, accuracy 89.00%, cross entropy 0.282349
INFO:tensorflow:Step #29995: rate 0.000020, accuracy 96.00%, cross entropy 0.124107
INFO:tensorflow:Step #29996: rate 0.000020, accuracy 97.00%, cross entropy 0.140024
INFO:tensorflow:Step #29997: rate 0.000020, accuracy 97.00%, cross entropy 0.128435
INFO:tensorflow:Step #29998: rate 0.000020, accuracy 94.00%, cross entropy 0.151532
INFO:tensorflow:Step #29999: rate 0.000020, accuracy 96.00%, cross entropy 0.143669
INFO:tensorflow:Step #30000: rate 0.000020, accuracy 93.00%, cross entropy 0.231711
INFO:tensorflow:Confusion Matrix:
 [[258   0   0   0   0   0   0   0   0   0   0   0]
 [  1 216   1   3   3   6   6   7   8   2   0   5]
 [  3   2 247   4   0   1   2   0   0   0   0   2]
 [  0   9   0 248   1   2   2   0   0   0   0   8]
 [  2   7   0   0 235   0   0   0   0  14   2   0]
 [  0   4   1  16   0 233   0   0   0   0   1   9]
 [  1   2  12   1   1   0 227   2   1   0   0   0]
 [  0   7   0   1   1   0   2 243   1   1   0   0]
 [  4   4   0   0   2   1   2   0 239   5   0   0]
 [  0   1   0   0  13   0   1   0   2 235   2   2]
 [  2   3   0   0   8   0   2   0   0   4 226   1]
 [  6   8   1  10   2   1   0   0   2   5   1 224]]
INFO:tensorflow:Step 30000: Validation accuracy = 91.53% (N=3093)
INFO:tensorflow:So far the best validation accuracy is 92.31%
INFO:tensorflow:set_size=3081
INFO:tensorflow:Confusion Matrix:
 [[257   0   0   0   0   0   0   0   0   0   0   0]
 [  0 226   2   3   0   0   3   5   3   2   5   8]
 [  0   5 239   1   0   2   7   1   0   0   0   1]
 [  0   2   1 237   3   3   2   0   0   1   0   3]
 [  0   2   0   0 248   0   3   0   3  11   3   2]
 [  1   5   0   7   2 230   1   0   0   0   0   7]
 [  0   3   4   0   2   0 256   2   0   0   0   0]
 [  1   7   1   0   1   1   2 245   0   0   1   0]
 [  0   2   0   0   2   1   1   1 238   1   0   0]
 [  0   2   0   0   6   1   1   0   8 240   2   2]
 [  0   6   1   1   5   3   0   1   0   1 228   3]
 [  0   4   1   7   1   2   3   4   1   6   0 222]]
INFO:tensorflow:Final test accuracy = 93.02% (N=3081)

# python test.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --checkpoint work/DS_CNN/DS_CNN1/training/be/ds_cnn_9230.ckpt-12400
2018-06-07 15:49:56.929040: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2018-06-07 15:49:57.088431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:65:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-06-07 15:49:57.088490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-07 15:49:57.311486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-07 15:49:57.311542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-06-07 15:49:57.311552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-06-07 15:49:57.311777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10413 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
2018-06-07 15:50:25.113020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-07 15:50:25.113082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-07 15:50:25.113092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-06-07 15:50:25.113100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-06-07 15:50:25.113310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10413 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
INFO:tensorflow:wav_filename_placeholder: Tensor("Placeholder:0", shape=(), dtype=string)
INFO:tensorflow:Restoring parameters from work/DS_CNN/DS_CNN1/training/best/ds_cnn_9230.ckpt-12400
INFO:tensorflow:set_size=22246
INFO:tensorflow:Confusion Matrix:
 [[1827    0    0    0    0    0    0    0    0    0    0    0]
 [   0 1723    0   89    0   31    1    0    0    0    0   10]
 [   0 1509  150  186    0   36    3    0    0    0    0    1]
 [   0  635    0 1152    0   23    0    0    0    0    0   15]
 [   0 1762    0   97    0   38    2    0    0    0    0    8]
 [   0 1191    0  196    0  386    0    0    0    0    0   10]
 [   0 1572   12  187    0   19   51    0    0    0    1    3]
 [   1 1793    0   36    0   21    9   23    0    0    0    6]
 [   0 1760    1   50    0   55    0    0    2    0    0    4]
 [   0 1635    2  111    0   26    0    0    0    0    0   13]
 [   0 1774    3   95    0   79    1    0    0    0    0    5]
 [   0 1210    0  390    0   53    3    0    0    0    0  159]]
INFO:tensorflow:Training accuracy = 24.60% (N=22246)
INFO:tensorflow:set_size=3093
INFO:tensorflow:Confusion Matrix:
 [[258   0   0   0   0   0   0   0   0   0   0   0]
 [  0 240   0  12   0   4   0   0   0   0   0   2]
 [  0 200  21  33   0   7   0   0   0   0   0   0]
 [  0 109   0 156   0   5   0   0   0   0   0   0]
 [  0 243   0  11   0   3   1   0   0   0   0   2]
 [  0 178   0  38   0  40   0   0   0   0   0   8]
 [  0 222   0  15   0   2   7   0   0   0   0   1]
 [  0 248   0   3   0   1   0   1   0   0   0   3]
 [  0 243   0   9   0   4   0   0   0   0   0   1]
 [  0 242   0   9   0   3   0   0   0   0   0   2]
 [  1 228   0  13   0   4   0   0   0   0   0   0]
 [  0 175   0  56   0   5   0   0   0   0   0  24]]
INFO:tensorflow:Validation accuracy = 24.15% (N=3093)
INFO:tensorflow:set_size=3081
INFO:tensorflow:Confusion Matrix:
 [[257   0   0   0   0   0   0   0   0   0   0   0]
 [  0 240   0  10   0   6   0   0   0   0   0   1]
 [  0 193  21  31   0   9   0   0   0   0   0   2]
 [  0  84   0 158   0   8   0   0   0   0   0   2]
 [  0 244   0  14   0  12   1   0   0   0   0   1]
 [  0 147   0  40   0  64   0   0   0   0   0   2]
 [  0 220   1  33   0   3   9   0   0   0   0   1]
 [  0 234   0   9   0   7   2   3   0   0   0   4]
 [  0 221   0  12   0  13   0   0   0   0   0   0]
 [  0 223   1  26   0   8   0   0   0   0   0   4]
 [  0 203   0  17   0  28   0   0   0   0   0   1]
 [  0 153   1  67   0  12   0   0   0   0   0  18]]
INFO:tensorflow:Test accuracy = 24.99% (N=3081)
#

Incompatibility problem wiht mbed-os

Hi:

I found that the case was not compatiolbe with mbed-os, if I use class mbed::timer(Timer T;), the code would cause a BusFault.
I have tested the case on borad FRDM-K64F and STM32L476-DISCO, whit mbed-os-5.5.3 and mbed-os-5.7.7, the sesults were same.
I noticed that even in the latest mbed-os, the lib cmsis_dsp was located in features/unsupported/dsp/cmsis_dsp.
Have you ever met the same problem?

Thanks!

DSCNN accuracy reported by test.py

Hi,
If I am not mistaken, test.py, tests the data-set on the checkpoint, and reports the results by forming confusion matrix.
when I use test.py for DNN with my Data-set which is just an older version of google dataset I see better test accuracy for DNN comparing to DSCNN, ~80% and ~60% respectively, which is not compatible with accuracies reported in the article, it is expected that DSCNN perform better.
I use the same small model size info and other parameters provided here both for train.py and test.py. Do you see the same accuracies? @navsuda

Model summary image for Readme

quant_test returns incorrect number of fraq bits?

after quant_test.py finish I can see that number of bits for fractional part for fixed point are 9 or even 14:

final_fc_0 number of wts/bias: (144, 12) dec bits: 9 max: (0.20117188,0.20042063) min: (-0.19726562,-0.19753574)
Variable_0 number of wts/bias: (12,) dec bits: 14 max: (0.0051879883,0.0051957406) min: (-0.002319336,-0.0023462817)

It seems it is a bug. If wts/bias value is in range -1,1, number of integer bits should be 0, isn't it? Now according to formula int(log2(value)) number of bits for integer part become negative (because log of value <1 os negative). This cause number of fraq bits > 7.

How to use the LFBEs in tensorflow envoirnment

Dear Naveen

I was able to run this code and train the networks on a different set of classes. However, in the paper, Hello edge it was mentioned that we can train either on LFBE or on MFCCs. Can you please guide me about the preprocessing using logmel filter bank energies (LFBEs)? I am new to tensorflow and having a hard time to understand the gen_audio_ops.
I tried to use librosa but I am not getting meaningful results. Any kind of help will be highly appreciated. Thank you.

dynamic_rnn

there are one error when I try to deploy on target.

1.Build and run a simple KWS inference
2.mbed new kws_simple_test --mbedlib
3. ERROR An error occurred while unpacking library archive ".bld.rev-5aab5a7997ee.zip" in "c:\Users\Administrator\Desktop\ML-KWS-for-MCU\Deployment\kws_simple_test\mbed"

Training the model with my own data

Hi,
As the train.py describes, the program support training with my own data. However, when I use my own .wav files by --data_dir, it says IOError: CRC check failed 0x7668da3b != 0x316d6d75L. Have you ever test the program with some other database instead of the google recordings? Thanks.

Log-mel filter bank features

Hi! Thank you for share your ideas and code. I'm newbie in DL and I'm learning a lot with your material.

I would like run this examples with log-mel filterbanks as features to the neural network input.

How I can do it?

Stack overflow

The "int16_t audio_buffer[16000]=WAVE_DATA;" should not be defined in the main function because it will result in stack overflow when running with some other boards such K64F.
This variable should defined as global variables and store in heap.

Demo experiences compile time error when using mbed-os 5

Hello,

I experience the following compile time error when working with mbed-os 5. I am pretty sure it's because mbed5 is running CMSIS4 and not CMSIS5. Are there any plans to update this demo to support mbed OS 5?

__STATIC_FORCEINLINE is undefined <- not available in CMSIS4

pretty sure this is due to https://github.com/ARM-software/CMSIS_5/blob/develop/CMSIS/NN/Include/arm_nnsupportfunctions.h#L123

Question: Extend to larger network

Thanks for extending the TF audio speech command app & making this available.

Say the hardware specs allow for a larger network to be trained (think XL or 2XL in comparison to S(64), M(172), L(276) sizes mentioned in paper), what would be the best way to train such a network? How did you arrive at 64, 172 and 276 (trial/error or systematically)?

I tried the following:
python train.py --model_architecture ds_cnn --model_size_info 9 276 10 4 2 1 276 3 3 2 2 276 3 3 1 1 276 3 3 1 1 276 3 3 1 1 276 3 3 1 1 276 3 3 1 1 276 3 3 1 1 276 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DS_CNN_9layer/retrain_logs --train_dir work/DS_CNN_9layer/training

It's interesting that a deeper network didn't improve validation accuracy.

The training commands in train_commands.txt were super useful, wish more github projects would be as complete as this repo. However, after running a few of the training commands, I noticed that validation accuracy approached 99 to 100%. I wonder how many iterations did you stop at (which checkpoint file did you use) to generate the pre-trained models you provided?

Is the model used only for the trainers

Is the model used only for the trainers, is the input voice required, 16000HZ, 16bit, wav

Issue running with F411RE

Hi,
I tried to execute the kws_simple_test example as given in https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment

Board used is NUCLEO_F411RE.
The program crashes in the following line:
KWS kws(audio_buffer,scratch_buffer);
in main.cpp of Examples/simple_test

Did not investigate further but these examples seem tested on Nucleo boards?

regards,
Mridu

Question about the test of trained model

Hi,
I want to do the F1 and ROC test for the trained model, anyone did this before?

Quant_guide.md example command failed

I want to run quantization as described in Quant_guide.md. I run command as specified in Quant_guide.md. But I get error:
$ python quant_test.py --model_architecture dnn --model_size_info 144 144 144 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 40 --checkpoint Pretrained_models/DNN/DNN_S.pb
/home/arteev/.local/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
2018-07-10 20:11:37.959657: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Traceback (most recent call last):
File "quant_test.py", line 308, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/arteev/tmp/ML-KWS-for-MCU-virtualenv-python27/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "quant_test.py", line 206, in main
FLAGS.model_architecture, FLAGS.model_size_info)
File "quant_test.py", line 79, in run_quant_inference
is_training=False)
File "/home/arteev/work/src/ML-KWS-for-MCU/quant_models.py", line 115, in create_model
act_max, is_training)
File "/home/arteev/work/src/ML-KWS-for-MCU/quant_models.py", line 163, in create_dnn_model
if(act_max[i]!=0):
IndexError: list index out of range

Does Quant_guide.md contain correct commands example? It seems command --model_size_info 144 144 144 is not correct.

How to execute training and other steps to generate the example

Hello to all,
I'm trying to execute again the training and the other steps to reproduce the .pb and weights.h files.
I use this commands:

python train.py --model_architecture dnn --model_size_info 144 144 144 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 40 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DNN/DNN1/retrain_logs --train_dir work/DNN/DNN1/training

python test.py --model_architecture dnn --model_size_info 144 144 144 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 40 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DNN/DNN1/retrain_logs --train_dir work/DNN/DNN1/training --checkpoint /home/embedded/ML-KWS-for-MCU/work/DNN/DNN1/training/best/dnn_8448.ckpt-29600

python freeze.py --model_architecture dnn --model_size_info 144 144 144 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 40 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DNN/DNN1/retrain_logs --train_dir work/DNN/DNN1/training --checkpoint /home/embedded/ML-KWS-for-MCU/work/DNN/DNN1/training/best/dnn_8448.ckpt-29600 --output_file dnn.pb

But running this last script I got an error:
python quant_test.py --model_architecture dnn --model_size_info 144 144 144 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 40 --checkpoint /home/embedded/ML-KWS-for-MCU/work/DNN/DNN1/training/best/dnn_8448.ckpt-29600

The error is:
Traceback (most recent call last):
File "quant_test.py", line 305, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/embedded/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "quant_test.py", line 203, in main
FLAGS.model_architecture, FLAGS.model_size_info)
File "quant_test.py", line 79, in run_quant_inference
is_training=False)
File "/home/embedded/ML-KWS-for-MCU/quant_models.py", line 115, in create_model
act_max, is_training)
File "/home/embedded/ML-KWS-for-MCU/quant_models.py", line 160, in create_dnn_model
if(act_max[i]!=0):
IndexError: list index out of range

Please: where am I wrong?
Can someone publish the correct commands? I want to rebuild and test the project and then I want to replace the sound with different sound, in my case urban sound, and repeat the build.

Thanks a lot.
Clemente

how to generate new input wav_data.h

Hi,
how can i generate a new input audio example "wav_data.h"?
thanks.

CRNN question

Dear @navsuda ,
Sorry to bother you again,
1. I found, in your paper, Figure 3, CRNN feed all the timesteps output of GRU to the Fully Connected Layer. But in the code, only the last timestep output of GRU is feeded to Fully Connected Layer. I think the original paper of CRNN(convolutional recurrent neural networks for small-footprint keyword spotting) use all the timesteps. What do you consider it?
2. And I found in the original paper of CRNN, it use bi-direction GRU and layer normalization, but in your code, you seems not prefer to Bi-direction and layer normalization. What is your consideration?
3. In the original CRNN paper, it use deepspeech2 to align the keyword audio, But Google's speech command dataset don't align the keyword, it just make sure the keyword is in the whole *.wav, but it does not align the keyword, the format is [random length of filler, keyword, residue length of filler].
Actually I can't understand the original CRNN how to align the keyword, is it
[zero, keyword],
[keyword, zero],
or [random length of filler, keyword, residue length of filler].
Do you consider it? I think align should have affect to the performance, because the Fully connection Layer don't have the ability to handle the invariance of time shift.
4. Do you know where can get another big dataset of keyword spotting? Because in the google's speech command dataset, it only has almost 2000 audio files for each keyword, Maybe it is not enough

Thanks
Jinhong

Extract weight from checkpoint

Hi,
I found I can run quant_test.py whit dnn model,
but I run quant_test.py whit ds_cnn model has a error.(copy the models method create_ds_cnn_model to quant_models)
the error row is quant_test.py -> np.savetxt(f,var_values.transpose(),fmt='%d',delimiter=', ',newline=', ')
how can I fix it?

Issue with MFCC banks noise in realtime test

When I run the code, unmodified, for KWS inference on live audio on STM32F746NG development kit, I get a solid band which hugely negatively impacts the accuracy. This band exists even when audio.IN.SetVolume() is set down to 20. I can't hear any noise when listening through the line out. The band also exists when the audio input is changed from the onboard MEMs mic to Line 1, so it is not an issue with the microphone itself. A photo is attached when the volume is lowered down to 20. I would really appreciate any ideas to eliminate this issue.

Quantization

Hi,

In conclusion of the paper, it states "We quantized representative trained 32-bit floating-point KWS models into 8-bit fixed-point versions... ".

Is the pretrained model already in 8 bit fixed point ??

tensorflow pb file exportation

I have trained several models with the train.py program and have the .pb files. But how do we change the models when we are loading them in the deployment examples, all I see is a already compiled bin file

Wondered about stm32 supported board list.

Hello, I'm very thank you for ARM team's effort.

I have several questions about stm32 support board.
We already have several stm32 board(stm32f051, f103rb, f407ig)

But these boards are not word in mbed-cli environments. (not supported)
So, we purchase other boards to applicated mbed-cli.

their is another way that work in my stm32(ML-KWS)?
Also i have seen other developer repo, that page implement cortex-m4 board KWS.
then, is it possible to applies ARM-KWS any other board that supported mbed-cli?

Thank you for your support.

Model size about DS_CNN_L.pb

hi,
I just use DS-CNN model to test, but I can't ensure the value of "model_size_info".

So I want to read the variable value from DS_CNN_L.pb by:

with tf.Graph().as_default():
    output_graph_def = tf.GraphDef()
    output_graph_path = './Pretrained_models/DS_CNN/DS_CNN_L.pb'
    #sess.graph.add_to_collection("input", mnist.test.images)

    with open(output_graph_path, "rb") as f:
        output_graph_def.ParseFromString(f.read())
        _ = tf.import_graph_def(output_graph_def, name="")

    with tf.Session() as sess:

        tf.initialize_all_variables().run()
        input_x = sess.graph.get_tensor_by_name("DS-CNN/fc1/weights:0")
        print input_x

But it returns error:
KeyError: "The name 'DS-CNN/conv_1/weights:0' refers to a Tensor which does not exist. The operation, 'DS-CNN/conv_1/weights', does not exist in the graph."

So I want to ask you what's the model_size_info in Pretrained_models/DS_CNN/DS_CNN_L.pb and why I can't read variable from Pretrained_models/DS_CNN/DS_CNN_L.pb?

Thank you!

Kernel size of DS-CNN's Conv1

Hi, I read throughout the paper but it's still not clear about kernel size, so let me ask:

How did you decide your kernel size of DS-CNN (4 x 10)?
Is it "time by feature" or "feature by time"? It looks like size for feature is 4, and size for time is 10 in your code. But for me, it is more reasonable if 10 is for feature and 4 is for time.

This is because it seems not to be clear what is the best kernel representation especially for speech, questioned by this article:
https://towardsdatascience.com/whats-wrong-with-spectrograms-and-cnns-for-audio-processing-311377d7ccd

I appreciate if you could share any of your thought when you were elaborating your fine DS-CNN.
Thank you.

This error shows that "....might take a minute", however it cost one hour and still donot success.

C:\Users\Administrator\Desktop\ML-KWS-for-MCU\Deployment>mbed new kws_simple_tes
t --mbedlib
[mbed] Creating new program "kws_simple_test" (git)
[mbed] Adding library "mbed" from "https://mbed.org/users/mbed_official/code/mbe
d/builds" at latest revision in the current branch
[mbed] Downloading library build "5aab5a7997ee" (might take a minute)

Extracting weights and quantization for DSCNN

Hi,
I see that, currently, quant_test.py only supports extracting and quantizing of DNN weights.
Is it possible that you share a script that also supports ds_cnn ?

Thanks you

realtime_test compiling steps fail with "mbed tools not found"

Hi,

I can not get the realtime_test example to work following the steps here.

mbed new kws_realtime_test --create-only
produces the warning
[mbed] WARNING: Cannot find the mbed tools directory in...

The warning is repeated in the output of the command mbed deploy

Finally, mbed compile ... fails with
[mbed] ERROR: The mbed tools were not found in ...
[mbed] ERROR: Run mbed deploy to install dependencies and tools.

Is this a naming problem ? (mbed instead of mbed-os ?) or a new behavior of mbed-cli ?

Thanks

How can i run these models on MCU？

Thanks for your open source project. In your paper，you have deployed the KWS application on Cortex-M7. But there's only tensorflow models in this project, so what should i do to run these models on Cortex-M7？ Which system should the Cortex-M7 running, RTOS/Ubuntu？And how can i port tensorflow to Cortex-M7？ I will appreciate if you can give me some suggestions！

About model's posterior on devices

Hi,
Thank you for releasing your awesome work.
I would like to ask some questions about the model's posterior on the devices.

According to the paper,

KWS is running at 10 inferences per second.

(1) Do you smooth the inference scores? If the answer is yes, how often do you smooth data and what is the algorithm do you use?
(2) How does the confidence score come out? What is the algorithm you use?

Thank you for your time to answer these questions.

provided DS-CNN accuracy (on board)

Hi,

When testing the provided DS-CNN on my board, the accuracy seems to be very low. Especially in the real time example, the model almost always predicts "left" as soon as there is some sound.
Everything seems fine listening to the audio loopback, and the DNN provides much better results.
Is there specific parameters to use with DS-CNN in order to get high accuracy ?