jiaxiang-wu / quantized-cnn Goto Github PK

An efficient framework for convolutional neural networks

Makefile 0.21% C 1.40% C++ 98.40%

quantized-cnn's Introduction

Quantized-CNN for Mobile Devices

Quantized-CNN is a novel framework of convolutional neural network (CNN) with simultaneous computation acceleration and model compression in the test-phase. Mobile devices can perform efficient on-site image classification via our Quantized-CNN, with only negligible loss in accuracy.

Installation

We have prepared a file (500+MB) containing 1k images drawn from the ILSVRC-12 validation set for a more accurate speed-test. You can download it from here, and put it under the "ILSVRC12.227x227.IMG" directory.

For the original AlexNet model, you can download the corresponding model files from here, and put them under the "AlexNet/Bin.Files" directory.

Updates (23/08/30): The AlexNet model file and validation images have been moved to Google Drive.

Prior to compilation, you need to install ATLAS and OpenVML, and modify the "CXXFLAGS" and "LDFLAGS" entries in the Makefile, if needed. Also, you should append the corresponding library paths to LD_LIBRARY_PATH in the ~/.bashrc. After that, use "make" to generate the executable file and "make run" to perform the speed-test with the above 1k images.

You can also use our code for single image classification (BMP format). Please refer to "src/Main.cc" for details.

Speed-test

The experiment is carried out on a single desktop PC, equipped with an Intel® Core™ i7-4790K CPU and 32GB RAM. All programs are executed in the single-thread mode, without GPU acceleration. Note that the run-time speed comparison result may vary under different hardware conditions.

We compare the run-time speed of AlexNet, for which Quantized-CNN's theoretical speed-up is 4.15×. For the baseline method, we use the Caffe implementation, compiled with ATLAS (default BLAS choice). We measure the forward-passing time per image, based on the average of 100 batches. Each batch contains a single image, since in practice, users usually take one photo with their cellphones and then fed it into the ConvNet for classification. The experiment is repeated five times and here are the results:

Time (ms)	CNN	Quantized-CNN	Speed-up
1	167.431	55.346	-
2	168.578	55.382	-
3	166.120	55.372	-
4	172.792	55.389	-
5	164.008	55.250	-
Ave.	167.786	55.348	3.03×

Quantized-CNN achieves 3.03× speed-up against the Caffe implementation, slightly lower than the theoretical one but still quite acceptable. Meanwhile, our method requires much less memory and storage space, which is critical for mobile applications.

Citation

Please cite our paper if it helps your research:

@inproceedings{wu2016quantized,
  author = {Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng},
  title = {Quantized Convolutional Neural Networks for Mobile Devices},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2016},
}

quantized-cnn's People

Contributors

Stargazers

Watchers

Forkers

clcarwin qinhongwei zencoding joinwei baiyancheng20 silasxue zhangxujinsh onlysang wanjinchang qingsong99 ml-lab jassonvia tammyyang templeblock saturosfz deercoder bestdecoder hyzcn issac8huxley uiyo guoyilin xuezhi-liang pranavsreedhar yingxiaosan caomw maxosprojects achao2013 brb-chen yuchen112358 mvpduncan soledad89 kevin0932 wzkg2012 jtn-ms wangsheng1991 cloudstdio zehaos cwlseu wllw210 zengjianping zzqiuzz navneet1083 ezineo trantorrepository milestonesvn iefiac guanlicome poisonbox cys4 nagyistge jiangqh zgsxwsdxg baifanysu walter1218 mynameischaos wltongxing phenixi jiancunwang ycxia 6676401088 scapeqin yogsin stevejamesyang zhly0 insmod-he column6942 suzhenghang sunmy-seu haiyang21 sucrerouge yimingchan hanzz2007 weitaoatvison greenteahua ewenwan shiyongde yuankkk clhne jeannieway csuk0914 gaimjkp cauthyzheng hi-yan prozyy parety goodgoodstudy92 jinagdayou127333 zhipingzhou he9995 cqray1990 lovepan1 wlj0417 mathpopo jack-chen-435 drabblesaur jawaechan

quantized-cnn's Issues

Quantizeing the Fully-connected Layer--->look-up table

Hi! In your paper with respect to quantizing the FC layers,you divide weight matrix into M subspaces represented by a product of D and B.And during the test-phase, you store the results given by the computation of the inner products between S(m) and every sub-codeword in D(m) int a look-up table. For inputs as images which are different， they give different inputs S(m),so how's a look-up table working? Thank you!

Training Caffe models with Quantized-CNN

Hi,
Thanks for code reproducing the paper, I was able to reproduce the test phase computation following the directions. I was however unable to find any code to generate compact representation (BIN) files for Caffe models, the AlexNet is provided in the repo but there is no code to create the BIN files for models such as VggCnnS or CaffeNet. I reviewed the code, it is basically loading precomputed code files from BIN files and then performing optimized layer operations on it.

Do you have any instructions or code on how to use pretrained Caffe models and generate the corresponding BIN files?

help

Sorry to trouble you! But I could not download the two files in the onedrive, so could you give another link? Thanks

how to do the quantization about inputs and weights ?

first thanks for you paper.
I want to know ,how to quantization the weights and input about about the result ??

can you tell me some code about quantization ?

make run unsuccess

hello!
Thanks for the code! I was following the steps of installation, the command is
make run
It seems alright at the start but soon it came out with this error:
...
swIndvLayerLst #13: 0.5279 (s)
swIndvLayerLst #14: 0.0028 (s)
swIndvLayerLst #15: 0.0038 (s)
swIndvLayerLst #16: 0.4414 (s)
swIndvLayerLst #17: 0.0003 (s)
swIndvLayerLst #18: 0.0003 (s)
swIndvLayerLst #19: 0.1936 (s)
swIndvLayerLst #20: 0.0003 (s)
swIndvLayerLst #21: 0.0002 (s)
swIndvLayerLst #22: 0.1903 (s)
swIndvLayerLst #23: 0.0024 (s)
elapsed time: 7.9943 (s)
*** Error in `bin/QuanCNN': double free or corruption (!prev): 0x00000000019072c0 ***
make: *** [run] Aborted (core dumped)

Well that looked to me is an invoking problem of internal memory?
Did that ever come to you? Or maybe i just made any steps wrong?

absence file

Hello,
Thanks for code reproducing the paper,.
however,when i set the mode "ENUM_CompMethod::Prec",it is the mode that caffe used. it will see [ERROR] could not open file at .//AlexNet/Bin.Files/bvlc_alexnet_aCaF.convKnl.01.bin
[ERROR] could not open file at .//AlexNet/Bin.Files/bvlc_alexnet_aCaF.convKnl.05.bin
[ERROR] could not open file at .//AlexNet/Bin.Files/bvlc_alexnet_aCaF.convKnl.09.bin
[ERROR] could not open file at .//AlexNet/Bin.Files/bvlc_alexnet_aCaF.convKnl.13.bin
[ERROR] could not open file at .//AlexNet/Bin.Files/bvlc_alexnet_aCaF.fcntWei.16.bin
[ERROR] could not open file at .//AlexNet/Bin.Files/bvlc_alexnet_aCaF.fcntWei.19.bin
[ERROR] could not open file at .//AlexNet/Bin.Files/bvlc_alexnet_aCaF.fcntWei.22.bin
etc..
it seems you have forget to upload it .
if i hadn't that file ,i only can get the test time using Q-CNN,and can't make a contrast between Q-CNN and caffe .
I hope you can upload it ,
Thanks you very much~

Error correction and fine-tune questions

Hello!!
I have two questions about the implementation.

How many training data would be used in error correction step? And did FC layer and Conv layer use the same amount of training data to do error correction?
In the paper, you mention fine-tune after quantization. How did fine-tune work? Will fine-tune maintain the structure of D and B?

Thanks!!

the onedrive file not exist( ILSVRC12.227x227.IMG and AlexNet/Bin.Files )

I tried the links ILSVRC12.227x227.IMG AlexNet/Bin.Files from README.md, but the corresponding file did not exist

question about your paper

hello,thank you for your contribution to the model quantification model
but I have some problems when I read your paper,
when you quantification the conv kernel and FC,what do you mean by pre-computation?
I can not understand it,the input can not be sure during the test phase ,how to realize the pre-computation

help

If I want to test the origial model, what should I do? And I put the origial model file in the dir , there is no difference.

help

Hello, will your home page open source code Q-CNN is also about quantifying acceleration? If I would like to ask how to use it, if you can write a description of the use I would be appreciate it,thanks.

questions about codebook

Hi,
I still have several question after reading your code and paper.

if the FC network parameter is W (matrix), input is x (vector), so the inference should be y = Wx

split the W into M sub-matrices, are the M sub-matrices share the same codebook, or they have their own codebook (totol M codebooke should be stored)?
the test-data-set (labelled data) is needed to compute the D(m) and B(m), is it?
For each subspace, we compute the inner products between S(m) and every sub-codeword in D(m), and store the results in a look-up table while the S(m) is splitted input layer, whose value should be know only when we need to inference (e.g., a new picture comes to the cnn), how can we compute the inner product in advance?

I may not totally understand the method, could you please explain those questions? Thanks a lot!

Other Model Architectures

Hello -- Will you please share guidance on how to apply the quantization on other model architectures, such as Squeezenet? Thank you in advance.

您的这篇工作十分有趣，请问有pytorch版本的实现吗？

OpenVML installation

Wondering which folder should I direct to for OpenVML installation?
When I run make, it return

mkdir -p bin
mkdir -p obj
g++ -I/usr/include/atlas -I/opt/OpenVML/include -Wall -std=c++11 -O2 -D ENABLE_ATLAS -D ENABLE_OPENVML -c src/BlasWrapper.cc -o obj/BlasWrapper.o
In file included from src/BlasWrapper.cc:8:0:
src/../include/BlasWrapper.h:84:23: fatal error: openvml.h: No such file or directory
   #include <openvml.h>
                       ^
compilation terminated.
make: *** [obj/BlasWrapper.o] Error 1

Program crashed right before ends: Double release buffer in ~CaffeEva::~CaffeEva(void)

In CaffeEva::~CaffeEva(void) , I think it is inappropriate that calling ~Matrix() manually. It causes double release problem due to matrix allocated in stack will be released automatically.
This should be enough ^_^:
CaffeEva::~CaffeEva(void) {
// release dynamically allocated memory
delete [] featMapLst;
for (int layerInd = 0; layerInd < caffeParaObj.layerCnt; layerInd++) {
FeatBufStrLst& featBufStrLst = featBufStrMat[layerInd];
for (std::size_t bufInd = 0; bufInd < featBufStrLst.size(); bufInd++) {
if (featBufStrLst[bufInd].pFeatBuf != nullptr) {
delete featBufStrLst[bufInd].pFeatBuf;
} // ENDIF: featBufStrLst
} // ENDFOR: bufInd
} // ENDFOR: layerInd

// destory objects:
}

如何实现Error Correction

您好，没有在您开源的代码中找到Error Correction的实现代码，请问是怎么实现的

Is QCNN suitable to fully convolution net(FCN)？

Dear @jiaxiang-wu ,
I noitced that the input data size of all example nets are fixed. What if test image size are diffrient from each other? For example, some pixel-wise classification tasks run on FCN. Theoretically, Can the quantized technology apply on FCN? If yes, any modification on this project needed?
Best regard

The result is weird.

Hi, @jiaxiang-wu , thanks for your demo, I set bool kEnblAprxComp = false or true, to test the time of 1000 images.

bool kEnblAprxComp = false; the result is :
ACCURACY@1: 542, 54.20%
elapsed time: 143.5600 (s)

bool kEnblAprxComp = true; the result is :
ACCURACY@1: 545, 54.50%
elapsed time: 116.8000 (s)

the result is weird:

The approximation is better than approximation.
the time is almost the same. why?

https://github.com/jiaxiang-wu/caffe-QCNN

https://github.com/jiaxiang-wu/caffe-QCNN
You github above did not contain the quantized convolution layer ？

implement issues about error correction for fully connect layer ？

I try to implement the error correction for the FC, but I got a problam that as updaing D(m) and B(m) goes by, some subspces maping table B(m) trend to consuct by only a few sub words, that's leads to updating abort. ?

I try to figure out the reason, it seems that i miss uderstading the definion of residual ? the folling snippet is my implement of coumute the residual of one subspace on all the input.

def conpute_all_residual_on_one_dim(self, sub_dim_index): construct_ouptput = np.zeros(self.N_resudial.shape) for i in xrange(self.num_sub_dims): if sub_dim_index == i: continue table_B = self.from_asmt_data_get_index_table(i) #print table_B.shape, self.centeroid_data[i].shape res_dot = np.dot(table_B, self.centeroid_data[i].T) # dot([4*64], [64*6916]) # sum up residual construct by different subspace # N*6915 = dot([N*4], dot([4*64], [64*6916])) construct_ouptput += np.dot(self.feat_in[:, i*self.len_sub_dim:(i+1)*self.len_sub_dim], res_dot.T) self.N_resudial = self.feat_out - construct_ouptput

@jiaxiang-wu

question about pre-computed LUT

hi @jiaxiang-wu ,
You teach us to quantize fully-connected layer weights via codebooks. But There is not any instruction in the paper for how the input S(m) could be quantized, So I am not every clear that how the LUT could be pre-computed. Would you pls to explain this more?

How to apply your framework on Android platform

just as the title, i am wandering howto apply the quantized cnn to an Android phone

Not able to read binary file

I am not able to read binary file which are in Alexnet/Bin.Files directory, which i downloaded from the url mentioned in README. Getting following error:

static bool FileIO::ReadBinFile(const string&, Matrix*) [with T = float; std::__cxx11::string = std::__cxx11::basic_string]: Assertion `rtnVal == dataCntInBuffer' failed.

It would be great help if you could tell me the cause for this.

jiaxiang-wu / quantized-cnn Goto Github PK

quantized-cnn's Introduction

Quantized-CNN for Mobile Devices

Installation

Speed-test

Citation

quantized-cnn's People

Contributors

Stargazers

Watchers

Forkers

quantized-cnn's Issues

Recommend Projects

Recommend Topics

Recommend Org