Coder Social home page Coder Social logo

shihangatnk / yolov3_lite Goto Github PK

View Code? Open in Web Editor NEW

This project forked from artyze/yolov3_lite

0.0 1.0 0.0 47.57 MB

yolov3 model compress and acceleration (quantization, sparse), c++ version

Python 4.45% C 10.38% Shell 0.06% Cuda 2.43% C++ 82.58% Makefile 0.10%

yolov3_lite's Introduction

yolov3_lite

As my repo must run in industry embedded devices which has poor computer sources, so I have to compress and accelerate them step by step untill the inference time fit our boss's command :(

Backbone net of my project is yolov3-lite and optimise version.

In the process of creating my project, I have referenced some git projects and papers in cvpr, thanks to these guys.

I will continue to update afterwards, please stay tuned.

All accelerate switches can be found in MakeFile

[What tricks I used]

Multiple Threads

Set OPENMP := 1 in Makefile

If you know multiple threads run in arm of X86 chips, you must know Openmp.

Next picture is how Openmp runs. It has many tricks to ensure work well between threads.

The result of use openmp in project is:

Image text

Kernel Mask (net sparsity)

Set MASK := 1 in Makefile

It a regular method to decrease the computation of conv layers. But the key point is how to set which kernel is important and which kernel need to delete.

In this project, I referenced the paper of
Accelerating Convolutional Networks via Global & Dynamic Filter Pruning product of Tencent lab

The accelerating result of use kernel mask in project is:

Weights Prune

Set PRUNE := 1 in Makefile

Because this method is very simple, you just need to set weights < threshold to 0, so I don't need to introduce it anymore.

The accelerating result of use < kernel mask & weights prune > in project is:

L1 Regularization

L1 Regularization can be regard to another way to decrease kernels, the principle is like kernel decrease with BN parameters in other papers.

Yolo use L2 regularization as default, so you need to change it to L1 in code. This method has a disadvantage, you need to change cfg files after every epoch end (after one epoch train you know how many kernels to leave in every conv layer) k If you want to know more about L2 and L1 regularization in yolo, you can go to my blog

The accelerating result of use L1 Regulatization in project is:

Quantization

In the domain of network acceleration, Quantization is always the most important trick. I have realized two quantization type, which can be switched in Makefile.

Set QUANTIZATION := 1 in Makefile

This module were imported from AlexeyAB's github repo

As he introduced, this quantization method is referenced nvidia's TensorRT theory.

But when I test this module, it works not good, recently I added google's quantization method code to it.

Set QUANTIZATION_GOOGLE := 1 in Makefile

Paper: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [C]// CVPR, 2018

The most novelty idea is plug in Fake Quantization in train process. And you can get the input quantization scale directly after model training instead of run calibrate process in calibration dataset.

And for the purpose of implemente the project to embedded devices, I added gemm_lowp of google to darknet.

Depthwise Conv

The key point of Mobilenet, it has been merged in yolov3 by the author, I optimized the code so that l.groups can be used in every module.

[How to train the repo]

  1. analysis your original net, decide which module you need to use  
	  
  2. change makefile and open modules, for example, if you want to use image mask, you just need to set 
  `MASK=1`

  1. start train
  
    ./darknet detector train [data_file path] cfg/yolov3.cfg [pretrain weights file] 
     
   4. start test
   set 'GPU=0'
   
   ./darknet detector test [data_file path] cfg/yolov3.cfg [weights file] [image file to detect]

[How to test the repo]

I have pretrained a model in backup, you can have a try :)

  1. analysis your original net, decide which module you need to use  
	  
  2. change makefile and open modules, for example, if you want to use image mask, you just need to set 
  `MASK=1`
   
  3. normal test
    ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg 
  4. test with nvidia quantization
     1). set QUANTIZATION := 1
     2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg -quantized
  5. test with google quantization
     1). set QUANTIZATION_GOOGLE := 1
     2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg

[Something more]

1. I added F1 score test code, the command is :

./darknet detector f1 [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup

1. I also have some other modules such as `Hash Compress` `Huffman Compress`, but I can't give all of them to you with other 
reasons.

1. When I test all the method in tiny net(not in VGG), it can decrease inference time by 30%~50% with very little f1 decrease,
and if you want faster, use quantization, it will surprise you!!!!!

If you want to use my code, please let me know!!!!

yolov3_lite's People

Contributors

artyze avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.