Coder Social home page Coder Social logo

holmesshuan / dynamic-network-surgery-caffe-reimplementation Goto Github PK

View Code? Open in Web Editor NEW
18.0 2.0 0.0 8.16 MB

Caffe re-implementation of dynamic network surgery.

License: Other

CMake 2.81% Makefile 0.67% Shell 0.43% C++ 80.22% Cuda 5.86% MATLAB 0.90% Python 9.11%

dynamic-network-surgery-caffe-reimplementation's Introduction

Dynamic-Network-Surgery-Caffe-Reimplementation

Caffe reimplementation of dynamic network surgery(GPU-only/cuDNN unsupported yet).
Official repo at here (https://github.com/yiwenguo/Dynamic-Network-Surgery).

Main Differences:

  • We didn't prune the bias term.
  • We make the selection of hyper-parameters more clear and intuitive.
  • We re-adjust the organization of the codes.
    You may monitor the change of weights sparsity of convolution layers and fully-connected layers during training.
  • We re-write the original convolution layer and inner-product layer instead of creating new classes.
    It will be easier to reuse the existing .prototxt without modifying the layer types.

How to use ?

The sames as the original Caffe framework.

$ make all -j8 # USE_NCCL=1 make all -j8 for multi-GPU support
$ ./build/tools/caffe train --weights /ModelPath/Ur.caffemodel --solver /SolverPath/solver.prototxt -gpu 0
$ ./build/tools/caffe test --weights /ModelPath/Ur.caffemodel --model /StructPath/train_val.prototxt -gpu 0 -iterations 100
# Please notice: 
# CPU Version is not supported yet, but you may find it quite easy to rewrite conv_layer.cpp and innerproduct_layer.cpp from .cu files.

You may load pre-trained caffemodel into this framework to fine-tune (highly recommended) or re-train from the begining (remember to set the threshold in train_val.prototxt, which will be mentioned below).

Usage Example :

Pre-trained Caffemodel:
AlexNet with BN (https://github.com/HolmesShuan/AlexNet-BN-Caffemodel-on-ImageNet)
Sparse (50%) convolution layers should outperform the full-precison baseline.

Pruned Layer

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  # weight_mask param
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    pad: 2
    threshold: 0.6 ## based on the 68-95-99.7 rule [defalut 0.6]
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
    # weight_mask_filler { ## omissible 
    #   type: "constant"
    #   value: 1 ## This term has been reset to 1 in caffe.proto
    # }
  }
}

Dense Layer

layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    sparsity_term: false ## Default is true
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}

solver.prototxt

net: "models/bvlc_alexnet/train_val.prototxt"
base_lr: 0.001
lr_policy: "multistep"
gamma: 0.1
stepvalue: 84000
display: 20
max_iter: 162000
momentum: 0.9
weight_decay: 0.00005
snapshot: 6000
snapshot_prefix: "models/bvlc_alexnet/alexnet-BN"
solver_mode: GPU
surgery_iter_gamma: 0.0001 ## [default 1e-4] Probability(do surgery) = (1+gamma*iter)^-power 
surgery_iter_power: 1 ## [default 1] 

Tips

  1. The selection of threshold is pretty tricky. It may differ a lot between different layers.
  2. If you encounter the vanishing gradient problem, then adjust gamma and power in solver.prototxt.
    If multiple attempts failed, then you may reduce the threshold based on the 68-95-99.7 rule.
Threshold Sparsity
0.674 50%
0.994 68%
1.281 80%
1.644 90%
1.959 95%

Citation

Basic idea comes from:

@inproceedings{guo2016dynamic,    
  title = {Dynamic Network Surgery for Efficient DNNs},
  author = {Guo, Yiwen and Yao, Anbang and Chen, Yurong},
  booktitle = {Advances in neural information processing systems (NIPS)},
  year = {2016}
} 

And base on Caffe framework:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

dynamic-network-surgery-caffe-reimplementation's People

Contributors

holmesshuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dynamic-network-surgery-caffe-reimplementation's Issues

sparsity

thanks for your work, do you have good result about convolution with sparsity 90%? or in your best result, what sparsity do your set?

Not prune the bias

Hi Shuan,

Thanks for the great implementation.
I wonder what do you mean by 'didn't prune the bias term'.
Do you mean that you only use Wx (instead of Wx+b) to get the predictions and calculate the gradients?

For the pruned models of interests, should I use:

  1. both new weights and (original) bias (which does not make sense).
  2. only new weights (which may cause negative effects on the accuracy of original models because bia terms are omitted).

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.