Coder Social home page Coder Social logo

group-sparsity-sbp's Introduction

Structured Bayesian Pruning
via Log-Normal Multiplicative Noise

This repo contains the code for our NIPS17 paper, Structured Bayesian Pruning via Log-Normal Multiplicative Noise (poster). In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized.

Spotlight video

Launch experiments

Example for launching LeNet5 experiment.

python ./scripts/lenet5-sbp.py

Example for launching VGG-like experiment. To obtain sparse VGG-like architecture we use pretrained network, so you can use your own weights, or train the network from scratch using the following command.

python ./scripts/vgglike.py --num_gpus <num GPUs>

Don't forget to adjust batch size to obtain the same number of iterations. For instance, for one GPU we use batch_size=100, for 2 GPUs we use batch_size=50.

Finally, use the following command to launch SBP model for VGG-like architecture.

python ./scripts/vgglike-sbp.py --num_gpus <num GPUs> --checkpoint <path to pretrained checkpoint>

MNIST Experiments

Results for LeNet architectures on MNIST

Network Method Error Neurons per Layer CPU GPU FLOPs
Lenet-fc Original 1.54 784 - 500 - 300 - 10 1.00 X 1.00 X 1.00 X
SparseVD 1.57 537 - 217 - 130 - 10 1.19 X 1.03 X 3.73 X
SSL 1.49 434 - 174 - 78 - 10 2.21 X 1.04 X 6.06 X
StructuredBP 1.55 245 - 160 - 55 - 10 2.33 X 1.08 X 11.23 X
LeNet5 Original 0.80 20 - 50 - 800 - 500 1.00 X 1.00 X 1.00 X
SparseVD 0.75 17 - 32 - 329 - 75 1.48 X 1.41 X 2.19 X
SSL 1.00 3 - 12 - 800 - 500 5.17 X 1.80 X 3.90 X
StructuredBP 0.86 3 - 18 - 284 - 283 5.41 X 1.91 X 10.49 X

CIFAR-10 Experiments

Results for VGG-like architecture on CIFAR-10 dataset. Here speed-up is reported for CPU. More detailed results are provided in the paper.

Citation

If you found this code useful please cite our paper

@incollection{
  neklyudov2018structured,
  title = {Structured Bayesian Pruning via Log-Normal Multiplicative Noise},
  author = {Neklyudov, Kirill and Molchanov, Dmitry and Ashukha, Arsenii and Vetrov, Dmitry P},
  booktitle = {Advances in Neural Information Processing Systems 30},
  editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
  pages = {6778--6787},
  year = {2017},
  publisher = {Curran Associates, Inc.},
  url = {http://papers.nips.cc/paper/7254-structured-bayesian-pruning-via-log-normal-multiplicative-noise.pdf}
}

group-sparsity-sbp's People

Contributors

necludov avatar senya-ashukha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

group-sparsity-sbp's Issues

the final loss

In paper, the final loss function is presented in equation (12),
the estimated expected log-likelihood through SGVB and KL divergence.

It seems that the SBP layer only takes KL divergence into account, why don't we need to deal with the expected log-likelihood term?

Is the log likelihood included as our objective function?

Elaboration on the pretrained model used

The paper mentions that for VGG-like training, a pretrained model was used. Could a link be provided for the checkpoint file of the pretrained model so the vgglike-sbp.py experiment can be replicated independently?

different erfcx approximation error on pytorch

Hi, I try to re-implement your paper on pytorch. I changed your erfcx function to adapt pytorch tensor. After compared to the values of special.erfcx(x), the average absolute error is approximate 2.11-08, and average relative error is approximate 3.91e-08, both are much larger than your erfcx approximation. Could this be a problem?

Thanks ,
Shangqian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.