Coder Social home page Coder Social logo

impl-pruning-tf's Introduction

TensorFlow implementation of "Iterative Pruning"

CAUTION: Out-of-date notices.

Currently, I've checked TF (>1.3) supports sparse_matmul and it seems that this is more correct way to implement iterative pruning. This work is just naively done with quite old versions (0.8.0) and thus, I do not recommend to consider these codes for your serious cases. And there will be no updates or maintenance either.


This work is based on "Learning both Weights and Connections for Efficient Neural Network." Song et al. @ NIPS '15. Note that these works are just for quantifying its effectiveness on latency (within TensorFlow), not a best optimal. Thus, some details are abbreviated for simplicity. (e.g. # of iterations, adjusted dropout ratio, etc.)

I applied Iterative Pruning on a small MNIST CNN model (13MB, originally), which can be accessed from TensorFlow Tutorials. After pruning off some percentages of weights, I've simply retrained two epochs for each case and got compressed models (minimum 2.6MB with 90% off) with minor loss of accuracy. (99.17% -> 98.99% with 90% off and retraining) Again, this is not an optimal.

Issues

Due to lack of supports on SparseTensor and its operations of TensorFlow (0.8.0), this implementation has some limitations. This work uses embedding_lookup_sparse to compute sparse matrix-vector multiplication. It is not solely for the purpose of sparse matrix vector multiplication, and thus its performance may be sub-optimal. (I'm not sure.) Also, TensorFlow uses <index, value> pair for sparse matrix rather than using typical CSR format which is more compact and performant. In summary, because of the following reasons, I think this implementation has some limitations.

  1. embedding_lookup_sparse doesn't support broadcasting, which prohibits users to run test with normal test datasets.
  2. Performance may be somewhat sub-optimal.
  3. Because "Sparse Variable" is not supported, manual dense to sparse and sparse to dense transformation is required.
  4. 4D Convolution Tensor may also be applicable, but bit tricky.
  5. Current embedding_lookup_sparse forces additional matrix transpose, dimension squeeze and dimension reshape.

File descriptions and usages

model_ckpt_dense: original model
model_ckpt_dense_pruned: 90% pruned-only model
model_ckpt_sparse_retrained: 90% pruned and retrained model

Python package requirements

sudo apt-get install python-scipy python-numpy python-matplotlib

To regenerate these sparse model, edit config.py first as your threshold configuration, and then run training with second (pruning and retraining) and third (generate sparse form of weight data) round options.

./train.py -2 -3

To inference single image (seven.png) and measure its latency,

./deploy_test.py -d -m model_ckpt_dense
./deploy_test_sparse.py -d -m model_ckpt_sparse_retrained

To test dense model,

./deploy_test.py -t -m model_ckpt_dense
./deploy_test.py -t -m model_ckpt_dense_pruned
./deploy_test.py -t -m model_ckpt_dense_retrained

To draw histogram that shows the weight distribution,

# After running train.py (it generates .dat files)
./draw_histogram.py

Performance

Results are currently somewhat mediocre or degraded due to indirection and additional storage overhead originated from sparse matrix form. Also, it may because model size is too small. (12.49MB)

Storage overhead

Baseline: 12.49 MB
10 % pruned: 21.86 MB
20 % pruned: 19.45 MB
30 % pruned: 17.05 MB
40 % pruned: 14.64 MB
50 % pruned: 12.23 MB
60 % pruned: 9.83 MB
70 % pruned: 7.42 MB
80 % pruned: 5.02 MB
90 % pruned: 2.61 MB

CPU performance (5 times averaged)

CPU: Intel Core i5-2500 @ 3.3 GHz, LLC size: 6 MB

http://garion9013.github.io/images/cpu-desktop.png

Baseline: 0.01118040085 s
10 % pruned: 1.919299984 s
20 % pruned: 0.2325239658 s
30 % pruned: 0.2111079693 s
40 % pruned: 0.1982570648 s
50 % pruned: 0.1691776752 s
60 % pruned: 0.1305227757 s
70 % pruned: 0.116039753 s
80 % pruned: 0.103564167 s
90 % pruned: 0.1058168888 s

GPU performance (5 times averaged)

GPU: Nvidia Geforce GTX650 @ 1.058 GHz, LLC size: 256 KB

http://garion9013.github.io/images/gpu-desktop.png

Baseline: 0.1475181845 s
10 % pruned: 0.2954540253 s
20 % pruned: 0.2665398121 s
30 % pruned: 0.2585638046 s
40 % pruned: 0.2090051651 s
50 % pruned: 0.1995279789 s
60 % pruned: 0.1815193653 s
70 % pruned: 0.1436806202 s
80 % pruned: 0.135668993 s
90 % pruned: 0.1218701839 s

impl-pruning-tf's People

Contributors

younghwanoh avatar

Stargazers

Yunjie avatar

Watchers

Yunjie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.