- Deep Compression: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING. (Stanford, Tsinghua University, ICLR 2016)
- Network pruning
- weights below a threshold are removed from the network
- retrain weights
- sparse structure stored by difference instead of absolute position (CSR or CSC)
- Weight sharing
- quantize to shared weights by k-means clustering
- centroid finetune by gredient
- after quantization and finetune, results show that linear initialization works best
- Huffman coding
- weight index (index of shared weights) and sparse martrix location index (difference of weight location) are encoded, saves 20%-30% of storage
- Network pruning
- Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing. (University of Toronto, University of British Columbia, ISCA'16)
- A DNN accelerator that can dynamically eliminate most ineffectual multiplications.
- Targets convolutional layers of DNNs which dominate the execution time
- CNV decouples the neuron lanes (input channel) which were working synchronously, allowing them to proceed independently
- only non-zeros appear in the input buffer (eliminated at output of the preceding layer), stored by value and index (generate on-the-fly)
- input neurons is divided into several blocks by dimension, independently process each brick, but some lanes may have wait other lanes complete the processing of current window
- further improve performance by pruning weights close to zeros, with a loss in accuracy
- EIE: Efficient Inference Engine on Compressed Deep Neural Network. (Stanford University, Tsinghua University, ISCA'16)
- First acclerator for sparse and weight sharing neural networks
- achieves weight sharing by store only index of quantized weights (a shared table between PEs)
- Targets the fully connected layers, to performs inference on compressed models
- Proposed customized sparse matrix multiplication, which exploit both static and dynamic sparsity of the model
- static sparsity: weights stored by variation of CSR (sparsity of weights)
- dynamic sparsity: leading non-zeros detection (sparsity of input vectors)
- 1st: broadcast non-zero to each PE. 2nd: walk through weights of that column(from start of this column to start of next column)
- Proposed a method of both distributed computation and storage to parallelize sparsified layer across multiple PEs
- FIFO used as activation queue to achieve load balance between multiple PEs
- First acclerator for sparse and weight sharing neural networks
- Minerva: Enabling Low-Power, High-Accuracy Deep Neural Network Accelerators. (Harvard University, ISCA'16)
- highly accurate, ultra-low power DNN acclerator
- data type quantization: input, weights, output at each layer are quantized into different types (different integer and fractional bits)
- selective operation pruning: removes operands close to zero (dynamically predicate)
- 75% of activities can be safely pruned (at threshold value of 1.05)
- SRAM fault mitigation: low overhead fault mitigation techniques (since fault rate of SRAM increases with reduction of voltage), to tolerate reduced SRAM supply voltages
- faults in SRAM are modeled as random bit-flips in the weight matrix
- flip a high-order bit of zero dramatically affect the accuracy
- bit masking: detected faulted bits are set to zero
- relative benefits from each optimization is different on each data set
- SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. (NVIDIA, UC-Berkeley, Stanford University, ISCA'17)
- sparse CNN accelerator architecture, exploits both weight and activation sparsity to improve performance and power
- exploit sparsity by:
- compressing data: encode data
- eliminating computation: gate zero weights and activations
- employ a Cartesian product dataflow
- consider a all-to-all production instead of sliding window based convolution
- input/output activation are tailed, cross-tile dependencies of convotion resolved by data halos
- Cambricon-X: An Accelerator for Sparse Neural Networks
- **
- Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. (MIT, NVIDIA, JSSC'2017)
- Present an energy analysis framework.
- Propose an energy-efficienct dataflow called Row Stationary, which considers three levels of reuse.
- Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. (Havard University, CAN 2014)
- a pre-RTL power-performance simulator for rapid design space exploration of accelerators
notes's Introduction
notes's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.