Awesome Knowledge-Distillation

Different forms of knowledge
KD + GAN
KD + Meta-learning
Data-free KD
KD + AutoML
KD + RL
Multi-teacher KD
Cross-modal KD
Application of KD
- for NLP
Model Pruning or Quantization
Beyond

Different forms of knowledge

Knowledge from logits

Distilling the knowledge in a neural network. Hinton et al. arXiv:1503.02531
Learning from Noisy Labels with Distillation. Li, Yuncheng et al. ICCV 2017
Training Deep Neural Networks in Generations:A More Tolerant Teacher Educates Better Students. arXiv:1805.05551
Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018
Learning Metrics from Teachers: Compact Networks for Image Embedding. Yu, Lu et al. CVPR 2019
Relational Knowledge Distillation. Park, Wonpyo et al, CVPR 2019
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. Huang, Zehao and Wang, Naiyan. 2017
On Knowledge Distillation from Complex Networks for Response Prediction. Arora, Siddhartha et al. NAACL 2019
On the Efficacy of Knowledge Distillation. Cho, Jang Hyun and Hariharan, Bharath. arXiv:1910.01348. ICCV 2019
[noval]Revisit Knowledge Distillation: a Teacher-free Framework. Yuan, Li et al. arXiv:1909.11723
Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher. Mirzadeh et al. arXiv:1902.03393
Ensemble Distribution Distillation. ICLR 2020
Noisy Collaboration in Knowledge Distillation. ICLR 2020
On Compressing U-net Using Knowledge Distillation. arXiv:1812.00249
Distillation-Based Training for Multi-Exit Architectures. Phuong, Mary and Lampert, Christoph H. ICCV 2019
Self-training with Noisy Student improves ImageNet classification. Xie, Qizhe et al.(Google) arXiv:1911.04252
Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework. arXiv:1910.12061
Preparing Lessons: Improve Knowledge Distillation with Better Supervision. arXiv:1911.07471
Adaptive Regularization of Labels. arXiv:1908.05474
Positive-Unlabeled Compression on the Cloud. Xu, Yixing(HUAWEI) et al. NIPS 2019
Snapshot Distillation: Teacher-Student Optimization in One Generation. Yang, Chenglin et al. CVPR 2019
QUEST: Quantized embedding space for transferring knowledge. Jain, Himalaya et al. CVPR 2020(pre)

Knowledge from intermediate layers

Fitnets: Hints for thin deep nets. Romero, Adriana et al. arXiv:1412.6550
Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Zagoruyko et al. ICLR 2017
Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks. Zhang, Zhi et al. arXiv:1710.09505
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. Yim, Junho et al. CVPR 2017
Paraphrasing complex network: Network compression via factor transfer. Kim, Jangho et al. NIPS 2018
Knowledge transfer with jacobian matching. ICML 2018
Self-supervised knowledge distillation using singular value decomposition. Lee, Seung Hyun et al. ECCV 2018
Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019 9
Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
Knowledge Distillation via Route Constrained Optimization. Jin, Xiao et al. ICCV 2019
Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019
A Comprehensive Overhaul of Feature Distillation. Heo, Byeongho et al. ICCV 2019
Feature-map-level Online Adversarial Knowledge Distillation. ICLR 2020
Distilling Object Detectors with Fine-grained Feature Imitation. ICLR 2020
Knowledge Squeezed Adversarial Network Compression. Changyong, Shu et al. AAAI 2020
Stagewise Knowledge Distillation. Kulkarni, Akshay et al. arXiv: 1911.06786
Knowledge Distillation from Internal Representations. AAAI 2020
Knowledge Flow:Improve Upon Your Teachers. ICLR 2019
LIT: Learned Intermediate Representation Training for Model Compression. ICML 2019

Graph-based

Graph-based Knowledge Distillation by Multi-head Attention Network. Lee, Seunghyun and Song, Byung. Cheol arXiv:1907.02226
Graph Representation Learning via Multi-task Knowledge Distillation. arXiv:1911.05700
Deep geometric knowledge distillation with graphs. arXiv:1911.03080

Mutual Information

Correlation Congruence for Knowledge Distillation. Peng, Baoyun et al. ICCV 2019
Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019
Contrastive Representation Distillation. Tian, Yonglong et al. arXiv: 1910.10699

Self-KD

Moonshine:Distilling with Cheap Convolutions. Crowley, Elliot J. et al. NIPS 2018
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. Zhang, Linfeng et al. ICCV 2019
Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding. Clark, Kevin et al. ACL 2019,short
Self-Knowledge Distillation in Natural Language Processing. Hahn, Sangchul and Choi, Heeyoul. arXiv:1908.01851
Rethinking Data Augmentation: Self-Supervision and Self-Distillation. Lee, Hankook et al. ICLR 2020
Regularizing Predictions via Class wise Self knowledge Distillation. ICLR 2020
MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks. arXiv:1911.09418

Structured Knowledge

Paraphrasing Complex Network:Network Compression via Factor Transfer. Kim, Jangho et al. NIPS 2018
Relational Knowledge Distillation. Park, Wonpyo et al. CVPR 2019
Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
Contrastive Representation Distillation. Tian, Yonglong et al. arXiv: 1910.10699
Teaching To Teach By Structured Dark Knowledge. ICLR 2020

Privileged Information

Learning using privileged information: similarity control and knowledge transfer. Vapnik, Vladimir and Rauf, Izmailov. MLR 2015
Unifying distillation and privileged information. Lopez-Paz, David et al. ICLR 2016
Model compression via distillation and quantization. Polino, Antonio et al. ICLR 2018
KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018
[noval]Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
Retaining privileged information for multi-task learning. Tang, Fengyi et al. KDD 2019
A Generalized Meta-loss function for regression and classification using privileged information. Asif, Amina et al. arXiv:1811.06885

KD + GAN

Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks. Xu, Zheng et al. arXiv:1709.00513
KTAN: Knowledge Transfer Adversarial Network. Liu, Peiye et al. arXiv:1810.08126
KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018
Adversarial Learning of Portable Student Networks. Wang, Yunhe et al. AAAI 2018
Adversarial Network Compression. Belagiannis, Vasileios et al. ECCV 2018
Cross-Modality Distillation: A case for Conditional Generative Adversarial Networks. ICASSP 2018
Adversarial Distillation for Efficient Recommendation with External Knowledge. TOIS 2018
Training student networks for acceleration with conditional adversarial networks. Xu, Zheng et al. BMVC 2018
[noval]DAFL:Data-Free Learning of Student Networks. Chen, Hanting et al. ICCV 2019
MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019
Knowledge Distillation with Adversarial Samples Supporting Decision Boundary. Heo, Byeongho et al. AAAI 2019
Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection. Liu, Jian et al. AAAI 2019
Adversarially Robust Distillation. Goldblum, Micah et al. AAAI 2020
GAN-Knowledge Distillation for one-stage Object Detection. Hong, Wei et al. arXiv:1906.08467
Lifelong GAN: Continual Learning for Conditional Image Generation. Kundu et al. arXiv:1908.03884
Compressing GANs using Knowledge Distillation. Aguinaldo, Angeline et al. arXiv:1902.00159
Feature-map-level Online Adversarial Knowledge Distillation. ICLR 2020
MineGAN: effective knowledge transfer from GANs to target domains with few images. Wang, Yaxing et al. arXiv:1912.05270

KD + Meta-learning

Few Sample Knowledge Distillation for Efficient Network Compression. Li, Tianhong et al. ICLR 2020
Learning What and Where to Transfer. Jang, Yunhun et al, ICML 2019
Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019
Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation. arXiv:1911.05329v1
Progressive Knowledge Distillation For Generative Modeling. ICLR 2020
Few Shot Network Compression via Cross Distillation. AAAI 2020

Data-free KD

Data-Free Knowledge Distillation for Deep Neural Networks. NIPS 2017
Zero-Shot Knowledge Distillation in Deep Networks. ICML 2019
DAFL:Data-Free Learning of Student Networks. ICCV 2019
Zero-shot Knowledge Transfer via Adversarial Belief Matching. Micaelli, Paul and Storkey, Amos. NIPS 2019
Dream Distillation: A Data-Independent Model Compression Framework. Kartikeya et al. ICML 2019
Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion. Yin, Hongxu et al. CVPR 2020
Data-Free Adversarial Distillation. Fang, Gongfan et al. CVPR 2020

other data-free model compression:

Data-free Parameter Pruning for Deep Neural Networks. Srinivas, Suraj et al. arXiv:1507.06149
Data-Free Quantization Through Weight Equalization and Bias Correction. Nagel, Markus et al. ICCV 2019

KD + AutoML

Improving Neural Architecture Search Image Classifiers via Ensemble Learning. Macko, Vladimir et al. 2019
Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Li, Changlin et al. arXiv:1911.13053v1
Towards Oracle Knowledge Distillation with Neural Architecture Search. Kang, Minsoo et al. AAAI 2020

KD + RL

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
Knowledge Flow:Improve Upon Your Teachers. Liu, Iou-jen et al. ICLR 2019
Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
Exploration by random network distillation. Burda, Yuri et al. ICLR 2019

Multi-teacher KD

Learning from Multiple Teacher Networks. You, Shan et al. KDD 2017
Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data. ICLR 2017
Knowledge Adaptation: Teaching to Adapt. Arxiv:1702.02052
Deep Model Compression: Distilling Knowledge from Noisy Teachers. Sau, Bharat Bhusan et al. arXiv:1610.09650v2
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Tarvainen, Antti and Valpola, Harri. NIPS 2017
Born-Again Neural Networks. Furlanello, Tommaso et al. ICML 2018
Deep Mutual Learning. Zhang, Ying et al. CVPR 2018
Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018
Collaborative learning for deep neural networks. Song, Guocong and Chai, Wei. NIPS 2018
Data Distillation: Towards Omni-Supervised Learning. Radosavovic, Ilija et al. CVPR 2018
Multilingual Neural Machine Translation with Knowledge Distillation. ICLR 2019
Unifying Heterogeneous Classifiers with Distillation. Vongkulbhisal et al. CVPR 2019
Distilled Person Re-Identification: Towards a More Scalable System. Wu, Ancong et al. CVPR 2019
Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System. Yang, Ze et al. WSDM 2020
FEED: Feature-level Ensemble for Knowledge Distillation. Park, SeongUk and Kwak, Nojun. arXiv:1909.10754(AAAI20 pre)
Stochasticity and Skip Connection Improve Knowledge Transfer. Lee, Kwangjin et al. ICLR 2020
Online Knowledge Distillation with Diverse Peers. Chen, Defang et al. AAAI 2020
Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation. ICCV 2019

Cross-modal KD

SoundNet: Learning Sound Representations from Unlabeled Video SoundNet Architecture. Aytar, Yusuf et al. ECCV 2016
Cross Modal Distillation for Supervision Transfer. Gupta, Saurabh et al. CVPR 2016
Emotion recognition in speech using cross-modal transfer in the wild. Albanie, Samuel et al. ACM MM 2018
Through-Wall Human Pose Estimation Using Radio Signals. Zhao, Mingmin et al. CVPR 2018
Compact Trilinear Interaction for Visual Question Answering. Do, Tuong et al. ICCV 2019
Cross-Modal Knowledge Distillation for Action Recognition. Thoker, Fida Mohammad and Gall, Juerge. ICIP 2019
Learning to Map Nearly Anything. Salem, Tawfiq et al. arXiv:1909.06928
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation. Kundu et al. ICCV 2019
CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency. Chen, Yun-Chun et al. CVPR 2019
XD:Cross lingual Knowledge Distillation for Polyglot Sentence Embeddings. ICLR 2020
Effective Domain Knowledge Transfer with Soft Fine-tuning. Zhao, Zhichen et al. arXiv:1909.02236
ASR is all you need: cross-modal distillation for lip reading. Afouras et al. arXiv:1911.12747v1

Application of KD

Face model compression by distilling knowledge from neurons. Luo, Ping et al. AAAI 2016
Learning efficient object detection models with knowledge distillation. Chen, Guobin et al. NIPS 2017
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. Mishra, Asit et al. NIPS 2018
Distilled Person Re-identification: Towars a More Scalable System. Wu, Ancong et al. CVPR 2019
[noval]Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
Fast Human Pose Estimation. Zhang, Feng et al. CVPR 2019
Distilling knowledge from a deep pose regressor network. Saputra et al. arXiv:1908.00858 (2019)
Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
Structured Knowledge Distillation for Semantic Segmentation. Liu, Yifan et al. CVPR 2019
Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. Dong, Xuanyi and Yang, Yi. ICCV 2019
Progressive Teacher-student Learning for Early Action Prediction. Wang, Xionghui et al. CVPR2019
Lightweight Image Super-Resolution with Information Multi-distillation Network. Hui, Zheng et al. ICCVW 2019
AWSD:Adaptive Weighted Spatiotemporal Distillation for Video Representation. Tavakolian, Mohammad et al. ICCV 2019
Dynamic Kernel Distillation for Efficient Pose Estimation in Videos. Nie, Xuecheng et al. ICCV 2019
Teacher Guided Architecture Search. Bashivan, Pouya and Tensen, Mark. ICCV 2019
Online Model Distillation for Efficient Video Inference. Mullapudi et al. ICCV 2019
Distilling Object Detectors with Fine-grained Feature Imitation. Wang, Tao et al. CVPR2019
Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
Knowledge Distillation for Incremental Learning in Semantic Segmentation. arXiv:1911.03462
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization. arXiv:1910.12295
Teacher-Students Knowledge Distillation for Siamese Trackers. arXiv:1907.10586
LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning. Zhao, Albert et al. CVPR 2020(pre)
Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition. Meng, Zhong et al. arXiv:2001.01798

for NLP

Patient Knowledge Distillation for BERT Model Compression. Sun, Siqi et al. arXiv:1908.09355
TinyBERT: Distilling BERT for Natural Language Understanding. Jiao, Xiaoqi et al. arXiv:1909.10351
Learning to Specialize with Knowledge Distillation for Visual Question Answering. NIPS 2018
Knowledge Distillation for Bilingual Dictionary Induction. EMNLP 2017
A Teacher-Student Framework for Maintainable Dialog Manager. EMNLP 2018
Understanding Knowledge Distillation in Non-Autoregressive Machine Translation. arxiv 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Sanh, Victor et al. arXiv:1910.01108
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Turc, Iulia et al. arXiv:1908.08962
On Knowledge distillation from complex networks for response prediction. Arora, Siddhartha et al. NAACL 2019
Distilling the Knowledge of BERT for Text Generation. arXiv:1911.03829v1
Understanding Knowledge Distillation in Non-autoregressive Machine Translation. arXiv:1911.02727
MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer. ICLR 2020
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation. Weng, Rongxiang et al. AAAI 2020

Model Pruning or Quantization

Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression. ECCV 2016
N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
Slimmable Neural Networks. Yu, Jiahui et al. ICLR 2018
Co-Evolutionary Compression for Unpaired Image Translation. Shu, Han et al. ICCV 2019
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. Liu, Zechun et al. ICCV 2019
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning. ICLR 2020
Pruning with hints: an efficient framework for model acceleration. ICLR 2020
Training convolutional neural networks with cheap convolutions and online distillation. arXiv:1909.13063
Cooperative Pruning in Cross-Domain Deep Neural Network Compression. Chen, Shangyu et al. IJCAI 2019
QKD: Quantization-aware Knowledge Distillation. Kim, Jangho et al. arXiv:1911.12491v1

Beyond

Do deep nets really need to be deep?. Ba,Jimmy, and Rich Caruana. NIPS 2014
When Does Label Smoothing Help? Müller, Rafael, Kornblith, and Hinton. NIPS 2019
Towards Understanding Knowledge Distillation. Phuong, Mary and Lampert, Christoph. AAAI 2019
Harnessing deep neural networks with logucal rules. ACL 2016
Adaptive Regularization of Labels. Ding, Qianggang et al. arXiv:1908.05474
Knowledge Isomorphism between Neural Networks. Liang, Ruofan et al. arXiv:1908.01581
Role-Wise Data Augmentation for Knowledge Distillation. ICLR 2020
Neural Network Distiller: A Python Package For DNN Compression Research. arXiv:1910.12232
(survey)Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation. arXiv:1912.13179

Note: All papers pdf can be found and downloaded on Bing or Google.

Source: https://github.com/FLHonker/Awesome-Knowledge-Distillation

Contact: Yuang Liu([email protected]), AIDA, ECNU.

kevinking / awesome-knowledge-distillation-1 Goto Github PK

awesome-knowledge-distillation-1's Introduction

Awesome Knowledge-Distillation

Different forms of knowledge

Knowledge from logits

Knowledge from intermediate layers

Graph-based

Mutual Information

Self-KD

Structured Knowledge

Privileged Information

KD + GAN

KD + Meta-learning

Data-free KD

KD + AutoML

KD + RL

Multi-teacher KD

Cross-modal KD

Application of KD

for NLP

Model Pruning or Quantization

Beyond

awesome-knowledge-distillation-1's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org