mmkd's Introduction

MMKD

This repo covers the implementation of the following ICME 2023 paper: Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Installation

This repo was tested with Python 3.6, PyTorch 1.8.1, and CUDA 11.1.

Running

Before distill the student, be sure to put the teacher model directory in setting.py.

nohup python train_meta.py --model_s vgg8 --teacher_num 3 --distill inter --ensemble_method META --nesterov -r 1 -a 1 -b 100 --hard_buffer  --convs  --trial 0  --gpu_id 0&

where the flags are explained as:

--distill: specify the distillation method
--model_s: specify the student model, see 'models/init.py' to check the available model types.
-r: the weight of the cross-entropy loss between logit and ground truth, default: 1
-a: the weight of the KD loss, default: 1
-b: the weight of other distillation losses, default: 0
--teacher_num: specify the ensemble size (number of teacher models)
--ensemble_method: specify the ensemble_method
--hard_buffer: whether a hard buffer is required
convs: the way of feature alignment. If not, just use 1x1 convolution for alignment

Citation

If you find this repository useful, please consider citing the following paper:

Acknowledgement

The implementation of compared methods are mainly based on the author-provided code and the open-source benchmark https://github.com/HobbitLong/RepDistiller and https://github.com/alinlab/L2T-ww.

mmkd's People

Contributors

Stargazers

Watchers

mmkd's Issues

About Paper

Congratulations on the acceptance of MMKD, and when will the paper be released? Thanks!

Best
lujun

File "D:\Multi-Teacher Knowledge Distillation\MMKD-main\helper\meta_optimizer.py", line 90, in meta_backward
a_new = (a[0].mul(1-lr*wd).add_(wd, a[1]).add_(p.grad.data),
AttributeError: 'NoneType' object has no attribute 'data'

How to solve it？

About pretrain model

Hello, I didn't find any instructions on pre-training models in redeme, where should I download them or which code should I use to train them, thanks!

Recommend Projects