yoshitomo-matsubara / torchdistill Goto Github PK

A coding-free framework built on PyTorch for reproducible deep learning studies. 🏆25 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.

Home Page: https://yoshitomo-matsubara.net/torchdistill/

License: MIT License

Python 100.00%

knowledge-distillation pytorch image-classification imagenet object-detection coco semantic-segmentation cifar10 cifar100 colab-notebook

torchdistill's People

Stargazers

Watchers

Forkers

atomeyang dorniwang yjingyu lilujunai bboyhanat danceindark cv-ip yuehchuan cufix donnyyou wuxiaolianggit chaoso blakecheng potatothanh lovegood-1 johnbrowncc mabu-dev leo4678 meteoorliu csgcmai liujing1023 runauto veryhannibal coldfire93 hl-louis wongyufei dostos jryangex the-intelligence-of-information jie311 ahmedhusskhalifa xinxinatg neudeep simonsst gist-ailab tjdhg456 nathanielhuish jim79 roger1993 check-777 msathishkumar1990 techthiyanes wx-b ztt0810 taotaoxu federicocunico liuguoyou black1025 cdp-study lakshinav cyang0515 kravi2018 ayoubkarine myc1998 lizezheng zeroonegame pverma2 shaoshitong lisovrv 2644556969 stjordanis zyn-z ucfxj neverlcy mountains-high elya-krozi abecadel learnermaxrl souvik2222 yvonnedl bot66 tpl-egg-hit anmyles scottblack1998 j133808 myrault1998 ashymuzuro mohan259 pipichensir yuzhangnku gg-big-org ajunlonglive doytsujin pushkarraj ok3ks gupta-abhay bassemfg deponce hookk yanzhaowu gusario jireh-father mikeswf madhusaran26 zzs4026 wahyurahmaniar superorangeman jsrdcht 666dzy666 musherm

torchdistill's Issues

Combine two distillation losses

Hi,

Thanks for your amazing work.
I have a question about how to combine two or more distillation losses.
For example, how can I set the config files such that I can use CRD and KD together, as mentioned in CRD paper?
I looked up for a while and couldn't find the solution.
Would really appreciate your reply. Thanks again!

Why using `log_softmax` instead of `softmax`?

Same question has been asked here and here . These repositories (I think you already know them) are other attempts to implement knowledge distillation algorithms.

Could you please explain why it used log_softmax instead of softmax?

torchdistill/torchdistill/losses/single.py

Lines 99 to 106 in 993ee94

    
           def forward(self, student_output, teacher_output, targets=None, *args, **kwargs): 
        
               soft_loss = super().forward(torch.log_softmax(student_output / self.temperature, dim=1), 
        
                                           torch.softmax(teacher_output / self.temperature, dim=1)) 
        
               if self.alpha is None or self.alpha == 0 or targets is None: 
        
                   return soft_loss 
        
               hard_loss = self.cross_entropy_loss(student_output, targets) 
        
               return self.alpha * hard_loss + self.beta * (self.temperature ** 2) * soft_loss

is tochdistill support knowlede distillation for Vision Foundation Models like Grounding Dino / Grounding DinoSAM ?

Hi Team,

Currently i am working in Grounding Dino vision foundation model for object detection ( https://github.com/IDEA-Research/GroundingDINO). The model size is around 660 MB. I want to deploy it in the edge device and i like to use Grounding Dino model (as teacher model) for KD.

I want to know whether torchdistill package supports vision foundation model ?
if it is so i want to know is there any sample link / demo code available for Vision foundation model KD.

thanks for your help.

How should I use Torchdistill？

How should TorchdiStill be used in the project

RuntimeError: CUDA error: device-side assert triggered

Hi Yoshitomo,

My machine has 2 TitanV + Torch 1.7.1 + Cuda11.0 + TorchVision 0.8.2

I ran python examples/image_classification.py --config configs/sample/ilsvrc2012/single_stage/kd/alexnet_from_resnet152.yaml --log log/ilsvrc2012/kd/alexnet_from_resnet152.txt

Then got error RuntimeError: CUDA error: device-side assert triggered:

2021/02/24 13:53:53 INFO torchdistill.common.main_util Not using distributed mode
2021/02/24 13:53:53 INFO main Namespace(adjust_lr=False, config='configs/sample/ilsvrc2012/single_stage/kd/alexnet_from_resnet152.yaml', device='cuda', dist_url='env://', log='log/ilsvrc2012/kd/alexnet_from_resnet152.txt', start_epoch=0, student_only=False, sync_bn=False, test_only=False, world_size=1)
2021/02/24 13:53:53 INFO torchdistill.datasets.util Loading train data
2021/02/24 13:53:58 INFO torchdistill.datasets.util dataset_id ilsvrc2012/train: 4.215242624282837 sec
2021/02/24 13:53:58 INFO torchdistill.datasets.util Loading val data
2021/02/24 13:53:58 INFO torchdistill.datasets.util dataset_id ilsvrc2012/val: 0.18817710876464844 sec
2021/02/24 13:53:59 INFO torchdistill.common.main_util ckpt file is not found at ./resource/ckpt/ilsvrc2012/teacher/ilsvrc2012-resnet152.pt
2021/02/24 13:54:02 INFO torchdistill.common.main_util ckpt file is not found at ./resource/ckpt/ilsvrc2012/single_stage/kd/ilsvrc2012-alexnet_from_resnet152.pt
2021/02/24 13:54:02 INFO main Start training
2021/02/24 13:54:02 INFO torchdistill.models.util [teacher model]
2021/02/24 13:54:02 INFO torchdistill.models.util Using the original teacher model
2021/02/24 13:54:02 INFO torchdistill.models.util [student model]
2021/02/24 13:54:02 INFO torchdistill.models.util Using the original student model
2021/02/24 13:54:02 INFO torchdistill.core.distillation Loss = 1.0 * OrgLoss
2021/02/24 13:54:02 INFO torchdistill.core.distillation Freezing the whole teacher model
2021/02/24 13:54:06 INFO torchdistill.misc.log Epoch: [0] [ 0/40037] eta: 1 day, 21:05:22 lr: 0.0001 img/s: 11.305412092278724 loss: 7.0715 (7.0715) time: 4.0543 data: 1.2238 max mem: 2885
/opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "examples/image_classification.py", line 181, in
main(argparser.parse_args())
File "examples/image_classification.py", line 163, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 123, in train
train_one_epoch(training_box, device, epoch, log_freq)
File "examples/image_classification.py", line 66, in train_one_epoch
metric_logger.update(loss=loss.item(), lr=training_box.optimizer.param_groups[0]['lr'])
RuntimeError: CUDA error: device-side assert triggered

Use different models as Teacher/Student

@yoshitomo-matsubara @bot66 @dostos

How i can use different models as teacher/student ,lets i want to use teacher as Effnet b5 and student as Effnet b0

[BUG]ImportError: cannot import name 'import_dependencies' from 'torchdistill.common.main_util'

Hi,yoshitomo,I have a question about setting up a virtual environment.How can I resolve it?
Describe the bug
Traceback (most recent call last):
File "H:/torchdistill/torchdistill-main/examples/torchvision/image_classification.py", line 14, in
from torchdistill.common.main_util import is_main_process, init_distributed_mode, load_ckpt, save_ckpt, set_seed,
ImportError: cannot import name 'import_dependencies' from 'torchdistill.common.main_util' (E:\Anaconda\envs\torchdistill\lib\site-packages\torchdistill\common\main_util.py)

Environment (please complete the following information):

OS: [e.g. Ubuntu 20.04 LTS]
Python ver. 3.8
torchdistill ver. v0.3.3
torch==1.8.0+cu111 torchvision==0.9.0+cu111

Using forward hook for auxiliary loss

Hello!

I was trying to find a way to use the intermediate features of a pretrained ResNet without changing the architecture, then found this github repo by the 2020 RPRR paper torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation.

In the paper it mentions:
Taking an advantage of forward hook paradigm in PyTorch [30], torchdistill supports introducing such auxiliary modules without altering the original implementations
of the models.

I was very excited to find this repo since this is exactly what I was looking for!

However, it is difficult for me to actually spot an example code for this.

Can anyone generously highlight where in this repo I can find a code regarding this, or provide an example for me?

Thanks for the help in advance :)

Hyperparameters tunning

Hey,
Thank you for your great effort in creating this tool.
Is there a possible way to tune the hyperparameter using your current framework or should I add Ray Tune to your framework?

Implementation of SemCKD

Hi @yoshitomo-matsubara ,

Thanks for your advice before. Right now I have implemented another method SemCKD based on this framework. Due to the way it trains the student I have made some changes on the framework, but I am no sure if this could hurt the pipeline.

In this method when the student network forward, it needs to reshape the output feature of each layer to the same one as feature in teacher network, so we need to know the shape of feature in each layer in teacher network. In order to do that I add a new forward_proc function like this:

@register_forward_proc_func
def forward_batch_teacher_output(model, sample_batch, targets=None, supp_dict=None, teacher_io_dict=None):
    return model(sample_batch, teacher_io_dict)

I am not sure if there is a better way to do that.

Also I also designed a wrapper function like SSKD to calculate module aside from the original backbone such as key, query and value in attention etc. The problem is that in the old frame it extract every value with hook manager including those in post_forward function, so it could cause a long list of path in config file. I add a return value in post forward, so I put all those value in post_forward into a tuple and input to loss module directly. Maybe using hook manager only in original backbone might be easier to transfer different codes?
Also there is another question here: I found there are two losses: org_term and sub_term. So if I only use one kind of loss (such as SemCKD) they are the same right? If I want to implement SemCKD+CRD, then I can directly use these two terms?

Thank you very much!

CSE-L2 KD mobilenetv2 from resnet18 on cifar100

I try to imply CSE-L2. student model is mobilenetv2 and teacher model is resnet18. The dataset is cifar10/100. Here are my problems.

(torchdistill) lthpc@lthpc:/data/Code/Wang_Yufei/PAD/torchdistill$ bash cifar100.sh
2021/07/04 19:16:30 INFO torchdistill.common.main_util Not using distributed mode
2021/07/04 19:16:30 INFO main Namespace(adjust_lr=False, config='configs/sample/cifar100/kd/L2-mobilenetv2_from_resnet_18-final_run.yaml', device='cuda', dist_url='env://', log='log/cifar100/kd/L2-mobilenetv2_from_resnet_18-final_run.log', seed=None, start_epoch=0, student_only=False, sync_bn=False, test_only=False, world_size=1)
2021/07/04 19:16:30 INFO torchdistill.datasets.util Loading train data
Files already downloaded and verified
2021/07/04 19:16:31 INFO torchdistill.datasets.util dataset_id cifar100/train: 0.9996764659881592 sec
2021/07/04 19:16:31 INFO torchdistill.datasets.util Loading val data
Files already downloaded and verified
2021/07/04 19:16:31 INFO torchdistill.datasets.util dataset_id cifar100/val: 0.6575329303741455 sec
2021/07/04 19:16:31 INFO torchdistill.datasets.util Loading test data
Files already downloaded and verified
2021/07/04 19:16:32 INFO torchdistill.datasets.util dataset_id cifar100/test: 0.6455981731414795 sec
2021/07/04 19:16:32 INFO torchdistill.common.main_util Loading model parameters
2021/07/04 19:16:36 INFO torchdistill.common.main_util ckpt file is not found at ./resource/ckpt/cifar100/L2/cifar100-mobilev2_from_resnet_18-final_run.pt
2021/07/04 19:16:36 INFO main Start training
2021/07/04 19:16:36 INFO torchdistill.models.util [teacher model]
2021/07/04 19:16:36 INFO torchdistill.models.util Using the original teacher model
2021/07/04 19:16:36 INFO torchdistill.models.util [student model]
2021/07/04 19:16:36 INFO torchdistill.models.util Using the original student model
Traceback (most recent call last):
File "examples/image_classification.py", line 180, in
main(argparser.parse_args())
File "examples/image_classification.py", line 162, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 110, in train
device, device_ids, distributed, lr_factor)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 406, in get_distillation_box
device, device_ids, distributed, lr_factor, accelerator)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 229, in init
self.setup(train_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 105, in setup
self.setup_teacher_student_models(teacher_config, student_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 82, in setup_teacher_student_models
student_config, self.student_io_dict))
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/util.py", line 39, in set_hooks
requires_input, requires_output, io_dict)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/forward_hook.py", line 75, in register_forward_hook_with_dict
return module.register_forward_hook(forward_hook4output)
AttributeError: 'function' object has no attribute 'register_forward_hook'
(torchdistill) lthpc@lthpc:/data/Code/Wang_Yufei/PAD/torchdistill$

Do you know the answers?

Not a bug but a discrepency between the log and config file for kd-resnet18_from_resnet34

Describe the bug
Looking at the config file and its corresponding log file for resnet18, kd-resnet18_from_resnet34.log, it seems the provided config file is different from the one being used to create the log file.
If you look closely you can see in the log file that, the learning rate is 0.3 while in the config file its set as 0.1.

Drop Fully Connected Layer of a Pretrained model

Hey,
I was wondering if we can drop the fully connected layer for a pretrained model on ImageNet and fine-tune it.

Is there any possibility to integrate your framework with the models mentioned in this link or here ?

[BUG] Missing Link in Readme

Hi,
Thanks for the great repo! Saves us a lot of time!

I wanted to download checkpoints from ILSVRC2012 R34 -> R18 distillation, but the last column (KR: Knowledge Review) checkpoint seem to be missing? The top-1 accuracy is mentioned on the main README but on the second README (in the Imagenet folder) it disappears.

Thanks a lot.
Amin

I tried with this script also, only single nproc seems to be working. Do i need to define any additional enviornment variables like RANK or LocaL HOST

also can we enable --amp in torchdistill. the img/s on imagenet is also pretty good on the original script: https://github.com/pytorch/vision/blob/main/references/classification/train.py
what can i do differently to get img/s higher (I'm getting like 12 img/s :( )and multi gpu run working

please help

Originally posted by @arpitsahni04 in #378 (reply in thread)

Can you please tell why the dataloading img/s speed is much lesser in torchdistill as compared to the train.py file in torchvision for the same dataset??
Screenshot 2023-07-25 at 11.41.02 PM

Bug. Bad implement.

I found a bug in cifa-resnet.

I changed Avgpool2d(8,stride=1) to AdaptiveAvgPool2d(1) that is actually correct.

Besides, Thx for your contribution of opening resource.I am going to use this repo to build my own distillation algorithm. But to be honestly, it seems that there are some '' so big'' bugs in this repo. So chould you plz tell me if all your experiments are done correctly by using this repo? I am so confused about that.

Affinity Loss usage

Hi @yoshitomo-matsubara
I want to do semantic segmentation using distillation using "AffinityLoss" as in the paper "Knowledge Adaptation for Efficient Semantic Segmentation".

Its been shown in the line
class AffinityLoss(nn.Module):

I tried to use example section mentioned in
python3 examples/semantic_segmentation.py --config configs/sample/coco2017/multi_stage/ktaad/lraspp_mobilenet_v3_large_from_deeplabv3_resnet50.yaml --log log/coco2017/ktaad/lraspp_mobilenet_v3_large_from_deeplabv3_resnet50.txt

But while debugging it will never excute AffinityLoss module.
requesting you to help us how to go ahed. Seeking your help

If the Teacher model is different from Student model, how can I use this framework？

Hi. Thanks a lot for the great framework. I want to know if the student model is different from the Teacher model. For example, the Teacher model is BERT and the Student model is RNN(small model) .How should I use this framework?
Could you please tell me how I can implement that? Thanks

Similarity Preserving KD

Hi,
Thanks for your amazing work.

I have a question regarding the implementation of Similarity Preserving KD loss in

torchdistill/torchdistill/losses/single.py

Line 467 in 7f533ba

spkd_loss = spkd_losses.sum()

In the implementation, the loss is calculated by taking a frobenius norm over the difference between the square matrixes of teacher and student, and then takes a sum of the this. However, torch.norm would calculate the norm and give a single value for a layer. I am confused why we take the sum over it (if its a single value)?

The paper says the loss is the summation over mean of element-wise squared difference between the two square matrix of different layer pairs. So does the sum correspond to different layers?

Thank you.

About the application scenarios supported by the program

Hi, I would like to know if all the reproduced paper methods in this project support semantic segmentation, categorization, and target detection, or is it just the original paper that corresponds to the work. For example, if the original paper is doing knowledge distillation about target detection, then the pages in the project only support target detection.

Inquiry about "CacheableDataset from wrapper"

Hey,

I am trying to tailor your framework to accommodate my experiments on imagenet. One of my experiments is to implement this paper Knowledge distillation: A good teacher is patient and consistent where they do some data augmentation at the teacher and student side. Each model has different input data. I expect I would need to creat each of them a different dataloader. I will also maintain the same augmentation method for each image to be able to store the output vector of the teacher using default_idx2subpath function on my hard drive (SSD). I figured that saving the data on the SSD compared to loading the images made the training runs slower.

I am trying to train a Student model -ResNet18- using 2 GPUs on ImageNet and a teacher model -Resnet34-. Could u recommend what is the best scenario for this pipeline to run the code faster giving the current resources?

Another question, if I need to train using cosine scheduler [CosineAnnealingLR ] with Adam optimizer, which method should I change?

[BUG] fp16 causes AssertionError: No inf checks were recorded for this optimizer

Describe the bug
I modified the examples/legacy/image_classification.py to adapt to huggingface accelerate , meeting the following question:

Traceback (most recent call last):
  File "examples/legacy/image_classification_accelerate.py", line 217, in <module>
    main(argparser.parse_args())
  File "examples/legacy/image_classification_accelerate.py", line 198, in main
    train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args, accelerator)
  File "examples/legacy/image_classification_accelerate.py", line 129, in train
    train_one_epoch(training_box, device, epoch, log_freq)
  File "examples/legacy/image_classification_accelerate.py", line 71, in train_one_epoch
    training_box.update_params(loss)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torchdistill/core/distillation.py", line 316, in update_params
    self.optimizer.step()
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/optimizer.py", line 133, in step
    self.scaler.step(self.optimizer, closure)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 339, in step
    assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.

To Reproduce
Provide

Exact command to run your code
accelerate launch examples/legacy/image_classification_accelerate.py --config /workspace/sync/torchdistill/configs/legacy/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/at-vit-base_from_vit-base.yaml
Whether or not you made any changes in Python code (if so, how you made the changes?)
I have enabled the fp16 multi-gpu option in the configuration file of accelerate. My main experiment configuration file is for the AT algorithm.
I made some modifications to the image_classification file, mainly following the modifications made to the text_classification.py file by the author. I did not make any personalized changes and simply followed the approach of text_classification.py with minimal modifications, which ultimately led to this error.
YAML config file

datasets:
  ilsvrc2012:
    name: &dataset_name 'ilsvrc2012'
    type: 'ImageFolder'
    root: &root_dir !join ['/workspace/sync/imagenet-1k']
    splits:
      train:
        dataset_id: &imagenet_train !join [*dataset_name, '/train']
        params:
          root: !join [*root_dir, '/train']
          transform_params:
            - type: 'RandomResizedCrop'
              params:
                size: &input_size [224, 224]
            - type: 'RandomHorizontalFlip'
              params:
                p: 0.5
            - &totensor
              type: 'ToTensor'
              params:
            - &normalize
              type: 'Normalize'
              params:
                mean: [0.485, 0.456, 0.406]
                std: [0.229, 0.224, 0.225]
      val:
        dataset_id: &imagenet_val !join [*dataset_name, '/val']
        params:
          root: !join [*root_dir, '/val']
          transform_params:
            - type: 'Resize'
              params:
                size: 256
            - type: 'CenterCrop'
              params:
                size: *input_size
            - *totensor
            - *normalize

models:
  teacher_model:
    name: &teacher_model_name 'maskedvit_base_patch16_224'
    params:
      num_classes: 1000
      pretrained: True
      mask_ratio: 0.0
    experiment: &teacher_experiment !join [*dataset_name, '-', *teacher_model_name]
    ckpt: !join ['./resource/ckpt/ilsvrc2012/teacher/', *teacher_experiment, '.pt']
  student_model:
    name: &student_model_name 'maskedvit_base_patch16_224'
    params:
      num_classes: 1000
      pretrained: False
      mask_ratio: 0.5
    experiment: &student_experiment !join [*dataset_name, '-', *student_model_name, '_from_', *teacher_model_name]
    ckpt: !join ['./imagenet/mask_distillation/', *student_experiment, '.pt']

train:
  log_freq: 1000
  num_epochs: 100
  train_data_loader:
    dataset_id: *imagenet_train
    random_sample: True
    batch_size: 64
    num_workers: 16
    cache_output:
  val_data_loader:
    dataset_id: *imagenet_val
    random_sample: False
    batch_size: 128
    num_workers: 16
  teacher:
    sequential: []
    forward_hook:
      input: []
      output: ['mask_filter']
    wrapper: 'DataParallel'
    requires_grad: False
  student:
    adaptations:
    sequential: []
    frozen_modules: []
    forward_hook:
      input: []
      output: ['mask_filter']
    wrapper: 'DistributedDataParallel'
    requires_grad: True
  optimizer:
    type: 'SGD'
    grad_accum_step: 16
    max_grad_norm: 5.0
    module_wise_params:
      - params: ['mask_token', 'cls_token', 'pos_embed']
        is_teacher: None
        module: None
        weight_decay: 0.0
    params:
      lr: 0.001
      momentum: 0.9
      weight_decay: 0.0001
      
  scheduler:
    type: 'MultiStepLR'
    params:
      milestones: [30, 60, 90]
      gamma: 0.1
  criterion:
    type: 'GeneralizedCustomLoss'
    org_term:
      criterion:
        type: 'CrossEntropyLoss'
        params:
          reduction: 'mean'
      factor: 1.0
    sub_terms:
      GenerativeKDLoss:
        criterion:
          type: 'GenerativeKDLoss'
          params:
            student_module_io: 'output'
            student_module_path: 'mask_filter'
            teacher_module_io: 'output'
            teacher_module_path: 'mask_filter'
        factor: 1.0

test:
  test_data_loader:
    dataset_id: *imagenet_val
    random_sample: False
    batch_size: 1
    num_workers: 16

Log file

(pytorch_1) root@baa8ef5448b2:/workspace/sync/torchdistill# accelerate launch examples/legacy/image_classification_accelerate.py --config /workspace/sync/torchdistill/configs/legacy/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/at-vit-base_from_vit-base.yaml
2023/08/15 02:49:09     INFO    __main__        Namespace(adjust_lr=False, config='/workspace/sync/torchdistill/configs/legacy/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/at-vit-base_from_vit-base.yaml', device='cuda', dist_url='env://', log=None, log_config=False, seed=None, start_epoch=0, student_only=False, test_only=False, world_size=1)
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Added key: store_based_barrier_key:1 to store for rank: 0
2023/08/15 02:49:09     INFO    __main__        Namespace(adjust_lr=False, config='/workspace/sync/torchdistill/configs/legacy/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/at-vit-base_from_vit-base.yaml', device='cuda', dist_url='env://', log=None, log_config=False, seed=None, start_epoch=0, student_only=False, test_only=False, world_size=1)
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Added key: store_based_barrier_key:1 to store for rank: 1
2023/08/15 02:49:09     INFO    __main__        Namespace(adjust_lr=False, config='/workspace/sync/torchdistill/configs/legacy/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/at-vit-base_from_vit-base.yaml', device='cuda', dist_url='env://', log=None, log_config=False, seed=None, start_epoch=0, student_only=False, test_only=False, world_size=1)
2023/08/15 02:49:09     INFO    __main__        Namespace(adjust_lr=False, config='/workspace/sync/torchdistill/configs/legacy/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/at-vit-base_from_vit-base.yaml', device='cuda', dist_url='env://', log=None, log_config=False, seed=None, start_epoch=0, student_only=False, test_only=False, world_size=1)
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Added key: store_based_barrier_key:1 to store for rank: 2
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Added key: store_based_barrier_key:1 to store for rank: 3
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023/08/15 02:49:09     INFO    torch.distributed.distributed_c10d      Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023/08/15 02:49:09     INFO    __main__        Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 4
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: fp16

2023/08/15 02:49:09     INFO    torchdistill.datasets.util      Loading train data
2023/08/15 02:49:12     INFO    torchdistill.datasets.util      dataset_id `ilsvrc2012/train`: 2.874385356903076 sec
2023/08/15 02:49:12     INFO    torchdistill.datasets.util      Loading val data
2023/08/15 02:49:12     INFO    torchdistill.datasets.util      dataset_id `ilsvrc2012/val`: 0.12787175178527832 sec
2023/08/15 02:49:15     INFO    timm.models._builder    Loading pretrained weights from Hugging Face hub (timm/vit_base_patch16_224.augreg2_in21k_ft_in1k)
2023/08/15 02:49:16     INFO    timm.models._hub        [timm/vit_base_patch16_224.augreg2_in21k_ft_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
2023/08/15 02:49:16     INFO    torchdistill.common.main_util   ckpt file is not found at `./resource/ckpt/ilsvrc2012/teacher/ilsvrc2012-maskedvit_base_patch16_224.pt`
2023/08/15 02:49:18     INFO    torchdistill.common.main_util   ckpt file is not found at `./imagenet/mask_distillation/ilsvrc2012-maskedvit_base_patch16_224_from_maskedvit_base_patch16_224.pt`
2023/08/15 02:49:18     INFO    __main__        Start training
2023/08/15 02:49:18     INFO    torchdistill.models.util        [teacher model]
2023/08/15 02:49:18     INFO    torchdistill.models.util        Using the original teacher model
2023/08/15 02:49:18     INFO    torchdistill.models.util        [student model]
2023/08/15 02:49:18     INFO    torchdistill.models.util        Using the original student model
2023/08/15 02:49:18     INFO    torchdistill.core.distillation  Loss = 1.0 * OrgLoss + 1.0 * GenerativeKDLoss(
  (cross_entropy_loss): CrossEntropyLoss()
  (SmoothL1Loss): SmoothL1Loss()
)
2023/08/15 02:49:18     INFO    torchdistill.core.distillation  Freezing the whole teacher model
2023/08/15 02:49:18     INFO    torchdistill.common.module_util `None` of `None` could not be reached in `DataParallel`
/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/state.py:802: FutureWarning: The `use_fp16` property is deprecated and will be removed in version 1.0 of Accelerate use `AcceleratorState.mixed_precision == 'fp16'` instead.
  warnings.warn(
/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/state.py:802: FutureWarning: The `use_fp16` property is deprecated and will be removed in version 1.0 of Accelerate use `AcceleratorState.mixed_precision == 'fp16'` instead.
  warnings.warn(
/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/state.py:802: FutureWarning: The `use_fp16` property is deprecated and will be removed in version 1.0 of Accelerate use `AcceleratorState.mixed_precision == 'fp16'` instead.
  warnings.warn(
/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/state.py:802: FutureWarning: The `use_fp16` property is deprecated and will be removed in version 1.0 of Accelerate use `AcceleratorState.mixed_precision == 'fp16'` instead.
  warnings.warn(
2023/08/15 02:49:24     INFO    torchdistill.misc.log   Epoch: [0]  [   0/5005]  eta: 8:39:24  lr: 0.001  img/s: 21.99282017795937  loss: 0.4513 (0.4513)  time: 6.2267  data: 3.3162  max mem: 8400
2023/08/15 02:49:24     INFO    torch.nn.parallel.distributed   Reducer buckets have been rebuilt in this iteration.
2023/08/15 02:49:24     INFO    torch.nn.parallel.distributed   Reducer buckets have been rebuilt in this iteration.
Traceback (most recent call last):
  File "examples/legacy/image_classification_accelerate.py", line 217, in <module>
Traceback (most recent call last):
  File "examples/legacy/image_classification_accelerate.py", line 217, in <module>
    main(argparser.parse_args())
  File "examples/legacy/image_classification_accelerate.py", line 198, in main
    train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args, accelerator)
  File "examples/legacy/image_classification_accelerate.py", line 129, in train
    train_one_epoch(training_box, device, epoch, log_freq)
  File "examples/legacy/image_classification_accelerate.py", line 71, in train_one_epoch
    main(argparser.parse_args())
  File "examples/legacy/image_classification_accelerate.py", line 198, in main
    training_box.update_params(loss)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torchdistill/core/distillation.py", line 316, in update_params
    train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args, accelerator)
  File "examples/legacy/image_classification_accelerate.py", line 129, in train
    self.optimizer.step()    
train_one_epoch(training_box, device, epoch, log_freq)  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/optimizer.py", line 133, in step

  File "examples/legacy/image_classification_accelerate.py", line 71, in train_one_epoch
    training_box.update_params(loss)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torchdistill/core/distillation.py", line 316, in update_params
    self.optimizer.step()
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/optimizer.py", line 133, in step
        self.scaler.step(self.optimizer, closure)self.scaler.step(self.optimizer, closure)

  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 339, in step
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 339, in step
        assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."

AssertionErrorAssertionError: : No inf checks were recorded for this optimizer.No inf checks were recorded for this optimizer.

Traceback (most recent call last):
  File "examples/legacy/image_classification_accelerate.py", line 217, in <module>
    main(argparser.parse_args())
  File "examples/legacy/image_classification_accelerate.py", line 198, in main
    train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args, accelerator)
  File "examples/legacy/image_classification_accelerate.py", line 129, in train
    train_one_epoch(training_box, device, epoch, log_freq)
  File "examples/legacy/image_classification_accelerate.py", line 71, in train_one_epoch
    training_box.update_params(loss)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torchdistill/core/distillation.py", line 316, in update_params
    self.optimizer.step()
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/optimizer.py", line 133, in step
    self.scaler.step(self.optimizer, closure)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 339, in step
    assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.
Traceback (most recent call last):
  File "examples/legacy/image_classification_accelerate.py", line 217, in <module>
    main(argparser.parse_args())
  File "examples/legacy/image_classification_accelerate.py", line 198, in main
    train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args, accelerator)
  File "examples/legacy/image_classification_accelerate.py", line 129, in train
    train_one_epoch(training_box, device, epoch, log_freq)
  File "examples/legacy/image_classification_accelerate.py", line 71, in train_one_epoch
    training_box.update_params(loss)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torchdistill/core/distillation.py", line 316, in update_params
    self.optimizer.step()
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/optimizer.py", line 133, in step
    self.scaler.step(self.optimizer, closure)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 339, in step
    assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3701268) of binary: /root/miniconda3/envs/pytorch_1/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/pytorch_1/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/commands/launch.py", line 970, in launch_command
    multi_gpu_launcher(args)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
    distrib_run.run(args)
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/pytorch_1/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
examples/legacy/image_classification_accelerate.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-08-15_02:49:37
  host      : baa8ef5448b2
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 3701269)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-08-15_02:49:37
  host      : baa8ef5448b2
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 3701270)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-08-15_02:49:37
  host      : baa8ef5448b2
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 3701271)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-15_02:49:37
  host      : baa8ef5448b2
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 3701268)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Expected behavior
A clear and concise description of what you expected to happen.

Environment (please complete the following information):

OS: Ubuntu 22.04 LTS
Python ver.3.8
torchdistill ver. v0.3.3

(pytorch_1) root@baa8ef5448b2:/workspace/sync/torchdistill# conda list
# packages in environment at /root/miniconda3/envs/pytorch_1:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
accelerate                0.21.0                   pypi_0    pypi
blas                      1.0                         mkl    defaults
brotlipy                  0.7.0           py38h27cfd23_1003    defaults
bzip2                     1.0.8                h7b6447c_0    defaults
ca-certificates           2023.05.30           h06a4308_0    defaults
certifi                   2023.7.22        py38h06a4308_0    defaults
cffi                      1.15.1           py38h5eee18b_3    defaults
charset-normalizer        2.0.4              pyhd3eb1b0_0    defaults
contourpy                 1.1.0                    pypi_0    pypi
cryptography              41.0.2           py38h22a60cf_0    defaults
cuda-cudart               11.7.99                       0    nvidia
cuda-cupti                11.7.101                      0    nvidia
cuda-libraries            11.7.1                        0    nvidia
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-runtime              11.7.1                        0    nvidia
cycler                    0.11.0                   pypi_0    pypi
cython                    3.0.0                    pypi_0    pypi
ffmpeg                    4.3                  hf484d3e_0    pytorch
filelock                  3.12.2                   pypi_0    pypi
fonttools                 4.42.0                   pypi_0    pypi
freetype                  2.12.1               h4a9f257_0    defaults
fsspec                    2023.6.0                 pypi_0    pypi
future                    0.18.3           py38h06a4308_0    defaults
giflib                    5.2.1                h5eee18b_3    defaults
gmp                       6.2.1                h295c915_3    defaults
gnutls                    3.6.15               he1e5248_0    defaults
huggingface-hub           0.16.4                   pypi_0    pypi
idna                      3.4              py38h06a4308_0    defaults
importlib-resources       6.0.1                    pypi_0    pypi
intel-openmp              2023.1.0         hdb19cb5_46305    defaults
jpeg                      9e                   h5eee18b_1    defaults
kiwisolver                1.4.4                    pypi_0    pypi
lame                      3.100                h7b6447c_0    defaults
lcms2                     2.12                 h3be6417_0    defaults
ld_impl_linux-64          2.38                 h1181459_1    defaults
lerc                      3.0                  h295c915_0    defaults
libcublas                 11.10.3.66                    0    nvidia
libcufft                  10.7.2.124           h4fbf590_0    nvidia
libcufile                 1.7.1.12                      0    nvidia
libcurand                 10.3.3.129                    0    nvidia
libcusolver               11.4.0.1                      0    nvidia
libcusparse               11.7.4.91                     0    nvidia
libdeflate                1.17                 h5eee18b_0    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgfortran-ng            11.2.0               h00389a5_1    defaults
libgfortran5              11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libiconv                  1.16                 h7f8727e_2    defaults
libidn2                   2.3.4                h5eee18b_0    defaults
libnpp                    11.7.4.75                     0    nvidia
libnvjpeg                 11.8.0.2                      0    nvidia
libopenblas               0.3.21               h043d6bf_0    defaults
libpng                    1.6.39               h5eee18b_0    defaults
libprotobuf               3.20.3               he621ea3_0    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
libtasn1                  4.19.0               h5eee18b_0    defaults
libtiff                   4.5.0                h6a678d5_2    defaults
libunistring              0.9.10               h27cfd23_0    defaults
libwebp                   1.2.4                h11a3e52_1    defaults
libwebp-base              1.2.4                h5eee18b_1    defaults
lz4-c                     1.9.4                h6a678d5_0    defaults
matplotlib                3.7.2                    pypi_0    pypi
mkl                       2023.1.0         h213fc3f_46343    defaults
mkl-service               2.4.0            py38h5eee18b_1    defaults
mkl_fft                   1.3.6            py38h417a72b_1    defaults
mkl_random                1.2.2            py38h417a72b_1    defaults
ncurses                   6.4                  h6a678d5_0    defaults
nettle                    3.7.3                hbbd107a_1    defaults
ninja                     1.10.2               h06a4308_5    defaults
ninja-base                1.10.2               hd09550d_5    defaults
numpy                     1.24.3           py38hf6e8229_1    defaults
numpy-base                1.24.3           py38h060ed82_1    defaults
openh264                  2.1.1                h4ff587b_0    defaults
openssl                   3.0.10               h7f8727e_0    defaults
packaging                 23.1                     pypi_0    pypi
pillow                    9.4.0            py38h6a678d5_0    defaults
pip                       23.2.1           py38h06a4308_0    defaults
psutil                    5.9.5                    pypi_0    pypi
pycocotools               2.0.6                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0    defaults
pyopenssl                 23.2.0           py38h06a4308_0    defaults
pyparsing                 3.0.9                    pypi_0    pypi
pysocks                   1.7.1            py38h06a4308_0    defaults
python                    3.8.17               h955ad1f_0    defaults
python-dateutil           2.8.2                    pypi_0    pypi
pytorch                   1.13.0          py3.8_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pyyaml                    6.0              py38h5eee18b_1    defaults
readline                  8.2                  h5eee18b_0    defaults
requests                  2.31.0           py38h06a4308_0    defaults
safetensors               0.3.2                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                68.0.0           py38h06a4308_0    defaults
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    defaults
tbb                       2021.8.0             hdb19cb5_0    defaults
timm                      0.9.5                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
torchaudio                0.13.0               py38_cu117    pytorch
torchdistill              0.3.3                    pypi_0    pypi
torchvision               0.14.0               py38_cu117    pytorch
tqdm                      4.66.1                   pypi_0    pypi
typing-extensions         4.7.1            py38h06a4308_0    defaults
typing_extensions         4.7.1            py38h06a4308_0    defaults
urllib3                   1.26.16          py38h06a4308_0    defaults
wheel                     0.38.4           py38h06a4308_0    defaults
xz                        5.4.2                h5eee18b_0    defaults
yaml                      0.2.5                h7b6447c_0    defaults
zipp                      3.16.2                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0    defaults
zstd                      1.5.5                hc292b87_0    defaults

Additional context
Add any other context about the problem here.

Support for SSD Object Detection Model?

Hi. Thanks for the great framework. I wanted to know if there are any plans to support SSD like models for object detection knowledge distillation. An example could be an SSD with a larger backbone (Teacher) and another SSD with a lighter backbone (Student) or even using some other network as a teacher.
If there are no plans, can you please guide me as to how I can implement that? Thanks

[BUG] ModuleNotFoundError: No module named 'torch._six'

Bug description
When trying to import torchdistill.core.forward_hook with an up-to-date version of pytorch, you get ModuleNotFoundError: No module named 'torch._six' error.

To Reproduce

Exact command to run your code: import torchdistill.core.forward_hook
Whether or not you made any changes in Python code (if so, how you made the changes?): Did not make changes.
YAML config file - not relevant.
Log file - not relevant.

Expected behavior
The import should succeed without errors.

Environment:

OS: Ubuntu 22.04 LTS
Python ver. 3.12.2
torchdistill ver. v0.3.3
torch ver. v2.2.1

Additional context
Related to this change in pytorch: pytorch/pytorch#94709
Apparently string_classes is no longer needed and str can be used instead.

Where is trained model?

I had trained segmentation model, but i didn't find it.

Custom Data

Hi, I adjusted cfg file , but when I run segmentation.py It always download pascal voc ?
How can I adjust to train in my dataset which include image and label ?

ForwardHookManager on multiple GPUs

Hi,
How can I use ForwardHookManager within a DataParalled?
Thanks

AttributeError: 'dict' object has no attribute 'flatten'

Hello, I think your project is great, but after I successfully installed torchdistill with pytorch 1.6, I meet the following problem when running. I look forward to your help，thanks~

AttributeError: 'dict' object has no attribute 'flatten'
Traceback (most recent call last):
  File "examples/image_classification.py", line 176, in <module>
    main(argparser.parse_args())
  File "examples/image_classification.py", line 158, in main
    distill(teacher_model, student_model, dataset_dict, device, device_ids, distributed, config, args)
  File "examples/image_classification.py", line 121, in distill
    distill_one_epoch(distillation_box, device, epoch, log_freq)
  File "examples/image_classification.py", line 61, in distill_one_epoch
    loss = distillation_box(sample_batch, targets, supp_dict)
  File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/distill/lib/python3.7/site-packages/torchdistill/core/distillation.py", line 253, in forward
    self.student_model.post_forward(self.student_io_dict)
  File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/distill/lib/python3.7/site-packages/torchdistill/models/special.py", line 422, in post_forward
    embed_outputs = io_dict[self.input_module_path][self.input_module_io].flatten(1)

Is it possible to use a model with YOLO framework?

Is it possible to use a model with YOLO framework?
Since this is written in C I'm not sure it will work with Pytorch.

Distilling Knowledge from a image classification model with sigmoid function and binary cross entropy

Hi, I found this paper and github and it looks robust. I was wondering if it is possible to use your framework to distill knowledge from a cumbersome model used for image classification that uses sigmoid function for classification and binary cross entropy for loss computation. Since it is a cumbersome model trained on a custom dataset, I would like to know if I can use your framework to distill the knowledge to a smaller network that actually uses softmax for binary cross entropy, and what are the steps required to make it so?

[BUG] Not supported to Nvidia 4090

Please use Discussions to ask questions.

Describe the bug
When I install torchdistill==0.3.3 and torchvision==0.13.1, it did not support my cuda version (12.0).
I assume the problem is the torch library is supporting my cuda version.

Environment (please complete the following information):

OS: Ubuntu 20.04 LTS
Python ver: 3.7
torchdistill ver: v0.3.3

Additional context
Is it possible if you can update some of the libraries so that those who are using the new GPU can also interact with your repo directly? Thank you.

AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

(torchdistill) lthpc@lthpc:/data/Code/Wang_Yufei/PAD/torchdistill$ bash run_train.sh

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

2021/06/29 17:16:43 INFO torchdistill.common.main_util | distributed init (rank 0): env://
2021/06/29 17:16:43 INFO torchdistill.common.main_util | distributed init (rank 1): env://
2021/06/29 17:16:43 INFO torchdistill.common.main_util | distributed init (rank 2): env://
2021/06/29 17:16:43 INFO root Added key: store_based_barrier_key:1 to store for rank: 1
2021/06/29 17:16:43 INFO root Added key: store_based_barrier_key:1 to store for rank: 2
2021/06/29 17:16:43 INFO root Added key: store_based_barrier_key:1 to store for rank: 0
2021/06/29 17:16:47 INFO main Namespace(adjust_lr=False, config='configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/cse_l2-resnet18_from_resnet34.yaml', device='cuda', dist_url='env://', log='log/ilsvrc2012/cse_l2-resnet18_from_resnet34.log', seed=None, start_epoch=0, student_only=False, sync_bn=False, test_only=False, world_size=3)
2021/06/29 17:16:47 INFO torchdistill.datasets.util Loading train data
2021/06/29 17:16:51 INFO torchdistill.datasets.util dataset_id ilsvrc2012/train: 4.093475580215454 sec
2021/06/29 17:16:51 INFO torchdistill.datasets.util Loading val data
2021/06/29 17:16:51 INFO torchdistill.datasets.util dataset_id ilsvrc2012/val: 0.1801161766052246 sec
2021/06/29 17:16:52 INFO torchdistill.common.main_util ckpt file is not found at ./resource/ckpt/ilsvrc2012/teacher/ilsvrc2012-resnet34.pt
2021/06/29 17:16:52 INFO torchdistill.common.main_util Loading model parameters
2021/06/29 17:16:52 INFO main Start training
2021/06/29 17:16:52 INFO torchdistill.models.util [teacher model]
2021/06/29 17:16:52 INFO torchdistill.models.util Using the original teacher model
2021/06/29 17:16:52 INFO torchdistill.models.util [student model]
2021/06/29 17:16:52 INFO torchdistill.models.util Using the original student model
2021/06/29 17:16:52 INFO torchdistill.core.distillation Loss = 1.0 * OrgLoss + 15.0 * MSELoss()
2021/06/29 17:16:52 INFO torchdistill.core.distillation Freezing the whole teacher model
Traceback (most recent call last):
File "examples/image_classification.py", line 180, in
main(argparser.parse_args())
File "examples/image_classification.py", line 162, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 110, in train
device, device_ids, distributed, lr_factor)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 406, in get_distillation_box
device, device_ids, distributed, lr_factor, accelerator)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 229, in init
self.setup(train_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 125, in setup
teacher_any_updatable)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/util.py", line 50, in wrap_model
model = DistributedDataParallel(model, device_ids=device_ids, find_unused_parameters=find_unused_parameters)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in init
"DistributedDataParallel is not needed when a module "
AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Traceback (most recent call last):
File "examples/image_classification.py", line 180, in
main(argparser.parse_args())
File "examples/image_classification.py", line 162, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 110, in train
device, device_ids, distributed, lr_factor)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 406, in get_distillation_box
device, device_ids, distributed, lr_factor, accelerator)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 229, in init
self.setup(train_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 125, in setup
teacher_any_updatable)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/util.py", line 50, in wrap_model
model = DistributedDataParallel(model, device_ids=device_ids, find_unused_parameters=find_unused_parameters)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in init
"DistributedDataParallel is not needed when a module "
AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Traceback (most recent call last):
File "examples/image_classification.py", line 180, in
main(argparser.parse_args())
File "examples/image_classification.py", line 162, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 110, in train
device, device_ids, distributed, lr_factor)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 406, in get_distillation_box
device, device_ids, distributed, lr_factor, accelerator)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 229, in init
self.setup(train_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 125, in setup
teacher_any_updatable)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/util.py", line 50, in wrap_model
model = DistributedDataParallel(model, device_ids=device_ids, find_unused_parameters=find_unused_parameters)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in init
"DistributedDataParallel is not needed when a module "
AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Killing subprocess 28872
Killing subprocess 28873
Killing subprocess 28874
Traceback (most recent call last):
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/lthpc/.conda/envs/torchdistill/bin/python3', '-u', 'examples/image_classification.py', '--config', 'configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/cse_l2-resnet18_from_resnet34.yaml', '--log', 'log/ilsvrc2012/cse_l2-resnet18_from_resnet34.log', '--world_size', '3']' returned non-zero exit status 1.
(torchdistill) lthpc@lthpc:/data/Code/Wang_Yufei/PAD/torchdistill$

Disagreement betweeen the log and configuration of kd-resnet18_from_resnet34

Hi yoshitomo.

I just noticed that in the config, the initial learning rate is 0.1 while that in the log seemed to be 0.3.

Moreover, the batch size is set to 256 in the config. However, given that there are 1.2 million images in ILSVRC 2012 train set, and from the log there seems to be 1667 batches in total. I guess this means the batch size would be around 720.

How to use different methods for a single task?

Hi,
As mentioned, torchdistill offers various KD methods. So is it possible to assign different KD methods like fitnet or contrastive learning method for a specific task like image classification? Thank you.

How to train my own COCO dataset for object detection?

I encountered the following problem.
（1）

（2）
How do I modify the number of categories?

How to specify the weight of the downloaded teacher model without access to the Internet?

ValueError: batchmean is not a valid value for reduction

Hello, I tried to reproduce the example of Coco2017 and encountered this error
(please let me know if I need to move this to the Discussion, I am sorry for the previous post as I did not read your README carefully)

Thank you!

Command executed:
CUDA_VISIBLE_DEVICES=3 python3 examples/object_detection.py --config configs/sample/coco2017/multi_stage/ft/custom_fasterrcnn_resnet18_fpn_from_fasterrcnn_resnet50_fpn.yaml --log log/coco2017/ft/custom_fasterrcnn_resnet18_fpn_from_fasterrcnn_resnet50_fpn.txt

Error log:

2022/07/07 00:07:12     INFO    torchdistill.models.util        [teacher model]
2022/07/07 00:07:12     INFO    torchdistill.models.util        Using the original teacher model
2022/07/07 00:07:13     INFO    torchdistill.models.util        [teacher model]
2022/07/07 00:07:13     INFO    torchdistill.models.util        Using the Teacher4FactorTransfer teacher model
2022/07/07 00:07:13     INFO    torchdistill.models.util        [student model]
2022/07/07 00:07:13     INFO    torchdistill.models.util        Using the Student4FactorTransfer student model
2022/07/07 00:07:13     INFO    torchdistill.models.util        Frozen module(s): {'student_model.backbone.body.bn1', 'student_model.backbone.body.conv1', 'student_model.backbone.body.maxpool', 'student_model.backbone.body.relu'}
2022/07/07 00:07:13     INFO    torchdistill.core.distillation  Loss = 1.0 * OrgLoss + 1000.0 * FTLoss()
2022/07/07 00:07:13     INFO    torchdistill.core.distillation  Freezing the whole teacher model
2022/07/07 00:07:13     INFO    torchdistill.core.distillation  Advanced to stage 2
Traceback (most recent call last):
  File "examples/object_detection.py", line 244, in <module>
    main(argparser.parse_args())
  File "examples/object_detection.py", line 224, in main
    train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
  File "examples/object_detection.py", line 178, in train
    train_one_epoch(training_box, device, epoch, log_freq)
  File "examples/object_detection.py", line 72, in train_one_epoch
    loss = training_box(sample_batch, targets, supp_dict)
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torchdistill/core/distillation.py", line 314, in forward
    total_loss = self.criterion(output_dict, org_loss_dict, targets)
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torchdistill/losses/custom.py", line 48, in forward
    loss_dict[loss_name] = factor * criterion(student_output_dict, teacher_output_dict, targets)
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torchdistill/losses/single.py", line 270, in forward
    reduction=self.reduction)
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torch/nn/functional.py", line 3249, in l1_loss
    return torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
  File "/home/longnv/.conda/envs/torchdistill2/lib/python3.7/site-packages/torch/nn/_reduction.py", line 19, in get_enum
    raise ValueError("{} is not a valid value for reduction".format(reduction))
ValueError: batchmean is not a valid value for reduction

Packages:

dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - ca-certificates=2022.4.26=h06a4308_0
  - certifi=2022.6.15=py37h06a4308_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - ncurses=6.3=h5eee18b_3
  - openssl=1.1.1p=h5eee18b_0
  - pip=21.2.2=py37h06a4308_0
  - python=3.7.13=h12debd9_0
  - readline=8.1.2=h7f8727e_1
  - setuptools=61.2.0=py37h06a4308_0
  - sqlite=3.38.5=hc218d9a_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.37.1=pyhd3eb1b0_0
  - xz=5.2.5=h7f8727e_1
  - zlib=1.2.12=h7f8727e_2
  - pip:
    - cython==0.29.30
    - numpy==1.21.6
    - pycocotools==2.0.4
    - python-dateutil==2.8.2
    - pyyaml==6.0
    - six==1.16.0
    - torch==1.12.0
    - torchdistill==0.3.2
    - torchvision==0.13.0
    - typing-extensions==4.3.0

About dockerfile of torchdistill

I think your open source project (torchdistill) is so great ! But I meet a little problems when I try to run it on my cloud server . Would you provide the dockerfile of this project, just like the following project (https://github.com/mzhaoshuai/SplitNet-Divide-and-Co-training). Thanks so much !

Segmentation fault encountered when entering the second epoch with num_workers>0

Hi, thanks for your code. I encountered this issue when running the training script of kd (i.e. resnet34 -> resnet18). It seems something is wrong with the data loader worker. The log is as follows:

2021/04/28 08:41:23 INFO main Updating ckpt (Best top1 accuracy: 0.0000 -> 20.4760)
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "kd_main.py", line 183, in
main(argparser.parse_args())
File "kd_main.py", line 165, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "kd_main.py", line 124, in train
train_one_epoch(training_box, device, epoch, log_freq)
File "kd_main.py", line 61, in train_one_epoch
metric_logger.log_every(training_box.train_data_loader, log_freq, header):
File "/home/ec2-user/.local/lib/python3.7/site-packages/torchdistill/misc/log.py", line 153, in log_every
for obj in iterable:
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 355, in iter
return self._get_iterator()
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 940, in init
self._reset(loader, first_iter=True)
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 971, in _reset
self._try_put_index()
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1205, in _try_put_index
index = self._next_index()
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 508, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 227, in iter
for idx in self.sampler:
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 125, in iter
yield from torch.randperm(n, generator=self.generator).tolist()
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 48668) is killed by signal: Segmentation fault.

Hopefully someone can help me address this issue. Thanks!

It seems some bug in `split_dataset`

When building dataset dict, transformers would be overloaded in train and val split dataset in split_dataset

I fixed such bug using following codes:

import copy
dataset_dict[sub_dataset_id] = copy.deepcopy(sub_dataset)

How to run my own dataset using the object detection example?

Hi,
I want to do the experiment using other dataset such as VOC dataset. What should I do before executing the examples/object_detection.py script?
I converted the VOC annotation to COCO format. And I modified the yaml configuration file(figure 1) , but got the error in the second figure.
Could you please tell me the reason? Thank you!

figure 1:

figure2:

	def forward(self, student_output, teacher_output, targets=None, args, *kwargs):
	soft_loss = super().forward(torch.log_softmax(student_output / self.temperature, dim=1),
	torch.softmax(teacher_output / self.temperature, dim=1))
	if self.alpha is None or self.alpha == 0 or targets is None:
	return soft_loss

	hard_loss = self.cross_entropy_loss(student_output, targets)
	return self.alpha * hard_loss + self.beta * (self.temperature ** 2) * soft_loss

yoshitomo-matsubara / torchdistill Goto Github PK

torchdistill's People

Stargazers

Watchers

Forkers

torchdistill's Issues

Recommend Projects

Recommend Topics

Recommend Org