Comments (24)
Hi @Coldfire93 ,
Thank you for confirming that!
I identified and fixed a bug introduced in recent update. PR #126 should resolve it.
Since the bug is in the package, I'll soon release a new version of torchdistill so that you can update the package.
from torchdistill.
Hi @Coldfire93 ,
It solves the issue and starts training. It looks good. Thank you very much. I will learn more about how to config.
Great to hear that :)
I did the experiment, but it seems that the torchvision pretrained model was not loaded. The loss is very large shown below:
I believe that you're using pretrained teacher model and the loss values you showed above are not that large for GHND (generalized head network distillation) since the loss is the sum of squared errors as shown in Fig. 1 and Eq. (2) of the paper. Note that in the above log says Loss = 1.0 * MSELoss() + 1.0 * MSELoss() + 1.0 * MSELoss() + 1.0 * MSELoss()
but they are MSELoss
module in torch and their reduction is sum
and they are working as sum of squared error losses.
The log file associated with the yaml file is also available in the folder, and you can refer to the numbers at each epoch (though you cannot expect them to match loss values in your training log).
I want to compare the performance of the two model trained by step 2 and step 3. The performance of the model trained by step 3 is supposed to be better than that of step 2.
I wonder if the above steps are reasonable.
The step 3 looks built on step 2 like pretrained on coco -> end-to-end training on voc (step 2) -> GHND on voc (step 3).
If you have some hypothesis that the three steps significantly improve performance over simple end-to-end training, it may be worth trying.
If not and you simply want to see end-to-end training vs. GHND, I'd suggest the following three separate experiments:
- train teacher model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
- train my student model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
- train my student model by GHND on voc dataset with/without initializing the model by my published trained weights, using 1. as teacher model
so that you can compare the performance of step 2 with that of step 3. Note that the student model at the 3rd experiment is partially initialized with the teacher model obtained through the 1st experiment, not with the student model through the 2nd experiment.
To leverage of GHND, you should initialize weights of layers in student at step 2 by those in teacher model fine-tuned to VOC (step 1) as HND and GHND reuse pretrained teacher model's tail portion for that of student model (i.e., the first k layers in student are trained by HND or GHND and all their remaining layers are fixed and identical to those in teacher in terms of architecture and learned params)
from torchdistill.
Hi @Coldfire93 ,
If your format is compatible with that of COCO dataset used in torchvision, I think you can use my example code as is.
Can you confirm that your converted VOC files work with their reference code ?
The KeyError: '2'
in [self.imgs[id] for id in ids]
looks like your dataset instance misses the image file paths.
Also another question; how does voc_collate_fn
look? I think it's not from my repo.
Since the error occurs in DataLoader
, it might be caused by the collate function as well.
from torchdistill.
Hi @yoshitomo-matsubara ,
Thanks for your answer. I have confirmed that my converted VOC files works ok with the script you mentioned [https://github.com/pytorch/vision/tree/master/references/detection]
-
And I changed the "voc_collate_fn" to "coco_collate_fn".
-
But I still got an error (figure 3). Could you please tell me the reason?
-
I have another question. How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!
from torchdistill.
Hi @yoshitomo-matsubara ,
Thanks for your answer. I have confirmed that my converted VOC files works ok with the script you mentioned [https://github.com/pytorch/vision/tree/master/references/detection]
- And I changed the "voc_collate_fn" to "coco_collate_fn".
- But I still got an error (figure 3). Could you please tell me the reason?
- I have another question. How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!
It seems that the problem is produced because I didn't resize the image. I will try to use the "transforms_params" defined in the yaml file to do the resize operation.
Another question, can I set the "collate_fn" in the yaml file to None instead of "coco_collate_fn"?
Thank you~
from torchdistill.
Hi @Coldfire93 ,
Thank you for confirming that!
I identified and fixed a bug introduced in recent update. PR #126 should resolve it.Since the bug is in the package, I'll soon release a new version of torchdistill so that you can update the package.
OK. Thanks~
from torchdistill.
Hi @Coldfire93 ,
I just released the new version, torchdistill==0.2.3
Update your local package and let me know if it resolves the issue
from torchdistill.
Hi @Coldfire93 ,
I just released the new version, torchdistill==0.2.3
Update your local package and let me know if it resolves the issue
Hi @yoshitomo-matsubara ,
I updated and got an error below:
But I have checked that the contents in the parameter "targets":
from torchdistill.
Could you put your 1) yaml config and 2) executed command in text instead of screenshot? I didn't get such errors when using my config files and example code in this repo
How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!
I forgot to answer this; Do you want to use weights of my customized student model? or weights of original Faster R-CNN?
from torchdistill.
Could you put your 1) yaml config and 2) executed command in text instead of screenshot? I didn't get such errors when using my config files and example code in this repo
How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!
I forgot to answer this; Do you want to use weights of my customized student model? or weights of original Faster R-CNN?
OK.
- My yaml config is below:
frcnn_resnet50_voc.txt
Note: I modified the .yaml to .txt since .yaml file is not supported to upload.
2)The executed command:
python examples/object_detection.py --config configs/frcnn_resnet50_voc.yaml -student_only
And, I want to use weights of your customized student model.
Thank you~
from torchdistill.
The command looks fine; I'd suggest you add --log <some file path>
to keep your training log
e.g., python examples/object_detection.py --config configs/frcnn_resnet50_voc.yaml --log log/frcnn_resnet50_voc.txt -student_only
Your yaml config file still contains teacher model in training loop and attempts to use head network distillation, which requires teacher model.
Check if the following config works for you. This one doesn't include teacher model, but trains student model by minimizing the original loss function used in torchvision.
new_frcnn_resnet50_voc.txt
(Note that you'll probably want to tune hyperparameters in the yaml file once you confirm it works)
To use weights of my customized student model, download checkpoints available here and specify the file path in ckpt
entry of your yaml file
e.g.,
models:
model:
... (skipped)
ckpt: 'ckpt_file_path.pt '
from torchdistill.
Hi, @yoshitomo-matsubara,
Thanks for your answer. But I got the same error when running my own dataset ( use the config you modified) .
But, it works ok when running the coco dataset. (Error occurs before upgrading torchdistill to 0.2.3)
The log file and error msg is below:
frcnn_resnet50_voc.txt
from torchdistill.
Could you put your 1) yaml config and 2) executed command in text instead of screenshot? I didn't get such errors when using my config files and example code in this repo
How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!
I forgot to answer this; Do you want to use weights of my customized student model? or weights of original Faster R-CNN?
Hi, @yoshitomo-matsubara ,
Is teacher/coco2017-fasterrcnn_resnet50_fpn.pt the weights of the original Faster R-CNN? Where I can download it? Thank you~
from torchdistill.
Hi @Coldfire93 ,
Thanks for your answer. But I got the same error when running my own dataset ( use the config you modified) .
I found forward_proc
should be forward_proc: 'forward_batch_target'
for end-to-end training in case of object detection, here is the fixed one.
new_frcnn_resnet50_voc.txt.
Is teacher/coco2017-fasterrcnn_resnet50_fpn.pt the weights of the original Faster R-CNN? Where I can download it? Thank you~
The teacher model weights are from torchvision, and when ckpt
file for teacher (in yaml) does not exist, it downloads and uses the pretrained weights in torchvision as long as you leave pretrained: True
for the teacher arguments (The list of args is completely dependent on its original interface .
In our studies, I used torchvision's pretrained models as teachers.
I should have asked you this; What training method would you like to try with torchdistill?
If it's end-to-end training without teacher (like torchvision's reference code), the above config should be ok. If not and you want to use something else like (generalized) head network distillation, you need teacher configs like the original official
and sample
configs.
P.S.,
It would be appreciated and more useful if you could copy and paste the error log as txt (e.g., Ctrl + Shift + C on terminal) instead of screenshot so that other users can catch this issue when searching
from torchdistill.
Hi @yoshitomo-matsubara ,
I found forward_proc should be forward_proc: 'forward_batch_target' for end-to-end training in case of object detection.
It solves the issue and starts training. It looks good. Thank you very much. I will learn more about how to config.
The teacher model weights are from torchvision, and when ckpt file for teacher (in yaml) does not exist, it downloads and uses the pretrained weights in torchvision as long as you leave pretrained: True for the teacher arguments (The list of args is completely dependent on its original interface .
In our studies, I used torchvision's pretrained models as teachers.
I did the experiment, but it seems that the torchvision pretrained model was not loaded. The loss is very large shown below:
(torchdistill) songhongguang@elcnlhdc-41-239:~/lwh/torchdistill$ python examples/object_detection.py --config configs/official/coco2017/yoshitomo-matsubara/rrpr2020/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.yaml --log logs/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.log
2021/06/30 22:43:39 INFO torchdistill.common.main_util Not using distributed mode
2021/06/30 22:43:39 INFO main Namespace(adjust_lr=False, config='configs/official/coco2017/yoshitomo-matsubara/rrpr2020/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.yaml', device='cuda', dist_url='env://', iou_types=None, log='logs/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.log', seed=None, start_epoch=0, student_only=False, test_only=False, world_size=1)
loading annotations into memory...
Done (t=20.95s)
creating index...
index created!
loading annotations into memory...
Done (t=3.69s)
creating index...
index created!
2021/06/30 22:44:08 INFO torchdistill.common.main_util ckpt file is not found at ./resource/ckpt/coco2017/teacher/coco2017-fasterrcnn_resnet50_fpn.pt
2021/06/30 22:44:13 INFO torchdistill.common.main_util Loading model parameters
2021/06/30 22:44:14 INFO main Start training
2021/06/30 22:44:14 INFO torchdistill.datasets.sampler Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
2021/06/30 22:44:14 INFO torchdistill.datasets.sampler Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
2021/06/30 22:44:14 INFO torchdistill.models.util [R-CNN model]
2021/06/30 22:44:14 INFO torchdistill.models.util Redesigning the R-CNN model with ['backbone.body']
2021/06/30 22:44:14 INFO torchdistill.models.util [teacher model]
2021/06/30 22:44:14 INFO torchdistill.models.util Using the HeadRCNN teacher model
2021/06/30 22:44:14 INFO torchdistill.models.util [R-CNN model]
2021/06/30 22:44:14 INFO torchdistill.models.util Redesigning the R-CNN model with ['backbone.body']
2021/06/30 22:44:14 INFO torchdistill.models.util [student model]
2021/06/30 22:44:14 INFO torchdistill.models.util Using the HeadRCNN student model
2021/06/30 22:44:14 INFO torchdistill.models.util Frozen module(s): {'seq.backbone.body.layer4', 'seq.backbone.body.layer3', 'seq.backbone.body.layer2'}
2021/06/30 22:44:14 INFO torchdistill.core.distillation Loss = 1.0 * MSELoss() + 1.0 * MSELoss() + 1.0 * MSELoss() + 1.0 * MSELoss()
2021/06/30 22:44:14 INFO torchdistill.core.distillation Freezing the whole teacher model
2021/06/30 22:44:14 INFO torchdistill.common.main_util Loading optimizer parameters
2021/06/30 22:44:14 INFO torchdistill.common.main_util Loading scheduler parameters
2021/06/30 22:44:21 INFO torchdistill.misc.log Epoch: [0] [ 0/29316] eta: 2 days, 5:11:02 lr: 0.0001 img/s: 1.053708416124615 loss: 1119365.1250 (1119365.1250) time: 6.5310 data: 2.7348 max mem: 9330
2021/06/30 22:57:03 INFO torchdistill.misc.log Epoch: [0] [ 1000/29316] eta: 6:02:25 lr: 0.0001 img/s: 6.1385905169748485 loss: 927617.8750 (962067.6578) time: 0.7018 data: 0.0133 max mem: 9334
2021/06/30 23:08:33 INFO torchdistill.misc.log Epoch: [0] [ 2000/29316] eta: 5:32:01 lr: 0.0001 img/s: 5.9592064060377705 loss: 919090.8750 (964320.9032) time: 0.6940 data: 0.0123 max mem: 9334
2021/06/30 23:20:04 INFO torchdistill.misc.log Epoch: [0] [ 3000/29316] eta: 5:14:16 lr: 0.0001 img/s: 6.350791444465642 loss: 986742.8125 (966603.5369) time: 0.6556 data: 0.0125 max mem: 9334
I should have asked you this; What training method would you like to try with torchdistill?
If it's end-to-end training without teacher (like torchvision's reference code), the above config should be ok. If not and you want to use something else like (generalized) head network distillation, you need teacher configs like the original official and sample configs.
I want to use torchdistill to do knowledge distillation. My method contains three steps:
- train teacher model on voc datasets( as you said, use torchvision's pretrained model as teacher );
- train student model on voc datasets( use your custom model(trained on coco) as my pretrained model);
- train student model use ghnd method( use the step 2 model as pretrained model)
I want to compare the performance of the two model trained by step 2 and step 3. The performance of the model trained by step 3 is supposed to be better than that of step 2.
I wonder if the above steps are reasonable.
Thanks for your patience.
P.S.,
It would be appreciated and more useful if you could copy and paste the error log as txt (e.g., Ctrl + Shift + C on terminal) instead of screenshot so that other users can catch this issue when searching
OK. I understand. 😊
from torchdistill.
Hi @yoshitomo-matsubara ,
- train teacher model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
- train my student model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
- train my student model by GHND on voc dataset with/without initializing the model by my published trained weights, using 1. as teacher model
I'm confused about step 2. I thought the student network is designed by you( You modified the structure of the backbone). And there is no corresponding pretrained model in torchvision.
Maybe I should learn more about GHND? I‘d like your advice. Thank you~
from torchdistill.
@Coldfire93
Yes, I designed the student model and there is no pretrained model in torchvision about the step 2.
It meant with/without initializing the model by my pretrained weights.
from torchdistill.
The size of your teacher model and student model is almost the same(about 160M) . I wonder why?
It's expected that the student model is smaller than the teacher model.
from torchdistill.
Hi @Coldfire93 ,
The size of your teacher model and student model is almost the same(about 160M) . I wonder why?
It's expected that the student model is smaller than the teacher model.
The student models in the example are from our ICPR paper (preprint ver.).
The teacher models are pretrained Faster, Mask, and Keypoint R-CNNs in torchvision, and their student models are based on the teacher models but modified to introduce bottlenecks for split computing
i.e., the first layers until bottleneck called head model will be executed on mobile device and its output (compressed information called bottleneck) will be transferred to edge server to complete the inference by the rest of the model (called tail model).
While the overall student model size is almost the same as teacher model, the student model w/ bottleneck can achieve shorter end-to-end latency by splitting the inference for resource-constrained edge computing systems. Read the above paper for more details.
Could you please tell me the teacher's information? (mAP, #Epochs, Training time)
As described in the torchdistill paper, I did all the experiments for reproducing experimental results reported in prior studies.
The Table 6 shows results originally reported in the above ICPR paper reproduced by torchdistill. Thus, the teacher models are also pretrained Faster and Mask R-CNN models in torchvision. The mAP of the teacher models are also shown in Table 3 in the above ICPR paper, and other information (# epochs and training time) can be found in torchvision's example code and blog post.
from torchdistill.
Hi @yoshitomo-matsubara ,
Thank you for your reply. I understand.
Actually, I want to get a smaller student model by doing knowledge distillation. Obviously, the ghnd method can not do that.
But, the training time is shorten by using the ghnd method. (60hours v.s. 24hours) That's good.
Thank you again.
from torchdistill.
For object detection, applying knowledge distillation to object detection in end-to-end manner is pretty difficult as I answered at #117
FYI, torchvision recently introduced SSD object detection models.
If you find a pair of module paths in student and teacher models that match the output shapes to compute a loss value, you can do a kind of such knowledge distillation (student model much smaller than teacher model) for object detection by defining so in a yaml file
from torchdistill.
Hi @yoshitomo-matsubara ,
I'm trying to understand what you said above. I read the paper and learned about the GHND method. This method is very valuable. I will continue to follow your work. I will read your code to learn more.
Thank you again.
from torchdistill.
Hi @Coldfire93 ,
My pleasure. Feel free to ask me if you have any question.
from torchdistill.
@Coldfire93 Closing this issue as I haven't seen any follow-up for a while.
Open a new Discussion
(not Issue
) if you still have questions
from torchdistill.
Related Issues (20)
- It seems some bug in `split_dataset` HOT 1
- Distilling Knowledge from a image classification model with sigmoid function and binary cross entropy HOT 3
- Bug. Bad implement. HOT 2
- Combine two distillation losses HOT 9
- Similarity Preserving KD HOT 2
- How to train my own COCO dataset for object detection? HOT 1
- Why using `log_softmax` instead of `softmax`? HOT 1
- ValueError: batchmean is not a valid value for reduction HOT 1
- Disagreement betweeen the log and configuration of kd-resnet18_from_resnet34 HOT 1
- Use different models as Teacher/Student HOT 1
- Custom Data HOT 1
- Where is trained model? HOT 1
- Not a bug but a discrepency between the log and config file for kd-resnet18_from_resnet34 HOT 1
- How should I use Torchdistill? HOT 1
- [BUG] Not supported to Nvidia 4090 HOT 1
- I tried with this script also, only single nproc seems to be working. Do i need to define any additional enviornment variables like RANK or LocaL HOST HOT 1
- [BUG] fp16 causes AssertionError: No inf checks were recorded for this optimizer HOT 4
- [BUG] Missing Link in Readme HOT 1
- [BUG]ImportError: cannot import name 'import_dependencies' from 'torchdistill.common.main_util' HOT 2
- is tochdistill support knowlede distillation for Vision Foundation Models like Grounding Dino / Grounding DinoSAM ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchdistill.