Comments (10)
It seems strange. What's the teacher's and student's performance before distillation. Teacher is 75%, how about the student?
from fgd.
my student mAP is 73% on my test dataset.
Let me describe my workflow:
1- I trained Retinanet-r101 with my data (I am using my usecase data which is damage detection on equipment) using pretrained resnet-101. As I said my mAP is 0.75
2- Then I trained Retinanet-r50 with my data (Same data used for training in step 1) using pretrained resnet-50. As I said my mAP is 0.73. This I am using for baseline.
3- I distill using checkpoint if Retinanet-r101 for distilling to Retinanet-r50. mAP drops to 58%
Note: I put my data dict in distiller config
Is initializing the student backbone with already trained Retinanet-r50 backbone (in step 2) helps?
from fgd.
For distillation, you should keep the training setting as 2. For example, using pretrained Res-50 first, then train the student with FGD. Besides, you can use inheriting strategy to further improve the studnet.
from fgd.
Thanks for the reply.
1- Would you please explain more about keeping distillation the setting as 2?
2- I am using inheriting strategy for initializing neck and head of student with teachers.
from fgd.
The training and initialization setting for baseline and distillation should be the same. Such as using pretrained Res-50. Normally, the perfromance after the first epoch would be much higher than that of baseline.
from fgd.
1- In here the initialization of the backlog skipped:
if name.startswith("backbone."): continue
2- My teacher and student trained with Adam lr=0.001, Do you think should I change the distiller configuration to the same parameters?
from fgd.
- Not this, you do not need to skip it.
- Keep the same, including optimizer.
from fgd.
I have initialized the backbone of the student and adjusted the optimizer same as baseline. now I am starting with 71% with the first epoch. But in the next epochs I am getting significant drops and seems the model is not converging:
2023-09-06 18:25:50,163 - mmdet - INFO - Epoch [1][50/203] lr: 9.890e-05, eta: 3:06:07, time: 2.316, data_time: 1.673, memory: 5745, loss_cls: 0.9247, loss_bbox: 0.4944, loss_fgd_fpn_4: 37.8894, loss_fgd_fpn_3: 7.7019, loss_fgd_fpn_2: 0.6114, loss_fgd_fpn_1: 2.2289, loss_fgd_fpn_0: 8.1574, loss: 58.0080, grad_norm: 571.0370 2023-09-06 18:26:23,640 - mmdet - INFO - Epoch [1][100/203] lr: 1.988e-04, eta: 1:58:45, time: 0.670, data_time: 0.069, memory: 5745, loss_cls: 0.1247, loss_bbox: 0.1389, loss_fgd_fpn_4: 2.6471, loss_fgd_fpn_3: 0.9848, loss_fgd_fpn_2: 0.2494, loss_fgd_fpn_1: 0.8901, loss_fgd_fpn_0: 3.3881, loss: 8.4231, grad_norm: 228.2081 2023-09-06 18:26:54,944 - mmdet - INFO - Epoch [1][150/203] lr: 2.987e-04, eta: 1:34:45, time: 0.626, data_time: 0.020, memory: 5745, loss_cls: 0.1039, loss_bbox: 0.1309, loss_fgd_fpn_4: 4.3547, loss_fgd_fpn_3: 0.9474, loss_fgd_fpn_2: 0.1836, loss_fgd_fpn_1: 0.6756, loss_fgd_fpn_0: 2.6093, loss: 9.0055, grad_norm: 350.5065 2023-09-06 18:27:25,823 - mmdet - INFO - Epoch [1][200/203] lr: 3.986e-04, eta: 1:22:20, time: 0.618, data_time: 0.019, memory: 5745, loss_cls: 0.1332, loss_bbox: 0.1357, loss_fgd_fpn_4: 6.1208, loss_fgd_fpn_3: 1.2110, loss_fgd_fpn_2: 0.1994, loss_fgd_fpn_1: 0.6954, loss_fgd_fpn_0: 2.6123, loss: 11.1077, grad_norm: 423.2370 bbox_mAP: 0.7140
2023-09-06 18:31:15,442 - mmdet - INFO - Epoch [2][50/203] lr: 5.045e-04, eta: 1:38:13, time: 2.226, data_time: 1.617, memory: 5745, loss_cls: 0.1705, loss_bbox: 0.1552, loss_fgd_fpn_4: 2.1541, loss_fgd_fpn_3: 0.8034, loss_fgd_fpn_2: 0.1834, loss_fgd_fpn_1: 0.6773, loss_fgd_fpn_0: 2.6396, loss: 6.7835, grad_norm: 174.3350 2023-09-06 18:31:47,209 - mmdet - INFO - Epoch [2][100/203] lr: 6.044e-04, eta: 1:29:06, time: 0.635, data_time: 0.020, memory: 5745, loss_cls: 0.1480, loss_bbox: 0.1469, loss_fgd_fpn_4: 2.1242, loss_fgd_fpn_3: 0.6617, loss_fgd_fpn_2: 0.1525, loss_fgd_fpn_1: 0.5729, loss_fgd_fpn_0: 2.2660, loss: 6.0722, grad_norm: 183.8174 2023-09-06 18:32:18,974 - mmdet - INFO - Epoch [2][150/203] lr: 7.043e-04, eta: 1:22:25, time: 0.635, data_time: 0.021, memory: 5745, loss_cls: 0.1637, loss_bbox: 0.1636, loss_fgd_fpn_4: 1.7635, loss_fgd_fpn_3: 0.6042, loss_fgd_fpn_2: 0.1576, loss_fgd_fpn_1: 0.5902, loss_fgd_fpn_0: 2.3537, loss: 5.7965, grad_norm: 154.4271 2023-09-06 18:32:50,011 - mmdet - INFO - Epoch [2][200/203] lr: 8.042e-04, eta: 1:17:08, time: 0.621, data_time: 0.029, memory: 5745, loss_cls: 0.1569, loss_bbox: 0.1697, loss_fgd_fpn_4: 3.1833, loss_fgd_fpn_3: 0.7770, loss_fgd_fpn_2: 0.1706, loss_fgd_fpn_1: 0.6309, loss_fgd_fpn_0: 2.3866, loss: 7.4749, grad_norm: 227.1264 bbox_mAP: 0.6870
`2023-09-06 18:36:43,303 - mmdet - INFO - Epoch [3][50/203] lr: 9.101e-04, eta: 1:25:49, time: 2.288, data_time: 1.640, memory: 5745, loss_cls: 0.4277, loss_bbox: 0.2139, loss_fgd_fpn_4: 4.8695, loss_fgd_fpn_3: 1.1391, loss_fgd_fpn_2: 0.2268, loss_fgd_fpn_1: 0.8387, loss_fgd_fpn_0: 3.3569, loss: 11.0726, grad_norm: 293.6464
2023-09-06 18:37:14,756 - mmdet - INFO - Epoch [3][100/203] lr: 1.000e-03, eta: 1:20:59, time: 0.629, data_time: 0.019, memory: 5745, loss_cls: 0.4082, loss_bbox: 0.2028, loss_fgd_fpn_4: 2.1853, loss_fgd_fpn_3: 0.7505, loss_fgd_fpn_2: 0.2096, loss_fgd_fpn_1: 0.7672, loss_fgd_fpn_0: 3.3458, loss: 7.8694, grad_norm: 174.7761
2023-09-06 18:37:46,440 - mmdet - INFO - Epoch [3][150/203] lr: 1.000e-03, eta: 1:16:57, time: 0.634, data_time: 0.022, memory: 5745, loss_cls: 0.2216, loss_bbox: 0.1832, loss_fgd_fpn_4: 2.8286, loss_fgd_fpn_3: 0.7335, loss_fgd_fpn_2: 0.1637, loss_fgd_fpn_1: 0.5890, loss_fgd_fpn_0: 2.2849, loss: 7.0046, grad_norm: 201.6889
2023-09-06 18:38:18,894 - mmdet - INFO - Epoch [3][200/203] lr: 1.000e-03, eta: 1:13:36, time: 0.649, data_time: 0.017, memory: 5745, loss_cls: 0.4074, loss_bbox: 0.2027, loss_fgd_fpn_4: 2.2815, loss_fgd_fpn_3: 0.8024, loss_fgd_fpn_2: 0.1913, loss_fgd_fpn_1: 0.7121, loss_fgd_fpn_0: 2.5630, loss: 7.1605, grad_norm: 149.1443
bbox_mAP: 0.4750`
from fgd.
It seems strange. Does the baseline keep the same that the first epoch performs best? Does the learning rate for distillation keep the same as baseline.
from fgd.
Thanks.
The problem was learning rate. I reduced my learning rate and now the optimization is working. I will share the result under this post for reference.
My other question is : What loss_fgd_fpn_0 to loss_fgd_fpn_4* I mean is loss_fgd_fpn_0 Lat and loss_fgd_fpn_4 is Lfocal?
from fgd.
Related Issues (20)
- Whether additional monitoring training is required for gc-block? HOT 2
- The pre-trained model for Solo_r101_3x.pth HOT 1
- 1 HOT 1
- ConfigDict has no attribute "model" HOT 6
- 请问作者有在yolov5上实现效果吗 HOT 2
- Welcome update to OpenMMLab 2.0
- hello,I want to ask a question about yolox distillation HOT 8
- KeyError: 'neck.fpn_convs.4.conv'
- invalid pre-training HOT 7
- Why the mAP is lower after distillation in VOC? HOT 1
- Can the network be replicated using the 2023 version of mmdetection? HOT 4
- YoloX-l distill Yolox-s with coco pretrained weights HOT 1
- How to output the Params or FLOPs after FGD distillation? HOT 5
- Some questions about gcblock HOT 1
- 你好,我看到您在GitHub更新了yolox的相关蒸馏的数据 HOT 4
- FGD for Ultralytics frameworks
- To know Pytorch and cuda version
- fgd.py文件中,area的值为0
- 你好,有空可以更新一下代码到mmdetection3.0以上版本吗。谢谢了
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fgd.