I am using fgd_retina_r101_fpn_2x_distill_retina_r50_fpn_2x_coco.py to distil

1- In here the initialization of the backlog skipped: <code class="n

Not this, you do not need to skip it. Keep the same, including optimize

Drop in mAP after Retinanet Distillation about fgd HOT 10 OPEN

Rrschch-6 commented on August 16, 2024

Drop in mAP after Retinanet Distillation

from fgd.

Comments (10)

yzd-v commented on August 16, 2024

It seems strange. What's the teacher's and student's performance before distillation. Teacher is 75%, how about the student?

from fgd.

Rrschch-6 commented on August 16, 2024

my student mAP is 73% on my test dataset.

Let me describe my workflow:
1- I trained Retinanet-r101 with my data (I am using my usecase data which is damage detection on equipment) using pretrained resnet-101. As I said my mAP is 0.75
2- Then I trained Retinanet-r50 with my data (Same data used for training in step 1) using pretrained resnet-50. As I said my mAP is 0.73. This I am using for baseline.
3- I distill using checkpoint if Retinanet-r101 for distilling to Retinanet-r50. mAP drops to 58%

Note: I put my data dict in distiller config

Is initializing the student backbone with already trained Retinanet-r50 backbone (in step 2) helps?

from fgd.

yzd-v commented on August 16, 2024

For distillation, you should keep the training setting as 2. For example, using pretrained Res-50 first, then train the student with FGD. Besides, you can use inheriting strategy to further improve the studnet.

from fgd.

Rrschch-6 commented on August 16, 2024

Thanks for the reply.
1- Would you please explain more about keeping distillation the setting as 2?
2- I am using inheriting strategy for initializing neck and head of student with teachers.

from fgd.

yzd-v commented on August 16, 2024

The training and initialization setting for baseline and distillation should be the same. Such as using pretrained Res-50. Normally, the perfromance after the first epoch would be much higher than that of baseline.

from fgd.

Rrschch-6 commented on August 16, 2024

1- In here the initialization of the backlog skipped:

if name.startswith("backbone."): continue
2- My teacher and student trained with Adam lr=0.001, Do you think should I change the distiller configuration to the same parameters?

from fgd.

yzd-v commented on August 16, 2024

Not this, you do not need to skip it.
Keep the same, including optimizer.

from fgd.

Rrschch-6 commented on August 16, 2024

I have initialized the backbone of the student and adjusted the optimizer same as baseline. now I am starting with 71% with the first epoch. But in the next epochs I am getting significant drops and seems the model is not converging:

2023-09-06 18:25:50,163 - mmdet - INFO - Epoch [1][50/203] lr: 9.890e-05, eta: 3:06:07, time: 2.316, data_time: 1.673, memory: 5745, loss_cls: 0.9247, loss_bbox: 0.4944, loss_fgd_fpn_4: 37.8894, loss_fgd_fpn_3: 7.7019, loss_fgd_fpn_2: 0.6114, loss_fgd_fpn_1: 2.2289, loss_fgd_fpn_0: 8.1574, loss: 58.0080, grad_norm: 571.0370 2023-09-06 18:26:23,640 - mmdet - INFO - Epoch [1][100/203] lr: 1.988e-04, eta: 1:58:45, time: 0.670, data_time: 0.069, memory: 5745, loss_cls: 0.1247, loss_bbox: 0.1389, loss_fgd_fpn_4: 2.6471, loss_fgd_fpn_3: 0.9848, loss_fgd_fpn_2: 0.2494, loss_fgd_fpn_1: 0.8901, loss_fgd_fpn_0: 3.3881, loss: 8.4231, grad_norm: 228.2081 2023-09-06 18:26:54,944 - mmdet - INFO - Epoch [1][150/203] lr: 2.987e-04, eta: 1:34:45, time: 0.626, data_time: 0.020, memory: 5745, loss_cls: 0.1039, loss_bbox: 0.1309, loss_fgd_fpn_4: 4.3547, loss_fgd_fpn_3: 0.9474, loss_fgd_fpn_2: 0.1836, loss_fgd_fpn_1: 0.6756, loss_fgd_fpn_0: 2.6093, loss: 9.0055, grad_norm: 350.5065 2023-09-06 18:27:25,823 - mmdet - INFO - Epoch [1][200/203] lr: 3.986e-04, eta: 1:22:20, time: 0.618, data_time: 0.019, memory: 5745, loss_cls: 0.1332, loss_bbox: 0.1357, loss_fgd_fpn_4: 6.1208, loss_fgd_fpn_3: 1.2110, loss_fgd_fpn_2: 0.1994, loss_fgd_fpn_1: 0.6954, loss_fgd_fpn_0: 2.6123, loss: 11.1077, grad_norm: 423.2370 bbox_mAP: 0.7140

2023-09-06 18:31:15,442 - mmdet - INFO - Epoch [2][50/203] lr: 5.045e-04, eta: 1:38:13, time: 2.226, data_time: 1.617, memory: 5745, loss_cls: 0.1705, loss_bbox: 0.1552, loss_fgd_fpn_4: 2.1541, loss_fgd_fpn_3: 0.8034, loss_fgd_fpn_2: 0.1834, loss_fgd_fpn_1: 0.6773, loss_fgd_fpn_0: 2.6396, loss: 6.7835, grad_norm: 174.3350 2023-09-06 18:31:47,209 - mmdet - INFO - Epoch [2][100/203] lr: 6.044e-04, eta: 1:29:06, time: 0.635, data_time: 0.020, memory: 5745, loss_cls: 0.1480, loss_bbox: 0.1469, loss_fgd_fpn_4: 2.1242, loss_fgd_fpn_3: 0.6617, loss_fgd_fpn_2: 0.1525, loss_fgd_fpn_1: 0.5729, loss_fgd_fpn_0: 2.2660, loss: 6.0722, grad_norm: 183.8174 2023-09-06 18:32:18,974 - mmdet - INFO - Epoch [2][150/203] lr: 7.043e-04, eta: 1:22:25, time: 0.635, data_time: 0.021, memory: 5745, loss_cls: 0.1637, loss_bbox: 0.1636, loss_fgd_fpn_4: 1.7635, loss_fgd_fpn_3: 0.6042, loss_fgd_fpn_2: 0.1576, loss_fgd_fpn_1: 0.5902, loss_fgd_fpn_0: 2.3537, loss: 5.7965, grad_norm: 154.4271 2023-09-06 18:32:50,011 - mmdet - INFO - Epoch [2][200/203] lr: 8.042e-04, eta: 1:17:08, time: 0.621, data_time: 0.029, memory: 5745, loss_cls: 0.1569, loss_bbox: 0.1697, loss_fgd_fpn_4: 3.1833, loss_fgd_fpn_3: 0.7770, loss_fgd_fpn_2: 0.1706, loss_fgd_fpn_1: 0.6309, loss_fgd_fpn_0: 2.3866, loss: 7.4749, grad_norm: 227.1264 bbox_mAP: 0.6870

`2023-09-06 18:36:43,303 - mmdet - INFO - Epoch [3][50/203] lr: 9.101e-04, eta: 1:25:49, time: 2.288, data_time: 1.640, memory: 5745, loss_cls: 0.4277, loss_bbox: 0.2139, loss_fgd_fpn_4: 4.8695, loss_fgd_fpn_3: 1.1391, loss_fgd_fpn_2: 0.2268, loss_fgd_fpn_1: 0.8387, loss_fgd_fpn_0: 3.3569, loss: 11.0726, grad_norm: 293.6464
2023-09-06 18:37:14,756 - mmdet - INFO - Epoch [3][100/203] lr: 1.000e-03, eta: 1:20:59, time: 0.629, data_time: 0.019, memory: 5745, loss_cls: 0.4082, loss_bbox: 0.2028, loss_fgd_fpn_4: 2.1853, loss_fgd_fpn_3: 0.7505, loss_fgd_fpn_2: 0.2096, loss_fgd_fpn_1: 0.7672, loss_fgd_fpn_0: 3.3458, loss: 7.8694, grad_norm: 174.7761
2023-09-06 18:37:46,440 - mmdet - INFO - Epoch [3][150/203] lr: 1.000e-03, eta: 1:16:57, time: 0.634, data_time: 0.022, memory: 5745, loss_cls: 0.2216, loss_bbox: 0.1832, loss_fgd_fpn_4: 2.8286, loss_fgd_fpn_3: 0.7335, loss_fgd_fpn_2: 0.1637, loss_fgd_fpn_1: 0.5890, loss_fgd_fpn_0: 2.2849, loss: 7.0046, grad_norm: 201.6889
2023-09-06 18:38:18,894 - mmdet - INFO - Epoch [3][200/203] lr: 1.000e-03, eta: 1:13:36, time: 0.649, data_time: 0.017, memory: 5745, loss_cls: 0.4074, loss_bbox: 0.2027, loss_fgd_fpn_4: 2.2815, loss_fgd_fpn_3: 0.8024, loss_fgd_fpn_2: 0.1913, loss_fgd_fpn_1: 0.7121, loss_fgd_fpn_0: 2.5630, loss: 7.1605, grad_norm: 149.1443

bbox_mAP: 0.4750`

from fgd.

yzd-v commented on August 16, 2024

It seems strange. Does the baseline keep the same that the first epoch performs best? Does the learning rate for distillation keep the same as baseline.

from fgd.

Rrschch-6 commented on August 16, 2024

Thanks.
The problem was learning rate. I reduced my learning rate and now the optimization is working. I will share the result under this post for reference.

My other question is : What loss_fgd_fpn_0 to loss_fgd_fpn_4* I mean is loss_fgd_fpn_0 Lat and loss_fgd_fpn_4 is Lfocal?

from fgd.

Drop in mAP after Retinanet Distillation about fgd HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent