Coder Social home page Coder Social logo

Comments (10)

yzd-v avatar yzd-v commented on August 16, 2024

It seems strange. What's the teacher's and student's performance before distillation. Teacher is 75%, how about the student?

from fgd.

Rrschch-6 avatar Rrschch-6 commented on August 16, 2024

my student mAP is 73% on my test dataset.

Let me describe my workflow:
1- I trained Retinanet-r101 with my data (I am using my usecase data which is damage detection on equipment) using pretrained resnet-101. As I said my mAP is 0.75
2- Then I trained Retinanet-r50 with my data (Same data used for training in step 1) using pretrained resnet-50. As I said my mAP is 0.73. This I am using for baseline.
3- I distill using checkpoint if Retinanet-r101 for distilling to Retinanet-r50. mAP drops to 58%

Note: I put my data dict in distiller config

Is initializing the student backbone with already trained Retinanet-r50 backbone (in step 2) helps?

from fgd.

yzd-v avatar yzd-v commented on August 16, 2024

For distillation, you should keep the training setting as 2. For example, using pretrained Res-50 first, then train the student with FGD. Besides, you can use inheriting strategy to further improve the studnet.

from fgd.

Rrschch-6 avatar Rrschch-6 commented on August 16, 2024

Thanks for the reply.
1- Would you please explain more about keeping distillation the setting as 2?
2- I am using inheriting strategy for initializing neck and head of student with teachers.

from fgd.

yzd-v avatar yzd-v commented on August 16, 2024

The training and initialization setting for baseline and distillation should be the same. Such as using pretrained Res-50. Normally, the perfromance after the first epoch would be much higher than that of baseline.

from fgd.

Rrschch-6 avatar Rrschch-6 commented on August 16, 2024

1- In here the initialization of the backlog skipped:

if name.startswith("backbone."): continue
2- My teacher and student trained with Adam lr=0.001, Do you think should I change the distiller configuration to the same parameters?

from fgd.

yzd-v avatar yzd-v commented on August 16, 2024
  1. Not this, you do not need to skip it.
  2. Keep the same, including optimizer.

from fgd.

Rrschch-6 avatar Rrschch-6 commented on August 16, 2024

I have initialized the backbone of the student and adjusted the optimizer same as baseline. now I am starting with 71% with the first epoch. But in the next epochs I am getting significant drops and seems the model is not converging:

2023-09-06 18:25:50,163 - mmdet - INFO - Epoch [1][50/203] lr: 9.890e-05, eta: 3:06:07, time: 2.316, data_time: 1.673, memory: 5745, loss_cls: 0.9247, loss_bbox: 0.4944, loss_fgd_fpn_4: 37.8894, loss_fgd_fpn_3: 7.7019, loss_fgd_fpn_2: 0.6114, loss_fgd_fpn_1: 2.2289, loss_fgd_fpn_0: 8.1574, loss: 58.0080, grad_norm: 571.0370 2023-09-06 18:26:23,640 - mmdet - INFO - Epoch [1][100/203] lr: 1.988e-04, eta: 1:58:45, time: 0.670, data_time: 0.069, memory: 5745, loss_cls: 0.1247, loss_bbox: 0.1389, loss_fgd_fpn_4: 2.6471, loss_fgd_fpn_3: 0.9848, loss_fgd_fpn_2: 0.2494, loss_fgd_fpn_1: 0.8901, loss_fgd_fpn_0: 3.3881, loss: 8.4231, grad_norm: 228.2081 2023-09-06 18:26:54,944 - mmdet - INFO - Epoch [1][150/203] lr: 2.987e-04, eta: 1:34:45, time: 0.626, data_time: 0.020, memory: 5745, loss_cls: 0.1039, loss_bbox: 0.1309, loss_fgd_fpn_4: 4.3547, loss_fgd_fpn_3: 0.9474, loss_fgd_fpn_2: 0.1836, loss_fgd_fpn_1: 0.6756, loss_fgd_fpn_0: 2.6093, loss: 9.0055, grad_norm: 350.5065 2023-09-06 18:27:25,823 - mmdet - INFO - Epoch [1][200/203] lr: 3.986e-04, eta: 1:22:20, time: 0.618, data_time: 0.019, memory: 5745, loss_cls: 0.1332, loss_bbox: 0.1357, loss_fgd_fpn_4: 6.1208, loss_fgd_fpn_3: 1.2110, loss_fgd_fpn_2: 0.1994, loss_fgd_fpn_1: 0.6954, loss_fgd_fpn_0: 2.6123, loss: 11.1077, grad_norm: 423.2370 bbox_mAP: 0.7140

2023-09-06 18:31:15,442 - mmdet - INFO - Epoch [2][50/203] lr: 5.045e-04, eta: 1:38:13, time: 2.226, data_time: 1.617, memory: 5745, loss_cls: 0.1705, loss_bbox: 0.1552, loss_fgd_fpn_4: 2.1541, loss_fgd_fpn_3: 0.8034, loss_fgd_fpn_2: 0.1834, loss_fgd_fpn_1: 0.6773, loss_fgd_fpn_0: 2.6396, loss: 6.7835, grad_norm: 174.3350 2023-09-06 18:31:47,209 - mmdet - INFO - Epoch [2][100/203] lr: 6.044e-04, eta: 1:29:06, time: 0.635, data_time: 0.020, memory: 5745, loss_cls: 0.1480, loss_bbox: 0.1469, loss_fgd_fpn_4: 2.1242, loss_fgd_fpn_3: 0.6617, loss_fgd_fpn_2: 0.1525, loss_fgd_fpn_1: 0.5729, loss_fgd_fpn_0: 2.2660, loss: 6.0722, grad_norm: 183.8174 2023-09-06 18:32:18,974 - mmdet - INFO - Epoch [2][150/203] lr: 7.043e-04, eta: 1:22:25, time: 0.635, data_time: 0.021, memory: 5745, loss_cls: 0.1637, loss_bbox: 0.1636, loss_fgd_fpn_4: 1.7635, loss_fgd_fpn_3: 0.6042, loss_fgd_fpn_2: 0.1576, loss_fgd_fpn_1: 0.5902, loss_fgd_fpn_0: 2.3537, loss: 5.7965, grad_norm: 154.4271 2023-09-06 18:32:50,011 - mmdet - INFO - Epoch [2][200/203] lr: 8.042e-04, eta: 1:17:08, time: 0.621, data_time: 0.029, memory: 5745, loss_cls: 0.1569, loss_bbox: 0.1697, loss_fgd_fpn_4: 3.1833, loss_fgd_fpn_3: 0.7770, loss_fgd_fpn_2: 0.1706, loss_fgd_fpn_1: 0.6309, loss_fgd_fpn_0: 2.3866, loss: 7.4749, grad_norm: 227.1264 bbox_mAP: 0.6870

`2023-09-06 18:36:43,303 - mmdet - INFO - Epoch [3][50/203] lr: 9.101e-04, eta: 1:25:49, time: 2.288, data_time: 1.640, memory: 5745, loss_cls: 0.4277, loss_bbox: 0.2139, loss_fgd_fpn_4: 4.8695, loss_fgd_fpn_3: 1.1391, loss_fgd_fpn_2: 0.2268, loss_fgd_fpn_1: 0.8387, loss_fgd_fpn_0: 3.3569, loss: 11.0726, grad_norm: 293.6464
2023-09-06 18:37:14,756 - mmdet - INFO - Epoch [3][100/203] lr: 1.000e-03, eta: 1:20:59, time: 0.629, data_time: 0.019, memory: 5745, loss_cls: 0.4082, loss_bbox: 0.2028, loss_fgd_fpn_4: 2.1853, loss_fgd_fpn_3: 0.7505, loss_fgd_fpn_2: 0.2096, loss_fgd_fpn_1: 0.7672, loss_fgd_fpn_0: 3.3458, loss: 7.8694, grad_norm: 174.7761
2023-09-06 18:37:46,440 - mmdet - INFO - Epoch [3][150/203] lr: 1.000e-03, eta: 1:16:57, time: 0.634, data_time: 0.022, memory: 5745, loss_cls: 0.2216, loss_bbox: 0.1832, loss_fgd_fpn_4: 2.8286, loss_fgd_fpn_3: 0.7335, loss_fgd_fpn_2: 0.1637, loss_fgd_fpn_1: 0.5890, loss_fgd_fpn_0: 2.2849, loss: 7.0046, grad_norm: 201.6889
2023-09-06 18:38:18,894 - mmdet - INFO - Epoch [3][200/203] lr: 1.000e-03, eta: 1:13:36, time: 0.649, data_time: 0.017, memory: 5745, loss_cls: 0.4074, loss_bbox: 0.2027, loss_fgd_fpn_4: 2.2815, loss_fgd_fpn_3: 0.8024, loss_fgd_fpn_2: 0.1913, loss_fgd_fpn_1: 0.7121, loss_fgd_fpn_0: 2.5630, loss: 7.1605, grad_norm: 149.1443

bbox_mAP: 0.4750`

from fgd.

yzd-v avatar yzd-v commented on August 16, 2024

It seems strange. Does the baseline keep the same that the first epoch performs best? Does the learning rate for distillation keep the same as baseline.

from fgd.

Rrschch-6 avatar Rrschch-6 commented on August 16, 2024

Thanks.
The problem was learning rate. I reduced my learning rate and now the optimization is working. I will share the result under this post for reference.

My other question is : What loss_fgd_fpn_0 to loss_fgd_fpn_4* I mean is loss_fgd_fpn_0 Lat and loss_fgd_fpn_4 is Lfocal?

from fgd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.