Coder Social home page Coder Social logo

hikaritju / ld Goto Github PK

View Code? Open in Web Editor NEW
349.0 5.0 50.0 40.1 MB

Localization Distillation for Object Detection (CVPR 2022, TPAMI 2023)

License: Apache License 2.0

Python 99.90% Shell 0.08% Dockerfile 0.02%
detection knowledge-distillation ld object-detection pytorch deep-learning

ld's Introduction

Localization Distillation for Dense Object Detection

English | 简体中文

Rotated-LD-mmRotate and Rotated-LD-Jittor for rotated object detection are now released.

This repo is based on mmDetection.

Analysis of LD in ZhiHu: 目标检测-定位蒸馏 (LD, CVPR 2022) and 目标检测-定位蒸馏续集——logit蒸馏与feature蒸馏之争

This is the code for our paper:

@Inproceedings{LD,
  title={Localization Distillation for Dense Object Detection},
  author={Zheng, Zhaohui and Ye, Rongguang and Wang, Ping and Ren, Dongwei and Zuo, Wangmeng and Hou, Qibin and Cheng, Ming-Ming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={9407--9416},
  year={2022}
}

@Article{zheng2023rotatedLD,
  title={Localization Distillation for Object Detection},
  author= {Zheng, Zhaohui and Ye, Rongguang and Hou, Qibin and Ren, Dongwei and Wang, Ping and Zuo, Wangmeng and Cheng, Ming-Ming},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023},
  volume={45},
  number={8},
  pages={10070-10083},
  doi={10.1109/TPAMI.2023.3248583}}

[2022.12.3] Rotated-LD-Jittor is now available.

[2022.4.13] Rotated-LD-mmRotate is now available.

[2021.3.30] LD is officially included in MMDetection V2, many thanks to @jshilong , @Johnson-Wang and @ZwwWayne for helping migrating the code.

LD is the extension of knowledge distillation on localization task, which utilizes the learned bbox distributions to transfer the localization dark knowledge from teacher to student.

LD stably improves over GFocalV1 about ~2.0 AP without adding any computational cost!

Introduction

Knowledge distillation (KD) has witnessed its powerful capability in learning compact models in object detection. Previous KD methods for object detection mostly focus on imitating deep features within the imitation regions instead of mimicking classification logits due to its inefficiency in distilling localization information. In this paper, by reformulating the knowledge distillation process on localization, we present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student. Moreover, we also heuristically introduce the concept of valuable localization region that can aid to selectively distill the semantic and localization knowledge for a certain region. Combining these two new components, for the first time, we show that logit mimicking can outperform feature imitation and localization knowledge distillation is more important and efficient than semantic knowledge for distilling object detectors. Our distillation scheme is simple as well as effective and can be easily applied to different dense object detectors. Experiments show that our LD can boost the AP score of GFocal-ResNet-50 with a single-scale 1x training schedule from 40.1 to 42.1 on the COCO benchmark without any sacrifice on the inference speed.

Installation

Please refer to INSTALL.md for installation and dataset preparation. Pytorch=1.7 and cudatoolkits=11 are recommended.

Get Started

Please see GETTING_STARTED.md for the basic usage of MMDetection.

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'

./tools/dist_train.sh configs/ld/ld_r50_gflv1_r101_fpn_coco_1x.py 8

Learning rate and batch size setting

lr=(samples_per_gpu * num_gpu) / 16 * 0.01

For 2 GPUs and mini-batch size 6, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.00375, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=3,

For 8 GPUs and mini-batch size 16, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=2,

Do not set your samples_per_gpu larger than 3!

Feature Imitation Methods

We provide several feature imitation methods, including FitNets fitnet, DeFeat decouple, Fine-Grained finegrain, GI location gibox.

    bbox_head=dict(
        loss_im=dict(type='IMLoss', loss_weight=2.0),
        imitation_method='finegrained'  # gibox, finegrain, decouple, fitnet
    )

Convert model

If you find trained model very large, please refer to publish_model.py

python tools/model_converters/publish_model.py your_model.pth your_new_model.pth

Speed Test (FPS)

CUDA_VISIBLE_DEVICES=0 python3 ./tools/benchmark.py configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth

Evaluation

./tools/dist_test.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth 8 --eval bbox
COCO
  • LD for Lightweight Detectors

    Evaluate on the main distillation region only.

    Teacher Student Training schedule AP (val) AP50 (val) AP75 (val) AP (test-dev) AP50 (test-dev) AP75 (test-dev) AR100 (test-dev)
    -- R-18 1x 35.8 53.1 38.2 36.0 53.4 38.7 55.3
    R-101 R-18 1x 36.5 52.9 39.3 36.8 53.5 39.9 56.6
    -- R-34 1x 38.9 56.6 42.2 39.2 56.9 42.3 58.0
    R-101 R-34 1x 39.8 56.6 43.1 40.0 57.1 43.5 59.3
    -- R-50 1x 40.1 58.2 43.1 40.5 58.8 43.9 59.0
    R-101 R-50 1x 41.1 58.7 44.9 41.2 58.8 44.7 59.8
    -- R-101 2x 44.6 62.9 48.4 45.0 63.6 48.9 62.3
    R-101-DCN R-101 2x 45.4 63.1 49.5 45.6 63.7 49.8 63.3
  • Self-LD

    Evaluate on the main distillation region only.

    Teacher Student Training schedule AP (val) AP50 (val) AP75 (val)
    -- R-18 1x 35.8 53.1 38.2
    R-18 R-18 1x 36.1 52.9 38.5
    -- R-50 1x 40.1 58.2 43.1
    R-50 R-50 1x 40.6 58.2 43.8
    -- X-101-32x4d-DCN 1x 46.9 65.4 51.1
    X-101-32x4d-DCN X-101-32x4d-DCN 1x 47.5 65.8 51.8
  • Logit Mimicking vs. Feature Imitation

    Ours = Main KD + Main LD + VLR LD. ''Main'' denotes the main distillation region, ''VLR'' denotes the valuable localization region. Teacher is R-101 and student is R-50.

    Method Training schedule AP (val) AP50 (val) AP75 (val) APs (val) APm (val) APl (val)
    -- 1x 40.1 58.2 43.1 23.3 44.4 52.5
    FitNets 1x 40.7 58.6 44.0 23.7 44.4 53.2
    Inside GT Box 1x 40.7 58.6 44.2 23.1 44.5 53.5
    Main Region 1x 41.1 58.7 44.4 24.1 44.6 53.6
    Fine-Grained 1x 41.1 58.8 44.8 23.3 45.4 53.1
    DeFeat 1x 40.8 58.6 44.2 24.3 44.6 53.7
    GI Imitation 1x 41.5 59.6 45.2 24.3 45.7 53.6
    Ours 1x 42.1 60.3 45.6 24.5 46.2 54.8
./tools/dist_test.sh configs/ld/ld_gflv1_r101_r18_fpn_voc.py work_dirs/ld_gflv1_r101_r18_fpn_voc/epoch_4.pth 8 --eval mAP
PASCAL VOC
  • LD for Lightweight Detectors

    Evaluate on the main distillation region only.

    Teacher Student Training Epochs AP AP50 AP75
    -- R-18 4 51.8 75.8 56.3
    R-101 R-18 4 53.0 75.9 57.6
    -- R-50 4 55.8 79.0 60.7
    R-101 R-50 4 56.1 78.5 61.2
    -- R-34 4 55.7 78.9 60.6
    R-101-DCN R-34 4 56.7 78.4 62.1
    -- R-101 4 57.6 80.4 62.7
    R-101-DCN R-101 4 58.4 80.2 63.7

    This is an example of evaluation results (R-101→R-18).

    +-------------+------+-------+--------+-------+
    | class       | gts  | dets  | recall | ap    |
    +-------------+------+-------+--------+-------+
    | aeroplane   | 285  | 4154  | 0.081  | 0.030 |
    | bicycle     | 337  | 7124  | 0.125  | 0.108 |
    | bird        | 459  | 5326  | 0.096  | 0.018 |
    | boat        | 263  | 8307  | 0.065  | 0.034 |
    | bottle      | 469  | 10203 | 0.051  | 0.045 |
    | bus         | 213  | 4098  | 0.315  | 0.247 |
    | car         | 1201 | 16563 | 0.193  | 0.131 |
    | cat         | 358  | 4878  | 0.254  | 0.128 |
    | chair       | 756  | 32655 | 0.053  | 0.027 |
    | cow         | 244  | 4576  | 0.131  | 0.109 |
    | diningtable | 206  | 13542 | 0.150  | 0.117 |
    | dog         | 489  | 6446  | 0.196  | 0.076 |
    | horse       | 348  | 5855  | 0.144  | 0.036 |
    | motorbike   | 325  | 6733  | 0.052  | 0.017 |
    | person      | 4528 | 51959 | 0.099  | 0.037 |
    | pottedplant | 480  | 12979 | 0.031  | 0.009 |
    | sheep       | 242  | 4706  | 0.132  | 0.060 |
    | sofa        | 239  | 9640  | 0.192  | 0.060 |
    | train       | 282  | 4986  | 0.142  | 0.042 |
    | tvmonitor   | 308  | 7922  | 0.078  | 0.045 |
    +-------------+------+-------+--------+-------+
    | mAP         |      |       |        | 0.069 |
    +-------------+------+-------+--------+-------+
    AP:  0.530091167986393
    ['AP50: 0.759393', 'AP55: 0.744544', 'AP60: 0.724239', 'AP65: 0.693551', 'AP70: 0.639848', 'AP75: 0.576284', 'AP80: 0.489098', 'AP85: 0.378586', 'AP90: 0.226534', 'AP95:   0.068834']
    {'mAP': 0.7593928575515747}
    

Note:

Pretrained weights

VOC 07+12

GFocal V1

pan.baidu pw: ufc8, teacher R101

pan.baidu pw: 5qra, teacher R101DCN

pan.baidu pw: 1bd3, Main LD R101→R18, box AP = 53.0

pan.baidu pw: thuw, Main LD R101DCN→R34, box AP = 56.5

pan.baidu pw: mp8t, Main LD R101DCN→R101, box AP = 58.4

GoogleDrive Main LD + VLR LD + VLR KD R101→R18, box AP = 54.0

GoogleDrive Main LD + VLR LD + VLR KD + GI imitation R101→R18, box AP = 54.4

COCO

GFocal V1

pan.baidu pw: hj8d, Main LD R101→R18 1x, box AP = 36.5

pan.baidu pw: bvzz, Main LD R101→R50 1x, box AP = 41.1

GoogleDrive Main KD + Main LD + VLR LD R101→R18 1x, box AP = 37.5

GoogleDrive Main KD + Main LD + VLR LD R101→R34 1x, box AP = 41.0

GoogleDrive Main KD + Main LD + VLR LD R101→R50 1x, box AP = 42.1

GoogleDrive Main KD + Main LD + VLR LD + GI imitation R101→R50, box AP = 42.4

GFocal V2

GoogleDrive Main KD + Main LD + VLR LD R101→R50 1x, box AP = 42.7

GoogleDrive | Training log Main KD + Main LD + VLR LD R101-DCN→R101 2x, box AP (test-dev) = 47.1

GoogleDrive | Training log Main KD + Main LD + VLR LD Res2Net101-DCN→X101-32x4d-DCN 2x, box AP (test-dev) = 50.5

For any other teacher model, you can download at GFocalV1, GFocalV2 and mmdetection.

AP Landscape

If you want to draw AP landscape, please replace the relevant files with the files in AP_landscape, and run

# config1 and checkpoint1 correspond to the heads you want to pass through

./tools/dist_test.py config1 config2 checkpoint1 checkpoint2 1

Score voting Cluster-DIoU-NMS

We provide Score voting Cluster-DIoU-NMS which is a speed up version of score voting NMS and combination with DIoU-NMS. For GFocalV1 and GFocalV2, Score voting Cluster-DIoU-NMS will bring 0.1-0.3 AP increase, 0.2-0.5 AP75 increase and <=0.4 AP50 decrease, while it is much faster than score voting NMS in mmdetection. The relevant portion of the config file would be:

# Score voting Cluster-DIoU-NMS
test_cfg = dict(
nms=dict(type='voting_cluster_diounms', iou_threshold=0.6),

# Original NMS
test_cfg = dict(
nms=dict(type='nms', iou_threshold=0.6),

ld's People

Contributors

hikaritju avatar zzh-tju avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ld's Issues

subprocess.CalledProcessError: Command '[]' returned non-zero exit status 1.

抱歉打扰了,我在复现您的代码时遇到了以下问题。
安装好相应包后我想试试代码能不能跑通,因为只有一个GPU,我将后面参数改为1
/tools/dist_train.sh configs/ld/ld_r50_gflv1_r101_fpn_coco_1x.py 1

报错如下:
Traceback (most recent call last):
File "./tools/train.py", line 15, in
from mmdet.apis import set_random_seed, train_detector
File "/home/cs/LD/mmdet/apis/init.py", line 1, in
from .inference import (async_inference_detector, inference_detector,
File "/home/cs/LD/mmdet/apis/inference.py", line 10, in
from mmdet.core import get_classes
File "/home/cs/LD/mmdet/core/init.py", line 5, in
from .mask import * # noqa: F401, F403
File "/home/cs/LD/mmdet/core/mask/init.py", line 2, in
from .structures import BaseInstanceMasks, BitmapMasks, PolygonMasks
File "/home/cs/LD/mmdet/core/mask/structures.py", line 6, in
import pycocotools.mask as maskUtils
ModuleNotFoundError: No module named 'pycocotools'
Traceback (most recent call last):
File "/home/cs/anaconda3/envs/LD/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/cs/anaconda3/envs/LD/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cs/anaconda3/envs/LD/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/cs/anaconda3/envs/LD/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/cs/anaconda3/envs/LD/bin/python', '-u', './tools/train.py', '--local_rank=0', 'configs/ld/ld_r50_gflv1_r101_fpn_coco_1x.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

运行环境:ubuntu 20.04

配置:
mmcv-full 1.2.7
torch 1.5.1
cuda 11.4

文件缺失

能提供一下gfl_r101_fpn_1x_voc.py文件吗?

How can I train my own VOC datasets?

我想尝试你们的方法在我自己数据集的效果,但是我遇到了很多问题。
1、我自己的数据集只有两类,不能使用提供的预训练模型
2、我想使用gfl_r101_fpn_voc.py重新训练一个教师网络,遇到错误如下所示。
1652337053(1)
另外,add new datasets的链接是失效的。感谢解答!

other models?

Hello
Firstly, thank you for this great work.
Please, is it possible to use the code with whatever teacher/student model from mmdetection?
I am interested to distill knowledge from a large model to a very light student such as MobileNet, and use it for fast inference.

Thank you in advance

loss_ld is samll

on the 487th line in ld_head.py, as below:
# losses_ld = list(map(lambda x: x / avg_factor, losses_ld))
it wasn't implemented, because losses_ld was samll, wasn't it ?

关于回归的变量离散化选择

您好,感谢您的很棒的工作。我想将LD用到其他检测器中,关于把regression换成离散化的概率这部分有一个疑惑想要问问作者。我看检测器每一个scale得到的regression都通过相同的integral将概率转化成lrtb值,那么得到的lrtb都是一样的。所以最终不同feature scale的lrtb范围都是相同的,为什么不把每个scale的lrtb按照比例放缩范围呢?以及如果feature比较大,这样的实现得到的lrtb范围是否有可能小于GT lrtb的范围呢?

pos_bbox_pred_corners = self.integral(pos_bbox_pred)

训练达不到论文的效果

安装您论文给的设置,在源码和mmdetection上都训练不到论文中的效果,甚至连原来gfl都没有,请问这是什么原因?是不是蒸馏后T的值要改变的问题?

map is getting worse

I caculated by below formula:
loss = loss_qfl + loss_bbox + loss_dfl + loss_ld
the result was that map was worse.
why?

关于GI(General Instance)论文的实现问题

您好,在看您论文和代码的时候,发现您实现了GI(General Instance),但是我好像只找到了特征蒸馏部分,没有找到response-based蒸馏和relation-based蒸馏,请问您有实现这两个部分吗?如果有,方便提供一下吗?非常感谢。

使用LD的参数变化

WeChat Image_20220509105231
您好,我在研读您的论文的时候发现使用LD的模型参数不会减低,反而会增加。我理解的知识蒸馏是一种模型压缩的方法,我有些困惑,希望您能解答。

A question about model training

Hello, thanks for such a wonderful work. After reading this paper, I have a question regarding model training.

According to the codes, ground truth annotations are still necessary during knowledge distillation to compute classfication loss, regression loss and DFL. The paper mentions that removing regression loss and DFL only causes a small decrease in mAP.

But if I want to completely get rid of the dependency on ground truth labels, how should I deal with the classfication loss? Does this term affects the final mAP a lot? Would you please share some insights or results about this?

Different softmax functions for prediction and soft label

Hello, may I ask why you used log_softmax for prediction but softmax for soft label? Please see the code lines below.

@weighted_loss
def ld_distribution_focal_loss(pred, label, soft_label, T):
ld_loss = F.kl_div(F.log_softmax(pred / T, dim=1), F.softmax(soft_label / T, dim=1).detach(),
reduction='none').mean(1) * (T * T)
return ld_loss

And here is the equation in your paper:
image

Best regards

On Self-LD

@HikariTJU 你好, 在论文中有提到使用了Self KD可以涨点。 不过release的代码里,好像没有找到。
因为self KD的做法好像有好几种, 请问可以大概讲一下怎么做的吗?参考的哪篇论文呢?
谢谢!

About GPU settings

Hi, in your paper, you use 2 GPUs in all your experiments except for SOTA experiments and 8 GPUs in SOTA experiments, is it? If it is, may I ask why?

关于ld_head.py 中loss的问题?

请问大佬,mmdetection官方项目中的ld_head.py中loss比LD提供的少了以下几个loss呢?
loss_ld_vlr=losses_ld_vlr,
loss_kd=losses_kd,
loss_kd_neg=losses_kd_neg,
loss_im=losses_im,
请问这是我理解错了还是后续改进了loss的使用?

bug while running the sample model training

Hello again,
I tried to run the training code that is provided in the Readme.md file using 1 gpu.

./tools/dist_train.sh configs/ld/ld_r50_gflv1_r101_fpn_coco_1x.py 1

The only modification I did in the config file, is to precise that I want to run the program for one epoch:

runner = dict( type='EpochBasedRunner', max_epochs= 1 )

I am getting an error while trying to save the checkpoint after the training. This is the complete bug:

2022-02-03 02:45:24,914 - mmdet - INFO - Epoch [1][58550/58633] lr: 2.500e-03, eta: 0:00:46, time: 0.5
58, data_time: 0.004, memory: 4122, loss_cls: 0.7503, loss_bbox: 0.3822, loss_dfl: 0.2494, loss_ld: 0.
2703, loss_ld_vlr: 0.4174, loss_kd: 0.2851, loss_kd_neg: 0.0000, loss_im: 0.3847, loss: 2.7392
2022-02-03 02:45:52,665 - mmdet - INFO - Epoch [1][58600/58633] lr: 2.500e-03, eta: 0:00:18, time: 0.5
55, data_time: 0.004, memory: 4122, loss_cls: 0.7615, loss_bbox: 0.3864, loss_dfl: 0.2457, loss_ld: 0.
2216, loss_ld_vlr: 0.3467, loss_kd: 0.2940, loss_kd_neg: 0.0000, loss_im: 0.3496, loss: 2.6055
2022-02-03 02:46:16,693 - mmdet - INFO - Saving checkpoint at 1 epochs
[                                                  ] 0/5000, elapsed: 0s, ETA:Traceback (most recent c
all last):
  File "./tools/train.py", line 187, in <module>
    main()
  File "./tools/train.py", line 183, in main
    meta=meta)
  File "/home/edouard/eden/work/codes/LD/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py"
, line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py"
, line 54, in train
    self.call_hook('after_train_epoch')
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 
308, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/edouard/eden/work/codes/LD/mmdet/core/evaluation/eval_hooks.py", line 276, in after_trai
n_epoch
    gpu_collect=self.gpu_collect)
  File "/home/edouard/eden/work/codes/LD/mmdet/apis/test.py", line 97, in multi_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 
550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 458, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/home/edouard/eden/work/codes/LD/mmdet/models/detectors/base.py", line 183, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/edouard/eden/work/codes/LD/mmdet/models/detectors/base.py", line 160, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/edouard/eden/work/codes/LD/mmdet/models/detectors/single_stage.py", line 120, in simple_test
    *outs, img_metas, rescale=rescale)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 164, in new_func
    return old_func(*args, **kwargs)
      File "/home/edouard/eden/work/codes/LD/mmdet/models/dense_heads/anchor_head.py", line 583, in get_bboxes
    scale_factors, cfg, rescale)
  File "/home/edouard/eden/work/codes/LD/mmdet/models/dense_heads/gfl_head.py", line 560, in _get_bboxes
    cfg.max_per_img)
  File "/home/edouard/eden/work/codes/LD/mmdet/core/post_processing/bbox_nms.py", line 187, in multiclass_nms
    return dets, labels[keep]
IndexError: index 8663 is out of bounds for dimension 0 with size 100
Traceback (most recent call last):
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/edouard/anaconda3/envs/LD/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/edouard/anaconda3/envs/LD/bin/python', '-u', './tools/train.py', '--local_rank=0', 'configs/ld/ld_r50_gflv1_r101_fpn_coco_1x.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

PS. all is installed as recommended in the readme file.

Thank you very much for you help

config问题

文章Table4和Table5所述的LD应该是指Main KD + Main LD + VLR LD吧,我看config文件里ld_r18_gflv1_r101_fpn_coco_1x.py只应用了Main LD,但在ld_r50_gflv1_r101_fpn_coco_1x.py中应用了Main KD + Main LD + VLR LD。其他的config需要自己配置编写对吧。谢谢!

效果达不到论文以及config疑问

1.使用teacher-r101-student-r18的config 1x训练结果:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.323
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.467
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.350
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.179
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.345
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.538
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.538
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.538
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.317
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.579
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.707
结果差论文很多,这大概是什么原因?
2.config中只加载了teacher网络在coco上的训练好的模型,为啥student却是只加载了imagenet预训练的模型,不应该是基于训练好的coco模型蒸馏吗?感谢回答

AP landscape

Exciting and excellent work! We are really interested in your AP landscape. Will you released the code or pseudo-code for testing AP landscape?

why mAP of my train model is under mAP of this paper?

I download Resnet101 GFocal model from mmdetction(download link:https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth), and train by this command line:
bash ./tools/dist_train.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py 8
but I get the mAP of epoch 12 is 0.3960。this paper give this mAP should be 41.2。
train and eval log:
2021-08-18 10:02:25,727 - mmdet - INFO - Epoch [12][4750/4887] lr: 3.750e-05, eta: 0:02:01, time: 0.882, data_time: 0.015, memory: 6289, loss_cls: 0.3524, loss_bbox: 0.2883, loss_ld: 0.1610, loss_dfl: 0.2044, loss: 1.0061
2021-08-18 10:03:10,220 - mmdet - INFO - Epoch [12][4800/4887] lr: 3.750e-05, eta: 0:01:17, time: 0.890, data_time: 0.014, memory: 6289, loss_cls: 0.3597, loss_bbox: 0.2918, loss_ld: 0.1602, loss_dfl: 0.2045, loss: 1.0162
2021-08-18 10:03:54,650 - mmdet - INFO - Epoch [12][4850/4887] lr: 3.750e-05, eta: 0:00:32, time: 0.888, data_time: 0.015, memory: 6289, loss_cls: 0.3543, loss_bbox: 0.2908, loss_ld: 0.1611, loss_dfl: 0.2060, loss: 1.0122
2021-08-18 10:04:45,038 - mmdet - INFO - Saving checkpoint at 12 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 95.6 task/s, elapsed: 52s, ETA: 0s

2021-08-18 10:05:50,605 - mmdet - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=6.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=53.03s).
Accumulating evaluation results...
DONE (t=11.64s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.396
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.572
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.428
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.226
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.435
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.582
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.582
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.582
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.369
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.630
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.745
2021-08-18 10:07:02,558 - mmdet - INFO - Exp name: ld_gflv1_r101_r50_fpn_coco_1x.py
2021-08-18 10:07:02,559 - mmdet - INFO - Epoch(val) [12][4887] bbox_mAP: 0.3960, bbox_mAP_50: 0.5720, bbox_mAP_75: 0.4280, bbox_mAP_s: 0.2260, bbox_mAP_m: 0.4350, bbox_mAP_l: 0.5160, bbox_mAP_copypaste: 0.396 0.572 0.428 0.226 0.435 0.516

could you help me to slove this problem?

RuntimeError: NCCL error

./tools/dist_train.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py 8

RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8

retinagfl_r101_2x_coco.py训练问题

您好,我在最新版的mmdetection中增加了这个仓库中的retinagfl_r101_2x_coco.py文件和需要的一些head文件
代码的唯一改动是retina_fl_head.py中的Integral.forward(),因为直接跑会出现数字过大cuda报错,所以将
x = F.linear(x, self.project.type_as(x)).reshape(-1, 4) 改成了
x = F.linear(x.reshape(4, -1, x.shape[-1]), self.project.type_as(x)).reshape(-1, 4)
其他都是按照原配置运行实验,学习率设置为0.01,8卡,具体设置如下
20220308_071908.log
但是实验结果非常糟糕,2022-03-09 20:50:44,029 - mmdet - INFO - Epoch(val) [24][625] bbox_mAP: 0.0010, bbox_mAP_50: 0.0020, bbox_mAP_75: 0.0000, bbox_mAP_s: 0.0000, bbox_mAP_m: 0.0000, bbox_mAP_l: 0.0020, bbox_mAP_copypaste: 0.001 0.002 0.000 0.000 0.000 0.002
想问下是哪里出现了问题?

关于分类蒸馏的问题

想问一下这里是吧多个二分类sigmoid输出转化为softmax下的了吗?这样是否不考虑背景分类的蒸馏

请问是否可以使用其他预训练教师模型

我用FCOS-GFL-R101的配置预训练了一个教师模型,并用教师训练了学生模型,基本达到了您论文的结果。
我想尝试使用其他的预训练模型训练教师,不知道可不可以,另外代码是不是需要修改某些部分,麻烦帮忙看一下,谢谢。

我对 LD/configs/ld/ld_r50_fcos_r101_1x.py 进行了更改,
teacher_config='configs/fcos/fcos_r101_caffe_fpn_gn-head_mstrain_640-800_2x_coco.py',
teacher_ckpt='configs/ld/fcos_r101_caffe_fpn_gn-head_mstrain_640-800_2x_coco-511424d6.pth'
并下载了mmdetection提供的预训练模型。

Some Questions about the VLR ?

  1. I notice that you separate the VLR and MDR in an ATSS manner, but if I use some other label assignments like OTA or TOOD, should I split the valuable distillation region via quality metrics to keep in step with different LA methods? (set a quality metric threshold)
  2. Region Weighting in your paper is from the Student regression feature map or Teacher or the combination of the two?
  3. Great idea and extend the probabilistic attribute of DFL to distillation, fantastic !!!

关于ld_loss计算的问题

实现中ld_loss没有除以avg_factor
如果batch_size比较大的话,ld_loss会波动很大
我训练的时候ld_loss会从刚开始1左右上升到10多,又慢慢下降。

我看您推荐的batch_size是2,如果要batch_size比较大该怎么设置呢?把ld_loss的权重 0.25 除以batchsize*2?
#28 #7

setting parameter of LDLoss

at present, I modify and run through distillation code, using nanodet training framework.
setting parameter reduction='mean', loss_weight=1.0, T=2, alpha=1, beta=1 in LDLoss, did it?

[root][05-08 03:57:02]INFO:warmup|Iter(211/300)| lr:7.06e-02| loss_qfl:0.2829| loss_bbox:0.8766| loss_ld:0.0196| loss_dfl:0.3119|
[root][05-08 03:57:10]INFO:warmup|Iter(221/300)| lr:7.39e-02| loss_qfl:0.2794| loss_bbox:0.8341| loss_ld:0.0250| loss_dfl:0.2975|
[root][05-08 03:57:16]INFO:warmup|Iter(231/300)| lr:7.72e-02| loss_qfl:0.5848| loss_bbox:0.9857| loss_ld:0.0168| loss_dfl:0.3245|
[root][05-08 03:57:23]INFO:warmup|Iter(241/300)| lr:8.05e-02| loss_qfl:0.2754| loss_bbox:0.9521| loss_ld:0.0166| loss_dfl:0.3287|
...
[root][05-08 04:00:07]INFO:train|Epoch1/300|Iter281(280/1543)| lr:1.00e-01| loss_qfl:0.2121| loss_bbox:0.6372| loss_ld:0.0287| loss_dfl:0.2452|
[root][05-08 04:00:11]INFO:train|Epoch1/300|Iter291(290/1543)| lr:1.00e-01| loss_qfl:0.1914| loss_bbox:0.6420| loss_ld:0.0306| loss_dfl:0.2301|
[root][05-08 04:00:15]INFO:train|Epoch1/300|Iter301(300/1543)| lr:1.00e-01| loss_qfl:0.2096| loss_bbox:0.6041| loss_ld:0.0293| loss_dfl:0.2287|
[root][05-08 04:00:20]INFO:train|Epoch1/300|Iter311(310/1543)| lr:1.00e-01| loss_qfl:0.2008| loss_bbox:0.6006| loss_ld:0.0285| loss_dfl:0.2271|
......

which config

GFocalV1 + LD R101_R50_1x pan.baidu pw: bvzz, according to which config?

关于GI(General Instance) Feature-based Distillation 实现的问题

我注意到您代码里的实现:
idx_out = torch.ops.torchvision.nms(gibox, giscore, 0.3)[:10]
return idx_out

gi_idx = self.get_gi_region(soft_label, cls_score, anchors,
bbox_pred, soft_targets, stride)
gi_teacher = teacher_x[gi_idx]
gi_student = x[gi_idx]
loss_im = self.loss_im(gi_student, gi_teacher)

并没有像GI 论文里那样使用GIbox ROIAlign 进行FM。

这样的话只选了10个FPN像素进行FM,而且经过NMS 这10个GI box像素附近的点也大概率被抑制掉了,这些像素应该也是信息量比较大的像素。

请问你们这样实现是为什么呢?有试过原文的做法吗,效果怎么样?

KeyError: "GFL: 'LDGFLHead is not in the models registry'",您好,请问在运行的时候报这个错误是什么原因?

Notice

There are several common situations in the reimplementation issues as below

  1. Reimplement a model in the model zoo using the provided configs
  2. Reimplement a model in the model zoo on other dataset (e.g., custom datasets)
  3. Reimplement a custom model but all the components are implemented in MMDetection
  4. Reimplement a custom model with new modules implemented by yourself

There are several things to do for different cases as below.

  • For case 1 & 3, please follow the steps in the following sections thus we could help to quick identify the issue.
  • For case 2 & 4, please understand that we are not able to do much help here because we usually do not know the full code and the users should be responsible to the code they write.
  • One suggestion for case 2 & 4 is that the users should first check whether the bug lies in the self-implemented code or the original code. For example, users can first make sure that the same model runs well on supported datasets. If you still need help, please describe what you have done and what you obtain in the issue, and follow the steps in the following sections and try as clear as possible so that we can better help you.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The issue has not been fixed in the latest version.

Describe the issue

A clear and concise description of what the problem you meet and what have you done.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. What config dir you run?
A placeholder for the config.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Results

If applicable, paste the related results here, e.g., what you expect and what you get.

A placeholder for results comparison

Issue fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

训练时遇到subprocess.CalledProcessError

您好,我按照项目中的要求创建了新的虚拟环境
使用了cuda10.1+pytorch1.5.0
(这里install.md中的mmcv-full安装方式需要更新pip install mmcv-full==1.2.7 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.5.0/index.html
README.md中的configs目录有问题
我只有一个GPU,训练的时候,我修改成了
./tools/dist_train.sh configs/ld/ld_r18_gflv1_r101_fpn_coco_1x.py 1
有一处报错为:
subprocess.CalledProcessError: Command '['/home/a/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=0', 'configs/ld/ld_r18_gflv1_r101_fpn_coco_1x.py', '--launcher', 'pytorch']' returned non-zero exit status 1.
我搜索了此种报错的解决方案,有一种是在DistributedDataParallel中添加find_unused_parameters=True
model = torch.nn.parallel.DistributedDataParallel(model,device_ids=[args.local_rank],output_device=args.local_rank, find_unused_parameters=True)
我想知道这个项目的find_unused_parameters应该在哪个文件中设置呢?感谢。

我的整体报错如下:
(open-mmlab) a@a-System-Product-Name:~/LD$ ./tools/dist_train.sh configs/ld/ld_r18_gflv1_r101_fpn_coco_1x.py 1
2021-11-27 22:44:39,804 - mmdet - INFO - Environment info:

sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2060 SUPER
CUDA_HOME: /usr/local/cuda-10.2
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.5.1
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+35d732a
OpenCV: 4.5.4
MMCV: 1.2.7
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.10.0+9856a78

2021-11-27 22:44:39,967 - mmdet - INFO - Distributed training: True
2021-11-27 22:44:40,128 - mmdet - INFO - Config:
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=3,
workers_per_gpu=2,
train=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_train2017.json',
img_prefix='data/coco/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]),
val=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]),
test=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(type='SGD', lr=0.00375, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth'
model = dict(
type='KnowledgeDistillationSingleStageDetector',
pretrained='torchvision://resnet18',
teacher_config='configs/gfl/gfl_r101_fpn_mstrain_2x_coco.py',
teacher_ckpt=
'https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth',
output_feature=True,
backbone=dict(
type='ResNet',
depth=18,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[64, 128, 256, 512],
out_channels=256,
start_level=1,
add_extra_convs='on_output',
num_outs=5),
bbox_head=dict(
type='LDHead',
num_classes=80,
in_channels=256,
stacked_convs=4,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
ratios=[1.0],
octave_base_scale=8,
scales_per_octave=1,
strides=[8, 16, 32, 64, 128]),
loss_cls=dict(
type='QualityFocalLoss',
use_sigmoid=True,
beta=2.0,
loss_weight=1.0),
loss_dfl=dict(type='DistributionFocalLoss', loss_weight=0.25),
loss_ld=dict(
type='KnowledgeDistillationKLDivLoss', loss_weight=0.25, T=10),
reg_max=16,
loss_bbox=dict(type='GIoULoss', loss_weight=2.0)),
train_cfg=dict(
assigner=dict(type='ATSSAssigner', topk=9),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=1000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.6),
max_per_img=100))
work_dir = './work_dirs/ld_r18_gflv1_r101_fpn_coco_1x'
gpu_ids = range(0, 1)

Traceback (most recent call last):
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 179, in build_from_cfg
return obj_cls(**args)
TypeError: init() missing 1 required positional argument: 'loss_im'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 179, in build_from_cfg
return obj_cls(**args)
File "/home/a/LD/mmdet/models/detectors/kd_one_stage.py", line 35, in init
pretrained)
File "/home/a/LD/mmdet/models/detectors/single_stage.py", line 30, in init
self.bbox_head = build_head(bbox_head)
File "/home/a/LD/mmdet/models/builder.py", line 59, in build_head
return build(cfg, HEADS)
File "/home/a/LD/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 182, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: LDHead: init() missing 1 required positional argument: 'loss_im'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./tools/train.py", line 187, in
main()
File "./tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/home/a/LD/mmdet/models/builder.py", line 77, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/a/LD/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 182, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: KnowledgeDistillationSingleStageDetector: LDHead: init() missing 1 required positional argument: 'loss_im'
Traceback (most recent call last):
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/a/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/a/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=0', 'configs/ld/ld_r18_gflv1_r101_fpn_coco_1x.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

AttributeError: 'ATSSAssigner' object has no attribute 'assign_neg'

assign_result_neg, assigned_neg = self.assigner.assign_neg(

您好,我使用了配置文件:configs/ld/ld_r50_atss_r101_1x.py
其中,bbox_head的type为LDATSSHead
train_cfg 的 assigner 为 ATSSAssigner
但是我发现,在mmdet/models/dense_heads/ld_atss.py第467行的assign_neg函数,并没有定义。

也就是说:LD/mmdet/core/bbox/assigners/atss_assigner.py的class ATSSAssigner 里没有assign_neg函数

请问关于retinanet,fcos的gfocal实现

您好!
我注意到LD项目中,config中的gfl文件夹里有复现相关不同网络使用generalized focal loss的代码
我理解的generalized focal loss的改进是在 1.分类分支使用了joint的quality focal loss 2.回归分支采用了general distribution+distribution focal loss,后者的核心是采用概率分布改进了边框拟合方式。
我看到项目中代码,例如gfl_r50_fpn_1x_coco这些,是在loss_cls中使用了quality focal loss,在loss_dfl中使用了distribution focal loss,这很容易理解。
但是,例如retina_gfl_r101_2x,fcos_gfl_r50_center这些配置中,对loss_cls还是使用了focalloss,在loss_bbox使用了GIoUloss(我理解的是quality focal loss中采用IoU label代替one-hot label的意思),不知道distribution focal loss是怎么体现的呢?其中,retinanet(retina_gfl_r101_2x)采用的好像还是deltaxywhbboxcoder?
非常感谢!

how to calculate loss?

hi, was it implemented to losses = self(**data)?
on 234th line in base.py in mmdet/models/detectors

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.