Comments (5)
Hi, Thanks for your attention.
It seems that there is no problem with the DotD calculation. Moreover, could you please provide the config file, so that I can help you figure out the problems? And here is a link to the official implementation for your reference.
https://github.com/Chasel-Tsui/mmdet-aitod
As for the mAP drop, could you please provide more details about the in-house dataset? such as the average absolute object size and the largest/smallest object size. I think the performance degradation may result from the large size variation of the dataset, and we found that the DotD may generate sub-optimal results when the dataset contains many medium and large objects (>32*32 pixels). For a substitute, we recommend our newly released NWD-RKA (also in the above link) which may better handle tiny object detection when there is large size variation.
As for the AI-TOD dataset, we have attached a download link (BaiduPan) in this repo, you may need to download the xview training set and the remaining part of AI-TOD (AI-TOD_wo_xview) to generate the whole AI-TOD dataset with the end2end tools. We will update other download links (Google Drive or OneDrive) to the AI-TOD_wo_xview in a week.
Please feel free to contact me if you have further issues.
from ai-tod.
Thanks @Chasel-Tsui,
Here is the config
model = dict(
type='FasterRCNN',
backbone=dict(
type='mmcls.ConvNeXt',
arch='base',
out_indices=[0, 1, 2, 3],
drop_path_rate=0.4,
layer_scale_init_value=1.0,
gap_before_final_norm=False,
init_cfg=dict(
type='Pretrained',
checkpoint=
'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-base_3rdparty_in21k_20220301-262fd037.pth',
prefix='backbone.')),
neck=dict(
type='FPN',
in_channels=[128, 256, 512, 1024],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=1,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1,
iou_calculator=dict(
type='DotDistOverlaps', average_size=900.0)),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1,
iou_calculator=dict(
type='DotDistOverlaps', average_size=900.0)),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)))
The stats of the data set is as the following:
Thanks again.
from ai-tod.
Thanks @Chasel-Tsui,
Here is the config
model = dict( type='FasterRCNN', backbone=dict( type='mmcls.ConvNeXt', arch='base', out_indices=[0, 1, 2, 3], drop_path_rate=0.4, layer_scale_init_value=1.0, gap_before_final_norm=False, init_cfg=dict( type='Pretrained', checkpoint= 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-base_3rdparty_in21k_20220301-262fd037.pth', prefix='backbone.')), neck=dict( type='FPN', in_channels=[128, 256, 512, 1024], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1, iou_calculator=dict( type='DotDistOverlaps', average_size=900.0)), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1, iou_calculator=dict( type='DotDistOverlaps', average_size=900.0)), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)))The stats of the data set is as the following:
Thanks again.
Hi, my pleasure.
I have checked the config and the stats information.
Here are three possible solutions.
First, you could try to disable the DotD in the RCNN stage, and switch it to the original IoU to test the performance.
Second, I notice that the "avg_annotation_area" of this dataset is 16761 pixels, which indicates that DotD is not a good candidate for this dataset. Although it contains some tiny objects, from a global perspective, it might not be very effective to apply tiny object detection strategies to a large object dataset (average size) to improve the mAP. To my best knowledge, the DotD strategy may not be effective for a dataset containing many large objects. The tested AI-TOD contains objects mainly on a tiny scale (smaller than 1024 pixels). If you want to use DotD, I am not sure but it might be helpful to ensemble two models, one of IoU training, and another of DotD training.
from ai-tod.
Thanks @Chasel-Tsui,
Here is the configmodel = dict( type='FasterRCNN', backbone=dict( type='mmcls.ConvNeXt', arch='base', out_indices=[0, 1, 2, 3], drop_path_rate=0.4, layer_scale_init_value=1.0, gap_before_final_norm=False, init_cfg=dict( type='Pretrained', checkpoint= 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-base_3rdparty_in21k_20220301-262fd037.pth', prefix='backbone.')), neck=dict( type='FPN', in_channels=[128, 256, 512, 1024], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1, iou_calculator=dict( type='DotDistOverlaps', average_size=900.0)), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1, iou_calculator=dict( type='DotDistOverlaps', average_size=900.0)), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)))The stats of the data set is as the following:
Thanks again.Hi, my pleasure. I have checked the config and the stats information. Here are three possible solutions. First, you could try to disable the DotD in the RCNN stage, and switch it to the original IoU to test the performance.
Second, I notice that the "avg_annotation_area" of this dataset is 16761 pixels, which indicates that DotD is not a good candidate for this dataset. Although it contains some tiny objects, from a global perspective, it might not be very effective to apply tiny object detection strategies to a large object dataset (average size) to improve the mAP. To my best knowledge, the DotD strategy may not be effective for a dataset containing many large objects. The tested AI-TOD contains objects mainly on a tiny scale (smaller than 1024 pixels). If you want to use DotD, I am not sure but it might be helpful to ensemble two models, one of IoU training, and another of DotD training.
The last solution might be trying our newly released NWD-RKA (https://github.com/Chasel-Tsui/mmdet-aitod), which might be helpful for tiny object detection when objects are of size variation. But I could not guarantee the improvement since the dataset on average seems to be a large object detection dataset as mentioned above.
from ai-tod.
Thanks @Chasel-Tsui - I have created a new dataset in which the maximum area of a bbox is 2024 and the DotD performed better than IoU.
from ai-tod.
Related Issues (20)
- 指标计算问题 HOT 9
- train on my own dataset HOT 1
- It would be nice to put the source code of the proposed detector (M-CenterNet)
- xView数据集问题 HOT 7
- About the annotation files of AI-TOD HOT 2
- How to download Xview Training set in terminal? HOT 1
- What is the keyword of AI-TOD_wo_xview HOT 1
- ModuleNotFoundError: No module named 'lxml' HOT 1
- 请问aitod(不包括xview部分)是coco格式吗? HOT 1
- Not able to find the xview_train.geojson file HOT 3
- Error in tiff images while executing python generate_aitod_imgs.py. HOT 21
- Is there any other way to download the AITOD dataset, such as Baidu Netdisk ? HOT 9
- AI-TOD的标签 HOT 2
- AP未达到论文中报告的结果 HOT 4
- NameError: name 'wasserstein_nms' is not defined HOT 1
- 如何支持其他框架的aitod数据集适配 HOT 1
- label format dataset HOT 2
- How to download the dataset HOT 3
- When the dataset comes HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ai-tod.