idea-research / maskdino Goto Github PK

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

License: Apache License 2.0

Python 85.87% Shell 0.15% C++ 1.39% Cuda 12.59%

instance-segmentation object-detection panoptic-segmentation semantic-segmentation

maskdino's People

Contributors

Stargazers

Watchers

Forkers

fangwudi congjianting 23119841 yf-wang-chn tinyloop husnejahan mohamedettebayo czczup li-qingyun peoplelu tttamaki kaidduong achbogga asdf2kr mtchiu2 taikisugiura meamarp jacktkk otakutyrant zgazak philiphaddad97 lukehansen rasmuspjohansson itto1992 jiaqi-chen-00 kaiguy23 lxt98 ulteraa whuhxb qzhangli chawins dunanyang olilessard healthonrails anoop-qasolve dumpmemory dora42909 stmharry cv-seg fedllanes keyboardcartel cv-det aniya9660 rinatum qianqian121 pipizhum mikes95 rteklewold ryanamaudruz jieli1990 suke0 chenyuyi94 yjc11 peggypeppa mission-systems-pty-ltd tpswpu nathanterroir babyblue26 braca51e icedstone vslaykovsky lin159753 haohao11 techthiyanes phanthanhhang connellybarnes prasadvineetv cdchenlin khd0425 kkpan11 davidocea yirui-fafa raywang-iat itachi1232gg jdj2261 simplexsigil ivano-donadi-ennova poohoh 594zyc misslibra jeonggaram xijunke david-rohrschneider sangboom hostmkmk oyontalas sadjadasghari andybarcia bf-zheng carcruz97 senliontec noticeable amink8 richardminsoogo-ml crzaizxw1314 mhieutr

maskdino's Issues

Inference demo with pre-trained MaskDINO model

Hello Team,
Thanks for publishing code. I was impressed by results that you have published in paper. Compared to other top leaderboard ( on https://paperswithcode.com/sota/instance-segmentation-on-coco ) models for instance segmentation problem. MaskDINO is using quit a low # of parameters and yet acheiving better score. Hence, In my current application we are planning to use MaskDINO and was trying to do some infer tests.

I was able to setup a env as per installation requirment shared in repo.

I just wanted to know if you could also add demo infer, just like detectron2 has in its getting started section.
https://detectron2.readthedocs.io/en/latest/tutorials/getting_started.html

Or else let me know how I can modify detectron2 demo.py script so I can run it.

MaskDINOHead changes statedict keys

On lines 33/34 of maskdino_head.py, the keys are modified. When performing a finetuning with the SwinL weights, the new statedict keys do not match up with the model layer names due to this renaming. Removing this renaming allows for the keys to match up.

tensorboard logging

Can you please let me know if the tensorboard is logging train or val stat and logging is happening from detectron api or there is some code in your repo which is logging the stats ?

masks of instance segmentation

Hi,

I trained a model with a custom dataset, the bbox seems pretty solid. I was wondering how can I get the instance mask? I looked at the inference notebook and looked at the output of the model and I'm not sure what I"m missing, but I couldn't find any mask parameters.

My goal is eventually to do instance/panoptic segmentation.

Thanks!

AttributeError: 'list' object has no attribute 'get_bounding_boxes'

I try to train maskdino_r50_50ep_100q_celoss_hid1024_3s_semantic_cityscapes_79.8miou.pth with this config [MaskDINO](https://github.com/IDEA-Research/MaskDINO)/[configs](https://github.com/IDEA-Research/MaskDINO/tree/main/configs)/[cityscapes](https://github.com/IDEA-Research/MaskDINO/tree/main/configs/cityscapes)/[semantic-segmentation](https://github.com/IDEA-Research/MaskDINO/tree/main/configs/cityscapes/semantic-segmentation)/maskdino_R50_bs16_90k_steplr.yaml. The command is python train_net.py --num-gpus 1 --config-file configs/cityscapes/semantic-segmentation/maskdino_R50_bs16_90k_steplr.yaml MODEL.WEIGHTS maskdino_r50_50ep_100q_celoss_hid1024_3s_semantic_cityscapes_79.8miou.pth

The error log is:

2
11
2
13
11
12
4
1
12
12
11
2
0
11
4
12
11
5
12
[02/14 20:40:13 d2.utils.events]:  eta: 14:30:56  iter: 419  total_loss: 79.05  loss_ce: 0.7662  loss_mask: 0.4672  loss_dice: 1.376  loss_bbox: 0.641  loss_giou: 0.7265  loss_ce_dn: 0.3494  loss_mask_dn: 0.5014  loss_dice_dn: 1.254  loss_bbox_dn: 0.2784  loss_giou_dn: 0.3464  loss_ce_0: 1.648  loss_mask_0: 0.4273  loss_dice_0: 1.536  loss_bbox_0: 3.453  loss_giou_0: 1.724  loss_ce_1: 1.129  loss_mask_1: 0.4599  loss_dice_1: 1.435  loss_bbox_1: 1.292  loss_giou_1: 1.109  loss_ce_dn_1: 0.8849  loss_mask_dn_1: 0.5496  loss_dice_dn_1: 1.32  loss_bbox_dn_1: 0.4981  loss_giou_dn_1: 0.4988  loss_ce_2: 0.9948  loss_mask_2: 0.4748  loss_dice_2: 1.334  loss_bbox_2: 0.9305  loss_giou_2: 0.9705  loss_ce_dn_2: 0.399  loss_mask_dn_2: 0.5417  loss_dice_dn_2: 1.294  loss_bbox_dn_2: 0.3806  loss_giou_dn_2: 0.4247  loss_ce_3: 0.7965  loss_mask_3: 0.4374  loss_dice_3: 1.374  loss_bbox_3: 0.7992  loss_giou_3: 0.8476  loss_ce_dn_3: 0.2176  loss_mask_dn_3: 0.5195  loss_dice_dn_3: 1.283  loss_bbox_dn_3: 0.3308  loss_giou_dn_3: 0.3852  loss_ce_4: 0.7999  loss_mask_4: 0.4505  loss_dice_4: 1.355  loss_bbox_4: 0.7958  loss_giou_4: 0.8001  loss_ce_dn_4: 0.2266  loss_mask_dn_4: 0.5241  loss_dice_dn_4: 1.261  loss_bbox_dn_4: 0.3066  loss_giou_dn_4: 0.3541  loss_ce_5: 0.9027  loss_mask_5: 0.4278  loss_dice_5: 1.316  loss_bbox_5: 0.6092  loss_giou_5: 0.8138  loss_ce_dn_5: 0.1839  loss_mask_dn_5: 0.5166  loss_dice_dn_5: 1.279  loss_bbox_dn_5: 0.2979  loss_giou_dn_5: 0.3484  loss_ce_6: 0.6809  loss_mask_6: 0.4294  loss_dice_6: 1.335  loss_bbox_6: 0.6096  loss_giou_6: 0.7339  loss_ce_dn_6: 0.2392  loss_mask_dn_6: 0.526  loss_dice_dn_6: 1.29  loss_bbox_dn_6: 0.2936  loss_giou_dn_6: 0.3468  loss_ce_7: 0.7687  loss_mask_7: 0.4149  loss_dice_7: 1.362  loss_bbox_7: 0.624  loss_giou_7: 0.7326  loss_ce_dn_7: 0.2941  loss_mask_dn_7: 0.5212  loss_dice_dn_7: 1.293  loss_bbox_dn_7: 0.2868  loss_giou_dn_7: 0.3471  loss_ce_8: 0.7624  loss_mask_8: 0.456  loss_dice_8: 1.393  loss_bbox_8: 0.6341  loss_giou_8: 0.7605  loss_ce_dn_8: 0.3587  loss_mask_dn_8: 0.5122  loss_dice_dn_8: 1.299  loss_bbox_dn_8: 0.281  loss_giou_dn_8: 0.3456  time: 0.5916  data_time: 0.0030  lr: 0.0001  max_mem: 3626M
ERROR [02/14 20:40:13 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/train_loop.py", line 268, in run_step
    data = next(self._data_loader_iter)
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/data/common.py", line 291, in __iter__
    for d in self.dataset:
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data.append(next(self.dataset_iter))
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/data/common.py", line 258, in __iter__
    yield self.dataset[idx]
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/data/common.py", line 95, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/utils/serialize.py", line 26, in __call__
    return self._obj(*args, **kwargs)
  File "/home/otakutyrant/Projects/MaskDINO/maskdino/data/dataset_mappers/mask_former_semantic_dataset_mapper.py", line 182, in __call__
    instances.gt_boxes = masks.get_bounding_boxes()
AttributeError: 'list' object has no attribute 'get_bounding_boxes'

[02/14 20:40:13 d2.engine.hooks]: Overall training speed: 418 iterations in 0:04:07 (0.5917 s / it)
[02/14 20:40:13 d2.engine.hooks]: Total training time: 0:04:08 (0:00:01 on hooks)
[02/14 20:40:13 d2.utils.events]:  eta: 14:30:55  iter: 420  total_loss: 79.05  loss_ce: 0.7662  loss_mask: 0.4672  loss_dice: 1.376  loss_bbox: 0.641  loss_giou: 0.7265  loss_ce_dn: 0.3494  loss_mask_dn: 0.5014  loss_dice_dn: 1.254  loss_bbox_dn: 0.2784  loss_giou_dn: 0.3464  loss_ce_0: 1.648  loss_mask_0: 0.4273  loss_dice_0: 1.536  loss_bbox_0: 3.453  loss_giou_0: 1.724  loss_ce_1: 1.129  loss_mask_1: 0.4599  loss_dice_1: 1.435  loss_bbox_1: 1.292  loss_giou_1: 1.109  loss_ce_dn_1: 0.8849  loss_mask_dn_1: 0.5496  loss_dice_dn_1: 1.32  loss_bbox_dn_1: 0.4981  loss_giou_dn_1: 0.4988  loss_ce_2: 0.9948  loss_mask_2: 0.4748  loss_dice_2: 1.334  loss_bbox_2: 0.9305  loss_giou_2: 0.9705  loss_ce_dn_2: 0.399  loss_mask_dn_2: 0.5417  loss_dice_dn_2: 1.294  loss_bbox_dn_2: 0.3806  loss_giou_dn_2: 0.4247  loss_ce_3: 0.7965  loss_mask_3: 0.4374  loss_dice_3: 1.374  loss_bbox_3: 0.7992  loss_giou_3: 0.8476  loss_ce_dn_3: 0.2176  loss_mask_dn_3: 0.5195  loss_dice_dn_3: 1.283  loss_bbox_dn_3: 0.3308  loss_giou_dn_3: 0.3852  loss_ce_4: 0.7999  loss_mask_4: 0.4505  loss_dice_4: 1.355  loss_bbox_4: 0.7958  loss_giou_4: 0.8001  loss_ce_dn_4: 0.2266  loss_mask_dn_4: 0.5241  loss_dice_dn_4: 1.261  loss_bbox_dn_4: 0.3066  loss_giou_dn_4: 0.3541  loss_ce_5: 0.9027  loss_mask_5: 0.4278  loss_dice_5: 1.316  loss_bbox_5: 0.6092  loss_giou_5: 0.8138  loss_ce_dn_5: 0.1839  loss_mask_dn_5: 0.5166  loss_dice_dn_5: 1.279  loss_bbox_dn_5: 0.2979  loss_giou_dn_5: 0.3484  loss_ce_6: 0.6809  loss_mask_6: 0.4294  loss_dice_6: 1.335  loss_bbox_6: 0.6096  loss_giou_6: 0.7339  loss_ce_dn_6: 0.2392  loss_mask_dn_6: 0.526  loss_dice_dn_6: 1.29  loss_bbox_dn_6: 0.2936  loss_giou_dn_6: 0.3468  loss_ce_7: 0.7687  loss_mask_7: 0.4149  loss_dice_7: 1.362  loss_bbox_7: 0.624  loss_giou_7: 0.7326  loss_ce_dn_7: 0.2941  loss_mask_dn_7: 0.5212  loss_dice_dn_7: 1.293  loss_bbox_dn_7: 0.2868  loss_giou_dn_7: 0.3471  loss_ce_8: 0.7624  loss_mask_8: 0.456  loss_dice_8: 1.393  loss_bbox_8: 0.6341  loss_giou_8: 0.7605  loss_ce_dn_8: 0.3587  loss_mask_dn_8: 0.5122  loss_dice_dn_8: 1.299  loss_bbox_dn_8: 0.281  loss_giou_dn_8: 0.3456  time: 0.591
6  data_time: 0.0030  lr: 0.0001  max_mem: 3626M
Traceback (most recent call last):
  File "/home/otakutyrant/Projects/MaskDINO/train_net.py", line 388, in <module>
    launch(
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/launch.py", line 84, in launch
    main_func(*args)
  File "/home/otakutyrant/Projects/MaskDINO/train_net.py", line 375, in main
    return trainer.train()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/engine/train_loop.py", line 268, in run_step
    data = next(self._data_loader_iter)
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/data/common.py", line 291, in __iter__
    for d in self.dataset:
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data.append(next(self.dataset_iter))
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/data/common.py", line 258, in __iter__
    yield self.dataset[idx]
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/data/common.py", line 95, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/home/otakutyrant/Projects/MaskDINO/.env/lib/python3.10/site-packages/detectron2/utils/serialize.py", line 26, in __call__
    return self._obj(*args, **kwargs)
  File "/home/otakutyrant/Projects/MaskDINO/maskdino/data/dataset_mappers/mask_former_semantic_dataset_mapper.py", line 182, in __call__
    instances.gt_boxes = masks.get_bounding_boxes()
AttributeError: 'list' object has no attribute 'get_bounding_boxes'

Those number lines are printed by print(len(masks)) before instances.gt_boxes = masks.get_bounding_boxes().

Question about the segmentation branch

Hi,

Nice work. I see that you use the highest resolution backbone feature map and encoder feature map to generate the pixel embedding map. Did you try including other feature maps with lower resolution (backbone or encoder) and find any increase in performance?

Thanks,
Owen

MaskDINO/maskdino/modeling/pixel_decoder/maskdino_encoder.py

Lines 415 to 428 in 76c8e45

    
           for idx, f in enumerate(self.in_features[:self.num_fpn_levels][::-1]): 
        
               x = features[f].float() 
        
               lateral_conv = self.lateral_convs[idx] 
        
               output_conv = self.output_convs[idx] 
        
               cur_fpn = lateral_conv(x) 
        
               # Following FPN implementation, we use nearest upsampling here 
        
               y = cur_fpn + F.interpolate(out[self.high_resolution_index], size=cur_fpn.shape[-2:], mode="bilinear", align_corners=False) 
        
               y = output_conv(y) 
        
               out.append(y) 
        
           for o in out: 
        
               if num_cur_levels < self.total_num_feature_levels: 
        
                   multi_scale_features.append(o) 
        
                   num_cur_levels += 1 
        
           return self.mask_features(out[-1]), out[0], multi_scale_features

Question about the training for sementic segmentation.

Thanks for sharing your great work and I read your paper.

I have one question about the training for semantic segmentation on ADE20K.
For instance segmentation and panoptic segmentation, we can get GT bbox from mask of thing class.
But, for sementic segmentation, masks of the same class are grouped together so we cannot use instance level mask and get GT bbox.
How did you train Mask DINO for semantic segmentation?
Did you treat multiple masks of the same class as a single instance?
Or did you consider all classes as stuff classes and remove the box loss and box matching?

Tentative date for code release

Hi authors,

Thanks for your wonderful line of work and for the most recent MaskDINO work.
Would you be so kind as to provide an estimate of by when you will release the code?

Regards
Harkirat

Models cannot train with negative examples

I was trying to train using images with no annotations by setting the filter empty annotations flag to false. However the code crashes out in the coco parsing and later down the line when gt_masks is referenced. When using negative examples, the instance object won’t have gt_masks.

training loss starts around 2e4

I tried to train the swin large model on coco, with swin backbone pretrained initilaization. The initial loss starts around 2e4. The loss print looks like this:

iter: 19  total_loss: 2.163e+04  loss_ce: 1853  loss_mask: 4.578  loss_dice: 4.78  loss_bbox: 0.05134  loss_giou: 0.03028  loss_ce_dn: 97.37  loss_mask_dn: 4.537  loss_dice_dn: 4.779  loss_bbox_dn: 0.002613  loss_giou_dn: 0.01703  loss_ce_0: 2027  loss_mask_0: 3.692  loss_dice_0: 4.75  loss_bbox_0: 0.07883  loss_giou_0: 0.0301  loss_ce_dn_0: 93.84  loss_mask_dn_0: 4.338  loss_dice_dn_0: 4.748  loss_bbox_dn_0: 0.002613  loss_giou_dn_0: 0.01703  loss_ce_1: 2074  loss_mask_1: 3.531  loss_dice_1: 4.745  loss_bbox_1: 0.05333  loss_giou_1: 0.03028  loss_ce_dn_1: 91.22  loss_mask_dn_1: 3.887  loss_dice_dn_1: 4.74  loss_bbox_dn_1: 0.002613  loss_giou_dn_1: 0.01703  loss_ce_2: 1948  loss_mask_2: 3.681  loss_dice_2: 4.743  loss_bbox_2: 0.05584  loss_giou_2: 0.0301  loss_ce_dn_2: 92.22  loss_mask_dn_2: 4.061  loss_dice_dn_2: 4.766  loss_bbox_dn_2: 0.002613  loss_giou_dn_2: 0.01703  loss_ce_3: 1774  loss_mask_3: 4.15  loss_dice_3: 4.762  loss_bbox_3: 0.05436  loss_giou_3: 0.03028  loss_ce_dn_3: 86.42  loss_mask_dn_3: 4.324  loss_dice_dn_3: 4.754  loss_bbox_dn_3: 0.002613  loss_giou_dn_3: 0.01703  loss_ce_4: 1770  loss_mask_4: 4.513  loss_dice_4: 4.739  loss_bbox_4: 0.05335  loss_giou_4: 0.03519  loss_ce_dn_4: 89.65  loss_mask_dn_4: 4.792  loss_dice_dn_4: 4.726  loss_bbox_dn_4: 0.002613  loss_giou_dn_4: 0.01703  loss_ce_5: 1838  loss_mask_5: 5.272  loss_dice_5: 4.718  loss_bbox_5: 0.05066  loss_giou_5: 0.03384  loss_ce_dn_5: 96.19  loss_mask_dn_5: 5.856  loss_dice_dn_5: 4.696  loss_bbox_dn_5: 0.002613  loss_giou_dn_5: 0.01703  loss_ce_6: 1814  loss_mask_6: 4.328  loss_dice_6: 4.731  loss_bbox_6: 0.04711  loss_giou_6: 0.03028  loss_ce_dn_6: 95.16  loss_mask_dn_6: 4.563  loss_dice_dn_6: 4.727  loss_bbox_dn_6: 0.002613  loss_giou_dn_6: 0.01703  loss_ce_7: 1785  loss_mask_7: 4.494  loss_dice_7: 4.75  loss_bbox_7: 0.0455  loss_giou_7: 0.03191  loss_ce_dn_7: 97.4  loss_mask_dn_7: 4.624  loss_dice_dn_7: 4.761  loss_bbox_dn_7: 0.002613  loss_giou_dn_7: 0.01703  loss_ce_8: 1910  loss_mask_8: 4.013  loss_dice_8: 4.792  loss_bbox_8: 0.04841  loss_giou_8: 0.03385  loss_ce_dn_8: 102.4  loss_mask_dn_8: 4.038  loss_dice_dn_8: 4.791  loss_bbox_dn_8: 0.002613  loss_giou_dn_8: 0.01703  loss_ce_interm: 2027  loss_mask_interm: 3.694  loss_dice_interm: 4.751  loss_bbox_interm: 0.07883  loss_giou_interm: 0.0301  time: 1.6347  data_time: 0.4793  lr: 0.0001  max_mem: 14792M

When the network is initialized using your pretrained weights, the initial loss is about 40 some. This seems to mean that the initialization of weights is not done properly?

ImportError: cannot import name 'add_maskformer2_config' from 'maskdino'

I followed this for setup -

conda create --name maskdino python=3.8 -y
conda activate maskdino
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install -U opencv-python

git clone [email protected]:facebookresearch/detectron2.git
cd detectron2
pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git

cd ..
git clone [email protected]:facebookresearch/MaskDINO.git
cd MaskDINO
pip install -r requirements.txt
cd maskdino/modeling/pixel_decoder/ops
sh make.sh

I am getting this error while I am trying to run train_net.py

update maskdino init method

The following lines need to be removed from __init__.py in the maskdino/ folder:
from .data.dataset_mappers.mask_former_instance_dataset_mapper import ( MaskFormerInstanceDatasetMapper, )
from .data.dataset_mappers.mask_former_panoptic_dataset_mapper import ( MaskFormerPanopticDatasetMapper, )
These modules were removed in the latest commits.

How to do inference on single gpu, getting cuda oom error.

When trying to inference one image using detectron2 default predictor, I am getting cuda oom error on a single 15 gb T4 gpu. Any help?

Instance Segmentation training on a custom dataset (some questions)

del

TypeError: init() got an unexpected keyword argument 'dtype'

Hello, thanks for your great work.

When I run the codes, I encounter the following errors.
Can you provide any helpfully suggestions? Thanks a lot in advance.

[01/26 23:47:40 d2.engine.train_loop]: Starting training from iteration 0
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
ERROR [01/26 23:47:45 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/16T-2/zitong/code/detectron2/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/16T-2/zitong/code/detectron2/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/16T-2/zitong/code/detectron2/detectron2/engine/train_loop.py", line 421, in run_step
    with autocast(dtype=self.precision):
TypeError: __init__() got an unexpected keyword argument 'dtype'
[01/26 23:47:45 d2.engine.hooks]: Total training time: 0:00:05 (0:00:00 on hooks)
[01/26 23:47:45 d2.utils.events]:  iter: 0    lr: N/A  max_mem: 411M
Traceback (most recent call last):
  File "train_net.py", line 377, in <module>
    launch(
  File "/16T-2/zitong/code/detectron2/detectron2/engine/launch.py", line 69, in launch
    mp.start_processes(
  File "/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 4 terminated with the following error:
Traceback (most recent call last):
  File "/home/zoloz/.conda/envs/maskdino/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/16T-2/zitong/code/detectron2/detectron2/engine/launch.py", line 123, in _distributed_worker
    main_func(*args)
  File "/16T-2/zitong/code/MaskDINO/train_net.py", line 364, in main
    return trainer.train()
  File "/16T-2/zitong/code/detectron2/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/16T-2/zitong/code/detectron2/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/16T-2/zitong/code/detectron2/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/16T-2/zitong/code/detectron2/detectron2/engine/train_loop.py", line 421, in run_step
    with autocast(dtype=self.precision):
TypeError: __init__() got an unexpected keyword argument 'dtype'

R50 panoptic eval

hi,I found there are some problems when eval R50 panopric（53.0）when load the model（download from your github）

so I think this mistake is the mismatch between checkpoint and model

please check it!! Very thanks!!!

IndexError: tuple index out of range

File "train_net.py", line 378, in
launch(
File "/mnt/md0/rabi/transformer_model/detectron2/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 363, in main
trainer = Trainer(cfg)
File "train_net.py", line 82, in init
model = self.build_model(cfg)
File "/mnt/md0/rabi/transformer_model/detectron2/detectron2/engine/defaults.py", line 514, in build_model
model = build_model(cfg)
File "/mnt/md0/rabi/transformer_model/detectron2/detectron2/modeling/meta_arch/build.py", line 22, in build_model
model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
File "/mnt/md0/rabi/transformer_model/detectron2/detectron2/config/config.py", line 189, in wrapped
explicit_args = _get_args_from_config(from_config_func, *args, **kwargs)
File "/mnt/md0/rabi/transformer_model/detectron2/detectron2/config/config.py", line 245, in _get_args_from_config
ret = from_config_func(*args, **kwargs)
File "/mnt/md0/rabi/transformer_model/MaskDINO/maskdino/maskdino.py", line 197, in from_config
"metadata": MetadataCatalog.get(cfg.DATASETS.TRAIN[0]),
IndexError: tuple index out of range

wrong spelling

you have a wrong spelling of "result" in the Markdown file :)

Try to train with own custom dataset in coco format but the error below

FileNotFoundError: [Errno 2] No such file or directory: 'MaskDINO/DETECTRON2_DATASETS/coco/coco/annotations/instances_train2017.json'

How can I set some class weights for semantic segmentation?

I have fine-tuned a maskdino model in a semantic segmantation dataset that has four classes: other, person, car, and road. However it performs poorly in person and car these two classes while the others otherwise. So I want to enhance the loss penality for person and car these classes. But how? I read some source code about SetCritation but I am not familiar with some loss functions. I appreciate your help!

How to visualize the predicted mask

Hi,
I read the article about visualizing the mask, i have one question, how to visualize the predicted mask of front decoder layer(not the last decoder layer) in Mask2former?
any suggestion will be helpful !

When will you publish your code?

I read your article, and very appreciated it!
When will you publish its code on Github?

Dataset has been registered but training dont started?

Error show dataset is registered but why training not get started?

The implementation of Hybrid Matching

Thanks for the great work.

I have read the paper and I think there is not enough detail about the implementation of hybrid matching. Or if there is, I could not understand it. Is it possible to elaborate it more, will it be analyzed in detail in the second version of the paper?

Thanks in advance
Esat

About implementation details

Thanks for your impressive work first!
I'd like to dive into some details:

We predict both boxes and masks in the encoder and select the top-ranked ones to initialize decoder queries.

As far as I know, the feature sequence is lengthy (128x128+64x64+32x32+16x16 for 4-scale ). If each feature vector predicts its mask prediction (with reso. 256x256), it is very likely to consume too much GPU memory. Could you please elaborate on the implementation?

initialize both the content and anchor box queries in Mask DINO

Does it behave like 2-stage Deformable DETR ( q, q_pos = split( LN(Linear(PE(bbox))), edims ) ) ? Or something else?

Semantic segmentation with Swin-L backbone

Will there be pre-trained models available for semantic segmentation with Swin-L backbone?

Thanks!

maskdino directory has a replicate in maskdino/maskdino

is it a mistake

Can this model be trained from instance segmentation COCO format?

OneFormer requires custom dataset to be in a panoptic COCO format which differs from things detection COCO format.
Can this model be trained from instance segmentation COCO format? Or will I have to convert things detection COCO format to panoptic?

TTA Codes for Instance segmentation and object detection

Nice work! Would you be kind enough to share the TTA Codes for Instance segmentation and object detection? Thanks in advance.

EVAL_PERIOD

when i changed the EVAL_PERIOD to 1000 from default value of 5000 , I see evaluation happening after each 1000 iterations but the checkpoint is being saved only after 5000 iterations. looks like its hard coded inside the code.

Can anyone please check ?

How to export maskdino into onnx format?

Hi Team, After initial evaluation am planning to export model into onnx format using detectron2 deploy tools.

But, Am running into following error, on build_model(cfg) call.
KeyError: "No object named 'MaskDINO' found in 'META_ARCH' registry!"

I made following changes in setup_cfg() method of export_model.py

def setup_cfg(args):
    cfg = get_cfg()
    # cuda context is initialized before creating dataloader, so we don't fork anymore
    cfg.DATALOADER.NUM_WORKERS = 0
    add_pointrend_config(cfg)
    add_deeplab_config(cfg)                            # <---
    add_maskdino_config(cfg)                         # <---
    cfg.merge_from_file(args.config_file)
    cfg.merge_from_list(args.opts)
    cfg.freeze()
    return cfg

Is there any way I can resolve this? Please guide me here.

Try to train custom panoptic cocodata and Get KeyError with "stuff_dataset_id_to_contiguous_id"

Thank you for your great works.

After check the train with coco2017 dataset works well,
I try to train custom panoptic cocodata.
I made custom dataset and make simbolic link as coco dataset.

But, it makes error with follow message.

Traceback (most recent call last):
  File "train_net.py", line 380, in <module>
    launch(
  File "/usr/local/lib/python3.8/dist-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "train_net.py", line 364, in main
    trainer = Trainer(cfg)
  File "train_net.py", line 83, in __init__
    data_loader = self.build_train_loader(cfg)
  File "train_net.py", line 221, in build_train_loader
    return build_detection_train_loader(cfg, mapper=mapper)
  File "/usr/local/lib/python3.8/dist-packages/detectron2/config/config.py", line 207, in wrapped
    explicit_args = _get_args_from_config(from_config, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/detectron2/config/config.py", line 245, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/build.py", line 344, in _train_loader_from_config
    dataset = get_detection_dataset_dicts(
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/build.py", line 241, in get_detection_dataset_dicts
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/build.py", line 241, in <listcomp>
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/catalog.py", line 58, in get
    return f()
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/datasets/coco_panoptic.py", line 88, in <lambda>
    lambda: load_coco_panoptic_json(panoptic_json, image_root, panoptic_root, metadata),
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/datasets/coco_panoptic.py", line 51, in load_coco_panoptic_json
    segments_info = [_convert_category_id(x, meta) for x in ann["segments_info"]]
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/datasets/coco_panoptic.py", line 51, in <listcomp>
    segments_info = [_convert_category_id(x, meta) for x in ann["segments_info"]]
  File "/usr/local/lib/python3.8/dist-packages/detectron2/data/datasets/coco_panoptic.py", line 33, in _convert_category_id
    segment_info["category_id"] = meta["stuff_dataset_id_to_contiguous_id"][
KeyError: 12

I can't find the fixing point.
Please somebody help me.

Thank you

About mask head on eq 1

About eq (1) below where the paper say that "M is the segmentation head", does it mean there is an extra layer operating over the 2d map T(Cb) + F(Ce)? If so, what kind of layer is it? Intuitively, I would expect an MLP layer over qc instead.

m = qc ⊗ M(T(Cb) + F(Ce))

Thank you!

RuntimeError: Global alloc not supported yet

I tried running mask dino recently with the latest code and a new environment. I run into a RuntimeError: Global alloc not supported yet using the batch_dice_loss_jit function on line 166 in matcher.py.
This error appears to go away if I just switch to using the regular batch_dice_loss instead.
My new environment is using pytorch 1.10, detectron2 0.6, and cuda 11.2.

I saw that in your detrex repository, you have already adjusted the code to no longer use jit.
IDEA-Research/detrex#161

Is the detrex repository replacing/being updated more often than this repository? Should I switch to using that one instead?

Training On custom instance segmentation datasetset

I am trying to train MaskDINO with R50 backbone for my custom Instance segmentation dataset which has 3 classes.

My Data is in COCO format -

-datasets
    -coco
        -images
            aaaa.jpg
            bbb.jpg
            ...
            zzz.jpg
        -annotations
            -instances_train2017.json
            -instances_val2017.json

I am using the following command -
python3 train_net.py --num-gpus 1 --config-file configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml MODEL.WEIGHTS pretrained_models/maskdino_r50_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask46.3ap_box51.7ap.pth

I made following changes -

I have updated NUM_CLASSES in configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml to 3

after this change I ran but countered following error -

AssertionError: Attribute 'thing_classes' in the metadata of 'coco_2017_train' cannot be set to a different value!
['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] != ['Facet', 'Wall', 'Extension']

Updated COCO_CATEGORIES in maskdino/data/datasets/register_coco_stuff_10k.py

COCO_CATEGORIES = [
    {"color": [220, 20, 60], "isthing": 1, "id": 1, "name": "class1"},
    {"color": [119, 11, 32], "isthing": 1, "id": 2, "name": "class2"},
    {"color": [0, 0, 142], "isthing": 1, "id": 3, "name": "class3"},
]

After both the changes, I got the following error -

File "/home/ubuntu/abhishek/maskdino/maskdino/data/datasets/register_coco_stuff_10k.py", line 192, in _get_coco_stuff_meta
    assert len(stuff_ids) == 171, len(stuff_ids)
AssertionError: 3

Can anyone guide me on training my custom data? What changes I need to make to make this work ?

point loss in mask loss

Hi @FengLi-ust , thanks for your great works!

In the appendix, you stated that We also follow [17, 5, 4] to use point loss in mask loss for efficiency. There are some details that I can not figured out:

How to sample the points? with (a) regular grid, (b) uniform, (c) mildly biased or (d) heavily biased? How many points are smpled for each query?
The dot product in eq (1) will introduce losts of computation and memory cost. Did you sample the point features from M(T(C_b)+F(C_e)) for each query before performing the dot production? While sampling, which kind of boxes are used, GT boxes or anchor boxes?
Is there an obvious APm gap between the point (mask) loss and the mask loss without sampling?

Could guide me how to train this model in our own dataset that is in COCO format? step by step guide will be very helpful

Could any one guide me how to train this model in our own dataset that is in COCO format? step by step guide will be very helpful
thank you

Require explanation regarding eval-only mode

I am not able to understand why same thing is happening twice in this part of code? https://github.com/IDEA-Research/MaskDINO/blob/main/train_net.py#L348-L354

MaskDINO support for CPU-only inference

Hi,

The MaskDINO uses the DETR (https://github.com/fundamentalvision/Deformable-DETR) module which wasn't available in CPU-Only format until recently. But there's a new open source implementation at Huggingface now which supports CPU (fundamentalvision/Deformable-DETR#160). Are the authors of this paper or anyone else planning to build the MaskDINO CPU-Only inference support?
Also the install docs of MaskDINO mention that in absence of GPU, this command can be used for setup -

To build on a system that does not have a GPU device but provide the drivers:
TORCH_CUDA_ARCH_LIST='8.0' FORCE_CUDA=1 python setup.py build install

https://github.com/IDEA-Research/MaskDINO/blob/main/INSTALL.md#:~:text=TORCH_CUDA_ARCH_LIST%3D%278.0%27%20FORCE_CUDA%3D1%20python%20setup.py%20build%20install
I am a bit confused about this part, because without GPU, how would one be able to set up drivers (CUDA) on any system?

I was able to train the model on a GPU machine but I don't have access to a GPU now and I wanted to run inference on a CPU machine, if anyone can help with that or confirm CPU-only inference is not possible for MaskDINO as of now, it would be very helpful. Thanks!

MaskDINO 12 epoch results with 100 queries

Hi, @FengLi-ust. I'm very interested in your great work!

When I read your paper, it's method still works well in 100 queries setting. So did you evaluate your method with 1x a schedule on COCO?

Thank you again for your elaborate work and I look forward to hearing from you.

questions about Mask DINO for semantic segmentation

I have seen the Issue #1, and gone through the codebase of Maskformer and Mask2former.
I 'm interested in Mask DINO for semantic seg too, and got some questions.

Is the mask attention in Mask2former kept in Mask DINO ?
Is the semantic map converted into instance mask and bbox for preprocessing, and calculate the loss of converted annotation? For example, for a semantic map F:{0,1,2,3,4} in which there are 5 classes, The map is converted into 5 classes i:0~4 bin masks Fi:{0,1} and bboxs Bi. Hence, loss_cls, loss_bbox, loss_mask are calculated with the three converted annotations. Is this right?
I think the biggest differences between Mask DINO model and Mask2former model is that, the decoder of DINO requires anchor boxes generated from encoder and content queries embeddings as input, while the decoder of Mask2former is like that of the original DETR, which requires learnable embeddings for both content and spatial queries. Although there is not clear bboxs for semantic segmentation, the encoder of DINO still generates a set of anchor boxes for the decoder and iteratively refines them in decoder. Hence, there are bboxs in the output, even through the task is semantic segmentation. Is that right?
I'm curious about the query selection and query denoising schemes of masks. when the category, bbox and bin mask converted from a semantic map are obtained. The cls and bbox is noised as DINO. How to noise the mask, to ensure the behavioral consistency of the denoising and matching parts? And how is the mask inputted into the decoder? What about the matching part?

My questions seem long, thanks for the authors.

To filter output results by confidence score

Hi Team,

Am trying to create demo for MaskDINO, refer https://github.com/MeAmarP/MaskDINO/tree/quickfix/infer_demo

If am correct currently maskdino gives 100 infers per frame. which sometimes overally populates per detected object with overlayed info when visualising using demo.py.

It would be great if we can filter by using cli arg config-score. But am unable to trace where in code we can perform filtering for predictions.

Need little help/guidence here.

when import libraries, already registered error is occurred.

First of all, thank you for sharing your works.

I try to evaluate the model for testing my environment and model with following code
python3 train_net.py --eval-only --num-gpus 1 --config-file configs/coco/panoptic-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml MODEL.WEIGHTS ckpts/maskdino_r50_50ep_300q_hid2048_3sd1_panoptic_pq53.0.pth

But, error with AssertionError: An object named 'MaskDINOEncoder' was already registered in 'SEM_SEG_HEADS' registry! message.

Did anyone face the same case and have a solution?

Full Error message is below. Thank you.
Traceback (most recent call last): File "train_net.py", line 46, in <module> from maskdino import ( File "/workspace/MaskDINO/maskdino/__init__.py", line 23, in <module> from .maskdino import MaskDINO File "/workspace/MaskDINO/maskdino/maskdino/__init__.py", line 3, in <module> from . import modeling File "/workspace/MaskDINO/maskdino/maskdino/modeling/__init__.py", line 3, in <module> from .pixel_decoder.maskdino_encoder import MaskDINOEncoder File "/workspace/MaskDINO/maskdino/maskdino/modeling/pixel_decoder/maskdino_encoder.py", line 189, in <module> class MaskDINOEncoder(nn.Module): File "/usr/local/lib/python3.8/dist-packages/fvcore/common/registry.py", line 59, in deco self._do_register(name, func_or_class) File "/usr/local/lib/python3.8/dist-packages/fvcore/common/registry.py", line 43, in _do_register assert ( AssertionError: An object named 'MaskDINOEncoder' was already registered in 'SEM_SEG_HEADS' registry!

MaskDINO performs poorly on single-class Object detection or Instance segmantation tasks?

Table 2, model params clarification

Hi!

Table 2 shows Mask DINO's total parameter count as below. Cloud you please check if the numbers are correct? Specially for Mask DINO with 24 epochs. It looks like there was a typo and the number was carried from Mask2Former.

Model	Epochs	Query type	Params
Mask2Former	50	100 queries	44M
Mask DINO (ours)	50	100 queries	50M
Mask DINO (ours)	50	300 queries	50M
Mask DINO (ours)	24	300 queries	44M
Mask2Former	12	100 queries	44M
Mask DINO (ours)	12	300 queries	50M

Thank you!

semantic segment train error: Cannot find field 'gt_boxes' in the given Instances!

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 395, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/disk/local_ssd1/zhj/segment/MaskDINO/maskdino/maskdino.py", line 262, in forward
targets = self.prepare_targets(gt_instances, images)
File "/disk/local_ssd1/zhj/segment/MaskDINO/maskdino/maskdino.py", line 350, in prepare_targets
"boxes":box_ops.box_xyxy_to_cxcywh(targets_per_image.gt_boxes.tensor)/image_size_xyxy
File "/opt/conda/lib/python3.7/site-packages/detectron2/structures/instances.py", line 65, in getattr
raise AttributeError("Cannot find field '{}' in the given Instances!".format(name))
AttributeError: Cannot find field 'gt_boxes' in the given Instances!

Base-ADE20K-SemanticSegmentation.yaml
default set DATASET_MAPPER_NAME: "mask_former_semantic"
in MaskFormerSemanticDatasetMapper

input sample:
'instances': Instances(num_instances=2, image_height=640, image_width=640, fields=[gt_classes: tensor([16, 17]), gt_masks: tensor([[]]) has no "gt_bboxes"
but in maskdino.py 259:
function self.prepare_targets need a "gt_bboxes" tensor

so:
modify maskdino.py 347:
"boxes":box_ops.box_xyxy_to_cxcywh(targets_per_image.gt_boxes.tensor)/image_size_xyxy --> "boxes": None
or other resolvent？

Look forward to your reply, thank you！

About the LSJ and padding

Hi @FengLi-ust @SuperHenry2333 ,

Congratulations on your great work!

In your paper you mentioned that you used Large Scale Jittering (LSJ) to replace the DINO augmentation. I am also trying to implement LSJ in my project, but there are two things that are not clear:

After padding the image to 1024*1024, did you feed a key_padding_mask to Deformable DETR, or you let it "see" the whole image with padding like Mask2Former?
Did you modify the point sampling in matching and loss functions to restrict the sampling points within the valid (no padded) regions?

Looking forward to your reply!

Ablation question

Thank you for your work.

In table 13, for 12-epoch settings, it is mentioned that -DINO mask branch reaches 49.5, and was 49.6 before training for segmentation. But 49.6 in Table 2 is trained for 36 epochs instead of 12? Could you clarify the setting for the -DINO mask branch setting?

point sample from mask ground truth using bilinear interpolation

In loss_mask where the segmentation loss is computed, the mask ground truth at uncertain points are sampled using bilinear interpolation when the sampling mode is omitted. Is this a bug?

This line:

MaskDINO/maskdino/modeling/criterion.py

Line 281 in 76c8e45

point_labels = point_sample(

Mask/box denoising hparams

What noise hyper-params are used in MaskDINO? Thanks!

Is there any noise added to "init anchors"?

what is "Unified denoising for mask" / "Unified DN" in figure 1? any special losses for this?

Does MaskDINO also use deformable attention? Both in encoder and in decoder?

	for idx, f in enumerate(self.in_features[:self.num_fpn_levels][::-1]):
	x = features[f].float()
	lateral_conv = self.lateral_convs[idx]
	output_conv = self.output_convs[idx]
	cur_fpn = lateral_conv(x)
	# Following FPN implementation, we use nearest upsampling here
	y = cur_fpn + F.interpolate(out[self.high_resolution_index], size=cur_fpn.shape[-2:], mode="bilinear", align_corners=False)
	y = output_conv(y)
	out.append(y)
	for o in out:
	if num_cur_levels < self.total_num_feature_levels:
	multi_scale_features.append(o)
	num_cur_levels += 1
	return self.mask_features(out[-1]), out[0], multi_scale_features