The detreg from amirbar

Fine Tuning the Model on a fraction of VOC

Thank You for the great work. It looks like the parameter --filter_pct has never been used in the code. It means the code effectively running fine-tuning on whole VOC/COCO datasets. Please correct me if I am wrong.

Thanks

False detection on colab demo

Hi, I followed the Notebook and as a result, received only false detection. Please let me know if I am missing on something.
Colab Demo result
Thank you :)

TypeError: 'numpy.float64' object cannot be interpreted as an integer

Hello, thank you for your excellent work. I found the problem in the following picture when I used the weights that you provided pre-trained on ImageNet to fine-tune the coco data set I made myself. Do you know what the problem is?

May you provide training logs for pretrain model?

Thanks for the great work! I want to train the pretrain model based on IMN1k. The class number is 92, but the class label is 1 only. The loss of cate_error is always 0 or 100. May you provide training logs for pretrain model?

Which hyperparameters are used in the fewshot setting

Hi, what are the batchsize, lr, lr_drop, and epoch used in the fewshot setting?
There are two fewshot experiments, as shown in Tabel 3 and 4. Please share the hyperparameters for the two experiments. Many thx~~

Question about the lr_drop in DETR based experiment

It seems the lr_drop is missing in the pretraining and finetune script.
The default lr_drop for DETR is 200, but the pretraining is only 60 epochs, so lr_drop is possible missing.

few-shot detection performance on COCO 2014

hi,

Interesting work.

Following the training and evaluation pipeline of your paper, I have reproduced the full-supervised object detection results on MS COCO 2017 (Table 1). However, the produced few-shot results are significantly worse than that of your paper on MS-COCO 2014 (30shot-seed0, Table 3). Specifically, I first train the model on base classes (60 classes, 99k labeled images) from ImageNet pretraining and then fine-tune it on few-shot labeled images (80 classes, every class has 30 instances). Finally, the results (all classes, base classes, and novel classes) of 5000 validation images are reported as follows. Here, we set all hyperparameters as the same as the fully-supervised training.

evaluation type	AP	AP75
All classes	29.9	32.7
Base classes	33.0	35.9
Novel classes	22.0	23.9
Novel classes (Table 3 in your paper)	30.0	33.7

In the above table, we can see that the reproduced results are significantly worse than that of your paper on novel classes (about 10 AP). Could you like to share the results of base classes and all classes with your model, or the hyper-parameters of few-shot training (for example, epoch, learning rate, and more)?

Best,
Bin-Bin ([email protected])

Selective Search boxes are recomputed for the topk policy

How to run DETReg on video custom dataset ?

Could you provide me some feedback about this possibility and some guidance on where is the video custom data loading actually happening so that I could check?

Thank you for your time

Did you pretrain DETReg from scratch in Semi-supervised Learning experiment

In the Semi-supervised Learning experiment, you said "We pretrain DETReg (Deformable DETR) for 50 epochs on MS COCO train2017 without labels".

So, did you load the IN1K DETReg pretrain and then pretrain on COCO or just using SwAV as the pretrain and then pretrain on COCO ?

evaluate pretrained model

Hi, are there any way to directly evaluate pretrained model?

Question About Matching

Hello- first off, great work! I have really enjoyed working through this repo.
It appears that the Hungarian Matcher does not use the output and target object embeddings when determining the matching. Why is this? It appears the embeddings still contribute to the loss.
Also, I understand that there are usually more outputs than targets- In the paper it says you pad the targets with non-objects (I assumed these would be something like random crops from portions of the image for which selective search returned no boxes), but in the code it looks like you discard unmatched outputs as non-objects. Can you explain this a bit more? It seems to me that with no non-object targets, the DETR model would simply learn to assign an object class to every detection no matter what and that portion of the loss would quickly collapse to zero.
Thanks again!

Missing file (util/plot_utils.py)

Hi,

While creating the SS boxes, I am getting the attached error. It looks like the file util/plot_utils.py is missing. Please advise. Thanks

I use 300 pictures to finetune,but the result is bad!

Reproducing the Results of Table 3 & Table 4

Hi,

How can I reproduce the results of Table 3 which is about Few-shot detection performance for the 20 novel categories on the COCO dataset.

Also, in Table 4 (Comparison to semi-supervised detection methods), it is mentioned in the paper that you pretrained the network on the entire coco train2017 unlabeled images and then fine-tuned on X% of data. But the instructions in README or the corresponding config files load the ImageNet100 pretrained weights for fine-tuning on COCO. Kindly guide me, what process may I follow to reproduce the results reported in Table 4.

Thanks

RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

Hi, I'm trying to run the pretraining but I receive a mismatch size here https://github.com/amirbar/DETReg/blob/main/models/deformable_detr.py#L328, src_features has a shape of torch.Size([228, 512]) and target_features a shape of torch.Size([228, 3, 128, 128]). Is this ok?

Start training
/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py:329: UserWarning: Using a target size (torch.Size([228, 3, 128, 128])) that is different to the input size (torch.Size([228, 512])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')}
Traceback (most recent call last):
File "main.py", line 403, in
main(args)
File "main.py", line 314, in main
model, swav_model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "/home/jossalgon/notebooks/unsupervised/DETReg/engine.py", line 50, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 406, in forward
losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, **kwargs))
File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 381, in get_loss
return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs)
File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 329, in loss_object_embedding_loss
return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')}
File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py", line 3058, in l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/functional.py", line 73, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined]
RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 188, in main
cmd=process.args)

Using:
cudatoolkit 11.1.74 h6bb024c_0 nvidia/linux-64
pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch/linux-64
torchaudio 0.9.0 py37 pytorch/linux-64
torchvision 0.10.0 py37_cu111 pytorch/linux-64

Thanks and great work!

A very simple question: how to get COCO 10% labeled data

If there a public version of COCO 1% or 10% labeled data? Should I split personally?

Add Faster RCNN detector

TODO

(also open to pull requests if anyone happens to have implemented this)

Release the code for Few-shot object detection

Would you like to release the code for the Airbus Ship Detection and few-shot detection part?

What happens when Imagenet pre-training is replaced with COCO pre-training in Table-1?

Hi,
Great work! I was just wondering how important is the imagenet pre-training in Table-1. What happens if we pre-train DETReg from scratch on MS-COCO train 2017 set and use that to compare against methods (pre-trained on ImageNet) in Table-1? Did you try that experiment? Is it worse than pre-training on ImageNet 1k? Is it because of fewer images or is it because of bad region proposals on scene centric datasets like COCO?
Thanks

Only one gpu

Would you like to share how to train with a single GPU.

How to plot the gradients

Can you help saying how to plot the gradient norms from the unsupervised DETReg detection

Would you like to share the pretrained UP-DETR(Deformable DETR) weights

Hi, would you like to share the pretrained UP-DETR(Deformable DETR) weights on imagenet?

difference between obj_embedding_head head and intermediate?

What is the difference for the two options for the swav backbone? What is the use case for choosing obj_embedding_head = "head" or "intermediate"?

What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

DETReg/models/backbone.py

Lines 156 to 177 in 0a258d8

    
           def build_swav_backbone(args, device): 
        
               model = resnet50( 
        
                   normalize=True, 
        
                   hidden_mlp=2048, 
        
                   output_dim=128, 
        
               ) 
        
               for name, parameter in model.named_parameters(): 
        
                   parameter.requires_grad_(False) 
        
               checkpoint = torch.hub.load_state_dict_from_url( 
        
                   'https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar', map_location="cpu") 
        
               state_dict = {k.replace("module.", ""): v for k, v in checkpoint.items()} 
        
               missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False) 
        
               return model.to(device) 
        
           def build_swav_backbone_old(args, device): 
        
               train_backbone = False 
        
               return_interm_layers = args.masks or (args.num_feature_levels > 1) 
        
               model = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation, load_backbone=args.load_backbone).to(device) 
        
               def model_func(elem): 
        
                   return model(elem)['0'].mean(dim=(2,3)) 
        
               return model_func

It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?

Thanks.

Can you provide you hyperparameter settings of DETReg in Few-Shot Object Detection?

Thanks for your great work very much!
I'm intersted in using your unsupervised pretrained DETReg in few shot object detection, so can you provide the hyperparameter of DETReg in FSOD? (such as total epochs/iterations, optimizer, initial learning rate, learning rate dropping iterations/epochs, weight decay and so on). Or maybe your hyperparameter of FSOD is same as Meta-DETR? (becaused I noticed you compared your results with Meta-DETR.)
Sincerely waiting for your reply!

It seems in the pretrain stage the network output 90 categories instead of 2

Hello, It seems the network output 90 categories instead of 2, in the pretrain stage.
In the paper, it supposes to output 2 categories (either back gourd or foreground), which is not true in the code.
I'm so confused, Am i missing something?

DETReg/configs/DETReg_top30_in100.sh

Line 8 in 490e404

    
           python -u main.py --output_dir ${EXP_DIR} --dataset imagenet100 --strategy topk --load_backbone swav --max_prop 30 --object_embedding_loss --lr_backbone 0 ${PY_ARGS}

DETReg/main.py

Line 120 in 490e404

parser.add_argument('--dataset_file', default='coco')

DETReg/models/deformable_detr.py

Lines 497 to 503 in 490e404

    
           if args.dataset_file == 'coco': 
        
               num_classes = 90 
        
           elif args.dataset_file == 'coco_panoptic': 
        
               num_classes = 250 
        
           else: 
        
               num_classes = 20 
        
           num_classes += 1

Hello, your work is great, I am very interested, how to do pre-training on a custom pre-training dataset

inconsistent description?

hi, @amirbar , in paper, you describe "We pretrain two variants of DETReg based on DETR [5] and Deformable DETR [71] detectors for 5 and 60 epochs on IN1K and IN100, respectively", are you saying for both in1k and in100 datasets, we pretrain detr for 5 epochs and deformable detr for 60 epochs?

but in the model zoo table of the readme file, i see you pretrain detr for 60 epochs and deformable detr for 5 epochs, is it a typo?

Have you tried using pre-trained RPN to extract proposals?

Pretrained model on ImageNet-1K

Hi, Thank you for sharing your great work. I am conducting a study on the features of DETReg, and wanted to explore the performance with the pretrained model trained on the full ImageNet. I was wondering if you could share an ImageNet-1K pretrained model?

Thank you

Which dateset did you use in the fewshot experiment, coco2014 or coco2017?

error in ms_deformable_im2col_cuda: initialization error

Follow Installation "Compiling CUDA operators" and execute "python test.py", i delete redundant code, it occur a error that
some problem in function ms_deformable_im2col_cuda. i wonder how fix it.

...
if __name__ == '__main__':
    check_forward_equal_with_pytorch_double()
    check_forward_equal_with_pytorch_float()

    # for channels in [30, 32, 64, 71, 1025, 2048, 3096]:
    #     check_gradient_numerical(channels, True, True, True)

/bin/python /home/DETReg/models/ops/test.py
error in ms_deformable_im2col_cuda: initialization error
* False check_forward_equal_with_pytorch_double: max_abs_err 4.67e-03 max_rel_err 1.00e+00
error in ms_deformable_im2col_cuda: initialization error
* False check_forward_equal_with_pytorch_float: max_abs_err 6.50e-03 max_rel_err 1.00e+00

Bug: Target["area"] incorrect when using selective_search (and possibly others)

The selective_search function changes the boxes to xyxy coordinates.
boxes[..., 2] = boxes[..., 0] + boxes[..., 2] boxes[..., 3] = boxes[..., 1] + boxes[..., 3]

In [get_item] (

DETReg/datasets/selfdet.py

Line 67 in 36ae584

def __getitem__(self, item):

)
we have
boxes = selective_search(img, h, w, res_size=128)
...
target['boxes'] = torch.tensor(boxes)
...
target['area'] = target['boxes'][..., 2] * target['boxes'][..., 3]

But boxes at this point on in xyxy not cxcywh, So the "area" is incorrect. I do not know if this effects anything down the line, it may not.

Caching boxes leads to worse class agnostic detection performance

TODO: remove caching and update readme

pre-training

I used 3424 images from Imagenet100 for pre-training. And then fine-tuning on the COCO data set yields almost the same accuracy. Why?

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

Hi,

I'm getting this error when trying to run the Demo notebook on Colab. Any idea why? Thanks

Edit: I got this working on my local GPU by following the 'Compiling CUDA operators' instructions. But how do you do the same on Colab? Any help is appreciated. Thanks

Expected Codebase Release Date

Hi,

I just wanted to check if you have any expected release date for this repository?

Thanks

doubts

I have generated the pretraining boxes of a new dataset according to your guidance, but I have some doubts that there is no information related to the val set boxes generated in the code. I browse the pretraining boxes in ss_box_cache.tar.gz. I found some '.npy ' files for val set. I am very confused and hope to get some help. Thank you very much!

Not implemented on the CPU

i meet this error:

Traceback (most recent call last):
File "wkktest.py", line 22, in
res = model(im_t.unsqueeze(0))
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_detr.py", line 138, in forward
return self.forward_samples(samples)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_detr.py", line 167, in forward_samples
query_embeds)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_transformer.py", line 153, in forward
memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_transformer.py", line 256, in forward
output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_transformer.py", line 221, in forward
src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/ops/modules/ms_deform_attn.py", line 113, in forward
value, input_spatial_shapes, input_level_start_index, sampling_locations, attention_weights, self.im2col_step)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/ops/functions/ms_deform_attn_func.py", line 26, in forward
value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, ctx.im2col_step)
RuntimeError: Not implemented on the CPU

another error:
i run these:
cd ./models/ops
sh ./make.sh
and when i run python test.py, it report:
RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 31.75 GiB total capacity; 22.52 GiB already allocated; 656.44 MiB free; 23.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

my device is v100.
how to run the detreg code?

error occured when following the Compiling CUDA operators step.

Hello, when I try to run sh ./make.sh by following the Compiling CUDA operators it always show the error,
Traceback (most recent call last):
File "setup.py", line 69, in
ext_modules=get_extensions(),
File "setup.py", line 47, in get_extensions
raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

Any idea why this happens? Thanks!

Can`t install MultiScaleDeformableAttention

Hi,
I have seen others have resolved this, but no clues are left behind.
I was not able to install the MultiScaleDeformableAttention package from pip or conda, and there is nothing in Readme.

Please assist.
Thank you!

First of all, congratulations on the CVPR2022 in this article, and secondly, can you provide some code of your experiment?

Question about error in the GoogleColab demo

Hi! I'm a beginner. I'm sorry for my poor English.
Sorry, same question as #33.

Content

I did a demo on GoogleColab and received the following error in the 'Define model and load checkpoint' cell.
How can I solve this problem?
Or can I plot the results locally instead of GoogleColab?

It's resolved. Sorry for my lack of knowledge.

Error I've received

ModuleNotFoundError Traceback (most recent call last)
in ()
3 from PIL import Image
4 import requests
----> 5 from main import get_args_parser
6 from models import build_model
7 from argparse import Namespace

6 frames
/content/gdrive/My Drive/DETReg/models/ops/functions/ms_deform_attn_func.py in ()
16 from torch.autograd.function import once_differentiable
17
---> 18 import MultiScaleDeformableAttention as MSDA
19
20

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

Matching predictions and targets that belong to different images

Hi! Thank you for your code and work!

It seems that it is possible to match predictions of image A with targets of image B if the batch is bigger than one. Does it lead to the potentially worse solution when finetuning on COCO and pertaining?

DETReg/models/matcher.py

Line 73 in 4f9cb9a

tgt_ids = torch.cat([v["labels"] for v in targets])

Results between IN100 and IN1k setting

In the arXiv v1 version, the fine-tune result on COCO is 45.5 with IN100 pretrain. But in the arXiv v2 version, it seems the fine-tune result on COCO is still 45.5, but the pretrain dataset is IN1k. So, in my understanding, with more pretrain data, but the fine-tune result is not improved?

the result turns out to be huge difference :

Please help me, did i missing anything in reproducing ?

By the way, i can reproduce the full COCO result @45.5AP. So the conda env is probably right.

	def build_swav_backbone(args, device):
	model = resnet50(
	normalize=True,
	hidden_mlp=2048,
	output_dim=128,
	)
	for name, parameter in model.named_parameters():
	parameter.requires_grad_(False)

	checkpoint = torch.hub.load_state_dict_from_url(
	'https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar', map_location="cpu")
	state_dict = {k.replace("module.", ""): v for k, v in checkpoint.items()}
	missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False)
	return model.to(device)

	def build_swav_backbone_old(args, device):
	train_backbone = False
	return_interm_layers = args.masks or (args.num_feature_levels > 1)
	model = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation, load_backbone=args.load_backbone).to(device)
	def model_func(elem):
	return model(elem)['0'].mean(dim=(2,3))
	return model_func

	if args.dataset_file == 'coco':
	num_classes = 90
	elif args.dataset_file == 'coco_panoptic':
	num_classes = 250
	else:
	num_classes = 20
	num_classes += 1

amirbar / detreg Goto Github PK

detreg's People

Contributors

Stargazers

Watchers

Forkers

detreg's Issues

Content

Error I've received

Recommend Projects

Recommend Topics

Recommend Org