Coder Social home page Coder Social logo

amirbar / detreg Goto Github PK

View Code? Open in Web Editor NEW
335.0 335.0 46.0 803 KB

Official implementation of the CVPR 2022 paper "DETReg: Unsupervised Pretraining with Region Priors for Object Detection".

Home Page: https://amirbar.net/detreg

License: Apache License 2.0

Python 80.66% Shell 1.45% C++ 1.62% Cuda 16.27%
deep-learning object-detection pytorch unsupervised-learning

detreg's People

Contributors

amirbar avatar vadimkantorov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

detreg's Issues

Fine Tuning the Model on a fraction of VOC

Hi @amirbar,

Thank You for the great work. It looks like the parameter --filter_pct has never been used in the code. It means the code effectively running fine-tuning on whole VOC/COCO datasets. Please correct me if I am wrong.

Thanks

May you provide training logs for pretrain model?

Thanks for the great work! I want to train the pretrain model based on IMN1k. The class number is 92, but the class label is 1 only. The loss of cate_error is always 0 or 100. May you provide training logs for pretrain model?

Which hyperparameters are used in the fewshot setting

Hi, what are the batchsize, lr, lr_drop, and epoch used in the fewshot setting?
There are two fewshot experiments, as shown in Tabel 3 and 4. Please share the hyperparameters for the two experiments. Many thx~~

few-shot detection performance on COCO 2014

hi,

Interesting work.

Following the training and evaluation pipeline of your paper, I have reproduced the full-supervised object detection results on MS COCO 2017 (Table 1). However, the produced few-shot results are significantly worse than that of your paper on MS-COCO 2014 (30shot-seed0, Table 3). Specifically, I first train the model on base classes (60 classes, 99k labeled images) from ImageNet pretraining and then fine-tune it on few-shot labeled images (80 classes, every class has 30 instances). Finally, the results (all classes, base classes, and novel classes) of 5000 validation images are reported as follows. Here, we set all hyperparameters as the same as the fully-supervised training.

evaluation type AP AP75
All classes 29.9 32.7
Base classes 33.0 35.9
Novel classes 22.0 23.9
Novel classes (Table 3 in your paper) 30.0 33.7

In the above table, we can see that the reproduced results are significantly worse than that of your paper on novel classes (about 10 AP). Could you like to share the results of base classes and all classes with your model, or the hyper-parameters of few-shot training (for example, epoch, learning rate, and more)?

Best,
Bin-Bin ([email protected])

How to run DETReg on video custom dataset ?

Could you provide me some feedback about this possibility and some guidance on where is the video custom data loading actually happening so that I could check?

Thank you for your time

Question About Matching

Hello- first off, great work! I have really enjoyed working through this repo.
It appears that the Hungarian Matcher does not use the output and target object embeddings when determining the matching. Why is this? It appears the embeddings still contribute to the loss.
Also, I understand that there are usually more outputs than targets- In the paper it says you pad the targets with non-objects (I assumed these would be something like random crops from portions of the image for which selective search returned no boxes), but in the code it looks like you discard unmatched outputs as non-objects. Can you explain this a bit more? It seems to me that with no non-object targets, the DETR model would simply learn to assign an object class to every detection no matter what and that portion of the loss would quickly collapse to zero.
Thanks again!

Missing file (util/plot_utils.py)

image

Hi,

While creating the SS boxes, I am getting the attached error. It looks like the file util/plot_utils.py is missing. Please advise. Thanks

Reproducing the Results of Table 3 & Table 4

Hi,

How can I reproduce the results of Table 3 which is about Few-shot detection performance for the 20 novel categories on the COCO dataset.

Also, in Table 4 (Comparison to semi-supervised detection methods), it is mentioned in the paper that you pretrained the network on the entire coco train2017 unlabeled images and then fine-tuned on X% of data. But the instructions in README or the corresponding config files load the ImageNet100 pretrained weights for fine-tuning on COCO. Kindly guide me, what process may I follow to reproduce the results reported in Table 4.

Thanks

RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

Hi, I'm trying to run the pretraining but I receive a mismatch size here https://github.com/amirbar/DETReg/blob/main/models/deformable_detr.py#L328, src_features has a shape of torch.Size([228, 512]) and target_features a shape of torch.Size([228, 3, 128, 128]). Is this ok?

Start training
/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py:329: UserWarning: Using a target size (torch.Size([228, 3, 128, 128])) that is different to the input size (torch.Size([228, 512])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')}
Traceback (most recent call last):
File "main.py", line 403, in
main(args)
File "main.py", line 314, in main
model, swav_model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "/home/jossalgon/notebooks/unsupervised/DETReg/engine.py", line 50, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 406, in forward
losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, **kwargs))
File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 381, in get_loss
return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs)
File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 329, in loss_object_embedding_loss
return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')}
File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py", line 3058, in l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/functional.py", line 73, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined]
RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 188, in main
cmd=process.args)

Using:
cudatoolkit 11.1.74 h6bb024c_0 nvidia/linux-64
pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch/linux-64
torchaudio 0.9.0 py37 pytorch/linux-64
torchvision 0.10.0 py37_cu111 pytorch/linux-64

Thanks and great work!

What happens when Imagenet pre-training is replaced with COCO pre-training in Table-1?

Hi,
Great work! I was just wondering how important is the imagenet pre-training in Table-1. What happens if we pre-train DETReg from scratch on MS-COCO train 2017 set and use that to compare against methods (pre-trained on ImageNet) in Table-1? Did you try that experiment? Is it worse than pre-training on ImageNet 1k? Is it because of fewer images or is it because of bad region proposals on scene centric datasets like COCO?
Thanks

Only one gpu

Would you like to share how to train with a single GPU.

What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

DETReg/models/backbone.py

Lines 156 to 177 in 0a258d8

def build_swav_backbone(args, device):
model = resnet50(
normalize=True,
hidden_mlp=2048,
output_dim=128,
)
for name, parameter in model.named_parameters():
parameter.requires_grad_(False)
checkpoint = torch.hub.load_state_dict_from_url(
'https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar', map_location="cpu")
state_dict = {k.replace("module.", ""): v for k, v in checkpoint.items()}
missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False)
return model.to(device)
def build_swav_backbone_old(args, device):
train_backbone = False
return_interm_layers = args.masks or (args.num_feature_levels > 1)
model = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation, load_backbone=args.load_backbone).to(device)
def model_func(elem):
return model(elem)['0'].mean(dim=(2,3))
return model_func

It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?

Thanks.

Can you provide you hyperparameter settings of DETReg in Few-Shot Object Detection?

Thanks for your great work very much!
I'm intersted in using your unsupervised pretrained DETReg in few shot object detection, so can you provide the hyperparameter of DETReg in FSOD? (such as total epochs/iterations, optimizer, initial learning rate, learning rate dropping iterations/epochs, weight decay and so on). Or maybe your hyperparameter of FSOD is same as Meta-DETR? (becaused I noticed you compared your results with Meta-DETR.)
Sincerely waiting for your reply!

It seems in the pretrain stage the network output 90 categories instead of 2

Hello, It seems the network output 90 categories instead of 2, in the pretrain stage.
In the paper, it supposes to output 2 categories (either back gourd or foreground), which is not true in the code.
I'm so confused, Am i missing something?

python -u main.py --output_dir ${EXP_DIR} --dataset imagenet100 --strategy topk --load_backbone swav --max_prop 30 --object_embedding_loss --lr_backbone 0 ${PY_ARGS}

DETReg/main.py

Line 120 in 490e404

parser.add_argument('--dataset_file', default='coco')

if args.dataset_file == 'coco':
num_classes = 90
elif args.dataset_file == 'coco_panoptic':
num_classes = 250
else:
num_classes = 20
num_classes += 1

inconsistent description?

hi, @amirbar , in paper, you describe "We pretrain two variants of DETReg based on DETR [5] and Deformable DETR [71] detectors for 5 and 60 epochs on IN1K and IN100, respectively", are you saying for both in1k and in100 datasets, we pretrain detr for 5 epochs and deformable detr for 60 epochs?

but in the model zoo table of the readme file, i see you pretrain detr for 60 epochs and deformable detr for 5 epochs, is it a typo?

Pretrained model on ImageNet-1K

Hi, Thank you for sharing your great work. I am conducting a study on the features of DETReg, and wanted to explore the performance with the pretrained model trained on the full ImageNet. I was wondering if you could share an ImageNet-1K pretrained model?

Thank you

error in ms_deformable_im2col_cuda: initialization error

Follow Installation "Compiling CUDA operators" and execute "python test.py", i delete redundant code, it occur a error that
some problem in function ms_deformable_im2col_cuda. i wonder how fix it.

...
if __name__ == '__main__':
    check_forward_equal_with_pytorch_double()
    check_forward_equal_with_pytorch_float()

    # for channels in [30, 32, 64, 71, 1025, 2048, 3096]:
    #     check_gradient_numerical(channels, True, True, True)
/bin/python /home/DETReg/models/ops/test.py
error in ms_deformable_im2col_cuda: initialization error
* False check_forward_equal_with_pytorch_double: max_abs_err 4.67e-03 max_rel_err 1.00e+00
error in ms_deformable_im2col_cuda: initialization error
* False check_forward_equal_with_pytorch_float: max_abs_err 6.50e-03 max_rel_err 1.00e+00

Bug: Target["area"] incorrect when using selective_search (and possibly others)

The selective_search function changes the boxes to xyxy coordinates.
boxes[..., 2] = boxes[..., 0] + boxes[..., 2] boxes[..., 3] = boxes[..., 1] + boxes[..., 3]

In [get_item] (

def __getitem__(self, item):
)
we have
boxes = selective_search(img, h, w, res_size=128)
...
target['boxes'] = torch.tensor(boxes)
...
target['area'] = target['boxes'][..., 2] * target['boxes'][..., 3]

But boxes at this point on in xyxy not cxcywh, So the "area" is incorrect. I do not know if this effects anything down the line, it may not.

pre-training

I used 3424 images from Imagenet100 for pre-training. And then fine-tuning on the COCO data set yields almost the same accuracy. Why?

doubts

I have generated the pretraining boxes of a new dataset according to your guidance, but I have some doubts that there is no information related to the val set boxes generated in the code. I browse the pretraining boxes in ss_box_cache.tar.gz. I found some '.npy ' files for val set. I am very confused and hope to get some help. Thank you very much!

Not implemented on the CPU

i meet this error:

Traceback (most recent call last):
File "wkktest.py", line 22, in
res = model(im_t.unsqueeze(0))
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_detr.py", line 138, in forward
return self.forward_samples(samples)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_detr.py", line 167, in forward_samples
query_embeds)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_transformer.py", line 153, in forward
memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_transformer.py", line 256, in forward
output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/deformable_transformer.py", line 221, in forward
src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
File "/ssd2/wangkangkang/anaconda3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/ops/modules/ms_deform_attn.py", line 113, in forward
value, input_spatial_shapes, input_level_start_index, sampling_locations, attention_weights, self.im2col_step)
File "/ssd2/wangkangkang/projects/test_detreg/another/DETReg-main/models/ops/functions/ms_deform_attn_func.py", line 26, in forward
value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, ctx.im2col_step)
RuntimeError: Not implemented on the CPU

another error:
i run these:
cd ./models/ops
sh ./make.sh
and when i run python test.py, it report:
RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 31.75 GiB total capacity; 22.52 GiB already allocated; 656.44 MiB free; 23.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

my device is v100.
how to run the detreg code?

error occured when following the Compiling CUDA operators step.

Hello, when I try to run sh ./make.sh by following the Compiling CUDA operators it always show the error,
Traceback (most recent call last):
File "setup.py", line 69, in
ext_modules=get_extensions(),
File "setup.py", line 47, in get_extensions
raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

Any idea why this happens? Thanks!

Can`t install MultiScaleDeformableAttention

Hi,
I have seen others have resolved this, but no clues are left behind.
I was not able to install the MultiScaleDeformableAttention package from pip or conda, and there is nothing in Readme.

Please assist.
Thank you!

Question about error in the GoogleColab demo

Hi! I'm a beginner. I'm sorry for my poor English.
Sorry, same question as #33.

Content

I did a demo on GoogleColab and received the following error in the 'Define model and load checkpoint' cell.
How can I solve this problem?
Or can I plot the results locally instead of GoogleColab?

It's resolved. Sorry for my lack of knowledge.

Error I've received

ModuleNotFoundError Traceback (most recent call last)
in ()
3 from PIL import Image
4 import requests
----> 5 from main import get_args_parser
6 from models import build_model
7 from argparse import Namespace

6 frames
/content/gdrive/My Drive/DETReg/models/ops/functions/ms_deform_attn_func.py in ()
16 from torch.autograd.function import once_differentiable
17
---> 18 import MultiScaleDeformableAttention as MSDA
19
20

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

Results between IN100 and IN1k setting

In the arXiv v1 version, the fine-tune result on COCO is 45.5 with IN100 pretrain. But in the arXiv v2 version, it seems the fine-tune result on COCO is still 45.5, but the pretrain dataset is IN1k. So, in my understanding, with more pretrain data, but the fine-tune result is not improved?

Question about reproducing the Semi-supervised Learning experiment

When i using this checkpoint as pretrain

image

and using these script to reproducing the Semi-supervised Learning experiment

image

the result turns out to be huge difference :

image

Please help me, did i missing anything in reproducing ?

By the way, i can reproduce the full COCO result @45.5AP. So the conda env is probably right.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.