shenyunhang / drn-wsod-pytorch Goto Github PK

View Code? Open in Web Editor NEW

50.0 6.0 10.0 2.83 MB

Enabling Deep Residual Networks for Weakly Supervised Object Detection

Home Page: https://github.com/shenyunhang/DRN-WSOD-pytorch/tree/DRN-WSOD/projects/WSL

License: Apache License 2.0

Python 89.92% Shell 0.59% C++ 4.13% Cuda 5.27% Dockerfile 0.07% CMake 0.03%

weakly-supervised-detection weakly-supervised-learning object-detection weakly-supervised-object-detection

drn-wsod-pytorch's People

Contributors

Stargazers

Watchers

Forkers

xuyunqiu sisrfeng liujianlun cynthiacoding mrk1992 theshy456 renzhidada youhuang67 qu-zhenyu sinamalakouti

drn-wsod-pytorch's Issues

ITER_SIZE different for R50 and WSR50 (pascal voc detection)

Why would there be such a difference? Thanks!

None for config key: DATASETS

Hi Shenyun, thanks for sharing the codes.

I got an error when trying to train a PCL ResNet 101 WS by the command

python3 projects/WSL/tools/train_net.py --num-gpus 4 --config-file projects/WSL/configs/PascalVOC-Detection/pcl_WSR_101_DC5_1x.yaml OUTPUT_DIR output/pcl_WSR_101_DC5_VOC07_`date +'%Y-%m-%d_%H-%M-%S'`

The error message is shown below:

Traceback (most recent call last):
  File "tools/train_net.py", line 255, in <module>
    args=(args,),
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "tools/train_net.py", line 218, in main
    cfg = setup(args)
  File "tools/train_net.py", line 210, in setup
    cfg.merge_from_file(args.config_file)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/detectron2/config/config.py", line 49, in merge_from_file
    self.merge_from_other_cfg(loaded_cfg)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/fvcore/common/config.py", line 120, in merge_from_other_cfg
    return super().merge_from_other_cfg(cfg_other)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/yacs/config.py", line 474, in _merge_a_into_b
    v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/yacs/config.py", line 537, in _check_and_coerce_cfg_value_type
    original_type, replacement_type, original, replacement, full_key
ValueError: Type mismatch (<class 'detectron2.config.config.CfgNode'> vs. <class 'NoneType'>) with values (PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: ()
PROPOSAL_FILES_TRAIN: ()
PROPOSAL_FILES_VAL: ()
TEST: ()
TRAIN: ()
VAL: () vs. None) for config key: DATASETS

Seems that it fails to follow the base setting for DATASETS in Base-RCNN-DilatedC5.yaml

Config wsddn_R_50 has RES5_DILATION=1, but for some reason this incurs spatial_scale=1/8

https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/DRN-WSOD/projects/WSL/configs/PascalVOC-Detection/wsddn_R_50_DC5_1x.yaml#L11

and then in the log I get:

  (roi_heads): WSDDNROIHeads(
    (box_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): RoIPool(output_size=(7, 7), spatial_scale=0.03125)
      )
    )

Should it not be 1/32 as the stride of unmodified R_50?

Thanks!

Why is only the last box regression module learned?

In all reg configs:

WSL:
  REFINE_NUM: 4
  REFINE_REG: [False, False, False, True]

Why is only the last box regression layer learned?

Intermediate ones don't help?

Number of epochs for VOC07 / WSDDN difference for R50 and WSR50

WSR50 config has 160 epochs: https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/DRN-WSOD/projects/WSL/configs/PascalVOC-Detection/wsddn_WSR_50_DC5_1x.yaml

R50 config has 28 epochs: https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/DRN-WSOD/projects/WSL/configs/PascalVOC-Detection/wsddn_R_50_DC5_1x.yaml

(this is true for WSR and R configs in gneral)

Why is this the case?

Thanks!

Instructions To Reproduce the Issue:

The URL provided to download the pretrained models can not be accessed. Could you kindly provide an alternative?
Or can I somehow transcript some other pretrain weights to become detectron2 compatible?

Thanks.

Expected behavior:

Environment:

The trained models on COCO and Pascal VOC dataset

Thanks for your great work. Here is one request: could you provide your trained models on coco and pascal voc datasets in your paper?

Supervised objectness score from MCG used for modulation of proposal features

It seems that MCG objectness scores are used to modulate proposal features: https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/DRN-WSOD/projects/WSL/wsl/modeling/roi_heads/roi_heads_wsddn.py#L286

From what I understand, MCG proposals objectness score is trained using strong supervision.

Do you have results without this modulation?

Can't train with multiple GPUs

Hi. Thanks for your work. The code works in single GPU training but when I try to run in multiple GPUs mode I got an error.

The command I run:

python3 projects/WSL/tools/train_net.py --num-gpus 2 --config-file projects/WSL/configs/PascalVOC-Detection/oicr_WSR_101_DC5_1x.yaml OUTPUT_DIR output/oicr_WSR_101_DC5_VOC07_`date +'%Y-%m-%d_%H-%M-%S'`

The error message:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/launch.py", line 94, in _distributed_worker
    main_func(*args)
  File "/home/projects/DRN-WSOD-pytorch/projects/WSL/tools/train_net.py", line 243, in main
    return trainer.train()
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 399, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 140, in train
    self.run_step()
  File "/home/projects/DRN-WSOD-pytorch/projects/WSL/tools/train_net.py", line 88, in run_step
    loss_dict = self.model(data)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 528, in forward
    self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

Do you have any suggestions?

Dataset Preparation

Thank you for your work!
I try to use the new data set for training, but I don't know how to construct my data format. Can you help me?

Look forward to your reply！

proposal_convert.py

python3 projects/WSL/tools/proposal_convert.py voc_2007_train datasets/proposals/MCG-Pascal-Main_trainvaltest_2007-boxes datasets/proposals/mcg_voc_2007_train_d2.pkl
mat_data["boxes"]
the mat_data has no attribute of boxes??

Would you mind briefly introducingthis repo, instead of Detectron2?

Some questions about the network

Hello, I am a beginner in this field. I recently discovered some magical designs I didn’t understand when running the code. I hope to get your help.

In the WSDDN(2016) paper, the author Bilen said

Region-level features are further processed by two fully connected layers

What is the difference between this design and the Redundant Adaptation Neck design in this paper?

2.In the part of Robust Information Flow, you said

" we replace the original stem block with three conservative 3 × 3 convolutions, with the first and third convolutions followed by 2×2 MaxPool layers."

I want to confirm with you whether the first convolution here refers to the stem structure. If so，I noticed that the maxpool stride following the third convolutions is 1. I want to know the role of this layer.

Look forward to your reply！

How to run an inference after training the WSL model?

Config for pretraining the ResNet*-WS backbones

Hi!

What config / command-line were used for pretraining the ResNet*-WS backbones (the provided pickled weights)?

Thanks!

Pretrained weights for R50-WS contains some strange tensors

Hi!

What are the fully-connected 'fc1_w', 'fc1_b', 'fc2_w', 'fc2_b', 'last_out_L1000_w', 'last_out_L1000_b' layer weights contained in resnet50_ws_model_120_d2.pkl?

These look like fully-connected layers for VGG? For ResNet there should be no such layers, right?

The shapes are

>>> loaded['fc1_w'].shape
(2048, 100352) == 2048 * 7 * 7
>>> loaded['fc1_b'].shape
(2048,)
>>> loaded['fc2_w'].shape
(4096, 2048)
>>> loaded['fc2_b'].shape
(4096,)
>>> loaded['last_out_L1000_w'].shape
(1000, 4096)
>>> loaded['last_out_L1000_b'].shape

Thank you!

What is meaning of MEAN_LOSS = False | True

What's effective loss scaling? Does it sum or mean over classes? over batch size?

How does it interact with distributed training? Is there anywhere scaling over the world size?

two return in roi_heads.py function

Instructions To Reproduce the 🐛 Bug:

There is two return in roi_heads.py/_sample_proposals() function, I wonder does it a bug, or what does it mean?

Code Link

   def _sample_proposals(
        self, matched_idxs: torch.Tensor, matched_labels: torch.Tensor, gt_classes: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor]:
      
        has_gt = gt_classes.numel() > 0
        # Get the corresponding GT for each proposal
        if has_gt:
            gt_classes = gt_classes[matched_idxs]
            # Label unmatched proposals (0 label from matcher) as background (label=num_classes)
            gt_classes[matched_labels == 0] = self.num_classes
            # Label ignore proposals (-1 label)
            gt_classes[matched_labels == -1] = -1
        else:
            gt_classes = torch.zeros_like(matched_idxs) + self.num_classes

        sampled_idxs = torch.arange(gt_classes.shape[0])
        return sampled_idxs, gt_classes[sampled_idxs]

        sampled_fg_idxs, sampled_bg_idxs = subsample_labels(
            gt_classes, self.batch_size_per_image, self.positive_fraction, self.num_classes
        )

        sampled_idxs = torch.cat([sampled_fg_idxs, sampled_bg_idxs], dim=0)
        return sampled_idxs, gt_classes[sampled_idxs]

How important is dropout in DAN?

Droput prob here is quite high - 0.5, and it is not discussed in the paper...

Results don't match paper

I just tried running your ResNet18 WS model on VOC07 (PascalVOC-Detection/oicr_WSR_18_DC5_1x.yaml). I changed the scales in the config to match the ones in the paper (i.e. the standard [480, 576, 688, 864, 1200]) for both training and testing. The results I got were only ~42 mAP however your paper reports ~51 mAP. This is quite a significant discrepancy. Any suggestions as to how one might reproduce the published results?

Exception: process 3 terminated with signal SIGKILL

Hi, when using more than 1 GPU I get the following error:


Traceback (most recent call last):
  File "projects/WSL/tools/train_net.py", line 257, in <module>
    args=(args,),
  File "DRN-WSOD-pytorch/detectron2/engine/launch.py", line 59, in launch                                                                                                                   
    daemon=False,
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
    (error_index, name)
Exception: process 3 terminated with signal SIGKILL

Do you know what could be the problem?
If I use only one GPU, I get a memory error when loading the weights.

This is my configuration:

----------------------  -------------------------------------------------------------------------------
sys.platform            linux
Python                  3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0]
numpy                   1.19.5
detectron2              0.2 @/home/usr_341317_ulta_com/work/brand-detection/DRN-WSOD-pytorch/detectron2
Compiler                GCC 8.3
CUDA compiler           CUDA 11.0
detectron2 arch flags   sm_70
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.7.1 @/opt/conda/lib/python3.7/site-packages/torch
PyTorch debug build     False
GPU available           True
GPU 0,1,2,3             Tesla V100-SXM2-16GB
CUDA_HOME               /usr/local/cuda
Pillow                  8.1.0
torchvision             0.8.2 @/opt/conda/lib/python3.7/site-packages/torchvision
torchvision arch flags  sm_35, sm_50, sm_60, sm_70, sm_75, sm_80
fvcore                  0.1.4.post20210323
cv2                     4.5.1
----------------------  -------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.4 Product Build 20200917 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

Thanks

shenyunhang / drn-wsod-pytorch Goto Github PK

drn-wsod-pytorch's People

Contributors

Stargazers

Watchers

Forkers

drn-wsod-pytorch's Issues

Instructions To Reproduce the Issue:

Expected behavior:

Environment:

Instructions To Reproduce the 🐛 Bug:

Recommend Projects

Recommend Topics

Recommend Org