Coder Social home page Coder Social logo

shenyunhang / drn-wsod-pytorch Goto Github PK

View Code? Open in Web Editor NEW
50.0 6.0 10.0 2.83 MB

Enabling Deep Residual Networks for Weakly Supervised Object Detection

Home Page: https://github.com/shenyunhang/DRN-WSOD-pytorch/tree/DRN-WSOD/projects/WSL

License: Apache License 2.0

Python 89.92% Shell 0.59% C++ 4.13% Cuda 5.27% Dockerfile 0.07% CMake 0.03%
weakly-supervised-detection weakly-supervised-learning object-detection weakly-supervised-object-detection

drn-wsod-pytorch's People

Contributors

alexander-kirillov avatar arutyunovg avatar botcs avatar bryant1410 avatar chenbohua3 avatar endernewton avatar jonmorton avatar jss367 avatar kondela avatar lyttonhao avatar marcszafraniec avatar maxfrei750 avatar obendidi avatar pakornvs avatar patricklabatut avatar ppwwyyxx avatar rajprateek avatar raymondcm avatar rbgirshick avatar sampepose avatar shenyunhang avatar stanislavglebik avatar superirabbit avatar timofurrer avatar vkhalidov avatar wangg12 avatar wat3rbro avatar xmyqsh avatar yanicklandry avatar yfeldblum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

drn-wsod-pytorch's Issues

None for config key: DATASETS

Hi Shenyun, thanks for sharing the codes.

I got an error when trying to train a PCL ResNet 101 WS by the command

python3 projects/WSL/tools/train_net.py --num-gpus 4 --config-file projects/WSL/configs/PascalVOC-Detection/pcl_WSR_101_DC5_1x.yaml OUTPUT_DIR output/pcl_WSR_101_DC5_VOC07_`date +'%Y-%m-%d_%H-%M-%S'`

The error message is shown below:

Traceback (most recent call last):
  File "tools/train_net.py", line 255, in <module>
    args=(args,),
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "tools/train_net.py", line 218, in main
    cfg = setup(args)
  File "tools/train_net.py", line 210, in setup
    cfg.merge_from_file(args.config_file)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/detectron2/config/config.py", line 49, in merge_from_file
    self.merge_from_other_cfg(loaded_cfg)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/fvcore/common/config.py", line 120, in merge_from_other_cfg
    return super().merge_from_other_cfg(cfg_other)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/yacs/config.py", line 474, in _merge_a_into_b
    v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key)
  File "/home/xx/envs/detectron2/lib/python3.6/site-packages/yacs/config.py", line 537, in _check_and_coerce_cfg_value_type
    original_type, replacement_type, original, replacement, full_key
ValueError: Type mismatch (<class 'detectron2.config.config.CfgNode'> vs. <class 'NoneType'>) with values (PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: ()
PROPOSAL_FILES_TRAIN: ()
PROPOSAL_FILES_VAL: ()
TEST: ()
TRAIN: ()
VAL: () vs. None) for config key: DATASETS

Seems that it fails to follow the base setting for DATASETS in Base-RCNN-DilatedC5.yaml

Pretrained Model Link Broken

Instructions To Reproduce the Issue:

The URL provided to download the pretrained models can not be accessed. Could you kindly provide an alternative?
Or can I somehow transcript some other pretrain weights to become detectron2 compatible?

Thanks.

Expected behavior:

Environment:

Can't train with multiple GPUs

Hi. Thanks for your work. The code works in single GPU training but when I try to run in multiple GPUs mode I got an error.

The command I run:

python3 projects/WSL/tools/train_net.py --num-gpus 2 --config-file projects/WSL/configs/PascalVOC-Detection/oicr_WSR_101_DC5_1x.yaml OUTPUT_DIR output/oicr_WSR_101_DC5_VOC07_`date +'%Y-%m-%d_%H-%M-%S'`

The error message:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/launch.py", line 94, in _distributed_worker
    main_func(*args)
  File "/home/projects/DRN-WSOD-pytorch/projects/WSL/tools/train_net.py", line 243, in main
    return trainer.train()
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 399, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 140, in train
    self.run_step()
  File "/home/projects/DRN-WSOD-pytorch/projects/WSL/tools/train_net.py", line 88, in run_step
    loss_dict = self.model(data)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/anaconda3/envs/detectron2/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 528, in forward
    self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

Do you have any suggestions?

Dataset Preparation

Thank you for your work!
I try to use the new data set for training, but I don't know how to construct my data format. Can you help me?

Look forward to your reply!

proposal_convert.py

python3 projects/WSL/tools/proposal_convert.py voc_2007_train datasets/proposals/MCG-Pascal-Main_trainvaltest_2007-boxes datasets/proposals/mcg_voc_2007_train_d2.pkl
mat_data["boxes"]
the mat_data has no attribute of boxes??

Some questions about the network

Hello, I am a beginner in this field. I recently discovered some magical designs I didn’t understand when running the code. I hope to get your help.

  1. In the WSDDN(2016) paper, the author Bilen said

Region-level features are further processed by two fully connected layers

What is the difference between this design and the Redundant Adaptation Neck design in this paper?

2.In the part of Robust Information Flow, you said

" we replace the original stem block with three conservative 3 × 3 convolutions, with the first and third convolutions followed by 2×2 MaxPool layers."

I want to confirm with you whether the first convolution here refers to the stem structure. If so,I noticed that the maxpool stride following the third convolutions is 1. I want to know the role of this layer.

Look forward to your reply!

Pretrained weights for R50-WS contains some strange tensors

Hi!

What are the fully-connected 'fc1_w', 'fc1_b', 'fc2_w', 'fc2_b', 'last_out_L1000_w', 'last_out_L1000_b' layer weights contained in resnet50_ws_model_120_d2.pkl?

These look like fully-connected layers for VGG? For ResNet there should be no such layers, right?

The shapes are

>>> loaded['fc1_w'].shape
(2048, 100352) == 2048 * 7 * 7
>>> loaded['fc1_b'].shape
(2048,)
>>> loaded['fc2_w'].shape
(4096, 2048)
>>> loaded['fc2_b'].shape
(4096,)
>>> loaded['last_out_L1000_w'].shape
(1000, 4096)
>>> loaded['last_out_L1000_b'].shape

Thank you!

What is meaning of MEAN_LOSS = False | True

What's effective loss scaling? Does it sum or mean over classes? over batch size?

How does it interact with distributed training? Is there anywhere scaling over the world size?

two return in roi_heads.py function

Instructions To Reproduce the 🐛 Bug:

There is two return in roi_heads.py/_sample_proposals() function, I wonder does it a bug, or what does it mean?

Code Link

   def _sample_proposals(
        self, matched_idxs: torch.Tensor, matched_labels: torch.Tensor, gt_classes: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor]:
      
        has_gt = gt_classes.numel() > 0
        # Get the corresponding GT for each proposal
        if has_gt:
            gt_classes = gt_classes[matched_idxs]
            # Label unmatched proposals (0 label from matcher) as background (label=num_classes)
            gt_classes[matched_labels == 0] = self.num_classes
            # Label ignore proposals (-1 label)
            gt_classes[matched_labels == -1] = -1
        else:
            gt_classes = torch.zeros_like(matched_idxs) + self.num_classes

        sampled_idxs = torch.arange(gt_classes.shape[0])
        return sampled_idxs, gt_classes[sampled_idxs]

        sampled_fg_idxs, sampled_bg_idxs = subsample_labels(
            gt_classes, self.batch_size_per_image, self.positive_fraction, self.num_classes
        )

        sampled_idxs = torch.cat([sampled_fg_idxs, sampled_bg_idxs], dim=0)
        return sampled_idxs, gt_classes[sampled_idxs]

Results don't match paper

I just tried running your ResNet18 WS model on VOC07 (PascalVOC-Detection/oicr_WSR_18_DC5_1x.yaml). I changed the scales in the config to match the ones in the paper (i.e. the standard [480, 576, 688, 864, 1200]) for both training and testing. The results I got were only ~42 mAP however your paper reports ~51 mAP. This is quite a significant discrepancy. Any suggestions as to how one might reproduce the published results?

Exception: process 3 terminated with signal SIGKILL

Hi, when using more than 1 GPU I get the following error:


Traceback (most recent call last):
  File "projects/WSL/tools/train_net.py", line 257, in <module>
    args=(args,),
  File "DRN-WSOD-pytorch/detectron2/engine/launch.py", line 59, in launch                                                                                                                   
    daemon=False,
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
    (error_index, name)
Exception: process 3 terminated with signal SIGKILL

Do you know what could be the problem?
If I use only one GPU, I get a memory error when loading the weights.

This is my configuration:

----------------------  -------------------------------------------------------------------------------
sys.platform            linux
Python                  3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0]
numpy                   1.19.5
detectron2              0.2 @/home/usr_341317_ulta_com/work/brand-detection/DRN-WSOD-pytorch/detectron2
Compiler                GCC 8.3
CUDA compiler           CUDA 11.0
detectron2 arch flags   sm_70
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.7.1 @/opt/conda/lib/python3.7/site-packages/torch
PyTorch debug build     False
GPU available           True
GPU 0,1,2,3             Tesla V100-SXM2-16GB
CUDA_HOME               /usr/local/cuda
Pillow                  8.1.0
torchvision             0.8.2 @/opt/conda/lib/python3.7/site-packages/torchvision
torchvision arch flags  sm_35, sm_50, sm_60, sm_70, sm_75, sm_80
fvcore                  0.1.4.post20210323
cv2                     4.5.1
----------------------  -------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.4 Product Build 20200917 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.