zjhuang22 / maskscoring_rcnn Goto Github PK
View Code? Open in Web Editor NEWCodes for paper "Mask Scoring R-CNN".
License: MIT License
Codes for paper "Mask Scoring R-CNN".
License: MIT License
I tried changing WORLD_SIZE and CUDA_VISIBLE_DEVICES environment variables, but I got errors when init process group:
os.environ['WORLD_SIZE'] = '2'
os.environ["CUDA_VISIBLE_DEVICES"] = "1,9"
num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
args.distributed = num_gpus > 1
Traceback (most recent call last):
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1741, in
main()
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/opt/maskscoring_RCNN/train_net.py", line 178, in
main()
File "/opt/maskscoring_RCNN/train_net.py", line 145, in main
backend="nccl", init_method="env://"
File "/root/anaconda3/lib/python3.6/site-packages/torch/distributed/deprecated/init.py", line 101, in init_process_group
group_name, rank)
RuntimeError: rank is not set but it is required for env:// init method at /opt/conda/conda-bld/pytorch_1549628766161/work/torch/lib/THD/process_group/General.cpp:20
Thanks for your great work!
In your paper, speed and computation is mentioned as "Our MaskIoU head has about 0.39G FLOPs while Mask head has about 0.53G FLOPs for each proposal."
As fa as I known, there should be at least 10 proposals for mask head in each image, so that's 3.9G and 5.3G for MaskIou and Mask head. However, ResNet-18 is about 2G, so why MaskIoU head didn't lead to slower inference? Thank you!
When I use simple GPU to train the network.I have a problem"AttributeError: 'list' object has no attribute 'resize'".Could you please tell me how to solve this problem.Thank you very much.
PyTorch version: 1.1.0.dev20190506
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 7.5.17
GPU models and configuration: GPU 0: GeForce GTX TITAN X
Nvidia driver version: 418.40.04
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.3
/usr/local/lib/libcudnn.so.5.1.10
Versions of relevant libraries:
[pip] numpy==1.16.3
[pip] torch==1.1.0.dev20190506
[pip] torchvision==0.2.3a0+d534785
[conda] blas 1.0 mkl
[conda] mkl 2019.3 199
[conda] mkl_fft 1.0.12 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch-nightly 1.1.0.dev20190506 py3.7_cuda9.0.176_cudnn7.5.1_0 pytorch
Pillow (6.0.0)
2019-05-08 22:23:42,841 maskrcnn_benchmark INFO: Loaded configuration file configs/e2e_ms_rcnn_R_50_FPN_1x.yaml
2019-05-08 22:23:42,842 maskrcnn_benchmark INFO:
.
.
.
.
2019-05-08 22:23:58,578 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 172, in
main()
File "tools/train_net.py", line 165, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 74, in train
arguments,
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 85, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/datasets/coco.py", line 36, in getitem
img, anno = super(COCODataset, self).getitem(idx)
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torchvision-0.2.3a0+d534785-py3.7.egg/torchvision/datasets/coco.py", line 114, in getitem
img, target = self.transforms(img, target)
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 14, in call
image, target = t(image, target)
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call
target = target.resize(image.size)
AttributeError: 'list' object has no attribute 'resize'
Have anyone reproduce the AP reaults shown in paper ?
I mean in paper , not the AP results on github.
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 143, in main
cfg.merge_from_file(args.config_file)
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 213, in merge_from_file
self.merge_from_other_cfg(cfg)
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 217, in merge_from_other_cfg
_merge_a_into_b(cfg_other, self, self, [])
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 460, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 473, in _merge_a_into_b
raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.PRETRAINED_MODELS'
loading annotations into memory...
Done (t=14.28s)
creating index...
index created!
loading annotations into memory...
Done (t=1.97s)
creating index...
index created!
2019-05-06 18:22:29,797 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 85, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/data/datasets/coco.py", line 36, in getitem
img, anno = super(COCODataset, self).getitem(idx)
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torchvision-0.2.3a0+7a4845a-py3.7.egg/torchvision/datasets/coco.py", line 114, in getitem
img, target = self.transforms(img, target)
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call
image, target = t(image, target)
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call
target = target.resize(image.size)
AttributeError: 'list' object has no attribute 'resize'
(with coco2017 dataset )
can anyone tell me how to fix it?thanks!!
I want to test on my own image and see the segmentation and bounding box result, please tell me how to do?
In the calculation of mask_iou_targets
,
But the pre_mask
here has not go through sigmoid, so I wonder why you use 0.5 as a threshold.
I followed install.md to setup environment, in the last step to install PyTorch maskscoring_rcnn, I got some error like this:
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:15:17: error: switch quantity not an integer
switch (TYPE) {
^
/home//github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:16:44: error: could not convert ‘Double’ from ‘c10::ScalarType’ to ‘’
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, VA_ARGS)
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:8:8: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
case enum_type: {
^
/home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:17:44: error: could not convert ‘Float’ from ‘c10::ScalarType’ to ‘’
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, VA_ARGS)
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:8:8: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
case enum_type: {
^
/home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
In file included from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/Scalar.h:10:0,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/core/Type.h:8,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Type.h:2,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
from /hom/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
from /home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/vision.h:3,
from /home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:2:
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/ScalarType.h:122:28: note: ‘c10::toString’
static inline const char * toString(ScalarType t) {
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/ScalarType.h:122:28: note: ‘c10::toString’
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/ScalarType.h:122:28: note: ‘c10::toString’
error: command 'gcc' failed with exit status 1
can you please help me?
Traceback (most recent call last):
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/tools/train_net.py", line 18, in
from maskrcnn_benchmark.engine.inference import inference
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/engine/inference.py", line 20, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: cannot import name '_C'
Hi, I want to ask when you evaluate with box AP, what's the score type you use, cls score or mask score. I just can not understand that if you use the mask score to evaluate box AP, the box AP must drop, but in your repo, the box AP nearly unchanged, so I think the results of box AP and mask AP shown come from two different scores, and you test twice?
it only took 1880 iterations and it stopped. May I ask what might be the reason for this?
2019-04-22 02:19:55,592 maskrcnn_benchmark.trainer INFO: eta: 9 days, 1:55:33 iter: 1840 loss: 0.6163 (0.7650) loss_classifier: 0.3722 (0.4438) loss_box_reg: 0.1609 (0.1872) loss_mask: 0.0471 (0.0711) loss_maskiou: 0.0093 (0.0110) loss_objectness: 0.0095 (0.0252) loss_rpn_box_reg: 0.0180 (0.0267) time: 1.0529 (1.0924) data: 0.0552 (0.0508) lr: 0.002500 max mem: 7203
2019-04-22 02:20:16,294 maskrcnn_benchmark.trainer INFO: eta: 9 days, 1:47:48 iter: 1860 loss: 0.6061 (0.7632) loss_classifier: 0.3617 (0.4429) loss_box_reg: 0.1469 (0.1867) loss_mask: 0.0573 (0.0709) loss_maskiou: 0.0125 (0.0111) loss_objectness: 0.0080 (0.0250) loss_rpn_box_reg: 0.0149 (0.0266) time: 1.0085 (1.0918) data: 0.0435 (0.0507) lr: 0.002500 max mem: 7203
2019-04-22 02:20:37,496 maskrcnn_benchmark.trainer INFO: eta: 9 days, 1:43:25 iter: 1880 loss: 0.5906 (0.7615) loss_classifier: 0.3566 (0.4420) loss_box_reg: 0.1496 (0.1863) loss_mask: 0.0485 (0.0707) loss_maskiou: 0.0114 (0.0111) loss_objectness: 0.0094 (0.0248) loss_rpn_box_reg: 0.0148 (0.0265) time: 1.0790 (1.0915) data: 0.0493 (0.0507) lr: 0.002500 max mem: 7203
Process finished with exit code -1
Because facebook's mask-rcnn has lots of files and I just want to know your algorithm code. And I have already known how to realize mask rcnn. So can you directly tell me where I can see your network design and your loss function.
And thanks.
Hi! I am using coco2017 dataset and have changed paths in paths_catalog from 2014 to corresponding 2017 ones. When I run the project I got this error:
No such file or directory: 'datasets/coco/annotations/instances_valminusminival2017.json'
It seems there is no such json file in coco datasets. May I ask what are these minival json files? How can I reproduce one?
When I use your script to run it with multi-gpus, error happened:
Traceback (most recent call last): File "tools/train_net.py", line 171, in <module> main() File "tools/train_net.py", line 140, in main backend="nccl", init_method="env://" File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/distributed/deprecated/__init__.py", line 101, in init_process_group group_name, rank) RuntimeError: Address already in use at /opt/conda/conda-bld/pytorch-nightly_1552799380021/work/torch/lib/THD/process_group/General.cpp:20 Traceback (most recent call last): File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/distributed/launch.py", line 238, in <module> main() File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/distributed/launch.py", line 234, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/chh/anaconda2/envs/maskrcnn_benchmark/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/e2e_ms_rcnn_R_50_FPN_1x.yaml']' returned non-zero exit status 1.
In your code, the maskiou value is predicted by the network and you use l2-loss.
But the observation of GT maskiou is [0,1], so is it better to do ReLu after you calculated the iou using the Linear layer?
My gpus are 2* RTX2080
os = ubuntu 16.04.6
cuda version = 9.0.176
python = 3.6
pytorch = 1.0.1
when I use script " export NGPUS=2
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 700"
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 167, in main
test(cfg, model, args.distributed)
File "tools/train_net.py", line 104, in test
maskiou_on=cfg.MODEL.MASKIOU_ON
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/engine/inference.py", line 379, in inference
predictions = compute_on_dataset(model, data_loader, device)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/engine/inference.py", line 31, in compute_on_dataset
output = model(images)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 51, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 43, in forward
loss_maskiou, detections = self.maskiou(roi_feature, detections, selected_mask, labels, maskiou_targets)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/roi_heads/maskiou_head/maskiou_head.py", line 41, in forward
x = self.feature_extractor(features, selected_mask)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/roi_heads/maskiou_head/roi_maskiou_feature_extractors.py", line 39, in forward
x = torch.cat((x, mask_pool), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 47 and 250 in dimension 0 at /opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THC/generic/THCTensorMath.cu:83
Thanks for the great work!
It seems the link provided in README is for ImageNet pretrained models? Could you please provide the R-50/R-101 MS-RCNN model that is used to produce your results in the paper?
I used the script "python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "maskscoring_rcnn/configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 TEST.IMS_PER_BATCH 2" to run the code twice. But in the second try I got an error: UnbondLocalError:local variable 'iteration' referenced before assignment.
hi @zjhuang22
do u have a plan to release the code of dcn?
Have you reproduce this on mmdetection? @zjhuang22
We mainly follow your code and modify mmdetection but get 1.5 reduction in detection and no gain on ins seg compared with mask rcnn on mmdetection.
What are the differences between the data representations of the two frameworks,is there any points for attention ?
The pretrained network gives strange results for images even from COCO dataset (the predictions are almost random)
Looks like the problem is the weights files.
What I've done:
configs/e2e_ms_rcnn_R_50_FPN_1x.yaml
and configs/e2e_ms_rcnn_R_101_FPN_1x.yaml
hello thank you for your great job!
I want to know how to obtain the visual result as you shown.
I just want to test one image, and show its mask result....
Can you tell me what should I do??
Maybe "tests/test_data_samplers.py", but I don't known how to run it?
Thanks!
hi @zjhuang22
i have a question that in this line, it seems that it returns one image's mask
and label
info. if one gpu holds more than one image, how to get the corresponding mask
and label
?
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 84 and 366 in dimension 0 at /opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THC/generic/THCTensorMath.cu:83
When I run train_net I got this.My datasets are coco2014.
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 133, in _load_file
return load_c2_format(self.cfg, f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 155, in load_c2_format
return C2_FORMAT_LOADER[cfg.MODEL.BACKBONE.CONV_BODY](cfg, f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 146, in load_resnet_c2_format
state_dict = _load_c2_pickled_weights(f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 124, in _load_c2_pickled_weights
data = pickle.load(f, encoding="latin1")
_pickle.UnpicklingError: pickle data was truncated.
I don't know why.
Hello, I only can use one gpu with your code. What should I do if I want to use multi-gpus, I noticed that you use "torch.distributed.deprecated.init_process_group()" to set it, but there are not many materials about bow to use it. Can you tell me how to use it ? thank you very much!
2019-03-12 15:03:22,550 maskrcnn_benchmark.trainer INFO: Start training
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/king/githubToolkit/maskscoring_rcnn/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 631, in next
idx, batch = self._get_batch()
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 610, in _get_batch
return self.data_queue.get()
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/queues.py", line 94, in get
res = self._recv_bytes()
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 274, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 2753) exited unexpectedly with exit code 127. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
Why this occur??Is that because my GPU memory or system shared memory limited??or ln command maked some potential mistakes?someone met this problem?
for segmentation_mask, proposal in zip(segmentation_masks, proposals):
cropped_mask = segmentation_mask.crop(proposal)
scaled_mask = cropped_mask.resize((M, M))
mask = scaled_mask.convert(mode="mask")
masks.append(mask)
if maskiou_on:
x1 = int(proposal[0])
y1 = int(proposal[1])
x2 = int(proposal[2]) + 1
y2 = int(proposal[3]) + 1
for poly_ in segmentation_mask.polygons:
poly = np.array(poly_, dtype=np.float32)
x1 = np.minimum(x1, poly[0::2].min())
x2 = np.maximum(x2, poly[0::2].max())
y1 = np.minimum(y1, poly[1::2].min())
y2 = np.maximum(y2, poly[1::2].max())
img_h = segmentation_mask.size[1]
img_w = segmentation_mask.size[0]
x1 = np.maximum(x1, 0)
x2 = np.minimum(x2, img_w-1)
y1 = np.maximum(y1, 0)
y2 = np.minimum(y2, img_h-1)
segmentation_mask_for_maskratio = segmentation_mask.crop([x1, y1, x2, y2])
Here is the code for segmentation_mask_for_maskratio
, if I am right, segmentation_mask_for_maskratio
is the segmentation cover the whole object. So why not directly calculate the area of segmentation_mask
but you firstly crop the area of [x1,y1,x2,y2]
. Thanks in advance!
Loading and preparing results...
DONE (t=3.93s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=44.62s).
Accumulating evaluation results...
DONE (t=5.38s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.558
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.380
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.161
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.379
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.517
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.298
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.450
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.468
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.269
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.505
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.619
2019-05-09 06:57:34,621 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.3759153459368749)
, ('AP50', 0.5908149513831737), ('AP75', 0.407623512259985), ('APs', 0.21592651568910282), ('APm', 0.40564621888303226),
('APl', 0.4964606429195723)])), ('segm', OrderedDict([('AP', 0.35404836208584806), ('AP50', 0.558382551808117), ('AP75'
, 0.379614103783724), ('APs', 0.1613041010105462), ('APm', 0.37909971606652115), ('APl', 0.5174932965096802)]))])
Hi, I use the repo to train on train2017 and test on val2017 (8 GPUs), the backbone is resnet-r50. I does not change any other config, but the final AP (mask) is 35.4 which is lower 0.2 than 35.6, is it normal? Is this the normal performance variation?
Hi, thanks for your job. Now I am trying this code, howerver I cannot train because maskrcnn_benchmark\layers\nms.py
. The code will stop at maskrcnn_benchmark\structures\boxlist_ops.py
line keep = _box_nms(boxes, score, nms_thresh)
, and there is no error. The env I use are gcc 5.4.0
pytorch stable 1.0
python3.7
cuda8.0
. How can I solve this problem. Thanks!
Hi, I want to train on diffierent dataset, so what should I modify?
Looking forward to your reply.
Hi, nice job man. But I'm still a little confused about the procedure of inference. In the original version of Mask-RCNN, NMS is processed among the proposals from RPN, and then the model use the rest proposals to generate cls-score and bbox-refinement. After that, another NMS is processed among the detections from cls-head and bbox-head. Finally, we feed the rest bbox to the mask-head and get the final result.
However, according to your paper, the mask-scores are used to refine the scores of cls-head, which means we should first get the mask-head outputs and then attach them to the cls-head. I'm just confused that how can the refined-scores help? Do you use those refined-scores to redo NMS among the detections from cls-head and -bbox-head and then re-feed them to mask-head to get some new segmentation results? So if I were correct, you had run the mask-head twice to get more accurate results?
I checked the paper and really thx for the awesome work and code share!
Previously I thought "MaskIoU" is that you're training the mask with IoU directly (dice loss) -- your predict a mask for box i of class A, and you use the ground of box i of class A to compute IoU and use that as a loss to train your mask-rcnn in a multi-task learning fashion.
Checking the paper more carefully I realized you're feeding the predicted mask to a CNN to compute IoU rather than compute it directly as https://github.com/kevinzakka/pytorch-goodies/blob/master/losses.py#L54 I wonder if this approach is prone to overfitting, have you compared the results with/without "concatenation" and see how much the MaskIoU head is actually relying on RoI feature maps?
Could anyone explain to me the meaning of mask_ratios
and why we need to compute that? Thanks in advance. The calculation of mask_ratios
is here:
since i want add ur code into my own and do the trainings, so i want to reproduce ur results. can u provide the training log files?
(maskrcnn) wuyi@nclab:~/github/maskscoring_rcnn/tools$ python test_net.py
Traceback (most recent call last):
File "test_net.py", line 12, in
from maskrcnn_benchmark.engine.inference import inference
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/engine/inference.py", line 20, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _Z20ROIPool_forward_cudaRKN2at6TensorES2_fii
I'm confused,can you help me with it?
I train my model with your methods, but something terrible occur, could you give me some suggestions
The error message as follow:
Traceback (most recent call last):
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 319, in reduce_storage
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 194, in DupFd
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 48, in init
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 319, in reduce_storage
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 194, in DupFd
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 48, in init
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 149, in _serve
send(conn, destination_pid)
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 50, in send
reduction.send_handle(conn, new_fd, pid)
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 179, in send_handle
with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/socket.py", line 463, in fromfd
nfd = dup(fd)
OSError: [Errno 24] Too many open files
Traceback (most recent cal
2019-03-15 20:43:52,007 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from catalog://ImageNetPretrained/MSRA/R-50
2019-03-15 20:43:52,007 maskrcnn_benchmark.utils.checkpoint INFO: catalog://ImageNetPretrained/MSRA/R-50 points to https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
2019-03-15 20:43:52,008 maskrcnn_benchmark.utils.checkpoint INFO: url https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl cached in pretrained_models/R-50.pkl
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 133, in _load_file
return load_c2_format(self.cfg, f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 155, in load_c2_format
return C2_FORMAT_LOADER[cfg.MODEL.BACKBONE.CONV_BODY](cfg, f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 146, in load_resnet_c2_format
state_dict = _load_c2_pickled_weights(f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 124, in _load_c2_pickled_weights
data = pickle.load(f, encoding="latin1")
_pickle.UnpicklingError: invalid load key, '\x00'.
when i run setup.py build develop, i got this error:
copying build/lib.linux-x86_64-3.6/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so -> maskrcnn_benchmark
error: could not create 'maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so': No such file or directory
I am trying to reproduce your work. When I run train_net I got this. BTW I am using coco2017 rather than coco2014. Is dataset selection related to this error?
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 133, in _load_file
return load_c2_format(self.cfg, f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 155, in load_c2_format
return C2_FORMAT_LOADER[cfg.MODEL.BACKBONE.CONV_BODY](cfg, f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 146, in load_resnet_c2_format
state_dict = _load_c2_pickled_weights(f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 124, in _load_c2_pickled_weights
data = pickle.load(f, encoding="latin1")
_pickle.UnpicklingError: invalid load key, '<'.
Hi,
Thank you for your works! It would be really beneficial for everyone who are interested in the model to have a notebook example that can do prediction based on COCO data on given images like what the original repo does.
https://github.com/facebookresearch/maskrcnn-benchmark/tree/master/demo
Thanks!
loading annotations into memory...
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 60, in train
start_iter=arguments["iteration"],
File "/home/yg/github/maskscoring_rcnn/maskrcnn_benchmark/data/build.py", line 149, in make_data_loader
datasets = build_dataset(dataset_list, transforms, DatasetCatalog, is_train)
File "/home/yg/github/maskscoring_rcnn/maskrcnn_benchmark/data/build.py", line 41, in build_dataset
dataset = factory(**args)
File "/home/yg/github/maskscoring_rcnn/maskrcnn_benchmark/data/datasets/coco.py", line 13, in init
super(COCODataset, self).init(root, ann_file)
File "/home/yg/anaconda3/envs/ms-rcnn/lib/python3.5/site-packages/torchvision/datasets/coco.py", line 97, in init
self.coco = COCO(annFile)
File "/home/yg/anaconda3/envs/ms-rcnn/lib/python3.5/site-packages/pycocotools-2.0-py3.5-linux-x86_64.egg/pycocotools/coco.py", line 84, in init
dataset = json.load(open(annotation_file, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: '/home/yg/datasets/coco/annotations/instances_train2014.json'
and it seems that the same problem occurred in maskrcnn-benchmark
https://github.com/facebookresearch/maskrcnn-benchmark/issues/345
While I excute the code:
python tools/train_net.py --config-file "configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
A error returned:
2019-03-11 14:57:30,345 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from catalog://ImageNetPretrained/MSRA/R-50
2019-03-11 14:57:30,345 maskrcnn_benchmark.utils.checkpoint INFO: catalog://ImageNetPretrained/MSRA/R-50 points to https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
Downloading: "https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl" to pretrained_models/R-50.pkl
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/media//ffd15abb-ef51-4903-a331-ef8327a5864a/DukTo/svn/object-detection/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/media//ffd15abb-ef51-4903-a331-ef8327a5864a/DukTo/svn/object-detection/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 128, in _load_file
cached_f = cache_url(f, model_dir=self.cfg.MODEL.PRETRAINED_MODELS)
File "/media//ffd15abb-ef51-4903-a331-ef8327a5864a/DukTo/svn/object-detection/maskscoring_rcnn/maskrcnn_benchmark/utils/model_zoo.py", line 54, in cache_url
_download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/model_zoo.py", line 88, in _download_url_to_file
u = urlopen(url)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
how could I get the pretrained model by other way.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.