xiaohangzhan / deocclusion Goto Github PK
View Code? Open in Web Editor NEWCode for our CVPR 2020 work.
License: Apache License 2.0
Code for our CVPR 2020 work.
License: Apache License 2.0
Hi, I think the offset
here
deocclusion/utils/data_utils.py
Line 90 in c8439ea
when i use the pretrain model from https://github.com/naoto0804/pytorch-inpainting-with-partial-conv, i get the result: keyerror, can you tell me what's wrong?
Hi, thanks for your contribution!
I want to test the occlusion order on my own dataset, and I notice it need annotation files in demo_kins.ipynb, which I do not have......
already done
In demos/cocoa.ipynb
,
def content_completion(pcnetc, image, input_size, modal, bboxes, amodal_patches_pred, category, idx, dilate, debug=False):
rgb = cv2.resize(
utils.crop_padding(image, bboxes[idx], pad_value=(0,0,0)),
(input_size, input_size), interpolation=cv2.INTER_CUBIC)
modal_patch = infer.resize_mask(
utils.crop_padding(modal[idx], bboxes[idx], pad_value=(0,)), input_size, 'linear')
amodal_patch = infer.resize_mask(
amodal_patches_pred[idx], input_size, 'linear')
ret, rgb_erased, vsb_mask = pcnetc.inference(
rgb, modal_patch, category[idx].item(), amodal_patch, dilate=dilate, with_modal=True)
ret = recover_image_patch(ret, bboxes[idx], image.shape[0], image.shape[1], (255,255,255))
vsb_mask = infer.recover_mask(vsb_mask, bbox, image.shape[0], image.shape[1], 'linear')
return ret, vsb_mask
but where is the bbox
defined? I guess it is a bug, it should be ori_bboxes[idx]
?
Thanks for the work and the task is, by all means, a hard task. However, I want to leave a comment (also open to discuss) that the model doesn't do an acceptable job for filling-in content with amodal masks. I show some context-filling results below with their image ID shown on the top left. The original images are from COCOA/VAL2014. Current inpainting models may do better even they know nothing about object layers.
subprocess.CalledProcessError: Command '['/home/james/anaconda3/bin/python', '-u', 'main.py', '--local_rank=1', '--config', 'experiments/COCOA/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.
I got this error when I set "python -m torch.distributed.launch --nproc_per_node=2 main.py" in the train.sh file. When I set "python -m torch.distributed.launch --nproc_per_node=1 main.py" ,it failed as well. I tried training your model in a machine with 2 GPUs and Pytorch 1.5.1 installed. How could this be solved? By the way, what's the recommended GPU requirement for training this model? Thank you!
hi, do you plan to release the app for image manipulation as you demonstrated in the gif?
Hi, thank you for sharing the code of this nice work :)
It is written in the paper that you reproduced OrderNet.
I found the inference code of OrderNet, but not the training code or the model.
Can you share the code of training the OrderNet?
How to label my own data?
When I run sh experiments/COCOA/pcnet_c/train.sh, the following error is reported:
Original Traceback (most recent call last):
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_content_dataset.py", line 114, in getitem
self._init_memcached()
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_content_dataset.py", line 51, in _init_memcached
self.mclient = mc.MemcachedClient.GetInstance(server_list_config_file, client_config_file)
NameError: name 'mc' is not defined.
How to deal with it? Thanks!
Hi, could you give me some advice on this error. The details of the experiment is listed as follows:
I followed the instructions to run training. PCNet-M trains fine, and I did convert the partialconv.pth model to accept 4 channel inputs. When I run "sh experiments/COCOA/pcnet_c/train.sh", I got the following error:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'
[2020-09-22 15:53:59,916] Validation Iter: [0] Time 0.443 (2.212) Data 0.015 (1.491) hole: 0.06159 (0.05562) valid: 0.05347 (0.05307) prc: 2.072 (2.004) style: 0.01656 (0.01629) $
v: 0.2303 (0.2479) dis: 0 (0) adv: 0 (0)
Traceback (most recent call last):
File "main.py", line 48, in <module>
main(args)
File "main.py", line 30, in main
trainer.run()
File ".../deocclusion/trainer.py", line 125, in run
self.train()
File ".../deocclusion/trainer.py", line 147, in train
loss_dict = self.model.step()
File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
gen_loss.backward()
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable a$
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
File "main.py", line 48, in <module>
Traceback (most recent call last):
File "main.py", line 48, in <module>
main(args)
File "main.py", line 30, in main
main(args)
File "main.py", line 30, in main
trainer.run()
File ".../deocclusion/trainer.py", line 125, in run [7/1538]
trainer.run()
File ".../deocclusion/trainer.py", line 125, in run
self.train()
File ".../deocclusion/trainer.py", line 147, in train
self.train()
File ".../deocclusion/trainer.py", line 147, in train
loss_dict = self.model.step()
File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
loss_dict = self.model.step()
File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
gen_loss.backward()
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
gen_loss.backward()
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
File "main.py", line 48, in <module>
main(args)
File "main.py", line 30, in main
trainer.run()
File ".../deocclusion/trainer.py", line 125, in run
self.train()
File ".../deocclusion/trainer.py", line 147, in train
loss_dict = self.model.step()
File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
gen_loss.backward()
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
File ".../anaconda3/envs/python37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File ".../anaconda3/envs/python37/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module>
main()
File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['.../anaconda3/envs/python37/bin/python', '-u', 'main.py', '--local_rank=3', '--config', 'experiments/COCOA/pcnet_c/config.yaml', '--launcher', 'pytor
ch', '--load-pretrain', 'pretrains/partialconv_input_ch4.pth']' returned non-zero exit status 1.
Has anyone run into this error before? Any help would be much appreciated. Thanks!
How can I make my own COCOA data set?
i see the :http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_Semantic_Amodal_Segmentation_CVPR_2017_paper.pdf.
but the tools do not open-source.
Hey,I tried to train the model.But When I used the model I got to run demo_cocoa.ipynb,I got an error like this.
RuntimeError: ../experiments/COCOA/pcnet_m/checkpoints/ckpt_iter_56000.pth.tar is a zip archive (did you mean to use torch.jit.load()?)
Can you help me please?
Hello, I encountered such an error:
Traceback (most recent call last):
File "./coco.py", line 498, in
layers='heads')
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 2207, in train
validation_data=next(val_generator),
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 1604, in data_generator
use_mini_mask=config.USE_MINI_MASK)
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 1163, in load_image_gt
mask, class_ids = dataset.load_mask(image_id)
File "./coco.py", line 249, in load_mask
image_info["width"])
File "./coco.py", line 308, in annToMask
rle = self.annToRLE(ann, height, width)
File "./coco.py", line 294, in annToRLE
rle = maskUtils.merge(rles)
File "pycocotools/_mask.pyx", line 145, in pycocotools._mask.merge (pycocotools/_mask.c:3173)
File "pycocotools/_mask.pyx", line 122, in pycocotools._mask._frString (pycocotools/_mask.c:2605)
TypeError: Expected bytes, got str.
How to solve this? Thanks!
The pre-trained image inpainting model using partial convolution to pretrains / partialconv.pth mentioned in the training PCNet-C section, Download the link suggested above, I did not find this pre-trained image inpainting model, please ask where can I find, Thank you!
Thanks for your code sharing. I'm a fresh man to this problem. When I looked into your ipython demo on COCOA dataset, I found that the amodal completion result of PCNet-M is always inside the bounding box provided by the COCOA dataset. However, the bounding box provided by the COCOA dataset seems only cover the modal annotations. Is it right? If it is true, I will feel confused about using modal bounding box to restrict the amodal completion area of PCNet-M. And I don't know whether it will bring any bad influence to your training stage.
The figures below show an example I captured from demo_cocoa.ipynb. The image id is 2 (in code).
The bounding boxes are:
The amodal completions are:
In the download link you gave, all the published models have been damaged. Can you upload them again.
Traceback (most recent call last):
File "main.py", line 9, in
from trainer import Trainer
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/trainer.py", line 14, in
import datasets
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/datasets/init.py", line 1, in
from .reader import *
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/datasets/reader.py", line 8, in
import pycocotools.mask as maskUtils
ModuleNotFoundError: No module named 'pycocotools'
Traceback (most recent call last):
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/wwx/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', 'experiments/KINS/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.
/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
"The module torch.distributed.launch is deprecated "
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK')
instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : main.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 8
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:29500
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}
INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_muppckot/none_9kg5iq21
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/usr/local/lib/python3.7/dist-packages/torch/distributed/elastic/utils/store.py:53: FutureWarning: This is an experimental API and will be changed in future.
"This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/7/error.json
Traceback (most recent call last):
Traceback (most recent call last):
File "main.py", line 48, in
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 29, in main
Traceback (most recent call last):
File "main.py", line 48, in
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
Traceback (most recent call last):
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
Traceback (most recent call last):
File "main.py", line 48, in
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
main(args)
File "main.py", line 29, in main
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
main(args)
File "main.py", line 29, in main
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
broadcast_params(self.module)broadcast_params(self.module)broadcast_params(self.module)broadcast_params(self.module)
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
dist.broadcast(p, 0) dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
dist.broadcast(p, 0)dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).work = default_pg.broadcast([tensor], opts)
work = default_pg.broadcast([tensor], opts)work = default_pg.broadcast([tensor], opts)RuntimeError
: RuntimeErrorNCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
RuntimeError: work = default_pg.broadcast([tensor], opts):
NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).RuntimeError
: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 6 (pid: 1211) of binary: /usr/bin/python3
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/7/error.json
Hi Xiaohang, I ran your demos and the results are amazing. However, I do not find codes related to image manipulation such as delete, shift, reposition, and swap as you mentioned in your paper. Is it possible to also provide this part? Thanks a lot!
Hi! Thanks for sharing the excellent codebase. It's very helpful!
I came across an issue related to the backward pass while training the PCNet-C network using the PartialCompletionContentCGAN network. The lines responsible for the errors are:
# update
self.optimD.zero_grad()
dis_loss.backward()
utils.average_gradients(self.netD)
self.optimD.step()
self.optim.zero_grad()
gen_loss.backward()
utils.average_gradients(self.model)
self.optim.step()
If we comment out either of the .backward()
lines, the error goes away.
I am using Pytorch 1.8.1.
Hi Xiaohang,
Thanks for your releasing code and the demo is really amazing. Recently I test few images on COCO validation set based on your pre-train models, also I use the demo images you used. However the results are frustrated and far satisfied, could you please check it?
This is COCOA/2.jpg and COCOA/2.json ground-truth annotation.
This is COCOA/2.jpg and CenterMask instance segmentation results.
The output segmentation result is slightly different with the ground-truth, but, as we can see, the instance is not completed well.
After training PCNet-C on the the KINS dataset,what do these generated images in the folder "/home/jddx/wxp/deocclusion/experiments/KINS/pcnet_c/images" mean?
Is it the content completion representing the validation set image?
But these png pictures are like depth maps, they're pitch black.
hi, I have ran your demo and the results are good, however, when I tried to use the maskrcnn model of pytorch to obtain the bounding boxes and modal information, the inpainting result is bad, I have visualized the bounding boxes and masks detected by maskrcnn and compared with yours, and they don't have large difference, so I wonder whether you have encountered this problem? thanks a lot.
No such file or directory: '../data/COCOA/annotations/COCO_amodal_val2014.json'
How to make my own data set?
Hi Xiaohang Zhan,
Your framework seems to be elegant and easy to use. And the work you proposed is inspiring. However, I think there is one unreasonable computation on your metric (IOU).
Line 195 in c8439ea
As we can see, you sum all foreground pixels and all background pixels together of the whole dataset, and then use the total pixel number to get the final IOU result. As far as my experience, it is not the common style to calculate IOU. Although you also use the computation way for other methods, it is fair in your paper.
Could you please clarify the computation style? I think it significantly enlarges the metric difference between Method Raw
and Method PCNet-M
.
Regards,
Qiang Zhou
Hi, I was testing out the demo codes on this RGB image
I slowly increased the 'th' and 'dilate_kernel' from (0.1, 5), (0.3, 7), (0.5, 9), (0.7, 11) respectively as advised in #14 but the order matrix is still not correct.
From the order matrix, Instance 3 (table) is not considered as an occluder to Instance 5 (bench). May I ask for some advice as to how to improve the situation?
demos/demo_kins.ipynb
, the following error occurs. How should it be solved? Thank you!JSONDecodeError Traceback (most recent call last)
in
6 annot_path = "../data/KINS/instances_{}.json".format(phase)
7
----> 8 data_reader = KINSLVISDataset('KINS', annot_path)
~/fuxian/Self-Supervised_Scene_De-occlusion/deocclusion-master/datasets/reader.py in init(self, dataset, annot_fn)
133 def init(self, dataset, annot_fn):
134 self.dataset = dataset
--> 135 data = cvb.load(annot_fn)
136 self.images_info = data['images']
137 self.annot_info = data['annotations']
~/anaconda3/envs/py3-env/lib/python3.7/site-packages/cvbase/io.py in load(file, format, **kwargs)
114 if format not in processors:
115 raise TypeError('Unsupported format: ' + format)
--> 116 return processors[format](file, **kwargs)
117
118
~/anaconda3/envs/py3-env/lib/python3.7/site-packages/cvbase/io.py in json_load(file)
18 if isinstance(file, str):
19 with open(file, 'r') as f:
---> 20 obj = json.load(f)
21 elif hasattr(file, 'read'):
22 obj = json.load(file)
~/anaconda3/envs/py3-env/lib/python3.7/json/init.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
--> 296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
297
298
~/anaconda3/envs/py3-env/lib/python3.7/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
346 parse_int is None and parse_float is None and
347 parse_constant is None and object_pairs_hook is None and not kw):
--> 348 return _default_decoder.decode(s)
349 if cls is None:
350 cls = JSONDecoder
~/anaconda3/envs/py3-env/lib/python3.7/json/decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
~/anaconda3/envs/py3-env/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
351 """
352 try:
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
355 raise JSONDecodeError("Expecting value", s, err.value) from None
JSONDecodeError: Unterminated string starting at: line 1 column 18350068 (char 18350067)
08/13/2020 08:26:47 PM INFO: Shutdown kernel
08/13/2020 08:26:47 PM WARNING: Exiting with nonzero exit status
Traceback (most recent call last):
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1961, in main
setup = process_command_line(sys.argv)
File "/root/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_command_line_handling.py", line 145, in process_command_line
raise ValueError("Unexpected option: " + argv[i])
ValueError: Unexpected option: --local_rank=0
Usage:
pydevd.py --port N [(--client hostname) | --server] --file executable [file_options]
Process finished with exit code 0
when i debug,the error occurs。
excuse me, there is another question, how to get the generated image from pcnet_c?
Sorry, the above is the code I found when I searched Google for the same error. I accidentally copied it. The following is the error I reported while training under the deocclusion file:
Traceback (most recent call last):
File "main.py", line 51, in
main(args)
File "main.py", line 31, in main
trainer.run()
File "/home/peng/python_pro/pig_pro/deocclusion-master/trainer.py", line 122, in run
self.validate('on_val')
File "/home/peng/python_pro/pig_pro/deocclusion-master/trainer.py", line 206, in validate
for i, inputs in enumerate(self.val_loader):
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_dataset.py", line 116, in getitem
idx, load_rgb=self.config['load_rgb'], randshift=True) # modal, uint8 {0, 1}
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_dataset.py", line 69, in _get_inst
modal, bbox, category, imgfn, _ = self.data_reader.get_instance(idx)
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/reader.py", line 108, in get_instance
modal, bbox, category = read_COCOA(reg, h, w)
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/reader.py", line 52, in read_COCOA
modal = maskUtils.decode(rle).squeeze()
File "pycocotools/_mask.pyx", line 138, in pycocotools._mask.decode
File "pycocotools/_mask.pyx", line 122, in pycocotools._mask._frString
TypeError: Expected bytes, got str.
How should it be solved? Thank you!
Please add a collab. The interface can be taken from here: https://stackoverflow.com/questions/59630751/simple-ui-on-top-of-colab
不知道如何从自述中给的COCOA annotations的下载链接中下载COCOA annotations
Hi, I am deeply confused about make shuffle=False in train loader. Is there any special reason for this?
Line 95 in ac543f9
self.train_loader = DataLoader(train_dataset,
batch_size=args.data['batch_size'],
shuffle=False,
num_workers=args.data['workers'],
pin_memory=False,
sampler=train_sampler)
utils/scheduler.py", line 17, in init
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
"in param_groups[{}] when resuming an optimizer".format(i))
how to solve this error
How the json files convert to KINS format
Hi, thanks for your work!
If training and validating with a custom dataset, how are the evaluation metrics acc_occpair and miu generated?
Is it the predicted amodal mask and the generated amodal mask annotations ? BecauseThe custom dataset does not contain amodal and order annotations
Can you answer my question? Thanks a lot
ValueError Traceback (most recent call last)
Cell In[19], line 42
40 plt.axis('off')
41 plt.text(0, -10, title[i])
---> 42 pface, pedge = polygon_drawing(toshow[i], selidx, colors, bbox_show, thickness=3)
43 ax.add_collection(pface)
44 ax.add_collection(pedge)
File /deocclusion/demos/demo_utils.py:206, in polygon_drawing(masks, selidx, color_source, bbox, thickness)
204 masks = masks[:, u:b, l:r]
205 for i,am in enumerate(masks[selidx,...]):
--> 206 pts_list = reader.mask_to_polygon(am)
207 for pts in pts_list:
208 pts = np.array(pts).reshape(-1, 2)
File /deocclusion/datasets/reader.py:286, in mask_to_polygon(mask, tolerance, area_threshold)
284 contours = measure.find_contours(padded_mask, 0.5)
285 # Fix coordinates after padding
--> 286 contours = np.subtract(contours, 1)
287 for contour in contours:
288 if not np.array_equal(contour[0], contour[-1]):
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
when i infer amodal completion that error occured. how can i solve it?
order score is 0.82706 and mIOU is 0.76812
but in the paper they are 87.1% and 81.35%
Can you help me?Thank you.
In the second step"Run demos/demo_cocoa.ipynb or demos/demo_kins.ipynb"
I don’t know how to run it or Use files in'.jpynb' format.
All i can do is to open it
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.