Coder Social home page Coder Social logo

deocclusion's People

Contributors

doubledaibo avatar hno2 avatar praeclarumjj3 avatar xiaohangzhan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deocclusion's Issues

How to run on my own dataset?

Hi, thanks for your contribution!
I want to test the occlusion order on my own dataset, and I notice it need annotation files in demo_kins.ipynb, which I do not have......

About function 'content_completion'

In demos/cocoa.ipynb,

def content_completion(pcnetc, image, input_size, modal, bboxes, amodal_patches_pred, category, idx, dilate, debug=False):
    rgb = cv2.resize(
        utils.crop_padding(image, bboxes[idx], pad_value=(0,0,0)),
        (input_size, input_size), interpolation=cv2.INTER_CUBIC)
    modal_patch = infer.resize_mask(
       utils.crop_padding(modal[idx], bboxes[idx], pad_value=(0,)), input_size, 'linear')
    amodal_patch = infer.resize_mask(
        amodal_patches_pred[idx], input_size, 'linear')
    ret, rgb_erased, vsb_mask = pcnetc.inference(
        rgb, modal_patch, category[idx].item(), amodal_patch, dilate=dilate, with_modal=True)
    ret = recover_image_patch(ret, bboxes[idx], image.shape[0], image.shape[1], (255,255,255))
    vsb_mask = infer.recover_mask(vsb_mask, bbox, image.shape[0], image.shape[1], 'linear')
    return ret, vsb_mask

but where is the bbox defined? I guess it is a bug, it should be ori_bboxes[idx]?

Results not as expected as shown in the paper

Thanks for the work and the task is, by all means, a hard task. However, I want to leave a comment (also open to discuss) that the model doesn't do an acceptable job for filling-in content with amodal masks. I show some context-filling results below with their image ID shown on the top left. The original images are from COCOA/VAL2014. Current inpainting models may do better even they know nothing about object layers.

image

image

image

subprocess.CalledProcessError

subprocess.CalledProcessError: Command '['/home/james/anaconda3/bin/python', '-u', 'main.py', '--local_rank=1', '--config', 'experiments/COCOA/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.

I got this error when I set "python -m torch.distributed.launch --nproc_per_node=2 main.py" in the train.sh file. When I set "python -m torch.distributed.launch --nproc_per_node=1 main.py" ,it failed as well. I tried training your model in a machine with 2 GPUs and Pytorch 1.5.1 installed. How could this be solved? By the way, what's the recommended GPU requirement for training this model? Thank you!

training code for supervised training

Hi, thank you for sharing the code of this nice work :)

It is written in the paper that you reproduced OrderNet.

I found the inference code of OrderNet, but not the training code or the model.
Can you share the code of training the OrderNet?

Inference result is not expected, totally fail

Hi, I want to use this code to complete the human body. The input mask is extracted with mask-rcnn. But the result is not expected:
image
I also tried the example image in the repo, but the result is also bad:
image
What is the problem? Many thanks!

NameError: name 'mc' is not defined

When I run sh experiments/COCOA/pcnet_c/train.sh, the following error is reported:
Original Traceback (most recent call last):
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_content_dataset.py", line 114, in getitem
self._init_memcached()
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_content_dataset.py", line 51, in _init_memcached
self.mclient = mc.MemcachedClient.GetInstance(server_list_config_file, client_config_file)
NameError: name 'mc' is not defined.
How to deal with it? Thanks!

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Hi, could you give me some advice on this error. The details of the experiment is listed as follows:

  1. Dataset: COCOA
  2. environmtn: Python 3.7.9, pytorch 1.6.0
  3. Downloaded pretrains/partialconv.pth from here

I followed the instructions to run training. PCNet-M trains fine, and I did convert the partialconv.pth model to accept 4 channel inputs. When I run "sh experiments/COCOA/pcnet_c/train.sh", I got the following error:

*****************************************                                                                                                                                                                         
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.  
*****************************************                                                                                                                                                                         
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'                                                                                                                                                       
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'                                                                                                                                                       
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'                                                                                                                                                       
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'
[2020-09-22 15:53:59,916] Validation Iter: [0]  Time 0.443 (2.212)      Data 0.015 (1.491)      hole: 0.06159 (0.05562)  valid: 0.05347 (0.05307)        prc: 2.072 (2.004)      style: 0.01656 (0.01629)        $
v: 0.2303 (0.2479)      dis: 0 (0)      adv: 0 (0)
Traceback (most recent call last):
  File "main.py", line 48, in <module>
    main(args)
  File "main.py", line 30, in main
    trainer.run()
  File ".../deocclusion/trainer.py", line 125, in run
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable a$
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
  File "main.py", line 48, in <module>
Traceback (most recent call last):
  File "main.py", line 48, in <module>
    main(args)
  File "main.py", line 30, in main
    main(args)
  File "main.py", line 30, in main
    trainer.run()
File ".../deocclusion/trainer.py", line 125, in run                                                                                                   [7/1538]
    trainer.run()
  File ".../deocclusion/trainer.py", line 125, in run
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
  File "main.py", line 48, in <module>
    main(args)
  File "main.py", line 30, in main
    trainer.run()
  File ".../deocclusion/trainer.py", line 125, in run
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
  File ".../anaconda3/envs/python37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File ".../anaconda3/envs/python37/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module>
    main()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['.../anaconda3/envs/python37/bin/python', '-u', 'main.py', '--local_rank=3', '--config', 'experiments/COCOA/pcnet_c/config.yaml', '--launcher', 'pytor
ch', '--load-pretrain', 'pretrains/partialconv_input_ch4.pth']' returned non-zero exit status 1.

Has anyone run into this error before? Any help would be much appreciated. Thanks!

training problem

Hey,I tried to train the model.But When I used the model I got to run demo_cocoa.ipynb,I got an error like this.
RuntimeError: ../experiments/COCOA/pcnet_m/checkpoints/ckpt_iter_56000.pth.tar is a zip archive (did you mean to use torch.jit.load()?)
Can you help me please?

TypeError: Expected bytes, got str.

Hello, I encountered such an error:
Traceback (most recent call last):
File "./coco.py", line 498, in
layers='heads')
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 2207, in train
validation_data=next(val_generator),
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 1604, in data_generator
use_mini_mask=config.USE_MINI_MASK)
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 1163, in load_image_gt
mask, class_ids = dataset.load_mask(image_id)
File "./coco.py", line 249, in load_mask
image_info["width"])
File "./coco.py", line 308, in annToMask
rle = self.annToRLE(ann, height, width)
File "./coco.py", line 294, in annToRLE
rle = maskUtils.merge(rles)
File "pycocotools/_mask.pyx", line 145, in pycocotools._mask.merge (pycocotools/_mask.c:3173)
File "pycocotools/_mask.pyx", line 122, in pycocotools._mask._frString (pycocotools/_mask.c:2605)
TypeError: Expected bytes, got str.
How to solve this? Thanks!

Are the bboxes of COCOA dataset incorrectly used in this code?

Thanks for your code sharing. I'm a fresh man to this problem. When I looked into your ipython demo on COCOA dataset, I found that the amodal completion result of PCNet-M is always inside the bounding box provided by the COCOA dataset. However, the bounding box provided by the COCOA dataset seems only cover the modal annotations. Is it right? If it is true, I will feel confused about using modal bounding box to restrict the amodal completion area of PCNet-M. And I don't know whether it will bring any bad influence to your training stage.
The figures below show an example I captured from demo_cocoa.ipynb. The image id is 2 (in code).
The bounding boxes are:
image
The amodal completions are:
image

please help:subprocess.CalledProcessError: Command '['/home/wwx/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', 'experiments/KINS/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.

Traceback (most recent call last):
File "main.py", line 9, in
from trainer import Trainer
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/trainer.py", line 14, in
import datasets
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/datasets/init.py", line 1, in
from .reader import *
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/datasets/reader.py", line 8, in
import pycocotools.mask as maskUtils
ModuleNotFoundError: No module named 'pycocotools'
Traceback (most recent call last):
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/wwx/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', 'experiments/KINS/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.

!sh experiments/COCOA/pcnet_m/train.sh # you may have to set --nproc_per_node=#YOUR_GPUS, I have modified the nproc_per_node =1. Thank you

/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
"The module torch.distributed.launch is deprecated "
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK') instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : main.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 8
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:29500
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_muppckot/none_9kg5iq21
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/usr/local/lib/python3.7/dist-packages/torch/distributed/elastic/utils/store.py:53: FutureWarning: This is an experimental API and will be changed in future.
"This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/7/error.json
Traceback (most recent call last):
Traceback (most recent call last):
File "main.py", line 48, in
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 29, in main
Traceback (most recent call last):
File "main.py", line 48, in
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
Traceback (most recent call last):
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
Traceback (most recent call last):
File "main.py", line 48, in
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
main(args)
File "main.py", line 29, in main
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
main(args)
File "main.py", line 29, in main
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
broadcast_params(self.module)broadcast_params(self.module)broadcast_params(self.module)broadcast_params(self.module)

broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params

File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
dist.broadcast(p, 0) dist.broadcast(p, 0)

  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast

File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
dist.broadcast(p, 0)dist.broadcast(p, 0)

      File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast

File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).work = default_pg.broadcast([tensor], opts)

work = default_pg.broadcast([tensor], opts)work = default_pg.broadcast([tensor], opts)RuntimeError
: RuntimeErrorNCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

RuntimeError: work = default_pg.broadcast([tensor], opts):
NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).RuntimeError
: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 6 (pid: 1211) of binary: /usr/bin/python3
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_1/7/error.json

manipulation

Hi Xiaohang, I ran your demos and the results are amazing. However, I do not find codes related to image manipulation such as delete, shift, reposition, and swap as you mentioned in your paper. Is it possible to also provide this part? Thanks a lot!

Bugs in `.backward()` while training PCNet-C.

Hi! Thanks for sharing the excellent codebase. It's very helpful!

I came across an issue related to the backward pass while training the PCNet-C network using the PartialCompletionContentCGAN network. The lines responsible for the errors are:

# update
self.optimD.zero_grad()
dis_loss.backward()
utils.average_gradients(self.netD)
self.optimD.step()


self.optim.zero_grad()
gen_loss.backward()
utils.average_gradients(self.model)
self.optim.step()

If we comment out either of the .backward() lines, the error goes away.

I am using Pytorch 1.8.1.

Concerning the overfitting problem in PCNet-M and PCNet-C

Hi Xiaohang,

Thanks for your releasing code and the demo is really amazing. Recently I test few images on COCO validation set based on your pre-train models, also I use the demo images you used. However the results are frustrated and far satisfied, could you please check it?

This is COCOA/2.jpg and COCOA/2.json ground-truth annotation.
image

This is COCOA/2.jpg and CenterMask instance segmentation results.
image

The output segmentation result is slightly different with the ground-truth, but, as we can see, the instance is not completed well.

Obtained Result

After training PCNet-C on the the KINS dataset,what do these generated images in the folder "/home/jddx/wxp/deocclusion/experiments/KINS/pcnet_c/images" mean?
Is it the content completion representing the validation set image?
But these png pictures are like depth maps, they're pitch black.

another segmentation model fails

hi, I have ran your demo and the results are good, however, when I tried to use the maskrcnn model of pytorch to obtain the bounding boxes and modal information, the inpainting result is bad, I have visualized the bounding boxes and masks detected by maskrcnn and compared with yours, and they don't have large difference, so I wonder whether you have encountered this problem? thanks a lot.

Question about Loader in demo_cocoa.ipynb

截屏2022-11-02 11 19 57

Hi, I met a problem in demo_cocoa.ipynb, it seems that the problem is from yaml package, then I try to change the yaml.load(f) to yaml.safe_load or full load, or add a argument loader, but the result is still the same. Does someone knows how to deal with this issue? Thank you in advance.

Question about your metric computation

Hi Xiaohang Zhan,

Your framework seems to be elegant and easy to use. And the work you proposed is inspiring. However, I think there is one unreasonable computation on your metric (IOU).

miou = intersection_rec.sum / (union_rec.sum + 1e-10) # mIoU

As we can see, you sum all foreground pixels and all background pixels together of the whole dataset, and then use the total pixel number to get the final IOU result. As far as my experience, it is not the common style to calculate IOU. Although you also use the computation way for other methods, it is fair in your paper.

Could you please clarify the computation style? I think it significantly enlarges the metric difference between Method Raw and Method PCNet-M.

Regards,
Qiang Zhou

Order matrix not correct even after adjusting 'th' and 'dilate_kernel'

Hi, I was testing out the demo codes on this RGB image
image

and its instance mask
image

I slowly increased the 'th' and 'dilate_kernel' from (0.1, 5), (0.3, 7), (0.5, 9), (0.7, 11) respectively as advised in #14 but the order matrix is still not correct.

image

From the order matrix, Instance 3 (table) is not considered as an occluder to Instance 5 (bench). May I ask for some advice as to how to improve the situation?

ValueError: Unterminated string

Hi, when I run demos/demo_kins.ipynb, the following error occurs. How should it be solved? Thank you!

JSONDecodeError Traceback (most recent call last)
in
6 annot_path = "../data/KINS/instances_{}.json".format(phase)
7
----> 8 data_reader = KINSLVISDataset('KINS', annot_path)

~/fuxian/Self-Supervised_Scene_De-occlusion/deocclusion-master/datasets/reader.py in init(self, dataset, annot_fn)
133 def init(self, dataset, annot_fn):
134 self.dataset = dataset
--> 135 data = cvb.load(annot_fn)
136 self.images_info = data['images']
137 self.annot_info = data['annotations']

~/anaconda3/envs/py3-env/lib/python3.7/site-packages/cvbase/io.py in load(file, format, **kwargs)
114 if format not in processors:
115 raise TypeError('Unsupported format: ' + format)
--> 116 return processors[format](file, **kwargs)
117
118

~/anaconda3/envs/py3-env/lib/python3.7/site-packages/cvbase/io.py in json_load(file)
18 if isinstance(file, str):
19 with open(file, 'r') as f:
---> 20 obj = json.load(f)
21 elif hasattr(file, 'read'):
22 obj = json.load(file)

~/anaconda3/envs/py3-env/lib/python3.7/json/init.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
--> 296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
297
298

~/anaconda3/envs/py3-env/lib/python3.7/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
346 parse_int is None and parse_float is None and
347 parse_constant is None and object_pairs_hook is None and not kw):
--> 348 return _default_decoder.decode(s)
349 if cls is None:
350 cls = JSONDecoder

~/anaconda3/envs/py3-env/lib/python3.7/json/decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):

~/anaconda3/envs/py3-env/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
351 """
352 try:
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
355 raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Unterminated string starting at: line 1 column 18350068 (char 18350067)
08/13/2020 08:26:47 PM INFO: Shutdown kernel
08/13/2020 08:26:47 PM WARNING: Exiting with nonzero exit status

amodel pred ours

Hello, I found that the amodal_pred_ours image was not marked when I was running, may I ask what the problem might be?
image

ValueError: Unexpected option: --local_rank=0

Traceback (most recent call last):
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1961, in main
setup = process_command_line(sys.argv)
File "/root/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_command_line_handling.py", line 145, in process_command_line
raise ValueError("Unexpected option: " + argv[i])
ValueError: Unexpected option: --local_rank=0
Usage:
pydevd.py --port N [(--client hostname) | --server] --file executable [file_options]

Process finished with exit code 0

when i debug,the error occurs。

TypeError: Expected bytes, got str.

Sorry, the above is the code I found when I searched Google for the same error. I accidentally copied it. The following is the error I reported while training under the deocclusion file:
Traceback (most recent call last):
File "main.py", line 51, in
main(args)
File "main.py", line 31, in main
trainer.run()
File "/home/peng/python_pro/pig_pro/deocclusion-master/trainer.py", line 122, in run
self.validate('on_val')
File "/home/peng/python_pro/pig_pro/deocclusion-master/trainer.py", line 206, in validate
for i, inputs in enumerate(self.val_loader):
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_dataset.py", line 116, in getitem
idx, load_rgb=self.config['load_rgb'], randshift=True) # modal, uint8 {0, 1}
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_dataset.py", line 69, in _get_inst
modal, bbox, category, imgfn, _ = self.data_reader.get_instance(idx)
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/reader.py", line 108, in get_instance
modal, bbox, category = read_COCOA(reg, h, w)
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/reader.py", line 52, in read_COCOA
modal = maskUtils.decode(rle).squeeze()
File "pycocotools/_mask.pyx", line 138, in pycocotools._mask.decode
File "pycocotools/_mask.pyx", line 122, in pycocotools._mask._frString
TypeError: Expected bytes, got str.
How should it be solved? Thank you!

error when resume checkpoints

utils/scheduler.py", line 17, in init
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
"in param_groups[{}] when resuming an optimizer".format(i))

evaluation issues

Hi, thanks for your work!
If training and validating with a custom dataset, how are the evaluation metrics acc_occpair and miu generated?
Is it the predicted amodal mask and the generated amodal mask annotations ? BecauseThe custom dataset does not contain amodal and order annotations
Can you answer my question? Thanks a lot

contours = np.subtract(contours, 1) error

ValueError Traceback (most recent call last)
Cell In[19], line 42
40 plt.axis('off')
41 plt.text(0, -10, title[i])
---> 42 pface, pedge = polygon_drawing(toshow[i], selidx, colors, bbox_show, thickness=3)
43 ax.add_collection(pface)
44 ax.add_collection(pedge)

File /deocclusion/demos/demo_utils.py:206, in polygon_drawing(masks, selidx, color_source, bbox, thickness)
204 masks = masks[:, u:b, l:r]
205 for i,am in enumerate(masks[selidx,...]):
--> 206 pts_list = reader.mask_to_polygon(am)
207 for pts in pts_list:
208 pts = np.array(pts).reshape(-1, 2)

File /deocclusion/datasets/reader.py:286, in mask_to_polygon(mask, tolerance, area_threshold)
284 contours = measure.find_contours(padded_mask, 0.5)
285 # Fix coordinates after padding
--> 286 contours = np.subtract(contours, 1)
287 for contour in contours:
288 if not np.array_equal(contour[0], contour[-1]):

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

when i infer amodal completion that error occured. how can i solve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.