xiaohangzhan / deocclusion Goto Github PK

View Code? Open in Web Editor NEW

777.0 777.0 101.0 37.3 MB

Code for our CVPR 2020 work.

License: Apache License 2.0

Python 97.41% Shell 2.59%

deocclusion's People

Contributors

Stargazers

Watchers

Forkers

daviddatascientist aihill lsheiba pooyaalamirpour calculusoflambdas anoopyear2020 killsking zebrajack hanyeliu dahburj jiangxuehan ameerhamza111 anotherother abhinavm24 nerdsav apprisi tonylv tchigher ml-and-ai-repo meheditest123 suprasteen danboardxu scordavis shaunstanislauslau karim-ahmed xiong224 felixzhang7 cv-ip sts-sadr shualite maczone xwyangjshb shuxiangguo hios-source jsyzeng jangocheng zdhscdj wohaiyo shuixianhua capriciouszihao yuhonghong95721 wuxiaolianggit fisheaty happog zhaopengpeng8866 peterzs 985606575 huguensjean happytianhao 18673461800 paozhuanyinyuba davittta qin-jiang davidko3 chenquan-cq anzisheng ameeransari xrosliang wqz960 banben jjdbear robot-ai-machinelearning zengyh1900 shinetzh whqchina sktz-er alibuildsai yi-shi94 790578527 wpfhtl avazak vmr013 wmjhome sddzlsc sg47 briana-jin-zhang aliang-cv praeclarumjj3 lianhui1993 tiamat-tech yutong-zhou-cv wh-forker wcyjames tracymacc ma3252788 jiunyen-ching carolynxzy freezingnicole landian60 ayyar samsgood0310 zbwxp uaws developer0116 aklawrence fosstheory nbrrawal erdalstk hzkgit tum-luk

deocclusion's Issues

box offset of eraser in PCNet-M

Hi, I think the offset
here

deocclusion/utils/data_utils.py

Line 90 in c8439ea

bbox = (int(offx * h), int(offy * h), w, h)

should be bbox = (int(offx * w), int(offy * h), w, h).
Is it a bug?

a question about pretrain model

when i use the pretrain model from https://github.com/naoto0804/pytorch-inpainting-with-partial-conv, i get the result: keyerror, can you tell me what's wrong?

How to run on my own dataset?

Hi, thanks for your contribution!
I want to test the occlusion order on my own dataset, and I notice it need annotation files in demo_kins.ipynb, which I do not have......

About the coco amodal dataset ಠ_ಠ

already done

About function 'content_completion'

In demos/cocoa.ipynb,

def content_completion(pcnetc, image, input_size, modal, bboxes, amodal_patches_pred, category, idx, dilate, debug=False):
    rgb = cv2.resize(
        utils.crop_padding(image, bboxes[idx], pad_value=(0,0,0)),
        (input_size, input_size), interpolation=cv2.INTER_CUBIC)
    modal_patch = infer.resize_mask(
       utils.crop_padding(modal[idx], bboxes[idx], pad_value=(0,)), input_size, 'linear')
    amodal_patch = infer.resize_mask(
        amodal_patches_pred[idx], input_size, 'linear')
    ret, rgb_erased, vsb_mask = pcnetc.inference(
        rgb, modal_patch, category[idx].item(), amodal_patch, dilate=dilate, with_modal=True)
    ret = recover_image_patch(ret, bboxes[idx], image.shape[0], image.shape[1], (255,255,255))
    vsb_mask = infer.recover_mask(vsb_mask, bbox, image.shape[0], image.shape[1], 'linear')
    return ret, vsb_mask

but where is the bbox defined? I guess it is a bug, it should be ori_bboxes[idx]?

Results not as expected as shown in the paper

Thanks for the work and the task is, by all means, a hard task. However, I want to leave a comment (also open to discuss) that the model doesn't do an acceptable job for filling-in content with amodal masks. I show some context-filling results below with their image ID shown on the top left. The original images are from COCOA/VAL2014. Current inpainting models may do better even they know nothing about object layers.

subprocess.CalledProcessError

subprocess.CalledProcessError: Command '['/home/james/anaconda3/bin/python', '-u', 'main.py', '--local_rank=1', '--config', 'experiments/COCOA/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.

I got this error when I set "python -m torch.distributed.launch --nproc_per_node=2 main.py" in the train.sh file. When I set "python -m torch.distributed.launch --nproc_per_node=1 main.py" ,it failed as well. I tried training your model in a machine with 2 GPUs and Pytorch 1.5.1 installed. How could this be solved? By the way, what's the recommended GPU requirement for training this model? Thank you!

app for image manipulation

hi, do you plan to release the app for image manipulation as you demonstrated in the gif?

RuntimeError: invalid argument 2: non-empty vector or matrix expected at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:31

When I was running this step sh experiments/COCOA/pcnet_m/train.sh, this problem occurred. I don't know how to solve it.

RuntimeError: invalid argument 2: non-empty vector or matrix expected at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:31

training code for supervised training

Hi, thank you for sharing the code of this nice work :)

It is written in the paper that you reproduced OrderNet.

I found the inference code of OrderNet, but not the training code or the model.
Can you share the code of training the OrderNet?

Inference result is not expected, totally fail

Hi, I want to use this code to complete the human body. The input mask is extracted with mask-rcnn. But the result is not expected:

I also tried the example image in the repo, but the result is also bad:

What is the problem? Many thanks!

How to label my own data

How to label my own data？

NameError: name 'mc' is not defined

When I run sh experiments/COCOA/pcnet_c/train.sh, the following error is reported：
Original Traceback (most recent call last):
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_content_dataset.py", line 114, in getitem
self._init_memcached()
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_content_dataset.py", line 51, in _init_memcached
self.mclient = mc.MemcachedClient.GetInstance(server_list_config_file, client_config_file)
NameError: name 'mc' is not defined.
How to deal with it? Thanks!

please:subprocess.CalledProcessError: Command

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Hi, could you give me some advice on this error. The details of the experiment is listed as follows:

Dataset: COCOA
environmtn: Python 3.7.9, pytorch 1.6.0
Downloaded pretrains/partialconv.pth from here

I followed the instructions to run training. PCNet-M trains fine, and I did convert the partialconv.pth model to accept 4 channel inputs. When I run "sh experiments/COCOA/pcnet_c/train.sh", I got the following error:

*****************************************                                                                                                                                                                         
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.  
*****************************************                                                                                                                                                                         
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
main.py:14: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.                                     
  config = yaml.load(f)                                                                                                                                                                                           
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'                                                                                                                                                       
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'                                                                                                                                                       
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'                                                                                                                                                       
=> loading checkpoint 'pretrains/partialconv_input_ch4.pth'
[2020-09-22 15:53:59,916] Validation Iter: [0]  Time 0.443 (2.212)      Data 0.015 (1.491)      hole: 0.06159 (0.05562)  valid: 0.05347 (0.05307)        prc: 2.072 (2.004)      style: 0.01656 (0.01629)        $
v: 0.2303 (0.2479)      dis: 0 (0)      adv: 0 (0)
Traceback (most recent call last):
  File "main.py", line 48, in <module>
    main(args)
  File "main.py", line 30, in main
    trainer.run()
  File ".../deocclusion/trainer.py", line 125, in run
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable a$
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
  File "main.py", line 48, in <module>
Traceback (most recent call last):
  File "main.py", line 48, in <module>
    main(args)
  File "main.py", line 30, in main
    main(args)
  File "main.py", line 30, in main
    trainer.run()
File ".../deocclusion/trainer.py", line 125, in run                                                                                                   [7/1538]
    trainer.run()
  File ".../deocclusion/trainer.py", line 125, in run
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
  File "main.py", line 48, in <module>
    main(args)
  File "main.py", line 30, in main
    trainer.run()
  File ".../deocclusion/trainer.py", line 125, in run
    self.train()
  File ".../deocclusion/trainer.py", line 147, in train
    loss_dict = self.model.step()
  File ".../deocclusion/models/partial_completion_content_cgan.py", line 153, in step
    gen_loss.backward()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
  File ".../anaconda3/envs/python37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File ".../anaconda3/envs/python37/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module>
    main()
  File ".../anaconda3/envs/python37/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['.../anaconda3/envs/python37/bin/python', '-u', 'main.py', '--local_rank=3', '--config', 'experiments/COCOA/pcnet_c/config.yaml', '--launcher', 'pytor
ch', '--load-pretrain', 'pretrains/partialconv_input_ch4.pth']' returned non-zero exit status 1.

Has anyone run into this error before? Any help would be much appreciated. Thanks!

I want to ask：How can I make my own data set?---thanks

How can I make my own COCOA data set?

i see the :http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_Semantic_Amodal_Segmentation_CVPR_2017_paper.pdf.
but the tools do not open-source.

training problem

Hey,I tried to train the model.But When I used the model I got to run demo_cocoa.ipynb,I got an error like this.
RuntimeError: ../experiments/COCOA/pcnet_m/checkpoints/ckpt_iter_56000.pth.tar is a zip archive (did you mean to use torch.jit.load()?)
Can you help me please?

TypeError: Expected bytes, got str.

Hello, I encountered such an error:
Traceback (most recent call last):
File "./coco.py", line 498, in
layers='heads')
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 2207, in train
validation_data=next(val_generator),
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 1604, in data_generator
use_mini_mask=config.USE_MINI_MASK)
File "/data/g/weidaihua/Mask_RCNN-master/model.py", line 1163, in load_image_gt
mask, class_ids = dataset.load_mask(image_id)
File "./coco.py", line 249, in load_mask
image_info["width"])
File "./coco.py", line 308, in annToMask
rle = self.annToRLE(ann, height, width)
File "./coco.py", line 294, in annToRLE
rle = maskUtils.merge(rles)
File "pycocotools/_mask.pyx", line 145, in pycocotools._mask.merge (pycocotools/_mask.c:3173)
File "pycocotools/_mask.pyx", line 122, in pycocotools._mask._frString (pycocotools/_mask.c:2605)
TypeError: Expected bytes, got str.
How to solve this? Thanks!

where can I find the pre-trained image inpainting model using partial convolution to pretrains / partialconv.pth?

The pre-trained image inpainting model using partial convolution to pretrains / partialconv.pth mentioned in the training PCNet-C section, Download the link suggested above, I did not find this pre-trained image inpainting model, please ask where can I find, Thank you!

Are the bboxes of COCOA dataset incorrectly used in this code?

Thanks for your code sharing. I'm a fresh man to this problem. When I looked into your ipython demo on COCOA dataset, I found that the amodal completion result of PCNet-M is always inside the bounding box provided by the COCOA dataset. However, the bounding box provided by the COCOA dataset seems only cover the modal annotations. Is it right? If it is true, I will feel confused about using modal bounding box to restrict the amodal completion area of PCNet-M. And I don't know whether it will bring any bad influence to your training stage.
The figures below show an example I captured from demo_cocoa.ipynb. The image id is 2 (in code).
The bounding boxes are:

The amodal completions are:

The published model is corrupted

In the download link you gave, all the published models have been damaged. Can you upload them again.

please help：subprocess.CalledProcessError: Command '['/home/wwx/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', 'experiments/KINS/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.

Traceback (most recent call last):
File "main.py", line 9, in
from trainer import Trainer
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/trainer.py", line 14, in
import datasets
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/datasets/init.py", line 1, in
from .reader import *
File "/media/wwx/B8D46DEEC022AA4B/deocclusion-master/datasets/reader.py", line 8, in
import pycocotools.mask as maskUtils
ModuleNotFoundError: No module named 'pycocotools'
Traceback (most recent call last):
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/wwx/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/wwx/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', 'experiments/KINS/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.

!sh experiments/COCOA/pcnet_m/train.sh # you may have to set --nproc_per_node=#YOUR_GPUS, I have modified the nproc_per_node =1. Thank you

/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
"The module torch.distributed.launch is deprecated "
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK') instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : main.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 8
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:29500
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_muppckot/none_9kg5iq21
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/usr/local/lib/python3.7/dist-packages/torch/distributed/elastic/utils/store.py:53: FutureWarning: This is an experimental API and will be changed in future.
"This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_muppckot/none_9kg5iq21/attempt_0/7/error.json
Traceback (most recent call last):
Traceback (most recent call last):
File "main.py", line 48, in
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 29, in main
Traceback (most recent call last):
File "main.py", line 48, in
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
Traceback (most recent call last):
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
Traceback (most recent call last):
File "main.py", line 48, in
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
File "main.py", line 48, in
Traceback (most recent call last):
File "main.py", line 48, in
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
main(args)
File "main.py", line 29, in main
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
main(args)
File "main.py", line 29, in main
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
main(args)
File "main.py", line 29, in main
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
trainer = Trainer(args)
File "/content/drive/MyDrive/deocclusion/trainer.py", line 61, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
args.model, load_pretrain=args.load_pretrain, dist_model=True)
File "/content/drive/MyDrive/deocclusion/models/partial_completion_mask.py", line 16, in init
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
super(PartialCompletionMask, self).init(params, dist_model)
File "/content/drive/MyDrive/deocclusion/models/single_stage_model.py", line 16, in init
broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
self.model = utils.DistModule(self.model)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 16, in init
broadcast_params(self.module)broadcast_params(self.module)broadcast_params(self.module)broadcast_params(self.module)

broadcast_params(self.module)
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params

File "/content/drive/MyDrive/deocclusion/utils/distributed_utils.py", line 32, in broadcast_params
dist.broadcast(p, 0) dist.broadcast(p, 0)

  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast

File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
dist.broadcast(p, 0)dist.broadcast(p, 0)

      File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast

File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
dist.broadcast(p, 0)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 1076, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).work = default_pg.broadcast([tensor], opts)

work = default_pg.broadcast([tensor], opts)work = default_pg.broadcast([tensor], opts)RuntimeError
: RuntimeErrorNCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

RuntimeError: work = default_pg.broadcast([tensor], opts):
NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).RuntimeError
: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 6 (pid: 1211) of binary: /usr/bin/python3
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

manipulation

Hi Xiaohang, I ran your demos and the results are amazing. However, I do not find codes related to image manipulation such as delete, shift, reposition, and swap as you mentioned in your paper. Is it possible to also provide this part? Thanks a lot!

Bugs in `.backward()` while training PCNet-C.

Hi! Thanks for sharing the excellent codebase. It's very helpful!

I came across an issue related to the backward pass while training the PCNet-C network using the PartialCompletionContentCGAN network. The lines responsible for the errors are:

# update
self.optimD.zero_grad()
dis_loss.backward()
utils.average_gradients(self.netD)
self.optimD.step()


self.optim.zero_grad()
gen_loss.backward()
utils.average_gradients(self.model)
self.optim.step()

If we comment out either of the .backward() lines, the error goes away.

I am using Pytorch 1.8.1.

Concerning the overfitting problem in PCNet-M and PCNet-C

Hi Xiaohang,

Thanks for your releasing code and the demo is really amazing. Recently I test few images on COCO validation set based on your pre-train models, also I use the demo images you used. However the results are frustrated and far satisfied, could you please check it?

This is COCOA/2.jpg and COCOA/2.json ground-truth annotation.

This is COCOA/2.jpg and CenterMask instance segmentation results.

The output segmentation result is slightly different with the ground-truth, but, as we can see, the instance is not completed well.

Obtained Result

After training PCNet-C on the the KINS dataset,what do these generated images in the folder "/home/jddx/wxp/deocclusion/experiments/KINS/pcnet_c/images" mean?
Is it the content completion representing the validation set image?
But these png pictures are like depth maps, they're pitch black.

another segmentation model fails

hi, I have ran your demo and the results are good, however, when I tried to use the maskrcnn model of pytorch to obtain the bounding boxes and modal information, the inpainting result is bad, I have visualized the bounding boxes and masks detected by maskrcnn and compared with yours, and they don't have large difference, so I wonder whether you have encountered this problem? thanks a lot.

Prediction of amodal mask for own labeled dataset

hi,thanks for youe work!
I labeled the image with the labeling tool as shown

Why does it produce a modal mask and an amodal mask that are almost identical?
Why does it produce 3 boxes？

我把demo_cocoa.ipynb 转成py文件，运行报错

No such file or directory: '../data/COCOA/annotations/COCO_amodal_val2014.json'

How to make my own data set

How to make my own data set？

Question about Loader in demo_cocoa.ipynb

Hi, I met a problem in demo_cocoa.ipynb, it seems that the problem is from yaml package, then I try to change the yaml.load(f) to yaml.safe_load or full load, or add a argument loader, but the result is still the same. Does someone knows how to deal with this issue? Thank you in advance.

Question about your metric computation

Hi Xiaohang Zhan,

Your framework seems to be elegant and easy to use. And the work you proposed is inspiring. However, I think there is one unreasonable computation on your metric (IOU).

deocclusion/tools/test.py

Line 195 in c8439ea

miou = intersection_rec.sum / (union_rec.sum + 1e-10) # mIoU

As we can see, you sum all foreground pixels and all background pixels together of the whole dataset, and then use the total pixel number to get the final IOU result. As far as my experience, it is not the common style to calculate IOU. Although you also use the computation way for other methods, it is fair in your paper.

Could you please clarify the computation style? I think it significantly enlarges the metric difference between Method Raw and Method PCNet-M.

Regards,
Qiang Zhou

Order matrix not correct even after adjusting 'th' and 'dilate_kernel'

Hi, I was testing out the demo codes on this RGB image

and its instance mask

I slowly increased the 'th' and 'dilate_kernel' from (0.1, 5), (0.3, 7), (0.5, 9), (0.7, 11) respectively as advised in #14 but the order matrix is still not correct.

From the order matrix, Instance 3 (table) is not considered as an occluder to Instance 5 (bench). May I ask for some advice as to how to improve the situation?

ValueError: Unterminated string

Hi, when I run `demos/demo_kins.ipynb`, the following error occurs. How should it be solved? Thank you!

JSONDecodeError Traceback (most recent call last)
in
6 annot_path = "../data/KINS/instances_{}.json".format(phase)
7
----> 8 data_reader = KINSLVISDataset('KINS', annot_path)

~/fuxian/Self-Supervised_Scene_De-occlusion/deocclusion-master/datasets/reader.py in init(self, dataset, annot_fn)
133 def init(self, dataset, annot_fn):
134 self.dataset = dataset
--> 135 data = cvb.load(annot_fn)
136 self.images_info = data['images']
137 self.annot_info = data['annotations']

~/anaconda3/envs/py3-env/lib/python3.7/site-packages/cvbase/io.py in load(file, format, **kwargs)
114 if format not in processors:
115 raise TypeError('Unsupported format: ' + format)
--> 116 return processors[format](file, **kwargs)
117
118

~/anaconda3/envs/py3-env/lib/python3.7/site-packages/cvbase/io.py in json_load(file)
18 if isinstance(file, str):
19 with open(file, 'r') as f:
---> 20 obj = json.load(f)
21 elif hasattr(file, 'read'):
22 obj = json.load(file)

~/anaconda3/envs/py3-env/lib/python3.7/json/init.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
--> 296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
297
298

~/anaconda3/envs/py3-env/lib/python3.7/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
346 parse_int is None and parse_float is None and
347 parse_constant is None and object_pairs_hook is None and not kw):
--> 348 return _default_decoder.decode(s)
349 if cls is None:
350 cls = JSONDecoder

~/anaconda3/envs/py3-env/lib/python3.7/json/decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):

~/anaconda3/envs/py3-env/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
351 """
352 try:
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
355 raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Unterminated string starting at: line 1 column 18350068 (char 18350067)
08/13/2020 08:26:47 PM INFO: Shutdown kernel
08/13/2020 08:26:47 PM WARNING: Exiting with nonzero exit status

amodel pred ours

Hello, I found that the amodal_pred_ours image was not marked when I was running, may I ask what the problem might be?

ValueError: Unexpected option: --local_rank=0

Traceback (most recent call last):
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1961, in main
setup = process_command_line(sys.argv)
File "/root/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_command_line_handling.py", line 145, in process_command_line
raise ValueError("Unexpected option: " + argv[i])
ValueError: Unexpected option: --local_rank=0
Usage:
pydevd.py --port N [(--client hostname) | --server] --file executable [file_options]

Process finished with exit code 0

when i debug，the error occurs。

a question about pcnet_c

excuse me, there is another question, how to get the generated image from pcnet_c?

TypeError: Expected bytes, got str.

Sorry, the above is the code I found when I searched Google for the same error. I accidentally copied it. The following is the error I reported while training under the deocclusion file:
Traceback (most recent call last):
File "main.py", line 51, in
main(args)
File "main.py", line 31, in main
trainer.run()
File "/home/peng/python_pro/pig_pro/deocclusion-master/trainer.py", line 122, in run
self.validate('on_val')
File "/home/peng/python_pro/pig_pro/deocclusion-master/trainer.py", line 206, in validate
for i, inputs in enumerate(self.val_loader):
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/anaconda3/envs/tpy36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_dataset.py", line 116, in getitem
idx, load_rgb=self.config['load_rgb'], randshift=True) # modal, uint8 {0, 1}
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/partial_comp_dataset.py", line 69, in _get_inst
modal, bbox, category, imgfn, _ = self.data_reader.get_instance(idx)
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/reader.py", line 108, in get_instance
modal, bbox, category = read_COCOA(reg, h, w)
File "/home/peng/python_pro/pig_pro/deocclusion-master/datasets/reader.py", line 52, in read_COCOA
modal = maskUtils.decode(rle).squeeze()
File "pycocotools/_mask.pyx", line 138, in pycocotools._mask.decode
File "pycocotools/_mask.pyx", line 122, in pycocotools._mask._frString
TypeError: Expected bytes, got str.
How should it be solved? Thank you!

Colab

Please add a collab. The interface can be taken from here: https://stackoverflow.com/questions/59630751/simple-ui-on-top-of-colab

COCOA annotations的下载链接

不知道如何从自述中给的COCOA annotations的下载链接中下载COCOA annotations

confusion with the code which makes "shuffle=False" in trainer.py

Hi, I am deeply confused about make shuffle=False in train loader. Is there any special reason for this?

deocclusion/trainer.py

Line 95 in ac543f9

self.train_loader = DataLoader(train_dataset,

self.train_loader = DataLoader(train_dataset,
                                           batch_size=args.data['batch_size'],
                                           shuffle=False,
                                           num_workers=args.data['workers'],
                                           pin_memory=False,
                                           sampler=train_sampler)

error when resume checkpoints

utils/scheduler.py", line 17, in init
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
"in param_groups[{}] when resuming an optimizer".format(i))

ModuleNotFoundError: No module named 'matting_api'

how to solve this error

convert to KINS format

How the json files convert to KINS format

evaluation issues

Hi, thanks for your work!
If training and validating with a custom dataset, how are the evaluation metrics acc_occpair and miu generated?
Is it the predicted amodal mask and the generated amodal mask annotations ? BecauseThe custom dataset does not contain amodal and order annotations
Can you answer my question? Thanks a lot

contours = np.subtract(contours, 1) error

ValueError Traceback (most recent call last)
Cell In[19], line 42
40 plt.axis('off')
41 plt.text(0, -10, title[i])
---> 42 pface, pedge = polygon_drawing(toshow[i], selidx, colors, bbox_show, thickness=3)
43 ax.add_collection(pface)
44 ax.add_collection(pedge)

File /deocclusion/demos/demo_utils.py:206, in polygon_drawing(masks, selidx, color_source, bbox, thickness)
204 masks = masks[:, u:b, l:r]
205 for i,am in enumerate(masks[selidx,...]):
--> 206 pts_list = reader.mask_to_polygon(am)
207 for pts in pts_list:
208 pts = np.array(pts).reshape(-1, 2)

File /deocclusion/datasets/reader.py:286, in mask_to_polygon(mask, tolerance, area_threshold)
284 contours = measure.find_contours(padded_mask, 0.5)
285 # Fix coordinates after padding
--> 286 contours = np.subtract(contours, 1)
287 for contour in contours:
288 if not np.array_equal(contour[0], contour[-1]):

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

when i infer amodal completion that error occured. how can i solve it?

The COCOA score that I get from testing the released model on the original code is not the same as the paper

order score is 0.82706 and mIOU is 0.76812
but in the paper they are 87.1% and 81.35%

Can you help me?Thank you.

I would like to ask about "run demo" related content

In the second step"Run demos/demo_cocoa.ipynb or demos/demo_kins.ipynb"

I don’t know how to run it or Use files in'.jpynb' format.
All i can do is to open it

xiaohangzhan / deocclusion Goto Github PK

deocclusion's People

Contributors

Stargazers

Watchers

Forkers

deocclusion's Issues

Hi, when I run demos/demo_kins.ipynb, the following error occurs. How should it be solved? Thank you!

Recommend Projects

Recommend Topics

Recommend Org

Hi, when I run `demos/demo_kins.ipynb`, the following error occurs. How should it be solved? Thank you!