scarecrow0 / sgtr Goto Github PK

View Code? Open in Web Editor NEW

73.0 73.0 7.0 982 KB

License: Apache License 2.0

Python 93.59% C++ 2.35% Cuda 3.93% Shell 0.13%

sgtr's People

Contributors

Stargazers

Watchers

Forkers

champon1020 kanghoonyoon siqizhou xinyulyu maelic emiya-syw rascalgdd brian265686

sgtr's Issues

Fatal IO error while running using nohup

Thanks for your work!
I run the program and save the output to a file at
nohup pods_train --num-gpus 1 &> /home/XX/dockerFile/sgtr/log/test1.out&

After the training starts, the model runs normally for a few hours, and then reports an error：

XIO: fatal IO error 25 (Inappropriate ioctl for device) on X server "localhost:13.0"
after 8784 requests (8784 known processed) with 52 events remaining.

I checked online and it may be due to a call to matplotlib, which conflicts with nohup. So I would like to ask how to save the output when running, or, what to do to solve this problem?

results without graph constraint

Hi, thank you for sharing this wonderful work!
I notice that the results without graph constraint are lower than with graph constraint (VG: R@100 27.6, ngR@100 25.6), but ngR is usually much higher than R. I wonder what causes SGTR to have this feature :)

Hello，can you please tell me where is this config.py to modify the weight? Thank u

Training (IMPORTANT)

Prepare Faster-RCNN Detector
You can download the pretrained DETR on ResNet-101 we used in the paper:

[VG]
[OIv6]

Then,** you need to modify the pre-trained weight parameter MODEL.WEIGHTS in config.py playground/experiment/path/config_xx.py to the path of corresponding pre-trained detector weight to make sure you load the detection weight parameter correctly.

Bes**ides, you can also train your own detector weight by the provide configs in [Model

when will the code release?

Can you give an approximate date?

Model Selection

Hi,

Thanks for the great work! I have downloaded your pretrained checkpoint (sgtr_vg_new_pth) on the VG dataset, and found out from the log.txt that you have trained the network for 115200 iterations. However, the saved model you provide to us is model_0095999.pth. May I ask how do you select out the model_0095999.pth from all saved models?

Checkpoint Link (VG)

Thanks!

Only the code and results on the sgdet subtask are posted?

Training with multi GPUs

The code runs normally when training on a single GPU (pods_train --num-gpus 1)
However, it returns errors when training on multi GPUs (pods_train --num-gpus 2)
The log is as follows:

Scan GPUs to get 2 free GPU ([0, 1])
soft link to /data/Program/SGTR-main/outputs//playground/sgg/detr.res101.c5.one_stage_rel_tfmer
Command Line Args: Namespace(disable_gpu_check=False, dist_url='tcp://127.0.0.1:49192', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False, skip_git_commit=False)
Traceback (most recent call last):
File "/data/Program/SGTR-main/tools/train_net.py", line 295, in
args=(args,),
File "/data/Program/SGTR-main/cvpods/engine/launch.py", line 80, in launch
daemon=False,
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/data/Program/SGTR-main/cvpods/engine/launch.py", line 101, in _distributed_worker
comm.synchronize()
File "/data/Program/SGTR-main/cvpods/utils/distributed/comm.py", line 94, in synchronize
dist.barrier()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:891, internal error, NCCL version 21.0.3
ncclInternalError: Internal check failed. This is either a bug in NCCL or due to memory corruption

Do the author have any suggestions about this problem?

Change to SGCls and PredCls

Hi,

On the Visual Genome Dataset, I see that the current code will be trained in the SGDet (Scene Graph Detection) setting by default. I am wondering where I can make changes to train the model in the SGCls (Scene Graph Classification) setting and/or PredCls (Predicate Classification) setting? I am assuming that somewhere in some config file should set some related flags, but I can not find them...

Thanks!

The time of updating code

Thanks for your work about end-to-end SGG, it is interesting and inspiring!
I'd like to learn about more details about the work during running the code, and it would be wonderful if there is a clear time point of updating the code.
Thanks again for your work.

对gpu有什么要求嘛？实验室有4块2080，不知道能不能跑起来

Can not load pre-trained DETR model.

Hi,
it is a nice work! :) But i can not load pre-trained DETR model. Can you share a new pre-trained DETR model?
Thanks!

Prepare Faster-RCNN Detector

You can download the pretrained DETR on ResNet-101 we used in the paper:
    [VG](https://shanghaitecheducn-my.sharepoint.com/:u:/g/personal/lirj2_shanghaitech_edu_cn/EfJK_InTsk9Hq9RgXEui4gsBsk3pekuzPYk4gTR8coBYAA?e=fAo647),
    [OIv6](https://shanghaitecheducn-my.sharepoint.com/:u:/g/personal/lirj2_shanghaitech_edu_cn/Edrab9pd0O1NuVoz9RPmZtoBwxnSyl-NVIFCjxf6yUZ7FA?e=uPQZ1A),

Unzip the checkpoint into the folder

Then, you need to modify the pre-trained weight parameter MODEL.WEIGHTS in config.py playground/experiment/path/config_xx.py to the path of corresponding pre-trained detector weight to make sure you load the detection weight parameter correctly.

Error details:

Traceback (most recent call last):
File "/home/zztao/anaconda3/envs/sgtr/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/zztao/dev/SGTR/cvpods/engine/launch.py", line 118, in _distributed_worker
main_func(*args)
File "/home/zztao/dev/SGTR/tools/train_net.py", line 230, in main
stage_main(args, config, build_model)
File "/home/zztao/dev/SGTR/tools/train_net.py", line 155, in stage_main
trainer.resume_or_load(resume=args.resume, load_mapping=cfg.MODEL.WEIGHTS_LOAD_MAPPING)
File "/home/zztao/dev/SGTR/cvpods/engine/trainer.py", line 436, in resume_or_load
self.start_iter = (self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS,
File "/home/zztao/dev/SGTR/cvpods/checkpoint/checkpoint.py", line 170, in resume_or_load
return self.load(path, load_mapping)
File "/home/zztao/dev/SGTR/cvpods/checkpoint/checkpoint.py", line 107, in load
checkpoint = self._load_file(path)
File "/home/zztao/dev/SGTR/cvpods/checkpoint/detection_checkpoint.py", line 65, in _load_file
loaded = super()._load_file(filename) # load native pth checkpoint
File "/home/zztao/dev/SGTR/cvpods/checkpoint/checkpoint.py", line 194, in _load_file
return torch.load(f, map_location=torch.device("cpu"))
File "/home/zztao/anaconda3/envs/sgtr/lib/python3.8/site-packages/torch/serialization.py", line 600, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/zztao/anaconda3/envs/sgtr/lib/python3.8/site-packages/torch/serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

如何训练在自己特定数据集下的场景图生成呢，是否有构建数据集的工具呢请问

About inference code for custom images

Thank you for your wonderful work!

Is there any plan to upload code about custom image inference code(Predcls, SGCls, SGDet)?

Thanks.

The output results may not respect graph constraints.

Thank you for wonderful work!
I notice that the bipartite graph assembling procedure only restricts that there are no self-loops, but does not restrict the multiple relationships (predicates) to match the identical subject-object pair, which means that the result does not comply with the graph constraints.
I'm not sure whether my understanding is wrong.

Code/Weights for bi-level sampled model

I couldn't find either model weights or code for the bi-level sampled model. Can you help me out here?

Pretrained Faster-RCNN backbone

Thank you for offering the DETR backbone pretrained on the Visual Genome dataset! I would appreciate it if you could also provide us with the Faster-RCNN (with ResNet-101 FPN or ResNeXt-101-FPN backbone) pretrained on VG, which has been mentioned in your paper. Thanks.

Training time on the datasets

Hi authors,

Thank you so much for the great work!

May I know how long does it take to train your model on Visual Genome and OpenImages with 4 GPUs?

Thanks you!