Coder Social home page Coder Social logo

sgtr's People

Contributors

scarecrow0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sgtr's Issues

Fatal IO error while running using nohup

Thanks for your work!
I run the program and save the output to a file at
nohup pods_train --num-gpus 1 &> /home/XX/dockerFile/sgtr/log/test1.out&

After the training starts, the model runs normally for a few hours, and then reports an error:

XIO: fatal IO error 25 (Inappropriate ioctl for device) on X server "localhost:13.0"
after 8784 requests (8784 known processed) with 52 events remaining.

I checked online and it may be due to a call to matplotlib, which conflicts with nohup. So I would like to ask how to save the output when running, or, what to do to solve this problem?

results without graph constraint

Hi, thank you for sharing this wonderful work!
I notice that the results without graph constraint are lower than with graph constraint (VG: R@100 27.6, ngR@100 25.6), but ngR is usually much higher than R. I wonder what causes SGTR to have this feature :)

Hello,can you please tell me where is this config.py to modify the weight? Thank u

Training (IMPORTANT)

Prepare Faster-RCNN Detector
You can download the pretrained DETR on ResNet-101 we used in the paper:

[VG]
[OIv6]


Then,** you need to modify the pre-trained weight parameter MODEL.WEIGHTS in config.py playground/experiment/path/config_xx.py to the path of corresponding pre-trained detector weight to make sure you load the detection weight parameter correctly.


Bes**ides, you can also train your own detector weight by the provide configs in [Model

Model Selection

Hi,

Thanks for the great work! I have downloaded your pretrained checkpoint (sgtr_vg_new_pth) on the VG dataset, and found out from the log.txt that you have trained the network for 115200 iterations. However, the saved model you provide to us is model_0095999.pth. May I ask how do you select out the model_0095999.pth from all saved models?

Checkpoint Link (VG)

Thanks!

Training with multi GPUs

The code runs normally when training on a single GPU (pods_train --num-gpus 1)
However, it returns errors when training on multi GPUs (pods_train --num-gpus 2)
The log is as follows:

Scan GPUs to get 2 free GPU ([0, 1])
soft link to /data/Program/SGTR-main/outputs//playground/sgg/detr.res101.c5.one_stage_rel_tfmer
Command Line Args: Namespace(disable_gpu_check=False, dist_url='tcp://127.0.0.1:49192', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False, skip_git_commit=False)
Traceback (most recent call last):
File "/data/Program/SGTR-main/tools/train_net.py", line 295, in
args=(args,),
File "/data/Program/SGTR-main/cvpods/engine/launch.py", line 80, in launch
daemon=False,
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/data/Program/SGTR-main/cvpods/engine/launch.py", line 101, in _distributed_worker
comm.synchronize()
File "/data/Program/SGTR-main/cvpods/utils/distributed/comm.py", line 94, in synchronize
dist.barrier()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:891, internal error, NCCL version 21.0.3
ncclInternalError: Internal check failed. This is either a bug in NCCL or due to memory corruption

Do the author have any suggestions about this problem?

Change to SGCls and PredCls

Hi,

On the Visual Genome Dataset, I see that the current code will be trained in the SGDet (Scene Graph Detection) setting by default. I am wondering where I can make changes to train the model in the SGCls (Scene Graph Classification) setting and/or PredCls (Predicate Classification) setting? I am assuming that somewhere in some config file should set some related flags, but I can not find them...

Thanks!

The time of updating code

Thanks for your work about end-to-end SGG, it is interesting and inspiring!
I'd like to learn about more details about the work during running the code, and it would be wonderful if there is a clear time point of updating the code.
Thanks again for your work.

Can not load pre-trained DETR model.

Hi,
it is a nice work! :) But i can not load pre-trained DETR model. Can you share a new pre-trained DETR model?
Thanks!

Prepare Faster-RCNN Detector

You can download the pretrained DETR on ResNet-101 we used in the paper:
    [VG](https://shanghaitecheducn-my.sharepoint.com/:u:/g/personal/lirj2_shanghaitech_edu_cn/EfJK_InTsk9Hq9RgXEui4gsBsk3pekuzPYk4gTR8coBYAA?e=fAo647),
    [OIv6](https://shanghaitecheducn-my.sharepoint.com/:u:/g/personal/lirj2_shanghaitech_edu_cn/Edrab9pd0O1NuVoz9RPmZtoBwxnSyl-NVIFCjxf6yUZ7FA?e=uPQZ1A),

Unzip the checkpoint into the folder

Then, you need to modify the pre-trained weight parameter MODEL.WEIGHTS in config.py playground/experiment/path/config_xx.py to the path of corresponding pre-trained detector weight to make sure you load the detection weight parameter correctly.

Error details:

Traceback (most recent call last):
File "/home/zztao/anaconda3/envs/sgtr/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/zztao/dev/SGTR/cvpods/engine/launch.py", line 118, in _distributed_worker
main_func(*args)
File "/home/zztao/dev/SGTR/tools/train_net.py", line 230, in main
stage_main(args, config, build_model)
File "/home/zztao/dev/SGTR/tools/train_net.py", line 155, in stage_main
trainer.resume_or_load(resume=args.resume, load_mapping=cfg.MODEL.WEIGHTS_LOAD_MAPPING)
File "/home/zztao/dev/SGTR/cvpods/engine/trainer.py", line 436, in resume_or_load
self.start_iter = (self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS,
File "/home/zztao/dev/SGTR/cvpods/checkpoint/checkpoint.py", line 170, in resume_or_load
return self.load(path, load_mapping)
File "/home/zztao/dev/SGTR/cvpods/checkpoint/checkpoint.py", line 107, in load
checkpoint = self._load_file(path)
File "/home/zztao/dev/SGTR/cvpods/checkpoint/detection_checkpoint.py", line 65, in _load_file
loaded = super()._load_file(filename) # load native pth checkpoint
File "/home/zztao/dev/SGTR/cvpods/checkpoint/checkpoint.py", line 194, in _load_file
return torch.load(f, map_location=torch.device("cpu"))
File "/home/zztao/anaconda3/envs/sgtr/lib/python3.8/site-packages/torch/serialization.py", line 600, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/zztao/anaconda3/envs/sgtr/lib/python3.8/site-packages/torch/serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

The output results may not respect graph constraints.

Thank you for wonderful work!
I notice that the bipartite graph assembling procedure only restricts that there are no self-loops, but does not restrict the multiple relationships (predicates) to match the identical subject-object pair, which means that the result does not comply with the graph constraints.
I'm not sure whether my understanding is wrong.

Pretrained Faster-RCNN backbone

Thank you for offering the DETR backbone pretrained on the Visual Genome dataset! I would appreciate it if you could also provide us with the Faster-RCNN (with ResNet-101 FPN or ResNeXt-101-FPN backbone) pretrained on VG, which has been mentioned in your paper. Thanks.

Training time on the datasets

Hi authors,

Thank you so much for the great work!

May I know how long does it take to train your model on Visual Genome and OpenImages with 4 GPUs?

Thanks you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.