Coder Social home page Coder Social logo

kaihuatang / scene-graph-benchmark.pytorch Goto Github PK

View Code? Open in Web Editor NEW
1.0K 17.0 228.0 29.28 MB

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”

License: MIT License

Jupyter Notebook 93.19% Python 5.81% Dockerfile 0.02% C++ 0.12% Cuda 0.79% C 0.07%

scene-graph-benchmark.pytorch's Introduction

Scene Graph Benchmark in Pytorch

LICENSE Python PyTorch

Our paper Unbiased Scene Graph Generation from Biased Training has been accepted by CVPR 2020 (Oral).

Recent Updates

  • 2020.06.23 Add no graph constraint mean Recall@K (ng-mR@K) and no graph constraint Zero-Shot Recall@K (ng-zR@K) [link]
  • 2020.06.23 Allow scene graph detection (SGDet) on custom images [link]
  • 2020.07.21 Change scene graph detection output on custom images to json files [link]
  • 2020.07.21 Visualize detected scene graphs of custom images [link]
  • TODO: Using Background-Exempted Inference to improve the quality of TDE Scene Graph

Contents

  1. Overview
  2. Install the Requirements
  3. Prepare the Dataset
  4. Metrics and Results for our Toolkit
  5. Faster R-CNN Pre-training
  6. Scene Graph Generation as RoI_Head
  7. Training on Scene Graph Generation
  8. Evaluation on Scene Graph Generation
  9. Detect Scene Graphs on Your Custom Images 🌟
  10. Visualize Detected Scene Graphs of Custom Images 🌟
  11. Other Options that May Improve the SGG
  12. Tips and Tricks for TDE on any Unbiased Task
  13. Frequently Asked Questions
  14. Citations

Overview

This project aims to build a new CODEBASE of Scene Graph Generation (SGG), and it is also a Pytorch implementation of the paper Unbiased Scene Graph Generation from Biased Training. The previous widely adopted SGG codebase neural-motifs is detached from the recent development of Faster/Mask R-CNN. Therefore, I decided to build a scene graph benchmark on top of the well-known maskrcnn-benchmark project and define relationship prediction as an additional roi_head. By the way, thanks to their elegant framework, this codebase is much more novice-friendly and easier to read/modify for your own projects than previous neural-motifs framework(at least I hope so). It is a pity that when I was working on this project, the detectron2 had not been released, but I think we can consider maskrcnn-benchmark as a more stable version with less bugs, hahahaha. I also introduce all the old and new metrics used in SGG, and clarify two common misunderstandings in SGG metrics in METRICS.md, which cause abnormal results in some papers.

Benefit from the up-to-date Faster R-CNN in maskrcnn-benchmark, this codebase achieves new state-of-the-art Recall@k on SGCls & SGGen (by 2020.2.16) through the reimplemented VCTree using two 1080ti GPUs and batch size 8:

Models SGGen R@20 SGGen R@50 SGGen R@100 SGCls R@20 SGCls R@50 SGCls R@100 PredCls R@20 PredCls R@50 PredCls R@100
VCTree 24.53 31.93 36.21 42.77 46.67 47.64 59.02 65.42 67.18

Note that all results of VCTree should be better than what we reported in Unbiased Scene Graph Generation from Biased Training, because we optimized the tree construction network after the publication.

The illustration of the Unbiased SGG from 'Unbiased Scene Graph Generation from Biased Training'

alt text

Installation

Check INSTALL.md for installation instructions.

Dataset

Check DATASET.md for instructions of dataset preprocessing.

Metrics and Results (IMPORTANT)

Explanation of metrics in our toolkit and reported results are given in METRICS.md

Pretrained Models

Since we tested many SGG models in our paper Unbiased Scene Graph Generation from Biased Training, I won't upload all the pretrained SGG models here. However, you can download the pretrained Faster R-CNN we used in the paper, which is the most time consuming step in the whole training process (it took 4 2080ti GPUs). As to the SGG model, you can follow the rest instructions to train your own, which only takes 2 GPUs to train each SGG model. The results should be very close to the reported results given in METRICS.md

After you download the Faster R-CNN model, please extract all the files to the directory /home/username/checkpoints/pretrained_faster_rcnn. To train your own Faster R-CNN model, please follow the next section.

The above pretrained Faster R-CNN model achives 38.52/26.35/28.14 mAp on VG train/val/test set respectively.

Alternate links

Since OneDrive links might be broken in mainland China, we also provide the following alternate links for all the pretrained models and dataset annotations using BaiduNetDisk:

Link:https://pan.baidu.com/s/1oyPQBDHXMQ5Tsl0jy5OzgA Extraction code:1234

Faster R-CNN pre-training

The following command can be used to train your own Faster R-CNN model:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=4 tools/detector_pretrain_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 8 TEST.IMS_PER_BATCH 4 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.STEPS "(30000, 45000)" SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 MODEL.RELATION_ON False OUTPUT_DIR /home/kaihua/checkpoints/pretrained_faster_rcnn SOLVER.PRE_VAL False

where CUDA_VISIBLE_DEVICES and --nproc_per_node represent the id of GPUs and number of GPUs you use, --config-file means the config we use, where you can change other parameters. SOLVER.IMS_PER_BATCH and TEST.IMS_PER_BATCH are the training and testing batch size respectively, DTYPE "float16" enables Automatic Mixed Precision supported by APEX, SOLVER.MAX_ITER is the maximum iteration, SOLVER.STEPS is the steps where we decay the learning rate, SOLVER.VAL_PERIOD and SOLVER.CHECKPOINT_PERIOD are the periods of conducting val and saving checkpoint, MODEL.RELATION_ON means turning on the relationship head or not (since this is the pretraining phase for Faster R-CNN only, we turn off the relationship head), OUTPUT_DIR is the output directory to save checkpoints and log (considering /home/username/checkpoints/pretrained_faster_rcnn), SOLVER.PRE_VAL means whether we conduct validation before training or not.

Scene Graph Generation as RoI_Head

To standardize the SGG, I define scene graph generation as an RoI_Head. Referring to the design of other roi_heads like box_head, I put most of the SGG codes under maskrcnn_benchmark/modeling/roi_heads/relation_head and their calling sequence is as follows:

alt text

Perform training on Scene Graph Generation

There are three standard protocols: (1) Predicate Classification (PredCls): taking ground truth bounding boxes and labels as inputs, (2) Scene Graph Classification (SGCls) : using ground truth bounding boxes without labels, (3) Scene Graph Detection (SGDet): detecting SGs from scratch. We use two switches MODEL.ROI_RELATION_HEAD.USE_GT_BOX and MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL to select the protocols.

For Predicate Classification (PredCls), we need to set:

MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True

For Scene Graph Classification (SGCls):

MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False

For Scene Graph Detection (SGDet):

MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False

Predefined Models

We abstract various SGG models to be different relation-head predictors in the file roi_heads/relation_head/roi_relation_predictors.py, which are independent of the Faster R-CNN backbone and relation-head feature extractor. To select our predefined models, you can use MODEL.ROI_RELATION_HEAD.PREDICTOR.

For Neural-MOTIFS Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor

For Iterative-Message-Passing(IMP) Model (Note that SOLVER.BASE_LR should be changed to 0.001 in SGCls, or the model won't converge):

MODEL.ROI_RELATION_HEAD.PREDICTOR IMPPredictor

For VCTree Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR VCTreePredictor

For our predefined Transformer Model (Note that Transformer Model needs to change SOLVER.BASE_LR to 0.001, SOLVER.SCHEDULE.TYPE to WarmupMultiStepLR, SOLVER.MAX_ITER to 16000, SOLVER.IMS_PER_BATCH to 16, SOLVER.STEPS to (10000, 16000).), which is provided by Jiaxin Shi:

MODEL.ROI_RELATION_HEAD.PREDICTOR TransformerPredictor

For Unbiased-Causal-TDE Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor

The default settings are under configs/e2e_relation_X_101_32_8_FPN_1x.yaml and maskrcnn_benchmark/config/defaults.py. The priority is command > yaml > defaults.py

Customize Your Own Model

If you want to customize your own model, you can refer maskrcnn-benchmark/modeling/roi_heads/relation_head/model_XXXXX.py and maskrcnn-benchmark/modeling/roi_heads/relation_head/utils_XXXXX.py. You also need to add corresponding nn.Module in maskrcnn-benchmark/modeling/roi_heads/relation_head/roi_relation_predictors.py. Sometimes you may also need to change the inputs & outputs of the module through maskrcnn-benchmark/modeling/roi_heads/relation_head/relation_head.py.

As to the Unbiased-Causal-TDE, there are some additional parameters you need to know. MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE is used to select the causal effect analysis type during inference(test), where "none" is original likelihood, "TDE" is total direct effect, "NIE" is natural indirect effect, "TE" is total effect. MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE has two choice "sum" or "gate". Since Unbiased Causal TDE Analysis is model-agnostic, we support Neural-MOTIFS, VCTree and VTransE. MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER is used to select these models for Unbiased Causal Analysis, which has three choices: motifs, vctree, vtranse.

Note that during training, we always set MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE to be 'none', because causal effect analysis is only applicable to the inference/test phase.

Examples of the Training Command

Training Example 1 : (PreCls, Motif Model)

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=2 tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR /home/kaihua/checkpoints/motif-precls-exmp

where GLOVE_DIR is the directory used to save glove initializations, MODEL.PRETRAINED_DETECTOR_CKPT is the pretrained Faster R-CNN model you want to load, OUTPUT_DIR is the output directory used to save checkpoints and the log. Since we use the WarmupReduceLROnPlateau as the learning scheduler for SGG, SOLVER.STEPS is not required anymore.

Training Example 2 : (SGCls, Causal, TDE, SUM Fusion, MOTIFS Model)

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10026 --nproc_per_node=2 tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR /home/kaihua/checkpoints/causal-motifs-sgcls-exmp

Evaluation

Examples of the Test Command

Test Example 1 : (PreCls, Motif Model)

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/motif-precls-exmp OUTPUT_DIR /home/kaihua/checkpoints/motif-precls-exmp

Test Example 2 : (SGCls, Causal, TDE, SUM Fusion, MOTIFS Model)

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10028 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/causal-motifs-sgcls-exmp OUTPUT_DIR /home/kaihua/checkpoints/causal-motifs-sgcls-exmp

Examples of Pretrained Causal MOTIFS-SUM models

Examples of Pretrained Causal MOTIFS-SUM models on SGDet/SGCls/PredCls (batch size 12): (SGDet Download), (SGCls Download), (PredCls Download)

Corresponding Results (The original models used in the paper are lost. These are the fresh ones, so there are some fluctuations on the results. More results can be found in Reported Results):

Models R@20 R@50 R@100 mR@20 mR@50 mR@100 zR@20 zR@50 zR@100
MOTIFS-SGDet-none 25.42 32.45 37.26 4.36 5.83 7.08 0.02 0.08 0.24
MOTIFS-SGDet-TDE 11.92 16.56 20.15 6.58 8.94 10.99 1.54 2.33 3.03
MOTIFS-SGCls-none 36.02 39.25 40.07 6.50 8.02 8.51 1.06 2.18 3.07
MOTIFS-SGCls-TDE 20.47 26.31 28.79 9.80 13.21 15.06 1.91 2.95 4.10
MOTIFS-PredCls-none 59.64 66.11 67.96 11.46 14.60 15.84 5.79 11.02 14.74
MOTIFS-PredCls-TDE 33.38 45.88 51.25 17.85 24.75 28.70 8.28 14.31 18.04

SGDet on Custom Images

Note that evaluation on custum images is only applicable for SGDet model, because PredCls and SGCls model requires additional ground-truth bounding boxes information. To detect scene graphs into a json file on your own images, you need to turn on the switch TEST.CUSTUM_EVAL and give a folder path (or a json file containing a list of image paths) that contains the custom images to TEST.CUSTUM_PATH. Only JPG files are allowed. The output will be saved as custom_prediction.json in the given DETECTED_SGG_DIR.

Test Example 1 : (SGDet, Causal TDE, MOTIFS Model, SUM Fusion) (checkpoint)

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/causal-motifs-sgdet OUTPUT_DIR /home/kaihua/checkpoints/causal-motifs-sgdet TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH /home/kaihua/checkpoints/custom_images DETECTED_SGG_DIR /home/kaihua/checkpoints/your_output_path

Test Example 2 : (SGDet, Original, MOTIFS Model, SUM Fusion) (same checkpoint)

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/causal-motifs-sgdet OUTPUT_DIR /home/kaihua/checkpoints/causal-motifs-sgdet TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH /home/kaihua/checkpoints/custom_images DETECTED_SGG_DIR /home/kaihua/checkpoints/your_output_path

The output is a json file. For each image, the scene graph information is saved as a dictionary containing bbox(sorted), bbox_labels(sorted), bbox_scores(sorted), rel_pairs(sorted), rel_labels(sorted), rel_scores(sorted), rel_all_scores(sorted), where the last rel_all_scores give all 51 predicates probability for each pair of objects. The dataset information is saved as custom_data_info.json in the same DETECTED_SGG_DIR.

Visualize Detected SGs of Custom Images

To visualize the detected scene graphs of custom images, you can follow the jupyter note: visualization/3.visualize_custom_SGDet.jpynb. The inputs of our visualization code are custom_prediction.json and custom_data_info.json in DETECTED_SGG_DIR. They will be automatically generated if you run the above custom SGDet instruction successfully. Note that there may be too much trivial bounding boxes and relationships, so you can select top-k bbox and predicates for better scene graphs by change parameters box_topk and rel_topk.

Other Options that May Improve the SGG

  • For some models (not all), turning on or turning off MODEL.ROI_RELATION_HEAD.POOLING_ALL_LEVELS will affect the performance of predicate prediction, e.g., turning it off will improve VCTree PredCls but not the corresponding SGCls and SGGen. For the reported results of VCTree, we simply turn it on for all three protocols like other models.

  • For some models (not all), a crazy fusion proposed by Learning to Count Object will significantly improves the results, which looks like f(x1, x2) = ReLU(x1 + x2) - (x1 - x2)**2. It can be used to combine the subject and object features in roi_heads/relation_head/roi_relation_predictors.py. For now, most of our model just concatenate them as torch.cat((head_rep, tail_rep), dim=-1).

  • Not to mention the hidden dimensions in the models, e.g., MODEL.ROI_RELATION_HEAD.CONTEXT_HIDDEN_DIM. Due to the limited time, we didn't fully explore all the settings in this project, I won't be surprised if you improve our results by simply changing one of our hyper-parameters

Tips and Tricks for any Unbiased TaskX from Biased Training

The counterfactual inference is not only applicable to SGG. Actually, my collegue Yulei found that counterfactual causal inference also has significant potential in unbiased VQA. We believe such an counterfactual inference can also be applied to lots of reasoning tasks with significant bias. It basically just runs the model two times (one for original output, another for the intervened output), and the later one gets the biased prior that should be subtracted from the final prediction. But there are three tips you need to bear in mind:

  • The most important things is always the causal graph. You need to find the correct causal graph with an identifiable branch that causes the biased predictions. If the causal graph is incorrect, the rest would be meaningless. Note that causal graph is not the summarization of the existing network (but the guidance to build networks), you should modify your network based on causal graph, but not vise versa.
  • For those nodes having multiple input branches in the causal graph, it's crucial to choose the right fusion function. We tested lots of fusion funtions and only found the SUM fusion and GATE fusion consistently working well. The fusion function like element-wise production won't work for TDE analysis in most of the cases, because the causal influence from multiple branches can not be linearly separated anymore, which means, it's no longer an identifiable 'influence'.
  • For those final predictions having multiple input branches in the causal graph, it may also need to add auxiliary losses for each branch to stablize the causal influence of each independent branch. Because when these branches have different convergent speeds, those hard branches would easily be learned as unimportant tiny floatings that depend on the fastest/stablest converged branch. Auxiliary losses allow different branches to have independent and equal influences.

Frequently Asked Questions:

  1. Q: Fail to load the given checkpoints. A: The model to be loaded is based on the last_checkpoint file in the OUTPUT_DIR path. If you fail to load the given pretained checkpoints, it probably because the last_checkpoint file still provides the path in my workstation rather than your own path.

  2. Q: AssertionError on "assert len(fns) == 108073" A: If you are working on VG dataset, it is probably caused by the wrong DATASETS (data path) in maskrcnn_benchmark/config/paths_catlog.py. If you are working on your custom datasets, just comment out the assertions.

  3. Q: AssertionError on "l_batch == 1" in model_motifs.py A: The original MOTIFS code only supports evaluation on 1 GPU. Since my reimplemented motifs is based on their code, I keep this assertion to make sure it won't cause any unexpected errors.

Citations

If you find this project helps your research, please kindly consider citing our project or papers in your publications.

@misc{tang2020sggcode,
title = {A Scene Graph Generation Codebase in PyTorch},
author = {Tang, Kaihua},
year = {2020},
note = {\url{https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch}},
}

@inproceedings{tang2018learning,
  title={Learning to Compose Dynamic Tree Structures for Visual Contexts},
  author={Tang, Kaihua and Zhang, Hanwang and Wu, Baoyuan and Luo, Wenhan and Liu, Wei},
  booktitle= "Conference on Computer Vision and Pattern Recognition",
  year={2019}
}

@inproceedings{tang2020unbiased,
  title={Unbiased Scene Graph Generation from Biased Training},
  author={Tang, Kaihua and Niu, Yulei and Huang, Jianqiang and Shi, Jiaxin and Zhang, Hanwang},
  booktitle= "Conference on Computer Vision and Pattern Recognition",
  year={2020}
}

scene-graph-benchmark.pytorch's People

Contributors

bighuang624 avatar kaihuatang avatar karim-53 avatar navidre avatar rafiberlin avatar shonejin avatar zhanwenchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scene-graph-benchmark.pytorch's Issues

Failed at pre-evaluation when training with MotifPredictor for SGCLS & SGDET

Hi,

This happened during pre-evaluation (SOLVER.PRE_VAL = True) before the training started. I adopted the config file with PREDICTOR: "MotifPredictor", adapting it for SGCLS or SGDET, i.e.

MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False, or

MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False,

the following assertion was triggered

It seems the evaluation can proceed if I comment out the line, I'll let the training run and see where it leads me to. So, still not sure if that's an issue/bug/etc?

best,
Julius

The roi_relation_feature_extractor

❓ Questions and Help

I saw that the roi_relation_feature_extractor is Motif-style. Do you mean that all the methods (Motif, IMP, VCTree) you implemented adopt this Motif-style relation feature extractor although their original implementations do not extract features in such a way? In other words, all the methods are only different in terms of the "predictor", right? Thanks.

Unable to use public kernels

Dear reader,

I am unable to install this repo on a win10 machine (using these instructions)
and unable to install it on public kernels: Google Colab and the kernel provided by Kaggle.

Steps to Reproduce

  1. Open this jupyter notebook (link) with Google Colaboratory
  2. Activate the GPU (Edit -> parameter of the notebook -> ...)
  3. Run all cells
    Section "install PyTorch Detection (Scene-Graph-Benchmark.pytorch)" will not work

if the installation succeed the output should be

Installed /content/Scene
Processing dependencies for maskrcnn-benchmark==0.1
Finished processing dependencies for maskrcnn-benchmark==0.1

otherwise you can get:

RuntimeError: Error compiling objects for extension

Environment

PyTorch version: 1.5.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: version 3.12.0

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 418.67
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.4
[pip3] torch==1.5.0+cu101
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.3.1
[pip3] torchvision==0.6.0+cu101
[conda] Could not collect

Additional context

Surprisingly, the installation is successful if the used kernel is not equipped with a GPU, but it is useless.
If we compare pip freeze on a CPU kernel with a GPU kernel the difference is:
only in GPU: cupy-cuda101==6.5.0
only in CPU: -e git+https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch.git@db02790a60bb9b9f7c270352820968b2f2089469#egg=maskrcnn_benchmark because the installation was successful
Therefore, I strongly think that this cannot be solved by installing another version of a certain package, but who knows...

Thank you for your help. Once this would be fixed I hope we can provide a public Jupiter notebook that anyone can run directly and use online.

Need help for training a detector

❓ Questions and Help

Hi Kaihua!

Following the cmd in cmd.cache to train a detector based on VGG16 but I always got "Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.940911144672375e-213".

As you said, the cost of training a detector is some unacceptable. Compared to model based on X_101 backbone, it is harder to find a one based on VGG16, so would you like to upload the pretain weight of FasterRCNN based on VGG16? It may be a little difficult, but absolutely will be a super precious resource.

Best wishes.

'get_img_info' Error on SGDet on Custom Images

❓ Questions and Help

I run SGDet on Custom Images on my own images, but the code load the 'VG information'(eg, height and width) in datasets/image_data.json in maskrcnn_benchmark/data/datasets/visual_genome.py line 121 functionget_custom_imgs. This leads to a mismatch between my images and the VG images. Should I make a json file which contains width and heightinformation on my own image? or what shoule I do?

About used GPU number for training

Firstly, thanks for this excellent work.
I notice that we need 4 gpus for training detector and 2gpus for training SGG.
eg., for detector,

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=4 tools/detector_pretrain_net.py ...(omit other parameters)

If I wanna train it just using ONE gpu, can I manually reduce SOLVER.IMS_PER_BATCH and run

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=1 tools/detector_pretrain_net.py ...(omit other parameters)

I tried it and it can start training. I want to ask if there are any problem with above command? What's the right command for single GPU training?

`KeyError: 'labels'` for predictions while doing PredCls

❓ Questions and Help

In vg_eval.py, following lines will report error KeyError: 'labels' while testing.

if mode == 'predcls':
    label = prediction.get_field('labels').detach().cpu().numpy()

Since prediction only have two extra fields: pred_labels and pred_scores.

Is this a bug? Thanks

torch version

Hello my friend, I'm testing your code with pytorch1.4.0 and torchvision0.5.0 on cuda10.0 python3.7.0. (This is the default version using conda install pytorch torchvision cudatoolkit=10.0 -c pytorch). However the version seems to be mismatched since there is an import error with symbol _ZN6caffe26detail36_typeMetaDataInstance_preallocated_7E.
Stackoverflow suggested to change the version of my torch or torchvision but I'm not sure which version I should use.

inference speed

Hi, I'm just testing the pretrained faster_rcnn you provided and I found the speed is really slow. It took 0.3s to inference on average, and around 3 hours for the entire test set on 1 gpu. I wonder what is it normal speed for inference?
BTW, I met some trouble installing apex and ended up with installing the python only version of it. Does it have something to do with the slow speed?
Thanks!

Size mismatch

❓ Questions and Help

Hello! I am facing the following issue:

  File "tools/detector_pretrain_net.py", line 315, in <module>
    main()
  File "tools/detector_pretrain_net.py", line 308, in main
    model = train(cfg, args.local_rank, args.distributed, logger)
  File "tools/detector_pretrain_net.py", line 72, in train
    extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT, update_schedule=cfg.SOLVER.UPDATE_SCHEDULE_DURING_LOAD)
  File "/home/lkochiev/Documents/SFU/NSM/SGB/Scene-Graph-Benchmark.pytorch/maskrcnn_benchmark/utils/checkpoint.py", line 65, in load
    self._load_model(checkpoint, load_mapping)
  File "/home/lkochiev/Documents/SFU/NSM/SGB/Scene-Graph-Benchmark.pytorch/maskrcnn_benchmark/utils/checkpoint.py", line 106, in _load_model
    load_state_dict(self.model, checkpoint.pop("model"), load_mapping)
  File "/home/lkochiev/Documents/SFU/NSM/SGB/Scene-Graph-Benchmark.pytorch/maskrcnn_benchmark/utils/model_serialization.py", line 94, in load_state_dict
    model.load_state_dict(model_state_dict)
  File "/home/lkochiev/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/nn/modules/module.py", line 829, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN:
        size mismatch for roi_heads.box.feature_extractor.fc7.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([2048, 4096]).
        size mismatch for roi_heads.box.feature_extractor.fc7.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for roi_heads.box.predictor.cls_score.weight: copying a param with shape torch.Size([151, 4096]) from checkpoint, the shape in current model is torch.Size([151, 2048]).
        size mismatch for roi_heads.box.predictor.bbox_pred.weight: copying a param with shape torch.Size([604, 4096]) from checkpoint, the shape in current model is torch.Size([604, 2048]).
Traceback (most recent call last):
  File "/home/lkochiev/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/lkochiev/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/lkochiev/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/lkochiev/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main

while running the follwoing command:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=1 tools/detector_pretrain_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 6 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.STEPS "(30000, 45000)" SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 MODEL.RELATION_ON True MODEL.ATTRIBUTE_ON True OUTPUT_DIR /home/lkochiev/checkpoints/pretrained_faster_rcnn SOLVER.PRE_VAL False

Can I ask you how can I eliminate it?

Some questions about SGCls Predictions

❓ Questions and Help

SGCls: given bounding boxes and image, predicting labels and relationships.
It means the model needs to predict labels about bounding boxes. I print the predictions of the SGCls model, but the pred_scores are all 1. I am confused about why these values are not ranging from 0 to 1. Could you help me to understand it?
image

Visualize SGDet

🐛 Bug

I have already prepared all required files, and other lines are all running correctly. But when I run the last line code, some errors are coming up.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-65157587e5d4> in <module>
----> 1 show_all(start_idx=707, length=1)
      2 # show_selected([119, 967, 713, 5224, 19681, 25371])

<ipython-input-6-1557c3d6a601> in show_all(start_idx, length)
      8         print(f'Image {cand_idx}:')
      9         img_path, boxes, labels, pred_labels, pred_scores, gt_rels, pred_rels, pred_rel_score, pred_rel_label = get_info_by_idx(cand_idx, detected_origin_result)
---> 10         draw_image(img_path=img_path, boxes=boxes, labels=labels, pred_labels=pred_labels, pred_scores=pred_scores, gt_rels=gt_rels, pred_rels=pred_rels, pred_rel_score=pred_rel_score, pred_rel_label=pred_rel_label, print_img=True)

<ipython-input-5-1fd503fdce41> in draw_image(img_path, boxes, labels, pred_labels, pred_scores, gt_rels, pred_rels, pred_rel_score, pred_rel_label, print_img)
     29         print_list('gt_rels', gt_rels, None)
     30         print('*' * 50)
---> 31     print_list('pred_labels', pred_labels, pred_rel_score)
     32     print('*' * 50)
     33     print_list('pred_rels', pred_rels, pred_rel_score)

<ipython-input-5-1fd503fdce41> in print_list(name, input_list, scores)
     10 def print_list(name, input_list, scores):
     11     for i, item in enumerate(input_list):
---> 12         if scores == None:
     13             print(name + ' ' + str(i) + ': ' + str(item))
     14         else:

TypeError: eq() received an invalid combination of arguments - got (NoneType), but expected one of:
 * (Tensor other)
      didn't match because some of the arguments have invalid types: (!NoneType!)
 * (Number other)
      didn't match because some of the arguments have invalid types: (!NoneType!)

MOTIFS-SGDet-none and Neural Motifs

Hi, dear author, are these two models, MOTIFS-SGDet-none and Neural Motifs, the same? If not, which one is the baseline listed in the paper.

GPU memory

❓ Questions and Help

I found that the object detector is trained with the ResNeXt101-32x8d-FPN backbone. I want to know the memory it used on a single GPU. I have tried the ResNeXt101-64x4d-FPN backbone and the peak memory it used is about 11GB on a single GPU (2 images per GPU, and memory of each of my GPU is 12GB). I wonder if it's enough for further SGG training because additional modules will be added. Should I reduce the number of images on a single GPU to one?

error in inferencing

Hi, I'm evaluating on VG Dataset using the pretrained model you provided. And I got this warning from time to time.

2020-05-04 20:19:40,514 maskrcnn_benchmark.inference WARNING: WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!Number of images that were gathered from multiple processes is nota contiguous set. Some images might be missing from the evaluation

I found it somehow related to the gpu number I am using. Sometime I got the warning but when I change the gpu number and run it in a completely config, I didn't got it any more. It's strange and I don't know in what condition will I got this warning.
Do you have any idea about what's going on here?
Thanks!

About pretrained detection model

umm, it's me again.
I notice that you release your pretrained faster-rcnn detection model. I download and evaluate it.

CUDA_VISIBLE_DEVICES=1 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools/detector_pretest_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/myname/glove MODEL.PRETRAINED_DETECTOR_CKPT ./checkpoints/pretrained_faster_rcnn OUTPUT_DIR ./checkpoints/pretrained_faster_rcnn

btw, I fixed a bug to run above command. The "last_checkpoint" string is the absolute path which contains "kaihua". We need modify some code in "./maskrcnn_benchmark/utils/checkpoint.py".

The result is "Detection evaluation mAp=0.2635", which is far from the value 0.296 mentioned in your paper. Is there something wrong with my evalution command?

Training errors

❓ Questions and Help

Replace the backbone with ResNeSt and load pre-trained model, but loss is always nan. And after a long time, the program is corrupted.

.....
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.6727647100921956e-51
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.3363823550460978e-51
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 6.681911775230489e-52
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.3409558876152446e-52
2020-05-10 08:34:41,021 maskrcnn_benchmark INFO: eta: 1 day, 19:57:40  iter: 200  loss: nan (nan)  loss_classifier: nan (nan)  loss_box_reg: 0.0000 (nan)  loss_objectness: 0.6554 (3.2355)  loss_rpn_box_reg: 0.0783 (0.1020)  time: 1.6072 (1.5858)  data: 0.0132 (0.0184)  lr: 0.023000  max mem: 8743
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.6704779438076223e-52
...
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1e-323
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5e-324
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0
Traceback (most recent call last):
  File "tools/detector_pretrain_net.py", line 317, in <module>
    main()
  File "tools/detector_pretrain_net.py", line 310, in main
    model = train(cfg, args.local_rank, args.distributed, logger)
  File "tools/detector_pretrain_net.py", line 128, in train
    scaled_losses.backward()
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/amp/handle.py", line 123, in scale_loss
    optimizer._post_amp_backward(loss_scaler)
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
    post_backward_models_are_masters(scaler, params, stashed_grads)
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/amp/_process_optimizer.py", line 131, in post_backward_models_are_masters
    scaler.unscale_with_stashed(
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/amp/scaler.py", line 176, in unscale_with_stashed
    out_scale/grads_have_scale,   # 1./scale,
ZeroDivisionError: float division by zero
Traceback (most recent call last):
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/mist/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/mist/anaconda3/envs/scene_graph_benchmark/bin/python', '-u', 'tools/detector_pretrain_net.py', '--local_rank=0', '--config-file', 'configs/e2e_det_ResNeSt_FPN.yaml', 'SOLVER.IMS_PER_BATCH', '4', 'TEST.IMS_PER_BATCH', '1', 'DTYPE', 'float16', 'SOLVER.MAX_ITER', '100000', 'SOLVER.STEPS', '(30000, 45000)', 'SOLVER.VAL_PERIOD', '2000', 'SOLVER.CHECKPOINT_PERIOD', '2000', 'MODEL.RELATION_ON', 'False', 'OUTPUT_DIR', '/home/mist/checkpoints/resnest', 'SOLVER.PRE_VAL', 'False', 'SOLVER.BASE_LR', '0.01', 'GLOVE_DIR', '/home/mist/glove', 'MODEL.WEIGHT', '/home/mist/checkpoints/resnest/resnest_det_new.pth']' returned non-zero exit status 1.

[🐛 Bug Report] Some images in val/test sets are filtered by program

🐛 Bug

Hi,
I found a little glitch in your code.
At the config for build up the datasets maskrcnn_benchmark/config/paths_catalog.py, line 159.

# IF MODEL.RELATION_ON is True, filter images with empty rels
# else set filter to False, because we need all images for pretraining detector
args['filter_non_overlap'] = (
                                 not cfg.MODEL.ROI_RELATION_HEAD.USE_GT_BOX) and cfg.MODEL.RELATION_ON and cfg.MODEL.ROI_RELATION_HEAD.REQUIRE_BOX_OVERLAP
args['filter_empty_rels'] = cfg.MODEL.RELATION_ON
args['flip_aug'] = cfg.MODEL.FLIP_AUG
return dict(
    factory="VGDataset",
    args=args,
)

The args 'filter_empty_rels' will be always set as True if the relation mode is set on, even while building up the dataset for validation or testing, So there are some images that are filtered in the test and validation set.
I tried to fix it by modifying the line159 to:

args['filter_empty_rels'] = cfg.MODEL.RELATION_ON and split == 'train'

The model(Motifs baseline) obtains a lower performance of object detection (mAP50 drop 4~5) and scene graph generation(Recall@100 drop 4~5 ). You may need to check this issue.

issues when testing on images without anotations

❓ Questions and Help

Hello,

    Thanks for your great work! Recently, I am using your code and pre-trained model for scene graph generation on other images but without ground truth annotations, and for the code in 
x, result, detector_losses '= self.roi_heads(features, proposals, targets, logger)

and

    def assign_label_to_proposals(self, proposals, targets):
        for img_idx, (target, proposal) in enumerate(zip(targets, proposals)):
            match_quality_matrix = boxlist_iou(target, proposal)
            matched_idxs = self.proposal_matcher(match_quality_matrix)
            # Fast RCNN only need "labels" field for selecting the targets
            target = target.copy_with_fields(["labels", "attributes"])
            matched_targets = target[matched_idxs.clamp(min=0)]
            
            labels_per_image = matched_targets.get_field("labels").to(dtype=torch.int64)
            attris_per_image = matched_targets.get_field("attributes").to(dtype=torch.int64)

            labels_per_image[matched_idxs < 0] = 0
            attris_per_image[matched_idxs < 0, :] = 0
            proposals[img_idx].add_field("labels", labels_per_image)
            proposals[img_idx].add_field("attributes", attris_per_image)
        return proposals

They all require targets, when I pass the

targets = None

There will be the following errors:

Traceback (most recent call last):
  File "tools/relation_test_net.py", line 113, in <module>
    main()
  File "tools/relation_test_net.py", line 97, in main
    inference(
  File "/scenegraph/Scene-Graph-Benchmark/maskrcnn_benchmark/engine/inference.py", line 146, in inference
    predictions = compute_on_dataset(model, data_loader, device, synchronize_gather=cfg.TEST.RELATION.SYNC_GATHER, timer=inference_timer)
  File "/scenegraph/Scene-Graph-Benchmark/maskrcnn_benchmark/engine/inference.py", line 39, in compute_on_dataset
    output = model(images.to(device), targets)
  File "anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scenegraph/Scene-Graph-Benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 53, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets, logger)
  File "/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scenegraph/Scene-Graph-Benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 27, in forward
    x, detections, loss_box = self.box(features, proposals, targets)
  File "anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scenegraph/Scene-Graph-Benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 67, in forward
    proposals = self.samp_processor.assign_label_to_proposals(proposals, targets)
  File "/scenegraph/Scene-Graph-Benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/sampling.py", line 119, in assign_label_to_proposals
    for img_idx, (target, proposal) in enumerate(zip(targets, proposals)):
TypeError: 'NoneType' object is not iterable
Traceback (most recent call last):
  File "/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main

Is this a bug, since when testing on other images, targets information is not necessary, right?
Thank you for your help!

Question about resampling

Hi!

Thanks for your interesting work and this nice codebase! 6666

I have some questions about the details of the resampling baseline. In the paper, I saw that the resampling is meant "Rare categories were up-sampled by the inversed sample fraction during training." The scene graph generation is a detection problem, you can upsample the data by repeating the images or repeat the relationship in one batch.
If upsampled by repeating images, one image may contain multiple relationships, which means the common categories may be upsampled at the same time. If upsampled by repeating relationships means to scale up the loss of the rare relationship in each batch by multiply some number.

The code and the reference [3] in the paper seem to have not shown this resampling process on the scene graph generation task specifically.

Could you tell the detail about this resampling technic?

Looking forward to your reply.

A bug in the reimplementation of Neural-Motifs

🐛 Bug

In the motif_utils.py/sort_by_score function:

 scores = scores.split(num_rois, dim=0)
    ordered_scores = []
    for i, (score, num_roi) in enumerate(zip(scores, num_rois)):
        ordered_scores.append( score - 2.0 * float(num_roi * 2 * num_im + i) )
    ordered_scores = cat(ordered_scores, dim=0)
    _, perm = torch.sort(ordered_scores, 0, descending=True)
    num_rois = sorted(num_rois, reverse=True)

The original implementation by Zellers Rowan is:

 for i, s, e in enumerate_by_image(im_inds):
        rois_per_image[i] = 2 * (s - e) * num_im + i
        lengths.append(e - s)
    lengths = sorted(lengths, reverse=True)
    inds, ls_transposed = transpose_packed_sequence_inds(lengths)  # move it to TxB form
    inds = torch.LongTensor(inds).cuda(im_inds.get_device())
   roi_order = scores - 2 * rois_per_image[im_inds]

We can see that actually the rois_per_image[i] is the negative value since it is (s-e). So the roi_order is actually, scores + 2 x (2 x num_roi x num_im + i). In this way, the bboxes of the images with more bboxes will rank first. However, in your implementation, the image is ranked by num_roi in descending order, while the bboxes are ranked in ascending order. It will make mistakes. It should be modified as "ordered_scores.append( score + 2.0 x float(num_roi x 2 x num_im + i) )"

Configuration and hyper parameters for training transformer model to achieve reported performance

Hi Kaihua,

Thanks for your great framework!
I met some trouble in reproducing reported performance on transformer model. For PredCls task, there's a gap of about 0.3 percent between my results (66.97) and reported performance (67.29).

I used provided pretrained detector and configs/e2e_relation_X_101_32_8_FPN_1x.yaml as config. As instructed in METRICS.md I set SOLVER.BASE_LR to 0.001.
My environment is like:

python=3.7.6
pytorch=1.2.0
torchvision=0.4.0
CUDA=10.0

I've tried only on transformer model for PredCls task, so I'm not sure whether this issue exists on other tasks/models or not. Are there any additional changes on config or hyper parameters that's needed to achieve the reported performance, or is there something that I missed in the documents?

Thanks for your help!

AssertionError

❓ Questions and Help

File "/home/xxd/1 scene graph/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/data/datasets/visual_genome.py", line 291, in load_image_filenames
assert len(fns) == 108073
AssertionError

Does anyone encounter this problem?thk

About the "baseline" in the paper.

❓ Questions and Help

I observed that there is a little difference between directly using the MotifPredictor and using CausalPredictor for Motif (other methods also have this problem). So which one do you use to train the "baseline" in the paper for every method (Motif, VCTree, VTransE)?

head_rect, tail_rect

❓ Questions and Help

Thanks for your contribution. I really appreciate your efforts for this repo.

May I ask what is the usage for this head_rect and tail rect in roi_relation_feature_extractors.py? I couldn't get it even though I have read through the papers (VCTree, Motitf..).

Is there a way to restrict dataset size

❓ Questions and Help

Is there an easy way to restrict the data set size to only about a 100 examples? (just to sanity check my model to make sure it converges)

Train SGDET Directly?

Hi, dear author, do we need to pretrain sgcls and then finetune sgdet to get the final results on sgdet metrics like neural motifs? It would be nice if you could provide scripts that generates the results in the paper.

About the pair accuracy

❓ Questions and Help

I wonder if you can get the "PREDDET" (no graph constraints) results reported in the paper "Neural Motifs [CVPR 2018]" or the so-called "PREDCLS" in "Graph Contrastive Loss [CVPR 2019]". I know that this is actually the "PairAccuracy" in your codes. The results in those 2 papers are about 96%-98%. However, I only got about 84-85% with your PairAccuracy. I think that your code is right. But your paper did not report this result. So I may need your help. Thanks.

Pretrained checkpoint

Hi, is it possible for you to provide the pre-trained checkpoints for TDE models. I want to try inference with your pretrained models. Thank you!

parameter passing

❓ Questions and Help

obj_dists, obj_preds, edge_ctx, _ = self.context_layer(roi_features, proposals, logger)

In this line you are passing [roi_features, proposals, logger] to the LSTMcontext class

def forward(self, x, proposals, rel_pair_idxs, logger=None, all_average=False, ctx_average=False):

But shouldn't you pass these parameter [x, proposals, rel_pair_idxs] instead? What I understand is the [rel_pair_idxs] is referred to [logger] based on your current code implementation.

facing problem in training code

❓ Questions and Help

Sir i m facing an issue

  1. undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs
  2. subprocess.CalledProcessError: Command '['/home/cse/anaconda3/envs/scene_graph_benchmark/bin/python', u'-u', 'tools/relation_train_net.py', u'--local_rank=1', '--config-file', 'configs/e2e_relation_X_101_32_8_FPN_1x.yaml', 'MODEL.ROI_RELATION_HEAD.USE_GT_BOX', 'True', 'MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL', 'False', 'MODEL.ROI_RELATION_HEAD.PREDICTOR', 'CausalAnalysisPredictor', 'MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE', 'none', 'MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE', 'sum', 'MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER', 'motifs', 'SOLVER.IMS_PER_BATCH', '12', 'TEST.IMS_PER_BATCH', '2', 'DTYPE', 'float16', 'SOLVER.MAX_ITER', '50000', 'SOLVER.VAL_PERIOD', '2000', 'SOLVER.CHECKPOINT_PERIOD', '2000', 'GLOVE_DIR', '/home/cse/neural/Scene-Graph-Benchmark.pytorch/glove', 'MODEL.PRETRAINED_DETECTOR_CKPT', '/home/cse/neural/Scene-Graph-Benchmark.pytorch/checkpoints/pretrained_faster_rcnn/model_final.pth', 'OUTPUT_DIR', '/home/cse/neural/Scene-Graph-Benchmark.pytorch/checkpoints/causal-motifs-sgcls-exmp']' returned non-zero exit status 1

please suggest me any solution for resolving these issues.

Training only uses 2 gpus

❓ Questions and Help

Regardless of the number of GPUs that are made visible via "CUDA_VISIBLE_DEVICES", the training program only uses 2 GPUs.
Is there any setting that I'm missing? would love some help.

Could you provide the VG-SGG-dicts-with-attri.json

Hi, Kaihua,
Thanks for your excellent framework. I encountered trouble when I tried to run your code.
It needs the VG-SGG-dicts-with-attri.json file to make the code run, but I find that you have only provided the link of VG-SGG-with-attri.h5 in the prepare and dataset.md. Could you please tell me the way to get the VG-SGG-dicts-with-attri.json?
Your reply will be appreciated.

Good day!

Update docker file

🚀 Feature

Fix the docker file

I've been struggling endlessly trying to make by tuning the versions of pytorch, cuda, and torchvision and it just wont work.

Regarding zero shot evaluation

❓ Questions and Help

Hi,
First of all this is a great repository.
I enjoyed reading your paper.

I wanted to know regarding the zero shot evaluation, what are the splits you used because when i checked all the triplets in the training set and test set that are most commonly used (Language priors, iterative message passing and motifs) i did not find any triplets that appear in the test set and did not appear in the train set.

How are models synchronized across GPUS

❓ Questions and Help

How are the models and gradients synchronized when training in the multigpu regime?
I get that each process has its own copy of the model and that each process gets its own batch every iteration. When and where are the models synchronized across the gpus? do they have the same starting weights?

Would love a clarification on how the distributed training works in this repo as I didn't fully understand just by looking at the code

Transformer model performance

❓ Questions and Help

I used the default configs except the SOLVER.BASE_LR set to 0.001 to train the Transformer model. However, I couldn't get the reported results. The R@K drop by ~2% for three protocols.

What other configs should I change to train the Transformer model to reach the reported results?

Thanks

Problem of loading word vectors

❓ Questions and Help

When I train or test model, this warning will appear. Does this warning make effect on the performance of model? And, how to avoid this warning?

loading word vectors from /home/mist/glove/glove.6B.200d.pt
__background__ -> __background__ 
fail on __background__
loading word vectors from /home/mist/glove/glove.6B.200d.pt
__background__ -> __background__ 
fail on __background__

Traning Error

When I trained causal TDE with VCTree, after one epoch, the log is that
creating index... index created! Loading and preparing results... Converting ndarray to lists... (398878, 7) 0/398878 DONE (t=3.14s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* DONE (t=61.00s). Accumulating evaluation results... DONE (t=16.11s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.004 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.004 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.004 2020-03-22 19:37:24,802 maskrcnn_benchmark INFO: Detection evaluation mAp=0.0002 SGG eval: R @ 20: 0.0000; R @ 50: 0.0000; R @ 100: 0.0000; for mode=sgdet, type=Recall(Main). SGG eval: ngR @ 20: 0.0000; ngR @ 50: 0.0000; ngR @ 100: 0.0000; for mode=sgdet, type=No Graph Constraint Recall(Main). SGG eval: zR @ 20: 0.0000; zR @ 50: 0.0000; zR @ 100: 0.0000; for mode=sgdet, type=Zero Shot Recall. SGG eval: mR @ 20: 0.0000; mR @ 50: 0.0000; mR @ 100: 0.0000; for mode=sgdet, type=Mean Recall. (above:0.0000) (across:0.0000) (against:0.0000) (along:0.0000) (and:0.0000) (at:0.0000) (attached to:0.0000) (behind:0.0000) (belonging to:0.0000) (between:0.0000) (carrying:0.0000) (covered in:0.0000) (covering:0.0000) (eating:0.0000) (flying in:0.0000) (for:0.0000) (from:0.0000) (growing on:0.0000) (hanging from:0.0000) (has:0.0000) (holding:0.0000) (in:0.0000) (in front of:0.0000) (laying on:0.0000) (looking at:0.0000) (lying on:0.0000) (made of:0.0000) (mounted on:0.0000) (near:0.0000) (of:0.0000) (on:0.0000) (on back of:0.0000) (over:0.0000) (painted on:0.0000) (parked on:0.0000) (part of:0.0000) (playing:0.0000) (riding:0.0000) (says:0.0000) (sitting on:0.0000) (standing on:0.0000) (to:0.0000) (under:0.0000) (using:0.0000) (walking in:0.0000) (walking on:0.0000) (watching:0.0000) (wearing:0.0000) (wears:0.0000) (with:0.0000) 2020-03-22 19:37:27,902 maskrcnn_benchmark INFO: Start training Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 2020-03-22 19:37:47,367 maskrcnn_benchmark INFO: ---Total norm inf clip coef 0.00000----------------- 2020-03-22 19:37:47,406 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.vision_prior.weight: inf, (torch.Size([1, 1537])) 2020-03-22 19:37:47,406 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.vis_compress.weight: inf, (torch.Size([51, 4096])) 2020-03-22 19:37:47,406 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.feature_extractor.fc7.weight: 20.78024, (torch.Size([4096, 4096])) 2020-03-22 19:37:47,406 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.feature_extractor.fc6.weight: 20.11942, (torch.Size([4096, 12544])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.feature_extractor.pooler.reduce_channel.0.weight: 4.14919, (torch.Size([256, 1024, 3, 3])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.decoderLSTM.px.weight: 3.35416, (torch.Size([512, 5136])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.box_feature_extractor.fc7.weight: 2.27102, (torch.Size([4096, 4096])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.decoderLSTM.iofux.weight: 2.19413, (torch.Size([2560, 5136])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.out.weight: 2.18532, (torch.Size([151, 512])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.4.weight: 1.96494, (torch.Size([256, 128, 3, 3])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.ctx_compress.weight: 1.93220, (torch.Size([51, 4096])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.spt_emb.2.weight: 1.83346, (torch.Size([4096, 512])) 2020-03-22 19:37:47,407 maskrcnn_benchmark INFO: module.roi_heads.relation.box_feature_extractor.fc6.weight: 1.15214, (torch.Size([4096, 12544])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.post_cat.0.weight: 1.12126, (torch.Size([4096, 1024])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.ctx_compress.bias: 0.96463, (torch.Size([51])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.vis_compress.bias: 0.96335, (torch.Size([51])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.px.weight: 0.95863, (torch.Size([256, 4808])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.px.weight: 0.95250, (torch.Size([256, 4808])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.out.bias: 0.80306, (torch.Size([151])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.spt_emb.0.weight: 0.62462, (torch.Size([512, 32])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofux.weight: 0.59710, (torch.Size([1280, 4808])) 2020-03-22 19:37:47,408 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffux.weight: 0.57092, (torch.Size([1536, 4808])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.spt_emb.2.bias: 0.56506, (torch.Size([4096])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.0.weight: 0.54830, (torch.Size([128, 2, 7, 7])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.px.weight: 0.50012, (torch.Size([256, 4424])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.px.weight: 0.49038, (torch.Size([256, 4424])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.decoderLSTM.iofuh.weight: 0.43853, (torch.Size([2560, 512])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofux.weight: 0.32212, (torch.Size([1280, 4424])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffux.weight: 0.31670, (torch.Size([1536, 4424])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.decoderLSTM.px.bias: 0.25716, (torch.Size([512])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.feature_extractor.fc7.bias: 0.22816, (torch.Size([4096])) 2020-03-22 19:37:47,409 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.score_pre.weight: 0.19370, (torch.Size([512, 512])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.spt_emb.0.bias: 0.18030, (torch.Size([512])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.score_sub.weight: 0.17972, (torch.Size([512, 512])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.decoderLSTM.iofux.bias: 0.17216, (torch.Size([2560])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.decoderLSTM.iofuh.bias: 0.17216, (torch.Size([2560])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.score_obj.weight: 0.16978, (torch.Size([512, 512])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.6.bias: 0.14980, (torch.Size([256])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.feature_extractor.fc6.bias: 0.09835, (torch.Size([4096])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.0.bias: 0.09271, (torch.Size([128])) 2020-03-22 19:37:47,410 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.px.bias: 0.08824, (torch.Size([256])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.px.bias: 0.08800, (torch.Size([256])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofuh.weight: 0.08449, (torch.Size([1280, 256])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_reduce.weight: 0.08011, (torch.Size([128, 4096])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_right.weight: 0.07955, (torch.Size([1536, 256])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.feature_extractor.pooler.reduce_channel.0.bias: 0.07541, (torch.Size([256])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.vision_prior.bias: 0.06488, (torch.Size([1])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofux.bias: 0.05902, (torch.Size([1280])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofuh.bias: 0.05902, (torch.Size([1280])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.box_feature_extractor.fc7.bias: 0.05845, (torch.Size([4096])) 2020-03-22 19:37:47,411 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.pos_embed.2.weight: 0.05703, (torch.Size([128, 32])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffux.bias: 0.05547, (torch.Size([1536])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_left.bias: 0.05547, (torch.Size([1536])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_right.bias: 0.05547, (torch.Size([1536])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_right.weight: 0.05066, (torch.Size([1536, 256])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.post_emb.weight: 0.04889, (torch.Size([1024, 512])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofuh.weight: 0.04459, (torch.Size([1280, 256])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.px.bias: 0.04133, (torch.Size([256])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.px.bias: 0.04059, (torch.Size([256])) 2020-03-22 19:37:47,412 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.overlap_embed.0.weight: 0.03534, (torch.Size([128, 6])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.6.weight: 0.03523, (torch.Size([256])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.post_cat.0.bias: 0.03501, (torch.Size([4096])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_embed1.weight: 0.03427, (torch.Size([151, 200])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.2.weight: 0.03308, (torch.Size([128])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.4.bias: 0.03284, (torch.Size([256])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.pos_embed.0.weight: 0.03209, (torch.Size([32, 9])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.union_feature_extractor.rect_conv.2.bias: 0.03060, (torch.Size([128])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofux.bias: 0.02700, (torch.Size([1280])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_backward.treeLSTM.iofuh.bias: 0.02700, (torch.Size([1280])) 2020-03-22 19:37:47,413 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffux.bias: 0.02620, (torch.Size([1536])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_left.bias: 0.02620, (torch.Size([1536])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_right.bias: 0.02620, (torch.Size([1536])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.pos_embed.2.bias: 0.02409, (torch.Size([128])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.post_emb.bias : 0.02241, (torch.Size([1024])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.emb_reduce.weight: 0.02194, (torch.Size([128, 200])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.freq_bias.obj_baseline.weight: 0.02130, (torch.Size([22801, 51])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.box_embed.0.weight: 0.02100, (torch.Size([128, 9])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.box_feature_extractor.fc6.bias: 0.01448, (torch.Size([4096])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.score_sub.bias: 0.01442, (torch.Size([512])) 2020-03-22 19:37:47,414 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.score_obj.bias: 0.01403, (torch.Size([512])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.pos_embed.1.bias: 0.01337, (torch.Size([32])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.score_pre.bias: 0.01170, (torch.Size([512])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.overlap_embed.1.weight: 0.00795, (torch.Size([128])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.pos_embed.1.weight: 0.00637, (torch.Size([32])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.emb_reduce.bias: 0.00607, (torch.Size([128])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_reduce.bias: 0.00599, (torch.Size([128])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.decoder_rnn.obj_embed.weight: 0.00583, (torch.Size([152, 200])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_embed2.weight: 0.00520, (torch.Size([151, 200])) 2020-03-22 19:37:47,415 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.box_embed.1.weight: 0.00474, (torch.Size([128])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.overlap_embed.1.bias: 0.00398, (torch.Size([128])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.box_embed.1.bias: 0.00379, (torch.Size([128])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.edge_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_left.weight: 0.00101, (torch.Size([1536, 256])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.obj_ctx_rnn.multi_layer_lstm.0.treeLSTM_foreward.treeLSTM.ioffuh_left.weight: 0.00039, (torch.Size([1536, 256])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.bi_freq_prior.weight: 0.00002, (torch.Size([1, 22801])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.box_embed.0.bias: 0.00000, (torch.Size([128])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.pos_embed.0.bias: 0.00000, (torch.Size([32])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: module.roi_heads.relation.predictor.context_layer.overlap_embed.0.bias: 0.00000, (torch.Size([128])) 2020-03-22 19:37:47,416 maskrcnn_benchmark INFO: ------------------------------- Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 2020-03-22 20:06:12,746 maskrcnn_benchmark INFO: eta: 4 days, 23:18:05 iter: 200 loss: 2.3552 (3.5001) auxiliary_ctx: 0.1831 (0.4863) auxiliary_frq: 0.2194 (0.2859) auxiliary_vis: 0.1982 (0.3998) binary_loss: 0.6931 (0.6737) loss_refine_obj: 0.7161 (1.1575) loss_rel: 0.3178 (0.4969) time: 7.8887 (8.6242) data: 0.0262 (0.0358) lr: 0.054984 max mem: 8340

The GPUs are still occupied, but no more logs. What's wrong with me?
I just run
CUDA_VISIBLE_DEVICES=9,8 python -m torch.distributed.launch --master_port 10026 --nproc_per_node=2 tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER vctree SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR ./GLOVE MODEL.PRETRAINED_DETECTOR_CKPT ./checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR ./output/VCTrees_out

Causal Effect Analysis

❓ Questions and Help

I want to ask a simple question: what is the difference between using CausalAnalysisPredictor and setting the effect_analysis=True in other predictors (Motifs)? Thanks.

Issue with Evaluation

In my machine, I tried to evaluate the performance with one of the pretrained model. The inference is done completely, however, in the evaluation phase I see this error in the screenshot. Any help?

Capture

❓ Questions and Help

AssertionError

❓ Questions and Help

Hello there! I am running the following command: CUDA_VISIBLE_DEVICES=1 python -m torch.distribode=1 tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE suR motifs SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000GB/Scene-Graph-Benchmark.pytorch MODEL.PRETRAINED_DETECTOR_CKPT /home/lkochiev/checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR /home/lkochiev/checkpoints/causal-motifs-sgcls-exmp And I expect the model to start training, while I'm getting the following:

File "/home/lkochiev/Documents/SFU/NSM/SGB/Scene-Graph-Benchmark.pytorch/maskrcnn_benchmark/modeling/roi_heads/relation_head/model_motifs.py", line 178, in forward' assert l_batch == 1 AssertionError

Could I ask for some tips how to solve it?
I am using PyTorch 1.4, CUDA 10.0, 2080ti. Thank you in advance!

How to get the predicted relationship?

❓ Questions and Help

It seems that the file "eval_results.pytorch" generated from the evaluation code only contains a list of "BoxList" objects, which describe the informatino about the detected objects.
How can I get the predicted relationship between each pair of detected objects?

Groundtruth of VisualGenome Dataset

❓ Questions and Help

In line 147 in the visual_genome.py (function get_groundtruth), what is the purpose to add "if (random.random() > 0.5):"?

GQA datasets support?

❓ Questions and Help

In your configs, I saw there exist difference between VG and GQA. But I cannot find the support for the GQA dataset.So any ideas about the GQA support?

How to run SGDet on COCO dataset

❓ Questions and Help

Dear author:
Thank you for your awesome code!
I just want use the SGDet module to get "Box, object, relation" of images in COCO dataset for my work, so what should do? How can I use it or which part of your code should I edit?

NO-MATCHING of current module

I download the pre-trained model predcls, and run as below:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/mist/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/mist/checkpoints/motif-precls/model_0030000.pth OUTPUT_DIR /home/mist/checkpoints/motif-precls MODEL.WEIGHT /home/mist/checkpoints/motif-precls/model_0030000.pth
But it seems the checkpoint lacks some modules. And the results are also wired. mAP detection is around 100%, and other metrics are all lower than you reported. What's wrong with it?

2020-05-07 05:34:12,114 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.post_cat.bias of shape (4096,)
2020-05-07 05:34:12,114 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.post_cat.weight of shape (4096, 1024)
2020-05-07 05:34:12,115 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.rel_compress.bias of shape (51,)
2020-05-07 05:34:12,115 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.rel_compress.weight of shape (51, 4096)
2020-05-07 05:34:17,258 maskrcnn_benchmark.inference INFO: Start evaluation on VG_stanford_filtered_with_attribute_test dataset(26446 images).

The result shows below

DONE (t=32.50s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.999
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.999
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.999
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.999
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.659
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.995
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 1.000

2020-05-07 07:08:52,592 maskrcnn_benchmark.inference INFO: 
====================================================================================================
Detection evaluation mAp=0.9995
====================================================================================================
SGG eval:   R @ 20: 0.4845;   R @ 50: 0.5942;   R @ 100: 0.6388;  for mode=predcls, type=Recall(Main).
SGG eval: ngR @ 20: 0.5314; ngR @ 50: 0.7008; ngR @ 100: 0.8024;  for mode=predcls, type=No Graph Constraint Recall(Main).
SGG eval:  zR @ 20: 0.0002;  zR @ 50: 0.0013;  zR @ 100: 0.0022;  for mode=predcls, type=Zero Shot Recall.
SGG eval:  mR @ 20: 0.0829;  mR @ 50: 0.1276;  mR @ 100: 0.1549;  for mode=predcls, type=Mean Recall.
(above:0.0854) (across:0.0000) (against:0.0000) (along:0.0092) (and:0.0237) (at:0.1920) (attached to:0.0024) (behind:0.4910) (belonging to:0.0000) (between:0.0069) (carrying:0.2284) (covered in:0.1619) (covering:0.0129) (eating:0.4193) (flying in:0.0000) (for:0.1236) (from:0.0282) (growing on:0.0000) (hanging from:0.0372) (has:0.7925) (holding:0.5807) (in:0.3445) (in front of:0.0680) (laying on:0.0045) (looking at:0.0311) (lying on:0.0000) (made of:0.0625) (mounted on:0.0000) (near:0.2985) (of:0.6079) (on:0.7632) (on back of:0.0000) (over:0.0823) (painted on:0.0000) (parked on:0.0000) (part of:0.0000) (playing:0.0000) (riding:0.4037) (says:0.0000) (sitting on:0.2931) (standing on:0.0066) (to:0.0000) (under:0.2259) (using:0.2667) (walking in:0.0000) (walking on:0.0134) (watching:0.0726) (wearing:0.9630) (wears:0.0008) (with:0.0394) 
SGG eval:   A @ 20: 0.6770;   A @ 50: 0.6795;   A @ 100: 0.6795;  for mode=predcls, type=TopK Accuracy.
====================================================================================================

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.