Coder Social home page Coder Social logo

kelikeli / spatialsense Goto Github PK

View Code? Open in Web Editor NEW

This project forked from princeton-vl/spatialsense

0.0 0.0 0.0 6.74 MB

An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition

Home Page: https://arxiv.org/abs/1908.02660

License: BSD 2-Clause "Simplified" License

Python 100.00%

spatialsense's Introduction

SpatialSense

Code style: black

Samples

Dataset and code for the paper:

SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
Kaiyu Yang, Olga Russakovsky, and Jia Deng
International Conference on Computer Vision (ICCV) 2019

@inproceedings{yang2019spatialsense,
  title={SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition},
  author={Yang, Kaiyu and Russakovsky, Olga and Deng, Jia},
  booktitle={International Conference on Computer Vision (ICCV)},
  year={2019}
}

Requirements

  1. Download the SpatialSense dataset (including images.tar.gz and annotations.json) to the root of this repo. Unzip images.tar.gz.
  2. Download and install Miniconda Python 3 (Anaconda should also work).
  3. Install Python dependencies using conda: conda env create -f spatialsense.yaml && conda activate spatialsense. If you have troubles with the aforementioned two steps, you may manually install the packages in spatialsense.yaml in whatever way that works for you.
  4. Download the pre-trained Word2Vec model GoogleNews-vectors-negative300.bin.gz. to ./baselines.

Dataset

Data Format

The annotation.json file contains a list in which each element contains the annotations for a single image. For example, the first element is:

{
  'url': 'https://farm4.staticflickr.com/3543/5704634119_8b8ccf3229.jpg',  # URL for Flickr Images
  'nsid': '10392797@N03',                                                  # Flickr NSID of the user
  'height': 500,
  'width': 281,
  'split': 'train',                                                        # train/valid/test split
  'annotations': [{                                                        # a list of spatial relations
    '_id': '59fbffe4f25c8070bb77ec42',                                     # an unique identifier for the relation      
    'predicate': 'on',               
    'object': {
      'y': 402,                                                            # (x, y) is a point on the object
      'x': 148,
      'name': 'ground',
      'bbox': [196, 500, 3, 278]                                           # bounding box
    },
    'subject': {
      'y': 317, 
      'x': 157, 
      'name': 'cat', 
      'bbox': [230, 434, 31, 264]
    },
    'label': True                                                          # the relation is a positive example
    }, {
    '_id': '59ff0e910de0c80e4077c5f0',
    'predicate': 'to the left of',
    'object': {
      'y': 213,
      'x': 240,
      'name': 'mirror',
      'bbox': [0, 345, 160, 280]
    },
    'subject': {
      'y': 303, 
      'x': 143, 
      'name': 'cat', 
      'bbox': [226, 449, 33, 271]
    },
    'label': True},
  ...  
  ]}

Conventions for coordinates and bounding boxes: The origin is the upper-left corner of an image; the x-axis is along the width, and the y-axis is alone the height. A bounding box [y0, y1, x0, x1] has (x0, y0) as its upper-left corner and (x1, y1) as its bottom-right corner.

Visualizations

To visualize the relations in SpatialSense: python visualize.py
Run python visualize.py --help to see the options.

Baselines

Assuming you are in the ./baselines directory, below are instructions for reproducing the baselines in the paper.

Language-only

Training: python main_L.py --train_split train_valid --exp_id language-only

Predictions and model checkpoints will be saved in ./runs/language-only.

2D-only

Training: NO_WORD2VEC=1, python main_S.py --train_split train_valid --exp_id 2d-only

Predictions and model checkpoints will be saved in ./runs/2d-only.

Vip-CNN

Training: python main.py --train_split train_valid --exp_id vipcnn --model vipcnn --learning_rate 4e-4 --l2 5e-7 --n_epochs 40 --batchsize 16 --patience 18

Predictions and model checkpoints will be saved in ./runs/vipcnn.

Peyre et al.

  1. Extract the spatial features: python unrel/spatial_features.py
  2. Extract the appearance features: python unrel/appearance_features.py
  3. Train and test the model python unrel/train.py --spatial --appr --no_val

PPR-FCN

Training: python main.py --train_split train_valid --exp_id pprfcn --model pprfcn --backbone resnet101 --learning_rate 3e-4 --l2 6e-7 --batchsize 7 --patience 14

Predictions and model checkpoints will be saved in ./runs/pprfcn.

DRNet

Training: python main.py --train_split train_valid --exp_id drnet --model drnet --learning_rate 1.5e-4 --l2 2.8e-4

Predictions and model checkpoints will be saved in ./runs/drnet.

VTransE

Training: python main.py --train_split train_valid --exp_id vtranse --model vtranse --learning_rate 6e-4 --l2 3e-4 --feature_dim 128

Predictions and model checkpoints will be saved in ./runs/vtranse.

Spatial Relations in 3D

In a more recent paper, we constructed Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. It enables quantifying the effectiveness of 3D information in predicting spatial relations. Moreover, we propose minimally contrastive data collection—a novel crowdsourcing method for reducing dataset bias. The examples in Rel3D come in minimally contrastive pairs: two examples in a pair are almost identical but have different labels.

Rel3D

spatialsense's People

Contributors

coldmanck avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.