Coder Social home page Coder Social logo

hwang-cs-ime / vrr-tamp Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 227 KB

[TIP 2022] The official implementation of "One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing".

License: Apache License 2.0

Python 100.00%
transformer visual-relationship-referring

vrr-tamp's Introduction

One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing, (TIP 2022)
Hang Wang1,2 | Youtian Du1 | Yabin Zhang2 | Shuai Li2 | Lei Zhang2
1Xi'an Jiaotong University, 2The Hong Kong Polytechnic University

Abstract

There exist a variety of visual relationships among entities in an image. Given a relationship query <subject, predicate, object>, the task of visual relationship referring (VRR) aims to disambiguate instances of the same entity category and simultaneously localize the subject and object entities in an image. Previous works of VRR can be generally categorized into one-stage and multi-stage methods. The former ones directly localize a pair of entities from the image but they suffer from low prediction accuracy, while the latter ones perform better but they are cumbersome to localize only a couple of entities by generating a rich amount of candidate proposals. In this paper, we formulate the task of VRR as an end-to-end bounding box regression problem and propose a novel one-stage approach, called VRR-TAMP, by effectively integrating Transformers and an adaptive message passing mechanism. First, visual relationship queries and images are respectively encoded to generate the basic modality-specific embeddings, which are then fed into a cross-modal Transformer encoder to produce the joint representation. Second, to obtain the specific representation of each entity, we introduce an adaptive message passing mechanism and design an entity-specific information distiller SR-GMP, which refers to a gated message passing (GMP) module that works on the joint representation learned from a single learnable token. The GMP module adaptively distills the final representation of an entity by incorporating the contextual cues regarding the predicate and the other entity. Experiments on VRD and Visual Genome datasets demonstrate that our approach significantly outperforms its one-stage competitors and achieves competitive results with the state-of-the-art multi-stage methods.

Overall framework of VRR-TAMP:

Framework

Prerequisites

Create the virtual python environment with the requirements.txt file:

conda create -n VRR-TAMP python=3.6.5
pip install -r requirements.txt

Activate the environment with:

conda activate VRR-TAMP

Installation

Please refer to DETR and download model weights of ResNet50 detr-r50-e632da11.pth. Please store it in ./saved_models/.

Train and Evaluate

python main.py --train

Acknowledgement

Thanks for works of TransVG, DETR and ReSC. Our code is based on these implementations.

Citation

@article{wang2022one,
  title={One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing},
  author={Wang, Hang and Du, Youtian and Zhang, Yabin and Li, Shuai and Zhang, Lei},
  journal={IEEE Transactions on Image Processing},
  volume={32},
  pages={190--202},
  year={2022},
  publisher={IEEE}
 }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.