Coder Social home page Coder Social logo

trellixvulnteam / ppmn_pnxq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dzh19990407/ppmn

0.0 0.0 0.0 2.06 MB

ACM MM 2022 - PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

License: Apache License 2.0

Shell 0.54% C++ 3.18% Python 92.04% Cuda 4.07% Makefile 0.02% CMake 0.03% Dockerfile 0.11%

ppmn_pnxq's Introduction

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding,
Zihan Ding*, Zi-han Ding*, Tianrui Hui, Junshi Huang, Xiaoming Wei, Xiaolin Wei and Si Liu
ACM MM 2022 (arxiv 2208.05647)

News

  • [2022-08-12] Code is released.

Abstract

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image. The previous two-stage approach first extracts segmentation region proposals by an off-the-shelf panoptic segmentation model, then conducts coarse region-phrase matching to ground the candidate regions for each noun phrase. However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage and the loss of spatial details caused by region feature pooling, as well as complicated strategies designed for things and stuff categories separately. To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination. Thus, our model can exploit sufficient and finer cross-modal semantic correspondence from the supervision of densely annotated pixel-phrase pairs rather than sparse region-phrase pairs. In addition, we also propose a Language-Compatible Pixel Aggregation (LCPA) module to further enhance the discriminative ability of phrase features through multi-round refinement, which selects the most compatible pixels for each phrase to adaptively aggregate the corresponding visual context. Extensive experiments show that our method achieves new state-of-the-art performance on the PNG benchmark with 4.0 absolute Average Recall gains.

Installation

Requirements

  • Python
  • Numpy
  • Pytorch 1.7.1
  • Tqdm 4.56.0
  • Scipy 1.5.3

Cloning the repository

$ git clone [email protected]:dzh19990407/PPMN.git
$ cd PPMN

Dataset Preparation

Panoptic Narrative Grounding Benchmark

  1. Download the 2017 MSCOCO Dataset from its official webpage. You will need the train and validation splits' images and panoptic segmentations annotations.

  2. Download the Panoptic Narrative Grounding Benchmark from the PNG's project webpage. Organize the files as follows:

panoptic_narrative_grounding
|_ images
|  |_ train2017
|  |_ val2017
|_ annotations
   |_ png_coco_train2017.json
   |_ png_coco_val2017.json
   |_ panoptic_segmentation
   |  |_ train2017
   |  |_ val2017
   |_ panoptic_train2017.json
   |_ panoptic_val2017.json
  1. Pre-process the Panoptic narrative Grounding Ground-Truth Annotation for the dataloader using utils/pre_process.py.

  2. At the end of this step you should have two new files in your annotations folder.

panoptic_narrative_grounding
|_ annotations
   |_ png_coco_train2017.json
   |_ png_coco_val2017.json
   |_ png_coco_train2017_dataloader.json
   |_ png_coco_val2017_dataloader.json
   |_ panoptic_segmentation
   |  |_ train2017
   |  |_ val2017
   |_ panoptic_train2017.json
   |_ panoptic_val2017.json

Train and Inference

Modify the routes in train_net.sh according to your local paths. If you want to only test the pretrained model, add --ckpt_path ${PRETRAINED_MODEL_PATH} and --test_only.

Pretrained Model

To reproduce all our results as reported bellow, you can use our pretrained model and our source code.

Method things + stuff things stuff
PNG 55.4 56.2 54.3
MCN - 48.2 -
PPMN 59.4 57.2 62.5
Method singulars + plurals singulars plurals
PNG 55.4 56.2 48.8
PPMN 59.4 60.0 54.0

Citation

@inproceedings{Ding_2022_ACMMM,
    author    = {Ding, Zihan and Ding, Zi-han and Hui, Tianrui and Huang, Junshi and Wei, Xiaoming and Wei, Xiaolin and Liu, Si},
    title     = {PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding},
    booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
    year      = {2022}

Acknowledge

Some of the codes are built upon K-Net and PNG. Thanks them for their great works!

ppmn_pnxq's People

Contributors

dzh19990407 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.