ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

We provide the code for reproducing experiment results of ViTAA

ECCV2020 conference paper: pdf.
If this work is helpful for your research, please cite ViTAA

@misc{wang2020vitaa,
    title={ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language},
    author={Zhe Wang and Zhiyuan Fang and Jun Wang and Yezhou Yang},
    year={2020},
    eprint={2005.07327},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Benchmark

CUHK-PEDES

Method	Features	R@1	R@5	R@10
GNA-RNN	global	19.05	-	53.64
CMCE	global	25.94	-	60.48
PWM-ATH	global	27.14	49.45	61.02
Dual Path	global	44.40	66.26	75.07
CMPM+CMPC	global	49.37	-	79.27
MIA	global+region	53.10	75.00	82.90
GALM	global+keypoint	54.12	75.45	82.97
ViTAA	global+attribute	55.97	75.84	83.52

Data preparation

Download CUHK-PEDES dataset and save it anywhere you like (e.g. ~/datasets/cuhkpedes/).
Download text_attribute_graph (GoogleDrive / BaiduYun(code: vbss)) which are the text phrases parsed from the sentences, and save it in (e.g. ~/datasets/cuhkpedes/).
Use the provided Human Parsing Network to generate the attribute segmentations, and save it in (e.g. ~/datasets/cuhkpedes/).
Run the script in tools/cuhkpedes/convert_to_json to generate the json files as annotations.

python tools/cuhkpedes/convert_to_json.py --datadir ~/datasets/cuhkpedes/ --outdir datasets/cuhkpedes/annotations

Your datasets directory should look like this:

ViTAA
-- configs
-- tools
-- vitaa
-- datasets
   |-- cuhkpedes
   |   |-- annotations
   |   |   |-- test.json
   |   |   |-- train.json
   |   |   |-- val.json
   |   |-- imgs
   |   |   |-- cam_a
   |   |   |-- cam_b
   |   |   |--  ...
   |   |-- segs
   |   |   |-- cam_a
   |   |   |-- cam_b
   |   |   |--  ...

Training

# single-gpu training
python tools/train_net.py --config-file configs/cuhkpedes/bilstm_r50_seg.yaml

# multi-gpu training
We provide the code for distributed training but they haven't been tested

Note: We train ViTAA with batch_size=64 on one Tesla V100 GPU. If your GPU doesn't support such batch size, please follow the Linear Scaling Rule to adjust the configuration.

Testing

# single-gpu testing
python tools/test.py --config-file configs/cuhkpedes/bilstm_r50_seg.yaml --checkpoint-file output/cuhkpedes/...

Human Parsing Network

We separately provide the code of our Human Parsing Network because we think it might be a useful tool for the community.

Acknowledgement

Our codes is based on maskrcnn-benchmark, great thanks to their work.

liviust / vitaa Goto Github PK

vitaa's Introduction

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Benchmark

CUHK-PEDES

Data preparation

Training

Testing

Human Parsing Network

Acknowledgement

vitaa's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent