Coder Social home page Coder Social logo

hoitransformer's Introduction

HOI Transformer

Code for CVPR 2021 accepted paper End-to-End Human Object Interaction Detection with HOI Transformer.

This method also won 2nd Place Award in HOI Challenge in Person In Context in CVPR Workshop 2021.

TODO list

  • HICO-DET Code & DataSet
  • V-COCO Code & DataSet
  • HOI-A Code & DataSet
  • HICO-DET Res50 Res101
  • V-COCO Res50 Res101
  • HOI-A Res50 Res101
  • Swin-B Code

Performance

Model HICO-DET (Full/Rare/NonRare) V-COCO HOI-A
Res50 28.92/19.83/31.64 51.15 73.23

Reproduction

We recomend you to setup in the following steps:

1.Clone the repo.

git clone https://github.com/bbepoch/HoiTransformer.git

2.Download the MS-COCO pretrained DETR model.

cd data/detr_coco && bash download_model.sh

3.Download the annotation files for HICO-DET, V-COCO and HOI-A.

cd data && bash download_annotations.sh

4.Download the image files for HICO-DET, V-COCO and HOI-A. Instead, we provide a script to get all of them. A required directory structure is:

    HoiTransformer/
    ├── data/
    │   ├── detr_coco/
    │   ├── hico/
    │   │   ├── eval/
    │   │   └── images/
    │   │       ├── train2015/
    │   │       └── test2015/
    │   ├── hoia/
    │   │   ├── eval/
    │   │   └── images/
    │   │       ├── trainval/
    │   │       └── test/
    │   └── vcoco/
    │       ├── eval/
    │       └── images/
    │           ├── train2014/
    │           └── val2014/
    ├── datasets/
    ├── models/
    ├── tools/
    ├── util/
    ├── engin.py
    ├── main.py
    └── test.py

5.OPTIONAL SETTINGS. When the above subdirectories in 'data' are all ready, you can train a model on any one of the three benchmarks. But before that, we highly recommend you to move the whole folder 'data' to another place on your computer, e.g. '/home/hoi/data', and only put a soft link named 'data' under 'HoiTransformer'.

# Optional but recommended to separate data from code.
mv data /home/hoi/
ln -s /home/hoi/data data

6.Train a model.

# Train on HICO-DET.
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --epochs=150 --lr_drop=110 --dataset_file=hico --batch_size=2 --backbone=resnet50

# Train on HOI-A.
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --epochs=150 --lr_drop=110 --dataset_file=hoia --batch_size=2 --backbone=resnet50

# Train on V-COCO.
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --epochs=150 --lr_drop=110 --dataset_file=vcoco --batch_size=2 --backbone=resnet50

# Training longer can get even better performance.

7.Test a model.

python3 test.py --backbone=resnet50 --batch_size=1 --dataset_file=hico --log_dir=./ --model_path=your_model_path

Annotations

We propose a new annotation format 'ODGT' which is much easier to understand, and we have provided annotation files for all the existing benchmarks, i.e. HICO-DET, HOI-A, V-COCO, so you don't have to know how to get it, just use it. The core structure of 'ODGT' format is:

{
    file_name: XXX.jpg,
    width: image width,
    height: image height,
    gtboxes: [
        {
            box: [x, y, w, h],
            tag: object category name,
        },
        ...
    ],
    hoi: [
        {
            subject_id: human box index in gtboxes,
            object_id: object box index in gtboxes,
            interaction: hoi category name,
        },
        ...
    ],
}

Citation

@inproceedings{zou2021_hoitrans,
  author = {Zou, Cheng and Wang, Bohan and Hu, Yue and Liu, Junqi and Wu, Qian and Zhao, Yu and Li, Boxun and Zhang, Chenguang and Zhang, Chi and Wei, Yichen and Sun, Jian},
  title = {End-to-End Human Object Interaction Detection with HOI Transformer},
  booktitle={CVPR},
  year = {2021},
}

Acknowledgement

We sincerely thank all previous works, especially DETR, PPDM, iCAN, for some of the codes are built upon them.

hoitransformer's People

Contributors

bbepoch avatar iloveat avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.