Coder Social home page Coder Social logo

heterologous-image-matching's Introduction

Heterologous Image Matching

This is the course design of Pattern Recognition and Machine Learning in Huazhong University of Science and Technology, School of Artificial Intelligence and Automation

Introduction

TransT presents an attention-based network to achieve precise and robust detection and tracking by fusing template and search features. Inspired by TransT, we propose a pseudo-Siamese network that is independent at the lower level and shared at the higher level based on the characteristics of heterogeneous image-matching tasks. In the experiment, we discuss different backbones as well as different strategies of feature extraction for template and search images. Besides, we simplify the attention module in TransT according to the characteristics of image matching.

Quick Start

Train

To run a model, run

python run_train.py

Evaluation

To get the test metrics, run

python run_test.py

Demo

To see a demo, run

python demo.py

Experiment result

Dataset: M3FD

M3FD is a paired visible and infrared images dataset which contains 6 kinds of targets: {People, Car, Bus, Motorcycle, Lamp, Truck}.

Backbone Processing Strategy

The following is the results of applying different backbone processing strategies to TransT on the M3FD test set.

Backbone processing strategy single backbone low-sep-high-sharing double backbones independent double backbones
Model parameters 23.0M 23.2M 31.6M
FLOPs 25.49G 25.49G 25.49G
mIOU 0.71 0.80 0.80
P0.5 86.26% 95.10% 92.91%
P0.7 70.35% 88.79% 86.95%

(P0.5,P0.7 represents the ratio of IOU above 0.5,0.7 respectively)

###Different Backbones The following is the results of applying the low-sep-high-sharing backbone processing strategies to model whose backbone is different from TranSt on the M3FD test set.

Backbone Network ResNet50 MobileNetv3 CSPNet
Model parameters 23.2M 17.4M 26.7M
FLOPs 25.49G 15.34G 29.42G
mIOU 0.80 0.81 0.77
P0.5 95.10% 96.66% 91.78%
P0.7 88.79% 88.63% 84.09%

Appling the technology of multi-scale feature map fusion to above backbone.

Backbone Network ResNet50(multi) MobileNetv3(multi) CSPNet(multi)
Model parameters 24.3M 17.9M 27.6M
FLOPs 26.16G 15.64G 29.98G
mIOU 0.80 0.81 0.81
P0.5 94.75% 96.33% 93.91%
P0.7 86.05% 88.43% 88.84%

We conduct experiments on homologous image matching based on M3FD's Visible image part.Note that it is now a Siamese network.

Backbone Network ResNet50 MobileNetv3(multi) CSPNet(multi)
Model parameters 23.0M 17.7M 27.0M
FLOPs 25.49G 15.64G 29.98G
mIOU 0.92 0.89 0.91
P0.5 99.25% 99.20% 99.61%
P0.7 96.61% 97.75% 95.86%

We apply the above model to the COCO dataset without fine-tuning.

Backbone Network ResNet50 MobileNetv3(multi) CSPNet(multi)
mIOU 0.85 0.81 0.84
P0.5 95.36% 94.33% 94.81%
P0.7 87.25% 82.75% 86.12%

Experiments on Attention Modules

The following is the results of applying TransT's and ours attention module on the M3FD test set.

Attention Module TransT(x4) Ours(x5)
Model parameters 23.2M 19.8M
FLOPs 25.49G 25.49G
mIOU 0.80 0.83
P0.5 99.25% 99.20%
P0.7 96.61% 97.75%

(x4,x5 represents the number of layers stacked)

Reference

@inproceedings{TransT,
title={Transformer Tracking},
author={Chen, Xin and Yan, Bin and Zhu, Jiawen and Wang, Dong and Yang, Xiaoyun and Lu, Huchuan},
booktitle={CVPR},
year={2021}
}

@inproceedings{TarDAL,
  title={Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection},
  author={Jinyuan Liu, Xin Fan*, Zhangbo Huang, Guanyao Wu, Risheng Liu , Wei Zhong, Zhongxuan Luo},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

heterologous-image-matching's People

Contributors

tqtqliu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.