Coder Social home page Coder Social logo

mstl's Introduction

Multi-source Templates Learning for Real-time Aerial Tracking

This is the official code for the paper "Multi-source Templates Learning for Real-time Aerial Object Tracking".In this work, we present an efficient Aerial Object Tracking method via Multi-source Templates named MSTL.

Highlights

Real-Time Speed on edge platform.

Our tracker can run ~200fps on GPU, ~100fps on CPU, and ~20 on Nvidia Jetson Xavier NX platform. After tensorRT to accelerate, the speed can reach , ~60fps on Jetson Xavier NX, ~19 fps on Jetson Nano.

Opposing to previous aerial trackers which evaluate on high-end platform(like Jetson AGX/Orin Series), the proposed tracker can run on extremely cheap edge platform: Jetson Nano and Jetson Xavier NX.

Competitive performance.

Year Speed(fps) UAV123(Prec.) UAV123@10fps(Prec.) UAV20L(Prec.)
Ours 209 82.35 83.50 83.59
TCTrack CVPR 2022 128 80.05 77.39 67.20
HIFT ICCV 2021 137 78.70 74.87 76.32

Demo

demo_gif

Quick Start

Environment Preparing

python 3.7.3
pytorch 1.11.0
opencv-python 4.5.5.64

Training

First, you need to set paths for training datasets in lib/train/admin/local.py.

Then, run the following commands for training.

python lib/train/run_training.py

Evaluation

First, you need to set paths for this project in lib/test/evaluation/local.py.

Then, run the following commands for evaluation on four datasets.

  • UAV123
python tracking/test.py MSTL MSTL --dataset uav
  • UAV20L
python tracking/test.py MSTL MSTL --dataset uavl
  • UAV@10fps
python tracking/test.py MSTL MSTL --dataset uav10
  • UAV -x
python tracking/test.py MSTL MSTL --dataset uavd

Trained model and Row results

The trained models, the training logs, and the raw tracking results are provided in the model zoo

MSTL framework for other transformer-based trackers.

To use our framework for other transformer-based trackers, we jointly trained the original tracker with an additional prediction head. The head takes outputs of the transformer encoder(or transformer-based structure) as inputs and predicts the bounding box of the target directly.

As an example, we use TransT(CVPR2021) with 4 feature integration layers (Each layer with 2 self-Attention and 2 Cross-Attention) to demonstrate how to implement the proposed decoupling strategy.

  • During training:
    • Step 1: Feed the outputs of the second layers into an additional cross-attention mechanism to fuse the search and template features.
    • Step 2: Use the outputs of the cross-attention as input for prediction heads to locate the target.
    • Step 3: Train the original model together with the additional cross-attention layer and prediction heads.
  • During inference:
    • We use the outputs of the second layer to locate the target.

TransT

Tracker Original Succ. (UAV123) Original params Succ. (After decoupling) params(After decoupling)
TransT 69.1 23.0M - 16.7M
STARK-S 68.35 28.079M 68.55 18.616M

For further information, we will make the corresponding codes and pre-trained models available.

About the UAV platform

The platform mainly consists of four parts, i.e., a Pixhawk flight controller, a Figure Number transmissions, a visual camera and a Jetson Xavier NX onboard computer. The onboard computer can obtain the video flow through the USB port. The ground station computer can remotely access the onboard computer and select the target to be tracked through data transmission.

Hardware

Acknowledgement

  • Thanks for the PyTracking and stark Library, which helps us to quickly implement our ideas. We would like to thank their authors for providing great frameworks and toolkits.

mstl's People

Contributors

ymsun2020 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

mstl's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.