Coder Social home page Coder Social logo

hift's Introduction

HiFT: Hierarchical Feature Transformer for Aerial Tracking

Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li

Our paper is Accepted by ICCV 2021.

Abstract

Most existing Siamese-based tracking methods execute the classification and regression of the target object based on the similarity maps. However, they either employ a single map from the last convolutional layer which degrades the localization accuracy in complex scenarios or separately use multiple maps for decision making, introducing intractable computations for aerial mobile platforms. Thus, in this work, we propose an efficient and effective hierarchical feature transformer (HiFT) for aerial tracking. Hierarchical similarity maps generated by multi-level convolutional layers are fed into the feature transformer to achieve the interactive fusion of spatial (shallow layers) and semantics cues (deep layers). Consequently, not only the global contextual information can be raised, facilitating the target search, but also our end-to-end architecture with the transformer can efficiently learn the interdependencies among multi-level features, thereby discovering a tracking-tailored feature space with strong discriminability. Comprehensive evaluations on four aerial benchmarks have proven the effectiveness of HiFT. Real-world tests on the aerial platform have strongly validated its practicability with a real-time speed.

Workflow of our tracker

This figure shows the workflow of our tracker.

About Code

1. Environment setup

This code has been tested on Ubuntu 18.04, Python 3.8.3, Pytorch 0.7.0/1.6.0, CUDA 10.2. Please install related libraries before running this code:

pip install -r requirements.txt

2. Test

Download pretrained model: general_model(code: c99t) general_model_googleand put it into tools/snapshot directory.

Download testing datasets and put them into test_dataset directory. If you want to test the tracker on a new dataset, please refer to pysot-toolkit to set test_dataset.

python test.py                                
	--dataset UAV10fps                 #dataset_name
	--snapshot snapshot/general_model.pth  # tracker_name

The testing result will be saved in the results/dataset_name/tracker_name directory.

3. Train

Prepare training datasets

Download the datasets:

Note: train_dataset/dataset_name/readme.md has listed detailed operations about how to generate training datasets.

Train a model

To train the SiamAPN model, run train.py with the desired configs:

cd tools
python train.py

4. Evaluation

We provide the tracking results (code: tj12) results_google of UAV123@10fps, DTB70, UAV20L, and UAV123. If you want to evaluate the tracker, please put those results into results directory.

python eval.py 	                          \
	--tracker_path ./results          \ # result path
	--dataset UAV20                  \ # dataset_name
	--tracker_prefix 'general_model'   # tracker_name

5. Contact

If you have any questions, please contact me.

Ziang Cao

Email: [email protected]

Qualitative Evaluation

Compared with deeper trackers

Performance Comparison

Compared with deeper trackers

Result on DTB70 and UAV20L

For more evaluations, please refer to our paper.

References

@INPROCEEDINGS{cao2021iccv,       
	author={Cao, Ziang and Fu, Changhong and Ye, Junjie and Li, Bowen and Li, Yiming},   
	booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)}, 
	title={{HiFT: Hierarchical Feature Transformer for Aerial Tracking}},
	year={2021},
	volume={},
	number={},
	pages={1-10}
}

Acknowledgement

The code is implemented based on pysot. We would like to express our sincere thanks to the contributors.

hift's People

Contributors

ziangcao0312 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hift's Issues

模型训练的问题

我只使用了GOT-10K来作训练,VIDEOS_PER_EPOCH从600000改成了64000,其他设置没改,想看下效果,发现模型基本无法实现跟踪,请问是正常的吗?VIT是不是必须要用更庞大的数据集才可以?谢谢

结果咨询

请问一下,为什么我用您的代码在自己机器上跑出来的模型,然后在UAV123上测试,预测出来的结果为什么很不正常呢,后两个值都是10
1634730578(1)
1634730578(1)

两个gpu和单个gpu训练时间一样

作者你好,我在用两个gpu训练的时候发现和用一个gpu训练时间是一样的,后来我发现pysot里面的分布式训练时设置的rank=0,world_size=1,这样的话就不会并行训练了,请问你是故意这样设置的吗?

关于OT和FT

请问消融实验中OT和FT的transformer结构是什么样的啊,看了论文不太懂

测试集问题

作者您好,我看到您在UAV123@10fps,和DTB70这两个数据集上测试了,您问您有没有这两个数据集的json文件呢,可不可以分享一下呢,十分感谢

par_crop.py

请问got10k文件夹下的par_crop.py 中 cv2.imwrite(join(video_crop_base_path, '{:06d}.{:02d}.x.jpg'.format(int(idx), int(0))), x)写入的文件后续在哪里使用呢

Could you please provide the result files of comparison trackers?

We notice that only the result files of your proposed tracker are uploaded on UAV123 and DTB70.
We would greatly appreciate you if you can provide the result files of other comparison trackers in your paper, which will make us cite your work in our paper much more convenient.

heat maps

老哥,请教一下,论文里面的heatmap怎么画出来的呢?用的grad-cam嘛?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.