iq-scm / tcformer Goto Github PK

View Code? Open in Web Editor NEW

The codes for TCFormer in paper: Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

License: Apache License 2.0

Shell 1.51% Python 98.49%

tcformer's Introduction

TCFormer (CVPR'2022 Oral)

[📜paper]

Introduction

Official code repository for the paper:
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
[Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, and Xiaogang Wang]

TODO

Whole-body pose estimation training/testing codes release.
Whole-body pose estimation model zoo release.
TCFormer-large on COCO-WholeBody dataset.
Flops calculation function.
Integrate TCFormer to MMPose.

Model Zoo

You can find the pretrained checkpoints here.

Image Classification

Classification configs & weights see >>>here<<<.

TCFormer on ImageNet-1K

Method	Size	Acc@1	#Params (M)	Config	Checkpoint	log
TCFormer-light	224	79.4	14.2M	config	57M [Google]	[Google]
TCFormer	224	82.3	25.6M	config	103M [Google]	[Google]
TCFormer-large	224	83.6	62.8M	config	103M [Google]	[Google]

WholeBody Estimation

WholeBody Estimation configs & weights see >>>here<<<.

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
TCFormer	256x192	0.697	0.774	0.705	0.821	0.656	0.753	0.539	0.652	0.576	0.681	ckpt	log
TCFormer_large	384x288	0.718	0.794	0.744	0.850	0.790	0.856	0.614	0.715	0.642	0.733	ckpt	log

Citation

If you use this code for a paper, please cite:

@inproceedings{zeng2022not,
  title={Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer},
  author={Zeng, Wang and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11101--11111},
  year={2022}
}