Coder Social home page Coder Social logo

mcg-nju / videomae-action-detection Goto Github PK

View Code? Open in Web Editor NEW
47.0 2.0 3.0 594 KB

[NeurIPS 2022 Spotlight] VideoMAE for Action Detection

License: Other

Python 89.29% C 0.53% C++ 1.29% Cuda 7.14% Shell 1.75%
pytorch mae masked-autoencoder neurips-2022 transformer video-transformer video-understanding videomae action-detection

videomae-action-detection's Introduction

VideoMAE for Action Detection (NeurIPS 2022 Spotlight) [Arxiv]

VideoMAE Framework

License: CC BY-NC 4.0
PWC

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong, Yibing Song, Jue Wang, Limin Wang
Nanjing University, Tencent AI Lab

This repo contains the supported code and scripts to reproduce action detection results of VideoMAE. The code of pre-training is available in original repo.

πŸ“° News

[2023.1.16] Code and pre-trained models are available now!

πŸš€ Main Results

✨ AVA 2.2

Method Extra Data Extra Label Backbone #Frame x Sample Rate mAP
VideoMAE Kinetics-400 βœ— ViT-S 16x4 22.5
VideoMAE Kinetics-400 βœ“ ViT-S 16x4 28.4
VideoMAE Kinetics-400 βœ— ViT-B 16x4 26.7
VideoMAE Kinetics-400 βœ“ ViT-B 16x4 31.8
VideoMAE Kinetics-400 βœ— ViT-L 16x4 34.3
VideoMAE Kinetics-400 βœ“ ViT-L 16x4 37.0
VideoMAE Kinetics-400 βœ— ViT-H 16x4 36.5
VideoMAE Kinetics-400 βœ“ ViT-H 16x4 39.5
VideoMAE Kinetics-700 βœ— ViT-L 16x4 36.1
VideoMAE Kinetics-700 βœ“ ViT-L 16x4 39.3

πŸ”¨ Installation

Please follow the instructions in INSTALL.md.

➑️ Data Preparation

Please follow the instructions in DATASET.md for data preparation.

‴️ Fine-tuning with pre-trained models

The fine-tuning instruction is in FINETUNE.md.

πŸ“Model Zoo

We provide pre-trained and fine-tuned models in MODEL_ZOO.md.

☎️ Contact

Zhan Tong: [email protected]

πŸ‘ Acknowledgements

Thanks to Lei Chen for support. This project is built upon MAE-pytorch, BEiT and AlphAction. Thanks to the contributors of these great codebases.

πŸ”’ License

The majority of this project is released under the CC-BY-NC 4.0 license as found in the LICENSE file. Portions of the project are available under separate license terms: pytorch-image-models are licensed under the Apache 2.0 license. BEiT is licensed under the MIT license.

✏️ Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

@inproceedings{tong2022videomae,
  title={Video{MAE}: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training},
  author={Zhan Tong and Yibing Song and Jue Wang and Limin Wang},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

@article{videomae,
  title={VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training},
  author={Tong, Zhan and Song, Yibing and Wang, Jue and Wang, Limin},
  journal={arXiv preprint arXiv:2203.12602},
  year={2022}
}

videomae-action-detection's People

Contributors

wanglimin avatar yztongzhan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

videomae-action-detection's Issues

Input resolution of train and validation

Hi authors,

I found that the input video resolution was set to 16x224x224 while the video resolution used during the validation was 16x256x352.

I might read it wrong, but I wonder how much the video resolution could affect the validation accuracy. Could you please provide results validated with 224x224?

How to test?

Thanks for your work

After fine-tuning, the performance of the model on the verification set is obtained, but how to get the accuracy of the model on the test set? Is there a demo file or process for reference?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.