Coder Social home page Coder Social logo

sejja / avion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhaoyue-zephyrus/avion

0.0 0.0 0.0 1.34 MB

Code release for "Training a Large Video Model on a Single Machine in a Day"

Home Page: http://arxiv.org/abs/2309.16669

License: MIT License

Python 99.56% Shell 0.44%

avion's Introduction

✈️ avion

AVION is short for A VIdeo model in ONe day. AVION (meaning plane in French and Spanish) is fast.

Training a Large Video Model on a Single Machine in a Day
Yue Zhao, Philipp Krähenbühl
UT Austin
arxiv | bibtex

Installation

See INSTALL.md to install this code.

Main results

  1. AVION enables video-language contrastive pre-training on Ego4D (original narratives) on a single node of 8× consumer-grade GPUs within a day.

    Method Backbone batch-size
    per GPU
    GPU memory Hardware GPU×hour^ EK100 MIR
    0-shot Avg. mAP
    EgoVLP TSF-B 16 22 32× A100 1536 22.1
    Ours ViT-B 256 19 8× A5000 130 27.4

    ^The reported GPU×hour is not normalized for GPU generations. The cost for EgoVLP is obtained from the original paper (Sec 6.1).

  2. AVION speeds up LLM-augmented video-language contrastive pre-training (LaViLa) on Ego4D.

    a. Pretraining cost and performance.

    Method Backbone batch-size
    per GPU
    GPU memory Hardware GPU×hour^ EK100 MIR
    0-shot Avg. mAP
    LaViLa TSF-B 32 25 32× V100 1824 30.9
    Ours ViT-B 256 19 8× A5000 260 33.2

    ^The reported GPU×hour is not normalized for GPU generations.

    b. Downstream performance.

    Method Backbone EK100 MIR
    Avg. mAP
    EK100 MIR
    Avg. nDCG
    EK100 CLS
    Action Top-1
    LaViLa TSF-B 50.5 65.0 46.9
    Ours ViT-B 51.7 66.8 49.5
    LaViLa TSF-L 50.9 66.5 51.0
    Ours ViT-L 54.5 69.0 54.5

    🏆 LaViLa+AVION helps us win CVPR 2023 EPIC-Kitchens Challenges in both Action Recognition and Multi-Instance Retrieval Tasks by a significant margin.

  3. AVION speeds up VideoMAE pre-training.

    Method Backbone Epochs GPU×hour^^ top-1/top-5 (w/. FT)
    VideoMAE ViT-B 800 995 80.0/94.4
    Ours ViT-B 800 583 80.1/94.5

    ^^Both GPU×hour are measured on the same hardware environment (4× A5000 GPU).

For more details, please refer to MODEL_ZOO.

License

MIT License.

Acknowledgements

  • The vision-language contrastive pretraining part is refactored from LaViLa.
  • The MAE-style self-supervised pre-training part is built upon VideoMAE.

Citing AVION

@article{zhao2023training,
  title={Training a large video model on a single machine in a day},
  author={Zhao, Yue and Kr{\"a}henb{\"u}hl, Philipp},
  journal={arXiv preprint arXiv:2309.16669},
  year={2023}
}
@inproceedings{zhao2023lavila,
  title={Learning Video Representations from Large Language Models},
  author={Zhao, Yue and Misra, Ishan and Kr{\"a}henb{\"u}hl, Philipp and Girdhar, Rohit},
  booktitle={CVPR},
  year={2023}
}

avion's People

Contributors

zhaoyue-zephyrus avatar sejja avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.