Coder Social home page Coder Social logo

shufangxun / mac Goto Github PK

View Code? Open in Web Editor NEW
23.0 1.0 0.0 2 MB

An end-to-end masked contrastive video-and-language pre-training framework

License: MIT License

multimodal pretraining vision-transformer activitynet clip didemo msrvtt video-language contrastive-learning mae

mac's Introduction

MAC

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval, arxiv 2022,

We present a simple yet effective Masked Contrastive Video-and-Language Pre-training framework for efficient video-text retrieval. Instead of blindly applying the mask-then-prediction paradigm from MAE, we propose a masked-then-alignment paradigm for efficient video-text alignment with random masking on both video and text. Our MAC enables efficient end-to-end pre-training: reduce FLOPs (60% off), accelerate pre-training (by 3x), and improve performance.

img

Pre-Training

  1. Download WebVid2M (see https://github.com/m-bain/webvid)

  2. Download CC3M (see https://ai.google.com/research/ConceptualCaptions/download)

Finetune

  1. Download MSRVTT (see https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip)
  2. Download DiDeMo (see https://github.com/LisaAnne/TemporalLanguageRelease)
  3. Download ActivityNet (see https://github.com/activitynet/ActivityNet)

Results

We achieve SOTA results on various video-text retrieval datasets including MSR-VTT, DiDeMo, and ActivityNet. Below is the result on MSRVTT, more details can be found in our paper.

image

Citation

If you find our paper helpful in your research, please cite:

@article{shu2022masked,
  title={Masked Contrastive Pre-Training for Efficient Video-Text Retrieval},
  author={Shu, Fangxun and Chen, Biaolong and Liao, Yue and Xiao, Shuwen and Sun, Wenyu and Li, Xiaobo and Zhu, Yousong and Wang, Jinqiao and Liu, Si},
  journal={arXiv preprint arXiv:2212.00986},
  year={2022}
}

LICENSE

This project is licensed under the MIT License. See LICENSE for more details

Acknowledgements

This code is built on Frozen in time and MAE, we thank the authors for their awesome projects

mac's People

Contributors

shufangxun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.