Coder Social home page Coder Social logo

awesome-gif-llm's Introduction

Awesome-GIF-LLM

Part of the information comes from Awesome-LLMs-for-Video-Understanding.

Datasets

GIF

Name Paper Number Videos Number Sens Ave Duration Comments
TGIF TGIF: A New Dataset and Benchmark on Animated GIF Description 100k 120k Des 3.1s Captioning
TGIF-QA TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering 72k 165k QAs - VQAs
Ani-GIFs Ani-GIFs: A benchmark dataset for domain generalization of action recognition from GIFs 17k 536 cls 2.1s Action Recognitaion & Animated GIFs
Vid2GIF Video2gif: automatic generation of animated gifs from video 100k - 5.8s Generating GIF from Video
GifGIF Predicting viewer perceived emotions in animated GIFs 3.8k 17 emotions <=303 frames (15s) emotions recg
Gifgif+ Gifgif+: Collecting emotional animated gifs with clustered multi-task learning 23k 17 emotions - emotions recg

Video

Name Paper Number Videos Number Sens Ave Duration Comments
MSR-VTT MSR-VTT: A Large Video Description Dataset for Bridging Video and Language 10k (200k clips) 200k Des 14s Captiong
MSVD Collecting Highly Parallel Data for Paraphrase Evaluation 2k 85k Des 4-10s Captioning
ActivityNet ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding 27k (849h) 203 cls 109 s activity reg

Methods

Pre-trained

Name Paper Code Comments
LanguageBind LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment LinkStar OpenCLIP
BLIP-2 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Link ViT+Q-former+OPT/FlanT5
mPLUG2 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Link WebVid-2M
mPLUG-Owl2 mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration LinkStar -

Instruction-tuned for Lanuage Task

Name Paper Code Video Datasets Comments
Video-LLaMA Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding LinkStar Pertraining: Webvid+CC3M
Finetune: Video-Chat+LLaVa+MiniGPT4-4
BLIP2+Vicuna/LLaMa
Video-LLaVA Video-LLaVA: Learning United Visual Representation by Alignment Before Projection LinkStar Pertraining: WebVid+CC3M
Finetune:Video-ChatGPT+LLaVa
LanguageBind+Vicuna
StarVector StarVector: Generating Scalable Vector Graphics Code from Images LinkStar SVG-Fonts+SVG-Icons+SVG-Emoji+SVG-Stack Clip+Adapter+StarCode

Code Generation with Reinforcement Learning (RL)

Name Paper Code Datasets Comments
StepCoder StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback LinkStar APPS+ Curriculum Learning+Reinforcement Learning

awesome-gif-llm's People

Contributors

zhugekongkong avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.