Light

zhugekongkong / awesome-gif-llm Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 7 KB

awesome-gif-llm's Introduction

Awesome-GIF-LLM

Part of the information comes from Awesome-LLMs-for-Video-Understanding.

Datasets

GIF

Name	Paper	Number Videos	Number Sens	Ave Duration	Comments
TGIF	TGIF: A New Dataset and Benchmark on Animated GIF Description	100k	120k Des	3.1s	Captioning
TGIF-QA	TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering	72k	165k QAs	-	VQAs
Ani-GIFs	Ani-GIFs: A benchmark dataset for domain generalization of action recognition from GIFs	17k	536 cls	2.1s	Action Recognitaion & Animated GIFs
Vid2GIF	Video2gif: automatic generation of animated gifs from video	100k	-	5.8s	Generating GIF from Video
GifGIF	Predicting viewer perceived emotions in animated GIFs	3.8k	17 emotions	<=303 frames (15s)	emotions recg
Gifgif+	Gifgif+: Collecting emotional animated gifs with clustered multi-task learning	23k	17 emotions	-	emotions recg

Video

Name	Paper	Number Videos	Number Sens	Ave Duration	Comments
MSR-VTT	MSR-VTT: A Large Video Description Dataset for Bridging Video and Language	10k (200k clips)	200k Des	14s	Captiong
MSVD	Collecting Highly Parallel Data for Paraphrase Evaluation	2k	85k Des	4-10s	Captioning
ActivityNet	ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding	27k (849h)	203 cls	109 s	activity reg

Methods

Pre-trained

Name	Paper	Code	Comments
LanguageBind	LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment	Link	OpenCLIP
BLIP-2	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	Link	ViT+Q-former+OPT/FlanT5
mPLUG2	mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video	Link	WebVid-2M
mPLUG-Owl2	mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	Link	-

Instruction-tuned for Lanuage Task

Name	Paper	Code	Video Datasets	Comments
Video-LLaMA	Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding	Link	Pertraining: Webvid+CC3M Finetune: Video-Chat+LLaVa+MiniGPT4-4	BLIP2+Vicuna/LLaMa
Video-LLaVA	Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	Link	Pertraining: WebVid+CC3M Finetune:Video-ChatGPT+LLaVa	LanguageBind+Vicuna
StarVector	StarVector: Generating Scalable Vector Graphics Code from Images	Link	SVG-Fonts+SVG-Icons+SVG-Emoji+SVG-Stack	Clip+Adapter+StarCode

Code Generation with Reinforcement Learning (RL)

Name	Paper	Code	Datasets	Comments
StepCoder	StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback	Link	APPS+	Curriculum Learning+Reinforcement Learning

awesome-gif-llm's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.