Coder Social home page Coder Social logo

ttengwang / awesome_prompting_papers_in_computer_vision Goto Github PK

View Code? Open in Web Editor NEW
881.0 35.0 68.0 74 KB

A curated list of prompt-based paper in computer vision and vision-language learning.

Home Page: https://visualprompting.github.io/

prompt-learning adapter few-shot-learning prompt-tuning zero-shot-learning visual-prompt parameter-efficient-tuning

awesome_prompting_papers_in_computer_vision's Introduction

Awesome Prompting Papers in Computer Vision

A curated list of prompt-based papers in computer vision and vision-language learning.

Keywords

  • Task tag, e.g.,
  • Abbreviation tag, e.g.,
  • Characteristic tag: Some characteristic makes this paper unique, e.g.,
  • Bold font: We highlight some pilot work that may contribute to the prevalence of visual prompting.

Vision Prompt

This section collects papers prompting pretrained vision foundation models (e.g., ViT) for parameter-efficient adaptation.

  • Learning to Prompt for Continual Learning [paper] [code]

    CVPR 2022

  • Visual Prompt Tuning [paper] [code]

    ECCV 2022

  • DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning [paper] [code]

    ECCV 2022

  • AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition [paper] [code]

    NeurIPS 2022

  • Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning [paper] [code]

    NeurIPS 2022

  • P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting [paper] [code]

    NeurIPS 2022

  • Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models [paper] [code]

    NeurIPS 2022

  • Visual Prompting via Image Inpainting [paper] [code]

    NeurIPS 2022

  • Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation [paper]

    AAAI 2023

  • LPT: Long-tailed Prompt Tuning for Image Classification [paper]

    ICLR 2023

  • Diversity-Aware Meta Visual Prompting [paper] [code]

    CVPR 2023

  • Semantic Prompt for Few-Shot Image Recognition [paper]

    CVPR 2023

  • Visual Prompt Tuning for Generative Transfer Learning [paper] [code]

    CVPR 2023

  • CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching [paper] [code]

    CVPR 2023

  • Images Speak in Images: A Generalist Painter for In-Context Visual Learning [paper] [code]

    CVPR 2023

  • PIVOT: Prompting for Video Continual Learning [paper]

    CVPR 2023

  • Learning Expressive Prompting With Residuals for Vision Transformers [paper]

    CVPR 2023

  • BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning [paper] [code]

    CVPR 2023

  • Visual Prompt Multi-Modal Tracking [paper] [code]

    CVPR 2023

  • A-La-Carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting [paper]

    CVPR 2023

  • Understanding and Improving Visual Prompting: A Label-Mapping Perspective [paper] [code]

    CVPR 2023

  • Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning [paper] [code]

    CVPR 2023

  • Explicit Visual Prompting for Low-Level Structure Segmentations low-level segmentation [paper] [code]

    CVPR 2023

  • Understanding and Improving Visual Prompting: A Label-Mapping Perspective [paper] [code]

    CVPR 2023

ArXiv Papers

  • Exploring Visual Prompts for Adapting Large-Scale Models [paper] [code]

    arXiv 2022/03

  • Vision Transformer Adapter for Dense Predictions [paper] [code]

    arXiv 2022/05

  • Neural Prompt Search [paper] [code]

    arXiv 2022/06

  • Convolutional Bypasses Are Better Vision Transformer Adapters [paper] [code]

    arXiv 2022/07

  • Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets [paper]

    arXiv 2022/08

  • Prompt Vision Transformer for Domain Generalization [paper]

    arXiv 2022/08

  • Prompt-Matched Semantic Segmentation [paper]

    arXiv 2022/08

  • Visual Prompt Tuning for Test-time Domain Adaptation [paper]

    arXiv 2022/10

  • Visual Prompting for Adversarial Robustness [paper]

    arXiv 2022/10

  • Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers [paper] [code]

    arXiv 2022/10

  • Towards a Unified View on Visual Parameter-Efficient Transfer Learning [paper] [code]

    arXiv 2022/10

  • Multitask Vision-Language Prompt Tuning [paper] [code]

    arXiv 2022/11

Vision-Language Prompt

This section collects papers prompting pretrained vision-language foundation models (e.g., CLIP) for parameter-efficient adaptation.

  • Learning Transferable Visual Models From Natural Language Supervision [paper] [code]

    ICML 2021

  • Learning to Prompt for Vision-Language Models [paper] [code]

    IJCV 2022

  • Prompt Distribution Learning [paper]

    CVPR 2022

  • Conditional Prompt Learning for Vision-Language Models [paper] [code]

    CVPR 2022

  • DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [paper] [code]

    CVPR 2022

  • Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos [paper] [code]

    CVPR 2022

  • PointCLIP: Point Cloud Understanding by CLIP [paper] [code]

    CVPR 2022

  • VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [paper] [code]

    CVPR 2022

  • A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models [paper]

    ACL 2022

  • Can Language Understand Depth? [paper] [code]

    ACM MM 2022

  • Expanding Language-Image Pretrained Models for General Video Recognition [paper] [code]

    ECCV 2022

  • Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification [paper] [code]

    ECCV 2022

  • OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression [paper]

    NeurIPS 2022

  • Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [paper] [code]

    NeurIPS 2022

  • Learning to Decompose Visual Features with Latent Textual Prompts [paper]

    ICLR 2023

  • PLOT: Prompt Learning with Optimal Transport for Vision-Language Models [paper] [code]

    ICLR 2023

  • Visual-Language Prompt Tuning with Knowledge-guided Context Optimization [paper] [code]

    CVPR 2023

  • Open-Set Fine-Grained Retrieval Via Prompting Vision-Language Evaluator [paper]

    CVPR 2023

  • Multimodal Prompting With Missing Modalities for Visual Recognition [paper] [code]

    CVPR 2023

  • Efficient Multimodal Fusion Via Interactive Prompting [paper]

    CVPR 2023

  • Hierarchical Prompt Learning for Multi-Task Learning [paper] [code]

    CVPR 2023

  • Text-Visual Prompting for Efficient 2D Temporal Video Grounding [paper]

    CVPR 2023

  • VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval [paper] [code]

    CVPR 2023

  • MaPLe: Multi-modal Prompt Learning [paper] [code]

    CVPR 2023

  • Texts as Images in Prompt Tuning for Multi-Label Image Recognition [paper] [code]

    CVPR 2023

  • Vita-CLIP: Video and Text Adaptive CLIP Via Multimodal Prompting [paper] [code]

    CVPR 2023

  • LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models [paper] [code]

    CVPR 2023

  • $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation [paper] [code]

    ICML 2023

  • POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models [paper] [code]

    ICML 2023

  • Rethinking the Openness of CLIP [paper] [code]

    ACL 2023

  • PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization [paper] [code]

    ICCV 2023

ArXiv Papers

  • Colorful Prompt Tuning for Pre-trained Vision-Language Models [paper]

    arXiv 2021/08

  • ActionCLIP: A New Paradigm for Video Action Recognition [paper] [code]

    arXiv 2021/09

  • CLIP-Adapter: Better Vision-Language Models with Feature Adapters [paper] [code]

    arXiv 2021/10

  • Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain Generalization [paper]

    arXiv 2021/11

  • Prompting Visual-Language Models for Efficient Video Understanding [paper] [code]

    arXiv 2021/12 task task task

  • Unsupervised Prompt Learning for Vision-Language Models [paper] [code]

    arXiv 2022/04

  • Prompt-aligned Gradient for Prompt Tuning [paper] [code]

    arXiv 2022/05

  • Parameter-Efficient Image-to-Video Transfer Learning [paper]

    arXiv 2022/06 task

  • DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations [paper]

    arXiv 2022/06 task

  • Prompt Tuning for Generative Multimodal Pretrained Models [paper] [code]

    arXiv 2022/06

  • Prompt Tuning with Soft Context Sharing for Vision-Language Models [paper]

    arXiv 2022/08

  • CPL: Counterfactual Prompt Learning for Vision and Language Models [paper] [code]

    arXiv 2022/10

  • Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models [paper] [code]

    arXiv 2022/10

  • Unified Vision and Language Prompt Learning [paper]

    arXiv 2022/10

  • Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation [paper]

    arXiv 2022/10

  • Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition [paper] [code]

    arXiv 2023/04

Language-Interactable Prompt

Language-interactable prompter develops zero/few-shot capabilities by prompting several independent foundational models (VLMs, LLMs, VMs, etc.) with the language interface. One of the most attractive applications is multimodal chatbot.

  • Multimodal Few-Shot Learning with Frozen Language Models [paper]

    NeurIPS 2021

  • An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA [paper] [code]

    AAAI 2022

  • VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning [paper] [code]

    CVPR 2022

  • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [paper] [code]

    ICLR 2023

Arxiv Papers

  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [paper] [code] [demo] arXiv 2023/03

  • Chameleon: Plug-and-play compositional reasoning with large language models [paper] [code] arXiv 2023/04

  • ClipCap: CLIP Prefix for Image Captioning [paper] [code]

    arXiv 2021/11

  • Flamingo: a Visual Language Model for Few-Shot Learning [paper]

    arXiv 2022/04

  • Language Models Can See: Plugging Visual Controls in Text Generation [paper] [code]

    arXiv 2022/05

  • Zero-Shot Video Question Answering via Frozen Bidirectional Language Models [paper]

    arXiv 2022/06

  • Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning [paper]

    arXiv 2022/06

Vision-Language Instruction Tuning

The goal of vision-language instruction tuning is to train a model that can effectively understand instructions for general-purpose multimodal tasks.

  • Visual Instruction Tuning [paper] [code] [demo]

    arXiv 2023/04

  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models [paper] [code] [demo]

    arXiv 2023/04

  • Otter: A Multi-Modal Model with In-Context Instruction Tuning [paper] [code] [demo]

    arXiv 2023/05

  • MultiModal-GPT: A Vision and Language Model for Dialogue with Humans [paper] [code]

    arXiv 2023/05

  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning [paper] [code]

    arXiv 2023/05

  • InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists [paper] [code] [demo]

    arXiv 2023/09

More Resources

  • PromptPapers: A comprehensive curated list for prompting papers (mainly in natural language processing)
  • Awesome Multimodal Assistant: a curated list for vision-language instruction tuning and LLM-based chatbot.

awesome_prompting_papers_in_computer_vision's People

Contributors

baoshuo avatar byakuya-zi avatar josephkj avatar liulingbo918 avatar nbl97 avatar promptstyler avatar renshuhuai-andy avatar shoufachen avatar sunrainyg avatar ttengwang avatar zhangyuanhan-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome_prompting_papers_in_computer_vision's Issues

Could you please include our paper in the "Vision-Language Prompt" section?

Hi Teng Wang,

I hope you're doing well.

My name is Marco Mistretta, and I am a researcher at MICC (Florence, Italy).
Thanks for creating this repository! It has been an invaluable resource for me.

Could you please add our ECCV24 paper, Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

Thank you for your time and consideration!

TITLE: Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
CONFERENCE: ECCV24
PAPER: https://arxiv.org/abs/2407.03056
CODE; https://github.com/miccunifi/KDPL

About ViT-Adapter

Thank you for this awesome work. It really helps me a lot.

Perhaps "Vision Transformer Adapter for Dense Predictions" (ViT-Adapter) is not a parameter-efficient method, since it actually finetunes the ViT backbone.

Some questions

Hi Teng,

Thanks for organizing this good awesome project for vision prompting.

I commit several updates on your project recently, and I have two questions.

  1. I notice that some papers are not really related to prompting though they are very good, e.g. GLIP. So I am a little confused about which papers you think should be posted on this project.

  2. Some papers are honored as oral in CVPR22, should we make a tag for them?

Looking forward to replying.

Thank you,
Yuanhan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.