Light

yzhuoning / awesome-clip Goto Github PK

View Code? Open in Web Editor NEW

1.0K 19.0 51.0 62 KB

Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

clip contrastive-learning pre-training

awesome-clip's Introduction

Awesome CLIP

This repo collects the research resources based on CLIP (Contrastive Language-Image Pre-Training) proposed by OpenAI. If you would like to contribute, please open an issue.

CLIP

Training

OpenCLIP (3rd-party, PyTorch) [code]
Train-CLIP (3rd-party, PyTorch) [code]
Paddle-CLIP (3rd-party, PaddlePaddle) [code]

Applications

GAN

Object Detection

Roboflow Zero-shot Object Tracking [code]
Zero-Shot Detection via Vision and Language Knowledge Distillation [code]
Crop-CLIP [code]
Detic: Detecting Twenty-thousand Classes using Image-level Supervision [code]
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
SLIP: Self-supervision meets Language-Image Pre-training [code]
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension [code]

Information Retrieval

Unsplash Image Search [code]
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval [code]
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling [code]
Natural Language YouTube Search [code]
CLIP-as-service: Embed images and sentences into fixed-length vectors with CLIP [code]
clip-retrieval [code]
A CLIP-Hitchhiker’s Guide to Long Video Retrieval [code]
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP [code]
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval [code]
Extending CLIP for Category-to-image Retrieval in E-commerce [code]

Representation Learning

Text-to-3D Generation

Text-to-Image Generation

Big Sleep: A simple command line tool for text to image generation [code]
Deep Daze: A simple command line tool for text to image generation [code]
CLIP-CLOP: CLIP-Guided Collage and Photomontage [code]
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP [code]

Prompt Learning

Video Understanding

Image Captioning

CLIP prefix captioning [code]
CLIPScore: A Reference-free Evaluation Metric for Image Captioning [code]
ClipCap: CLIP Prefix for Image Captioning [code]
Text-Only Training for Image Captioning using Noise-Injected CLIP [code]
Fine-grained Image Captioning with CLIP Reward [code]

Image Editing

Image Segmentation

3D Recognition

Audio

Language Tasks

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment [code]

Object Navigation

CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration [code]

Localization

Adapting CLIP For Phrase Localization Without Further Training [code]

Others

Acknowledgment

Inspired by Awesome Visual-Transformer.

awesome-clip's People

Contributors

Stargazers

Watchers

Forkers

szulm amirgoren zzhaozeng pengliang-cn jason-cs18 zhecanjameswang zzxslp hzhang57 metavai nashid sorrowyn leyrio nbl97 lotayou mariyahendriksen inch-z chelokot julianlopezb muzili anhquancao mfahes arslan-z wangjunxiao stevenlzq bairw660606 mrcyme shineyusong wxqianggo gdeleon80 feeeeeel liuhuicnn hookk anh-vunguyen libingzeng songxinkuan wazhenzhen youngfrank successhaha promptstyler esqvebtzlll leehongpyo whuhxb mazeqi truongchau2602 halqadasi ranga-rangarajan mc1016 oztc yaxinhou liam-ji blue-blue272

awesome-clip's Issues

Please add the following papers

Text2Mesh : Their approach can modify a given mesh with given text/image information via CLIP text/image encoder.

Detecting Twenty-thousand Classes using Image-level Supervision : Which is a object detection research by facebook, they use CLIP text embedding as classifier weight.

The above papers are I want to add.

I think Crop-CLIP should be put at Object Detection

For more image manipuliation / generation applications, I summarized in my medium
You can add them if you think they are valueable.
Text-Driven Image Manipulation/Generation with CLIP

The list of above image is allocated at this google sheet

Adding CLIP-as-service?

CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.

Code: https://github.com/jina-ai/clip-as-service
Docs: https://clip-as-service.jina.ai/

Would be awesome to add to this awesome list! Thanks in advance!

（CVPR 2022）Please consider our weakly supervised semantic segmentation based on CLIP

CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2203.02668
Code: https://github.com/CVI-SZU/CLIMS

Related work on prompt learning

Hi!

This is an excellent collection of CLIP-related works. We recently put out a preprint on prompt learning. It would be awesome if you could include our work under the Prompt Learning section. Below are the details:

Learning to Compose Soft Prompts for Compositional Zero-Shot Learning [paper] [code]

Please let me know if you want me to send a PR instead.

Thank you!

wrong with code link of DetCLIP

In the Representation Learning part, the code link following DetCLIP is DeCLIP. I can't find the code of DetCLIP. Maybe it is close source.

About the usage of pseudolabels to enhance CLIP

This repository is very useful to learn about the works bootstrapping off CLIP, thank you for curating it!

We have just published on arXiv a work that investigates how to best use pseudolabels generated by CLIP to enhance CLIP itself. We believe this work to have good applicability for practitioners that want to adapt CLIP to novel tasks efficiently and with limited, or no, labeled data.

You can find the paper here and the code here

I'm happy to submit a pull request if needed :)

related work on information retrieval

Hello!

Thanks for creating this repository, it is super useful!

My colleagues and I recently finalized the work on using CLIP for information retrieval in e-commerce domain. The paper is called 'Extending CLIP for Category-to-image Retrieval in E-commerce', we presented it on ECIR 2022 a couple of month ago.
I would really appreciate it if you could add it to the Information Retrieval subsection. Here is the markdown code in case it is helpful:

Extending CLIP for Category-to-image Retrieval in E-commerce [paper]

Please let me know if you'd rather me send a pull request.

Thank you!

The code of RegionCLIP is now public

Hi Zhuoning,

Thanks for your contribution for this nice repo!

Just wanted to give a update that our work RegionCLIP (CVPR 2022) is now public (https://github.com/microsoft/RegionCLIP). Feel free to give it a try!

PS: the name of RegionCLIP was misspelled.

Best,
Yiwu

Maybe add these papers related to few-shot image classification?

Learning to Prompt for Vision-Language Models [paper][code]
Conditional Prompt Learning for Vision-Language Models [paper][code]
Prompt-aligned Gradient for Prompt Tuning [paper][code]
CLIP-Adapter: Better Vision-Language Models with Feature Adapters [paper][code]

New papers for image captioning

Hi!
Thanks for this great repository.
I'm searching for different papers that used CLIP for image captioning. I read image captioning papers in this repository but I think some papers can be added to this section:

Distinctive Image Captioning via CLIP Guided Group Optimization
paper link: link
The Unreasonable Effectiveness of CLIP Features for Image Captioning:
An Experimental Analysis
paper link: link

Adding ICML 2023 paper

Thanks for creating this repository! It's a very comprehensive source of information.

Could you please add our ICML 2023 paper, POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models
? The code is provided in this link?

We appreciate your help. Thanks!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.