so - I've been looking into some code for VQGAN <a href="https://github.com/mehdid

Generating prompts from an image about open_clip HOT 5 CLOSED

mlfoundations commented on July 22, 2024

Generating prompts from an image

from open_clip.

Comments (5)

mitchellnw commented on July 22, 2024 1

Hello, although this is not currently enabled with this repo we think this is feasible.

Say you have an image x and a huge bank of possible captions t_1,...,t_n.

You could get image/text features by running everything through CLIP, e.g. f_im = model.encode_image(x) and text features for f_j = model.encode_text(t_j) for j in 1,...,n.

Then you could use a nearest neighbors library like faiss to find the nearest neighbor between f_j for j in 1,...,n and f_im.

That nearest neighbor could be used as the generated words for that image.

from open_clip.

gabrielilharco commented on July 22, 2024 1

Alternatively, if you don't have bank of possible captions, you can try performing an automated search for the prompts that maximize agreement with the image. A good starting point is the method from Shin et al., 2020 (https://arxiv.org/abs/2010.15980)

from open_clip.

johndpope commented on July 22, 2024

https://ucinlp.github.io/autoprompt/

With a bit of github digging for "faiss clip" - got a hit on this repo by @ps-auxw.
It seems like he has done it - https://github.com/ps-auxw/CLI-P

I'll ask if @ps-auxw can integrate his repo to use this open_clip.

UPDATE - so if anyone's interested there's a neat way to install faiss using just pip -
https://pypi.org/project/faiss-gpu/

this repo above has 2 steps -

build-index.py / just from images.
query-index.py / just query the index built from step 1.

"Say you have an image x and a huge bank of possible captions t_1,...,t_n."
Here is said dataset / 12million captions....
https://github.com/google-research-datasets/conceptual-12m

UPDATE 2.
This repo - https://github.com/johndpope/rerank/blob/main/data/prepare_data.py
seems to do take captions +( from variety of datasets) and images / indexes them and includes some retrieval via knearest.
https://github.com/RitaRamo/rerank/blob/993fb49df843ba8c5a3567aa97c0e5382ecbe48e/src/toolkit/data/datasets.py
def retrieve_nearest_for_train_query(self, query_img, k=2):

UPDATE 3.
found this - which looks more scalable than option 2.
https://github.com/rom1504/clip-retrieval

from open_clip.

johndpope commented on July 22, 2024

I believe this does it
https://github.com/dzryk/clip-grams

from open_clip.

mitchellnw commented on July 22, 2024

great thanks for linking!

from open_clip.

Recommend Projects

Generating prompts from an image about open_clip HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent