Coder Social home page Coder Social logo

Comments (5)

mitchellnw avatar mitchellnw commented on July 22, 2024 1

Hello, although this is not currently enabled with this repo we think this is feasible.

Say you have an image x and a huge bank of possible captions t_1,...,t_n.

You could get image/text features by running everything through CLIP, e.g. f_im = model.encode_image(x) and text features for f_j = model.encode_text(t_j) for j in 1,...,n.

Then you could use a nearest neighbors library like faiss to find the nearest neighbor between f_j for j in 1,...,n and f_im.

That nearest neighbor could be used as the generated words for that image.

from open_clip.

gabrielilharco avatar gabrielilharco commented on July 22, 2024 1

Alternatively, if you don't have bank of possible captions, you can try performing an automated search for the prompts that maximize agreement with the image. A good starting point is the method from Shin et al., 2020 (https://arxiv.org/abs/2010.15980)

from open_clip.

johndpope avatar johndpope commented on July 22, 2024

https://ucinlp.github.io/autoprompt/

With a bit of github digging for "faiss clip" - got a hit on this repo by @ps-auxw.
It seems like he has done it - https://github.com/ps-auxw/CLI-P

I'll ask if @ps-auxw can integrate his repo to use this open_clip.

UPDATE - so if anyone's interested there's a neat way to install faiss using just pip -
https://pypi.org/project/faiss-gpu/

this repo above has 2 steps -

  1. build-index.py / just from images.
  2. query-index.py / just query the index built from step 1.

"Say you have an image x and a huge bank of possible captions t_1,...,t_n."
Here is said dataset / 12million captions....
https://github.com/google-research-datasets/conceptual-12m

UPDATE 2.
This repo - https://github.com/johndpope/rerank/blob/main/data/prepare_data.py
seems to do take captions +( from variety of datasets) and images / indexes them and includes some retrieval via knearest.
https://github.com/RitaRamo/rerank/blob/993fb49df843ba8c5a3567aa97c0e5382ecbe48e/src/toolkit/data/datasets.py
def retrieve_nearest_for_train_query(self, query_img, k=2):

UPDATE 3.
found this - which looks more scalable than option 2.
https://github.com/rom1504/clip-retrieval

from open_clip.

johndpope avatar johndpope commented on July 22, 2024

I believe this does it
https://github.com/dzryk/clip-grams

from open_clip.

mitchellnw avatar mitchellnw commented on July 22, 2024

great thanks for linking!

from open_clip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.