Coder Social home page Coder Social logo

kingyiusuen / clip-image-search Goto Github PK

View Code? Open in Web Editor NEW
174.0 4.0 21.0 22 KB

Search images with a text or image query, using Open AI's pretrained CLIP model.

License: MIT License

Makefile 5.48% Python 90.04% Dockerfile 4.48%
reverse-image-search elasticsearch deep-learning search-engine image-search image-search-engine computer-vision streamlit-webapp

clip-image-search's Introduction

Image Search using CLIP

Streamlit App Code style: black pre-commit CI/CD pipeline License

Retrieve images based on a query (text or image), using Open AI's pretrained CLIP model.

Text as query.

Image as query.

Introduction

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can map images and text into the same latent space, so that they can be compared using a similarity measure.

CLIP

Extending the work in this repository, I created a simple image search engine that can take both text and images as query. The search engine works as follows:

  1. Use the image encoder to compute the feature vector of the images in the dataset.
  2. Index the images in the following format:
    image_id: {"url": https://abc.com/xyz, "feature_vector": [0.1, 0.3, ..., 0.2]}
    
  3. Compute the feature vector of the query. (Use text encoder if query is text. Use image encoder if query is image.)
  4. Compute the cosine similarities between the feature vector of the query and the feature vector of the images in the dataset.
  5. Return $k$ images that have the highest similarity.

I used the lite version of the Unsplash dataset that contains 25,000 images. The k-Nearest Neighbor search is powered by Amazon Elasticsearch Service. I deployed the query service as an AWS Lambda function and put an API gateway in front of it. The frontend is developed using Streamlit.

Possible Improvements

  • The feature vector outputted by CLIP is a 32-bit floating point vector with 512 dimensions. To reduce storage cost and increase query speed, we may consider using a dimension reduction technique such as PCA to reduce the number of features. If we want to scale the system to billions of images, we may even consider binarizing the features, as is done in Pinterest.

How to Use

Install dependencies

pip install -e . --no-cache-dir

Download the Unsplash dataset

python scripts/download_unsplash.py --image_width=480 --threads_count=32

This will download and extract a zip file that contains the metadata about the photos in the dataset. The script will use the URLs of the photos to download the actual images to unsplash-dataset/photos. The download may fail for a few images (see this issue). Since CLIP will downsample the images to 224 x 224 anyway, you may want to adjust the width of the downloaded images to reduce storage space. You may also want to increase the threads_count parameter to achieve a faster performance.

Create index and upload image feature vectors to Elasticsearch

python scripts/ingest_data.py

The script will download the pretrained CLIP model and process the images by batch. It will use GPU if there is one.

Build Docker image

Build Docker image for AWS Lambda.

docker build --build-arg AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID \
             --build-arg AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY \
             --tag clip-image-search \
             --file server/Dockerfile .

Run the Docker image as a container.

docker run -p 9000:8080 -it --rm clip-image-search

Test the container with a POST request.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"query": "two dogs", "input_type": "text"}'

Run Streamlit app

streamlit run streamlit_app.py

Acknowledgement

clip-image-search's People

Contributors

kingyiusuen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

clip-image-search's Issues

Demo is not working

Unfortunately, right now the demo is not working...
Error message:

Whoops โ€” something went wrong! An error has been logged.
(for both text and image search)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.