abaldrati / clip4cirdemo Goto Github PK

View Code? Open in Web Editor NEW

71.0 71.0 8.0 1.36 MB

[CVPR 2022 - Demo Track] - Effective conditioned and composed image retrieval combining CLIP-based features

Python 3.23% JavaScript 39.50% SCSS 55.86% HTML 1.40%

cirr cvpr2022 demo-app fashion-iq pytorch retrieval-chatbot

clip4cirdemo's People

Stargazers

Watchers

Forkers

leerock peternara anminhhung shahabmokari pks20iitk faithfulnguyen hoangfvx karthikra

clip4cirdemo's Issues

database

What image DB did you use when you implemented it?

I am trying to replicate your results from CIRR and was wondering if you could release the training code? I have evaluated the pre-trained weights from the demo on the evaluation server and get roughly the same results. However, when training the combiner from scratch, I am only getting about 31.5% (global R@1, test set, CLIPR50x4, only train 'combiner'). I am training with full precision on top of pre-extracted features so I can fit the batch size of 4096.

no full fashion dataset available?

hello. As mentioned in the paper you used the original fashionIQ dataset containing around 77000 images. but there are many missing images in the current dataset (from the main GitHub page for the dataset and also other sources). do you have access to the full dataset? I would appreciate it if you share the whole dataset to be used in future works.

the encode image feature is nan

when I run extract_features.py，I found the elements of batch_features are all nan:

batch_features = clip_model.encode_image(images)

result: tensor([[nan, nan, nan ....nan, nan]]
then I use the code below to test. When I load RN50x4 model, the result is the same as above(both wild image and dataset image). But when I load another model like ViT-B/32, the result is not nan:

import clip
import torch
import PIL.Image
import PIL.ImageOps
from data_utils import targetpad_transform, server_base_path

if torch.cuda.is_available():
    device = torch.device("cuda")
    data_type = torch.float16
else:
    device = torch.device("cpu")
    data_type = torch.float32

#RN50x4
#"ViT-B/32"
clip_model, clip_preprocess = clip.load("RN50x4")
clip_model = clip_model.eval().to(device)

image_path = 'woman.jpg'
pil_image = PIL.Image.open(image_path).convert('RGB')
image = targetpad_transform(1.25, clip_model.visual.input_resolution)(pil_image).to(device)
reference_features = clip_model.encode_image(image.unsqueeze(0))

I would like to know how to solve this problem? And I would appreciate it if you could test it and tell me the solution.

abaldrati / clip4cirdemo Goto Github PK

clip4cirdemo's People

Stargazers

Watchers

Forkers

clip4cirdemo's Issues

database

Training code / details

no full fashion dataset available?

the encode image feature is nan

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent