Coder Social home page Coder Social logo

navervision / lincir Goto Github PK

View Code? Open in Web Editor NEW
92.0 92.0 5.0 1.74 MB

Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)

License: Other

Python 100.00%
composed-image-retrieval cvpr2024 image-retrieval

lincir's People

Contributors

geonm avatar sanghyukchun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

lincir's Issues

Why training and validation use the same function? (encode_with_pseudo_tokens_HF)

In the Line 239 of train_phi.py and Line 162 of validate.py, i.e.,

replaced_text_embeddings, replaced_last_hidden_states = encode_with_pseudo_tokens_HF(text_encoder, replaced_tokens, estimated_token_embeddings, return_last_states=True)
and
text_features = encode_with_pseudo_tokens_HF(clip_model, tokenized_input_captions, batch_tokens)
, Why these two use the same function? The former's replaced_tokens contain not only one $, but the latter's tokenized_input_captions contain only one $. The meaning of the two usages is different, and the former assigns the same value to each $, which seems problematic.

Using other models as backbone

Hi, thank you for your interesting work,

I wonder if I can use other models than CLIP ( like BLIP, BLIP-2, etc) as backbone.

How can it be done?
What modifications need to be done?

About the inference

I think it's very interesting work!

The training process is clear, but there seems to be some ambiguity about the inference. For example, if the pre-trained module \phi receives images directly as input, how does it concatenate the output with conditions during inference?
Table B.5 shows the results of different prompts. What kind of prompt does the author use in Table 2-5?

Looking forward to the author's reply!

About ViT-G backbone pretrained model

Hi, I tried to pretrained a phi model with ViT-G backbone, but the results are not as good. Can you provide a pretrained model with ViT-G as backbone?

About single GPU

Thank you for this great work!
I trained lincir on my computer with a single GPU and found that the program would hang after running all the steps.
Did you face the same problem when using a single GPU for training?

About Model

Nice work! Do you have a plan for when you will release the model?

Maybe a code bug?

Great work! In the line 32 of encode_with_pseudo_tokens.py, i.e.,

x = torch.where(text.unsqueeze(-1) == 259,
. Why input the text embedding of the caption? and the dimension of pseudo_tokens is [bs, 768], while the dimension of x is [bs, 77, 768]. Thus, why input the embedding of the single pseudo_tokens to each position of mask with the same embedding? And then x will be input to the clip text emcoder. The logic seems to be incomprehensible. Looking forward to your reply!

Evaluation on GeneCIS benchmark

According to the README, the code for evaluating the GeneCIS benchmark is located in a branch named eval_genecis. However, I could not find this specific branch upon checking your repository.

GeneCIS

Evaluating GeneCIS requires a few additional steps. Check out the eval_genecis branch and make the necessary adjustments to the configuration in ./eval_genecis/config.py. Then, run the following script:

$ cd eval_genecis

$ python evaluate.py \
--combiner_mode phi \
--model large \
--combiner_pretrain_path /path/to/trained_your/phi_best.pt

If this branch hasn't been uploaded, could you make it available?

About the performance

I tried to reproduce the whole training, but found that the performance did not reach the level shown in the paper, and it would not increase after about 5000 steps, and it might decline in the following iteration, and the performance improvement was not obvious compared with the beginning, may I ask if the learning rate has been maintained at 1e-4? Whether adjusting the learning rate can further improve performance.

Request for Code Acknowledgment and License Clarification

Hi there!

First of all, I've been reading your paper, and it's a really interesting piece of work - well done!
On another note, thank you for including a citation to our work in your LinCIR paper; it's really appreciated!

While examining your code, I noticed some notable similarities. For example, the file 'validate.py' is almost identical. Could I kindly request that you include a citation to our code in your README?

On a friendly note, I noticed that the license information in the files borrowed from SEARLE is still intact. To be consistent with open-source practices, would you mind removing the license information from the files taken directly from our repository?

Thank you very much for your understanding and cooperation.

Best,
Alberto

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.