navervision / lincir Goto Github PK

View Code? Open in Web Editor NEW

92.0 92.0 5.0 1.74 MB

Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)

License: Other

Python 100.00%

composed-image-retrieval cvpr2024 image-retrieval

lincir's People

Contributors

Stargazers

Watchers

Forkers

peternara drogozhang anh-ngn valeria-maw emasa

lincir's Issues

Why training and validation use the same function? (encode_with_pseudo_tokens_HF)

In the Line 239 of train_phi.py and Line 162 of validate.py, i.e.,

lincir/train_phi.py

Line 239 in 6ffbdeb

    
           replaced_text_embeddings, replaced_last_hidden_states = encode_with_pseudo_tokens_HF(text_encoder, replaced_tokens, estimated_token_embeddings, return_last_states=True)

and

lincir/validate.py

Line 162 in 6ffbdeb

    
           text_features = encode_with_pseudo_tokens_HF(clip_model, tokenized_input_captions, batch_tokens)

, Why these two use the same function? The former's replaced_tokens contain not only one $, but the latter's tokenized_input_captions contain only one $. The meaning of the two usages is different, and the former assigns the same value to each $, which seems problematic.

Using other models as backbone

Hi, thank you for your interesting work,

I wonder if I can use other models than CLIP ( like BLIP, BLIP-2, etc) as backbone.

How can it be done?
What modifications need to be done?

Training datasets

In the paper, CC3M and 2.47M StableDiffusion prompts are employed for training. However, in the released code, three datasets are adopted, so i want to know if the 'dataset3': 'Geonmo/midjourney-prompts-onlyonly' is used for training, i.e.,https://github.com/navervision/lincir/blob/28943db28b4f65d41dc2724b6e79596b0b8cc82d/loader.py#L219C19-L219C21

About the inference

I think it's very interesting work!

The training process is clear, but there seems to be some ambiguity about the inference. For example, if the pre-trained module \phi receives images directly as input, how does it concatenate the output with conditions during inference?
Table B.5 shows the results of different prompts. What kind of prompt does the author use in Table 2-5?

Looking forward to the author's reply!

About ViT-G backbone pretrained model

Hi, I tried to pretrained a phi model with ViT-G backbone, but the results are not as good. Can you provide a pretrained model with ViT-G as backbone?

About single GPU

Thank you for this great work!
I trained lincir on my computer with a single GPU and found that the program would hang after running all the steps.
Did you face the same problem when using a single GPU for training?

About Model

Nice work! Do you have a plan for when you will release the model?

Maybe a code bug?

Great work! In the line 32 of encode_with_pseudo_tokens.py, i.e.,

lincir/encode_with_pseudo_tokens.py

Line 32 in 6ffbdeb

x = torch.where(text.unsqueeze(-1) == 259,

. Why input the text embedding of the caption? and the dimension of pseudo_tokens is [bs, 768], while the dimension of x is [bs, 77, 768]. Thus, why input the embedding of the single pseudo_tokens to each position of mask with the same embedding? And then x will be input to the clip text emcoder. The logic seems to be incomprehensible. Looking forward to your reply!

About the training logs and the codes

I've run the training code, but I can't find the log file in logs folder. Besides, there may be a problem in the training code that the training loop does not seem to stop after max_train_steps, possibly because it does not exit the While True loop .(https://github.com/navervision/lincir/blob/b1ce7d283ab92c0f131972c71d5fed1ce54f23ac/train_phi.py#L222C1-L222C1)

Evaluation on GeneCIS benchmark

According to the README, the code for evaluating the GeneCIS benchmark is located in a branch named eval_genecis. However, I could not find this specific branch upon checking your repository.

GeneCIS

Evaluating GeneCIS requires a few additional steps. Check out the eval_genecis branch and make the necessary adjustments to the configuration in ./eval_genecis/config.py. Then, run the following script:

$ cd eval_genecis

$ python evaluate.py \
--combiner_mode phi \
--model large \
--combiner_pretrain_path /path/to/trained_your/phi_best.pt

If this branch hasn't been uploaded, could you make it available?

About the performance

I tried to reproduce the whole training, but found that the performance did not reach the level shown in the paper, and it would not increase after about 5000 steps, and it might decline in the following iteration, and the performance improvement was not obvious compared with the beginning, may I ask if the learning rate has been maintained at 1e-4? Whether adjusting the learning rate can further improve performance.

Request for Code Acknowledgment and License Clarification

Hi there!

First of all, I've been reading your paper, and it's a really interesting piece of work - well done!
On another note, thank you for including a citation to our work in your LinCIR paper; it's really appreciated!

While examining your code, I noticed some notable similarities. For example, the file 'validate.py' is almost identical. Could I kindly request that you include a citation to our code in your README?

On a friendly note, I noticed that the license information in the files borrowed from SEARLE is still intact. To be consistent with open-source practices, would you mind removing the license information from the files taken directly from our repository?

Thank you very much for your understanding and cooperation.

Best,
Alberto