gaopengcuhk / tip-adapter Goto Github PK

View Code? Open in Web Editor NEW

506.0 506.0 41.0 7.61 MB

Python 100.00%

tip-adapter's People

Contributors

Stargazers

Watchers

Forkers

zrrskywalker teleema bdevnani3 idealwei ronniefu bohao-lee cv-ip gaopengpjlab hongbo-sun omipan shijiaying oe-heart antonioo-c chengy12 faithxia herisai yunkai696 sugary199 nihalbaig0 chenjiehu maxzanella myl-uestc jingchensun holmes-gu mandylove1993 chasemcdo msergencatal xxynov 59-lmq awj2021 buzzy0423 quas-modo onceuponatimemathley fffinale auroramylin hui55hua lanxin-xiang i4vk whuhxb yzh-dev

tip-adapter's Issues

Inference on new Image

Any pointers or code snippet to run inference on new image would be helpful

Thanks!

code for tSNE visualization

Can you release the code for tSNE visualization?

Where is the CLIP-Adapter code, please?

Greetings! Really fantastic work!
But where is the CLIP-Adapter code? I wonder how is the two-layer MLP in CLIP-Adapter structured.

Bug when I try cifar100

Thanks for your work.
When I try your code on CIFAR100, I got this error and I dont know how to slove it.
Due to ImageNet's huge number of images, I can only do this.
PLS help.

Torch version: 1.7.1 Namespace(alpha=1, augment_epoch=10, beta=1.17, lr=0.001, train_epoch=20) Model parameters: 151,277,313 Input resolution: 224 Context length: 77 Vocab size: 49408 Load data finished. start getting text features. finish getting text features. start getting image features start saving training image features Augment time: 0 / 10 3%|▉ | 6/196 [00:03<01:45, 1.81it/s] Traceback (most recent call last): File "main.py", line 487, in <module> main() File "main.py", line 244, in main for i, (images, target) in enumerate(tqdm(train_loader)): File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/tqdm/std.py", line 1180, in __iter__ for obj in iterable: File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__ data = self._next_data() File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torchvision/datasets/cifar.py", line 113, in __getitem__ img, target = self.data[index], self.targets[index] IndexError: list index out of range [1]+ Killed python main.py

请问作者torch和torchvision的版本是多少？谢谢

replicate your results on food101 dataset

Would you consider providing the script to replicate your results on food101 dataset? If someone is to adapt your script on ImageNet, do you have suggestions on what to make sure to adjust?

Run TIP-adapter on text2img retrieval instead

Hi, thanks for the amazing work on adapters on CLIP. Currently the framework computes the affinities between the test query image and the cache keys, before obtaining the corresponding few-shot label. This works well and good. I would just like your advise on how can i extend this to text2img retrieval where I would like to query with text search term, and utilise the cache key-value adapter to return corresponding images. Would it be as naive as to do a text to text embedding affinity matching of the query text with the cache VALUES (instead of keys) as they contain the ground truth labels for the few-shot learning?

Can't find directory 'dalle_caltech-101\\dalle_caltech.json' , How to solve it?

Running configs.
{'root_path': '', 'load_cache': False, 'load_pre_feat': False, 'search_hp': True, 'search_scale': [12, 5], 'search_step': [200, 20], 'init_beta': 1, 'init_alpha': 1.3, 'gpt3_prompt_file': './gpt_file/caltech_prompt.json', 'dataset': 'caltech101', 'shots': 16, 'clip_backbone': 'RN50', 'dino_backbone': 'resnet50', 'dalle_dataset': 'dalle_caltech', 'dalle_shots': 1, 'lr': 0.001, 'augment_epoch': 10, 'train_epoch': 20, 'cache_dir': './caches\caltech101'}

Pretrained weights found at dino/dino_resnet50_pretrain.pth and loaded with msg:
Preparing dataset.
Reading split from E:/semester of junior year 2/graduation design/CaFo-main/DATA/caltech-101/split_zhou_Caltech101.json
Creating a 16-shot dataset
Reading split from dalle_caltech-101\dalle_caltech.json
Traceback (most recent call last):
File "main.py", line 292, in
main()
File "main.py", line 228, in main
dalle_dataset = build_dataset(cfg['dalle_dataset'], cfg['root_path'], cfg['dalle_shots'])
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets_init_.py", line 51, in build_dataset
return dataset_list[dataset](root_path, shots)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\dalle_caltech.py", line 15, in init
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\oxford_pets.py", line 120, in read_split
split = read_json(filepath)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\utils.py", line 17, in read_json
with open(fpath, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'dalle_caltech-101\dalle_caltech.json'

Why am I having this problem after configuring my environment?

Any future plan to add support for microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224?

Current CLIP model seems to be lack of knowledge about medical images, so introduce more pre-trained CLIP models may extend the project's scope of application.

Sharpness & residual ratio range

Hi,
I can't find any information on how you set the alpha and beta hyperparam ranges for each dataset. Why don't you use the same ranges for all sets, and how did you determine these ranges?

why set the mode to eval while training TIP-Adapter

Dear author,
I find that while training TIP-Adapter, the clip model is still set to eval mode instead of clip.train(), what is the underlying reason?

The "alpha" and "beta" in the paper are the opposite of the "alpha" and "beta" in the code of Tip-Adapter

In Code,
"alpha_list = [i * (6.0 - 1.0) / 20 + 1 for i in range(20)] " "beta_list = [i * (7 - 0.1) / 200 + 0.1 for i in range(200)]"
In paper,

which version for ImageNet dataset

https://image-net.org/challenges/LSVRC/index.php
I saw a lot of versions: 2010,2012, ...

Is it convenient to provide a link that can be downloaded directly?

test

Can you provide the code of how to classify any test set picture by using the cache model and test the class probability?

Adaptor used in vision encoder or text encoder?

Hey, Thanks for nice work. I have some confusion as follows.
First, why the adaptor is used only in vision encoder, did the authors try to use the adaptor in text encoder?
Second, I don't understand why using adaptor performs better using learnable prompt. In addition, the "adaptor" used in this paper is different from the adaptor in NLP tasks, also the position of the insertion is different, which one is better?

Can you give me 他和

search_hp

Hi, Could please the author explain the use of the searchhp function in the code? It doesn't seem to be mentioned in the paper,and how to select the search_ Step and search_ scale ? What are the ranges of alpha and beta?

How to extend to base-to-novel classes task?

Hi, This method modifies the parameters of the text encoder, so it cannot extend to base-to-new classes tasks. I would like to know how to address this problem.

Is the "clip" folder the same as the "clip" folder in original clip github?

Hi,
I notice that you didn't install clip. Instead you use a "clip" folder. I wonder if the "clip" folder the same as the "clip" folder in the original clip?

Number of runs used to generate the results

Hi,

Thank for the very nice repo!
How many few-shot runs did you use to generate the results reported in the paper?

Thanks in advance!

"Tip-Adapter/main.py" use test features to eval

It seems odd to use test features to eval.
see

Tip-Adapter/main.py

Line 111 in fcb0605

# Eval

Could authors give some explanation?

Are CLIP/TIP-Adapter only designed for the few-shot setting?

Sorry I've got another question.
I did not find experiments under the base-to-new/domain generalization setting and cross-dataset transfer setting, which is conducted by CoCoOp.
Are CLIP/TIP-Adapter only designed for the few-shot setting? I wonder how the generation abilities are. Maybe you can give me any intuition?

Details of data augmentation

In the paper, "the CLIP-style pre-processing resizes the cropped image’s short side to 224 while keeping its original aspect", and you said that you use the CLIP-style RandomResizeCrop.

However, I found that in the code, the standard RandomResizeCrop is used.

I wonder that is this setting important to the final performance or I misunderstood here?