Coder Social home page Coder Social logo

tip-adapter's People

Contributors

gaopengcuhk avatar zrrskywalker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tip-adapter's Issues

Bug when I try cifar100

Thanks for your work.
When I try your code on CIFAR100, I got this error and I dont know how to slove it.
Due to ImageNet's huge number of images, I can only do this.
PLS help.

Torch version: 1.7.1 Namespace(alpha=1, augment_epoch=10, beta=1.17, lr=0.001, train_epoch=20) Model parameters: 151,277,313 Input resolution: 224 Context length: 77 Vocab size: 49408 Load data finished. start getting text features. finish getting text features. start getting image features start saving training image features Augment time: 0 / 10 3%|▉ | 6/196 [00:03<01:45, 1.81it/s] Traceback (most recent call last): File "main.py", line 487, in <module> main() File "main.py", line 244, in main for i, (images, target) in enumerate(tqdm(train_loader)): File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/tqdm/std.py", line 1180, in __iter__ for obj in iterable: File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__ data = self._next_data() File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torchvision/datasets/cifar.py", line 113, in __getitem__ img, target = self.data[index], self.targets[index] IndexError: list index out of range [1]+ Killed python main.py

replicate your results on food101 dataset

Would you consider providing the script to replicate your results on food101 dataset? If someone is to adapt your script on ImageNet, do you have suggestions on what to make sure to adjust?

Run TIP-adapter on text2img retrieval instead

Hi, thanks for the amazing work on adapters on CLIP. Currently the framework computes the affinities between the test query image and the cache keys, before obtaining the corresponding few-shot label. This works well and good. I would just like your advise on how can i extend this to text2img retrieval where I would like to query with text search term, and utilise the cache key-value adapter to return corresponding images. Would it be as naive as to do a text to text embedding affinity matching of the query text with the cache VALUES (instead of keys) as they contain the ground truth labels for the few-shot learning?

Can't find directory 'dalle_caltech-101\\dalle_caltech.json' , How to solve it?

Running configs.
{'root_path': '', 'load_cache': False, 'load_pre_feat': False, 'search_hp': True, 'search_scale': [12, 5], 'search_step': [200, 20], 'init_beta': 1, 'init_alpha': 1.3, 'gpt3_prompt_file': './gpt_file/caltech_prompt.json', 'dataset': 'caltech101', 'shots': 16, 'clip_backbone': 'RN50', 'dino_backbone': 'resnet50', 'dalle_dataset': 'dalle_caltech', 'dalle_shots': 1, 'lr': 0.001, 'augment_epoch': 10, 'train_epoch': 20, 'cache_dir': './caches\caltech101'}

Pretrained weights found at dino/dino_resnet50_pretrain.pth and loaded with msg:
Preparing dataset.
Reading split from E:/semester of junior year 2/graduation design/CaFo-main/DATA/caltech-101/split_zhou_Caltech101.json
Creating a 16-shot dataset
Reading split from dalle_caltech-101\dalle_caltech.json
Traceback (most recent call last):
File "main.py", line 292, in
main()
File "main.py", line 228, in main
dalle_dataset = build_dataset(cfg['dalle_dataset'], cfg['root_path'], cfg['dalle_shots'])
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets_init_.py", line 51, in build_dataset
return dataset_list[dataset](root_path, shots)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\dalle_caltech.py", line 15, in init
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\oxford_pets.py", line 120, in read_split
split = read_json(filepath)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\utils.py", line 17, in read_json
with open(fpath, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'dalle_caltech-101\dalle_caltech.json'

Sharpness & residual ratio range

Hi,
I can't find any information on how you set the alpha and beta hyperparam ranges for each dataset. Why don't you use the same ranges for all sets, and how did you determine these ranges?

test

Can you provide the code of how to classify any test set picture by using the cache model and test the class probability?

Adaptor used in vision encoder or text encoder?

Hey, Thanks for nice work. I have some confusion as follows.
First, why the adaptor is used only in vision encoder, did the authors try to use the adaptor in text encoder?
Second, I don't understand why using adaptor performs better using learnable prompt. In addition, the "adaptor" used in this paper is different from the adaptor in NLP tasks, also the position of the insertion is different, which one is better?

search_hp

Hi, Could please the author explain the use of the searchhp function in the code? It doesn't seem to be mentioned in the paper,and how to select the search_ Step and search_ scale ? What are the ranges of alpha and beta?

Are CLIP/TIP-Adapter only designed for the few-shot setting?

Sorry I've got another question.
I did not find experiments under the base-to-new/domain generalization setting and cross-dataset transfer setting, which is conducted by CoCoOp.
Are CLIP/TIP-Adapter only designed for the few-shot setting? I wonder how the generation abilities are. Maybe you can give me any intuition?

Details of data augmentation

In the paper, "the CLIP-style pre-processing resizes the cropped image’s short side to 224 while keeping its original aspect", and you said that you use the CLIP-style RandomResizeCrop.

However, I found that in the code, the standard RandomResizeCrop is used.

I wonder that is this setting important to the final performance or I misunderstood here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.