tip-adapter's People
Forkers
zrrskywalker teleema bdevnani3 idealwei ronniefu bohao-lee cv-ip gaopengpjlab hongbo-sun omipan shijiaying oe-heart antonioo-c chengy12 faithxia herisai yunkai696 sugary199 nihalbaig0 chenjiehu maxzanella myl-uestc jingchensun holmes-gu mandylove1993 chasemcdo msergencatal xxynov 59-lmq awj2021 buzzy0423 quas-modo onceuponatimemathley fffinale auroramylin hui55hua lanxin-xiang i4vk whuhxb yzh-devtip-adapter's Issues
Inference on new Image
Any pointers or code snippet to run inference on new image would be helpful
Thanks!
code for tSNE visualization
Can you release the code for tSNE visualization?
Where is the CLIP-Adapter code, please?
Greetings! Really fantastic work!
But where is the CLIP-Adapter code? I wonder how is the two-layer MLP in CLIP-Adapter structured.
Bug when I try cifar100
Thanks for your work.
When I try your code on CIFAR100, I got this error and I dont know how to slove it.
Due to ImageNet's huge number of images, I can only do this.
PLS help.
Torch version: 1.7.1 Namespace(alpha=1, augment_epoch=10, beta=1.17, lr=0.001, train_epoch=20) Model parameters: 151,277,313 Input resolution: 224 Context length: 77 Vocab size: 49408 Load data finished. start getting text features. finish getting text features. start getting image features start saving training image features Augment time: 0 / 10 3%|▉ | 6/196 [00:03<01:45, 1.81it/s] Traceback (most recent call last): File "main.py", line 487, in <module> main() File "main.py", line 244, in main for i, (images, target) in enumerate(tqdm(train_loader)): File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/tqdm/std.py", line 1180, in __iter__ for obj in iterable: File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__ data = self._next_data() File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torchvision/datasets/cifar.py", line 113, in __getitem__ img, target = self.data[index], self.targets[index] IndexError: list index out of range [1]+ Killed python main.py
请问作者torch和torchvision的版本是多少?谢谢
replicate your results on food101 dataset
Would you consider providing the script to replicate your results on food101 dataset? If someone is to adapt your script on ImageNet, do you have suggestions on what to make sure to adjust?
Run TIP-adapter on text2img retrieval instead
Hi, thanks for the amazing work on adapters on CLIP. Currently the framework computes the affinities between the test query image and the cache keys, before obtaining the corresponding few-shot label. This works well and good. I would just like your advise on how can i extend this to text2img retrieval where I would like to query with text search term, and utilise the cache key-value adapter to return corresponding images. Would it be as naive as to do a text to text embedding affinity matching of the query text with the cache VALUES (instead of keys) as they contain the ground truth labels for the few-shot learning?
Can't find directory 'dalle_caltech-101\\dalle_caltech.json' , How to solve it?
Running configs.
{'root_path': '', 'load_cache': False, 'load_pre_feat': False, 'search_hp': True, 'search_scale': [12, 5], 'search_step': [200, 20], 'init_beta': 1, 'init_alpha': 1.3, 'gpt3_prompt_file': './gpt_file/caltech_prompt.json', 'dataset': 'caltech101', 'shots': 16, 'clip_backbone': 'RN50', 'dino_backbone': 'resnet50', 'dalle_dataset': 'dalle_caltech', 'dalle_shots': 1, 'lr': 0.001, 'augment_epoch': 10, 'train_epoch': 20, 'cache_dir': './caches\caltech101'}
Pretrained weights found at dino/dino_resnet50_pretrain.pth and loaded with msg:
Preparing dataset.
Reading split from E:/semester of junior year 2/graduation design/CaFo-main/DATA/caltech-101/split_zhou_Caltech101.json
Creating a 16-shot dataset
Reading split from dalle_caltech-101\dalle_caltech.json
Traceback (most recent call last):
File "main.py", line 292, in
main()
File "main.py", line 228, in main
dalle_dataset = build_dataset(cfg['dalle_dataset'], cfg['root_path'], cfg['dalle_shots'])
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets_init_.py", line 51, in build_dataset
return dataset_list[dataset](root_path, shots)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\dalle_caltech.py", line 15, in init
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\oxford_pets.py", line 120, in read_split
split = read_json(filepath)
File "E:\semester of junior year 2\graduation design\CaFo-main\datasets\utils.py", line 17, in read_json
with open(fpath, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'dalle_caltech-101\dalle_caltech.json'
Why am I having this problem after configuring my environment?
Any future plan to add support for microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224?
Current CLIP model seems to be lack of knowledge about medical images, so introduce more pre-trained CLIP models may extend the project's scope of application.
Sharpness & residual ratio range
Hi,
I can't find any information on how you set the alpha and beta hyperparam ranges for each dataset. Why don't you use the same ranges for all sets, and how did you determine these ranges?
why set the mode to eval while training TIP-Adapter
Dear author,
I find that while training TIP-Adapter, the clip model is still set to eval mode instead of clip.train(), what is the underlying reason?
The "alpha" and "beta" in the paper are the opposite of the "alpha" and "beta" in the code of Tip-Adapter
which version for ImageNet dataset
https://image-net.org/challenges/LSVRC/index.php
I saw a lot of versions: 2010,2012, ...
Is it convenient to provide a link that can be downloaded directly?
test
Can you provide the code of how to classify any test set picture by using the cache model and test the class probability?
Adaptor used in vision encoder or text encoder?
Hey, Thanks for nice work. I have some confusion as follows.
First, why the adaptor is used only in vision encoder, did the authors try to use the adaptor in text encoder?
Second, I don't understand why using adaptor performs better using learnable prompt. In addition, the "adaptor" used in this paper is different from the adaptor in NLP tasks, also the position of the insertion is different, which one is better?
Can you give me 他和
search_hp
Hi, Could please the author explain the use of the searchhp function in the code? It doesn't seem to be mentioned in the paper,and how to select the search_ Step and search_ scale ? What are the ranges of alpha and beta?
How to extend to base-to-novel classes task?
Hi, This method modifies the parameters of the text encoder, so it cannot extend to base-to-new classes tasks. I would like to know how to address this problem.
Is the "clip" folder the same as the "clip" folder in original clip github?
Hi,
I notice that you didn't install clip. Instead you use a "clip" folder. I wonder if the "clip" folder the same as the "clip" folder in the original clip?
Number of runs used to generate the results
Hi,
Thank for the very nice repo!
How many few-shot runs did you use to generate the results reported in the paper?
Thanks in advance!
"Tip-Adapter/main.py" use test features to eval
It seems odd to use test features to eval.
see
Line 111 in fcb0605
Could authors give some explanation?
Are CLIP/TIP-Adapter only designed for the few-shot setting?
Sorry I've got another question.
I did not find experiments under the base-to-new/domain generalization setting and cross-dataset transfer setting, which is conducted by CoCoOp.
Are CLIP/TIP-Adapter only designed for the few-shot setting? I wonder how the generation abilities are. Maybe you can give me any intuition?
Details of data augmentation
In the paper, "the CLIP-style pre-processing resizes the cropped image’s short side to 224 while keeping its original aspect", and you said that you use the CLIP-style RandomResizeCrop.
However, I found that in the code, the standard RandomResizeCrop is used.
I wonder that is this setting important to the final performance or I misunderstood here?
result
why there are overlapped
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.