computer-vision-in-the-wild / elevater_toolkit_ic Goto Github PK

View Code? Open in Web Editor NEW

64.0 4.0 17.0 3.02 MB

Toolkit for Elevater Benchmark

License: MIT License

Shell 2.14% Python 97.86%

elevater_toolkit_ic's People

Contributors

Stargazers

Watchers

Forkers

andsteing huseyinatahaninan simonjjj prazek sunxm2357 etri-clara justinlin610 chunyuanli pandaupc harmandotpy ffvdfg leonzheng2 tian1327 nessessence bobgoodi76 wangziyannb zhaopufeng

elevater_toolkit_ic's Issues

Question about automatic hyper-parameter tuning toolkit

Hi, thanks for this great benchmark.

I have a question about the hyper-parameter tuning.
see,

the training accuracy and validation accuracy are good at the hyper-parameter sweeping stage. And toolkit chooses "Learning rate 0.01, L2 lambda 0.0001" as the best one for the final 50 epochs.

However, the performance of the model with the selected hyper-parameter is extremely bad.
see,

.

Have you ever faced this problem? this problem mainly shows in dtd, fer2013, and resics45 datasets. Usually, this problem occurs when a relatively large LR (like 0.01) is selected in the sweeping stage.

I don't think this problem comes from the gap between the validation set and testing set, because you can see training accuracy is also bad for the final 50 epochs of training.

An error when running mocoV3

I tried to run MoCoV3 on your Evaluation Toolkit, but met a problem in loading the model.

I only changed parameter model_cfg from vitb16_CLIP to mocov3_vitb16 as suggested and didn't make any other changes, but it seems that MoCo model cannot be correctly loaded.

Is MoCov3 supported in the current version of Evaluation Toolkit. If so, what should I do to get it correct?

Some details about finetuning

Hi authors, thanks for your great work and useful toolkit! Just as the title, would you like to share some of settings in your experiments? Such as:

How long does it take to finetune one pre-trained model like CLIP-ViT-B/16 on 20 classification datasets?
For experiments on a single datasets (e.g., cifar100), how many gpus did you use?

(Simply excute run.sh only involves one gpu and requires long training time under my case)
Looking forward to your reply :)

Recommendation with Vision Projection and Text Projection for Zero-shot learning

Hi, I wish to evaluate zero shot learning on a new model. Taking the clip_swin.py as example, there are two parameters which are kind of uninitialized due to being created from torch.empty. trunc_normal has been applied later on, but for the same seed I am receiving drastically different accuracy scores.

What do you recommend in my situation, the model I am trying is maxvit_tiny from torchvision pretrained weight and pretrained vitb32_CLIP.

Thank you for creating this wonderful benchmark.

AttributeError when custom model does not have a 'visual' branch

It seems that the following line of code tries to load the visual branch of the model.

Elevater_Toolkit_IC/vision_benchmark/evaluation/feature.py

Line 326 in 9d39620

visual_backbone = model.visual if model.visual is not None else model

However, when the model (e.g., a custom model) does not have a visual branch, an AttributeError would be raised.

This problem can be solved by modifying it into:

visual_backbone = model.visual if hasattr(model, 'visual') else model

Why is imagenet-1k considered zero shot?

I thought Imagenet 21k is a superset of imagenet-1k, as written in the ViT paper.

If Imagenet-21K is allowed for pre-training, I assume the evaluation on imagenet1k cannot be considered as zero shot?

efficiency score on parameters-efficiency track

Hi, thanks for your great benchmark.

I wonder how the "Efficiency" score is calculated. Is the challenge ranked based on this scores?

compute all feature need too many memory

Log files not being created

Thankyou for your work and on releasing this benchmark. I am trying to verify zero-shot results and was going to just grep the log files, but they are not being created when I run run.sh. Do you have any pointers on this? Is there a better way to accumulate results over multiple runs?

EvalAI taking too long

Hi, first of all thank you for developing such a wonderful benchmark for zero-shot learning assessment.

I have submit some predictions to EvalAI, and it has been running for more than 1 hour. Is the evaluation system working properly?

Thanks.

403 Authentication Error: Server Failed to Authenticate Request for Downloading Datasets

Currently, I have the following errors:

403：Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

It seems like we do not have access to the datasets anymore due to the wrong authorization header or signature, can anyone check on this?

Few shot learning for custom datasets

Can you please share how we can use the image classification few-shot learning for custom datasets?

Does the repo pick the weights that perform best in val dataset to evaluate in test dataset?

Thank you for your solid work.
Does the repo implement the function that pick the model weights that perform best in val dataset to evaluate in test dataset?
From the code below, it seems that the repo directly choose the best results in test dataset as the final results?

Elevater_Toolkit_IC/vision_benchmark/evaluation/full_model_finetune.py

Lines 267 to 277 in 00d0af7

    
               train_one(train_dataloader, model, criterion, optimizer, epoch, config) 
        
               # evaluate on validation set 
        
               acc1, logits = validate(test_dataloader, model, criterion, epoch, config, return_logits=True) 
        
               # remember best acc@1 and save checkpoint 
        
               if acc1 > best_acc1: 
        
                   model_info['best_logits'] = logits 
        
               best_acc1 = max(acc1, best_acc1) 
        
           logging.info(f'=> Learning rate {config.TRAIN.LR}, L2 lambda {config.TRAIN.WD}: Best score: Acc@1 {best_acc1:.3f}')

What parameters are trained in linear probe (LP) exactly?

Adopting your notations in figure 5(d), you initialize the final linear layer W with the text embedding V, and you also keep the visual projection W_v. When you do LP, do you train both V and W_v, or just W? From your code, it seems that you turned off require_grad for visual.proj, so I guess you only trained W.

In section 5.1, you reached the conclusion that "with the proposed language-init method, one can ensure that few-shot performance is always better than zero-shot". This is not precise because if you only train W, the optimization problem becomes convex and the initialization should not matter (of course this is up to the optimizer's inductive bias). So I suspect the reason why you have LP better than zero-shot in figure 6 is due to the inherent regularization from the optimizer (for example, early stopping ~ l2 regularization). For the CLIP paper, they used L-BFGS to solve LP so initialization really didn't matter to them.

Problem with link to datasets

Currently I am having the following error:

requests.exceptions.HTTPError: 409 Client Error: Public access is not permitted on this storage account. for url: https://cvinthewildeus.blob.core.windows.net/datasets/classification/cifar_10_20211007/test.txt

It seems like we do not have access to the datasets anymore, can anyone check on this?

Missing classes in GPT-3 Knowledge source for Imagenet-1K

Hi team,
In the knowledge source for Imagenet-1K, I have noticed that for GPT definitions, there are 2 classes missing.
They are:

Class Number: 837, Class Name: Sunglasses
Class Number: 744, Class Name: Missile.

These classes are duplicated in Imagenet-1K, and seem to be have missed as a result (1000 to 998). It is a small change, and could be fixed quickly.

	train_one(train_dataloader, model, criterion, optimizer, epoch, config)

	# evaluate on validation set
	acc1, logits = validate(test_dataloader, model, criterion, epoch, config, return_logits=True)

	# remember best acc@1 and save checkpoint
	if acc1 > best_acc1:
	model_info['best_logits'] = logits
	best_acc1 = max(acc1, best_acc1)

	logging.info(f'=> Learning rate {config.TRAIN.LR}, L2 lambda {config.TRAIN.WD}: Best score: Acc@1 {best_acc1:.3f}')