computer-vision-in-the-wild / elevater_toolkit_ic Goto Github PK
View Code? Open in Web Editor NEWToolkit for Elevater Benchmark
License: MIT License
Toolkit for Elevater Benchmark
License: MIT License
Hi, thanks for this great benchmark.
I have a question about the hyper-parameter tuning.
see,
the training accuracy and validation accuracy are good at the hyper-parameter sweeping stage. And toolkit chooses "Learning rate 0.01, L2 lambda 0.0001" as the best one for the final 50 epochs.
However, the performance of the model with the selected hyper-parameter is extremely bad.
see,
.
Have you ever faced this problem? this problem mainly shows in dtd, fer2013, and resics45 datasets. Usually, this problem occurs when a relatively large LR (like 0.01) is selected in the sweeping stage.
I don't think this problem comes from the gap between the validation set and testing set, because you can see training accuracy is also bad for the final 50 epochs of training.
I tried to run MoCoV3 on your Evaluation Toolkit, but met a problem in loading the model.
I only changed parameter model_cfg from vitb16_CLIP to mocov3_vitb16 as suggested and didn't make any other changes, but it seems that MoCo model cannot be correctly loaded.
Is MoCov3 supported in the current version of Evaluation Toolkit. If so, what should I do to get it correct?
Hi authors, thanks for your great work and useful toolkit! Just as the title, would you like to share some of settings in your experiments? Such as:
(Simply excute run.sh
only involves one gpu and requires long training time under my case)
Looking forward to your reply :)
Hi, I wish to evaluate zero shot learning on a new model. Taking the clip_swin.py as example, there are two parameters which are kind of uninitialized due to being created from torch.empty. trunc_normal has been applied later on, but for the same seed I am receiving drastically different accuracy scores.
What do you recommend in my situation, the model I am trying is maxvit_tiny from torchvision pretrained weight and pretrained vitb32_CLIP.
Thank you for creating this wonderful benchmark.
It seems that the following line of code tries to load the visual branch of the model.
However, when the model (e.g., a custom model) does not have a visual branch, an AttributeError would be raised.
This problem can be solved by modifying it into:
visual_backbone = model.visual if hasattr(model, 'visual') else model
Hi, thanks for your great benchmark.
I wonder how the "Efficiency" score is calculated. Is the challenge ranked based on this scores?
Thankyou for your work and on releasing this benchmark. I am trying to verify zero-shot results and was going to just grep the log files, but they are not being created when I run run.sh. Do you have any pointers on this? Is there a better way to accumulate results over multiple runs?
Hi, first of all thank you for developing such a wonderful benchmark for zero-shot learning assessment.
I have submit some predictions to EvalAI, and it has been running for more than 1 hour. Is the evaluation system working properly?
Thanks.
Currently, I have the following errors:
403:Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
It seems like we do not have access to the datasets anymore due to the wrong authorization header or signature, can anyone check on this?
Can you please share how we can use the image classification few-shot learning for custom datasets?
Thank you for your solid work.
Does the repo implement the function that pick the model weights that perform best in val dataset to evaluate in test dataset?
From the code below, it seems that the repo directly choose the best results in test dataset as the final results?
Elevater_Toolkit_IC/vision_benchmark/evaluation/full_model_finetune.py
Lines 267 to 277 in 00d0af7
Adopting your notations in figure 5(d), you initialize the final linear layer W with the text embedding V, and you also keep the visual projection W_v. When you do LP, do you train both V and W_v, or just W? From your code, it seems that you turned off require_grad for visual.proj, so I guess you only trained W.
In section 5.1, you reached the conclusion that "with the proposed language-init method, one can ensure that few-shot performance is always better than zero-shot". This is not precise because if you only train W, the optimization problem becomes convex and the initialization should not matter (of course this is up to the optimizer's inductive bias). So I suspect the reason why you have LP better than zero-shot in figure 6 is due to the inherent regularization from the optimizer (for example, early stopping ~ l2 regularization). For the CLIP paper, they used L-BFGS to solve LP so initialization really didn't matter to them.
Currently I am having the following error:
requests.exceptions.HTTPError: 409 Client Error: Public access is not permitted on this storage account. for url: https://cvinthewildeus.blob.core.windows.net/datasets/classification/cifar_10_20211007/test.txt
It seems like we do not have access to the datasets anymore, can anyone check on this?
Hi team,
In the knowledge source for Imagenet-1K, I have noticed that for GPT definitions, there are 2 classes missing.
They are:
These classes are duplicated in Imagenet-1K, and seem to be have missed as a result (1000 to 998). It is a small change, and could be fixed quickly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.