Coder Social home page Coder Social logo

elevater_toolkit_ic's People

Contributors

chunyuanli avatar haotian-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

elevater_toolkit_ic's Issues

Question about automatic hyper-parameter tuning toolkit

Hi, thanks for this great benchmark.

I have a question about the hyper-parameter tuning.
see,
image
the training accuracy and validation accuracy are good at the hyper-parameter sweeping stage. And toolkit chooses "Learning rate 0.01, L2 lambda 0.0001" as the best one for the final 50 epochs.

However, the performance of the model with the selected hyper-parameter is extremely bad.
see,
image
.

Have you ever faced this problem? this problem mainly shows in dtd, fer2013, and resics45 datasets. Usually, this problem occurs when a relatively large LR (like 0.01) is selected in the sweeping stage.

I don't think this problem comes from the gap between the validation set and testing set, because you can see training accuracy is also bad for the final 50 epochs of training.

An error when running mocoV3

I tried to run MoCoV3 on your Evaluation Toolkit, but met a problem in loading the model.

  I only changed parameter model_cfg from vitb16_CLIP to mocov3_vitb16 as suggested and didn't make any other changes, but it seems that MoCo model cannot be correctly loaded.

  Is MoCov3 supported in the current version of Evaluation Toolkit. If so, what should I do to get it correct?

Some details about finetuning

Hi authors, thanks for your great work and useful toolkit! Just as the title, would you like to share some of settings in your experiments? Such as:

  1. How long does it take to finetune one pre-trained model like CLIP-ViT-B/16 on 20 classification datasets?
  2. For experiments on a single datasets (e.g., cifar100), how many gpus did you use?

(Simply excute run.sh only involves one gpu and requires long training time under my case)
Looking forward to your reply :)

Recommendation with Vision Projection and Text Projection for Zero-shot learning

Hi, I wish to evaluate zero shot learning on a new model. Taking the clip_swin.py as example, there are two parameters which are kind of uninitialized due to being created from torch.empty. trunc_normal has been applied later on, but for the same seed I am receiving drastically different accuracy scores.

What do you recommend in my situation, the model I am trying is maxvit_tiny from torchvision pretrained weight and pretrained vitb32_CLIP.

Thank you for creating this wonderful benchmark.

AttributeError when custom model does not have a 'visual' branch

It seems that the following line of code tries to load the visual branch of the model.

visual_backbone = model.visual if model.visual is not None else model

However, when the model (e.g., a custom model) does not have a visual branch, an AttributeError would be raised.

This problem can be solved by modifying it into:

visual_backbone = model.visual if hasattr(model, 'visual') else model

Why is imagenet-1k considered zero shot?

I thought Imagenet 21k is a superset of imagenet-1k, as written in the ViT paper.
Screenshot 2023-03-16 at 8 28 10 AM
If Imagenet-21K is allowed for pre-training, I assume the evaluation on imagenet1k cannot be considered as zero shot?

Log files not being created

Thankyou for your work and on releasing this benchmark. I am trying to verify zero-shot results and was going to just grep the log files, but they are not being created when I run run.sh. Do you have any pointers on this? Is there a better way to accumulate results over multiple runs?

EvalAI taking too long

Hi, first of all thank you for developing such a wonderful benchmark for zero-shot learning assessment.

I have submit some predictions to EvalAI, and it has been running for more than 1 hour. Is the evaluation system working properly?

Thanks.

Does the repo pick the weights that perform best in val dataset to evaluate in test dataset?

Thank you for your solid work.
Does the repo implement the function that pick the model weights that perform best in val dataset to evaluate in test dataset?
From the code below, it seems that the repo directly choose the best results in test dataset as the final results?

train_one(train_dataloader, model, criterion, optimizer, epoch, config)
# evaluate on validation set
acc1, logits = validate(test_dataloader, model, criterion, epoch, config, return_logits=True)
# remember best acc@1 and save checkpoint
if acc1 > best_acc1:
model_info['best_logits'] = logits
best_acc1 = max(acc1, best_acc1)
logging.info(f'=> Learning rate {config.TRAIN.LR}, L2 lambda {config.TRAIN.WD}: Best score: Acc@1 {best_acc1:.3f}')

What parameters are trained in linear probe (LP) exactly?

Adopting your notations in figure 5(d), you initialize the final linear layer W with the text embedding V, and you also keep the visual projection W_v. When you do LP, do you train both V and W_v, or just W? From your code, it seems that you turned off require_grad for visual.proj, so I guess you only trained W.

In section 5.1, you reached the conclusion that "with the proposed language-init method, one can ensure that few-shot performance is always better than zero-shot". This is not precise because if you only train W, the optimization problem becomes convex and the initialization should not matter (of course this is up to the optimizer's inductive bias). So I suspect the reason why you have LP better than zero-shot in figure 6 is due to the inherent regularization from the optimizer (for example, early stopping ~ l2 regularization). For the CLIP paper, they used L-BFGS to solve LP so initialization really didn't matter to them.

Missing classes in GPT-3 Knowledge source for Imagenet-1K

Hi team,
In the knowledge source for Imagenet-1K, I have noticed that for GPT definitions, there are 2 classes missing.
They are:

  1. Class Number: 837, Class Name: Sunglasses
  2. Class Number: 744, Class Name: Missile.

These classes are duplicated in Imagenet-1K, and seem to be have missed as a result (1000 to 998). It is a small change, and could be fixed quickly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.