Coder Social home page Coder Social logo

gqk / lae Goto Github PK

View Code? Open in Web Editor NEW
67.0 67.0 2.0 600 KB

A Unified Continual Learning Framework with General Parameter-Efficient Tuning, ICCV 2023 [PyTorch Code]

Home Page: https://arxiv.org/abs/2303.10070

License: Apache License 2.0

Makefile 0.31% Python 99.69%

lae's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

mldl whuhxb

lae's Issues

Subject: Adapter dimension

I appreciate your good work and thank you for sharing your excellent code. While going through the code, I had a question. In the vit_adapter.yaml, there is the following section:

extends:
  - ./base/cifar100_order1.yaml
module:
  model:
    backbone: ViT-B_16
  adapt_blocks: [0, 1, 2, 3, 4]
  pet_cls: Adapter
  pet_kwargs:
    down_sample: 5
    mode: parallel
    scale: null

Is down_sample: 5 an absolute value, not a ratio? As far as I know, a common Adapter typically involves a dimension reduction like hidden dim -> hidden_dim * 1/4 -> hidden_dim. Your code has the structure (768, 5), GELU(), (5, 768), is there a specific reason for this setup, and why the value is specifically 5?

Question regarding CIFAR-100 training accuracy being Lower than evaluation accuracy

I've noticed an intriguing phenomenon where the training accuracy is lower than the evaluation accuracy. This seems to deviate from the common trend where training accuracy usually surpasses evaluation accuracy.
image

To provide some context, I have run the L2P code and observed that, as expected, the training accuracy for CIFAR-100 is higher than the evaluation accuracy. However, with your LAE method applied to the ImageNet-R dataset, it aligns well with the usual pattern of higher training accuracy. This leads me to wonder if there might be a specific reason behind the different behavior observed with CIFAR-100 in your implementation.
image

I am curious to understand more about this and would greatly appreciate any insights or explanations you could provide. Understanding the nuances of your approach would be incredibly beneficial for my ongoing research and experiments.

Thank you very much for your time and consideration. I am looking forward to your response.

Question about the naive baselines

Hi authors,
Congratulations on your great work! I have a few questions about you paper. It would be great if you can kindly answer them!

In your paper Tab.1 and Tab. 2, your baselines are extremely high, i.e. comparable or even better then L2P and DualPrompt.
Also in Tab.1 and Tab.2, the results of Seq-FT are also much higher the numbers reported in L2P paper. For CIFAR100 dataset, L2P reported 33.61% for Seq-FT while you reported 77.61%.
Could you explain why this happened? Did you use the task identity during inference? I.e., for the test set in each task, did you filter out the logits of other tasks?

Also is my understanding to the naive baseline correct? For CIFAR-100, you insert some new parameters into the pretrained model, i.e. 20 prompts and a 100-class classifier. Then you train the new parameters sequentially on each task. During inference, given a test image, you predict the category from the 100-class classifier without any other information (i.e. task class mask). Is this right?

Thanks!

Question regarding Joint-FT

Hello Authors,

Thank you for releasing the code. Could I ask how you implemented the Joint-FT experiment? More specifically:

  • Which codebase/configs was used?
  • What training settings were used for this experment? (Learning rate, epochs, learning rate schedule, etc).

Best,
Jinhyung Park

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.