gqk / lae Goto Github PK

View Code? Open in Web Editor NEW

67.0 67.0 2.0 600 KB

A Unified Continual Learning Framework with General Parameter-Efficient Tuning, ICCV 2023 [PyTorch Code]

Home Page: https://arxiv.org/abs/2303.10070

License: Apache License 2.0

Makefile 0.31% Python 99.69%

lae's People

Stargazers

Watchers

Forkers

mldl whuhxb

lae's Issues

Question about local cross entorpy relating on classification head type

Subject: Adapter dimension

I appreciate your good work and thank you for sharing your excellent code. While going through the code, I had a question. In the vit_adapter.yaml, there is the following section:

extends:
  - ./base/cifar100_order1.yaml
module:
  model:
    backbone: ViT-B_16
  adapt_blocks: [0, 1, 2, 3, 4]
  pet_cls: Adapter
  pet_kwargs:
    down_sample: 5
    mode: parallel
    scale: null

Is down_sample: 5 an absolute value, not a ratio? As far as I know, a common Adapter typically involves a dimension reduction like hidden dim -> hidden_dim * 1/4 -> hidden_dim. Your code has the structure (768, 5), GELU(), (5, 768), is there a specific reason for this setup, and why the value is specifically 5?

Question regarding CIFAR-100 training accuracy being Lower than evaluation accuracy

I've noticed an intriguing phenomenon where the training accuracy is lower than the evaluation accuracy. This seems to deviate from the common trend where training accuracy usually surpasses evaluation accuracy.

To provide some context, I have run the L2P code and observed that, as expected, the training accuracy for CIFAR-100 is higher than the evaluation accuracy. However, with your LAE method applied to the ImageNet-R dataset, it aligns well with the usual pattern of higher training accuracy. This leads me to wonder if there might be a specific reason behind the different behavior observed with CIFAR-100 in your implementation.

I am curious to understand more about this and would greatly appreciate any insights or explanations you could provide. Understanding the nuances of your approach would be incredibly beneficial for my ongoing research and experiments.

Thank you very much for your time and consideration. I am looking forward to your response.

Question regarding reproduction of results

Hello,
Thanks for sharing the code and congratulations on your publication. I have been trying to reproduce your results on CIFAR-100 and I am not getting the average accuracy of 89.96. Here is the config file: https://pastebin.com/FzdDxBD7

Here is the log file: https://pastebin.com/qFMv3kW4

Looking forward to hearing from you :)

Question about the naive baselines

Hi authors,
Congratulations on your great work! I have a few questions about you paper. It would be great if you can kindly answer them!

In your paper Tab.1 and Tab. 2, your baselines are extremely high, i.e. comparable or even better then L2P and DualPrompt.
Also in Tab.1 and Tab.2, the results of Seq-FT are also much higher the numbers reported in L2P paper. For CIFAR100 dataset, L2P reported 33.61% for Seq-FT while you reported 77.61%.
Could you explain why this happened? Did you use the task identity during inference? I.e., for the test set in each task, did you filter out the logits of other tasks?

Also is my understanding to the naive baseline correct? For CIFAR-100, you insert some new parameters into the pretrained model, i.e. 20 prompts and a 100-class classifier. Then you train the new parameters sequentially on each task. During inference, given a test image, you predict the category from the 100-class classifier without any other information (i.e. task class mask). Is this right?

Thanks!

Question regarding Joint-FT

Hello Authors,

Thank you for releasing the code. Could I ask how you implemented the Joint-FT experiment? More specifically:

Which codebase/configs was used?
What training settings were used for this experment? (Learning rate, epochs, learning rate schedule, etc).

Best,
Jinhyung Park

gqk / lae Goto Github PK

lae's People

Stargazers

Watchers

Forkers

lae's Issues

Question about local cross entorpy relating on classification head type

Subject: Adapter dimension

Question regarding CIFAR-100 training accuracy being Lower than evaluation accuracy

Question regarding reproduction of results

Question about the naive baselines

Question regarding Joint-FT

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent