gqk / lae Goto Github PK
View Code? Open in Web Editor NEWA Unified Continual Learning Framework with General Parameter-Efficient Tuning, ICCV 2023 [PyTorch Code]
Home Page: https://arxiv.org/abs/2303.10070
License: Apache License 2.0
A Unified Continual Learning Framework with General Parameter-Efficient Tuning, ICCV 2023 [PyTorch Code]
Home Page: https://arxiv.org/abs/2303.10070
License: Apache License 2.0
I appreciate your good work and thank you for sharing your excellent code. While going through the code, I had a question. In the vit_adapter.yaml, there is the following section:
extends:
- ./base/cifar100_order1.yaml
module:
model:
backbone: ViT-B_16
adapt_blocks: [0, 1, 2, 3, 4]
pet_cls: Adapter
pet_kwargs:
down_sample: 5
mode: parallel
scale: null
Is down_sample: 5 an absolute value, not a ratio? As far as I know, a common Adapter typically involves a dimension reduction like hidden dim -> hidden_dim * 1/4 -> hidden_dim. Your code has the structure (768, 5), GELU(), (5, 768), is there a specific reason for this setup, and why the value is specifically 5?
I've noticed an intriguing phenomenon where the training accuracy is lower than the evaluation accuracy. This seems to deviate from the common trend where training accuracy usually surpasses evaluation accuracy.
To provide some context, I have run the L2P code and observed that, as expected, the training accuracy for CIFAR-100 is higher than the evaluation accuracy. However, with your LAE method applied to the ImageNet-R dataset, it aligns well with the usual pattern of higher training accuracy. This leads me to wonder if there might be a specific reason behind the different behavior observed with CIFAR-100 in your implementation.
I am curious to understand more about this and would greatly appreciate any insights or explanations you could provide. Understanding the nuances of your approach would be incredibly beneficial for my ongoing research and experiments.
Thank you very much for your time and consideration. I am looking forward to your response.
Hello,
Thanks for sharing the code and congratulations on your publication. I have been trying to reproduce your results on CIFAR-100 and I am not getting the average accuracy of 89.96. Here is the config file: https://pastebin.com/FzdDxBD7
Here is the log file: https://pastebin.com/qFMv3kW4
Looking forward to hearing from you :)
Hi authors,
Congratulations on your great work! I have a few questions about you paper. It would be great if you can kindly answer them!
In your paper Tab.1 and Tab. 2, your baselines are extremely high, i.e. comparable or even better then L2P and DualPrompt.
Also in Tab.1 and Tab.2, the results of Seq-FT are also much higher the numbers reported in L2P paper. For CIFAR100 dataset, L2P reported 33.61% for Seq-FT while you reported 77.61%.
Could you explain why this happened? Did you use the task identity during inference? I.e., for the test set in each task, did you filter out the logits of other tasks?
Also is my understanding to the naive baseline correct? For CIFAR-100, you insert some new parameters into the pretrained model, i.e. 20 prompts and a 100-class classifier. Then you train the new parameters sequentially on each task. During inference, given a test image, you predict the category from the 100-class classifier without any other information (i.e. task class mask). Is this right?
Thanks!
Hello Authors,
Thank you for releasing the code. Could I ask how you implemented the Joint-FT experiment? More specifically:
Best,
Jinhyung Park
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.