Coder Social home page Coder Social logo

mona's Introduction

Mona

The official implementation of "Adapter is All You Need for Tuning Visual Tasks".

Table of Contents

Introduction

Pre-training & fine-tuning can enhance the transferring efficiency and performance in visual tasks. Recent deltatuning methods provide more options for visual classification tasks. Despite their success, existing visual delta-tuning art fails to exceed the upper limit of full fine-tuning on challenging tasks like instance segmentation and semantic segmentation. To find a competitive alternative to full fine-tuning, we propose the Multi-cognitive Visual Adapter (Mona) tuning, a novel adapter-based tuning method.

mona

Mona achieves the strong performance on COCO object detection (53.4 box AP and 46.0 mask AP on test-dev with Swin-Base), and ADE20K semantic segmentation (51.36 mIoU on val with Swin-Large).

Main Results

The proposed Mona outperforms full fine-tuning on representative visual tasks, which promotes the upper limit of previous delta-tuning art. The results demonstrate that the adapter-tuning paradigm can replace full fine-tuning and achieve better performance in most visual tasks. Full fine-tuning may no longer be the only preferred solution for transfer learning in the future.

performance

Note:

  • We report the results with Cascade Mask R-CNN (Swin-Base) and UperNet (Swin-Large) framework for COCO and ADE20K respectively.
  • The pre-trained weights are IM22K-Supervied pre-trained Swin-Base and Swin-Large.

Moreover, Mona converges faster than other tested delta-tuning arts.

convergency

Note:

  • We obtain the loss on VOC dataset with RetinaNet equipped with Swin-Large.

Getting Started

Object Detection & Instance Segmentation

Installation

Please refer to Swin-Transformer-Object-Detection for the environments and dataset preparation.

Training Mona

After organizing the dataset, you have to modify the config file according to your environments.

  • data_root, have to be set as the actual dataset path.
  • load_from, should be set to your pre-trained weight path.
  • norm_cfg, have to be set to SyncBN if you train the model with multi-gpus.

Please execute the following command in the project path.

COCO

bash Swin-Transformer-Object-Detection/tools/dist_train.sh Swin-Transformer-Object-Detection/mona_configs/swin-b_coco/cascade_mask_swin_base_3x_coco_sample_1_bs_16_mona.py `Your GPUs`

VOC

bash Swin-Transformer-Object-Detection/tools/dist_train.sh Swin-Transformer-Object-Detection/mona_configs/swin-l_voc/voc_retinanet_swin_large_1x_mona.py `Your GPUs`

Semantic Segmentation

Installation

Please refer to Swin-Transformer-Semantic-Segmentation for the environments and dataset preparation.

Training Mona

Follow the guidance in Object Detection & Instance Segmentation to check your config file.

Please execute the following command in the project path.

ADE20K

bash Swin-Transformer-Semantic-Segmentation/tools/dist_train.sh Swin-Transformer-Semantic-Segmentation/mona_configs/swin-l_ade20k/ade20k_upernet_swin_large_160k_mona.py `Your GPUs`

Classification

Installation

Please refer to Swin-Transformer-Classification for environments. the environments.

Note:

  • We reorganize the dataset format to match the requirements of mmclassification.
  • You can follow the following format:
mmclassification
└── data
    └── my_dataset
        ├── meta
        │   ├── train.txt
        │   ├── val.txt
        │   └── test.txt
        ├── train
        ├── val
        └── test

Training Mona

Follow the guidance in Object Detection & Instance Segmentation to check your config file.

Please execute the following command in the project path.

Oxford Flower

bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_oxford_flower_mona.py `Your GPUs`

Oxford Pet

bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_oxford_pet_mona.py `Your GPUs`

Oxford VOC

bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_voc_mona.py `Your GPUs`

Citation

If our work is helpful for your research, please cite:


@misc{yin2023adapter,
      title={Adapter is All You Need for Tuning Visual Tasks}, 
      author={Dongshuo Yin and Leiyi Hu and Bin Li and Youqun Zhang},
      year={2023},
      eprint={2311.15010},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We are grateful for the following, but not limited to these, wonderful open-source repositories.

mona's People

Contributors

leiyi-hu avatar

Stargazers

Wu Junxian avatar jjin avatar  avatar  avatar Tianbao Li avatar  avatar  avatar  avatar DongshuoYin avatar YifengYao avatar Vanilla avatar yangxue avatar Pan Zhang avatar HappyYang avatar  avatar Jieyi Tan avatar  avatar SynBol (辛毅) avatar Thomas Friedel avatar Siyuan Lu avatar semap avatar  avatar  avatar Jifeng Wang avatar  avatar  avatar Wangbo Zhao(明先生) avatar Jeff Carpenter avatar LetThoughtsFlowOut avatar Zhongjianmei avatar Hang Guo avatar Nanyang Du avatar  avatar Shuming Liu avatar  avatar Jas avatar AULAY WANG avatar George Pearse avatar Alireza Hosseini avatar  avatar Lixiang Ru avatar Evan avatar takuoko avatar Rui Zhang avatar  avatar An-zhi WANG avatar Junlin Chang avatar Yiwen Tang avatar 爱可可-爱生活 avatar Mike avatar 李开宇 avatar Bolin Fu avatar  avatar kongdebug avatar Xa9aX ツ avatar  avatar 邵杰 avatar HuFeiHu avatar  avatar  avatar  avatar

Watchers

Mike avatar  avatar

Forkers

cv-det

mona's Issues

Regarding the baseline performance and hyperparameters

Hello Authors,

Thanks for your work and this codebase, I had some questions regarding your implementation and hyperparameters of baselines like LoRA.
Specifically -

  1. What is the rank in LoRA, which layers have a LoRA branch (Attention or MLP), does adding LoRA to all linear layers improve the performance (as shown in recent works), any ablations done on basic LoRA for visual tasks?
  2. Since LoRA is completely re-parameterizable post-tuning, it should be included in NO extra structure section.

I am planning on extending your work for much smaller models and hence any insights would be really appreciated.

Regards,
Arnav

How to calcalate the Trainable params in Mona?

Hi author, it seems that the trainable params in Mona is hard to calculate. We have tried torchsummary, but it didn't work.
For example, we use torchsummary to calculate the Swin Transformer Tiny and the result is:

Total params: 28,265,032
Trainable params: 28,265,032
Non-trainable params: 0
Input size (MB): 0.57
Forward/backward pass size (MB): 252.99
Params size (MB): 107.82
Estimated Total Size (MB): 361.39

We hope the Mona can add the similar function for precise comparison. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.