Coder Social home page Coder Social logo

mooncake199809 / vitmvt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xt-1997/vitmvt

0.0 0.0 0.0 3.1 MB

The implementation of the paper: ViT-MVT: A Unified Vision Transformer Network for Multiple Vision Tasks

Shell 0.12% Python 99.88%

vitmvt's Introduction

ViT-MVT: A Unified Vision Transformer Network for Multiple Vision Tasks

This is an official implementation of ViT-MVT.

ViT-MVT is a consolidated network that integrates numerous visual tasks into a single and manageable supernet and optimizes them concurrently under various dataset domains. For the design of ViT-MVT, we augment the vision transformer with a multi-head se-attention (MHSE) to offer complementary cues in the channel and spatial dimension, as well as a local perception unit (LPU) and locality feed forwards network (locality FFN) for information exchange in the local region, thus endowing ViT-MVT with the ability to effectively optimize multiple tasks. Besides, we construct a search space comprising potential architectures with a broad spectrum of model sizes to offer various optimum candidates for diverse tasks. After that, we design a layer-adaptive sharing technique that automatically determines whether each layer of the transformer block is shared or not for all tasks, enabling ViT-MVT to obtain task-shared parameters for a reduction of storage and task-specific parameters to learn task-related features such that boosting performance. Finally, we introduce a joint-task evolutionary search algorithm to discover an optimal backbone for all tasks under total model size constraint, which challenges the conventional wisdom that visual tasks are typically supplied with backbone networks developed for image classification.

pipeline

Highlights

  • Once-for-all

ViT-MVT is a simple yet effective method to train a vision transformer supernet for multiple vision tasks.

  • Competive performance

ViT-MVT consistently outperform the baseline AutoFormer and other state-of-the-art works.

pipeline

Environment Setup

To set up the enviroment you can easily run the following command:

  • Create environment
conda create -n Vitmvt python=3.7
conda activate Vitmvt
  • Install torch, we verify Vit-MVT with pytorch 1.10.1 and cuda 11.3.
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
  • Install openmmlab packages.
pip install mmcv-full==1.5.0
pip install mmcls==0.23.1
pip install mmdet==2.24.1
pip install mmsegmentation==0.24.1
  • Install ViT-MVT
cd vitmvt 
pip install -e .

Data Preparation

You need to first download the ImageNet-2012 to the folder ./data/imagenet, COCO-2017 to the folder ./data/coco, and ADE20K to the folder ./data/ade20k.

Model Zoo

For evaluation, we provide the checkpoints of our models in Google Drive.

After downloading the models, you can do the evaluation following the description in Quick Start - Test).

Model download links:

Model Total Params. Top-1 Acc. % mAP % mIoU % Download link
ViT-MVT-T 86.3M 78.7 44.5 45.56 Google Drive
ViT-MVT-S 135.3M 82.4 47.3 49.42 Google Drive
ViT-MVT-B 239.6M 83.2 49.4 50.46 Google Drive
  • Note: We do not integrate these models into a single one. You can manually integrate these models into a single one according to the learnable score of each layer so as to reduce storage cost.

Quick Start

We provide Test code of ViT-MVT as follows:

Test

To test our trained models, you need to put the downloaded model in /PATH/TO/CHECKPOINT. After that you could use the following command to test the model (Please change your config file and model checkpoint according to different models. Here we use the ViT-MVT-T as an example).

  • Classification task
cd vitmvt
python tools/test.py configs/nas/vit_mvt_tiny/task_cls.py /PATH/TO/CHECKPOINT --work-dir $WORK_DIR --eval accuracy
  • Detection task
cd vitmvt
python tools/test.py configs/nas/vit_mvt_tiny/task_det.py /PATH/TO/CHECKPOINT --work-dir $WORK_DIR --eval bbox
  • Segmentation task
cd vitmvt
python tools/test.py configs/nas/vit_mvt_tiny/task_seg.py /PATH/TO/CHECKPOINT --work-dir $WORK_DIR --eval mIoU

Visualization

We give some visualizations of ViT-MVT-B for object detection and semantic segmentation as follows:

  • Object detection
  • Semantic segmentation

Demo

We give two demos of visualization for instance segmentation on COCO dataset and semantic segmentation on ADE20K. You can utilize the trained ViT-MVT to run the demo.

Additional Notes

  • We do not intend to release the ViT-MVT training code due to the the training code is related to the enterprise source code.

vitmvt's People

Contributors

xt-1997 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.