Coder Social home page Coder Social logo

iq-scm / vit-adapter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from czczup/vit-adapter

0.0 0.0 0.0 1.86 MB

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Home Page: https://arxiv.org/abs/2205.08534

License: Apache License 2.0

Shell 0.33% C++ 0.37% Python 95.55% Cuda 3.75%

vit-adapter's Introduction

ViT-Adapter

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

The official implementation of the paper "Vision Transformer Adapter for Dense Predictions".

Paper | Blog in Chinese | Slides | Colab Notebook (thanks @IamShubhamGupto, @dudifrid)

News

  • 2023/04/14: ๐Ÿš€ ViT-Adapter is used in EVA and DINOv2!
  • 2023/01/21: Our paper is accepted by ICLR 2023!
  • 2023/01/17: We win the champion of WSDM Cup 2023 Toloka VQA Challenge using ViT-Adapter.
  • 2022/10/20: ViT-Adapter is adopted by Zhang et al. and they ranked 1st in the UVO Challenge 2022.
  • 2022/08/22: ViT-Adapter is adopted by BEiT-3 and created new SOTA of 62.8 mIoU on ADE20K.
  • 2022/06/09: ViT-Adapter-L achieves 60.4 box AP and 52.5 mask AP on COCO test-dev without Objects365.
  • 2022/06/04: Code and models are released.
  • 2022/05/12: ViT-Adapter-L reaches 85.2 mIoU on Cityscapes test set without coarse data.
  • 2022/05/05: ViT-Adapter-L achieves the SOTA on ADE20K val set with 60.5 mIoU!

Highlights

  • ViT-Adapter supports various dense prediction tasks, including object detection, instance segmentation, semantic segmentation, visual grounding, panoptic segmentation, etc.
  • This codebase includes many SOTA detectors and segmenters to achieve top performance, such as HTC++, Mask2Former, DINO.
results.mp4

Abstract

This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT). Unlike recently advanced variants that incorporate vision-specific inductive biases into their architectures, the plain ViT suffers inferior performance on dense predictions due to weak prior assumptions. To address this issue, we propose the ViT-Adapter, which allows plain ViT to achieve comparable performance to vision-specific transformers. Specifically, the backbone in our framework is a plain ViT that can learn powerful representations from large-scale multi-modal data. When transferring to downstream tasks, a pre-training-free adapter is used to introduce the image-related inductive biases into the model, making it suitable for these tasks. We verify ViT-Adapter on multiple dense prediction tasks, including object detection, instance segmentation, and semantic segmentation. Notably, without using extra detection data, our ViT-Adapter-L yields state-of-the-art 60.9 box AP and 53.0 mask AP on COCO test-dev. We hope that the ViT-Adapter could serve as an alternative for vision-specific transformers and facilitate future research. The code and models will be released.

Method

image

image

Catalog

  • Support flash attention
  • Support faster deformable attention
  • Segmentation checkpoints
  • Segmentation code
  • Detection checkpoints
  • Detection code
  • Initialization

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{chen2022vitadapter,
  title={Vision Transformer Adapter for Dense Predictions},
  author={Chen, Zhe and Duan, Yuchen and Wang, Wenhai and He, Junjun and Lu, Tong and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2205.08534},
  year={2022}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

vit-adapter's People

Contributors

czczup avatar duanduanduanyuchen avatar vvvb-github avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.