Coder Social home page Coder Social logo

hankyul2 / maxvit-pytorch Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 48 KB

[ECCV 2022] unofficial pytorch implementation of the paper "MaxViT: Multi-Axis Vision Transformer"

Home Page: https://arxiv.org/pdf/2204.01697.pdf

Python 100.00%
maxvit pytorch-implementation

maxvit-pytorch's Introduction

MaxViT (PyTorch version)

This repo contains the unofficial PyTorch-version MaxViT model, training, and validation codes. This repo is written to share the PyTorch-version training hyper-parameters of MaxViT. For this, we just copy-and-paste the training hyper-parameters shown in table 12 of the original paper with the modification of the number of GPUs (we use 4 GPUs). Since most codes including model, train, and valid are copy-pasted from Timm github, the credits should be given to @rwightman and the original authors. See also their repos:

Tutorial

Test environments: torch==1.11.0 & timm==0.9.2

  1. Clone this repo

    git clone https://github.com/hankyul2/maxvit-pytorch
    cd maxvit-pytorch
  2. Run the following command to train MaxViT-T in imagenet-1k dataset. For model variants, just change the --drop-path to 0.3 (small) and 0.4 (base). For training with 4 GPUs, we use the gradient accumulation of 16 = 4096 (paper total batch) / 256 (our total batch).

    Training time: about 5 days for the maxvit_tiny_tf_224 model with 4 GPUs (RTX 3090, 24GB).

    torchrun --nproc_per_node=4 --master_port=12345 train.py /path/to/imagenet --model maxvit_tiny_tf_224 --aa rand-m15-mstd0.5-inc1 --mixup .8 --cutmix 1.0 --remode pixel --reprob 0.25 --drop-path .2 --opt adamw --weight-decay .05 --sched cosine --epochs 300 --lr 3e-3 --warmup-lr 1e-6 --warmup-epoch 30 --min-lr 1e-5 -b 64 -tb 4096 --smoothing 0.1 --clip-grad 1.0 -j 8 --amp --pin-mem --channels-last 
  3. Run the following command to reproduce the validation results of MaxViT-T in the imagenet-1k dataset.

    Results: ** Acc@1 83.820 (16.180) Acc@5 96.528 (3.472)*

    python3 valid.py /path/to/imagenet --img-size 224 --crop-pct 0.95 --cuda 0 --model maxvit_tiny_tf_224 --pretrained

Experiment result

Model Image size #Param FLOPs Top1 Artifacts
MaxViT-T (paper) 224 31M 5.6G 83.62
MaxViT-T (ours) 224 31M 5.6G 83.82 [yaml], [ckpt], [log], [csv]

References

@inproceedings{tu2022maxvit,
  title={Maxvit: Multi-axis vision transformer},
  author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
  booktitle={European conference on computer vision},
  pages={459--479},
  year={2022},
  organization={Springer}
}

maxvit-pytorch's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

maxvit-pytorch's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.