Coder Social home page Coder Social logo

cream's Introduction

Neural Architecture Design and Search Tweet

This is a collection of our NAS and Vision Transformer work

TinyViT (@ECCV'22): TinyViT: Fast Pretraining Distillation for Small Vision Transformers

MiniViT (@CVPR'22): MiniViT: Compressing Vision Transformers with Weight Multiplexing

CDARTS (@TPAMI'22): Cyclic Differentiable Architecture Search

AutoFormerV2 (@NeurIPS'21): Searching the Search Space of Vision Transformer

iRPE (@ICCV'21): Rethinking and Improving Relative Position Encoding for Vision Transformer

AutoFormer (@ICCV'21): AutoFormer: Searching Transformers for Visual Recognition

Cream (@NeurIPS'20): Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

We also implemented our NAS algorithms on Microsoft NNI (Neural Network Intelligence).

News

  • โ˜€๏ธ Hiring research interns for neural architecture search, tiny transformer design, model compression projects: [email protected]
  • ๐Ÿ’ฅ Jul, 2022: Code for TinyViT is now released.
  • ๐Ÿ’ฅ Apr, 2022: Code for MiniViT is now released.
  • ๐Ÿ’ฅ Mar, 2022: MiniViT has been accepted by CVPR'22.
  • ๐Ÿ’ฅ Feb, 2022: Code for CDARTS is now released.
  • ๐Ÿ’ฅ Feb, 2022: CDARTS has been accepted by TPAMI'22.
  • ๐Ÿ’ฅ Jan, 2022: Code for AutoFormerV2 is now released.
  • ๐Ÿ’ฅ Oct, 2021: AutoFormerV2 has been accepted by NeurIPS'21, code will be released soon.
  • ๐Ÿ’ฅ Aug, 2021: Code for AutoFormer is now released.
  • ๐Ÿ’ฅ July, 2021: iRPE code (with CUDA Acceleration) is now released. Paper is here.
  • ๐Ÿ’ฅ July, 2021: iRPE has been accepted by ICCV'21.
  • ๐Ÿ’ฅ July, 2021: AutoFormer has been accepted by ICCV'21.
  • ๐Ÿ’ฅ July, 2021: AutoFormer is now available on arXiv.
  • ๐Ÿ’ฅ Oct, 2020: Code for Cream is now released.
  • ๐Ÿ’ฅ Oct, 2020: Cream was accepted to NeurIPS'20

Works

TinyViT

TinyViT is a new family of tiny and efficient vision transformers pretrained on large-scale datasets with our proposed fast distillation framework. The central idea is to transfer knowledge from large pretrained models to small ones. The logits of large teacher models are sparsified and stored in disk in advance to save the memory cost and computation overheads.

TinyViT overview

MiniViT

MiniViT is a new compression framework that achieves parameter reduction in vision transformers while retaining the same performance. The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks. Specifically, we make the weights shared across layers, while imposing a transformation on the weights to increase diversity. Weight distillation over self-attention is also applied to transfer knowledge from large-scale ViT models to weight-multiplexed compact models.

MiniViT overview

CDARTS

In this work, we propose new joint optimization objectives and a novel Cyclic Differentiable ARchiTecture Search framework, dubbed CDARTS. Considering the structure difference, CDARTS builds a cyclic feedback mechanism between the search and evaluation networks with introspective distillation.

CDARTS overview

AutoFormerV2

In this work, instead of searching the architecture in a predefined search space, with the help of AutoFormer, we proposed to search the search space to automatically find a great search space first. After that we search the architectures in the searched space. In addition, we provide insightful observations and guidelines for general vision transformer design.

AutoFormerV2 overview

AutoFormer

AutoFormer is new one-shot architecture search framework dedicated to vision transformer search. It entangles the weights of different vision transformer blocks in the same layers during supernet training. Benefiting from the strategy, the trained supernet allows thousands of subnets to be very well-trained. Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch.

AutoFormer overview

iRPE

Image RPE (iRPE for short) methods are new relative position encoding methods dedicated to 2D images, considering directional relative distance modeling as well as the interactions between queries and relative position embeddings in self-attention mechanism. The proposed iRPE methods are simple and lightweight, being easily plugged into transformer blocks. Experiments demonstrate that solely due to the proposed encoding methods, DeiT and DETR obtain up to 1.5% (top-1 Acc) and 1.3% (mAP) stable improvements over their original versions on ImageNet and COCO respectively, without tuning any extra hyperparamters such as learning rate and weight decay. Our ablation and analysis also yield interesting findings, some of which run counter to previous understanding.

iRPE overview

Cream

[Paper] [Models-Google Drive][Models-Baidu Disk (password: wqw6)] [Slides] [BibTex]

In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.

Bibtex

@InProceedings{tiny_vit,
  title={TinyViT: Fast Pretraining Distillation for Small Vision Transformers},
  author={Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
  booktitle={European conference on computer vision (ECCV)},
  year={2022}
}

@InProceedings{MiniViT,
    title     = {MiniViT: Compressing Vision Transformers With Weight Multiplexing},
    author    = {Zhang, Jinnian and Peng, Houwen and Wu, Kan and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {12145-12154}
}

@article{CDARTS,
  title={Cyclic Differentiable Architecture Search},
  author={Yu, Hongyuan and Peng, Houwen and Huang, Yan and Fu, Jianlong and Du, Hao and Wang, Liang and Ling, Haibin},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year={2022}
}

@article{S3,
  title={Searching the Search Space of Vision Transformer},
  author={Minghao, Chen and Kan, Wu and Bolin, Ni and Houwen, Peng and Bei, Liu and Jianlong, Fu and Hongyang, Chao and Haibin, Ling},
  booktitle={Conference and Workshop on Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

@InProceedings{iRPE,
    title     = {Rethinking and Improving Relative Position Encoding for Vision Transformer},
    author    = {Wu, Kan and Peng, Houwen and Chen, Minghao and Fu, Jianlong and Chao, Hongyang},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {10033-10041}
}

@InProceedings{AutoFormer,
    title     = {AutoFormer: Searching Transformers for Visual Recognition},
    author    = {Chen, Minghao and Peng, Houwen and Fu, Jianlong and Ling, Haibin},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {12270-12280}
}

@article{Cream,
  title={Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search},
  author={Peng, Houwen and Du, Hao and Yu, Hongyuan and Li, Qi and Liao, Jing and Fu, Jianlong},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

License

License under an MIT license.

cream's People

Contributors

crj1998 avatar dependabot[bot] avatar dominickzhang avatar hongyuanyu avatar microsoftopensource avatar penghouwen avatar silent-chen avatar tapphughesn avatar wkcn avatar z7zuqer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.