Coder Social home page Coder Social logo

cmhungsteve / awesome-transformer-attention Goto Github PK

View Code? Open in Web Editor NEW
4.2K 120.0 471.0 7.42 MB

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

transformer attention-mechanism vision-transformer deep-learning awesome-list transformer-cv transformer-architecture transformer-awesome transformer-with-cv transformer-models

awesome-transformer-attention's Introduction

Hi there πŸ‘‹

My name is Min-Hung (Steve) Chen (ι™³ζ•εΌ˜ in Chinese). I am a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multi-Modal AI. I received my Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira. Before joining NVIDIA, I was working on Biometric Research for Cognitive Services as a Research Engineer II at Microsoft Azure AI, and was working on Edge-AI Research as a Senior AI Engineer at MediaTek, respectively.

My research interest is mainly Multi-Modal AI, including Vision-Language, Video Understanding, Cross-Modal Learning, Efficient Tuning, and Transformer. I am also interested in Learning without Fully Supervision, including domain adaptation, transfer learning, continual learning, X-supervised learning, etc.

[Update] I released a comprehensive paper list for Vision Transformer & Attention to facilitate related research. Feel free to check it (I would be appreciative if you can β˜…STAR it)!

[Personal Website][LinkedIn][Twitter][Google Scholar][Resume]

Min-Hung (Steve)'s GitHub stats

awesome-transformer-attention's People

Contributors

apsdehal avatar arshadshk avatar bwittmann avatar chienyiwang avatar cmhungsteve avatar cmsflash avatar davidnvq avatar fastmetro avatar guoleisun avatar jacobyuan7 avatar jeasinema avatar luoweizhou avatar nitr098 avatar nothingg24 avatar pengboxiangshang avatar pranav-gupta-7 avatar scaomath avatar sebasmos avatar xiuqhou avatar ywyue avatar ziyangwang007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-transformer-attention's Issues

Broken links

Hi, I found some broken links:

Request to add a paper

Hi,

Thanks for maintaining this awesome repository.
We have a paper : https://arxiv.org/abs/2303.14863 that uses DETR for diffusion. We think this is relevant to the awesome repository you are maintaining.

Can this be added to the list ?

Thank you in advance

Apply to add a paper

Hi~ Thanks for maintaining this awesome repository!

We have a new paper Salience-DETR proposing a novel transformer-based object detector, which is accepted to CVPR 2024 and we think it is relevant to the Object Detection part of your repository. We are wondering if you could please add our paper to the collection.

Salience DETR: "Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement", CVPR, 2024 (Xi'an Jiaotong University) [Paper][PyTorch]

Thank you very much.

BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models (request for switch categories)

Hi ,

Thanks for maintaining this awesome repository and thank you for adding my work, BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models , to this repository.

I see that you put our work under the "Efficient Vision Transformer" section. However, I believe that this work falls more under the category of "Model Compression + Transformer" as our work explores the effect of architectural operations/properties (mainly from CNN besides convolutions) on the classification accuracy of binary Vision Transformers with binary weights and activations.

If you can put our work under "Model Compression + Transformer" category alongside works on Vision Transformer quantization/binarization, I would be happy :) Thanks!

Also, our source code can be found in (https://github.com/phuoc-hoan-le/binaryvit).

Transformer on Sample Relationship Exploration

Hi, Thanks for your conclusion!

I think our recent work might be also a typical Transformer Attention for sample relationship exploration: CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522. There is also a general version in BatchFormerV2, in which we present the consistent effectiveness on object detection, panoptic segmentation, and image classification.

If you also think so, it would be better to include this paper.

Regards,

request to add a paper

Hi,
Thanks for this amazing collection. It really help me get a nice literature survey for our work. We also propose to improve efficiency in vision transformer and was wondering if you could please add our paper to the collection.

Skip-Attention: Improving Vision Transformers by Paying Less Attention - Arxiv.

Thanks

Add code link for DGL in Multi-Modal retrieval (video)

Hi Min-Hung,

I'd like to suggest adding a code link for DGL in the list for multi-modal retrieval(Video):

  • DGL: "DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval", AAAI, 2024 (University of Technology Sydney). [Paper] [Pytorch]

The code is implemented in PyTorch and provides an innovative approach for enhancing text-video retrieval capabilities using dynamic global-local prompt tuning with only 0.53M parameters.

Thank you!

ICCV 2023 SparseBEV

Hello, thanks for your great efforts of collecting papers for reference. Our paper SparseBEV[arXiv][github] has recently been accepted to ICCV 2023. Please include our paper in your list.

Applying to add a paper

Hi~ Thanks for maintaining this awesome repository!

We have a new paper Agent Attention proposing a novel attention paradigm, which we think is relevant to the General Vision Transformer part of your repository. We are wondering if you could please add our paper to the collection.

Agent Attention: "Agent Attention: On the Integration of Softmax and Linear Attention", arXiv, 2023 (Tsinghua). [Paper][PyTorch]

Thank you very much.

Add TransCeption and our survey paper

Hi there,

Thanks for your great repository. I want to give you updates about our papers. All of these papers are in the medical image segmentation category. TransDeepLab published at the MICCAI PRIME workshop, HiFormer published at WACV 2023, and TransNorm published at IEEE Access. Please update their information. In addition, please add our new paper entitled "Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale Feature Fusion Approach" with its implementation code link at GitHub.

Also, our new paper, attention swin u-net, was accepted by ISBI 2023 and is available through this link with the code link at GitHub.

Moreover, our survey paper is also published at arXiv with this repo.

Considering this adaptation to your awesome list repo would be nice of you.

Best regards, Ehsan

Great works! Could add some works from our group.

Hi! Dr.Min-Hung. It is a great repo on Transformer+X.

Could you add some detection/segmentation transformer works from my group.
These works are closely with using transformer for video/multi-modal/universal/few shot/referring/efficient segmentation.

1, K-Net "K-Net: Towards Unified Image Segmentation", NeurIPS-2021, (MMlab@NTU) Paper, Project

2,Video K-Net "Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation", (PKU, MMlab@NTU) CVPR-2022. Paper,Project

3, PanopticPartFormer++ "PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation". arxiv,(PKU) Paper, Project

4, FashionFormer "Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition",ECCV-2022. (PKU, Sensetime) Paper,Project.

5, RefSegformer "Towards Robust Referring Image Segmentation", arxiv. (PKU) Project

6, Tube-Link "Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation", arxiv. (MMlab@NTU) Paper, Project

7, EMO, "Rethinking Mobile Block for Efficient Neural Models", arxiv. (Tencent, PKU) Paper, Project

8, Reference Twice "Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation". arxiv. (ZJU, Tencent, PKU) Paper, Project

Despite some of these works are not open-sourced now (under review), I promise we will release the code and models.

Requesting to add four papers published in 2022 and 2023

Please add the following four papers which use transformer backbones:

  1. Egocentric Video-language pre-training and solves video-text retrieval, video classification, text-guided video grounding, text-guided video summarization, video question-answering etc.
  1. Image-language pre-training and solves image captioning, image-text retrieval, object detection, segmentation, referring expression comprehension.
  • VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment (TMLR 2023) [Paper] [Code] [Project]
  1. Video temporal grounding, unifying diverse temporal annotations to power moment retrieval (interval), highlight detection (curve) and video summarization (point).
  • UniVTG: Towards Unified Video-Language Temporal Grounding (ICCV 2023) [Paper] [Code]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.