Light

cmhungsteve / awesome-transformer-attention Goto Github PK

View Code? Open in Web Editor NEW

4.2K 120.0 471.0 7.42 MB

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

transformer attention-mechanism vision-transformer deep-learning awesome-list transformer-cv transformer-architecture transformer-awesome transformer-with-cv transformer-models

awesome-transformer-attention's Introduction

Hi there 👋

My name is Min-Hung (Steve) Chen (陳敏弘 in Chinese). I am a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multi-Modal AI. I received my Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira. Before joining NVIDIA, I was working on Biometric Research for Cognitive Services as a Research Engineer II at Microsoft Azure AI, and was working on Edge-AI Research as a Senior AI Engineer at MediaTek, respectively.

My research interest is mainly Multi-Modal AI, including Vision-Language, Video Understanding, Cross-Modal Learning, Efficient Tuning, and Transformer. I am also interested in Learning without Fully Supervision, including domain adaptation, transfer learning, continual learning, X-supervised learning, etc.

[Update] I released a comprehensive paper list for Vision Transformer & Attention to facilitate related research. Feel free to check it (I would be appreciative if you can ★STAR it)!

[Personal Website][LinkedIn][Twitter][Google Scholar][Resume]

awesome-transformer-attention's People

Contributors

Stargazers

Watchers

Forkers

jlqzzz jizhihang luoweizhou dmuistlab vmtmxmf5 studentbythesea itsroman benjamesbabala jim79 whuhxb yoontae6719 yidfeng nimritakoul yudadabing waterbearbee georgedeac sts-sadr nandan91 nikhilroxtomar rameshnair007 shahrullo tangohu17 tuananh1007 brendenrossin nhgowtham lewieyasu mekongdelta-mind ahwhbc cv-ip allensmile shramanpramanick anisiolacerda dengdqdq tcwltcwl prabathbr xiaoyee ashishpatel26 uservinu mfkiwl deekshithadprakash chandan5362 mhatoum1 contropist vikasreddy636 dangokuson machinelearningsystem mazino69 lichuanx eclipsess mohamedhedi-elfkir taowangzj vienngo keerthisaran jeasinema batman-do biozid-0208 hfzarslan ahatamiz das-sunanda wangjiejie2022 godofpdog shbaydadaev mustansarfiaz yxma666 apsdehal mahii6991 arunadevikaruppasamy rahmaniitp licj1 mgebran amitkayal aditikhare007 omkarthawakar snehilsanyal kforcodeai hurmean jadaliha cdmc-zrz sunnybhati ssbagalkar sum-coderepo amitkml kanedev thewillhuang ibrandiay sqali ebarsoum t-karthik pranav-gupta-7 eyalhizmi arunsank ganesh3 arubattino lyf0801 davidnvq codewithflycat daiguangzhao skyrookieyu martinser shahnazari

awesome-transformer-attention's Issues

Adding Official Code for ViTGAN

Dear Min-Hung,

Thank you for creating a great list of Transformers for computer vision.

Could you please add the official code link for the ViTGAN paper?
https://github.com/mlpc-ucsd/ViTGAN

Thank you for your consideration!

Transformers in Flax

Hi,

I previously worked with lucidrains to implement 18 different Vision Transformers and PaLM in Google's Flax/JAX. I am going to release versions in DeepMind's Haiku soon as well. If this is of any interest I can open up a PR and add them to the corresponding places on the list.

Vision Transformers: https://github.com/conceptofmind/vit-flax

PaLM: https://github.com/conceptofmind/PaLM-flax

Thank you,

Enrico

Broken links

Hi, I found some broken links:

Request to add a paper

Hi,

Thanks for maintaining this awesome repository.
We have a paper : https://arxiv.org/abs/2303.14863 that uses DETR for diffusion. We think this is relevant to the awesome repository you are maintaining.

Can this be added to the list ?

Thank you in advance

You may consider the integration of my collected papers

Hi, thanks for your works, and I like it very much.

You may also consider to integrate my collected papers into your git, I found there are several missing papers can be complementary to your collection.

https://github.com/junchen14/Multi-Modal-Transformer

Apply to add a paper

Hi~ Thanks for maintaining this awesome repository!

We have a new paper Salience-DETR proposing a novel transformer-based object detector, which is accepted to CVPR 2024 and we think it is relevant to the Object Detection part of your repository. We are wondering if you could please add our paper to the collection.

Salience DETR: "Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement", CVPR, 2024 (Xi'an Jiaotong University) [Paper][PyTorch]

Thank you very much.

BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models (request for switch categories)

Hi ,

Thanks for maintaining this awesome repository and thank you for adding my work, BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models , to this repository.

I see that you put our work under the "Efficient Vision Transformer" section. However, I believe that this work falls more under the category of "Model Compression + Transformer" as our work explores the effect of architectural operations/properties (mainly from CNN besides convolutions) on the classification accuracy of binary Vision Transformers with binary weights and activations.

If you can put our work under "Model Compression + Transformer" category alongside works on Vision Transformer quantization/binarization, I would be happy :) Thanks!

Also, our source code can be found in (https://github.com/phuoc-hoan-le/binaryvit).

Visual Document Understanding

Hi, is the Visual Document Understanding section removed?

Transformer on Sample Relationship Exploration

Hi, Thanks for your conclusion!

I think our recent work might be also a typical Transformer Attention for sample relationship exploration: CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522. There is also a general version in BatchFormerV2, in which we present the consistent effectiveness on object detection, panoptic segmentation, and image classification.

If you also think so, it would be better to include this paper.

Regards,

MSANet

Kindly add newly Few-Shot Segmentation SOTA Paper.
MSANet: Multi-Similarity and Attention Guidance for Boosting Few-Shot Segmentation
https://arxiv.org/abs/2206.09667
https://paperswithcode.com/paper/msanet-multi-similarity-and-attention-1

request to add a paper

Hi,
Thanks for this amazing collection. It really help me get a nice literature survey for our work. We also propose to improve efficiency in vision transformer and was wondering if you could please add our paper to the collection.

Skip-Attention: Improving Vision Transformers by Paying Less Attention - Arxiv.

Thanks

Add code link for DGL in Multi-Modal retrieval (video)

Hi Min-Hung,

I'd like to suggest adding a code link for DGL in the list for multi-modal retrieval(Video):

DGL: "DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval", AAAI, 2024 (University of Technology Sydney). [Paper] [Pytorch]

The code is implemented in PyTorch and provides an innovative approach for enhancing text-video retrieval capabilities using dynamic global-local prompt tuning with only 0.53M parameters.

Thank you!

ICCV 2023 SparseBEV

Hello, thanks for your great efforts of collecting papers for reference. Our paper SparseBEV[arXiv][github] has recently been accepted to ICCV 2023. Please include our paper in your list.

Applying to add a paper

Hi~ Thanks for maintaining this awesome repository!

We have a new paper Agent Attention proposing a novel attention paradigm, which we think is relevant to the General Vision Transformer part of your repository. We are wondering if you could please add our paper to the collection.

Agent Attention: "Agent Attention: On the Integration of Softmax and Linear Attention", arXiv, 2023 (Tsinghua). [Paper][PyTorch]

Thank you very much.

Code and project page of SPoVT

Hi,
I am one of the first author of "SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion". Thanks for mentioning our in this repo. I am happy to tell you that we finally established our project page and release our code (after an entire year lol). Here's the link:
project page: https://haoyuhsu.github.io/spovt/index.html
code: https://github.com/haoyuhsu/spovt

Thank you!

Two Pruning Papers for Transformer in NeurIPS 2022

Kindly add two papers about pruning (compression) for transformer appeared in NeurIPS 2022.

Add TransCeption and our survey paper

Hi there,

Thanks for your great repository. I want to give you updates about our papers. All of these papers are in the medical image segmentation category. TransDeepLab published at the MICCAI PRIME workshop, HiFormer published at WACV 2023, and TransNorm published at IEEE Access. Please update their information. In addition, please add our new paper entitled "Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale Feature Fusion Approach" with its implementation code link at GitHub.

Also, our new paper, attention swin u-net, was accepted by ISBI 2023 and is available through this link with the code link at GitHub.

Moreover, our survey paper is also published at arXiv with this repo.

Considering this adaptation to your awesome list repo would be nice of you.

Best regards, Ehsan

Transformer for instance segmentation - Mask Transfiner

Hello, can you add our CVPR22 work using transformer attention?
Paper title: Mask Transfiner for High-Quality Instance Segmentation
Paper link: https://arxiv.org/abs/2111.13673
Code: https://github.com/SysCV/transfiner
Project page: https://www.vis.xyz/pub/transfiner

add DearKD(CVPR2022)

training+transformer

title: DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
paper link: https://arxiv.org/abs/2204.12997

Thanks

Add New Paper

Thanks for your great contribution. Could you please add this work :

Title: Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement
Published in ICCV 2023
Name: Retinexformer
Arxiv: https://arxiv.org/abs/2303.06705
Code: https://github.com/caiyuanhao1998/Retinexformer

Thank you very much.

Adding official code for STRM

Hi Steve,
Thanks for making this excellent repository. Please add our work, Spatio-temporal Relation Modeling for Few-shot Action Recognition:

Paper: https://arxiv.org/pdf/2112.05132.pdf
Github Link: https://github.com/Anirudh257/strm

Hello. Great work!

Hello! Great work!

I would like to introduce CVPR2023 paper called QD-DETR (Query-Dependent Detection Transformer). (multi-modal retrieval branch)
Paper : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
arxiv link : https://arxiv.org/abs/2303.13874
Github link : https://github.com/wjun0830/QD-DETR

Thank you.

Great works! Could add some works from our group.

Hi! Dr.Min-Hung. It is a great repo on Transformer+X.

Could you add some detection/segmentation transformer works from my group.
These works are closely with using transformer for video/multi-modal/universal/few shot/referring/efficient segmentation.

1, K-Net "K-Net: Towards Unified Image Segmentation", NeurIPS-2021, (MMlab@NTU) Paper, Project

2,Video K-Net "Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation", (PKU, MMlab@NTU) CVPR-2022. Paper,Project

3, PanopticPartFormer++ "PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation". arxiv,(PKU) Paper, Project

4, FashionFormer "Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition",ECCV-2022. (PKU, Sensetime) Paper,Project.

5, RefSegformer "Towards Robust Referring Image Segmentation", arxiv. (PKU) Project

6, Tube-Link "Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation", arxiv. (MMlab@NTU) Paper, Project

7, EMO, "Rethinking Mobile Block for Efficient Neural Models", arxiv. (Tencent, PKU) Paper, Project

8, Reference Twice "Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation". arxiv. (ZJU, Tencent, PKU) Paper, Project

Despite some of these works are not open-sourced now (under review), I promise we will release the code and models.

Requesting to add four papers published in 2022 and 2023

Please add the following four papers which use transformer backbones:

Egocentric Video-language pre-training and solves video-text retrieval, video classification, text-guided video grounding, text-guided video summarization, video question-answering etc.

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone (ICCV 2023) [Paper] [Code] [Project] [Poster]
Egocentric Video-Language Pretraining (NeurIPS 2022) [Paper] [Code] [Project] [Poster]

Image-language pre-training and solves image captioning, image-text retrieval, object detection, segmentation, referring expression comprehension.

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment (TMLR 2023) [Paper] [Code] [Project]

Video temporal grounding, unifying diverse temporal annotations to power moment retrieval (interval), highlight detection (curve) and video summarization (point).

UniVTG: Towards Unified Video-Language Temporal Grounding (ICCV 2023) [Paper] [Code]

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.