Computer Vision and Pattern Recognition is a massive conference. In 2024 alone, 11,532 papers were submitted, and 2,719 were accepted. I created this repository to help you search for crème de la crème of CVPR publications. If the paper you are looking for is not on my short list, take a peek at the full list of accepted papers.
🔥 - highlighted papers
🔥 SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou
[paper] [code]
Topic: 3D from multi-view and sensors
Session: Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner
[paper] [code] [video]
Topic: 3D from multi-view and sensors
Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20
🔥 EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra
[paper] [code] [demo]
Topic: Efficient and scalable vision
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #144
DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
[paper] [code] [demo] [colab]
Topic: Image and video synthesis and generation
Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #132
🔥 DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai
[paper] [code] [video]
Topic: Image and video synthesis and generation
Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #392
DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen
[paper] [code] [video]
Topic: Recognition: Categorization, detection, retrieval
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #229
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan
[paper] [code] [video] [demo] [colab]
Topic: Recognition: Categorization, detection, retrieval
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #223
🔥 RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang
[paper] [video]
Topic: Segmentation, grouping and shape analysis
Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #378
🔥 Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao
[paper] [code] [video]
Topic: Segmentation, grouping and shape analysis
Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #351
🔥 Matching Anything by Segmenting Anything
Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu
[paper] [code] [video]
Topic: Video: Low-level analysis, motion, and tracking
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #421
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng
[paper] [code]
Topic: Video: Low-level analysis, motion, and tracking
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #455
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
[paper] [code] [video] [demo]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #327
🔥 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie
[paper] [code]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #390
🔥 LISA: Reasoning Segmentation via Large Language Model
Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia
[paper] [code] [demo]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #413
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee
[paper] [code] [video] [demo]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #317
We would love your help in making this repository even better! If you know of an amazing paper that isn't listed here, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.