Coder Social home page Coder Social logo

victoraptord / top-cvpr-2024-papers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from skalskip/top-cvpr-2024-papers

0.0 0.0 0.0 105 KB

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]

License: Creative Commons Zero v1.0 Universal

Python 100.00%

top-cvpr-2024-papers's Introduction

visitor badge

top CVPR 2024 papers

2023

vancouver

photo from 2023; I will update in June

👋 hello

Computer Vision and Pattern Recognition is a massive conference. In 2024 alone, 11,532 papers were submitted, and 2,719 were accepted. I created this repository to help you search for crème de la crème of CVPR publications. If the paper you are looking for is not on my short list, take a peek at the full list of accepted papers.

🗞️ papers and posters

🔥 - highlighted papers

3d from multi-view and sensors

SpatialTracker: Tracking Any 2D Pixels in 3D Space 🔥 SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou
[paper] [code]
Topic: 3D from multi-view and sensors
Session: Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84



ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner
[paper] [code] [video]
Topic: 3D from multi-view and sensors
Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20



efficient and scalable vision

🔥 EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra
[paper] [code] [demo]
Topic: Efficient and scalable vision
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #144


image and video synthesis and generation

DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
[paper] [code] [demo] [colab]
Topic: Image and video synthesis and generation
Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #132


DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing 🔥 DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai
[paper] [code] [video]
Topic: Image and video synthesis and generation
Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #392



recognition: categorization, detection, retrieval

DETRs Beat YOLOs on Real-time Object Detection DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen
[paper] [code] [video]
Topic: Recognition: Categorization, detection, retrieval
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #229



YOLO-World: Real-Time Open-Vocabulary Object Detection YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan
[paper] [code] [video] [demo] [colab]
Topic: Recognition: Categorization, detection, retrieval
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #223



segmentation, grouping and shape analysis

RobustSAM: Segment Anything Robustly on Degraded Images 🔥 RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang
[paper] [video]
Topic: Segmentation, grouping and shape analysis
Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #378



Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation 🔥 Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao
[paper] [code] [video]
Topic: Segmentation, grouping and shape analysis
Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #351



video: low-level analysis, motion, and tracking

Matching Anything by Segmenting Anything 🔥 Matching Anything by Segmenting Anything
Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu
[paper] [code] [video]
Topic: Video: Low-level analysis, motion, and tracking
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #421



DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng
[paper] [code]
Topic: Video: Low-level analysis, motion, and tracking
Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #455



vision, language, and reasoning

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
[paper] [code] [video] [demo]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #327



🔥 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie
[paper] [code]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #390


LISA: Reasoning Segmentation via Large Language Model 🔥 LISA: Reasoning Segmentation via Large Language Model
Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia
[paper] [code] [demo]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #413



ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee
[paper] [code] [video] [demo]
Topic: Vision, language, and reasoning
Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #317



🦸 contribution

We would love your help in making this repository even better! If you know of an amazing paper that isn't listed here, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.