Coder Social home page Coder Social logo

yangcaoai / awesome-llm-3d Goto Github PK

View Code? Open in Web Editor NEW

This project forked from activevisionlab/awesome-llm-3d

0.0 0.0 0.0 13.94 MB

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

License: MIT License

awesome-llm-3d's Introduction

Awesome-LLM-3D Awesome Maintenance PR's Welcome

๐Ÿ  About

Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models (LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents. Also, we include other Foundation Models (CLIP, SAM) for the whole picture of this area.

This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.

๐Ÿ”ฅ News

  • [2023-12-16] Xianzheng Ma and Yash Bhalgat curated this list and published the first version;
  • [2024-01-06] Runsen Xu added chronological information and Xianzheng Ma reorganized it in Z-A order for better following the latest advances.

Table of Content

3D Understanding via LLM

Date Keywords Institute (first) Paper Publication Others
2023-12-21 LiDAR-LLM PKU LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding Arxiv project
2023-12-15 3DAP Shanghai AI Lab 3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V Arxiv project
2023-12-13 Chat-3D v2 ZJU Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers Arxiv github
2023-12-5 GPT4Point HKU GPT4Point: A Unified Framework for Point-Language Understanding and Generation Arxiv github
2023-11-30 LL3DA Fudan University LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning Arxiv github
2023-11-26 ZSVG3D CUHK(SZ) Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding Arxiv project
2023-11-18 LEO BIGAI An Embodied Generalist Agent in 3D World Arxiv github
2023-10-14 JM3D-LLM Xiamen University JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues ACM MM'2023 github
2023-9-27 - KAUST Zero-Shot 3D Shape Correspondence Siggraph Asia'2023 -
2023-9-21 LLM-Grounder U-Mich LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent Arxiv github
2023-9-1 Point-Bind CUHK Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following Arxiv github
2023-8-31 PointLLM CUHK PointLLM: Empowering Large Language Models to UnderstandPoint Clouds Arxiv github
2023-8-17 Chat-3D ZJU Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes Arxiv github
2023-8-8 3D-VisTA BIGAI 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment ICCVโ€˜2023 github
2023-7-24 3D-LLM UCLA 3D-LLM: Injecting the 3D World into Large Language Models NeurIPS'2023 github
2023-3-29 ViewRefer CUHK ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding ICCV'2023 github
2023-2-14 ConceptFusion MIT ConceptFusion: Open-set Multimodal 3D Mapping RSS'2023 project
2022-9-12 - MIT Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding Arxiv github

3D Understanding via other Foundation Models

ID keywords Institute (first) Paper Publication Others
2023-12-17 SAI3D PKU SAI3D: Segment Any Instance in 3D Scenes Arxiv project
2023-12-17 Open3DIS VinAI Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance Arxiv project
2023-11-6 OVIR-3D Rutgers University OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data CoRL'2023 github
2023-10-29 OpenMask3D ETH OpenMask3D: Open-Vocabulary 3D Instance Segmentation NeurIPS'2023 project
2023-10-5 Open-Fusion - Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation Arxiv github
- - - From Language to 3D Worlds: Adapting Language Model for Point Cloud Perception OpenReview -
- OpenNerf - OpenNerf: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views OpenReview github
2023-9-1 OpenIns3D Cambridge OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation Arxiv project
2023-6-7 Contrastive Lift Oxford-VGG Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion NeurIPS'2023 github
2023-6-4 Multi-CLIP ETH Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes Arxiv -
2023-5-23 3D-OVS NTU Weakly Supervised 3D Open-vocabulary Segmentation NeurIPS'2023 github
2023-5-21 VL-Fields University of Edinburgh VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations ICRA'2023 project
2023-5-8 CLIP-FO3D Tsinghua University CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP ICCVW'2023 -
2023-4-12 3D-VQA ETH CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes CVPRW 2023 github
2023-4-3 RegionPLC HKU RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding Arxiv project
2023-3-20 CG3D JHU CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition Arxiv github
2023-3-16 LERF UC Berkeley LERF: Language Embedded Radiance Fields ICCVโ€˜2023 github
2023-1-12 CLIP2Scene HKU CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP CVPR'2023 github
2022-12-1 UniT3D TUM UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding ICCV'2023 github
2022-11-29 PLA HKU PLA: Language-Driven Open-Vocabulary 3D Scene Understanding CVPR'2023 github
2022-11-28 OpenScene ETHz OpenScene: 3D Scene Understanding with Open Vocabularies CVPRโ€™2023 github
2022-10-11 CLIP-Fields NYU CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory Arxiv project
2022-7-23 Semantic Abstraction Columbia Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models CoRL'2022 project
2022-4-26 ScanNet200 TUM Language-Grounded Indoor 3D Semantic Segmentation in the Wild ECCV'2022 project

3D Reasoning

Date keywords Institute (first) Paper Publication Others
2023-5-20 3D-CLR UCLA 3D Concept Learning and Reasoning from Multi-View Images CVPR'2023 github
- Transcribe3D TTI, Chicago Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning CoRL'2023 github

3D Generation

Date keywords Institute Paper Publication Others
2023-11-29 ShapeGPT Fudan University ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model Arxiv github
2023-11-27 MeshGPT TUM MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers Arxiv project
2023-10-19 3D-GPT ANU 3D-GPT: Procedural 3D Modeling with Large Language Models Arxiv github
2023-9-21 LLMR MIT LLMR: Real-time Prompting of Interactive Worlds using Large Language Models Arxiv github
2023-9-20 DreamLLM MEGVII DreamLLM: Synergistic Multimodal Comprehension and Creation Arxiv github
2023-4-1 ChatAvatar Deemos Tech DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance ACM TOG website

3D Embodied Agent

Date keywords Institute Paper Publication Others
2023-11-27 Dobb-E NYU On Bringing Robots Home Arxiv github
2023-11-26 STEVE ZJU See and Think: Embodied Agent in Virtual Environment Arxiv github
2023-11-18 LEO BIGAI An Embodied Generalist Agent in 3D World Arxiv github
2023-9-14 UniHSI Shanghai AI Lab Unified Human-Scene Interaction via Prompted Chain-of-Contacts Arxiv github
2023-7-28 RT-2 Google-DeepMind RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Arxiv github
2023-7-12 SayPlan QUT Centre for Robotics SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning CoRL'2023 github
2023-7-12 VoxPoser Stanford VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models Arxiv github
2022-12-13 RT-1 Google RT-1: Robotics Transformer for Real-World Control at Scale Arxiv github
2022-12-8 LLM-Planner The Ohio State University LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models ICCV'2023 github
2022-10-11 CLIP-Fields NYU, Meta CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory RSS'2023 github

3D Benchmarks

Date keywords Institute Paper Publication Others
2023-12-26 EmbodiedScan Shanghai AI Lab EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI Arxiv github
2023-12-17 M3DBench Fudan University M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts Arxiv github
2023-11-29 - DeepMind Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects Arxiv github
2022-10-14 SQA3D BIGAI SQA3D: Situated Question Answering in 3D Scenes ICLR'2023 github
2021-12-20 ScanQA RIKEN AIP ScanQA: 3D Question Answering for Spatial Scene Understanding CVPR'2023 github
2020-12-3 Scan2Cap TUM Scan2Cap: Context-aware Dense Captioning in RGB-D Scans CVPR'2021 github
2020-8-23 ReferIt3D Stanford ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes ECCV'2020 github
2019-12-18 ScanRefer TUM ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language ECCV'2020 github

Contributing

your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding ๐Ÿ‘ to them.


If you have any questions about this opinionated list, please get in touch at [email protected] or Wechat ID: mxz1997112.

Acknowledgement

This repo is inspired by Awesome-LLM

awesome-llm-3d's People

Contributors

hannibal046 avatar xianzhengma avatar sumedhn97 avatar aaronwhy avatar yashbhalgat avatar sinwang20 avatar patrick-tssn avatar sinclaircoder avatar guyshilo avatar pchalasani avatar romilbhardwaj avatar lemanschik avatar cyril-jz avatar jeff3071 avatar ianblenke avatar xin-jing avatar merrymercy avatar jackmpcollins avatar zxlzr avatar jeasinema avatar rese1f avatar russhustle avatar l0z1k avatar sartajbhuvaji avatar rossng avatar rogeriochaves avatar rogerhyang avatar notmahi avatar mohabfekry avatar izhx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.