Coder Social home page Coder Social logo

awesome-large-vision-language-models's Introduction

Awesome-Large-Vision-Language-Models

Awesome

Papers and codes for large vision-language models.

This repo mainly focuses on the large vision-language models tasks. Please pull requests or email me by [email protected] if you want to recommend papers.

If you are interested in related tasks, you can reach me out by discord account: yangcao#9724 or WeChat: 85298328912.

3D

  1. [3D-LLM] 3D-LLM: Injecting the 3D World into Large Language Models, NeurIPS2023. [Code]
  2. [LL3DA] LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning, CVPR2024. [Code]
  3. [GPT4Point] GPT4Point: A Unified Framework for Point-Language Understanding and Generation, CVPR2024. [Code]
  4. [Uni3D] Uni3D: Exploring Unified 3D Representation at Scale, ICLR2024. [Code]

2D

  1. [LLaMA-VID] LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models, Arxiv2023. [Code]
  2. [Mini-Gemini] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models, Arxiv2024. [Code]
  3. [Prompt Highlighter] Prompt Highlighter: Interactive Control for Multi-Modal LLMs, CVPR2024. [Code]

awesome-large-vision-language-models's People

Contributors

yangcaoai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.