Jensen Wang's Projects
Awesome papers about Multi-Camera 3D Object Detection and Segmentation in Bird-Eye-View, such as DETR3D, BEVDet, BEVFormer
A curated list of awesome neural radiance fields papers
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Emu: An Open Multimodal Generalist
EVA Series: Visual Representation Fantasies from BAAI
A Keypoint-based Global Association Network for Lane Detection. Accepted by CVPR 2022
ImageBind One Embedding Space to Bind Them All
LAVIS - A One-stop Library for Language-Vision Intelligence
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
Ongoing research training transformer models at scale
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, image/video restoration/enhancement, etc.
A general representation modal across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
An open source implementation of CLIP.
An open-source framework for training large multimodal models.
Painter & SegGPT Series: Vision Foundation Models from BAAI
PandaGPT: One Model To Instruction-Follow Them All
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
[Preprint] ViT-Lens: Towards Omni-modal Representations