Coder Social home page Coder Social logo

barrydoooit / msmdfusion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sxjyjay/msmdfusion

0.0 0.0 0.0 5.71 MB

[CVPR 2023] MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

License: Apache License 2.0

Shell 0.07% C++ 6.54% Python 89.10% Cuda 4.25% Dockerfile 0.03%

msmdfusion's Introduction

MSMDFusion

Official implementation of our CVPR'2023 paper "MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection", by Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, and Yu-Gang Jiang. MSMDFusion framework

Introduction

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and semantic features from two drastically different modalities. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques. However, depth information is under-investigated in these approaches when lifting points into 3D space, thus 2D semantics can not be reliably fused with 3D points. Moreover, their multi-modal fusion strategy, which is implemented as concatenation or attention, either can not effectively fuse 2D and 3D information or is unable to perform fine-grained interactions in the voxel space. To this end, we propose a novel framework called MSMDFusion to tackle above problems.

Getting Started

Installation

For basic installation, please refer to getting_started.md for installation.

Notice:

Data Preparation

Step 1: Please refer to the official site for prepare nuscenes data. After data preparation, you will be able to see the following directory structure:

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval
│   │   ├── nuscenes_database
│   │   ├── nuscenes_infos_train.pkl
│   │   ├── nuscenes_infos_val.pkl
│   │   ├── nuscenes_infos_test.pkl
│   │   ├── nuscenes_dbinfos_train.pkl

Step 2: Download preprocessed virtual points samples(extraction code: 9xcb) and sweeps(extraction code: 2eg1) data. And put them under the above folder samples and sweeps, respectively, and rename them as FOREGROUND_MIXED_6NN_WITH_DEPTH.

Training and Evaluation

For training, you need to first train a pure LiDAR backbone, such as TransFusion-L. Then, you can merge the checkpoints from pretrained TransFusion-L and ResNet-50 as suggested here. We also provide a merged 1-st stage checkpoint here(extraction code: 69i7)

# 1-st stage training
sh ./tools/dist_train.sh ./configs/transfusion_nusc_voxel_L.py 8
# 2-nd stage training
sh ./tools/dist_train.sh ./configs/MSMDFusion_nusc_voxel_LC.py 8

Notice: When training the 1-st stage of TransFusion-L, please follow the copy-and-paste fade strategy as suggested here.

For evaluation, you can use the following command:

# Evaluation
sh ./tools/dist_test.sh ./configs/MSMDFusion_nusc_voxel_LC.py $ckpt_path$ 8 --eval bbox

For testing and making a submission to the leaderboard, please refer to the official site

Results

3D Object Detection on nuScenes

Model Set mAP NDS Result Files
MSMDFusion val 69.27 72.05 checkpoints
MSMDFusion test 71.49 73.96 predictions
MSMDFusion-TTA test 73.28 75.09 predictions

3D Object Tracking on nuScenes

Model Set AMOTA AMOTP Recall Result Files
MSMDFusion test 73.98 54.87 76.30 predictions

Citation

If you find our paper useful, please cite:

@InProceedings{Jiao_2023_CVPR,
    author    = {Jiao, Yang and Jie, Zequn and Chen, Shaoxiang and Chen, Jingjing and Ma, Lin and Jiang, Yu-Gang},
    title     = {MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {21643-21652}
}

Acknowlegement

We sincerely thank the authors of mmdetection3d, CenterPoint, TransFusion, MVP, BEVFusion and BEVFusion for open sourcing their methods.

msmdfusion's People

Contributors

sxjyjay avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.