Coder Social home page Coder Social logo

eaphan / slidr-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from valeoai/slidr

0.0 0.0 0.0 360 KB

Official PyTorch implementation of "Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data"

License: Other

Python 88.78% Jupyter Notebook 10.54% Dockerfile 0.68%

slidr-1's Introduction

Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data

Official PyTorch implementation of the method SLidR. More details can be found in the paper:

Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data, CVPR 2022 [arXiv] by Corentin Sautier, Gilles Puy, Spyros Gidaris, Alexandre Boulch, Andrei Bursuc, and Renaud Marlet

Overview of the method

If you use SLidR in your research, please consider citing:

@InProceedings{SLidR,
    author    = {Sautier, Corentin and Puy, Gilles and Gidaris, Spyros and Boulch, Alexandre and Bursuc, Andrei and Marlet, Renaud},
    title     = {Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {9891-9901}
}

Dependencies

Please install the required required packages. Some libraries used in this project, including MinkowskiEngine and Pytorch-lightning are known to have a different behavior when using a different version; please use the exact versions specified in requirements.txt.

Datasets

The code provided is compatible with nuScenes and semantic KITTI. Put the datasets you intend to use in the datasets folder (a symbolic link is accepted).

Pre-trained models

Minkowski SR-UNet

SR-UNet pre-trained on nuScenes

SPconv VoxelNet

VoxelNet pre-trained on nuScenes

PV-RCNN finetuned on KITTI

Reproducing the results

Pre-computing the superpixels (required)

Before launching the pre-training, you first need to compute all superpixels on nuScenes, this can take several hours. You can either compute superpixels for the Minkowski SR-UNet (minkunet) or the voxelnet backbones. The first is adapted for semantic segmentation and the second for object detection.

python superpixel_segmenter.py --model minkunet

Pre-training a 3D backbone

To launch a pre-training of the Minkowski SR-UNet (minkunet) on nuScenes:

python pretrain.py --cfg config/slidr_minkunet.yaml

You can alternatively replace minkunet with voxelnet to pre-train a PV-RCNN backbone.
Weights of the pre-training can be found in the output folder, and can be re-used during a downstream task. If you wish to use multiple GPUs, please scale the learning rate and batch size accordingly.

Semantic segmentation

To launch a semantic segmentation, use the following command:

python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="output/pretrain/[...]/model.pt"

with the previously obtained weights, and any config file. The default config will perform a finetuning on 1% of nuScenes' training set, with the learning rates optimized for the provided pre-training.

To re-evaluate the score of any downstream network, run:

python evaluate.py --resume_path="output/downstream/[...]/model.pt" --dataset="nuscenes"

If you wish to reevaluate the linear probing, the experiments in the paper were obtained with lr=0.05, lr_head=null and freeze_layers=True.

Object detection

All experiments for object detection have been done using OpenPCDet.

Published results

All results are obtained with a pre-training on nuScenes.

Few-shot semantic segmentation

Results on the validation set using Minkowski SR-Unet:

Method nuScenes
lin. probing
nuScenes
Finetuning with 1% data
KITTI
Finetuning with 1% data
Random init. 8.1 30.3 39.5
PointContrast 21.9 32.5 41.1
DepthContrast 22.1 31.7 41.5
PPKT 36.4 37.8 43.9
SLidR 38.8 38.3 44.6

Semantic Segmentation on nuScenes

Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:

Method 1% 5% 10% 25% 100%
Random init. 30.3 47.7 56.6 64.8 74.2
SLidR 39.0 52.2 58.8 66.2 74.6

Object detection on KITTI

Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:

Method 5% 10% 20%
Random init. 56.1 59.1 61.6
PPKT 57.8 60.1 61.2
SLidR 57.8 61.4 62.4

Unpublished preliminary results

All results are obtained with a pre-training on nuScenes.

Results on the validation set using PV-RCNN:

Method Car Pedestrian Cyclist mAP@40
Random init. 84.5 57.9 71.3 71.3
STRL* 84.7 57.8 71.9 71.5
PPKT 83.2 55.5 73.8 70.8
SLidR 84.4 57.3 74.2 71.9

*STRL has been pre-trained on KITTI, while SLidR and PPKT were pre-trained on nuScenes

Results on the validation set using SECOND:

Method Car Pedestrian Cyclist mAP@40
Random init. 81.5 50.9 66.5 66.3
DeepCluster* 66.1
SLidR 81.9 51.6 68.5 67.3

*As reimplemented in ONCE

Visualizations

For visualization you need a pre-training containing both 2D & 3D models. We provide the raw SR-UNet & ResNet50 pre-trained on nuScenes. The image part of the pre-trained weights are identical for almost all layers to those of MoCov2 (He et al.)

The visualization code allows to assess the similarities between points and pixels, as shown in the article.

Acknowledgment

Part of the codebase has been adapted from PointContrast. Computation of the lovasz loss used in semantic segmentation follows the code of PolarNet.

License

SLidR is released under the Apache 2.0 license.

slidr-1's People

Contributors

csautier avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.