Coder Social home page Coder Social logo

zebrajack / deepviewagg Goto Github PK

View Code? Open in Web Editor NEW

This project forked from drprojects/deepviewagg

0.0 2.0 0.0 308.96 MB

PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

License: Other

Python 94.61% Shell 0.98% Jupyter Notebook 4.41%

deepviewagg's Introduction

DeepViewAgg [CVPR 2022 Oral]

PWC PWC

Official repository for Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation paper ๐Ÿ“„ selected for an Oral presentation at CVPR 2022.

We propose to exploit the synergy between images and 3D point clouds by learning to select the most relevant views for each point. Our approach uses the viewing conditions of 3D points to merge features from images taken at arbitrary positions. We reach SOTA results for S3DIS (74.7 mIoU 6-Fold) and on KITTI- 360 (58.3 mIoU) without requiring point colorization, meshing, or the use of depth cameras: our full pipeline only requires raw 3D scans and a set of images and poses.

Coming very soon ๐Ÿšจ ๐Ÿšง

  • pretrained weights from our best-performing model on S3DIS and KITTI-360
  • wandb logs of our experiments

Change log

  • 2022-04-20 Added notebooks and scripts to get started with DeepViewAgg

Requirements ๐Ÿ“

The following must be installed before installing this project.

  • Anaconda3
  • cuda >= 10.1
  • gcc >= 7

All remaining dependencies (PyTorch, PyTorch Geometric, etc) should be installed using the provided installation script.

The code has been tested in the following environment:

  • Ubuntu 18.04.6 LTS
  • Python 3.8.5
  • PyTorch 1.7.1
  • CUDA 10.2, 11.2 and 11.4
  • 64G RAM

Installation ๐Ÿงฑ

To install DeepViewAgg, simply run ./install.sh from inside the repository.

  • You will need to have sudo rights to install MinkowskiEngine and TorchSparse dependencies.
  • โš ๏ธ Do not install Torch-Points3D from the official repository, or with pip.

Disclaimer

This is not the official Torch-Points3D framework. This work builds on and modifies a fixed version of the framework and has not been merged with the official repository yet. In particular, this repository introduces numerous features for multimodal learning on large-scale 3D point clouds. In this repository, some TP3D-specific files were removed for simplicity.

Project structure

The project follows the original Torch-Points3D framework structure.

โ”œโ”€ conf                    # All configurations live there
โ”œโ”€ notebooks               # Notebooks to get started with multimodal datasets and models
โ”œโ”€ eval.py                 # Eval script
โ”œโ”€ insall.sh               # Installation script for DeepViewAgg
โ”œโ”€ scripts                 # Some scripts to help manage the project
โ”œโ”€ torch_points3d
    โ”œโ”€ core                # Core components
    โ”œโ”€ datasets            # All code related to datasets
    โ”œโ”€ metrics             # All metrics and trackers
    โ”œโ”€ models              # All models
    โ”œโ”€ modules             # Basic modules that can be used in a modular way
    โ”œโ”€ utils               # Various utils
    โ””โ”€ visualization       # Visualization
โ””โ”€ train.py                # Main script to launch a training

Several changes were made to extend the original project to multimodal learning on point clouds with images. The most important ones can be found in the following:

  • conf/data/segmentation/multimodal: configs for the 3D+2D datasets.
  • conf/models/segmentation/multimodal: configs for the 3D+2D models.
  • torch_points3d/core/data_transform/multimodal: transforms for 3D+2D data.
  • torch_points3d/core/multimodal: multimodal data and mapping objects.
  • torch_points3d/datasets/segmentation/multimodal: 3D+2D datasets (eg S3DIS, ScanNet, KITTI360).
  • torch_points3d/models/segmentation/multimodal: 3D+2D architectures.
  • torch_points3d/modules/multimodal: 3D+2D modules. This is where the DeepViewAgg module can be found.
  • torch_points3d/visualization/multimodal_data.py: tools for interactive visualization of multimodal data.

Getting started ๐Ÿš€

Notebook to create synthetic toy dataset and get familiar with 2D-3D mappings construction :

  • notebooks/synthetic_multimodal_dataset.ipynb

Notebooks to create dataset, get familiar with dataset configuration and produce interactive visualization:

  • notebooks/kitti360_visualization.ipynb (at least 350G of memory ๐Ÿ’พ)
  • notebooks/s3dis_visualization.ipynb (at least 400G of memory ๐Ÿ’พ)
  • notebooks/scannet_visualization.ipynb (at least 1.3T of memory ๐Ÿ’พ)

Notebooks to create multimodal models, get familiar with model configuration and run forward and backward passes for debugging:

  • notebooks/multimodal_model.ipynb

Notebooks to run full inference on multimodal datasets, from a model checkpoint:

  • notebooks/kitti360_inference.ipynb
  • notebooks/s3dis_inference.ipynb
  • notebooks/scannet_inference.ipynb

Scripts to replicate our paper's best results ๐Ÿ“ˆ for each dataset:

  • scripts/train_kitti360.sh
  • scripts/train_s3dis.sh
  • scripts/train_scannet.sh

If you need to go deeper into this project, see the Documentation section.

If you have trouble using these or need reproduce other results from our paper, create an issue or leave me a message ๐Ÿ’ฌ !

Documentation ๐Ÿ“š

The official documentation of Pytorch Geometric and Torch-Points3D are good starting points, since this project largely builds on top of these frameworks. For DeepViewAgg-specific features (i.e. all that concerns multimodal learning), the provided code is commented as much as possible, but hit me up ๐Ÿ’ฌ if some parts need clarification.

Visualization of multimodal data ๐Ÿ”ญ

We provide code to produce interactive and sharable HTML visualizations of multimodal data and point-image mappings:

Examples of such HTML produced on S3DIS Fold 5 are zipped here and can be opened in your browser.

Credits ๐Ÿ’ณ

  • This implementation of DeepViewAgg largely relies on the Torch-Points3D framework, although not merged with the official project at this point.
  • For datasets, some code from the official KITTI-360 and ScanNet repositories was used.

Reference

In case you use all or part of the present code, please include a citation to the following paper:

@inproceedings{robert2022dva,
  title={Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation},
  author={Robert, Damien and Vallet, Bruno and Landrieu, Loic},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022},
  url = {\url{https://github.com/drprojects/DeepViewAgg}}
}

deepviewagg's People

Contributors

tchaton avatar nicolas-chaulet avatar drprojects avatar humanpose1 avatar tristanheywood avatar ccinc avatar damdamrob avatar daili650 avatar loicland avatar zeliu98 avatar harrydobbs avatar uakh avatar rancheng avatar simone-fontana avatar guochengqian avatar gabrieleangeletti avatar stakhan avatar wundersam avatar fengziyue avatar chaitjo avatar dependabot[bot] avatar yhijioka avatar jloveu avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.