Coder Social home page Coder Social logo

thended / torch-ash Goto Github PK

View Code? Open in Web Editor NEW
144.0 7.0 5.0 1.62 MB

[PAMI 2022, CVPR 2023] ASH: Parallel Spatial Hashing for Fast Scene Reconstruction

Home Page: https://dongwei.info/publication/ash-mono/

License: MIT License

Python 48.88% C++ 12.12% Cuda 29.94% C 9.06%
hashing sdf

torch-ash's Introduction

torch-ash

torch-ash is the missing piece of collision-free extendable parallel spatial hashing for torch modules. It includes two paper's core implementations:

[PAMI 2022] | [CVPR 2023]

@article{dong2022ash,
  title={ASH: A modern framework for parallel spatial hashing in 3D perception},
  author={Dong, Wei and Lao, Yixing and Kaess, Michael and Koltun, Vladlen},
  journal={PAMI},
  year={2022},
}

@inproceedings{dong2023ash-mono,
  title={Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids},
  author={Dong, Wei and Choy, Chris and Loop, Charles and Zhu, Yuke and Litany, Or and Anandkumar, Anima},
  booktitle={CVPR},
  year={2023},
}

Note for a more user-friendly interface and further extensions, I have fully rewritten everything from scratch in this repo. Discrepancies from the reported results in the aforementioned papers are expected. Updates and more examples will come.

Install

First, install PyTorch. Optionally install nerfacc for volume rendering.

cmake is required in the conda environment for compiling the source code.

git clone --recursive [email protected]:theNded/torch-ash.git
pip install . --verbose

Engine (ASH)

  • The core is ASHEngine, a PyTorch module implementing a parallel, collision-free, dynamic hash map from coordinates (torch.IntTensor) to indices (torch.LongTensor). It depends on stdgpu.
  • Above ASHEngine, there are HashSet and HashMap which are wrappers around ASHEngine. A HashSet maps a coordinate to a boolean value, usually used for the unique operation. A HashMap maps a coordinate to a (dictionary) of values, and allows fast insertion and accessing coordinate-value pairs.
  • Similar to HashMap, HashEmbedding maps coordinates to embeddings and is akin to torch.nn.Embedding.

Usage

hashmap = HashMap(key_dim=3, value_dims={"color": 3, "depth": 1}, capacity=100, device=torch.device("cuda:0"))

# To insert
keys = (torch.rand(10, 3) * 100).int().cuda()
values = {"colors": torch.rand(10, 3).float().cuda(), "depth": torch.rand(10, 1).float().cuda()}
hashmap.insert(keys, values)

# To query
query_keys = (torch.rand(10, 3) * 100).int().cuda()
indices, masks = hashmap.find(query_keys)

# To enumerate
all_indices, all_values = hashmap.items(return_indices=True, return_values=True)

SparseDenseGrids for Surface Reconstruction

SparseDenseGrid is the engine for direct/neural scene representation. It consists of sparse arrays of grids and dense arrays of cells. The idea is similar to Instant-NGP and Plenoxels, but precise sparsity is achieved through spatial initialization and collision-free hashing. Essentially it is a modern version of VoxelHashing.

It has two wrappers for coordinate transform, UnboundedSparseDenseGrid for potentially dynamically increasing metric scenes, and BoundedSparseDenseGrid for scenes bounded in unit cubes. Trilinear interpolation and double backward are implemented to support differentiable gradient computation. All these modules can be converted to and from state dicts by serializing the underlying hash map.

The SparseDenseGrid does a good job without an MLP in fast reconstruction tasks (e.g. RGB-D fusion, differentiable volume rendering with a decent initialization), but with an MLP, there seem no advantages in comparison to Instant-NGP as of now. Potential extensions in this line are still in progress.

Demo: RGB-D fusion [PAMI 22]

RGB-D fusion takes in posed RGB-D images and creates colorized mesh, raw and filtered. Here, depth can either be sensor depth, or generated from a monocular depth prediction model (e.g. omnidata) with calibrated scales via COLMAP. Example datasets can be downloaded at Google Drive. Instructions for custom datasets will be available soon.

These datasets are organized by

- image/ # for RGB images [jpg|png]
- depth/ # for sensor depth [optional, png]
- omni_depth/ # for learned depth generated from RGB [npy]
- depth_scales.txt # calculated between learned depth and SfM
- omni_normal/ # for learned normals generated from RGB [optional, npy]
- poses.txt
- intrinsic.txt

To run the demo,

# Unbounded scenes, sensor depth
python demo/rgbd_fusion.py --path /path/to/dataset/samples --voxel_size 0.015 --depth_type sensor

# Bounded scenes, learned depth
python demo/rgbd_fusion.py --path /path/to/dataset/samples --resolution 512 --depth_type learned

Demo: surface refinement [CVPR 23]

With learned depth, the fusion result is usually noisy. We can apply volume rendering to further optimize the shape:

python demo/train_scene_recon.py --path /path/to/dataset/samples --voxel_size 0.015 --depth_type learned

We start with a local 7x7x7 Gaussian filter to smooth the initialization.

Volume rendering follows the initialization. The results will be written in logs/datetime. At every 500 iterations, mesh will be extracted and stored. The optimization will start with ripples on the surfaces, but finally converge to smooth reconstructions as shown above.

API Usage

Here is a brief summary of basic usage, doc will be online soon.

Allocation

We first initialize a 3D sparse-dense grid with 10000 sparse grid blocks. Each sparse grid contains a dense 8^3=512 array of cells, whose size is 0.01m.

grid = UboundedSparseDenseGrid(in_dim=3,
                               num_embeddings=10000,
                               grid_dim=16,
                               embedding_dims=8,
                               cell_size=0.01)

Initialization

We then spatially initialize the grid at input points (e.g. obtained point cloud, RGB-D scans). This results in coordinates and indices that support index-based access.

with torch.no_grad():
    grid_coords, cell_coords, grid_indices, cell_indices = grid.spatial_init_(points)

    # [Optional] direct assignment
    grid.embeddings[grid_indices, cell_indices] = attributes

Optimization

As a PyTorch extension, first and second-order autodiff are enabled by a differentiable query.

optim = torch.optim.SGD(grid.parameters(), lr=1e-3)
for x, gt in batch:
    optim.zero_grad()
    x.requires_grad_(True)
    embedding, mask = grid(x, interpolation="linear")

    output = forward_fn(embedding, mask)

    doutput_dx = torch.autograd.grad(
        outputs=output,
        inputs=x,
        grad_outputs=torch.ones_like(output, requires_grad=False),
        create_graph=True,
        retain_graph=True)[0]

    (loss_fn(output) + grad_loss_fn(doutput_dx)).backward()
    optim.step()

Milestones

  • Initial release
  • Demo: RGB-(pseudo)D SDF fusion
  • Demo: SDF refinement from volume rendering
  • Better instructions and documentation
  • Demo: LiDAR SDF fusion
  • Demo: MLP integration
  • CPU counterpart

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.