Coder Social home page Coder Social logo

scrn-vrc / point-pillars-in-unitycg Goto Github PK

View Code? Open in Web Editor NEW
18.0 2.0 1.0 49.79 MB

Implementation of PointPillars paper in Unity for VRChat

License: MIT License

C++ 20.02% C# 4.06% ShaderLab 56.01% HLSL 5.33% GLSL 14.58%
machine-learning point-cloud pointpillars shaders unity3d vrchat

point-pillars-in-unitycg's Introduction

PointPillars in UnityCG

NOTE: This was built and tested with Unity 2019.4.31f1 using built-in render pipeline, there may be shader compatibility issues with other versions.

Table of Contents

Overview

Simple implementation of PointPillars: Fast Encoders for Object Detection from Point Clouds in Unity for VRChat using just fragment and geometry shaders without any additional dependencies. This is more of an educational tool than a practical implementation. This implementation takes ~40 frames to compute a single frame of input while running at ~20 FPS. The original runs around ~60 FPS in PyTorch.

Problems

Slow computations aside, actual lidar data also returns a reflectance value. One of the key features used by the network during classification.

The table above was directly lifted from the PointPillars paper. It shows how reflectance raised the average precision from the base XYZ location information. This implementation supports reflectance, however, I modified it to a constant value of 0.15 because there is no easy way to estimate reflectance without extra information.

Setup

  1. Either clone the repo or download from Release.
  2. Drag and drop the prefab in the Prefabs folder into the scene. Or open up the scene that came with the package.
  3. Run the network in Playmode.

The network outputs up to 100 predictions, only 33 of the bounding boxes are shown. Invalid predictions are returned as -1. All 100 predictions are rendered into PointPillars\RenderTextures\Output3.renderTexture. You can read it in a shader by importing the .cginc PointPillars\Shaders\PointPillarsInclude.cginc

Shader properties:

    Properties
    {
        _ControllerTex ("Controller", 2D) = "black" {}
        _DataTex ("Data Texture", 2D) = "black"
    }

PPControllerBuffer.renderTexture goes into _ControllerTex, Output3.renderTexture goes into _DataTex. PPControllerBuffer.renderTexture contains a count of all predictions.

Includes:

#include "PointPillarsInclude.cginc"

Functions:

int count = getCount(_ControllerTex);
float4 sizeRot = getPredictionSizeRotation(_DataTex, id);
float3 pos = getPredictionPosition(_DataTex, id);

id is the index going from 0 to 99. An example of how to draw the bounding boxes with the following functions is in PointPillars\Display\BBoxes\BBoxDraw.shader.

C++ Code

The C++ code included with the repo is just an exact CPU clone of how PointPillars would run on the GPU. No additional dependency is required to compile but it runs very slowly.

Model Architecture

Figure from the original PointPillars paper. The network begins by voxelizing the lidar data into a 2D grid without bounding Z. Hence the name "pillars". This serves to condense the data into a dense matrix. Then it's fed into a classic 2D CNN classifier as the backbone, and ending with a single shot detector network structure, like YOLOv4.

GPU Implementation

The GPU implementation for VRChat consists of 40+ cameras rendering to about 1GB of render textures. Moving points into pillar voxels required using a bitonic merge sort and d4rk's compact sparse texture code. One million particles were used in a geometry shader to scatter extracted features into the dense matrix for the 2D CNN.

PointPillars spits out 321408 predictions towards the end. But because most of the outputs are 0, they can be filtered with d4rk's compact sparse texture method into a 32x32 render texture and sorted again.

The Non-maximum Suppression (NMS) method used at the end to remove duplicate bounding boxes was simplified in the GPU implementation because of how slow it already was. Instead of the normal Rotation-robust Intersection over Union for 3D Object Detection, I simply used a sphere intersection check on the shortest XY length of the bounding box. The original rotation-robust intersection implementation can be seen in the C++ code.

Resources

Datasets

Thanks to d4rkpl4y3r for the Compact Sparse Texture Demo that made this possible.

If you have questions or comments, you can reach me on Discord: SCRN#8008 or Twitter: https://twitter.com/SCRNinVR

point-pillars-in-unitycg's People

Contributors

scrn-vrc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

unitycoder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.