Coder Social home page Coder Social logo

complex_yolo_3d's Introduction

Complex YOLO with Uncertainty

Deep Learning Project

Wei Luo and Yuanchu Dang

Our repo contains a PyTorch implementation of the Complex YOLO model with uncertainty for object detection in 3D.
Our code is inspired by and builds on existing implementations of Complex YOLO implementation of 2D YOLO and sample Complex YOLO implementation.
Our further contributions are as follows:

  1. Added dropout layers and incorporated uncertainty into 3D object detection while preserving average precision.
  2. Projected predictions to 3D using homography.
  3. Attempted to add innovative loss terms to improve the model in cases when it predicts overlapping bounding boxes.

Data

To run the model, you need to download and unzip the following data:

You also need to set the dataset path by modifying the following line from main.py:

dataset = KittiDataset(root='/Users/yuanchu/columbia/deep_learning/project/milestone/YOLO3D/data',set='train')

The following is an visualization of a sample image and its corresponding velodyne point-cloud.

Network Architecture

Training

These three lines in kitti.py should be modified with respect to your own path:

def __init__(self, root = '/Users/yuanchu/',set='train',type='velodyne_train'):

You need to also have a train.txt that contains filename for the images that you want in the training set. Each line corresponds to one image. See the sample file in this repo.

Testing

In eval.py, there is a block that begins with the following:

for file_i in range(1):
	test_i = str(file_i).zfill(6)
	cur_dir = os.getcwd()	
	lidar_file = cur_dir + '/data/training/velodyne/'+test_i+'.bin'
	calib_file = cur_dir + '/data/training/calib/'+test_i+'.txt'
	label_file = cur_dir + '/data/training/label_2/'+test_i+'.txt'

You need to change the number in range(1) to the number of files that you want to put in the test set.

For each test file, the model will make predictions and output a point cloud image, saved using

misc.imsave('eval_bv'+test_i+'.png',img)

Generating Results

The heat folder and project folder contain code for generating heatmap and 3D projections, respectively. The heatmap script loads a saved .npy file containing bounding box predictions, and a .png file for the corresponding road image. Note that running the heatmap script requires an account on plotly. After running the program, it will put the resulting image on plotly. You should change the configurations inside the script accordingly. For projection, the script loads in saved .npy files containing target and prediction boxes, as well as original road image and corresponding velodyne point-cloud with target and prediction boxes drawn. It also needs predefined heights and fine-tuned homography anchor points to produce an accurate 3D projection.

Sample Results

Below are sample velodyne point-cloud with box predictions, along with the corresponding heatmaps that show our model's confidence.

Below is a comparison of average precision between original Complex YOLO and our Complex YOLO with uncertainty.

You may refer to either our report or poster for more details.

Future Work

For future work, we can train model directly on labeled 3D data to make direct predictions without having to use homography and be able to visualize uncertainty in 3D. We can also attempt to take other models such as Fast-RCNN to 3D. Yet another direction would to extend to 4D as just presented at NeurIPS 2018: YOLO 4D!

complex_yolo_3d's People

Contributors

wl5 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.