Coder Social home page Coder Social logo

codedvo's Introduction

CodedVO: Coded Visual Odometry

Official implementation of "CodedVO: Coded Visual Odometry" accepted in IEEE Robotics and Automation Letters, 2024.

Project page | IEEE Xplore | arXiv

Example of coded aperture setup

Video

CodedVO Video

Citation

If you use this code in your research, please cite:

@ARTICLE{codedvo2024,
  author={Shah, Sachin and Rajyaguru, Naitri and Singh, Chahat Deep and Metzler, Christopher and Aloimonos, Yiannis},
  journal={IEEE Robotics and Automation Letters}, 
  title={CodedVO: Coded Visual Odometry}, 
  year={2024},
  doi={10.1109/LRA.2024.3416788}}

Table of Contents

  1. Introduction
  2. Installation
  3. Models
  4. Dataset
  5. Training
  6. Evaluation
  7. Usage
  8. Contributions

Introduction

  • A novel method for estimating monocular visual odometry that leverages RGB and metric depth estimates obtained through a phase mask on a standard 1-inch camera sensor.
  • A depth-weighted loss function designed to prioritize learning depth maps at closer distances.
  • Evaluation in zero-shot indoor scenes without requiring a scale for evaluation.

Installation

Clone Repository

git clone https://github.com/naitri/CodedVO
cd CodedVO

Environment Setup

 conda env create -f environment.yml

Models

Download Pre-trained Models

We provide our metric depth-weighted loss pre-trained model, which has been benchmarked on various indoor datasets. Download Pre-trained Models

Dataset

Download and Setup

We provide the training dataset, which includes the UMD-CodedVO dataset LivingRoom and NYU data, each containing 1000 images. The dataset also includes coded blur RGB images.

Additionally, we provide UMD-CodedVO dataset which includes ground truth depth, RGB images, coded blur RGB images, and trajectory information.

Dataset Structure

├── README.md
├── datasets
│   └── nyu_data
│       ├── rgb
│       ├── depth
│       └── Codedphasecam-27Linear
│   └── ...
├── scripts
│   └── ...
├── weights
│   └── ...

Generate Coded Images

To generate coded blur RGB images from your own data, you can use the script coded-generator.py.

cd scripts
python coded_generator.py --root /path/to/your/data --scale_factor YOUR_SCALE_FACTOR
  • Scale factor for NYUv2 dataset is 1000, UMD-CodedVO-dataset is 1 and ICL-NUIM dataset is 5000.
  • root path should be fodler contianing rgb, depth, Codedphasecam-27Linear. for e.g. ./datasets/nyu_data

Note: Our Point Spread Functions (PSFs) correspond to discretized depth layers using a 23×23 Zernike parameterized phase mask,with the depth range discretized into 27 bins within the interval of [0.5, 6] meters, with a focal distance of 85 cm.

Training

Train from Scratch

To train your data or our given dataset :

 python trainer.py --config MetricWeightedLossBlenderNYU --datasets /path/to/dataset/folder
  • You can add different configurations for loss and depth space in config.py and use those configurations for training. In this example, we use MetricWeightedLossBlenderNYU for our pre-trained weight file.
  • You can also change the training or test dataset in config.py by modifying lines 19-31.

Evaluation

The evaluation script can be executed as follows:

python evaluate.py --CONFIG MetricWeightedLossBlenderNYU --DATASET /path/to/dataset/folder --OUTPUT /path/to/output/folder --CHECKPOINT /path/to/checkpoint/file

Usage

Run Visual Odometry

We use ORB-SLAM after disabling the loop closure. Predicted depth maps from the above models are used to compute the odometry. Follow the ORB-SLAM2 RGBD execution instructions. Note that we do not use coded blur RGB images directly. As mentioned in the paper, we apply unsharp masking on them for computing odometry.

Acknowledgements

We would like to thank authors of Phasecam3D and ORB-SLAM2 for opensourcing codebase.

Contributions

If you have any questions/comments/bug reports, feel free to open a github issue or pull a request or e-mail to the authors Naitri Rajyaguru or Sachin Shah

codedvo's People

Contributors

naitri avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.