Coder Social home page Coder Social logo

aeolusguan / nmrf Goto Github PK

View Code? Open in Web Editor NEW
52.0 4.0 2.0 27.42 MB

[CVPR 2024] Neural Markov Random Field for Stereo Matching

License: MIT License

Python 97.20% C++ 0.29% Cuda 2.51%
markov-random-fields neural-message-passing stereo-matching transformer

nmrf's Introduction

NMRF-Stereo

Official PyTorch implementation of paper:

Neural Markov Random Field for Stereo Matching, CVPR 2024
Tongfan Guan, Chen Wang, Yun-Hui Liu

Introduction

The stereo method of hand-crafted Markov Random Field (MRF) lacks sufficient modeling accuracy compared to end-to-end deep models. While deep learning representations have greatly improved the unary terms of MRF models, the overall accuracy is still severely limited by the hand-crafted pairwise terms and message passing. To address these issues, we propose a neural MRF model, where both potential functions and message passing are designed using data-driven neural networks. Our fully data-driven model is built on the foundation of variational inference theory, to prevent convergence issues and retain stereo MRF's graph inductive bias. To make the inference tractable and scale well to high-resolution images, we also propose a Disparity Proposal Network (DPN) to adaptively prune the search space for every pixel.

overview

Highlights

  • High accuracy & efficiency

    NMRF-Stereo reports state-of-the-art accuracy on Scene Flow and ranks first on KITTI 2012 and KITTI 2015 leaderboards among all published methods at the time of submission. The model runs at 90ms (RTX 3090) for KITTI data (1242x375).

  • Strong cross-domain generalization

    NMRF-Stereo exhibits great generalization abilities on other dataset/scenes. The model is trained only with synthetic Scene Flow data:

    eth3d middlebury

  • Sharp depth boundaries

    NMRF-Stereo is able to recover sharp depth boundaries, which is key to downstream applications, such as 3D reconstruction and object detection.

    pointcloud

Installation

Our code is developed on Ubuntu 20.04 using Python 3.8 and PyTorch 1.13. Please note that the code has only been tested with these specified versions. We recommend using conda for the installation of dependencies:

  1. Create the NMRF conda environment and install all dependencies:
conda env create -f environment.yml
conda activate NMRF
  1. Build superpixel-guided disparity downsample operator:
cd kernels/downsample && python setup.py install --user && cd ../..

Dataset Preparation

To train/evaluate NMRF-Stereo, you will need to download the required datasets.

By default datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the $root/datasets folder:

ln -s $YOUR_DATASET_ROOT datasets

Our folder structure is as follows:

├── datasets
    ├── ETH3D
    │   ├── two_view_training
    │   └── two_view_training_gt
    ├── KITTI
    │   ├── KITTI_2012
    │   │   ├── testing
    │   │   └── training
    │   └── KITTI_2015
    │       ├── testing
    │       └── training
    ├── Middlebury
    │   ├── 2014
    │   └── MiddEval3
    └── SceneFlow
        ├── Driving
        │   ├── disparity
        │   └── frames_finalpass
        ├── FlyingThings3D
        │   ├── disparity
        │   └── frames_finalpass
        └── Monkaa
            ├── disparity
            └── frames_finalpass

(Optional) Occlusion mask

We provide a script to generate occlusion mask for Scene Flow dataset. This may bring marginal performance improvement.

python tools/generate_occlusion_map.py

Demos

Pretrained models can be downloaded from google drive

We assume the downloaded weights are located under the pretrained directory.

You can demo a trained model on pairs of images. To predict stereo for ETH3D, run

python inference.py --dataset-name eth3d --output $output_directory SOLVER.RESUME pretrained/sceneflow.pth

Or test on your own stereo pairs

python inference.py --input $left_directory/*.png $right_directory/*.png --output $output_directory SOLVER.RESUME pretrained/$pretrained_model.pth

Evaluation

To evaluate on SceneFlow test set, run

python main.py --num-gpus 4 --eval-only SOLVER.RESUME pretrained/sceneflow.pth

Or for cross-domain generalization:

python main.py --num-gpus 4 --eval-only --config-file configs/zero_shot_evaluation.yaml SOLVER.RESUME pretrained/sceneflow.pth

For submission to KITTI 2012 and 2015 online test sets, you can run:

python inference.py --dataset-name kitti_2015 SOLVER.RESUME pretrained/kitti.pth

and

python inference.py --dataset-name kitti_2012 SOLVER.RESUME pretrained/kitti.pth

Training

To train on SceneFlow, run

python main.py --checkpoint-dir checkpoints/sceneflow --num-gpus 4

To train on KITTI, run

python main.py --checkpoint-dir checkpoints/kitti --config-file configs/kitti_mix_train.yaml --num-gpus 4 SOLVER.RESUME pretrained/sceneflow.pth

We support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with

tensorboard --logdir checkpoints

and then access http://localhost:6006 in your browser.

Citation

If you find our work useful in your research, please consider citing our paper:

@article{guan2024neural,
  title={Neural Markov Random Field for Stereo Matching},
  author={Guan, Tongfan and Wang, Chen and Liu, Yun-Hui},
  journal={arXiv preprint arXiv:2403.11193},
  year={2024}
}

Acknowledgements

This project would not have been possible without relying on some awesome repos: RAFT-Stereo, Detectron2, and Swin.

nmrf's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nmrf's Issues

Configure the environment according to environment.yml, but an error is reported.

The error is reported as follows:

(base) D:\Project\NMRF-main>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

  • libstdcxx-ng=11.2.0
  • openh264=2.1.1
  • _openmp_mutex=5.1
  • gmp=6.2.1
  • libunistring=0.9.10
  • libgcc-ng=11.2.0
  • ncurses=6.4
  • ld_impl_linux-64=2.38
  • readline=8.2
  • libtasn1=4.19.0
  • libcufile=1.9.0.20
  • gnutls=3.6.15
  • libidn2=2.3.4
  • libgomp=11.2.0
  • nettle=3.7.3

Is the network 'fixed' by dataset?

Hello, thank you for your amazing job!
I have a question about your work.
Is the network 'fixed' by dataset?
I mean if I use kitti_pth to my own dataset(the params of cameras are different with kitti cameras totally), can I get the right depth?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.