huangying-zhan / depth-vo-feat Goto Github PK

View Code? Open in Web Editor NEW

345.0 13.0 66.0 1.32 MB

Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction

License: Other

C++ 26.21% Python 55.97% Cuda 17.01% Shell 0.81%

deep-learning caffe depth-estimation visual-odometry cvpr computer-vision 3d-vision

depth-vo-feat's Introduction

Introduction

This repo implements the system described in the CVPR-2018 paper:

Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction

Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, Ian Reid

@InProceedings{Zhan_2018_CVPR,
author = {Zhan, Huangying and Garg, Ravi and Saroj Weerasekera, Chamara and Li, Kejie and Agarwal, Harsh and Reid, Ian},
title = {Unsupervised Learning of Monocular Depth Estimation and Visual Odometry With Deep Feature Reconstruction},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

This repo includes (1) the training procedure of our models; (2) evaluation scripts for the results; (3) trained models and results.

Requirements
Prepare dataset
Depth
Depth and odometry
Feature Reconstruction Loss for Depth
Depth, odometry and feature
Result evaluation

Part 1. Requirements

This code was tested with Python 2.7, CUDA 8.0 and Ubuntu 14.04 using Caffe.

Caffe: Add the required layers in ./caffe into your own Caffe. Remember to enable Python Layers in the Caffe configuration.

Most of our required models, trained models and results can be downloaded from here. The following instruction also includes specific links to the items.

Part 2. Download dataset and models

The main dataset used in this project is KITTI Driving Dataset. Please follow the instruction in ./data/README.md to prepare the required dataset.

For our trained models and pre-requested models, please visit here to download the models and put the models into the directory ./models.

Part 3. Depth

In this part, the training of single view depth estimation network from stereo pairs is introduced. Photometric loss is used as the main supervision signal. Only stereo pairs are used in this experiment.

Update $YOUR_CAFFE_DIR in ./experiments/depth/train.sh.
Run bash ./expriments/depth/train.sh.

The trained models are saved in ./snapshots/depth

Part 4. Depth and odometry

In this part, the joint training of the depth estimation network and the visual odometry network is introduced. Photometric losses for spatial pairs and temporal pairs are used as the main supervision signal. Both spatial (stereo) pairs and temporal pairs (i.e. stereo sequences) are used in this experiment.

To facilitate the training, the model trained in the Depth experiment is used as an initialization.

Update $YOUR_CAFFE_DIR in ./experiments/depth_odometry/train.sh.
Run bash ./expriments/depth_odometry/train.sh.

The trained models are saved in ./snapshots/depth_odometry

Part 5. Feature Reconstruction Loss for Depth

In this part, the training of single view depth estimation network from stereo pairs is introduced. Both photometric loss and feature reconstruction loss are used as the main supervision signal. Only stereo pairs are used in this experiment. There are several features we have tried for this experiment. Currently, only the example of using KITTI Feat. is shown here. More details of using other features will be updated later.

To facilitate the training, the model trained in the Depth experiment is used as an initialization.

Update $YOUR_CAFFE_DIR in ./experiments/depth_feature/train.sh.
Run bash ./expriments/depth_feature/train.sh.

The trained models are saved in ./snapshots/depth_feature

Part 6. Depth, odometry and feature

In this part, we show the training including feature reconstruction loss. Stereo sequences are used in this experiment.

With the feature extractor proposed in Weerasekera et.al, we can finetune the trained depth model and/or odometry model with our proposed deep feature reconstruction loss.

Update $YOUR_CAFFE_DIR in ./experiments/depth_odometry_feature/train.sh.
Run bash ./expriments/depth_odometry_feature/train.sh.

NOTE: The link to download the feature extractor proposed in Weerasekera et.al will be released soon.

Part 7. Result evalution

Note that the evaluation script provided here uses a different image interpolation for resizing input images (i.e. python's interpolation v.s. Caffe's interpolation), therefore the quantative result could be a little different from the published result.

Depth estimation

Using the test set (697 image-depth pairs from 28 scenes) in Eigen Split is a common protocol to evaluate depth estimation result.

We basically use the evaluation script provided by monodepth to evalute depth estimation results.

In order to run the evaluation, a npy file is required to store the predicted depths. Then run the script to evaluate the performance.

Update caffe_root in ./tools/evaluation_tools.py
To generate the depth prediction and save it in a npy file.

 python ./tools/evaluation_tools.py --func generate_depth_npy --dataset kitti_eigen --depth_net_def ./experiments/networks/depth_deploy.prototxt --model models/trained_models/eigen_split/Baseline.caffemodel --npy_dir ./result/depth/inv_depths_baseline.npy

To evalute the predictions.

python ./tools/eval_depth.py --split eigen --predicted_inv_depth_path ./result/depth/inv_depths_baseline.npy --gt_path data/kitti_raw_data/ --min_depth 1  --max_depth 50 --garg_crop

Some of our results (inverse depths) are released and can be downloaded from here.

Visual Odometry

KITTI Odometry benchmark contains 22 stereo sequences, in which 11 sequences are provided with ground truth. The 11 sequences are used for evaluation or training of visual odometry.

Update caffe_root in ./tools/evaluation_tools.py
To generate the odometry predictions (relative camera motions), run the following script.

python ./tools/evaluation_tools.py --func generate_odom_result --model models/trained_models/odometry_split/Temporal.caffemodel --odom_net_def ./experiments/networks/odometry_deploy.prototxt --odom_result_dir ./result/odom_result

After getting the odometry predictions, we can evalute the performance by comparing with the ground truth poses.

python ./tools/evaluation_tools.py --func eval_odom --odom_result_dir ./result/odom_result

Our odometry results are released and can be downloaded from here.

License

For academic usage, the code is released under the permissive BSD license. For any commercial purpose, please contact the authors.

depth-vo-feat's People

Contributors

Stargazers

Watchers

Forkers

issac8huxley lijx10 vcmman simakovsp simonsroad dongdongbai ishitatakeshi xielinjiang endoyuuki zss1995 wpfhtl sg47 hyaihjq damimian magic0ad satoshirobatofujimoto taoihsu ming-c lixiangyu-1008 jinqiang panpanyunshi klqulei hongbowei sunkaianna tongpinmo ultronai yangjinhui2936 mianduo lichunshang haopo2005 bryan-bai liuwenhaha jiaojiaozhang deffandchen omarhamdoun bassykuo luoposss leeyangg peterzs vertexstudio aimerykong zkwalt aleonnet tamwaiban topgun666 itking666 zsustc hfutfl vmbatlle luozhibin851 satty93 adelaide-ai-group eric-yyjau liuguoyou nakajimakou1 superchong1987 ahmetkorkmaz82 kwrazi ankitshah009 sunstarchan wzking09 wolfworld6 pulkitchangoiwala aleky-g

depth-vo-feat's Issues

Depth

hi，
thank you for your work. I don't use the caffe , so I want to know if I get the disp from the dispnet. how can I get the depth from disp by using the focal and baseline ? and then I can use the depth to warp.

Something Wrong with the odometry result.

hi @Huangying-Zhan
thanks for your work!
when I the run odometry results you released, the ORB-SLAM odometry plot track path was wrong with the Ground Truth.

Do you know why?

does not support CLI Multi-GPU ?

sh train.sh

wen GPU = 0,1

I0410 14:03:04.544904 25645 net.cpp:406] SE3 <- T
I0410 14:03:04.544951 25645 net.cpp:380] SE3 -> SE3
F0410 14:03:04.545002 25645 python_layer.hpp:25] PythonLayer does not support CLI Multi-GPU, use train.py
*** Check failure stack trace: ***
    @     0x7fd369e545cd  google::LogMessage::Fail()
    @     0x7fd369e56433  google::LogMessage::SendToLog()
    @     0x7fd369e5415b  google::LogMessage::Flush()
    @     0x7fd369e56e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fd2ee9e61d6  caffe::PythonLayer<>::LayerSetUp()
    @     0x7fd36a51aa3c  caffe::Net<>::Init()
    @     0x7fd36a51d12e  caffe::Net<>::Net()
    @     0x7fd36a52fff5  caffe::Solver<>::InitTrainNet()
    @     0x7fd36a531465  caffe::Solver<>::Init()
    @     0x7fd36a53177f  caffe::Solver<>::Solver()
    @     0x7fd36a542451  caffe::Creator_AdamSolver<>()
    @           0x416dec  caffe::SolverRegistry<>::CreateSolver()
    @           0x40e6bd  train()
    @           0x40b8e3  main
    @     0x7fd368b32830  __libc_start_main
    @           0x40c289  _start
    @              (nil)  (unknown)

does not support CLI Multi-GPU ?

For the rmse of odom result

After reading your paper's experiment part, I updated the data_builder(which include only 00-08's 19600 pairs of images) and retrain the both part of Depth and Depth+Odom. Finally, my trained model got some huge difference from your Temporal.model.
"
Sequence: 9
Average translational RMSE (%): 18.29599141944674
Average rotational error (deg/100m): 6.84948743104451
Sequence: 10
Average translational RMSE (%): 31.608499883939217
Average rotational error (deg/100m): 13.520508069343077
"
compare to yours:
"
Sequence: 9
Average translational RMSE (%): 11.260165600943994
Average rotational error (deg/100m): 3.8829374695264844
Sequence: 10
Average translational RMSE (%): 10.655759483817
Average rotational error (deg/100m): 3.944550738667936
"

Could you help me to figure out why there is huge difference?

BTW, is your "Baseline.caffemodel" trained by the seq00-09 or trained by raw data's 23455 pairs?

can I train my own Temporal.caffemodel from Baseline.caffemodel?

If possible would you like to share any train.protxt for the depth/odom trainning regarding to your paper says you are following T.Zhou's paper to finish that part?

Also could you help to explain why you choose to use R12 to train odometry, but when you test with eval_tools.py, you are using Left image "image_02"?

Thank you!
my email:[email protected]

Transform relative location to absolute location in SfMLearner

Hi, as we know, all the predictions in SfMLearner are 5-frame snippets with the format of timestamp tx ty tz qx qy qz qw consistent with the TUM evaluation toolkit. However, the odometry results which you had released are continuous absolute position. How did you convert the relative position to ensure the accuracy of the results?
Thanks!

about plot error

I want to draw the plot error figure, but when I use the evaluation_tool, it turns out to be like this, how to solve this? thank you!

Results of visual odometry on supplementary material.

Hi @Huangying-Zhan

Do you still have the results of More Visual Odometry Results on your supplementary material which is trained on KITTI Eigen and testing on Seq. 00,04,05,07.

Thanks :)

About back-propagation of geometry transformation layer

I have a question about the back-propagation process in the geometry transformation layer. The derivative dX_dD in line 111 in geometry_transformation.cu is wrote as "dX_dD = trans_off[0] * base_X + trans_off[1] * base_Y + trans_off[2]". In my opinion, "X = trans_off[0] * base_X + trans_off[1] * base_Y + trans_off[2] * D + trans_off[3]", so the derivative of D is "dX_dD = trans_off[2]". Since the derivative is took of D, "trans_off[0] * base_X + trans_off[1] * base_Y" should be regarded as constant, which is the same as the "trans_off[2]". Thus I think the expression of dX_dD shouldn't contain the "trans_off[0] * base_X + trans_off[1] * base_Y". This may be my misunderstanding, could you please explain this for me? Thanks a lot!

Training on odometry split

Hi Zhan,
Can I use the given solver.prototxt for both Eigen-split based training and Odom-split based training? Is there any difference between them in terms of training parameters?

still cannot repeat your experiment, please help!

Thank you again for your new provided model.

What make me confuse is:

I have been working on the new baseline model of odom split you've been uploaded. from that point, with exact same train and solver you provided here for eigen_split, the result still not very good such as 26 for rmse, 9 for rot error/100m on Kitti sequence data series 9.

Originally posted by @bbbrian94 in #28 (comment)

about Trajectories

I know it is a silly question, but after I trained the model and get the test output, how can I get the Trajectories? Thank you!

Questions about "Differentiable geometry modules"

hi @Huangying-Zhan
Thanks your works. I have some questions about the "Differentiable geometry modules“ in your paper and code.

The coordinate projection process formulated as P(L,t1) = K*T(t2->t1)*D(L,t2)K^-1p(L,t2), It means the estimated pose is from the t2 to t1 as the input images stack order is I1, I2 ? As your code for evaluating the whole sequence 09 and 10. The input queue should be inverse i.e. (t2,t1),(t3,t2),(t4,t3) ?
So why the camera pose in projection function cannot be T(t1->t2), from source view to target view.

How to solve this problem

when running the depth train, meet this problem. what is pygeometry?

Check failed: false Unknown filler name: EdgeX

Part 3. Depth
Run

bash ./expriments/depth/train.sh

get ：

I0402 21:05:01.934139 12367 net.cpp:406] gx_Itgt_RGB <- Itgt_norm_imR_0_split_1
I0402 21:05:01.934151 12367 net.cpp:380] gx_Itgt_RGB -> gx_Itgt_RGB
F0402 21:05:01.934406 12367 filler.hpp:288] Check failed: false Unknown filler name: EdgeX

I check the 'train.prototxt'file，I find

layer {
  name: "gx_Itgt_RGB"
  type: "Convolution"
  bottom: "Itgt"
  top: "gx_Itgt_RGB"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 3
    bias_term: false
    kernel_size: 3
    group: 3
    stride: 1
    weight_filler {
      type: "EdgeX"
    }
  }
  include {phase: TRAIN}
}

but there seem no “EdgeX”

when i run the comand “make runtest”,i meet this error, i have added the required layers in ./caffe into my own Caffe

F0708 06:35:00.978440 12323 blob.hpp:140] Check failed: num_axes() <= 4 (5 vs. 4) Cannot use legacy accessors on Blobs with > 4 axes.
*** Check failure stack trace: ***
@ 0x7f0be45ab5cd google::LogMessage::Fail()
@ 0x7f0be45ad433 google::LogMessage::SendToLog()
@ 0x7f0be45ab15b google::LogMessage::Flush()
@ 0x7f0be45ade1e google::LogMessageFatal::~LogMessageFatal()
@ 0x490d41 caffe::Blob<>::LegacyShape()
@ 0x77a9dc caffe::PositiveUnitballFiller<>::Fill()
@ 0x8f1a90 caffe::PositiveUnitballFillerTest<>::test_params()
@ 0x8eb5ee caffe::PositiveUnitballFillerTest_TestFill5D_Test<>::TestBody()
@ 0x955973 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x94ef8a testing::Test::Run()
@ 0x94f0d8 testing::TestInfo::Run()
@ 0x94f1b5 testing::TestCase::Run()
@ 0x95048f testing::internal::UnitTestImpl::RunAllTests()
@ 0x9507b3 testing::UnitTest::Run()
@ 0x47123d main
@ 0x7f0be10bc830 __libc_start_main
@ 0x4792e9 _start
@ (nil) (unknown)
Makefile:533: recipe for target 'runtest' failed
make: *** [runtest] Aborted (core dumped)

when run ./experiments/depth_odometry_feature/train.sh

in ./experiments/depth_odometry_feature/train.sh, it requires Weerasekera_nyu.caffemodel, where can I dowload this caffe model? thank you!

An issue about the compiling of caffe

Hi:) @Huangying-Zhan
After I added your layers in caffe and compiled it, I found this import error when I ran the bash file ./experiments/depth/train.sh to training the depth estimation network.

I0301 14:24:43.756604 3770 layer_factory.hpp:77] Creating layer SE3 Traceback (most recent call last): File "/home/myubuntu/anaconda3/envs/caf/caffe/python/pygeometry.py", line 1, in <module> import caffe File "/home/myubuntu/anaconda3/envs/caf/caffe/python/caffe/__init__.py", line 1, in <module> from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer File "/home/myubuntu/anaconda3/envs/caf/caffe/python/caffe/pycaffe.py", line 11, in <module> import numpy as np File "/home/myubuntu/anaconda3/envs/caf/lib/python2.7/site-packages/numpy/__init__.py", line 140, in <module> from . import _distributor_init File "/home/myubuntu/anaconda3/envs/caf/lib/python2.7/site-packages/numpy/_distributor_init.py", line 33, in <module> with RTLD_for_MKL(): File "/home/myubuntu/anaconda3/envs/caf/lib/python2.7/site-packages/numpy/_distributor_init.py", line 18, in __enter__ import ctypes File "/home/myubuntu/anaconda3/envs/caf/lib/python2.7/ctypes/__init__.py", line 7, in <module> from _ctypes import Union, Structure, Array ImportError: /home/myubuntu/anaconda3/envs/caf/lib/python2.7/lib-dynload/_ctypes.so: undefined symbol: _PySlice_Unpack

It seems that this issue occurred when the SE3 layer we added was creating. However, when I imported caffe in other python files, it could be imported successfully. For example, when I ran the pygeometry.py in the terminal, there was no any error.
Could you tell me how I can solve this issue?

F0418 21:31:37.329504 11665 image_data_layer.cpp:49] Check failed: !lines_.empty() File is empty

doubts on the odometry

Hello Zhan
Here I have some problems.In your paper,you just use the cnn to get the two-view odometry.
How to get the result like Figure.3? Does it use the tool on the https://vision.in.tum.de/data/datasets/rgbd-dataset/tools? But they use the txtytz qxqyqzqw as input.
And I am the beginner of learning this area, so could you share codes to do these transformation,such as accumulate all timestamps from two-view odometry,the transformation of matrix,，quaternion?

thanks

Hello,when i run ./experiments/depth/train.sh ,it occur a error : no module named pygeometry,but i try import geometry,it works.And i am sure i install PyGeometry module

error: (-215) ssize.area() > 0 in function resize

Visual Odometry
To generate the odometry predictions (relative camera motions), run the following script.

python ./tools/evaluation_tools.py --func generate_odom_result --model models/trained_models/odometry_split/Temporal.caffemodel --odom_net_def ./experiments/networks/odometry_deploy.prototxt --odom_result_dir ./result/odom_result

I0410 19:00:23.287318  7460 net.cpp:744] Ignoring source layer silence_train
Getting predictions... Sequence:  0  /  10
OpenCV Error: Assertion failed (ssize.area() > 0) in resize, file /home/travis/miniconda/conda-bld/conda_1486587066442/work/opencv-3.1.0/modules/imgproc/src/imgwarp.cpp, line 3229
Traceback (most recent call last):
  File "../../caffe-master/tools/evaluation_tools.py", line 648, in <module>
    pred_poses = generator.getPredPoses()
  File "../../caffe-master/tools/evaluation_tools.py", line 155, in getPredPoses
    img1 = self.getImage(img1_path)
  File "../../caffe-master/tools/evaluation_tools.py", line 125, in getImage
    img = cv2.resize(img, (self.image_width, self.image_height))
cv2.error: /home/travis/miniconda/conda-bld/conda_1486587066442/work/opencv-3.1.0/modules/imgproc/src/imgwarp.cpp:3229: error: (-215) ssize.area() > 0 in function resize

when I run evaluation_tools.py to generate_depth_npy, has error"target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompat ible number of blobs for layer bn_1"

when I run evaluation_tools.py to generate_depth_npy,
has error"target_blobs.size() == source_layer.blobs_size() (5 vs. 3)
Incompatible number of blobs for layer bn_1"

so I commented all the "BatchNorm" layer of depth_deploy.prototxt, the python code can run ,but the depth maps error

pretrained model in dropbox can't download

Thank you for your code, I want to use your pre-training model to test some results, I entered the Dropbox page to download the model, but I can't download it.Maybe there is something wrong with the website. Do you have other addresses for storing the model, such as Google Cloud Drive, thank you very much!

how to visualize the disparity map

Could you please tell me how to visualize the 1-channel disparity map as 3-channel color image?

get depth-prediction

Sorry to bother you. How to solve this when run the depth evaluation? Thank you!

tmp

Get wrong Results

hi
when I use your method for the depth experiment using stereo images pairs. With the train go on , my disparity will go all the dark. Do you know what cause this ?

about trajectories plot

It's so kind of you to provide not only your pose result but also other methods' result, like ORBslam.

I notice that you compare the result of SfMlearner's result. Could you please upload a file of it?

No register converter error occured when running the Python Layer('SE3') in odometry_deploy.prototxt

Error info:

Traceback (most recent call last):
File "compare.py", line 110, in main
odom_net.forward()
File "/home/gaof/caffe-dev/python/caffe/pycaffe.py", line 121, in _Net_forward
self._forward(start_ind, end_ind)
File "/home/gaof/caffe-dev/python/pygeometry.py", line 33, in forward
self.uw = bottom[0].data[:,:3]
TypeError: No registered converter was able to produce a C++ rvalue of type std::shared_ptr<caffe::Blob > from this Python object of type Blob

Note that I have copied files in ./caffe into my own caffe and recompiled it.

How can I fix this bug? Did anyone else encounter this problem as well?

what is your numpy verion please

I met a problem,could you tell me your numpy version?

RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
ImportError: numpy.core.multiarray failed to import

Align the unsupervised methods results

hi @Huangying-Zhan

I have a question about how to align the results from sfmlearner with ground truth?

first, the result produce by sfmlearner are 5-frames, and all the pose ares relative to the middle frame, so you said in your paper that you align every 5 frames independently?
could you share your code how to align with GT?

Run bash ./expriments/depth/train.sh.

it happen some error ：

 29274 layer_factory.hpp:77] Creating layer geoTransform
F0402 17:17:15.390364 29274 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: GeoTransform (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Parameter, Pooling, Power, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
    @     0x7f53f519f5cd  google::LogMessage::Fail()
    @     0x7f53f51a1433  google::LogMessage::SendToLog()
    @     0x7f53f519f15b  google::LogMessage::Flush()
    @     0x7f53f51a1e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f53f583f794  caffe::Net<>::Init()
    @     0x7f53f5840bae  caffe::Net<>::Net()
    @     0x7f53f5849d05  caffe::Solver<>::InitTrainNet()
    @     0x7f53f584b175  caffe::Solver<>::Init()
    @     0x7f53f584b48f  caffe::Solver<>::Solver()
    @     0x7f53f585c961  caffe::Creator_AdamSolver<>()
    @           0x40bd63  train()
    @           0x408480  main
    @     0x7f53f3e7d830  __libc_start_main
    @           0x408ca9  _start
    @              (nil)  (unknown)

by the way ，can i have a question about how to “Remember to enable Python Layers in the Caffe configuration.” in ”Part 1. Requirements“

hi how to use this project

I want to use this project