yoyo000 / mvsnet Goto Github PK

View Code? Open in Web Editor NEW

1.3K 43.0 303.0 6.58 MB

MVSNet (ECCV2018) & R-MVSNet (CVPR2019)

License: MIT License

Python 100.00%

mvsnet's Introduction

MVSNet & R-MVSNet

[News] BlendedMVS dataset is released!!! (project link).

About

MVSNet is a deep learning architecture for depth map inference from unstructured multi-view images, and R-MVSNet is its extension for scalable learning-based MVS reconstruction. If you find this project useful for your research, please cite:

@article{yao2018mvsnet,
  title={MVSNet: Depth Inference for Unstructured Multi-view Stereo},
  author={Yao, Yao and Luo, Zixin and Li, Shiwei and Fang, Tian and Quan, Long},
  journal={European Conference on Computer Vision (ECCV)},
  year={2018}
}

@article{yao2019recurrent,
  title={Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference},
  author={Yao, Yao and Luo, Zixin and Li, Shiwei and Shen, Tianwei and Fang, Tian and Quan, Long},
  journal={Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

If BlendedMVS dataset is used in your research, please also cite:

@article{yao2020blendedmvs,
  title={BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks},
  author={Yao, Yao and Luo, Zixin and Li, Shiwei and Zhang, Jingyang and Ren, Yufan and Zhou, Lei and Fang, Tian and Quan, Long},
  journal={Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

How to Use

Installation

Check out the source code git clone https://github.com/YoYo000/MVSNet
Install cuda 9.0, cudnn 7.0 and python 2.7
Install Tensorflow and other dependencies by sudo pip install -r requirements.txt

Download

Preprocessed training/validation data: BlendedMVS, DTU and ETH3D. More training resources could be found in BlendedMVS github page
Preprocessed testing data: DTU testing set, ETH3D testing set, Tanks and Temples testing set and training set
Pretrained models: pretrained on BlendedMVS, on DTU and on ETH3D

Training

Enter mvsnet script folder: cd MVSNet/mvsnet
Train MVSNet on BlendedMVS, DTU and ETH3D:
python train.py --regularization '3DCNNs' --train_blendedmvs --max_w 768 --max_h 576 --max_d 128 --online_augmentation
python train.py --regularization '3DCNNs' --train_dtu --max_w 640 --max_h 512 --max_d 128
python train.py --regularization '3DCNNs' --train_eth3d --max_w 896 --max_h 480 --max_d 128
Train R-MVSNet:
python train.py --regularization 'GRU' --train_blendedmvs --max_w 768 --max_h 576 --max_d 128 --online_augmentation
python train.py --regularization 'GRU' --train_dtu --max_w 640 --max_h 512 --max_d 128
python train.py --regularization 'GRU' --train_eth3d --max_w 896 --max_h 480 --max_d 128
Specify your input training data folders using --blendedmvs_data_root, --dtu_data_root and --eth3d_data_root
Specify your output log and model folders using --log_folder and --model_folder
Switch from BlendeMVS to BlendedMVG by replacing using --train_blendedmvs with --train_blendedmvg

Validation

Validate MVSNet on BlendedMVS, DTU and ETH3D:
python validate.py --regularization '3DCNNs' --validate_set blendedmvs --max_w 768 --max_h 576 --max_d 128
python validate.py --regularization '3DCNNs' --validate_set dtu --max_w 640 --max_h 512 --max_d 128
python validate.py --regularization '3DCNNs' --validate_set eth3d --max_w 896 --max_h 480 --max_d 128
Validate R-MVSNet:
python validate.py --regularization 'GRU' --validate_set blendedmvs --max_w 768 --max_h 576 --max_d 128
python validate.py --regularization 'GRU' --validate_set dtu --max_w 640 --max_h 512 --max_d 128
python validate.py --regularization 'GRU' --validate_set eth3d --max_w 896 --max_h 480 --max_d 128
Specify your input model check point using --pretrained_model_ckpt_path and --ckpt_step
Specify your input training data folders using --blendedmvs_data_root, --dtu_data_root and --eth3d_data_root
Specify your output result file using --validation_result_path

Testing

Download test data scan9 and unzip it to TEST_DATA_FOLDER folder
Run MVSNet (GTX1080Ti):
python test.py --dense_folder TEST_DATA_FOLDER --regularization '3DCNNs' --max_w 1152 --max_h 864 --max_d 192 --interval_scale 1.06
Run R-MVSNet (GTX1080Ti):
python test.py --dense_folder TEST_DATA_FOLDER --regularization 'GRU' --max_w 1600 --max_h 1200 --max_d 256 --interval_scale 0.8
Specify your input model check point using --pretrained_model_ckpt_path and --ckpt_step
Specify your input dense folder using --dense_folder
Inspect the .pfm format outputs in TEST_DATA_FOLDER/depths_mvsnet using python visualize.py .pfm. For example, the depth map and probability map for image 00000012 should look like:


reference image	depth map	probability map

Post-Processing

R/MVSNet itself only produces per-view depth maps. To generate the 3D point cloud, we need to apply depth map filter/fusion for post-processing. As our implementation of this part is depended on the Altizure internal library, currently we could not provide the corresponding code. Fortunately, depth map filter/fusion is a general step in MVS reconstruction, and there are similar implementations in other open-source MVS algorithms. We provide the script depthfusion.py to utilize fusibile for post-processing (thank Silvano Galliani for the excellent code!).

To run the post-processing:

Check out the modified version fusibile git clone https://github.com/YoYo000/fusibile
Install fusibile by cmake . and make, which will generate the executable at FUSIBILE_EXE_PATH
Run post-processing (--prob_threshold 0.8 if using 3DCNNs): python depthfusion.py --dense_folder TEST_DATA_FOLDER --fusibile_exe_path FUSIBILE_EXE_PATH --prob_threshold 0.3
The final point cloud is stored in TEST_DATA_FOLDER/points_mvsnet/consistencyCheck-TIME/final3d_model.ply.

We observe that depthfusion.py produce similar but quantitatively worse result to our own implementation. For detailed differences, please refer to MVSNet paper and Galliani's paper. The point cloud for scan9 should look like:


point cloud result	ground truth point cloud

Reproduce Paper Results

The following steps are required to reproduce depth map/point cloud results:

Generate R/MVSNet inputs from SfM outputs.You can use our preprocessed testing data in the download section. (provided)
Run R/MVSNet testing script to generate depth maps for all views (provided)
Run R/MVSNet validation script to generate depth map validation results. (provided)
Apply variational depth map refinement for all views (optional, not provided)
Apply depth map filter and fusion to generate point cloud results (partially provided via fusibile)

R-MVSNet point cloud results with full post-processing are also provided: DTU evaluation point clouds

File Formats

Each project folder should contain the following

.                          
├── images                 
│   ├── 00000000.jpg       
│   ├── 00000001.jpg       
│   └── ...                
├── cams                   
│   ├── 00000000_cam.txt   
│   ├── 00000001_cam.txt   
│   └── ...                
└── pair.txt

If you want to apply R/MVSNet to your own data, please structure your data into such a folder. We also provide a simple script colmap2mvsnet.py to convert COLMAP SfM result to R/MVSNet input.

Image Files

All image files are stored in the images folder. We index each image using an 8 digit number starting from 00000000. The following camera and output files use the same indexes as well.

Camera Files

The camera parameter of one image is stored in a cam.txt file. The text file contains the camera extrinsic E = [R|t], intrinsic K and the depth range:

extrinsic
E00 E01 E02 E03
E10 E11 E12 E13
E20 E21 E22 E23
E30 E31 E32 E33

intrinsic
K00 K01 K02
K10 K11 K12
K20 K21 K22

DEPTH_MIN DEPTH_INTERVAL (DEPTH_NUM DEPTH_MAX)

Note that the depth range and depth resolution are determined by the minimum depth DEPTH_MIN, the interval between two depth samples DEPTH_INTERVAL, and also the depth sample number DEPTH_NUM (or max_d in the training/testing scripts if DEPTH_NUM is not provided). We also left the interval_scale for controlling the depth resolution. The maximum depth is then computed as:

DEPTH_MAX = DEPTH_MIN + (interval_scale * DEPTH_INTERVAL) * (max_d - 1)

View Selection File

We store the view selection result in the pair.txt. For each reference image, we calculate its view selection scores with each of the other views, and store the 10 best views in the pair.txt file:

TOTAL_IMAGE_NUM
IMAGE_ID0                       # index of reference image 0 
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 0 
IMAGE_ID1                       # index of reference image 1
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 1 
...

MVSNet input from SfM output

We provide a script to convert COLMAP SfM result to R/MVSNet input. After recovering SfM result and undistorting all images, COLMAP should generate a dense folder COLMAP/dense/ containing an undistorted image folder COLMAP/dense/images/ and an undistorted camera folder COLMAP/dense/sparse/. Then, you can apply the following script to generate the R/MVSNet input:

python colmap2mvsnet.py --dense_folder COLMAP/dense

The depth sample number will be automatically computed using the inverse depth setting. If you want to generate the MVSNet input with a fixed depth sample number (e.g., 256), you could specified the depth number via --max_d 256.

Output Format

The test.py script will create a depths_mvsnet folder to store the running results, including the depth maps, probability maps, scaled/cropped images and the corresponding cameras. The depth and probability maps are stored in .pfm format. We provide the python IO for pfm files in the preprocess.py script, and for the c++ IO, we refer users to the Cimg library. To inspect the pfm format results, you can simply type python visualize.py .pfm.

Changelog

2020 April 13

Update BlendedMVG interface

2020 March 2

Pretrained models on BlendedMVS, DTU and ETH3D (trained for 150000 iterations)
Update instructions in README.md

2020 Feb 29

Training with BlendedMVS dataset
Add validate.py script for depth map validation
Add photometric_augmentation.py script for online augmentation durig training

2019 April 29

Add colmap2mvsnet.py script to convert COLMAP SfM result to MVSNet input, including depth range estimation and view selection

2019 April 10

Add Baiduyun (code: s2v2) link for mainland China users

2019 March 14

Add R-MVSNet point clouds of DTU evaluation set

2019 March 11

Add "Reproduce Benchmarking Results" section

2019 March 7

MVSNet/R-MVSNet training/testing scripts
MVSNet and R-MVSNet models (trained for 100000 iterations)

2019 March 1

Implement R-MVSNet and GRU regularization
Network change: enable scale and center in batch normalization
Network change: replace UniNet with 2D UNet
Network change: use group normalization in R-MVSNet

2019 Feb 28

Use tf.contrib.image.transform for differentiable homography warping. Reconstruction is now x2 faster!

mvsnet's People

Contributors

Stargazers

Watchers

Forkers

fanziapril zebrajack peterzhousz leiup xiedufang gitgaoxing skylook leejaeyong7 stevenlol gninnur qicny xjliu1204 kongan jiyongma hylrh2008 tongpinmo a382695908 lunwk hushanming json87 realitytracer hlzz hurricane2018 shubhamag archive-git-repo clarencehoo xiaohedu 823639792 dazaier leeyangg drawhamxh xzhang311 progressforever ysf1996 peterzs klqulei miaoshasha gaopeng91 rkpatel1607 cpheinrich gengrui1983 wyinggui hollisjoe xiaobin-jiang xiaohulugo bolinpu zyfccc rahulsherwan366 vivasvan1 yinghuasha berther greatqz minhpvo gjy3035 zhongjiejiang zlou qcr10nrh simonsroad oopming nnu-gisa zbzstar richardyao1995 gcv9htd hersheyschen chenxi840221 jack-jun lcbwn wpfhtl ml-lab hdjsjyl randal7 donproc btm520 ubiquity6 touristcheng ottffive soccergame jackye001 strategist922 maotianwhu huangmhao jinyujinyujinyu xiaochi97 mengxin6 hyzcn xiwj hs5530hs fd-mingjie aabbas90 kuibinzhao gxy8696 lifunudt shaunyang1 blackjack2015 wykxyz davijo daydreamer2023 neycyanshi alanlcs suyuan945

mvsnet's Issues

There is no definition of mvsnet_loss in train.py.

There is no definition of mvsnet_loss in train.py.
Maybe, it refers to mvenet_regression_loss in loss.py ?

Error while executing make command

Thanks for the implementation.

I successfully ran the training part. For post processing, I tried to install fusible using the readme commands. I successfully ran cmake command with cuda 7.5 but while executing make command, I am getting the below error:

'''''
CMake Error at fusibile_generated_fusibile.cu.o.cmake:207 (message):
Error generating
~/MVSNet/mvsnet/fusibile/CMakeFiles/fusibile.dir//./fusibile_generated_fusibile.cu.o

CMakeFiles/fusibile.dir/build.make:63: recipe for target 'CMakeFiles/fusibile.dir/fusibile_generated_fusibile.cu.o' failed
make[2]: *** [CMakeFiles/fusibile.dir/fusibile_generated_fusibile.cu.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/fusibile.dir/all' failed
make[1]: *** [CMakeFiles/fusibile.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
''''''''

Please help me!

Unable to replicate benchmark results on Eth3D

UnicodeDecodeError by load_pfm()

Hi! I am trying to implement this project.
But when I use the function load_pfm(), I got
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa5 in position 16: illegal multibyte sequence
in line
header = str(file.readline()).rstrip().
I've tryed many encoding type and all of them didn't work.
So I want to ask did you guys get this error when load the pfm file ?
I still don't know whether it's the matter though I can ignore the errors.

How do you compute the memory required?

How can I compute the memory required to do inference with MVSNet as a function of max_w, max_h, max_d, interval_scale and possibly some other parameters?

As it seems that as DEPTH_INTERVAL decreases the network uses more memory.

Exporting camera values for use in MVSNet

I've been struggling to adapt camera extrinsics and intrinsics for use in MVSNet.

For extrinsics, I export E = [R|t] (via Colmap) as a 3x4 matrix composed of a 3x3 rotation matrix and 3x1 translation vector, as shown below:
r1,1 r1,2 r1,3 | t1
r2,1 r2,2 r2,3 | t2
r3,1 r3,2 r3,3 | t3

For example, here's the working '00000036_cam.txt' file:

extrinsic
-0.304163 -0.87333 0.380498 -236.771
0.244298 0.314555 0.917264 -567.94
-0.920762 0.371953 0.117677 583.523
0.0 0.0 0.0 1.0

intrinsic
2892.33 0 823.205
0 2883.18 619.072
0 0 1

425 2.5

When run, it yields the resulting depth map:

My output for the same source image using the code snippet shown below is:

extrinsic
0.998263 0.00300635 0.0588315 -0.453214
0.00892024 0.979466 -0.201412 -0.553871
-0.058229 0.201587 0.977738 1.11213
0.0 0.0 0.0 1.0

intrinsic
2889.61 0 800
0 2889.61 600
0 0 1

425 1.0

The depth map that results is shown here:

Looking at the intrinsics, you will immediately note that I'm using a single focal length value and that I haven't tuned the principal point; however, that seems not to be a problem as I'm able to compute results for your provided data by substituting my intrinsics values.

I thought perhaps the data was fine but I needed to adjust the DEPTH_MIN and DEPTH_INTERVAL values in order to frame the depth values, but changing those values yields highly similar results.

Therefore, the problem seems to be my construction of the extrinsics matrix. Any pointers would be very welcome.

Could you share the specification or code you use to export the camera values prior to MVSNet reconstruction?

My c++ output code using the Colmap library is below:

    file << "extrinsic" << std::endl;

    Eigen::Matrix3d R;
    R = image.second.RotationMatrix();
    
    // Write camera rotation matrix and translation vector
    file << R(0,0) << " " << R(0,1) << " " << R(0,2) << " " << image.second.Tvec(0) << std::endl;
    file << R(1,0) << " " << R(1,1) << " " << R(1,2) << " " << image.second.Tvec(1) << std::endl;
    file << R(2,0) << " " << R(2,1) << " " << R(2,2) << " " << image.second.Tvec(2) << std::endl;
    file << "0.0 0.0 0.0 1.0" << std::endl;

    // Write camera intrinsics
    file << std::endl;
    file << "intrinsic" << std::endl;

    // Reference to current image's camera
    auto& camera = cameras_.at(image.second.CameraId());
    
    // Note hard-coded zero value for skew
    file << camera.FocalLength() << " 0 " << camera.PrincipalPointX() << std::endl;
    file << "0 " << camera.FocalLength() << " " << camera.PrincipalPointY() << std::endl;
    file << "0 0 1" << std::endl;
    file << std::endl;```

ROI via principal point shift and focal length scaling

Here I have MVSNet camera information for a 4000x3000 image:

extrinsic
0.999804 -0.019096 -0.00518646 -1.1364
0.0189494 0.999457 -0.0269683 -1.71932
0.00569863 0.0268648 0.999623 1.65699
0.0 0.0 0.0 1.0

intrinsic
4806.29 0 2000
0 4806.29 1500
0 0 1

2.56448 0.0640064

The focal length (4806.29 pixels) and principle point (centered) are shown in bold in the above intrinsic matrix:

4806.29 0 2000
0 4806.29 1500
0 0 1

Next we crop the input image to its top left quadrant (one quarter of our original image, now 2000x1500) and compensate in the intrinsics by doubling the focal length and shifting the principal point as follows:

9612.58 0 0
0 9612.58 0
0 0 1

The idea is to break the existing camera into parts, creating an arbitrary region of interest.

When I run MVSNet on the above, I get sensible results for the initial pose but unexpected results for the upper left quadrant intrinsics.

Is this a sound approach? Does MVSNet need all input images to be the same size?

Testing on Aerial Dataset data

Hello,

I am currently trying to test MVSNet on the Aerial dataset. The scene I am using for testing contains 226 images of 1280 × 720. I resized them to 1280 × 900 prior using MVSNet.
The 00000000_cam.txt looks like this:

extrinsic
-0.781098 0.61589 0.102788 -1.16349
0.278708 0.491198 -0.825255 -1.00648
-0.558756 -0.615958 -0.555327 16.4424
0.0 0.0 0.0 1.0

intrinsic
2307.27 0 640
0 2307.27 360
0 0 1

8 0.05

I also added information in the pair.txt file by computing feature similarly between the pictures and selecting the top 10 results (for that I simply computed the RMSE between images and choose the lowest 10 results). The depth range for the input images is between 8~17. So I select a depth_interval of 0.05. I am also using a max_d of 192 and interval_scale of 1.
The output looks:

Clearly, the result is very off the original input image which looks like:

If you could please provide some guidance about the depth reconstruction or whether you ever assessed MVSNet on Aerial it would be great!

Thank you!

Fusion of input with varying K values

@YoYo000's custom variant of Gipuma fusion, provided here, appears to assume that all input camera intrinsics share K. For instance, input with varying principal points breaks the fusion.

Can you confirm, and perhaps suggest a workaround or an alternative fusion pipeline?

Freezing model

Hi,

I'm trying to optimize the code. You previously stated one could freeze the model to improve performance times. Is there an associated .pbtxt file you have?

Or does your training code save a .pbtxt file? I didn't find a "tf.train.write_graph" call to write out the .pbtxt file.

Thanks again for the help and great software!

fusible printing an empty .ply

Hello! I managed to run everything for the MVSNet and I got the depth folder that looks exactly like your example but when I try to run fusible the .ply file is empty. Also it gives me the following message:

.
.
.

Resizing globalstate to 49
Run cuda
Run gipuma
Grid size initrand is grid: 9-7 block: 32-32
Device memory used: 337.707001MB
Number of iterations is 8
Blocksize is 15x15
Disparity threshold is 0.250000
Normal threshold is 6.283185
Number of consistent points is 3
Cam scale is 1.000000
Fusing points
Processing camera 0
Found 0.00 million points
Processing camera 1
Found 0.00 million points
Processing camera 2
Found 0.00 million points
Processing camera 3
Found 0.00 million points
Processing camera 4
Found 0.00 million points
Processing camera 5
Found 0.00 million points
Processing camera 6
Found 0.00 million points
Processing camera 7
Found 0.00 million points
Processing camera 8
Found 0.00 million points
Processing camera 9
Found 0.00 million points
Processing camera 10
Found 0.00 million points
Processing camera 11
Found 0.00 million points
Processing camera 12
Found 0.00 million points
Processing camera 13
Found 0.00 million points
Processing camera 14
Found 0.00 million points
Processing camera 15
Found 0.00 million points
Processing camera 16
Found 0.00 million points
Processing camera 17
Found 0.00 million points
Processing camera 18
Found 0.00 million points
Processing camera 19
Found 0.00 million points
Processing camera 20
Found 0.00 million points
Processing camera 21
Found 0.00 million points
Processing camera 22
Found 0.00 million points
Processing camera 23
Found 0.00 million points
Processing camera 24
Found 0.00 million points
Processing camera 25
Found 0.00 million points
Processing camera 26
Found 0.00 million points
Processing camera 27
Found 0.00 million points
Processing camera 28
Found 0.00 million points
Processing camera 29
Found 0.00 million points
Processing camera 30
Found 0.00 million points
Processing camera 31
Found 0.00 million points
Processing camera 32
Found 0.00 million points
Processing camera 33
Found 0.00 million points
Processing camera 34
Found 0.00 million points
Processing camera 35
Found 0.00 million points
Processing camera 36
Found 0.00 million points
Processing camera 37
Found 0.00 million points
Processing camera 38
Found 0.00 million points
Processing camera 39
Found 0.00 million points
Processing camera 40
Found 0.00 million points
Processing camera 41
Found 0.00 million points
Processing camera 42
Found 0.00 million points
Processing camera 43
Found 0.00 million points
Processing camera 44
Found 0.00 million points
Processing camera 45
Found 0.00 million points
Processing camera 46
Found 0.00 million points
Processing camera 47
Found 0.00 million points
Processing camera 48
Found 0.00 million points
ELAPSED 0.022905 seconds
Error: no kernel image is available for execution on the device
Writing ply file ../TEST_DATA_FOLDER/dtu_test_scan9/scan9/points_mvsnet//consistencyCheck-20190220-150052//final3d_model.ply
store 3D points to ply file

it looks like there's something wrong with the kernel image? Do you have any idea of what could be causing this? i'm running it on a Tesla K80 with CUDA 9.0.

Unable to replicate benchmark results on Eth3D

Hi @YoYo000,

I have been attempting to replicate your results on the lakeside dataset from Eth3D (and again thank you for the preprocessed dataset). However, I am getting some spotty results and I am wondering what parameters you used when running the network and running fusibile as well.

Here are the images of my final reconstruction:

As for my setup, I used a max_w of 640 and a max_h of 480. There were some problems in the actual test.py file where if a collection of views had different image sizes, the actual cropping of the images to max_w and max_h failed. I bypassed this by having max_w and max_h be divided by the min width and height of the images (I'm sure this didn't affect anything).

After running this, I ran fusibile with 0.3 prob_threshold.

I'm hoping you could let me know where I am going wrong and how to reproduce your benchmarks.

Thanks,
Prashant

Training with max_d 256

Hi,

I was trying to compare the performance of the net with different number of depth samples. My expectation (and as per what the paper suggests) would be that higher number of depth samples would lead to more accurate depth predictions.
I ran two experiments on DTU, one with max_d 128, interval_scale set to 1.6 , and second with max_d 256 and interval_scale set to 0.8. I observe that the first model is performing much better (% < 1mm close to 90% ) than the second model with higher depth samples (%<1mm around 80% ) .

Is there anything else in the code to be changed when training/testing with different max_d besides interval_scale ? I am using the provided DTU cam files.

Question about ground-truth depth maps and the evaluation

Hello!
Thanks a lot for your work!

I have tried to train and test the network. It works fine but I still have two questions：
(1)For training, I wonder how to get the ground-truth depth maps? You have provided the depth maps and given a brief description in the paper, but it is difficult for people new to the field to realize that. Can you provide more instructions on generating ground truth depth maps?Or is there any software to do that?
(2)For testing, I have tried to compare the performance with Colmap. However, Colmap does not rely on calibrated camera parameters and seems to use a different coordinate system. Since the evaluation criteria of accuracy and completeness calculate the distances between point clouds, the distances calculated may be very high even though the point clouds are similar. I wonder how to conduct fair comparisons as you did in the paper?

Hope you can give some instructions. Thanks a lot.

Minimum hardware requirements to train/test

Hi,

First of all, congratulations for you excellent work and contribution to MVS.

I've tried to test the code with an Intel 8700k, 16GB RAM and Nvidia 1070 8GB VRAM without success using both tensorflow-cpu and tensorflow-gpu.

The problem is resource exhaustion, the system gets out of RAM or VRAM few seconds after program execution.

My first idea to fix it was decreasing batch size, but it is already 1.

Do I need a better graphics card to run your code? Could I tune some params to make it work with my current computer setup?

Thanks.

Shape mismatch after reducing test image sizes

From your paper, I see:

"It is noteworthy that the training and validation on DTU dataset could be done using one consumer level GTX 1080ti graphics card (11 GB)."

After finding that 6GB of GPU RAM wasn't sufficient to complete testing with the DTU 'scan9' data set, I halved max width and height in 'test.py' in the hope of being able to test (but not train) with 6GB of GPU RAM, e.g.:
'max_w' 1152-->576,
'max_h' 864-->432

Launching test.py now fails during a Tensorflow call as the height dimension varies by one:

ValueError: Dimension 2 in both shapes must be equal, but are 28 and 27. Shapes are [1,48,28,36,32] and [1,48,27,36,32].

I also tried scaling the (49) input images by half (1,600x1,200 --> 800x600) with the above reduced max width and height, with similar results:

ValueError: Dimension 2 in both shapes must be equal, but are 28 and 27. Shapes are [1,48,28,36,32] and [1,48,27,36,32].

Are there other values I need to edit to successfully run resized images?

About test set results

Dear author:
Thanks for your work and open source of MVSNet!
I have some trouble during the test set results.
I use your code step by step to generate the result depths of test sets; however, in the terminal there is some error coming out (error message is at the bottom). And I have rechecking my tensorflow version is 1.9.0. Looks like the error coming from the dim between this two tensors in the gather_nd. Maybe there is some problem that I have not find out yet. Please help me. thanks again

tiffany@tiffany-System-Product-Name:~/Downloads/MVSNet/mvsnet$ python test.py --dense_folder ../scan9/
Testing MVSNet with 5 views
sample number: 49
2D with 32 filters
2D with 32 filters
2D with 32 filters
2D with 32 filters
2D with 32 filters
3D with 8 filters
2018-08-31 19:30:31.197754: I Pre-trained model restored from /home/tiffany/Downloads/compare/scan9/model/model.ckpt-70000
left information:[[[0.970263 0.00747983 0.241939]]...]
fronto_direction information:[[[-0.241605 -0.030951 0.969881]]]
left information:[[[0.970263 0.00747983 0.241939]]...]
fronto_direction information:[[[-0.241605 -0.030951 0.969881]]]
left information:[[[0.970263 0.00747983 0.241939]]...]
fronto_direction information:[[[-0.241605 -0.030951 0.969881]]]
left information:[[[0.970263 0.00747983 0.241939]]...]
fronto_direction information:[[[-0.241605 -0.030951 0.969881]]]
Traceback (most recent call last):
File "test.py", line 252, in
tf.app.run()
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "test.py", line 239, in main
mvsnet_pipeline(mvs_list)
File "test.py", line 198, in mvsnet_pipeline
[depth_map, init_depth_map, prob_map, croped_images, scaled_cams, image_index])
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: flat indices[301, :] = [0, 192, 1, 13] does not index into param (shape: [1,192,216,288]).
[[Node: GatherNd_3 = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](soft_arg_min/prob_volume, stack_5)]]

Caused by op u'GatherNd_3', defined at:
File "test.py", line 252, in
tf.app.run()
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "test.py", line 239, in main
mvsnet_pipeline(mvs_list)
File "test.py", line 162, in mvsnet_pipeline
centered_images, scaled_cams, FLAGS.max_d, depth_start, depth_interval)
File "/data1/Downloads/MVSNet/mvsnet/model.py", line 241, in inference_mem
prob_map = get_propability_map(probability_volume, estimated_depth_map, depth_start, depth_interval)
File "/data1/Downloads/MVSNet/mvsnet/model.py", line 61, in get_propability_map
prob_map_right1 = tf.gather_nd(cv, voxel_coordinates_right1)
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3051, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/home/tiffany/anaconda3/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1740, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): flat indices[301, :] = [0, 192, 1, 13] does not index into param (shape: [1,192,216,288]).
[[Node: GatherNd_3 = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](soft_arg_min/prob_volume, stack_5)]]

Evaluation on DTU

Hi, Thanks for sharing this implementation !

We are trying to evaluate the model on the entire DTU dataset. We were wondering whether the numbers quoted in the paper for DTU test set are over all 7 lighting conditions, or just the max light setting (l3) of DTU, because it seems in the original DTU paper the authors inly used the max light to evaluate the methods.

Thanks !

None

Thank you for your answer, I just have a question on the function to calculate theta, p is supposed to be the sparse 3d point seen from both images in world coordinates? I did a first conversion and the result look weird, do you have any idea of what could be happening? I'm putting here the results i'm getting with mvsnet and the results i got with openmvs

my images look like this one:

I'm getting min depth 0 and max depth 26 from openmvg, when I use those values, the output is empty but when I start incrementing the min value I start getting these results.

Originally posted by @UannaFF in #29 (comment)

Question: testing the model on a training example

Hello!
Thanks, @YoYo000 a lot for your work!

I'm using your model to establish a baseline for my experiments on data augmentation for depth inference with DL.

I do not have much experience in DL but my understanding of general principles of ML makes me believe that I should expect a good performance on a training example given that the training converged to an optimal solution. Thus, I assumed that if I take your pre-trained model and run it on an example from the training set I should get something that looks similar to the corresponding ground truth. However, the visual assessment of the results I get does not support my assumption.

Here is the visualization for the test on mvs_training/Rectified/rect_001_0_r5000.png

Steps to reproduce the result:

download dtu_training.rar as explained in the Training section of the readme
compose the 'test_dir' as follows
~/test_dir/images <- mvs_training/Rectified/scan1_train
~/test_dir/cams <- mvs_training/Cameras/train
~/test_dir/pair.txt <- mvs_training/Cameras/pair.txt
~/test_dir/gt <- mvs_training/Depths/scan1_train
modify function gen_pipeline_mvs_list in preprocess.py accordingly to account for different naming of images:
line 424: ref_image_path = os.path.join(image_folder, ('rect_%03d_0_r5000.png' % (ref_index+1)))
line 434: view_image_path = os.path.join(image_folder, ('rect_%03d_0_r5000.png' % (view_index+1)))
run the testing script with the parameters used for training:
python test.py --dense_folder ~/test_dir/ --regularization 'GRU' --max_w 640 --max_h 512 --max_d 192 --interval_scale 1.06 --view_num 3

Does this mean that my assumption is wrong or did I make a mistake in my experiment somehow?

tf.contrib.image.transform produces slightly different results

Hi there,
thanks for the mvsnet and the code.

I tried the "image.transform()" commit and compared its depth map with the results without the change. It seems, like there is a considerable difference between them (+- 2mm) and even a bias of around 1 mm between the two results.

Can you confirm this? Feel free to use my comparison script below.

I think you can easily fix this by updating the training inference() function and the stored trained model as well. Although it would be really nice if it still produces the same results, even if used with an old trained model!

best regards Christian Achilles.

import os
from preprocess import load_pfm
import numpy as np
from tensorflow.python.lib.io import file_io
import matplotlib.pyplot as plt

def compareDepthFiles(file1, file2, probFile1, probFile2):
    depth1 = load_pfm_by_name(file1)
    depth2 = load_pfm_by_name(file2)
    
#    prob1 = load_pfm_by_name(probFile1)
#    prob2 = load_pfm_by_name(probFile2)
#    mask1 = (prob1 > 0.5) & (prob1 <= 1.001)
#    mask2 = (prob2 > 0.5) & (prob2 <= 1.001)
#    depth1 *= mask1
#    depth2 *= mask2
    
    plotEvaluation(depth2-depth1, [-5, 5])
    
def load_pfm_by_name(filename):
    return load_pfm(file_io.FileIO(filename, mode='rb'))


def plotEvaluation(diff, diff_range):
        plt.set_cmap('jet')
        fig = plt.figure(figsize=(15,10))
        plt.subplot(2,1,1)
        plt.title("difference")
        plt.imshow(diff, vmin=diff_range[0], vmax=diff_range[1])
        plt.colorbar()
        
        plt.subplot(2,2,3)
        plt.title("diff distribution within the given range")
        plt.hist(np.reshape(diff, [-1]), bins=200, range=diff_range, density=True)
        
        plt.subplot(2,2,4)
        plt.title("diff distribution")
        plt.hist(np.reshape(diff, [-1]), bins=200, density=True)
    
        plt.show(fig)

folder2 = "/data/scan9/depths_mvsnet/"
folder1 = "/data/scan9/depths_ImageTransform/"

probFile1 = os.path.join(folder1, "00000000_prob.pfm")
probFile2 = os.path.join(folder2, "00000000_prob.pfm")
depthFile1 = os.path.join(folder1, "00000000.pfm")
depthFile2 = os.path.join(folder2, "00000000.pfm")


compareDepthFiles(depthFile1, depthFile2, probFile1, probFile2)

OOM memory issue

Hi,

I had a question regarding running the 'family' image set from Tanks and Temples. I'm using a P100 on a Google Cloud compute engine machine. According to the initial logs, I have 15.5GB of GPU memory free (see below). I'm running into the OOM memory issue when running the following command:

python test.py --dense_folder /home/ubuntu/images/family/ --max_d 256 --max_w 1760 --max_h 960

The dimensions used are less than the 1920 x 1056 x 256 dimensions you used. Is there anything else you had to do to get MVSNet to work? My volume is approximately 83% of what you tested.

Did you use a P100 on a compute engine? Or did you use the Google Cloud ML platform without server?

Thanks again!
Ed

sample number: 152
2D with 32 filters
2D with 32 filters
2D with 32 filters
2D with 32 filters
2D with 32 filters
3D with 8 filters
2019-01-13 14:00:02.865478: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-13 14:00:03.482779: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-13 14:00:03.483170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
totalMemory: 15.90GiB freeMemory: 15.53GiB
2019-01-13 14:00:03.483203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-13 14:00:14.140075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-13 14:00:14.140130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-13 14:00:14.140137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-13 14:00:14.140357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15051 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)

一些关于D设置的问题

您好，感谢您的开源，我有一些D设置方面的问题想问您。
首先，我观察到在训练中，您对参数的设置为640x512x128，interval=1.6。在测试中您的设置为1100x860x192，interval=0.8。而就您文中而言，您又用了1600x1184x256,interval=0.8来测试得出最终结果。在测试中不同的D（192,256）却有着同样的interval（0.8），会不会对结果造成影响呢？
其次，我想请问，D的选择和图像分辨率的大小是成一个正比的关系？如果说我在640*540的图像上用D=256.interval=0.8训练的话，是不是就不应该了。这个D的选择是您经过多次实验而得出的吗？
再次感谢您的开源～希望得到您的回复～

Evaluation on DTU dataset

Hi, Yao Yao
I'm sorry to bother you because I ran into a problem when I tried to reproduce benchmarking results on DTU dataset.
Using the pre-trained R-MVSNet model and preprocessed inputs that you provide, the evaluation results are 0.4719(mean acc) and 0.5894(mean comp). W x H x D is set to 1600 x 1200 x 256, interval_scale is set to 0.8 and prob_threshold is set to 0.3. All other settings are default. Variational depth map refinement is not applied. So, I want to know whether variational depth map refinement improves the results a lot, or some settings need to be changed.
I’m looking forward to your reply, thank you.
xiaochi

CPU inference seg faults in depthfusion.py

I noticed that the outputs generated from running test.py using just CPU are different from the GPU and cause depthfusion.py to fail with segmentation fault.

This is the the TnT Horse dataset using max_w, max_w, max_h = (1152, 672, 256)

Here are the depth maps produced using the GPU:
https://www.dropbox.com/s/6nwbic9884ulwtt/depths_mvsnet_gpu.zip?dl=0

and these are the CPU ones:
https://www.dropbox.com/s/jozz38morx0pxd2/depths_mvsnet_cpu.zip?dl=0

Not exactly sure what's different between them that causes depthfusion.py to seg fault with the CPU dataset.

I wanted to try using the CPU because I had access to more memory even though it's slower.

Testing links not working

I am trying to download the model and scan9 link in testing section but it says there was problem redirecting.
Can someone check if it is working or provide another source?

How to test the model on other images

I have trained and tested the model on the given dataset, and the experimental result is pretty good.

However, when I test the model on my own data, it seems not to work.

My testing procedure is:

Collect a set of images using a camera with fixed focus;
Get the camera parameters and matching information through VisualSFM, and fill the information in the corresponding cam.txt and pair.txt;
Get the depth range based on point clouds produced by VisualSFM, and calculate the depth interval as (max_depth-min_depth)/192. In my case, the min_depth is 0.0, the interval is 0.01, and I have filled the information in each cam.txt;
Set the max_w and max_h in the test script based on my image size (width:1152, height:768).

However, in the last depthfusion step, 0.00 million points are found. By visualizing the pfm files, I find that the probabilities are all 0.020833334.

I am new to this field and cannot figure out where is wrong. Could anyone be so nice to help me?

Questions about evaluation results on middlebury datasets

Hi,
Really appreciate your great work on both MVSNet and R-MVSNet!
I tested the R-MVSNet using the Middlebury Multiview Dataset(temple),I tried many different depth ranges ,input image pairs and different numbers of images, but the result seems always bad, could you please give some advice?

点云结果在meshlab中显示很暗

大神您好，感谢你的开源供我们学习，在此我想问一下为什么我生成的点云在meshlab中显示是如此的暗，而你们的结果都如此的清晰明亮，求帮忙看看。。

Converting openMVG json to MVSNet usable data

Hello! I'm trying to convert the json I got from doing SFM with openMVG and I have some doubts.

First: this is what i understand from the pairs.txt provided by you
49
0
10 10 2346.41 1 2036.53 9 1243.89 12 1052.87 11 1000.84 13 703.583 2 604.456 8 439.759 14 327.419 27 249.278

line 1: number of views, line 2: number of the view for which we are listing the matches, line 3: first number is the count of matches, and then, you start listing the views that contains matches with the specified view but I don't know what the number on the side is.

If i say, view 0 has a match with view 10, what is 2346.41? from the json result of openMVG I get something like the following,
"structure": [
{
"key": 13,
"value": {
"X": [
1.8031973191036138,
2.2495496437301769,
6.762512416188528
],
"observations": [
{
"key": 0,
"value": {
"id_feat": 13,
"x": [
495.1080017089844,
723.5880126953125
]
}
},
{
"key": 1,
"value": {
"id_feat": 652,
"x": [
719.593017578125,
748.8690185546875
]
}
}
.
.
.
}

So, for each 3d point, they list the 2D points in each view that contributes to its formation. I thought I'd put those 2D coordinates next to the view matching but you have only one number, not a 2D coordinate.

Question: UniNet vs. UNetDS2GN - what is used in MVSnet / R-MVSnet papers?

HI, @YoYo000 !

I see that in the model.py in all the inference routines you actually employ 2D Unet (UNetDS2GN) instead of relatively simple 8-layer UniNet and that change is not mentioned in either of your papers.

As I understand this change might have noticable impact on the performance as the UNetDS2GN performs multi-scale processing and should be able encode better the fine feature details while UniNet does not.

Was this change implemented after R-MVSnet paper? Is it a a new iterative improvement of the model?

Some problems about fusion points cloud

您好，很冒昧突然給您來信。我是西北工業大學的一名研究生，最近在研究MVS的過程中遇到了一些post processing方面的問題。
具體問題如下：我在test data 上生成了各圖片的depth map之後，在使用fusibile程序進行編譯的過程中，可以正常運行，但是最終顯示部分結果如下：
p folder is ../data/dtu/calib/
pmvs folder is
numImages is 49
img_filenames is 49

Fusing points
Processing camera 0
Found 0.00 million points
Processing camera 1
Found 0.00 million points
Processing camera 2
Found 0.00 million points
Processing camera 3
Found 0.00 million points
Processing camera 4
Found 0.00 million points
Processing camera 5
Found 0.00 million points
Processing camera 6
Found 0.00 million points
Processing camera 7
即提示無法找到點，最終生成的final3d_model.plt也只有255字節，在meshlab中顯示無點。已經確定過程序的輸出無誤，也沒有對代碼進行較大改動（除了做了一些路徑上的調整）。想請問您，這個問題是否遇到過，應該如何處理。

希望能盡快得到您的回复，很感謝您在MVS上所做的成果和開源。

Some problems about the evaluation standard(f-score and Comp.) and the depth range.

Thank you for your code. I have some questions about the evaluation standard(f-score and Comp.) and the depth range during my debug process.

1.The evaluation standards in your program are different from them in your paper. Particularly, the unit in code is px, but it is mm in paper. And there are no f-score and Comp. functions in code. We can only achieve 0.75 in <3px.

2.Except that, we find the depth range in depth map label is from 429 to 939. While it is from 425 to 935 in paper. We want to know whether it would affect the final result.

3.We want to know where can we find the benchmark of the DTU dataset. We can only find the link of this dataset but no link of benchmark for evaluating the result.

Looking forward for your early reply. Thank you !

About DTU's matlab code for evaluation

您好！
我想用在您的论文中提到的DTU评估代码来计算distance metric。
但是我运行BaseEvalMain_web.m之后是得到了一个mat文件，请问我要怎样使用这个mat文件来得到Acc和Comp的两个值呢？
期待您的回复，谢谢。

Questions about shape of depth_start and depth in get_homographies()

Hi,
Really appreciate your great work on both MVSNet and R-MVSNet!

When I was reading your code, the shape of depth_start, depth_interval, depth really confused me. To be specific, in the function train (located at train.py), you have extracted the DEPTH_MIN of the reference view in a batch-wise fashion. And it is quite obvious that the shape of depth_start is [FLAGS.batch_size] since you have explicitly reshaped it. When it comes to the get_homographies (located at homography_warping.py), depth = depth_start + tf.cast(tf.range(depth_num), tf.float32) * depth_interval. This is where I am stuck into and confused.

# train.py/train()
depth_start = tf.reshape(
tf.slice(cams, [0, 0, 1, 3, 0], [FLAGS.batch_size, 1, 1, 1, 1]),
[FLAGS.batch_size]
)
depth_interval = tf.reshape(
tf.slice(cams, [0, 0, 1, 3, 1], [FLAGS.batch_size, 1, 1, 1, 1]), 

[FLAGS.batch_size]
)

# homography_warping.py/get_homographies()
depth_num = tf.reshape(tf.cast(depth_num, 'int32'), [])
depth = depth_start + tf.cast(tf.range(depth_num), tf.float32) * depth_interval
num_depth = tf.shape(depth)[0]

For one thing, depth_interval's shape is [FLAGS.batch_size], but the tf.cast(tf.range(depth_num), tf.float32)'s shape is [depth_num], can these two tensors multiplied well?
For another, depth's shape ought to be the same as depth_start. And in the next line of the code, num_depth = tf.shape(depth)[0], which is batch_size from my perspective. I am really confusing on how the shape of depth is formulated.

Currently, I am trying to rewrite your code in tensorpack and haven't tried to run them once. I am wondering if this is a issue related to the tf's version? I'm using tf 1.13

Thanks a lot!

Questions about points fusion on ETH3D dataset

Hello! Thank you for your source code~

I want to ask some questions about points cloud fusion on ETH3D dataset. Because the data of ETH3D dataset is gray image with channel of 1. However, the DTU dataset is RGB image with channel of 3. How can I generate points cloud on ETH3D dataset using the pretrained model on DTU model?

I use (cv2.imread(gray.png)) to let the channel of gray image 3. But the result is very bad!

Can I transfrom the RGB image to gray image on DTU and get a pretrained model using gray DTU dataset? Will it works?

Hope your early reply! Thank you

Results with no variation depthmap refinement

Hi,

RMVSNet appears to have improved the memory management on MVSNet. Good job!

I had a question regarding the point-cloud results without depth-map refinement (Figure 1 in your paper). I tried to reproduce your point-cloud results without refinement, and had a couple questions. From the views attached, you can see my results against the published point-clouds. The front views appear very similar. However, from the side view, the refined mesh is cleaner and sharper. My results have more noise.

I used settings of 1600x1200x256. I also set probability to 0.1. Is there anything else that could affect the noise level? Or do my results correspond to what you saw?

Thanks!

关于训练过程中出现的一个莫名错误

运行信息：
ubuntu 16
GTX 2080
Python2.7
cudatoolkit 9.0
cudnn 7.1.2

错误如下，请问问题可能出在哪里
2019-05-11 00:20:11.500263: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-11 00:20:11.873102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:88:00.0
totalMemory: 10.73GiB freeMemory: 10.57GiB
2019-05-11 00:20:11.873190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-05-11 00:20:12.375421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-11 00:20:12.375482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-05-11 00:20:12.375490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-05-11 00:20:12.376248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10213 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:88:00.0, compute capability: 7.5)
Forward pass: d_min = 425.000000, d_max = 931.150000.
2019-05-11 00:21:27.889785: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x562e4471b430
Forward pass: d_min = 425.000000, d_max = 931.150000.
2019-05-11 00:21:28.671645: E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED
2019-05-11 00:21:29.142631: I tensorflow/stream_executor/stream.cc:4737] stream 0x562e44930550 did not memcpy host-to-device; source: 0x7fb5ae822b00
2019-05-11 00:21:29.142743: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at matrix_inverse_op.cc:191 : Internal: MatInvBatched: failed to copy pointers to device
Traceback (most recent call last):
File "train.py", line 352, in
tf.app.run()
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 347, in main
train(sample_list)
File "train.py", line 313, in train
[summary_op, train_opt, loss, less_one_accuracy, less_three_accuracy])
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3
[[Node: Model_tower0/get_homographies/MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Model_tower0/get_homographies/transpose_1, Model_tower0/get_homographies/Squeeze_5)]]
[[Node: Model_tower0/gradients/AddN_515/_2989 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_94560_Model_tower0/gradients/AddN_515", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'Model_tower0/get_homographies/MatMul_1', defined at:
File "train.py", line 352, in
tf.app.run()
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 347, in main
train(sample_list)
File "train.py", line 220, in train
images, cams, FLAGS.max_d, depth_start, depth_interval, is_master_gpu)
File "/home/xhy/depth/MVS/mvsnet/model.py", line 98, in inference
depth_start=depth_start, depth_interval=depth_interval)
File "/home/xhy/depth/MVS/mvsnet/homography_warping.py", line 32, in get_homographies
c_right = -tf.matmul(R_right_trans, tf.squeeze(t_right, axis=1)) # (B, D, 3, 1)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 2084, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/xhy/anaconda3/envs/py97/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3
[[Node: Model_tower0/get_homographies/MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Model_tower0/get_homographies/transpose_1, Model_tower0/get_homographies/Squeeze_5)]]
[[Node: Model_tower0/gradients/AddN_515/_2989 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_94560_Model_tower0/gradients/AddN_515", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Tanks and Temples Setup

Thanks for sharing the code.

I am trying to reproduce the results on tanks and temples with the pre-trained model but not succeeding so far. An example camera file looks like:

extrinsic
0.333487 -0.0576322 -0.940992 -0.0320506
0.0582181 -0.994966 0.0815704 -0.0245921
-0.940956 -0.0819853 -0.328452 0.248608
0.0 0.0 0.0 1.0

intrinsic
1165.71 0 962.81
0 1165.71 541.723
0 0 1

0.193887 0.00406869 778 3.35933

I have parsed the file to adjust the depth min and max, but it doesn't seem to help much. I only have a 12GB GPU memory, so I am running at half the image resolution which shouldn't hurt a lot. However, the outputs I am getting are pretty bad and nothing like the paper. Moreover, I find that I am having to change the parameters for every single scan (horse, family, etc) separately and no set of values seems to apply to all.

@YoYo000 Since there are multiple similar questions on this, it'd be great if you could please summarize the detailed steps for reproducing the scan results including the parameters to use and changes to the repository scripts if any.

Score values in view selection

I'm attempting to supply my own data, and was hoping for a little explanation of the score values read from the view selection file. How should we compute these values? I see they're read, but I don't see if and where they're used. Any helpful hints would be welcome.

As an example, here in 'pair.txt' the scale weights are in bold following each of the 10 IDs:

49
0
10 10 2346.41 1 2036.53 9 1243.89 12 1052.87 11 1000.84 13 703.583 2 604.456 8 439.759 14 327.419 27 249.278

Question: How to use depth maps for reconstruction?

How can I use the output depth maps to reconstruct a 3D point cloud?

数据集链接的问题

你好，非常感谢您的开源，但是由于某些原因，无法登录国外的网站，请问有没有在大陆能下载数据集的链接提供，非常感谢。

about the train data

First. thanks your code. I can't down the data in China . because I cannot visit the google driver. can you share it by Baidu disk? . also I try to down it from the http://roboimagedata.compute.dtu.dk/ but the problem is that I can't find the right version. as your paper said : As it provides the ground truth point cloud with normal information, we use the screened Poisson surface reconstruction (SPSR) [16] to generate the mesh surface, and then render the mesh to each viewpoint to generate the depth maps for our training I think is should be a point cloud data. But I'm not sure which one should I download. Besides. I want try to classify the object with different size . so I want to classify with the depth image. I'm not sure if it can work. do you have any good idea about the classification? many thanks

Questions about download the pre-trained model

Hello!
Thanks, @YoYo000 a lot for your work!
When I tried to download the pre-trained models. It seems that we couldn't reach the google drive because of the "wall". Could you please provide a Baidu Netdisk version for China Mainland.
Thank you very much!

About training, I can't make it work

Thanks for your work and code.
I encounter some problems when I train the mvsnet.

I downloaded the preprocessed DTU training data from your link, however, there's no train folder in the Cameras folder. Also, I find the camera parameters seem correspond to the original image size. So I modified camera parameters as the process in the paper, resize into 800x600 and center crop the 512x640.
finally, the 00000000_cam.txt should look like this, is it right?

extrinsic
0.970263 0.00747983 0.241939 -191.02
-0.0147429 0.999493 0.0282234 3.28832
-0.241605 -0.030951 0.969881 22.5401
0 0 0 1

intrinsic
1446.1650390625 0.0 331.6025085449219
0.0 1441.5899658203125 265.5354919433594
0.0 0.0 1.0

425 2.5

I have tried training the network by using the original train.py. Also, I tried the hyperparameter listed in the paper with max_d=256, interval_scale=0.8. However, the loss decreases very slow and at 70000 iters, the model doesn't work at all. I also found that the size of my model is very small with around 20MB, but the model you provided is almost 200MB.

I'm sorry that I'm new to tensorflow, it's a little hard for me to check the problems out.
Could you please give me some suggestions in detail about training this network.

error occurs while doing depth fusion

hi, Dr Yao,
I have used your pre-trained model and scan9 dataset. I can get the correct cloud point (have transformed to world coordinate) per image. But it fails while fusing multi-images (looks like different coordinate). Could you give me some advice?

why does the image coordinate is x_p = x_i + 0.5

I notice that you gengerate pixel grid from 0.5, and then -0.5 in interplation. Since intrinsics use coordinate that start from (0,0), would that cause a problem?

UnicodeDecodeError

Hi！ I'm trying to run the test code under Python3. Then I got this error when it run to "image = scipy.misc.imread(image_file, mode='RGB')" in the MVSGenerator class:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte.

Can I please get some advice from you (or any others)? Thank you.

can you provide the derivation of Differentiable Homography？

Thanks for your work.
I have implemented the code on DTU dataset, and the result is same with yours.
However，when I run this model on my own data，it doesn't work. I looked for why, and found that the coordinate system representation in my data is different from yours, so the extrinsic and intrinsic matrix is different.
This caused your core parts (Differentiable Homography) not to work for my own data, so I have to re-derive this formula. The introduction to differentiable homography in your paper is relatively simple for me since I am a new learner.

If it is convenient, can you provide the derivation of this formula? I don’t particularly understand what the middle part means.
(BTW, I use the coordinate system in photogrammetry).

Sorry to bother you and look forward to your reply.

Questions regarding optimization

Hi,

Great software. I had a couple questions regarding optimization and output:

Memory usage:
For memory, I'm testing images on a Google compute engine running one P100 GPU. I'm running out of memory with max_h = 1792, max_w = 1344, and max_d = 256. The MVSNet paper states you were able to run images with max_w = 1920, max_h = 1440 and max_d = 256. Is this correct? Any other thing I should look at in regard to memory? This is the error I'm getting.

Resource exhausted: OOM when allocating tensor with shape[150528,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Timing:
My first depth inference is taking over 100 seconds. All subsequent depth inferences take about 3-5 seconds depending on resolution. Is there a way to speed this up? Also, the code takes about 2-3 minutes to get to the first depth inference. Is this normal?

Depthmap upsampling:
Is there a way to improve the depthmap resolution? The depthmaps are very accurate. However, with the 4X down-sampling, the resulting pointclouds are fairly sparse. This poses a problem for meshing which likes dense pointclouds.

Thanks again and looking to your comments,
Ed