Coder Social home page Coder Social logo

boese0601 / rc-mvsnet Goto Github PK

View Code? Open in Web Editor NEW
200.0 16.0 14.0 38.69 MB

[ECCV 2022] RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering

Home Page: https://boese0601.github.io/rc-mvsnet/

License: MIT License

Python 86.61% Shell 0.09% MATLAB 12.36% C++ 0.84% C 0.10%
eccv2022 multi-view-stereo neural-rendering depth-estimation unsupervised-learning 3d-computer-vision neural-radiance-fields

rc-mvsnet's Introduction

RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering

Di Chang1, Aljaž Božič1, Tong Zhang2, Qingsong Yan3, Yingcong Chen3, Sabine Süsstrunk2 and Matthias Nießner1
1TUM 2EPFL 3HKUST
ECCV 2022
Arxiv | Project page

Introduction

This is the official pytorch implementation of our ECCV2022 paper: RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering. In this work, we introduce RC-MVSNet, a neural-rendering based unsupervised Multi-View Stereo 3D reconstruction approach. First, we leverage NeRF-like rendering to generate consistent photometric supervision for non-Lambertian surfaces in unsupervised MVS task. Second, we impose depth rendering consistency loss to refine the initial depth map predicted by naive photometric consistency loss. We also propose Gaussian-Uniform sampling to improve NeRF's ability to learn the geometry features close to the object surface, which overcomes occlusion artifacts present in existing approaches. We achieved state-of-the-art performance on DTU and Tanks&Temples benchmarks and competitive performance to many supervised methods.

Installation

Clone repo:

git clone https://github.com/Boese0601/RC-MVSNet.git
cd RC-MVSNet

The code is tested with Python == 3.7, PyTorch == 1.10.1 and CUDA == 11.3 on NVIDIA GeForce RTX 3090. We recommend you to use anaconda to manage dependencies. You may need to change the torch and cuda version in the requirements.txt according to your computer.

conda create -n rcmvsnet python=3.7
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda activate rcmvsnet
pip install -r requirements.txt

Datasets

DTU

Training

Download the DTU dataset pre-processed by MVSNet and extract the archive. You could use gdown to download it form Google Drive. You could refer to MVSNet for the detailed documents of the file formats.

Download the original resolution depth maps provided by YaoYao. Extract it and rename the folder to Depths_raw.

Merge the folders together and you should get a dataset folder like below:

dtu
├── Cameras
├── Depths
├── Depths_raw
└── Rectified

Testing

Download the DTU testing dataset pre-processed by MVSNet and extract the archive. You could use gdown to download it form Google Drive. You could refer to MVSNet for the detailed documents of the file formats.

dtu_test
├── scan1
├── scan4
├── scan9
...
├── scan114
└── scan118

Tanksandtemples(Only for Testing)

Download the Tanks and Temples testing set pre-processed by MVSNet. For the intermediate subset, remember to replace the cameras by those in short_range_caemeras_for_mvsnet.zip in the intermediate folder, see here. You should get a dataset folder like below:

tankandtemples
├── advanced
│   ├── Auditorium
│   ├── Ballroom
│   ├── Courtroom
│   ├── Museum
│   ├── Palace
│   └── Temple
└── intermediate
    ├── Family
    ├── Francis
    ├── Horse
    ├── Lighthouse
    ├── M60
    ├── Panther
    ├── Playground
    └── Train

Configure

There are several options of flags at the beginning of each train/test file. Several key options are explained below. Other options are self-explanatory in the codes. Before running our codes, you may need to change the true_gpu, trainpath/testpath , logdirand loadckpt (only for testing).

  • logdir A relative or absolute folder path for writing logs.
  • true_gpu The true GPU IDs, used for setting CUDA_VISIBLE_DEVICES in the code. You may change it to your GPU IDs.
  • gpu The GPU ID used in your experiment. If true_gpu: "5, 6". Then you could use gpu: [0], gpu: [1], or gpu: [0, 1]
  • loadckpt The checkpoint file path used for testing.
  • trainpath/testpath A relative or absolute folder path for training or testing data. You may need to change it to your data folder.
  • outdir A relative or absolute folder path for generating depth maps and writing point clouds(DTU).
  • plydir A relative or absolute folder path for writing point clouds(Tanks).
  • dataset Dataset to be used. ["dtu_train","dtu_test","tanks"]
  • resume Resume training from the latest history.

Training

Train the model on DTU dataset

python train_rcmvsnet.py --logdir ./rc-mvsnet --trainpath {your data dir} --dataset dtu_train --gpu [0,1,2,3] --true_gpu 0,1,2,3 

Testing

DTU

We have provided pre-trained model in the pretrain folder, which contains models for both backbone network and rendering consistency network, only the backbone network (ended with 'cas') is used for testing as mentioned in the paper. The rendering consistency network (ended with 'nerf') is used for resume training from the current epoch.

You could use eval_rcmvsnet_dtu.py to reconstruct depthmaps and point clouds with the checkpoint. To reproduce the DTU results in our paper, run commands below:

python eval_rcmvsnet_dtu.py

After you get the point clouds, you could follow the instructions in DTU website to quantitatively evaluate the point clouds.

DTU Point Cloud Evaluation

We provide evaluation code in the matlab_eval folder. The code relies on the official code of DTU Dataset. Please use BaseEvalMain_web_pt.m, ComputeStat_web_pt.m and compute_mean.m for evaluation.

  • gt_datapath The path to the ground truth point clouds.
  • dataPaths The path to the generated point clouds of RC-MVSNet.
  • resultsPaths The path to output metrics of the evaluation script.

Tanksandtemples

To reproduce the Tanksandtemples results in our paper, run commands below:

python eval_rcmvsnet_tanks.py --split "intermediate" --loadckpt "./pretrain/model_000014_cas.ckpt"  --plydir "./tanks_submission" --outdir './tanks_exp' --testpath {your data dir}
python eval_rcmvsnet_tanks.py --split "advanced"  --loadckpt "./pretrain/model_000014_cas.ckpt" --plydir "./tanks_submission" --outdir './tanks_exp' --testpath {your data dir}

After you get the point clouds, you could submit them to the Tanksandtemples website for quantitative evaluatation.

License

Our code is distributed under the MIT License. See LICENSE file for more information.

Citation

@inproceedings{chang2022rc,
  title={RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering},
  author={Chang, Di and Bo{\v{z}}i{\v{c}}, Alja{\v{z}} and Zhang, Tong and Yan, Qingsong and Chen, Yingcong and S{\"u}sstrunk, Sabine and Nie{\ss}ner, Matthias},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  year={2022}
}

Contact

If you have any questions, please raise an issue or email to Di Chang ([email protected]or [email protected]).

Acknowledgments

Our code follows several awesome repositories. We appreciate them for making their codes available to public.

rc-mvsnet's People

Contributors

boese0601 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rc-mvsnet's Issues

Hello, when the code is released?

Hello, Thank you for your nice research.
When your code is published?
I want to refer to your research but there is no code.
Please tell me your brief schedule. It will be helpful for me.

A little confusion about Reference View Synthesis Loss.

Hello, thank you for your wonderful work!

In your ablation (Tab.5), Lrc can improve the performance of depth estimation.
As I understand it, Lrc in Sec. 3.2.3 is used to supervise NeRF. How dose it affect the performance of CasMvsNet? Is it through the shard 2D Feature Extraction Network ?

Looking forward to your reply.

Question about code

Hi~ Thanks for your open code!
I have read your paper and code, and one key step is to do ray marching in implicit neural volume for volume rendering, where the sampling points on the ray are converted to ndc space; It is well known that in Nerf, the ray marching is completed in the world coordinate system. Because it's converted to the world coordinate system by the camera pose.
I am confused about why can the ray marching of the implicit neural volume be done in the ndc space. Is the implicit neural volume in the ndc space after regularization? What's the physical meaning behind it? Since I can't seem to find the code to convert the final result from the ndc coordinate system to the world coordinate system, can you help me point it out?
BTW, If the implicit neural volume is indexed in NDC space, shouldn't it be trilinear interpolation? Why bilinear interpolation? Or is this a representation of pytorch :)

features = F.grid_sample(volume_feature, grid, align_corners=True, mode='bilinear')[:,:,0].permute(2,3,0,1).squeeze()#, padding_mode="border"

Looking forward to your reply!

Problem with reproducing "tanksandtemples" results

Hi,

I have reproduced the "tanksandtemples" .ply files according to the definition (without changing any parameter) on the main page of the RC-MVSNet. But unfortunately, I could not get meaningful results. I have uploaded a wide screenshot that depicts the results.

Results

You can see the depth estimates(left) and masks (right) on the left page. As you going to see here, Some of the depth estimates turn empty. Even worse, I could only get very few meaningful masks and not surprisingly I can only get the result like the right screen.

I would be appreciated it if you would help me to solve this issue.

Sincerely.

How to use with iPhone depth data

Hey! I have a dataset of MVS images and corresponding depths obtained buy iPhone lidar. I want to evaluate the existing net/finetune on this dataset. However, as I noticed, the out of the box results look far from good on my samples (reference image attached). I'm using colmap poses.

00000000

One thing I noticed is that depth_min in DTU is 425, the interval is 1.06, and in my dataset the depth range is from 0.03 to 26, so I likely need to take these different parameters into account when just evaluating the pretrained net. How can I reproduce the results on my dataset?

Questions about neural rendering quality

Thanks for the excellent work! Here I have a question regarding the neural rendering network.

As suggested in the paper, the RGB render loss of reference image boosts the depth prediction performance. I wonder have you visualized the RGB rendering quality of reference view on test scenes? Is it better than or comparable to pure NeRF-like methods like IBRNet?

I tried to demonstrate the rendering quality of the rendering network using your provided 'model_000014_nerf.ckpt' but the result seems quite weird:

RGB rendering:
0
depth rendering:
0

Any suggestions would be greatly appreciated. Thanks in advance!

Questions about batchnorm errors in Neural_Volume_Net

hello. First of all, thank you for sharing this great research.

  1. However, the Neural_Volume_Net that I define in line 736 of my render_models.py file is the code to aggregate 5D Volumes. But when I run the actual code, I get an error because the batchnorm is declared as 2D. I think I need to change it to BatchNorm3d, what do you think?
class Neural_Volume_Net(nn.Module):
    def __init__(self,
         num_groups=1,
         norm_act=nn.BatchNorm2d,
         levels=1):
  1. Also, I was wondering why you created an empty channel as much as C for volume_feature_no_ref with no reference image on line 83 of the casmvsnet.py file?

torch.empty((B, 3*(num_views-1) + C, num_depth, *ref_feature.shape[-2:]), device=imgs.device, dtype=torch.float)

About TT dataset

Do you have the code of jdcas TT dataset? I contacted the other party but said that I had communicated with the author, and I also encountered similar questions that the jdcas author asked under the cvpmvsnet code at the beginning: dimension error, do you have the code of the other party's TT dataset? Thanks a lot!!!

RuntimeError: NCCL error in

RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1639180594101/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3

The parameters of the network.

Thanks for your excellent work!

I want to know the network's parameters, which are not noticed in your paper.

  1. Are the size (D,H,W) of cost volume pyramid of MVS the same as the origin CasMvsNet?
  2. What is the size (D,H,W) of the cost volume of your NeRF?
  3. What is the number of your sample in one ray?

About train batch_size

When i take train batch_size to 4 ,it report that File "H:\jintan\RC-MVSNET + SE\models\render_utils.py", line 100, in get_rays_mvs
dirs = torch.stack([(xs-intrinsic[0,2])/intrinsic[0,0], (ys-intrinsic[1,2])/intrinsic[1,1], torch.ones_like(xs)], -1) # use 1 instead of -1,how can i fix it? Thank you

depth_est.detach()

return tensor2float(scalar_outputs["loss"]), tensor2float(scalar_outputs), tensor2numpy(image_outputs), depth_est.detach(),volume_feature,loss

Hi, I found that the MVSNeRF module only seems to affect the gradient of the image encoder, how does it improve the depth estimation of CascadeMVSNet?

Self collected dataset testing

Thank you for your hard work. I would like to use your algorithm and my own data for object reconstruction. After carefully reading Readme, I did not find any relevant descriptions. Can you tell me

out of memory error

Author,I want some usefull advices to solve my problem with my the gpu memory lackage except set batch size to 1(I already done).It is OK if the change downgrade the model's performance or slower the tranning speed,I just want to see it trainning.Thank you!

Decrease memory needs

I have a simple RTX 2060, with 6gb, is that possible to run on it?
I'm testing with the tanksandtample data, but its complaining of gpu memory.

but if I decrease the resolution in the arguments bwhen running its seems to doesnt make difference. If I change the ndepth it does work, to a certain point, as it complains

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 2

I think the problem is because the training checkpoint provided was trained using the default depths 48,32,8

So in the and I cant run the tests.
Is there another way?
Thanks a lot.

Can i train mine dataests?

hello,i have test your codes, it's perfect.,And i have a question ,i want to train my datasets, How do I train my datasets? Thanks!

Maybe there is a bug in "models/casmvsnet.py#L61"

Maybe there is a bug in "models/casmvsnet.py#L61"

        img_feat[:, :3, :, pad:H + pad, pad:W + pad] = imgs[0].unsqueeze(2).expand(-1, -1, num_depth, -1, -1)

the shape of img_feat is (B, N, C, H, W), the shape of imgs is (B*N, C, H, W), so the result shape of imgs[0].unsqueeze(2) is (C, H, 1, W), some error will happen when do the next step .expand(-1, -1, num_depth, -1, -1)
I guess this code is for adding the ref view(RGB channel) into img_feat
I think this line should be changed to the code:

        img_feat[:, :3, :, pad:H + pad, pad:W + pad] = imgs.unsqueeze(2).expand(-1, -1, num_depth, -1, -1)[0]

I have not tried to train the whole model, so I'm not sure
Please reply to me if you see it, I will appreciate it sincerely
(●'◡'●)

Reconstruction results are strange

I used the pretrained model in the repo with conmand python eval_rcmvsnet_dtu.py , and visualized the ply file of scan4 in the output directory, but the visualization results looked strange with some black parts, the visualization results of other scan have the same problem, as shown below,:
Snipaste_2023-06-09_10-49-18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.