Coder Social home page Coder Social logo

gangweix / igev Goto Github PK

View Code? Open in Web Editor NEW
467.0 29.0 58.0 50.83 MB

[CVPR 2023] Iterative Geometry Encoding Volume for Stereo Matching and Multi-View Stereo

License: MIT License

Python 89.27% MATLAB 10.73%
multi-view-stereo stereo-matching cost-volume-filtering iterative-optimization cvpr2023

igev's Introduction

IGEV-Stereo & IGEV-MVS (CVPR 2023)

This repository contains the source code for our paper:

Iterative Geometry Encoding Volume for Stereo Matching
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang

Demos

Pretrained models can be downloaded from google drive

We assume the downloaded pretrained weights are located under the pretrained_models directory.

You can demo a trained model on pairs of images. To predict stereo for Middlebury, run

python demo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth

Comparison with RAFT-Stereo

Method KITTI 2012
(3-noc)
KITTI 2015
(D1-all)
Memory (G) Runtime (s)
RAFT-Stereo 1.30 % 1.82 % 1.02 0.38
IGEV-Stereo 1.12 % 1.59 % 0.66 0.18

Environment

  • NVIDIA RTX 3090
  • Python 3.8
  • Pytorch 1.12

Create a virtual environment and activate it.

conda create -n IGEV_Stereo python=3.8
conda activate IGEV_Stereo

Dependencies

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -c nvidia
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install tqdm
pip install timm==0.5.4

Required Data

To evaluate/train IGEV-Stereo, you will need to download the required datasets.

By default stereo_datasets.py will search for the datasets in these locations.

├── /data
    ├── sceneflow
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── KITTI_2012
            ├── training
            ├── testing
            ├── vkitti
        ├── KITTI_2015
            ├── training
            ├── testing
            ├── vkitti
    ├── Middlebury
        ├── trainingH
        ├── trainingH_GT
    ├── ETH3D
        ├── two_view_training
        ├── two_view_training_gt
    ├── DTU_data
        ├── dtu_train
        ├── dtu_test

Evaluation

To evaluate on Scene Flow or Middlebury or ETH3D, run

python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset sceneflow

or

python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset middlebury_H

or

python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset eth3d

Training

To train on Scene Flow, run

python train_stereo.py --logdir ./checkpoints/sceneflow

To train on KITTI, run

python train_stereo.py --logdir ./checkpoints/kitti --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --train_datasets kitti

Submission

For submission to the KITTI benchmark, run

python save_disp.py

MVS training and evaluation

To train on DTU, run

python train_mvs.py

To evaluate on DTU, run

python evaluate_mvs.py

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{xu2023iterative,
  title={Iterative Geometry Encoding Volume for Stereo Matching},
  author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={21919--21928},
  year={2023}
}

Acknowledgements

This project is heavily based on RAFT-Stereo, we thank the original authors for their excellent work.

igev's People

Contributors

gangweix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

igev's Issues

关于demo_imgs.py相关的问题

@gangweiX 作者您好,我在进行demo图片测试的时候,报出以下错误:
image
当我对报错区域进行初始化后
image
又会出现如下的错误
image
请问应该如何解决

About the performance on ETH3D

Dear Dr.Xu:
You did a fantastic job on this project! Thank you for sharing your work.
I would like to ask you a question regarding ETH3D. Your evaluation score  (3.6) for the ETH3D training set is more modest than RAFT-Stereo's score (3.2), but your experiments outperform RAFT-Stereo in the ETH3D online benchmark (You rank 4th and RAFT ranks 62nd). Could you please teach me why you can perform better now?
Thank you very much!

关于网络模型泛化能力的问题

您好,感谢您开源的代码
对于网络结构,我想问一下,您是如何提高模型的泛化能力的,我使用了其他网络模型来预测我自己拍的图像,但是效果十分差,但是用了您的IGEV网络跑出来的效果还是比较可观的,想咨询一下您是如何做到这么强的泛化性的。谢谢

自己重新训练数据集时出现问题

我用 $Scene \ Flow$ 数据集从零开始训练, $Loss$ 一直在上升,同时 $EPE$ 也在上升,并且 $Loss$$EPE$ 在最后都会直接收敛。是数据集出现问题了吗?然后这是我训练时候的 $Loss$$EPE$。对于整个网络,我只修改了数据集读取的路径,是不是我的数据集本身出现问题了?
image
image

咨询:训练成本如何降低

从预训练模型到微调的成本并不高,几个小时。但是在仿真数据集的训练一次,起码要耗时4天。
您在设计实验的时候有没有考虑一些降低实验成本的方法?
我在您的提问回答中看到,没有尝试直接在KITTI上进行训练。所以想了解下您是否尝试去降低训练成本,毕竟小一周才能看一次的话,实验很难推进下去。
也许是分割了数据集中的一小部分,作为一个基准,最后全部跑一次。如果是这样,恳请您能详细说下划分的方式和基准的参数。
也许是先依照不到200K的迭代次数,作为一个基准。如果是这样,也想请您详细述说。
这些都是我的一些猜测,因为我不太敢想想八卡服务器或者是好几台服务器混合做实验(贫穷限制了想象)。

Question about EPE calculation.

Thanks for your amazing work.

evaluate_stereo.py on line 192:

epe = torch.sum((flow_pr - flow_gt)**2, dim=0).sqrt()

However, by definition, $EPE=|d_{est} - d_{gt}|$, $MAE=\frac{1}{N}\sum_{(x, y)\in N}|d_{est} - d_{gt}|$ So should it be like this?

epe = ((flow_pr - flow_gt)**2).sqrt()

In addition, another indicator D1 will also be affected by the change in the calculation method of EPE.

如何将模型应用在鱼眼图像中?

@gangweiX 作者大大您好,我想请问一下,我应该如何将双目鱼眼的图片输入模型来预测深度呢?我如果直接采用鱼眼图片训练模型是否可行?还是说我需要做鱼眼模型和针孔模型的映射?

[Question regarding resolution of input data]

Hi! Thank you for the great works!

I was just wondering whether the output of input data for training effect the outcome of the model!

If so, how high resolution would be good for the performance?

thank you!

Trifocal case: MVS?

Would you advise extending IGEV-Stereo for the special case of a trifocal tensor estabishing the geometric projection between three views, or would IGEV-MVS be a more natural choice? In the latter case, are three images enough for inference? In your paper I see the number of input images is N=5 for training.

Need help to do inference for gray scale image

Want to run the network for gray scale images (single channel)

I get this error while running network on gray images
Traceback (most recent call last):
File "demo_imgs.py", line 100, in
demo(args)
File "demo_imgs.py", line 50, in demo
image1 = load_image(imfile1)
File "demo_imgs.py", line 29, in load_image
img = torch.from_numpy(img).permute(2, 0, 1).float()
RuntimeError: number of dims don't match in permute

I tried copying same gray values for all 3 channel but results are not very good.

I see eth3d is gray scale image dataset so I also tried with eth3d network shared.
But I still get above error.

Can you please share what change is needed to adapt network to gray images?

Two way matching (left->right, right->left)

I usually switch the images (and turn 180°) and run the model a second time to get a stereo-match from right to left. This enables me to perform a consistency check between the two disparity maps. However, this is inefficient, since some modules perform the same calculations for both directions (e.g. the feature network). Where would be the best way to split the model into two paths, where either path performs one stereo-matching direction?

视差图上色的代码应该用哪个

我用了下面两种,但是效果都不怎么好,部分地方颜色异常
plt.imsave(file_stem, disp, cmap='jet')
cv2.imwrite(file_stem, cv2.applyColorMap(cv2.convertScaleAbs(disp, alpha=0.01),cv2.COLORMAP_JET))

SceneFLow Model Performance

Hi!

Thank you for your amazing work!

I test the model's performance on the scene flow dataset, while in the original paper, the EPE performance should equal to 0.47. When I used the per-trained model for testing. I only found the EPE equal to 0.66. Is the releasing Sceneflow pret-trained model with data-aug for other dataset fine-tuning and generation tests?

While the model with an epe of 0.47 only uses for scene flow performance evaluation. If so, would you mind releasing the scenflow-only pertained weights? Since I want to compare with my model in the aspect of occluded regions. Since I cannot reproduce the performance of 0.47.

Thank you very much.

Licence?

hi, i noticed this repo didn't have a licence. can it be used in commercial?

Precomputed results on KITTI 2015

Thank you for sharing the code for your great work!

Can you provide precomputed results on KITTI 2015? (the results you submit to KITTI)

About generalization experiments on Middlebury and ETH3D.

Hi,thank you very much for your great work.
I test the generalization performance of the checkpoint you provided on Middlebury and ETH3D, but the results are different from Table7.
Is it because the checkpoint provided is training without data augmentation ?
So i would like to know if i misunderstood or misused? Thanks.

The checkpoint for test is sceneflow.pth

The results on Middlebury(half) training sets is :
All:
'EPE': [7.572738313674927], 'D1': [0.27418503363927205], 'Thres1': [0.4556864986817042], 'Thres2': [0.3513654222091039], 'Thres3': [0.3013058652480443]
Noc:
avg_test_scalars_nonocc {'EPE': [6.537938332557678], 'D1': [0.2406569098432859], 'Thres1': [0.41864077051480614], 'Thres2': [0.3155737062295278], 'Thres3': [0.2668188696106275]}

The results on ETH3D training sets is :
All:
'EPE': [0.9156075097896434], 'D1': [0.06998444140157921], 'Thres1': [0.18718870177313132], 'Thres2': [0.10728459610362295], 'Thres3': [0.06998444140157921]
Noc:
'EPE': [0.9049188627137078], 'D1': [0.06913529045548919], 'Thres1': [0.1841263070150658], 'Thres2': [0.10554461269768783], 'Thres3': [0.06913529045548919]

The results in Table7 is :
image

Some qualitative results about the checkpoint:

color_disp

playground_1l

BTW,I am also curious about how the results of RAFT-Stereo in Table 7 are obtained?
image

The generalization result in the RAFT-Stereo paper is:
image

The results in Graftnet are the same as RAFT-Stereo:
image

timm库加载模型时报错

作者你好,我运行你代码时,执行到这里的报错了
image
报错内容如下:
image
请问这种情况该如何解决了?
Thank you!

ETH3D and Middlebury

作者您好,我想咨询您一下,在这两个数据集上,您论文中的指标直接是用的Sceneflow中的预训练模型进行检测的吗?而不是混合微调以后的结果!感谢您的指导

Results on ETH3D

Hi, thank you for your great work!
I have one question about results on ETH3D:
What is the difference between the model sceneflow.pth and eth3d.pth? Did you fine-tune on training set of ETH3D?

max disparity can't be changed

Hi

Many thanks for your work!

It seems that the max_disp parameter, which is set to 192 per default, can not be changed.
Change this value will cause a shape mismatch.

I figured out, this shape mismatch can be avoided by changing this line to:
gwc_volume = build_gwc_volume(match_left, match_right, self.args.max_disp//4, 8)
However igev stereo still is not able to match disparities > 192 pixels.

Do I neet to retrain the network with this change?

Best regards

There appears to be a small bug

Hi, thanks for the great work!

There appears to be a small bug in Line 137 of ./core/igev_mvs.py, which gives:
view_weight_sum += view_weight_sum + view_weight.unsqueeze(1)

I'm wondering whether this line of code should be written as:
view_weight_sum = view_weight_sum + view_weight.unsqueeze(1)
or
view_weight_sum += view_weight.unsqueeze(1)

如何训练自己的数据集

你好博主,我想请教一下如果训练自己的数据集,在数据加载、数据格式转化、数据预处理方面需要做哪些修改,如能告知,万分感谢。

kitti数据集训练问题

作者您好,我在训练kitti数据集的过程中,发现记载的验证集是FlyingThings的,但是我在evaluate_stereo.py里面已经将dataset改为kitti了,我找不到原因,麻烦您解答一下。

Question about IGEV-MVS

Nice work!
I know that the basic framework of IGEV-Stereo is based on RaftStereo.
May I ask which method or repo IGEV-MVS refers to?

How do i Finetune

Using IGEV I need to finetune the model on my own dataset. What should be the steps I should follow in order to execute the fine-tuning because there is no fine-tuning file available?

Training divergence

"Hello, author. I'm training the SceneFlow dataset with IGEV, and the EPE keeps increasing, reaching more than 4000. Could you please provide guidance on how to solve this issue?"

n_downsample can't be changed

To reduce the size of the model, I want to retrain the network with lower resolution of the disparity field.
I've already done this with the RAFT-Stereo network. However, when changing the corresponding parameter n_downsample from 2 to 3, the update_block throws a size mismatch error in this line.

It seems the cnet changes the resolution correctly, while the feature network keeps the original resolutions.

Can you supply a bugfix for this or describe, how you would change the network to allow different downsampling values?

Many thanks in advance!

kitti2012以及kitti2015的checkpoint的区别

作者您好,我想问一下所提供的在kitti上的两个checkpoint,kitti2012和kitti2015的区别。
论文里面写到的是在12和15的混合训练集上进行fine-tune,那应该两个数据集共享一个checkpoint
希望您可以提供一些这两个checkpoint的训练细节,谢谢

How to generate Figure 2(b)?

Hi,

Thank you for sharing this amazing work! I am wondering how did you generate Figure 2b (shown in screenshot below) in the paper? Did you first generate the all-pairs correlations using the following function? If so, how did you go from there to disparity map?

def corr(fmap1, fmap2):

Screenshot 2023-07-26 at 3 53 38 pm

Thank you in advance for your help with my question :)

有关模型训练的一些问题

作者您好!
我直接使用KITTI Dataset来训练模型,但是我发现我的模型在训练到800steps时,他的Loss降到了14,但是继续训练后,他的Loss直接会升到200左右,并且再也没降下来。
然后这是我的Loss图像,我想知道这是什么原因。谢谢!
image

算法调用咨询

作者您好,我尝试了一下将IGEV-Stereo里面的模型转换为onnx模型进行c++的调用,但没能转换成功;请问您有相关代码提供或者文档吗

Questions about MVS training

First, thank you for sharing this great research.

I have a question regarding MVS training.
When training MVS, despite having GT Depth, is there a reason to convert it to disparity level and lose it?

When I was training, I noticed that borrowing the RAFT structure, applying loss at depth level doesn't work, but applying loss at disparity level works.

Do you have any idea why this is the case?

Thanks again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.