gangweix / igev Goto Github PK

View Code? Open in Web Editor NEW

467.0 29.0 58.0 50.83 MB

[CVPR 2023] Iterative Geometry Encoding Volume for Stereo Matching and Multi-View Stereo

License: MIT License

Python 89.27% MATLAB 10.73%

multi-view-stereo stereo-matching cost-volume-filtering iterative-optimization cvpr2023

igev's Introduction

IGEV-Stereo & IGEV-MVS (CVPR 2023)

This repository contains the source code for our paper:

Iterative Geometry Encoding Volume for Stereo Matching
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang

Demos

Pretrained models can be downloaded from google drive

We assume the downloaded pretrained weights are located under the pretrained_models directory.

You can demo a trained model on pairs of images. To predict stereo for Middlebury, run

python demo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth

Comparison with RAFT-Stereo

Method	KITTI 2012 (3-noc)	KITTI 2015 (D1-all)	Memory (G)	Runtime (s)
RAFT-Stereo	1.30 %	1.82 %	1.02	0.38
IGEV-Stereo	1.12 %	1.59 %	0.66	0.18

Environment

NVIDIA RTX 3090
Python 3.8
Pytorch 1.12

Create a virtual environment and activate it.

conda create -n IGEV_Stereo python=3.8
conda activate IGEV_Stereo

Dependencies

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -c nvidia
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install tqdm
pip install timm==0.5.4

Required Data

To evaluate/train IGEV-Stereo, you will need to download the required datasets.

By default stereo_datasets.py will search for the datasets in these locations.

├── /data
    ├── sceneflow
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── KITTI_2012
            ├── training
            ├── testing
            ├── vkitti
        ├── KITTI_2015
            ├── training
            ├── testing
            ├── vkitti
    ├── Middlebury
        ├── trainingH
        ├── trainingH_GT
    ├── ETH3D
        ├── two_view_training
        ├── two_view_training_gt
    ├── DTU_data
        ├── dtu_train
        ├── dtu_test

Evaluation

To evaluate on Scene Flow or Middlebury or ETH3D, run

python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset sceneflow

python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset middlebury_H

python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset eth3d

Training

To train on Scene Flow, run

python train_stereo.py --logdir ./checkpoints/sceneflow

To train on KITTI, run

python train_stereo.py --logdir ./checkpoints/kitti --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --train_datasets kitti

Submission

For submission to the KITTI benchmark, run

python save_disp.py

MVS training and evaluation

To train on DTU, run

python train_mvs.py

To evaluate on DTU, run

python evaluate_mvs.py

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{xu2023iterative,
  title={Iterative Geometry Encoding Volume for Stereo Matching},
  author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={21919--21928},
  year={2023}
}

Acknowledgements

This project is heavily based on RAFT-Stereo, we thank the original authors for their excellent work.

igev's People

Contributors

Stargazers

Watchers

Forkers

bxslalala wioponsen bedoctor steven-xiong htensor cmes-ai lzzyzlbb zhazhao9 coder-drinker mistyr0se hs991023 awekling xupercoin obsidian6s e-kiss-me fskeo hisstar moguijoe luluchou zhuxing0 lycokie monsterdove paramedick maigone adrien-vl closegoingaway vamoko staccats 0x8235 herpacker zza1617 marisilicon yw94cool achbogga genaivisionary henrylol yanconggit sam-looper-noteworthy kingdeng05 immugong weihaoysgs zju-uw3dvg joooooc sgolodetz-vm weisui-ad ysw774977 alalain includemathh jiahexu city292 tasmbot eurekarobotics hzfguardian zchen666 amr-hussain emanuelegiacomini carolchen2024

igev's Issues

关于demo_imgs.py相关的问题

@gangweiX 作者您好，我在进行demo图片测试的时候，报出以下错误：

当我对报错区域进行初始化后

又会出现如下的错误

请问应该如何解决

About the performance on ETH3D

Dear Dr.Xu:
You did a fantastic job on this project! Thank you for sharing your work.
I would like to ask you a question regarding ETH3D. Your evaluation score (3.6) for the ETH3D training set is more modest than RAFT-Stereo's score (3.2), but your experiments outperform RAFT-Stereo in the ETH3D online benchmark (You rank 4th and RAFT ranks 62nd). Could you please teach me why you can perform better now?
Thank you very much!

关于网络模型泛化能力的问题

您好，感谢您开源的代码
对于网络结构，我想问一下，您是如何提高模型的泛化能力的，我使用了其他网络模型来预测我自己拍的图像，但是效果十分差，但是用了您的IGEV网络跑出来的效果还是比较可观的，想咨询一下您是如何做到这么强的泛化性的。谢谢

自己重新训练数据集时出现问题

我用 $Scene \ Flow$ 数据集从零开始训练， $Loss$ 一直在上升，同时 $EPE$ 也在上升，并且 $Loss$ 和 $EPE$ 在最后都会直接收敛。是数据集出现问题了吗？然后这是我训练时候的 $Loss$ 和 $EPE$。对于整个网络，我只修改了数据集读取的路径，是不是我的数据集本身出现问题了？

咨询：训练成本如何降低

从预训练模型到微调的成本并不高，几个小时。但是在仿真数据集的训练一次，起码要耗时4天。
您在设计实验的时候有没有考虑一些降低实验成本的方法？
我在您的提问回答中看到，没有尝试直接在KITTI上进行训练。所以想了解下您是否尝试去降低训练成本，毕竟小一周才能看一次的话，实验很难推进下去。
也许是分割了数据集中的一小部分，作为一个基准，最后全部跑一次。如果是这样，恳请您能详细说下划分的方式和基准的参数。
也许是先依照不到200K的迭代次数，作为一个基准。如果是这样，也想请您详细述说。
这些都是我的一些猜测，因为我不太敢想想八卡服务器或者是好几台服务器混合做实验（贫穷限制了想象）。

Question about EPE calculation.

Thanks for your amazing work.

evaluate_stereo.py on line 192:

epe = torch.sum((flow_pr - flow_gt)**2, dim=0).sqrt()

However, by definition, $EPE=|d_{est} - d_{gt}|$, $MAE=\frac{1}{N}\sum_{(x, y)\in N}|d_{est} - d_{gt}|$ So should it be like this?

epe = ((flow_pr - flow_gt)**2).sqrt()

In addition, another indicator D1 will also be affected by the change in the calculation method of EPE.

Where can I download the DTU dataset used in the project?

The DTU dataset downloaded from the public website has a different directory structure compared to the one in the project's dtu_yao.py file, especially the structure of the Cameras_1 directory. How should I adjust it?

AttributeError: 'EfficientNetFeatures' object has no attribute 'act1'

Thank you for sharing the code for your great work!
I had this problem when I tried to run demo_imgs.py！
self.act1 = model.act1
AttributeError: 'EfficientNetFeatures' object has no attribute 'act1'
look forward to your reply.

如何将模型应用在鱼眼图像中？

@gangweiX 作者大大您好，我想请问一下，我应该如何将双目鱼眼的图片输入模型来预测深度呢？我如果直接采用鱼眼图片训练模型是否可行？还是说我需要做鱼眼模型和针孔模型的映射？

[Question regarding resolution of input data]

Hi! Thank you for the great works!

I was just wondering whether the output of input data for training effect the outcome of the model!

If so, how high resolution would be good for the performance?

thank you!

MVS pretrained network / demo ?

Thanks for the nice work. Is there a pre-trained model / demo for the MVS network?

Trifocal case: MVS?

Would you advise extending IGEV-Stereo for the special case of a trifocal tensor estabishing the geometric projection between three views, or would IGEV-MVS be a more natural choice? In the latter case, are three images enough for inference? In your paper I see the number of input images is N=5 for training.

Online example.

Please provide an online example.
Maybe on https://huggingface.co
Or https://replicate.com/

Need help to do inference for gray scale image

Want to run the network for gray scale images (single channel)

I get this error while running network on gray images
Traceback (most recent call last):
File "demo_imgs.py", line 100, in
demo(args)
File "demo_imgs.py", line 50, in demo
image1 = load_image(imfile1)
File "demo_imgs.py", line 29, in load_image
img = torch.from_numpy(img).permute(2, 0, 1).float()
RuntimeError: number of dims don't match in permute

I tried copying same gray values for all 3 channel but results are not very good.

I see eth3d is gray scale image dataset so I also tried with eth3d network shared.
But I still get above error.

Can you please share what change is needed to adapt network to gray images?

Two way matching (left->right, right->left)

I usually switch the images (and turn 180°) and run the model a second time to get a stereo-match from right to left. This enables me to perform a consistency check between the two disparity maps. However, this is inefficient, since some modules perform the same calculations for both directions (e.g. the feature network). Where would be the best way to split the model into two paths, where either path performs one stereo-matching direction?

视差图上色的代码应该用哪个

我用了下面两种，但是效果都不怎么好，部分地方颜色异常
plt.imsave(file_stem, disp, cmap='jet')
cv2.imwrite(file_stem, cv2.applyColorMap(cv2.convertScaleAbs(disp, alpha=0.01),cv2.COLORMAP_JET))

SceneFLow Model Performance

Hi!

Thank you for your amazing work!

I test the model's performance on the scene flow dataset, while in the original paper, the EPE performance should equal to 0.47. When I used the per-trained model for testing. I only found the EPE equal to 0.66. Is the releasing Sceneflow pret-trained model with data-aug for other dataset fine-tuning and generation tests?

While the model with an epe of 0.47 only uses for scene flow performance evaluation. If so, would you mind releasing the scenflow-only pertained weights? Since I want to compare with my model in the aspect of occluded regions. Since I cannot reproduce the performance of 0.47.

Thank you very much.

Licence?

hi, i noticed this repo didn't have a licence. can it be used in commercial?

请问middlbury数据集训练视差范围

请问middlebury数据集F分辨率训练和推理的时候，视差范围应该设置为多少？

Precomputed results on KITTI 2015

Thank you for sharing the code for your great work!

Can you provide precomputed results on KITTI 2015? (the results you submit to KITTI)

demo_imgs前向遇到问题

前向时遇到这行报错，https://github.com/gangweiX/IGEV/blob/main/IGEV-Stereo/core/utils/utils.py#L68

assert torch.unique(ygrid).numel() == 1 and H == 1 # This is a stereo problem
AssertionError

触发了这个assert，我看作者标注了this is a stereo problem，请问是什么问题呢？又要怎么解决呢

图像输入大小

About generalization experiments on Middlebury and ETH3D.

Hi,thank you very much for your great work.
I test the generalization performance of the checkpoint you provided on Middlebury and ETH3D, but the results are different from Table7.
Is it because the checkpoint provided is training without data augmentation ?
So i would like to know if i misunderstood or misused? Thanks.

The checkpoint for test is sceneflow.pth

The results on Middlebury(half) training sets is :
All:
'EPE': [7.572738313674927], 'D1': [0.27418503363927205], 'Thres1': [0.4556864986817042], 'Thres2': [0.3513654222091039], 'Thres3': [0.3013058652480443]
Noc:
avg_test_scalars_nonocc {'EPE': [6.537938332557678], 'D1': [0.2406569098432859], 'Thres1': [0.41864077051480614], 'Thres2': [0.3155737062295278], 'Thres3': [0.2668188696106275]}

The results on ETH3D training sets is :
All:
'EPE': [0.9156075097896434], 'D1': [0.06998444140157921], 'Thres1': [0.18718870177313132], 'Thres2': [0.10728459610362295], 'Thres3': [0.06998444140157921]
Noc:
'EPE': [0.9049188627137078], 'D1': [0.06913529045548919], 'Thres1': [0.1841263070150658], 'Thres2': [0.10554461269768783], 'Thres3': [0.06913529045548919]

The results in Table7 is :

Some qualitative results about the checkpoint:

BTW,I am also curious about how the results of RAFT-Stereo in Table 7 are obtained?

The generalization result in the RAFT-Stereo paper is：

The results in Graftnet are the same as RAFT-Stereo:

timm库加载模型时报错

作者你好，我运行你代码时，执行到这里的报错了

报错内容如下：

请问这种情况该如何解决了？
Thank you！

ETH3D and Middlebury

作者您好，我想咨询您一下，在这两个数据集上，您论文中的指标直接是用的Sceneflow中的预训练模型进行检测的吗？而不是混合微调以后的结果！感谢您的指导

Results on ETH3D

Hi, thank you for your great work!
I have one question about results on ETH3D:
What is the difference between the model sceneflow.pth and eth3d.pth? Did you fine-tune on training set of ETH3D?

max disparity can't be changed

Many thanks for your work!

It seems that the max_disp parameter, which is set to 192 per default, can not be changed.
Change this value will cause a shape mismatch.

I figured out, this shape mismatch can be avoided by changing this line to:
gwc_volume = build_gwc_volume(match_left, match_right, self.args.max_disp//4, 8)
However igev stereo still is not able to match disparities > 192 pixels.

Do I neet to retrain the network with this change?

Best regards

cannot find the disp0GT.pfm in Middlebury

there is no disp0GT.pfm.

also, no mask0nocc.png.

There appears to be a small bug

Hi, thanks for the great work!

There appears to be a small bug in Line 137 of ./core/igev_mvs.py, which gives:
view_weight_sum += view_weight_sum + view_weight.unsqueeze(1)

I'm wondering whether this line of code should be written as:
view_weight_sum = view_weight_sum + view_weight.unsqueeze(1)
or
view_weight_sum += view_weight.unsqueeze(1)

如何训练自己的数据集

你好博主，我想请教一下如果训练自己的数据集，在数据加载、数据格式转化、数据预处理方面需要做哪些修改，如能告知，万分感谢。

kitti数据集训练问题

作者您好，我在训练kitti数据集的过程中，发现记载的验证集是FlyingThings的，但是我在evaluate_stereo.py里面已经将dataset改为kitti了，我找不到原因，麻烦您解答一下。

在SceneFlow上训练完，是否要在Middlebury和Eth3d数据集微调？微调多少代

我看论文中似乎并没有提到要在这两个数据集上微调，但是只在scene flow上训练达不到论文的效果，而且作者的预训练模型就有ETH3d和Middlebury的，所以我想应该要在这两个数据集上微调，具体微调多少代？

Thank you so much

Dear Dr.Xu:
Thank you so much for answering my question.
Best!

Question about IGEV-MVS

Nice work!
I know that the basic framework of IGEV-Stereo is based on RaftStereo.
May I ask which method or repo IGEV-MVS refers to?

How do i Finetune

Using IGEV I need to finetune the model on my own dataset. What should be the steps I should follow in order to execute the fine-tuning because there is no fine-tuning file available?

Training divergence

"Hello, author. I'm training the SceneFlow dataset with IGEV, and the EPE keeps increasing, reaching more than 4000. Could you please provide guidance on how to solve this issue?"

n_downsample can't be changed

To reduce the size of the model, I want to retrain the network with lower resolution of the disparity field.
I've already done this with the RAFT-Stereo network. However, when changing the corresponding parameter n_downsample from 2 to 3, the update_block throws a size mismatch error in this line.

It seems the cnet changes the resolution correctly, while the feature network keeps the original resolutions.

Can you supply a bugfix for this or describe, how you would change the network to allow different downsampling values?

Many thanks in advance!

Can you transform your pytorch model to onnx model?

请问获得的视差值的单位是什么？

kitti2012以及kitti2015的checkpoint的区别

作者您好，我想问一下所提供的在kitti上的两个checkpoint，kitti2012和kitti2015的区别。
论文里面写到的是在12和15的混合训练集上进行fine-tune，那应该两个数据集共享一个checkpoint
希望您可以提供一些这两个checkpoint的训练细节，谢谢

How to generate Figure 2(b)?

Hi,

Thank you for sharing this amazing work! I am wondering how did you generate Figure 2b (shown in screenshot below) in the paper? Did you first generate the all-pairs correlations using the following function? If so, how did you go from there to disparity map?

IGEV/IGEV-Stereo/core/geometry.py

Line 62 in ea4d55b

def corr(fmap1, fmap2):

Thank you in advance for your help with my question :)

When I was training, I noticed that borrowing the RAFT structure, applying loss at depth level doesn't work, but applying loss at disparity level works.

Do you have any idea why this is the case?

Thanks again.