cnexah / va-depthnet Goto Github PK

View Code? Open in Web Editor NEW

112.0 112.0 5.0 1.07 MB

VA-DepthNet: A Variational Approach to Single Image Depth Prediction

License: MIT License

Python 100.00%

va-depthnet's People

Contributors

Stargazers

Watchers

Forkers

xinfushe lhickley abdkhanstd xinyuhou97 lukasnroessler

va-depthnet's Issues

Processing time problem

I plan to use this code to process the images in the dataset to get the appropriate depth map. The dataset has 365 images. But it takes about two seconds to process each image. I guess this is because I have to load the model every time I process an image.
Could you please help me out there?

Code:
import torch
from PIL import Image
import numpy as np
from vadepthnet.networks.vadepthnet import VADepthNet
from vadepthnet.dataloaders.dataloader import ToTensor
import os
import tqdm
import glob
import time

def save_raw_16bit(depth, fpath="raw.png"):
if isinstance(depth, torch.Tensor):
depth = depth.squeeze().cpu().numpy()
assert isinstance(depth, np.ndarray), "Depth must be a torch tensor or numpy array"
assert depth.ndim == 2, "Depth must be 2D"
depth = depth * 5000 # scale for 16-bit png
depth = depth.astype(np.uint16)
depth = Image.fromarray(depth)
depth.save(fpath)
print("Saved raw depth to", fpath)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: %s" % device)
model = VADepthNet(max_depth=10,
prior_mean=1.54,
img_size=(480, 640))
model = torch.nn.DataParallel(model)
checkpoint = torch.load('vadepthnet_nyu.pth', map_location=device)
model.load_state_dict(checkpoint['model'])
model.eval()
totensor = ToTensor('test')

#Dataset path
#There are 365 images in the dataset.
img_dir = 'D:/xiaohe/VA-depthnet/VA-DepthNet-main/rgb/*.png'
for img in glob.glob(img_dir):
start = time.perf_counter()
img = 'D:/xiaohe/VA-depthnet/VA-DepthNet-main/rgb/' + os.path.basename(img)
image = Image.open(img)
image = np.asarray(image, dtype=np.float32) / 255.0
image = totensor.to_tensor(image)
image = totensor.normalize(image)
image = image.unsqueeze(0)
pdepth = model(image)
pdepth = pdepth.cpu().detach().numpy()
pdepth = np.squeeze(pdepth)
fpath = "./output/"
fpath = os.path.join(fpath, os.path.basename(img))
save_raw_16bit(pdepth, fpath)
end = time.perf_counter()
runTime = end - start
runTime_ms = runTime * 1000
print(runTime, "s")
print(runTime_ms, "ms")

Result:
D:\xiaohe\anaconda\envs\zoedepth\python.exe D:\xiaohe\VA-depthnet\VA-DepthNet-main\test-author.py
device: cuda
D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Saved raw depth to ./output/1341845820.751833.png
3.3062937000000003 s
3306.2937 ms
Saved raw depth to ./output/1341845820.787768.png
2.1543924000000008 s
2154.3924000000006 ms
Saved raw depth to ./output/1341845820.819654.png
2.1151333 s
2115.1333 ms
Saved raw depth to ./output/1341845820.851997.png
2.1185686000000015 s
2118.5686000000014 ms
Saved raw depth to ./output/1341845820.887882.png
2.2202800999999983 s
2220.280099999998 ms
Saved raw depth to ./output/1341845820.920082.png
2.1999916000000006 s
2199.9916000000007 ms
Traceback (most recent call last):
File "D:\xiaohe\VA-depthnet\VA-DepthNet-main\test-author.py", line 49, in
pdepth = model(image)
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\xiaohe\VA-depthnet\VA-DepthNet-main\vadepthnet\networks\vadepthnet.py", line 303, in forward
d = self.vlayer(x)
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\xiaohe\VA-depthnet\VA-DepthNet-main\vadepthnet\networks\vadepthnet.py", line 188, in forward
x = torch.linalg.solve(ATA+jitter,ATB)
KeyboardInterrupt.

CUDA out of memory

Thank you for your great work.
I am using an NVIDIA 3090 graphics card and trying to train with my own dataset. The dimensions of the dataset are consistent with KITTI. I attempted to modify the batch size, but it had no effect. The error details are as follows:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 684.00 MiB (GPU 0; 23.69 GiB total capacity; 20.26 GiB already allocated; 386.12 MiB free; 21.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additionally, the dataset I am using is in TIFF format.

WARNING batched routines are designed for small sizes. It might be better to use the Native/Hybrid classical routines if you want good performance.

When I train this network with a different dataset, the following error occurs：

WARNING batched routines are designed for small sizes. It might be better to use the
Native/Hybrid classical routines if you want good performance.

===============================================================================
do you know how to deal with it

demo.py or test.py

Thank you very much for your sharing. I am very interested in your research and would like to politely ask if you have a demo.py or test.py file that I can use to test a single image with the trained model and obtain a depth map.

prior_mean

How is the prior_mean calculated?
Thanks~

Calculation-prior _ mean

Hi again! I noticed that you mentioned a rough method for calculating prior_mean in your previous answers. I tried experimenting based on that, but the prior_mean I computed for the NYU dataset is significantly different from the value you provided. I'm curious to know where the issue might be. Here's my code:

import os
import cv2
import numpy as np

def calculate_prior_mean(dataset_path, filenames_file):
with open(filenames_file, 'r') as file:
filenames = file.read().splitlines()

total_log_depths = 0
total_valid_pixels = 0

for filename in filenames:
    depth_image_path = os.path.join(dataset_path, filename)  # Assuming filenames contain paths
    depth_image = cv2.imread(depth_image_path, cv2.IMREAD_UNCHANGED).astype(np.float32) / 1000.0  # Assuming depth values are in millimeters
    valid_pixels = depth_image > 0
    log_depths = np.log(depth_image[valid_pixels])
    total_log_depths += np.sum(log_depths)
    total_valid_pixels += np.sum(valid_pixels)
prior_mean = total_log_depths / total_valid_pixels

return prior_mean

dataset_path = 'image-depth-estimation/data/nyu2_test'
filenames_file = 'priormeannyutest.txt'
prior_mean = calculate_prior_mean(dataset_path, filenames_file)
print(f'The calculated prior mean is: {prior_mean}')

Output result：
The calculated prior mean is: 0.8990918813373943

Question about the SUN RGB-D metric of VA-DepthNet

I am writing to you today with a question regarding your impressive paper, "VA-DepthNet: A Variational Approach to Single Image Depth Prediction." I am particularly interested in Table 4, which presents the results on the SUN-RGB-D test set when the model is trained only on the NYU V2 training dataset.

While reading your paper, I noticed that the reported metrics for the SUN-RGB-D test set in Table 4 differ from those presented in other related works, such as the AdaBins paper by Bhat et al. (2021) and the DDP paper by Yuanfeng Ji et al. (2023). I am wondering if there might be an explanation for these discrepancies or if I have perhaps missed some important detail in my understanding.

Metrics written on VA-depthnet:

Metrics written on Adabins:

Metrics written on DDP:

I would be very grateful if you could clarify this point for me. I am eager to learn more about your work and its impressive methodology.

Thank you for your time and consideration.

Why does the measurement number change with each run evaluation?

Run the command for the first time: python ./vadepthnet/eval.py ./configs/arguments_eval_nyu.txt
result:
Computing errors for 654 eval samples , post_process: False
silog, abs_rel, log10, rms, sq_rel, log_rms, d1, d2, d3
37.1363, 0.5363, 0.1786, 1.2995, 0.8748, 0.4910, 0.3442, 0.6113, 0.8001

Run the command for the second time: python ./vadepthnet/eval.py ./configs/arguments_eval_nyu.txt
result:
Computing errors for 654 eval samples , post_process: False
silog, abs_rel, log10, rms, sq_rel, log_rms, d1, d2, d3
36.3882, 0.4038, 0.1615, 1.2035, 0.5721, 0.4457, 0.3762, 0.6654, 0.8434

Can you help me with this problem?

Output of model is odd

Hi, just for a sanity check I am training a small number of KITTI data points to the model and below is an output when I used one of the inputs for training as input for inference. The output of the model after converging is a bit odd. Errors all came down reasonably, but I can't figure out why the output depth map is blurry.

When I load the pretrained model on KITTI(vadepthnet_eigen.pth), the outputs are fine. However, when I try to train it with KITTI data with the pretrained encoder backbone, the outputs are like the above.

Do you have any clue why the output is blurry? Thanks in advance.

user case on random images?

Hi there! I was trying to get your wondnerful model tested on a random image when I encountered this bug, could you please help me out there?
Below is the code to reproduce the error:
import torch from vadepthnet.networks.vadepthnet import VADepthNet device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print("device: %s" % device) model = VADepthNet(max_depth=2, prior_mean=.6, img_size=(480, 640)) model = torch.nn.DataParallel(model) checkpoint = torch.load('vadepthnet_nyu.pth', map_location=device) model.load_state_dict(checkpoint['model']) img = torch.rand(1,3,480,640).to(torch.float32) pdepth = model.forward(img) print(pdepth)
And the error message gives:
Traceback (most recent call last): File "/data2/zq/VA-DepthNet/test_img.py", line 25, in pdepth = model.forward(img) File "/home/zq/micromamba/envs/zoe/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/zq/micromamba/envs/zoe/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/data2/zq/VA-DepthNet/vadepthnet/networks/vadepthnet.py", line 302, in forward d = self.vlayer(x) File "/home/zq/micromamba/envs/zoe/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/data2/zq/VA-DepthNet/vadepthnet/networks/vadepthnet.py", line 187, in forward x, _ = torch.linalg.solve(ATB, ATA+jitter) RuntimeError: linalg.solve: A must be batches of square matrices, but they are 1200 by 1 matrices
Could you add a user case on random image inference please? Thanks in advance!

Questions on Eval/Dataset

Hello.
I am really new in this field of SIDP.
I am trying to understanding your architecture.

As I understanding, raw rgb images are located under the 'kitti_raw' folder and other ground truth images are located under the 'gt' folder.

When I tried your pre-trained "KITTI EIGEN" model,

$ python vadepthnet/eval.py configs/yoon_arguments_eval_kittieigen.txt

Q. Do you have any ideas why it cannot read files and show "0 eval samples"?

/home/013907062/.conda/envs/depthEst/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/013907062/.conda/envs/depthEst/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/013907062/.conda/envs/depthEst/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
== Total number of parameters: 263110761
== Total number of learning parameters: 263110761
== Model Initialized
== Loading checkpoint 'ckpts/vadepthnet_eigen.pth'
== Loaded checkpoint 'ckpts/vadepthnet_eigen.pth'
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 697/697 [00:30<00:00, 22.61it/s]
Computing errors for 0 eval samples , post_process:  False
  silog, abs_rel,   log10,     rms,  sq_rel, log_rms,      d1,      d2,      d3
    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan

Q. When I tried to eval with kitti_official_valid.txt, I got issue depth_image s

depth_path = os.path.join(gt_path, "./" + sample_path.split()[1])
                                              ~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

So I added depth_image paths in the filed like this

(newDepth) [013907062@g5 VA-DepthNet]$ python vadepthnet/eval.py configs/yoon_arguments_eval_kittieigen.txt
/home/013907062/.conda/envs/newDepth/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
== Total number of parameters: 263110761
== Total number of learning parameters: 263110761
== Model Initialized
== Loading checkpoint 'ckpts/vadepthnet_eigen.pth'
== Loaded checkpoint 'ckpts/vadepthnet_eigen.pth'
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [07:27<00:00,  2.24it/s]
Computing errors for 1000 eval samples , post_process:  False
  silog, abs_rel,   log10,     rms,  sq_rel, log_rms,      d1,      d2,      d3
 6.5207,  0.0461,  0.0198,  1.9626,  0.1426,  0.0714,  0.9802,  0.9967,  0.9991

Am I doing correctly?
I wonder how can I check test set(without ground truth) from KITTI sites as well.

Q Where is located results of inference? Where can I look at output results?

torch._C._LinAlgError

problem description
torch._C._LinAlgError encountered after a few terations
proposed solution
replacing torch.linalg.solve by torch.linalg.lstsq

Need some help please

Can you please point out where can I find the GT for KITTI RAW data .

Much appreciated,
Thank you.

In the part of the code that calculates the difference, why add the difference of step size 2

vadepthnet.py

Lines 133 to 139 of the code seem to mean, calculate the difference of step 2, the paper does not mention, what is the reason for using the difference of step 2?

And does the difference operator with step 2 ignore ((i+1) % w! = 0) this condition?

    if (i+2) % w != 0 and (i+1) % w != 0 and(i+2) < num:
        a[i, 2, i] = 1.0
        a[i, 2, i+2] = -1.0