cnexah / va-depthnet Goto Github PK
View Code? Open in Web Editor NEWVA-DepthNet: A Variational Approach to Single Image Depth Prediction
License: MIT License
VA-DepthNet: A Variational Approach to Single Image Depth Prediction
License: MIT License
I plan to use this code to process the images in the dataset to get the appropriate depth map. The dataset has 365 images. But it takes about two seconds to process each image. I guess this is because I have to load the model every time I process an image.
Could you please help me out there?
Code:
import torch
from PIL import Image
import numpy as np
from vadepthnet.networks.vadepthnet import VADepthNet
from vadepthnet.dataloaders.dataloader import ToTensor
import os
import tqdm
import glob
import time
def save_raw_16bit(depth, fpath="raw.png"):
if isinstance(depth, torch.Tensor):
depth = depth.squeeze().cpu().numpy()
assert isinstance(depth, np.ndarray), "Depth must be a torch tensor or numpy array"
assert depth.ndim == 2, "Depth must be 2D"
depth = depth * 5000 # scale for 16-bit png
depth = depth.astype(np.uint16)
depth = Image.fromarray(depth)
depth.save(fpath)
print("Saved raw depth to", fpath)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: %s" % device)
model = VADepthNet(max_depth=10,
prior_mean=1.54,
img_size=(480, 640))
model = torch.nn.DataParallel(model)
checkpoint = torch.load('vadepthnet_nyu.pth', map_location=device)
model.load_state_dict(checkpoint['model'])
model.eval()
totensor = ToTensor('test')
#Dataset path
#There are 365 images in the dataset.
img_dir = 'D:/xiaohe/VA-depthnet/VA-DepthNet-main/rgb/*.png'
for img in glob.glob(img_dir):
start = time.perf_counter()
img = 'D:/xiaohe/VA-depthnet/VA-DepthNet-main/rgb/' + os.path.basename(img)
image = Image.open(img)
image = np.asarray(image, dtype=np.float32) / 255.0
image = totensor.to_tensor(image)
image = totensor.normalize(image)
image = image.unsqueeze(0)
pdepth = model(image)
pdepth = pdepth.cpu().detach().numpy()
pdepth = np.squeeze(pdepth)
fpath = "./output/"
fpath = os.path.join(fpath, os.path.basename(img))
save_raw_16bit(pdepth, fpath)
end = time.perf_counter()
runTime = end - start
runTime_ms = runTime * 1000
print(runTime, "s")
print(runTime_ms, "ms")
Result:
D:\xiaohe\anaconda\envs\zoedepth\python.exe D:\xiaohe\VA-depthnet\VA-DepthNet-main\test-author.py
device: cuda
D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Saved raw depth to ./output/1341845820.751833.png
3.3062937000000003 s
3306.2937 ms
Saved raw depth to ./output/1341845820.787768.png
2.1543924000000008 s
2154.3924000000006 ms
Saved raw depth to ./output/1341845820.819654.png
2.1151333 s
2115.1333 ms
Saved raw depth to ./output/1341845820.851997.png
2.1185686000000015 s
2118.5686000000014 ms
Saved raw depth to ./output/1341845820.887882.png
2.2202800999999983 s
2220.280099999998 ms
Saved raw depth to ./output/1341845820.920082.png
2.1999916000000006 s
2199.9916000000007 ms
Traceback (most recent call last):
File "D:\xiaohe\VA-depthnet\VA-DepthNet-main\test-author.py", line 49, in
pdepth = model(image)
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\xiaohe\VA-depthnet\VA-DepthNet-main\vadepthnet\networks\vadepthnet.py", line 303, in forward
d = self.vlayer(x)
File "D:\xiaohe\anaconda\envs\zoedepth\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\xiaohe\VA-depthnet\VA-DepthNet-main\vadepthnet\networks\vadepthnet.py", line 188, in forward
x = torch.linalg.solve(ATA+jitter,ATB)
KeyboardInterrupt.
Thank you for your great work.
I am using an NVIDIA 3090 graphics card and trying to train with my own dataset. The dimensions of the dataset are consistent with KITTI. I attempted to modify the batch size, but it had no effect. The error details are as follows:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 684.00 MiB (GPU 0; 23.69 GiB total capacity; 20.26 GiB already allocated; 386.12 MiB free; 21.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Additionally, the dataset I am using is in TIFF format.
===============================================================================
do you know how to deal with it
Thank you very much for your sharing. I am very interested in your research and would like to politely ask if you have a demo.py or test.py file that I can use to test a single image with the trained model and obtain a depth map.
How is the prior_mean calculated?
Thanks~
Hi again! I noticed that you mentioned a rough method for calculating prior_mean in your previous answers. I tried experimenting based on that, but the prior_mean I computed for the NYU dataset is significantly different from the value you provided. I'm curious to know where the issue might be. Here's my code:
import os
import cv2
import numpy as np
def calculate_prior_mean(dataset_path, filenames_file):
with open(filenames_file, 'r') as file:
filenames = file.read().splitlines()
total_log_depths = 0
total_valid_pixels = 0
for filename in filenames:
depth_image_path = os.path.join(dataset_path, filename) # Assuming filenames contain paths
depth_image = cv2.imread(depth_image_path, cv2.IMREAD_UNCHANGED).astype(np.float32) / 1000.0 # Assuming depth values are in millimeters
valid_pixels = depth_image > 0
log_depths = np.log(depth_image[valid_pixels])
total_log_depths += np.sum(log_depths)
total_valid_pixels += np.sum(valid_pixels)
prior_mean = total_log_depths / total_valid_pixels
return prior_mean
dataset_path = 'image-depth-estimation/data/nyu2_test'
filenames_file = 'priormeannyutest.txt'
prior_mean = calculate_prior_mean(dataset_path, filenames_file)
print(f'The calculated prior mean is: {prior_mean}')
Output result:
The calculated prior mean is: 0.8990918813373943
I am writing to you today with a question regarding your impressive paper, "VA-DepthNet: A Variational Approach to Single Image Depth Prediction." I am particularly interested in Table 4, which presents the results on the SUN-RGB-D test set when the model is trained only on the NYU V2 training dataset.
While reading your paper, I noticed that the reported metrics for the SUN-RGB-D test set in Table 4 differ from those presented in other related works, such as the AdaBins paper by Bhat et al. (2021) and the DDP paper by Yuanfeng Ji et al. (2023). I am wondering if there might be an explanation for these discrepancies or if I have perhaps missed some important detail in my understanding.
Metrics written on VA-depthnet:
Metrics written on Adabins:
Metrics written on DDP:
I would be very grateful if you could clarify this point for me. I am eager to learn more about your work and its impressive methodology.
Thank you for your time and consideration.
Run the command for the first time: python ./vadepthnet/eval.py ./configs/arguments_eval_nyu.txt
result:
Computing errors for 654 eval samples , post_process: False
silog, abs_rel, log10, rms, sq_rel, log_rms, d1, d2, d3
37.1363, 0.5363, 0.1786, 1.2995, 0.8748, 0.4910, 0.3442, 0.6113, 0.8001
Run the command for the second time: python ./vadepthnet/eval.py ./configs/arguments_eval_nyu.txt
result:
Computing errors for 654 eval samples , post_process: False
silog, abs_rel, log10, rms, sq_rel, log_rms, d1, d2, d3
36.3882, 0.4038, 0.1615, 1.2035, 0.5721, 0.4457, 0.3762, 0.6654, 0.8434
Can you help me with this problem?
Hi, just for a sanity check I am training a small number of KITTI data points to the model and below is an output when I used one of the inputs for training as input for inference. The output of the model after converging is a bit odd. Errors all came down reasonably, but I can't figure out why the output depth map is blurry.
When I load the pretrained model on KITTI(vadepthnet_eigen.pth), the outputs are fine. However, when I try to train it with KITTI data with the pretrained encoder backbone, the outputs are like the above.
Do you have any clue why the output is blurry? Thanks in advance.
Hi there! I was trying to get your wondnerful model tested on a random image when I encountered this bug, could you please help me out there?
Below is the code to reproduce the error:
import torch
from vadepthnet.networks.vadepthnet import VADepthNet
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: %s" % device)
model = VADepthNet(max_depth=2,
prior_mean=.6,
img_size=(480, 640))
model = torch.nn.DataParallel(model)
checkpoint = torch.load('vadepthnet_nyu.pth', map_location=device)
model.load_state_dict(checkpoint['model'])
img = torch.rand(1,3,480,640).to(torch.float32)
pdepth = model.forward(img)
print(pdepth)
And the error message gives:
Traceback (most recent call last):
File "/data2/zq/VA-DepthNet/test_img.py", line 25, in
pdepth = model.forward(img)
File "/home/zq/micromamba/envs/zoe/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/zq/micromamba/envs/zoe/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data2/zq/VA-DepthNet/vadepthnet/networks/vadepthnet.py", line 302, in forward
d = self.vlayer(x)
File "/home/zq/micromamba/envs/zoe/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data2/zq/VA-DepthNet/vadepthnet/networks/vadepthnet.py", line 187, in forward
x, _ = torch.linalg.solve(ATB, ATA+jitter)
RuntimeError: linalg.solve: A must be batches of square matrices, but they are 1200 by 1 matrices
Could you add a user case on random image inference please? Thanks in advance!
Hello.
I am really new in this field of SIDP.
I am trying to understanding your architecture.
As I understanding, raw rgb images are located under the 'kitti_raw' folder and other ground truth images are located under the 'gt' folder.
When I tried your pre-trained "KITTI EIGEN" model,
$ python vadepthnet/eval.py configs/yoon_arguments_eval_kittieigen.txt
Q. Do you have any ideas why it cannot read files and show "0 eval samples"?
/home/013907062/.conda/envs/depthEst/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/013907062/.conda/envs/depthEst/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
/home/013907062/.conda/envs/depthEst/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
== Total number of parameters: 263110761
== Total number of learning parameters: 263110761
== Model Initialized
== Loading checkpoint 'ckpts/vadepthnet_eigen.pth'
== Loaded checkpoint 'ckpts/vadepthnet_eigen.pth'
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 697/697 [00:30<00:00, 22.61it/s]
Computing errors for 0 eval samples , post_process: False
silog, abs_rel, log10, rms, sq_rel, log_rms, d1, d2, d3
nan, nan, nan, nan, nan, nan, nan, nan, nan
Q. When I tried to eval with kitti_official_valid.txt, I got issue depth_image s
depth_path = os.path.join(gt_path, "./" + sample_path.split()[1])
~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
So I added depth_image paths in the filed like this
(newDepth) [013907062@g5 VA-DepthNet]$ python vadepthnet/eval.py configs/yoon_arguments_eval_kittieigen.txt
/home/013907062/.conda/envs/newDepth/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
== Total number of parameters: 263110761
== Total number of learning parameters: 263110761
== Model Initialized
== Loading checkpoint 'ckpts/vadepthnet_eigen.pth'
== Loaded checkpoint 'ckpts/vadepthnet_eigen.pth'
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [07:27<00:00, 2.24it/s]
Computing errors for 1000 eval samples , post_process: False
silog, abs_rel, log10, rms, sq_rel, log_rms, d1, d2, d3
6.5207, 0.0461, 0.0198, 1.9626, 0.1426, 0.0714, 0.9802, 0.9967, 0.9991
Am I doing correctly?
I wonder how can I check test set(without ground truth) from KITTI sites as well.
Q Where is located results of inference? Where can I look at output results?
torch.linalg.solve
by torch.linalg.lstsq
Can you please point out where can I find the GT for KITTI RAW data .
Much appreciated,
Thank you.
Lines 133 to 139 of the code seem to mean, calculate the difference of step 2, the paper does not mention, what is the reason for using the difference of step 2?
And does the difference operator with step 2 ignore ((i+1) % w! = 0)
this condition?
if (i+2) % w != 0 and (i+1) % w != 0 and(i+2) < num:
a[i, 2, i] = 1.0
a[i, 2, i+2] = -1.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.