jugghm / penet_icra2021 Goto Github PK

View Code? Open in Web Editor NEW

311.0 6.0 44.0 2.36 MB

ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

License: MIT License

Python 100.00%

pytorch depth-completion

penet_icra2021's People

Contributors

Stargazers

Watchers

Forkers

salarim adamzdw maxee1900 wangqiqi577 versatran01 starbike tkuri liuguoyou mdl-psu jzengust mac137 oscarmind alihahaa xuelimin fanrz danishnazir leifengsoul shikishima-tasakilab entc17-fyp8 victor-yg byeonggichae frreason phatli sadteemo icra-2021 zhenglongyu jlqzzz isaac0424 standdrinkmilk giantmonkeytc creasson sxjyjay shyu216 seokyeongkim orram loahit5101 kirilllzaitsev yfh-yufeihu dzcgaara 5l1v3r1 buptwjf

penet_icra2021's Issues

Experimental results using the NYU dataset

Hello, I am still training on PENet with NYU dataset, please help me to take a look. The third one in this graph is the predicted result, right? I think this also proves that this network can run on the NYU dataset, is that correct? Because I want to know if this network is suitable for a densely labeled dataset, thanks. Looking forward to your recovery.

How to back project pixel-wise .png depth image to .bin point cloud?

Training with Multiple GPU's

Hi,
Thanks for the code, I am using 4 GPU's to train the model and training pass goes well. However when It starts validation i receive an error given below.

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 545, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nazir/PENet_ICRA2021/model.py", line 440, in forward
    sparsed_feature3 = self.depth_layer3(sparsed_feature2_plus, geo_s2, geo_s3) # b 64 88 304
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 545, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nazir/PENet_ICRA2021/basic.py", line 312, in forward
    x = torch.cat((x, g1), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 176 and 160 in dimension 2 at /tmp/pip-req-build-akjifb_7/aten/src/THC/generic/THCTensorMath.cu:71

I assume this is due to the multi-gpu training. I see that you have also used 2 GPUs for training, did you see this issue while training?

Thanks,

Use other datasets to train PENet

Hello
I now want to use the PENet network architecture to complete depth completion, because I only need the dataset at hand, so I found the HandNet dataset, which contains depth data and image data collected by the realsense series of depth cameras. I especially want to know where do I need to modify the code? Thank you in advance!
Looking forward to your reply!

Using the NYU dataset

I'm so sorry to bother you again. I watched the NLSPN network a few days ago, but I still want to use the NYU dataset on PENet to see the effect. I especially want to know why the prediction error is so large. Because I have been reading this article before. So I may still need to ask you a few questions:

I only downloaded the labeled dataset of the NYU dataset. Because of the limited hardware conditions in our lab, I downloaded rgb (16-bit jpg), depth (16-bit png), and rawdepths (16-bit) in the labeled dataset. png), as far as I know, they are all aligned. I think depths is groundtruth, I don't know if I think it's right. Because I read a lot of information on the Internet, I feel that it is not specific enough.
2.depths is the depth map after rawdepths is completed. In PENet, I take rawdepths and rgb as input, is this right? Because when I run the code, I see that the pixel values of rawdepths are all 0. I don't know why this is? Is there any other work I need to do before entering it?
I don't know if I express myself clearly? I'm really sorry to bother you again, I'm just getting to know this direction too. Because I really have no other way. Now, my teacher has no time, and I really can't understand what I read online. Thank you very much! Looking forward to your reply!

Are the released pre-trained models the best models?

Hi!
I just want to know whether the released pre-trained models the best models which can reproduce the results presented on the paper?

How to use KITTI-raw data

Hi, recently I've been through your great work, I would like to train the network , but which classes should I to download, you know that, KITTI-raw dataset is too big, so please guide me which classes in KITTI-raw dataset for training. Thanks in advance!

Runtime measurement

Hi there,
thank you very much for your excellent work and for publishing it.

I am trying to implement a "light weight" version of the ENet which aims to be faster in computation.
In order to have the runtime of the ENet on my hardware (Tesla V100 GPU) as a benchmark, I was trying to measure it. Being aware of this issue #4, I took account of the torch.cuda.synchronize() command.
Measuring the time this way, I obtained 10,5 ms runtime for processing a single image.
However, I realized that I can not compute more than 6 depth images per second (while having a GPU load of 100 %), which indicated to me that something was wrong.
Doing further investigations, I came across the PyTorch profiler which seems to be the official tool for correct GPU time measurement . Measuring the time that way I got 150 ms, which is in accordance with my maximum frame rate, as data preprocessing on the CPU comes on top.

Is it possible, that the times you measured are still not the proper execution times of the network, but rather the kernel launch times ?

Inference: own data

Hi,

I just had a few questions regarding using our own data and running inference using PENet pretrained weights.

How sparse can the depth map be?
Currently, my inference image is from the Kitti360 dataset which is quite similar to the previous kitti that the network was trained on. But there is no GT depth to sample the depth from. So my sparse depth map is quite sparse.
When I run inference on this image, the prediction is also sparse i.e I have prediction only in the regions covered by the sparse depth map. Is this an expected behaviour?
What should my input be for 'positions' (i.e the cropped image), I don't want to crop the images for running inference, so should I just set input['positions'] = input['rgb']?

It would be great if you can answer these questions when time permits :)

Regards,
Shrisha

How to convert .bin lidar files to .png

I want to infer using your model on kitti .bin files but your model take .png files. So if it's possible to provide the code for this conversion.
Thanks in advance.

Question about lr

PENet_ICRA2021/helper.py

Line 216 in ee4318a

def adjust_learning_rate(lr_init, optimizer, epoch, args):

Confusion about training time?

How long it takes to train the model one epoch?

How to inference with different input sizes?

Hi!I need to infer image with size 370*1226 without cropping. Is there any args in the code that I can change to directly conduct it?

Normalization Input Data

Hi and thank you very much for your great repo.
As it was mentioned in #43 before, you did not normalize the input data. However, as far as I know it is common practice to do so for a more efficient training and keeping weights and biases small. May I ask why you decided to not normalize your data?

Projection result confirmation

Hello,

I'm now working on the cross-modality detection tasks in 3D space.
Since the SFDNet uses this method as their depth completion way, so I try this repo as well.
The following is the output of PENet and the outcome I obtained when I project them back.
Does it look correct?
I know that the recovered 3d position from the depth map is suffering from the artifacts as u discuss in this issue: #3.
But it looks more severe far beyond my expectation.
Thanks for any help

Questioning Inference Speed

Good day,

First of all, congratulations on your work and paper. The idea of separating depth-dominant and color-dominant branches is interesting. Also, thank you for releasing the source code to the public. I have been replicating your code the past few days, and so far inferencing has been straightforward (I am getting RMSE scores at around ~760).

However, correct me if I'm wrong but I think there might be a mistake in the inference time computation. In main.py line 213/216, this is where the predictions are generated from the ENet/PENet models, after which gpu_time is computed. I tried adding a print(pred) function call (see in the image below).

I got very different inference times with and without the print(pred) function call. I ran this on a machine with RTX 2080Ti, i7-9700k, CUDA 11.2, torch==1.3.1, torchvision==0.4.2. Below are my runtimes:

original code - a bit faster than your official runtime presumably due to my newer CUDA version(?)

modified code - much slower when print(pred) was added

My understanding is that calling pred = model(batch_data) does not yet run the model prediction; the model inference only actually runs when you call result.evaluate() in line 268 (i.e. lazy execution):

This results in a nearly x10 increase in inference time (i.e. 151ms vs 17ms). Can you confirm that this also happens in your environment?

Questions about KITTI-odometry

您好，您的工作非常出色！
我想将他应用于KITTI-odometry数据集上，并使用了PENet的预训练模型，但是效果不尽如人意。想请问一下您对此问题的建议。
万分感谢！

Cannot load ENet Model

Hi,

I am trying to load the e.pth.tar model and I am not able to do it. All I am doing is

import torch
torch.load('e.pth.tar', 'cuda:0')

and it gives me this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/core_uc/.local/lib/python3.6/site-packages/torch/serialization.py", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/core_uc/.local/lib/python3.6/site-packages/torch/serialization.py", line 787, in _legacy_load
    result = unpickler.load()
ModuleNotFoundError: No module named 'metrics'

Would you know what could be the issue? I didpip3 install metrics just in case but it did not work.

Is that possible to use 3 inputs to your model instead of 2 ?

Hi, I am wondering to use 3 inputs like as if now you are using 2 inputs like Sparse depth map and corresponding RGB, Is that possible to use one more input like Coarse dense depth map got from hand crafted methods with corresponding to sparse depth map ? please let me know if its possible, Thanks in advance .

input images with standard size like 720p

Thank you for your amazing work!
I have an interest to use the ENet depth completion model. It is working very well with the KITTI database, but when I try to feed it with images of different size (640x360) instead of (1216x352) I am facing this error:
RuntimeError: Given groups=1, weight of size [32, 4, 5, 5], expected input[1, 5, 360, 640] to have 4 channels, but got 5 channels instead.
Is the input layer dedicated to this form factor? What is your recommendation to adapt to input image of size 360x640 or 720p?

Swapped model weight links to google drive

Opening the link for the PE model leads to the google drive link e.pth.tar, wheres the link for the E model leads to the drive link for pe.pth.tar.

About the training data

Thanks for your code!
I have a question about the KITTI raw data. Do I need to download all the raw data? I know that in the monocular depth estimation, generally only a part of it needs to be downloaded. Is there a download list that I can refer to? thanks!

A problem of incomplete network structure of backbone

In model.py，for the depth-dominant branch of backbone,
is it missing sparsed_feature0_plus = torch.cat([rgb_feature0_plus, sparsed_feature], 1) ？
Between 183 and 184.

Why did I only modify the network a little and triple the size of. PTH. Tar?

Thank you very much for sharing this great work~
In the process of modifying your network, I found a small problem. I just increased the network parameters from 131762888 to 131763344. Why did the saved .Pth.tar file triple from about 500m to 1.5G

Do intesity of pixels of RGB images get normalized to the range [0, 1]?

Hello,
Thank you very much for such an amazing work.
I have a question about input of ENet. As I understand from method __getitem__ of class KittiDepth, images represented by variable rgb do not get normalized to have pixel intensity in the range [0, 1.0], instead the intesity range is [0, 255]. Is it correct?
Looking forward to your answer.

The number of epochs to train the model

The number of epochs to train the model (the three stages) is 100 ?
And how long to train the whole model ?

Why can sparse loss functions supervise the generation of dense depth maps?

Thank you for your excellent work, network design gives me a lot of inspiration.
But there's one thing I never understood. I noticed that the training loss function in the paper only focuses on pixels with valid depth(sparse)，so why can networks generate dense depth maps? How to ensure that pixels are accurate without supervision?

question about the implementation of DE-CSPN++

Hi,

When I read the code about the implementation of DE-CSPN++. I notice that during the iteration you continuously update the depth3, depth5 and depth7. However, the depth3, depth5, depth7 are stored in the list without cloning the underlying data. I am wondering if it is incorrect, since the assignment operation in pytorch is only a reference assignment, so the depth stored in the list would also change when the iteration continues.

Thanks for your reply.

Can I reduce kitti_Raw data for training ?

Hi, recently I've been through your great work, I would like to train the network but with less Kitti_Raw data, is that possible? if so please guide me on how to reduce Kitt-raw data for training. Thanks in advance!

Unable to decompress the pre training model

Display file corruption

How to use the sparse depth?

Thank you for your outstanding contribution！
I want to know how the Color-dominant Branch is combined with the point cloud and sent to the network. Does the radar point cloud only take effective points, and does the color image also take the same effective points as the point cloud? How can we get a dense depth map in this way?
This question has been bothering me. I hope you can answer it for me. Thank you again！

Query Regarding iRMSE calculation.

Hello,
Thank you for the great work. I want to ask one small query regarding the iRMSE calculation. In your code iRMSE is calculated as follows

        # convert from meters to km
        inv_output_km = (1e-3 * output[valid_mask])**(-1)
        inv_target_km = (1e-3 * target[valid_mask])**(-1)
        abs_inv_diff = (inv_output_km - inv_target_km).abs()
        irmse = math.sqrt((torch.pow(abs_inv_diff, 2)).mean())`

I want to ask two things

In the first two lines according to your comment, you are converting meters to kilometers. Why are you multiplying with 1e-3 and taking the power to the -1? shouldn't it be 1e3?
inv_output_km is causing the iRMSE to go Nan. This is because output[valid_mask] is 0 at some entries of the tensor. Is this normal behavior?. I am using the same dataset as yours. Is it expected to decrease in late epochs? I only ran training for 5 epochs and iRMSE and IMAE are going Nan.
P.S I am training ENet so I am on round 1.

Thanks for helping out.

Ran PENet pre-trained model and results do not match Kitti benchmark depth completion page

Thank you very much for a very interesting paper.
I have run the PENet pre-trained model you've provided in evaluation mode on the cropped image.
in the results the code provided (val.csv under results) I got RMSE=757.197 MAE=209.001, compared to RMSE=730.08 MAE=210.55 as it appears in the Kitti benchmark page.
IIs there a different PENet model that matches the submitted results? or there is something in the parameters that I put wrong (I kept the parameters as is in this repository).
Thanks a lot,
Mani

Confusion about crop size

Can cspn++ accelerate during training？

Hi, thanks for your good work!

I found that cspn++ only accelerate in inference, can it accelerate during training?

Thanks!

RuntimeError: CUDA out of memory.

Hello!

I encountered some difficulties while training the mode. I run it on sigle NVIDIA GTX 2080 GPU, but an error occurred

RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 7.79 GiB total capacity; 5.85 GiB already allocated; 87.56 MiB free; 136.18 MiB cached)

I think memory should be sufficient, but it only run successfully when the batch size is reduced to 1, and the gradient jitter is very severe. Do you have any advise for this ？ :(

A rather common but vital problem about confidence map

Thank you for your good work!
I notice that in both CD-branch and DD-branch, confidence maps(concatenated with CD-depth and DD-depth respectively) are generated by the last convolutional layer, which are comprized of ordinary conv+bn+relu layer.

self.rgb_decoder_output = deconvbnrelu(in_channels=32, out_channels=2, kernel_size=3, stride=1, padding=1, output_padding=0)

rgb_output = self.rgb_decoder_output(rgb_feature0_plus)
rgb_depth = rgb_output[:, 0:1, :, :]
rgb_conf = rgb_output[:, 1:2, :, :]

self.decoder_layer6 = convbnrelu(in_channels=32, out_channels=2, kernel_size=3, stride=1, padding=1)

depth_output = self.decoder_layer6(decoder_feature5)
d_depth, d_conf = torch.chunk(depth_output, 2, dim=1)
rgb_conf, d_conf = torch.chunk(self.softmax(torch.cat((rgb_conf, d_conf), dim=1)), 2, dim=1)

I wonder how convbnrelu layer can output a neat confidence map and a depth map without confidence supervision. Would you please provide me with some relavant works or papers to see? Thanks.

Questions about the principal point of the croped image

In "kitti_loader.py" file, the function load_calib() changes the principal point of image by the code
"""
K[0, 2] = K[0, 2] - 13; # from width = 1242 to 1216, with a 13-pixel cut on both sides
K[1, 2] = K[1, 2] - 11.5; # from width = 375 to 352, with a 11.5-pixel cut on both sides
"""
but I find the data is croped from the bottom. In this case, I think the change of the principal point should be calculated as
"""
K[0, 2] = K[0, 2] - 13; # from width = 1242 to 1216, with a 13-pixel cut on both sides
K[1, 2] = K[1, 2] - 23; # from width = 375 to 352, with a 23-pixel cut from the bottom
"""

How to use .bin (Raw point Cloud) file for Depth completion ?

Hi, I would like to estimate the dense depth for the Kitti 3D Object detection dataset which also includes Images and its corresponding point cloud but it's in 360 views, not only front view. so how to convert Raw point cloud to the front view depth map and need to save in .png so that I can estimate dense depth map with that. If you provide code for this conversion, I would be very happy to use that code. Thanks in advance.

Trouble loading .pth.tar: Fix: metrics.py must be in the main directory

Hi,

Torch have officially removed version 1.3.1 from their website and I am unable to load the pretrained weights using torch versions 1.4 >. I get this error: ModuleNotFoundError: No module named 'metrics', which is then followed by AttributeError: Can't get attribute 'Result' on <module 'metrics' from '/home/lib/python3.8/site-packages/metrics/__init__.py'> after I pip install metrics.

All of these errors arise when I try torch.load('pe.pth.tar').

I searched for some issues and this maybe because PyTorch has changed the way they save and load the models. Would it be possible to save the model using this: _use_new_zipfile_serialization=True as a flag?

Or did you face this error aswell? The latest version of metrics is 0.3.3

Is that possible to access tensorboard during training ?

Hi, I am just curious to know whether it's possible to access tensrboard logs during training like others. Because it helps to see how convergence our loss function is! Thanks in advance

How to use the sparse depth?

Modify the backbone network

Hi!
I have a question for you!
Recently I changed PENet's backbone network ENet to attention-unet, and I only used part of Kitti's dataset. I felt that the training was a little slow, and then I modified the learning rate when training the backbone network so that the network could learn faster.

It's ok when I train ENet. (Blue is the training set, yellow is the validation set)
But when I used the trained backbone network for the second stage of training, the network experienced severe overfitting. I wonder if this has something to do with the learning rate?

We can see that cspn++ is trained very poorly. The validation set error is large.
What is the reason for this?
Below is the learning rate when I train ENet.

Inquiry on the dataset download

Hi thanks for your impressive work! My future research interests is on the depth completion so I would like to start with your work!

Just one minor question on downloading the kitti raw data: should I download the rectified or uncertified data? And Are all the branches (road, campus, city, person) are used?

Thanks again for your great work and looking forward to your reply!

Questions about data augmentation

Thanks for your great work !
I have a question about data augmentation part. You applied random crop on images and I notice that you didn't apply same augmentation on camera intrinsic matrix. When you crop the images, then the parameters in camera intrinsic matrix like cx,cy will mismatch. Do you guys think fixing this will get some improvement or you leave this on purpose ?

Thanks again for your open-source !

the point cloud looks a bit plausible and noisy

PENet is efficient and the output depth image seems very nice, but the point cloud transformed from the depth inferred by PENet looks plausible and noisy. Is there something wrong with my matlab code used for transformation? Thank you.
depth2cloud.zip

Training Log

Thanks for the nice work!
Is there any chance that the training loss log can be released? That can greatly help my debugging process.

please remove this issue, create by accident

why training using original CSPN implementation and inference using accelerated CSPN

Hi, thanks for your great work! I am wondering why during the training, the original CSPN is used. Because as you mentioned in the paper, the new implemented one is much faster.

Query regarding stage 3 training.

Thank you for the great work.
I am currently running stage 3 training to further refine the depth maps. However, the RMSE is not so great (till 15 epochs), It's still lingering around 860-890. After how many epochs do you generally experience a drop of RMSE especially in the third stage? and the hyperparameters used in the code repository for stage three is the same as you used in the stage 3 experiment?