kakaxi314 / guidenet Goto Github PK

View Code? Open in Web Editor NEW

153.0 153.0 15.0 14 KB

Implementation for our paper 'Learning Guided Convolutional Network for Depth Completion'

License: MIT License

Python 82.46% C++ 3.82% Cuda 13.72%

guidenet's People

Contributors

Stargazers

Watchers

Forkers

ucas-lucky liangliangguo kimhyung chaoer zhwzhong tkuri mouryarahul sg47 mac137 leifengsoul shikishima-tasakilab fanrz danielroeder1 ygyg99 seokyeongkim

guidenet's Issues

Hope to get your reply

Hello, thank you very much for your contribution. When I test your model, I only have loss. I want to ask you how to get the results of RMSE in the paper

请问能否开源计算RMSE的代码？参考PENET参考计算你开源的GNS模型的RMSE和论文中差距太大？

About the NYU training and evaluation

Hello! I notice that the released code didn't include the method for training and evaluating the NYU depth v2 dataset. So how to get the results? Or can anyone tell me? Thank you very much !

Question for the kernel design in multi-batch input.

Thank you for your nice work.
Since the code is not yet open, I write down my question for your kernel design.

In the paper, given the image feature information, you set this feature as the convolution weight.
My question is when the size of the batch of the input images is larger than 1, how did you compute the convolution operation?

In pytorch, kernel shape of CNN is [out_channels, in_channels, kernel_size, kernel_size] which means that the same kernels are applied to different multi-batch tensor (e.g. img.shape = [batch, channel, height, width]).
So I guess that you may split the multi-batch-tensor into single-batch-tensor for computation... or did you personally re-implement the convolution operation??..

when the code will be released？

Hi, thank you for the great work. I'm wondering when the code will be uploaded.

Seven months have passed, when you will upload the your code?

As the title said, since you creat the repo, 7 mon have passed, plz share the code, thks

when will you release the code?

Are there some private issue not releasing the code? The last sentence in the paper is just for passing the review?

Compile error

Thank you for your great project.
When I compile this project, it shows:
unable to execute ':/usr/local/cuda/bin/nvcc': No such file or directory
error: command ':/usr/local/cuda/bin/nvcc' failed with exit status 1

my environment is: ubuntu18.04, python3.6.9, pytorch-1.4.0, cuda10.0. When I run other projects on the same platform, it is ok.
I don't know why this issue happens, could you show me your environment settings please? Thank you very much!

Help for my GuidedConv Implementation

Deat author,
I try to reimplement your GuidedConv based on your paper.
But I found the training process is very unstable.
Here is my pytorch implementation:

class _GuideConv(nn.Module):
    def __init__(self, img_in, sparse_in, sparse_out, K=3):
        super(_GuideConv, self).__init__()
        # KGL: kernel generating layer
        self.conv1 = nn.Conv2d(img_in, K*K*sparse_in, kernel_size=3,
                               stride=1, padding=1, groups=1, bias=False)
        self.bn1 =  nn.BatchNorm2d(K*K*sparse_in)

        self.fc = nn.Linear(img_in, sparse_in * sparse_out, bias=False)
        self.bn2 = nn.BatchNorm2d(sparse_out)

        self.img_in = img_in
        self.sparse_in = sparse_in
        self.sparse_out = sparse_out

    def forward(self, G, S):
        '''
        G: input guidance
        S: input source feature
        '''
        # spatially-variant
        W1 = self.conv1(G) # [B,Cin*K*K,H,W]
        W1 = self.bn1(W1)
        depths = torch.chunk(S, self.sparse_in, 1) # Cin*[B,1,H,W]
        kernels = torch.chunk(W1, self.sparse_in, 1) # Cin*[B,K*K,H,W]
        S1 = []
        for i in range(self.sparse_in):
            S1.append(torch.sum(depths[i]*kernels[i], 1, keepdim=True))
        S1 = torch.cat(S1, 1) # [B,Cin,H,W]

        # cross-channel conv
        W2 = F.adaptive_avg_pool2d(G, (1, 1))
        B = W2.size(0)
        W2 = W2.reshape(B, -1) # (b,img_in)
        W2 = self.fc(W2)
        W2 = W2.view([B, self.sparse_out, self.sparse_in])

        depths = torch.chunk(S1, B, 0) # B*[1,Cin,H,W]
        kernels = torch.chunk(W2, B, 0) # B*[1,Cout,Cin]
        S2 = []
        for i in range(B):
            weight = kernels[i][0].unsqueeze(-1).unsqueeze(-1) # [Cout,Cin,1,1]
            S2.append(F.conv2d(depths[i], weight, bias=None, stride=1, padding=0))
        S2 = torch.cat(S2, 0) # [B,Cin,H,W]
        S2 = F.relu(self.bn2(S2))

        return S2

Could you give me some advice? or just open source this part of code? 🎉

When will you release the code?

could you please share the code ?

Hi,I want to ask some questions

How effective is your code on the indoor dataset? Can it be used to fill holes left by depth cameras?When will your code be released?

A naive GuidedConv implementation

I guess the source code for this repo would never be released, and since people are confused by "Guided Conv Module" in the paper, I'm going to share my naive PyTorch implementation based on CSPN code.

Note: This implementation was based on my own understanding of the paper, I'm not sure if it's correct. Besides, I got worse result with this module compared to ordinary concat/add fusion method.

Let me know if you have any questions or suggestions.

class _GuidedConv(nn.Module):
    def __init__(self):
        super(_GuidedConv, self).__init__()

        self.pad_left_top = nn.ZeroPad2d((1, 0, 1, 0))
        self.pad_center_top = nn.ZeroPad2d((0, 0, 1, 0))
        self.pad_right_top = nn.ZeroPad2d((0, 1, 1, 0))
        self.pad_left_middle = nn.ZeroPad2d((1, 0, 0, 0))
        self.pad_right_middel = nn.ZeroPad2d((0, 1, 0, 0))
        self.pad_left_bottom = nn.ZeroPad2d((1, 0, 0, 1))
        self.pad_center_bottom = nn.ZeroPad2d((0, 0, 0, 1))
        self.pad_right_bottom = nn.ZeroPad2d((0, 1, 0, 1))

    def forward(self, x, cw: list, cc: list):
        """
        `x`: input feature maps with size `[B, C_in, H, W]`  
        `cw`: `C_in` channel-wise kernels, each with size [B, 3*3, H, W]  
        `cc`: `C_out` cross-channel 1*1 kernels, each with size [B, C_in]
        """

        # stage-1: weight x with kernels in `cw`
        tmp = []
        for i in range(len(cw)):
            feat = self._compose_feat(x[:, i, :, :].unsqueeze_(1))
            feat *= cw[i]
            tmp.append(torch.sum(feat, dim=1, keepdim=True))
        tmp = torch.cat(tmp, dim=1)  # [B, C_in, H, W]

        # stage-2: weight tmp with kernels in `cc`
        out = []
        for i in range(len(cc)):
            weight = cc[i].unsqueeze_(-1).unsqueeze_(-1)
            out.append(torch.sum(tmp * weight, dim=1, keepdim=True))

        return torch.cat(out, dim=1)  # [B, C_out, H, W]

    def _compose_feat(self, feat: torch.FloatTensor):
        [H, W] = feat.shape[2:]
        output = [feat]

        # left-top
        output.append(self.pad_left_top(feat)[:, :, :H, :W])
        # center-top
        output.append(self.pad_center_top(feat)[:, :, :H, :])
        # right-top
        output.append(self.pad_right_top(feat)[:, :, :H, 1:])
        # left-middle
        output.append(self.pad_left_middle(feat)[:, :, :, :W])
        # right-middle
        output.append(self.pad_right_middel(feat)[:, :, :, 1:])
        # left-bottom
        output.append(self.pad_left_bottom(feat)[:, :, 1:, :W])
        # center-bottom
        output.append(self.pad_center_bottom(feat)[:, :, 1:, :])
        # right-bottom
        output.append(self.pad_right_bottom(feat)[:, :, 1:, 1:])
        # concat
        output = torch.cat(output, dim=1)  # [B, 3*3, H, W]

        return output

Dataloader for Virtual KITTI

Hi,
In your paper I noticed that you have done some ablation studies on the Virtual KITTI dataset.
You mentioned that you used the sparse depth masks corresponding to the frame in the original KITTI dataset to generate sparse depth maps for virtual kitti.
Would be great if you can share the code for this matching of frames between KITTI and virtual KITTI.

Thanks,
Hardik

Can someone convert cuda code into pytorch？

Thanks to the author for his good work and open source code!
Since the core code is written in cuda which is difficult to understand for people who are not familiar with cuda programming. Can someone provide the code for pytorch?
Thanks!

Your paper has published so when will you release the code?

how long does it take to train the model

As in the default setting, training with 2 v100 gpus on kitti, how long does it takes?

Hi, have you uploaded the paper to arxiv? Thanks!

Can I reduce kitti_Raw data for training ?

Hi, recently I've been through your great work, I would like to train the network but with less Kitti_Raw data, is that possible? as all know Kitti_raw dataset is very huge, and out of 5 dates of data, I have 3 dates. like 2011_09_26 , 2011_09_28 and 2011_09_30. if so please guide me on how to reduce Kitt-raw data for training. Thanks in advance!

......

....

Implementation details (layer and channel numbers)

Thank you for your interesting work and planning to release your code!

I understand that you plan to release the code after getting a review of the TIP submission.
Meanwhile, I'm trying to implement your method from the descriptions in the paper, but some of important information is missing to fully understand the method. Could you help me by answering my questions below?

Layer numbers: I suppose the overall network architecture is exactly the one illustrated in Figure 3 in terms of layer numbers, ie, single conv (+bn+relu) layers at input and output, and 5 successive blocks of ResBlock-ResBlock (w/ downsampling in the first ResBlock) in the encoders. Is is right?
Places of down-sampling: Where does down-sampling happen? I suppose there is always down-sampling at the first internal layer of each block of two successive ResBlocks. Do the first convolution layers (gray layers) do also down-sampling?
Channel numbers: Could you provide the channel number of each layer in Figure 3? It will be very helpful if you could directly write down the numbers beside the layers in the figure and upload it here as an image (You can upload an image here by copying an image data onto clipboard and paste in the text box).

Thank you.

flip in test

Hi,
Thanks for releasing the code.
I find that you feed both original maps and flipped maps into the network to predict the dense depth map in the code. I wonder whether the submitted results are obtained through the same strategy.
Best