zouchuhang / layoutnet Goto Github PK

Torch implementation of our CVPR 18 paper: "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image"

Home Page: http://openaccess.thecvf.com/content_cvpr_2018/papers/Zou_LayoutNet_Reconstructing_the_CVPR_2018_paper.pdf

License: MIT License

Lua 2.41% MATLAB 56.30% M 0.01% Objective-C 0.01% C++ 7.84% C 12.40% Makefile 0.11% HTML 17.73% CSS 0.12% Perl 0.01% Python 0.34% PostScript 2.72%

3d-layout 3d-reconstruction deep-learning layoutnet

layoutnet's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 weitaoatvison ml-lab wavelet303 cy-xu lulllabs shijy07 gumple thuhuwei ahmadyan leispeed zebrajack ryan2x shreks7 doregan oztc l732114684 pandinosaurus inkimage geekysethi euivmar alexcolburn dongdongbai n1ckfg jlee12393 chaoer tucan9389 johnny710vip panunsw nuaalyh xinrui-fang tianlei822 nuggy875 zoubochao oogetyboogety scitao haopo2005 jinczhg riddicksky manniru bruce-wayne99 zhenyangwuu leeyangg robotshanxi tensorflow-pool klqulei sunbirddy peterzs forrest-ht stoensin chunyu-lin-bjtu liuwenhaha jlh199171 nnu-gisa lileilang flyinsky235019 lingaimu minxuanjun paulkrush yhexie eghbalifatema19 jianweiguo wellowdata nerdtomars whqchina daeheepark thoang3 masoud-ghodrati eknim susumunara poacher69 hollisjoe yuantao15 bigones ajinkyapuar hongjunsong1 lilinxi lowe2011 vedanthpadigelwar aaronponceuv laws90s dongpenjing arctanbell ggbound lunalulu liu-shichao michaelzhou-new xiaoruiguo rikharthu byk008 hackyinge

layoutnet's Issues

Would you mind share the code about perspective images?

What about the hardware configuration？

Hardware operating environment is a number of gpu or server?

How to work with non-cuboid layout prediction

3d layout construction

hello @zouchuhang @alexcolburn
I already checked the issue #3 but I still cannot fully understand the output.
since I do not have any background with 3d environment could you please explain some more details such as which tool to generate the mesh and wrap the pano, which variables to use, etc?

How to tackle such memory issues

As shown below

How to handle such memory issues.?

My system has 8GB RAM
Nvidia GTX 940M 4GB
Core i5 6th GEN

Some questions about dataset

Hello,

Thank you for opensourcing this great work.
Recently I implemented LayoutNet using Pytorch https://github.com/sunset1995/pytorch-layoutnet and found some questions about the ground truth data:

The numbers of line in gt/pano*txt is 1063 while the numbers of gt/label_cor/**/*matis 1028. Can you please supplement the missing ground truth?
If I'm not wrong, labeled corner under label_cor/**/* should be scaled before visualization or evaluation. The scale for stanford2d3d and paonContext are 4.0 and 8.890625 respectively. Is that correct?

Thanks you and have a nice day :)

using fp16 or mixed precision help for inadequate vram

Will mixed-precision be able to accommodate the 10.9gb in 8gb since using fp16 essentially doubles the VRAM.

Max pooling is directly followed by upsampling

The network's structure contains max pooling immediately followed by upsampling.
Maybe I'm missing something but it doesn't seem to make any sense. And just removing it should improved results.

local pool7 = nn.SpatialMaxPooling(2,2,2,2)(conv7_relu)
local unpool00 = nn.SpatialUpSamplingNearest(2)(pool7)

Is there something specific that this structure addresses?

3d box from single RGB image of indoor room.. (without panorama)

Is there any way to use this logic to make a 3d box from single RGB image of indoor room.. (without panorama).

like of this image

Regards

generate floor plan through multiple panoramas

Instead of generating one room layout from one panorama, I'm wondering if LayoutNet can generate a complete floor plan from multiple panoramas?

Thank you
JJ

What is the ground-truth mask used during training?

I saw in the file train_pano_joint.lua that you use somekind of mask to increase the loss at some poisitions.
I couldn't find any reference to that neither in the paper or throughout the repository.

Could you please explain what is the mask, how it is generated and why is it needed?

Thanks

    gtMsk = torch.mul(gtMsk, 4)
    gtMsk = gtMsk:cuda()
    gtMsk_w = torch.cmul(loss_d_1, gtMsk)
    loss_d_1 = torch.add(gtMsk_w, loss_d_1)
    gt2Msk = torch.mul(gt2Msk, 4)
    gt2Msk = gt2Msk:cuda()
    gt2Msk_w = torch.cmul(loss_d_2, gt2Msk)
    loss_d_2 = torch.add(gt2Msk_w, loss_d_2)

perspective version of function `panoEdgeDetection`

Is the perspective image version of function panoEdgeDetection available somewhere in the code base? I can only find the pano one.

Thank you!

Experiment result is not consistent with the result reported on the paper

Hi, thanks a lot for sharing your work!
I download your full approach model pretrained on the panoContext dataset and use it to predict on the test set of the panoContext dataset. However, I find that the results inferred by your pretrained model are 73.85, 1.07 and 3.40 respectively while the results on the paper are 74.48, 1.06 and 3.34 respectively. I doubt wheter the pretrained model can obtain the results reported on the paper or whether I did something wrong...
Can u help me? Thanks a lot!

question about perspective training

I want to train the perspective model from scratch, but I can' t find info_edg_stack_tr_lsun_640_d6_sig20_trname.t7, can you tell me where I can get this dataset.

Seems like gradient computation is not right, correct me if I am wrong

LayoutNet/matlab/sampleEijOpt.m

Line 44 in f0d05b8

    
           d_aob_x0 = (2*x0-1)*n_v_ao*n_v_bo + dot(v_ao, v_bo) * (x0*n_v_bo/n_v_ao + (x0-1)*n_v_ao/n_v_bo);

line 44 and 45 and other lines after should be
d_aob_x0 = (2*x0-1)n_v_aon_v_bo/(sqrt(1-cos(b_aob)cos(b_aob))+eps) + dot(v_ao, v_bo) * (x0n_v_bo/n_v_ao + (x0-1)*n_v_ao/n_v_bo);
d_aob_x0 = -d_aob_x0 /n_v_ao/n_v_ao/n_v_bo/n_v_bo;

looks like only scale difference, but the optimization steps decrease.

Some questions about edge map and cor_map

I want to test in my own dataset ,and i have got the label_cor,but i don't know how to get the edge_map and cor_map from label_cor like you? Can you help me?

json to mat

Hello,I have used the PanoAnnotator to mark some panorama images, and I get the .json file. How can I get the .mat file in the label_cor like you?(The label_cor only contains the coordinate of corner)

Could you share the expected validation loss during training?

So far I trained only the first 2 steps. (I don't really care about the box prediction, just the edge and corner detection).

First I trained using driver_pano_edg.lua and reached validation loss of 0.12333107 after 3260 iterations, which stopped improving for the rest ~4700 iterations.

Then, using this model, I trained with driver_pano_joint.lua and reached validation loss of 0.20790252 after 1480 iterations, which stopped improving for the rest ~6500 iterations.

It seems to produce results that are not as good as the supplied pretrained model.

What is the expected validation loss in each step?

How to cover rotEdge(variable in getManhattanAndAlign.m ) to t7 ?

I have successfully ran your code getManhattanAndAlign.m and test testNet_pano_full.lua on your demo, but i want to try it on my own pic.
I see your code
lne_ts = torch.load('./data/panoContext_line_test.t7')
and
img_ts = torch.load('./data/panoContext_img_test.t7')

it needs t7 file, but how can i cover variable in matlab to t7 file? it should be saved to image then saved to t7 file ? really thank you for your help.

I want to make to iOS app using Core ML :)

Hi! It’s awesome idea!
I’m iOS developer and want to make to iOS DEMO!
Is there a trained model?

(Sorry for bad English)

LSUN toolkit missing?

Where can I find the LSUN toolkit to further process/fine-tune my results (corner locations) on perspective images, as stated here: https://github.com/zouchuhang/LayoutNet#extension-to-perspective-images ?

What is the scale of the reconstructed 3D room to the actual room?

Hi ,

Thanks for sharing your work. This is very interesting and has some great use cases.

With regard to the result, is it possible to find the measurements of the actual room from the 3D reconstruction? Or in other words, how accurate is the 3d reconstruction in terms of wall width or length measurements to the actual room ?

where is the "RoomLayoutTypes" in the matlab/getLSUNRes.m file

the "RoomLayoutTypes" (which is in the line 15 in the matlab/getLSUNRes.m) is missed, where can I find it?

How to annotate panorama

Hello
First, thanks for your sharing .
We want to use new indoor panorama samples for training, but how to annotate panorama as shown in the appendix becomes a trouble for us. Could you share the method or some tools to do the work?
Thank you!
![camera_0ada711b86f540e0a037287c2274f93a_wc_2_frame_equirectangular_domain_](https://user-images.githubusercontent.com/17869984/47842686-658a7200-d

df8-11e8-8b3e-2164e48c7fd4.png)

preprocessPano requires pano_edge_tr_1024

I'm trying Matlab scripts, but preprocessPano.m requires pano_edge_tr_1024..
Could you show how to run it properly?

>> preprocessPano
pano_aadmuaxyxouqic
Error using load
Unable to read file '..\data\pano_edge_tr_1024\vp\pano_aadmuaxyxouqic.mat'. No
such file or directory.

Error in preprocessPano (line 59)
    load(['..\data\pano_edge_tr_1024\vp\' im_name '.mat']);

Creating Ground Truth

Hi, thank you for your wonderful work. I would like to create a new dataset for this architecture and was wondering how you created the GT.

I am having trouble interpreting the mat files for the ground truth and how to reproduce them for a series of new images.

Thank you so much for your time in advance, and I sincerely appreciate whatever insights you are able to give.

Issues reproducing network - resulting with different size

Hello! :)
I'm trying to implement your network with keras and it that the network I built has many more parameters than the amount you declared at your paper.
You've mentioned you have been able to train the entire network with a batch size of 20 using 12GB.
(I've even seen in #5 that you've mentioned you use 10.969GB)
It seems that my gpu has 10.57GiB available, but when I try to use a batch size of 15, which by calculation should fit the gpu, the gpu cannot fit the model into it's memory.
I've even removed the 3-D regresson part and it still fails.

So I wanted to ask if you could help me see if i've made any implementation error :)
Could you for example provide the total number of parameters of your model?
And perhaps even better, provide the number of parameters per layer? :)

Here is the description of my implementation :)
I've defined the network as follows:

def layoutnet():
    # Encoder
    input = layers.Input(shape=(6, 512, 1024))  # chw format
    e1 = conv2d_relu_pool(input, 32, name='e1')  # [?, 32, 256, 512]
    e2 = conv2d_relu_pool(e1, 64, name='e2')  # [?, 64, 128, 256]
    e3 = conv2d_relu_pool(e2, 128, name='e3')  # [?, 128, 64, 128]
    e4 = conv2d_relu_pool(e3, 256, name='e4')  # [?, 256, 32, 64]
    e5 = conv2d_relu_pool(e4, 512, name='e5')  # [?, 512, 16, 32]
    e6 = conv2d_relu_pool(e5, 1024, name='e6')  # [?, 1024, 8, 16]
    e7 = conv2d_relu_pool(e6, 2048, name='e7')  # [?, 2048, 4, 8]
    encoder = Model(input, e7)

    # Top decoder branch
    td1 = up_conv2d_relu(e7, 1024, 'td1')  # [?, 8, 16, 1024]
    td1 = layers.Concatenate(axis=1, name='td1_concat')([td1, e6])  # [?, 1024 * 2, 8, 16]

    td2 = up_conv2d_relu(td1, 512, name='td2')  # [?, 16, 32, 512]
    td2 = layers.Concatenate(axis=1, name='td2_concat')([td2, e5])  # [?, 512 * 2, 16, 32]

    td3 = up_conv2d_relu(td2, 256, name='td3')  # [?, 32, 64, 256]
    td3 = layers.Concatenate(axis=1, name='td3_concat')([td3, e4])  # [?, 256 * 2, 32, 64]

    td4 = up_conv2d_relu(td3, 128, name='td4')  # [?, 64, 128, 128]
    td4 = layers.Concatenate(axis=1, name='td4_concat')([td4, e3])  # [?, 128 * 2, 64, 128]

    td5 = up_conv2d_relu(td4, 64, name='td5')  # [?, 128, 256, 64]
    td5 = layers.Concatenate(axis=1, name='td5_concat')([td5, e2])  # [?, 64 * 2, 128, 256]

    td6 = up_conv2d_relu(td5, 32, name='td6')  # [?, 256, 512, 32]
    td6 = layers.Concatenate(axis=1, name='td6_concat')([td6, e1])  # [?, 32 * 2, 256, 512]

    td7 = up_conv2d_relu(td6, 3, name='td7')  # [?, 512, 1024, 3]
    td = layers.Activation('sigmoid')(td7)
    top_decoder = Model(input, td)

    # Bottom decoder branch
    bd1 = layers.Convolution2D(1024, (3, 3), (1, 1), padding='same', activation='relu', name='bd1_conv'+'_conv')(top_decoder.get_layer('td1_upsample').output)  # [?, 1024, 8, 16]
    bd1 = layers.Concatenate(axis=1, name='bd1_concat')([bd1, td1])  # [?, 1024 * 3, 8, 16]

    bd2 = up_conv2d_relu(bd1, 512, name='bd2')  # [?, 16, 32, 512]
    bd2 = layers.Concatenate(axis=1, name='bd2_concat')([bd2, td2])  # [?, 512 * 3, 16, 32]

    bd3 = up_conv2d_relu(bd2, 256, name='bd3')  # [?, 32, 64, 256]
    bd3 = layers.Concatenate(axis=1, name='bd3_concat')([bd3, td3])  # [?, 256 * 3, 32, 64]

    bd4 = up_conv2d_relu(bd3, 128, name='bd4')  # [?, 64, 128, 128]
    bd4 = layers.Concatenate(axis=1, name='bd4_concat')([bd4, td4])  # [?, 128 * 3, 64, 128]

    bd5 = up_conv2d_relu(bd4, 64, name='bd5')  # [?, 128, 256, 64]
    bd5 = layers.Concatenate(axis=1, name='bd5_concat')([bd5, td5])  # [?, 64 * 3, 128, 256]

    bd6 = up_conv2d_relu(bd5, 32, name='bd6')  # [?, 256, 512, 32]
    bd6 = layers.Concatenate(axis=1, name='bd6_concat')([bd6, td6])  # [?, 32 * 3, 256, 512]

    bd7 = up_conv2d_relu(bd6, 1, name='bd7')  # [?, 512, 1024, 1]
    bd = layers.Activation('sigmoid')(bd7)
    bot_decoder = Model(input, bd)

    # 3D box
    # reg = layers.Concatenate(axis=1, name='reg_input')([td, bd])  # [?, 4, 512, 1024]
    # reg = conv2d_relu_pool(reg, 8, name='reg_downsample1')  # [?, 8, 256, 512]
    # reg = conv2d_relu_pool(reg, 16, name='reg_downsample2')  # [?, 16, 128, 256]
    # reg = conv2d_relu_pool(reg, 32, name='reg_downsample3')  # [?, 32, 64, 128]
    # reg = conv2d_relu_pool(reg, 64, name='reg_downsample4')  # [?, 64, 32, 64]
    # reg = conv2d_relu_pool(reg, 128, name='reg_downsample5')  # [?, 128, 16, 32]
    # reg = conv2d_relu_pool(reg, 256, name='reg_downsample6')  # [?, 256, 8, 16]
    # reg = conv2d_relu_pool(reg, 512, name='reg_downsample7')  # [?, 512, 4, 8]
    # reg = layers.Flatten(name='reg_flatten')(reg)
    # reg = layers.Dense(1024, activation='relu', name='reg_dense1')(reg)
    # reg = layers.Dense(256, activation='relu', name='reg_dense2')(reg)
    # reg = layers.Dense(64, activation='relu', name='reg_dense3')(reg)
    # reg = layers.Dense(6, name='reg_dense4')(reg)

    # model = Model(input, [top_decoder, bot_decoder, reg])
    model = Model(input, [td, bd])
    return model

And the number of parameters per layer is shown here:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 6, 512, 1024) 0                                            
__________________________________________________________________________________________________
e1_conv (Conv2D)                (None, 32, 512, 1024 1760        input_1[0][0]                    
__________________________________________________________________________________________________
e1_pool (MaxPooling2D)          (None, 32, 256, 512) 0           e1_conv[0][0]                    
__________________________________________________________________________________________________
e2_conv (Conv2D)                (None, 64, 256, 512) 18496       e1_pool[0][0]                    
__________________________________________________________________________________________________
e2_pool (MaxPooling2D)          (None, 64, 128, 256) 0           e2_conv[0][0]                    
__________________________________________________________________________________________________
e3_conv (Conv2D)                (None, 128, 128, 256 73856       e2_pool[0][0]                    
__________________________________________________________________________________________________
e3_pool (MaxPooling2D)          (None, 128, 64, 128) 0           e3_conv[0][0]                    
__________________________________________________________________________________________________
e4_conv (Conv2D)                (None, 256, 64, 128) 295168      e3_pool[0][0]                    
__________________________________________________________________________________________________
e4_pool (MaxPooling2D)          (None, 256, 32, 64)  0           e4_conv[0][0]                    
__________________________________________________________________________________________________
e5_conv (Conv2D)                (None, 512, 32, 64)  1180160     e4_pool[0][0]                    
__________________________________________________________________________________________________
e5_pool (MaxPooling2D)          (None, 512, 16, 32)  0           e5_conv[0][0]                    
__________________________________________________________________________________________________
e6_conv (Conv2D)                (None, 1024, 16, 32) 4719616     e5_pool[0][0]                    
__________________________________________________________________________________________________
e6_pool (MaxPooling2D)          (None, 1024, 8, 16)  0           e6_conv[0][0]                    
__________________________________________________________________________________________________
e7_conv (Conv2D)                (None, 2048, 8, 16)  18876416    e6_pool[0][0]                    
__________________________________________________________________________________________________
e7_pool (MaxPooling2D)          (None, 2048, 4, 8)   0           e7_conv[0][0]                    
__________________________________________________________________________________________________
td1_upsample (UpSampling2D)     (None, 2048, 8, 16)  0           e7_pool[0][0]                    
__________________________________________________________________________________________________
td1_conv (Conv2D)               (None, 1024, 8, 16)  18875392    td1_upsample[0][0]               
__________________________________________________________________________________________________
td1_concat (Concatenate)        (None, 2048, 8, 16)  0           td1_conv[0][0]                   
                                                                 e6_pool[0][0]                    
__________________________________________________________________________________________________
bd1_conv_conv (Conv2D)          (None, 1024, 8, 16)  18875392    td1_upsample[0][0]               
__________________________________________________________________________________________________
td2_upsample (UpSampling2D)     (None, 2048, 16, 32) 0           td1_concat[0][0]                 
__________________________________________________________________________________________________
bd1_concat (Concatenate)        (None, 3072, 8, 16)  0           bd1_conv_conv[0][0]              
                                                                 td1_concat[0][0]                 
__________________________________________________________________________________________________
td2_conv (Conv2D)               (None, 512, 16, 32)  9437696     td2_upsample[0][0]               
__________________________________________________________________________________________________
bd2_upsample (UpSampling2D)     (None, 3072, 16, 32) 0           bd1_concat[0][0]                 
__________________________________________________________________________________________________
td2_concat (Concatenate)        (None, 1024, 16, 32) 0           td2_conv[0][0]                   
                                                                 e5_pool[0][0]                    
__________________________________________________________________________________________________
bd2_conv (Conv2D)               (None, 512, 16, 32)  14156288    bd2_upsample[0][0]               
__________________________________________________________________________________________________
td3_upsample (UpSampling2D)     (None, 1024, 32, 64) 0           td2_concat[0][0]                 
__________________________________________________________________________________________________
bd2_concat (Concatenate)        (None, 1536, 16, 32) 0           bd2_conv[0][0]                   
                                                                 td2_concat[0][0]                 
__________________________________________________________________________________________________
td3_conv (Conv2D)               (None, 256, 32, 64)  2359552     td3_upsample[0][0]               
__________________________________________________________________________________________________
bd3_upsample (UpSampling2D)     (None, 1536, 32, 64) 0           bd2_concat[0][0]                 
__________________________________________________________________________________________________
td3_concat (Concatenate)        (None, 512, 32, 64)  0           td3_conv[0][0]                   
                                                                 e4_pool[0][0]                    
__________________________________________________________________________________________________
bd3_conv (Conv2D)               (None, 256, 32, 64)  3539200     bd3_upsample[0][0]               
__________________________________________________________________________________________________
td4_upsample (UpSampling2D)     (None, 512, 64, 128) 0           td3_concat[0][0]                 
__________________________________________________________________________________________________
bd3_concat (Concatenate)        (None, 768, 32, 64)  0           bd3_conv[0][0]                   
                                                                 td3_concat[0][0]                 
__________________________________________________________________________________________________
td4_conv (Conv2D)               (None, 128, 64, 128) 589952      td4_upsample[0][0]               
__________________________________________________________________________________________________
bd4_upsample (UpSampling2D)     (None, 768, 64, 128) 0           bd3_concat[0][0]                 
__________________________________________________________________________________________________
td4_concat (Concatenate)        (None, 256, 64, 128) 0           td4_conv[0][0]                   
                                                                 e3_pool[0][0]                    
__________________________________________________________________________________________________
bd4_conv (Conv2D)               (None, 128, 64, 128) 884864      bd4_upsample[0][0]               
__________________________________________________________________________________________________
td5_upsample (UpSampling2D)     (None, 256, 128, 256 0           td4_concat[0][0]                 
__________________________________________________________________________________________________
bd4_concat (Concatenate)        (None, 384, 64, 128) 0           bd4_conv[0][0]                   
                                                                 td4_concat[0][0]                 
__________________________________________________________________________________________________
td5_conv (Conv2D)               (None, 64, 128, 256) 147520      td5_upsample[0][0]               
__________________________________________________________________________________________________
bd5_upsample (UpSampling2D)     (None, 384, 128, 256 0           bd4_concat[0][0]                 
__________________________________________________________________________________________________
td5_concat (Concatenate)        (None, 128, 128, 256 0           td5_conv[0][0]                   
                                                                 e2_pool[0][0]                    
__________________________________________________________________________________________________
bd5_conv (Conv2D)               (None, 64, 128, 256) 221248      bd5_upsample[0][0]               
__________________________________________________________________________________________________
td6_upsample (UpSampling2D)     (None, 128, 256, 512 0           td5_concat[0][0]                 
__________________________________________________________________________________________________
bd5_concat (Concatenate)        (None, 192, 128, 256 0           bd5_conv[0][0]                   
                                                                 td5_concat[0][0]                 
__________________________________________________________________________________________________
td6_conv (Conv2D)               (None, 32, 256, 512) 36896       td6_upsample[0][0]               
__________________________________________________________________________________________________
bd6_upsample (UpSampling2D)     (None, 192, 256, 512 0           bd5_concat[0][0]                 
__________________________________________________________________________________________________
td6_concat (Concatenate)        (None, 64, 256, 512) 0           td6_conv[0][0]                   
                                                                 e1_pool[0][0]                    
__________________________________________________________________________________________________
bd6_conv (Conv2D)               (None, 32, 256, 512) 55328       bd6_upsample[0][0]               
__________________________________________________________________________________________________
bd6_concat (Concatenate)        (None, 96, 256, 512) 0           bd6_conv[0][0]                   
                                                                 td6_concat[0][0]                 
__________________________________________________________________________________________________
td7_upsample (UpSampling2D)     (None, 64, 512, 1024 0           td6_concat[0][0]                 
__________________________________________________________________________________________________
bd7_upsample (UpSampling2D)     (None, 96, 512, 1024 0           bd6_concat[0][0]                 
__________________________________________________________________________________________________
td7_conv (Conv2D)               (None, 3, 512, 1024) 1731        td7_upsample[0][0]               
__________________________________________________________________________________________________
bd7_conv (Conv2D)               (None, 1, 512, 1024) 865         bd7_upsample[0][0]               
__________________________________________________________________________________________________
activation (Activation)         (None, 3, 512, 1024) 0           td7_conv[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 1, 512, 1024) 0           bd7_conv[0][0]                   
==================================================================================================
Total params: 94,347,396
Trainable params: 94,347,396
Non-trainable params: 0

some poor images cannot make it

Hi,really good work!
when I evaluated some images at optimization step,it's not work any more, I mean most images I used can make it,but except some of them ,for example, like this one below, can you help me:D

Undefined function or variable 'cor_fn_t'.
Error in samplingPanoBox (line 38)
            line_can(line_n+1:end,score_id(line_id)) =cor_fn_t(2*score_id(line_id),1);

can the method reconstruction panoramic image taken by mobile phone

can the LayoutNet reconstruction panoramic image taken by mobile phone?

3D ground truth interpretation

Hello,

First, thank you for sharing this great work. A quick question about the panoContext_box_train.t7 tensor:

The paper mentions 6 ground truth 3D parameters: sw, sl, sh, tx, tz, r_theta. The first 6 elements in the box tensor above (box[{{1}{1}{1,6}}]), which I believe contain those parameters for the first example image, read:

sw = -0.5154072972870558
sl = -0.6748731674025037
sh = -1.316387492900166
tx = -0.24216556285261603
tz = -0.2114205765327388
r_theta = 0.08283438070600802

A naive interpretation would suggest that the room is almost 3x higher than it is wide? Is there a reason for the negative scale factors? Any guidance on interpretation would be much appreciated

How to create 3D reconstruction from the given code

I run the th testNet_pano_full.lua in my system which is running perfectly, but I just wanted to know which script should I run to create 3D reconstruction.

How to evaluate pixel error listed in the paper?

Saw it as "the pixel-wise accuracy between the layout and the ground truth", not sure what's that mean.

Could you also share the code of generating Manhattan lines?

Would you mind share optimization code for 6 walls

@zouchuhang
For 4 walls case function sampleEijOpt optimize 3 variables (camera center and scale probably), I assume there would be more variables for 6 walls. Thanks.

Can 3d room layout be reconstructed with multiple panoramas?

Training does not converge

hello,
thank you for sharing the great work.
I tried the edge training 'th driver_pano_edg.lua' followed with your guidance, but the loss value does not converge. I checked the code and the data but can not find the reason. could you give me some advices?

the output as followed

done
414
Uploaded training
46
Uploaded validation
start training
update param, loss = 0.64732569, gradnorm = 6.7421e+00
update param, loss = 2.26082158, gradnorm = 9.6880e-01
update param, loss = 2.10377669, gradnorm = 1.5707e-03
update param, loss = 2.14185762, gradnorm = 0.0000e+00
update param, loss = 2.16074014, gradnorm = 0.0000e+00
update param, loss = 2.17253232, gradnorm = 0.0000e+00
...
update param, loss = 2.05997038, gradnorm = 0.0000e+00
update param, loss = 2.17931867, gradnorm = 0.0000e+00
update param, loss = 2.07385349, gradnorm = 0.0000e+00
iteration 8000, loss = 2.07385349, gradnorm = 0.0000e+00
validation loss = 2.14375149

Original mapping between dataset ids and panoContext names

What is the mapping between the images in the .t7 files (from data.zip) and their original name in the panoContext dataset? (and Stanford 2d-3d)

I tried to look in the file ./gt/panoContext_train.txt, but it doesn't,
For example:
./gt/panoContext_train.txt[1] says pano_aurfmkmrmsfgau.png
but it's actually file
./data/panoContext_img_train.t7[{{336},{},{},{}}]

and
./data/panoContext_img_train.t7[{{1},{},{},{}}]
is actually file- pano_93a57c28c5e11bb9c96f944c2a649f2b.jpg which appears as
./gt/panoContext_train.txt[364]

Is there some way to find the original mapping between the datasets?

Also, do you have the rotation matrix that was used to align each image? Or should I just execute the getManhattanAndAlign.m on the images and I will get those exact images?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.