sunset1995 / hohonet Goto Github PK

View Code? Open in Web Editor NEW

103.0 6.0 23.0 9.59 MB

"HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features" official pytorch implementation.

Home Page: https://sunset1995.github.io/HoHoNet/

License: MIT License

Python 18.46% Jupyter Notebook 81.54%

computer-vision 360-photo room-layout depth-estimation semantic-segmentation cvpr2021 hohonet

hohonet's Introduction

HoHoNet

Code for our paper in CVPR 2021: HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features (paper, video).

News

April 3, 2021: Release inference code, jupyter notebook and visualization tools. Guide for reproduction is also finished.
March 4, 2021: A new backbone HarDNet is included, which shows better speed and depth accuracy.

Pretrained weight

Links to trained weights ckpt/: download on Google drive or download on Dropbox.

Inference

In below, we use an out-of-training-distribution 360 image from PanoContext as an example.

Jupyter notebook

See infer_depth.ipynb, infer_layout.ipynb, and infer_sem.ipynb for interactive demo and visualization.

Batch inference

Run infer_depth.py/infer_layout.py to inference depth/layout. Use --cfg and --pth to specify the path to config file and pretrained weight. Specify input path with --inp. Glob pattern for a batch of files is avaiable. The results are stored into --out directory with the same filename with extention set ot .depth.png and .layout.txt.

Example for depth:

python infer_depth.py --cfg config/mp3d_depth/HOHO_depth_dct_efficienthc_TransEn1_hardnet.yaml --pth ckpt/mp3d_depth_HOHO_depth_dct_efficienthc_TransEn1_hardnet/ep60.pth --out assets/ --inp assets/pano_asmasuxybohhcj.png

Example for layout:

python infer_layout.py --cfg config/mp3d_layout/HOHO_layout_aug_efficienthc_Transen1_resnet34.yaml --pth ckpt/mp3d_layout_HOHO_layout_aug_efficienthc_Transen1_resnet34/ep300.pth --out assets/ --inp assets/pano_asmasuxybohhcj.png

Visualization tools

To visualize layout as 3D mesh, run:

python vis_layout.py --img assets/pano_asmasuxybohhcj.png --layout assets/pano_asmasuxybohhcj.layout.txt

Rendering options: --show_ceiling, --ignore_floor, --ignore_wall, --ignore_wireframe are available. Set --out to export the mesh to ply file. Set --no_vis to disable the visualization.

To visualize depth as point cloud, run:

python vis_depth.py --img assets/pano_asmasuxybohhcj.png --depth assets/pano_asmasuxybohhcj.depth.png

Rendering options: --crop_ratio, --crop_z_above.

Reproduction

Please see README_reproduction.md for the guide to:

prepare the datasets for each task in our paper
reproduce the training for each task
reproduce the numerical results in our paper with the provided pretrained weights

Citation

@inproceedings{SunSC21,
  author    = {Cheng Sun and
               Min Sun and
               Hwann{-}Tzong Chen},
  title     = {HoHoNet: 360 Indoor Holistic Understanding With Latent Horizontal
               Features},
  booktitle = {CVPR},
  year      = {2021},
}

hohonet's People

Contributors

Stargazers

Watchers

hohonet's Issues

Ground Truth same as HorizonNet?

Hi thank you for your work and writing such a well documented repositories. I wanted to ask if this repo had the same ground truth format as what is referenced in your HorizonNet project description from here: https://github.com/sunset1995/HorizonNet/blob/master/README_PREPARE_DATASET.md

Thank you so much for any insight you are able to give! and again thank you for your work

for perspective images

Hi, I am pretty new about the 3D layout estimation and reconstruction research.

is there any way to make HoHoNet work for perspective images?

Thanks:)

The unit of the depth graph

What's the unit of the depth graph? and What's the unit of the 3d point cloud?

Spherical coordinate system used in codebase

Hi, thanks for your excellent work and repo. Question for you -- how are you defining your spherical coordinate system? From the code below, it seems you have a reflection about the y axis, since u is not negated in get_uni_sphere_xyz() (image unwraps from left to right clockwise, but theta unwraps right to left counterclockwise).

A more traditional spherical coordinate system is the one below:

where:

I noted that you were using r = \rho cos(\phi), if \phi=v and \rho=1, which you call c = np.cos(v). But this doesn't match the traditional spherical coordinate system. Could you explain a bit more about your derivation? Thanks.

https://github.com/sunset1995/HoHoNet/blob/master/vis_depth.py#L7

def get_uni_sphere_xyz(H, W):
    j, i = np.meshgrid(np.arange(H), np.arange(W), indexing='ij')
    u = (i+0.5) / W * 2 * np.pi
    v = ((j+0.5) / H - 0.5) * np.pi
    z = -np.sin(v)
    c = np.cos(v)
    y = c * np.sin(u)
    x = c * np.cos(u)
    sphere_xyz = np.stack([x, y, z], -1)
    return sphere_xyz

What is the order of the segmentation labels provided in the segmentation notebook?

The color map used in the notebook does not correspond to the color map of the paper. This makes it hard to check if the segmentation works in other examples.

The result is not match with your given npz file

Hi,I use your given pth file and your yaml
I use test_depth.py file,
ckpt/s2d3d_depth_HOHO_depth_dct_efficienthc_TransEn1/ep60.pth
config/s2d3d_depth/HOHO_depth_dct_efficienthc_TransEn1.yaml
{'mre': array(0.10142188), 'mae': array(0.2026864), 'rmse': array(0.38335027), 'rmse_log': array(0.06684125), 'log10': array(0.04376619), 'delta_1': array(0.90537266), 'delta_2': array(0.96934565), 'delta_3': array(0.98862388)}

mre 163.6033
mae 1.8321
rmse 2.0242
rmse_log 2.1812
log10 2.1632
delta_1 0.0001
delta_2 0.0002
delta_3 0.0005

The CKPT file has become invalid.

Does anyone have access to files ckpt?

Where is BiFuse's stitching

I can't find the code in BiFuse repository, please remind me the direct location.

how to cut wall, floor, ceil images in particularly

Semantic segmentation on Matterport3D dataset

Hi,
Thanks for your great work !

I find this work performs the semantic segmentation task on the S-2D-3D dataset, but it seems that the Matterport3D dataset also contains panoramic semantic information.

It is possible that I use the HoHoNet to perform semantic segmentation on the Matterport3D dataset? Have you tried it?

Thanks !

How to get the correct depth?

I have downloaded the Stanford2D3D dataset. but in the ground truth image of depth like below:

Could you let me know how you got this result in your paper?

why i get so bad sem result ??

input image: 'assets/pano_asmasuxybohhcj.png'

get the sem result:

and my code is:
import os
import argparse
import importlib

import cv2
from natsort import natsorted

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F

from lib.config import config, update_config, infer_exp_id

if name == 'main':

# Parse args & config
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--cfg', default='config/s2d3d_sem/HOHO_depth_dct_efficienthc_TransEn1_h1024_fold1_resnet101rgb.yaml')
parser.add_argument('--pth')
parser.add_argument('--out')
parser.add_argument('--vis_dir', default=True)
parser.add_argument('--y', action='store_true')
parser.add_argument('--test_hw', type=int, nargs='*')
parser.add_argument('opts',
                    help='Modify config options using the command-line',
                    default=None, nargs=argparse.REMAINDER)
args = parser.parse_args()
update_config(config, args)
device = 'cuda' if config.cuda else 'cpu'

if config.cuda and config.cuda_benchmark:
    torch.backends.cudnn.benchmark = False

# Init global variable
if not args.pth:
    from glob import glob
    exp_id = infer_exp_id(args.cfg)
    exp_ckpt_root = os.path.join(config.ckpt_root, exp_id)
    args.pth = natsorted(glob(os.path.join(exp_ckpt_root, 'ep*pth')))[-1]
    print(f'No pth given,  inferring the trained pth: {args.pth}')

# Init network
model_file = importlib.import_module(config.model.file)
model_class = getattr(model_file, config.model.modelclass)
net = model_class(**config.model.kwargs).to(device)
net.load_state_dict(torch.load(args.pth))
net = net.to(device).eval()

# Start eval
cm = 0
num_classes = config.model.kwargs.modalities_config.SemanticSegmenter.num_classes
with torch.no_grad():
    color = cv2.imread('assets/pano_asmasuxybohhcj.png')
    # color = cv2.imread('assets/1.jpg')
    x = torch.from_numpy(color).permute(2, 0, 1)[None].float()/255.
    if x.shape[2:] != config.dataset.common_kwargs.hw:
        # x = F.interpolate(x, size=config.dataset.common_kwargs.hw, mode='bilinear', align_corners=False)
        x = torch.nn.functional.interpolate(x, size=config.dataset.common_kwargs.hw, mode='area')
    x = x.to(device)

    pred_sem = net.infer(x)['sem']

    # Visualization
    if args.vis_dir:
        import matplotlib.pyplot as plt
        from imageio import imwrite
        cmap = (plt.get_cmap('gist_rainbow')(np.arange(num_classes) / num_classes)[...,:3] * 255).astype(np.uint8)

        vis_sem = cmap[pred_sem[0].argmax(0).cpu().numpy()]

        color = cv2.resize(color, (vis_sem.shape[1], vis_sem.shape[0]))
        vis_sem = (color * 0.2 + vis_sem * 0.8).astype(np.uint8)
        cv2.imwrite('result.jpg', vis_sem)

        cv2.imshow('seg', vis_sem)
        cv2.waitKey(0)

i also check the test_sem.py and infer_sem.ipynb.

How to calculate the weights of the CrossEntropyLoss, i.e. data/s2d3d_sem/label13_weight.pth?

The codes use the weights of 13 classes to train the segmentation networks, but how to calculate the weights of different classes for the cross-entropy loss? There is no corresponding part in the pre-processing codes or uploaded file in the repository. Could you please provide the values of the weights? Thanks!

Training on image size other than 512x1024?

Hi, I see the convolution layers are hard-coded for training on the images of 512x1024 resolution. How can I train it on images of size let say 256x512?

Request for another paper "Indoor Panorama Planar 3D Reconstruction via Divide and Conquer"

Thank you for release the cool work of HoHoNet.
I found another paper("Indoor Panorama Planar 3D Reconstruction via Divide and Conquer") also from you. Do you have any plan to open source too?
Or will you release the pre-train weight for testing and comparing with relative papers?