Coder Social home page Coder Social logo

sunset1995 / hohonet Goto Github PK

View Code? Open in Web Editor NEW
103.0 6.0 23.0 9.59 MB

"HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features" official pytorch implementation.

Home Page: https://sunset1995.github.io/HoHoNet/

License: MIT License

Python 18.46% Jupyter Notebook 81.54%
computer-vision 360-photo room-layout depth-estimation semantic-segmentation cvpr2021 hohonet

hohonet's Introduction

HoHoNet

Code for our paper in CVPR 2021: HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features (paper, video).

teaser

News

  • April 3, 2021: Release inference code, jupyter notebook and visualization tools. Guide for reproduction is also finished.
  • March 4, 2021: A new backbone HarDNet is included, which shows better speed and depth accuracy.

Pretrained weight

Links to trained weights ckpt/: download on Google drive or download on Dropbox.

Inference

In below, we use an out-of-training-distribution 360 image from PanoContext as an example.

Jupyter notebook

See infer_depth.ipynb, infer_layout.ipynb, and infer_sem.ipynb for interactive demo and visualization.

Batch inference

Run infer_depth.py/infer_layout.py to inference depth/layout. Use --cfg and --pth to specify the path to config file and pretrained weight. Specify input path with --inp. Glob pattern for a batch of files is avaiable. The results are stored into --out directory with the same filename with extention set ot .depth.png and .layout.txt.

Example for depth:

python infer_depth.py --cfg config/mp3d_depth/HOHO_depth_dct_efficienthc_TransEn1_hardnet.yaml --pth ckpt/mp3d_depth_HOHO_depth_dct_efficienthc_TransEn1_hardnet/ep60.pth --out assets/ --inp assets/pano_asmasuxybohhcj.png

Example for layout:

python infer_layout.py --cfg config/mp3d_layout/HOHO_layout_aug_efficienthc_Transen1_resnet34.yaml --pth ckpt/mp3d_layout_HOHO_layout_aug_efficienthc_Transen1_resnet34/ep300.pth --out assets/ --inp assets/pano_asmasuxybohhcj.png

Visualization tools

To visualize layout as 3D mesh, run:

python vis_layout.py --img assets/pano_asmasuxybohhcj.png --layout assets/pano_asmasuxybohhcj.layout.txt

Rendering options: --show_ceiling, --ignore_floor, --ignore_wall, --ignore_wireframe are available. Set --out to export the mesh to ply file. Set --no_vis to disable the visualization.

To visualize depth as point cloud, run:

python vis_depth.py --img assets/pano_asmasuxybohhcj.png --depth assets/pano_asmasuxybohhcj.depth.png

Rendering options: --crop_ratio, --crop_z_above.

Reproduction

Please see README_reproduction.md for the guide to:

  1. prepare the datasets for each task in our paper
  2. reproduce the training for each task
  3. reproduce the numerical results in our paper with the provided pretrained weights

Citation

@inproceedings{SunSC21,
  author    = {Cheng Sun and
               Min Sun and
               Hwann{-}Tzong Chen},
  title     = {HoHoNet: 360 Indoor Holistic Understanding With Latent Horizontal
               Features},
  booktitle = {CVPR},
  year      = {2021},
}

hohonet's People

Contributors

sunset1995 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hohonet's Issues

Spherical coordinate system used in codebase

Hi, thanks for your excellent work and repo. Question for you -- how are you defining your spherical coordinate system? From the code below, it seems you have a reflection about the y axis, since u is not negated in get_uni_sphere_xyz() (image unwraps from left to right clockwise, but theta unwraps right to left counterclockwise).

A more traditional spherical coordinate system is the one below:
image
where:
Screen Shot 2021-10-08 at 1 49 46 PM

I noted that you were using r = \rho cos(\phi), if \phi=v and \rho=1, which you call c = np.cos(v). But this doesn't match the traditional spherical coordinate system. Could you explain a bit more about your derivation? Thanks.

https://github.com/sunset1995/HoHoNet/blob/master/vis_depth.py#L7

def get_uni_sphere_xyz(H, W):
    j, i = np.meshgrid(np.arange(H), np.arange(W), indexing='ij')
    u = (i+0.5) / W * 2 * np.pi
    v = ((j+0.5) / H - 0.5) * np.pi
    z = -np.sin(v)
    c = np.cos(v)
    y = c * np.sin(u)
    x = c * np.cos(u)
    sphere_xyz = np.stack([x, y, z], -1)
    return sphere_xyz

The result is not match with your given npz file

Hi,I use your given pth file and your yaml
I use test_depth.py file,
ckpt/s2d3d_depth_HOHO_depth_dct_efficienthc_TransEn1/ep60.pth
config/s2d3d_depth/HOHO_depth_dct_efficienthc_TransEn1.yaml
{'mre': array(0.10142188), 'mae': array(0.2026864), 'rmse': array(0.38335027), 'rmse_log': array(0.06684125), 'log10': array(0.04376619), 'delta_1': array(0.90537266), 'delta_2': array(0.96934565), 'delta_3': array(0.98862388)}

mre 163.6033
mae 1.8321
rmse 2.0242
rmse_log 2.1812
log10 2.1632
delta_1 0.0001
delta_2 0.0002
delta_3 0.0005

why i get so bad sem result ??

input image: 'assets/pano_asmasuxybohhcj.png'

get the sem result:
image

and my code is:
import os
import argparse
import importlib

import cv2
from natsort import natsorted

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F

from lib.config import config, update_config, infer_exp_id

if name == 'main':

# Parse args & config
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--cfg', default='config/s2d3d_sem/HOHO_depth_dct_efficienthc_TransEn1_h1024_fold1_resnet101rgb.yaml')
parser.add_argument('--pth')
parser.add_argument('--out')
parser.add_argument('--vis_dir', default=True)
parser.add_argument('--y', action='store_true')
parser.add_argument('--test_hw', type=int, nargs='*')
parser.add_argument('opts',
                    help='Modify config options using the command-line',
                    default=None, nargs=argparse.REMAINDER)
args = parser.parse_args()
update_config(config, args)
device = 'cuda' if config.cuda else 'cpu'

if config.cuda and config.cuda_benchmark:
    torch.backends.cudnn.benchmark = False

# Init global variable
if not args.pth:
    from glob import glob
    exp_id = infer_exp_id(args.cfg)
    exp_ckpt_root = os.path.join(config.ckpt_root, exp_id)
    args.pth = natsorted(glob(os.path.join(exp_ckpt_root, 'ep*pth')))[-1]
    print(f'No pth given,  inferring the trained pth: {args.pth}')

# Init network
model_file = importlib.import_module(config.model.file)
model_class = getattr(model_file, config.model.modelclass)
net = model_class(**config.model.kwargs).to(device)
net.load_state_dict(torch.load(args.pth))
net = net.to(device).eval()

# Start eval
cm = 0
num_classes = config.model.kwargs.modalities_config.SemanticSegmenter.num_classes
with torch.no_grad():
    color = cv2.imread('assets/pano_asmasuxybohhcj.png')
    # color = cv2.imread('assets/1.jpg')
    x = torch.from_numpy(color).permute(2, 0, 1)[None].float()/255.
    if x.shape[2:] != config.dataset.common_kwargs.hw:
        # x = F.interpolate(x, size=config.dataset.common_kwargs.hw, mode='bilinear', align_corners=False)
        x = torch.nn.functional.interpolate(x, size=config.dataset.common_kwargs.hw, mode='area')
    x = x.to(device)

    pred_sem = net.infer(x)['sem']

    # Visualization
    if args.vis_dir:
        import matplotlib.pyplot as plt
        from imageio import imwrite
        cmap = (plt.get_cmap('gist_rainbow')(np.arange(num_classes) / num_classes)[...,:3] * 255).astype(np.uint8)

        vis_sem = cmap[pred_sem[0].argmax(0).cpu().numpy()]

        color = cv2.resize(color, (vis_sem.shape[1], vis_sem.shape[0]))
        vis_sem = (color * 0.2 + vis_sem * 0.8).astype(np.uint8)
        cv2.imwrite('result.jpg', vis_sem)

        cv2.imshow('seg', vis_sem)
        cv2.waitKey(0)

i also check the test_sem.py and infer_sem.ipynb.

How to get the correct depth?

I have downloaded the Stanford2D3D dataset. but in the ground truth image of depth like below:

image

Could you let me know how you got this result in your paper?

image

for perspective images

Hi, I am pretty new about the 3D layout estimation and reconstruction research.

is there any way to make HoHoNet work for perspective images?

Thanks:)

Semantic segmentation on Matterport3D dataset

Hi,
Thanks for your great work !

I find this work performs the semantic segmentation task on the S-2D-3D dataset, but it seems that the Matterport3D dataset also contains panoramic semantic information.

It is possible that I use the HoHoNet to perform semantic segmentation on the Matterport3D dataset? Have you tried it?

Thanks !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.