zju3dv / manhattan_sdf Goto Github PK

View Code? Open in Web Editor NEW

490.0 23.0 35.0 747 KB

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

Home Page: https://zju3dv.github.io/manhattan_sdf/

License: Other

Python 100.00%

3d-reconstruction 3d-vision computer-vision cvpr2022

manhattan_sdf's Introduction

News

06/03/2022 We provide the instruction to run on custom data here.
05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COLMAP*, ACMP, NeRF, UNISURF, NeuS, and VolSDF.
05/10/2022 To make the following works easier to compare with our model, we provide our quantitative and qualitative results, as well as the trained models on ScanNet here.
05/10/2022 We upload our processed ScanNet scene data to Google Drive.

Neural 3D Scene Reconstruction with the Manhattan-world Assumption

Project Page | Video | Paper

Neural 3D Scene Reconstruction with the Manhattan-world Assumption
Haoyu Guo^*, Sida Peng^*, Haotong Lin, Qianqian Wang, Guofeng Zhang, Hujun Bao, Xiaowei Zhou
CVPR 2022 (Oral Presentation)

Setup

Installation

conda env create -f environment.yml
conda activate manhattan

Data preparation

Download ScanNet scene data evaluated in the paper from Google Drive and extract them into data/. Make sure that the path is consistent with config file.

We provide the instruction to run on custom data here.

Usage

Training

python train_net.py --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050

Mesh extraction

python run.py --type mesh_extract --output_mesh result.obj --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050

Evaluation

python run.py --type evaluate --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{guo2022manhattan,
  title={Neural 3D Scene Reconstruction with the Manhattan-world Assumption},
  author={Guo, Haoyu and Peng, Sida and Lin, Haotong and Wang, Qianqian and Zhang, Guofeng and Bao, Hujun and Zhou, Xiaowei},
  booktitle={CVPR},
  year={2022}
}

Acknowledgement

Thanks to Lior Yariv for her excellent work VolSDF.
Thanks to Jianfei Guo for his implementation of VolSDF neurecon.
Thanks to Johannes Schönberger for his excellent work COLMAP.
Thanks to Shaohui Liu for his customized implementation of COLMAP as a submodule of NerfingMVS.

manhattan_sdf's People

Contributors

Stargazers

Watchers

manhattan_sdf's Issues

Hello!
I notice that the depth in ScannetV2 dateset is too large, almost larger than 1k. I have tried to normalize the depth by using (x - min) / (max - min), but the restruction and the metrics is terrible.
So I wonder how to normalize the depth, and which factors influence it.
Thanks!

intrinsic.txt problem

Thanks for your excellent work!

I am new in this area, I have a question for the dataset detailed information. How do you get the 'intrinsic.txt' file? Do you extract the intrinsic parameters of camera by COLMAP tools?

Different offset & scale values when performing the camera normalization by myself

Hi, appreciate for releasing the codes!

I attempt to do the camera normalization step according to this instruction, but I cannot obtain the same offset and scale value provided by you. For instance, the provided offset and scale of scene0050_00 are [4.24910, 2.30138, 1.15986] and [0.40543], while the parameters I got are [4.2678, 2.2656, 1.2732] and [0.4167]. I directly borrowed the code from VolSDF with some modifications to fit the format of scannet annotations.

They are slightly different, could you explain where the difference comes from and will it affect the final performance a lot?

Thanks a lot.

different mesh visual and some error by running Mesh extraction and Evaluation

how to get the camera poses

你好，对于自己的数据集经过colmap后，如何normalize the camera poses ；如果用原始的poses，可以吗
volsdf项目中的normalize_cameras.py 是对全局进行归一化得到一个normalize matrix, 你这个是针对每个场景都要进行normalize，得到对应的matrix，我最近才开始研究建模相关的深度学习论文，请给我一些建议，谢谢

Problem on semantic segmentation evaluation

Hi, I have evaluated the predicted semantic of Scannet_0050_00. I think this sequece has worse segmentation result than others based on semantic picture. However, the IoU_f, IoU_w, IoU_m are 0.73332 , 0.76606, 0.78970, is much higher than table 3 in paper about 0.62 0.52 0.57.

The GT used is based on the scannet semantic label, while floor contains 1, 161, 52; wall contains 3,140. The evaluation code is based on semantic_nerf calculate_segmentation_metrics, with ignore_label=-1.

Is there anything wrong? If possible, could I get your calculation formula and evaluation code for semantic evaluation?

Thanks for your work and looking forward to your reply!

Invalid values generated by sdf_net

Hi @ghy0324 @pengsida ,

Thanks for your great work, I've tried it on my custom scene and it works successfully!
However, when I try to reconstruct on a more difficult scene such like the entire home with one-time recording by my cell phone (sush as the floorplan below), the model will fail on the mesh extraction part.

In utils/mesh_utils.py, the function extract_mesh() will first generate the sdf with volume (NxNxN). However, my model predict all negative values in the sdf, which makes the following marching_cubes() fails.

Can you kindly give me some opinions that what is the root cause of this scenario?. Is it the model capacity problem because my scene is too large (~10m x ~10m) to remember? Or other problems owing to the input pose or input depth. I normalized the pose to a unit sphere and if I trained on a small scene such as the living room, the model works perfectly.

Thanks!!

Depth result of run_colmap is not same as provided depth data

Hi, thanks for sharing the code to help us to run a modified version of colmap. I download scannet data and use official python toolkit https://github.com/ScanNet/ScanNet/tree/master/SensReader/python to extract the data.

Then I run the https://github.com/zju3dv/manhattan_sdf/blob/main/docs/run_colmap/run.py.

However, when I check the consistency between your provided data and the result from above script. I found two differences.

The name of depth folder is depth_patchmatch rather than depth_colmap in provided data.
The depth image has different values between your provied data and the result from above script, such as the screenshot showed.

In this screenshot, pub10 is your provided data for 0050_00 scene of 10th image, obj10 is the result from your script.
You can see that there's inconsistency.
But the final metric is quite similar, as the screenshot shows.

Here Provided data is your provided data and From Scannet is from downloaded scannet data.

For the sake of rigorousness, I would like to know whether the difference is normal. I would appreciate a lot for your response!

Question about the difference between the pose of ScanNet and the processed data.

Thank you for your excellent work!
I am now trying to train new scene of ScanNet using your model. I have found that there is some difference between the pose of ScanNet and your processed data's pose (in t parts). Could you share the process method of pose with me? Thanks~

The coordinate of pose.txt?

I have read the data preperation guid. You said you get the pose from clomap, which means x points right, y points down, z points towards scene. If i use the blender data (x points right, y points up, z points towards scene), should i transform the pose to match yours like this?
` # coordinate change

location = np.array([bcam.matrix_world.decompose()[0]]).T

R_bcam2world = np.array(bcam.matrix_world.decompose()[1].to_matrix())

R_col2world = np.matmul(R_col2bcam, R_bcam2world)

T_col2world = np.matmul(R_col2bcam,location)

extrinsic = np.concatenate((R_col2world, T_col2world), axis=1)`

The Coordinate System of Colmap Depth

Hi~

I would like to ask a question about the colmap depth input.
In one scene, does the depth value of each view input 'COLMAP depth map' under an unique world coordinate?
or under the camera coordinate of each view frame?

Since original colmap tool outputs 'point3D.txt', how do you change the 3D points to depth?

Looking for your reply. Thank you!

Training and running on custom dataset

Thank you for this amazing work.

I wanted to run this for a custom dataset. I prepared everything as recommended here. However I am stuck at the part where we need a ground truth mesh for training. How do I get that?

Run on my own data

Could you please tell me the detail steps of running the code on our own data? I am confused with how to prepare the data, like the intrinsic.txt. And I don't know why would this happen

My pics are all 756*504 but I met this mistake. I don't know how to fix it. I hope to receive your reply,THX!

Scannet scenes used for training

Hi, thanks for sharing the amazing work!

I am curious about the number of scenes you used for training for the reconstruction task on the Scannet and 7-Scenes datasets . Did you train the model on all scenes or just a few scenes for the reconstruction?

Looking forward to your reply.

How to export optimized semantic segmentation result?

as the title shows.

something wrong with DDP

when i run :
python -m torch.distributed.launch --nproc_per_node=4 train_net.py --cfg_file configs/scannet/0084.yaml exp_name 0084

The link to download pretrained semantic segmentation model weight is invalid~

Hello, I find the link to download the weight of 2D semantic segmentation model invalid, could you provide another one?

How do you get GT mesh?

The GT mesh in your data is different from the one in ScanNet, which is in very low quailty. How do you get the GT mesh?

为什么使用colmap重建深度信息时，会出现bug？

因为我采用colmap重建相机位姿，所以我认为可以直接使用重建位姿的colmap模型，因此修改了run.py的代码。但当我这样做时，从fuse.ply.vis文件中读取信息的代码会报错，请问其中的原因是什么？或者，从fuse.ply.vis文件中读取信息的代码的作用是什么，能否提供注释？谢谢！
这是修改后的run.py主要代码
`for scene_id in ['214']:
source = f'/data8T/ydf/manhattan_sdf/data/tmp/{scene_id}/' # TODO: modify this to your path
target = f'/data8T/ydf/manhattan_sdf/data/{scene_id}'

os.makedirs(f'{target}/images', exist_ok=True)
os.makedirs(f'{target}/pose', exist_ok=True)
os.makedirs(f'{target}/depth_patchmatch', exist_ok=True)

if not os.path.exists(f'{source}/images/10.jpg'):
    sortFile(source)

colmap_path = "colmap"

with open(f'{source}/colmap_output.txt', 'a') as f:
    feature_extractor_args = [
        colmap_path, 'feature_extractor',
        '--database_path', os.path.join(source, "database.db"),
        '--image_path', os.path.join(source, "images"),
        '--ImageReader.single_camera', '1',
        # '--ImageReader.mask_path', os.path.join(basedir, args.masks),
    ]
    feat_output = (subprocess.check_output(feature_extractor_args, universal_newlines=True))
    f.writelines(feat_output)
    print('Features extracted')

    exhaustive_matcher_args = [
        colmap_path, 'exhaustive_matcher',
        '--database_path', os.path.join(source, "database.db"),
    ]
    exh_output = (subprocess.check_output(exhaustive_matcher_args, universal_newlines=True))
    f.writelines(exh_output)
    print('Exhaustive matched')

    p = os.path.join(source, 'sparse')
    if not os.path.exists(p):
        os.makedirs(p)

    mapper_args = [
        colmap_path, 'hierarchical_mapper',
        '--database_path', os.path.join(source, "database.db"),
        '--image_path', os.path.join(source, "images"),
        # '--ImageReader.mask_path', os.path.join(basedir, args.masks),
        '--output_path', os.path.join(source, 'sparse'),
    ]

    map_output = (subprocess.check_output(mapper_args, universal_newlines=True))
    f.writelines(map_output)
    print('Sparse map created')

    os.makedirs(f'{source}/dense/', exist_ok=True)

    danse_args = [
        colmap_path, 'image_undistorter',
        '--image_path', os.path.join(source, "images"),
        # '--ImageReader.mask_path', os.path.join(basedir, args.masks),
        '--input_path', os.path.join(source, 'sparse', '0'),
        '--output_path', os.path.join(source, 'dense'),
        '--output_type', 'COLMAP',
    ]

    danse_output = (subprocess.check_output(danse_args, universal_newlines=True))
    f.writelines(danse_output)
    print('Danse map created')

    patch_args = [
        colmap_path, 'patch_match_stereo',
        '--workspace_path', os.path.join(source, 'dense'),
        '--workspace_format', 'COLMAP',
        '--PatchMatchStereo.cache_size', '64',
    ]

    patch_output = (subprocess.check_output(patch_args, universal_newlines=True))
    f.writelines(patch_output)
    print('Patch match stereo succeed')

    stereo_args = [
        colmap_path, 'stereo_fusion',
        '--workspace_path', os.path.join(source, 'dense'),
        '--workspace_format', 'COLMAP',
        '--input_type', 'geometric',
        '--output_path', os.path.join(source, 'dense', 'fused.ply'),
        '--StereoFusion.cache_size', '64',
    ]

    stereo_output = (subprocess.check_output(stereo_args, universal_newlines=True))
    f.writelines(stereo_output)
    print('Stereo fusion succeed')

load_save_pose(source)
npy2one(source)

id_list = os.listdir(f'{source}/images')
id_list = [id[:-4] for id in id_list if id.endswith('0.jpg')]
id_list.sort(key=lambda _: int(_))

pose_dict = dict()
for id in id_list:
    pose_dict[id] = np.loadtxt(source + f'pose/{id}.txt')

id_list = [id for id in id_list if not np.isinf(pose_dict[id]).any()]
id_list.sort()

translation_list = []
for id in id_list:
    translation_list.append(pose_dict[id][None, :3, 3])
translation_list = np.concatenate(translation_list)
translation_center = (translation_list.max(axis=0) + translation_list.min(axis=0)) / 2
translation_list -= translation_center
max_cam_norm = np.linalg.norm(translation_list, axis=1).max()
scale = (scale_radius / max_cam_norm / 1.1)

for id in id_list:
    pose_dict[id][:3, 3] -= translation_center
    pose_dict[id][:3, 3] *= scale

with open(f'{source}/offset.txt', 'w') as f:
    f.write(f'{translation_center}')

with open(f'{source}/scale.txt', 'w') as f:
    f.write(f'{scale}')

os.system(f'cp {source}/intrinsic.txt {target}/intrinsic.txt')

for id in tqdm(id_list):
    color = cv2.imread(f'{source}/images/{id}.jpg')
    color = cv2.resize(color, (width, height))
    cv2.imwrite(f'{target}/images/{id}.png', color)
    np.savetxt(f'{target}/pose/{id}.txt', pose_dict[id])

intrinsic = np.loadtxt(f'{target}/intrinsic.txt')

images_bin_path = f'{source}/sparse/0/images.bin'
images = read_images_binary(images_bin_path)
names = [_[1].name for _ in images.items()]

shape = (height, width)

ply_vis_path = f'{source}/dense/fused.ply.vis'
assert os.path.exists(ply_vis_path)
masks = [np.zeros(shape, dtype=np.uint8) for name in names]
load_point_vis(ply_vis_path, masks)

for name, mask in tqdm(zip(names, masks)):
    depth_bin_path = f'{source}/dense/stereo/depth_maps/{name}.geometric.bin'
    if not os.path.exists(depth_bin_path):
        continue
    depth_fname = depth_bin_path
    depth = read_array(depth_fname)
    depth[mask == 0] = 0
    np.save(f'{target}/depth_patchmatch/{name[:-4]}.npy', depth)`

这是报错信息
Traceback (most recent call last): File "/home/ydf/manhattan_sdf/docs/run_colmap/run.py", line 179, in <module> load_point_vis(ply_vis_path, masks) File "/home/ydf/manhattan_sdf/docs/run_colmap/run.py", line 33, in load_point_vis idx, u, v = struct.unpack('<III', f.read(4 * 3)) struct.error: unpack requires a buffer of 12 bytes
其中的原因在于，按照你们提供的代码要求读取文件时，文件大小不匹配，导致不能读到更多信息

How to represent lens radial distortion with camera intrinsic matrix？

How to convert the interior parameters of colmap's SIMPLE_RADIAL camera model into a 4*4 interior parameter matrix? I saw that you said that when using custom data, camera intrinsic parameters should be saved to intrins.txt. But if there are camera distortion parameters, how to express them in a parameter matrix? Or do we just discard the distortion parameters?

大佬，您好，相机参数是怎么得到的啊

大佬，您好，相机参数是怎么得到的啊 @JiamingSuen @ybbbbt @pengsida @angshine @zehongs

Data Download problem

Sorry to bothter, but I can not download the ScanNet Data uploaded in OneDrive , could you upload to BaiduNetDisk or GoogleDrive ?

运行train_net.py时遇到问题

当我运行train_net.py时遇到以下问题

请问有没有什么解决方法？

Running on Custom Dataset

A big thanks for this excellent work. The training script is running fine with the provided scannet dataset. But please instruct how to run your code on other scannet data and custom datasets too. Few specific questions:

Can I replace 'depth_colmap' with GT depth images provided in the ScanNet datasets?
How to generate semantic_deeplab data?
Any transformation needed fot the poses?

Question about the depth loss

Hi, thanks for your beautiful work. And I recently read your paper, it's so amazing and enlightening. When viewing your source code, I'm confused about the depth_loss_clamp.

 if 'depth_loss_clamp' in loss_weights:
                depth_loss = depth_loss.clamp(max=loss_weights['depth_loss_clamp'])

Why the depth loss should be clamped?

Question about the preparing custom data.

I follow your description of Run on custom data.

I used colmap to generate image.txt, which contains pose. What should we do later? Is the pose processed with NeuS scheme?Or other things I need to do.

Thanks in advance.

Created groundtruth using your script is not good

Hi, thanks for kind sharing your method of creating the ground truth mesh from scannet in this #37 (comment) !
I followed your script and made slight modification by your suggestion, which is as follows.

Modifications are

I didn't multiply cfg.test_dataset.scale on voxel_length and voxel_length
I use ground truth depth and pose from scannet.
Divide the depth map by 1000

Except those three, others keep the same as your script .

But I found two differences. Take scene 0050_00 as an example.

There are some outliers of reconstructed scenes (visualized by meshlab)

The above ground truth is created by myself, the bottom is your provided.

File size differs

The first line was ground truth created by myself which is 4.5M , and the bottom is your provided which is 8.3 M.

I've checked all the issues for this repo but didn't find a solution. Could you please give me more instructions? Thanks for your help in advance!
`

Pretrained semantic segmentation model's results look strange

I've downloaded the pretrained model.pth file and used cuda 10.1, pytorch 1.6 and detectron 0.4 as it was required but the results on simple images look not good. Do you know what might be wrong here?

Calculate the new intrinsic and pose matrix after resizing image

Hi, developers. It's an excellent paper and very good code repo. I'm quite new for 3D indoor scene reconstruction, and I have a basic question here.

In your paper, you said, "Images are resized to 640 × 480 for both 2D semantic segmentation and scene reconstruction (4. Implementation details)". I have downloaded the original scannet data, and I found the original image size 1296*968. So I'm wondering how did you get the new intrinsic and pose matrix after resizing the image? I have surveyed a lot on the internet, but I'm still stuck here.

I've compared the intrinsic and pose matrix on the original scannet data and your provided data for the 4 scenes. I found that the intrinsic matrix didn't change. For pose matrix , as it's [R|t], I found only t , translation vector has changed, but I'm confused about why only translation vector t changed.

I would appreciate it a lot if you could help me!

Problem occurs in training

Hello, thanks for the wonderful work !

There is something wrong when training the net using:

python train_net.py --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050

the problem printed in the terminal is:

File "train_net.py", line 119, in
main()
File "train_net.py", line 115, in main
train(cfg, network)
File "train_net.py", line 70, in train
trainer.val(epoch, test_loader)
File "/home/manhattan_sdf/manhattan_sdf/lib/train/trainers/trainer.py", line 126, in val
mesh = extract_mesh(self.network.net.model.sdf_net, self.network.net.model.semantic_net)
File "/home/manhattan_sdf/manhattan_sdf/lib/utils/mesh_utils.py", line 115, in extract_mesh
sdf, level=level, spacing=[float(v) / N for v in volume_size]
File "/home/anaconda3/envs/manhattan_sdf/lib/python3.7/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 137, in marching_cubes
mask=mask)
File "/home/anaconda3/envs/manhattan_sdf/lib/python3.7/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 300, in _marching_cubes_lewiner
level = float(level)
TypeError: float() argument must be a string or a number, not 'SemanticNet'

It seems two nets are fed into 'extract_mesh' function.
Can you tell me how to change the code to make it work fine?

Looking for your reply. Thank you!

GPU-memory?

I use TITAN RTX(24g) for training, but CUDA out of memory still occurred when the step is about 2454.
eta: 0:05:38 epoch: 2 step: 2454 rgb_loss: 0.1126 psnr: 15.9341 depth_loss: 0.0696 joint_loss: 0.1813 cross_entropy_loss: 0.7693 eikonal_loss: 0.0659 loss: 0.5825 beta: 0.0238 theta: -0.0845 rgb_weight: 1.0000 depth_weight: 1.0000 depth_loss_clamp_weight: 0.5000 joint_weight: 0.0500 ce_weight: 0.5000 eikonal_weight: 0.1000 data: 0.0226 batch: 0.6272 lr: 0.000456 max_mem: 22170
What should I do then? thanks!

Question about generated ply by pretrain model VS ply model on project page

Hi @ghy0324

Thank you again for this wonderful project!

I have a question regarding your presented model. In your webpage , I do see reconstruction with high texture quality.

However from the evaluation data you provided (gt.obj) and inference result(I refer to ply model), I have not observed similar quality. Here is a capture.

Can you explain why?

The visualization comes from meshlab. The file downloaded from eval zip file you provided https://zjueducn-my.sharepoint.com/personal/guohaoyu_zju_edu_cn/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fguohaoyu%5Fzju%5Fedu%5Fcn%2FDocuments%2Fmanhattan%5Fsdf%2Fdata%2Fmanhattan%5Fsdf%5Fscannet%5Fdata&ga=1

Another question is, how to achieve high quality texture as presented in your project webpage? I have not found specific detail in your paper. And I will be appreciate if you can point out explicitly which section you discussed this.

Thank you,

Questions about training my own dataset

已经验证了0050_00数据集，在自己的数据集上训练，meshlab可视化没体现效果。图片帧数据采集一秒取5帧，ARcore pose转RT 4*4，相机内参，分割图没问题，深度估计数据也跑了NerfingMVS colmap，另外配置文件改了scale offset，还有什么需要对齐的？

Question about COLMAP generated depth map

Hi Author,

Please allow me to show my appreciation for this great work!

I have a question regarding data preparation #5 , where you give a guideline. I followed your instruction on #7 to generate sparse and dense reconstruction following the colmap provided by NerfingMVS.

However, after obtaining predicted depth map by loading files under 'dense/stereo/depth_maps', image_name + '.geometric.bin' as in #7 and transform in numpy array, I compared both yours and my prediction. I found that there is a large difference. After I replace the colmap depth of my version and retrain, I can generally converge, but the appearance is not good. There are large area of distortion on floor.

How did I generated depth prediction?

following your instruction to download colmap version as in NerfingMVS. The version support fusion ops. I manually verify this following issues in NerfingMVS.
run script here https://github.com/weiyithu/NerfingMVS/blob/main/colmap.sh to generate depth map with sparse+depth reconstruction.
gather depth map as the function your refer to in #7 and save them to numpy array

I have no idea why there is a large difference. So here is my question,

Are you doing the similar as mine? Have you normalized the colmap depth result?
Do we need to adjust pose according to generated colmap depth? I use your pose instead. Not sure if it is related. Actually I have no idea how to adjust pose :D.
for question 1, if yes, can you instruct me the possible reason that I failed to generate good result? If no, can you tell me which part I missed?

Thank you and really appreciate if you can help answer my question.

About the depth loss L_d

Hi, thanks for your wonderful work. I am not familiar with this area, so I have a question about the calculation of the depth loss L_d(Eq.7). Why only the set of camera rays going through image pixels that have depth values estimated by COLMAP are used for calculating the depth loss? why not use the depth of all pixels?

Facing issue while training.

Hi, thankyou for sharing your wonderful work.

I've followed the steps given in the setup. When I proceed further to the training step and run python train_net.py --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050, I'm facing a RuntimeError: Function 'MmBackward' returned nan values in its 1th output.

How should I solve this error?

One issue regarding image rendering with pre-trained model

Hi @ghy0324,

Thanks for sharing your pre-trained model.

I tested one model (0050_00.pth) to render images at training camera poses.

Please check the same image below. The left figure is the GT image and the right figure is the rendered image.

The rendered image is still very blurry using your pre-trained model. I only changed N_rays = 512 due to memory issues on my machine. Is it by-design to trade image loss (or L_img in Eq 5) for a better geometry ?

Any idea why this issue happens ?

Thanks.

Meaning of scale and offset?

In config file, what does the scale and offset mean? Thanks a lot!

scale: 0.44963 offset: [1.23815, 2.57319, 1.38001]

Visualization issue

Hi Authors, thanks for your work. After running "mesh extraction" I can get .obj files. I would like to ask what tools I can use to get the same visualization as Fig.5, Fig.11, and Fig.12 in your paper?

Problem on data preparation

Hi, thanks for your wonderful work!
I woud like to train on the new recorded sequences. so I wonder when the 'Data preparation' will be published?
Looking forward to your early reply.

你们分享的语义分割的那个模型的链接失效了，能麻烦在分享一次吗

The script to generate groundtruth mesh

Hi authors,

Thank you for sharing your great work! I notice that you are using the mesh generated by TSDF-fusion by Open3d instead of the original mesh provided by ScanNet to evaluate your results. I am expecting to generate the groundtruth mesh with you in the same manner for a fair comparison with our algorithm. So it will be great if you can share the script!

Thanks!

why divide distance_map by depth_ratio？

Hi @ghy0324,
Thanks for the amazing and inspiring work!
But I have a quick question w.r.t the depth calculation.Specifically, why do you divide distance_map by depth_ratio? what's the difference between distance and depth (I originally thought that distance d to the camera origin ray_o is exactly the depth value for corresponding pixel)?

manhattan_sdf/lib/networks/network.py

Line 131 in 4819741

depth_map = distance_map / depth_ratio

Looking forward to your reply.

Performance Issue

which scene used in 7-scene?

hello,

In paper, the authors said 4 scenes are selected from 7-Scenes. Could you share the scene data used in 7-Scenes, as ScanNet?

Thank you very much.

为什么我运行strain_net.py代码的时候GPU内存极其容易爆满？

你好：
我在按照文件构建好项目，运行strain_net.py的时候，12G显存一下子就满了，我修改了batch_size也是容易爆满显存，请问你训练的机子都是显存很高的吗？
谢谢！

Out of Memory

When I run the code with pytorch 1.9 for evaluation, it appears this problem. And my memory is 8G, is it enough for evaluation?

Questions for preprocessing custom data

Thank you for your nice work!

I want to apply this work to my own data.

I have some images and their own camera poses, and can also get new poses from colmap.

However, regardless of the source of poses, their axis are not aligned with dominant axis, such as x, y, and z-axis.

As far as I understand from you paper, our data must be aligned each axis by Manhattan-world assumption.

Is it okay just to use the colmap procedure you provide to apply your code to my data?

Thank you!

Error: No such file or directory: 'configs/default.yaml (VSCode)

When I try to replicate in VScode

[Running] python -u "f:\Code\Code04_0617_Manhattan-SDF\manhattan_sdf-1\run.py"
Traceback (most recent call last):
File "f:\Code\Code04_0617_Manhattan-SDF\manhattan_sdf-1\run.py", line 1, in
from lib.config import args, cfg
File "f:\Code\Code04_0617_Manhattan-SDF\manhattan_sdf-1\lib\config_init_.py", line 1, in
from .config import cfg, args
File "f:\Code\Code04_0617_Manhattan-SDF\manhattan_sdf-1\lib\config\config.py", line 105, in
cfg = make_cfg(args)
File "f:\Code\Code04_0617_Manhattan-SDF\manhattan_sdf-1\lib\config\config.py", line 87, in make_cfg
cfg.merge_from_file(args.cfg_file)
File "f:\Code\Code04_0617_Manhattan-SDF\manhattan_sdf-1\lib\config\yacs.py", line 169, in merge_from_file
with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'configs/default.yaml'

How should I solve this problem？

One issue regarding 3D reconstruction quality

Hi @ghy0324 ,

Thanks for your work. I found one issue when I tried to reproduce your results.

I trained the model with your code from scratch using 50 epochs. Unfortunately, the 3D reconstruction quality is not as good as the one in the project page. It lacks texture details and the walls are bumpy at some parts.

I also compared rendered images (at training camera poses) with the training images, and found that the rendered images are pretty blurry. Please check the sample images below.

Could you please provide insights about this issue from your experience ?

BTW, it would be great if you can provide a pre-trained model as reference.