whu-usi3dv / freereg Goto Github PK

[ICLR 2024] FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators

Home Page: https://whu-usi3dv.github.io/FreeReg/

Python 100.00%

cross-modality-feature-extraction diffusion-model diffusion-models image-to-point-cloud-registration paper diffusion-feature

freereg's People

Contributors

Stargazers

Watchers

Forkers

hiyyg pean1128 seu-jk alexandor91 whuhxb tyfan0321

freereg's Issues

Request for the model.

I have some issues in loading the local model as follows. Since the MiDaS repository is also downloaded from the Internet. However, in this case, unsuccessful Internet access does not result in an error. Could you please send me a copy of it?
midas = torch.hub.load("/mnt/proj/SOTAs/ZoeDepth-main/pkgs/intel-isl_MiDaS_master", midas_model_type, pretrained=use_pretrained_midas, source='local')

Using the yaml file you provided will install the cpu version of pytorch.

A modification to the contents of the file is suggested.

MinkowskiEngine的安装需求

非常牛的工作！但是我想问下在README提示的环境配置中，我无法成功安装MinkowskiEngine，我对这个库的难装程度也有所耳闻，请问你们确实在README提示的环境下完成MinkowskiEngine的安装的么？有没有其他细节可以提示一下？

How did you set the dpt_intrinsic for a new pair?

I noticed the dpt_intrinsic of the demo is [[574.541,0,322.522],[0,577.584,238.559],[0,0,1]]. But for a new pair, how should we set this value?

the number of keypoints for matching

The work is inspiring.

Note that we uniformly sample a dense grid of keypoints on both the depth map and the image.

So how many points are sampled for matching?

Solving ''CUDA out of memory'' -- Run FreeReg on RTX3060 (12G)

RuntimeError: CUDA out of memory. Tried to allocate 968.00 MiB (GPU 0; 11.76 GiB total capacity; 8.87 GiB already allocated; 484.12 MiB free; 9.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My device: RTX 3060 with 12GB capacity.

Is there someway to run this? Thanks!

https://huggingface.co/runwayml/stable-diffusion-v1-5 404

请问谁有保留这个ckpt吗，huggingface上这个模型权重被撤了

Segmentation fault (core dumped) when running on my own images and point cloud ?

Hi. I have been able to run the demo, however when I run on my own images and point cloud, then it could not run:

/home/researcher/anaconda3/envs/freereg/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/init.py:36: UserWarning: The environment variable OMP_NUM_THREADS not set. MinkowskiEngine will automatically set OMP_NUM_THREADS=16. If you want to set OMP_NUM_THREADS manually, please export it on the command line before running a python script. e.g. export OMP_NUM_THREADS=12; python your_program.py. It is recommended to set it below 24.
warnings.warn(
logging improved.
Overwriting config with config_version None
img_size [384, 512]
/home/researcher/anaconda3/envs/freereg/lib/python3.8/site-packages/torch/functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3587.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Params passed to Resize transform:
width: 512
height: 384
resize_target: True
keep_aspect_ratio: True
ensure_multiple_of: 32
resize_method: minimal
/home/researcher/anaconda3/envs/freereg/lib/python3.8/site-packages/torch/nn/modules/transformer.py:306: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Using pretrained resource local::./tools/zoe/models/ZoeD_M12_NK.pt
Loaded successfully
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
/home/researcher/anaconda3/envs/freereg/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
/home/researcher/anaconda3/envs/freereg/lib/python3.8/site-packages/transformers/modeling_utils.py:433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(checkpoint_file, framework="pt") as f:
Loaded model config from [./tools/controlnet/models/control_v11f1p_sd15_depth.yaml]
Loaded state_dict from [./tools/controlnet/models/v1-5-pruned.ckpt]
Loaded state_dict from [./tools/controlnet/models/control_v11f1p_sd15_depth_ft.pth]
Global seed set to 12345
We force to use step-150 (~150 rather than 150) for our control process use 20 steps!
source-feat:['rgb_df', 'rgb_gf']
target-feat:['dpt_df', 'dpt_gf']
weight: [0.5 0.5]
we use zoe-ransac solver for source-rgb and target-dpt!
[Open3D WARNING] Read PTS: only points and colors attributes are supported.
Estimating zoe-depth for rgb on demo:
100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 66576.25it/s]
50%|██████████████████████▌ | 1/2 [00:08<00:08, 8.99s/it]Segmentation fault (core dumped)

How can I make it work ? Thank you very much.

测试自己的数据

作者你好，我跑完提供数据的demo后，尝试在自己的数据上测试，将rgb_size改为（2158，3844），相机内参也做了相应更改，但是运行的时候出现以下错误：
logging improved.
Overwriting config with config_version None
img_size [384, 512]
/root/miniconda3/envs/gs_model/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402412426/work/aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Params passed to Resize transform:
width: 512
height: 384
resize_target: True
keep_aspect_ratio: True
ensure_multiple_of: 32
resize_method: minimal
Using pretrained resource local::./tools/zoe/models/ZoeD_M12_N.pt
Loaded successfully
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./tools/controlnet/models/control_v11f1p_sd15_depth.yaml]
Loaded state_dict from [./tools/controlnet/models/v1-5-pruned.ckpt]
Loaded state_dict from [./tools/controlnet/models/control_v11f1p_sd15_depth_ft.pth]
Seed set to 12344
We force to use step-150 (~150 rather than 150) for our control process use 20 steps!
source-feat:['rgb_df', 'rgb_gf']
target-feat:['dpt_df', 'dpt_gf']
weight: [0.5 0.5]
we use zoe-ransac solver for source-rgb and target-dpt!
Estimating zoe-depth for rgb on demo:
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 9289.71it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:17<00:00, 68.83s/it]
0%| | 0/1 [00:00<?, ?it/s]/root/miniconda3/envs/gs_model/lib/python3.9/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
warnings.warn(
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:12<00:00, 12.33s/it]
Evaling on demo...
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/data_user/ysl/FreeReg/demo.py", line 150, in
mm_reg.run()
File "/root/data_user/ysl/FreeReg/demo.py", line 121, in run
self.eval()
File "/root/data_user/ysl/FreeReg/demo.py", line 109, in eval
self.evalor.run({'demo':self.meta})
File "/root/data_user/ysl/FreeReg/pipeline/gen_eval.py", line 67, in run
smatch_xyz, tmatch_xyz = self.eval_pair(stype, ttype, sitem, titem, pps)
File "/root/data_user/ysl/FreeReg/pipeline/gen_eval.py", line 46, in eval_pair
gts, smask = self.eval_mask(source_type,sitem)
File "/root/data_user/ysl/FreeReg/pipeline/gen_eval.py", line 34, in eval_mask
gtd = gtd[uv[:,1],uv[:,0]]
IndexError: index 986 is out of bounds for axis 0 with size 968

这是什么原因呢

Error when loading Zoe

Hi !

Can you please help to check this error?

File "/FreeReg/tools/zoe/zoedepth/models/model_io.py", line 49, in load_state_dict
model.load_state_dict(state)
File "/anaconda3/envs/freereg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ZoeDepth:
Unexpected key(s) in state_dict: "core.core.pretrained.model.blocks.0.attn.relative_position_index", "core.core.pretrained.model.blocks.1.attn.relative_position_index", "core.core.pretrained.model.blocks.2.attn.relative_position_index", "core.core.pretrained.model.blocks.3.attn.relative_position_index", "core.core.pretrained.model.blocks.4.attn.relative_position_index", "core.core.pretrained.model.blocks.5.attn.relative_position_index", "core.core.pretrained.model.blocks.6.attn.relative_position_index", "core.core.pretrained.model.blocks.7.attn.relative_position_index", "core.core.pretrained.model.blocks.8.attn.relative_position_index", "core.core.pretrained.model.blocks.9.attn.relative_position_index", "core.core.pretrained.model.blocks.10.attn.relative_position_index", "core.core.pretrained.model.blocks.11.attn.relative_position_index", "core.core.pretrained.model.blocks.12.attn.relative_position_index", "core.core.pretrained.model.blocks.13.attn.relative_position_index", "core.core.pretrained.model.blocks.14.attn.relative_position_index", "core.core.pretrained.model.blocks.15.attn.relative_position_index", "core.core.pretrained.model.blocks.16.attn.relative_position_index", "core.core.pretrained.model.blocks.17.attn.relative_position_index", "core.core.pretrained.model.blocks.18.attn.relative_position_index", "core.core.pretrained.model.blocks.19.attn.relative_position_index", "core.core.pretrained.model.blocks.20.attn.relative_position_index", "core.core.pretrained.model.blocks.21.attn.relative_position_index", "core.core.pretrained.model.blocks.22.attn.relative_position_index", "core.core.pretrained.model.blocks.23.attn.relative_position_index".

Greate Work!

Thank you for sharing the work. I have temporarily reproduced demo.py, but it takes up more GPU. I will try to use it to register my own dataset

On the Calculation of the Final Pose of Registered Images

Hello!

Based on my understanding, the Tpre matrix generated by the match results is the transformation matrix that converts the extrinsics of the input point cloud (pcd) projection to the camera pose where the image is registered within the point cloud.

Could you provide a example tool code that allows the user to directly obtain the final pose of the registered image? If my understanding is incorrect, could you please explain how to accurately determine the final pose of the registered image?

Thank you!