lioryariv / idr Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
In the function load_K_Rt_from_P()
, the rotation matrix was transposed after got from the cv2.decomposeProjectionmatrix()
. As I know, the result from the cv2.decomposeProjectionmatrix()
is the really rotation matrix and t and not need to be changed, why the rotation matrix iwas transposed but others was not changed.
Hi, thanks for the great work!
I wonder if this differentiable renderer is suitable to supervise the SDF by normal maps.
Since computing normals would implement another gradient computation, is there any trouble with this?
Thanks!
How do you get the input cameras.npz (world_mat and scale_mat), not the output cameras_new.npz?
Hi, just going through the rgb loss calculation part. For my understanding, it seems the highlighted part should be: l_1 loss divided by the float((object_mask).sum()), which means averge the valid number of pixels instead of also considering the pixels outside of the object mask. Hope someone can correct me if there is anything wrong, thanks.
I encountered a "ResolvePackageNotFound" error when attempting to create a conda virtual environment using environment.yml file.
ResolvePackageNotFound:
- markupsafe==1.1.1=py37h7b6447c_0
- libarchive==3.3.3=h5d8350f_5
- tbb==2020.2=hfd86e86_0
- libedit==3.1.20181209=hc058e9b_0
- zeromq==4.3.1=he6710b0_3
- astropy==3.1.2=py37h7b6447c_0
- wrapt==1.11.1=py37h7b6447c_0
- glib==2.56.2=hd408876_0
- jbig==2.1=hdba287a_0
- unixodbc==2.3.7=h14c3975_0
- mkl_random==1.0.2=py37hd81dba3_0
- secretstorage==3.1.1=py37_0
- pillow==5.4.1=py37h34e0f95_0
- hdf5==1.10.4=hb1b8bf9_0
- pyrsistent==0.14.11=py37h7b6447c_0
- readline==7.0=h7b6447c_5
- krb5==1.16.1=h173b8e3_7
- python==3.7.3=h0371630_0
- pandas==0.24.2=py37he6710b0_0
- graphite2==1.3.13=h23475e2_0
- cffi==1.12.2=py37h2e261b9_1
- intel-openmp==2019.3=199
- cairo==1.14.12=h8948797_3
- pyzmq==18.0.0=py37he6710b0_0
- get_terminal_size==1.0.0=haa9412d_0
- libsodium==1.0.16=h1bed415_0
- lxml==4.3.2=py37hefd8a0e_0
- pixman==0.38.0=h7b6447c_0
- pycurl==7.43.0.2=py37h1ba5d50_0
- sqlite==3.27.2=h7b6447c_0
- libxml2==2.9.9=he19cac6_0
- numpy==1.16.2=py37h7e9f1db_0
- pyqt==5.9.2=py37h05f1152_2
- llvmlite==0.28.0=py37hd408876_0
- kiwisolver==1.0.1=py37hf484d3e_0
- mpc==1.1.0=h10f8cd9_1
- pywavelets==1.0.2=py37hdd07704_0
- mkl-service==1.1.2=py37he904b0f_5
- h5py==2.9.0=py37h7918eee_0
- lzo==2.10=h49e0be7_2
- ptyprocess==0.6.0=py37_0
- curl==7.64.0=hbc83047_2
- bottleneck==1.2.1=py37h035aef0_1
- lz4-c==1.8.1.2=h14c3975_0
- libtiff==4.0.10=h2733197_2
- libtool==2.4.6=h7b6447c_5
- statsmodels==0.9.0=py37h035aef0_0
- libstdcxx-ng==8.2.0=hdf63c60_1
- numexpr==2.6.9=py37h9e4a6bb_0
- liblief==0.9.0=h7725739_2
- fontconfig==2.13.0=h9420a91_0
- libpng==1.6.36=hbc83047_0
- libxcb==1.13=h1bed415_1
- pycosat==0.6.3=py37h14c3975_0
- scipy==1.2.1=py37h7c811a0_0
- wurlitzer==1.0.2=py37_0
- py-lief==0.9.0=py37h7725739_2
- gevent==1.4.0=py37h7b6447c_0
- mkl_fft==1.0.10=py37ha843d7b_0
- yaml==0.1.7=had09818_2
- xz==5.2.4=h14c3975_4
- bitarray==0.8.3=py37h14c3975_0
- mpfr==4.0.1=hdf1c602_3
- mistune==0.8.4=py37h7b6447c_0
- libgcc-ng==8.2.0=hdf63c60_1
- gmp==6.1.2=h6c8ec71_1
- numba==0.43.1=py37h962f231_0
- freetype==2.9.1=h8a8886c_1
- tk==8.6.8=hbc83047_0
- openssl==1.1.1b=h7b6447c_1
- lazy-object-proxy==1.3.1=py37h14c3975_2
- greenlet==0.4.15=py37h7b6447c_0
- blosc==1.15.0=hd408876_0
- libcurl==7.64.0=h20c2e04_2
- libssh2==1.8.0=h1ba5d50_4
- ruamel_yaml==0.15.46=py37h14c3975_0
- ncurses==6.1=he6710b0_1
- pytorch==1.2.0=py3.7_cuda10.0.130_cudnn7.6.2_0
- msgpack-python==0.6.1=py37hfd86e86_1
- numpy-base==1.16.2=py37hde5b4d6_0
- patchelf==0.9=he6710b0_3
- jpeg==9b=h024ee3a_2
- bzip2==1.0.6=h14c3975_5
- cytoolz==0.9.0.1=py37h14c3975_1
- pango==1.42.4=h049681c_0
- zstd==1.3.7=h0b5b093_0
- pcre==8.43=he6710b0_0
- scikit-image==0.14.2=py37he6710b0_0
- harfbuzz==1.8.8=hffaf4a1_0
- scikit-learn==0.20.3=py37hd81dba3_0
- fribidi==1.0.5=h7b6447c_0
- libffi==3.2.1=hd88cf55_4
- mkl==2019.3=199
- pytables==3.5.1=py37h71ec239_0
- expat==2.2.6=he6710b0_0
- tornado==6.0.2=py37h7b6447c_0
- zlib==1.2.11=h7b6447c_3
- gmpy2==2.0.8=py37h10f8cd9_2
- torchvision==0.4.0=cuda100py37hecfc37a_0
- ninja==1.10.1=py37hfd86e86_0
- libxslt==1.1.33=h7d1a2b0_0
- psutil==5.6.1=py37h7b6447c_0
- icu==58.2=h9c2bf20_1
- sqlalchemy==1.3.1=py37h7b6447c_0
- dbus==1.13.6=h746ee38_0
- gst-plugins-base==1.14.0=hbbd80ab_1
- pyodbc==4.0.26=py37he6710b0_0
- gstreamer==1.14.0=hb453b48_1
- libgfortran-ng==7.3.0=hdf63c60_0
- pycrypto==2.6.1=py37h14c3975_9
- pyyaml==5.1=py37h7b6447c_0
- matplotlib==3.0.3=py37h5429711_0
- qt==5.9.7=h5867ecd_1
- cryptography==2.6.1=py37h1ba5d50_0
- fastcache==1.0.2=py37h14c3975_2
- sip==4.19.8=py37hf484d3e_0
- cython==0.29.6=py37he6710b0_0
- snappy==1.1.7=hbae5bb6_3
- libuuid==1.0.3=h1bed415_2
Upon further investigation, I suspect the underlying issue may be due to OS-specific dependencies causing incompatibilities. Although I modified the environment.yml based on recommendations from this link, the problem persists.
...>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abor\
failed
Is it possible to run idr on Windows, or should I transition to Linux?
Any suggestions welcome!
I have run the code and it nearly took 5:30 hours and after trying with SCAN_ID=24, I have seen there has been this pop up saying that face normal shape incorrect in 2000 epoch,
can you give a solution how to resolve it?
RayTracing: object = 1359/2048, secant on 634/653.
what I have done is python evaluation/eval.py --conf ./confs/dtu_fixed_cameras.conf --scan_id 24 --checkpoint 2000
resulting face normals incorrect shape for 2 steps after that it notifies face normal is zero for rest of the following and stop executing.
Hoping to know what can be done further.
Hello, I am a big fan of this project
I wonder how you get the scale matrix with DTU dataset.
If you don't mind, could you share the code which you use to get scale matrix?
I think I solved it
Thank you
In the paper, the surface light field assumes, to my understanding, that the light source is fixed in the scene, and the DTU MVS dataset uses fixed lights, so, would moving light sources (like a light attached to the camera) worsen the performance of the method?
On a similar topic, what effect would including images with the same camera poses but different lighting conditions (position or intensity) have on the quality of the 3d reconstructions?
Hi, thanks for your awesome work.
I intended to use the preprocess_cameras.py
to preprocess the custom data, which can be run on threestudio. However, it always raise the bellow error.
Can you provide some insights of where to modify?
My input camera is proj_mtx_new
, mask is extracted from the alpha of a png.
Thanks in advance!
c2w = transform[:3, :4] # transform is a synthesized parameters
mv = torch.linalg.inv(transform)
fovy = np.random.uniform(self.fovy_range_min, self.fovy_range_max)
self.resolution = [512, 512]
K = np.array([
[fovy, 0, self.resolution[0] / 2, 0],
[0, fovy, self.resolution[1] / 2, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]
])
K = torch.from_numpy(K).float()
proj_mtx_new = K @ mv
Number of points:0
/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py:220: RuntimeWarning: Mean of empty slice.
centroid = np.array(all_Xs).mean(axis=0)
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:269: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in divide
arrmean = um.true_divide(arrmean, div, out=arrmean,
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:261: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 312, in <module>
get_normalization(opt.source_dir, opt.use_linear_init, dataset)
File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 266, in get_normalization
normalization, all_Xs = get_normalization_function(Ps, mask_points_all, number_of_normalization_points, number_of_cameras,masks_all)
File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 225, in get_normalization_function
centroid,scale,all_Xs = refine_visual_hull(masks_all, Ps, scale, centroid)
File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 151, in refine_visual_hull
points = points + center[:, np.newaxis]
IndexError: invalid index to scalar variable.
How can I create a .ply file with colored mesh?
Hi! Greak work!
I want to know how to save the mesh as an image like the photos in your paper.
Thanks!
I noticed that the cameras.npz
file provided for the DTU dataset has some additional matrices called "camera_mat_{i}". Are these the 3x3 calibration matrices K from P = K[R | t]?
I tried to extract K
, R
and t
from P (which are "world_mat_{i}") using opencv:
K, R, t, _, _, _ = cv2.decomposeProjectionMatrix(P2)
The K
that I get from here does not match with the corresponding "camera_mat_{i}". Also the t
values that I get are confusing to me. t
has the shape (4, 1) and usually the 4th entry is always 1, but that is not happening here.
UPDATE:
The 't' values were actually okay. I forgot to divide it by the 4th entry
Hi,
I have a quick question regarding the design choice made in this work. In the rendering network, positional encoding is applied to view directions but not positions. This is quite different from NeRF, where they positional-encode the 3d positions. I just wonder if there's any special consideration behind this choice? Thank you.
Hi I noticed that the default batch size is 1 and I was thinking that increasing the batch size would speed up the training. But After I set the batch size to 25. The result looks blury comparing to the result trained with batch size one. I am wandering what might go wrong in my setting. Have you met such an issue before?
Hi There,
Thanks for sharing this great work.
I have difficulties understanding Equation 3 in the paper relating the
According to the code below in implicit_differentiable_renderer.py, I don't think it is implemented as described in the paper.
points = (cam_loc.unsqueeze(1) + dists.reshape(batch_size, num_pixels, 1) * ray_dirs).reshape(-1, 3)
sdf_output = self.implicit_network(points)[:, 0:1]
ray_dirs = ray_dirs.reshape(-1, 3)
if self.training:
.....
surface_points = points[surface_mask]
surface_output = sdf_output[surface_mask]
.....
output = self.implicit_network(surface_points)
surface_sdf_values = output[:N, 0:1].detach()
From the code above, sdf_output
and surface_sdf_values
are the output of implicit_network
with the same input. Hence, they should always be the same. Actually, When I print surface_output - surface_sdf_values
, the value is indeed close to zero (within the float value precision limit).
So, does it make sense to update
surface_dists_theta = surface_dists - (surface_output - surface_sdf_values) / surface_points_dot
I literally spent days trying to understand this code but failed. Could you elaborate more on this?
Thank you.
Hi, I am trying to train network with synthetic dataset from blender and trying to figure out the exact meaning 'scale_mats, world_mats' in scene_dataset.py Line41. And I did download the original DTU dataset, it seems only provide the intrinsic and extrinsic camera parameters. I want to know how did you preprocess the data, is it possible to upload the 'data processing code' used in the paper?
I see the Pytorch in the environment.yml file uses pytorch=1.2.0=py3.7_cuda10.0.130_cudnn7.6.2_0.
Can the code run with Pytorch CPU version as well?
conda install -c pytorch pytorch-cpu
Thanks.
thanks for sharing this great work! I have downloaded the ground truth 3d point clouds from DTU dataset. While the point clouds contain the table and some point clouds are not complete. How did you get the chamfer distance from your generated meshes and the ground truth point coluds?
Hi @lioryariv, this is great work!
I noticed that the published code doesn't contain scripts for quantitatively evaluating the reconstructed surface against the DTU ground truth. Your packaged DTU dataset doesn't contain ground-truth shapes either.
Could you please share your scripts for quantitative evaluation of DTU scenes? If you can also share the ground truth shapes (packaged with your DTU data) to use with your scripts, that would be amazing!
Thanks and Regards,
Shubham
Hi,
I am trying to get a sense of how this method performs compared with MVSNet and its variants. But seems you are using different measures on DTU dataset. So I wonder do you have some comparisons that you can share? Thank you
I did try to run the last instruction , where for GEOMETRY_ID , I have given the trained surface model with surface_world_coordinates.ply and same for appearance_id too, even I have changed those generated evals to code and then tried to run the code by specifying the location_path saying evals/dtu_trained_cameras/surface_world_coordinates.ply
But nothing worked for me, can you be bit more clear about the instructions for the execution.
Thank you
Hi! Thank you for the great work, I'm trying to get system to work on my own images, but am facing some difficulties. I obtain the position information using an aruco board on which I place my object. Now, the algorithm moves past the ray-tracing step properly, but then the algorithm outputs an rgb-loss of "0.0". Did anyone encounter something similar? I am very grateful for any type of response :)
Hi folks,
Thanks for this awesome work. I'm wondering how to render the "spiral" videos demonstrated in the project page?
Any reply would be appreciated!
Best,
Peihao
In the code:
I want to know if it is well-designed by you, and why?
Hi there,
First of all congrats on the project and on the code. It's really neat 🔥 .
I'm trying to run the code with my own scene and, even though the losses are printed with kind of meaningful values, they do not decrease and the learnt scene doesn't resemble at all at what it should. I was wondering if the issue could be related to the coordinate system used.
My coordinate system is this way:
Could it be you are assuming an OpenGL-like coordinate system and that it could be the reason it doesn't work?
Any thoughts will be very welcome.
Thanks in advance!
The given readme shows how to run the code on the DTU dataset but if we want to run it on some other set of images, how do we do it?
What all data is needed other than the images to make the code run?
I don't need help anymore, thanks.
Hi,
Thank you for sharing the code. I trained the model. When I ran eval.py to extract the mesh using the latest checkpoint I am getting Cuda OOM in the File "../code/utils/plots.py", line 197, in get_surface_high_res_mesh which is the line
grid_points = torch.cat(g, dim=0)
I tried to change the following code snippet in plots.py, but it did not help. My GPU has 11GB memory. I also tried using 2 gpus, but I think the eval code uses only a single gpu. I also tried torch.cuda.empty_cache() to clear the cuda cache but I still get the OOM error. Could you please provide some guidance on how to fix the OOM problem? Thanks.
g = []
for i, pnts in enumerate(torch.split(grid_points, 100000, dim=0)):
g.append(torch.bmm(vecs.unsqueeze(0).repeat(pnts.shape[0], 1, 1).transpose(1, 2),
pnts.unsqueeze(-1)).squeeze() + s_mean)
grid_points = torch.cat(g, dim=0)
Great work! Thanks for sharing the code. Do you have a rough estimate of how long it takes your method to train a single DTU scene? Thank you.
In the data_convention.md file, the preprocess script says that it generates the needed cameras.npz file.
The code loads an initial cameras.npz file, but how should we build this cameras.npz file to run the preprocess code on new images ?
Closing this issue as I've figured it out. Thanks!
I've got the sense that world_mat_
is the projection matrix converting 3d world coordinates to 2d pixel image coordinates; 'scale_mat_' is the normalization transform to move and scale the 3d object in world space to let it be centered at the origin and inside a unit sphere.
My question is why do we need to do that in such a sophisticated way? I checked the normalization matrix on DTU and data_BlendedMVS, It seems that for each object the normalization matrices are the same for all views. Then what kind of scenarios are this preprocess for? Thank you :-)
In order to run IDR on new data [DIR PATH], you need to supply image and mask directories, as well as cameras.npz file containing the appropriate camera projection matrices.
Dear authors,
Do we need to convert camera projection matrices in DTU dataset from txt to npy files?
Thanks
hello, is there some way to construct the color mesh?
Hello, I'm trying to annotate the binary masks for other DTU objects the same way as your IDR work. So may you provide your annotation methods?
Thanks for the great work. I was trying to run it on some images captured through my camera. I used the colmap to get the world matrices. But it is giving the following error while trying to run preprocess_camera.py
preprocess_cameras.py:87: DeprecationWarning:
np.floatis a deprecated alias for the builtin
float. To silence this warning, use
floatby itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use
np.float64here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations mask_points_all.append(np.stack((xs,ys,np.ones_like(xs))).astype(np.float)) preprocess_cameras.py:45: DeprecationWarning:
np.floatis a deprecated alias for the builtin
float. To silence this warning, use
floatby itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use
np.float64 here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations cur_l_1=Fj0 @ np.array([curx,cury,1.0]).astype(np.float) Number of points:0 preprocess_cameras.py:169: RuntimeWarning: Mean of empty slice. centroid = np.array(all_Xs).mean(axis=0) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:263: RuntimeWarning: Degrees of freedom <= 0 for slice keepdims=keepdims, where=where) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in true_divide subok=False) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "preprocess_cameras.py", line 244, in <module> get_normalization(opt.source_dir, opt.use_linear_init) File "preprocess_cameras.py", line 207, in get_normalization normalization,all_Xs=get_normalization_function(Ps, mask_points_all, number_of_normalization_points, number_of_cameras,masks_all) File "preprocess_cameras.py", line 175, in get_normalization_function centroid,scale,all_Xs = refine_visual_hull(masks_all, Ps, scale, centroid) File "preprocess_cameras.py", line 101, in refine_visual_hull points = points + center[:, np.newaxis] IndexError: invalid index to scalar variable.
idr/code/model/sample_network.py
Line 15 in 44959e7
Thanks for your great work! I want to convert the nerf synthetic dataset into the form like yours, can you tell me how to do it? Thanks!
I would like to ask you, how to modify the geometric initialisation part for the case, where the output from the positional encoding does not contain the raw XYZ and consist only of sin and cos of the XYZ at various frequencies? Thank you.
Hi there, really nice work!
And I'm trying to understand this line code, it looks different from Eq.3 of the main paper, the "surface_output, surface_sdf_values" seems are the same value and it would be really helpful if you could explain it more. Thank you in advance!
Compared the implementation of the description of equation 3 to the repo code, I am confused about the code surface_dists_theta = surface_dists - (surface_output - surface_sdf_values) / surface_points_dot
, here surface_output
is f (c + t 0 v; θ)
and is equal to surface_sdf_values
which has a detach operation in surface_sdf_values = output[:N, 0:1].detach()
, it seems that f (c + t 0 v; θ) - f (c + t 0 v; θ).detach()
, should the code consistent with the formula be surface_dists_theta = surface_dists - surface_output / surface_points_dot
?
Hi, thanks for your great work.
I am reading your codes and one thing confuses me.
For the spherical tracing part, accoring to the paper, it should find the first intersection point (which is also nearer) first and if it is not convergent, the algorithm will begin at the another intersection point, which is the farthest point.
However, when I read the codes, the start_point seems to be farther from the camera center than the end_points because the start_points choose the index 0 of "sphere_intersections_points" and the end_points choose index 1.
Because according to "get_sphere_intersection" function,
the index 0 stores (-under_sqrt - ray_cam_dot) and index 1 stores (under_sqrt - ray_cam_dot), which means the tensors on index 0 have larger absolute value.
So it seems that start_points will be farther than the end_points, which is comfusing.
Did I get anything wrong?
Hey there,
This is not an issue, just a question. Using softplus
activation is significantly slower than using for instance ReLU
. Is there any technical (theoretical or practical) reason why you chose softplus
in front of other more efficient activations?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.