cvlab-columbia / zero123 Goto Github PK
View Code? Open in Web Editor NEWZero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
Home Page: https://zero123.cs.columbia.edu/
License: MIT License
Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
Home Page: https://zero123.cs.columbia.edu/
License: MIT License
Hi, thanks for sharing this fantastic work.
I am trying to understand the cross attention used in the model. Here, the conditional context only has one token, i.e., the clip embedding concatenated with pose. As a result, the size of the cross attention matrix is [num_spatial_token, 1] and all attention weights would be one. The output is just copying the value vector to each spatial location (or add, if we consider the residual connection). It seems that the K and Q are redundant in this case. Is this the expected behavior?
as you mentioned 'If you are trying out images of humans, especially faces, note that it is unfortunately not the intended use case. We would encourage trying out images of everyday objects instead, or even artworks.' I am wondering whether it's hard to generate 3d model of human faces or the model just still in training and will be released later?
Thanks for the awesome project !
I wonder whether the training codes of this study are also released or not !
Again, thanks for the awesome and very cool project.
Hi!
I have seen the updated README about training. Will you also release the command for testing?
Also, may I know how to use the ObjaverseDataModuleFromConfig by myself to have a flexible control, instead of using the trainer.fit()?
Nice work !
configs/sd-objaverse-finetune-c_concat-256.yaml
it can not find the config and corresponding last.ckpt model.
Will it be shared later
hi, first show great respect to this work!
I just wondering how many objaverse
objects are used in this project? I see the objaverse
dataset is quite large lol.
Perhaps you can provide some tutorial about processing objaverse
dataset?(like rendering etc)
Best,
Hi, @ruoshiliu Thank you for making your incredible work public!
I'm looking into the code for 3D reconstruction based on SJC, and wondering if there is a code for extracting the mesh from the generated 3D, like what has been shared on the project page. Or, if there are any libraries or code repositories used, that information would also be of great help. Thank you!
I have checked Appendix Section A of the paper regarding the camera coordinate system. In gradio_new.py, Is the Rotation matrix(camera_R ) in w2c format?
In gradio_new.py, you use camera_R to obtain the camera extrinsic but not given T matric of the camera. So I want to use pytorch3d look_at_view_transform to compute
extrinsic matric R|t using: R, T = look_at_view_transform(dist=radius, elev=90 - polar_deg, azim=azimuth_deg, up=((0, 1, 0),)). but I got a different result. In the image below, the result of R_jisuan is used pytorch3d to compute, and the camera_R_ori is your code result. how to compute the same R using pytorch3d in your code. Or you can give me advice on how to use your code to compute the T matric of the camera.
Looking forward to your reply!
Thanks for this great job 🚀 !
However, the current weight files saved on Google Drive are too large to download. Is there any plan to upload the weight files to huggingface or split them by submodule?
On your webpage you show comparisons in the single-view 3D reconstruction against Point-e. I was wondering which version of Point-e this is - whether you trained your own or used the open source release? And in either case what the specifics were?
Thank you
Awesome work!Due to the high VRAM required,I'm unable to try by myself.Is it able to export the model and using it in other pipelines,like game?
Thanks for the interesting work and for releasing the code. Could you please provide more details on the evalution steps in your experiments ? Additionaly, I noticed that you evaluated the method on the GSO dataset and RTMV dataset, but it seems that the GSO dataset is a subset of RTMV dataset. Could you please clarify the evaluation procees and provide more information on how you evaluated on these two dataset ?
Hi,
Thanks for the amazing work. I would like to use your released data to do some experiments. I have some visualization examples of projecting the vertices of the 3d object bounding box to the 2d image, but I got some results like this:
I set the intrinsic matrix as K = np.asarray([512, 0, 256, 0, 512, 256, 0, 0, 1]).reshape(3, 3)
, which could be wrong. Is there any normalization process in your implementation? Could you please provide the correct intrinsic matrix?
Looking forward to your reply!
Hi, I appreciate your excellent work. I tried to download your data and ran your train script. However, the released data has no file 'valid_paths.json' (I managed to find it in another issue), is it just all the subfolder names in views_release folder? Also in your dataset code:
if self.paths[index][-2:] == '_1': # dirty fix for rendering dataset twice
total_view = 8
else:
total_view = 4
which means total view for each subfolder is 8 or 4. However, I found in my downloaded data, most scenes have more than 10 views, do you only random from 4 or 8 views during training?
Thanks a lot for releasing this training scripts and data. When I run the training scrips, I get some errors.
We originally extract 100 objects from the dataset, and we run in 2 GPUs with the 32GB of VRAM, we set batch_size=16, num_workers=8
in config/sd-objaverse-finetune-c_concat-256.yaml
, and it can run successfully.
However, then we use a larger dataset of 2000 objects, and continue to keep batch_size=16, num_workers=8
, then get the error RuntimeError: DataLoader worker (pid(s) 8298) exited unexpectedly
. I modify the config/sd-objaverse-finetune-c_concat-256.yaml
. We try to set batch_size
or num_workers
to be smaller, but it still get the same error. We also try to set num_workers=0
and it did not work. Only we set batch_size=1
,it did not have this error. So, I want to know where the error may occur, and is there any solution to solve this problem?
I had some trouble to debug it, but it seems like an OutOfMemory error, as 24.2 GB are filled via 'nvidia-smi', leading to a failure:
cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
or with cuDNN disabled:
cuBLAS error: CUBLAS_STATUS_NOT_INITIALIZED
in the F.conv2D call inside .
If you got it running on a RTX3090, could you share the configuration changes?
I used a 128x128 RBG image as a smallest test.
Hello and thank you for your very nice paper!
I am trying to train a view-conditional network using the code in zero123, but something is going wrong. I am wondering if my command is wrong, or if there is something else that I am missing.
I am using the command:
python main.py --base configs/sd-objaverse-finetune-c_concat-256.yaml --train --gpus=0,1,2,3 precision=16
I have trained for 10,000 steps and it is evident from the generations that something is going wrong. Do you know why this might be / should I be using a different command?
For context, the logged images look as follows:
inputs_gs-000000_e-000000_b-000000
:
conditioning_gs-000000_e-000000_b-000000
:
reconstruction_gs-000000_e-000000_b-000000
samples_gs-000000_e-000000_b-000000
samples_cfg_scale_3 00_gs-000000_e-000000_b-000000
Thank you so much for your help!
Currently the readme provides an example of how one might do 3D reconstruction with an existing image and transforms_train.json pair but provides no details on how we'd start doing reconstructions with our own images.
cd zero123/zero123, then python gradio_new.py, the program gets Killed.
The code locates in ' zero123/zero123/ldm/models/diffusion/ddpm.py ',
def instantiate_cond_stage(self, config):
if not self.cond_stage_trainable:
if config == "__is_first_stage__":
print("Using first stage also as cond stage.")
self.cond_stage_model = self.first_stage_model
elif config == "__is_unconditional__":
print(f"Training {self.__class__.__name__} as an unconditional model.")
self.cond_stage_model = None
# self.be_unconditional = True
else:
model = instantiate_from_config(config)
instantiate_from_config gets error, config is ' cond_stage_config:target: ldm.modules.encoders.modules.FrozenCLIPImageEmbedder'.
Did anyone else encounter similar situations?
Hi, thanks for the great work!
I saw in the paper, one baseline called SJC-I was mentioned:"Finally, we adapted SJC [53], a diffusion-based text-to-3D model where the original text-conditioned diffusion model is replaced with an image-conditioned diffusion model, which we termed SJC-I". Just wondering what does the "image-conditioned diffusion model" refer to? (I guess it's not your finetuned view-conditioned diffusion model, right?)
Thanks!
When I try to run gradio_new.py
I'm getting a CUDA OOM error, despite having >23GB of available memory. Is there something in the config I can tweak to overcome this?
Hi, thanks for providing the 1.5T rendered views!
As I'm using this dataset for my research, it occurs that only the transformation matrices are provided, while other camera parameters are missing, such as fov
, camera_angle_x
, rotation
, etc (those from BlenderDataset). Since we are using the objects from Objaverse, where GTs are provided, it would be better to skip the calibration step and use the information from the 3D asset. Therefore, I'm wondering if you could provide more data about how each object is rendered.
Thanks!
Hello! Thanks for the awesome project and releasing the models and data !
By the way, I found that downloading rendered images is too slow (about 30KB/s).
Is there any other way to download the dataset?
Thanks !
Hi,
I was wondering what GPU the authors use for 3drec/run_zero123.py
, and if anyone has been successful running it with <=32GB of GPU RAM? The novel view reconstruction works fine but I'm getting a CUDA out of memory error running the 3D reconstruction script. Thanks and great work!
Hi ruoshi,
Thank you for your awesome work. I have a question about training script. In main.py, when load the pretrained SD model sd-image-conditioned-v2.ckpt, the parameters of FrozenCLIPImageEmbedder will not be loaded due to unmatched key names. So do your fine-tuned model load the clip embedding parameters?
Hi, authors of zero123,
I wonder if it is necessary to filter out some objects in Objaverse when creating the dataset since some instances are too weird (e.g. a single paper). And have you filtered the dataset you provide in this repo? Could you give me some suggestions?
Hi @ruoshiliu,
Thanks for your excellent contribution, I tried to reproduce the results but cannot make sure whether I am right. Could you please provide the training log for me to have a better look at it? Tensorboard Logs are also needed if you could provide them.
Thanks for this fantastic work!
I appreciate the effort it took to fit the model in 22GB. Would it be possible to squeeze it further down to 16GB? I'd love to be able to run it on my card (rtx 4080).
Hi!
First of all, thank you so much for revealing such a wonderful work.
I've checked this issue but still have a question.
In the paper, it is said that 3D reconstruction was performed with sjc, not nerf.
Even in the 3D reconstruction section of the README, run_zero123.py
using sjc is shown as an example.
However, there is no part in the sjc code that brings out 3D.
Can you tell me the reason?
Thanks.
Hi @ruoshiliu,
Thanks for your contribution, I'm trying to quickly test an idea, but I need to re-render the data set. I noticed that your previous answer mentioned using 10 machines to render, can you provide code and tutorials for multi-machine rendering?
I'm wondering if the RT matrix is the extrinsic matrix of the camera? And I just want to make sure that in the dataset, this matrix is kept in the transforms.json right? I'm confused because it seems that the RT matrix generated when rendering is different from that of GET3D, so I want to make sure that I'm doing the right render process and get the extrinsic matrix of the camera.
Hi, authors thanks for your work. I'm training your repo with 2 A100 and set batch_size = 192, do I have to set accumulate_grad_batches to 4 since you trained on 8 A100s?
Hi @ruoshiliu,
Have you tried to train the model with half-precision?
As mentioned in #20, I couldn't make it to train with precision=16.
Thanks for your excellent work
Hi @ruoshiliu,
I tried to reproduce the results with 32 V100 GPUs (batch size is 12 for each node, and the accumulate_grad_batches is 4 ), could you please help check these losses and reconstruction results?
In addition, I really hope that you can provide the training logs for future research.
Hi! @ruoshiliu
May I ask a question about lr schedular? Currently it seems you are using constant lr=1 after warm up. Is it the optimal schedule you've found? I am asking this question because I am wondering what would be the appropriate lr if we want to fine tune the model without deviating.
Thanks so much for your help!
Dear friends! I really appreciate your work and your caution of choosing testing images out of objavserse distribution! May I ask your way of choosing testing objects? (To be more concrete, do you have any advice on choosing 3D object testing dataset so that we can test it with ground truth?)
Hi! Happy to read about your excellent work!
May I know where is the file valid_paths.json that is used for training on objaverse? I can only find object-paths.json in the downloaded files.
Thanks.
Hi, Thank you for amazing work!
I was able to download the renderings provided in the repository. However, I was not able to find the camera intrinsic (e.g focal length or camera angle) nor the near and far depths of each scene. I wanted to check if I was missing something?
Also I wanted to confirm, the downloadable renderings do not contain the depth maps from each view right?
Thanks
Hi, @ruoshiliu
I tried to download the rendered results of the objaverse dataset. However, the following errors occurred:
Hi @ruoshiliu,
Thanks for sharing the code! When I run the 3drec code, it seems the config is missing?
Hello, thanks for fixing the download link !
I successfully download your dataset !
By the way, what is the information of numpy files ?
I checked each npy file include 3x4 matrix (camera pose).
Is it camera extrinsic?
I wonder the camera extrinsic is already preprocessed, assuming the object center is oriented at the origin of coordinates.
I run the training script and encounter an error: No such file or directory: 'views_whole_sphere/valid_paths.json'. Please help me fix this problem. Thank you.
Hi, thanks for demonstrating this fantastic work!
I have been following the provided rendering code based on your instructions.
However, as mentioned, the rendering speed is extremely slow, even with 8GPUs.
#4
Do you have an estimation of how long the whole process would take to render the objarvse dataset?
Will it be possible to have a downloadable link, such as Google drive, dropbox, etc for your preprocessed dataset (or even a reasonable size subset)?
Thanks!
Hello, this is perfect work.
I had a question about the gradio_new.py. Specifically, I want to know the Rotation matrix(camera_R ) is in c2w or w2c format. Please clarify this for me.
Also, I want to know if the camera coordinate system used in gradio_new.py is based on the NerF/OpenGL convention, where the camera faces the negative z-axis, and the positive x-axis points to the right, or others.
Finally, I was wondering if the values of cam_x, cam_y, and cam_z in the gradio_new.py represent the coordinates of the camera in the world coordinate system.
Looking forward to your reply
Hi, just to validate the files downloaded correctly could you provide md5 checksums for them please?
This is working great on windows with novel view synthesis, but there is an small path issue with 3d reconstruction:
Loading model from ../zero123/105000.ckpt
Global Step: 165000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
File "C:\Users\user\Desktop\zero123\3drec\run_zero123.py", line 405, in <module>
dispatch(SJC)
File "C:\Users\user\Desktop\zero123\3drec\my\config.py", line 76, in dispatch
mod.run()
File "C:\Users\user\Desktop\zero123\3drec\run_zero123.py", line 124, in run
sjc_3d(**cfgs, poser=poser, model=model, vox=vox)
File "C:\Users\user\Desktop\zero123\3drec\run_zero123.py", line 150, in sjc_3d
images_, _, poses_, mask_, fov_x = load_blender('train', scene=scene, path=nerf_path)
File "C:\Users\user\Desktop\zero123\3drec\voxnerf\data.py", line 15, in load_blender
with open(root / f'transforms_{split}.json', "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: "'data\\nerf_wild'\\'pikachu'\\transforms_train.json"
Hello, I want to know how to get the image data for training, I only download the objaverse data from huggingface with the '.glb' file, and when I run the training scripts, It seems it try to load the image from.objaverse/hf-objaverse-v1/692db5f2d3a04bb286cb977a7dba903e_1/002.png
, but I do not have these image data with '.png'.
By the way, in line 282 in zero123/zero123/ldm/data/simple.py
, the sys
is not defined in this file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.