ken2576 / vision-nerf Goto Github PK

Official PyTorch Implementation of paper "Vision Transformer for NeRF-Based View Synthesis from a Single Input Image", WACV 2023.

License: MIT License

Python 100.00%

vision-nerf's People

Contributors

Stargazers

Watchers

Forkers

anhquancao peterzs lxptnd jackzhousz seabird-go moonjunyyy poohoh wangjuenew newton-raphson bparker9 saitamandd srivardhanakorada

vision-nerf's Issues

How to export the 3D mesh?

Thank you for your amazing work!
Your code works great for new view synthesis, but how can I export the 3D mesh (as an obj) from your representation?

Confusion about generalization of NeRF

Thank your great work！
I have some confusion about NeRF generalizability.

Your paper title says that only need a single image to synthesize novel image, and And what is the function of the pre-training weights you provide? Pre-training weights how you get them？

Are the pre-training weights used to extract the global and local features of a single input image, and then use NeRF MLP to obtain target view?

The original NeRF needs to input dozens to hundreds of pictures of a scene, and after training, it can generate any new perspective of the scene. Although you only input a single image, you train a network on the image data set to extract global features and local features. What is the difference between input many images in this original nerf?

Sorry, I don't understand the generalizability of NeRF, I'd appreciate your reply, thanks!

Question about the number of input images

Hi,

In your paper, you said the input is one single image.
However, after reading your code, it seems that you use all other images except the target view image as the input of the transformer whether it is in the training phase or the generation phase?
This is not consistent with what you describe in your paper.
I'm curious why it is.

Single image input for NeRF

Hi there! Really cool model, managed to get the model working on my own input images but I've had to resort to a bit of workaround to get there.

I've been trying to run your model on my own data, and don't seem to be able to get the model to take in a single image as an input as described in the paper. The only way I've found the model to work is by duplicating the input image 100 times and adding a set of poses from the training SRN files.

This is the error code I get when running the SRN, NMR and gen_real models on a single image:

I also tried preparing the data as suggested by using Pixel-NeRF's method, and managed to get detectron working but not Pixel-NeRF itself, which are both required to prepare the data as suggested. Would you be able to clarify the format of the input data for the Vision-NeRF model?

visualization

Could you release the code that renders the visualization video and calculates metrics? It will save me a lot of time.
Thanks!

index is out of bounds

Hello,
When reading the ynz of the srn_cars dataset, an error of index is out of bounds for dimension with size 0 will appear. The dataset is downloaded from PixelNeRF as you described. How to solve it?

Question about reproducing.

Hi,
I am trying to reproduce the training of your ckpt. However, the code seems not support DDP or DP training&evaluation. Therefore, I tried the default training config where batchsize=1 and trained for 500K iterations. However, the performance is significantly worse than your provided ckpt. Do you have any idea about it ?
Best,

Question about producing the result using just VIT features

Is there any chance i can check out the rendering results of the srn-cars dataset using only the vit features.

Generate Multi-level Feature Maps

Hello, may I ask what is the difference between using ViT Encoder and Convolutional Decoder to generate Multi-level Feature Maps, and using PVT Encoder to generate Multi-level Feature Maps directly?

Failed to render images on the pretrained srn_cars model

A problem about cutting the Stanford automobile data set

Thank your great work！
When I was trying to use the Stanford Automobile Dataset to conduct experiments in real scenarios, I encountered the problem that the data set could not be cut correctly. Could you please help? I need a cut data set.

Details to train your model

Hi, could you provide details to train your model? Many thanks

Seems still need the pose information of the single input image

Hi, thanks for sharing this work.

As you mentioned in the paper, the vision-nerf could synthesize the novel views conditioned on the single unposed input image.
However, from the code in render_ray.py, I found it seems still requires the pose information of the source image.

Could you point out whether I misunderstand something?

Weights are not available

Dear author, my access to your pretrained weight link has been denied recently, could you please provide me permission to download the weights? My Google account is [email protected]. I promise not to use it for any commercial purpose, very much looking forward to get your permission.
Best wishes!

ModuleNotFoundError: No module named 'configargparse'

When I use python eval_nmr.py --config [config path] to run the code always get the error:No module named 'configargparse'
So is my config file geting wrong? OR i missed some setting.

Excuse me，what part of the code corresponds to local feature extraction ?

Excuse me, what part of the code corresponds to local feature extraction?

question about reproducing this paper

when i reproduced this paper ， use the dataset NMR but i have question like this ：
File "<array_function internals>", line 200, in stack
File "/home4T/cxj/anaconda3/envs/VF/lib/python3.8/site-packages/numpy/core/shape_base.py", line 460, in stack
raise ValueError('need at least one array to stack')

i try many method，but i cant handle it ，can you give me some suggestion，thanks

Question about training time.

Hi,
May I ask about the specific training time and GPU number of your method on different datasets, e.g., SRN-chairs, SRN-cars, and NMR?
Best,