Coder Social home page Coder Social logo

yilundu / cross_attention_renderer Goto Github PK

View Code? Open in Web Editor NEW
126.0 126.0 10.0 67.82 MB

CVPR 2023: Learning to Render Novel Views from Wide-Baseline Stereo Pairs

Home Page: https://yilundu.github.io/wide_baseline/

Python 100.00%
deep-learning rendering-3d-graphics

cross_attention_renderer's People

Contributors

yilundu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cross_attention_renderer's Issues

Error in colab

I run the code in the colab, and encounter the below error. Please help me fix it. Thank you!
image

Failed to reproduce the results

Dear authors,

I recently tried to replicate the results presented in the paper by rerunning the repository code myself. However, I encountered a discrepancy wherein the results I obtained did not match the numbers reflected in the table presented in your documentation.
Furthermore, I evaluated the released pre-trained model and noticed a decline in performance, particularly in the LPIPS and SSIM metrics, compared to the reported values. I have noted the numerical results that I obtained in the below table.

I would greatly appreciate any insights or suggestions you might have that could help me identify potential reasons for this discrepancy.

LPIPS SSIM PSNR
The paper's numbers 0.262 0.839 21.38
The released model 0.316 0.807 21.17
The model trained with this repository 0.343 0.797 20.59

Regarding the settings and the code:

I used the same version of Pytorch and Torchvision and ran the code on 4-V100, which is the same GPU configuration noted in the paper.
I did not make any changes to the original code, except for the data processing section, due to an error that stopped the progress. More specifically, I observed that certain images were not conforming to the expected (360, 640) resolution and were not being reshaped into the appropriate shape because of not passing this line. To address this, I made the following modifications to ensure proper image reshaping and to adjust the intrinsic parameters accordingly:

H, W = rgb.shape[0], rgb.shape[1] # Some images are height less than 360

rgb = data_util.square_crop_img(rgb)
rgb = cv2.resize(rgb, (256, 256)) # all images are reshaped into (256, 256)

intrinsics = unnormalize_intrinsics(cam_params[timestep].intrinsics, 256, W*(256/H))
xscale = W / min(H, W)
yscale = H / min(H, W)

intrinsics[0, 2] = intrinsics[0, 2] / xscale
intrinsics[1, 2] = intrinsics[1, 2] / yscale

Train / test splits of ACID dataset

Hi, thank you for sharing the reproducing code!
I am trying to download the ACID videos, but I found no explicit train/test split information in the ACID pose files downloaded from the link in the downloaders README. Could you tell me the train/test splits of ACID used to get the results in the paper?

Regarding square crop

Hi, Thanks for sharing such wonderful work!

I have a small question regarding the square crop function.
So if my understanding is correct, in the augmentation, you use 'center square crop', but square crop (256x256) is a default setting for both training and evaluation?

I found the code lines that performs square crops at training time, but I'm not sure if you do the same at evaluation as well.

Do you square crop at evaluation as well?

Thanks!

Experimental details

Hello, I would like to ask what is the baseline unit for the experimental results of different baselines shown in Table 7 of your paper? Centimeters or?

RealEstate10K download error

I am very sorry to bother you, but there was an error when I downloaded the RealEstate10K dataset. The error message is as follows:

(base) mxy@ZYR:~/DataSets/cross_attention_renderer/data_download$ python3 generate_realestate.py  test
[INFO] Loading data list ...  Done! 
[INFO] 0 movies are used in test mode
########################################
TOTAL : 0 sequnces
[INFO] Start downloading 0 movies
Done!

How do I download this data set correctly? Thank you very much

Regarding pretrained weights, reproducing results

Hi,

I was trying to run the evaluation code, but it seems that the pretrained weights do not match with the default code in this repository.
There are some mismatching dimensions that cause errors when loading the weights.

Also, I did not know that realestate dataset was such a big dataset that requires around 8TB of storage. Did you use all of them to train your model? Do you think it's possible to produce similar results given the small subset of that dataset you provide in this repository?

Thanks!

File missing

Dear Authors,

Thanks for the support of Co3Dv2. However, dataset/co3d.yaml is missing. It is called by train_co3d.py.

"
with open('dataset/co3d.yaml', 'r') as file:
config = yaml.safe_load(file) "

*SIde Note: training.multiscale_training should be training.training in train_co3d and train_acid. It is so in train_realestate10k.

Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.