Coder Social home page Coder Social logo

R matrix from gradio_new.py about zero123 HOT 9 CLOSED

cwwjyh avatar cwwjyh commented on August 19, 2024
R matrix from gradio_new.py

from zero123.

Comments (9)

ruoshiliu avatar ruoshiliu commented on August 19, 2024

The matrices calculation in gradio_new.py is only for angle visualization purposes and is independent of the zero123 inference code, so I wouldn't use one to understand another.

Assuming you are asking zero123 inference conditioning, here's the code converting a pair of camera matices (input view and target view) into the 4-dimensional conditioning vector described in appendix section A:

def get_T(self, target_RT, cond_RT):
R, T = target_RT[:3, :3], target_RT[:, -1]
T_target = -R.T @ T
R, T = cond_RT[:3, :3], cond_RT[:, -1]
T_cond = -R.T @ T
theta_cond, azimuth_cond, z_cond = self.cartesian_to_spherical(T_cond[None, :])
theta_target, azimuth_target, z_target = self.cartesian_to_spherical(T_target[None, :])
d_theta = theta_target - theta_cond
d_azimuth = (azimuth_target - azimuth_cond) % (2 * math.pi)
d_z = z_target - z_cond
d_T = torch.tensor([d_theta.item(), math.sin(d_azimuth.item()), math.cos(d_azimuth.item()), d_z.item()])
return d_T

Hope it makes sense.

P.S. camera matrices over parameterized the transformation so there'll be more than one way to represent an RT operation. I think a better way to compare RT is to visualize the camera or to convert it to Euler or axis angle first.

from zero123.

cwwjyh avatar cwwjyh commented on August 19, 2024

The matrices calculation in gradio_new.py is only for angle visualization purposes and is independent of the zero123 inference code, so I wouldn't use one to understand another.

Assuming you are asking zero123 inference conditioning, here's the code converting a pair of camera matices (input view and target view) into the 4-dimensional conditioning vector described in appendix section A.

Hope it makes sense.

P.S. camera matrices over parameterized the transformation so there'll be more than one way to represent an RT operation. I think a better way to compare RT is to visualize the camera or to convert it to Euler or axis angle first.

if I use camera_R to calculate the rotation matric in gradio_new.py, is the camera_R the target image's rotation matric or not?

from zero123.

ruoshiliu avatar ruoshiliu commented on August 19, 2024

Sorry but I don't understand. What do you need camera_R, which is used only for visualization, for?

from zero123.

cwwjyh avatar cwwjyh commented on August 19, 2024

Sorry but I don't understand. What do you need camera_R, which is used only for visualization, for?

camera_R is not used only for visualization. I want to input a single RGB image and then synthesize an image from a specified camera viewpoint. if I give many specified camera viewpoints, I can get many target views. Then, I utilize the target image and corresponding extrinsic matric(eg: camera_R) as a new dataset. So I want to know if the camera_R in gradio_new.py is the target image rotation matric. if yes, I can calculate the translation matric T in gradio_new.py.

from zero123.

ruoshiliu avatar ruoshiliu commented on August 19, 2024

If you want to generate a novel view based on a sampled target viewpoint described by an RT matrix, assuming your input viewpoint is cond_RT, you can sample a target viewpoint target_RT, and use the get_T function to get d_T which is the conditioning information for zero123:

def get_T(self, target_RT, cond_RT):
R, T = target_RT[:3, :3], target_RT[:, -1]
T_target = -R.T @ T
R, T = cond_RT[:3, :3], cond_RT[:, -1]
T_cond = -R.T @ T
theta_cond, azimuth_cond, z_cond = self.cartesian_to_spherical(T_cond[None, :])
theta_target, azimuth_target, z_target = self.cartesian_to_spherical(T_target[None, :])
d_theta = theta_target - theta_cond
d_azimuth = (azimuth_target - azimuth_cond) % (2 * math.pi)
d_z = z_target - z_cond
d_T = torch.tensor([d_theta.item(), math.sin(d_azimuth.item()), math.cos(d_azimuth.item()), d_z.item()])
return d_T

from zero123.

cwwjyh avatar cwwjyh commented on August 19, 2024

If you want to generate a novel view based on a sampled target viewpoint described by an RT matrix, assuming your input viewpoint is cond_RT, you can sample a target viewpoint target_RT, and use the get_T function to get d_T which is the conditioning information for zero123:

def get_T(self, target_RT, cond_RT):
R, T = target_RT[:3, :3], target_RT[:, -1]
T_target = -R.T @ T
R, T = cond_RT[:3, :3], cond_RT[:, -1]
T_cond = -R.T @ T
theta_cond, azimuth_cond, z_cond = self.cartesian_to_spherical(T_cond[None, :])
theta_target, azimuth_target, z_target = self.cartesian_to_spherical(T_target[None, :])
d_theta = theta_target - theta_cond
d_azimuth = (azimuth_target - azimuth_cond) % (2 * math.pi)
d_z = z_target - z_cond
d_T = torch.tensor([d_theta.item(), math.sin(d_azimuth.item()), math.cos(d_azimuth.item()), d_z.item()])
return d_T

Thank you! I will try it. Can I write the above code in gradio_new.py to get the target Rt matric?

from zero123.

ruoshiliu avatar ruoshiliu commented on August 19, 2024

This function does (cond_RT, target_RT) -> d_T. Sounds like you want (cond_RT, d_T) -> target_RT. You will need to implement that yourself.

from zero123.

cwwjyh avatar cwwjyh commented on August 19, 2024

This function does (cond_RT, target_RT) -> d_T. Sounds like you want (cond_RT, d_T) -> target_RT. You will need to implement that yourself.

cond_RT means input image RT, target_RT means synthesize view RT, what's mean d_T?
1683256486555

from zero123.

ruoshiliu avatar ruoshiliu commented on August 19, 2024

It means the relative transformation from cond_RT to target_RT, which is used as input to the diffusion model:

T = torch.tensor([math.radians(x), math.sin(
math.radians(y)), math.cos(math.radians(y)), z])

I suggest you try to understand the paper/code before raising an issue here. Closing the ticket for now.

from zero123.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.