I have checked Appendix Section A of the paper regarding the camera coordinate system.

R matrix from gradio_new.py,about cvlab-columbia/zero123

ruoshiliu commented on August 19, 2024

The matrices calculation in gradio_new.py is only for angle visualization purposes and is independent of the zero123 inference code, so I wouldn't use one to understand another.

Assuming you are asking zero123 inference conditioning, here's the code converting a pair of camera matices (input view and target view) into the 4-dimensional conditioning vector described in appendix section A:

zero123/zero123/ldm/data/simple.py

Lines 257 to 272 in f70ea8c

    
           def get_T(self, target_RT, cond_RT): 
        
               R, T = target_RT[:3, :3], target_RT[:, -1] 
        
               T_target = -R.T @ T 
        
               R, T = cond_RT[:3, :3], cond_RT[:, -1] 
        
               T_cond = -R.T @ T 
        
               theta_cond, azimuth_cond, z_cond = self.cartesian_to_spherical(T_cond[None, :]) 
        
               theta_target, azimuth_target, z_target = self.cartesian_to_spherical(T_target[None, :]) 
        
               d_theta = theta_target - theta_cond 
        
               d_azimuth = (azimuth_target - azimuth_cond) % (2 * math.pi) 
        
               d_z = z_target - z_cond 
        
               d_T = torch.tensor([d_theta.item(), math.sin(d_azimuth.item()), math.cos(d_azimuth.item()), d_z.item()]) 
        
               return d_T

Hope it makes sense.

P.S. camera matrices over parameterized the transformation so there'll be more than one way to represent an RT operation. I think a better way to compare RT is to visualize the camera or to convert it to Euler or axis angle first.

from zero123.

cwwjyh commented on August 19, 2024

The matrices calculation in gradio_new.py is only for angle visualization purposes and is independent of the zero123 inference code, so I wouldn't use one to understand another.

Assuming you are asking zero123 inference conditioning, here's the code converting a pair of camera matices (input view and target view) into the 4-dimensional conditioning vector described in appendix section A.

Hope it makes sense.

P.S. camera matrices over parameterized the transformation so there'll be more than one way to represent an RT operation. I think a better way to compare RT is to visualize the camera or to convert it to Euler or axis angle first.

if I use camera_R to calculate the rotation matric in gradio_new.py, is the camera_R the target image's rotation matric or not?

from zero123.

ruoshiliu commented on August 19, 2024

Sorry but I don't understand. What do you need camera_R, which is used only for visualization, for?

from zero123.

cwwjyh commented on August 19, 2024

Sorry but I don't understand. What do you need camera_R, which is used only for visualization, for?

camera_R is not used only for visualization. I want to input a single RGB image and then synthesize an image from a specified camera viewpoint. if I give many specified camera viewpoints, I can get many target views. Then, I utilize the target image and corresponding extrinsic matric(eg: camera_R) as a new dataset. So I want to know if the camera_R in gradio_new.py is the target image rotation matric. if yes, I can calculate the translation matric T in gradio_new.py.

from zero123.

ruoshiliu commented on August 19, 2024

If you want to generate a novel view based on a sampled target viewpoint described by an RT matrix, assuming your input viewpoint is cond_RT, you can sample a target viewpoint target_RT, and use the get_T function to get d_T which is the conditioning information for zero123:

zero123/zero123/ldm/data/simple.py

Lines 257 to 272 in f70ea8c

    
           def get_T(self, target_RT, cond_RT): 
        
               R, T = target_RT[:3, :3], target_RT[:, -1] 
        
               T_target = -R.T @ T 
        
               R, T = cond_RT[:3, :3], cond_RT[:, -1] 
        
               T_cond = -R.T @ T 
        
               theta_cond, azimuth_cond, z_cond = self.cartesian_to_spherical(T_cond[None, :]) 
        
               theta_target, azimuth_target, z_target = self.cartesian_to_spherical(T_target[None, :]) 
        
               d_theta = theta_target - theta_cond 
        
               d_azimuth = (azimuth_target - azimuth_cond) % (2 * math.pi) 
        
               d_z = z_target - z_cond 
        
               d_T = torch.tensor([d_theta.item(), math.sin(d_azimuth.item()), math.cos(d_azimuth.item()), d_z.item()]) 
        
               return d_T

from zero123.

cwwjyh commented on August 19, 2024

If you want to generate a novel view based on a sampled target viewpoint described by an RT matrix, assuming your input viewpoint is cond_RT, you can sample a target viewpoint target_RT, and use the get_T function to get d_T which is the conditioning information for zero123:

zero123/zero123/ldm/data/simple.py

Lines 257 to 272 in f70ea8c

def get_T(self, target_RT, cond_RT):

R, T = target_RT[:3, :3], target_RT[:, -1]

T_target = -R.T @ T

R, T = cond_RT[:3, :3], cond_RT[:, -1]

T_cond = -R.T @ T

theta_cond, azimuth_cond, z_cond = self.cartesian_to_spherical(T_cond[None, :])

theta_target, azimuth_target, z_target = self.cartesian_to_spherical(T_target[None, :])

d_theta = theta_target - theta_cond

d_azimuth = (azimuth_target - azimuth_cond) % (2 * math.pi)

d_z = z_target - z_cond

d_T = torch.tensor([d_theta.item(), math.sin(d_azimuth.item()), math.cos(d_azimuth.item()), d_z.item()])

return d_T

Thank you! I will try it. Can I write the above code in gradio_new.py to get the target Rt matric?

from zero123.

ruoshiliu commented on August 19, 2024

This function does (cond_RT, target_RT) -> d_T. Sounds like you want (cond_RT, d_T) -> target_RT. You will need to implement that yourself.

from zero123.

cwwjyh commented on August 19, 2024

This function does (cond_RT, target_RT) -> d_T. Sounds like you want (cond_RT, d_T) -> target_RT. You will need to implement that yourself.

cond_RT means input image RT, target_RT means synthesize view RT, what's mean d_T?

from zero123.

ruoshiliu commented on August 19, 2024

It means the relative transformation from cond_RT to target_RT, which is used as input to the diffusion model:

zero123/zero123/gradio_new.py

Lines 81 to 82 in f70ea8c

    
           T = torch.tensor([math.radians(x), math.sin( 
        
               math.radians(y)), math.cos(math.radians(y)), z])

I suggest you try to understand the paper/code before raising an issue here. Closing the ticket for now.

from zero123.

R matrix from gradio_new.py about zero123 HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def get_T(self, target_RT, cond_RT):
	R, T = target_RT[:3, :3], target_RT[:, -1]
	T_target = -R.T @ T

	R, T = cond_RT[:3, :3], cond_RT[:, -1]
	T_cond = -R.T @ T

	theta_cond, azimuth_cond, z_cond = self.cartesian_to_spherical(T_cond[None, :])
	theta_target, azimuth_target, z_target = self.cartesian_to_spherical(T_target[None, :])

	d_theta = theta_target - theta_cond
	d_azimuth = (azimuth_target - azimuth_cond) % (2 * math.pi)
	d_z = z_target - z_cond

	d_T = torch.tensor([d_theta.item(), math.sin(d_azimuth.item()), math.cos(d_azimuth.item()), d_z.item()])
	return d_T

	T = torch.tensor([math.radians(x), math.sin(
	math.radians(y)), math.cos(math.radians(y)), z])