Hello! Thanks in advance for this implementation of a cool paper! Currently i am tryin

I cannot understand what you mean by 'masking', but I just really 'crop and resi

by processing on the third dimension i meant this line <div class="highlight highl

Combination Strategy for Root + Pose about 3dmppe_rootnet_release HOT 7 CLOSED

mks0601 commented on May 24, 2024

Combination Strategy for Root + Pose

from 3dmppe_rootnet_release.

Comments (7)

mks0601 commented on May 24, 2024

There are too many questions and actually, answers of most of the questions are in my paper. Could you reduce the number of questions after reading the paper?

from 3dmppe_rootnet_release.

usamahjundia commented on May 24, 2024

No, i quite understand the gist of some questions actually (which is pretty much explained clearly in the paper, thanks for that), but not the technical aspect of doing so.
Here are the reduced actual questions:

Did you just crop a bounding box out of detection (the paper mentioned this), or do you crop the bounding box and mask the image (this remains unclear)? does not doing the latter do well for 2 persons occluding one another?
What is the dimension of the correction factor of the rootnet? the dimension of the computed K is in terms of real-world length unit (mm,cm,inch etc), but what i observe in the testing protocol, you directly find the Distance between the GT "depth" or distance from camera. Yes it undergoes some transformations, but in inspecting the code, i see that the Z value remains unchanged. Short question is, rootnet outputs x,y and Z. Is x and y in pixels and Z in length unit?
If (2) is true, how do we convert Z back to pixels, or whatever unit is suitable for input? If i know the dimensionality of the image censor and the actual pixel range for the image, can i use this info to do so?

Thank you. I apologize for asking too many questions as i asked the questions while reading the paper and i really need a lot of iteration in reading the paper and as you observe, i edited the questions several times as i got answers in an iteration of reading.

from 3dmppe_rootnet_release.

mks0601 commented on May 24, 2024

I cannot understand what you mean by 'masking', but I just really 'crop and resize' the bounding box area from the original image.
The dimension of the correction factor is 1 because that is scalar. I think you are interested in unit, and that is described in paper. The RootNet outputs (x,y) image coordinates of human root joint and correction factor gamma has no unit. K is in mm.
What do you mean by convert Z back to pixel? pixel is defined in xy space, not z space.

from 3dmppe_rootnet_release.

usamahjundia commented on May 24, 2024

What i meant by the 3rd point is to represent the depth/distance (the corrected K value) in pixel dimension. But it seems like i found some insight in the code. I'm looking at this line here in MSCOCO.py:

pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0)
pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

Seems like PoseNet outputs all x,y,z in pixel unit, and this is the conversion into length unit.

So is this statement true:

In 3D pixel coordinates, humans are surrounded by a 3D bounding box of size 256x256x256 (or 64?) and in real world coordinates of size 2x2x2 meters, and the processing on the third dimension of posenet outputs is the mapping from this 256^3 in pixel space to 2^3 in length unit (meter) space?

(Yes, i am aware this isnt the repo for posenet, but starting a new issue in the other repo just to get a related question answered does not seem effective)

from 3dmppe_rootnet_release.

mks0601 commented on May 24, 2024

First of all, depth cannot be converted to pixel. Depth is defined in z-axis, and pixel is defined in x- and y-axis.
PoseNet outputs x,y in image coordinate space and z in discretized camera-centered space.
Regarding your question, there is no 3D pixel coordinate.
Do you mean voxel? If that is the case, the answer is close to yes. I cannot understand 'processing on the third dimension'

from 3dmppe_rootnet_release.

usamahjundia commented on May 24, 2024

by processing on the third dimension i meant this line

pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

and okay, i obviously failed trying to frame it in academic terms so let's reframe:
let's say i want to visualize or export the 3D coordinates of a pose, for example, so that i can use it to animate a 3D model of a character. How do i make sure the x,y, and z coordinates are consistent, as in, movements along the x axis and y axis (which is in posenet outputs, in terms of image space which i have been failing to find the right term for) is in the same scale / unit as the z axis?

from 3dmppe_rootnet_release.

mks0601 commented on May 24, 2024

L168 of https://github.com/mks0601/3DMPPE_POSENET_RELEASE/blob/master/data/MSCOCO/MSCOCO.py

pixel2cam converts (x_img, y_img, z_cam) to (x_cam, y_cam, z_cam)

from 3dmppe_rootnet_release.

Combination Strategy for Root + Pose about 3dmppe_rootnet_release HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent