Coder Social home page Coder Social logo

Comments (7)

mks0601 avatar mks0601 commented on May 24, 2024

There are too many questions and actually, answers of most of the questions are in my paper. Could you reduce the number of questions after reading the paper?

from 3dmppe_rootnet_release.

usamahjundia avatar usamahjundia commented on May 24, 2024

No, i quite understand the gist of some questions actually (which is pretty much explained clearly in the paper, thanks for that), but not the technical aspect of doing so.
Here are the reduced actual questions:

  1. Did you just crop a bounding box out of detection (the paper mentioned this), or do you crop the bounding box and mask the image (this remains unclear)? does not doing the latter do well for 2 persons occluding one another?
  2. What is the dimension of the correction factor of the rootnet? the dimension of the computed K is in terms of real-world length unit (mm,cm,inch etc), but what i observe in the testing protocol, you directly find the Distance between the GT "depth" or distance from camera. Yes it undergoes some transformations, but in inspecting the code, i see that the Z value remains unchanged. Short question is, rootnet outputs x,y and Z. Is x and y in pixels and Z in length unit?
  3. If (2) is true, how do we convert Z back to pixels, or whatever unit is suitable for input? If i know the dimensionality of the image censor and the actual pixel range for the image, can i use this info to do so?

Thank you. I apologize for asking too many questions as i asked the questions while reading the paper and i really need a lot of iteration in reading the paper and as you observe, i edited the questions several times as i got answers in an iteration of reading.

from 3dmppe_rootnet_release.

mks0601 avatar mks0601 commented on May 24, 2024
  1. I cannot understand what you mean by 'masking', but I just really 'crop and resize' the bounding box area from the original image.
  2. The dimension of the correction factor is 1 because that is scalar. I think you are interested in unit, and that is described in paper. The RootNet outputs (x,y) image coordinates of human root joint and correction factor gamma has no unit. K is in mm.
  3. What do you mean by convert Z back to pixel? pixel is defined in xy space, not z space.

from 3dmppe_rootnet_release.

usamahjundia avatar usamahjundia commented on May 24, 2024

What i meant by the 3rd point is to represent the depth/distance (the corrected K value) in pixel dimension. But it seems like i found some insight in the code. I'm looking at this line here in MSCOCO.py:

pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0)
pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

Seems like PoseNet outputs all x,y,z in pixel unit, and this is the conversion into length unit.

So is this statement true:

In 3D pixel coordinates, humans are surrounded by a 3D bounding box of size 256x256x256 (or 64?) and in real world coordinates of size 2x2x2 meters, and the processing on the third dimension of posenet outputs is the mapping from this 256^3 in pixel space to 2^3 in length unit (meter) space?

(Yes, i am aware this isnt the repo for posenet, but starting a new issue in the other repo just to get a related question answered does not seem effective)

from 3dmppe_rootnet_release.

mks0601 avatar mks0601 commented on May 24, 2024

First of all, depth cannot be converted to pixel. Depth is defined in z-axis, and pixel is defined in x- and y-axis.
PoseNet outputs x,y in image coordinate space and z in discretized camera-centered space.
Regarding your question, there is no 3D pixel coordinate.
Do you mean voxel? If that is the case, the answer is close to yes. I cannot understand 'processing on the third dimension'

from 3dmppe_rootnet_release.

usamahjundia avatar usamahjundia commented on May 24, 2024

by processing on the third dimension i meant this line

pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

and okay, i obviously failed trying to frame it in academic terms so let's reframe:
let's say i want to visualize or export the 3D coordinates of a pose, for example, so that i can use it to animate a 3D model of a character. How do i make sure the x,y, and z coordinates are consistent, as in, movements along the x axis and y axis (which is in posenet outputs, in terms of image space which i have been failing to find the right term for) is in the same scale / unit as the z axis?

from 3dmppe_rootnet_release.

mks0601 avatar mks0601 commented on May 24, 2024

L168 of https://github.com/mks0601/3DMPPE_POSENET_RELEASE/blob/master/data/MSCOCO/MSCOCO.py

pixel2cam converts (x_img, y_img, z_cam) to (x_cam, y_cam, z_cam)

from 3dmppe_rootnet_release.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.