Comments (7)
There are too many questions and actually, answers of most of the questions are in my paper. Could you reduce the number of questions after reading the paper?
from 3dmppe_rootnet_release.
No, i quite understand the gist of some questions actually (which is pretty much explained clearly in the paper, thanks for that), but not the technical aspect of doing so.
Here are the reduced actual questions:
- Did you just crop a bounding box out of detection (the paper mentioned this), or do you crop the bounding box and mask the image (this remains unclear)? does not doing the latter do well for 2 persons occluding one another?
- What is the dimension of the correction factor of the rootnet? the dimension of the computed K is in terms of real-world length unit (mm,cm,inch etc), but what i observe in the testing protocol, you directly find the Distance between the GT "depth" or distance from camera. Yes it undergoes some transformations, but in inspecting the code, i see that the Z value remains unchanged. Short question is, rootnet outputs x,y and Z. Is x and y in pixels and Z in length unit?
- If (2) is true, how do we convert Z back to pixels, or whatever unit is suitable for input? If i know the dimensionality of the image censor and the actual pixel range for the image, can i use this info to do so?
Thank you. I apologize for asking too many questions as i asked the questions while reading the paper and i really need a lot of iteration in reading the paper and as you observe, i edited the questions several times as i got answers in an iteration of reading.
from 3dmppe_rootnet_release.
- I cannot understand what you mean by 'masking', but I just really 'crop and resize' the bounding box area from the original image.
- The dimension of the correction factor is 1 because that is scalar. I think you are interested in unit, and that is described in paper. The RootNet outputs (x,y) image coordinates of human root joint and correction factor gamma has no unit. K is in mm.
- What do you mean by convert Z back to pixel? pixel is defined in xy space, not z space.
from 3dmppe_rootnet_release.
What i meant by the 3rd point is to represent the depth/distance (the corrected K value) in pixel dimension. But it seems like i found some insight in the code. I'm looking at this line here in MSCOCO.py:
pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0)
pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]
Seems like PoseNet outputs all x,y,z in pixel unit, and this is the conversion into length unit.
So is this statement true:
In 3D pixel coordinates, humans are surrounded by a 3D bounding box of size 256x256x256 (or 64?) and in real world coordinates of size 2x2x2 meters, and the processing on the third dimension of posenet outputs is the mapping from this 256^3 in pixel space to 2^3 in length unit (meter) space?
(Yes, i am aware this isnt the repo for posenet, but starting a new issue in the other repo just to get a related question answered does not seem effective)
from 3dmppe_rootnet_release.
First of all, depth cannot be converted to pixel. Depth is defined in z-axis, and pixel is defined in x- and y-axis.
PoseNet outputs x,y in image coordinate space and z in discretized camera-centered space.
Regarding your question, there is no 3D pixel coordinate.
Do you mean voxel? If that is the case, the answer is close to yes. I cannot understand 'processing on the third dimension'
from 3dmppe_rootnet_release.
by processing on the third dimension i meant this line
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]
and okay, i obviously failed trying to frame it in academic terms so let's reframe:
let's say i want to visualize or export the 3D coordinates of a pose, for example, so that i can use it to animate a 3D model of a character. How do i make sure the x,y, and z coordinates are consistent, as in, movements along the x axis and y axis (which is in posenet outputs, in terms of image space which i have been failing to find the right term for) is in the same scale / unit as the z axis?
from 3dmppe_rootnet_release.
L168 of https://github.com/mks0601/3DMPPE_POSENET_RELEASE/blob/master/data/MSCOCO/MSCOCO.py
pixel2cam
converts (x_img, y_img, z_cam) to (x_cam, y_cam, z_cam)
from 3dmppe_rootnet_release.
Related Issues (20)
- DetectNet으로 resnet 50 + FPN 사용 HOT 3
- test on wild images HOT 7
- Measurement of bbox_real HOT 9
- Issue with 3d visualisation
- Does the RootNet work if i just predict the gamma?
- How do I set root_depth_list HOT 2
- How to set the config for the FreiHAND dataset HOT 12
- pre-trained RootNet is broken HOT 1
- Where did I find k ?Data sets should be downloaded and merged HOT 4
- human-images HOT 1
- test dataset
- when i demo on my own img,the resout is very bad,why?
- the coordinates of x and y HOT 2
- Is there any follow-up study? HOT 2
- About 3DPW dataset HOT 2
- About MuCo dataset
- demo output
- Converting to onnx HOT 2
- Dataset download HOT 6
- The issue of inaccurate deep prediction HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 3dmppe_rootnet_release.