Hello. I was looking at 05_wrist_rom where I can see hand in 3d being properly pos

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I have added two options to the <a href="https://github.com/ntu-rris/google-mediapipe/

Demo of body 3d pose in space about google-mediapipe HOT 12 CLOSED

ntu-rris commented on July 30, 2024

Demo of body 3d pose in space

from google-mediapipe.

Comments (12)

gb2111 commented on July 30, 2024 1

@guanming001
I wonder if you could do one more try. Namely to use same 4 points to SolvePNP. It seems for me like legs especially when on knees they give bad impact on estimate.
You need to have global variables for rvec and tvec and always pass them to function as argument and read as result.
I am using something similar in C# and it allows to use 4 points when you set useExtrinsicGuess and pass previous rotation and translation vector

(_, rotation_vector, translation_vector) = cv2.solvePnP(
            points_model,
            image_points,
            self.camera_matrix,
            self.dist_coeefs,
            rvec=self.r_vec,
            tvec=self.t_vec,
            useExtrinsicGuess=True)
self.r_vec=rotation_vector
self.t_vec=translation_vector

from google-mediapipe.

guanming001 commented on July 30, 2024

Hi @gb2111 thank you for your interest in this project.

I only tried to convert the hand joint to 3D space with reference to camera coordinate by making some simplified assumptions such as the distance between wrist to index finger MCP is around 8 cm and the distance from the hand to camera is around 0.6 m (details can be found in the convert_joint_to_camera_coor function).

However, when I tried to apply similar assumptions to the human body, the results are not as good, perhaps due to the greater diversity in human body dimensions and the distance from the body to single camera view is much harder to estimate due to depth ambiguity.

from google-mediapipe.

gb2111 commented on July 30, 2024

But I have seen that you have had some kind of depth estimation and you must have done some updates.
On readme of this repo, we have 08_skeleton_3D gif that put a skeleton in some perspective. While now when I run the demo it is at the source of the camera. I think 01_video before was also in some perspective and now middle of hips is simple in camera source.

Second question. Do you somehow normalize skeleton that it hold always same size?
Thanks.

from google-mediapipe.

guanming001 commented on July 30, 2024

Hi @gb2111

I made some update to try to estimate the body joint in camera coordinate (details can be found in convert_body_joint_to_camera_coor function).

Below are the result when tested on the video taken from The Greatest Showman inspired by the work of VIBE: Video Inference for Human Body Pose and Shape Estimation [CVPR-2020].
When the person is closer to the camera, his 3D joints also appears closer, but take note that the it is still not very stable and accurate as the actual camera intrinsics are unknown and it only uses 4 keypoints to estimate the 3D pose, if you have any suggestions/improvements feel free to make a pull request.

P.S. For the 08_skeleton_3D gif I think the hip was hardcoded and fixed at some distance in front of the camera

from google-mediapipe.

gb2111 commented on July 30, 2024

Yes. actually, I have an idea :)
With each frame, we have

3D model (fixed size, not scaled)
2D projection of the model to the image.

This is perfect input to SolvePNP from OpenCV. What do you think ? This function would address the situation when a person leans forward that seems to produce jitter on your gif.
Unfortunately, I have no skill in python to use it. Know it from c# mostly.

Also, can you tell if you scale or not the model? If not how do you make it fixed size?

Edit: I tested and by the first look, it seems very well. I will take a closer look tomorrow. Can we add more points like knees to the estimation to improve accuracy?

I hope you can take a look on SolvePNP as well ;)
Best regards.

from google-mediapipe.

guanming001 commented on July 30, 2024

I have added two options to the convert_body_joint_to_camera_coor function you can give it a try:

scale_body try to keep the body dimensions fixed (e.g. hip width, arm length, leg length)
use_solvepnp but sometimes the depth of the joint may be estimated wrongly (i.e. translation in z direction is negative)

from google-mediapipe.

gb2111 commented on July 30, 2024

Thank you for adding this! I'd say they both work very well. I can see you are using all 33 landmarks for SolvePNP. I am surprised it works :) Can't tell which one works better. Apart from fact which you mention about SolvePnP direction being negative.
I am not sure why you added scale_body cause you already take it from pose_world_landmarks which are already unified?

from google-mediapipe.

guanming001 commented on July 30, 2024

The scale_body option was just to test if a fixed sized body model works, but anyway the latest version of MediaPipe 0.8.6 already offers quite a useful estimate of real-world 3D body joint so the scale_body option may not be necessary.

from google-mediapipe.

gb2111 commented on July 30, 2024

Thanks for clarification
Honestly, I think it works quite well.
Again I appreciate you made these changes.
Edit: If you find anything about pnp function and negative direction I hope you will update repo. If I find anything will let u know :)

from google-mediapipe.

gb2111 commented on July 30, 2024

Ok. I improved my python skills and added class variable for rotation and translation and use it as described above. I initialize the translation vector as [0,0,1] so the estimated pose is in a good location always. Now I can use SolvePnP either with 4 or with all landmarks.

When using 4 points need to remove multiplication by rmat = cv2.Rodrigues(rvec)[0] what doesn't seem to be required as we have very good rotation from mediapipe and here we need the only position. So in the end I removed it entirely.

So now we have in theory 3 methods. I cannot tell which one is better. Ideally would be good to compare one by one same video and have always the same view of camera. If you have a code snippet on how to initialize the camera with a view pls share it.

Thanks again for adding pnp to your repo!

from google-mediapipe.

guanming001 commented on July 30, 2024

Thank you for your suggestions!

By initializing the rvec, tvec, and enable useExtrinsicGuess, it allows solvepnp to use 4 landmarks and the issue of negative z translation is gone.

You can initialize the camera view, by simply pressing the key 'r' which will reset to default camera view.

If you want a different view you can change the value of identity matrix in self.pinhole.extrinsic.

Or you can also replace the reset_view function with the below code:

def reset_view(self):
    # Set camera view
    ctr = self.vis.get_view_control()
    ctr.set_up([0,-1,0]) # Set up as -y axis
    ctr.set_front([1,-1,-1])
    ctr.set_lookat([0,0,3])
    ctr.set_zoom(0.5)

from google-mediapipe.

gb2111 commented on July 30, 2024

I will give it a try.

Thank you.

from google-mediapipe.

Demo of body 3d pose in space about google-mediapipe HOT 12 CLOSED

Comments (12)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent