calciferzh / minimal-hand Goto Github PK

View Code? Open in Web Editor NEW

967.0 40.0 171.0 4.82 MB

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

License: MIT License

Python 100.00%

computer-vision deep-learning 3d-hand-pose-estimation hand-tracking hand-motion-capture

minimal-hand's Introduction

Minimal Hand

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

This project provides the core components for hand motion capture:

estimating joint locations from a monocular RGB image (DetNet)
estimating joint rotations from locations (IKNet)

We focus on:

ease of use (all you need is a webcam)
time efficiency (on our 1080Ti, 8.9ms for DetNet, 0.9ms for IKNet)
robustness to occlusion, hand-object interaction, fast motion, changing scale and view point

Some links: [video] [paper] [supp doc] [webpage]

The author is too busy to collect the training code for release. On the other hand, it should not be difficult to implement the training part. Feel free to open an issue for any encountered problems.

Pytorch Version

Here is a pytorch version implemented by @MengHao666. I didn't personally check it but I believe it worth trying. Many thanks to @MengHao666 !

With Unity

Here is a project that connects this repo to unity. It looks very cool and many thanks to @vinnik-dmitry07 !

Usage

Install dependencies

Please check requirements.txt. All dependencies are available via pip and conda.

Prepare MANO hand model

Download MANO model from here and unzip it.
In config.py, set OFFICIAL_MANO_PATH to the left hand model.
Run python prepare_mano.py, you will get the converted MANO model that is compatible with this project at config.HAND_MESH_MODEL_PATH.

Prepare pre-trained network models

Download models from here.
Put detnet.ckpt.* in model/detnet, and iknet.ckpt.* in model/iknet.
Check config.py, make sure all required files are there.

Run the demo for webcam input

python app.py
Put your right hand in front of the camera. The pre-trained model is for left hand, but the input would be flipped internally.
Press ESC to quit.
Although the model is robust to variant scales, most ideally the image should be 1.3x larger than the hand bounding box. A good bounding box may result in better accuracy. You can track the bounding box with the 2D predictions of the model.

We found that the model may fail on some "simple" poses. We think this is because such poses were no presented in the training data. We are working on a v2 version with further extended data to tackle this problem.

Use the models in your project

Please check wrappers.py.

IKNet Alternative

We also provide an optimization-based IK solver here.

Dataset

The detection model is trained with following datasets:

The IK model is trained with the poses shipped with MANO.

Citation

This is the official implementation of the paper "Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data" (CVPR 2020).

The quantitative numbers reported in the paper can be found in plot.py.

If you find the project helpful, please consider citing us:

@InProceedings{zhou2020monocular,
  author = {Zhou, Yuxiao and Habermann, Marc and Xu, Weipeng and Habibie, Ikhsanul and Theobalt, Christian and Xu, Feng},
  title = {Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}

minimal-hand's People

Contributors

Stargazers

Watchers

Forkers

lotayou7355608 hz1272803128 iamlychee mandymo kevinchen1223 finchchen zumbalamambo aiekick peterzhousz human2b wishinger-li wjgaas avatarworld kenyony stevenlsw fisis trendingtechnology adujardin gridl goodgoodstudy92 zebrajack lhq0308 dpduanpu jason0720 mryudi wellsred www516717402 aaronzou ml-and-ai-repo satoshirobatofujimoto zhoushiwei guyafeng xiuyangleiasp vinnik-dmitry07 xjsxujingsong cbsudux sadjadasghari gh0sthx olegjakushkin wuxiaolianggit shuj1 suledtc lucas-amin wavelens reyuwei hoofreeman kurnianggoro dumyy jet-zheng yukizj bboyhanat foamliu palmerkuo felixzhang7 gq124 liyunfan1998 jiayi-wang-mpi zhang405744522 sowd liuguoyou falconmadhab fweidner sreerag-ibtl czkg shamazharikh xrcodes kuibinzhao zengwb-lx sammymx rafaeltoyo hyzcn wuyx matrix-2020 yuyufei88 gouxiayibu it086 hastar eastarliu akatsuka-daichi jaesung-jun aysan1 u34rali zhangxuelei86 willligithub peterzs markusbuchholz z1z9b89 ericwang0701 stevengit baodijun zrsee jiakechong1991 viliusmat jasperyu samshin7 rmbashirov almutama joe-surrey alperiox masataka46

minimal-hand's Issues

Joints visualization of demo video.

Hi, Yuxiao

Thanks for your awesome work!

In the demo video, you plot joints like this:

How to align joints with image?

Thanks.

@CalciferZh

IK using 3D joint coordinates

Hello,
First, I would like to congrats you on the amazing paper. Moreover, I have a question regarding IK architecture. I would like to know if there is any comparison between the IK architecture that you proposed here and the other algorithm that you previously proposed based on Levenberg-Marquadr on Mano's hand. Additionally, could you guide me on applying the IK architecture without running the entire code as I have some ground truth 3d coordinates, and I want to obtain the IK parameters? Thanks a lot.

how to determine the global rotation?

In MANO model, the first three pose parameters is the global rotation, I want to know how do you define it?And I find the globel rotation of 3d joints in GenerateHands datasets is not match the picture, how do you process it when training ?

python app.py issue

pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python\platform\self_check.py", line 75, in preload_check
ctypes.WinDLL(build_info.cudart_dll_name)
File "C:\Program Files\Python37\lib\ctypes_init_.py", line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "app.py", line 13, in
from wrappers import ModelPipeline
File "C:\Users\faker\OneDrive\Desktop\minimal-hand\wrappers.py", line 4, in

import tensorflow as tf
File "C:\Program Files\Python37\lib\site-packages\tensorflow_init_.py", line 28, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python_init_.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 30, in
self_check.preload_check()
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python\platform\self_check.py", line 82, in preload_check
% (build_info.cudart_dll_name, build_info.cuda_version_number))
ImportError: Could not find 'cudart64_100.dll'. TensorFlow requires that this
DLL be installed in a directory that is named in your %PATH% environment variable. Download and install CUDA 10.0 from this URL: https://developer.nvidia.com/cuda-90-download-archive

Windows 10 OS
installed all the requirements.txt still some issue with running the file

Obtaining MoCAP from a two hand video dataset

Greetings and many thanks for the great work.

I wanted to utilize your code to extract MoCAP data given a first person RGB video dataset that has a clear view of both hands during a task. Given that your model is restricted to predicting from a single hand I wonder whether it will consistently show preference for the left hand if presented with videos that display both? If that's the case I suppose I could parse the dataset twice, flipping it the second time to obtain both hands' coordinates, right?

Any plans on evaluating on Sign Language dataset?

Thanks for your cool work. If yes, look at the list of datasets https://github.com/AlexeYAYAY/sign-lanuage-datasets. Have you tried to test which dataset from the list?

Finding the centroid of the hand

Can we get the centroid or is there a link in the middle to roughly estimate the centroid

about loss of quat?

Hello, that is a great job.
Could you please give some code about Formula 10? Once I use it , the loss can't converge.
Thank you!

How can I get coordinates on frame_large?

Hi. Your project is amazing.

I'm doing an AR app that puts 3d objects around a finger. I multiply 32x32 coordinates by 4 to map on 128x128 image.

I wonder how can to get joints coordinates on an initial frame?

Training step

Hello.
About DetNet training. 2D points / 3D points / delta map by One-time training, or Train in order?
In my code training 2D points and 3D points in FreiHand by One-time. However get terrible result.
Hope your reply.

download link of pretrained model in baiduyun

we can't download from google drive link. please supply a link in baiduyun

Are my results normal?

https://youtu.be/RbReWo94W68
I use the example app. Did I miss some additional setups?

Have you tried another rotation representation? and how is axis-angle performed in your paper?

https://arxiv.org/abs/1812.07035 which is also used in VIBE and SPIN.

Thanks!

IKNet Input

Hi! Thanks for sharing this nice project. By going through your code, it seems to me that the input to the IKNet is simply the joint coordinates instead of the encoding you mentioned in the paper. It's that really the case or I missed something? Thanks!

How do I run this on a video with both hands?

How to run on a video with both hands, left and right?

Did your MANO model takes in shape parameter (beta)?

Thanks for your sharing. I have several confusions.

The network structure in paper shows that DetNet estimation shape parameter and the mano model takes in both shape and pose parameters. But the code in app.py seems that the DetNet only outputs joint location and the mano model only needs theta to recover hand mesh.
Is your IKNet pretrained using only MoCap data, then integrate it into the whole network pipeline and fintune it ?

Thanks for your answer.

About the pre-trained network model

Hello, thanks for sharing this amazing code! This is a great work!
I have a question about the pre-trained model in the Google drive.

I downloaded this pre-trained model, and ran app.py code for inference.
I noticed that the hand shape was inconsistent with some hand poses.
For example,

when I tried to bend my middle finger, the hand mesh did not bend accordingly.
when I rotated my hand and let the camera capture the hand side (not palm or back), the hand mesh was not generated according to the hand pose.

So, I tried to retrieve the hand pose(21-joint skeleton) by modifying this line:
_, theta_mpii = model.process(frame)
into
xyz, theta_mpii = model.process(frame)
, and displayed "xyz" in Open3D visualizer.
I found out that hand pose also had the same issues.

I have a feeling that this pre-trained model was not fully converged during training.
Could you please kindly confirm that if I am right? :)
Or is there something wrong in app.py code?
Thank you!

Output probability

Cool repo. I am wondering if it's possible to also output to per joint probability. Is that something that you have looked into?

Suggestion for extracting 3D keypoints for both hands

Hi @CalciferZh ,
Thanks for this elaborative and easy-to-use repository. I tried your code and it's working fine for the right hand as you said (flipped internally as the model here is trained for left hand).
I went one step forward and tried to detect both hands in realtime and used your "ModelDet()" from "wrappers.py" to get xyz for every bounding boxes generated by hand-detection model.
My confusion is that how can i distinguish both hands(left or right) to get the corresponding results from your model.
Please share here if anyone has done the same.

Can you tell me how to visualize a bounding box of the scene?

like your demo the black and white grid,thanks

Any plans on evaluating on FreiHAND dataset?

I'm curious as it seems to be one of the better datasets publicly available, not only does it include really accurate 3D poses, but they are all on real images include challenging poses and object interactions. Along with all of this, it includes MANO hand shape ground truths. I would love to see how this model performs.

It also allows for seeing how this performs without needing alignment since both camera intrinsics and scale are included for each image

I'm also curious if this would be a good alternative for training IKNet instead of the MoCap data since it includes the hand shape ground truths. I'm not sure if I should open a separate issue for that to make it easier for others to find

when will training code release?

When will release the training code?

About fist

This is a great job. Have you tried to make a fist? I find that making a fist is unnatural.

'NoneType' object has no attribute 'load_from_json'

Hi, @CalciferZh , thanks for releasing this wonderful work, but when test on my own images I got an error like above, can you give some advice?
btw, after succefully installed all requirements , there is also an GLFW Error: X11: The DISPLAY environment variable is missing Failed to initialize GLFW
Hope can receive your response, thanks~

How to extract 3d coordinates

Hello! First of all, thank you for this project, it is awesome!

Please, how can I extract 3d coordinates of each joint? What is the distance unit? Is it possible to extract the hand pose into a file?

Thank you.

Demo on CPU

Hi,

I was wondering if the demo can run on CPU. I guess it will not be as fast, but are the code and models compatible with it?

Thanks

Benet

How do you get the output for 'Results on Sequences from Internet'?

In your video, I see that you detect hands on the first frame and crop. Couple of questions:

What model do you use to crop?
You mention that you use 2D pose for the rest of the frames. I don't understand this, can you elaborate please?

Thanks

Hand croppping?

Was DetNet trained on cropped hand images? or did you use the full image? I'm a bit confused since it seems in the inference app there's zero hand cropping. It looks like all that is done is the image is converted to a square and then reshaped to 128x128.

From what I can tell most hand keypoint detection pipelines usually involve first cropping out the hand and then feeding it into the 2D/3D keypoint detector.

Getting bad results on simple poses. Any idea why?

I'm using the left hand model and have flipped it accordingly.

Also when I bend one finger, the model doesn't recognize this.

Can we calculate the axis-angle representation directly from 3D locations

Hi, great work.
I am confused that can we calculate the axis-angle representations directly from 3D joint locations?

How can I overlay the mesh result on the origin image as Figure 1 in the paper ?

Hi, I noticed that the rendered result is not the same size as the input image, and it is very difficult to align them in the same image. I wonder how you create the visualization result as show in Figure 1 in your paper? Thank you!

[Plan on PyTorch?]

Thanks for your great work!
Would you have a plan for releasing PyTorch version?

how to use the Right-hand model

In config.py, I set OFFICIAL_MANO_PATH to the right hand model and Run python prepare_mano.py.Then I can get the converted MANO model about right hand. But when I use the converted MANO model about right hand, the result is so bad. Where are the wrongs?
Thus, I want to known how to use the MANO model about right hand OR how to convert the MANO model about right hand?
looking forward to your reply! Thanks a lot.

Is the quaternion based on left-handed or right-handed？

how to train the model?

Will this code compatible with tf2.0?

hand3d_minimal/wrappers.py", line 25, in __init__
    with tf.variable_scope('prior_based_hand'):
AttributeError: module 'tensorflow' has no attribute 'variable_scope'

Also how to train the model? Any datasets?

About the training

Hey, it is a very cool work. By the way, is there any plan for releasing the training code and dataset?

Is there some ways to update the model?

Hi, thanks for sharing. I am also working on HGR, and I want to do some modifications for my product. I do not need to reconstruct the hand model using MANO, I just want to classify different gestures from skeleton. But seems your models are pretrained. Are there some methods to modify your pretrained model, or just retrain it?
Thanks!!!

Could you add some new features

I'll write it succinctly

Nice futures:

Two Hands at the same time
Training class, where u can train your own dataset - with an training exsample
and
Hand Positional Tracking (Not only relative position)

Some solutions:
Adding a handfinder and putting this into the ai, then calculating the relative position with the center of the rect

best regards,
DENNIS

When changing the original 16 rotations of MANO into 21 rotations, do W, T0, I0, M0, R0, and L0 share the rotation of W in the original MANO?
I found the joints_xyz calculated by using MANO ref_pose and the transformed 21 rotation parameters using the method in hand_mesh.py is not equal to the 'J_transformed' saved in the MANO pkl file , the order of joints has been adjusted according to kinematics.py. When using the MANO dataset to train IKNet, how did you get the ground truth 3D joint annotation in Lxyz? Is the calculation method of FK (Q) the same as the calculation method of joint_xyz in hand_mesh.py

How can I save joint location?

That's a nice job, by the way, could you just tell me how or where can I save real-time joint location imfomation? Thank you very much！