Coder Social home page Coder Social logo

calciferzh / minimal-hand Goto Github PK

View Code? Open in Web Editor NEW
967.0 40.0 171.0 4.82 MB

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

License: MIT License

Python 100.00%
computer-vision deep-learning 3d-hand-pose-estimation hand-tracking hand-motion-capture

minimal-hand's Introduction

Minimal Hand

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

teaser

This project provides the core components for hand motion capture:

  1. estimating joint locations from a monocular RGB image (DetNet)
  2. estimating joint rotations from locations (IKNet)

We focus on:

  1. ease of use (all you need is a webcam)
  2. time efficiency (on our 1080Ti, 8.9ms for DetNet, 0.9ms for IKNet)
  3. robustness to occlusion, hand-object interaction, fast motion, changing scale and view point

Some links: [video] [paper] [supp doc] [webpage]

The author is too busy to collect the training code for release. On the other hand, it should not be difficult to implement the training part. Feel free to open an issue for any encountered problems.

Pytorch Version

Here is a pytorch version implemented by @MengHao666. I didn't personally check it but I believe it worth trying. Many thanks to @MengHao666 !

With Unity

Here is a project that connects this repo to unity. It looks very cool and many thanks to @vinnik-dmitry07 !

Usage

Install dependencies

Please check requirements.txt. All dependencies are available via pip and conda.

Prepare MANO hand model

  1. Download MANO model from here and unzip it.
  2. In config.py, set OFFICIAL_MANO_PATH to the left hand model.
  3. Run python prepare_mano.py, you will get the converted MANO model that is compatible with this project at config.HAND_MESH_MODEL_PATH.

Prepare pre-trained network models

  1. Download models from here.
  2. Put detnet.ckpt.* in model/detnet, and iknet.ckpt.* in model/iknet.
  3. Check config.py, make sure all required files are there.

Run the demo for webcam input

  1. python app.py
  2. Put your right hand in front of the camera. The pre-trained model is for left hand, but the input would be flipped internally.
  3. Press ESC to quit.
  4. Although the model is robust to variant scales, most ideally the image should be 1.3x larger than the hand bounding box. A good bounding box may result in better accuracy. You can track the bounding box with the 2D predictions of the model.

We found that the model may fail on some "simple" poses. We think this is because such poses were no presented in the training data. We are working on a v2 version with further extended data to tackle this problem.

Use the models in your project

Please check wrappers.py.

IKNet Alternative

We also provide an optimization-based IK solver here.

Dataset

The detection model is trained with following datasets:

The IK model is trained with the poses shipped with MANO.

Citation

This is the official implementation of the paper "Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data" (CVPR 2020).

The quantitative numbers reported in the paper can be found in plot.py.

If you find the project helpful, please consider citing us:

@InProceedings{zhou2020monocular,
  author = {Zhou, Yuxiao and Habermann, Marc and Xu, Weipeng and Habibie, Ikhsanul and Theobalt, Christian and Xu, Feng},
  title = {Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}

minimal-hand's People

Contributors

adujardin avatar calciferzh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minimal-hand's Issues

IK using 3D joint coordinates

Hello,
First, I would like to congrats you on the amazing paper. Moreover, I have a question regarding IK architecture. I would like to know if there is any comparison between the IK architecture that you proposed here and the other algorithm that you previously proposed based on Levenberg-Marquadr on Mano's hand. Additionally, could you guide me on applying the IK architecture without running the entire code as I have some ground truth 3d coordinates, and I want to obtain the IK parameters? Thanks a lot.

how to determine the global rotation?

In MANO model, the first three pose parameters is the global rotation, I want to know how do you define it?And I find the globel rotation of 3d joints in GenerateHands datasets is not match the picture, how do you process it when training ?

python app.py issue

pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python\platform\self_check.py", line 75, in preload_check
ctypes.WinDLL(build_info.cudart_dll_name)
File "C:\Program Files\Python37\lib\ctypes_init_.py", line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "app.py", line 13, in
from wrappers import ModelPipeline
File "C:\Users\faker\OneDrive\Desktop\minimal-hand\wrappers.py", line 4, in

import tensorflow as tf
File "C:\Program Files\Python37\lib\site-packages\tensorflow_init_.py", line 28, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python_init_.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 30, in
self_check.preload_check()
File "C:\Program Files\Python37\lib\site-packages\tensorflow\python\platform\self_check.py", line 82, in preload_check
% (build_info.cudart_dll_name, build_info.cuda_version_number))
ImportError: Could not find 'cudart64_100.dll'. TensorFlow requires that this
DLL be installed in a directory that is named in your %PATH% environment variable. Download and install CUDA 10.0 from this URL: https://developer.nvidia.com/cuda-90-download-archive

Windows 10 OS
installed all the requirements.txt still some issue with running the file

Obtaining MoCAP from a two hand video dataset

Greetings and many thanks for the great work.

I wanted to utilize your code to extract MoCAP data given a first person RGB video dataset that has a clear view of both hands during a task. Given that your model is restricted to predicting from a single hand I wonder whether it will consistently show preference for the left hand if presented with videos that display both? If that's the case I suppose I could parse the dataset twice, flipping it the second time to obtain both hands' coordinates, right?

about loss of quat?

Hello, that is a great job.
Could you please give some code about Formula 10? Once I use it , the loss can't converge.
Thank you!

How can I get coordinates on frame_large?

Hi. Your project is amazing.

I'm doing an AR app that puts 3d objects around a finger. I multiply 32x32 coordinates by 4 to map on 128x128 image.

I wonder how can to get joints coordinates on an initial frame?

Training step

Hello.
About DetNet training. 2D points / 3D points / delta map by One-time training, or Train in order?
In my code training 2D points and 3D points in FreiHand by One-time. However get terrible result.
Hope your reply.

IKNet Input

Hi! Thanks for sharing this nice project. By going through your code, it seems to me that the input to the IKNet is simply the joint coordinates instead of the encoding you mentioned in the paper. It's that really the case or I missed something? Thanks!

Did your MANO model takes in shape parameter (beta)?

Thanks for your sharing. I have several confusions.

  1. The network structure in paper shows that DetNet estimation shape parameter and the mano model takes in both shape and pose parameters. But the code in app.py seems that the DetNet only outputs joint location and the mano model only needs theta to recover hand mesh.

  2. Is your IKNet pretrained using only MoCap data, then integrate it into the whole network pipeline and fintune it ?

Thanks for your answer.

About the pre-trained network model

Hello, thanks for sharing this amazing code! This is a great work!
I have a question about the pre-trained model in the Google drive.

I downloaded this pre-trained model, and ran app.py code for inference.
I noticed that the hand shape was inconsistent with some hand poses.
For example,

  1. when I tried to bend my middle finger, the hand mesh did not bend accordingly.
  2. when I rotated my hand and let the camera capture the hand side (not palm or back), the hand mesh was not generated according to the hand pose.

So, I tried to retrieve the hand pose(21-joint skeleton) by modifying this line:
_, theta_mpii = model.process(frame)
into
xyz, theta_mpii = model.process(frame)
, and displayed "xyz" in Open3D visualizer.
I found out that hand pose also had the same issues.

I have a feeling that this pre-trained model was not fully converged during training.
Could you please kindly confirm that if I am right? :)
Or is there something wrong in app.py code?
Thank you!

Output probability

Cool repo. I am wondering if it's possible to also output to per joint probability. Is that something that you have looked into?

Suggestion for extracting 3D keypoints for both hands

Hi @CalciferZh ,
Thanks for this elaborative and easy-to-use repository. I tried your code and it's working fine for the right hand as you said (flipped internally as the model here is trained for left hand).
I went one step forward and tried to detect both hands in realtime and used your "ModelDet()" from "wrappers.py" to get xyz for every bounding boxes generated by hand-detection model.
My confusion is that how can i distinguish both hands(left or right) to get the corresponding results from your model.
Please share here if anyone has done the same.

Any plans on evaluating on FreiHAND dataset?

I'm curious as it seems to be one of the better datasets publicly available, not only does it include really accurate 3D poses, but they are all on real images include challenging poses and object interactions. Along with all of this, it includes MANO hand shape ground truths. I would love to see how this model performs.

It also allows for seeing how this performs without needing alignment since both camera intrinsics and scale are included for each image

I'm also curious if this would be a good alternative for training IKNet instead of the MoCap data since it includes the hand shape ground truths. I'm not sure if I should open a separate issue for that to make it easier for others to find

About fist

This is a great job. Have you tried to make a fist? I find that making a fist is unnatural.

'NoneType' object has no attribute 'load_from_json'

Hi, @CalciferZh , thanks for releasing this wonderful work, but when test on my own images I got an error like above, can you give some advice?
btw, after succefully installed all requirements , there is also an GLFW Error: X11: The DISPLAY environment variable is missing Failed to initialize GLFW
Hope can receive your response, thanks~

How to extract 3d coordinates

Hello! First of all, thank you for this project, it is awesome!

Please, how can I extract 3d coordinates of each joint? What is the distance unit? Is it possible to extract the hand pose into a file?

Thank you.

Demo on CPU

Hi,

I was wondering if the demo can run on CPU. I guess it will not be as fast, but are the code and models compatible with it?

Thanks

Benet

Hand croppping?

Was DetNet trained on cropped hand images? or did you use the full image? I'm a bit confused since it seems in the inference app there's zero hand cropping. It looks like all that is done is the image is converted to a square and then reshaped to 128x128.

From what I can tell most hand keypoint detection pipelines usually involve first cropping out the hand and then feeding it into the 2D/3D keypoint detector.

[Plan on PyTorch?]

Thanks for your great work!
Would you have a plan for releasing PyTorch version?

how to use the Right-hand model

In config.py, I set OFFICIAL_MANO_PATH to the right hand model and Run python prepare_mano.py.Then I can get the converted MANO model about right hand. But when I use the converted MANO model about right hand, the result is so bad. Where are the wrongs?
Thus, I want to known how to use the MANO model about right hand OR how to convert the MANO model about right hand?
looking forward to your reply! Thanks a lot.

how to train the model?

Will this code compatible with tf2.0?

hand3d_minimal/wrappers.py", line 25, in __init__
    with tf.variable_scope('prior_based_hand'):
AttributeError: module 'tensorflow' has no attribute 'variable_scope'

Also how to train the model? Any datasets?

About the training

Hey, it is a very cool work. By the way, is there any plan for releasing the training code and dataset?

Is there some ways to update the model?

Hi, thanks for sharing. I am also working on HGR, and I want to do some modifications for my product. I do not need to reconstruct the hand model using MANO, I just want to classify different gestures from skeleton. But seems your models are pretrained. Are there some methods to modify your pretrained model, or just retrain it?
Thanks!!!

Could you add some new features

I'll write it succinctly

Nice futures:

  • Two Hands at the same time
  • Training class, where u can train your own dataset - with an training exsample
    and
  • Hand Positional Tracking (Not only relative position)

Some solutions:
Adding a handfinder and putting this into the ai, then calculating the relative position with the center of the rect

best regards,
DENNIS

About location map

hello, I have read the paper and the code. But I am still confused about how to get the GT location maps from 2d heatmaps. Could you explain it in more detail? Thanks a lot!

How to mix and train the different datasets?

Paper say that: DetNet is trained on 3 datasets: theCMU Panoptic Dataset (CMU) , the Rendered Hand-pose Dataset (RHD) and the GANerated Hands Dataset(GAN).

Since the images of three datasets are different from each other, can u please tell me how to preprocess the image?

Batching

I'm wondering how you would go about changing the the batch size of detnet? I've tried a few thing but am unable to load your pretrained weights unless the batch size is 1. Thanks again for your help.

About 3D label

Thank you for great project. In your project set Batchnum_pointsWH3 as 3d label. I want to know how to generate 3D label ? Anyone can help me.

About SMPL(mosh) label .

Hello, Ask a question again.
There is no mosh(SMPL theta and beta) in STB、RHD、FreiHand dataset etc. How to translate 3D keypoints to mesh(SMPL theta beta)?
Hope your reply, thanks.

Could you supply me with train code?

I am very interesting in your work and want to follow your work. I don't Know whether you could provide train code to me.

Hope for your reply.
Best
Wu

Questions about training IKNet

Thank you for great project,I have a few questions about training IKNet

  1. When changing the original 16 rotations of MANO into 21 rotations, do W, T0, I0, M0, R0, and L0 share the rotation of W in the original MANO?
  2. I found the joints_xyz calculated by using MANO ref_pose and the transformed 21 rotation parameters using the method in hand_mesh.py is not equal to the 'J_transformed' saved in the MANO pkl file , the order of joints has been adjusted according to kinematics.py. When using the MANO dataset to train IKNet, how did you get the ground truth 3D joint annotation in Lxyz? Is the calculation method of FK (Q) the same as the calculation method of joint_xyz in hand_mesh.py

How can I save joint location?

That's a nice job, by the way, could you just tell me how or where can I save real-time joint location imfomation? Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.