A tensorflow implementation of VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera.
For the caffe model required in the repository: please contact the author of the paper.
- Python 3.5
- opencv-python 3.4.4.19
- tensorflow-gpu 1.12.0
- pycaffe
- matplotlib 3.0.0 or 3.0.2 (this module shuts down occasionally for unknown reason)
- ……
Fedora 29
pip3 install -r requirements.txt --user
sudo dnf install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel glog-devel gflags-devel lmdb-devel atlas-devel python-lxml boost-python3-devel
git clone https://github.com/BVLC/caffe.git
cd caffe
sudo make all
sudo make runtest
sudo make pycaffe
sudo make distribute
sudo cp .build_release/lib/ /usr/lib64
sudo cp -a distribute/python/caffe/ /usr/lib/python3.7/site-packages/
- Drop the caffe model into
models/caffe_model
. - Run
init_weights.py
to generate tensorflow model.
-
benchmark.py
is a class implementation containing all the elements needed to run the model. -
run_estimator.py
is a script for running with video stream. -
run_estimator_ps.py
is a multiprocessing version script. When 3d plotting function shuts down inrun_estimator.py
mentioned above, you can try this one. -
run_estimator_robot.py
provides ROS and serial connection for communication in robot controlling besides the functions inrun_estimator.py
. -
[NOTE] To run the video stream based scripts mentioned above:
i ) click the left mouse button to confirm a simple static bounding box generated by HOG method;
ii) trigger any keyboard input to exit while the network running.
-
run_pic.py
is a script for running with one single picture: the outputs are 4×21 heatmaps and 2D results.
- I don't know why in some cases the 3d plotting function shuts down in the script. It may result from the variety of programming environments. Anyone capable to fix this and pull a request would be gratefully appreciated.
- The input image in this implementation is in BGR color format (cv2.imread()) and the pixel value is regulated into a range of [-0.4, 0.6).
- The joint-parent map (detailed information in
joint_index.xlsx
):
- The joint positions (index numbers as above):
- Every input image is assumed to contain 21 joints to be found, which means it is easy to fit wrong results when a joint is actually not in the input.
- In some cases the estimation results are not so good as the results shown in the paper author's promotional video.
- UPDATE: the running speed is now faster thanks to some coordinate extraction optimization!
- Optimize the structure of the codes.
- Implement a better bounding box strategy.
- Implement the training script.
Refer to MPI-INF-3DHP Dataset
- original MATLAB implementation provided by the model author
- timctho/VNect-tensorflow
- EJShim/vnect_estimator