Human pose estimation is a computer vision problem involves detecting human body positions and their key points in images or videos. This repository hosts a pre-trained model (simple-HRNet model) that specializes in this task, enabling the detection of human poses within diverse scenarios.
- Clone the Repository:
git clone https://github.com/Jd8111997/Human-pose-estimation
- Navigate to the Simple-HRNet Subdirectory:
cd simple-HRNet
- Install Required Packages:
pip install -r requirements.txt
- Install the Ultralytics package
pip install ultralytics
- Obtain YOLOv5:
- Clone [YOLOv5](git clone https://github.com/ultralytics/yolov5)
into the
./simple-HRNet/models_/detectors
folder and change the folder name fromyolov5
toyolo
.
- Clone [YOLOv5](git clone https://github.com/ultralytics/yolov5)
into the
- Download the official pre-trained weights for the model:
- For COCO w48 384x288 (Default in inference.py): pose_hrnet_w48_384x288.pth
- Create a new weights directory within the main repository and copy the downloaded pre-trained weights there.
- Optional Uninstallation (For CPU-Only Machine):
- Uninstall the
nvidia_cublas_cu11
package:
pip uninstall nvidia_cublas_cu11
- Uninstall the
- The
inference.py
script analyzes human poses in images, offering the ability to detect bounding boxes and keypoints. Below are the available options:
python inference.py [-h] [--visualize] [--output_folder_name OUTPUT_FOLDER_NAME] image_path
image_path
: Path to a single image or a directory containing multiple images.--visualize
: Flag to enable visualization of bounding box detection and keypoints in the input image.--output_folder_name OUTPUT_FOLDER_NAME
: Path to the output directory to save the final output images with visualizations.
Upon executing the inference.py
script, it runs a human pose estimation model and prints a dictionary containing model output at the end. This dictionary comprises:
- Key: Represents the name of the input image.
- Value: A nested dictionary containing:
- bounding_box: A numpy array indicating the detected bounding box coordinates of the human.
- key_points: A numpy array containing 17 keypoints for the detected human within the bounding box.
For Example:
{'baseball1.jpg': {'bounding_box': array([[141, 45, 618, 681]], dtype=int32), 'key_points': array([[[ 144.38, 372.88, 1.0009],
[ 131.12, 372.88, 0.87439],
[ 137.75, 359.62, 0.98067],
[ 117.88, 319.88, 0.48314],
...
[ 561.75, 200.62, 0.88002]]], dtype=float32)}}
The pre-trained model demonstrates strong performance when analyzing diverse images to detect human poses. Particularly, it excels in detecting human poses within CCTV footage. Additional images are attached in the assets.
- Although this pre-trained model is trained on the COCO keypoints dataset, it performs admirably well in detecting human poses across different test distributions and in real CCTV footages. However, further testing is essential on diverse datasets and various edge cases, including different occlusions, viewing angles, and various background contexts.
- This repository utilizes the open-source simple HRNet model for human pose detection. It follows a top-down approach, initially detecting humans using an object detector and then extracting key points through another deep learning model applied to the object detector's output. Consequently, the model's accuracy heavily relies on the performance of the underlying object detector. Some images exhibit missed detections or false positives by the object detector, leading to reduced accuracy in key point detection.
- Being pretrained on the COCO keypoints dataset, this model is constrained to regress only 17 keypoints. To identify more intricate keypoints, fine-tuning becomes necessary.
- In specific images, as illustrated below, the predicted bounding box occasionally fails to encompass the entire human body. Consequently, the key-point detector network struggles to effectively regress all keypoints.
- As it's a pretrained model, fine-tuning for specific tasks is essential to enhance model accuracy.
- The default object detector in inference.py is YOLOv5 nano, known for its lightweight (only 3.75 million parameters) nature. Users can modify inference.py to experiment with heavier models like YOLOv5 medium and YOLOv5 large for improved object detection accuracy. However, this may increase inference time.
- In default settings, the script runs with an inference time of approximately 1.5 seconds on a CPU-only machine with 8GB RAM. Deploying it on a GPU machine can significantly enhance the inference speed. Additionally, leveraging TensorRT with the simple-HRNet package can optimize inference time.
- While simple-HRNet is an older method (released four years ago), there exist multiple state-of-the-art approaches for enhancing key point detection accuracy. Consider exploring the MMPose framework, supporting various cutting-edge human pose detection models for effective deployment.