Coder Social home page Coder Social logo

human-pose-estimation's Introduction

Human-pose-estimation

Human pose estimation is a computer vision problem involves detecting human body positions and their key points in images or videos. This repository hosts a pre-trained model (simple-HRNet model) that specializes in this task, enabling the detection of human poses within diverse scenarios.

Table of Contents

  1. Installation
  2. Usage
  3. Output
  4. Visualization of Model Output
  5. Limitations
  6. Improvements
  7. References

Installation

  • Clone the Repository:
    git clone https://github.com/Jd8111997/Human-pose-estimation
  • Navigate to the Simple-HRNet Subdirectory:
    cd simple-HRNet
  • Install Required Packages:
    pip install -r requirements.txt
  • Install the Ultralytics package
    pip install ultralytics
  • Obtain YOLOv5:
  • Download the official pre-trained weights for the model:
    • For COCO w48 384x288 (Default in inference.py): pose_hrnet_w48_384x288.pth
    • Create a new weights directory within the main repository and copy the downloaded pre-trained weights there.
  • Optional Uninstallation (For CPU-Only Machine):
    • Uninstall the nvidia_cublas_cu11 package:
    pip uninstall nvidia_cublas_cu11

Usage

  • The inference.py script analyzes human poses in images, offering the ability to detect bounding boxes and keypoints. Below are the available options:
python inference.py [-h] [--visualize] [--output_folder_name OUTPUT_FOLDER_NAME] image_path
  • image_path: Path to a single image or a directory containing multiple images.
  • --visualize: Flag to enable visualization of bounding box detection and keypoints in the input image.
  • --output_folder_name OUTPUT_FOLDER_NAME: Path to the output directory to save the final output images with visualizations.

Output

Upon executing the inference.py script, it runs a human pose estimation model and prints a dictionary containing model output at the end. This dictionary comprises:

  • Key: Represents the name of the input image.
  • Value: A nested dictionary containing:
    • bounding_box: A numpy array indicating the detected bounding box coordinates of the human.
    • key_points: A numpy array containing 17 keypoints for the detected human within the bounding box.

For Example:

{'baseball1.jpg': {'bounding_box': array([[141,  45, 618, 681]], dtype=int32), 'key_points': array([[[     144.38,      372.88,      1.0009],
        [     131.12,      372.88,     0.87439],
        [     137.75,      359.62,     0.98067],
        [     117.88,      319.88,     0.48314],
        ...
        [     561.75,      200.62,     0.88002]]], dtype=float32)}}

Visualization of model output

The pre-trained model demonstrates strong performance when analyzing diverse images to detect human poses. Particularly, it excels in detecting human poses within CCTV footage. Additional images are attached in the assets.







Limitations

  • Although this pre-trained model is trained on the COCO keypoints dataset, it performs admirably well in detecting human poses across different test distributions and in real CCTV footages. However, further testing is essential on diverse datasets and various edge cases, including different occlusions, viewing angles, and various background contexts.
  • This repository utilizes the open-source simple HRNet model for human pose detection. It follows a top-down approach, initially detecting humans using an object detector and then extracting key points through another deep learning model applied to the object detector's output. Consequently, the model's accuracy heavily relies on the performance of the underlying object detector. Some images exhibit missed detections or false positives by the object detector, leading to reduced accuracy in key point detection.
  • Being pretrained on the COCO keypoints dataset, this model is constrained to regress only 17 keypoints. To identify more intricate keypoints, fine-tuning becomes necessary.
  • In specific images, as illustrated below, the predicted bounding box occasionally fails to encompass the entire human body. Consequently, the key-point detector network struggles to effectively regress all keypoints.


- In certain test images, multiple overlapping bounding boxes are present, resulting in the prediction of redundant key points. Additionally, when multiple humans are in close proximity within the image, it directly impacts the performance of the key-point detection model. Several examples showcasing these scenarios are presented below.





Improvements

  • As it's a pretrained model, fine-tuning for specific tasks is essential to enhance model accuracy.
  • The default object detector in inference.py is YOLOv5 nano, known for its lightweight (only 3.75 million parameters) nature. Users can modify inference.py to experiment with heavier models like YOLOv5 medium and YOLOv5 large for improved object detection accuracy. However, this may increase inference time.
  • In default settings, the script runs with an inference time of approximately 1.5 seconds on a CPU-only machine with 8GB RAM. Deploying it on a GPU machine can significantly enhance the inference speed. Additionally, leveraging TensorRT with the simple-HRNet package can optimize inference time.
  • While simple-HRNet is an older method (released four years ago), there exist multiple state-of-the-art approaches for enhancing key point detection accuracy. Consider exploring the MMPose framework, supporting various cutting-edge human pose detection models for effective deployment.

Reference

human-pose-estimation's People

Contributors

jd8111997 avatar

Stargazers

Mazhar Shaikh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.