Coder Social home page Coder Social logo

signlanguage_detection's Introduction

Detecting Sign Language using action detection

Overview

This project entails the reading of sign language given an action by a user. The action is mapped to a dictionary that contain the meaning (word) of the action. The model is used is a LSTM Neural Network due to the sequential nature of the data. Each action contains a pre-defined number of frames of 30, which contains landmarks that are detected and stored MediaPipe as Numpy arrays.

Why LSTM

Since we are dealing with a sequential form of data, in the form of frames in sequence. A simple RNN was used but yielded poor results during inference as weights are not updated optimally due to long sequences. Thus encountering the vanishing gradient problem and the model performs poorly.

Process

  1. Data Preparation: Using Mediapipe and OpenCV, individual actions are recorded and stored as frames. For sufficient training data, each action consist of 30 videos (Sequences). Each sequence are split into 30 frames, with each frame being represented as a numpy array with all landmarks (Face, limbs, fingers, etc). After capturing the actions, the arrays are then labelled with their corresponding action names for prediction after training.

  2. The sequences of data are partitioned and fed into the model for training and testing . The model compromises of 3 LSTM and Dense layers, with a final softmax layer for prediction of action with the highest probability. The trained model is then exported as in .h5 format and loaded for inference.

  3. During inference, the probability distribution of each word is shown on the screen to monitor the model's accuracy. Words are also recorded to a sentence to keep track of its past predicted actions.

Inference

alt text

alt text

Evaluation of Model

alt text

Dependency Notes

  1. To view tensorboard, downgrade protobuf from 3.20.3 to 3.20.1 via pip install --upgrade protobuf==3.20.1. Tensorboard can be brought up using tensorboard --logdir=.

Mediapipe

Mediapipe provides trained ML models for building pipelines to perform computer vision inference over arbitrary sensory data such as video or audio. In this case, it was used to detect facial and body features and outputted in the form of landmarks that are easily manipulated.

Further Notes

In the future, I hope to extend this pipeline to be finedtuned to a specific sign language. Given multiple sign languages in the world such as the American Sign Language, Spanish Sign Language etc, finetuning is required to encompass different gestures.

Limitation

Each action is currently mapped to a label, and each frame is independent of the other. In the real world where facial features, transitions between actions are also essential in building the context of the sentence, higher level of details are required to be captured. More complex architecture are needed to capture such spatial data such as Transformers and build up the correct context of a sentence.

signlanguage_detection's People

Contributors

tshjustin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.