Coder Social home page Coder Social logo

rkasale28 / hastakshar Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 200.1 MB

BE-Project : Hastakshar Video Calling Web Application

Python 88.04% HTML 0.76% CSS 0.07% JavaScript 0.42% Shell 0.08% Jupyter Notebook 10.51% Dockerfile 0.12%
isl video-calling web-application web-development django object-detection tensorflow be-project

hastakshar's Introduction

HastAkshar

A video calling web application with Indian Sign Language Interpretation

One of the biggest things which has been noticed during the COVID-19 pandemic is how family and friends have been keeping in contact with the help of multiple video meeting platforms and apps. These social interactions have been so important during this period to compensate the lack of physical contact and have helped to keep spirits high during a truly difficult time for us all. COVID-19 has also forced many people to work from their homes. In-person interactions are a necessity at the workplace, without which information is lost or misinterpreted when just using written and verbal communication. However, there is a great digital divide, which was brought by this shift in technology. Those who are deprived from verbal communication, the deaf and mute, face difficulties using this technology. HastAkshar will come to their aid. HastAkshar will interpret limited ISL cues and transcribe these non-verbal cues.

Problem Definition

HastAkshar is a video calling web-application with the ability to interpret Indian Sign-Language (ISL) cues and transcribe these non-verbal cues to verbal cues. Hence it understands defined ISL for the users who don’t know the language themselves and provide a closed captioning for users, enabling convenient conversation for all parties.

Features of the Project:

  • Video Calling Service: This is the base feature of our service. Network connection would play an important role here. Stable Internet connectivity would ensure uninterrupted communication between the users with minimum latency. Also, camera, microphone and speakers are necessary accessories for a smooth conversation between the two users.
  • ISL Interpretation Service: The application aims at providing video calling service to everyone, including individuals who are mute or deaf. The application will detect ISL cues with input from a camera and transcribe them into verbal cues for users who are unable to comprehend sign-languages. For this feature, use of HD camera is recommended as a poor quality camera can hamper the interpretation model.

Components of the Application

The web application is built using Django – an open-sourced web framework that allows rapid development of the application.

  • Video Calling Service (VCS): The VCS makes use of Socket.IO for communication between the users and it needs to function asynchronously to ensure support to critical functionality of the application. The default PeerCloud service is used for establishing connections between any two users. Once the users are inside the room using the generated ‘roomId’, they can access the calling interface that enables chat and use the ISL Interpretation service on a successful call connection. The messages are sent across by broadcasting a Socket.IO event within the room. All the button toggles are implemented using JavaScript.
  • ISL Interpretation Service (SLS): The SLS module’s activities are primarily supported by a deep learning model which is configured using TensorFlow models. This is done by using SSD MobileNet and TensorFlow’s object detection libraries. Feature extraction is done using ‘ssd_mobilenet_v2_fpb_keras’ function with ‘RELU_6’ activation function. A detailed configuration of the model is stored in a configuration file which is accessed when the model is trained. It also makes use of depthwise separable convolutions to construct an efficient model for 15 classes (one per ISL gesture). The custom dataset used for training this model includes 40 labeled images per ISL gesture split in 4:1 ratio for training and the latter for testing purposes. These images have been contributed by different individuals with no particular background filtering or setup and without any image data augmentation technique. Such conditions will help in the development of a diverse dataset which will improve the model performance when the application is deployed in a real-time environment. The images were labeled using ‘labelImg’ which is an open-source tool that generates XML files in PASCAL VOC format. The XML file consists of the details of the image such as the bounding box points, labels, etc. The model is trained until 30,000 epochs, a point where it achieves a minimum classification loss at a learning rate of 0.0286. An eventual regularized loss of 0.05298 was recorded at the model training termination. Once the model is tested it is exported as an API so that it can be invoked by the application with a request call when the SLS button is switched on for use.

Implementation

Results

  • With the followed work pipeline, 15 ISL gestures including double-handed ones can be interpreted over a live video call between users
  • The model was trained for 30,000 epochs at a learning rate of 0.028
  • The total loss recorded in the model training process was 0.172
  • While using ISL interpretation, the network latency of the overall application is recorded to be less than 2 seconds which suggests faster communication
  • The average response time for the interpretation process from image capture to displaying response on-screen is 0.214 seconds
  • For measuring the accuracy of the object detection model, metrics such as Recall, Precision, Mean Average Precision (mAP), and Intersection over Union (IoU) is used. Below are the tabulated results:
Metric Used Metric Score (ratio)
Average Precision (area = large) 0.853
Average Recall (area = large) 0.871

hastakshar's People

Contributors

dhairyap99 avatar rkasale28 avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

dhairyap99

hastakshar's Issues

.gitignore suggested changes

assets
media

Need to keep these files as database fetches from local file explorer and in case of lost data on the system, it will no longer be available for the application to load the images.

Suggested action: revert changes made in .gitignore

Forget password

Forget password works well but there's a problem. The old password can't be the same as the new password, and validation should be added because if I enter the same password as before, it should instead prompt to login.

Regex not working

Regex needs to be tested for

  • Email Id
  • User name
  • Password

Currently, JS regex is not working

Check Bootstrap version

Current version didn't support bootstrap modals. Hence, changed the version on home.html. Check if any other thing is affected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.