Coder Social home page Coder Social logo

emanuelegiona / mi2020 Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 3.14 MB

Project for Multimodal Interaction course (A.Y. 2019/2020), GesturePad

License: GNU Affero General Public License v3.0

Python 70.79% Makefile 2.36% C++ 26.84%
python voicetext gesture-control multimodal-interactions text-editor google-cloud-speech sapienza-university mediapipe

mi2020's Introduction

MI2020

Project for Multimodal Interaction course (A.Y. 2019/2020), codename GesturePad.

GesturePad is a text editor capable of producing HTML/Markdown documents allowing multiple input modalities: text or voice and gestures.

The vocal interaction is based on a continuous, dictation-style, speaker-independent speech recognition model implemented by Google Cloud Speech-to-Text.

The gesture interaction is based on arbitrary semaphoric gestures, and their recognition relies on hand landmarks detection by Google MediaPipe, and the processed by a cloud-deployed Google Cloud Vision AutoML model.

Complete details can be found in the PDF report.

Instructions

GesturePad has been developed on Ubuntu 18.04 (LTS) with Python 3.6+. See further installation requirements for Google MediaPipe.

In order to run this project, a Makefile has been set up to contain all the required libraries and Python packages; for this reason the suggested routine for running this project is the following:

  1. Download (and unzip) or clone the project;

  2. Move into the main directory (containing the Makefile);

  3. Run the make command which will: (a) install required system libraries, (b) install the required Python packages, (c) clone Google MediaPipe from its official GitHub repository, and (d) patch the MediaPipe installation with our custom files;

  4. Modify the config.json file according to your Google Cloud Platform subscription and AutoML settings;

  5. Run gesture_pad.py in the main project directory and follow the instructions in the GUI.

Note

It is advised to set up a Python virtual environment and to download/clone the project into a directory whose parent is not a root-protected directory.

License

Code contained in this repository is distributed under AGPL-3.0 license, exceptions below. The file dataset.zip representing the gesture dataset created by us is distributed under CC-BY-4.0.

Authors1: Angelo Di Mambro, Emanuele Giona.

1: equal contribution, alphabetic ordering is applied

Acknowledgments

Files demo_run_graph_main.cc, end_loop_calculator.h, and landmarks_to_render_data_calculator.cc are unmodified copies of the ones present in the repository Sign language recognition with RNN and Mediapipe, which is property of Anna Kim and the same license of the original repository applies.

Files multi_hand_renderer_cpu.pbtxt and multi_hand_tracking_desktop_live.pbtxt are original modifications of the ones present in the repository MediaPipe: Cross-platform ML solutions made simple, which is property of Google's MediaPipe team and they are redistributed under the same AGPL-3.0 license as the rest of this repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.