Coder Social home page Coder Social logo

kiwichen2003 / david_wb_gaze-estimation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from david-wb/gaze-estimation

0.0 1.0 0.0 2.33 MB

A deep learning based gaze estimation framework implemented with PyTorch

Shell 0.08% Python 7.07% Jupyter Notebook 92.85%

david_wb_gaze-estimation's Introduction

Gaze Estimation with Deep Learning

This project implements a deep learning model to predict eye region landmarks and gaze direction. The model is trained on a set of computer generated eye images synthesized with UnityEyes [1]. This work is heavily based on [2] but with some key modifications. This model achieves ~14% mean angular error on the MPIIGaze evaluation set after training on UnityEyes alone.

Setup

NOTE: This repo has been tested only on Ubuntu 16.04 and MacOS.

First, create a conda env for your system and activate it:

conda env create -f env-linux.yml
conda activate ge-linux

Then download the pretrained model files. One is for detecting face landmarks. The other is the main pytorch model.

./scripts/fetch_models.sh

Finally, run the webcam demo. You will likely need a GPU and have cuda 10.1 installed in order to get acceptable performance.

python run_with_webcam.py

If you'd like to train the model yourself, please see the readme under datasets/UnityEyes.

Materials and Methods

Over 100k training images were generated using UnityEyes [1]. These images are each labeled with a json metadata file. The labels provide eye region landmark positions in screenspace, the direction the eye is looking in camera space, and other pieces of information. A rectangular region around the eye was extracted from each raw traing image and normalized to have a width equal to the eye width (1.5 times the distance between eye corners). For each preprocessed image, a set of heatmaps corresponding to 34 eye region landmarks was created. The model was trained to regress directly on the landmark locations and gaze direction in (pitch, yaw) form. The model was implemented in pytorch. The overall method is summarized in the following figure. alt text

The model architecture is based on the stacked hourglass model [3]. The main modification was to add a separate pre-hourglass layer for predicting the gaze direction. The output of the additional layer is concatenated with the predicted eye-region landmarks before being passed to two fully connected layers. This way, the model can make use of the high-level landmark features for predicting the gaze direction.

Demo Video

Watch the video

References

  1. https://www.cl.cam.ac.uk/research/rainbow/projects/unityeyes/
  2. https://github.com/swook/GazeML
  3. https://github.com/princeton-vl/pytorch_stacked_hourglass

david_wb_gaze-estimation's People

Contributors

david-wb avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.