Coder Social home page Coder Social logo

gaze-tracking-pipeline's Introduction

Framework for the Complete Gaze Tracking Pipeline

The figure below shows a general representation of the camera-to-screen gaze tracking pipeline [1]. The webcam image is preprocessed to create a normalized image of the eyes and face, from left to right. These images are fed into a model, which predicts the 3D gaze vector. The predicted gaze vector can be projected onto the screen once the user’s head pose is known.
This framework allows for the implementation of a real-time approach to predict the viewing position on the screen based only on the input image.

camera-to-screen gaze tracking pipeline

  1. pip install -r requirements.txt
  2. If necessary, calibrate the camera using the provided interactive script python calibrate_camera.py, see Camera Calibration by OpenCV.
  3. For higher accuracy, it is also advisable to calibrate the position of the screen as described by Takahashiet al., which provide an OpenCV and matlab implementation.
  4. To make reliable predictions, the proposed model needs to be specially calibration for each user. A software is provided to collect this calibration data.
  5. Train a model or download a pretrained model.
  6. If all previous steps are fulfilled, python main.py --calibration_matrix_path=./calibration_matrix.yaml --model_path=./p00.ckpt can be executed and a "red laser pointer" should be visible on the screen. main.py also provides multiple visualization options like:
    1. --visualize_preprocessing to visualize the preprocessed images
    2. --visualize_laser_pointer to show the gaze point the person is looking at on the screen like a red laserpointer dot, see the right monitor on the image below
    3. --visualize_3d to visualize the head, the screen, and the gaze vector in a 3D scene, see left monitor on the image below

live-example

[1] Amogh Gudi, Xin Li, and Jan van Gemert, “Efficiency in real-time webcam gaze tracking”, in Computer Vision - ECCV 2020 Workshops - Glasgow, UK, August 23-28, 2020, Proceedings, Part I, Adrien Bartoli and Andrea Fusiello, Eds., ser. Lecture Notes in Computer Science, vol. 12535, Springer, 2020, pp. 529–543. DOI : 10.1007/978-3-030-66415-2_34. [Online]. Available: https://doi.org/10.1007/978-3-030-66415-2_34.

gaze-tracking-pipeline's People

Contributors

pperle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gaze-tracking-pipeline's Issues

Duplicated Code

This code in main.py:

face_landmarks = np.asarray([[landmark.x * width, landmark.y * height] for landmark in results.multi_face_landmarks[0].landmark])
face_landmarks = np.asarray([face_landmarks[i] for i in landmarks_ids])
smoothing_buffer.append(face_landmarks)
face_landmarks = np.asarray(smoothing_buffer).mean(axis=0)
success, rvec, tvec, inliers = cv2.solvePnPRansac(face_model, face_landmarks, camera_matrix, dist_coefficients, rvec=rvec, tvec=tvec, useExtrinsicGuess=True, flags=cv2.SOLVEPNP_EPNP) # Initial fit
for _ in range(10):
success, rvec, tvec = cv2.solvePnP(face_model, face_landmarks, camera_matrix, dist_coefficients, rvec=rvec, tvec=tvec, useExtrinsicGuess=True, flags=cv2.SOLVEPNP_ITERATIVE) # Second fit for higher accuracy

seems duplicated in this call:

face_model_transformed, face_model_all_transformed = get_face_landmarks_in_ccs(camera_matrix, dist_coefficients, frame.shape, results, face_model, face_model_all, landmarks_ids)

here:
face_landmarks = np.asarray([[landmark.x * width, landmark.y * height] for landmark in results.multi_face_landmarks[0].landmark])
face_landmarks = np.asarray([face_landmarks[i] for i in landmarks_ids])
rvec, tvec = None, None
success, rvec, tvec, inliers = cv2.solvePnPRansac(face_model, face_landmarks, camera_matrix, dist_coefficients, rvec=rvec, tvec=tvec, useExtrinsicGuess=True, flags=cv2.SOLVEPNP_EPNP) # Initial fit
for _ in range(10):
success, rvec, tvec = cv2.solvePnP(face_model, face_landmarks, camera_matrix, dist_coefficients, rvec=rvec, tvec=tvec, useExtrinsicGuess=True, flags=cv2.SOLVEPNP_ITERATIVE) # Second fit for higher accuracy

with the only difference being the main call smooths the face_landmarks.

Is this intentional?

use of calibration data

Hi, I cannot understand how the calibration data obtained with the main.py script in the gaze-data-collection project is used in this project. In that case a .csv file is produced and the calibration images are saved. How and where are this data used to optimize the gaze vector projection from the 3D space to the point on the monitor on the 2D space? The paper "Efficiency in Real-time Webcam Gaze Tracking" talks about three ways to perform this type of monitor calibration. Geometric, machine learning and hybrid? What kind of optimization do you apply?

Thanks

model ues

model is not used why do i need this???

Gaze tracking is not accurate

Hi! First of all thanks for publishing your work, it is really helpful! I have been working on my graduation project which is very similar to yours and I wanted to run your program to get some idea on how it works.

I was able to calibrate and get the yaml file, then I ran the main,py and manually set entered the screen sizes. However, when I run it the screen appears to be way smaller then the actual screen (playing with the numbers didn't help), the laser doesn't appear , and it shows the red line mostly in the center unless I move my head than the line starts moving towards the side my head moved.
Can it be because of the calibration? Or what might cause it? How did it work for you, can you share with more details please?

Thank you!

error for getting calibration output

Hello, thank you for this work, I want to have an idea about how will be the output.
I want to run the code in the colab:
https://colab.research.google.com/drive/1LvhTnsOw3MnVR5YDOP8euNbcpMvzdUC8?usp=sharing
to get the calibration_matrix.yaml file, while I got this below error:
[ WARN:0] global /tmp/pip-req-build-7m_g9lbm/opencv/modules/videoio/src/cap_v4l.cpp (893) open VIDEOIO(V4L2:/dev/video0): can't open camera by index OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' Traceback (most recent call last): File "camera_calibration.py", line 103, in <module> record_video(width=1280, height=720, fps=30) File "camera_calibration.py", line 24, in record_video for idx, frame in enumerate(source): File "/content/gaze-tracking-pipeline/webcam.py", line 30, in __iter__ raise StopIteration StopIteration
I appreciate let me know how I can fix this issue.
Thanks

python version

what you used python version?
i can't do pip install in python 3.7

Dead links.

Hi @pperle !
Thanks for this original work ! I wanted to try out the demo. Could you provide processed datasets and model checkpoint links ?
Current ones seems to be dead.

Much thanks !

No reaction after running

I am running the main.py as it's described. I've created the yaml file and downloaded the trained model. After running the command, there is no reaction. I am debugging to see the issue and it says : Exception has occurred: OSError

License

Hello, i would like to use parts of the code presented here in my own project.

The gaze-tracking Repo has a license file, sadly this repository does not.
Would it be possible to update this repository with a license?

Best regards.

Gaze Estimation is not accurate

image

When I look at the corners of my screen, the estimated points are only shown inside this square. Is there any scaling factor for that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.