erkil1452 / gaze360 Goto Github PK

View Code? Open in Web Editor NEW

225.0 9.0 42.0 12.98 MB

Code for the Gaze360: Physically Unconstrained Gaze Estimation in the Wild Dataset

Home Page: http://gaze360.csail.mit.edu

License: Other

Python 100.00%

gaze360's Introduction

Gaze360: Physically Unconstrained Gaze Estimation in the Wild Dataset

Updated online demo: https://colab.research.google.com/drive/1SJbzd-gFTbiYjfZynIfrG044fWi6svbV?usp=sharing

About

This is code and data for Gaze360. The usage of the dataset and the code is for non-commercial research use only. By using this code you agree to terms of the LICENSE. If you use our dataset or code cite our paper as:

Petr Kellnhofer*, Adrià Recasens*, Simon Stent, Wojciech Matusik, and Antonio Torralba. “Gaze360: Physically Unconstrained Gaze Estimation in the Wild”. IEEE International Conference on Computer Vision (ICCV), 2019.

@inproceedings{gaze360_2019,
    author = {Petr Kellnhofer and Adria Recasens and Simon Stent and Wojciech Matusik and and Antonio Torralba},
    title = {Gaze360: Physically Unconstrained Gaze Estimation in the Wild},
    booktitle = {IEEE International Conference on Computer Vision (ICCV)},
    month = {October},
    year = {2019}
}

Gaze360 dataset

You can obtain the Gaze360 dataset at http://gaze360.csail.mit.edu. You can find an accurate description of its usage in the dataset section of this repository.

Code

You can find the code for the Gaze360 model in the code section of this repository. You can find code and instructions describing how to run Gaze360 in Youtube videos in

Google Colab notebook v2 (new version using detectron2).
Google Colab notebook (beta) (original version using original DensePose release).

gaze360's People

Contributors

Stargazers

Watchers

Forkers

peterzs raijinspecial manhcuongk55 swayfreeda zhangax1 labimage leon-liangwu andudu thanathasch peterzhousz anilrgukt arduinepo kevinnt2018 io-pdt tummywang davidbp nhatuan84 aliushn primeshshamilka lindayuanyuan kekua01 shashimalcse haokun-wang ulziibayarrepo zhoujin1104 quandosj chenwewu kiwifig jxncyym hgse-schneider stamcini exponentialr qiangtang2017 atoaiari adas-eye brahimmade axaiit igpapadi ajinkyapuar brajskular bluedream1121 rohitaa1306

gaze360's Issues

some error in the traindata

@erkil1452 I trained a model using your train.txt, but there are some warning, I guess there are some error in you training data:

Epoch: [325][651/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular 2.076 (2.421) Loss 0.0120 (0.0128) Prediction Error 2.5859 (2.5833)
Epoch: [325][652/1587] Time 0.722 (0.724) Data 0.009 (0.013) Angular 2.126 (2.421) Loss 0.0104 (0.0128) Prediction Error 2.4450 (2.5831)
Epoch: [325][653/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular 2.402 (2.421) Loss 0.0133 (0.0128) Prediction Error 2.4985 (2.5830)
WARNING:root:NaN or Inf found in input tensor.
Epoch: [325][654/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular nan (nan) Loss 0.0152 (0.0128) Prediction Error 2.5007 (2.5829)
Epoch: [325][655/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular 2.482 (nan) Loss 0.0134 (0.0128) Prediction Error 2.5017 (2.5828)
Epoch: [325][656/1587] Time 0.720 (0.724) Data 0.010 (0.012) Angular 2.701 (nan) Loss 0.0138 (0.0128) Prediction Error 2.8752 (2.5832)
Epoch: [325][657/1587] Time 0.721 (0.724) Data 0.010 (0.012) Angular 1.995 (nan) Loss 0.0106 (0.0128) Prediction Error 2.3353 (2.5828)
Epoch: [325][658/1587] Time 0.723 (0.724) Data 0.010 (0.012) Angular 2.179 (nan) Loss 0.0112 (0.0128) Prediction Error 2.4611 (2.5826)
Epoch: [325][659/1587] Time 0.723 (0.724) Data 0.010 (0.012) Angular 2.341 (nan) Loss 0.0140 (0.0128) Prediction Error 2.5880 (2.5826)
Epoch: [325][660/1587] Time 0.721 (0.724) Data 0.010 (0.012) Angular 2.273 (nan) Loss 0.0114 (0.0128) Prediction Error 2.2808 (2.5822)
Epoch: [325][661/1587] Time 0.720 (0.724) Data 0.010

Full images

Hi,

I'm working on a 2D gaze estimation task using image coordinates. It requires both the head crop images as well as the full scene images. Could you please provide a link to the full image dataset? Thank you.

DATASET ACCESS INFORMATION

Important notes about the dataset access:

The registration does work but we are currently experiencing issues with our email service provider. That means that registration confirmation may be delayed or lost. Despite that, the account should still be ready to use so please try to log in.

Furthermore, there is no way how to reset password. If you have forgotten your password, contact us using the e-mail used for the registration and we will reset the entire account.

how to get train.txt

@erkil1452 how can i get the train.txt, val.txt, and test.txt？ coulde you provide me the code?
thanks !!

How to get mean angular errors

Hi,Petr.Thanks for your work.I have a question to your paper 'Gaze360: Physically Unconstrained Gaze Estimation in the Wild'.In table2 you reports the mean angular errors for various models.How you get this mean angular error? (yaw error+pitch error)/2?

predict

Can I predict gaze on a image by this model ? thanks you

Prediction for single image

Thanks for sharing the code for gaze360. As far as I know, the model is trained on 7-frame videos, what changes should I make to use the model for predicting gaze vector for single image?

where is the physical location of the origin of Ladybug camera coordinate system?

as mentioned in the ICCV2019 paper 'The Ladybug5 consists of five synchronized and overlapping 5 megapixel camera' and the metadata also provides the 'index of the camera'. it seems that the subject and the target are captured by different cameras. Generally each camera has its own coordinate, so which camera dose 'the Ladybug camera coordinate system' based on?

Where to find the supplementary file?

Hi,

I have questions regarding the camera's Cartesian eye coordinate system and would like to refer to the supplementary file for understanding. I can't find the file on the cvf open access and only see the video on the link provided in the paper. Would you mind providing the supplementary file?

It would also be great if you can just address my following concerns: In the paper it is said that the gaze vector is expressed in the camera's Cartesian coordinate system, E_z has the same direction as g_L. But g_L is the gaze direction and varies, does that mean the coordinate system is changing for every gaze? I don't understand this point. And what is p_e in the paper? Is it the eye position?

Thank you very much

Python 2 is deprecated in Colab.

@erkil1452 Python2 has been deprecated by Colab, so the demo code cannot be run on Google Colab notebook. Is there any other way to deal with it?

What is the meaning of the third output varible named var?

for the prediction output, the first two is Yaw and Pitch，the third is var. what does it mean?thanks

How about the predict results?

hello, dear friend!
I wonder the prediction result is (Yaw, Pitch, Roll)?
and how to change to Two dimensional coordinates？
Thanks a lot!

Static Model configuration Query

Hello,
I was interested in knowing about the Static Model.
Quoting your paper:

Static - the backbone model, ResNet-18, and two final layers to compute the prediction

I was interested in knowing last but one layer's size and activation function.
Thanks a lot!
Ashesh

some question about generate ground truth

@erkil1452 hello, I have some questions:

in the article, you said"We compute the gaze vector in the Ladybug coordinate system as a simple difference gL =pt − pe. " so what the pe represent, is the right eye 3d coordinate or the left eye 3d coordinate?
you describe the process to get target 3d coordinate as that "We use the original AprilTag library to detect the marker in each of the camera views and estimate its 3D pose using the known camera calibration parameters and marker size. We then use the pose and known board geometry to find the 3D location of the target cross pt."
I understand the AprilTag can get the 2d coordinate of the marker, then how to get the target 3d coordinate, could you describe the process detail? or you can give an example to describe the process， such that : the detected marker position is (20,50), then maker size is 20 pixel, ......
in the paper, you use 7 pictures to estimate the gaze of the middle picture, do you evaluate the performance of using 5 pictures or 3 pictures?
I notice a new gaze dataset：ETHX-Gaze，they collect the dataset use 2d camera, but I don't find the way to get the ground truth, do you know how they get the gaze label?

Cross dataset evaluation

Thanks for your great paper and code.
I have a question regarding cross dataset evaluation. While you evaluate your static model on MPIIfacegaze, did you test on normalized data or the original dataset without normalization????

Prediction

Hello, I have two questions.
What do the output coordinates represent and how to achieve the visualization?

about the data normalization

Hi, thanks for your great work, I have some questions about inference on my own data.
Like Revisiting data normalization for appearance-based gaze estimation
and Learning-by-synthesis for appearance-based 3d gaze estimation , the face image need to be normalized using some matrix, but you mentioned in #30 that you do not do any face normalization.
So when I test a new image on a model trained on the face images in Gaze360, I just need to crop the face using the bounding box and feed it into the model?

Definition of yaw and pitch

Hi,

In your paper,

yaw=-atan(x./z);
pitch=asin(y);

I don’t know why you define yaw and pitch like this. I think yaw and pitch should be defined as:

r=(x^2+y^2+z^2)^0.5;
yaw=acos(z/r);
pitch=atan(y/x);
(because (x,y,z)is a normalized vector,so r always equals 1)

I know maybe this is a stupid question,but I will be very appreciated if you can help me.Thanks.

Could you tell me setting of gaze360 V2?

Thank you for your great job!
My setting of the environment is as follows:
Ubuntu 20.04
cuda : 9.2
cudnn:
tensorflow: 1.15
tensorflow-gpu: 1.15
pytorch: 1.1.0
but it doesn't work...😭😭
Could you tell me setting of gaze360 V2?

Dataset access

I tried to follow the link (http://gaze360.csail.mit.edu/). I got the following error message using different browsers and devices:

Forbidden
You don't have permission to access / on this server.
Apache/2.4.7 (Ubuntu) Server at gaze360.csail.mit.edu Port 80

Could you please check this issue?
Thanks!

Static model

Hello Petr Kellnhofer,

I am a student in Xidian University, in China, and I am learning gaze estimation. After reading your paper accepted by ICCV2019, I have learned so much. But I can only find the LSTM model on your Github link, and I need to test the static model. Could you provide the static model for me?

I will be very appreciate if you could send the static mode of " Gaze360: Physically Unconstrained Gaze Estimation in the Wild "in ICCV2019 for me.

Thanks very much! Best wishes for you!

Gerong

Incorrect example for getting eye region in README.md

Relevant Section

Following snippet is incorrect.

eyeBBInCrop = [
    eyeBBInFull[0] - headBBInFull[0], # subtract offset of the crop
    eyeBBInFull[1] - headBBInFull[1], 
    eyeBBInFull[2] / headBBInFull[2], # scale to smaller space of the crop
    eyeBBInFull[3] / headBBInFull[3], 
    ]

Correct snippet should be

eyeBBInCrop = [
    (eyeBBInFull[0] - headBBInFull[0])/headBBInFull[2], # subtract offset of the crop
    (eyeBBInFull[1] - headBBInFull[1])/headBBInFull[3], 
    eyeBBInFull[2] / headBBInFull[2], # scale to smaller space of the crop
    eyeBBInFull[3] / headBBInFull[3], 
    ]

Code for domain adaptation

Hi, from the paper I find a part using unsupervised method for domain adaptation to unseen people, would the code for this part be available?

some error in the

@erkil1452 when pepole look up the ceil, I found the model can't prdict precisis , It's because that the training data in the gaze360 don't include these images look up the sky?

some question about gaze estimation

@erkil1452 ，hello, I want to ask a question:
now I only have a RGB camera, I use this camera to capture people face image, how can I get gaze vector(pitch, yaw)? Is there some method or some paper to reference? thanks in advance!

Shader validation error

Thank you so much for an amazing work.
I'm trying to implement your code on my local machine.
at the steps where shaders are created I get an error.

shader = shaders.compileProgram(VERTEX_SHADER, FRAGMENT_SHADER)

ShaderValidationError: Validation failure (0):

package versions:
lucid 0.2.3
PyOpenGL 3.1.5

I would greatly appreciate any help. Thanks!

demo

some question about camera calibration

@erkil1452 hello, I'm sorry to trouble you again. In the issue #34, you told me "For the ball I can imagine using multiple calibrated cameras (see multiview stereo camera calibration) and triangulating the ball position" . In the internet, I find a website, here is the link: https://sites.google.com/site/prclibo/toolbox, there are some code to calibrate multiple cameras, the images used to calibrate cameras also gived in the website. I don't know whether these code can be used to calibrate the cameras as you said above. And whether I need to do some changes about the images input to the code. I think my RGB camera should be pinhole, and in my scenario there are two cameras will be used, one camera is placed in front of the driver's left and the other in front of his right, the pictures taken by the two cameras have some overlap, so are there only two images need to be input to the code?
Another question is the Triangulation method can be used to compute the ball 3d coordinate, I think it also can be used to compute the eye 3d coordinate, so I think don't need the method used in MPIIGaze to get the eye 3d coordinate(For getting 3D position of the person you can either rely on face scale as a cue (MPIIGaze uses that I believe) or you can use additional depth camera (eg Kinect Azure).) I don't know whether I'm right ? There are some implementation of Triangulation in opencv：cv::triangulatePoints，but the fuction only can compute the 3d coordinate of the point matched in two images. I can't promise the centers of the ball or centers the eye will be matched in two images. so I also can't promise I can compute the ball 3d coordinate and the eye 3d coordinate. Is that right?

fail to download dataset

hi,
I have registered for Gaze360, but I can't login , I try again and again, but it always fail.
please tell what's wrong with that.

Adapting to detectron2 : wrong predicted multiple heads boxes

Hi,

I adapted the Colab tutorial to detectron2, and rewrote the script into three classes to make it clearer. It works well on single people pictures ; but not at all on multiple people pictures, for example :

Could anyone please help me ? I thank you a lot.

Dataset download interrupted

Data download link obtained after registration. Because the data set is too large, it will be invalid after downloading. Can I get Google cloud address after registration，like dropbox?

Location of stored coordinates

Hello,
I tried out the google colaboratory version of gaze estimation but was not sure where the coordinates for the arrow pointing in direction of gaze are stored using which, it is later on plotted.

Missing data for rec_079 which has samples in the test category

loc = (metadata['split'] == 2) & (metadata['recordings'][metadata['recording']] == 'rec_079')
print(np.sum(loc))

It tells me that there are 733 images in test.

how to get gaze label?

@erkil1452 hello, I notice there are a new gaze dataset named ETH-XGaze(https://ait.ethz.ch/projects/2020/ETH-XGaze/), I still have some questions about how to get the gaze label. 1) how to get the target 3d coordinate. you told me to measure the target 3d coordinate use tapeline. there I have a question? what measurement units I should use: mm,cm or m? the second question is how to choose the origin of the wold coordinate? when the origin change, the target world coordiante also changed。then when we convert the target world coordiante to the cameras coordiante, the target 3d coordiante will different。2）how to get the eye 3d coordiante。first, I want to use dlib to get the face landmarks, the use the 3d face landmarks models to convert, but this method can only get rotation matirx and translation matrix throught pnp, and I don't know how to get the eye 3d coordiante through the rotation matrix and translation matrix. can you help me？

Unable to login the account for the Gaze360

To whom it may concern,

I was trying to register for the account to access the Gaze360 dataset for a human interaction project.

It seems I can not log in with the activated e-mail address. There might be some problems with the dataset login system. I could not log in with the right password. I tried few times and got forbidden. And I could not contact you via the email address [email protected]. And I have sent the email to [email protected]. Could you please help me?

Best regards,
Haiyuan Liu

About eye and camera coordinate system in dataset

@erkil1452
Hi, thanks for your great job, but when I try to use your dataset, I have some problems.

In https://github.com/erkil1452/gaze360/tree/master/dataset, it said:

gaze_dir = M * (target_pos3d - person_eyes3d) where M depends on a normal direction between eyes and the camera
Is M the conversion matrix M=SR just like Revisiting data normalization for appearance-based gaze estimation
and Learning-by-synthesis for appearance-based 3d gaze estimation ?

If 1 is true, when I use a raw image in dataset, do I still need to do some data normalization?
How can I get the camera parameters in dataset so that I can compute M matrix
In your paper, I also found an illuminating part "Estimating attention in a supermarket", if it is possible, could you please tell me how to convert the gaze vector to the shelf

Thanks a lot!

how to compute these datas in train.txt

@erkil1452 In the train.txt, I notice the data format such like that:

rec_022/head/000000/000131.jpg 0.453840661720333 0.057788951913994 -0.889207001100381
rec_022/head/000022/000131.jpg 0.187331672118635 0.036473247576072 -0.981619349255347
rec_022/head/000001/000131.jpg 0.376386464341092 0.055499673938626 -0.924798905521367
rec_022/head/000002/000131.jpg 0.286355056945921 0.029490498088630 -0.957669615203481
rec_022/head/000584/000131.jpg 0.430812587090941 0.056813836181665 -0.900651265930567

I want to know how to compute these datas:
0.453840661720333 0.057788951913994 -0.889207001100381
0.187331672118635 0.036473247576072 -0.981619349255347
0.376386464341092 0.055499673938626 -0.924798905521367
0.286355056945921 0.029490498088630 -0.957669615203481
0.430812587090941 0.056813836181665 -0.900651265930567

Is there some formula to compute these data? or some code to get these data?

test error

Using the author's code,the mean angle error of the model is 12.8 degrees.There is a certain gap with the results in the essay.What could be the reason?

Pitch Yaw angels

Hello, I have a question

you report that gaze_dir - 3D gaze direction in the Eye coordinate system, So how do you calculate the spherical angles pitch and yaw from the direction vector and in which coordinate system are these angles???

Cross-dataset evaluation

Hi, I am very interested in your gaze360. Recently, I am trying to reproduce your results.
When I try to test the Cross-dataset evaluation(train in Gaze360, and test in Columbia Gaze), I find my result(10.8°) is much lower than that showing in your paper(9.0°)。
Here, my method is to use the LSTM model with 7 identical images.

After that, I also try to train a static model(just resnet18 and no LSTM), The result(13.3°) is also lower than 9.0°

I use all the image in Columbia Gaze for testing. And my annotation from the image name:V and H
gaze[0] = -math.cos(V) * math.sin(H)
gaze[1] = math.sin(V)
gaze[2] = -math.cos(V) * math.cos(H)

I hope you can answer my doubts or point out my mistakes in my operation.
Thanks

Colab notebook error Lucid

Has anyone tried running the Colab Notebook please? I'm getting an error ModuleNotFoundError: No module named 'lucid' when trying to run import lucid.misc.io.showing as show. Thank you for your help

some question about the code

@erkil1452 hello, Recently, I'm viewing your code, there are some doubts, which are lined with red:

I don't know why the predicted value need use nn.Tanh, and one mutiply math.pi, while another mutiply math.pi/2;
I also don't know what's mean of [:,3,:]

supplementary file is missing

Hi,

Could you please provide the supplemental file? I have some doubts with the coordinate system.

thanks,
Anil

some problem about the predict

@erkil1452 hello, I'm sorry to trouble you again. I test my own trained model using the camera that comes with the laptop, the camera is on the top-midde of the labtop. when I sit in the front of my labtop , the position of the eyes are on the same height of the camera, I got the correct gaze vector. However, when I test my own trained model using the video from the website, the gaze vector is not correct, I don't know why? whether I should do some thing like normalization of the head? another similar question is : when the tester is not in front of the camera, maybe there are some angles between the camera and the eyes(for example, the camera is on the left side or right side of the tester, or the camera looks up at the face), how can I get the correct gaze vector? thankyou!

Thanks!

how to present the predicted result

Is there some code to get the picture like that:

I don't know how to present the predicted result, for I only get the result(x,y,z)or(pitch,yaw), then how to present the predicted result on the image, like the picture as above?