Coder Social home page Coder Social logo

gaze360's People

Contributors

erkil1452 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaze360's Issues

some question about the code

@erkil1452 hello, Recently, I'm viewing your code, there are some doubts, which are lined with red:
image

I don't know why the predicted value need use nn.Tanh, and one mutiply math.pi, while another mutiply math.pi/2;
I also don't know what's mean of [:,3,:]

Demo

Hello.
how do i reduce red arrow size when demo on gg colab ? . it's too big.
Sorry,my english is not good.
thanks you

fail to download dataset

hi,
I have registered for Gaze360, but I can't login , I try again and again, but it always fail.
please tell what's wrong with that.

some problem about the predict

@erkil1452 hello, I'm sorry to trouble you again. I test my own trained model using the camera that comes with the laptop, the camera is on the top-midde of the labtop. when I sit in the front of my labtop , the position of the eyes are on the same height of the camera, I got the correct gaze vector. However, when I test my own trained model using the video from the website, the gaze vector is not correct, I don't know why? whether I should do some thing like normalization of the head? another similar question is : when the tester is not in front of the camera, maybe there are some angles between the camera and the eyes(for example, the camera is on the left side or right side of the tester, or the camera looks up at the face), how can I get the correct gaze vector? thankyou!

test error

Using the author's code,the mean angle error of the model is 12.8 degrees.There is a certain gap with the results in the essay.What could be the reason?

some question about gaze estimation

@erkil1452 ,hello, I want to ask a question:
now I only have a RGB camera, I use this camera to capture people face image, how can I get gaze vector(pitch, yaw)? Is there some method or some paper to reference? thanks in advance!

An error occurred in the test results

@erkil1452 hello!I'm sorry to bother you, I'm having some problems while running the code. My test had an angular error of 75, and my attempt to modify some parameters didn't help. Can you tell me some possible reasons? Thank you !
微信图片_20220524224804

DATASET ACCESS INFORMATION

Important notes about the dataset access:

The registration does work but we are currently experiencing issues with our email service provider. That means that registration confirmation may be delayed or lost. Despite that, the account should still be ready to use so please try to log in.

Furthermore, there is no way how to reset password. If you have forgotten your password, contact us using the e-mail used for the registration and we will reset the entire account.

some error in the traindata

@erkil1452 I trained a model using your train.txt, but there are some warning, I guess there are some error in you training data:

Epoch: [325][651/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular 2.076 (2.421) Loss 0.0120 (0.0128) Prediction Error 2.5859 (2.5833)
Epoch: [325][652/1587] Time 0.722 (0.724) Data 0.009 (0.013) Angular 2.126 (2.421) Loss 0.0104 (0.0128) Prediction Error 2.4450 (2.5831)
Epoch: [325][653/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular 2.402 (2.421) Loss 0.0133 (0.0128) Prediction Error 2.4985 (2.5830)
WARNING:root:NaN or Inf found in input tensor.
Epoch: [325][654/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular nan (nan) Loss 0.0152 (0.0128) Prediction Error 2.5007 (2.5829)
Epoch: [325][655/1587] Time 0.721 (0.724) Data 0.010 (0.013) Angular 2.482 (nan) Loss 0.0134 (0.0128) Prediction Error 2.5017 (2.5828)
Epoch: [325][656/1587] Time 0.720 (0.724) Data 0.010 (0.012) Angular 2.701 (nan) Loss 0.0138 (0.0128) Prediction Error 2.8752 (2.5832)
Epoch: [325][657/1587] Time 0.721 (0.724) Data 0.010 (0.012) Angular 1.995 (nan) Loss 0.0106 (0.0128) Prediction Error 2.3353 (2.5828)
Epoch: [325][658/1587] Time 0.723 (0.724) Data 0.010 (0.012) Angular 2.179 (nan) Loss 0.0112 (0.0128) Prediction Error 2.4611 (2.5826)
Epoch: [325][659/1587] Time 0.723 (0.724) Data 0.010 (0.012) Angular 2.341 (nan) Loss 0.0140 (0.0128) Prediction Error 2.5880 (2.5826)
Epoch: [325][660/1587] Time 0.721 (0.724) Data 0.010 (0.012) Angular 2.273 (nan) Loss 0.0114 (0.0128) Prediction Error 2.2808 (2.5822)
Epoch: [325][661/1587] Time 0.720 (0.724) Data 0.010

Cross dataset evaluation

Thanks for your great paper and code.
I have a question regarding cross dataset evaluation. While you evaluate your static model on MPIIfacegaze, did you test on normalized data or the original dataset without normalization????

Location of stored coordinates

Hello,
I tried out the google colaboratory version of gaze estimation but was not sure where the coordinates for the arrow pointing in direction of gaze are stored using which, it is later on plotted.

some question about camera calibration

@erkil1452 hello, I'm sorry to trouble you again. In the issue #34, you told me "For the ball I can imagine using multiple calibrated cameras (see multiview stereo camera calibration) and triangulating the ball position" . In the internet, I find a website, here is the link: https://sites.google.com/site/prclibo/toolbox, there are some code to calibrate multiple cameras, the images used to calibrate cameras also gived in the website. I don't know whether these code can be used to calibrate the cameras as you said above. And whether I need to do some changes about the images input to the code. I think my RGB camera should be pinhole, and in my scenario there are two cameras will be used, one camera is placed in front of the driver's left and the other in front of his right, the pictures taken by the two cameras have some overlap, so are there only two images need to be input to the code?
Another question is the Triangulation method can be used to compute the ball 3d coordinate, I think it also can be used to compute the eye 3d coordinate, so I think don't need the method used in MPIIGaze to get the eye 3d coordinate(For getting 3D position of the person you can either rely on face scale as a cue (MPIIGaze uses that I believe) or you can use additional depth camera (eg Kinect Azure).) I don't know whether I'm right ? There are some implementation of Triangulation in opencv:cv::triangulatePoints,but the fuction only can compute the 3d coordinate of the point matched in two images. I can't promise the centers of the ball or centers the eye will be matched in two images. so I also can't promise I can compute the ball 3d coordinate and the eye 3d coordinate. Is that right?

some error in the

@erkil1452 when pepole look up the ceil, I found the model can't prdict precisis , It's because that the training data in the gaze360 don't include these images look up the sky?

Pitch Yaw angels

Hello, I have a question

you report that gaze_dir - 3D gaze direction in the Eye coordinate system, So how do you calculate the spherical angles pitch and yaw from the direction vector and in which coordinate system are these angles???

fail to obtain gaze360 dataset

Hi, I have registered successfully yesterday. But, I can't login for dataset until now. How long should I wait for? Several hours seems not true. Is there any other ways for dataset?

Looking forward to your reply.

Thanks!

where is the physical location of the origin of Ladybug camera coordinate system?

as mentioned in the ICCV2019 paper 'The Ladybug5 consists of five synchronized and overlapping 5 megapixel camera' and the metadata also provides the 'index of the camera'. it seems that the subject and the target are captured by different cameras. Generally each camera has its own coordinate, so which camera dose 'the Ladybug camera coordinate system' based on?

some question about generate ground truth

@erkil1452 hello, I have some questions:

  1. in the article, you said"We compute the gaze vector in the Ladybug coordinate system as a simple difference gL =pt − pe. " so what the pe represent, is the right eye 3d coordinate or the left eye 3d coordinate?

  2. you describe the process to get target 3d coordinate as that "We use the original AprilTag library to detect the marker in each of the camera views and estimate its 3D pose using the known camera calibration parameters and marker size. We then use the pose and known board geometry to find the 3D location of the target cross pt."
    I understand the AprilTag can get the 2d coordinate of the marker, then how to get the target 3d coordinate, could you describe the process detail? or you can give an example to describe the process, such that : the detected marker position is (20,50), then maker size is 20 pixel, ......

  3. in the paper, you use 7 pictures to estimate the gaze of the middle picture, do you evaluate the performance of using 5 pictures or 3 pictures?

  4. I notice a new gaze dataset:ETHX-Gaze,they collect the dataset use 2d camera, but I don't find the way to get the ground truth, do you know how they get the gaze label?

Dataset access

I tried to follow the link (http://gaze360.csail.mit.edu/). I got the following error message using different browsers and devices:

Forbidden
You don't have permission to access / on this server.
Apache/2.4.7 (Ubuntu) Server at gaze360.csail.mit.edu Port 80

Could you please check this issue?
Thanks!

Static Model configuration Query

Hello,
I was interested in knowing about the Static Model.
Quoting your paper:

Static - the backbone model, ResNet-18, and two final layers to compute the prediction

I was interested in knowing last but one layer's size and activation function.
Thanks a lot!
Ashesh

Colab notebook error Lucid

Has anyone tried running the Colab Notebook please? I'm getting an error ModuleNotFoundError: No module named 'lucid' when trying to run import lucid.misc.io.showing as show. Thank you for your help

Definition of yaw and pitch

Hi,

In your paper,

yaw=-atan(x./z);
pitch=asin(y);

I don’t know why you define yaw and pitch like this. I think yaw and pitch should be defined as:

r=(x^2+y^2+z^2)^0.5;
yaw=acos(z/r);
pitch=atan(y/x);
(because (x,y,z)is a normalized vector,so r always equals 1)

I know maybe this is a stupid question,but I will be very appreciated if you can help me.Thanks.

Where to find the supplementary file?

Hi,

I have questions regarding the camera's Cartesian eye coordinate system and would like to refer to the supplementary file for understanding. I can't find the file on the cvf open access and only see the video on the link provided in the paper. Would you mind providing the supplementary file?

It would also be great if you can just address my following concerns: In the paper it is said that the gaze vector is expressed in the camera's Cartesian coordinate system, E_z has the same direction as g_L. But g_L is the gaze direction and varies, does that mean the coordinate system is changing for every gaze? I don't understand this point. And what is p_e in the paper? Is it the eye position?

Thank you very much

predict

Can I predict gaze on a image by this model ? thanks you

Cross-dataset evaluation

Hi, I am very interested in your gaze360. Recently, I am trying to reproduce your results.
When I try to test the Cross-dataset evaluation(train in Gaze360, and test in Columbia Gaze), I find my result(10.8°) is much lower than that showing in your paper(9.0°)。
Here, my method is to use the LSTM model with 7 identical images.

After that, I also try to train a static model(just resnet18 and no LSTM), The result(13.3°) is also lower than 9.0°

I use all the image in Columbia Gaze for testing. And my annotation from the image name:V and H
gaze[0] = -math.cos(V) * math.sin(H)
gaze[1] = math.sin(V)
gaze[2] = -math.cos(V) * math.cos(H)

I hope you can answer my doubts or point out my mistakes in my operation.
Thanks

Incorrect example for getting eye region in README.md

Relevant Section

Following snippet is incorrect.

eyeBBInCrop = [
    eyeBBInFull[0] - headBBInFull[0], # subtract offset of the crop
    eyeBBInFull[1] - headBBInFull[1], 
    eyeBBInFull[2] / headBBInFull[2], # scale to smaller space of the crop
    eyeBBInFull[3] / headBBInFull[3], 
    ]

Correct snippet should be

eyeBBInCrop = [
    (eyeBBInFull[0] - headBBInFull[0])/headBBInFull[2], # subtract offset of the crop
    (eyeBBInFull[1] - headBBInFull[1])/headBBInFull[3], 
    eyeBBInFull[2] / headBBInFull[2], # scale to smaller space of the crop
    eyeBBInFull[3] / headBBInFull[3], 
    ]

Shader validation error

Thank you so much for an amazing work.
I'm trying to implement your code on my local machine.
at the steps where shaders are created I get an error.

shader = shaders.compileProgram(VERTEX_SHADER, FRAGMENT_SHADER)

ShaderValidationError: Validation failure (0):

package versions:
lucid 0.2.3
PyOpenGL 3.1.5

I would greatly appreciate any help. Thanks!

demo

how do i shrink the gaze vector in demo? it quite big

Prediction for single image

Thanks for sharing the code for gaze360. As far as I know, the model is trained on 7-frame videos, what changes should I make to use the model for predicting gaze vector for single image?

Code for domain adaptation

Hi, from the paper I find a part using unsupervised method for domain adaptation to unseen people, would the code for this part be available?

how to get gaze label?

@erkil1452 hello, I notice there are a new gaze dataset named ETH-XGaze(https://ait.ethz.ch/projects/2020/ETH-XGaze/), I still have some questions about how to get the gaze label. 1) how to get the target 3d coordinate. you told me to measure the target 3d coordinate use tapeline. there I have a question? what measurement units I should use: mm,cm or m? the second question is how to choose the origin of the wold coordinate? when the origin change, the target world coordiante also changed。then when we convert the target world coordiante to the cameras coordiante, the target 3d coordiante will different。2)how to get the eye 3d coordiante。first, I want to use dlib to get the face landmarks, the use the 3d face landmarks models to convert, but this method can only get rotation matirx and translation matrix throught pnp, and I don't know how to get the eye 3d coordiante through the rotation matrix and translation matrix. can you help me?

How to get mean angular errors

Hi,Petr.Thanks for your work.I have a question to your paper 'Gaze360: Physically Unconstrained Gaze Estimation in the Wild'.In table2 you reports the mean angular errors for various models.How you get this mean angular error? (yaw error+pitch error)/2?

how to present the predicted result

Is there some code to get the picture like that:

图片

I don't know how to present the predicted result, for I only get the result(x,y,z)or(pitch,yaw), then how to present the predicted result on the image, like the picture as above?

About eye and camera coordinate system in dataset

@erkil1452
Hi, thanks for your great job, but when I try to use your dataset, I have some problems.

  1. In https://github.com/erkil1452/gaze360/tree/master/dataset, it said:

gaze_dir = M * (target_pos3d - person_eyes3d) where M depends on a normal direction between eyes and the camera
Is M the conversion matrix M=SR just like Revisiting data normalization for appearance-based gaze estimation
and Learning-by-synthesis for appearance-based 3d gaze estimation ?

  1. If 1 is true, when I use a raw image in dataset, do I still need to do some data normalization?
  2. How can I get the camera parameters in dataset so that I can compute M matrix
  3. In your paper, I also found an illuminating part "Estimating attention in a supermarket", if it is possible, could you please tell me how to convert the gaze vector to the shelf

Thanks a lot!

Dataset download interrupted

Data download link obtained after registration. Because the data set is too large, it will be invalid after downloading. Can I get Google cloud address after registration,like dropbox?

about the data normalization

Hi, thanks for your great work, I have some questions about inference on my own data.
Like Revisiting data normalization for appearance-based gaze estimation
and Learning-by-synthesis for appearance-based 3d gaze estimation
, the face image need to be normalized using some matrix, but you mentioned in #30 that you do not do any face normalization.
So when I test a new image on a model trained on the face images in Gaze360, I just need to crop the face using the bounding box and feed it into the model?

Unable to login the account for the Gaze360

To whom it may concern,

I was trying to register for the account to access the Gaze360 dataset for a human interaction project.

It seems I can not log in with the activated e-mail address. There might be some problems with the dataset login system. I could not log in with the right password. I tried few times and got forbidden. And I could not contact you via the email address [email protected]. And I have sent the email to [email protected]. Could you please help me?

Best regards,
Haiyuan Liu

Could you tell me setting of gaze360 V2?

Thank you for your great job!
My setting of the environment is as follows:
Ubuntu 20.04
cuda : 9.2
cudnn:
tensorflow: 1.15
tensorflow-gpu: 1.15
pytorch: 1.1.0
but it doesn't work...😭😭
Could you tell me setting of gaze360 V2?

how to compute these datas in train.txt

@erkil1452 In the train.txt, I notice the data format such like that:

rec_022/head/000000/000131.jpg 0.453840661720333 0.057788951913994 -0.889207001100381
rec_022/head/000022/000131.jpg 0.187331672118635 0.036473247576072 -0.981619349255347
rec_022/head/000001/000131.jpg 0.376386464341092 0.055499673938626 -0.924798905521367
rec_022/head/000002/000131.jpg 0.286355056945921 0.029490498088630 -0.957669615203481
rec_022/head/000584/000131.jpg 0.430812587090941 0.056813836181665 -0.900651265930567

I want to know how to compute these datas:
0.453840661720333 0.057788951913994 -0.889207001100381
0.187331672118635 0.036473247576072 -0.981619349255347
0.376386464341092 0.055499673938626 -0.924798905521367
0.286355056945921 0.029490498088630 -0.957669615203481
0.430812587090941 0.056813836181665 -0.900651265930567

Is there some formula to compute these data? or some code to get these data?

Static model

Hello Petr Kellnhofer,

I am a student in Xidian University, in China, and I am learning gaze estimation. After reading your paper accepted by ICCV2019, I have learned so much. But I can only find the LSTM model on your Github link, and I need to test the static model. Could you provide the static model for me?

I will be very appreciate if you could send the static mode of " Gaze360: Physically Unconstrained Gaze Estimation in the Wild "in ICCV2019 for me.

Thanks very much! Best wishes for you!

Gerong

Full images

Hi,

I'm working on a 2D gaze estimation task using image coordinates. It requires both the head crop images as well as the full scene images. Could you please provide a link to the full image dataset? Thank you.

How about the predict results?

hello, dear friend!
I wonder the prediction result is (Yaw, Pitch, Roll)?
and how to change to Two dimensional coordinates?
Thanks a lot!

Prediction

Hello, I have two questions.
What do the output coordinates represent and how to achieve the visualization?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.