Coder Social home page Coder Social logo

Comments (6)

erkil1452 avatar erkil1452 commented on August 17, 2024
  1. It is the cyclopean eye (mean of both eyes). We do not detect the eyes explicitly so the point is just estimated based on the AlphaPose skeleton.
  2. The tag has a known size (and it is relatively large) so we can get its position from the size in the image. This assumes we know the camera intrinsics, the pixel coordinates of the marker's corners (therefore the 3D view rays) and the physical size of the markers corners. From here we need to find the 3D rotation and translation of the marker that fits the size and shape contraint.
  3. We use 7 frames. I believe we did test fewer as well but I do not see the results anywhere. Presumably they were at least a little bit worse. The MSE Static in Table 2 should show the 1 frame case.
  4. It seems they just ask people to look at a target they choose for them so they know where they look: https://ait.ethz.ch/projects/2020/ETH-XGaze/ . They also know the size of the screen and from the look of the head rest, they will also be able to control the position of the participant. So they have all 3D points under control.

from gaze360.

jxncyym avatar jxncyym commented on August 17, 2024

@erkil1452 thank you for your reeply.

  1. For the fourth question, I want to confirm something about the process of getting the groundtruth .my understanding is that: because the datas are collected in the lab, if we set a point(assume the left top corner of the screen) as the origin of the world coordinate system, so they know every 3D world coordinate of all the lab(include every point of the screen), then they can use Zhengyou Zhang calibration to get the rotation and translation matrix for every camera. then the rotation and translation matrix can be used to covert these 3D world coordinate to camera coordinate, and use the camera coordinate to compute the ground truth. I don't know whether my understanding is right?
  2. another question is : once the camera is calibrated(assume using the method of Zhengyou Zhang calibration), then the rotation and translation matrix will not changed in any environment?
  3. why you use the head image instead of face image to train the model? and do you evaluate the performance of using face image?

from gaze360.

erkil1452 avatar erkil1452 commented on August 17, 2024
  1. You can still use known camera position (from previous calibration) as the origin. Subtracting the position of the gaze target on the screen and head position will give the gaze vector.
  2. Intrinsic matrix does not change when you move camera. Extrinsic matrix (translation and rotation) does generally change.
  3. In 360 deg setup, the face is often not visible. And we only tested the method using the head crops.

from gaze360.

jxncyym avatar jxncyym commented on August 17, 2024

@erkil1452 I'm very sorry I still not fully understood what you said. For I'm newer to this, could you describe detail about the process? I guess: first we fix the camera and the screen,then we calibrate the camera to get the rotation and translation matrix. do you mean use the camera position as the origin of the world coordinate system,and use the distance of the camera position and the gaze target to compute the gaze target world coordinate,compute the head position world coordinate in the same way, then subtract the gaze target and head position, we can get the gaze vector,and use the rotation and translation matrix to convert the gaze vector to camera coordinate system, Is that rigtht?If what I said is not right, could you describe the process detail? for I am a newer to this.

from gaze360.

erkil1452 avatar erkil1452 commented on August 17, 2024

Yes, it is as you say.

from gaze360.

jxncyym avatar jxncyym commented on August 17, 2024

thank you very much

from gaze360.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.