Coder Social home page Coder Social logo

ivgaze's Introduction

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng , Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang, CVPR 2024

Description

This repository provides offical code of the paper titled What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation, accepted at CVPR24. Our contribution includes:

  • We provide a dataset IVGaze collected on vehicles containing 44k images of 125 subjects.
  • We propose a gaze pyramid transformer (GazePTR) that leverages transformer-based multilevel features integration.
  • We introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation.

Please visit our project page for details. The dataset is available on this page .

Requirement

  1. Install Pytorch and torchvision. This code is written in Python 3.8 and utilizes PyTorch 1.13.1 with CUDA 11.6 on Nvidia GeForce RTX 3090. While this environment is recommended, it is not mandatory. Feel free to run the code on your preferred environment.
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
  1. Install other packages.
pip install opencv-python PyYAML easydict warmup_scheduler

If you have any issues due to missing packages, please report them. I will update the requirements. Thank you for your cooperation.

Training

Step 1: Choose the model file.

We provide three models GazePTR.py, GazeDPTR.py and GazeDPTR_v2.py. (We will update pretrained weights ASAP.)

Name Description Input Output Accuracy Pretrained Weights
1 GazePTR This method leverages multi-level feature. Normalized Images Gaze Directions 7.04° Link
2 GazeDPTR This method integrates feature from two images. Normalized Images Original Images Gaze Directions 6.71° Link
3 GazeDPTR_V2 This method contains a diffierential projection for gaze zone prediction. Normalized Images Original Images Gaze Directions Gaze Zone 6.71° 81.8% Link

Please choose one model and rename it as model.py, e.g.,

cp GazeDPTR.py model.py

Step 2: Modify the config file

Please modify config/train/config_iv.yaml according to your environment settings.

  • The Save attribute specifies the save path, where the model will be stored atos.path.join({save.metapath}, {save.folder}). Each saved model will be named as Iter_{epoch}_{save.model_name}.pt
  • The data attribute indicates the dataset path. Update the image and label to match your dataset location.

Step 3: Training models

Run the following command to initiate training. The argument 3 indicates that it will automatically perform three-fold cross-validation:

python trainer/leave.py config/train/config_iv.yaml 3

Once the training is complete, you will find the weights saved at os.path.join({save.metapath}, {save.folder}). Within the checkpoint directory, you will find three folders named train1.txt, train2.txt, and train3.txt, corresponding to the three-fold cross-validation. Each folder contains the respective trained model."

Testing

Run the following command for testing.

python tester/leave.py config/train/config_iv.yaml config/test/config_iv.yaml 3

Similarly,

  • Update the image and label in config/test/config_iv.yaml based on your dataset location.
  • The savename attribute specifies the folder to save prediction results, which will be stored at os.path.join({save.metapath}, {save.folder}) as defined in config/train/config_iv.yaml.
  • The code tester/leave.py provides the gaze zone prediction results. Remove it if you do not require gaze zone prediction.

Evaluation

We provide evaluation.py script to assess the accuracy of gaze direction estimation. Run the following command:

python evaluation.py {PATH}

Replace {PATH} with the path of {savename} as configured in your settings.

Contact

Please send email to [email protected] if you have any questions.

ivgaze's People

Contributors

yihuacheng avatar

Stargazers

 avatar Jacob Moonki Back avatar Penut avatar Lrving Yu avatar  avatar  avatar JP avatar  avatar Guan Dai avatar

Watchers

 avatar  avatar

Forkers

chky

ivgaze's Issues

Question about dataset parse

Hi, professor Cheng @yihuacheng , Thank you for your work!
I did some visualizations of labels while parsing normal dataset, and I found that some of the lines of sight were going in different directions than I expected. For example, I think the person is looking down, but the label says the person is looking up.
I did a simple test using the pre-trained ResNet50(ETH-XGaze) and labels for comparison, I found that the ResNet50 was similar to my visual judgment, i.e. my judgment and the label diverged a bit.
Here is a picture of my experiment, the data comes from the Norm/20221013/subject0050_yaw_out, green represents the label and red represents the prediction, a red source indicates a suspicious output. Can you give me some useful information? thanks!
20240522-133904

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.