What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng , Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang, CVPR 2024

Description

This repository provides offical code of the paper titled What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation, accepted at CVPR24. Our contribution includes:

We provide a dataset IVGaze collected on vehicles containing 44k images of 125 subjects.
We propose a gaze pyramid transformer (GazePTR) that leverages transformer-based multilevel features integration.
We introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation.

Please visit our project page for details. The dataset is available on this page .

Requirement

Install Pytorch and torchvision. This code is written in Python 3.8 and utilizes PyTorch 1.13.1 with CUDA 11.6 on Nvidia GeForce RTX 3090. While this environment is recommended, it is not mandatory. Feel free to run the code on your preferred environment.

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

Install other packages.

pip install opencv-python PyYAML easydict warmup_scheduler

If you have any issues due to missing packages, please report them. I will update the requirements. Thank you for your cooperation.

Training

Step 1: Choose the model file.

We provide three models GazePTR.py, GazeDPTR.py and GazeDPTR_v2.py. (We will update pretrained weights ASAP.)

	Name	Description	Input	Output	Accuracy	Pretrained Weights
1	GazePTR	This method leverages multi-level feature.	Normalized Images	Gaze Directions	7.04°	Link
2	GazeDPTR	This method integrates feature from two images.	Normalized Images Original Images	Gaze Directions	6.71°	Link
3	GazeDPTR_V2	This method contains a diffierential projection for gaze zone prediction.	Normalized Images Original Images	Gaze Directions Gaze Zone	6.71° 81.8%	Link

Please choose one model and rename it as model.py, e.g.,

cp GazeDPTR.py model.py

Step 2: Modify the config file

Please modify config/train/config_iv.yaml according to your environment settings.

The Save attribute specifies the save path, where the model will be stored atos.path.join({save.metapath}, {save.folder}). Each saved model will be named as Iter_{epoch}_{save.model_name}.pt
The data attribute indicates the dataset path. Update the image and label to match your dataset location.

Step 3: Training models

Run the following command to initiate training. The argument 3 indicates that it will automatically perform three-fold cross-validation:

python trainer/leave.py config/train/config_iv.yaml 3

Once the training is complete, you will find the weights saved at os.path.join({save.metapath}, {save.folder}). Within the checkpoint directory, you will find three folders named train1.txt, train2.txt, and train3.txt, corresponding to the three-fold cross-validation. Each folder contains the respective trained model."

Testing

Run the following command for testing.

python tester/leave.py config/train/config_iv.yaml config/test/config_iv.yaml 3

Similarly,

Update the image and label in config/test/config_iv.yaml based on your dataset location.
The savename attribute specifies the folder to save prediction results, which will be stored at os.path.join({save.metapath}, {save.folder}) as defined in config/train/config_iv.yaml.
The code tester/leave.py provides the gaze zone prediction results. Remove it if you do not require gaze zone prediction.

Evaluation

We provide evaluation.py script to assess the accuracy of gaze direction estimation. Run the following command:

python evaluation.py {PATH}

Replace {PATH} with the path of {savename} as configured in your settings.

Contact

Please send email to [email protected] if you have any questions.

yihuacheng / ivgaze Goto Github PK

ivgaze's Introduction

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Description

Requirement

Training

Testing

Evaluation

Contact

ivgaze's People

Contributors

Stargazers

Watchers

Forkers

ivgaze's Issues

Question about dataset parse

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent