hassony2 / handobjectconsist Goto Github PK

View Code? Open in Web Editor NEW

120.0 5.0 18.0 16.3 MB

[cvpr 20] Demo, training and evaluation code for joint hand-object pose estimation in sparsely annotated videos

Home Page: https://hassony2.github.io/handobjectconsist.html

License: MIT License

Python 99.96% Shell 0.04%

cvpr2020 sparse-supervision photometric differentiable-rendering hands pose-estimation video 3d-reconstruction

handobjectconsist's Introduction

Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction

Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, and Cordelia Schmid

Table of Content

Setup
- Download and install code
- Download datasets
Demo
Training
- Train model for joint hand-object pose estimation
- Train in sparsely annotated setting on FPHAB dataset
Acknowledgements

Setup

Download and install code

Retrieve the code

git clone https://github.com/hassony2/handobjectconsist`
cd handobjectconsist

Create and activate the virtual environment with python dependencies

conda env create --file=environment.yml
conda activate handobject_env

Download the MANO model files

Go to MANO website
Create an account by clicking Sign Up and provide your information
Download Models and Code (the downloaded file should have the format mano_v*_*.zip). Note that all code and data from this download falls under the MANO license.
unzip and copy the content of the models folder into the assets/mano folder
Your structure should look like this:

handobjectconsist/
  assets/
    mano/
      MANO_LEFT.pkl
      MANO_RIGHT.pkl
      fhb_skel_centeridx9.pkl

Download datasets

First-Person Hand Action Benchmark (FPHAB)

Download the First-Person Hand Action Benchmark dataset following the official instructions to the data/fhbhands folder
Unzip the Object_models

unzip data/fhbhands/Object_models.zip -d data/fhbhands

Unzip MANO fits

tar -xvf assets/fhbhands_fits.tgz -C assets/

Download pre-trained models

wget https://github.com/hassony2/handobjectconsist/releases/download/v0.2/releasemodels.zip

unzip releasemodels.zip

Optionally, resize the images (speeds up training !)
- python reduce_fphab.py
Your structure should look like this:

data/
  fhbhands/
    Video_files/
    Video_files_480/  # Optional, created by reduce_fphab.py script
    Subects_info/
    Object_models/
    Hand_pose_annotation_v1/
    Object_6D_pose_annotation_v1_1/
assets/
  fhbhands_fits/
releasemodels/
  fphab/
     ...

HO3D

CVPR 2020

Note that all results in our paper are reported on a subset of the current dataset which was published as an early release, additionally we used synthetic data which is not released. The results are therefore not directly comparable with the final published results which are reported on the v2 version of the dataset.

Codalab challenge pre-trained model

After submisison I retrained a baseline model on the current dataset (official release of HO3D, which I refer to as HO3D-v2). You can get the model from the releasemodels

Evaluate the pre-trained model:

Download pre-trained models
Extract the pre-trained models unzip releasemodels.zip
Run the evaluation code and generate the codalab submission file

python evalho3dv2.py --resume releasemodels/ho3dv2/realonly/checkpoint_200.pth --val_split test --json_folder jsonres/res

This will create a file 'pred.zip' ready for upload to the codalab challenge

Training model on HO3D-v2

Download the HO3D-v2 dataset.
launch training using python trainmeshreg and providing all arguments as in releasemodels/ho3dv2/realonly/opt.txt

Demo

Run the demo on the FPHAB dataset.

python visualize.py

This script loads three models and visualizes their predictions on samples from the test split of FPHAB:

a model trained on the full FPHAB dataset
a model trained with only a fraction (<1%) of the full ground truth annotations finetuned with photometric consistency
a control model trained with the same fraction of the full ground truth annotations finetuned without photometric consistency

It produces images such as the following:

Training

Run the training code

Baseline model for joint hand-object pose estimation

Train baseline model of entire FPHAB (100% of the data is supervised with 3D annotations)

python trainmeshreg.py --freeze_batchnorm --workers 8 --block_rot

Train in sparsely annotated setting

Step 1: Train baseline model on a fraction of the FPHAB dataset (here 0.65%)

python trainmeshreg.py --freeze_batchnorm --workers 8 --fraction 0.00625 --eval_freq 50

Step 2: Resume training, adding photometric supervision

Step 1 will have produced a trained model which will be saved in a subdirectory of checkpoints/fhbhands_train_mini1/{data_you_launched_trainings}/.

Step 2 will resume training from this model, and further train with the additional photometric consistency loss on the frames for which the ground truth annotations are not used.

python trainmeshwarp.py --freeze_batchnorm --consist_gt_refs --workers 8 --fraction 0.00625 --resume checkpoints/path/to/saved/checkpoint.pth

Optional: For fair comparison (same number of training epochs), training can also be resumed without photometric consistency (this shows that the improvement does not come simply from longer training)

python trainmeshwarp.py --freeze_batchnorm --consist_gt_refs --workers 8 --fraction 0.00625 --resume checkpoints/path/to/saved/checkpoint.pth --lambda_data 1 --lambda_consist 0

Citation

If you find this code useful for your research, consider citing our paper:

@INPROCEEDINGS{hasson20_handobjectconsist,
	       title     = {Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction},
	       author    = {Hasson, Yana and Tekin, Bugra and Bogo, Federica and Laptev, Ivan and Pollefeys, Marc and Schmid, Cordelia},
	       booktitle = {CVPR},
	       year      = {2020}
}

To fix

Thanks to Samira Kaviani for spotting that in Table 2. the splits are different because I previously filtered out frames for which hands are further than 10cm away from the object ! I will rerun the results beginning September and update them here.

Acknowledgements

Code

For this project, we relied on research code from:

PWC-Net for image warping utilities.
Christian Zimmermann for hand evaluation code from hand3d
the PyTorch port of Neural Renderer.

Advice and discussion

I would like to specially thank Shreyas Hampali for advice on the HO-3D dataset and Guillermo Garcia-Hernando for advice and on the FPHAB dataset.

I would also like to thank Mihai Dusmanu, Yann Labbé and Thomas Eboli for helpful discussions and proofreading !

handobjectconsist's People

Contributors

Stargazers

Watchers

Forkers

gzzgz www516717402 fkggit leylakhaleghi hyzcn zc-alexfan tyan3001 pgrady3 spk921 zumbalamambo jonashein dumyy sgchi wangjingbo1219 binghui-z hoangcuongbk80 m-leng

handobjectconsist's Issues

Why dose the cam_extr look like this?

Hi,hassony.

Thanks for this work.
I start leaning th code recently, and now I got confused bout the "get_hand_verts3d" function in ho3dv2.py.
In such function, you use cam_extr and it has been defined as np.array([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]]).
I cannot know the reason.
Hope do some little work in such field ,welcome for any generous suggestion.

about self.reorder_idxs

Hi, thanks for the great work!

In ho3dv2.py line54 , u use self.reorder_idxs. An d I have checked that the joints 3d location after passing params. to mano_layer(which u proposed as manopth) are different from the ground truth 3d joint loactions.But after u use self.reorder_idxs to change order , they are nearly same.
So do u change the order through self.reorder_idxs to adapt to the mano layer? In code of ho3d, such order change is used only for simpler visualization.
So after training on ho3d subset, do you use an "inverse_reorder_idxs" to recover the original order which are used in evaluation datast? Is it necessary? How do you achieve it?

Hope to get some suggestions from u.
Merci beaucoup!

same issue as #11

On running python visualize.py I get the following error (I have made sure I followed all instructions for installing the FPHAB dataset correctly):

Traceback (most recent call last): File "visualize.py", line 214, in main(args) File "visualize.py", line 127, in main crops = [vizdemo.get_crop(render_res) for render_res in render_ress] File "visualize.py", line 127, in crops = [vizdemo.get_crop(render_res) for render_res in render_ress] File "/advaya/handobjectconsist/meshreg/visualize/vizdemo.py", line 32, in get_crop x_min = xs.min() File "/h/advaya/.conda/envs/handobject_env/lib/python3.7/site-packages/numpy/core/_methods.py", line 43, in _amin return umr_minimum(a, axis, None, out, keepdims, initial, where) ValueError: zero-size array to reduction operation minimum which has no identit

data_split_action_recognition.txt

I was following the readme to run visualize.py and keep running into the following error:

Traceback (most recent call last): File "visualize.py", line 214, in <module> main(args) File "visualize.py", line 51, in main sample_nb=None, File "/scratch/ssd002/home/advaya/handobjectconsist/meshreg/netscripts/get_dataset.py", line 51, in get_dataset split=split, use_cache=use_cache, mini_factor=mini_factor, fraction=fraction, mode=mode File "/scratch/ssd002/home/advaya/handobjectconsist/meshreg/datasets/fhbhands.py", line 126, in __init__ self.load_dataset() File "/scratch/ssd002/home/advaya/handobjectconsist/meshreg/datasets/fhbhands.py", line 174, in load_dataset with open(self.info_split, "r") as annot_f: FileNotFoundError: [Errno 2] No such file or directory: 'data/fhbhands/data_split_action_recognition.txt'

How do I obtain/create this file?

Problem with neural_renderer

Hi there,

Thanks for sharing your work.

I got this problem while creating the virtual environment:

My system: Ubuntu 20.04, Cuda 10.1.

Any suggestions would be greatly appreciated!

Memory used by Warping

Hi @hassony2 ,
Thanks for sharing the code,
I have treid second stage of training; meshwarp; after some epochs, for me 50 epochs, occupied memory goes around 128 G, I do not have an idea why it happens?

the loss about recov_joint3d

Why is the original GT used for supervision recov_Joint3D, even with data enhancement

About table 1. results in your paper

Hello.
I have two questions.

For the errors reported at table 1 in your paper, are these errors computed after Procrustes alignment or not ?

Is the model chekpoint in releasemodes/fphab/hand_and_objects/checkpoint_200.pth correspond to the model of table 1 ??

Thanks in advance!

unable to find fhbhands_fits.tgz

the fhbhands dataset does not have the fhbhands_fits.tgz file.

What do you mean by "BaseQueries" and " TransQueries" in your code?

Hi Yana~ I'm now reading your code. I am confused about these two names. And I don't know why you do some transformation on data if "TransQueries" are selected. I would be appreciate if you can explain! Thanks!

Live demo with RealSense D435 camera

Hi, thanks for sharing this great project!

I'm wondering if it's possible to run a live demo using the D435i camera or this library works only with specific datasets?

Thanks!

What exactly subset were you using for evaluation on HO3D

Hi, thanks for the great work!

For HO3D you mentioned that
"
HO3D
Optional: Download the HO3D-v2 dataset. Note that all results in our paper are reported on a subset of the current dataset which was published as an early release. The results are therefore not directly comparable with the final published results which are reported on the v2 version of the dataset.
"

Do you remember what exactly are the subset you were using? I'm working on a project which may need to follow same protocol for fair comparison later. Currently HO3D is using online competition and the evaluation set groundtruths are not available personally. If I understand correctly, you were not using the same evaluation set at that moment?

Figure 8 & Figure 7 plot data

Hi @hassony2 ,
It would be great if you share plot data, for the fair comparison,

Thanks,

Training details about HO3D

Hi, Hasson,

Thanks for your great work!

Could you be kind enough to share the training args when train the meshreg on HO3D dataset.
as well as the pretrained model (of HO3D)

Thanks 👍

Figure 5 plot data

Hi @hassony2 ,
It would be great if you share plot data, for the fair comparison,

Thanks,

Processing a single image with the model

Hello everyone,

i want to write an application that processes each frame of a video with the pretrained model of handobjectconsist
(producing the MANO-mesh and Object Pose for each frame).
I saw the code in the visualize.py file has a demonstration of the model, but it receives some sort of datastructure for the dataset which it gets by "dataset, input_res = get_dataset.get_dataset (...)", however i dont want to process a whole datastructure but just a single frame.
I would like to know, what would be the easiest way to process a single frame when each RGB-frame is given as a single opencv mat type.
I tried it like this (the getMat () funktion receives the frame from the video stream as a opencv mat):

resume = "releasemodels/fphab/hands_and_objects/checkpoint_200.pth"
opts = reloadmodel.load_opts(resume)
self.model, epoch = reloadmodel.reload_model(resume, opts)
freeze.freeze_batchnorm_stats(self.model)
self.model.cuda()
self.model.eval()

mat = getMat()
dataset = torch.utils.data.Dataset (np.array (mat))
loader = torch.utils.data.DataLoader(dataset,batch_size=1)
_, results, _ = self.model (loader)

However i get the error:
Failed to call callback: 'DataLoader' object is not subscriptable
Traceback (most recent call last):

Has anybody an idea how to fix this,
Thanks in advance,
Patrick

visualize.py zero-size array

On running python visualize.py I get the following error (I have made sure I followed all instructions for installing the FPHAB dataset correctly):

Traceback (most recent call last): File "visualize.py", line 214, in <module> main(args) File "visualize.py", line 127, in main crops = [vizdemo.get_crop(render_res) for render_res in render_ress] File "visualize.py", line 127, in <listcomp> crops = [vizdemo.get_crop(render_res) for render_res in render_ress] File "/advaya/handobjectconsist/meshreg/visualize/vizdemo.py", line 32, in get_crop x_min = xs.min() File "/h/advaya/.conda/envs/handobject_env/lib/python3.7/site-packages/numpy/core/_methods.py", line 43, in _amin return umr_minimum(a, axis, None, out, keepdims, initial, where) ValueError: zero-size array to reduction operation minimum which has no identity

Is there are full MANO fitted parameters for other frames of "fhbhands"?

Hi, Hasson,

Thank you for the work.
I noticed that the MANO fitted of "fhbhands_fits" is not full version.
I saw your previous paper "Learning joint reconstruction of hands and manipulated objects" fitted the MANO model on "First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations" dataset.
Wondering that there are mano fitted parameters with that dataset.
If so could you share url?
Thank you.

Figure 8 in your paper

Hi, I see you have resized all FPHAB images to [480x270] dimension in this code

In figure 8, are the reported pixel erros in the size of [480x270], or the full resolution size? I'm working on a research project and want to compare with this method fairly.

Training on HO3D

Hi Yana,

Thanks for sharing your code on GitHub. It is very cool! I am trying to reproduce your HO3D results using your code, but encountered an error and I am not sure if I missed anything.

The command for launching the training procedure:

python trainmeshreg.py --freeze_batchnorm --workers 8 --block_rot --train_datasets ho3dv2 --version 1

The error message:

Traceback (most recent call last):
  File "trainmeshreg.py", line 361, in <module>
    main(args)
  File "trainmeshreg.py", line 79, in main
    sample_nb=None,
  File "/home/elytra/nether/projects/handobjectconsist/meshreg/netscripts/get_dataset.py", line 40, in get_dataset
    full_sequences=False,
  File "/home/elytra/nether/projects/handobjectconsist/meshreg/datasets/ho3dv2.py", line 259, in __init__
    self.obj_meshes = ho3dfullutils.load_objects(os.path.join(self.root, "modelsprocess"))
  File "/home/elytra/nether/projects/handobjectconsist/meshreg/datasets/ho3dfullutils.py", line 8, in load_objects
    object_names = [obj_name for obj_name in os.listdir(obj_root) if ".tgz" not in obj_name]
FileNotFoundError: [Errno 2] No such file or directory: 'data/ho3dv2/modelsprocess'

I am not sure what modelsprocess is, but according to the object file names, it should be YCB object files because the file name is called textured_simple_2000.obj while the YCB objects are called textured_simple.obj.

Questions:

How do you go from textured_simple.obj to textured_simple_2000.obj? It is my first mesh-based project. Could you kindly provide some instructions to generate the modelsprocess folder? (my email is [email protected])

Thank you,
Alex

FileNotFoundError: [Errno 2] No such file or directory: '/home/chen/datasets/HO3D_v2/modelsprocess'

Hi, thanks for update on ho3d_v2, I have 2 questions here:

When I run python evalho3dv2.py --resume releasemodels/ho3dv2/realonly/checkpoint_200.pth --val_split test --json_folder jsonres/res, I got error below.It seems object models are neeeded.So could u share them?
Traceback (most recent call last): File "evalho3dv2.py", line 156, in <module> main(args) File "evalho3dv2.py", line 56, in main has_dist2strong=True, File "/home/chen/PycharmProjects/handobjectconsist/meshreg/netscripts/get_dataset.py", line 41, in get_dataset full_sequences=False, File "/home/chen/PycharmProjects/handobjectconsist/meshreg/datasets/ho3dv2.py", line 260, in __init__ self.obj_meshes = ho3dfullutils.load_objects(os.path.join(self.root, "modelsprocess")) File "/home/chen/PycharmProjects/handobjectconsist/meshreg/datasets/ho3dfullutils.py", line 8, in load_objects object_names = [obj_name for obj_name in os.listdir(obj_root) if ".tgz" not in obj_name] FileNotFoundError: [Errno 2] No such file or directory: '/home/chen/datasets/HO3D_v2/modelsprocess'
I got confuesd about that you could generate results to Codalab Competion ,as your method depicted in CVPR2020 paper couldn't handle unseen object in official evaluation set. And I remember u once upload one better result to Codalab Competition using method mentioned in your CVPR2019 paper.Could u spare some time to explain it?

Merci beaucoup!

Figure7. Ho3d

Hi @hassony2
Are the reported results in camera coordinate space or root-relative?

Best,