ronlek / fastv2c-handnet Goto Github PK

Repository for the implementation of "FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks"

Home Page: https://arxiv.org/abs/1907.06327

Python 78.40% M 21.60%

hand-pose-estimation fastv2c-handnet deep-learning computer-vision 3d-pose-estimation 3d-hand-pose depth-images 3d-convolutional-network

fastv2c-handnet's Introduction

FastV2C-HandNet : Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks

Introduction

This is the project repository for the paper, FastV2C-HandNet : Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks (Springer).

Please refer to our paper for details.

If you find our work useful in your research or publication, please cite our work:

[1] Rohan Lekhwani, Bhupendra Singh. "FastV2C-HandNet : Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks"[Springer]

Lekhwani, Rohan, and Bhupendra Singh. 
"FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks." 
International Conference on Innovative Computing and Communications. 
Springer, Singapore, 2019.

In this repository, we provide

Our model architecture description (FastV2C-HandNet)
Comparison with the previous state-of-the-art methods
Training code
Dataset we used (MSRA)
Trained models and estimated results

Model Architecture

Comparison with the previous state-of-the-art methods

About our code

Dependencies

The code is tested under Ubuntu 18.04, Windows 10 environment with Nvidia P100 GPU (16GB VRAM).

Code

Clone this repository into any place you want. You may follow the example below.

makeReposit = [/the/directory/as/you/wish]
mkdir -p $makeReposit/; cd $makeReposit/
git clone https://github.com/RonLek/FastV2C-HandNet.git

src folder contains python script files for data loader, trainer, tester and other utilities.
data folder should contain an 'MSRA' folder with binary image files.

To train our model, please run the following command in the src directory:

python train.py

Dataset

We trained and tested our model on the MSRA Hand Pose Dataset.

MSRA Hand Pose Dataset [link] [paper]

Results

Here we provide the precomputed centers, estimated 3D coordinates and pre-trained models of MSRA dataset. You can download precomputed centers and 3D hand pose results in here and pre-trained models in here

The precomputed centers are obtained by training the hand center estimation network from DeepPrior++ . Each line represents 3D world coordinate of each frame. In case depth map does not exist or not contain hand, that frame is considered as invalid. All test images are considered as valid.

We used awesome-hand-pose-estimation to evaluate the accuracy of the FastV2C-HandNet on the MSRA dataset.

Belows are qualitative results.

fastv2c-handnet's People

Contributors

Stargazers

Watchers

Forkers

2113vm swipswaps

fastv2c-handnet's Issues

How to generate "center__.txt“ file？

I have a question about whether you should generate center__.txt in advance? How to generate it ? Thank you.

Have you managed to improve the accuracy?

@RonLek , nice work! Thanks for sharing!
Wonder if you have managed to improve the accuracy while maintaining the similar speed.

I find you provide "center_train_*_refined" file, but I don't know which files each line corresponds to. Could you give me some advice?

I find you provide "center_train_*_refined" file, but I don't know which files each line corresponds to. Could you give me some advice? Thank you very much.

wrong number of predictions for msra?

I am using the test.py script to generate the predictions so I can later evaluate the model using the https://github.com/xinghaochen/awesome-hand-pose-estimation framework.

For msra the total number of predictions in the generated txt file is 8496 while the expected number of predictions in the https://github.com/xinghaochen/awesome-hand-pose-estimation is 76375.

Therefore i am getting an error while trying to generate the scores :

ValueError: operands could not be broadcast together with shapes (76375,21,3) (8496,21,3)

Shouldn't the test.py script generate a bigger list of predictions for this dataset? Or am I doing something wrong?

how to speed up training?

Environment: Ubuntu RTX 2080ti 11g

When I try to train with MSRA dataset, I find that the training speed is very slow, and the batch size can only be set to 2. How can I increase the size of batch size? Why can a video card with 11g memory only read 2 batch sizes? How can I improve my training speed? How many epochs do you need to train? Thank you for response.

Evalution Script

Hi,

Thank you for open-sourcing this project.

How can I evaluate the model with awesome-hand-pose-estimation? It seems the test.py can only generate 2124 samples, while the awesome-hand-pose-estimation requires the whole MSRA dataset.

Is it possible to enable the whole dataset evaluation or individually?

Thanks

I find that your model file size is 11.4M, why you say your model size is 42M?

I try to run "hand render" code, but it cannot get expected result.

Error while run train.py

Hello!
I was trying to reproduce experiments with your solution. When I run train.py I have this error:

Traceback (most recent call last):
  File "train.py", line 145, in <module>
    net = model_inst(input_channels = 1, output_channels = keypoints_num) 
  File "/home/oriuser/mounted/src/mymodel.py", line 176, in model_inst
    x = Dense(44, activation = 'relu')(x) #Changed
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in __call__
    self._maybe_build(inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build
    self.build(input_shapes)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py", line 1005, in build
    raise ValueError('The last dimension of the inputs to `Dense` '
ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.

I tried to fix that commenting out this line in mymodel.py

x = Reshape((output_channels, -1))(x)

And got an error:

Traceback (most recent call last):
  File "train.py", line 177, in <module>
    history = net.fit_generator(train_set, steps_per_epoch = steps_per_epoch_train, epochs = epochs_num, verbose = 1, callbacks = [cp_callback], validation_data = val_set, validation_steps = steps_per_epoch_val, workers = 0, use_multiprocessing = False, shuffle = True, initial_epoch = 0) #Changed
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1153, in train_on_batch
    extract_tensors_from_dataset=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 2692, in _standardize_user_data
    y, self._feed_loss_fns, feed_output_shapes)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training_utils.py", line 549, in check_loss_and_target_compatibility
    ' while using as loss `' + loss_name + '`. '
ValueError: A target array with shape (4, 21, 3) was passed for an output of shape (None, 21, 22, 22, 3) while using as loss `mean_squared_error`. This loss expects targets to have the same shape as the output.

Could you please specify tensorflow and keras versions? (I guess this is the issue: I use different versions of tensorflow)

If not, is there any suggestion to fix those errors?

I do not find the "mean_error_all" in your paper, could you provide it?

I do not find the "mean_error_all" in your paper, could you provide it? We can find the way to calculate "mean_error_all" in show_accuracy.py. Thank you very much.

Why the functions pixel2world and world2pixel are implemented differently for different datasets?

In V2V github website "https://github.com/mks0601/V2V-PoseNet_RELEASE". I find the functions pixel2world and world2pixel are implemented differently for different datasets.
In dataset MSRA: world2pixel(x,y,z) local pixelY = imgHeight/2 - fy * torch.cdiv(y,z)
In dataset ICVL: world2pixel(x,y,z) local pixelY = imgHeight/2 + fy * torch.cdiv(y, z)
So can you provide your ICVL dataset related code? The formulate is whether right? Thank you!

I try to use "model.h5" that you provide to obtaion "mean_error_all", which is [99.6296708]. So the accuracy is not very good, right? I see other paper(V2V) proposed model that can achieve 7.49 in the MSRA dataset.