Coder Social home page Coder Social logo

tucan9389 / tf2-mobile-2d-single-pose-estimation Goto Github PK

View Code? Open in Web Editor NEW
167.0 17.0 41.0 39.74 MB

:dancer: Pose estimation for iOS and android using TensorFlow 2.0

License: Apache License 2.0

Python 99.50% Shell 0.50%
tensorflow2 pose-estimation mobile ios tensorflow-lite

tf2-mobile-2d-single-pose-estimation's Issues

Training doesn't work

First of all, thank you for sharing the code 👍🏻I'm trying to get into ML for my project. I've installed all dependancies successfully, downloaded the dataset and started the training script with python train.py.

It seems to work fine. I'm getting output like this:

Train for 100 steps
Epoch 1/500
2019-11-20 05:27:23.706954: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
 99/100 [============================>.] - ETA: 2s - loss: 0.2444   
Epoch 00001: saving model to /Users/aeyoa/Documents/Sites/tf2-mpe/outputs/models/11200527_hg_lr0.0001.hdf5
100/100 [==============================] - 251s 3s/step - loss: 0.2422

After 500 epochs I'm getting loss of 0.0038

Epoch 500/500
 99/100 [============================>.] - ETA: 1s - loss: 0.0038  
Epoch 00500: saving model to /Users/aeyoa/Documents/Sites/tf2-mpe/outputs/models/11200527_hg_lr0.0001.hdf5
100/100 [==============================] - 208s 2s/step - loss: 0.0038

But when I'm checking the progress in Tensorboard the training seems to be broken: heatmaps are changing randomly. Here are a few examples:

After 245 epochs
Screenshot 2019-11-20 at 19 30 52

After 246 epochs — suddenly white
Screenshot 2019-11-20 at 19 37 03

After 306 epochs — still no results
Screenshot 2019-11-20 at 23 01 59

After the fitting I've checked the model with model.predict and still didn't get any reasonable heatmap.

Have you tried the latest code for training? Does is work? Do you have any idea why this isn't working?

Thank you

from_concrete_function()

'TFLiteConverterV2' has no attribute 'from_concrete_function'?

The reason is that tensorflow2.0 has changed the function from 'from_concrete_function(func)' to 'from_concrete_function(cls,funcs)'. So we only need to change the code to 'converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])'

Convert Eager to Keras fit in training

Related script

@tf.function
def train_step(model, images, labels):
with tf.GradientTape() as tape:
model_output = model(images)
predictions_layers = model_output
losses = [loss_object(labels, predictions) for predictions in predictions_layers]
total_loss = tf.math.add_n(losses)
max_val = tf.math.reduce_max(predictions_layers[-1])
gradients = tape.gradient(total_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(total_loss)
return total_loss, losses[-1], max_val

Training trend seems weird on COCO Datasets

This is my configuration file rewritten as the yaml format for my own convenience.

dataset:
  dataset_root_path: /datasets/
  dataset_directory_name: coco_dataset
  train_images: train2017
  train_annotation: annotations_trainval2017/person_keypoints_train2017.json
  valid_images: val2017
  valid_annotation: annotations_trainval2017/person_keypoints_val2017.json

model:
  model_name: cpm
  model_subname: 
  batch_size: 3
  input_width: 192
  input_height: 192
  output_width: 24
  output_height: 24

preprocessing:
  is_scale: True
  is_rotate: True
  is_flipping: True
  is_resize_shortest_edge: True
  is_crop: True
  rotate_min_degree: -15.0
  rotate_max_degree: 15.0
  heatmap_std: 5.0

training:
  batch_size: 32
  learning_rate: 0.001
  epsilon: +1.e-8
  number_of_epoch: 200
  period_echo: 100
  period_save_model: 5000
  period_tensorboard: 10
  period_valid_image: 5000
  valid_pckh: True
  pckh_distance_ratio: 0.5
  multiprocessing_num: `4`

I've trained CPM model for days using this configuration and when I looked at the tensorboard the loss score and PCKh value showed no improvements over the course of training.

image

Of course the heatmap result somehow showed the same results.
result2-090000

When I changed my dataset from coco to ai_challenger dataset, the model seemed to be training fine.
image

I'm thinking that the error is caused from the data preparation steps however did not find specific codes to fix. Do you have an idea for this?

RAM memory increases

I use Colab to retrain with custom data. I noticed that after saving the model (5000 steps), I see an increase in the amount of RAM. It could also be due to the val_step function or something else. Has anyone come across a case like this?

Evaluate PCKh chart with saved models

  • write on tensorboard
  • evaluate pckh for each part

Example PCKh Table

Name Head Neck Shoulder Elbow Wrist Hip Knee Ankle Total
mv2 hourglass 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
mv2 cpm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
mv2 simplepose 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
resnet18 simplepose 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Replace config python code to .cfg file

  • repace model_config.pyconfig/experiment01.cfg
  • repace config/train_config.pyconfig/experiment01.cfg
  • create config/dataset/ai_challenge-gpu.cfg
  • create config/dataset/coco2017-gpu.cfg

Getting confidence values from the body points

I'm using the Pose Estimation model for Android, I was wondering how I can get the confidence value from a body point?

I'm also using the iOS/CoreML model and it has a confidence value.

training was stoped when reaching num_train_samples (I guess)

  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8009, in sub
    "Sub", x=x, y=y, name=name)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [16,48,48,14] vs. [12,48,48,14]
         [[Node: GPU_0/sub_4 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU_0/upsample_for_loss_0, IteratorGetNext_1/_4657)]]
         [[Node: GPU_0/hourglass_out_3_1/_4663 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2975_GPU_0/hourglass_out_3_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'GPU_0/sub_4', defined at:
  File "src/train.py", line 256, in <module>
    tf.app.run()
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "src/train.py", line 158, in main
    reuse_variable)
  File "src/train.py", line 48, in get_loss_and_output
    loss_l2 = tf.nn.l2_loss(tf.concat(pred_heat, axis=0) - input_heat, name='loss_heatmap_stage%d' % idx)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 979, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8009, in sub
    "Sub", x=x, y=y, name=name)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/mot/.conda/envs/pefm-env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [16,48,48,14] vs. [12,48,48,14]
         [[Node: GPU_0/sub_4 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU_0/upsample_for_loss_0, IteratorGetNext_1/_4657)]]
         [[Node: GPU_0/hourglass_out_3_1/_4663 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2975_GPU_0/hourglass_out_3_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

KeyError: 'extra' in train.py

Output:
"""
tensorflow version : 2.2.0
keras version : 2.3.0-tf
config/dataset/coco2017-gpu.cfg
config/training/experiment01.cfg
Traceback (most recent call last):
File "train.py", line 78, in
for key in parser["extra"]:
File "/usr/local/lib/python3.6/configparser.py", line 959, in getitem
raise KeyError(key)
KeyError: 'extra'
"""

[Building the model in google colab(https://colab.research.google.com/)]

Update README for 1.0.0

  • Add results of experiments
  • Change training script command
  • Add new supporting models
  • Add future works

Future Works

  • Convert to S4TF
  • Rearrange the pre-processing code
  • Converting script for Core ML
  • Evaluate PCKh with Core ML model(.mlmodel)

Sequencial training

Usage

python train.py --dataset_config=config/dataset/coco2017-gpu.cfg --experiment_config=config/training/experiment01.cfg \
python train.py --dataset_config=config/dataset/coco2017-gpu.cfg --experiment_config=config/training/experiment02.cfg \
python train.py --dataset_config=config/dataset/coco2017-gpu.cfg --experiment_config=config/training/experiment03.cfg \
python train.py --dataset_config=config/dataset/coco2017-gpu.cfg --experiment_config=config/training/experiment04.cfg \
python train.py --dataset_config=config/dataset/coco2017-gpu.cfg --experiment_config=config/training/experiment05.cfg 

Features

  • Slack or mail on end of training for each experiment
  • Write report automatically

Model Files Needed

Training takes too long, in my case about 200 hrs for one model. Could you please put trained models into the project so that we can compare the performance?

getting error while running train.py in windows

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = main(fd)
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\spawn.py", line 114, in main
    prepare(preparation_data)
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main")
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\runpy.py", line 96, in run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\runpy.py", line 85, in run_code
    exec(code, run_globals)
  File "C:\Users\Harsh\Desktop\tf2-mobile\train.py", line 135, in <module>
    from evaluate import calculate_total_pckh
  File "C:\Users\Harsh\Desktop\tf2-mobile\evaluate.py", line 105, in <module>
    manager = multiprocessing.Manager()
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\context.py", line 56, in Manager
    m.start()
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\managers.py", line 563, in start
    self._process.start()
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\Harsh\anaconda3\envs\posewin\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Please help me to solve this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.