Coder Social home page Coder Social logo

paninski-lab / deepgraphpose Goto Github PK

View Code? Open in Web Editor NEW
32.0 8.0 9.0 71.57 MB

DeepGraphPose

License: GNU Lesser General Public License v3.0

Python 88.56% Shell 0.03% Jupyter Notebook 11.41%
animal-pose-estimation keypoint-detection keypoint-tracking

deepgraphpose's Introduction

Deepgraphpose

This is the code for Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking.

DGP is a semi supervised model which can run on top of other tracking algorithms, such as DLC.
Since DLC developers have put in a lot of work into their GUI (and made it open source!), our algorithm can be run using the same filestructure as DLC.

If you have used DLC before, you can use the DGP pipeline to run DGP on top of these results!
If you have not used DLC before, you can use the DGP pipeline to run DLC and DGP.

Please see the installation instructions to install and run DGP on your videos.
Note: We have cloned the DLC package within the DGP repository, so we highly recommend that you install DGP in a new conda environment to avoid any conflicts with any concurrent DLC installation.

Installation Instructions

To install DGP, navigate to desired installation directory "DGP_DIR" and clone the repository:

cd "{DGP_DIR}"
git clone https://github.com/paninski-lab/deepgraphpose.git

Follow the installation instructions here using the dgp*.yaml files, instead of the dlc*.yaml files.

For example, if you are using Ubuntu OS with available GPUs, you can run the following:

cd deepgraphpose/src/DeepLabCut/conda-environments/
conda env create -f dgp-ubuntu-GPU-clean.yaml

This only creates an empty conda env. We need to manually install packages with specific versions as follows (this is because DGP is maintained with some old-version packages): Activate the dgp conda environment and navigate to the parent directory to install the DLC clone inside DGP:

source activate dgp

pip install opencv-python==3.4.5.20
pip install scipy==1.2.1
pip install matplotlib==3.0.3
pip install tensorflow-gpu==1.13.1
pip install tensorflow==1.15
pip install imgaug==0.4.0
pip install https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-18.04/wxPython-4.0.3-cp36-cp36m-linux_x86_64.whl

cd ../
pip install -e .

Next, install DGP in dev mode:

cd ../..
pip install -e .

Finally, download the resnet weights to train the network:

curl http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz | tar xvz -C src/DeepLabCut/deeplabcut/pose_estimation_tensorflow/models/pretrained/

You may need to first install curl from the command line; if running ubuntu, the command is:

sudo apt install curl

Remote server comments:

On a remote server, install open-cv without head by running:

pip install opencv-python-headless

On a remote server, install dlc/dgp in light mode by running:

export DLClight=True

before you begin the installation process of dgp.

Check that both packages were installed:

ipython
import deeplabcut 
print(deeplabcut.__path__)
['{DGP_DIR}/deepgraphpose/src/DeepLabCut/deeplabcut']
import deepgraphpose
print(deepgraphpose.__path__)
['{DGP_DIR}/deepgraphpose/src/deepgraphpose']

To Run DGP on your videos:

  1. Use DLC's GUI to collect labels as described here. This step should create a project folder: "{PROJ_DIR}/task-scorer-date" and a folder with your labeled data "{PROJ_DIR}/task-scorer-date/labeled-data". If you label frames from multiple videos, the GUI should automatically create folders corresponding to each video which contain *.csv, *.h5 files and *.png files with information from the manual labels.
  2. Inside '{PROJ_DIR}/task-scorer-date', create a folder 'videos_dgp'.
cd '{PROJ_DIR}/task-scorer-date'
mkdir videos_dgp
  1. Add to this folder other videos on which you want to run DGP during test time. Since DGP is a semi supervised model, it can exploit information from frames of videos with and without manually labeled markers during training time. You don't have to include all you videos but should include at least the most representative ones. The training time will increase proportional to the number of videos in this folder. Note: We currently don't support running multiple passes of DGP.

  2. Check the "bodyparts" and "skeleton" entries in "{PROJ_DIR}/task-scorer-date/config.yaml". For example, if I am tracking 4 fingers in each paw of a mice, my "bodyparts" and "skeleton" entries will have the following form:

bodyparts:
- pinky_finger_r
- ring_finger_r
- middle_finger_r
- pointer_finger_r
skeleton:
- - pinky_finger_r
  - ring_finger_r
- - ring_finger_r
  - middle_finger_r
- - middle_finger_r
  - pointer_finger_r

Each item in "bodyparts" corresponds to a marker, and each item in "skeleton" corresponds to a pair of connected markers (the order in which the parts are listed does not matter). If you don't want to consider skeleton (no interconnected parts), leave that field empty "skeleton"

skeleton:

  1. Run the DGP pipeline by running the following command:
python ['{DGP_DIR}/demo/run_dgp_demo.py'] --dlcpath '{PROJ_DIR}/task-scorer-date/' --shuffle 'the shuffle to run' --dlcsnapshot 'specify the DLC snapshot if you\'ve already run DLC with location refinement'

*** You can run the demo with the example project in the test mode to check the runnability of the code

python {DGP_DIR}/demo/run_dgp_demo.py --dlcpath {DGP_DIR}/data/Reaching-Mackenzie-2018-08-30 --test
  1. The output of the pipeline, including the labeled videos and the pickle files with predicted trajectories will be stored in "{PROJ_DIR}/task-scorer-date/videos_pred".

Try DGP on the cloud:

You can try DGP with GPUs on the cloud for free through a web interface using NeuroCAAS. The current DGP implementation in NeuroCAAS is in beta mode so we are looking for feedback from beta users. If you are interested please email [email protected] once you sign up to NeuroCAAS to get started.

deepgraphpose's People

Contributors

ekellbuch avatar md-121 avatar waq1129 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepgraphpose's Issues

Fitting DGP throws ValueError with 2 training frames

When running the run_dgp_demo.py script with two training frames, fitting DGP with labeled and unlabeled frames throws the following error:
ValueError: all the input array dimensions except for the concatenation axis must match exactly

The error occurs if running in test mode or not:
python "demo/run_dgp_demo.py" --dlcpath "path/to/model" or python "demo/run_dgp_demo.py" --dlcpath "path/to/model" --test

It does not occur if I increase the number of training frames to 4 or greater. I am using a fork of the dgp repo, forked from commit 0c5d91b, running on Ubuntu 18.04.4 LTS with a Tesla V100 GPU on Amazon AWS. I've attached the full stdout and stderr logs below.
DATASET_NAME_raw_data.zip_STATUS (5).json.txt

"Tensor had NaN values" when encountering improbable label values

Hi and thanks for all the great work!
I just wanted to point to a potential problem we've encountered during training of DGP. With our dataset, we could successfully run the first 50k iterations of "DGP on labeled frames only", but then for "Running DGP" encountered

tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[{{node VerifyFinite/CheckNumerics}}]]

This occurs in line 818 of fitdgp.py:
[loss_eval, _] = sess.run([loss, train_op], feed_dict)
After some debugging, I could trace the error to labeled frames in which the labels were accidentally set out of the normal range (DLC deletes markers set at x=0, y=0, but here they were accidentally at x=1, y=4; normally, labels were x/y>200). After removing these improbable labels, training continued normally.
It's great that we now had a chance to clean our training dataset, but it would be better if there was a way for DGP to maybe just ignore such labels while giving a precise Warning message to alert the user. Otherwise, it's quite hard for the user to figure out where the actual problem is.
Thanks,
Oliver

Code in command prompt runs, now what?

I ran the code in command prompt and trained a model. How do I use this model without going through the whole training process again? I see that it has created snapshots in my dlc-models directory however I cannot use these with deeplabcut workflow.

runfile('C:/Users/wlwee/Documents/python/fhl_three_target_experiment/CODE/creating_rotated_data/test_deepgraphpose_model.py', wdir='C:/Users/wlwee/Documents/python/fhl_three_target_experiment/CODE/creating_rotated_data')
Traceback (most recent call last):

  File "<ipython-input-683-4679c477922f>", line 1, in <module>
    runfile('C:/Users/wlwee/Documents/python/fhl_three_target_experiment/CODE/creating_rotated_data/test_deepgraphpose_model.py', wdir='C:/Users/wlwee/Documents/python/fhl_three_target_experiment/CODE/creating_rotated_data')

  File "C:\Users\wlwee\Anaconda3\envs\DLC-GPU\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\wlwee\Anaconda3\envs\DLC-GPU\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/wlwee/Documents/python/fhl_three_target_experiment/CODE/creating_rotated_data/test_deepgraphpose_model.py", line 27, in <module>
    destfolder = snapshot_dirs[m])

  File "C:\Users\wlwee\Anaconda3\envs\DLC-GPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\predict_videos.py", line 207, in analyze_videos
    increasing_indices = np.argsort([int(m.split("-")[1]) for m in Snapshots])

  File "C:\Users\wlwee\Anaconda3\envs\DLC-GPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\predict_videos.py", line 207, in <listcomp>
    increasing_indices = np.argsort([int(m.split("-")[1]) for m in Snapshots])

out of memory error

sigmoid_pred_np = np.exp(softmaxtensors) / (np.exp(softmaxtensors) + 1)

when running DGP on a large video this line gives me an out of memory error (required ~40GB of memory). Since this variable is only used in one other place (L348) I would suggest removing L339 and replacing L348 with:

sigmoid_pred_np_jj = np.exp(softmaxtensors [ff_idx, :, :, jj_idx]) / (np.exp(softmaxtensors [ff_idx, :, :, jj_idx]) + 1)

Error with DLC iteration > 0

Hi all,

Ran into an error with the run_dgp_demo pipeline when I tried to use it on a project that had two DLC iterations (iteration-0 and iteration-1) --

File "[my home directory]/deepgraphpose/src/deepgraphpose/utils_model.py", line 93, in get_train_config
    TrainingFraction = cfg['TrainingFraction'][iteration]
IndexError: list index out of range

My project's config.yaml has "iteration: 1", which I think is causing this problem. It only has one value to index for TrainingFraction.

So far, I've fixed it in a hacky way by changing line 93 of utils_model.py to index using 0 rather than iteration. Runs just fine for me after that. (Not sure if that fix would cause other problems for folks with more than one TrainingFraction value...)

Do we need to train DLC prior to run the dgp pipeline

This might be a stupid question. I followed the guideline all the way to step 4 and as I run the command in step 5 I received the following message. It is asking for the trained DLC network. Do we need to train DLC prior to run the dgp pipeline or am I missing a step here?

Thanks in advance! Great model

Update: If you think "hey I have the same question". You need to follow the same DLC steps until the step Create training dataset, which will create {PROJ_DIR}/task-scorer-date/dlc-models/iteration-0/dgpdate-trainset95shuffle1/train/pose_cfg.yaml' .

Specifically, go through Welcome -> Manage Project -> Extract Frames -> Label Frames -> Create training dataset. You should see "The training dataset is successfully created. Use the function 'train_network' to start training. Happy training!
" from your terminal.

Then run the command in step 5
`python ['{DGP_DIR}/demo/run_dgp_demo.py'] --dlcpath '{PROJ_DIR}/task-scorer-date/' --shuffle 'the shuffle to run' --dlcsnapshot 'specify the DLC snapshot if you've already run DLC with location refinement

However, this still won't get training started. It will show "Start Training" in the terminal and stuck there for hours. Any suggestions?

config_path /home/victoria/Github/deepgraphpose/dgp-victoria-2021-03-15/config.yaml
Traceback (most recent call last):
  File "/home/victoria/Github/deepgraphpose/demo/run_dgp_demo.py", line 179, in <module>
    fit_dlc(snapshot, dlcpath, shuffle=shuffle, step=0)
  File "/home/victoria/Github/deepgraphpose/src/deepgraphpose/models/fitdgp.py", line 92, in fit_dlc
    dlc_cfg = load_config(pose_config_yaml)
  File "/home/victoria/Github/deepgraphpose/src/DeepLabCut/deeplabcut/pose_estimation_tensorflow/config.py", line 55, in load_config
    return cfg_from_file(filename)
  File "/home/victoria/Github/deepgraphpose/src/DeepLabCut/deeplabcut/pose_estimation_tensorflow/config.py", line 42, in cfg_from_file
    with open(filename, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/victoria/Github/deepgraphpose/dgp-victoria-2021-03-15/dlc-models/iteration-0/dgpMar15-trainset95shuffle1/train/pose_cfg.yaml'

Insufficient memory problem

Hi,

I ran into the same problem with #7 when I tried to run the demo:

Begin Training for 5 iterations Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\dgp\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "C:\ProgramData\Anaconda3\envs\dgp\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\ProgramData\Anaconda3\envs\dgp\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,512,94,104] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/Conv2D}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

It is normal to use DLC on my GPU.

I also tried the method mentioned in #7 , adding a few lines of code after line 158 of run_dgp_demo.py:

image

But the error still did not disappear. I probably understand that the error comes from insufficient memory in CUDA, but I am not sure why this method did not work for me. My understanding of these code is to allow the GPU memory to grow automatically. Did I add it to the correct position?

Hope someone can help me solve this problem, it is very important to me! Thanks for any comments!

How to continue training from the checkpoint

Hi,

I just want to know, in the step of running DGP (step 2), is there a way to start training from checkpoint? So far, I have not found a way to continue training from the last snapshot-step2 when the training is interrupted.

Thanks for any reply!

Questions

Hey there!

Beautiful! just read your paper this looks fantastic. My study organism is octopus and we have had issues getting deeplabcut to differentiate the tips of their arms reliably. this creates lots of noise in the dataset which stops use from using tools like BSoiD effectively. This tool you have created looks like it might help and I am going to try including it in my workflow.

is this a good place to reach out with stupid questions and issues?

Do videos placed in videos_dgp need to be labeled?

Hi,

I'm trying to run DGP on some DLC-labeled data I have from a long time ago. My DLC version is not up to date, but I already have the project folder, labeled data, and I ran up to iteration-7 on the DLC network, last year.

When I try to run DGP, I get through to "Running DGP with labeled frames only" and this section:

Creating training datasets
Selected additional 95 hidden frames
Skipped 0 high motion energy (me) frames since in visible window or close to higher me hidden frame
Selected additional 95 hidden frames
Skipped 0 high motion energy (me) frames since in visible window or close to higher me hidden frame
Selected additional 95 hidden frames
Skipped 0 high motion energy (me) frames since in visible window or close to higher me hidden frame

Video: 2mt_drtop_concat_cropped has 0 visible frames selected; 95 hidden frames selected.
Video: 2mt_b2279_concat_cropped has 0 visible frames selected; 95 hidden frames selected.
Video: 2mt_c2293_concat_cropped has 0 visible frames selected; 95 hidden frames selected.
n_hidden_frames_total 5985
n_visible_frames_total 0
n_frames_total 5985

But then I get this error:

Traceback (most recent call last):
File "/tigress/vcorbit/DGP/deepgraphpose/demo/run_dgp_demo.py", line 209, in
step=1)
File "/tigress/vcorbit/DGP/deepgraphpose/src/deepgraphpose/models/fitdgp.py", line 390, in fit_dgp_labeledonly
loss, total_loss, total_loss_visible, placeholders = dgp_loss(data_batcher, dgp_cfg)
File "/tigress/vcorbit/DGP/deepgraphpose/src/deepgraphpose/models/fitdgp.py", line 885, in dgp_loss
limb_full = np.reshape(limb_full, [joint_loc_full.shape[0], 2, -1])
File "/home/vcorbit/.conda/envs/dgp/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 292, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/home/vcorbit/.conda/envs/dgp/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 0 into shape (0,2,newaxis)

So, I'm a bit confused by the instructions, should the videos I include in the videos_dgp folder for "test phase" be videos that already have DLC labels? Or can they be videos that the network has never seen?

missing dpg-windows-GPU.yaml

Hello,

I am trying to install deepgraphpose on a windows operating system. I noticed that there is no dpg-windows-GPU.yaml within the src folder. I also noticed that there is tread within issues discussing use of deepgraphpose on windows. Where would I find the dpg-windows-GPU.yaml for installation?

Thanks for your help!

DGP stuck selecting training frames ['Starting with standard pose-dataset loader.']

Hi,
after successfully running DGP on the supplied demo data set, I am now trying to run it on my already trained DLC network, which is fairly large (23 videos, ~1,000 training frames total). Possibly due to these constraints, it seems like the code is getting stuck (silently) at the point of training frame selection (the last thing I see is 'Starting with standard pose-dataset loader'). Before that, I am running the run_dgp_demo.py including the DLC snapshot (running DGP with labeled frames only), and it is successfully initializing ResNet for all the videos. After I saw it being stuck at 'Starting with standard pose-dataset loader' for ~48 hours, I started debugging, to now see that the dataset target counter (dgp/dataset.py, l. 607) is stuck at counter=59 out of nt=113 frames. Specifically, the code is starting to skip more and more frames due to belonging to another video or already having processed these (l. 627 and 632), until it is only skipping.
Without rewriting the underlying structure, is there a good work-around, maybe a more straightforward way to select the training frames?
Thanks,
Oliver

windows compat

Hi! Been trying to run this on windows and realized there are a bunch of code that operates on unix file separators, i.e.

line 597 in dataset.py
def extract_frame_num(img_path):
return int(img_path.rsplit('\', 1)[-1][3:].split('.')[0])

Could we get a simple fix (i.e. os.path.split/os.path.join) to make file separators platform-independent?

CUDA_ERROR_OUT_OF_MEMORY with 8GB GPU

Hi,
I'm very curious how DGP performs on our existing DLC data, so I installed DGP following instructions on Ubuntu 20.04 with a GeForce RTX 2080 (8GB) and Cuda Toolkit 10.0.130, driver version 450.102.04. On this machine, DLC (2.0.8) works without problems, but I'm running into memory problems when trying the test run 'python demo/run_dgp_demo.py --dlcpath data/Reaching-Mackenzie-2018-08-30 --test'. Memory monitoring shows used memory at about 5.5GB when it tries to allocate an additional 2.53GB. Is there a way to circumvent this error? With DLC, I used to solve this by allowing GPU growth, but I could see in the code this has already been included...
Maybe this part of the error message is key:
UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory
Below is the full output.
Thanks a lot!
Oliver


            =====================
            |                   |
            |                   |
            |    Running DGP    |
            |                   |
            |                   |
            =====================
            
config_path /home/oliver/Git/deepgraphpose/data/Reaching-Mackenzie-2018-08-30/config.yaml
/home/oliver/Git/deepgraphpose/data/Reaching-Mackenzie-2018-08-30/dlc-models/iteration-0/ReachingAug30-trainset95shuffle1/train/pose_cfg.yaml
Warning. Check the number of frames
Warning. Check the number of frames
Initializing ResNet
reachingvideo1
[  5  20  23  28  31  33  36  37  38  40  42  46  48  52  60  68  71  75
  77  80  87  90 100 103 108 118 119 126 141 142 145 151 152 157 167 168
 177 179 180 194 211 213 214 225 227 228 230 231 234 237 240 245]



Creating training datasets
--------------------------
loading hidden indices from /home/oliver/Git/deepgraphpose/data/Reaching-Mackenzie-2018-08-30/dlc-models/iteration-0/ReachingAug30-trainset95shuffle1/train/batched_data/snapshot-0/reachingvideo1__nsjump=None_step=1_ns=10_nc=2048_max=2000_idxs.npy
Starting with standard pose-dataset loader.



n_hidden_frames_total 204
n_visible_frames_total 52
n_frames_total 256
WARNING:py.warnings:/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

restoring resnet weights from /home/oliver/Git/deepgraphpose/data/Reaching-Mackenzie-2018-08-30/dlc-models/iteration-0/ReachingAug30-trainset95shuffle1/train/snapshot-step1-final--0
Begin Training for 5 iterations
2021-03-04 10:29:53.912679: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:29:53.913370: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:29:53.997739: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:29:53.998346: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:04.001006: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:04.003373: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:04.051038: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:04.051662: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:04.058106: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:04.058745: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:14.061350: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:14.063780: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:14.085544: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:14.086443: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.089084: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.091384: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.113320: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.114464: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.115391: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.116299: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.122787: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.123699: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.124598: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.125238: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.125967: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:24.126614: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:34.129304: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:34.131674: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:34.136555: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:34.138764: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:44.141494: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-03-04 10:30:44.143829: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2715310336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Traceback (most recent call last):
  File "/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,187,208,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/Conv2D-0-1-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[{{node ConstantFoldingCtrl/absolute_difference/weighted_loss/assert_broadcastable/AssertGuard/Switch_0}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "demo/run_dgp_demo.py", line 243, in <module>
    gm3=gm3)
  File "/home/oliver/Git/deepgraphpose/src/deepgraphpose/models/fitdgp.py", line 816, in fit_dgp
    [loss_eval, _] = sess.run([loss, train_op], feed_dict)
  File "/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/oliver/anaconda3/envs/dgp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,187,208,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/Conv2D-0-1-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[{{node ConstantFoldingCtrl/absolute_difference/weighted_loss/assert_broadcastable/AssertGuard/Switch_0}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
`

If the training set does not contain a frame where all joints are labeled, crashes with IndexError

Hello!

This issue is potentially related to #9 when training with a small number of frames. If there is not a single training frame in your training set with all joints labeled, the code will throw the error attached in the screenshot below:
Screen Shot 2021-08-26 at 10 50 52 AM

This is because all of the frames in my training set have at least one point missing. I have tracked this down to the method _compute_targets in the deepgraphpose.dataset module, line 599 (computing the max on joints across datapoints). Is there a reason not to take the number of joints directly from the config file? If not, I can make the change and submit a pull request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.