wmcnally / kapao Goto Github PK

KAPAO is an efficient single-stage human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

License: GNU General Public License v3.0

Shell 0.46% Python 99.54%

pytorch deep-learning human-pose-estimation yolo pose-estimation

kapao's Introduction

KAPAO (Keypoints and Poses as Objects)

Accepted to ECCV 2022

KAPAO is an efficient single-stage multi-person human pose estimation method that models keypoints and poses as objects within a dense anchor-based detection framework. KAPAO simultaneously detects pose objects and keypoint objects and fuses the detections to predict human poses:

When not using test-time augmentation (TTA), KAPAO is much faster and more accurate than previous single-stage methods like DEKR, HigherHRNet, HigherHRNet + SWAHR, and CenterGroup:

This repository contains the official PyTorch implementation for the paper:
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation.

Our code was forked from ultralytics/yolov5 at commit 5487451.

Setup

If you haven't already, install Anaconda or Miniconda.
Create a new conda environment with Python 3.6: $ conda create -n kapao python=3.6.
Activate the environment: $ conda activate kapao
Clone this repo: $ git clone https://github.com/wmcnally/kapao.git
Install the dependencies: $ cd kapao && pip install -r requirements.txt
Download the trained models: $ python data/scripts/download_models.py

Inference Demos

Note: FPS calculations include all processing (i.e., including image loading, resizing, inference, plotting / tracking, etc.). See script arguments for inference options.

Static Image

To generate the four images in the GIF above:

$ python demos/image.py --bbox
$ python demos/image.py --bbox --pose --face --no-kp-dets
$ python demos/image.py --bbox --pose --face --no-kp-dets --kp-bbox
$ python demos/image.py --pose --face

Shuffling Video

KAPAO runs fastest on low resolution video with few people in the frame. This demo runs KAPAO-S on a single-person 480p dance video using an input size of 1024. The inference speed is ~9.5 FPS on our CPU, and ~60 FPS on our TITAN Xp.

CPU inference:

To display the results in real-time:
$ python demos/video.py --face --display

To create the GIF above:
$ python demos/video.py --face --device cpu --gif

CPU specs:
Intel Core i7-8700K
16GB DDR4 3000MHz
Samsung 970 Pro M.2 NVMe SSD

Flash Mob Video

This demo runs KAPAO-S on a 720p flash mob video using an input size of 1280.

GPU inference:

To display the results in real-time:
$ python demos/video.py --yt-id 2DiQUX11YaY --tag 136 --imgsz 1280 --color 255 0 255 --start 188 --end 196 --display

To create the GIF above:
$ python demos/video.py --yt-id 2DiQUX11YaY --tag 136 --imgsz 1280 --color 255 0 255 --start 188 --end 196 --gif

Red Light Green Light

This demo runs KAPAO-L on a 480p clip from the TV show Squid Game using an input size of 1024. The plotted poses constitute keypoint objects only.

GPU inference:

To display the results in real-time:
$ python demos/video.py --yt-id nrchfeybHmw --imgsz 1024 --weights kapao_l_coco.pt --conf-thres-kp 0.01 --kp-obj --face --start 56 --end 72 --display

To create the GIF above:
$ python demos/video.py --yt-id nrchfeybHmw --imgsz 1024 --weights kapao_l_coco.pt --conf-thres-kp 0.01 --kp-obj --face --start 56 --end 72 --gif

Squash Video

This demo runs KAPAO-S on a 1080p slow motion squash video. It uses a simple player tracking algorithm based on the frame-to-frame pose differences.

GPU inference:

To display the inference results in real-time:
$ python demos/squash.py --display --fps

To create the GIF above:
$ python demos/squash.py --start 42 --end 50 --gif --fps

Depth Video

Pose objects generalize well and can even be detected in depth video. Here KAPAO-S was run on a depth video from a fencing action recognition dataset.

The depth video above can be downloaded directly from here. To create the GIF above:
$ python demos/video.py -p 2016-01-04_21-33-35_Depth.avi --face --start 0 --end -1 --gif --gif-size 480 360

Web Demo

A web demo was integrated to Huggingface Spaces with Gradio (credit to @AK391). It uses KAPAO-S to run CPU inference on short video clips.

COCO Experiments

Download the COCO dataset: $ sh data/scripts/get_coco_kp.sh

Validation (without TTA)

KAPAO-S (63.0 AP): $ python val.py --rect
KAPAO-M (68.5 AP): $ python val.py --rect --weights kapao_m_coco.pt
KAPAO-L (70.6 AP): $ python val.py --rect --weights kapao_l_coco.pt

Validation (with TTA)

KAPAO-S (64.3 AP): $ python val.py --scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-M (69.6 AP): $ python val.py --weights kapao_m_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-L (71.6 AP): $ python val.py --weights kapao_l_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1

Testing

KAPAO-S (63.8 AP): $ python val.py --scales 0.8 1 1.2 --flips -1 3 -1 --task test
KAPAO-M (68.8 AP): $ python val.py --weights kapao_m_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1 --task test
KAPAO-L (70.3 AP): $ python val.py --weights kapao_l_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1 --task test

Training

The following commands were used to train the KAPAO models on 4 V100s with 32GB memory each.

KAPAO-S:

python -m torch.distributed.launch --nproc_per_node 4 train.py \
--img 1280 \
--batch 128 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5s6.pt \
--project runs/s_e500 \
--name train \
--workers 128

KAPAO-M:

python train.py \
--img 1280 \
--batch 72 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5m6.pt \
--project runs/m_e500 \
--name train \
--workers 128

KAPAO-L:

python train.py \
--img 1280 \
--batch 48 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5l6.pt \
--project runs/l_e500 \
--name train \
--workers 128

Note: DDP is usually recommended but we found training was less stable for KAPAO-M/L using DDP. We are investigating this issue.

CrowdPose Experiments

Install the CrowdPose API to your conda environment:
$ cd .. && git clone https://github.com/Jeff-sjtu/CrowdPose.git
$ cd CrowdPose/crowdpose-api/PythonAPI && sh install.sh && cd ../../../kapao
Download the CrowdPose dataset: $ sh data/scripts/get_crowdpose.sh

Testing

KAPAO-S (63.8 AP): $ python val.py --data crowdpose.yaml \
--weights kapao_s_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-M (67.1 AP): $ python val.py --data crowdpose.yaml \
--weights kapao_m_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-L (68.9 AP): $ python val.py --data crowdpose.yaml \
--weights kapao_l_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1

Training

The following commands were used to train the KAPAO models on 4 V100s with 32GB memory each. Training was performed on the trainval split with no validation. The test results above were generated using the last model checkpoint.

KAPAO-S:

python -m torch.distributed.launch --nproc_per_node 4 train.py \
--img 1280 \
--batch 128 \
--epochs 300 \
--data data/crowdpose.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5s6.pt \
--project runs/cp_s_e300 \
--name train \
--workers 128 \
--noval

KAPAO-M:

python train.py \
--img 1280 \
--batch 72 \
--epochs 300 \
--data data/crowdpose.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5m6.pt \
--project runs/cp_m_e300 \
--name train \
--workers 128 \
--noval

KAPAO-L:

python train.py \
--img 1280 \
--batch 48 \
--epochs 300 \
--data data/crowdpose.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5l6.pt \
--project runs/cp_l_e300 \
--name train \
--workers 128 \
--noval

Acknowledgements

This work was supported in part by Compute Canada, the Canada Research Chairs Program, the Natural Sciences and Engineering Research Council of Canada, a Microsoft Azure Grant, and an NVIDIA Hardware Grant.

If you find this repo is helpful in your research, please cite our paper:

@article{mcnally2021kapao,
  title={Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation},
  author={McNally, William and Vats, Kanav and Wong, Alexander and McPhee, John},
  journal={arXiv preprint arXiv:2111.08557},
  year={2021}
}

Please also consider citing our previous works:

@inproceedings{mcnally2021deepdarts,
  title={DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single Camera},
  author={McNally, William and Walters, Pascale and Vats, Kanav and Wong, Alexander and McPhee, John},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4547--4556},
  year={2021}
}

@article{mcnally2021evopose2d,
  title={EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation Using Accelerated Neuroevolution With Weight Transfer},
  author={McNally, William and Vats, Kanav and Wong, Alexander and McPhee, John},
  journal={IEEE Access},
  volume={9},
  pages={139403--139414},
  year={2021},
  publisher={IEEE}
}

kapao's People

Contributors

Stargazers

Watchers

Forkers

peternara chandanpanda isra60 lulu1315 nastarankhaleghi scott-mao ak391 cc1164 tuan-l cedro3 suryatmodulus cmeninwa zhanghongyong123456 lautarodc aloshkad faisalshahbaz mfkiwl uservinu orestis-z yinglang jinwook-shim laughing-q louisnust zfyong davincibj marearts houshuaishuai jie311 nisheethjaiswal jaedukseo sjzyzz amalaj7 tianbingsheng cihanongun moileehyeji thejoemarsh ml-and-ai-repo cooparation fahrizalfarid cretucalin sugi-san kjlee18 ewrwrnjwqr metavai wuzhenyusjtu madenburak luigibot kishan-992013 gdoongmathew chenyouxin113 jonguo111 lazylazypig nomissbowling xueyf21 linruohan0127 zobeirraisi atangfan cjlu610 asdf2kr zchenwang zhenpengchencode everguard-inc gellston tiger1933 safwennaimi paleomoon paipaipaidaxing chienbienbac peanut1028 ppogg xinsuinizhuan guyuex shuaijun-deng xuewengeophysics dfqytcom shuimoop zxf864823150 jiaerwang0328 airhors jiangpan1997 opt-ai kwyoung04 feelme0461 ustbwanglu raojiyong leejh0915 ajunlonglive duonghungcuong gg-big-org yurkar2333 boysong777 piantou autogyro chssozxw gracekafuu kenny-aictive zyqdragon jxncyym winterxx chenjian7578

kapao's Issues

hand pose

Hi,

Me again ^_^

I just want to inquiry if you have plan to do whole body pose estimation including hand pose?

Thanks and best!

Nearly 50% images are missed in training and validation

Scanning data/datasets/coco/kp_labels/img_txt/train2017.cache images and labels... 64115 found, 54172 missing, 0 empty, 0 corrupted'
Scanning data/datasets/coco/kp_labels/img_txt/val2017.cache images and labels... 2693 found, 2307 missing, 0 empty, 0 corrupted'

I wonder if you are using an ad-hoc way of data preprocessing/filtering. If you were doing so, then the numbers you had reported in the paper would not be comparable to DEKR.

How to fuse the results of different scales?

Thanks for your great work. And I wonder during the inference, how you combine the results of 4 different output grids? Will there be some special fusion? Looking forward to your reply.

How about support Halpe dataset?

Hi, kappa seems very promising, but I think it can be more powerful.
Would u consider add Halpe dataset support? Here I am not mean add a whole body key points , but can u train a body25 key points model?

body25 can be much more useful then simple 17 key points in coco. It can provides foot motion as well and you simply don't need changing anything just dataset.

Halpe dataset link: (it just using coco and MPII images)
https://github.com/Fang-Haoshu/Halpe-FullBody

Inference time 16 bit vs 32 bit

In the video.py file i hardcoded half = False so as to avoid any conversion to half precision.
But still the inference time was same ?

Is it because the weights by default are half precision and therefore I cannot measure how much time your model will take if it was 32 bit ?

Keypoint loss calculation

In the code you apply l2 loss for keypoints.
And you consider the keypoints which have v>0.

So thus this mean that you didn't penalize the False Positive keypoints. As you also filter the predictions based on vis tensor.

Input image size and speed

In your paper, it's mentioned that the input image is resized to 1280*1280. My question is if there is a version of Kapao which takes a smaller image size (e.g. 640) which can speed up the inference at the expense of lower accuracy?

Confidence of all keypoints in pose_object

You have mentioned in the paper:
1-Human pose predictions typically contain a sparse set of keypoint confidences because pose_object doesn't provide keypoint confidences by design.
2- If we reply solely on keypoint objects then we miss a lot of keypoints.

So what is the challenge in adding another 17 values in the object representation. So instead of 34 (x,y) it would become 51 (x,y,c) values.

DEKR performance on COCO val2017 are much higher than what is reported in your paper

Here is a table summarizing the performance of DEKR and KAPAO on COCO val 2017. The numbers are directly copied from DEKR's Table 2 and KAPAO's Table 1. We can the that the number reported in DEKR's paper is much higher that what is reported in KAPAO's paper. Is there anything I misunderstood? Thank you and hope your reply.

Method	TTA	Input Size	Parameters	AP	AR
DEKR-W48	N	640	65.7	66.3	73.2
DEKR-W48	N	640	N/A	71	76
KAPAO-S	N	1280	12.6	63	70.2

Oks

Which equation do you use to compute oks for crowdpose?

Doubts regarding ANCHORS

1- In the yolovXXX.yaml file there are anchor values in the multiple for 3. What are these values and how are they calculated ?
2- Aren't they suppose to be 2 values for each anchor channel i.e. w and h ? Therefore 3x2 values. 3 channels and 2 value each ?
3- In yaml file comments: Does P represent size of feature maps after certain layer ? Anchors are described as P3/8 , P4/16, P5/32 , P6/64. Can you elaborate that. This is also related to question 1.
4- In the yolo.py file, you transform 4 values into x,y,w,h using anchor_grid, grid, stride. Can you explain that calculation of xy and wh ?

This image is related to yolo: Transformation of 4 values to box (xywh).

Typos in Eq (3) and Eq (4)?

Dear McNally:

This is such a fantastic work! We love it.

I have a few questions below regarding Eq. 4:

In Eq. 3 and Eq. 4, should the A_w/s be A_h/s for t_h and v{yk}_?
Eq.4 is an extension of v's parameterization from YOLOv4/5.
2.1. Do you have any ablation study on the parameterization's extension? Why don't you adopt 4σ(v)−2 for v?
2.2. What are the intuitions behind?

Thanks!

Multi person body key-point labeling tool.

Not sure if I should I ask this here.

Can anyone please recommend an open source body keypoints labeling tool (multi person).

BCEwithlogitsloss for 18 classes

The question is why do yolo (and also Kapao) uses BCEwithlogitsLoss for 18 classes?
1- Yolo and Kapao is multi class single label problem.
2- But the above loss is good for multi class multi label. As it doesn't uses Soft-max

Loss for target grid which is both Pose and Keypoint object

How do u compute loss for a target grid which represents both Keypoint and Pose object ?

Question about tau_{ck}

How can I change the value of tau_{ck} in the code? I want to make it zero to have a higher number of keypoint confidences returned as described in Section 3.4 of the paper.

Body Global Pose

How can I get the global body pose in 3D space with respect to camera coordinate frame? Thank You

Convert trained model to TFlite

Hello Team

I want to convert trained model to tflite model, can you add support for that as well if possible

What is the license?

Can you clarify what kind of license there is for this?

colab

please add a google colab for inference

Multi-Class Pose estimation

@wmcnally thanks for the code base , how to extend the current code base to train for both person pose estimation and animal pose estimation ? what are all the changes to be made for multi class pose estimation

3D Pose estimation extention

@wmcnally can we train kapao on this dataset "http://vision.imar.ro/human3.6m/description.php" since there is depth parameter of the pose aslo involved
Please share your thoughts

Animal Pose estimation using KAPAO

@Kanav123 thanks for the open source code base I had a few queries

can we train kapo for animal pose estimation on the animal pose dataset? if so what are the changes to be made
can we have hand pose, facial landmarks, and body pose from the same architecture? can we modify kapo architecture for these things if so what is ur suggestions

thanks in advance

Error when i run train.py in crowdpose datasets

When i run kapao-s in crowdpose, there are some errors
' (kapao) ywk@hello-Precision-3640-Tower:~/Desktop/Paper/kapao$ python -m torch.distributed.launch --nproc_per_node 4 train.py \

--img 1280
--batch 128
--epochs 300
--data data/crowdpose.yaml
--hyp data/hyps/hyp.kp-p6.yaml
--val-scales 1
--val-flips -1
--weights yolov5s6.pt
--project runs/cp_s_e300
--name train
--workers 128
--noval
/home/ywk/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

train: weights=yolov5s6.pt, cfg=, data=data/crowdpose.yaml, hyp=data/hyps/hyp.kp-p6.yaml, epochs=300, batch_size=128, imgsz=1280, rect=False, resume=False, nosave=False, noval=True, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=128, project=runs/cp_s_e300, entity=None, name=train, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=0, freeze=0, patience=100, val_scales=[1.0], val_flips=[-1], autobalance=False
Traceback (most recent call last):
File "train.py", line 601, in
main(opt)
File "train.py", line 489, in main
assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command'
AssertionError: insufficient CUDA devices for DDP command
YOLOv5 🚀 2021-12-29 torch 1.9.1+cu102 CUDA:0 (GeForce RTX 2080 Ti, 11012.4375MB)

Added key: store_based_barrier_key:1 to store for rank: 0
Traceback (most recent call last):
File "train.py", line 601, in
main(opt)
File "train.py", line 489, in main
assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command'
AssertionError: insufficient CUDA devices for DDP command
Traceback (most recent call last):
File "train.py", line 601, in
main(opt)
File "train.py", line 489, in main
assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command'
AssertionError: insufficient CUDA devices for DDP command
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 17304) of binary: /home/ywk/anaconda3/envs/kapao/bin/python
Traceback (most recent call last):
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/distributed/run.py", line 692, in run
)(*cmd_args)
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ywk/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

        train.py FAILED

=======================================
Root Cause:
[0]:
time: 2022-01-05_18:05:39
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 17304)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
[1]:
time: 2022-01-05_18:05:39
rank: 2 (local_rank: 2)
exitcode: 1 (pid: 17305)
error_file: <N/A>
msg: "Process failed with exitcode 1"
[2]:
time: 2022-01-05_18:05:39
rank: 3 (local_rank: 3)
exitcode: 1 (pid: 17306)
error_file: <N/A>
msg: "Process failed with exitcode 1"

'
can you tell me how to solve the problem?
Thx!

the fusing operation described in paper

Hi, this project is impressive, but I have some questions there.

"fusing the candidate keypoint objects with the candidate pose objects using a distance tolerance τfd." --from the paper.

maybe the fusing operation code is there?
but the code doesn't change variable poses. And I comment the code in if data['use_kp_dets'] and nkp then run youtube.py, the results are correct too.
is there a bug? or my mistake?

Pose Tracking

Do you have a solution to make pose tracking with this algorithm

Kapao structure

Is yolo5 the backbone of Kapao? the same weights? or you have trained the whole model?

precision & recall still 0 when i train coco2017

I train coco2017 with coco-kp.yaml weights yolov5s6.pt

in the result.csv ，mAP is 0.83，but metrics precision &metrics recall still 0 。
run the val.py I can see the score are most 0.003611，0.0027580，closed to 0。
eg.
{"image_id": 463730, "category_id": 1, "keypoints": [584.2581176757812, 211.4682159423828, 0.0, 584.8880615234375, 209.99844360351562, 0.0, 583.2449340820312, 210.3875274658203, 0.0, 585.7825927734375, 210.25306701660156, 0.0, 580.646484375, 211.4004364013672, 0.0, 587.0042724609375, 215.35562133789062, 0.0, 578.5013427734375, 216.1226348876953, 0.0, 589.2357177734375, 220.04568481445312, 0.0, 573.6966552734375, 222.9219970703125, 0.0, 585.150634765625, 220.6389617919922, 0.0, 577.4009399414062, 223.36631774902344, 0.0, 582.9385986328125, 228.6605987548828, 0.0, 577.1669921875, 228.58680725097656, 0.0, 579.8134155273438, 239.3750762939453, 0.0, 571.2628173828125, 239.9718475341797, 0.0, 575.2064208984375, 253.8463134765625, 0.0, 569.2610473632812, 254.59164428710938, 0.0], "score": 0.002862187335267663},

How to train my own data?

Thank you for your splendid work! But I have some questions about training my own data by using your model. For example, what should I do on my label files(.json)?

puzzle about Algorithm 1

Hi，

I am deeply impressed by your work, however I feel puzzled about the Algorithm 1 in the paper.

There are some symbols in Algorithm 1 are not defined in the description, also the function of Pˆ ∗m[k, 3] < max(Oˆkc ) is not well explained.

I would be deeply appreciated if some modifications can be made to Algorithm 1 to make it clear.

Thanks!

Can you provide the right utils/datasets.py file?

I got an error when running the train process.
assert nf > 0 or not augment, f'{prefix}No labels in {cache_path}. Can not train without labels. See {HELP_URL}'
It shows that the provided utils/datasets.py file seems wrong for not changing into loading keypoints json file. So, could you please provide the right utils/datasets.py file?

Can this implementation be used for key-point object detection?

First of all congrats on great work.
I have a scenario where we try to detect/track objects by the central key point of the object without a bounding box.
Do you think the KAPAO can be sued for this purpose?

Why using the ultra-high 1280 resolution?

It is a fantastic and extraordinary job.
Quick question, why do you adopt 1280 resolution, since all other works are using 640?
Does the high AP come from high resolution rather than fancy model design?

Kapo Model structure

@Kanav123 @wmcnally i am having a framework that has both yolov5 and kapao in the same folder structure , when i run inference of the kapao it tries to load from the model folder yolo.py file , i tried renaming the file and loading the kapao but it gives an error, when looked deep into it during training the model node type properties are model.yolo_ . Can we change the model node properties while retraining ? can you please share ur thoughts

Thanks in adavance

the result of kapao-s

i train the kapao-s，but the result is not well，Is this the difference GPU brings？

YoLov52021-12-14 torch 1.9.1+cu102 CUDA:0(GeForce RTX 2080 SUPER，7982.3125NB)
Average Precision(AP)@[ IoU=0.50:0.95 | area=all | maxDets= 20 ] = 0.587
Average Precision(AP)@[ IoU=0.50| area=all l maxDets= 20 ] = 0.840
Average Precision(AP) [ IoU=0.75i area=alli maxDets= 20 j = 0.644
Average Precision(AP) @[ IoU=0.50:0.95i area=mediumimaxDets= 20 j = 0.542
Average Precision(AP)[ IoU=0.50:0.95i area= large i maxDets= 20 ] = 0.663
Average Recall(AR) @[ IoU=0.50:0.95area=all i maxDets= 20 ] = 0.662
Average Recall(AR) [ IoU=0.50i area=all j maxDets= 20 j = 0.890
Average Recall(AR)@[IoU=0.75i area=alli maxDets= 20 ] = 0.715
Average Recall(AR)@[ IoU=0.50:0.95 area=medium| maxDets= 20 ] = 0.610
Average Recall(AR) IoU=0.50:0.95i area= large | maxDets= 20 ] =0.737.
Speed: 0.518ms pre-process，12.620ms inference，5.793ms ilS per image at shape (1，3，1280，1280)

and can you tell me what the keypoint object Fused mean?

Kapao Half precision AP

Have you validated Kapao AP for half precision? In other words, how half precision affects AP and speed for S/M/L models?

Not working on RTX 3060 and RTX 3090

I used the installation instructions with conda on RTX 3060 and RTX 3090.
Cuda 11.4 Ubuntu 20 and 21 both. Getting this error.

`/kapao$ python demos/video.py --yt-id nrchfeybHmw --imgsz 1024 --weights kapao_l_coco.pt --conf-thres-kp 0.01 --kp-obj --face --start 56 --end 72 --display
Downloading demo video...
Done.
/home/beltech/anaconda3/envs/kapao2/lib/python3.6/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Using device: cuda:0
Traceback (most recent call last):
File "demos/video.py", line 115, in
model = attempt_load(args.weights, map_location=device) # load FP32 model
File "/home/beltech/kapao/models/experimental.py", line 96, in attempt_load
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval()) # FP32 model
File "/home/beltech/anaconda3/envs/kapao2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 692, in float
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
File "/home/beltech/anaconda3/envs/kapao2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in _apply
module._apply(fn)
File "/home/beltech/anaconda3/envs/kapao2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in _apply
module._apply(fn)
File "/home/beltech/anaconda3/envs/kapao2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/beltech/anaconda3/envs/kapao2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 552, in _apply
param_applied = fn(param)
File "/home/beltech/anaconda3/envs/kapao2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 692, in
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
`

Convert to onnx

After converting the PT model to onnx, the simplifier will report an error, and the onnxruntime will also report an error (failed: node (mul_3056) op (mul) [shapeinferenceerror] incompatible dimensions). Is this the reason for yolov5？

Importing as a package

Hi guys. I want to use kapao models in my project. Is it possible to import repo as a package? Or probably you can recommend how to properly load these models?

invalid memory access

val.py

I'm getting consistent failures on memory invalid access. I think this statement is sharing the same memory addresses simultaneously:

if len(poses_mask):
    kpd[:, :4] = scale_coords(imgs[si].shape[1:], kpd[:, :4], shape)
    kpd = kpd[:, :6].cpu()

It seems to work:

if len(poses_mask):
    kpd_local = kdp.clone()
    kpd[:, :4] = scale_coords(imgs[si].shape[1:], kpd_local[:, :4], shape)
    kpd = kpd[:, :6].cpu()

GPU or TensorRT

Can we run it on GPU?
The FPS is mentioned but hardware specs are not.
Can we convert it to tensorRT or OpenVino to run it on edge?

Amazing paper, found you on papers with code.

Onnx Tensort

@wmcnally do we have model converted with onnx or tensorRT along with the inference code

What is the exact difference between KAPAO-S, M, L models?

I wonder what is the exact network architecture of these three models: KAPAO-S, M, L? Do you have relevant config files?

training speed

In my V100 32 GB GPU，I just can train one epoch one hour. If training 500 epoch，need 500 hours？

Batched inference (stacked images)

By stacking several images (from a video or camera) in a tensor and then feeding it to the inference, do you expect a higher FPS? In other words, on my GPU, inference for one image takes 35% GPU usage, how to use the GPU full capacity at inference? Any idea?

TensorFlow and TFlite export

I've seen that in the past some commits mention the possibility to export the model in TensorFlow and in TFLite. Is this operation still possible? And if it is the case, how can it be done?

Inference showing error

I wanted to test the model on Google Colab. After setting up the requirements and models I ran the following command

The above error was shown. Any reason why?

For more clarity on what I have done you can view my colab notebook link

'NoneType' object has no attribute 'span'

When I run any of demos, I get an error. Any suggestions?

(kapao) Me:~/kapao$ python demos/squash.py --display --fps
Traceback (most recent call last):
File "demos/squash.py", line 83, in
stream = [s for s in yt.streams if s.itag == 137][0] # 1080p, 25 fps
File "/home/media/miniconda3/envs/kapao/lib/python3.6/site-packages/pytube/main.py", line 292, in streams
return StreamQuery(self.fmt_streams)
File "/home/media/miniconda3/envs/kapao/lib/python3.6/site-packages/pytube/main.py", line 177, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "/home/media/miniconda3/envs/kapao/lib/python3.6/site-packages/pytube/extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "/home/media/miniconda3/envs/kapao/lib/python3.6/site-packages/pytube/cipher.py", line 44, in init
self.throttling_array = get_throttling_function_array(js)
File "/home/media/miniconda3/envs/kapao/lib/python3.6/site-packages/pytube/cipher.py", line 323, in get_throttling_function_array
str_array = throttling_array_split(array_raw)
File "/home/media/miniconda3/envs/kapao/lib/python3.6/site-packages/pytube/parser.py", line 158, in throttling_array_split
match_start, match_end = match.span()
AttributeError: 'NoneType' object has no attribute 'span'

ERROR: Cannot uninstall 'certifi'.

How to run this on live webcam?

Hi,
i wanted to run this on my webcam, can anyone help me how to do that?

wmcnally / kapao Goto Github PK

kapao's Introduction

KAPAO (Keypoints and Poses as Objects)

Setup

Inference Demos

Static Image

Shuffling Video

Flash Mob Video

Red Light Green Light

Squash Video

Depth Video

Web Demo

COCO Experiments

Validation (without TTA)

Validation (with TTA)

Testing

Training

CrowdPose Experiments

Testing

Training

Acknowledgements

kapao's People

Contributors

Stargazers

Watchers

Forkers

kapao's Issues

======================================= Root Cause: [0]: time: 2022-01-05_18:05:39 rank: 1 (local_rank: 1) exitcode: 1 (pid: 17304) error_file: <N/A> msg: "Process failed with exitcode 1"

val.py

Recommend Projects

Recommend Topics

Recommend Org

=======================================
Root Cause:
[0]:
time: 2022-01-05_18:05:39
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 17304)
error_file: <N/A>
msg: "Process failed with exitcode 1"