Coder Social home page Coder Social logo

droliven / msrgcn Goto Github PK

View Code? Open in Web Editor NEW
61.0 2.0 14.0 234.86 MB

Official implementation of MSR-GCN (ICCV2021 paper)

Python 100.00%
gcn multiscale residuals pytorch frequency discrete-cosine-transform fourier-transform human-motion-prediction deep-learning deterministic

msrgcn's Introduction

MSR-GCN

PWC

Official implementation of MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction (ICCV 2021 paper)

[Paper] [Supp] [Poster] [Slides] [Video]

Notes:

Authors

  1. Lingwei Dang, School of Computer Science and Engineering, South China University of Technology, China, [email protected]
  2. Yongwei Nie, School of Computer Science and Engineering, South China University of Technology, China, [email protected]
  3. Chengjiang Long, JD Finance America Corporation, USA, [email protected]
  4. Qing Zhang, School of Computer Science and Engineering, Sun Yat-sen University, China, [email protected]
  5. Guiqing Li, School of Computer Science and Engineering, South China University of Technology, China, [email protected]

Overview

    Human motion prediction is a challenging task due to the stochasticity and aperiodicity of future poses. Recently, graph convolutional network (GCN) has been proven to be very effective to learn dynamic relations among pose joints, which is helpful for pose prediction. On the other hand, one can abstract a human pose recursively to obtain a set of poses at multiple scales. With the increase of the abstraction level, the motion of the pose becomes more stable, which benefits pose prediction too. In this paper, we propose a novel multi-scale residual Graph Convolution Network (MSR-GCN) for human pose prediction task in the manner of end-to-end. The GCNs are used to extract features from fine to coarse scale and then from coarse to fine scale. The extracted features at each scale are then combined and decoded to obtain the residuals between the input and target poses. Intermediate supervisions are imposed on all the predicted poses, which enforces the network to learn more representative features. Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches.

Dependencies

  • Pytorch 1.7.0+cu110
  • Python 3.8.5
  • Nvidia RTX 3090

Get the data

Human3.6m in exponential map can be downloaded from here.

CMU mocap was obtained from the repo of ConvSeq2Seq paper.

About datasets

Human3.6M

  • A pose in h3.6m has 32 joints, from which we choose 22, and build the multi-scale by 22 -> 12 -> 7 -> 4 dividing manner.
  • We use S5 / S11 as test / valid dataset, and the rest as train dataset, testing is done on the 15 actions separately, on each we use all data instead of the randomly selected 8 samples.
  • Some joints of the origin 32 have the same position
  • The input / output length is 10 / 25

CMU Mocap dataset

  • A pose in cmu has 38 joints, from which we choose 25, and build the multi-scale by 25 -> 12 -> 7 -> 4 dividing manner.
  • CMU does not have valid dataset, testing is done on the 8 actions separately, on each we use all data instead of the random selected 8 samples.
  • Some joints of the origin 38 have the same position
  • The input / output length is 10 / 25

Train

  • train on Human3.6M:

    python main.py --exp_name=h36m --is_train=1 --output_n=25 --dct_n=35 --test_manner=all

  • train on CMU Mocap:

    python main.py --exp_name=cmu --is_train=1 --output_n=25 --dct_n=35 --test_manner=all

Evaluate and visualize results

  • evaluate on Human3.6M:

    python main.py --exp_name=h36m --is_load=1 --model_path=ckpt/pretrained/h36m_in10out25dctn35_best_err57.9256.pth --output_n=25 --dct_n=35 --test_manner=all

  • evaluate on CMU Mocap:

    python main.py --exp_name=cmu --is_load=1 --model_path=ckpt/pretrained/cmu_in10out25dctn35_best_err37.2310.pth --output_n=25 --dct_n=35 --test_manner=all

Results

H3.6M-10/25/35-all 80 160 320 400 560 1000 -
walking 12.16 22.65 38.65 45.24 52.72 63.05 -
eating 8.39 17.05 33.03 40.44 52.54 77.11 -
smoking 8.02 16.27 31.32 38.15 49.45 71.64 -
discussion 11.98 26.76 57.08 69.74 88.59 117.59 -
directions 8.61 19.65 43.28 53.82 71.18 100.59 -
greeting 16.48 36.95 77.32 93.38 116.24 147.23 -
phoning 10.10 20.74 41.51 51.26 68.28 104.36 -
posing 12.79 29.38 66.95 85.01 116.26 174.33 -
purchases 14.75 32.39 66.13 79.63 101.63 139.15 -
sitting 10.53 21.99 46.26 57.80 78.19 120.02 -
sittingdown 16.10 31.63 62.45 76.84 102.83 155.45 -
takingphoto 9.89 21.01 44.56 56.30 77.94 121.87 -
waiting 10.68 23.06 48.25 59.23 76.33 106.25 -
walkingdog 20.65 42.88 80.35 93.31 111.87 148.21 -
walkingtogether 10.56 20.92 37.40 43.85 52.93 65.91 -
Average 12.11 25.56 51.64 62.93 81.13 114.18 57.93

Results use the metric like MotionMixer, IJCAI22

H3.6M-10/25/35-256 <=80 <=160 <=320 <=400 <=560 <=1000
walking 9.54 15.36 24.89 28.89 35.24 44.99
eating 5.88 9.94 17.76 21.48 28.58 44.71
smoking 6.39 10.66 18.78 22.58 29.43 44.23
discussion 8.81 15.55 29.81 36.66 49.06 74.06
directions 6.68 12.2 24.78 31.05 42.2 65.19
greeting 11.35 19.83 37.69 46.1 60.98 89.2
phoning 7.56 12.69 22.91 27.92 37.57 60.16
posing 8.77 16.11 32.94 41.69 58.66 99.05
purchases 10.96 19.39 36.22 43.9 57.6 85.08
sitting 7.96 13.47 25.34 31.2 42.38 67.88
sittingdown 13.2 21.52 37.02 44.3 58.25 89.99
takingphoto 7.18 12.45 23.81 29.5 40.95 68.61
waiting 7.63 13.14 25.19 31.07 41.76 64.19
walkingdog 14.97 25.66 44.8 52.61 66.25 93.61
walkingtogether 8.04 13.5 23.17 27.39 34.66 47.19
average 8.99 15.43 28.34 34.42 45.57 69.21

CMU-10/25/35-all 80 160 320 400 560 1000 -
basketball 10.24 18.64 36.94 45.96 61.12 86.24 -
basketball_signal 3.04 5.62 12.49 16.60 25.43 49.99 -
directing_traffic 6.13 12.60 29.37 39.22 60.46 114.56 -
jumping 15.19 28.85 55.97 69.11 92.38 126.16 -
running 13.17 20.91 29.88 33.37 38.26 43.62 -
soccer 10.92 19.40 37.41 47.00 65.25 101.85 -
walking 6.38 10.25 16.88 20.05 25.48 36.78 -
washwindow 5.41 10.93 24.51 31.79 45.13 70.16 -
Average 8.81 15.90 30.43 37.89 51.69 78.67 37.23

Citation

If you use our code, please cite our work

@InProceedings{Dang_2021_ICCV,
    author    = {Dang, Lingwei and Nie, Yongwei and Long, Chengjiang and Zhang, Qing and Li, Guiqing},
    title     = {MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11467-11476}
}

Acknowledgments

Some of our evaluation code and data process code was adapted/ported from LearnTrajDep by Wei Mao.

Licence

MIT

msrgcn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

msrgcn's Issues

模型

您好作者,请问放出的代码的模型是不是和论文中的模型不是完全一样的呀,例如图卷积的层数,下采样上采样的层数,谢谢回答

Memory issue when evaluating the dataset

Hi! Thanks for your nice work.

Given the current new tendency of Human Motion Forecasting in Human3.6M evaluates the model in the test set based on 256 samples (not all or 8, as you provide), I wanted to get these results. However, my PC does not have enough memory to run the evaluation.
To solve this issue, it would be enough to only load the test set, rather than loading first the training set to get the global_max and global_min values. Could you please provide the values for global_max and global_min with the default configuration set?

Wrong parameter name in README.md

In Evaluate and visualize results section of README.md,
python main.py --expname=h36m --is_load=1 --model_path=ckpt/pretrained/h36m_in10out25dctn35_best_epoch82_err57.9256.pth --output_n=25 --dct_n=35 --test_manner=all

The parameter name is --expaname, but in main.py, the parameter name is actually --exp_name.

关于MSRGCN在h36m数据集上的复现问题

您好,采用您提供的源码在h36m数据集上输入10帧预测25帧,最好的平均误差会在59mm左右,达不到论文中的57,请问在训练过程中有什么要注意的地方吗?十分感谢!

Results on AMASS

Hi, thank you for sharing the implementation of your interesting work.
I want to compare your method with others on AMASS dataset.
Have you already tried this experiment? If yes, I would ask you to share the results kindly; otherwise, I would like to know if you have any suggestions to perform a fair comparison, possibly showing what I should change in the config file (Index2212/127/74, dim_repeat...).

Thank you.

Question about training

Hi @Droliven ,

Thanks for your work.

According to the code, if this work reports the result trained for 5000 epochs? How much time does it need to train on the Human3.6 3d dataset?

数据处理问题

您好,没有看懂您对于h36m和cmu两个数据集的处理方法,请问这两个路径分别存放的东西是什么?是在这里讲训练集和测试集传入模型的吗?
image

Performance of different actions.

Dear author,
There are no evaluation performances of different actions or milliseconds prediction were output in the test. I want to kown how to print these performance results (Table 1/2).
Many Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.