Coder Social home page Coder Social logo

pointhmr_release's Introduction

Sampling is Matter: Point-guided 3D Human Mesh Reconstruction

PWC

This repository is an official Pytorch implementation of the paper "Sampling is Matter: Point-guided 3D Human Mesh Reconstruction"
Jeonghwan Kim*, Mi-Gyeong Gwon*, Hyunwoo Park, Hyukmin Kwon, Gi-Mun Um, and Wonjun Kim (Corresponding Author)
* equally contributed
🍁 IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2023. 🍁

👀 Overview

  • We propose to utilize the correspondence between encoded features and vertex positions, which are projected into the 2D space, via our point-guided feature sampling scheme. By explicitly indicating such vertex-relevant features to the transformer encoder, coordinates of the 3D human mesh are accurately estimated.
  • Our progressive attention masking scheme helps the model efficiently deal with local vertex-to-vertex relations even under complicated poses and occlusions.

⚙️ How to use it

Installation

Please refer to Installation.md for installation.

Download

We provide guidelines to download pre-trained models and datasets.
Please check Download.md for more information.

Demo

We provide demo codes to run end-to-end inference on the test images.

Please check Demo.md for more information.

Experiments

We provide guidelines to train and evaluate our model on Human3.6M and 3DPW.

Please check Experiments.md for more information.

📃 Results

Quantitative result

Model Dataset MPJPE PA-MPJPE Checkpoint
PointHMR-HR32 Human3.6M 48.3 32.9 Download
PointHMR-HR32 3DPW 73.9 44.9 Download

Qualitative results

Results on 3DPW dataset:

Results on COCO dataset:

License

This research code is released under the MIT license. Please see LICENSE for more information.

SMPL and MANO models are subject to Software Copyright License for non-commercial scientific research purposes. Please see SMPL-Model License and MANO License for more information.

We use submodules from third party (hassony2/manopth). Please see NOTICE for more information.

Acknowledgments

This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (2021-0-02084, eXtended Reality and Volumetric media generation and transmission technology for immersive experience sharing in noncontact environment with a Korea-EU international cooperative research).

Our implementation and experiments are built on top of open-source GitHub repositories. We thank all the authors who made their code public, which tremendously accelerates our project progress. If you find these works helpful, please consider citing them as well.

microsoft/MeshTransformer
microsoft/MeshGraphormer
postech-ami/FastMETRO
Arthur151/ROMP

Citation

@InProceedings{PointHMR,
author = {Kim, Jeonghwan and Gwon, Mi-Gyeong and Park, Hyunwoo and Kwon, Hyukmin and Um, Gi-Mun and Kim, Wonjun},
title = {{Sampling is Matter}: Point-guided 3D Human Mesh Reconstruction},
booktitle = {CVPR},
month = {June},
year = {2023}
}

pointhmr_release's People

Contributors

hwp97 avatar jhkim0759 avatar kmk3942 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pointhmr_release's Issues

About M

Thank you for your excellent work! We noticed in your paper that M refers to the distance threshold of local connectivity in self-attention, can you explain in detail how to understand the distance threshold?

Note that N and M denote
the number of vertices and the distance threshold for defining the local connection in self-attention, respectively.

Checkpoint licensing question

Just wanted to confirm that the checkpoints are MIT licensed just like this repository. Thank you for all of your great work!

what the joint token does

Hi, Thanks for greate work,
I wonder what the joint token does, does it influence the recovery of vertices by interacting with the vertex token?
And as you mentioned, during the training phase, it is randomly initialized and optimized, so in the test phase, is it involved as a trained parameter?
Thanks!

About grid feature

Hi, I've reading your paper, impressive work!
I've been wondering how the grid tokens work, I guess it is used to compensate the inaccurate point estimation? Have you guys done any ablation study about grid tokens and maybe can give some data when PointHMR is without grid token?

projection question

Thanks for your great work. I'd like to know how to do the 3d to 2d projection because the single view mesh estimation task don't have the cam parameters.

Why use grid tokens and joint tokens?

In your paper, you mention that

It is noteworthy that the grid feature plays an important role to create the united body structure by aligning each point in an appropriate location.

Is there any experiment that can prove this statement? Will there be any significant performance drop if you remove the grid feature?

In addition, since your objective is to predict the human mesh, why do you need joint tokens? Also, why do you use sampling techniques on vertices tokens only but exclude joint tokens?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.