Coder Social home page Coder Social logo

river-zhang / gta Goto Github PK

View Code? Open in Web Editor NEW
95.0 7.0 4.0 2.37 MB

[NeurIPS 23] Official repository for NeurIPS 2023 paper "Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction"

Home Page: https://river-zhang.github.io/GTA-projectpage/

Python 98.14% Shell 0.53% GLSL 1.34%
3d clothed-humans clothed-people-digitalization digital human reconstruction vision neurips-2023 python pytorch

gta's Introduction

Official Implementation for GTA (NeurIPS 2023)

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction (NeurIPS 2023) [Paper] [Website]

News

  • [2023/12/29] We are thrilled to announce the release of our latest model, SIFU, offering enhanced geometry and texture reconstruction capabilities!
  • [2023/11/30] We release code including inference and testing.
  • [2023/9/26] We release the arXiv version (Paper in arXiv).

TODO

  • Hugging Face
  • [√] Release code
  • [√] Release paper

Introduction

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing. Current methods exhibit limitations in performance, largely attributable to their dependence on insufficient 2D image features and inconsistent query methods. Owing to this, we present the Global-correlated 3D-decoupling Transformer for clothed Avatar reconstruction (GTA), a novel transformer-based architecture that reconstructs clothed human avatars from monocular images. Our approach leverages transformer architectures by utilizing a Vision Transformer model as an encoder for capturing global-correlated image features. Subsequently, our innovative 3D-decoupling decoder employs cross-attention to decouple tri-plane features, using learnable embeddings as queries for cross-plane generation. To effectively enhance feature fusion with the tri-plane 3D feature and human body prior, we propose a hybrid prior fusion strategy combining spatial and prior-enhanced queries, leveraging the benefits of spatial localization and human body prior knowledge. Comprehensive experiments on CAPE and THuman2.0 datasets illustrate that our method outperforms state-of-the-art approaches in both geometry and texture reconstruction, exhibiting high robustness to challenging poses and loose clothing, and producing higher-resolution textures.

framework

Installation

git clone https://github.com/River-Zhang/GTA.git
sudo apt-get install libeigen3-dev ffmpeg
cd GTA
conda env create -f environment.yaml
conda activate gta
pip install -r requirements.txt

Please download the checkpoint and place them in ./data/ckpt

Please follow ICON to download the extra data, such as HPS and SMPL.

Inference

python -m apps.infer -cfg ./configs/GTA.yaml -gpu 0 -in_dir ./examples -out_dir ./results -loop_smpl 100 -loop_cloth 200 -hps_type pixie

Testing

# 1. Register at http://icon.is.tue.mpg.de/ or https://cape.is.tue.mpg.de/
# 2. Download CAPE testset (Easy: 50, Hard: 100)
bash fetch_cape.sh 
# 3. Check CAPE testset via 3D visualization
python -m lib.dataloader_demo -v -c ./configs/train/GTA.yaml -d cape

# evaluation
python -m apps.train -cfg ./configs/train/GTA.yaml -test

# TIP: the default "mcube_res" is 256 in apps/train.

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{zhang2023globalcorrelated,
      title={Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction}, 
      author={Zhang, Zechuan and Sun, Li and Yang, Zongxin and Chen, Ling and Yang, Yi},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
      year={2023}
}

Acknowledgement

Our implementation is mainly based on ICON and PIFu, and many thanks to the following open-source projects:

In addition, we sincerely thank Yuliang Xiu, the author of ICON and ECON for resolving many of our concerns in GitHub Issues.

More related papers about 3D avatars: https://github.com/pansanity666/Awesome-Avatars

gta's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

gta's Issues

about training

if i don't use your ckpt, How long does it take to get your results?

get_sampling_geo

The get_sampling_geo method is used to obtain the geometric sampling points in the code. smplx_verts can be obtained from smplx_param using the compute_smpl_verts method.
image
Use load_fit_body in comput_smpl_verts to obtain the mesh of smplx.
image
The first problem is that:
in load_fit_body:
image
in compute_smpl_verts:
image
load_fit_body returns the smpl_mesh value of a trimesh to smpl_out, but comput_smpl_verts returns only the vertices of smpl_out. Does that mean that the trimesh step for smpl_verts is not necessary? We just return smpl_verts to smpl_out and smpl_out returns it via comput_smpl_verts.
The second problem is that:
in load_fit_body:
image
What are the param['scale'] and param['translation'] and scale? How did you get it?Why can't we just use the vertices obtained by smpl_model via the smplx parameter?

Question about evaluation.

Hi, thanks for your work.

I noticed you used GT normals while testing on THuman2.0 for normal evaluation of different views (as Table 2 in the paper). I wonder if you are using GT normals too or only GT SMPL-X while testing on CAPE (as in Table 1)?

image

image

The test results seems to be inconsistent with the paper

I tried your code in Ubuntu20.04 cuda 11.8 pytorch2.0.1
I have got the results:

GTA.mp4

but the corresponding results in your paper is much better obviously, as shown in the following:
GTA

I want to know if my result is reasonable。

Ask for training code

Excellent work! I would like to ask if the training code is included in the repository?

Question about numbers, evaluation.

Hello I have two questions regarding the test results in your two papers GTA and SIFU.

  1. I see Issue #5 and your explanation in it. But I still don't understand why your GTA numbers for THuman 2.0 is different.
    In GTA paper, it is Chamfer 0.814, P2S 0.862, Normal 0.055. In SIFU, it is 0.73, 0.72, 0.04.

  2. I noticed that in both of your papers evaluation code, you are using GT front and back normal. This is different from ICON's evaluation protocol where they use estimated normal. (YuliangXiu/ICON#183)
    If using estimated normal, your GTA numbers for THuman 2.0 should be 1.12, 1.12, 0.065.

Could you please clarify these two points? Thank you!

about SMPLX model

In which part of your code do you use a Pixie-like model to estimate SMPLX parameters? I have read your code, and it seems that in the training process, you used the SMPLX parameters obtained from the THuman2.0 dataset as the prior enhancement query. Only when infer was an image, since it was not the image of the dataset, PIXIE model was used to predict the SMPLX parameters as the prior enhancement query. Is my understanding correct?

ViTencoder input

I found that the front/back normal maps are also used as input to the encoder and image to generate three-plane features. I want to know why? Will the result be improved?
image
Reading the code, I found that after obtaining the three-plane feature map, it was concatenated with the normal feature.
image
I only input the image through VitPose's pre-trained ViTencoder model to get the image features, and then also through the three decoders to get the three-plane features and splice with the normal features. Is that all right?

About PNSR

Amazing work! Can you provide code for calculating PNSR or tell me where to find the relevant code?

inference time

Hi, thanks for your great work.
I read your paper, but didn't see any mention of inference time for a single image.
Do you have a rough idea of what it would be on a modern GPU?

thanks!

About HGPIFu

When estimating the human body geometry, the query operation is performed in HGPIFuNet.
image
The first step is to project the sampled point set onto the image plane. But I found that the transforms parameter is None.
image
image
So in xyz = self.projection(points, calibs, transforms), only the points are rotated and translated.
Are all points in the world coordinate system? The projection operation only converts points from the world coordinate system to the camera coordinate system after rotation and translation, and does not project further to the image plane. Please give me some help.

Expecting a demo

Hi, River-Zhang
I'm studying papers on human body reconstruction, and I have read your paper. It is a very nice work! May I ask when will you update the open source? Expecting your demo~

About the version of pymeshlab

During inferencing, I first installed the current pymeshlab version of 2023.12 and encountered the
AttributeError: 'pymeshlab.pmeshlab.MeshSet' object has no attribute 'laplacian_smooth'

Then I changed the pymeshlab to 2022 and finished the inference successfully.
Maybe the new version of pymeshlab is mismatched with the code.

Strange surface of inferenced results

Thanks for your great work!
I encountered some problems during inferencing.
Would you please help me?
My inference results have strange surfaces just as #7

I noticed that an ERROR occurred, although it didn't stop the inference progress:

Resume MLP weights from ./data/ckpt/GTA.ckpt
Resume normal model from ./data/ckpt/normal.ckpt
Using pixie as HPS Estimator

Dataset Size: 5
  0%|                                                                                                                                                            | 0/5 [00:00<?, ?it/s]
2024-03-02 16:02:28.809516226 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:515 CreateExecutionProviderInstance] Failed to create TensorrtExecutionProvider. 
Please reference https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements to ensure all dependencies are met.
1eca7a73c3c61d9debde493de37c7d99:   0%|                                                                                                                          | 0/5 [00:06<?, ?it/s
Body Fitting --- normal: 0.089 | silhouette: 0.043 | Total: 0.132:  12%|█████████▎                                                                    | 12/100 [00:01<00:13,  6.32it/s]
1eca7a73c3c61d9debde493de37c7d99:   0%|                                                                                                                          | 0/5 [00:08<?, ?it/s]

Is it normal that this error ocurred during inferencing?

I tried to change the onnxruntime-gpu and TensorRT's version but it didn't work.

My environment is:
CUDA 11.7
pytorch 1.13.1
onnxruntime-gpu 1.14
TensorRT 8.5.3.1

SSIM and LIPIS metrics

I observed that the SSIM and LIPIS metrics for the GTA on the Thuman2.0 dataset have not been made available. Could you kindly provide this data or share the rendered results?

there is no .npy file

Thanks for your sharing of this work.
When i tried to run the infer.py, i found there is no .npy files and directories in certain place:
image

THuman 2.0 evaluation protocol

Hi authors, I have a question regarding the THuman 2.0 evaluation protocol in your Table 1.

  • How do you create train/test split?
  • For the test set, how many views do you render per subject, and what is the FOV?

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.