Coder Social home page Coder Social logo

tmr's Introduction

TMR: Text-to-Motion Retrieval

Using Contrastive 3D Human Motion Synthesis

Mathis Petrovich · Michael J. Black · Gül Varol

ICCV2023 arXiv License

Description

Official PyTorch implementation of the paper:

Please visit our webpage for more details.

Bibtex

If you find this code useful in your research, please cite:

@inproceedings{petrovich23tmr,
    title     = {{TMR}: Text-to-Motion Retrieval Using Contrastive {3D} Human Motion Synthesis},
    author    = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
    booktitle = {International Conference on Computer Vision ({ICCV})},
    year      = 2023
}

and if you use the re-implementation of TEMOS of this repo, please cite:

@inproceedings{petrovich22temos,
    title     = {{TEMOS}: Generating diverse human motions from textual descriptions},
    author    = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
    booktitle = {European Conference on Computer Vision ({ECCV})},
    year      = 2022
 }

You can also put a star ⭐, if the code is useful to you.

Installation 👷

Create environment

Create a python virtual environnement:

python -m venv ~/.venv/TMR
source ~/.venv/TMR/bin/activate

Install PyTorch

python -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Then install remaining packages:

python -m pip install -r requirements.txt

which corresponds to the packages: pytorch_lightning, einops, hydra-core, hydra-colorlog, orjson, tqdm, scipy. The code was tested on Python 3.10.12 and PyTorch 2.0.1.

Set up the datasets

Introduction

The process is a little bit different than other repos because we need to have a common reprensenation for HumanML3D, KITML and BABEL (to be able to train on one, and evaluate on another). If you are currious about the details, I recommand you to read this file: DATASETS.md. I also put the bibtex files of the datasets, which I recommand you to cite.

Get the data

Please follow the instructions of the raw_pose_processing.ipynb of the HumanML3D repo, to get the pose_data folder. Then copy or symlink the pose_data folder in datasets/motions/:

ln -s /path/to/HumanML3D/pose_data datasets/motions/pose_data

Compute the features

Run the following command, to compute the HumanML3D Guo features on the whole AMASS (+HumanAct12) dataset.

python -m prepare.compute_guoh3dfeats

It should process the features (+ mirrored version) and saved them in datasets/motions/guoh3dfeats.

Compute the text embeddings

Run this command to compute the sentence embeddings and token embeddings used in TMR for each datasets.

python -m prepare.text_embeddings data=humanml3d

This will save:

  • the token embeddings of distilbert in datasets/annotations/humanml3d/token_embeddings
  • the sentence embeddings of all-mpnet-base-v2 in datasets/annotations/humanml3d/sent_embeddings

Compute statistics (already done for you)

To get statistics of the motion distribution for each datasets, you can run the following commands. It is already included in the repo, so you don't have to. The statistics are computed on the training set.

python -m prepare.motion_stats data=humanml3d

It will save the statistics (mean.pt and std.pt) in this folder stats/humanml3d/guoh3dfeats. You can replace data=humanml3d with data=kitml or data=babel anywhere in this repo.

Training 🚀

python train.py [OPTIONS]
Details

By default, it will train TMR on HumanML3D and store the folder in outputs/tmr_humanml3d_guoh3dfeats which I will call RUN_DIR. The other options are:

Models:

  • model=tmr: TMR (by default)
  • model=temos: TEMOS

Datasets:

  • data=humanml3d: HumanML3D (by default)
  • data=kitml: KIT-ML
  • data=babel: BABEL
Extracting weights

After training, run the following command, to extract the weights from the checkpoint:

python extract.py run_dir=RUN_DIR

It will take the last checkpoint by default. This should create the folder RUN_DIR/last_weights and populate it with the files: motion_decoder.pt, motion_encoder.pt and text_encoder.pt. This process makes loading models faster, it does not depends on the file structure anymore, and each module can be loaded independently. This is already done for pretrained models.

Pretrained models 📀

bash prepare/download_pretrain_models.sh

This will put pretrained models in the models folder. Currently, there are:

  • TMR trained on HumanML3D with Guo et al. humanml3d features models/tmr_humanml3d_guoh3dfeats
  • TMR trained on KIT-ML with Guo et al. humanml3d features models/tmr_kitml_guoh3dfeats

Not that KIT-ML is used with the Guo et al. humanml3d features (it is not a mistake). The motions come from AMASS and are converted (I am not using the MMM joints from the original KIT-ML). This makes the two models works in the same motion space.

More models may be available later on.

Evaluation 📊

python retrieval.py run_dir=RUN_DIR

It will compute the metrics, show them and save them in this folder RUN_DIR/contrastive_metrics/.

Usage 💻

Encode a motion

Note that the .npy file should corresponds to HumanML3D Guo features.

python encode_motion.py run_dir=RUN_DIR npy=/path/to/motion.npy

Encode a text

python encode_text.py run_dir=RUN_DIR text="A person is walking forward."

Compute similarity between text and motion

python text_motion_sim.py run_dir=RUN_DIR text=TEXT npy=/path/to/motion.npy

For example with text="a man sets to do a backflips then fails back flip and falls to the ground" and npy=HumanML3D/HumanML3D/new_joint_vecs/001034.npy you should get around 0.96.

Launch the demo

Encode the whole motion dataset

python encode_dataset.py run_dir=RUN_DIR

Text-to-motion retrieval demo

Run this command:

python app.py

and then open your web browser at the address: http://localhost:7860.

Localization (WIP)

The code will be available a bit later.

Reimplementation of TEMOS (WIP)

Details and difference

TEMOS code was probably a bit too abstract and some users struggle to understand it. As TMR and TEMOS share a similar architecture, I took the opportunity to rewrite TEMOS in this repo src/model/temos.py to make it more user friendly. Note that in this repo, the motion representation is different from the original TEMOS paper (see DATASETS.md for more details). Another difference is that I precompute the token embeddings (from distilbert) beforehand (as I am not finetunning the distilbert for the final model). This makes the training around x2 faster and it is more memory efficient.

The code and the generations are not fully tested yet, I will update the README with pretrained models and more information later.

License 📚

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including PyTorch, PyTorch3D, Hugging Face, Hydra, and uses datasets which each have their own respective licenses that must also be followed.

tmr's People

Contributors

dependabot[bot] avatar mathux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

tmr's Issues

Different definitions of face_joint_idx in src/guofeats/skeleton.py and src/guofeats/motion_representation.py

Hi Mathux, thanks for your excellent work on this motion retrival task, it's very impressive!
I have a question about the face_joint_idx variabal.

You commented that on line 77 in this src/guofeats/common/skeleton.py#L82-L86,

"face_joint_idx should follow the order of right hip, left hip, right shoulder, left shoulder"

but on line 82, you wrote l_hip, r_hip, sdr_r, sdr_l = face_joint_idx

However, on line 101 in src/guofeats/motion_representation.py#L239-L242 you just wrote
r_hip, l_hip, sdr_r, sdr_l = face_joint_idx

According to the code
face_joint_idx = [2, 1, 17, 14] in src/guofeats/motion_representation.py#L351-L352 ,
it seems l_hip, r_hip, sdr_r, sdr_l = face_joint_idx in src/guofeats/common/skeleton might be wrong.

I want to know what makes this difference.
image

Offset joints

Hi,

I have a quick question.
I am wondering why offset joints are needed when you compute guoh3dfeats?

Code lines are in src/guofeats/motion_representation/_get_joints_to_guofeats

Thank you

Negative filter

Hi TMR team.
I want to know when will you release your code? I am quite interesed in the Negative filter part !!!
Best wishes
Kangning

not same device

model.py
line 121
latent_unit_texts = normalize(self(texts)).to(unit_embs.device)

'models/tmr_humanml3d_guoh3dfeats/latents/humanml3d_all_unit.npy' file missing

After I train and prepare all the data by following the readme, when I run python app.py, there is an error:

Traceback (most recent call last):
  File "/app/TMR/app.py", line 244, in <module>
    unit_motion_embs, keyids_index, index_keyids = load_unit_embeddings(
  File "/app/TMR/demo/load.py", line 16, in load_unit_embeddings
    motion_embs = torch.from_numpy(np.load(unit_emb_path)).to(device)
  File "/usr/local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 405, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'models/tmr_humanml3d_guoh3dfeats/latents/humanml3d_all_unit.npy'

Need help pls.

Mirrored HumanML3D

Thank you for the great work!
I'm trying to train your code with HumanML3D dataset and I have a simple question.

How did you get the mirrored data from HumanML3D?
I followed raw_pose_processing.ipynb in the repository of HumanML3D from top to bottom, but I just got npy files named with 6-digits numbers in 'joints' folder.

I confirmed that annotations.json in your repository includes data that their paths are like "M/CMU/80/80_32_poses".
How did you get these mirrored motion files?
Thank you!

The motion encoder and its representation

Hi, Mathux. Thanks for your impressive work on motion retrieval. It could support many works in this field. I was wondering if you could provide more about your release plan (like the motion encoder) and the motion representation. Is it similar to the one used in HumanML3D?

In addition, I was wondering if TMR could be used to evaluate motion generation. Specifically, could its motion scores be used like R-Precision or even to assess motion quality? I have some doubts about the accuracy of the motion encoder provided by the HumanML3D dataset, and I believe that TMR may be a more reliable tool for this purpose.

Thank you for your time and expertise!

How to reproduce the metrics in the paper ?

Hi Mathux,

Thank you for releasing the TMR training code. I ran the train.py with the default settings against guo's dataset the contrastive_metrics result I got was quite a bit worse than the contrastive_metrics packaged in the pre-trained model. For example, the following is the content of the "threshold_0.95.yaml" metrics from my newly trained model and the pre-trained model.

-----------my newly trained model-------------
t2m/R01: 11.43
m2t/R01: 12.27
t2m/R02: 14.03
m2t/R02: 14.32
t2m/R03: 20.0
m2t/R03: 20.51
t2m/R05: 25.84
m2t/R05: 26.05
t2m/R10: 36.11
m2t/R10: 35.45
t2m/MedR: 21.0
m2t/MedR: 23.5
t2m/len: 4384.0
m2t/len: 4384.0

-------------Pre-trained Model--------------
t2m/R01: 13.0
m2t/R01: 12.32
t2m/R02: 16.15
m2t/R02: 14.92
t2m/R03: 21.35
m2t/R03: 21.35
t2m/R05: 28.51
m2t/R05: 27.6
t2m/R10: 39.07
m2t/R10: 37.73
t2m/MedR: 18.0
m2t/MedR: 20.0
t2m/len: 4384.0
m2t/len: 4384.0

The pre-trained model's metrics are clearly better than my newly trained model. Can you advise how can I reproduce the same level of metrics as the pre-trained model ? I increased the max_epochs from 1000 to 2000 and it still didn't help. Also I trained the model on a single A6000 GPU with 40GB VRAM. What hardware did you use for the pre-trained model ? Are there other hyper parameters I need change from the default values to achieve the similar results as the pre-trained model ?

Thanks,
Kc

eval with threshold

Hi @Mathux :
when running the evaluation with threshold of my own model, I faced the problems below and I wonder how to fix this issue?
In my setting, the shape of sims and text_selfsim is all (512, 512), and I set the threhold to 0.8.
Thanks a lot !
image

Missing file when run enocode_dataset.py

Hi, thanks for your great work. However, when I run the encode_dataset, I get an error as

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/annotations/humanml3d/token_embeddings/distilbert-base-uncased.npy'

And I do not find this file in the dataset. Am I missing any step?

python encode_dataset.py error

python encode_dataset.py run_dir=/data/home/wangyiming/Motion/TMR/models/tmr_humanml3d_guoh3dfeats

Error executing job with overrides: ['run_dir=/data/home/wangyiming/Motion/TMR/models/tmr_humanml3d_guoh3dfeats']
Error in call to target 'src.data.text_motion.TextMotionDataset':
FileNotFoundError(2, 'No such file or directory')

Motion Rendering

Hello, thanks for the great work.
I'd like to ask how to render the 3D human motion like that reported in your paper instead of the only skeleton but the mesh version?

Paper Question

Thank you for the amazing project.

In Figure3 of the paper, you say “ we provide sample qualitative results for text-to-motion retrieval on the full test set of H3D.” You say
If Rank1 000000.npy and Rank2 M000000.npy, do you mean Rank1 for both together?

I'm asking because you said you used the full test set, but the retrieved motion doesn't have symmetrical motion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.