Coder Social home page Coder Social logo

alvinliu0 / ha2g Goto Github PK

View Code? Open in Web Editor NEW
124.0 124.0 9.0 2.67 MB

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

Home Page: https://alvinliu0.github.io/projects/HA2G

License: GNU General Public License v3.0

Python 100.00%
audio-visual-learning co-speech-gesture cvpr2022

ha2g's People

Contributors

alvinliu0 avatar qianyiwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ha2g's Issues

About reviewing proj_joints on original images

Hi, thanks for your outstanding work !
However I have some problems about the visualization part for the proj_joints extracted from expose.

mm = clip['3d'][i]['focal_length_in_mm']
px = clip['3d'][i]['focal_length_in_px']
proj_joints = proj_joints * px * 10 / mm
proj_joints += center

Here is your code for visualize proj_joints into the original images, I am confused about this code, why it needs to multiple 10 for proj_joints*px. Are there any actual means of this 10 or can you offer some reference for this part ? Thanks !

quantitative results in the paper

In the repo, it seems evaluate on the val dataset as in ted_gesture_original.log the best fgd is 3.072, and i didn't find the code to evaluate on the test dataset.
I wonder you report which dataset's results (val or test) in the paper ?

Accessing to Pre-built TED gesture dataset

Hello, nice work with your research!
I just want to access to Pre-built TED gesture dataset.
what library did you use for reading mdb in python?
I have an error with mdb_parser like this:

mdb_read_table: Page 2 [size=4096] is not a valid table definition page (First byte = 0xDA, expected 0x02)

Questions about making datasets

Hello, thank you very much for your contribution, I have a doubt that I hope you can answer. When making the data set, I found that the key point you used is (23,3), but the bone key point dimension in your data set is (43,3), which seems a little inconsistent

Animation with human mesh?

This is really great work! I got train_feature_extractor_expressive.py to run, and it produced the following video:

yq3TQoMjXTw_998_2_000_0.mp4

It looks good, but the human motion is using a line-based skeleton:
line drawing

I'd like to show an animation on a human mesh, like the following screenshot from your demo video:
mesh

I was wondering, for your demo video, what software did you use to render it as an "animated mesh" instead of an "animated line drawing?"

mean_pose and mean_dir_vec

Hello, you've done a good job but when I try to build my dataset and train on it, I don't find any code for calculating these two params. In calc_mean, it seems that you haven't finished/uploaded the right file. Will you update this? Thank you very much.

code

waiting for the code:)

Data Reader

I encounter some bugs when read the preprocessed data.
Traceback (most recent call last):
File "test.py", line 14, in
value = pa.deserialize(buf)
File "pyarrow/serialization.pxi", line 550, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 555, in pyarrow.lib._deserialize
File "pyarrow/serialization.pxi", line 461, in pyarrow.lib._read_serialized
File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Expected IPC message of type sparse tensor but got tensor

Here is my code. My pyarrow==8.0.0. Could you provide the data reader or some instructions?

txn = env.begin()  
key = '%010d'%0  
buf=txn.get(key.encode('ascii'))  
value = pyarrow.deserialize(buf)

ValueError: not enough values to unpack (expected 6, got 3)

Hello,

I am trying to run the train_expressive.py (HA2G model) file for the TED Expressive Dataset. However, I am encountering the following error.

Traceback (most recent call last):
File "scripts/train_expressive.py", line 900, in
main({'args': _args})
File "scripts/train_expressive.py", line 895, in main
pose_dim=pose_dim, speaker_model=train_dataset.speaker_model)
File "scripts/train_expressive.py", line 237, in train_epochs
val_metrics = evaluate_testset(test_data_loader, generator, g1, g2, g3, g4, g5, g6, audio_encoder, loss_fn, embed_space_evaluator, args)
File "scripts/train_expressive.py", line 418, in evaluate_testset
for iter_idx, data in enumerate(test_data_loader, 0):
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 652, in next
data = self._next_data()
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
return self._process_data(data)
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/_utils.py", line 461, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/fs03/me12/HA2G/scripts/data_loader/lmdb_data_loader_expressive.py", line 119, in getitem
word_seq, pose_seq, vec_seq, audio, spectrogram, aux_info = sample
ValueError: not enough values to unpack (expected 6, got 3)

I am using the downloadable TED Expressive dataset from the given link in the repo.

Also, the error is happening after calling the pyarrow.deserialize function in the getitem function of the lmdb_ data_loader_expressive.py file.
Anyone's help would be highly appreciable.

About real-time performance

Hello!
I would like to ask how the real-time performance of this project is, for example, for a 10-second speech, how long does it take to generate the result?

Output convert

Hello,
I notice the physical meaning of output is the direction vector with the spine (or pelvis) as the origin, which is not compatible with the existing rendering engine such as blender and UE5. Is there any way how to convert the output to axis angle or rotation matrix of which UE5 can make use?

File "/home/ubuntu/HA2G/scripts/model/hierarchy_net.py", line 129, in forward

File "/home/ubuntu/HA2G/scripts/model/hierarchy_net.py", line 129, in forward
in_data = torch.cat((pre_seq, audio_feat_seq, text_feat_seq), dim=2)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 4 but got size 34 for tensor number 1 in the list.

I met the above error when using HA2G. how to solve it?

The realization of speech drive picture 2D characters?

It's an amazing project!
I want to use speech to drive the people in the picture to make gestures. But after reading the thesis and the project, I didn't have a good idea.

Might I get some Pointers from you in your spare time?

Thanks very much!

TED Express dataset

Are you going to release the TED Express dataset?
It will be helpful for further research.

Code

What wonderful work! Could you release the code as soon as possible?

How to give only audio as an input ?

Thank you for your great work! The paper mentioned that the model can use speech audio as guidance without text. Where could I find the related code? Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.