alvinliu0 / ha2g Goto Github PK

View Code? Open in Web Editor NEW

124.0 124.0 9.0 2.67 MB

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

Home Page: https://alvinliu0.github.io/projects/HA2G

License: GNU General Public License v3.0

Python 100.00%

audio-visual-learning co-speech-gesture cvpr2022

ha2g's People

Contributors

Stargazers

Watchers

Forkers

peterzs xiaoyun4 chhaviilli fjiang9 maxmax2016 hmthanh zigchang toymaker222 9950x

ha2g's Issues

About reviewing proj_joints on original images

Hi, thanks for your outstanding work !
However I have some problems about the visualization part for the proj_joints extracted from expose.

mm = clip['3d'][i]['focal_length_in_mm']
px = clip['3d'][i]['focal_length_in_px']
proj_joints = proj_joints * px * 10 / mm
proj_joints += center

Here is your code for visualize proj_joints into the original images, I am confused about this code, why it needs to multiple 10 for proj_joints*px. Are there any actual means of this 10 or can you offer some reference for this part ? Thanks !

can you share the test/eval script?

ValueError: padding='same' is not supported for strided convolutions

这个该如何解决呢？

最初是报的这个错误，ValueError: Invalid padding string 'SAME', should be one of {'same', 'valid'}。
然后，我把SAME改成了 same。
后面就报了 ValueError: padding='same' is not supported for strided convolutions 。

quantitative results in the paper

In the repo, it seems evaluate on the val dataset as in ted_gesture_original.log the best fgd is 3.072, and i didn't find the code to evaluate on the test dataset.
I wonder you report which dataset's results (val or test) in the paper ?

Accessing to Pre-built TED gesture dataset

Hello, nice work with your research!
I just want to access to Pre-built TED gesture dataset.
what library did you use for reading mdb in python?
I have an error with mdb_parser like this:

mdb_read_table: Page 2 [size=4096] is not a valid table definition page (First byte = 0xDA, expected 0x02)

AttributeError: module 'gentle' has no attribute 'Resources'

Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
File "scripts/synthesize.py", line 33, in
gentle_resources = gentle.Resources()
AttributeError: module 'gentle' has no attribute 'Resources'

what does this mean. btw, i ve downloaded the gentle and compile it using install.sh

Questions about making datasets

Hello, thank you very much for your contribution, I have a doubt that I hope you can answer. When making the data set, I found that the key point you used is (23,3), but the bone key point dimension in your data set is (43,3), which seems a little inconsistent

Animation with human mesh?

This is really great work! I got train_feature_extractor_expressive.py to run, and it produced the following video:

yq3TQoMjXTw_998_2_000_0.mp4

It looks good, but the human motion is using a line-based skeleton:

I'd like to show an animation on a human mesh, like the following screenshot from your demo video:

I was wondering, for your demo video, what software did you use to render it as an "animated mesh" instead of an "animated line drawing?"

mean_pose and mean_dir_vec

Hello, you've done a good job but when I try to build my dataset and train on it, I don't find any code for calculating these two params. In calc_mean, it seems that you haven't finished/uploaded the right file. Will you update this? Thank you very much.

code

waiting for the code:)

Data Reader

I encounter some bugs when read the preprocessed data.
Traceback (most recent call last):
File "test.py", line 14, in
value = pa.deserialize(buf)
File "pyarrow/serialization.pxi", line 550, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 555, in pyarrow.lib._deserialize
File "pyarrow/serialization.pxi", line 461, in pyarrow.lib._read_serialized
File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Expected IPC message of type sparse tensor but got tensor

Here is my code. My pyarrow==8.0.0. Could you provide the data reader or some instructions?

txn = env.begin()  
key = '%010d'%0  
buf=txn.get(key.encode('ascii'))  
value = pyarrow.deserialize(buf)

ValueError: not enough values to unpack (expected 6, got 3)

Hello,

I am trying to run the train_expressive.py (HA2G model) file for the TED Expressive Dataset. However, I am encountering the following error.

Traceback (most recent call last):
File "scripts/train_expressive.py", line 900, in
main({'args': _args})
File "scripts/train_expressive.py", line 895, in main
pose_dim=pose_dim, speaker_model=train_dataset.speaker_model)
File "scripts/train_expressive.py", line 237, in train_epochs
val_metrics = evaluate_testset(test_data_loader, generator, g1, g2, g3, g4, g5, g6, audio_encoder, loss_fn, embed_space_evaluator, args)
File "scripts/train_expressive.py", line 418, in evaluate_testset
for iter_idx, data in enumerate(test_data_loader, 0):
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 652, in next
data = self._next_data()
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
return self._process_data(data)
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/_utils.py", line 461, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/scratch/me12/kaustubk/miniconda/envs/HAG/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/fs03/me12/HA2G/scripts/data_loader/lmdb_data_loader_expressive.py", line 119, in getitem
word_seq, pose_seq, vec_seq, audio, spectrogram, aux_info = sample
ValueError: not enough values to unpack (expected 6, got 3)

I am using the downloadable TED Expressive dataset from the given link in the repo.

Also, the error is happening after calling the pyarrow.deserialize function in the getitem function of the lmdb_ data_loader_expressive.py file.
Anyone's help would be highly appreciable.

About real-time performance

Hello!
I would like to ask how the real-time performance of this project is, for example, for a 10-second speech, how long does it take to generate the result?

Output convert

Hello,
I notice the physical meaning of output is the direction vector with the spine (or pelvis) as the origin, which is not compatible with the existing rendering engine such as blender and UE5. Is there any way how to convert the output to axis angle or rotation matrix of which UE5 can make use?

File "/home/ubuntu/HA2G/scripts/model/hierarchy_net.py", line 129, in forward

File "/home/ubuntu/HA2G/scripts/model/hierarchy_net.py", line 129, in forward
in_data = torch.cat((pre_seq, audio_feat_seq, text_feat_seq), dim=2)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 4 but got size 34 for tensor number 1 in the list.

I met the above error when using HA2G. how to solve it?

Long delays often occur when saving results.

hello， thank you for your code and great work.
It happens to me all the time. I hit Enter, it will start processing the next one( very fast ) .

The realization of speech drive picture 2D characters？

It's an amazing project!
I want to use speech to drive the people in the picture to make gestures. But after reading the thesis and the project, I didn't have a good idea.

Might I get some Pointers from you in your spare time？

Thanks very much!