fudan-generative-vision / champ Goto Github PK

View Code? Open in Web Editor NEW

3.2K 175.0 375.0 779.27 MB

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Home Page: https://fudan-generative-vision.github.io/champ/

License: Apache License 2.0

Python 100.00%

human-animation video-generation image-animatioln

champ's Introduction

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu^*1 Junming Leo Chen^*2 Zuozhuo Dai³ Yinghui Xu² Xun Cao¹ Yao Yao¹ Hao Zhu⁺¹ Siyu Zhu⁺²

¹Nanjing University ²Fudan University ³Alibaba Group

^*Equal Contribution ⁺Corresponding Author

head.mp4

Framework

News

2024/05/05: 🎉🎉🎉Sample training data on HuggingFace released.
2024/05/02: 🌟🌟🌟Training source code released #99.
2024/04/28: 👏👏👏Smooth SMPLs in Blender method released #96.
2024/04/26: 🚁Great Blender Adds-on CEB Studios for various SMPL process!
2024/04/12: ✨✨✨SMPL & Rendering scripts released! Champ your dance videos now💃🤸‍♂️🕺. See docs.
2024/03/30: 🚀🚀🚀Amazing ComfyUI Wrapper by community. Here is the video tutorial. Thanks to @kijai🥳
2024/03/27: Cool Demo on replicate🌟. Thanks to @camenduru👏
2024/03/27: Visit our roadmap🕒 to preview the future of Champ.

Installation

System requirement: Ubuntu20.04/Windows 11, Cuda 12.1
Tested GPUs: A100, RTX3090

Create conda environment:

  conda create -n champ python=3.10
  conda activate champ

Install packages with pip

  pip install -r requirements.txt

Install packages with poetry

If you want to run this project on a Windows device, we strongly recommend to use poetry.

poetry install --no-root

Inference

The inference entrypoint script is ${PROJECT_ROOT}/inference.py. Before testing your cases, there are two preparations need to be completed:

Download all required pretrained models.
Prepare your guidance motions.
Run inference.

Download pretrained models

You can easily get all pretrained models required by inference from our HuggingFace repo.

Clone the the pretrained models into ${PROJECT_ROOT}/pretrained_models directory by cmd below:

git lfs install
git clone https://huggingface.co/fudan-generative-ai/champ pretrained_models

Or you can download them separately from their source repo:

Champ ckpts: Consist of denoising UNet, guidance encoders, Reference UNet, and motion module.
StableDiffusion V1.5: Initialized and fine-tuned from Stable-Diffusion-v1-2. (Thanks to runwayml)
sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
image_encoder: Fine-tuned from CompVis/stable-diffusion-v1-4-original to accept CLIP image embedding rather than text embeddings. (Thanks to lambdalabs)

Finally, these pretrained models should be organized as follows:

./pretrained_models/
|-- champ
|   |-- denoising_unet.pth
|   |-- guidance_encoder_depth.pth
|   |-- guidance_encoder_dwpose.pth
|   |-- guidance_encoder_normal.pth
|   |-- guidance_encoder_semantic_map.pth
|   |-- reference_unet.pth
|   `-- motion_module.pth
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Prepare your guidance motions

Guidance motion data which is produced via SMPL & Rendering is necessary when performing inference.

You can download our pre-rendered samples on our HuggingFace repo and place into ${PROJECT_ROOT}/example_data directory:

git lfs install
git clone https://huggingface.co/datasets/fudan-generative-ai/champ_motions_example example_data

Or you can follow the SMPL & Rendering doc to produce your own motion datas.

Finally, the ${PROJECT_ROOT}/example_data will be like this:

./example_data/
|-- motions/  # Directory includes motions per subfolder
|   |-- motion-01/  # A motion sample
|   |   |-- depth/  # Depth frame sequance
|   |   |-- dwpose/ # Dwpose frame sequance
|   |   |-- mask/   # Mask frame sequance
|   |   |-- normal/ # Normal map frame sequance
|   |   `-- semantic_map/ # Semanic map frame sequance
|   |-- motion-02/
|   |   |-- ...
|   |   `-- ...
|   `-- motion-N/
|       |-- ...
|       `-- ...
`-- ref_images/ # Reference image samples(Optional)
    |-- ref-01.png
    |-- ...
    `-- ref-N.png

Run inference

Now we have all prepared models and motions in ${PROJECT_ROOT}/pretrained_models and ${PROJECT_ROOT}/example_data separately.

Here is the command for inference:

  python inference.py --config configs/inference/inference.yaml

If using poetry, command is

poetry run python inference.py --config configs/inference/inference.yaml

Animation results will be saved in ${PROJECT_ROOT}/results folder. You can change the reference image or the guidance motion by modifying inference.yaml.

The default motion-02 in inference.yaml has about 250 frames, requires ~20GB VRAM.

Note: If your VRAM is insufficient, you can switch to a shorter motion sequence or cut out a segment from a long sequence. We provide a frame range selector in inference.yaml, which you can replace with a list of [min_frame_index, max_frame_index] to conveniently cut out a segment from the sequence.

Train the Model

The training process consists of two distinct stages. For more information, refer to the Training Section in the paper on arXiv.

Prepare Datasets

Prepare your own training videos with human motion (or use our sample training data on HuggingFace) and modify data.video_folder value in training config yaml.

All training videos need to be processed into SMPL & DWPose format. Refer to the Data Process doc.

The directory structure will be like this:

/training_data/
|-- video01/          # A video data frame
|   |-- depth/        # Depth frame sequance
|   |-- dwpose/       # Dwpose frame sequance
|   |-- mask/         # Mask frame sequance
|   |-- normal/       # Normal map frame sequance
|   `-- semantic_map/ # Semanic map frame sequance
|-- video02/
|   |-- ...
|   `-- ...
`-- videoN/
|-- ...
`-- ...

Select another small batch of data as the validation set, and modify the validation.ref_images and validation.guidance_folders roots in training config yaml.

Run Training Scripts

To train the Champ model, use the following command:

# Run training script of stage1
accelerate launch train_s1.py --config configs/train/stage1.yaml

# Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2
accelerate launch train_s2.py --config configs/train/stage2.yaml

Datasets

Type	HuggingFace	ETA
Inference	SMPL motion samples	Thu Apr 18 2024
Training	Sample datasets for Training	Sun May 05 2024

Roadmap

Status	Milestone	ETA
✅	Inference source code meet everyone on GitHub first time	Sun Mar 24 2024
✅	Model and test data on Huggingface	Tue Mar 26 2024
✅	Optimize dependencies and go well on Windows	Sun Mar 31 2024
✅	Data preprocessing code release	Fri Apr 12 2024
✅	Training code release	Thu May 02 2024
✅	Sample of training data release on HuggingFace	Sun May 05 2024
✅	Smoothing SMPL motion	Sun Apr 28 2024
🚀🚀🚀	Gradio demo on HuggingFace	TBD

Citation

If you find our work useful for your research, please consider citing the paper:

@misc{zhu2024champ,
      title={Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance},
      author={Shenhao Zhu and Junming Leo Chen and Zuozhuo Dai and Yinghui Xu and Xun Cao and Yao Yao and Hao Zhu and Siyu Zhu},
      year={2024},
      eprint={2403.14781},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Opportunities available

Multiple research positions are open at the Generative Vision Lab, Fudan University! Include:

Research assistant
Postdoctoral researcher
PhD candidate
Master students

Interested individuals are encouraged to contact us at [email protected] for further information.

champ's People

Contributors

Stargazers

Watchers

Forkers

siyuzhu-fudan sdbds camenduru szninjacat kijai super-alex blizaine mathdroid anthonyyuan xiusdk chnxindong drkhoinguyen awekling kustomzone frierenlabs monsterdove cuuupid 227time tufo830 ototao bluesealjs twelvearrays d3p10y dineshkumares sh3rly13 mz0in ameerazam08 mistyr0se vamoko mooneese ntt720 richardsolaire jmaigc iam20cm paramedick kackbob princepride wangxihao idoatad ysuws1314 aricgamma fskeo jinshiyin qoffee elvhack lplzyp painebenjamin stupidbai yushan777 jbluv hp027 cylonspace hubin858130 sunsmarterjie ltfschoen maigone aliang-cv magicwang1111 giantclam peter65374 zlandrew jameswang007 r42-chun red7sk jmwdpk dearsunshine maisnamraju tonghengcheng minisoco obsidian6s tc999 wensiyuansix anhtudotinfo bewazdi gznbilir xiaojiuli fathyjaquesa mikelmanro mikigit11 hhy5277 maxmax2016 padre33 anuragvohraec zyxyzk pariigh fateme211 narsis77 misterypoem techthiyanes zhaopufeng tardigrade34 charliechap3 decentralizedbug shinshin86 hs991023 farmingtong navezjt jakubik2023 unfolloweddev sam145g

champ's Issues

Parametric shape alignment

Hi, thanks for sharing this work -- it looks great!
The paper mentions parametric shape alignment that sounds intriguing. however, i'm not seeing any reference to those models in the codebase. do you plan to release the inference/models for that as well?.

How to obtain normal map

According to your previous advice, I have tried Smpler-X and depth anything to obtain 3-D human body and corresponding depth image. I also have a question about how to obtain normal map. Could you please tell me the model you use?

Thank you!

How to get result with my own pose?

Hi author!

Very nice work!

I wanna try this model with my own image or pose in video.

Could you know me how to use it?

About the release of the dataset

Very nice work! May I know do you have plan to release the dataset?

CUDA out of memory, feature request

This looks really wonderful. Many thanks for sharing with the community.
I gave up looking at the System requirement and Graphics card requirement but thought no harm in trying.

So I tested on Windows 11 + RTX 4060 8 GB VRAM + 16 GB RAM and it worked.
The changes I did was to keep the frames inside motion to 20 frames and deleted remaining from all folders within motion. [EDIT] Tried with 40 motion frames and it worked without complaining. :)

requesting to add a feature of batch processing and the number of frames can be specified in configs/inference.yaml
Correct the system requirements to also include Windows. along with Ubuntu20.04

Here is the output
https://github.com/fudan-generative-vision/champ/assets/2102186/b3fc3b93-a5cc-4ef7-94d3-22cd4e5ed9f3

the code of computing PSNR in the Disco repository is wrong

The Disco code of computing PSNR as follows is wrong:
https://github.com/Wangt-CN/DisCo/blob/8538889c9ee9edd8dd43ffee182d1a91ce7a9828/tool/metrics/ssim_l1_lpips_psnr.py#L13.

As pointed out in Wangt-CN/DisCo#86, the accurate code is mse = np.mean((original/1.0 - compressed/1.0) ** 2) instead of mse = np.mean((original - compressed) ** 2) , because original and compressed images are uint8 in their code, and (original - compressed) * * 2 will cause numerical overflow.

If you use their evaluation code of computing PSNR, please update your results.

Commercial usage?

Are the checkpoints available for commercial usage?
Thank you for your work!

my own test is bad!

Uploading data.zip…

What model is to be used for extraction of the semantic segmentation map?

Thank you for your awesome work released in open source! I really appreciate the impact this paper and code will bring to the community.

I would like to test this model by using in-the-wild video which requires preprocessing.
I am planning to use the follows:

Depth: Depth Anything with greyscale
Dwpose: official dwpose repository
Normal: ICON normal map
mask: unsure whether it is necessary as "inference.yaml" does not require mask for its guidance_types by default

Meanwhile, I am unsure what models I should be using for semantic segmentation map. Please guide me if there is any model that are suitable to be used in the data preprocessing stage.

Video flickers severely on my own data

Hi，Thank you very much for your great work. it's really awesome！

I successfully ran the entire project on my own dataset, but the generated results seem to flicker much more severely compared to the example data. Is there a way to stabilize the results like the examples do?
The tools I used are as follows:

0x1. To ensure the stability of the running results, I deliberately resized the ref_image and motion_data to the same dimensions. The ref_image was regenerated based on the pose from the motion data.
0x2. I obtained complete motion data based on the project at https://github.com/kijai/ComfyUI-champWrapper, and then imported the data into CHAMP to run the process:
1.1 DSINE Normal Map to obtain normal data
1.2 DWpose Estimator to obtain dwpose data
1.3 Depth Anything to obtain depth data
1.4 DensePose Estimator to obtain semantic_map data
0x3. The motion data and ref image I used are attached.
Thank you very much.

data.zip

怎么制作motion数据？

包括：depth dwpose mask normal semantic_map

How could you get the results of Animate Anyone?

2 Issues regarding example data

In example_data/motions/motion-0X:
There are extra output.mp4 which should not be in the dataset.
This causes an error in 'inference.py' line 78 by opening a video file using PIL.Image module
I fixed this in my local by adding the following lines after line 73:
try: Image.open(guidance_image_path).convert("RGB") except: continue
In example_data/motions/motion-07:
There is an extra 0389_all.png for motion-07/semantic_map while other depth, dwpose, mask, normal does not have 0389_all.png
This causes an assertion error in 'inference.py' line 87
I manually fixed it by erasing the file in my folder, but it would be grateful if you fix it by adding a code that automatically skips the image if it does not match with other guidances.

Please put your models on huggingface - Google drive quota already exceeded

Is there any plan for releasing the training code?

why

i can not find the file "smpl_rendering.blend"

hello,
thank you for your great work, it is solid and meaningful.
i am interested in your four conditions(depth, normal, seg_map, pose) rendered from SMPL model. this is also meaning for my work now.
however, i follow your old version code(https://github.com/Leoooo333/champ/tree/master?tab=readme-ov-file). when i process SMPL section, i have finished Fit SMPL and Transfer SMPL successfully.
THE QUESTION is the command "blender smpl_rendering.blend --background --python rendering.py --driving_path test_smpl/transfer_result/smpl_results --reference_path test_smpl/reference_imgs/images/ref.png" of RINDERING section, i can not find the file smpl_rendering.blend in the code.
i am confused, how can i find this file to finish my rendering? thank you!

PackagesNotFoundError: The following packages are not available from current channels: - gxx=12 - gcc=12

Vram required?

Thanks for the great work! What is the minimum Vram required?

[swscaler @ 0x68f11c0] Warning: data is not aligned! This can lead to a speed loss

测试数据如何与指定motion对齐

comfyui verison: size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

when I use the comfyui version, I met a tensor error , I really don't know what is going on, can someone help me, the workflow is below

Error occurred when executing champ_sampler:

The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

ERROR:root:Traceback (most recent call last):
File "/root/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/root/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/root/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 418, in process
result_video_tensor = inference(
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 471, in inference
video = pipeline(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/pipelines/pipeline_aggregation.py", line 550, in call
pred = self.denoising_unet(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/models/unet_3d.py", line 484, in forward
sample = sample + guidance_fea
RuntimeError: The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4
Champ_replace_person_01 (1).json

How to get depth images

Wonderful Work!!
I really appreciate you release your model and testing code. I have a question about the depth images. How do you get them?
Thank you!

Can this repo run on Mojo language?

Is it possible to run champ on mojo language than using python?

Could you provide the color map for SMPL body parts？

Is the semantic segmentation map generated by the body part segmentation map of the SMPL model? Could you provide the color map for different body parts？

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

(champ) meme@ubuntugpu:~/champ$ /mnt/data/meme/.conda/envs/champ/bin/python  inference.py --config configs/inference.yaml
[2024-04-06 14:13:13,650] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-04-06 14:13:14.193352: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-06 14:13:14.243075: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-06 14:13:15.100858: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/meme/.local/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
04/06/2024 14:13:16 - INFO - root - Running inference ...
04/06/2024 14:13:29 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
04/06/2024 14:13:56 - INFO - models.unet_3d - Load motion module params from pretrained_models/champ/motion_module.pth
04/06/2024 14:14:14 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
  File "/mnt/data/meme/champ/inference.py", line 312, in <module>
    main(cfg)
  File "/mnt/data/meme/champ/inference.py", line 260, in main
    result_video_tensor = inference(
  File "/mnt/data/meme/champ/inference.py", line 134, in inference
    video = pipeline(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/data/meme/champ/pipelines/pipeline_aggregation.py", line 387, in __call__
    clip_image_embeds = self.image_encoder(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1310, in forward
    vision_outputs = self.vision_model(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 865, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Am I supposed to manually create the folders for the models?

I don't have a folder called pretrained_models, champ(unless it means the main app folder which is called champ), image_ encoder, sd-vae-ft, stable-diffusion-v-15

ImportError: ('Unable to load EGL library', "Could not find module 'EGL' (or one of its dependencies). Try using the full path with constructor syntax.", 'EGL', None)

Why do you choose to use blender to render condition images?

Hi dear author @Leoooo333,

Thanks for releasing your code and provide guidance in preparing condition images to run with your method!

I noticed that you use pyrender as in HMR2 for semantic condition rendering and blender for rendering the rest of conditions. I am wondering why do you make this choice -- is it intentional? Will there be any issue we should keep in mind if we only use, say, pyrender to render all conditions?

Thanks!
Hang

some problems in data_process.md

When i follow data_process.md to setup environment and download models, i realize some problems about downloading models. The original text said "download our Pose model dw-ll_ucoco_384.onnx and Det model yolox_l.onnx, then put them into Champ/annotator/ckpts/", the "Champ" is the root directory of the project or "pretrained_models/champ",or should the models be placed in an unopened "annotator/ckpts"?

As shown in the red boxes in the screenshots below, "annotator" and "hmr2" don’t feel like third-party libraries, but there is no relevant directories in repo. Has it not been released yet?

Inference time

Hi, I'm grateful for your excellent work! I've implemented the code as per the instructions, and it runs without errors. However, the inference time is slow, approximately 176 seconds per iteration. I tested it on an 80G A100 GPU, and it seems to be using around 71G of GPU memory. Is this normal?

Memory Consumption

Hello,

This looks like an excellent piece of work - thank you for releasing openly with models available!

Question on whether there are any means by which we can reduce VRAM usage? For those of us who don't have an A100 :)

Cheers.

Data preprocessing code release date

Good luck to the project team.

operating system

how to run on windows or mac os

给项目点赞！制作 Motion 的这一套流程技术太强了！

✌✌✌用了最少帧的 06 样本，跑起来了。。。
启动中，请耐心等待......
03/26/2024 15:38:35 - INFO - root - Running inference ...
03/26/2024 15:38:39 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ...
03/26/2024 15:38:46 - INFO - models.unet_3d - Load motion module params from pretrained_models\champ\motion_module.pth
D:\AITest\Champ\runtime\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
03/26/2024 15:38:49 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
90%|████████████████████████████████████████████████████████████████████████ | 18/20 [18:10<02:01, 60.65s/it]

The output video is messy and disorganized

I take the configs/conference.yaml file for testing, and my graphics memory is only 20GB. Therefore I deleted a lot of the action images of motion-09, leaving only the first 100 images with an output resolution of 512x512. But the output video had serious errors, what is the reason for this? I tested a total of 4 images and actions in example_data, but none of them yielded the correct results

Blender script for animations

Hi, your script is awesome! I have a question, when will you make the Blender file available for generating new animations from video? Because at the moment, I can only use the example data for animation

CUDA out of memory

Hi :) Champ is really an inspiring work! During my experiment, Champ shows a high demand of memory and I cannot run the inference code on 3090 due to out of memory. May I ask is there any solution to solve this except changing to A10

The resulting video flickers badly

Thank you for your work! I have followed your instructions to complete the entire process, but the generated video flickers very seriously. Is this caused by SMPL not being smoothed? Can this problem be perfectly solved if SMPL is smoothed?

grid_wguidance.mp4

Some doubts after testing Champ

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.

grid_wguidance.mp4

Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.

91D23ZVV6NS.mp4

Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.

In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent.

However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.

animation.mp4

Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Salute your open source spirit！

The earth and the sky will praise your generosity, and countless algorithm engineers will praise you, the selfless devotee, the great architect.

Have new schedule about the dataset and training code release?

Hi,buddy.Thank you very much for your great work and I am very appreciate.
As you notice the roadmap will delay.
I am very concerned your new schedule.

Torch Not Compile with Cuda Enable

My machine is RTX4080 Window
I install all pretrain models, packages and run it in conda, when I use torch==2.0.1, it will say Torch not compiled with CUDA enabled.

  File "D:\Github\champ\inference.py", line 284, in <module>
    main(cfg)
  File "D:\Github\champ\inference.py", line 162, in main
    ).to(dtype=weight_dtype, device="cuda")
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\transformers\modeling_utils.py", line 1902, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\cuda\__init__.py", line 293, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I also tried using "conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia", in that way,
it will say

`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'NoneType'>
    operator wasn't built - see `python -m xformers.info` for more info
`[email protected]` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
`tritonflashattF` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 40

how to solve this

Is it possible to use customized basemodel?

I would like to run the inference using other base model than SD1.5, such as majicMIX-realistic on Hugging Face.

I faced a problem running it on other basemodel. Simply changing the cfg.base_model_path to majicMIX-realistic does not work:
denoising_unet/reference_unet.load_state_dict( ... ) at line 200-213 resets the unet to the base model
When I nullify line 200-213, what I obtain as result is basic noise image of grey color.

I would like to know whether it is possible to use .safetensor on CIVITAI to change the base model of the image?
Or, are the denoising_unet.pth and reference_unet.pth in provided checkpoint specialized for their own task which makes other models unable to function?

If there is a method that I can easily implement using other base model, please guide me. Thank you!

my test case for comfyui workflow is bad

here is my test image and video：

2s.mp4

and here is the result video：

Champ.mp4

So, what's the problem? is there any requirements for test data? Thanks

About SMPL, how to get the parameters, use HMR?

I want to know how to get the parameters of SMPL，using HMR is not very accurate。

V100爆显存

使用V100（32G）运行
CUDA_VISIBLE_DEVICES=6 python inference.py --config configs/inference.yaml

爆显存如下
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.95 GiB (GPU 0; 31.75 GiB total capacity; 24.84 GiB already allocated; 1.96 GiB free; 28.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

需要多大的显存才可以运行或者修改哪里？

can't open file 'C:\\sd1\\champ\\inference_smpl.py'

Hello,

I am trying to SMPL & Rendering for own video.

Followed all the steps but it comes up with an error

(venv) C:\sd1\champ>python inference_smpl.py  --reference_imgs_folder test_smpl/reference_imgs --driving_videos_folder test_smpl/driving_videos --device 1
C:\Program Files\Python310\python.exe: can't open file 'C:\\sd1\\champ\\inference_smpl.py': [Errno 2] No such file or directory

Can't locate this file on github

[Tutorial]: Run Champ on Windows

I am unable to checkout or create Pull request as I am not a collaborator.

If you could please include the following as Windows tutorial

Champ: Windows tutorial in English (https://www.youtube.com/watch?v=XasrlKbFKy0)
@Leoooo333
@ShenhaoZhu

where is model_config.yaml?

Traceback (most recent call last):
File "/workspace/champ/4D-Humans/inference_smpl.py", line 74, in
model, model_cfg = load_hmr2(DEFAULT_CHECKPOINT)
File "/workspace/champ/4D-Humans/hmr2/models/init.py", line 72, in load_hmr2
model_cfg = get_config(model_cfg, update_cachedir=True)
File "/workspace/champ/4D-Humans/hmr2/configs/init.py", line 103, in get_config
cfg.merge_from_file(config_file)
File "/opt/conda/lib/python3.10/site-packages/yacs/config.py", line 211, in merge_from_file
with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/4DHumans/logs/train/multiruns/hmr2/0/model_config.yaml'

No matter how hard I look, I can't find the relevant yaml file. Is there anything I missed?

Strong image distortion

Thank you very much for your great work.

I have tried your model using reference images of my own and the end result is often not visually pleasing at all (face distorted, image proportions changed, ...).

I would be grateful if you could provide any potential constraint regarding the source image, for instance :

any ratio height / width for the image
proportion of the head versus the rest of the image
location of the head and body in the image (horizontally centered ?)
amount of body visible and the pose of the body
max dimensions of the image

Many thanks in advance

fudan-generative-vision / champ Goto Github PK

champ's Introduction

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Framework

News

Installation

Inference

Download pretrained models

Prepare your guidance motions

Run inference

Train the Model

Prepare Datasets

Run Training Scripts

Datasets

Roadmap

Citation

Opportunities available

champ's People

Contributors

Stargazers

Watchers

Forkers

champ's Issues

Recommend Projects

Recommend Topics

Recommend Org