Coder Social home page Coder Social logo

fudan-generative-vision / champ Goto Github PK

View Code? Open in Web Editor NEW
3.2K 175.0 375.0 779.27 MB

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Home Page: https://fudan-generative-vision.github.io/champ/

License: Apache License 2.0

Python 100.00%
human-animation video-generation image-animatioln

champ's Introduction

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

1Nanjing University 2Fudan University 3Alibaba Group
*Equal Contribution +Corresponding Author
head.mp4

Framework

framework

News

  • 2024/05/05: 🎉🎉🎉Sample training data on HuggingFace released.

  • 2024/05/02: 🌟🌟🌟Training source code released #99.

  • 2024/04/28: 👏👏👏Smooth SMPLs in Blender method released #96.

  • 2024/04/26: 🚁Great Blender Adds-on CEB Studios for various SMPL process!

  • 2024/04/12: ✨✨✨SMPL & Rendering scripts released! Champ your dance videos now💃🤸‍♂️🕺. See docs.

  • 2024/03/30: 🚀🚀🚀Amazing ComfyUI Wrapper by community. Here is the video tutorial. Thanks to @kijai🥳

  • 2024/03/27: Cool Demo on replicate🌟. Thanks to @camenduru👏

  • 2024/03/27: Visit our roadmap🕒 to preview the future of Champ.

Installation

  • System requirement: Ubuntu20.04/Windows 11, Cuda 12.1
  • Tested GPUs: A100, RTX3090

Create conda environment:

  conda create -n champ python=3.10
  conda activate champ

Install packages with pip

  pip install -r requirements.txt

Install packages with poetry

If you want to run this project on a Windows device, we strongly recommend to use poetry.

poetry install --no-root

Inference

The inference entrypoint script is ${PROJECT_ROOT}/inference.py. Before testing your cases, there are two preparations need to be completed:

  1. Download all required pretrained models.
  2. Prepare your guidance motions.
  3. Run inference.

Download pretrained models

You can easily get all pretrained models required by inference from our HuggingFace repo.

Clone the the pretrained models into ${PROJECT_ROOT}/pretrained_models directory by cmd below:

git lfs install
git clone https://huggingface.co/fudan-generative-ai/champ pretrained_models

Or you can download them separately from their source repo:

  • Champ ckpts: Consist of denoising UNet, guidance encoders, Reference UNet, and motion module.
  • StableDiffusion V1.5: Initialized and fine-tuned from Stable-Diffusion-v1-2. (Thanks to runwayml)
  • sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
  • image_encoder: Fine-tuned from CompVis/stable-diffusion-v1-4-original to accept CLIP image embedding rather than text embeddings. (Thanks to lambdalabs)

Finally, these pretrained models should be organized as follows:

./pretrained_models/
|-- champ
|   |-- denoising_unet.pth
|   |-- guidance_encoder_depth.pth
|   |-- guidance_encoder_dwpose.pth
|   |-- guidance_encoder_normal.pth
|   |-- guidance_encoder_semantic_map.pth
|   |-- reference_unet.pth
|   `-- motion_module.pth
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Prepare your guidance motions

Guidance motion data which is produced via SMPL & Rendering is necessary when performing inference.

You can download our pre-rendered samples on our HuggingFace repo and place into ${PROJECT_ROOT}/example_data directory:

git lfs install
git clone https://huggingface.co/datasets/fudan-generative-ai/champ_motions_example example_data

Or you can follow the SMPL & Rendering doc to produce your own motion datas.

Finally, the ${PROJECT_ROOT}/example_data will be like this:

./example_data/
|-- motions/  # Directory includes motions per subfolder
|   |-- motion-01/  # A motion sample
|   |   |-- depth/  # Depth frame sequance
|   |   |-- dwpose/ # Dwpose frame sequance
|   |   |-- mask/   # Mask frame sequance
|   |   |-- normal/ # Normal map frame sequance
|   |   `-- semantic_map/ # Semanic map frame sequance
|   |-- motion-02/
|   |   |-- ...
|   |   `-- ...
|   `-- motion-N/
|       |-- ...
|       `-- ...
`-- ref_images/ # Reference image samples(Optional)
    |-- ref-01.png
    |-- ...
    `-- ref-N.png

Run inference

Now we have all prepared models and motions in ${PROJECT_ROOT}/pretrained_models and ${PROJECT_ROOT}/example_data separately.

Here is the command for inference:

  python inference.py --config configs/inference/inference.yaml

If using poetry, command is

poetry run python inference.py --config configs/inference/inference.yaml

Animation results will be saved in ${PROJECT_ROOT}/results folder. You can change the reference image or the guidance motion by modifying inference.yaml.

The default motion-02 in inference.yaml has about 250 frames, requires ~20GB VRAM.

Note: If your VRAM is insufficient, you can switch to a shorter motion sequence or cut out a segment from a long sequence. We provide a frame range selector in inference.yaml, which you can replace with a list of [min_frame_index, max_frame_index] to conveniently cut out a segment from the sequence.

Train the Model

The training process consists of two distinct stages. For more information, refer to the Training Section in the paper on arXiv.

Prepare Datasets

Prepare your own training videos with human motion (or use our sample training data on HuggingFace) and modify data.video_folder value in training config yaml.

All training videos need to be processed into SMPL & DWPose format. Refer to the Data Process doc.

The directory structure will be like this:

/training_data/
|-- video01/          # A video data frame
|   |-- depth/        # Depth frame sequance
|   |-- dwpose/       # Dwpose frame sequance
|   |-- mask/         # Mask frame sequance
|   |-- normal/       # Normal map frame sequance
|   `-- semantic_map/ # Semanic map frame sequance
|-- video02/
|   |-- ...
|   `-- ...
`-- videoN/
|-- ...
`-- ...

Select another small batch of data as the validation set, and modify the validation.ref_images and validation.guidance_folders roots in training config yaml.

Run Training Scripts

To train the Champ model, use the following command:

# Run training script of stage1
accelerate launch train_s1.py --config configs/train/stage1.yaml

# Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2
accelerate launch train_s2.py --config configs/train/stage2.yaml

Datasets

Type HuggingFace ETA
Inference SMPL motion samples Thu Apr 18 2024
Training Sample datasets for Training Sun May 05 2024

Roadmap

Status Milestone ETA
Inference source code meet everyone on GitHub first time Sun Mar 24 2024
Model and test data on Huggingface Tue Mar 26 2024
Optimize dependencies and go well on Windows Sun Mar 31 2024
Data preprocessing code release Fri Apr 12 2024
Training code release Thu May 02 2024
Sample of training data release on HuggingFace Sun May 05 2024
Smoothing SMPL motion Sun Apr 28 2024
🚀🚀🚀 Gradio demo on HuggingFace TBD

Citation

If you find our work useful for your research, please consider citing the paper:

@misc{zhu2024champ,
      title={Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance},
      author={Shenhao Zhu and Junming Leo Chen and Zuozhuo Dai and Yinghui Xu and Xun Cao and Yao Yao and Hao Zhu and Siyu Zhu},
      year={2024},
      eprint={2403.14781},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Opportunities available

Multiple research positions are open at the Generative Vision Lab, Fudan University! Include:

  • Research assistant
  • Postdoctoral researcher
  • PhD candidate
  • Master students

Interested individuals are encouraged to contact us at [email protected] for further information.

champ's People

Contributors

aricgamma avatar leoooo333 avatar shenhaozhu avatar subazinga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

champ's Issues

Parametric shape alignment

Hi, thanks for sharing this work -- it looks great!
The paper mentions parametric shape alignment that sounds intriguing. however, i'm not seeing any reference to those models in the codebase. do you plan to release the inference/models for that as well?.

How to obtain normal map

According to your previous advice, I have tried Smpler-X and depth anything to obtain 3-D human body and corresponding depth image. I also have a question about how to obtain normal map. Could you please tell me the model you use?

Thank you!

CUDA out of memory, feature request

This looks really wonderful. Many thanks for sharing with the community.
I gave up looking at the System requirement and Graphics card requirement but thought no harm in trying.

So I tested on Windows 11 + RTX 4060 8 GB VRAM + 16 GB RAM and it worked.
The changes I did was to keep the frames inside motion to 20 frames and deleted remaining from all folders within motion. [EDIT] Tried with 40 motion frames and it worked without complaining. :)

  1. requesting to add a feature of batch processing and the number of frames can be specified in configs/inference.yaml
  2. Correct the system requirements to also include Windows. along with Ubuntu20.04

Here is the output
https://github.com/fudan-generative-vision/champ/assets/2102186/b3fc3b93-a5cc-4ef7-94d3-22cd4e5ed9f3

the code of computing PSNR in the Disco repository is wrong

The Disco code of computing PSNR as follows is wrong:
https://github.com/Wangt-CN/DisCo/blob/8538889c9ee9edd8dd43ffee182d1a91ce7a9828/tool/metrics/ssim_l1_lpips_psnr.py#L13.

image

As pointed out in Wangt-CN/DisCo#86, the accurate code is mse = np.mean((original/1.0 - compressed/1.0) ** 2) instead of mse = np.mean((original - compressed) ** 2) , because original and compressed images are uint8 in their code, and (original - compressed) * * 2 will cause numerical overflow.

If you use their evaluation code of computing PSNR, please update your results.

Commercial usage?

Are the checkpoints available for commercial usage?
Thank you for your work!

What model is to be used for extraction of the semantic segmentation map?

Thank you for your awesome work released in open source! I really appreciate the impact this paper and code will bring to the community.

I would like to test this model by using in-the-wild video which requires preprocessing.
I am planning to use the follows:

  • Depth: Depth Anything with greyscale
  • Dwpose: official dwpose repository
  • Normal: ICON normal map
  • mask: unsure whether it is necessary as "inference.yaml" does not require mask for its guidance_types by default

Meanwhile, I am unsure what models I should be using for semantic segmentation map. Please guide me if there is any model that are suitable to be used in the data preprocessing stage.

Video flickers severely on my own data

Hi,Thank you very much for your great work. it's really awesome!

I successfully ran the entire project on my own dataset, but the generated results seem to flicker much more severely compared to the example data. Is there a way to stabilize the results like the examples do?
The tools I used are as follows:

0x1. To ensure the stability of the running results, I deliberately resized the ref_image and motion_data to the same dimensions. The ref_image was regenerated based on the pose from the motion data.
0x2. I obtained complete motion data based on the project at https://github.com/kijai/ComfyUI-champWrapper, and then imported the data into CHAMP to run the process:
1.1 DSINE Normal Map to obtain normal data
1.2 DWpose Estimator to obtain dwpose data
1.3 Depth Anything to obtain depth data
1.4 DensePose Estimator to obtain semantic_map data
0x3. The motion data and ref image I used are attached.
Thank you very much.

data.zip

2 Issues regarding example data

  1. In example_data/motions/motion-0X:
    There are extra output.mp4 which should not be in the dataset.
    This causes an error in 'inference.py' line 78 by opening a video file using PIL.Image module
    I fixed this in my local by adding the following lines after line 73:
    try: Image.open(guidance_image_path).convert("RGB") except: continue

  2. In example_data/motions/motion-07:
    There is an extra 0389_all.png for motion-07/semantic_map while other depth, dwpose, mask, normal does not have 0389_all.png
    This causes an assertion error in 'inference.py' line 87
    I manually fixed it by erasing the file in my folder, but it would be grateful if you fix it by adding a code that automatically skips the image if it does not match with other guidances.

i can not find the file "smpl_rendering.blend"

hello,
thank you for your great work, it is solid and meaningful.
i am interested in your four conditions(depth, normal, seg_map, pose) rendered from SMPL model. this is also meaning for my work now.
however, i follow your old version code(https://github.com/Leoooo333/champ/tree/master?tab=readme-ov-file). when i process SMPL section, i have finished Fit SMPL and Transfer SMPL successfully.
THE QUESTION is the command "blender smpl_rendering.blend --background --python rendering.py --driving_path test_smpl/transfer_result/smpl_results --reference_path test_smpl/reference_imgs/images/ref.png" of RINDERING section, i can not find the file smpl_rendering.blend in the code.
i am confused, how can i find this file to finish my rendering? thank you!

Vram required?

Thanks for the great work! What is the minimum Vram required?

comfyui verison: size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

when I use the comfyui version, I met a tensor error , I really don't know what is going on, can someone help me, the workflow is below

Error occurred when executing champ_sampler:

The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

ERROR:root:Traceback (most recent call last):
File "/root/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/root/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/root/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 418, in process
result_video_tensor = inference(
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 471, in inference
video = pipeline(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/pipelines/pipeline_aggregation.py", line 550, in call
pred = self.denoising_unet(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/models/unet_3d.py", line 484, in forward
sample = sample + guidance_fea
RuntimeError: The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4
Champ_replace_person_01 (1).json

How to get depth images

Wonderful Work!!
I really appreciate you release your model and testing code. I have a question about the depth images. How do you get them?
Thank you!

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

(champ) meme@ubuntugpu:~/champ$ /mnt/data/meme/.conda/envs/champ/bin/python  inference.py --config configs/inference.yaml
[2024-04-06 14:13:13,650] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-04-06 14:13:14.193352: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-06 14:13:14.243075: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-06 14:13:15.100858: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/meme/.local/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
04/06/2024 14:13:16 - INFO - root - Running inference ...
04/06/2024 14:13:29 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
04/06/2024 14:13:56 - INFO - models.unet_3d - Load motion module params from pretrained_models/champ/motion_module.pth
04/06/2024 14:14:14 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
  File "/mnt/data/meme/champ/inference.py", line 312, in <module>
    main(cfg)
  File "/mnt/data/meme/champ/inference.py", line 260, in main
    result_video_tensor = inference(
  File "/mnt/data/meme/champ/inference.py", line 134, in inference
    video = pipeline(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/data/meme/champ/pipelines/pipeline_aggregation.py", line 387, in __call__
    clip_image_embeds = self.image_encoder(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1310, in forward
    vision_outputs = self.vision_model(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 865, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Am I supposed to manually create the folders for the models?

./pretrained_models/
|-- champ
| |-- denoising_unet.pth
| |-- guidance_encoder_depth.pth
| |-- guidance_encoder_dwpose.pth
| |-- guidance_encoder_normal.pth
| |-- guidance_encoder_semantic_map.pth
| |-- reference_unet.pth
| -- motion_module.pth |-- image_encoder | |-- config.json | -- pytorch_model.bin
|-- sd-vae-ft-mse
| |-- config.json
| |-- diffusion_pytorch_model.bin
| -- diffusion_pytorch_model.safetensors -- stable-diffusion-v1-5
|-- feature_extractor
| -- preprocessor_config.json |-- model_index.json |-- unet | |-- config.json | -- diffusion_pytorch_model.bin
`-- v1-inference.yaml

I don't have a folder called pretrained_models, champ(unless it means the main app folder which is called champ), image_ encoder, sd-vae-ft, stable-diffusion-v-15

Why do you choose to use blender to render condition images?

Hi dear author @Leoooo333,

Thanks for releasing your code and provide guidance in preparing condition images to run with your method!

I noticed that you use pyrender as in HMR2 for semantic condition rendering and blender for rendering the rest of conditions. I am wondering why do you make this choice -- is it intentional? Will there be any issue we should keep in mind if we only use, say, pyrender to render all conditions?

Thanks!
Hang

some problems in data_process.md

When i follow data_process.md to setup environment and download models, i realize some problems about downloading models. The original text said "download our Pose model dw-ll_ucoco_384.onnx and Det model yolox_l.onnx, then put them into Champ/annotator/ckpts/", the "Champ" is the root directory of the project or "pretrained_models/champ",or should the models be placed in an unopened "annotator/ckpts"?

As shown in the red boxes in the screenshots below, "annotator" and "hmr2" don’t feel like third-party libraries, but there is no relevant directories in repo. Has it not been released yet?
image

image

Inference time

Hi, I'm grateful for your excellent work! I've implemented the code as per the instructions, and it runs without errors. However, the inference time is slow, approximately 176 seconds per iteration. I tested it on an 80G A100 GPU, and it seems to be using around 71G of GPU memory. Is this normal?
image
image

Memory Consumption

Hello,

This looks like an excellent piece of work - thank you for releasing openly with models available!

Question on whether there are any means by which we can reduce VRAM usage? For those of us who don't have an A100 :)

Cheers.

给项目点赞!制作 Motion 的这一套流程技术太强了!

✌✌✌用了最少帧的 06 样本,跑起来了。。。
启动中,请耐心等待......
03/26/2024 15:38:35 - INFO - root - Running inference ...
03/26/2024 15:38:39 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ...
03/26/2024 15:38:46 - INFO - models.unet_3d - Load motion module params from pretrained_models\champ\motion_module.pth
D:\AITest\Champ\runtime\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
03/26/2024 15:38:49 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
90%|████████████████████████████████████████████████████████████████████████ | 18/20 [18:10<02:01, 60.65s/it]

The output video is messy and disorganized

I take the configs/conference.yaml file for testing, and my graphics memory is only 20GB. Therefore I deleted a lot of the action images of motion-09, leaving only the first 100 images with an output resolution of 512x512. But the output video had serious errors, what is the reason for this? I tested a total of 4 images and actions in example_data, but none of them yielded the correct results
20240328182857

Blender script for animations

Hi, your script is awesome! I have a question, when will you make the Blender file available for generating new animations from video? Because at the moment, I can only use the example data for animation

CUDA out of memory

Hi :) Champ is really an inspiring work! During my experiment, Champ shows a high demand of memory and I cannot run the inference code on 3090 due to out of memory. May I ask is there any solution to solve this except changing to A10

The resulting video flickers badly

Thank you for your work! I have followed your instructions to complete the entire process, but the generated video flickers very seriously. Is this caused by SMPL not being smoothed? Can this problem be perfectly solved if SMPL is smoothed?

grid_wguidance.mp4

Some doubts after testing Champ

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.

grid_wguidance.mp4

Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.

91D23ZVV6NS.mp4

Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.

In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent.
image

However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.

animation.mp4

Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Salute your open source spirit!

The earth and the sky will praise your generosity, and countless algorithm engineers will praise you, the selfless devotee, the great architect.

Torch Not Compile with Cuda Enable

My machine is RTX4080 Window
I install all pretrain models, packages and run it in conda, when I use torch==2.0.1, it will say Torch not compiled with CUDA enabled.

  File "D:\Github\champ\inference.py", line 284, in <module>
    main(cfg)
  File "D:\Github\champ\inference.py", line 162, in main
    ).to(dtype=weight_dtype, device="cuda")
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\transformers\modeling_utils.py", line 1902, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\cuda\__init__.py", line 293, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I also tried using "conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia", in that way,
it will say

`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'NoneType'>
    operator wasn't built - see `python -m xformers.info` for more info
`[email protected]` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
`tritonflashattF` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 40

how to solve this

Is it possible to use customized basemodel?

I would like to run the inference using other base model than SD1.5, such as majicMIX-realistic on Hugging Face.

I faced a problem running it on other basemodel. Simply changing the cfg.base_model_path to majicMIX-realistic does not work:
denoising_unet/reference_unet.load_state_dict( ... ) at line 200-213 resets the unet to the base model
When I nullify line 200-213, what I obtain as result is basic noise image of grey color.

I would like to know whether it is possible to use .safetensor on CIVITAI to change the base model of the image?
Or, are the denoising_unet.pth and reference_unet.pth in provided checkpoint specialized for their own task which makes other models unable to function?

If there is a method that I can easily implement using other base model, please guide me. Thank you!

V100爆显存

使用V100(32G)运行
CUDA_VISIBLE_DEVICES=6 python inference.py --config configs/inference.yaml

爆显存如下
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.95 GiB (GPU 0; 31.75 GiB total capacity; 24.84 GiB already allocated; 1.96 GiB free; 28.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

需要多大的显存才可以运行或者修改哪里?

can't open file 'C:\\sd1\\champ\\inference_smpl.py'

Hello,

I am trying to SMPL & Rendering for own video.

Followed all the steps but it comes up with an error

(venv) C:\sd1\champ>python inference_smpl.py  --reference_imgs_folder test_smpl/reference_imgs --driving_videos_folder test_smpl/driving_videos --device 1
C:\Program Files\Python310\python.exe: can't open file 'C:\\sd1\\champ\\inference_smpl.py': [Errno 2] No such file or directory

Can't locate this file on github

where is model_config.yaml?

Traceback (most recent call last):
File "/workspace/champ/4D-Humans/inference_smpl.py", line 74, in
model, model_cfg = load_hmr2(DEFAULT_CHECKPOINT)
File "/workspace/champ/4D-Humans/hmr2/models/init.py", line 72, in load_hmr2
model_cfg = get_config(model_cfg, update_cachedir=True)
File "/workspace/champ/4D-Humans/hmr2/configs/init.py", line 103, in get_config
cfg.merge_from_file(config_file)
File "/opt/conda/lib/python3.10/site-packages/yacs/config.py", line 211, in merge_from_file
with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/4DHumans/logs/train/multiruns/hmr2/0/model_config.yaml'

No matter how hard I look, I can't find the relevant yaml file. Is there anything I missed?

Strong image distortion

Hi

Thank you very much for your great work.

I have tried your model using reference images of my own and the end result is often not visually pleasing at all (face distorted, image proportions changed, ...).

I would be grateful if you could provide any potential constraint regarding the source image, for instance :

  • any ratio height / width for the image
  • proportion of the head versus the rest of the image
  • location of the head and body in the image (horizontally centered ?)
  • amount of body visible and the pose of the body
  • max dimensions of the image

Many thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.