humanaigc / animateanyone Goto Github PK

View Code? Open in Web Editor NEW

14.0K 684.0 920.0 2.98 MB

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

License: Apache License 2.0

animateanyone's Introduction

AnimateAnyone

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Li Hu, Xin Gao, Peng Zhang, Ke Sun, Bang Zhang, Liefeng Bo

Updates

Thank you all for your incredible support and interest in our project. We've received lots of inquiries regarding a demo or the source code. We want to assure you that we are actively working on preparing the demo and code for public release. Although we cannot commit to a specific release date at this very moment, please be certain that the intention to provide access to both the demo and our source code is firm.

Our goal is to not only share the code but also ensure that it is robust and user-friendly, transitioning it from an academic prototype to a more polished version that provides a seamless experience. We appreciate your patience as we take the necessary steps to clean, document, and test the code to meet these standards.

Thank you for your understanding and continuous support.

Citation

@article{hu2023animateanyone,
  title={Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation},
  author={Li Hu and Xin Gao and Peng Zhang and Ke Sun and Bang Zhang and Liefeng Bo},
  journal={arXiv preprint arXiv:2311.17117},
  website={https://humanaigc.github.io/animate-anyone/},
  year={2023}
}

animateanyone's People

Contributors

Stargazers

Watchers

Forkers

liannice okb48 ekusiadadus ureshinop liukun24 jeffreyxuworld qzchenwl ellarnold mathdroid tomchapin jonnyquan tradingindian d-mad lyhiving davincibj leckyhuang chomyeong friendbear vmromero-ubi keyman9848 oreml felixchan9527 idreamaker brookswood touristshaun chi2nagisa amorjnyh ssarswat violetorz jagamypriera exgbit shih-yu-yeh moisescu70 legendadmin linecode sysdba lavineleo decentricity adhurim-del jmanhype xuzhouwang ogada104 renatomedev dimaioksha goooose339 rminz yousef-took tkyyds andrbond23 liudunxu kaann12 javierppe13 lrochetta serdalfb ramboxie belmerbelandres dmillner jadouse5 diepdo1810 planofrecordxyz sandhiyara eliasschwalme readeraiden avsessisozx jaedukseo quantumtau cjh88888 danya12aa1 sabwww ggenny geckodroid enock360 g7b9 ku100ren molotov-y hadryan goanywhereyoulike mikey240 wpwawan tuhinmallick asdlei99 godzillayellowquestan taiji1985 hafewa soi-20 foxyear-kyumin zjx54959 mkusiciel portu-sim blizaine k-aquatical hypezz94 vbrnv rkp64 a7mad-magdy77 laurence2k former79 mexicanamerican ismailkaratr lovehifi

animateanyone's Issues

great work, will model be released publicly?

project1

make dance

An empty project

An empty project, with just a readme, getting so many stars, is truly unbelievable.

What is the least requires GPU memory?

I reproduced the code and trained it on V100 32GB, but the OOM still occured even when the batch_size=1, image_resolution=128x128, and fp16 amp training.

Licencia 2.0 apache

蹲点ing...

Some Questions About the ReferenceNet

The ReferenceNet takes as input the VAE encode image. Does it add noise to it?

If you are not adding noise to the ReferenceNet image latents, do you call the ReferenceNet U-Net multiple times with the same timesteps as the denoising network, or are just calling it with a single timestep?

project is from the Alibaba development team?

When you hear that a project is from the Alibaba development team, you should understand the following points:

The possibility of it being open-source is low.
There's more talk than action.
Many projects are abandoned halfway.

Fraud in the AI industry, Vaporware aka "Proof of concept, or GTFO"

Too much of this going on.

It's all too easy to fake results and attain funding - this needs to stop!

https://github.com/magic-research/magic-animate for a working alternative

GitHub Community Guidelines

3 major red flags that makes me suspicious this is fake

All woman and attractive

So good consistency. We don't have anything close even for video to animation

And of course not even code released

By the way fake means it doesn't work as it is advertised

What network do you use to convert image to skeleton like pose?

I'm wondering what network do you use to convert image to skeleton like image like this below.

Request for Consideration: Inclusive Image Usage in Repository

I hope this finds you well. I would like to bring to your attention a concern regarding the usage of images, particularly pictures of women, in your GitHub repository. There's a possibility that the individuals in these images may not have consented to their use.

In the spirit of promoting inclusivity and respecting individuals' privacy, I kindly request a review and consideration for alternative, more inclusive imagery in your repository. It's essential to ensure that the visual content aligns with ethical practices and fosters a welcoming environment for all contributors and users.

Many journals also have be speaking out about using images without consent, so you can get problems with publishing this. Source: 1 Journal of Modern Optics, 2 Nature Nanotechnology 3 Optical Engineering

Furthermore here are some resources for alternative pictures or just use a picture of yourself, that would be a great showcase on how quickly it is possible to make anyone dance with your application! :)

I appreciate your understanding and attention to this matter. If you have any questions or would like further clarification, please feel free to reach out!

Homme portant un vêtement de compression

Questions about the training data

Excellent work!
I’m surprised that it animates both real and cartoon characters very well. Does the training dataset contain cartoon characters? And how to ensure the pose sequences are applicable to both real and cartoon?

How do you process non-square images passed to CLIP when doing inference?

You use square patches during training, but how do you handle non-square images during inference? Directly resize non-square images to (224, 224)? Looking forward to your reply;)

https://instantid.github.io/ 和这个配合岂不是无敌！？

https://instantid.github.io/

求联名求联名！

发布的时间越早越好

既然论文已经发布，代码求求了，不然现在的内卷情况外一那天抖音就出现这个功能了怎么办，抢占先机最好

Anime

Everybody, ByteDance announced a method for animating person by one pic

https://github.com/magic-research/magic-animate

这个在通义千问APP已经上线了，输入全民舞王可以体验，可能要被阿里巴巴没收了吧，能挣钱的东西我为什么要开源，呵呵

Sahin

dance

New github accounts try not to flood n spam this repo's issues tab with "Source code when????" challenge (Impossible)

Test

When will the source code be released?

Hello! I am very interested in your impressive and amazing work after watching your demo!
I wonder when will you release your source code because it seems amazing!

Thank you!

No code get out

No code

啥时候开源啊

跟这个简直是绝配

https://instantid.github.io/

I've just completed a simplified version of AnimateAnyone and invite everyone to give it a try! Currently, the training code has been made available, and soon we'll be releasing our pre-trained models as well. You can access the project via this link: [https://github.com/guoqincode/AnimateAnyone-unofficial](https://github.com/guoqincode/AnimateAnyone-unofficial). Looking forward to your feedback and support!

I've just completed a simplified version of AnimateAnyone and invite everyone to give it a try! Currently, the training code has been made available, and soon we'll be releasing our pre-trained models as well. You can access the project via this link: https://github.com/guoqincode/AnimateAnyone-unofficial. Looking forward to your feedback and support!

Collaborators for an Implementation (Awaiting Code Release...)

the project is interesting in the context of security, allowing for the replication of dangerous scenes.

Therefore, I am proceeding with the implementation using the few details provided in the article.

I have already set up a draft for the temporal part and am testing various trainings, adapting ControlLDM as I inferred from the images in the article. I am seeing the first results, and most of the details are maintained, but the other network, the one that preserves the details, is missing.

It is clear from the videos that it is done in steps, on groups of frames because the jump is noticeable when looking at, for example, the background.

Is anyone working on sketching out the detail preservation part?

Is this gonna be open-source? 😄

Why use the information from the Reference Image to encode Spatial-Attention and Cross-Attention into the Denoising UNet's Pose Sequence method instead of the other way around?

Is there any mathematical explanation to show that encoding the information from the Reference Image using Spatial-Attention and Cross-Attention into the Denoising UNet's Pose Sequence method is more advantageous than the reverse?

[Official Updates] Follow-up plans for the project

Thank you for your understanding and continuous support.

人未至，吠声先到？

源码还没发布，就斩获11k星，别辜负大家的期待？

为什么要采用Reference Image的信息用Spatial-Attention和Cross-Attention编码到Pose Sequence的Denoising UNet里面的方法而不是反过来

有什么数学解释说明采用Reference Image的信息用Spatial-Attention和Cross-Attention编码到Pose Sequence的Denoising UNet里面的方法比反过来更有优势吗

I love this!

I would love this very much as a composer, director and animation guy! I want to participate!

建个群抱团取暖。

图片到视频的探索群。

主要解决图片到视频过程中遇到的一系列问题：包括环境安装，图片到动作的提取，视频崩坏，显存爆满，多卡运行，底模更换等一些列问题，收集一些优秀作品案例。提高沟通效率，降低学习成本，推动项目发展。

Why are there already +140 forks of this read-me file?

59466efa9ba44a0faca481507ef859e32729ee3b53a18b740a53eb10732be417_1_001.mp4

Men

Men wearing a compression shirt

Some problems of my unofficial implementation

Hi,

I have unofficially reproduced the code for 'Animate Anyone' based on the description in your paper. However, I encountered two issues during the training process:

Currently, with a single GPU and a batch size of 2, I have trained for 8k iterations. The generated images show a significant difference in the background compared to the target images, which are pure white. The third row in the following figure.

The faces reconstructed by the VAE decoding exhibit distortion. I'm wondering if it's possible to utilize the latent diffusion model to capture the information lost by the VAE and correct the distorted faces. In your video demo, the faces appear clear, and I'm unsure how to address this issue.

I love this!

I would love this very much as a composer, director and animation guy! I want to participate!

Considerations for Keeping Your Code Closed Source

You should consider not releasing your code as open source if your business model relies on maintaining a competitive edge and generating revenue from software sales. Following OpenAI's example, keeping your code closed source allows you to control distribution and directly monetize your innovations. It also protects your intellectual property and ensures that you can provide high-quality support and services to your customers, which often serves as an additional revenue stream. This decision is not solely about security and ethics; it's also about establishing a strong financial foundation for sustainable growth.

Can the developer(s) perhaps give us weekly updates on the status of the project for public transparency on what's happening?

I recommend creating a Discord server for the project and setting up an announcement channel like a lot of AI projects do.

预计什么时候会开源吗？

Unable to Locate Downloadable Code

Hello,

I am eagerly looking forward to your amazing tool's demo. I attempted to download the code and understood the general explanation, but I could not locate the actual source files. If the source has already been published, I would be grateful if you could direct me to it. I am looking forward to future updates.

Thank you.