modelscope / diffsynth-studio Goto Github PK

Enjoy the magic of Diffusion models!

License: Apache License 2.0

Python 100.00%

diffsynth-studio's Introduction

DiffSynth Studio

Introduction

DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!

Until now, DiffSynth Studio has supported the following models:

News

August 22, 2024 We have implemented an interesting painter that supports all text-to-image models. Now you can create stunning images using the painter, with assistance from AI!
- Use it in our WebUI.
August 21, 2024 FLUX is supported in DiffSynth-Studio.
- Enable CFG and highres-fix to improve visual quality. See here
- LoRA, ControlNet, and additional models will be available soon.
June 21, 2024. 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.
- Project Page
- Source code is released in this repo. See examples/ExVideo.
- Models are released on HuggingFace and ModelScope.
- Technical report is released on arXiv.
- You can try ExVideo in this Demo!
June 13, 2024. DiffSynth Studio is transferred to ModelScope. The developers have transitioned from "I" to "we". Of course, I will still participate in development and maintenance.
Jan 29, 2024. We propose Diffutoon, a fantastic solution for toon shading.
- Project Page
- The source codes are released in this project.
- The technical report (IJCAI 2024) is released on arXiv.
Dec 8, 2023. We decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis. The development of this project is started.
Nov 15, 2023. We propose FastBlend, a powerful video deflickering algorithm.
- The sd-webui extension is released on GitHub.
- Demo videos are shown on Bilibili, including three tasks.
- The technical report is released on arXiv.
- An unofficial ComfyUI extension developed by other users is released on GitHub.
Oct 1, 2023. We release an early version of this project, namely FastSDXL. A try for building a diffusion engine.
- The source codes are released on GitHub.
- FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
  - The original repo of OLSS is here.
  - The technical report (CIKM 2023) is released on arXiv.
  - A demo video is shown on Bilibili.
  - Since OLSS requires additional training, we don't implement it in this project.
Aug 29, 2023. We propose DiffSynth, a video synthesis framework.
- Project Page.
- The source codes are released in EasyNLP.
- The technical report (ECML PKDD 2024) is released on arXiv.

Installation

Install from source code (recommended):

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

Or install from pypi:

pip install diffsynth

Usage (in Python code)

The Python examples are in examples. We provide an overview here.

Download Models

Download the pre-set models. Model IDs can be found in config file.

from diffsynth import download_models

download_models(["FLUX.1-dev", "Kolors"])

Download your own models.

from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope

# From Modelscope (recommended)
download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae")
# From Huggingface
download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae")

Video Synthesis

Long Video Synthesis

We trained an extended video synthesis model, which can generate 128 frames. examples/ExVideo

github_title.mp4

Toon Shading

Render realistic videos in a flatten style and enable video editing features. examples/Diffutoon

Diffutoon.mp4

Diffutoon_edit.mp4

Video Stylization

Video stylization without video models. examples/diffsynth

winter_stone.mp4

Image Synthesis

Generate high-resolution images, by breaking the limitation of diffusion models! examples/image_synthesis.

LoRA fine-tuning is supported in examples/train.

FLUX	Stable Diffusion 3

Kolors	Hunyuan-DiT

Stable Diffusion	Stable Diffusion XL

Usage (in WebUI)

Create stunning images using the painter, with assistance from AI!

video.mp4

This video is not rendered in real-time.

Before launching the WebUI, please download models to the folder ./models. See here.

Gradio version

pip install gradio

python apps/gradio/DiffSynth_Studio.py

Streamlit version

pip install streamlit streamlit-drawable-canvas

python -m streamlit run apps/streamlit/DiffSynth_Studio.py

sdxl_turbo_ui.mp4

diffsynth-studio's People

Contributors

Stargazers

Watchers

Forkers

hyjump asksasasa83 yuitsunomori sensebar qymh233 winjia byteshow1234 keyo3 cococyh thisjam zorrock galcy111 nightwhite 128580359 faisalshahbaz davidko3 tommasoperusini odoo2055 hadryan le-wei arunbanswal ameerazam08 juangon sorokinvld dl-diffusion albertocasasortiz ricmor70 ilyamk yyheart lee101 humanoid-z ainewsto rain2307 zz-brian iamjedi888 klimpaparazzi bigdong89 wangzijian1010 leoxing1996 phanigenin doemsy nopeanuts kellhuang asmsafone likeboo zengyh1900 tonghengcheng agulator 731why leoloveaili linesword op3n-ai jsnjfz leo136146 dimafors yifanli7 aimdreamboy linhqyy yeyuxx xilai0715 attect wuqd2048 husw725 wtopull 542774114 leegle ifanio w617777 sdrzhtpc conglesolutionx sushirice-dg huahuatech anminhhung kayatmin zcfrank1st zhanglin77 linshen6868 zhanghongyong123456 xunnew zyh-hu sosypans gokotx zpdsherlock mengfanshi douwantech zfbok thanhpham1987 camenduru peanutcocktail paperwave kustomzone kumar045 cloudenginehub konisberg vickyzb diyism utopic-dev wxcorpdev pangpangz yanlong5417

diffsynth-studio's Issues

TOO MUCH ERRORS.

File "D:\DiffSynth-Studio\venv\lib\site-packages\imageio\core\format.py", line 437, in get_data
raise IndexError(index)
can you pls help me. thank you !

Load models into respective folders

Do we have any simple tricks to load all models to respective folders which needs to run this project?

where to set ip and port?

sysinfo:
Miniconda conda3
Python 3.10
(ubuntu22.04)

Repo is difficult to set up. Would appreciate an easy, working Colab notebook

Hello. I tried installing this locally, it took around an hour, and it did not work. I tried installing it on Colab, that took another hour, and it did not work. It would be nice to have a notebook attached to this repo so we can just click "run all" and try it out. Thank you for making this repo though.

运行examples/diffutoon_toon_shading.py报错

Traceback (most recent call last):
File "D:\PyCharm 2023.2.1\project\DiffSynth-Studio\examples\diffutoon_toon_shading.py", line 94, in
runner.run(config)
File "D:\PyCharm 2023.2.1\project\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 340, in run
model_manager, pipe = self.load_pipeline(**config["models"])
File "D:\PyCharm 2023.2.1\project\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 271, in load_pipeline
model_manager.load_textual_inversions(textual_inversion_folder)
File "D:\PyCharm 2023.2.1\project\DiffSynth-Studio\diffsynth\models_init_.py", line 177, in load_textual_inversions
for file_name in os.listdir(folder):
FileNotFoundError: [WinError 3] 系统找不到指定的路径。: 'models/textual_inversion'

运行examples\diffutoon_toon_shading.py报错

about use LCM_lora_1.5, I have a error

当我设置 Lora 为 LCM 1.5 Lora时候，发现这个错误：

我发现是在转换 downsamplers时候，down 对应的权重是（320，64，3，3，），后面两个维度不是能够压缩的1 1 ，我该如何修改呢，
lora_unet_down_blocks_0_downsamplers_0_conv.alpha:torch.Size([])
lora_unet_down_blocks_0_downsamplers_0_conv.lora_down.weight:torch.Size([64, 320, 3, 3])
lora_unet_down_blocks_0_downsamplers_0_conv.lora_up.weight:torch.Size([320, 64, 1, 1])
关于 diffusers 中 LCM Lora 实现，封装的太隐蔽，没有找到，请问是和项目一样的实现吗，

Upload examples via this issue

大佬模型可以打包上传网盘吗？

Explain sd_text_to_video numframes and fps

numframes default is 64
fps default is 120

Shouldn't this make a video lasting approx half a second? But it makes a 4 second video?

If I wanted a 10 second video at 30 fps, what would I set the numframes and fps to?

突然無法使用了

IndexError: 150
Traceback:
File "D:\DiffSynth-Studio.glut\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "D:\DiffSynth-Studio\pages\2_Video_Creator.py", line 197, in
SDVideoPipelineRunner(in_streamlit=True).run(config)
File "D:\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 337, in run
config["pipeline"]["pipeline_inputs"] = self.add_data_to_pipeline_inputs(config["data"], config["pipeline"]["pipeline_inputs"])
File "D:\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 315, in add_data_to_pipeline_inputs
pipeline_inputs["input_frames"] = self.load_video(**data["input_frames"])
File "D:\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 310, in load_video
frames = [video[i] for i in range(start_frame_id, end_frame_id)]
File "D:\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 310, in
frames = [video[i] for i in range(start_frame_id, end_frame_id)]
File "D:\DiffSynth-Studio\diffsynth\data\video.py", line 121, in getitem
frame = self.data.getitem(item)
File "D:\DiffSynth-Studio\diffsynth\data\video.py", line 15, in getitem
return Image.fromarray(np.array(self.reader.get_data(item))).convert("RGB")
File "D:\DiffSynth-Studio.glut\lib\site-packages\imageio\core\format.py", line 437, in get_data
raise IndexError(index)

Black pictures

Hello, the pictures I produce are always black. I guess it is related to half precision. How can I adjust the parameters?

关于lora模型的使用，还有是否支持正常使用sdxl

作者您好，请问如何使用lora模型？我这边尝试在diffutoon_toon_shading.py中，添加了lora模型models/lora/HWmakeup.safetensors，并且设置为1，提示词也用了lora:HWmakeup:1.0，但是感觉没有什么效果？

Docker Setup

I would like to add docker option to setup this project, so everyone can get started easily.

Repository Not Found for url: https://huggingface.co/models/Annotators/resolve/main/sk_model.pth.

Why am I getting this error? Can someone please help?

Traceback (most recent call last):
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/models/Annotators/resolve/main/sk_model.pth

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/examples/image_synthesis/sd_text_to_image.py", line 21, in
pipe = SDImagePipeline.from_model_manager(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion.py", line 69, in from_model_manager
pipe.fetch_controlnet_models(model_manager, controlnet_config_units)
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion.py", line 42, in fetch_controlnet_models
Annotator(config.processor_id),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/diffsynth/controlnets/processors.py", line 23, in init
self.processor = LineartDetector.from_pretrained(model_path).to("cuda")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/controlnet_aux/lineart/init.py", line 108, in from_pretrained
model_path = hf_hub_download(pretrained_model_or_path, filename, cache_dir=cache_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
raise head_call_error
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
hf_raise_for_status(response)
File "/home/goku-kakarot-7227/Desktop/DiffSynth-Studio/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 352, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-667bc91f-2e38fd6b79acdb762c1dba9d;c12df456-0cd6-4711-ac97-91c4de0683f8)

Repository Not Found for url: https://huggingface.co/models/Annotators/resolve/main/sk_model.pth.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.

cuda error

My computer has CUDA correctly installed and it works fine with other open-source projects. However, when I start this project, I get this error and cannot use other features.

How to work on low VRAM?

can not run sd_video_rerender.py

when running sd_video_rerender.py error came

(/root/autodl-tmp/DiffSynth-Studio/venv/DiffSynthStudio) root@autodl-container-28ee458c74-84d890ed:~/autodl-tmp/DiffSynth-Studio/examples# python sd_video_rerender.py
Traceback (most recent call last):
File "/root/autodl-tmp/DiffSynth-Studio/examples/sd_video_rerender.py", line 1, in
from diffsynth import ModelManager, SDVideoPipeline, ControlNetConfigUnit, VideoData, save_video
ModuleNotFoundError: No module named 'diffsynth'

如何更改采样器和采样器参数

如何更改采样器和采样器参数
没有在config里发现对应字段。

Diffutoon目前只支持方形视频吗？

如题，后续有调整计划吗？

ModuleNotFoundError: No module named 'basicsr'

我每次一选用controlnet-depth就会报以下的错误
ModuleNotFoundError: No module named 'basicsr'
预处理模型都是https://huggingface.co/lllyasviel/Annotators/tree/main这里下的，有没有人知道是为什么，或者这个depth对应链接里那个预处理模型？我认为是预处理器模型的问题

mat1 and mat2 shapes cannot be multiplied

When toon shading, two matrix shapes cannot be multiplied

部署后运行无法访问页面

Python 3.12.3
pip install -e .
python -m streamlit run DiffSynth_Studio.py

Please help,why IndexError: 2400?

input_video = [video[i] for i in range(40*60 41*60)]

File "/code/./diffsynth/data/video.py", line 121, in getitem
frame = self.data.getitem(item)
File "/code/./diffsynth/data/video.py", line 15, in getitem
return Image.fromarray(np.array(self.reader.get_data(item))).convert("RGB")
File "/opt/conda/lib/python3.10/site-packages/imageio/core/format.py", line 437, in get_data
raise IndexError(index)
IndexError: 2400

Video Creater Error Text2Video

TypeError: diffsynth.pipelines.stable_diffusion_video.SDVideoPipelineRunner.load_video() argument after ** must be a mapping, not NoneType
File "F:\DiffSynth-Studio-main\DiffSynthStudio\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "F:\DiffSynth-Studio-main\pages\2_Video_Creator.py", line 197, in
SDVideoPipelineRunner(in_streamlit=True).run(config)
File "F:\DiffSynth-Studio-main\diffsynth\pipelines\stable_diffusion_video.py", line 337, in run
config["pipeline"]["pipeline_inputs"] = self.add_data_to_pipeline_inputs(config["data"], config["pipeline"]["pipeline_inputs"])
File "F:\DiffSynth-Studio-main\diffsynth\pipelines\stable_diffusion_video.py", line 315, in add_data_to_pipeline_inputs
pipeline_inputs["input_frames"] = self.load_video(**data["input_frames"])

Flash attention question

Hi, Great work!!
I have one question: in the paper, you said that "we adopt flash attention [6] in all attention layers, including the text encoder, UNet, VAE, ControlNet models, and motion modules". I found the xformers_forward() function in the Attention module. However, this function is never called during the whole process of "diffutoon_toon_shading.py". It is very strange, since it still can generate high resolution videos. I am very confused how does this work?
Thanks!

Error when loading HunyuanDiTCLIPTextEncoder: Missing key "embeddings.position_ids" in state_dict

Thank you for your help and support!

Issue Description

An error occurred while running ExVideo_svd_test.py. The error happens when loading the HunyuanDiTCLIPTextEncoder model, indicating a missing key "embeddings.position_ids" in the state dictionary.

Execute the following command:

python examples/ExVideo/ExVideo_svd_test.py

output

pytorch_model.bin has been already in models/HunyuanDiT/t2i/clip_text_encoder.
pytorch_model.bin has been already in models/HunyuanDiT/t2i/mt5.
pytorch_model_ema.pt has been already in models/HunyuanDiT/t2i/model.
diffusion_pytorch_model.bin has been already in models/HunyuanDiT/t2i/sdxl-vae-fp16-fix.
Traceback (most recent call last):
File "/home/hans/DiffSynth-Studio/examples/ExVideo/ExVideo_svd_test.py", line 88, in
image = generate_image()
File "/home/hans/DiffSynth-Studio/examples/ExVideo/ExVideo_svd_test.py", line 34, in generate_image
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", model_id_list=["HunyuanDiT"])
File "/home/hans/DiffSynth-Studio/diffsynth/models/init.py", line 98, in init
self.load_models(downloaded_files + file_path_list)
File "/home/hans/DiffSynth-Studio/diffsynth/models/init.py", line 476, in load_models
self.load_model(file_path, lora_alphas=lora_alphas)
File "/home/hans/DiffSynth-Studio/diffsynth/models/init.py", line 464, in load_model
self.load_hunyuan_dit_clip_text_encoder(state_dict, file_path=file_path)
File "/home/hans/DiffSynth-Studio/diffsynth/models/init.py", line 360, in load_hunyuan_dit_clip_text_encoder
model.load_state_dict(model.state_dict_converter().from_civitai(state_dict))
File "/home/hans/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for HunyuanDiTCLIPTextEncoder:
Missing key(s) in state_dict: "embeddings.position_ids".

是否支持Mac，还是只能在N卡上用？

建一个社区交流群，这样大家有问题可以直接提问和讨论

建议建一个社区交流群，这样大家有问题可以直接提问和讨论
如果二维码过期了，可以直接添加群主微信号 dreamingforhope

关于推理速度

请问下如果使用sdxl_lightling的话，是不是推理速度会有提升。我现在4090单卡渲染20秒的视频花了快两个小时才好。

Hello sir, why image creator does not have lora and contrlnet modules?

卡在“模型加载”

6G显存是不是搞不定啊...

如何使用SDXL模型进行视频转绘？

如题

关于运行配置的疑惑？

请问能在 README 说明一下运行项目的具体配置吗？
我不太清楚自己的环境能不能运行改项目。

test

testtesttesttesttesttest

阿里牛逼

如何处理16:9视频以及长视频

如果设置的比例为16:9，则会发生异常

100%|█████████████████████████████████████████████████████████████████████████████████| 300/300 [00:01<00:00, 288.29it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 300/300 [00:16<00:00, 17.91it/s]
  0%|                                                                                             | 0/10 [00:02<?, ?it/s]
2024-03-19 16:17:44.901 Uncaught app exception
Traceback (most recent call last):
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
    exec(code, module.__dict__)
  File "/mnt/e/DiffSynth-Studio/examples/diffutoon_toon_shading.py", line 94, in <module>
    runner.run(config)
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 349, in run
    output_video = self.synthesize_video(model_manager, pipe, config["pipeline"]["seed"], smoother, **config["pipeline"]["pipeline_inputs"])
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 299, in synthesize_video
    output_video = pipe(**pipeline_inputs, smoother=smoother)
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 221, in __call__
    noise_pred_posi = lets_dance_with_long_video(
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 38, in lets_dance_with_long_video
    hidden_states_batch = lets_dance(
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/dancer.py", line 72, in lets_dance
    hidden_states, time_emb, text_emb, res_stack = block(hidden_states, time_emb, text_emb, res_stack)
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/e/DiffSynth-Studio/diffsynth/models/sd_unet.py", line 222, in forward
    hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 24 but got size 23 for tensor number 1 in the list.

如果帧数大于1000，经测试好像会把所有帧解码为图像存在内存中，128GB内存也顶不住，能否改为按需解码或者全部处理为图片储存到磁盘上，使用时载入

100%|██████████████████████████████████████████████████████████████████████████████████| 900/900 [00:09<00:00, 92.59it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 900/900 [01:50<00:00,  8.18it/s]
Killed

如何优化显存使用

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 8.00 GiB total capacity; 7.31 GiB already allocated; 0 bytes free; 7.99 GiB allowed; 7.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

如题

CPU：5800H
显卡：3070 loptop 8G
内存：64GB

Image Creator
Model type：sd_xl_turbo_1.0_fp16.safetensors

Drawing tool：freedraw
Stroke width：50
Denoising strength：0.7
Repetition：6

Generate image：
enable auto update
columns：3

PermissionError: [WinError 10013] 以一种访问权限不允许的方式做了一个访问套接字的尝试。

(DiffSynthStudio) D:\下载\DiffSynth-Studio-main>python -m streamlit run Diffsynth_Studio.py
Traceback (most recent call last):
File "D:\Anaconda\envs\DiffSynthStudio\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Anaconda\envs\DiffSynthStudio\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit_main.py", line 20, in
main(prog_name="streamlit")
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit\web\cli.py", line 233, in main_run
_main_run(target, args, flag_options=kwargs)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit\web\cli.py", line 269, in _main_run
bootstrap.run(file, is_hello, args, flag_options)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit\web\bootstrap.py", line 430, in run
asyncio.run(run_server())
File "D:\Anaconda\envs\DiffSynthStudio\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "D:\Anaconda\envs\DiffSynthStudio\lib\asyncio\base_events.py", line 647, in run_until_complete
return future.result()
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit\web\bootstrap.py", line 418, in run_server
await server.start()
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit\web\server\server.py", line 262, in start
start_listening(app)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit\web\server\server.py", line 129, in start_listening
start_listening_tcp_socket(http_server)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\streamlit\web\server\server.py", line 188, in start_listening_tcp_socket
http_server.listen(port, address)
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\tornado\tcpserver.py", line 183, in listen
sockets = bind_sockets(
File "D:\Anaconda\envs\DiffSynthStudio\lib\site-packages\tornado\netutil.py", line 162, in bind_sockets
sock.bind(sockaddr)
PermissionError: [WinError 10013] 以一种访问权限不允许的方式做了一个访问套接字的尝试。

diffutoon_toon_shading_with_editing_signals 跑6秒视频非常慢

如题

RuntimeError: output with shape [77, 12, 1, 1] doesn't match the broadcast shape [77, 12, 77, 77]

It's hard to make the file diffsynth/models/attention.py run.
Please help,Compared to comfyui,i really like your code.
Thanks for your harding work.

output_paths = sample(
File "/code/./diffsynth_w.py", line 148, in sample
output_video = pipe(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/code/./diffsynth/pipelines/stable_diffusion_video.py", line 195, in call
prompt_emb_posi = self.prompter.encode_prompt(self.text_encoder, prompt, clip_skip=clip_skip, device=self.device, positive=True).cpu()
File "/code/./diffsynth/prompts/init.py", line 84, in encode_prompt
prompt_emb = text_encoder(input_ids, clip_skip=clip_skip)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/code/./diffsynth/models/sd_text_encoder.py", line 68, in forward
embeds = encoder(embeds, attn_mask=attn_mask)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/code/./diffsynth/models/sd_text_encoder.py", line 23, in forward
hidden_states = self.attn(hidden_states, attn_mask=attn_mask)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/code/./diffsynth/models/attention.py", line 76, in forward
return self.torch_forward(hidden_states[0], encoder_hidden_states=encoder_hidden_states, attn_mask=attn_mask)
File "/code/./diffsynth/models/attention.py", line 43, in torch_forward
hidden_states = torch.nn.functional._scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)
RuntimeError: output with shape [77, 12, 1, 1] doesn't match the broadcast shape [77, 12, 77, 77]

Ghff

VID_20240126_163708.mp4

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 15 for tensor number 1 in the list.

Hi! :)
I'm really interested in the new Difftoon pipeline, but whatever input video I use I get this error

File "/home/wizard/repositories/DiffSynth-Studio/diffsynth/models/sd_unet.py", line 222, in forward
    hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 15 for tensor number 1 in the list.

I set up the environment as indicated in the README.md and it worked flawlessly. I have no idea what I should look for to fix this: I haven't changed anything in the settings except the input video path and its resolution.

Thank you for the help!!

大佬求助，上周用还好的，今天突然IndexError: 140

Traceback:
File "C:\Users\DL\anaconda3\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_script
exec(code, module.dict)
File "F:\CYQ\AI\DiffSynth-Studio\pages\2_Video_Creator.py", line 197, in
SDVideoPipelineRunner(in_streamlit=True).run(config)
File "F:\CYQ\AI\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 337, in run
config["pipeline"]["pipeline_inputs"] = self.add_data_to_pipeline_inputs(config["data"], config["pipeline"]["pipeline_inputs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\CYQ\AI\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 315, in add_data_to_pipeline_inputs
pipeline_inputs["input_frames"] = self.load_video(**data["input_frames"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\CYQ\AI\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 310, in load_video
frames = [video[i] for i in range(start_frame_id, end_frame_id)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\CYQ\AI\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 310, in
frames = [video[i] for i in range(start_frame_id, end_frame_id)]
~~~~~^^^
File "F:\CYQ\AI\DiffSynth-Studio\diffsynth\data\video.py", line 121, in getitem
frame = self.data.getitem(item)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\CYQ\AI\DiffSynth-Studio\diffsynth\data\video.py", line 15, in getitem
return Image.fromarray(np.array(self.reader.get_data(item))).convert("RGB")
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DL\anaconda3\Lib\site-packages\imageio\core\format.py", line 437, in get_data
raise IndexError(index)

generate video error, model weight error?

Traceback (most recent call last):
File "/root/miniconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
exec(code, module.dict)
File "/root/autodl-tmp/DiffSynth-Studio/pages/2_Video_Creator.py", line 197, in
SDVideoPipelineRunner(in_streamlit=True).run(config)
File "/root/autodl-tmp/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 340, in run
model_manager, pipe = self.load_pipeline(**config["models"])
File "/root/autodl-tmp/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 271, in load_pipeline
model_manager.load_textual_inversions(textual_inversion_folder)
File "/root/autodl-tmp/DiffSynth-Studio/diffsynth/models/init.py", line 181, in load_textual_inversions
state_dict = load_state_dict(os.path.join(folder, file_name))
File "/root/autodl-tmp/DiffSynth-Studio/diffsynth/models/init.py", line 244, in load_state_dict
return load_state_dict_from_bin(file_path, torch_dtype=torch_dtype)
File "/root/autodl-tmp/DiffSynth-Studio/diffsynth/models/init.py", line 258, in load_state_dict_from_bin
state_dict = torch.load(file_path, map_location="cpu")
File "/root/miniconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/serialization.py", line 1040, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/miniconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/serialization.py", line 1258, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '-'.

PyTorch在编译时没有启用Flash Attention优化

在使用过程非常坎坷/(ㄒoㄒ)/，真的非常想体验这个应用的效果，有没有懂得大佬帮忙康康这个问题啊

D:\AIGC\DiffSynth-Studio-main\diffsynth\models\attention.py:43: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
hidden_states = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)

会一直卡在这里

How to use that ? Where is the Models?

can anyone provide the short brief how to use this?
&
Where i can download the model...
Thank You

Please help, why generate 2 gif?

I really like the video.Thank you!!

测试一下

ERROR: Sizes of tensors must match except in dimension 1. Expected size 136 but got size 135 for tensor number 1 in the list --which occurs at dancer.py

!!! Exception during processing !!!
Traceback (most recent call last):
File "/data/comfy-ui/execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/data/comfy-ui/execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/data/comfy-ui/execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/nodes.py", line 71, in stylize
DiffSynthService().stylize(video_file_path, width, height, frames, fps, output_dir, TARGET_FPS, prompt, neg_prompt,stage1_infer_steps,stage2_infer_steps)
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/diffsynth_service.py", line 181, in stylize
runner.run(config_stage_1)
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/diffsynth/pipelines/stable_diffusion_video.py", line 358, in run
output_video = self.synthesize_video(model_manager, pipe, config["pipeline"]["seed"], smoother, **config["pipeline"]["pipeline_inputs"])
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/diffsynth/pipelines/stable_diffusion_video.py", line 304, in synthesize_video
output_video = pipe(**pipeline_inputs, smoother=smoother)
File "/data/comfy-ui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/diffsynth/pipelines/stable_diffusion_video.py", line 226, in call
noise_pred_posi = lets_dance_with_long_video(
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/diffsynth/pipelines/stable_diffusion_video.py", line 43, in lets_dance_with_long_video
hidden_states_batch = lets_dance(
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/diffsynth/pipelines/dancer.py", line 72, in lets_dance
hidden_states, time_emb, text_emb, res_stack = block(hidden_states, time_emb, text_emb, res_stack)
File "/data/comfy-ui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/comfy-ui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/comfy-ui/custom_nodes/comfyui-cartoon-stylization/diffsynth/models/sd_unet.py", line 222, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 136 but got size 135 for tensor number 1 in the list.