amblyopius / stable-diffusion-onnx-fp16 Goto Github PK

Example code and documentation on how to get Stable Diffusion running with ONNX FP16 models on DirectML. Can run accelerated on all DirectML supported cards including AMD and Intel.

License: GNU General Public License v3.0

Python 100.00%

onnx stablediffusion

stable-diffusion-onnx-fp16's People

Contributors

Stargazers

Watchers

Forkers

marcus-arcadius devourer56 qwerkilo prbarcelon yangtou2000 xzuyn geminichai sanyueliuhuo xinyuanblue uchuusen assddxxz godmapper eshuka gavinchen1314 ramstorage raft0204 wxsid viclener assumecs emekaborisama daliqiqi 897578 hhx-cq fradeet joimson jaraim fcucgvhhhvjv zxz3781 bkunters 2213601279 5l1v3r1 l1-j5n nahasapeemapetilon felaray veryquant kirlf802 yifree zhoupan syrusfem k2m5t2 andrei-g-git iamfaith linecode jdp8 rockerboo

stable-diffusion-onnx-fp16's Issues

About onnxruntime-directml

Instead onnxruntime-directml, I used the command"pip install onnxruntime" for installation. In running the command of "python conv_sd_to_onnx.py --model_path "./stable-diffusion-v1-5" --output_path "./model/sd1.5_base-fp32" ", there promoted an error message. It said that onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for BiasSplitGelu(1) node with name 'BiasSplitGelu_0'. Is it a must to use onnxruntime-directml?

onnxruntime-directml not supported

onnxruntime-directml is not supported on all the systems. It's not needed for onnx converting but causes the requirements installation failure.

For my system (linux, x64, cuda) I installed onnxruntime-gpu and test-txt2img.py worked just fine.

I suggest to remove this library from requirements.txt and add to README, that pepople should install onnxruntime before test run.

Using Inpainting errors in the OnnxDiffusersUI interface

First of all, thank you author you saved my amd graphics card .No matter what model I use in OnnxDiffusersUI-inpainting, I get an error
I read https://github.com/azuritecoin/OnnxDiffusersUI#inpainting-fix
Author says If inpainting does not work for you, please follow these steps from de_inferno#6407 on discord to fix it.
Within: virtualenv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion_inpaint_legacy.py
Find (likely on line 402): sample=latent_model_input, timestep=np.array([t]), encoder_hidden_states=prompt_embeds
Replace with: sample=latent_model_input, timestep=np.array([t], dtype="float32"), encoder_hidden_states=prompt_embeds
But I can't find these files. What do I do? Thanks again

Model conversion fails for fking_scifi_v2

Might be an issue with SD 2.1 768x768 models? Shark apparently has an issue with those too.

https://civitai.com/models/2107/fkingscifiv2

THIS IS actually SLOW than torch and tensorrt

I didn't notice any value to do it in onnxruntime, both running time and memory footprint are not positive to me.

How to convert a .safetensors model

I tried converting the .safetensors model, but couldn't

The source of the model is civitai

https://civitai.com/models/14171/cutegirlmix4

I use v1-inference.yaml

global_step key not found in model
Traceback (most recent call last):
File "D:\python\diffusion-convert-fp32\conv_sd_to_onnx.py", line 523, in
pl = load_pipeline_from_original_stable_diffusion_ckpt(
File "D:\python\diffusion-convert-fp32\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\convert_from_ckpt.py", line 1137, in load_pipeline_from_original_stable_diffusion_ckpt
converted_unet_checkpoint = convert_ldm_unet_checkpoint(
File "D:\python\diffusion-convert-fp32\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\convert_from_ckpt.py", line 380, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'

Model ControlNet ONNX usable on Stable Diffusion WebUI DirectML

Hello,
I wanted to know if it was possible to use the model converted with controlnet on Stable Diffusion DirectML. I did a test and encountered this error:

raise ValueError(
ValueError: Required inputs (['down_block_0', 'down_block_1', 'down_block_2', 'down_block_3', 'down_block_4', 'down_block_5', 'down_block_6', 'down_block_7', 'down_block_8', 'down_block_9', 'down_block_10', 'down_block_11', 'mid_block_additional_residual']) are missing from the input feed (['sample', 'timestep', 'encoder_hidden_states']).

Do you have any ideas on how to solve this problem?

Direct ml not supported on linux

hi can we not use onnxruntime-gpu for conversion of a model to onnx?
On cpu it takes about 30 min and 12gb ram for a model ,
i tried to change CPUExecutionProvider to cuda execution provider after installing onnxruntime-gpu but it loaded everything on cpu .
I am using colab

Textual inversion support

Hi,

i might be throwing some nonsense here but i really have little idea about the code, i just wanted to create a desktop frontend (typescript) using this as a base backend (im still learning python).

anyhow i wanted to know if the script supports textual inversion (embeddings) like in automatic1111? Or is there a api documentation that i can start reading so i can try and implement?

Bad output on Intel A770

You wanted to know how it ran on Intel.

SD 2.1 large, base fp-16 top, base fp-32 bottom.

I've tried converting an earlier version of SD, changing the steps and guidance scale, always bad output though.

ERROR: No matching distribution found for torch

pip install torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu --pre
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/nightly/cpu
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

please help.

0.15.0 breaks conv_sd_to_onnx.py

load_pipeline_from_original_stable_diffusion_ckpt is now download_from_original_stable_diffusion_ckpt.

Fixing the 2 instances of that appears to be working.

Is stable diffusion able to use CLBlast? And if so is that possible here?

I have a AMD 6800M GPU but Direct ML only supports 7000 series and above. I can run a Kobold.cpp with CLBlast up to a 13b model for text generation. Is it possible to add support for this for text to image (or is there another stable diffusion UI that does)?

Convert Stable_Diffusion_controlnet: controlnet doesn't work

hi @Amblyopius:

Convert Stable_Diffusion_controlnet: controlnet doesn't work

enviroment:

torch: 2.0.0
onnxruntime: 1.14.1
onnxconverter-common: 1.13.0
onnx: 1.13.1
diffusers: 0.12.0

could you please give the ditail version of python lib

control_cond:
result

Failed to export stable difussion model with error UnsupportedOperatorError

Hi, I tried to export SD into ONNX but ran into ONNX conversion error when I run the export comment documented in README?

How can I solve it?

> python3 conv_sd_to_onnx.py --model_path "stabilityai/stable-diffusion-2-1-base" --output_path "./model/sd2_1base-fp16" --fp16
...
======================= 0 NONE 0 NOTE 0 WARNING 1 ERROR ========================
ERROR: missing-standard-symbolic-function
=========================================
Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 17 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.
None
<Set verbose=True to see more details>


Traceback (most recent call last):
  File "/home/marty/Documents/Stable-Diffusion-ONNX-FP16/conv_sd_to_onnx.py", line 658, in <module>
    convert_models(pl, args.output_path,
  File "/home/marty/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/marty/Documents/Stable-Diffusion-ONNX-FP16/conv_sd_to_onnx.py", line 375, in convert_models
    onnx_export(
  File "/home/marty/Documents/Stable-Diffusion-ONNX-FP16/conv_sd_to_onnx.py", line 125, in onnx_export
    export(
  File "/home/marty/.local/lib/python3.10/site-packages/torch/onnx/utils.py", line 506, in export
    _export(
  File "/home/marty/.local/lib/python3.10/site-packages/torch/onnx/utils.py", line 1548, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/marty/.local/lib/python3.10/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
    graph = _optimize_graph(
  File "/home/marty/.local/lib/python3.10/site-packages/torch/onnx/utils.py", line 665, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/home/marty/.local/lib/python3.10/site-packages/torch/onnx/utils.py", line 1901, in _run_symbolic_function
    raise errors.UnsupportedOperatorError(
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 17 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.

scheduler: {sched_name} not parsing scheduler name correctly to history.txt

Hello,

I noticed that the scheduler is not being registered correctly on history.txt.

Does anyone know how to fix this?

Thank you!

Support Long Prompts

Is there a way to modify the pipeline before converting while still using custom models? Because there are custom pipelines supporting long prompts (see here), but so far I can't see a way to use those. You have provided your own custom pipelines for ControlNet etc, but those replace the model (as far as I understand, I am not that great with python and diffusers yet)
Is there a way to provide a custom pipeline (or add first-class support for long prompts) while still loading custom models?

invalid value encountered in cast images = (images * 255).round().astype("uint8")

I met this problem when I executed

python conv_sd_to_onnx.py --model_path "stabilityai/stable-diffusion-2-1" --output_path "./model/sd2_1-fp16" --fp16
python test-txt2img.py --model "model\sd2_1-fp16" --size 768 --seed 0

and

python conv_sd_to_onnx.py --model_path "stabilityai/stable-diffusion-2-1" --output_path "./model/sd2_1-fp16-autoslicing" --fp16 --attention-slicing auto
python test-txt2img.py --model "model\sd2_1-fp16-autoslicing" --size 768 --seed 0

The result image is a black image.

Did anyone know how to deal with it?

Thank you very much!

ControlNet for Stable Diffusion 2.1

I tried to use the conv_sd_to_onnx.py script to convert thibaud's Canny ControlNet for SD 2.1 to ONNX format but it did not work, specifically a shape mismatch error occurred. The command I used was python conv_sd_to_onnx.py --model_path "stabilityai/stable-diffusion-2-1" --controlnet_path "thibaud/controlnet-sd21-canny-diffusers" --output_path "models/sd21-canny" --attention-slicing auto --ckpt-upcast-attention --fp16.

It would be great if the conversion of ControlNet for SD 2.1 to ONNX format was possible. For now, I am using ControlNet 1.0 and 1.1 with SD 1.5 in ONNX and it's enough but it would be great to have ControlNet for SD 2.1.

Fails to run the scripts.

I get this error when I try to run either py.. everything appeared to install correctly.

(sd_env_conv) G:\SD\sd3>python diffusers_to_onnx_optim.py --model_path "stabilityai/stable-diffusion-2-1-base" --output_
path "./model/sd2_1base-fp32"
Traceback (most recent call last):
  File "G:\SD\sd3\diffusers_to_onnx_optim.py", line 46, in <module>
    from diffusers.models import AutoencoderKL
  File "G:\SD\sd3\sd_env_conv\lib\site-packages\diffusers\__init__.py", line 4, in <module>
    from .onnx_utils import OnnxRuntimeModel
  File "G:\SD\sd3\sd_env_conv\lib\site-packages\diffusers\onnx_utils.py", line 31, in <module>
    import onnxruntime as ort
  File "G:\SD\sd3\sd_env_conv\lib\site-packages\onnxruntime\__init__.py", line 55, in <module>
    raise import_capi_exception
  File "G:\SD\sd3\sd_env_conv\lib\site-packages\onnxruntime\__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import (
  File "G:\SD\sd3\sd_env_conv\lib\site-packages\onnxruntime\capi\_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
ImportError: DLL load failed while importing onnxruntime_pybind11_state: The specified module could not be found.

ControlNet Annotators to ONNX

At the moment I am able to use Canny Edge Detection, Pose Estimation and Semantic Segmentation (all in JavaScript thanks to OpenCV.js) in diffusers.js for the annotation (pre-processing) of images that are used as input for various ControlNets. These are the only ones that I've managed to implement so far but there are more Annotator models such as Scribble, HED, M-LSD, etc that I have not found a JavaScript implementation for.

It would be useful to have these models in ONNX format so they can be called the same way as an UNET, VAE or other model. A script that can convert the Annotator models to ONNX format or the models directly in ONNX format so they can be used in diffusers.js would be an ideal solution if possible.

I believe the original implementation of the Annotator models is controlnet_aux.

Error installing UI

After cloned the UI , launching the command: python OnnxDiffusersUI\onnxUI.py
i receive the following error message:
"
File "H:\Stable-Diffusion-ONNX-FP16\OnnxDiffusersUI\onnxUI.py", line 14, in
from diffusers import (
ImportError: cannot import name 'OnnxStableDiffusionInpaintPipelineLegacy' from 'diffusers' (H:\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers_init_.py)
"

i tried reinstalling all , verified all dependencies and so on, when i run the command python OnnxDiffusersUI\onnxUI.py i receive the same message. there is someone that could pont me to some workaround?

Text2Video and/or Img2Video Conversion Script to ONNX

Not sure if I'm getting ahead of myself since some of the models that I will mention are new but I am not able to convert certain Text2Video or Img2Video models from Diffusers format to ONNX format. Some of the models are:

Text2Video
- damo-vilab/text-to-video-ms-1.7b
- cerspense/zeroscope_v2_576w
Img2Video
- stabilityai/stable-video-diffusion-img2vid

It would be convenient to have a script that can convert Text2Video or Img2Video models from Diffusers format to ONNX format. I tried using the conv_sd_to_onnx.py script to see if that would work by any chance but it did not.

Error when converting runwayml/stable-diffusion-v1-5

My command:
python conv_sd_to_onnx.py --model_path "runwayml/stable-diffusion-v1-5" --output_path "./model/sd1_5base"

Error:

(sd) PS D:\stable-diffusion\Stable-Diffusion-ONNX-FP16> python conv_sd_to_onnx.py --model_path "runwayml/stable-diffusion-v1-5" --output_path "./model/sd1_5base" text_encoder\model.safetensors not found Downloading (…)e6a/unet/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<?, ?B/s] C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: huggingface_hubcache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\LiAng\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting theHF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
warnings.warn(message)
Fetching 15 files: 47%|██████████████████████████████████████████████████████▏ | 7/15 [00:11<00:12, 1.62s/it]
Traceback (most recent call last):
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\connectionpool.py", line 536, in _make_request
response = conn.getresponse()
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\connection.py", line 454, in getresponse
httplib_response = super().getresponse()
File "C:\Users\LiAng\miniconda3\envs\sd\lib\http\client.py", line 1375, in getresponse
response.begin()
File "C:\Users\LiAng\miniconda3\envs\sd\lib\http\client.py", line 318, in begin
version, status, reason = self._read_status()
File "C:\Users\LiAng\miniconda3\envs\sd\lib\http\client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Users\LiAng\miniconda3\envs\sd\lib\socket.py", line 705, in readinto
return self._sock.recv_into(b)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\ssl.py", line 1274, in recv_into
return self.read(nbytes, buffer)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\ssl.py", line 1130, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\util\retry.py", line 470, in increment
raise reraise(type(error), error, _stacktrace)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\util\util.py", line 39, in reraise
raise value
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\connectionpool.py", line 790, in urlopen
response = self._make_request(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\connectionpool.py", line 538, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\urllib3\connectionpool.py", line 370, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\stable-diffusion\Stable-Diffusion-ONNX-FP16\conv_sd_to_onnx.py", line 619, in
pl = StableDiffusionPipeline.from_pretrained(args.model_path,
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 908, in from_pretrained
cached_folder = cls.download(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1349, in download
cached_folder = snapshot_download(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub_snapshot_download.py", line 235, in snapshot_download
thread_map(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\tqdm\contrib\concurrent.py", line 69, in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\tqdm\contrib\concurrent.py", line 51, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\tqdm\std.py", line 1178, in iter
for obj in iterable:
File "C:\Users\LiAng\miniconda3\envs\sd\lib\concurrent\futures_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "C:\Users\LiAng\miniconda3\envs\sd\lib\concurrent\futures_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\concurrent\futures_base.py", line 458, in result
return self.__get_result()
File "C:\Users\LiAng\miniconda3\envs\sd\lib\concurrent\futures_base.py", line 403, in __get_result
raise self._exception
File "C:\Users\LiAng\miniconda3\envs\sd\lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub_snapshot_download.py", line 211, in _inner_hf_hub_download
return hf_hub_download(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub\file_download.py", line 1364, in hf_hub_download
http_get(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub\file_download.py", line 505, in http_get
r = _request_wrapper(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub\file_download.py", line 442, in _request_wrapper
return http_backoff(
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\huggingface_hub\utils_http.py", line 212, in http_backoff
response = session.request(method=method, url=url, **kwargs)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "C:\Users\LiAng\miniconda3\envs\sd\lib\site-packages\requests\adapters.py", line 532, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10.0)`

How to use VAE file ends with *.vae.pt

lots of models use their own VAE file named .vae.pt
Now I have some VAE file (.vae.pt) and try to use them
How can I combine one (*.vae.pt) with more models, move *.vae.pt to models/model/vae ?

I built it with AMD6700XT video card

gradio install step needed

I may have missed it, but going through the installation instructions, it seems like pip install gradio was missing. I received an error at the UI run step. Installing gradio fixed it.

sd convert to onnx error

when i convert sd to onnx. There is error and I don't know why.
Here is the error.
global_step key not found in model Traceback (most recent call last): File "D:\Stable-Diffusion-ONNX-FP16\conv_sd_to_onnx.py", line 410, in <module> pl = load_pipeline_from_original_stable_diffusion_ckpt( File "D:\Stable-Diffusion-ONNX-FP16\lib\site-packages\diffusers\pipelines\stable_diffusion\convert_from_ckpt.py", line 1078, in load_pipeline_from_original_stable_diffusion_ckpt converted_unet_checkpoint = convert_ldm_unet_checkpoint( File "D:\Stable-Diffusion-ONNX-FP16\lib\site-packages\diffusers\pipelines\stable_diffusion\convert_from_ckpt.py", line 368, in convert_ldm_unet_checkpoint new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"] KeyError: 'time_embed.0.weight'

NameError: name 'init_empty_weights' is not defined

/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py", line 1344, in download_from_original_stable_diffusion_ckpt
with init_empty_weights():
NameError: name 'init_empty_weights' is not defined

Torch 2.0 not compatible

I couldn't get this running with Torch 2.0. So, I reinstalled Torch 1.13.1 and then it worked. Maybe the requirements file should be changed accordingly.

Do the ONNX models produced this way support batch size above 1?

I am working on a C++ and DirectML based Stable Diffusion app. I have experimented with this tool and generated an ONNX model from Realistic Vision 1.4, with FP16 + autoslicing. The model works fine as long as my batch size 1, if I go above 1, only the first image is good (or even that fails depending on input).

I thought the problem was my code, but I have tried with this model and it works fine with the same code.

Another possibility is that the converted models expect the data for multiple batches a different way. I tried multiple configurations, but at most only the first image was good with the converted models, the others were not working.

Some inputs I have tried for batch size 3:

sample [A, A, B, B, C, C,]
encoder_hidden_states [ Uncond, Cond ]
output expectation [ AU, AC, BU, BC, CU, CC ]

Converted model: first image good, rest are bad
Reference model: fail to run

sample [A, A, B, B, C, C,]
encoder_hidden_states [ Uncond, Cond, Uncond, Cond, Uncond, Cond ]
output expectation [ AU, AC, BU, BC, CU, CC ]

Converted model: first image good, rest are bad
Reference model: all ok

sample [A, B, C, A, B, C ]
encoder_hidden_states [ Uncond, Uncond, Uncond, Cond, Cond, Cond ]
output expectation [ AU, BU, CU, AC, BC, CC ]

Converted model: all fail
Reference model: all ok

Do you have any idea what is the issue?

BTW I have also noticed with FP16 without autoslicing, I cannot generate 512x512 px images with batch size 2, as I run out of memory with 12GB VRAM quite badly. This might be completely normal, but I have noticed that the reference model which uses a lot more resources for batch size 1, it will still run ok for 2, so resource use does not increase this much.

Great tool though! I am glad for it a lot.

TypeError: download_from_original_stable_diffusion_ckpt() got an unexpected keyword argument 'checkpoint_path'

https://github.com/huggingface/diffusers/blame/main/src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py#L1114

The parameter checkpoint_path for download_from_original_stable_diffusion_ckpt has been renamed to checkpoint_path_or_dict last month

Getting error on installing OnxxUI

Traceback (most recent call last):
  File "C:\Users\minn\Downloads\Stable Diffusion\Stable-Diffusion-ONNX-FP16\OnnxDiffusersUI\onnxUI.py", line 1288, in <module>
    image_t2 = gr.Image(
  File "C:\Users\minn\Downloads\Stable Diffusion\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\gradio\component_meta.py", line 146, in wrapper
    return fn(self, **kwargs)
TypeError: Image.__init__() got an unexpected keyword argument 'source'

use local common conversion resources

Hi, there is a model that I am trying to convert, the file is from civitai.com, so it is a single .safetensors file, I have been successfully testing it via automatic1111 webui. I am new to this stuff so now when I try to convert it, the script starts downloading some file named pytorch_model. a 1.22GB file, and since my network is slow, it gets cut off halfway, so my question is, is there a way I can use all of the required files from automatic1111 webui folder so that the conversion won't need to download anything at the time of conversion? btw I have been using same models folder for testing on automatic1111 webui, easydiffusion, and comfyUI. And they all work fine. comfyUI and automatic1111 even share the same venv.

Generated image is always a completely black canvas

I'm trying to use the tool (CPU only) and I'm following the guide step-by-step. However the generated image is always a completely black canvas.

First I downloaded the model and converted it to ONNX models:

Conversion log

(sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python conv_sd_to_onnx.py --model_path "d:/stable-diffusion-2-1-base" --output_path "d:/stable-diffusion-2-1-base_onnx" --fp16
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\symbolic_opset9.py:5742: UserWarning: Exporting aten::index operator of advanced indexing in opset 15 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn(
======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\unet_2d_condition.py:650: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]):
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:205: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:127: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:140: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if hidden_states.shape[0] >= 64:
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\unet_2d_condition.py:793: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not return_dict:
======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\autoencoder_kl.py:168: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not return_dict:
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\utils.py:691: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\utils.py:1198: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(
======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\autoencoder_kl.py:193: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not return_dict:
======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

ONNX pipeline saved to c:\stable-diffusion-2-1-base_onnx
2023-05-17 17:14:46.0759019 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:14:47.0292714 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:14:47.0359445 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-05-17 17:14:49.3462450 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:14:49.4506139 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:14:49.4578436 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-05-17 17:14:50.3127437 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:14:50.8373542 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:14:50.8446672 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-05-17 17:14:55.1853004 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:14:55.2730131 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:14:55.2808058 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
ONNX pipeline is loadable

Then the generation process

Process log

(sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python test-txt2img.py --model "d:\stable-diffusion-2-1-base_onnx2" --size 256 --seed 0
2023-05-17 17:18:35.1594695 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:18:35.2571994 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:18:35.2638711 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-05-17 17:18:35.5660742 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:18:35.6494087 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:18:35.6565771 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-05-17 17:18:36.1736922 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:18:36.6198002 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:18:36.6280828 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-05-17 17:18:38.2551350 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-05-17 17:18:39.0760730 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-17 17:18:39.0841299 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [01:27<00:00,  2.82s/it]
c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\utils\pil_utils.py:38: RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8")

My list of installed Python packages

(sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>pip list
Package                Version
---------------------- ---------------------
accelerate             0.19.0
aiofiles               23.1.0
aiohttp                3.8.4
aiosignal              1.3.1
altair                 5.0.0
antlr4-python3-runtime 4.9.3
anyio                  3.6.2
async-timeout          4.0.2
attrs                  23.1.0
blis                   0.7.9
catalogue              2.0.8
certifi                2023.5.7
charset-normalizer     3.1.0
click                  8.1.3
colorama               0.4.6
coloredlogs            15.0.1
confection             0.0.4
contourpy              1.0.7
cycler                 0.11.0
cymem                  2.0.7
diffusers              0.16.1
fastapi                0.95.1
ffmpy                  0.3.0
filelock               3.12.0
flatbuffers            23.5.9
fonttools              4.39.4
frozenlist             1.3.3
fsspec                 2023.5.0
ftfy                   6.1.1
gradio                 3.30.0
gradio_client          0.2.4
h11                    0.14.0
httpcore               0.17.0
httpx                  0.24.0
huggingface-hub        0.14.1
humanfriendly          10.0
idna                   3.4
importlib-metadata     6.6.0
Jinja2                 3.1.2
jsonschema             4.17.3
kiwisolver             1.4.4
langcodes              3.3.0
linkify-it-py          2.0.2
markdown-it-py         2.2.0
MarkupSafe             2.1.2
matplotlib             3.7.1
mdit-py-plugins        0.3.3
mdurl                  0.1.2
mpmath                 1.3.0
multidict              6.0.4
murmurhash             1.0.9
networkx               3.1
numpy                  1.24.3
omegaconf              2.3.0
onnx                   1.14.0
onnxconverter-common   1.13.0
onnxruntime-directml   1.14.1
opencv-python          4.7.0.72
orjson                 3.8.12
packaging              23.1
pandas                 2.0.1
pathy                  0.10.1
Pillow                 9.5.0
pip                    23.1.2
preshed                3.0.8
protobuf               4.23.0
psutil                 5.9.5
pydantic               1.10.7
pydub                  0.25.1
Pygments               2.15.1
pyparsing              3.0.9
pyreadline3            3.4.1
pyrsistent             0.19.3
python-dateutil        2.8.2
python-multipart       0.0.6
pytz                   2023.3
PyYAML                 6.0
regex                  2023.5.5
requests               2.30.0
safetensors            0.3.1
scipy                  1.10.1
semantic-version       2.10.0
setuptools             63.2.0
six                    1.16.0
smart-open             6.3.0
sniffio                1.3.0
spacy                  3.5.2
spacy-legacy           3.0.12
spacy-loggers          1.0.4
srsly                  2.4.6
starlette              0.26.1
sympy                  1.12
thinc                  8.1.10
tokenizers             0.13.3
toolz                  0.12.0
torch                  2.1.0.dev20230514+cpu
tqdm                   4.65.0
transformers           4.29.1
typer                  0.7.0
typing_extensions      4.5.0
tzdata                 2023.3
uc-micro-py            1.0.2
urllib3                2.0.2
uvicorn                0.22.0
wasabi                 1.1.1
wcwidth                0.2.6
websockets             11.0.3
yarl                   1.9.2
zipp                   3.15.0

Can anyone tell me what have I done wrong here please?