Coder Social home page Coder Social logo

voltaml / voltaml-fast-stable-diffusion Goto Github PK

View Code? Open in Web Editor NEW
940.0 940.0 76.0 69.14 MB

Beautiful and Easy to use Stable Diffusion WebUI

Home Page: https://voltaml.github.io/voltaML-fast-stable-diffusion/

License: GNU General Public License v3.0

Python 72.05% Shell 0.13% HTML 0.02% CSS 0.08% JavaScript 0.02% Vue 19.89% TypeScript 4.15% Rust 3.06% Batchfile 0.03% C++ 0.12% MATLAB 0.46%
ai-art aitemplate generative-art linux naive-ui python pytorch rust stable-diffusion text2image typescript vue webui windows

voltaml-fast-stable-diffusion's Introduction

Screenshot 2022-10-19 at 3 55 14 PM

Accelerate your machine learning and deep learning models by upto 10X

🔥UPDATE: Stable-Diffusion/DreamBooth Acceleration. Upto 2.5X speed up in inference🔥

voltaML is an open-source lightweight library to accelerate your machine learning and deep learning models. VoltaML can optimize, compile and deploy your models to your target CPU and GPU devices, with just one line of code.

animated

Out of the box support for

✅ FP16 Quantization

✅ Int8 Quantization*

✅ Hardware specific compilation


Screenshot 2022-10-17 at 12 06 26 PM


voltaML has compilation support for the following:

Screenshot 2022-06-13 at 3 43 03 PM

Installation

Own setup:

Requirements:

  • CUDA Version >11.x
  • TensorRT == 8.4.1.2
  • PyTorch == 1.12 cu11.x
  • NVIDIA Driver version > 510
git clone https://github.com/VoltaML/voltaML.git
cd voltaML
python setup.py install

Docker Container 🐳

docker pull voltaml/voltaml:v0.4
docker run -it --gpus=all -p "8888:8888" voltaml/voltaml:v0.4 \ 
        jupyter lab --port=8888 --no-browser --ip 0.0.0.0 --allow-root

Usage

import torch
from voltaml.compile import VoltaGPUCompiler, VoltaCPUCompiler, TVMCompiler
from voltaml.inference import gpu_performance

model = torch.load("path/to/model/dir")

# compile the model by giving paths
compiler = VoltaGPUCompiler(
        model=model,
        output_dir="destination/path/of/compiled/model",
        input_shape=(1, 3, 224, 224), # example input shape
        precision="fp16" # specify precision[fp32, fp16, int8] - Only for GPU compiler
        target="llvm" # specify target device - Only for TVM compiler
    )

# returns the compiled model
compiled_model = compiler.compile()

# compute and compare performance
gpu_performance(compiled_model, model, input_shape=(1, 3, 224, 224))
cpu_performance(compiled_model, model, compiler="voltaml", input_shape=(1, 3, 224, 224))
cpu_performance(compiled_model, model, compiler="tvm", input_shape=(1, 3, 224, 224))

Notebooks

  1. ResNet-50 image classification
  2. DeeplabV3_MobileNet_v3_Large Segmentation
  3. YOLOv5 Object Detection YOLOv5
  4. YOLOv6 Object Detection YOLOv6
  5. Bert_Base_Uncased Huggingface

Benchmarks

🖼️ Classification Models Inference Latency (on GPU) ⏱️

Classification has been done on Imagenet data, batch size = 1 and imagesize = 224 on NVIDIA RTX 2080Ti. In terms of top 1% and 5% accuracy for int8 models, we have not seen an accuracy drop of more than 1%.

Pytorch (ms), VoltaGPU FP16 (ms) and VoltaGPU int8 (ms)

Model Pytorch (ms) VoltaGPU FP16 (ms) VoltaGPU int8 (ms) Pytorch vs Int8 Speed
squeezenet1_1 1.6 0.2 0.2 8.4x
resnet18 2.7 0.4 0.3 9.0x
resnet34 4.5 0.7 0.5 9.0x
resnet50 6.6 0.7 0.5 13.2x
resnet101 13.6 1.3 1.0 13.6x
densenet121 15.7 2.4 2.0 7.9x
densenet169 22.0 4.4 3.8 5.8x
densenet201 26.8 6.3 5.0 5.4x
vgg11 2.0 0.9 0.5 4.0x
vgg16 3.5 1.2 0.7 5.0x

🧐 Object Detection (YOLO) Models Inference Latency (on GPU) ⏱️

Object Detection inference was done on a dummy data with imagesize = 640 and batch size = 1 on NVIDIA RTX 2080Ti.

Pytorch (ms) and VoltaGPU FP16 (ms)

Model Pytorch (ms) VoltaGPU FP16 (ms) Pytorch vs FP16 Speed
YOLOv5n 5.2 1.2 4.3x
YOLOv5s 5.1 1.6 3.2x
YOLOv5m 9.1 3.2 2.8x
YOLOv5l 15.3 5.1 3.0x
YOLOv5x 30.8 6.4 4.8x
YOLOv6s 8.8 3.0 2.9x
YOLOv6l_relu 23.4 5.5 4.3x
YOLOv6l 18.1 4.1 4.4x
YOLOv6n 9.1 1.6 5.7x
YOLOv6t 8.6 2.4 3.6x
YOLOv5m 15.5 3.5 4.4x

🎨 Segmentation Models Inference Latency (on GPU) ⏱️

Segmentation inference was done on a dummy data with imagesize = 224 and batch size = 1 on NVIDIA RTX 2080Ti.

Pytorch (ms), VoltaGPU FP16 (ms) and VoltaGPU Int8 (ms)(1)

Model Pytorch (ms) VoltaGPU FP16 (ms) VoltaGPU Int8 (ms) Speed Up (X)
FCN_Resnet50 8.3 2.3 1.8 3.6x
FCN_Resnet101 14.7 3.5 2.5 5.9x
DeeplabV3_Resnet50 12.1 2.5 1.3 9.3x
DeeplabV3_Resnet101 18.7 3.6 2.0 9.4x
DeeplabV3_MobileNetV3_Large 6.1 1.5 0.8 7.6x
DeeplabV3Plus_ResNet50 6.1 1.1 0.8 7.6x
DeeplabV3Plus_ResNet34 4.7 0.9 0.8 5.9x
UNet_ResNet50 6.2 1.3 1 6.2x
UNet_ResNet34 4.3 1.1 0.8 5.4x
FPN_ResNet50 5.5 1.2 1 5.5x
FPN_ResNet34 4.2 1.1 1 4.2x

🤗 Accelerating Huggingface Models using voltaML

We're adding support to accelerate Huggingface NLP models with voltaML. This work has been inspired from ELS-RD's work. This is still in the early stages and only few models listed in the below table are supported. We're working to add more models soon.

from voltaml.compile import VoltaNLPCompile
from voltaml.inference import nlp_performance


model='bert-base-cased'
backend=["tensorrt","onnx"] 
seq_len=[1, 1, 1] 
task="classification"
batch_size=[1,1,1]

VoltaNLPCompile(model=model, device='cuda', backend=backend, seq_len=seq_len)

nlp_performance(model=model, device='cuda', backend=backend, seq_len=seq_len)

Pytorch (ms) and VoltaML FP16 (ms)

Model Pytorch (ms) VoltaML FP16 (ms) SpeedUp
bert-base-uncased 6.4 1 6.4x
Jean-Baptiste/camembert-ner 6.3 1 6.3x
gpt2 6.6 1.2 5.5x
xlm-roberta-base 6.4 1.08 5.9x
roberta-base 6.6 1.09 6.1x
bert-base-cased 6.2 0.9 6.9x
distilbert-base-uncased 3.5 0.6 5.8x
roberta-large 11.9 2.4 5.0x
deepset/xlm-roberta-base-squad2 6.2 1.08 5.7x
cardiffnlp/twitter-roberta-base-sentiment 6 1.07 5.6x
sentence-transformers/all-MiniLM-L6-v2 3.2 0.42 7.6x
bert-base-chinese 6.3 0.97 6.5x
distilbert-base-uncased-finetuned-sst-2-english 3.4 0.6 5.7x
albert-base-v2 6.7 1 6.7x

voltaTrees ⚡🌴 -> Link

A LLVM-based compiler for XGBoost and LightGBM decision trees.

voltatrees converts trained XGBoost and LightGBM models to optimized machine code, speeding-up prediction by ≥10x.

Example

import voltatrees as vt

model = vt.XGBoostRegressor.Model(model_file="NYC_taxi/model.txt")
model.compile()
model.predict(df)

Installation

git clone git clone https://github.com/VoltaML/volta-trees.git
cd volta-trees/
pip install -e .

Benchmarks

On smaller datasets, voltaTrees is 2-3X faster than Treelite by DMLC. Testing on large scale dataset is yet to be conducted.

Enterpise Platform 🛣️

Any enterprise customers who would like a fully managed solution hosted on your own cloud, please contact us at [email protected]

  • Fully managed and cloud-hosted optimization engine.
  • Hardware targeted optimised dockers for maximum performance.
  • One-click deployment of the compiled models.
  • Cost-benefit analysis dashboard for optimal deployment.
  • NVIDIA Triton optimzed dockers for large-scale GPU deployment.
  • Quantization-Aware-Training (QAT)

voltaml-fast-stable-diffusion's People

Contributors

aaronsantiago avatar ahmedsheashaa avatar eltociear avatar gabe56f avatar harishprabhala avatar kamalkraj avatar katehuuh avatar miningp avatar riteshgangnani10 avatar sahil280114 avatar stax124 avatar xenfo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

voltaml-fast-stable-diffusion's Issues

Error on optimization step

[12/09/2022-20:25:24] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[12/09/2022-20:25:24] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[12/09/2022-20:25:24] [TRT] [W] Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[12/09/2022-20:25:25] [TRT] [W] Skipping tactic 8 due to insufficient memory on requested size of 17179869184 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().

How to fix this? I'm testing on T4 GPU with 15.1GB VRAM.

[Feature]: add controlnet for SD2.0 model support and ati acceleration capability

Is your feature request related to a problem? Please describe.

When I try to compile ATI for SD2.0 model, there is an error in the second step, I think it is not supported yet

Describe the solution you'd like

https://huggingface.co/thibaud/controlnet-sd21

Describe alternatives you've considered

No response

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

How to delete this?

Hello there, after i installed it and tinker with it with no results i found out that my drive C is full from this, how do i delete it? Why didnt it got installed in the folder i run it from?

Batch size of 8 acceleration with AI Template does not work

Running latest image

Running inference fails with this exception:

2023-03-18T18:54:45.876370659Z   File "/app/api/routes/generate.py", line 31, in txt2img_job
2023-03-18T18:54:45.876372038Z     images, time = await cluster.generate(job)
2023-03-18T18:54:45.876373180Z   File "/app/core/cluster.py", line 164, in generate
2023-03-18T18:54:45.876374433Z     raise e
2023-03-18T18:54:45.876375506Z   File "/app/core/cluster.py", line 143, in generate
2023-03-18T18:54:45.876376675Z     return await best_gpu.generate(job)
2023-03-18T18:54:45.876377823Z   File "/app/core/gpu.py", line 135, in generate
2023-03-18T18:54:45.876379035Z     raise err
2023-03-18T18:54:45.876380137Z   File "/app/core/gpu.py", line 127, in generate
2023-03-18T18:54:45.876381353Z     images = await run_in_thread_async(func=generate_thread_call, args=(job,))
2023-03-18T18:54:45.876382550Z   File "/app/core/utils.py", line 77, in run_in_thread_async
2023-03-18T18:54:45.876383781Z     raise exc
2023-03-18T18:54:45.876384916Z   File "/app/core/thread.py", line 45, in run
2023-03-18T18:54:45.876386300Z     self._return = target(*self._args, **self._kwargs)  # type: ignore
2023-03-18T18:54:45.876387628Z   File "/app/core/gpu.py", line 93, in generate_thread_call
2023-03-18T18:54:45.876388817Z     images: List[Image.Image] = model.generate(job)
2023-03-18T18:54:45.876390019Z   File "/app/core/inference/aitemplate.py", line 103, in generate
2023-03-18T18:54:45.876391269Z     images = self.txt2img(job)
2023-03-18T18:54:45.876392395Z   File "/app/core/inference/aitemplate.py", line 142, in txt2img
2023-03-18T18:54:45.876393651Z     data = pipe(
2023-03-18T18:54:45.876394809Z   File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2023-03-18T18:54:45.876396087Z     return func(*args, **kwargs)
2023-03-18T18:54:45.876397224Z   File "/app/core/aitemplate/src/ait_txt2img.py", line 285, in __call__
2023-03-18T18:54:45.876398387Z     text_embeddings = self.clip_inference(text_input.input_ids.to(self.device))
2023-03-18T18:54:45.876399602Z   File "/app/core/aitemplate/src/ait_txt2img.py", line 170, in clip_inference
2023-03-18T18:54:45.876400929Z     exe_module.run_with_tensors(inputs, ys, graph_mode=False)
2023-03-18T18:54:45.876402103Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 535, in run_with_tensors
2023-03-18T18:54:45.876403418Z     outputs_ait = self.run(
2023-03-18T18:54:45.876404582Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 438, in run
2023-03-18T18:54:45.876406137Z     return self._run_impl(
2023-03-18T18:54:45.876407297Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 377, in _run_impl
2023-03-18T18:54:45.876408540Z     self.DLL.AITemplateModelContainerRun(
2023-03-18T18:54:45.876410860Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 181, in _wrapped_func
2023-03-18T18:54:45.876412236Z     raise RuntimeError(f"Error in function: {method.__name__}")
2023-03-18T18:54:45.876413454Z RuntimeError: Error in function: AITemplateModelContainerRun

[Bug]: AIT Acceleration doesn't work

Describe the bug

AIT Acceleration doesn't work (waiting for 40 minutes already), stucks on UNet tab. If run AIT scripts directly with python, everything ok. Task manager shows no perfomance after 8 minutes.

Reproduction

Set Acceleration to 512x512, 1 batch, 24 CPU threads. Model stable-diffusion-v1-5.

Expected behavior

Model acceleration

Installation Method

Local

Branch

Main

System Info

Latest version of main branch. Windows, 5900x, 128 RAM, 3090ti.

Logs

2023-05-01 03:51:20,361 INFO <aitemplate.backend.profiler_cache> Ignore repeat profile_record:
SELECT algo, workspace, split_k
FROM cuda_gemm_3
WHERE
dtype_a=14 AND
dtype_b=14 AND
dtype_c=14 AND
dtype_acc=14 AND
major_a=2 AND
major_b=1 AND
major_c=2 AND
op_type='gemm_rcr_permute' AND
device='80' AND
epilogue=1 AND
pshape='64_1_8' AND
exec_entry_sha1='d9106d7291c48fc10faca140108b9deb185eed00';
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profiler (gemm_rcr_bias_fast_gelu_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 M == 128 && N == 5120 && K == 1280) selected kernel: best_algo='cutlass_tensorop_h16816gemm_64x128_32x6_tn_align_8_8' workspace=0 split_k=1
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profiler (gemm_rcr_bias_mul_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 M == 128 && N == 5120 && K == 1280) selected kernel: best_algo='cutlass_tensorop_h16816gemm_64x128_32x6_tn_align_8_8' workspace=0 split_k=1
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profiler (gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 M == 128 && N == 1280 && K == 5120) selected kernel: best_algo='cutlass_tensorop_h16816gemm_64x64_64x5_tn_align_8_8' workspace=0 split_k=1
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.transform.profile> ran 75 profilers elapsed time: 0:00:07.081708
2023-05-01 03:51:20,374 INFO <aitemplate.backend.codegen> generated 1 function srcs
2023-05-01 03:51:20,385 INFO <aitemplate.compiler.compiler> folded constants elapsed time: 0:00:00.021329
2023-05-01 03:51:21,261 INFO <aitemplate.backend.codegen> generated 199 function srcs
2023-05-01 03:51:23,263 INFO <aitemplate.backend.codegen> generated 7 library srcs
2023-05-01 03:51:23,264 INFO <aitemplate.backend.builder> Using 24 CPU for building

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • I am writing the issue in English.

CLIP model in models.py cannot be changed

Hi,

In models.py file, the CLIP class can only load a default CLIP (I copy/paste the current code bellow). Shouldn't it be self.model_path instead of "openai/clip-vit-large-patch14" ? VAE and UNET are set to the correct self.model_path.

class CLIP(BaseModel):
def get_model(self):
return CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(self.device)

Thanks !

Not a directory when trying to look for TRT models

Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__ return self.wsgi_app(environ, start_response) File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app response = self.handle_exception(e) File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/workspace/voltaML-fast-stable-diffusion/app.py", line 155, in scan_directory tmp2 = os.listdir(os.path.join(trt_model_path,i)) NotADirectoryError: [Errno 20] Not a directory: 'engine/unet_fp16.plan'

It seems like that the AIT doesn't support dynamic shape ?

Is your feature request related to a problem? Please describe.

When I change the output size of the txt2img, it occurs sth error as belows (I build the ait with runwayml--stable-diffusion-v1-5__512x512x1)

// case 1 output size is 64 * 64
...
[18:16:17] model_interface.cu:210: Error: [SetValue] Dimension got value out of bounds; expected value to be in [32, 64], but got 8.

// case 2 output size is 512 * 256
...
  File "/home/quchenxi/test/fastsd/core/aitemplate/src/ait_txt2img.py", line 429, in __call__
    latents = self.scheduler.step(
  File "/home/quchenxi/test/TensorRT/dm_engine/lib/python3.10/site-packages/diffusers/schedulers/scheduling_ddim.py", line 326, in step
    pred_original_sample = (sample - beta_prod_t ** (0.5) * model_output) / alpha_prod_t ** (0.5)
RuntimeError: The size of tensor a (32) must match the size of tensor b (64) at non-singleton dimension 3

Describe the solution you'd like

Maybe it need some work on AITStableDiffusionPipeline before building engines like trt

Describe alternatives you've considered

No response

Additional context

Do you have the plan to support the dynamic shape on AIT ? Or can you share some ideas about how to make it ?

Validations

  • Read the docs.
  • Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

error with using optimize.sh

Traceback (most recent call last):
File "volta_accelerate.py", line 153, in
convert_to_onnx(args)
File "volta_accelerate.py", line 79, in convert_to_onnx
traced_model = torch.jit.trace(
File "/home/work/python/lib/python3.8/site-packages/torch/jit/_trace.py", line 750, in trace
return trace_module(
File "/home/work/python/lib/python3.8/site-packages/torch/jit/_trace.py", line 967, in trace_module
module._c._create_method_from_trace(
File "/home/work/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/work/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
result = self.forward(*input, **kwargs)
TypeError: forward() takes from 4 to 5 positional arguments but 6 were given

Obvious Output Discrepancy between PyTorch and AITemplate inference

Describe the bug

Description

The output discrepancy between PyTorch and AITemplate inference is quite obvious.

According to our various testing cases, AITemplate produces lower-quality results on average, especially for human faces.

Reproduction

Model:
chilloutmix-ni-pruned-fp16-fix

Prompt:

brown hair, 1girl, solo, hand on the hip, dress, looking at viewer, smile, street

Negative Prompt:

(worst quality low quality:1.4)
Parameter Value
Height 512
Width 512
Sampler DPMSolverMultiStep
CFG 7
Batch Count 4
Batch Size 1
Seed 1191535362

PyTorch Results

bbb

AITemplate Results

ait

Expected behavior

PyTorch and AITemplate should produce similar results and quality.

Branch

Experimental

System Info

OS: Debian 11
GPU: Nvidia L4
CUDA: 12.1

Additional context

It might be related to facebookincubator/AITemplate#141

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

Question: about FP8 support for Ada and Hopper..

Hi,
Just asking how difficult will be to add FP8 support to VoltaML once TensorRT supports new FP8 format on Ada and Hopper..
What performance uplift can we expect? maybe 4090 from 80iters/s to 120 iters? Or near to 2x up to 160 iters/s..
Also would be nice to include VoltaML on H100 perf. numbers currently (beforce FP8), in case anyone can get access to..

Thanks..

Quietly fails and closes

Have been testing it out to see what it tries to do to load memory. I have an RTX 3070 8 GB and wanted to see how much memory it used before failing, however, it has not used any VRAM outside of the 2.4 GB to load the model. For RAM, it jumps up 6 GB but then quietly fails. Barely touches the 48 GB RAM avilable.

By quietly fails, I mean that it just doesn't do anything, or say anything.

Example:

if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
A:\Anaconda\envs\voltaml\lib\site-packages\transformers\models\clip\modeling_clip.py:262: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
(voltaml) PS H:\Downloads\voltaML-fast-stable-diffusion-main>

Output directory is empty, no logs were found either.

Process killed when using "volta_accelerate.py"

Hello, I encountered a problem when trying to use volta_accelerate.py

System info:
Windows 10
Nvidia 3060Ti 8GB
i5-11400F
Using Docker Desktop with WSL2 (Ubuntu) with the voltaML Docker Container image

Here is the command and the log I got

root@f585f96dd9a2:/workspace/voltaML-fast-stable-diffusion# python3 volta_accelerate.py --model="runwayml/stable-diffusion-v1-5"
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [05:11<00:00, 11.0MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 589kB/s]
/workspace/voltaML-fast-stable-diffusion/diffusers/models/unet_2d_condition_onnx.py:274: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]):
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:182: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:187: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:109: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:122: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if hidden_states.shape[0] >= 64:
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:777: UserWarning: no signature found for <torch.ScriptMethod object at 0x7fa7a64aa040>, skipping _decide_input_format
  warnings.warn(f"{e}, skipping _decide_input_format")
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1880: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input latent_model_input
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1880: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input t
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1880: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input encoder_hidden_states
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/onnx/_patch_torch.py:67: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1874.)
  torch._C._jit_pass_onnx_node_shape_type_inference(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:648: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1874.)
  _C._jit_pass_onnx_graph_shape_type_inference(
Killed

The last log is just "Killed" so I don't know where to investigate, maybe it's due to my graphics card not having enough VRAM?

python3 volta_accelerate.py --onnx_trt=trt "$@"

Sorry for bothering. I am not that of a code-man. Seems i get nearer using your docker image but i still keep a bunch of errors when i try to tune a model with it. last error i got was

optimize.sh: line 1: 4939 Segmentation fault python3 volta_accelerate.py --onnx_trt=onnx "$@"
optimize.sh: line 2: 4959 Segmentation fault python3 volta_accelerate.py --onnx_trt=trt "$@"

any idea what i can do to get it to work?

Websockets connection sudden break when set "batch count" more than 1

logs:
02:10:14 | root | INFO » Adding job e2ffa91a-e182-45ca-a93b-6112251cbd6c to queue
100% 25/25 [00:01<00:00, 17.57it/s]
100% 25/25 [00:01<00:00, 17.52it/s]
100% 25/25 [00:01<00:00, 17.43it/s]
100% 25/25 [00:01<00:00, 17.46it/s]
100% 25/25 [00:01<00:00, 17.43it/s]
100% 25/25 [00:01<00:00, 17.23it/s]
100% 25/25 [00:01<00:00, 17.40it/s]
100% 25/25 [00:01<00:00, 17.42it/s]
100% 25/25 [00:01<00:00, 17.32it/s]
INFO: 172.18.22.48:62843 - "POST /api/generate/txt2img HTTP/1.1" 200 OK
02:10:33 | asyncio | ERROR » Task exception was never retrieved
future: <Task finished name='Task-4' coro=<WebSocketManager.perf_loop() done, defined at /app/api/websockets/manager.py:32> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Traceback (most recent call last):
File "/app/api/websockets/manager.py", line 60, in perf_loop
await self.broadcast(Data(data_type="cluster_stats", data=data))
File "/app/api/websockets/manager.py", line 83, in broadcast
await connection.send_json(data.to_json())
File "/usr/local/lib/python3.8/dist-packages/starlette/websockets.py", line 173, in send_json
await self.send({"type": "websocket.send", "text": text})
File "/usr/local/lib/python3.8/dist-packages/starlette/websockets.py", line 85, in send
await self._send(message)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 65, in sender
await send(message)
File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/websockets/websockets_impl.py", line 327, in asgi_send
await self.send(data) # type: ignore[arg-type]
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/protocol.py", line 635, in send
await self.ensure_open()
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/protocol.py", line 953, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
ERROR: Exception in ASGI application

Segmentation fault (core dumped) - TRT Inference

I'm not able to run TRT inference locally on my A100 machine.

[11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:25] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars 51it [00:01, 40.63it/s] | 0/1 [00:00<?, ?it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.94s/it] Segmentation fault (core dumped)

Packages Details
accelerate==0.14.0
diffusers==0.9.0
ftfy==6.1.1
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==8.6.0.163
onnx==1.12.0
onnxconverter-common==1.13.0
onnxruntime==1.13.1
onnxsim==0.4.10
pycuda==2022.2
spacy==3.4.3
tensorrt==8.5.1.7
thinc==8.1.5
tokenizers==0.13.2
torch==1.13.0+cu116
torchaudio==0.13.0+cu116
torchvision==0.14.0+cu116
transformers==4.24.0

[Bug]: Image to image always outputs images with 512X512 resolution

Describe the bug

Image to image always outputs images with 512X512 resolution

Reproduction

build latest branch

Expected behavior

Output by setting

Installation Method

Docker

Branch

Experimental

System Info

building with dockerfile

Logs

No response

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • I am writing the issue in English.

Bug in Frontend View

Describe the bug

frontend\src\views\TextToImageView.vue

The width and height are misplaced, so when generating an image, the front sends the width value as height, and viceversa:
image

Reproduction

Generate any image with different width and height, and see how the frontend calls the api with the values reversed

Expected behavior

The frontend should send the correct values from the width and height sliders

Branch

Experimental

System Info

Experimental branch

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

What is the trick?

Is it true that this repo just converts model to TRT and this is the way how it gets speed boost?

[Enchancement ] Code Hygiene

The code is excellent from the scientific point of view, and I love how fast models work, but there is room for improvement in code hygiene.

It will significantly improve the code quality (easier to read, 10x faster to modify) if you add standard industry linters and formatters to the pre-commit and CI/CD.

It will take only 20 minutes to add these checks to the code base, but the value they provide would be substantial.

Error during TRT model build

Running from the docker container

python volta_accelerate.py --build-static-batch --prompt "Forest" --onnx-dir onnx --engine-dir engine --force-onnx-export --backend TRT

Get an error:

  File "volta_accelerate.py", line 741, in <module>
    infer_trt(saving_path=args.output_dir,
  File "volta_accelerate.py", line 670, in infer_trt
    load_trt(model, prompt, img_height, img_width, num_inference_steps)
  File "volta_accelerate.py", line 596, in load_trt
    trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset, 
  File "volta_accelerate.py", line 301, in loadEngines
    engine.build(onnx_opt_path, fp16=True, \
  File "/workspace/utilities.py", line 72, in build
    engine = engine_from_network(network_from_onnx_path(onnx_path), config=CreateConfig(fp16=fp16,max_workspace_size=8100654080, profiles=[p],
  File "<string>", line 3, in func_impl
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 526, in call_impl
    return engine_from_bytes(super().call_impl)
  File "<string>", line 3, in func_impl
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 550, in call_impl
    buffer, owns_buffer = util.invoke_if_callable(self._serialized_engine)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/util/util.py", line 661, in invoke_if_callable
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 484, in call_impl
    G_LOGGER.critical("Invalid Engine. Please ensure the engine was built correctly")
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/logger/logger.py", line 597, in critical
    raise PolygraphyException(message) from None
polygraphy.exception.exception.PolygraphyException: Invalid Engine. Please ensure the engine was built correctly

Does this project support the acceleration of AITemplate pipeline?

Is your feature request related to a problem? Please describe.

Does this project support the acceleration of Lora in AITemplate pipeline?

Describe the solution you'd like

Maybe it needs to "inject" or "swap" the LORAs weights into the already compiled unet.so. Refit API can be used to patch/update the weights of the engine at runtime in TensorRT

Describe alternatives you've considered

No response

Additional context

Or can anyone share some ideas on how to implement it ?

Validations

  • Read the docs.
  • Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Unable to run the voltaml/volta_diffusion:v0.1 docker image

-> % sudo docker run -it --gpus all voltaml/volta_diffusion:v0.1 bash
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/e049fdb3bc56fecdeefb3b950034cbc757eeb166b152330d00ef6e8a2972af06/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0000] error waiting for container: context canceled

This is probably because when --gpus=all is specified, the Docker engine will try and mount all the nvidia & cuda bits & pieces into the container. But some of the files in the image (e.g. /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1) are actually links rather than files, so the mounting process is not successful.

Please can you open source the Dockerfile as well.

After failing to load a model (or the correct one) and trying to load a textual inversion (after getting error), it doesn't reset the LOAD button

Discussed in #79

Originally posted by cleverestx May 14, 2023

image

The button is no longer usable even after re-loading the correct model when it tells you 'X model is not loaded', even when the window is closed and re-opened. Please fix this so the LOAD button is restored when loading the asked-for model (or closing and re-opening this window).

Even when loading it properly with the correct model, you can't unload/remove it? It does not appear in the prompt either...the latter would be nice, and some embeddings are for NEGATIVE prompts...how would that work?

Thank you.

Docker gets stuck

I get stuck at Volume "voltaml-fast-stable-diffusion_output" Creating

I use Windows 10 64-bit.

Add Multiple ControlNets support

Is your feature request related to a problem? Please describe.

It would be nice if voltaml suport Multi-ControlNet

Describe the solution you'd like

https://github.com/huggingface/diffusers/releases

Describe alternatives you've considered

No response

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Documentation

First of all, thank you for making compilation much simpler.

However, I think proper documentation is needed for better adoption of this technology

More specifically the infer_trt function on

def infer_trt(saving_path, model, prompt, neg_prompt, img_height, img_width, num_inference_steps, guidance_scale, num_images_per_prompt, seed=None):

I would also like to make sure that what all the "convertor" does is compile it into tensorRT.

Thank you.

MPS supporting

Is your feature request related to a problem? Please describe.

What do you think to add Apple MPS supporting? Will be it useful or even possible with voltaML architecture?

Describe the solution you'd like

Pytorch is supporting MPS backend for low-level Apple Silicon GPU accelerating.

Describe alternatives you've considered

No response

Additional context

Device: Macbook M1 Pro Max
OS: MacOS Ventura

Validations

  • Read the docs.
  • Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

[Bug]: Dead link in Documentation

Describe the bug

top DOCS URL/Link is bad and needs to be updated on this page https://voltaml.github.io/voltaML-fast-stable-diffusion/

image

Reproduction

Clicked on DOCS on the main page.

Expected behavior

Should go to: https://voltaml.github.io/voltaML-fast-stable-diffusion/getting-started/introduction

Installation Method

Local

Branch

Experimental

System Info

Windows 11
Python 3.10
LAtest build
Issues is not related to the actual AI software, but the webpage

Logs

No response

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • I am writing the issue in English.

[Bug]: Web UI sometimes disconnects under heavy load and breaks the preview.

Describe the bug

As the title says, sometimes while generating in the experimental branch, the Web UI will disconnects. Upon reconnecting, the noise preview feature won't work, and there is a 1-2 second delay before the generated image pops up in the UI. Disconnecting and reconnecting again doesn't fix the issue, the server has to be restarted.

This issue goes back awhile, a few weeks at least.

This is one of those annoying non deterministic issues I can't just trigger on the experimental brach... but the exact same error consistently happens when testing this torch.compile PR: #72

If I close the Web UI and open it when the model is done compiling/the image is done generating, it wont error out.

Seems to be related to some kind of networking "heartbeat" timeout? I tried making a few config changes, but had no luck:

PIPE_COMPILE_SET
  0%|                                                                             | 0/25 [00:00<?, ?it/s]
INFO     22:18:59 | uvicorn.access » 127.0.0.1:57130 - "POST /api/generate/txt2img        h11_impl.py:498
         HTTP/1.1" 500
ERROR    22:18:59 | uvicorn.error » Exception in ASGI application                         h11_impl.py:433

         ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:213 in generate     │
         │                                                                              │
         │   210 │   │   │   except Exception as err:  # pylint: disable=broad-except   │
         │   211 │   │   │   │   self.memory_cleanup()                                  │
         │   212 │   │   │   │   self.queue.mark_finished()                             │
         │ ❱ 213 │   │   │   │   raise err                                              │
         │   214 │   │   │                                                              │
         │   215 │   │   │   deltatime = time.time() - start_time                       │
         │   216                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:180 in generate     │
         │                                                                              │
         │   177 │   │   │   # Generate images                                          │
         │   178 │   │   │   try:                                                       │
         │   179 │   │   │   │   generated_images: Optional[List[Image.Image]]          │
         │ ❱ 180 │   │   │   │   generated_images = await run_in_thread_async(          │
         │   181 │   │   │   │   │   func=generate_thread_call, args=(job,)             │
         │   182 │   │   │   │   )                                                      │
         │   183                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/utils.py:104 in            │
         │ run_in_thread_async                                                          │
         │                                                                              │
         │   101 │   value, exc = thread.join()                                         │
         │   102 │                                                                      │
         │   103 │   if exc:                                                            │
         │ ❱ 104 │   │   raise exc                                                      │
         │   105 │                                                                      │
         │   106 │   return value                                                       │
         │   107                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/thread.py:45 in run        │
         │                                                                              │
         │   42 │   │   │   │   │   │   "Executing coroutine %s in %s", target.__name__ │
         │   43 │   │   │   │   │   )                                                   │
         │   44 │   │   │   │   │   try:                                                │
         │ ❱ 45 │   │   │   │   │   │   self._return = target(*self._args, **self._kwar │
         │      ignore                                                                  │
         │   46 │   │   │   │   │   except Exception as err:  # pylint: disable=broad-e │
         │   47 │   │   │   │   │   │   self._err = err                                 │
         │   48 │   │   │   │   else:                                                   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:125 in              │
         │ generate_thread_call                                                         │
         │                                                                              │
         │   122 │   │   │                                                              │
         │   123 │   │   │   if isinstance(model, PyTorchStableDiffusion):              │
         │   124 │   │   │   │   logger.debug("Generating with PyTorch")                │
         │ ❱ 125 │   │   │   │   images: List[Image.Image] = model.generate(job)        │
         │   126 │   │   │   elif isinstance(model, AITemplateStableDiffusion):         │
         │   127 │   │   │   │   logger.debug("Generating with AITemplate")             │
         │   128 │   │   │   │   images: List[Image.Image] = model.generate(job)        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/pytorch.py:621   │
         │ in generate                                                                  │
         │                                                                              │
         │   618 │   │   │   │   raise ValueError("Invalid job type for this pipeline") │
         │   619 │   │   except Exception as e:                                         │
         │   620 │   │   │   self.memory_cleanup()                                      │
         │ ❱ 621 │   │   │   raise e                                                    │
         │   622 │   │                                                                  │
         │   623 │   │   # Clean memory and return images                               │
         │   624 │   │   self.memory_cleanup()                                          │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/pytorch.py:610   │
         │ in generate                                                                  │
         │                                                                              │
         │   607 │   │                                                                  │
         │   608 │   │   try:                                                           │
         │   609 │   │   │   if isinstance(job, Txt2ImgQueueEntry):                     │
         │ ❱ 610 │   │   │   │   images = self.txt2img(job)                             │
         │   611 │   │   │   elif isinstance(job, Img2ImgQueueEntry):                   │
         │   612 │   │   │   │   images = self.img2img(job)                             │
         │   613 │   │   │   elif isinstance(job, InpaintQueueEntry):                   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/pytorch.py:257   │
         │ in txt2img                                                                   │
         │                                                                              │
         │   254 │   │   │   if "highres_fix" in job.flags:                             │
         │   255 │   │   │   │   output_type = "latent"                                 │
         │   256 │   │   │                                                              │
         │ ❱ 257 │   │   │   data = pipe.text2img(                                      │
         │   258 │   │   │   │   prompt=job.data.prompt,                                │
         │   259 │   │   │   │   height=job.data.height,                                │
         │   260 │   │   │   │   width=job.data.width,                                  │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/lwp_sd.py:685 in │
         │ text2img                                                                     │
         │                                                                              │
         │   682 │   │   │   list of `bool`s denoting whether the corresponding generat │
         │       represents "not-safe-for-work"                                         │
         │   683 │   │   │   (nsfw) content, according to the `safety_checker`.         │
         │   684 │   │   """                                                            │
         │ ❱ 685 │   │   return self.__call__(                                          │
         │   686 │   │   │   prompt=prompt,                                             │
         │   687 │   │   │   negative_prompt=negative_prompt,                           │
         │   688 │   │   │   height=height,                                             │
         │                                                                              │
         │ /home/alpha/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py:1 │
         │ 15 in decorate_context                                                       │
         │                                                                              │
         │   112 │   @functools.wraps(func)                                             │
         │   113 │   def decorate_context(*args, **kwargs):                             │
         │   114 │   │   with ctx_factory():                                            │
         │ ❱ 115 │   │   │   return func(*args, **kwargs)                               │
         │   116 │                                                                      │
         │   117 │   return decorate_context                                            │
         │   118                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/lwp_sd.py:580 in │
         │ __call__                                                                     │
         │                                                                              │
         │   577 │   │   │   │   # call the callback, if provided                       │
         │   578 │   │   │   │   if i % callback_steps == 0:                            │
         │   579 │   │   │   │   │   if callback is not None:                           │
         │ ❱ 580 │   │   │   │   │   │   callback(i, t, latents)  # type: ignore        │
         │   581 │   │   │   │   │   if is_cancelled_callback is not None and is_cancel │
         │   582 │   │   │   │   │   │   return None                                    │
         │   583                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference_callbacks.py:47  │
         │ in txt2img_callback                                                          │
         │                                                                              │
         │    44 def txt2img_callback(step: int, _timestep: int, tensor: torch.Tensor): │
         │    45 │   "Callback for txt2img with progress and partial image"             │
         │    46 │                                                                      │
         │ ❱  47 │   images, send_image = pytorch_callback(step, _timestep, tensor)     │
         │    48 │                                                                      │
         │    49 │   websocket_manager.broadcast_sync(                                  │
         │    50 │   │   data=Data(                                                     │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference_callbacks.py:173 │
         │ in pytorch_callback                                                          │
         │                                                                              │
         │   170 │                                                                      │
         │   171 │   if shared.interrupt:                                               │
         │   172 │   │   shared.interrupt = False                                       │
         │ ❱ 173 │   │   raise InferenceInterruptedError                                │
         │   174 │                                                                      │
         │   175 │   shared.current_done_steps += 1                                     │
         │   176 │   send_image: bool = time.time() - last_image_time > config.api.imag │
         ╰──────────────────────────────────────────────────────────────────────────────╯
         InferenceInterruptedError

         During handling of the above exception, another exception occurred:

         ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/uvicorn/protocols/http/h11_impl.py:428 in run_asgi                        │
         │                                                                              │
         │   425 │   # ASGI exception wrapper                                           │
         │   426 │   async def run_asgi(self, app: "ASGI3Application") -> None:         │
         │   427 │   │   try:                                                           │
         │ ❱ 428 │   │   │   result = await app(  # type: ignore[func-returns-value]    │
         │   429 │   │   │   │   self.scope, self.receive, self.send                    │
         │   430 │   │   │   )                                                          │
         │   431 │   │   except BaseException as exc:                                   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/uvicorn/middleware/proxy_headers.py:78 in __call__                        │
         │                                                                              │
         │   75 │   │   │   │   │   port = 0                                            │
         │   76 │   │   │   │   │   scope["client"] = (host, port)  # type: ignore[arg- │
         │   77 │   │                                                                   │
         │ ❱ 78 │   │   return await self.app(scope, receive, send)                     │
         │   79                                                                         │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/applications.py:276 in __call__                                   │
         │                                                                              │
         │   273 │   async def __call__(self, scope: Scope, receive: Receive, send: Sen │
         │   274 │   │   if self.root_path:                                             │
         │   275 │   │   │   scope["root_path"] = self.root_path                        │
         │ ❱ 276 │   │   await super().__call__(scope, receive, send)                   │
         │   277 │                                                                      │
         │   278 │   def add_api_route(                                                 │
         │   279 │   │   self,                                                          │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/applications.py:122 in __call__                                 │
         │                                                                              │
         │   119 │   │   scope["app"] = self                                            │
         │   120 │   │   if self.middleware_stack is None:                              │
         │   121 │   │   │   self.middleware_stack = self.build_middleware_stack()      │
         │ ❱ 122 │   │   await self.middleware_stack(scope, receive, send)              │
         │   123 │                                                                      │
         │   124 │   def on_event(self, event_type: str) -> typing.Callable:  # pragma: │
         │   125 │   │   return self.router.on_event(event_type)                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/errors.py:184 in __call__                            │
         │                                                                              │
         │   181 │   │   │   # We always continue to raise the exception.               │
         │   182 │   │   │   # This allows servers to log the error, or allows test cli │
         │   183 │   │   │   # to optionally raise the error within the test case.      │
         │ ❱ 184 │   │   │   raise exc                                                  │
         │   185 │                                                                      │
         │   186 │   def format_line(                                                   │
         │   187 │   │   self, index: int, line: str, frame_lineno: int, frame_index: i │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/errors.py:162 in __call__                            │
         │                                                                              │
         │   159 │   │   │   await send(message)                                        │
         │   160 │   │                                                                  │
         │   161 │   │   try:                                                           │
         │ ❱ 162 │   │   │   await self.app(scope, receive, _send)                      │
         │   163 │   │   except Exception as exc:                                       │
         │   164 │   │   │   request = Request(scope)                                   │
         │   165 │   │   │   if self.debug:                                             │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/cors.py:92 in __call__                               │
         │                                                                              │
         │    89 │   │   │   await response(scope, receive, send)                       │
         │    90 │   │   │   return                                                     │
         │    91 │   │                                                                  │
         │ ❱  92 │   │   await self.simple_response(scope, receive, send, request_heade │
         │    93 │                                                                      │
         │    94 │   def is_allowed_origin(self, origin: str) -> bool:                  │
         │    95 │   │   if self.allow_all_origins:                                     │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/cors.py:147 in simple_response                       │
         │                                                                              │
         │   144 │   │   self, scope: Scope, receive: Receive, send: Send, request_head │
         │   145 │   ) -> None:                                                         │
         │   146 │   │   send = functools.partial(self.send, send=send, request_headers │
         │ ❱ 147 │   │   await self.app(scope, receive, send)                           │
         │   148 │                                                                      │
         │   149 │   async def send(                                                    │
         │   150 │   │   self, message: Message, send: Send, request_headers: Headers   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/exceptions.py:79 in __call__                         │
         │                                                                              │
         │    76 │   │   │   │   handler = self._lookup_exception_handler(exc)          │
         │    77 │   │   │                                                              │
         │    78 │   │   │   if handler is None:                                        │
         │ ❱  79 │   │   │   │   raise exc                                              │
         │    80 │   │   │                                                              │
         │    81 │   │   │   if response_started:                                       │
         │    82 │   │   │   │   msg = "Caught handled exception, but response already  │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/exceptions.py:68 in __call__                         │
         │                                                                              │
         │    65 │   │   │   await send(message)                                        │
         │    66 │   │                                                                  │
         │    67 │   │   try:                                                           │
         │ ❱  68 │   │   │   await self.app(scope, receive, sender)                     │
         │    69 │   │   except Exception as exc:                                       │
         │    70 │   │   │   handler = None                                             │
         │    71                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/middleware/asyncexitstack.py:21 in __call__                       │
         │                                                                              │
         │   18 │   │   │   │   │   await self.app(scope, receive, send)                │
         │   19 │   │   │   │   except Exception as e:                                  │
         │   20 │   │   │   │   │   dependency_exception = e                            │
         │ ❱ 21 │   │   │   │   │   raise e                                             │
         │   22 │   │   │   if dependency_exception:                                    │
         │   23 │   │   │   │   # This exception was possibly handled by the dependency │
         │   24 │   │   │   │   # still bubble up so that the ServerErrorMiddleware can │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/middleware/asyncexitstack.py:18 in __call__                       │
         │                                                                              │
         │   15 │   │   │   async with AsyncExitStack() as stack:                       │
         │   16 │   │   │   │   scope[self.context_name] = stack                        │
         │   17 │   │   │   │   try:                                                    │
         │ ❱ 18 │   │   │   │   │   await self.app(scope, receive, send)                │
         │   19 │   │   │   │   except Exception as e:                                  │
         │   20 │   │   │   │   │   dependency_exception = e                            │
         │   21 │   │   │   │   │   raise e                                             │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/routing.py:718 in __call__                                      │
         │                                                                              │
         │   715 │   │   │   match, child_scope = route.matches(scope)                  │
         │   716 │   │   │   if match == Match.FULL:                                    │
         │   717 │   │   │   │   scope.update(child_scope)                              │
         │ ❱ 718 │   │   │   │   await route.handle(scope, receive, send)               │
         │   719 │   │   │   │   return                                                 │
         │   720 │   │   │   elif match == Match.PARTIAL and partial is None:           │
         │   721 │   │   │   │   partial = route                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/routing.py:276 in handle                                        │
         │                                                                              │
         │   273 │   │   │   │   )                                                      │
         │   274 │   │   │   await response(scope, receive, send)                       │
         │   275 │   │   else:                                                          │
         │ ❱ 276 │   │   │   await self.app(scope, receive, send)                       │
         │   277 │                                                                      │
         │   278 │   def __eq__(self, other: typing.Any) -> bool:                       │
         │   279 │   │   return (                                                       │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/routing.py:66 in app                                            │
         │                                                                              │
         │    63 │   async def app(scope: Scope, receive: Receive, send: Send) -> None: │
         │    64 │   │   request = Request(scope, receive=receive, send=send)           │
         │    65 │   │   if is_coroutine:                                               │
         │ ❱  66 │   │   │   response = await func(request)                             │
         │    67 │   │   else:                                                          │
         │    68 │   │   │   response = await run_in_threadpool(func, request)          │
         │    69 │   │   await response(scope, receive, send)                           │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/routing.py:237 in app                                             │
         │                                                                              │
         │    234 │   │   if errors:                                                    │
         │    235 │   │   │   raise RequestValidationError(errors, body=body)           │
         │    236 │   │   else:                                                         │
         │ ❱  237 │   │   │   raw_response = await run_endpoint_function(               │
         │    238 │   │   │   │   dependant=dependant, values=values, is_coroutine=is_c │
         │    239 │   │   │   )                                                         │
         │    240                                                                       │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/routing.py:163 in run_endpoint_function                           │
         │                                                                              │
         │    160 │   assert dependant.call is not None, "dependant.call must be a func │
         │    161 │                                                                     │
         │    162 │   if is_coroutine:                                                  │
         │ ❱  163 │   │   return await dependant.call(**values)                         │
         │    164 │   else:                                                             │
         │    165 │   │   return await run_in_threadpool(dependant.call, **values)      │
         │    166                                                                       │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/api/routes/generate.py:35 in    │
         │ txt2img_job                                                                  │
         │                                                                              │
         │    32 │   try:                                                               │
         │    33 │   │   images: Union[List[Image.Image], List[str]]                    │
         │    34 │   │   time: float                                                    │
         │ ❱  35 │   │   images, time = await gpu.generate(job)                         │
         │    36 │   except ModelNotLoadedError:                                        │
         │    37 │   │   raise HTTPException(  # pylint: disable=raise-missing-from     │
         │    38 │   │   │   status_code=400, detail="Model is not loaded"              │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:233 in generate     │
         │                                                                              │
         │   230 │   │   │                                                              │
         │   231 │   │   │   return (images, deltatime)                                 │
         │   232 │   │   except InferenceInterruptedError:                              │
         │ ❱ 233 │   │   │   await websocket_manager.broadcast(                         │
         │   234 │   │   │   │   Notification(                                          │
         │   235 │   │   │   │   │   "warning",                                         │
         │   236 │   │   │   │   │   "Inference interrupted",                           │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/api/websockets/manager.py:156   │
         │ in broadcast                                                                 │
         │                                                                              │
         │   153 │   │                                                                  │
         │   154 │   │   for connection in self.active_connections:                     │
         │   155 │   │   │   if connection.application_state.CONNECTED:                 │
         │ ❱ 156 │   │   │   │   await connection.send_json(data.to_json())             │
         │   157 │   │   │   else:                                                      │
         │   158 │   │   │   │   self.active_connections.remove(connection)             │
         │   159                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/websockets.py:173 in send_json                                  │
         │                                                                              │
         │   170 │   │   │   raise RuntimeError('The "mode" argument should be "text" o │
         │   171 │   │   text = json.dumps(data)                                        │
         │   172 │   │   if mode == "text":                                             │
         │ ❱ 173 │   │   │   await self.send({"type": "websocket.send", "text": text})  │
         │   174 │   │   else:                                                          │
         │   175 │   │   │   await self.send({"type": "websocket.send", "bytes": text.e │
         │   176                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/websockets.py:85 in send                                        │
         │                                                                              │
         │    82 │   │   │   │   )                                                      │
         │    83 │   │   │   if message_type == "websocket.close":                      │
         │    84 │   │   │   │   self.application_state = WebSocketState.DISCONNECTED   │
         │ ❱  85 │   │   │   await self._send(message)                                  │
         │    86 │   │   else:                                                          │
         │    87 │   │   │   raise RuntimeError('Cannot call "send" once a close messag │
         │    88                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/exceptions.py:65 in sender                           │
         │                                                                              │
         │    62 │   │   │                                                              │
         │    63 │   │   │   if message["type"] == "http.response.start":               │
         │    64 │   │   │   │   response_started = True                                │
         │ ❱  65 │   │   │   await send(message)                                        │
         │    66 │   │                                                                  │
         │    67 │   │   try:                                                           │
         │    68 │   │   │   await self.app(scope, receive, sender)                     │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/uvicorn/protocols/websockets/websockets_impl.py:345 in asgi_send          │
         │                                                                              │
         │   342 │   │                                                                  │
         │   343 │   │   else:                                                          │
         │   344 │   │   │   msg = "Unexpected ASGI message '%s', after sending 'websoc │
         │ ❱ 345 │   │   │   raise RuntimeError(msg % message_type)                     │
         │   346 │                                                                      │
         │   347 │   async def asgi_receive(                                            │
         │   348 │   │   self,                                                          │
         ╰──────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: Unexpected ASGI message 'websocket.send', after sending
         'websocket.close'.
INFO     22:19:02 | root » Adding job d359deb8-5040-4165-8fee-efee38f60f96 to queue        pytorch.py:605

Installation Method

Local

Branch

Experimental

System Info

CachyOS Arch Linux, RTX 2060 GPU, 4900HS CPU, Python 3.11 (also tested in 3.10).

Logs

No response

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • I am writing the issue in English.

mount directory of Local models already downloaded and allow civitai downloads

wanting to compare to a1111, and I have a ton of models downloaded... but I see no way to mount a model directory so I can use them.
Also I often prefer civitai to huggingface.

Also if the TensorRT conversion is done locally, I'd like to save those and be able to reuse them... again, external mounted directory?

WSL2 + docker + cuda toolkit

I have 8gb 2080 and 32 gb memory and got lots of out of memory errors.
But in the I think it succeed somehow, yet given this error.
I think docker with cuda toolkit support is a bit limited atm?
Cublas (Could not initialize cublas. Please check CUDA installation.)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl

/usr/local/lib/python3.8/dist-packages/diffusers/models/resnet.py:39: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/usr/local/lib/python3.8/dist-packages/diffusers/models/resnet.py:52: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if hidden_states.shape[0] >= 64:
/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py:349: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not return_dict:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Generating optimizing model: onnx/unet_fp16.opt.onnx
[I] Folding Constants | Pass 1
[I]     Total Nodes | Original:  8201, After Folding:  5947 |  2254 Nodes Folded
[I] Folding Constants | Pass 2
2022-12-27 21:32:57.943186900 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7900
2022-12-27 21:32:57.943320900 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7729
2022-12-27 21:32:57.943632400 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7969
2022-12-27 21:32:57.943700800 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7574
2022-12-27 21:32:57.943810200 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7427
2022-12-27 21:32:57.944080600 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7256
...
[I]     Total Nodes | Original:  5947, After Folding:  4536 |  1411 Nodes Folded
[I] Folding Constants | Pass 3
[I]     Total Nodes | Original:  4536, After Folding:  4536 |     0 Nodes Folded
Building TensorRT engine for onnx/unet_fp16.opt.onnx: engine/unet_fp16.plan
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(32, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(32, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
	
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Requested amount of GPU memory (8589934592 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[W] Skipping tactic 3 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000004.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 9 due to insufficient memory on requested size of 8589934592 detected for tactic 0x000000000000003c.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 8 due to insufficient memory on requested size of 8589934592 detected for tactic 0x000000000000003c.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[W] - 24 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 498.222 seconds
[I] Saving engine to engine/unet_fp16.plan
Exception ignored in: <function Engine.__del__ at 0x7fcd429b1550>
Traceback (most recent call last):
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 50, in __del__
    [buf.free() for buf in self.buffers.values() if isinstance(buf, cuda.DeviceArray) ]
AttributeError: 'Engine' object has no attribute 'buffers'
Exporting model: onnx/vae.onnx
Downloading:  23%|████████████████████████▍      
/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py:583: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not return_dict:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Generating optimizing model: onnx/vae.opt.onnx
[I] Folding Constants | Pass 1
[I]     Total Nodes | Original:   759, After Folding:   679 |    80 Nodes Folded
[I] Folding Constants | Pass 2
2022-12-27 21:43:47.511248100 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_149

2022-12-27 21:43:47.511248100 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_149
2022-12-27 21:43:47.511309200 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_66
[I]     Total Nodes | Original:   679, After Folding:   675 |     4 Nodes Folded
[I] Folding Constants | Pass 3
[I]     Total Nodes | Original:   675, After Folding:   675 |     0 Nodes Folded
Building TensorRT engine for onnx/vae.opt.onnx: engine/vae.plan
[I]     Configuring with profiles: [Profile().add('latent', min=(1, 4, 64, 64), opt=(1, 4, 64, 64), max=(16, 4, 64, 64))]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
	
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 0 due to insufficient memory on requested size of 8589934592 detected for tactic 0x00000000000003e8.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 1 due to insufficient memory on requested size of 8589934592 detected for tactic 0x00000000000003ea.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 2 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000000.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 12 due to insufficient memory on requested size of 17179869184 detected for tactic 0x994f5b723e2d80da.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 13 due to insufficient memory on requested size of 17179869184 detected for tactic 0x65d82d184f452332.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 14 due to insufficient memory on requested size of 17179869184 detected for tactic 0x8d5c64a52fab02c9.
	Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[I] Finished engine building in 221.174 seconds
[I] Saving engine to engine/vae.plan
Exception ignored in: <function Engine.__del__ at 0x7fcd429b1550>
Traceback (most recent call last):
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 50, in __del__
    [buf.free() for buf in self.buffers.values() if isinstance(buf, cuda.DeviceArray) ]
AttributeError: 'Engine' object has no attribute 'buffers'
Building TensorRT engine for onnx/clip.opt.onnx: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I]     Configuring with profiles: [Profile().add('input_ids', min=(1, 77), opt=(1, 77), max=(16, 77))]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
[W] - 6 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 49.716 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/clip.plan
Building TensorRT engine for onnx/unet_fp16.opt.onnx: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(32, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(32, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[W] - 22 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 426.725 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Building TensorRT engine for onnx/vae.opt.onnx: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I]     Configuring with profiles: [Profile().add('latent', min=(1, 4, 64, 64), opt=(1, 4, 64, 64), max=(16, 4, 64, 64))]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[I] Finished engine building in 164.962 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/vae.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[W] Requested amount of GPU memory (5570036736 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 2: [executionContext.cpp::ExecutionContext::409] Error Code 2: OutOfMemory (no further information)
[I] Finished engine building in 164.962 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/vae.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[W] Requested amount of GPU memory (5570036736 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 2: [executionContext.cpp::ExecutionContext::409] Error Code 2: OutOfMemory (no further information)
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 1.03MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 738kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 36.6kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 582kB/s]
[I] Warming up ..
[I] Running StableDiffusion pipeline
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [wrapper.cpp::CublasWrapper::85] Error Code 1: Cublas (Could not initialize cublas. Please check CUDA installation.)
[E] 1: [engine.cpp::deserialize::867] Error Code 1: Serialization (Serialization assertion postDeserializationCheck() failed.Post deserialization check failure)
[E] 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
[!] Could not deserialize engine. See log for details.

172.17.0.1 - - [27/Dec/2022 21:58:48] "POST /voltaml/job HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/workspace/voltaML-fast-stable-diffusion/app.py", line 88, in upload_file
    pipeline_time = infer_trt(saving_path=saving_path,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 678, in infer_trt
    load_trt(saving_path, model, prompt, img_height, img_width, num_inference_steps)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 599, in load_trt
    trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 309, in loadEngines
    self.engine[model_name].activate()
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 78, in activate
    self.engine = engine_from_bytes(bytes_from_path(self.engine_path))
  File "<string>", line 3, in func_impl
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 564, in call_impl
    G_LOGGER.critical("Could not deserialize engine. See log for details.")
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/logger/logger.py", line 597, in critical
    raise PolygraphyException(message) from None
polygraphy.exception.exception.PolygraphyException: Could not deserialize engine. See log for details.


[Feature]: Tesla A40 support?

Is your feature request related to a problem? Please describe.

· Graphics card for AITemplate: RTX 40xx, RTX 30xx, H100, A100, A10, A30, V100, T4
It seems that the Tesla A40 is not supported?

Describe the solution you'd like

I have a Tesla A40, and it seems not to be on the supported list. Can I use it normally, or do I need to wait for support?

Describe alternatives you've considered

No response

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Windows Native Errors

Describe the bug

AITemplate has an issue compiling models due to it complaining about the Make file being missing. Manually editing the make file for windows specific directory structures does not work (has not in the past). Module Aitemplate can't be imported unless built manually, and even then, like TensorRT, it fails to work on windows out of the box.

Reproduction

  1. Install the repo as you would on any other system
  2. Manually build AItemplate
  3. Install AITemplate Wheel (due to repo not being able to import Pypi AITemplate)
  4. Attempt to accelerate model
  5. Receive error about missing makefile directory

Expected behavior

Honestly, I expected this, most speedups are Linux only (WSL or otherwise).

Branch

Main

System Info

Python: 3.10
OS: Windows 11
Repo: 6c82d05
GPU: RTX 3090
RAM: 48 GB

Additional context

FileNotFoundError: [Errno 2] No such file or directory: "'data\aitemplate\Linaqruf--anything-v3.0__512x512x1\profiler'\Makefile"

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

NameError: name 'loaded_model' is not defined, and FileNotFoundError: [Errno 2] No such file or directory: 'onnx/clip.onnx'

Hi,

Trying to run accelerated SD1.5 models, getting this issue
Running on windows 11 WSL, with RTX 3070 8GB

CMD:

docker run --gpus=all -v C:\voltaml\engine/engine:/workspace/voltaML-fast-stable-diffusion/engine -v C:\voltaml\output/engine:/workspace/voltaML-fast-stable-diffusion/static/output -p 5003:5003 -it voltaml/volta_diffusion_webui:v0.2
172.17.0.1 - - [18/Dec/2022 13:15:21] "POST /voltaml/job HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 661, in infer_trt
    if loaded_model!=args.model_path:
NameError: name 'loaded_model' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/workspace/voltaML-fast-stable-diffusion/app.py", line 88, in upload_file
    pipeline_time = infer_trt(saving_path=saving_path,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 664, in infer_trt
    load_trt(saving_path, model, prompt, img_height, img_width, num_inference_steps)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 599, in load_trt
    trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 279, in loadEngines
    torch.onnx.export(model,
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/__init__.py", line 350, in export
    return utils.export(
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 163, in export
    _export(
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 1148, in _export
    with torch.serialization._open_file_like(f, "wb") as opened_file:
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'onnx/clip.onnx'

image to image AND controlnet don't work when use AIT model

INFO: 172.18.22.48:57122 - "POST /api/generate/img2img HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/core/cluster.py", line 143, in generate
return await best_gpu.generate(job)
File "/app/core/gpu.py", line 135, in generate
raise err
File "/app/core/gpu.py", line 127, in generate
images = await run_in_thread_async(func=generate_thread_call, args=(job,))
File "/app/core/utils.py", line 77, in run_in_thread_async
raise exc
File "/app/core/thread.py", line 45, in run
self._return = target(*self._args, **self._kwargs) # type: ignore
File "/app/core/gpu.py", line 93, in generate_thread_call
images: List[Image.Image] = model.generate(job)
File "/app/core/inference/aitemplate.py", line 102, in generate
images = self.img2img(job)
File "/app/core/inference/aitemplate.py", line 176, in img2img
pipe = StableDiffusionImg2ImgAITPipeline(
File "/app/core/aitemplate/src/ait_img2img.py", line 111, in init
self.clip_ait_exe = self.init_ait_module(
File "/app/core/aitemplate/src/ait_img2img.py", line 137, in init_ait_module
mod = Model(os.path.join(workdir, model_name, "test.so"))
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 213, in init
self.DLL = self._DLLWrapper(lib_path)
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 164, in init
self.DLL = ctypes.cdll.LoadLibrary(lib_path)
File "/usr/lib/python3.8/ctypes/init.py", line 451, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.8/ctypes/init.py", line 373, in init
self._handle = _dlopen(self._name, mode)
OSError: tmp/CLIPTextModel/test.so: cannot open shared object file: No such file or directory

Fine tuned model quality degrade after compiling to TRT

Hi I found your code works really good.
Compile went smooth, but I found that my fine tuned model quality goes down when infer with compiled TRT engines.
Do you have experience of quality change after compile?
Or is it okay with your models?

TRT Inference Not Working [volta_trt_flash]

[E] 3: [executionContext.cpp::validateInputBindings::1831] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::validateInputBindings::1831, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [2,4,64,96] for bindings[0] exceed min ~ max range at index 3, maximum dimension in profile is 64, minimum dimension in profile is 64, but supplied dimension is 96.
Exception in thread Thread-87:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 544, in infer_trt
    images = demo.infer(prompt, negative_prompt, args.height, args.width, verbose=args.verbose, seed=args.seed)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 404, in infer
    noise_pred = self.runEngine(self.unet_model_key, {"sample": sample_inp, "timestep": timestep_inp, "encoder_hidden_states": embeddings_inp})['latent']
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 271, in runEngine
    return engine.infer(feed_dict, self.stream)
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 108, in infer
    raise ValueError(f"ERROR: inference failed.")
ValueError: ERROR: inference failed.

rtx4090
used original Dockerfile from the volta_trt_flash branch.

[12/10/2022-22:18:01] [TRT] [E] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_9398 + (Unnamed Layer* 6862) [Shuffle].../3/0_2/Reshape_1 + /3/0_2/Transpose_1]}.) [12/10/2022-22:18:01] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

python3 volta_accelerate.py --onnx_trt=trtpython3 volta_accelerate.py --onnx_trt=trt

GPU: GTX 2080Ti

Running from the docker image: voltaml/volta_diffusion:v0.2

[12/10/2022-22:18:01] [TRT] [E] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_9398 + (Unnamed Layer* 6862) [Shuffle].../3/0_2/Reshape_1 + /3/0_2/Transpose_1]}.)
[12/10/2022-22:18:01] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

clip skip selection?

Is your feature request related to a problem? Please describe.

Feature like another ui

Describe the solution you'd like

Maybe slider?

Describe alternatives you've considered

No response

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Converted models produce same output

I've converted to trt SD1.5 SD2.1 and my merge https://huggingface.co/Magistr/Magmix
and run
that prompt against them,
(18 years young girl:1.4),detailed face and eyes,green eyes,female focus,silver hair,short messy hair,small breasts,flat chest,(blue sneakers:1.2),(black bike shorts:1.2),full body,fooling around,standing, wolf years, wolf tail, animal tail, grey croptop,red jacket, riverside, ruins, old bridge,(yellow socks:1.2),

got same images in output from all 3 trt models

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.