Coder Social home page Coder Social logo

ml-stable-diffusion's Introduction

Core ML Stable Diffusion

Run Stable Diffusion on Apple Silicon with Core ML

[Blog Post] [BibTeX]

This repository comprises:

  • python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python
  • StableDiffusion, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by python_coreml_stable_diffusion

If you run into issues during installation or runtime, please refer to the FAQ section. Please refer to the System Requirements section before getting started.

System Requirements

Details (Click to expand)

Model Conversion:

macOS Python coremltools
13.1 3.8 7.0

Project Build:

macOS Xcode Swift
13.1 14.3 5.8

Target Device Runtime:

macOS iPadOS, iOS
13.1 16.2

Target Device Runtime (With Memory Improvements):

macOS iPadOS, iOS
14.0 17.0

Target Device Hardware Generation:

Mac iPad iPhone
M1 M1 A14

Performance Benchmarks

Details (Click to expand)

stabilityai/stable-diffusion-2-1-base (512x512)

Device --compute-unit --attention-implementation End-to-End Latency (s) Diffusion Speed (iter/s)
iPhone 12 Mini CPU_AND_NE SPLIT_EINSUM_V2 18.5* 1.44
iPhone 12 Pro Max CPU_AND_NE SPLIT_EINSUM_V2 15.4 1.45
iPhone 13 CPU_AND_NE SPLIT_EINSUM_V2 10.8* 2.53
iPhone 13 Pro Max CPU_AND_NE SPLIT_EINSUM_V2 10.4 2.55
iPhone 14 CPU_AND_NE SPLIT_EINSUM_V2 8.6 2.57
iPhone 14 Pro Max CPU_AND_NE SPLIT_EINSUM_V2 7.9 2.69
iPad Pro (M1) CPU_AND_NE SPLIT_EINSUM_V2 11.2 2.19
iPad Pro (M2) CPU_AND_NE SPLIT_EINSUM_V2 7.0 3.07
Details (Click to expand)
  • This benchmark was conducted by Apple and Hugging Face using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 Seed 8 in August 2023.
  • The performance data was collected using the benchmark branch of the Diffusers app
  • Swift code is not fully optimized, introducing up to ~10% overhead unrelated to Core ML model execution.
  • The median latency value across 5 back-to-back end-to-end executions are reported
  • The image generation procedure follows the standard configuration: 20 inference steps, 512x512 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
  • The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (tokenizer.model_max_length) in the text token sequence regardless of the actual length of the input text.
  • Weights are compressed to 6 bit precision. Please refer to this section for details.
  • Activations are in float16 precision for both the GPU and the Neural Engine.
  • * indicates that the reduceMemory option was enabled which loads and unloads models just-in-time to avoid memory shortage. This added up to 2 seconds to the end-to-end latency.
  • In the benchmark table, we report the best performing --compute-unit and --attention-implementation values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific.
  • Note that the performance optimizations in this repository (e.g. --attention-implementation) are generally applicable to Transformers and not customized to Stable Diffusion. Better performance may be observed upon custom kernel tuning. Therefore, these numbers do not represent peak HW capability.
  • Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
  • Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state.

stabilityai/stable-diffusion-xl-base-1.0-ios (768x768)

Device --compute-unit --attention-implementation End-to-End Latency (s) Diffusion Speed (iter/s)
iPhone 12 Pro CPU_AND_NE SPLIT_EINSUM 116* 0.50
iPhone 13 Pro Max CPU_AND_NE SPLIT_EINSUM 86* 0.68
iPhone 14 Pro Max CPU_AND_NE SPLIT_EINSUM 77* 0.83
iPhone 15 Pro Max CPU_AND_NE SPLIT_EINSUM 31 0.85
iPad Pro (M1) CPU_AND_NE SPLIT_EINSUM 36 0.69
iPad Pro (M2) CPU_AND_NE SPLIT_EINSUM 27 0.98
Details (Click to expand)
  • This benchmark was conducted by Apple and Hugging Face using iOS 17.0.2 and iPadOS 17.0.2 in September 2023.
  • The performance data was collected using the benchmark branch of the Diffusers app
  • The median latency value across 5 back-to-back end-to-end executions are reported
  • The image generation procedure follows this configuration: 20 inference steps, 768x768 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
  • Unet.mlmodelc is compressed to 4.04 bit precision following the Mixed-Bit Palettization algorithm recipe published here
  • All models except for Unet.mlmodelc are compressed to 16 bit precision
  • madebyollin/sdxl-vae-fp16-fix by @madebyollin was used as the source PyTorch model for VAEDecoder.mlmodelc in order to enable float16 weight and activation quantization for the VAE model.
  • --attention-implementation SPLIT_EINSUM is chosen in lieu of SPLIT_EINSUM_V2 due to the prohibitively long compilation time of the latter
  • * indicates that the reduceMemory option was enabled which loads and unloads models just-in-time to avoid memory shortage. This added significant overhead to the end-to-end latency. Note that end-to-end latency difference between iPad Pro (M1) and iPhone 13 Pro Max despite identical diffusion speed.
  • The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (tokenizer.model_max_length) in the text token sequence regardless of the actual length of the input text.
  • In the benchmark table, we report the best performing --compute-unit and --attention-implementation values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific.
  • Note that the performance optimizations in this repository (e.g. --attention-implementation) are generally applicable to Transformers and not customized to Stable Diffusion. Better performance may be observed upon custom kernel tuning. Therefore, these numbers do not represent peak HW capability.
  • Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
  • Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state.

stabilityai/stable-diffusion-xl-base-1.0 (1024x1024)

Device --compute-unit --attention-implementation End-to-End Latency (s) Diffusion Speed (iter/s)
MacBook Pro (M1 Max) CPU_AND_GPU ORIGINAL 46 0.46
MacBook Pro (M2 Max) CPU_AND_GPU ORIGINAL 37 0.57
Mac Studio (M1 Ultra) CPU_AND_GPU ORIGINAL 25 0.89
Mac Studio (M2 Ultra) CPU_AND_GPU ORIGINAL 20 1.11
Details (Click to expand)
  • This benchmark was conducted by Apple and Hugging Face using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 in July 2023.
  • The performance data was collected by running the StableDiffusion Swift pipeline.
  • The median latency value across 3 back-to-back end-to-end executions are reported
  • The image generation procedure follows the standard configuration: 20 inference steps, 1024x1024 output image resolution, classifier-free guidance (batch size of 2 for unet).
  • Weights and activations are in float16 precision
  • Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
  • Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state. Given these factors, we do not report sub-second variance in latency.

Weight Compression (6-bits and higher)

Details (Click to expand)

coremltools-7.0 supports advanced weight compression techniques for pruning, palettization and linear 8-bit quantization. For these techniques, coremltools.optimize.torch.* includes APIs that require fine-tuning to maintain accuracy at higher compression rates whereas coremltools.optimize.coreml.* includes APIs that are applied post-training and are data-free.

We demonstrate how data-free post-training palettization implemented in coremltools.optimize.coreml.palettize_weights enables us to achieve greatly improved performance for Stable Diffusion on mobile devices. This API implements the Fast Exact k-Means algorithm for optimal weight clustering which yields more accurate palettes. Using --quantize-nbits {2,4,6,8} during conversion is going to apply this compression to the unet and text_encoder models.

For best results, we recommend training-time palettization: coremltools.optimize.torch.palettization.DKMPalettizer if fine-tuning your model is feasible. This API implements the Differentiable k-Means (DKM) learned palettization algorithm. In this exercise, we stick to post-training palettization for the sake of simplicity and ease of reproducibility.

The Neural Engine is capable of accelerating models with low-bit palettization: 1, 2, 4, 6 or 8 bits. With iOS 17 and macOS 14, compressed weights for Core ML models can be just-in-time decompressed during runtime (as opposed to ahead-of-time decompression upon load) to match the precision of activation tensors. This yields significant memory savings and enables models to run on devices with smaller RAM (e.g. iPhone 12 Mini). In addition, compressed weights are faster to fetch from memory which reduces the latency of memory bandwidth-bound layers. The just-in-time decompression behavior depends on the compute unit, layer type and hardware generation.

Weight Precision --compute-unit stabilityai/stable-diffusion-2-1-base generating "a high quality photo of a surfing dog"
6-bit cpuAndNeuralEngine
16-bit cpuAndNeuralEngine
16-bit cpuAndGPU

Note that there are minor differences across 16-bit (float16) and 6-bit results. These differences are comparable to the differences across float16 and float32 or differences across compute units as exemplified above. We recommend a minimum of 6 bits for palettizing Stable Diffusion. Smaller number of bits (1, 2 and 4) will require either fine-tuning or advanced palettization techniques such as MBP.

Resources:

Advanced Weight Compression (Lower than 6-bits)

Details (Click to expand)

This section describes an advanced compression algorithm called Mixed-Bit Palettization (MBP) built on top of the Post-Training Weight Palettization tools and using the Weights Metadata API from coremltools.

MBP builds a per-layer "palettization recipe" by picking a suitable number of bits among the Neural Engine supported bit-widths of 1, 2, 4, 6 and 8 in order to achieve the minimum average bit-width while maintaining a desired level of signal strength. The signal strength is measured by comparing the compressed model's output to that of the original float16 model. Given the same random seed and text prompts, PSNR between denoised latents is computed. The compression rate will depend on the model version as well as the tolerance for signal loss (drop in PSNR) since this algorithm is adaptive.

3.41-bit 4.50-bit 6.55-bit 16-bit (original)

For example, the original float16 stabilityai/stable-diffusion-xl-base-1.0 model has an ~82 dB signal strength. Naively applying linear 8-bit quantization to the Unet model drops the signal to ~65 dB. Instead, applying MBP yields an average of 2.81-bits quantization while maintaining a signal strength of ~67 dB. This technique generally yields better results compared to using --quantize-nbits during model conversion but requires a "pre-analysis" run that takes up to a few hours on a single GPU (mps or cuda).

Here is the signal strength (PSNR in dB) versus model size reduction (% of float16 size) for stabilityai/stable-diffusion-xl-base-1.0. The {1,2,4,6,8}-bit curves are generated by progresssively palettizing more layers using a palette with fixed number of bits. The layers were ordered in ascending order of their isolated impact to end-to-end signal strength so the cumulative compression's impact is delayed as much as possible. The mixed-bit curve is based on falling back to a higher number of bits as soon as a layer's isolated impact to end-to-end signal integrity drops below a threshold. Note that all curves based on palettization outperform linear 8-bit quantization at the same model size except for 1-bit.

Here are the steps for applying this technique on another model version:

Step 1: Run the pre-analysis script to generate "recipes" with varying signal strength:

python -m python_coreml_stable_diffusion.mixed_bit_compression_pre_analysis --model-version <model-version> -o <output-dir>

For popular base models, you may find the pre-computed pre-analysis results here. Fine-tuned models models are likely to honor the recipes of their corresponding base models but this is untested.

Step 2: The resulting JSON file from Step 1 will list "baselines", e.g.:

{
  "model_version": "stabilityai/stable-diffusion-xl-base-1.0",
  "baselines": {
    "original": 82.2,
    "linear_8bit": 66.025,
    "recipe_6.55_bit_mixedpalette": 79.9,
    "recipe_5.52_bit_mixedpalette": 78.2,
    "recipe_4.89_bit_mixedpalette": 76.8,
    "recipe_4.41_bit_mixedpalette": 75.5,
    "recipe_4.04_bit_mixedpalette": 73.2,
    "recipe_3.67_bit_mixedpalette": 72.2,
    "recipe_3.32_bit_mixedpalette": 71.4,
    "recipe_3.19_bit_mixedpalette": 70.4,
    "recipe_3.08_bit_mixedpalette": 69.6,
    "recipe_2.98_bit_mixedpalette": 68.6,
    "recipe_2.90_bit_mixedpalette": 67.8,
    "recipe_2.83_bit_mixedpalette": 67.0,
    "recipe_2.71_bit_mixedpalette": 66.3
  },
}

Among these baselines, select a recipe based on your desired signal strength. We recommend palettizing to ~4 bits depending on the use case even if the signal integrity for lower bit values are higher than the linear 8-bit quantization baseline.

Finally, apply the selected recipe to the float16 Core ML model as follows:

python -m python_coreml_stable_diffusion.mixed_bit_compression_apply --mlpackage-path <path-to-float16-unet-mlpackage> -o <output-dir> --pre-analysis-json-path <path-to--pre-analysis-json> --selected-recipe <selected-recipe-string-key>

An example <selected-recipe-string-key> would be "recipe_4.50_bit_mixedpalette" which achieves an average of 4.50-bits compression (compressed from ~5.2GB to ~1.46GB for SDXL). Please note that signal strength does not directly map to image-text alignment. Always verify that your MBP-compressed model variant is accurately generating images for your test prompts.

Using Stable Diffusion XL

Details (Click to expand)

Model Conversion

e.g.:

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-vae-decoder --convert-text-encoder --xl-version --model-version stabilityai/stable-diffusion-xl-base-1.0 --refiner-version stabilityai/stable-diffusion-xl-refiner-1.0 --bundle-resources-for-swift-cli --attention-implementation {ORIGINAL,SPLIT_EINSUM} -o <output-dir>
  • --xl-version: Additional argument to pass to the conversion script when specifying an XL model
  • --refiner-version: Additional argument to pass to the conversion script when specifying an XL refiner model, required for "Ensemble of Expert Denoisers" inference.
  • --attention-implementation: ORIGINAL is recommended for cpuAndGPU for deployment on Mac
  • --attention-implementation: SPLIT_EINSUM is recommended for cpuAndNeuralEngine for deployment on iPhone & iPad
  • --attention-implementation: SPLIT_EINSUM_V2 is not recommended for Stable Diffusion XL because of prohibitively long compilation time
  • Tip: Adding --latent-h 96 --latent-w 96 is recommended for iOS and iPadOS deployment which leads to 768x768 generation as opposed to the default 1024x1024.
  • Tip: Due to known float16 overflow issues in the original Stable Diffusion XL VAE, the model conversion script enforces float32 precision. Using a custom VAE version such as madebyollin/sdxl-vae-fp16-fix by @madebyollin via --custom-vae-version madebyollin/sdxl-vae-fp16-fix will restore the default float16 precision for VAE.

Swift Inference

swift run StableDiffusionSample <prompt> --resource-path <output-mlpackages-directory/Resources> --output-path <output-dir> --compute-units {cpuAndGPU,cpuAndNeuralEngine} --xl
  • Only the base model is required, refiner model is optional and will be used by default if provided in the resource directory
  • ControlNet for XL is not yet supported

Python Inference

python -m python_coreml_stable_diffusion.pipeline --prompt <prompt> --compute-unit {CPU_AND_GPU,CPU_AND_NE} -o <output-dir> -i <output-mlpackages-directory/Resources> --model-version stabilityai/stable-diffusion-xl-base-1.0
  • refiner model is not yet supported
  • ControlNet for XL is not yet supported

Using ControlNet

Details (Click to expand)

Example results using the prompt "a high quality photo of a surfing dog" conditioned on the scribble (leftmost):

ControlNet allows users to condition image generation with Stable Diffusion on signals such as edge maps, depth maps, segmentation maps, scribbles and pose. Thanks to @ryu38's contribution, both the Python CLI and the Swift package support ControlNet models. Please refer to this section for details on setting up Stable Diffusion with ControlNet.

Note that ControlNet is not yet supported for Stable Diffusion XL.

Using the System Multilingual Text Encoder

Details (Click to expand)

With iOS 17 and macOS 14, NaturalLanguage framework introduced the NLContextualEmbedding which provides Transformer-based textual embeddings for Latin (20 languages), Cyrillic (4 languages) and CJK (3 languages) scripts. The WWDC23 session titled Explore Natural Language multilingual models demonstrated how this powerful new model can be used by developers to train downstream tasks such as multilingual image generation with Stable Diffusion.

The code to reproduce this demo workflow is made available in this repository. There are several ways in which this workflow can be implemented. Here is an example:

Step 1: Curate an image-text dataset with the desired languages.

Step 2: Pre-compute the NLContextualEmbedding values and replace the text strings with these embedding vectors in your dataset.

Step 3: Fine-tune a base model from Hugging Face Hub that is compatible with the StableDiffusionPipeline by using your new dataset and replacing the default text_encoder with your pre-computed NLContextualEmbedding values.

Step 4: In order to be able to swap the text_encoder of a base model without training new layers, the base model's text_encoder.hidden_size must match that of NLContextualEmbedding. If it doesn't, you will need to train a linear projection layer to map between the two dimensionalities. After fine-tuning, this linear layer should be converted to CoreML as follows:

python -m python_coreml_stable_diffusion.multilingual_projection --input-path <path-to-projection-torchscript> --output-dir <output-dir>

The command above will yield a MultilingualTextEncoderProjection.mlmodelc file under --output-dir and this should be colocated with the rest of the Core ML model assets that were generated through --bundle-resources-for-swift-cli.

Step 5: The multilingual system text encoder can now be invoked by setting useMultilingualTextEncoder to true when initializing a pipeline or setting --use-multilingual-text-encoder in the CLI. Note that the model assets are distributed over-the-air so the first invocation will trigger asset downloads which is less than 100MB.

Resources:

Using Ready-made Core ML Models from Hugging Face Hub

Click to expand

πŸ€— Hugging Face ran the conversion procedure on the following models and made the Core ML weights publicly available on the Hub. If you would like to convert a version of Stable Diffusion that is not already available on the Hub, please refer to the Converting Models to Core ML.

If you want to use any of those models you may download the weights and proceed to generate images with Python or Swift.

There are several variants in each model repository. You may clone the whole repos using git and git lfs to download all variants, or selectively download the ones you need.

To clone the repos using git, please follow this process:

Step 1: Install the git lfs extension for your system.

git lfs stores large files outside the main git repo, and it downloads them from the appropriate server after you clone or checkout. It is available in most package managers, check the installation page for details.

Step 2: Enable git lfs by running this command once:

git lfs install

Step 3: Use git clone to download a copy of the repo that includes all model variants. For Stable Diffusion version 1.4, you'd issue the following command in your terminal:

git clone https://huggingface.co/apple/coreml-stable-diffusion-v1-4

If you prefer to download specific variants instead of cloning the repos, you can use the huggingface_hub Python library. For example, to do generation in Python using the ORIGINAL attention implementation (read this section for details), you could use the following helper code:

from huggingface_hub import snapshot_download
from pathlib import Path

repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/packages"

model_path = Path("./models") / (repo_id.split("/")[-1] + "_" + variant.replace("/", "_"))
snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path, local_dir_use_symlinks=False)
print(f"Model downloaded at {model_path}")

model_path would be the path in your local filesystem where the checkpoint was saved. Please, refer to this post for additional details.

Converting Models to Core ML

Click to expand

Step 1: Create a Python environment and install dependencies:

conda create -n coreml_stable_diffusion python=3.8 -y
conda activate coreml_stable_diffusion
cd /path/to/cloned/ml-stable-diffusion/repository
pip install -e .

Step 2: Log in to or register for your Hugging Face account, generate a User Access Token and use this token to set up Hugging Face API access by running huggingface-cli login in a Terminal window.

Step 3: Navigate to the version of Stable Diffusion that you would like to use on Hugging Face Hub and accept its Terms of Use. The default model version is CompVis/stable-diffusion-v1-4. The model version may be changed by the user as described in the next step.

Step 4: Execute the following command from the Terminal to generate Core ML model files (.mlpackage)

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker --model-version <model-version-string-from-hub> -o <output-mlpackages-directory>

WARNING: This command will download several GB worth of PyTorch checkpoints from Hugging Face. Please ensure that you are on Wi-Fi and have enough disk space.

This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful execution, the 4 neural network models that comprise Stable Diffusion will have been converted from PyTorch to Core ML (.mlpackage) and saved into the specified <output-mlpackages-directory>. Some additional notable arguments:

  • --model-version: The model version name as published on the Hugging Face Hub

  • --refiner-version: The refiner version name as published on the Hugging Face Hub. This is optional and if specified, this argument will convert and bundle the refiner unet alongside the model unet.

  • --bundle-resources-for-swift-cli: Compiles all 4 models and bundles them along with necessary resources for text tokenization into <output-mlpackages-directory>/Resources which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. However using these compiled models in Python will significantly speed up inference.

  • --quantize-nbits: Quantizes the weights of unet and text_encoder models down to 2, 4, 6 or 8 bits using a globally optimal k-means clustering algorithm. By default all models are weight-quantized to 16 bits even if this argument is not specified. Please refer to [this section](#compression-6-bits-and-higher for details and further guidance on weight compression.

  • --chunk-unet: Splits the Unet model in two approximately equal chunks (each with less than 1GB of weights) for mobile-friendly deployment. This is required for Neural Engine deployment on iOS and iPadOS if weights are not quantized to 6-bits or less (--quantize-nbits {2,4,6}). This is not required for macOS. Swift CLI is able to consume both the chunked and regular versions of the Unet model but prioritizes the former. Note that chunked unet is not compatible with the Python pipeline because Python pipeline is intended for macOS only.

  • --attention-implementation: Defaults to SPLIT_EINSUM which is the implementation described in Deploying Transformers on the Apple Neural Engine. --attention-implementation SPLIT_EINSUM_V2 yields 10-30% improvement for mobile devices, still targeting the Neural Engine. --attention-implementation ORIGINAL will switch to an alternative implementation that should be used for CPU or GPU deployment on some Mac devices. Please refer to the Performance Benchmark section for further guidance.

  • --check-output-correctness: Compares original PyTorch model's outputs to final Core ML model's outputs. This flag increases RAM consumption significantly so it is recommended only for debugging purposes.

  • --convert-controlnet: Converts ControlNet models specified after this option. This can also convert multiple models if you specify like --convert-controlnet lllyasviel/sd-controlnet-mlsd lllyasviel/sd-controlnet-depth.

  • --unet-support-controlnet: enables a converted UNet model to receive additional inputs from ControlNet. This is required for generating image with using ControlNet and saved with a different name, *_control-unet.mlpackage, distinct from normal UNet. On the other hand, this UNet model can not work without ControlNet. Please use normal UNet for just txt2img.

  • --convert-vae-encoder: not required for text-to-image applications. Required for image-to-image applications in order to map the input image to the latent space.

Image Generation with Python

Click to expand

Run text-to-image generation using the example Python pipeline based on diffusers:

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i <core-ml-model-directory> -o </path/to/output/image> --compute-unit ALL --seed 93

Please refer to the help menu for all available arguments: python -m python_coreml_stable_diffusion.pipeline -h. Some notable arguments:

  • -i: Should point to the -o directory from Step 4 of Converting Models to Core ML section from above. If you specified --bundle-resources-for-swift-cli during conversion, then use the resulting Resources folder (which holds the compiled .mlmodelc files). The compiled models load much faster after first use.
  • --model-version: If you overrode the default model version while converting models to Core ML, you will need to specify the same model version here.
  • --compute-unit: Note that the most performant compute unit for this particular implementation may differ across different hardware. CPU_AND_GPU or CPU_AND_NE may be faster than ALL. Please refer to the Performance Benchmark section for further guidance.
  • --scheduler: If you would like to experiment with different schedulers, you may specify it here. For available options, please see the help menu. You may also specify a custom number of inference steps by --num-inference-steps which defaults to 50.
  • --controlnet: ControlNet models specified with this option are used in image generation. Use this option in the format --controlnet lllyasviel/sd-controlnet-mlsd lllyasviel/sd-controlnet-depth and make sure to use --controlnet-inputs in conjunction.
  • --controlnet-inputs: Image inputs corresponding to each ControlNet model. Please provide image paths in same order as models in --controlnet, for example: --controlnet-inputs image_mlsd image_depth.

Image Generation with Swift

Click to expand

Example CLI Usage

swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path <output-mlpackages-directory>/Resources/ --seed 93 --output-path </path/to/output/image>

The output will be named based on the prompt and random seed: e.g. </path/to/output/image>/a_photo_of_an_astronaut_riding_a_horse_on_mars.93.final.png

Please use the --help flag to learn about batched generation and more.

Example Library Usage

import StableDiffusion
...
let pipeline = try StableDiffusionPipeline(resourcesAt: resourceURL)
pipeline.loadResources()
let image = try pipeline.generateImages(prompt: prompt, seed: seed).first

On iOS, the reduceMemory option should be set to true when constructing StableDiffusionPipeline

Swift Package Details

This Swift package contains two products:

  • StableDiffusion library
  • StableDiffusionSample command-line tool

Both of these products require the Core ML models and tokenization resources to be supplied. When specifying resources via a directory path that directory must contain the following:

  • TextEncoder.mlmodelc or `TextEncoder2.mlmodelc (text embedding model)
  • Unet.mlmodelc or UnetChunk1.mlmodelc & UnetChunk2.mlmodelc (denoising autoencoder model)
  • VAEDecoder.mlmodelc (image decoder model)
  • vocab.json (tokenizer vocabulary file)
  • merges.text (merges for byte pair encoding file)

Optionally, for image2image, in-painting, or similar:

  • VAEEncoder.mlmodelc (image encoder model)

Optionally, it may also include the safety checker model that some versions of Stable Diffusion include:

  • SafetyChecker.mlmodelc

Optionally, for the SDXL refiner:

  • UnetRefiner.mlmodelc (refiner unet model)

Optionally, for ControlNet:

  • ControlledUNet.mlmodelc or ControlledUnetChunk1.mlmodelc & ControlledUnetChunk2.mlmodelc (enabled to receive ControlNet values)
  • controlnet/ (directory containing ControlNet models)
    • LllyasvielSdControlnetMlsd.mlmodelc (for example, from lllyasviel/sd-controlnet-mlsd)
    • LllyasvielSdControlnetDepth.mlmodelc (for example, from lllyasviel/sd-controlnet-depth)
    • Other models you converted

Note that the chunked version of Unet is checked for first. Only if it is not present will the full Unet.mlmodelc be loaded. Chunking is required for iOS and iPadOS and not necessary for macOS.

Example Swift App

Click to expand

πŸ€— Hugging Face created an open-source demo app on top of this library. It's written in native Swift and Swift UI, and runs on macOS, iOS and iPadOS. You can use the code as a starting point for your app, or to see how to integrate this library in your own projects.

Hugging Face has made the app available in the Mac App Store.

FAQ

Click to expand
Q1: ERROR: Failed building wheel for tokenizers or error: can't find Rust compiler

A1: Please review this potential solution.

Q2: RuntimeError: {NSLocalizedDescription = "Error computing NN outputs."

A2: There are many potential causes for this error. In this context, it is highly likely to be encountered when your system is under increased memory pressure from other applications. Reducing memory utilization of other applications is likely to help alleviate the issue.

Q3: My Mac has 8GB RAM and I am converting models to Core ML using the example command. The process is getting killed because of memory issues. How do I fix this issue?

A3: In order to minimize the memory impact of the model conversion process, please execute the following command instead:

python -m python_coreml_stable_diffusion.torch2coreml --convert-vae-encoder --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-vae-decoder --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-safety-checker --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> &&

If you need --chunk-unet, you may do so in yet another independent command which will reuse the previously exported Unet model and simply chunk it in place:

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --chunk-unet -o <output-mlpackages-directory>
Q4: My Mac has 8GB RAM, should image generation work on my machine?

A4: Yes! Especially the --compute-unit CPU_AND_NE option should work under reasonable system load from other applications. Note that part of the Example Results were generated using an M2 MacBook Air with 8GB RAM.

Q5: Every time I generate an image using the Python pipeline, loading all the Core ML models takes 2-3 minutes. Is this expected?

A5: Both .mlpackage and .mlmodelc models are compiled (also known as "model preparation" in Core ML terms) upon first load when a specific compute unit is specified. .mlpackage does not cache this compiled asset so each model load retriggers this compilation which may take up to a few minutes. On the other hand, .mlmodelc files do cache this compiled asset and non-first load times are reduced to just a few seconds.

In order to benefit from compilation caching, you may use the .mlmodelc assets instead of .mlpackage assets in both Swift (default) and Python (possible thanks to @lopez-hector's contribution) image generation pipelines.

Q6: I want to deploy StableDiffusion, the Swift package, in my mobile app. What should I be aware of?

A6: The Image Generation with Swift section describes the minimum SDK and OS versions as well as the device models supported by this package. We recommend carefully testing the package on the device with the least amount of RAM available among your deployment targets.

The image generation process in StableDiffusion can yield over 2 GB of peak memory during runtime depending on the compute units selected. On iPadOS, we recommend using .cpuAndNeuralEngine in your configuration and the reduceMemory option when constructing a StableDiffusionPipeline to minimize memory pressure.

If your app crashes during image generation, consider adding the Increased Memory Limit capability to inform the system that some of your app’s core features may perform better by exceeding the default app memory limit on supported devices.

On iOS, depending on the iPhone model, Stable Diffusion model versions, selected compute units, system load and design of your app, this may still not be sufficient to keep your apps peak memory under the limit. Please remember, because the device shares memory between apps and iOS processes, one app using too much memory can compromise the user experience across the whole device.

We strongly recommend compressing your models following the recipes in Advanced Weight Compression (Lower than 6-bits) for iOS deployment. This reduces the peak RAM usage by up to 75% (from 16-bit to 4-bit) while preserving model output quality.

Q7: How do I generate images with different resolutions using the same Core ML models?

A7: The current version of python_coreml_stable_diffusion does not support single-model multi-resolution out of the box. However, developers may fork this project and leverage the flexible shapes support from coremltools to extend the torch2coreml script by using coremltools.EnumeratedShapes. Note that, while the text_encoder is agnostic to the image resolution, the inputs and outputs of vae_decoder and unet models are dependent on the desired image resolution.

Q8: Are the Core ML and PyTorch generated images going to be identical?

A8: If desired, the generated images across PyTorch and Core ML can be made approximately identical. However, it is not guaranteed by default. There are several factors that might lead to different images across PyTorch and Core ML:

1. Random Number Generator Behavior

The main source of potentially different results across PyTorch and Core ML is the Random Number Generator (RNG) behavior. PyTorch and Numpy have different sources of randomness. python_coreml_stable_diffusion generally relies on Numpy for RNG (e.g. latents initialization) and StableDiffusion Swift Library reproduces this RNG behavior by default. However, PyTorch-based pipelines such as Hugging Face diffusers relies on PyTorch's RNG behavior. Thanks to @liuliu's contributions, one can match the PyTorch (CPU/GPU) RNG behavior in Swift by specifying --rng torch/cuda which selects the torchRNG/cudaRNG mode.

2. PyTorch

"Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds." (source).

3. Model Function Drift During Conversion

The difference in outputs across corresponding PyTorch and Core ML models is a potential cause. The signal integrity is tested during the conversion process (enabled via --check-output-correctness argument to python_coreml_stable_diffusion.torch2coreml) and it is verified to be above a minimum PSNR value as tested on random inputs. Note that this is simply a sanity check and does not guarantee this minimum PSNR across all possible inputs. Furthermore, the results are not guaranteed to be identical when executing the same Core ML models across different compute units. This is not expected to be a major source of difference as the sample visual results indicate in this section.

4. Weights and Activations Data Type

When quantizing models from float32 to lower-precision data types such as float16, the generated images are known to vary slightly in semantics even when using the same PyTorch model. Core ML models generated by coremltools have float16 weights and activations by default unless explicitly overridden. This is not expected to be a major source of difference.

Q9: The model files are very large, how do I avoid a large binary for my App?

A9: The recommended option is to prompt the user to download these assets upon first launch of the app. This keeps the app binary size independent of the Core ML models being deployed. Disclosing the size of the download to the user is extremely important as there could be data charges or storage impact that the user might not be comfortable with.

Q10: `Could not initialize NNPACK! Reason: Unsupported hardware`

A10: This warning is safe to ignore in the context of this repository.

Q11: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect

A11: This warning is safe to ignore in the context of this repository.

Q12: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

A12: If this warning is printed right after zsh: killed python -m python_coreml_stable_diffusion.torch2coreml ... , then it is highly likely that your Mac has run out of memory while converting models to Core ML. Please see Q3 from above for the solution.

BibTeX Reference

@misc{stable-diffusion-coreml-apple-silicon,
title = {Stable Diffusion with Core ML on Apple Silicon},
author = {Atila Orhon and Michael Siracusa and Aseem Wadhwa},
year = {2022},
URL = {null}
}

ml-stable-diffusion's People

Contributors

0o001 avatar 1ucas avatar alejandro-isaza avatar atiorh avatar cclauss avatar dec2-anon avatar godly-devotion avatar guiyec avatar jiangdi0924 avatar justinmeans avatar kasima avatar littleowl avatar liuliu avatar lopez-hector avatar msiracusa avatar nathantannar4 avatar olegponomaryov avatar pcuenca avatar pd95 avatar ryu-ga avatar ryu38 avatar stephengoodman avatar stuartjmoore avatar thibaultcastells avatar tobyroseman avatar vzsg avatar wanaldino avatar wmorgue avatar yarspirin avatar zachnagengast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ml-stable-diffusion's Issues

Installation / Runtime Instructions

For those of us who aren't python heads, would it be possible to include a snippet of how to install and run this package? Thanks for building it!

Terminated due to memory issue (when deploying on iOS)

I was able to get StableDiffusion up and running in an app for macOS. Thanks for your incredible work, CoreML team!

However the same setup does not work on my iPhone 14 with iOS 16.2 (20C5058d) (no Simulator involved, tried it on a physical device).

Execution seems to struggle at line:

let pipeline = try StableDiffusionPipeline(resourcesAt: resourceURL)

...after a short freeze the app crashes and I’m getting an error from Xcode saying Terminated due to memory issue.

As recommended I’ve added the Increased Memory Limit capability to my project but without luck.

I also did use --chunk-unet, resulting in two unet chunks of about 850 MB each.

I am working with Xcode 14.1 (14B47) on a MBP 16 M1 Pro, 16 GB, macOS 13.0.1 Ventura.

What am I missing?

Always grey image generated, no matter of model

I'm always receive following generated modules for:

swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars"  --seed 11 --output-path ./out/ --compute-units cpuAndGPU

a_photo_of_an_astronaut_riding_a_horse_on_mars 11 final

Checked with: stabilityai/stable-diffusion-2-base/stabilityai/stable-diffusion and 1.4

Any idea what is wrong?

Symbol not found: (_$s10Accelerate6vImageO11PixelBufferV5widthSivg)

When trying to run the example through SwiftCLI on an M1 Macbook I get a "Symbol not found" abort:

> swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./Resources --seed 93 --output-path output
Building for debugging...
Build complete! (0.11s)
dyld[71007]: Symbol not found: (_$s10Accelerate6vImageO11PixelBufferV5widthSivg)
  Referenced from: '/Users/me/ml-stable-diffusion/.build/arm64-apple-macosx/debug/StableDiffusionSample'
  Expected in: '/usr/lib/swift/libswiftAccelerate.dylib'
[1]    71007 abort      swift run StableDiffusionSample  --resource-path ./Resources --seed 93  outpu

Unable to run model on iPhone 14 Pro with error: "failed to load ANE model" β€” works fine from CLI

When I run:

let config = MLModelConfiguration()
config.computeUnits = .all
let pipeline try! StableDiffusionPipeline(resourcesAt: modelUrl, configuration: config)

I get the following error:

[espresso] [Espresso::handle_ex_plan] exception=ANECF error: failed to load ANE model. Error=ANECCompile(/var/mobile/Library/Caches/com.apple.aned/tmp/com.featherless.MLDemo/CD3F6A18321CD0468900D511BF6E116C1AC2F5D1DB1D65F480343B1E5551B8A8/7204A653B1634F14166A639585DE3E3EDCFE052221F97F3476ECE9475CD8A5DE/) FAILED: err=(
    CompilationFailure
)
[coreml] Error plan build: -1.
[client] doUnloadModel:options:qos:error:: nil _ANEModel

The model is a converted stablediffusion model. I converted it using the following command line invocation:

python3 -m python_coreml_stable_diffusion.torch2coreml --convert-unet \
  --convert-text-encoder --convert-vae-decoder --convert-safety-checker \
  -o /Users/featherless/MLDemo/new-model   --model-version featherless/test-model \
  --chunk-unet --bundle-resources-for-swift-cli

The same model runs fine when invoked via command line:

swift run StableDiffusionSample "a digital portrait of an astronaut riding a horse, futuristic, highly detailed, HDR, 4k, illustration" \
  --resource-path /Users/featherless/MLDemo/new-model/Resources  \
  --seed=1235 --output-path /Users/featherless/MLDemo/output

Environment

Xcode Version 14.1 (14B47b)
Apple M1 Max, Ventura 13.1
iPhone 14 Pro, iOS 16.1.2

The model supplied is of version 7, intended for a newer version of Xcode

I got the following error , anyone know why?

when i run the generate cmd, i always got the following error, but i have set my xcode to version 14.1
image

Traceback (most recent call last):
File "/Users/lizheng/opt/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/lizheng/opt/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/lizheng/lz/lz_code/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 534, in
main(args)
File "/Users/lizheng/lz/lz_code/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 478, in main
image = coreml_pipe(
File "/Users/lizheng/lz/lz_code/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 297, in call
text_embeddings = self._encode_prompt(
File "/Users/lizheng/lz/lz_code/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 127, in _encode_prompt
text_embeddings = self.text_encoder(
File "/Users/lizheng/lz/lz_code/ml-stable-diffusion/python_coreml_stable_diffusion/coreml_model.py", line 79, in call
return self.model.predict(kwargs)
File "/Users/lizheng/opt/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/coremltools/models/model.py", line 545, in predict
raise self._framework_error
File "/Users/lizheng/opt/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/coremltools/models/model.py", line 143, in _get_proxy_and_spec
return (_MLModelProxy(filename, compute_units.name), specification, None)
RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".

Running swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path <output-mlpackages-directory>/Resources/ --seed 93 --output-path </path/to/output/image> gives the below error.

swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path mlmodels/Resources/ --seed 93 --output-path image.jpg
objc[9023]: Class _TtC10FoundationP33_45BFD3D387700B862E3A7353B97EF7ED21__CharacterSetStorage is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff8521766c0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8062f8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10Foundation13__DataStorage is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85217f1b8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8063d8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10Foundation13__NSSwiftData is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85216d640) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8044c8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCV10Foundation4Data14RangeReference is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85217f2d8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8064b8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10Foundation13__JSONEncoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85217ffa0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8065d0). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_12768CA107A31EF2DCE034FD75B541C913__JSONEncoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff8521800f0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806730). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_12768CA107A31EF2DCE034FD75B541C924__JSONReferencingEncoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff8521801f0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806820). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10Foundation13__JSONDecoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff852180320) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806930). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_12768CA107A31EF2DCE034FD75B541C913__JSONDecoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff852180498) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806a70). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCE10FoundationCSo12NSDictionary9_Iterator is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85217b8e8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806b58). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10Foundation26__NSErrorRecoveryAttempter is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85217b018) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806bf0). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_6DA0945A07226B3278459E9368612FF427__KVOKeyPathBridgeMachinery is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85216c980) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c804570). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCC10FoundationP33_6DA0945A07226B3278459E9368612FF427__KVOKeyPathBridgeMachinery9BridgeKey is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85216ca28) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8045d0). One of the two will be used. Which one is undefined.
objc[9023]: Class _NSKeyValueObservation is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85216cae8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c804648). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCC10Foundation21NSKeyValueObservationP33_6DA0945A07226B3278459E9368612FF46Helper is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85216cbc0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8046b8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10Foundation20_PropertyListEncoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff8521726c0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806d58). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_5692656F4C05BA2A580AE9322E9FB0A614__PlistEncoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff8521727b8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806e38). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_5692656F4C05BA2A580AE9322E9FB0A625__PlistReferencingEncoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff8521728b8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c806f28). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10Foundation20_PropertyListDecoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff8521729d8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c807038). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_5692656F4C05BA2A580AE9322E9FB0A614__PlistDecoder is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff852172ac0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c807100). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCE10FoundationCSo7NSTimer14TimerPublisher is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85217f3b8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8071f0). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCE10FoundationCSo16NSOperationQueueP33_0ECEE0A75E2DD5EDFED9A6FEB26D5D3219DelayReadyOperation is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85216ce10) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c804760). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC10FoundationP33_AE6BD10245B422606B9EE93C01570D8F21_CombineRunLoopAction is implemented in both /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (0x7ff85216cd20) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftFoundation.dylib (0x10c8047d8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtC8Dispatch17_DispatchWorkItem is implemented in both /usr/lib/swift/libswiftDispatch.dylib (0x7ff853dc2e18) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftDispatch.dylib (0x10af3e600). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs27_KeyedEncodingContainerBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a7e0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b961a98). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs27_KeyedDecodingContainerBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a998) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b961c50). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs19__EmptyArrayStorage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49430) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952450). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCVVs17__CocoaDictionary5Index7Storage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d28698) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b961e60). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCVs17__CocoaDictionary8Iterator is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d28750) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b961f18). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs22__RawDictionaryStorage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49638) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9524f8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs26__EmptyDictionarySingleton is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d496f8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9525b8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs24__SwiftEmptyNSEnumerator is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d199b8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952680). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs11_AnyKeyPath is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4acc8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b962150). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs20__SwiftNativeNSArray is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49870) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9526f0). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs26_SwiftNativeNSMutableArray is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d498d8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952758). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs25__SwiftNativeNSDictionary is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49940) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9527c0). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs18__SwiftNativeNSSet is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d499b0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952830). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs25__SwiftNativeNSEnumerator is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d19a28) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9528a0). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs32__stdlib_ReturnAutoreleasedDummy is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d19a98) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952910). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCVVs10__CocoaSet5Index7Storage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d28880) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b962298). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCVs10__CocoaSet8Iterator is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4adf0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b962350). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs15__RawSetStorage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49ad0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952978). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs19__EmptySetSingleton is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49b88) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952a30). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs18_StringBreadcrumbs is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4aef8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b962458). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs21__SwiftNativeNSString is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49ce0) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952af0). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs15__StringStorage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49d48) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952b58). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs21__SharedStringStorage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49dc8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952bd8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs41__SwiftNativeNSArrayWithContiguousStorage is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49e58) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952c68). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs20_SwiftNSMutableArray is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49ec8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952cd8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs22__SwiftDeferredNSArray is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49f40) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952d50). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs28__ContiguousArrayStorageBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d49fc8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952dd8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs15__VaListBuilder is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4b088) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9625e8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs18__stdlib_AtomicInt is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d28938) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9626a0). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNSErrorLayoutStandin is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a080) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952e90). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNativeNSError is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a0a8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952eb8). One of the two will be used. Which one is undefined.
objc[9023]: Class _TtCs12_SwiftObject is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a0f8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952f08). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftValue is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a170) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952f80). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNull is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff852d1b280) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952fd0). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNativeNSArrayBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a198) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b952ff8). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNativeNSMutableArrayBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a1e8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b953048). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNativeNSDictionaryBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a238) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b953098). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNativeNSSetBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a288) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b9530e8). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNativeNSStringBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a2d8) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b953138). One of the two will be used. Which one is undefined.
objc[9023]: Class __SwiftNativeNSEnumeratorBase is implemented in both /usr/lib/swift/libswiftCore.dylib (0x7ff853d4a328) and /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib (0x10b953188). One of the two will be used. Which one is undefined.
2022-12-01 17:02:10.729 swift-run[9023:125583] -[Swift.__StringStorage _fastCharacterContents]: unrecognized selector sent to instance 0x600002368840
2022-12-01 17:02:10.739 swift-run[9023:125583] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[Swift.__StringStorage _fastCharacterContents]: unrecognized selector sent to instance 0x600002368840'
*** First throw call stack:
(
0 CoreFoundation 0x00007ff8107b1cb3 __exceptionPreprocess + 242
1 libobjc.A.dylib 0x00007ff81030210a objc_exception_throw + 48
2 CoreFoundation 0x00007ff810848bbe -[NSObject(NSObject) __retain_OA] + 0
3 CoreFoundation 0x00007ff81071cab0 forwarding + 1324
4 CoreFoundation 0x00007ff81071c4f8 _CF_forwarding_prep_0 + 120
5 CoreFoundation 0x00007ff8106ca630 _CFStringGetFileSystemRepresentationWithErrorStatus + 55
6 CoreFoundation 0x00007ff8106ca5eb CFStringGetFileSystemRepresentation + 11
7 Foundation 0x00007ff8114725e4 -[NSFileManager getFileSystemRepresentation:maxLength:withPath:] + 65
8 Foundation 0x00007ff8114a3ca1 +[NSFileAttributes _attributesAtPath:partialReturn:filterResourceFork:error:] + 98
9 swift-run 0x0000000100dd33ef $s8TSCBasic15LocalFileSystem33_DA6B485C7A646531CFCACE37CF46BC6BLLC02isC0ySbAA12AbsolutePathVFTf4nd_n + 303
10 swift-run 0x0000000100b2e8d8 $s8Commands15findPackageRoot33_F429052364556F5CBB375DC04530186ALL8TSCBasic12AbsolutePathVSgyF + 248
11 swift-run 0x0000000100b068c5 $s8Commands9SwiftToolC8toolName5usage8overview4args7seeAlsoACyxGSS_S2SSaySSGSSSgtcfcAA03RunC7OptionsC_Tg5Tf4gggggn_n + 6101
12 swift-run 0x0000000100b1714c $s8Commands12SwiftRunToolC4argsACSaySSG_tcfC + 140
13 swift-run 0x000000010109a1e2 main + 82
14 dyld 0x0000000201a75310 start + 2432
)
libc++abi: terminating with uncaught exception of type NSException
[1] 9023 abort swift run StableDiffusionSample --resource-path mlmodels/Resources/ --seed 9

Misleading benchmarks?

The benchmarks only include inference latency, but the actual latency is much larger. For example, they say it takes 18 seconds on the 32c M1 Max, which I have validated. However, there's an additional 22-second latency before that where it says Sampling.... I pulled it up in Activity Monitor, and here's what's happening:

  • Loading resources and creating pipeline - 2 seconds, because I've already run the model several times
  • Sampling... - 99% CPU, ~0% GPU, which means one CPU core utilized through this entire step (not multi-core), 22 seconds
  • Step 50 of 50 [mean: 0.99, median: 1.56, last 1.55] step/sec - ~0% CPU, 88% GPU, which means the actual model is running, 18 seconds
  • Total time: 40 seconds

Is anyone else getting these wierd results? Is it the same, or much larger than 22 seconds? I don't know whether it's because I used the Swift CLI instead of the Python CLI. I cannot get the Python CLI to work: #43 (comment).

Generated images are constantly blank

What happened?

I downloaded the Model Checkpoints from huggingface and run the inference command but everytime either the output is completely blank or something like the image below

image

Tech details

Chip: Apple M2
Memory: 8GB
OS: 13.1 Beta (22C5059b)
pip list
Package                        Version    Editable project location
------------------------------ ---------- ----------------------------------------------------------
accelerate                     0.15.0
certifi                        2022.9.24
charset-normalizer             2.1.1
coremltools                    6.1
diffusers                      0.9.0
filelock                       3.8.0
huggingface-hub                0.11.1
idna                           3.4
importlib-metadata             5.1.0
mpmath                         1.2.1
numpy                          1.23.5
packaging                      21.3
Pillow                         9.3.0
pip                            22.3.1
protobuf                       3.20.3
psutil                         5.9.4
pyparsing                      3.0.9
python-coreml-stable-diffusion 0.1.0      /...
PyYAML                         6.0
regex                          2022.10.31
requests                       2.28.1
scipy                          1.9.3
setuptools                     65.5.1
sympy                          1.11.1
tokenizers                     0.13.2
torch                          1.12.0
tqdm                           4.64.1
transformers                   4.25.1
typing_extensions              4.4.0
urllib3                        1.26.13
wheel                          0.38.4
zipp                           3.11.0
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i models/coreml-stable-diffusion-v1-4_original_packages -o output --compute-unit CPU_AND_NE --num-inference-steps 50
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 16 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [00:00<00:00, 17467.17it/s]
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from models/coreml-stable-diffusion-v1-4_original_packages
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-v1-4_original_packages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_text_encoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 5.1 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-v1-4_original_packages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 116.1 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-v1-4_original_packages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 7.3 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading safety_checker mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-v1-4_original_packages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_safety_checker.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 2.2 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 51/51 [00:35<00:00,  1.44it/s]
INFO:__main__:Generated image has nsfw concept=False
INFO:__main__:Saving generated image to output/a_photo_of_an_astronaut_riding_a_horse_on_mars/randomSeed_93_computeUnit_CPU_AND_NE_modelVersion_CompVis_stable-diffusion-v1-4.png

--attention-implementation not implemented

Not sure if this is a code bug or I misunderstood the documentation. At any rate, attention-implementation is not a flag, despite the performance table suggesting otherwise.

time python -m python_coreml_stable_diffusion.pipeline --prompt "highly detailed photo of assistant professor getting lost in a data labyrinth with cobras and vipers" -i output-mlpackages -o images --compute-unit ALL --seed 93 --model-version runwayml/stable-diffusion-v1-5 --attention-implementation ORIGINAL
WARNING:coremltools:Torch version 1.13.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
usage: pipeline.py [-h] --prompt PROMPT -i I -o O [--seed SEED] [--model-version MODEL_VERSION]
                   [--compute-unit {ALL,CPU_AND_GPU,CPU_ONLY,CPU_AND_NE}]
                   [--scheduler {DDIM,DPMSolverMultistep,EulerAncestralDiscrete,EulerDiscrete,LMSDiscrete,PNDM}]
                   [--num-inference-steps NUM_INFERENCE_STEPS]
pipeline.py: error: unrecognized arguments: --attention-implementation ORIGINAL
python -m python_coreml_stable_diffusion.pipeline --prompt  -i  -o images  AL  1.79s user 3.92s system 384% cpu 1.483 total

Bug: File name too long when prompt too long

When I use a "mantra" on the Internet, Python will report an error when save a picture because the prompt text is too long:

prompt like this:

bestΒ quality,Amazing,BeautifulΒ goldenΒ eyes,finelyΒ detail,DepthΒ ofΒ field,extremelyΒ detailedΒ CGΒ unityΒ 8kΒ wallpaper,Β masterpiece,(((LongΒ darkΒ blondΒ hair))),((redΒ mediumhair)),(1Β girl)...

same to #48 #45

There appear to be 1 leaked semaphore objects to clean up at shutdown

Can't complete the conversion Models to Core ML

Chip: Apple M2
Memory: 8GB
OS: 13.0.1 (22A400)
pip list
Package                        Version    Editable project location
------------------------------ ---------- ----------------------------------------------------------
accelerate                     0.15.0
certifi                        2022.9.24
charset-normalizer             2.1.1
coremltools                    6.1
diffusers                      0.9.0
filelock                       3.8.0
huggingface-hub                0.11.1
idna                           3.4
importlib-metadata             5.1.0
mpmath                         1.2.1
numpy                          1.23.5
packaging                      21.3
Pillow                         9.3.0
pip                            21.3.1
protobuf                       3.20.3
psutil                         5.9.4
pyparsing                      3.0.9
python-coreml-stable-diffusion 0.1.0      /Users/....
PyYAML                         6.0
regex                          2022.10.31
requests                       2.28.1
scipy                          1.9.3
setuptools                     60.2.0
sympy                          1.11.1
tokenizers                     0.13.2
torch                          1.12.0
tqdm                           4.64.1
transformers                   4.25.1
typing_extensions              4.4.0
urllib3                        1.26.13
wheel                          0.37.1
zipp                           3.11.0

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o packages

!!! macOS 13.1 and newer or iOS/iPadOS 16.2 and newer is required for best performance !!!
INFO:__main__:Initializing StableDiffusionPipeline with CompVis/stable-diffusion-v1-4..
Fetching 16 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [00:00<00:00, 11636.70it/s]
INFO:__main__:Done.
INFO:__main__:Converting vae_decoder
INFO:__main__:`vae_decoder` already exists at packages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_vae_decoder.mlpackage, skipping conversion.
INFO:__main__:Converted vae_decoder
INFO:__main__:Converting unet
INFO:__main__:Attention implementation in effect: AttentionImplementations.SPLIT_EINSUM
INFO:__main__:Sample inputs spec: {'sample': (torch.Size([2, 4, 64, 64]), torch.float32), 'timestep': (torch.Size([2]), torch.float32), 'encoder_hidden_states': (torch.Size([2, 768, 1, 77]), torch.float32)}
INFO:__main__:JIT tracing..
/Users/xxx/xxx/apple/ml-stable-diffusion/venv/lib/python3.9/site-packages/torch/nn/functional.py:2515: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  _verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))
/Users/xxx/xxx/apple/ml-stable-diffusion/python_coreml_stable_diffusion/layer_norm.py:61: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
INFO:__main__:Done.
INFO:__main__:Converting unet to CoreML..
WARNING:coremltools:Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                           | 0/7876 [00:00<?, ? ops/s]WARNING:coremltools:Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 7874/7876 [00:01<00:00, 4105.24 ops/s]
Running MIL Common passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39/39 [00:27<00:00,  1.43 passes/s]
Running MIL FP16ComputePrecision pass: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:44<00:00, 44.50s/ passes]
Running MIL Clean up passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [03:00<00:00, 16.40s/ passes]
zsh: killed     python -m python_coreml_stable_diffusion.torch2coreml --convert-unet    -o
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

no such module 'PackageDescription'

I have Apple Swift version 5.7.2, xcode-select version 2396. and Ventura 13.1

when I run swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path models/coreml-stable-diffusion-v1-4_original_compiled --seed 93 --output-path /Users/jibril/stablediffusion/output I get this error

.../Package.swift:4:8: error: no such module 'PackageDescription'
import PackageDescription

Incompatible Architecture issue when Converting Models to Core ML

I am on an M1 Mac and created the conda environment using

CONDA_SUBDIR=osx-arm64 conda create -n coreml_stable_diffusion python=3.8 -y

Yet when I try to generate Core ML model files by running python -m python_coreml_stable_diffusion.torch2coreml --convert-unet, I run into this issue :

(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

Here's the full error:

ImportError: dlopen(/Users/kush/opt/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/tokenizers/tokenizers.cpython-38-darwin.so, 0x0002): 
tried: '/Users/kush/opt/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/tokenizers/tokenizers.cpython-38-darwin.so' 
(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')),

'/System/Volumes/Preboot/Cryptexes/OS/Users/kush/opt/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/tokenizers/tokenizers.cpython-38-darwin.so' (no such file), 
'/Users/kush/opt/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/tokenizers/tokenizers.cpython-38-darwin.so' 
(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

Support for negative prompts in Swift CLI

As far as I could see, the Swift CLI implementation currently doesn't support negative prompts.

In the Python implementation, the negative prompt argument is tokenised, encoded into uncond_embeddings, and prepended to the text embeddings of the regular prompt:

else:
uncond_tokens = negative_prompt
max_length = text_input_ids.shape[-1]
uncond_input = self.tokenizer(
uncond_tokens,
padding="max_length",
max_length=max_length,
truncation=True,
return_tensors="np",
)
uncond_embeddings = self.text_encoder(
input_ids=uncond_input.input_ids.astype(
np.float32))["last_hidden_state"]
# For classifier free guidance, we need to do two forward passes.
# Here we concatenate the unconditional and text embeddings into a single batch
# to avoid doing two forward passes
text_embeddings = np.concatenate(
[uncond_embeddings, text_embeddings])

The Swift implementation on the other hand does not handle a negative prompt argument:

// Encode the input prompt as well as a blank unconditioned input
let promptEmbedding = try textEncoder.encode(prompt)
let blankEmbedding = try textEncoder.encode("")
// Convert to Unet hidden state representation
let concatEmbedding = MLShapedArray<Float32>(
concatenating: [blankEmbedding, promptEmbedding],
alongAxis: 0
)

I don't really know how Stable Diffusion (v2) deals with negative prompts but how would one add support for them to the Swift implementation? The Core ML model seems to support it after all.

Error reading protobuf spec. validator error: The model supplied is of version 7

I am seeing the following error when I try to run the image generation script. I am using MacOS Monterey Version 12.6.1
and my xcode version is 14.1

My protobuf version is (libprotoc 3.21.9)

(coreml_stable_diffusion) ☁  mint  python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i coreml_stable_diffusion_out_ml_packages_v1 -o astro_horse_mars.jpeg  --compute-unit ALL --seed 93
WARNING:coremltools:Torch version 1.13.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 16 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [00:00<00:00, 17972.38it/s]
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from coreml_stable_diffusion_out_ml_packages_v1
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading coreml_stable_diffusion_out_ml_packages_v1/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_text_encoder.mlpackage
/Users/verapurv/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/coremltools/models/model.py:145: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".
  _warnings. Warn(
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 0.0 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading coreml_stable_diffusion_out_ml_packages_v1/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 0.3 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading coreml_stable_diffusion_out_ml_packages_v1/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 0.0 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading safety_checker mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading coreml_stable_diffusion_out_ml_packages_v1/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_safety_checker.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 0.1 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
Traceback (most recent call last):
  File "/Users/verapurv/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/verapurv/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/verapurv/ProgramFiles/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 534, in <module>
    main(args)
  File "/Users/verapurv/ProgramFiles/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 478, in main
    image = coreml_pipe(
  File "/Users/verapurv/ProgramFiles/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 297, in __call__
    text_embeddings = self._encode_prompt(
  File "/Users/verapurv/ProgramFiles/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 127, in _encode_prompt
    text_embeddings = self.text_encoder(
  File "/Users/verapurv/ProgramFiles/ml-stable-diffusion/python_coreml_stable_diffusion/coreml_model.py", line 79, in __call__
    return self.model.predict(kwargs)
  File "/Users/verapurv/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/coremltools/models/model.py", line 545, in predict
    raise self._framework_error
  File "/Users/verapurv/anaconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/coremltools/models/model.py", line 143, in _get_proxy_and_spec
    return (_MLModelProxy(filename, compute_units.name), specification, None)
RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".

Am I missing anything?

Could not initialize NNPACK! Reason: Unsupported hardware.

Thanks Apple ML team for this!

I got this warning: Could not initialize NNPACK! Reason: Unsupported hardware.

Is this expected?

Torch version 1.13.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
!!! macOS 13.1 and newer or iOS/iPadOS 16.2 and newer is required for best performance !!!
INFO:main:Initializing StableDiffusionPipeline with stabilityai/stable-diffusion-2-base..
Fetching 12 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:00<00:00, 10409.85it/s]
INFO:main:Done.
INFO:main:Converting vae_decoder
[W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.
/usr/local/Caskroom/miniforge/base/envs/coreml_stable_diffusion/lib/python3.8/site-packages/diffusers/models/resnet.py:109: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/usr/local/Caskroom/miniforge/base/envs/coreml_stable_diffusion/lib/python3.8/site-packages/diffusers/models/resnet.py:122: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if hidden_states.shape[0] >= 64:

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect.

Also got this warning:

/Applications/ml-stable-diffusion-main/python_coreml_stable_diffusion/layer_norm.py:61: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

Is this expected or something I should be worried about?

Getting protobuf spec error

RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".

I am on mac m1 pro

image

Is the loading time wait present even if a single Python run generates multiple images?

Q5: Every time I generate an image using the Python pipeline, loading all the Core ML models takes 2-3 minutes. Is this expected?

A5: Yes and using the Swift library reduces this to just a few seconds. The reason is that coremltools loads Core ML models (.mlpackage) and each model is compiled to be run on the requested compute unit during load time. Because of the size and number of operations of the unet model, it takes around 2-3 minutes to compile it for Neural Engine execution. Other models should take at most a few seconds. Note that coremltools does not cache the compiled model for later loads so each load takes equally long. In order to benefit from compilation caching, StableDiffusion Swift package by default relies on compiled Core ML models (.mlmodelc) which will be compiled down for the requested compute unit upon first load but then the cache will be reused on subsequent loads until it is purged due to lack of use.

Is this still the case if a single Python run runs the pipeline multiple times and generates multiple images?

Error when using flag --bundle-resources-for-swift-cli

Forgive me if this is an obvious error but I do not have experience with Swift.

This command ran fine without the --bundle.. and it worked fine with the Python executable but I wanted to try the Swift version and this is the error I have.

xcrun: error: unable to find utility "coremlcompiler", not a developer tool or in PATH

It seems in this instance that path is not being found, where should it be?

I am running via the exact environment setup from the docs (conda).

python -m python_coreml_stable_diffusion.torch2coreml --bundle-resources-for-swift-cli --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o ~/Code/mlpackages/
Torch version 1.13.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
!!! macOS 13.1 and newer or iOS/iPadOS 16.2 and newer is required for best performance !!!
INFO:__main__:Initializing StableDiffusionPipeline with CompVis/stable-diffusion-v1-4..
Fetching 16 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [00:00<00:00, 15473.57it/s]
INFO:__main__:Done.
INFO:__main__:Converting vae_decoder
INFO:__main__:`vae_decoder` already exists at /Users/0x44/Code/mlpackages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_vae_decoder.mlpackage, skipping conversion.
INFO:__main__:Converted vae_decoder
INFO:__main__:Converting unet
INFO:__main__:`unet` already exists at /Users/0x44/Code/mlpackages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_unet.mlpackage, skipping conversion.
INFO:__main__:Converted unet
INFO:__main__:Converting text_encoder
INFO:__main__:`text_encoder` already exists at /Users/0x44/Code/mlpackages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_text_encoder.mlpackage, skipping conversion.
INFO:__main__:Converted text_encoder
INFO:__main__:Converting safety_checker
INFO:__main__:`safety_checker` already exists at /Users/0x44/Code/mlpackages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_safety_checker.mlpackage, skipping conversion.
INFO:__main__:Converted safety_checker
INFO:__main__:Bundling resources for the Swift CLI
INFO:__main__:Compiling /Users/0x44/Code/mlpackages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_text_encoder.mlpackage
xcrun: error: unable to find utility "coremlcompiler", not a developer tool or in PATH
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/coreml_stable_diffusion/lib/python3.8/shutil.py", line 791, in move
    os.rename(src, real_dst)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/0x44/Code/mlpackages/Resources/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_text_encoder.mlmodelc' -> '/Users/0x44/Code/mlpackages/Resources/TextEncoder.mlmodelc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Caskroom/miniconda/base/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/0x44/Code/diffusion/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 926, in <module>
    main(args)
  File "/Users/0x44/Code/diffusion/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 822, in main
    bundle_resources_for_swift_cli(args)
  File "/Users/0x44/Code/diffusion/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 199, in bundle_resources_for_swift_cli
    target_path = _compile_coreml_model(source_path, resources_dir,
  File "/Users/0x44/Code/diffusion/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 175, in _compile_coreml_model
    shutil.move(compiled_output, target_path)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/coreml_stable_diffusion/lib/python3.8/shutil.py", line 811, in move
    copy_function(src, real_dst)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/coreml_stable_diffusion/lib/python3.8/shutil.py", line 435, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/coreml_stable_diffusion/lib/python3.8/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/0x44/Code/mlpackages/Resources/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_text_encoder.mlmodelc'
➜  ml-stable-diffusion git:(main) cat ~/.config/fish/config.fish                                                                                                  (coreml_stable_diffusion) 
if status is-interactive
    # Commands to run in interactive sessions can go here
end

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
eval /opt/homebrew/Caskroom/miniconda/base/bin/conda "shell.fish" "hook" $argv | source
# <<< conda initialize <<<

➜  ml-stable-diffusion git:(main) echo $CONDA_PREFIX                                                                                                              (coreml_stable_diffusion) 
/opt/homebrew/Caskroom/miniconda/base/envs/coreml_stable_diffusion

Package problems running with XCode

I have successfully downloaded 5 mlmodelc files.
But I have been struggling to build using the package. The issues range from Bundle.module not available to symbol not found.

I have XCode 14.1 on M2 MacBook Pro.

Here is what I have tried:

  • Added Resources folder (holding downloaded compiled models) under this directory
  • Changed Package.json like the following:
        .target(
            name: "StableDiffusion",
            dependencies: [],
            path: "swift/StableDiffusion",
            resources: [.copy("Resources")]),
  • Built the package for 'Any iOS Device' (building for simulator gives errors)
  • Created client project and imported my package locally using XCode
  • I see problem ('module' is inaccessible) with the below line:

if let url = Bundle.module.url(forResource: "Unet", withExtension: "mlmodelc")

  • I also get the following linking error building the client project (it built fine without .copy(Resources) change above):
Undefined symbols for architecture arm64:
  "_$s15StableDiffusion0aB8PipelineV11resourcesAt13configuration13disableSafetyAC10Foundation3URLV_So20MLModelConfigurationCSbtKcfC", referenced from:

Some elaboration on client code would be truly a great enabler to make this great work useful, thanks in advance!

Error when running with Swift CLI: no "merges.txt"?

Ran the commands in the readme, checked them twice, they succeeded, but generation doesn't work:
`% swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path mlpackages/ --seed 93 --output-path outputs/

Building for debugging...

Build complete! (0.10s)

Loading resources and creating pipeline

(Note: This can take a while the first time using these resources)

Error: The file β€œmerges.txt” couldn’t be opened because there is no such file.`

Can't use 2.0 after installation is done, I get the following output:

FileNotFoundError: text_encoder CoreML model doesn't exist at /Users/seanfrohman/Documents/AI_MODELS/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_text_encoder.mlpackage

So if I go ahead and install without using --model-version stabilityai/stable-diffusion-2-base

It works fine during install, but then I get the error above when prompting to use the 2.0 service.

If I leave --model-version blank, 1.4 installs into ~/Documents/AI_MODELS and it works with 1.4 no problem.

Anyone have any idea here? I am almost done, other than a few small issues with the prompt.

Convert pre-downloaded models

Many of us already downloaded many models and don't want to waste time and bandwidth downloading them again. I'd like to request an option to point to a folder where models that are already downloaded can be discovered and used instead of downloading them from the interwebs.

Error calling plan_submit in batch processing.

I guess I should have gave up by now, but I have not been able to get it to run on an M1 iPad Pro or M1 Mac. I just updated the Mac to Ventura, reinstalled conda, brew, everything but it fails:

(coreml_stable_diffusion)swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path mlpackages384/Resources --seed 93 --output-path output
Building for debugging...
[59/59] Linking StableDiffusionSample
Build complete! (10.72s)
Loading resources and creating pipeline
(Note: This can take a while the first time using these resources)
Sampling ...
inputLength: 77
inputShape: [1, 77]
inputLength: 77
inputShape: [1, 77]
2022-12-04 18:02:11.377 StableDiffusionSample[28003:235755] Error calling plan_submit in batch processing.
zsh: trace trap swift run StableDiffusionSample --resource-path --seed 93 --output-path

Model Version 7 warning with Xcode 14.1

coremltools/models/model.py:145: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".

I thought Xcode 14.1 was supported? What version of Xcode supports version 7?

xcodebuild -version:
Xcode 14.1
Build version 14B47b

Error calling plan_submit in batch processing.

Hi, I'm see a stable failing with following error:

(coreml_stable_diffusion) ➜  ml-stable-diffusion git:(main) βœ— swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars"  --seed 93 --output-path ./out/
Building for debugging...
[3/3] Linking StableDiffusionSample
Build complete! (0.79s)
Loading resources and creating pipeline
(Note: This can take a while the first time using these resources)
Step 50 of 50  [mean: 0.71, median: 0.72, last 0.73] step/sec
2022-12-04 13:51:27.635 StableDiffusionSample[23277:3591764] Error calling plan_submit in batch processing.
2022-12-04 13:51:27.635 StableDiffusionSample[23277:3591730] -[NSNull featureNames]: unrecognized selector sent to instance 0x20c536980

Any advice?, Thanks.

Error when trying to run coreml-stable-diffusion-v1-4_original_compiled on iPhone 12 Pro

iOS 16.2 (20C5058d)

2022-12-03 19:01:04.690255-0500 Hyperpaint[555:11322] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: failed to load ANE model. Error=createProgramInstanceForModel:modelToken:qos:isPreCompiled:enablePowerSaving:statsMask:memoryPoolID:enableLateLatch:modelIdentityStr:error:: Program load failure (0x20004)
2022-12-03 19:01:04.702382-0500 Hyperpaint[555:11322] [coreml] Error plan build: -1.
2022-12-03 19:01:04.730233-0500 Hyperpaint[555:11322] [client] doUnloadModel:options:qos:error:: nil _ANEModel
2022-12-03 19:01:04.730324-0500 Hyperpaint[555:11322] [espresso] ANECF error:

2 Minutes for Pipeline

It takes 2 minutes to instantiate the pipeline on a M1.

Seems to rule out use for all but the most curious early adapters. I spent an entire day on this because you claimed to create an image in about 30 seconds on an M1. But, it really should have included the fact that it takes 2 minutes for the pipeline to be ready.

Is there anyway to cache the state or something? Otherwise, I guess I'll write my time off as a loss.

coremlcompiler not found by xcrun

When I use the --bundle-resources-for-swift-cli option I get the following error:

xcrun: error: unable to find utility "coremlcompiler", not a developer tool or in PATH

I already tried reinstalling core xcode developer tools. I'm running on a M1 Pro Macbook Pro 14'', 2021

Using --compute-unit CPU_AND_GPU results in a black image

python -m python_coreml_stable_diffusion.pipeline --prompt "highly detailed photo of assistant professor getting lost in a data labyrinth with venomous snakes" -i output-mlpackages -o images --compute-unit ALL --seed 93
produces the attached image.

python -m python_coreml_stable_diffusion.pipeline --prompt "highly detailed photo of assistant professor getting lost in a data labyrinth with venomous snakes" -i output-mlpackages -o images --compute-unit CPU_AND_GPU --seed 93
produces a black image (also attached).

Both runs on M1 Max MacBook with Ventura, and default installation of ml-stable-diffusion.

Model Name:	MacBook Pro
  Model Identifier:	MacBookPro18,2
  Model Number:	Z14X0002GD/A
  Chip:	Apple M1 Max
  Total Number of Cores:	10 (8 performance and 2 efficiency)
  Memory:	32 GB
  System Firmware Version:	8419.41.10
  OS Loader Version:	8419.41.10

Chipset Model:	Apple M1 Max
  Type:	GPU
  Bus:	Built-In
  Total Number of Cores:	32
  Vendor:	Apple (0x106b)
  Metal Support:	Metal 3
![randomSeed_93_computeUnit_ALL_modelVersion_CompVis_stable-diffusion-v1-4](https://user-images.githubusercontent.com/3430712/205510539-7aeccfa0-75bb-491b-a100-176d19ac5731.png)
![randomSeed_93_computeUnit_CPU_AND_GPU_modelVersion_CompVis_stable-diffusion-v1-4](https://user-images.githubusercontent.com/3430712/205510545-b1e08456-cc10-49cd-b43e-3527f2f6d8c3.png)

filename too long running with runwayml/stable-diffusion-v1-5

python -m python_coreml_stable_diffusion.pipeline --prompt $prompt -i output-ml-packages -o ~/Desktop/ --compute-unit ALL --seed 93 --model-version runwayml/stable-diffusion-v1-5

The output :

INFO:__main__:Generated image has nsfw concept=False
Traceback (most recent call last):
  File "/Users/brunoamaral/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/brunoamaral/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/brunoamaral/Labs/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 534, in <module>
    main(args)
  File "/Users/brunoamaral/Labs/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 485, in main
    out_path = get_image_path(args)
  File "/Users/brunoamaral/Labs/ml-stable-diffusion/python_coreml_stable_diffusion/pipeline.py", line 444, in get_image_path
    os.makedirs(out_folder, exist_ok=True)
  File "/Users/brunoamaral/.pyenv/versions/3.10.6/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
OSError: [Errno 63] File name too long: '/Users/brunoamaral/Desktop/${$((_p9k_on_expand()...

Running on MacOS Ventura 13.0.1, M2

Issue crashing when converting model, and more

Running on iMac M1 8 GB, I found this error:

INFO:__main__:Converted vae_decoder
INFO:__main__:Converting unet
INFO:__main__:Attention implementation in effect: AttentionImplementations.SPLIT_EINSUM
INFO:__main__:Sample inputs spec: {'sample': (torch.Size([2, 4, 64, 64]), torch.float32), 'timestep': (torch.Size([2]), torch.float32), 'encoder_hidden_states': (torch.Size([2, 768, 1, 77]), torch.float32)}
INFO:__main__:JIT tracing..
/Users/blendersushi/Documents/CoreMLDiffusion/ml-stable-diffusion-main/python_coreml_stable_diffusion/layer_norm.py:61: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
INFO:__main__:Done.
INFO:__main__:Converting unet to CoreML..
WARNING:coremltools:Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                      | 0/7876 [00:00<?, ? ops/s]WARNING:coremltools:Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 7874/7876 [00:01<00:00, 4933.86 ops/s]
Running MIL Common passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39/39 [00:23<00:00,  1.63 passes/s]
Running MIL FP16ComputePrecision pass: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:40<00:00, 40.70s/ passes]
Running MIL Clean up passes:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                      | 2/11 [00:15<01:10,  7.85s/ passes]zsh: killed     python -m python_coreml_stable_diffusion.torch2coreml --model-version      -o
(coreml_stable_diffusion) blendersushi@192-168-1-102 ml-stable-diffusion-main % /Users/blendersushi/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
(coreml_stable_diffusion) blendersushi@192-168-1-102 ml-stable-diffusion-main % 

Getting this weird error hash

Collecting scipy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/44/8a/bae77e624391b27aeea2d33a02f2ce4a8019f1378ce92faf5780f1521f2e/scipy-1.9.3-cp38-cp38-macosx_12_0_arm64.whl (28.5 MB)
     ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/28.5 MB 29.9 kB/s eta 0:12:49
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    scipy from https://pypi.tuna.tsinghua.edu.cn/packages/44/8a/bae77e624391b27aeea2d33a02f2ce4a8019f1378ce92faf5780f1521f2e/scipy-1.9.3-cp38-cp38-macosx_12_0_arm64.whl#sha256=545c83ffb518094d8c9d83cce216c0c32f8c04aaf28b92cc8283eda0685162d5 (from python-coreml-stable-diffusion==0.1.0):
        Expected sha256 545c83ffb518094d8c9d83cce216c0c32f8c04aaf28b92cc8283eda0685162d5
             Got        fdf5cb99a7ec232c106d287ed30932f6e87f54ff951d5dd48ea79ac6aada9d84

I wonder what causing this error

Conversion to mlmodels error.

I'm trying to convert the models from transformers to mlmodels and I get the below error. Is this because this must be done in linux?

from torch.distributed import ReduceOp
ImportError: cannot import name 'ReduceOp' from 'torch.distributed' (/Users/r2q2/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/torch/distributed/__init__.py)

Crash when running SD1.4

Runs very slow then crashes on iPad Pro M1 on iPadOS 16.1 beta 4

I did add the Increased Memory Limit entitlement and it made it further but still not working

2022-12-04 16:20:26.776921-0500 Hyperpaint[728:15307] Metal API Validation Enabled
Creating pipeline...2022-12-04 16:27:41.052156-0500 Hyperpaint[728:16566] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: failed to load ANE model. Error=createProgramInstanceForModel:modelToken:qos:isPreCompiled:enablePowerSaving:statsMask:memoryPoolID:enableLateLatch:modelIdentityStr:error:: Program load failure (0x20004)
2022-12-04 16:27:41.052245-0500 Hyperpaint[728:16566] [coreml] Error plan build: -1.
2022-12-04 16:27:41.055306-0500 Hyperpaint[728:16566] [client] doUnloadModel:options:qos:error:: nil _ANEModel
2022-12-04 16:27:41.055336-0500 Hyperpaint[728:16566] [espresso] ANECF error:
Created pipeline!Starting...Sampling ...
2022-12-04 16:28:02.932071-0500 Hyperpaint[728:17101] [ServicesDaemonManager] interruptionHandler is called. -[FontServicesDaemonManager connection]_block_invoke
Step 0 of 10 [mean: 0.10, median: 0.10, last 0.10] step/sec

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.