nixified-ai / flake Goto Github PK

View Code? Open in Web Editor NEW

649.0 649.0 69.0 966 KB

A Nix flake for many AI projects

License: GNU Affero General Public License v3.0

Nix 97.47% HTML 0.08% CSS 0.58% PowerShell 1.87%

flake's People

Contributors

Stargazers

Watchers

Forkers

tomberek kranzes garaiza-93 arcnmx ohmymndy 414owen 5shekel davhau rydnr cadkin o7renebro jpetrucciani katanallama prodypanda lucasew trestripes-com qwbarch freesig puffnfresh repos-ai yboettcher 0xkyran zhengyangfeng00 henrik-ch yvieta m-laniakea k2052 requaos darthpjb denismhz alejandrosame anothergroupchat lanathlor javdl airradda daniel-fahey byteshiva 7omb saintdoggie danth silky akr2002 thorstenweber83 cannedmoose ayush5harma eownerdead program-learning sjagoe siryoussef ink-splatters meta-introspector ksenia-portu tobtobxx inayet jaduff dieracdelta someoneserge rcmast3r lboklin jtdoepke hauskens olafkfreund knkski virtual-architect

flake's Issues

RX 470/480/570/580/590 support

👋 Hi. I'm a nixOS user with a radeon RX 590.

I get the error: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!".

This GPU isn't exactly top-of-the-line, but people have managed to run stable diffusion et al on it.

The process, documented here, and a bit here, seems to involve building ROCm with the ROC_ENABLE_PRE_VEGA flag.

Then again, according to this issue other OSes have patched the ROCm packages, so maybe this is an issue for nixpkgs.

Any tips/insights welcome. Does anyone else have this issue?

FEATURE REQUEST: Okada voice changer

Would be great to package the Okada voice changer, as it looks like it uses the same Python goop under the hood as the rest of this stuff.

https://github.com/w-okada/voice-changer

invokeai terminated by signal SIGSEGV

hi! i'm trying to get invokeai running on my setup but i'm running into an address boundary error.

2023-10-11 07:49:52.269848: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/nix/store/6nyknk2dj5kxial6ymksbpgqhcmw2x7c-python3.10-pytorch-lightning-1.9.0/lib/python3.10/site-packages/pytorch_lightning/utilities/distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
* Initializing, be patient...
>> Initialization file /home/muni/invokeai/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.1.post2
>> InvokeAI runtime directory is "/home/muni/invokeai"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type cuda
>> xformers not installed
>> Initializing NSFW checker
fish: Job 1, 'nix run github:nixified-ai/flak…' terminated by signal SIGSEGV (Address boundary error)

i have an AMD RX 7600 GPU (gfx1102). let me know what other information i can provide to help!

Is it possible to access the CLI for invoked-ai via this flake?

When running nix run .#invokeai-nvidia, is it possible to somehow access the CLI?
By the way, thanks for packaging this into a nix flake! This is exactly what I needed

Can't save or load story in KoboldAI

In KoboldAI, I can't save my story. I'm on NixOS. It always gives a "read-only file system" error due to probably trying to save the file into the Nix store rather than somewhere in the home directory (shouldn't it be going somewhere in ~/.koboldai?).

I can save the story as JSON, but this is useless because I get the following error in logs when I try to import it in again:

Exception in thread Thread-29 (_handle_event_internal):
Traceback (most recent call last):
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/socketio/server.py", line 731, in _handle_event_internal
    r = server._trigger_event(data[0], namespace, sid, *data[1:])
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/socketio/server.py", line 756, in _trigger_event
    return self.handlers[namespace][event](*args)
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/flask_socketio/__init__.py", line 282, in _handler
    return self._handle_event(handler, message, namespace, sid,
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/flask_socketio/__init__.py", line 766, in _handle_event
    ret = handler(*args)
  File "/nix/store/513r43w6qkrv1ri5m49dawhjvrp205dw-koboldAi-patchedSrc/aiserver.py", line 466, in g
    return f(*a, **k)
  File "/nix/store/513r43w6qkrv1ri5m49dawhjvrp205dw-koboldAi-patchedSrc/aiserver.py", line 3661, in get_message
    loadfromfile()
  File "/nix/store/513r43w6qkrv1ri5m49dawhjvrp205dw-koboldAi-patchedSrc/aiserver.py", line 6496, in loadfromfile
    loadpath = fileops.getloadpath(vars.savedir, "Select Story File", [("Json", "*.json")])
  File "/nix/store/513r43w6qkrv1ri5m49dawhjvrp205dw-koboldAi-patchedSrc/fileops.py", line 32, in getloadpath
    import tkinter as tk
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/tkinter/__init__.py", line 37, in <module>
    import _tkinter # If this fails your Python may not be configured for Tk
ModuleNotFoundError: No module named '_tkinter'

This is really frustrating, because I have to manually copy in all my world data, story, memory, etc. every single time I start up KoboldAI.

Thanks in advance!

Permission Denied on Fresh Install (InvokeAI)

Hello, and thanks for the neat flake! It really helps streamline the installation process that Invoke AI requires.
However, it seems that I am running into a permissions issue, and I am unsure if it's relevant to this repository but I am making this issue just in case it is.
What tends to happen is:

nix run github:nixified-ai/flake/mc/update-invokeai#invokeai-nvidia -- --web (to use the latest version of Invoke AI)
models.yaml not found, so invokeai-configure runs.
After configuration, models are downloaded
InvokeAI creates configs/ without write permissions
InvokeAI attempts to create models.yaml but doesn't have write permissions
PermissionsError
Here is the output I get:

If you need more info let me know. Thanks!

overriding ports

How can I override the default ports? I tried editing koboldai/nixos/default.nix directly but it continues to launch on 5000. Thanks!

Mention nix installer

There is a short paragraph on setting up WSL2 for Windows users, but if I want to pitch this thing to people running, say Arch Linux, there should also be a sentence like "you need Nix with flake support installed, to get it, head over to https://zero-to-nix.com/start/install"

Update InvokeAI

Invoke has gone through a couple of updates since the last time it was updated here. Would be nice to have a bump to 3.6.0.

Eradicate cudaPackages.cudatoolkit from the closures

The cudaPackages.cudatoolkit attribute in nixpkgs is being deprecated (too slowly). This attribute corresponds to the runfile-based installation of the cuda toolkit, comes in just two huge outputs, which in addition have unreasonably many dependencies (like X, or gstreamer, or python2 even). Nixpkgs' CUDA-accelerated applications are being rewritten to use individual components of the toolkit, like buildInputs = with cudaPackages; [ cuda_cudart.dev cuda_cudart.lib cuda_cudart.static libcublas.dev libcublas.static ] (the example is a bit tedious atm, cc NixOS/nixpkgs#271792), etc. One benefit of doing so, aside from better cache reuse, is that most of the cuda inputs (e.g. libcublas.static, which is huge) are automatically discarded by Nix after the build as long as they're not referenced from the application package's outputs (naturally, the static archives usually aren't) so even if the build requires tens of gigabytes of storage, the runtime closure can as small as 3GIB.

Is Nvidia/CUDA passthru supposed to work on non NixOS Linux hosts?

I this a thing? The Readme mentions that it's supposed to work on WSL, but what about other Linux hosts? I am using Arch Linux.

Any pointers where to look to try to make it work or at least to figure out if it would work or not at all?

nixosModule: models.yaml not found exception

I've tried to install invokeai-nvidia on my server (I'm using flakes):

{ nixified, ... }: {

  imports = [ nixified.nixosModules.invokeai-nvidia ];

  services.invokeai = {
    enable = true;
  };

}

The service always fails on startup:

Nov 27 22:44:05 rtx3060 systemd[1]: Started invokeai.service.
Nov 27 22:44:08 rtx3060 invokeai-web[596092]: The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Nov 27 22:44:08 rtx3060 invokeai-web[596092]: [38B blob data]
Nov 27 22:44:09 rtx3060 invokeai-web[596092]: 2023-11-27 22:44:09.131483361 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1827 CreateInferencePybindStateModule] Init provider bridge failed.
Nov 27 22:44:11 rtx3060 invokeai-web[596092]: [2023-11-27 22:44:11,708]::[InvokeAI]::INFO --> Patchmatch initialized
Nov 27 22:44:11 rtx3060 invokeai-web[596092]: /nix/store/i50149q86mr8adaxplbss5gxj5z3nmkv-python3.11-torchvision-0.15.2/lib/python3.11/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
Nov 27 22:44:11 rtx3060 invokeai-web[596092]:   warnings.warn(
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: An exception has occurred: /var/lib/invokeai/configs/models.yaml not found
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: == STARTUP ABORTED ==
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** One or more necessary files is missing from your InvokeAI root directory **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** Please rerun the configuration script to fix this problem. **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** From the launcher, selection option [7]. **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** From the command line, activate the virtual environment and run "invokeai-configure --yes --skip-sd-weights" **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** (To skip this check completely, add "--ignore_missing_core_models" to your CLI args. Not installing these core models will prevent the loading of some or all .safetensors and .ckpt files. However, you can always come back and install these core models in the future.)
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: Press any key to continue...Traceback (most recent call last):
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/lib/python3.11/site-packages/invokeai/backend/install/check_root.py", line 11, in check_invokeai_root
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     assert config.model_conf_path.exists(), f"{config.model_conf_path} not found"
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: AssertionError: /var/lib/invokeai/configs/models.yaml not found
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: During handling of the above exception, another exception occurred:
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: Traceback (most recent call last):
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/bin/.invokeai-web-wrapped", line 9, in <module>
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     sys.exit(invoke_api())
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:              ^^^^^^^^^^^^
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/lib/python3.11/site-packages/invokeai/app/api_app.py", line 216, in invoke_api
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     check_invokeai_root(app_config)  # note, may exit with an exception if root not set up
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/lib/python3.11/site-packages/invokeai/backend/install/check_root.py", line 40, in check_invokeai_root
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     input("Press any key to continue...")
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: EOFError: EOF when reading a line
Nov 27 22:44:13 rtx3060 systemd[1]: invokeai.service: Main process exited, code=exited, status=1/FAILURE
Nov 27 22:44:13 rtx3060 systemd[1]: invokeai.service: Failed with result 'exit-code'.
Nov 27 22:44:13 rtx3060 systemd[1]: invokeai.service: Consumed 7.511s CPU time, received 9.1K IP traffic, sent 1.2K IP traffic.

I've tried deleting /var/lib/invokeai/- changed nothing.

Is the NixOS module currently meant to be working?
Are there any known workarounds for the issue I'm facing?

Latest version of InvokeAI not working

Latest version of invokeai (7b9730e) not working on my system. Not sure what could be causing this.

invokeai output log:

2023-11-04 22:45:00.154733553 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1827 CreateInferencePybindStateModule] Init provider bridge failed.
/nix/store/ds0qkkilzh7mqawssx7z8dmpgk34v7wm-python3.10-torchvision-0.15.2/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
[2023-11-04 22:45:00,815]::[InvokeAI]::INFO --> Patchmatch initialized
[2023-11-04 22:45:00,926]::[InvokeAI]::INFO --> InvokeAI version 3.3.0post3
[2023-11-04 22:45:00,945]::[InvokeAI]::INFO --> GPU device = cuda NVIDIA GeForce RTX 4080
[2023-11-04 22:45:00,955]::[InvokeAI]::INFO --> Scanning /home/nikoru/invokeai/models for new models
[2023-11-04 22:45:01,112]::[InvokeAI]::INFO --> Scanned 9 files and directories, imported 0 models
[2023-11-04 22:45:01,119]::[InvokeAI]::INFO --> Model manager service initialized
[2023-11-04 22:45:01,119]::[InvokeAI]::INFO --> InvokeAI database location is "/home/nikoru/invokeai/databases/invokeai.db"
Traceback (most recent call last):
  File "/nix/store/bkgmn0gqi97h94fqs588dvxw3l9yk5gs-python3.10-InvokeAI-3.3.0post3/bin/.invokeai-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/bkgmn0gqi97h94fqs588dvxw3l9yk5gs-python3.10-InvokeAI-3.3.0post3/lib/python3.10/site-packages/invokeai/frontend/legacy_launch_invokeai.py", line 18, in main
    invoke_cli()
  File "/nix/store/bkgmn0gqi97h94fqs588dvxw3l9yk5gs-python3.10-InvokeAI-3.3.0post3/lib/python3.10/site-packages/invokeai/app/cli_app.py", line 257, in invoke_cli
    graph_execution_manager = SqliteItemStorage[GraphExecutionState](conn=db_conn, table_name="graph_executions")
  File "/nix/store/pzf6dnxg8gf04xazzjdwarm7s03cbrgz-python3-3.10.12/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
TypeError: SqliteItemStorage.__init__() missing 1 required positional argument: 'lock'

Permission denied for --web without specifying --outdir

Thanks for the flake! The console interface for invokeai works, no errors found of any kind.

However when I want to run the web-UI:
nix run github:nixified-ai/flake#invokeai-nvidia --extra-experimental-features flakes --extra-experimental-features nix-command -- --web

I also get a permission denied error (Similar to #6 ?):
PermissionError: [Errno 13] Permission denied: '../outputs'

I get a permission denied error in the invoke_ai_web_server.py for ../outputs :

I guess it is meant to be ~/invokeai/outputs , and it wants to crate new folders there with mkdirs in line 282 invoke_ai_web_server.py?

Now, only after specifying the outdir argument
--outdir ~/Pictures/invokeai_output

The web UI runs:
nix run github:nixified-ai/flake#invokeai-nvidia --extra-experimental-features flakes --extra-experimental-features nix-command -- --web --outdir ~/Pictures/invokeai_output

Intended default behavior: This should also run without specifying --outdir.

Url in installer for WSL2 wrong

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/nixified-ai/flake/install.ps1'))

The correct URL is https://raw.githubusercontent.com/nixified-ai/flake/master/install.ps1

request: rename project to `nixified-ai`

The name flake is not optimal because:

after cloning the project it will end up in a directory called flake which might collide with other tings and also in not very expressive.
Also after forking the project on github it just ends up as DavHau/flake which has the same problems as described above.

How about just calling the project nixified-ai/nixified-ai ?

License compatible with nixpkgs

Hi,

I was wondering if the AGPL-3.0 license used here is compatible with nixpkgs MIT?

I'm not sure yet but I had been thinking of potentially submitting one or two packages upstream, would this be permitted?

Bitsandbytes dos not support ROCm

While trying to run nix run .#textgen-amd I got the following error:

error:
… while evaluating the attribute 'optionalValue.value'

     at /nix/store/xjviahzwa7x51vl51kc3c1k1n1jmhpd5-source/lib/modules.nix:854:5:

      853|
      854|     optionalValue =
         |     ^
      855|       if isDefined then { value = mergedValue; }

   … while evaluating a branch condition

     at /nix/store/xjviahzwa7x51vl51kc3c1k1n1jmhpd5-source/lib/modules.nix:855:7:

      854|     optionalValue =
      855|       if isDefined then { value = mergedValue; }
         |       ^
      856|       else {};

   (stack trace truncated; use '--show-trace' to show the full trace)

   error:

   text-generation-webui is not supported on AMD yet, as bitsandbytes does not support ROCm.

Also may I ask why not keep both KoboldAI and textgen as a part of the project?

How much of this do you think we can get upstreamed?

Hi there, was just curious if we could get most/all of this upstreamed to nixpkgs.
Haven't had a thorough look around yet, but at least a bunch of the python packages should be able to be upstreamed.
Also you might want to make sure things are working for ROCm 5.7.0. NixOS/nixpkgs#258328 upgraded the ROCm stack.

trouble running out of GPU memory & setting env var

When I try a basic prompt for invokeai-nvidia, this is the result:

Traceback (most recent call last):
  File "/nix/store/h4035a8gz1kqyvza3lpqw77qmrny9hzs-python3.10-InvokeAI-2.3.1.post2/lib/python3.10/site-packages/ldm/generate.py", line 557, in prompt2image
    results = generator.generate(
  File "/nix/store/h4035a8gz1kqyvza3lpqw77qmrny9hzs-python3.10-InvokeAI-2.3.1.post2/lib/python3.10/site-packages/ldm/invoke/generator/base.py", line 115, in generate
    image = make_image(x_T)
  File "/nix/store/h4035a8gz1kqyvza3lpqw77qmrny9hzs-python3.10-InvokeAI-2.3.1.post2/lib/python3.10/site-packages/ldm/invoke/generator/txt2img.py", line 45, in make_image
    pipeline_output = pipeline.image_from_embeddings(
  File "/nix/store/h4035a8gz1kqyvza3lpqw77qmrny9hzs-python3.10-InvokeAI-2.3.1.post2/lib/python3.10/site-packages/ldm/invoke/generator/diffusers_pipeline.py", line 429, in image_from_embeddings
    image = self.decode_latents(result_latents)
  File "/nix/store/h4035a8gz1kqyvza3lpqw77qmrny9hzs-python3.10-InvokeAI-2.3.1.post2/lib/python3.10/site-packages/ldm/invoke/generator/diffusers_pipeline.py", line 758, in decode_latents
    return super().decode_latents(latents)
  File "/nix/store/znwn5m13gh87v74wbvl2gfmyr2hykwqw-python3.10-diffusers-0.14.0/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 426, in decode_latents
    image = self.vae.decode(latents).sample
  File "/nix/store/znwn5m13gh87v74wbvl2gfmyr2hykwqw-python3.10-diffusers-0.14.0/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 185, in decode
    decoded = self._decode(z).sample
  File "/nix/store/znwn5m13gh87v74wbvl2gfmyr2hykwqw-python3.10-diffusers-0.14.0/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 172, in _decode
    dec = self.decoder(z)
  File "/nix/store/k7f999ns4h0v0zb3yjnpka3935pydw2w-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/nix/store/znwn5m13gh87v74wbvl2gfmyr2hykwqw-python3.10-diffusers-0.14.0/lib/python3.10/site-packages/diffusers/models/vae.py", line 188, in forward
    sample = up_block(sample)
  File "/nix/store/k7f999ns4h0v0zb3yjnpka3935pydw2w-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/nix/store/znwn5m13gh87v74wbvl2gfmyr2hykwqw-python3.10-diffusers-0.14.0/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 1949, in forward
    hidden_states = upsampler(hidden_states)
  File "/nix/store/k7f999ns4h0v0zb3yjnpka3935pydw2w-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/nix/store/znwn5m13gh87v74wbvl2gfmyr2hykwqw-python3.10-diffusers-0.14.0/lib/python3.10/site-packages/diffusers/models/resnet.py", line 131, in forward
    hidden_states = F.interpolate(hidden_states, scale_factor=2.0, mode="nearest")
  File "/nix/store/k7f999ns4h0v0zb3yjnpka3935pydw2w-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/nn/functional.py", line 3922, in interpolate
    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 5.80 GiB total capacity; 4.41 GiB already allocated; 251.38 MiB free; 4.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What is the recommended way of setting environment variables? I tried setting them outside of the flake, i.e. PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 nix run .#invokeai-nvidia , but predictably, that violates reproduciblity guarantees and so doesn't work. I'm open to setting this any other way, if there's a working option I can toggle e.g. directly in the GUI or an imperative config file.

Torch is borken in nixpkgs ?

Hi.

While running nix run .#invokeai-nvidia I end up getting a crash.

error: builder for '/nix/store/17w2k9yq7jr1jhfy63wb6l053w7si37p-7ae4d7c0e2dec358b4fe81538efe9da5eb580ec9.patch.drv' failed with exit code 1;
       last 7 log lines:
       >
       > trying https://github.com/pytorch/pytorch/pull/108847/commits/7ae4d7c0e2dec358b4fe81538efe9da5eb580ec9.patch
       >   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
       >                                  Dload  Upload   Total   Spent    Left  Speed
       >   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
       > curl: (22) The requested URL returned error: 406
       > error: cannot download 7ae4d7c0e2dec358b4fe81538efe9da5eb580ec9.patch from any mirror
       For full logs, run 'nix log /nix/store/17w2k9yq7jr1jhfy63wb6l053w7si37p-7ae4d7c0e2dec358b4fe81538efe9da5eb580ec9.patch.drv'.
error (ignored): error: cannot unlink '/tmp/nix-build-linux-headers-5.19.16.drv-3/linux-5.19.16/usr/include/linux': Directory not empty
error: 1 dependencies of derivation '/nix/store/zj1h1xwdp870d27gqsq0zixhq9231k1d-python3.11-torch-2.0.1.drv' failed to build
error (ignored): error: cannot unlink '/tmp/nix-build-perl5.38.0-Module-Build-0.4231.drv-0': Directory not empty
error: 1 dependencies of derivation '/nix/store/4v7gbq225xfwlxnrflra4ds7fifs9cn2-python3.11-accelerate-0.23.0.drv' failed to build
error (ignored): error: cannot unlink '/tmp/nix-build-openssl-3.0.11.drv-1/openssl-3.0.11/crypto': Directory not empty
error: 1 dependencies of derivation '/nix/store/n55js9hsfgx8zd0r3wllgpsr5kpaznqk-python3.11-safetensors-0.3.3.drv' failed to build
error: 1 dependencies of derivation '/nix/store/xdq819mifas9p85aasqm2aaj5r1fsr6v-python3.11-timm-0.9.8.drv' failed to build
error: 1 dependencies of derivation '/nix/store/8rk0r41hwnbf5x2v2z3grzihf8kw5vvw-python3.11-torchvision-0.15.2.drv' failed to build
error: 1 dependencies of derivation '/nix/store/5jmnh9mjk1y5sm8bqp0izssa2v4579qn-python3.11-InvokeAI-3.3.0post3.drv' failed to build

I traced down this pytorch/pytorch#108847 and this NixOS/nixpkgs#249259.
I dont know if there is anything to be done while this is fixed upstream. I tried using a previous build of torch but failed.

Document NixOS system requirements?

Hi, thanks so much for your work! I'm very excited to see this come together.

I'm getting started with this (and stable diffusion in general) and I'm running into some problems, when I run with invokeai-amd I get the following error:

"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

I've gone through the steps in https://nixos.wiki/wiki/AMD_GPU for both the HIP section and OpenCL and I get output for my graphics card when I run rocminfo.

I'm wonder if I need something like https://github.com/nixos-rocm/nixos-rocm to make this work, or what other steps are necessary to get this running on a NixOS system?

Thanks again!

macOS (via home-manager) somewhere out there?

Some of our use cases are macOS based and our flow would benefit greatly from bringing those M-class cores to good use on AI crunch. I realize that those are two separate challenges, but was wondering if you could possibly provide a broad idea on what feasibility and vague schedule range on something like that would look like. "Never going to work" / "Not this year" / "mac yes, M-chip, no" etc.

textgen-nvidia build fails on NixOS-WSL

I updated the flake and now I am getting this error for textgen-nvidia:

error: builder for '/nix/store/900bmg4iknf0yb7r1b3f5xdfarqc9yzy-triton-llvm-14.0.6-f28c006a5895.drv' failed with exit code 1;
       last 10 log lines:
       > In file included from /build/source/llvm/include/llvm/Support/YAMLTraits.h:23,
       >                  from /build/source/llvm/include/llvm/CodeGen/MIRYamlMapping.h:22,
       >                  from /build/source/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h:21,
       >                  from /build/source/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h:18:
       > /build/source/llvm/include/llvm/Support/SourceMgr.h: In member function ‘bool llvm::SMFixIt::operator<(const llvm::SMFixIt&) const’:
       > /build/source/llvm/include/llvm/Support/SourceMgr.h:241: note: ‘-Wmisleading-indentation’ is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
       >   241 |     if (Range.Start.getPointer() != Other.Range.Start.getPointer())
       >       |
       > /build/source/llvm/include/llvm/Support/SourceMgr.h:241: note: adding ‘-flarge-source-files’ will allow for more column-tracking support, at the expense of compilation time and memory
       > ninja: build stopped: subcommand failed.
       For full logs, run 'nix log /nix/store/900bmg4iknf0yb7r1b3f5xdfarqc9yzy-triton-llvm-14.0.6-f28c006a5895.drv'.
error: 1 dependencies of derivation '/nix/store/0xf7hi05hpx45khnbwvrhh1rxc5vc9j2-python3.11-triton-2.0.0.drv' failed to build
error (ignored): error: cannot unlink '/tmp/nix-build-nccl-2.18.5-1.drv-3': Directory not empty
error (ignored): error: cannot unlink '/tmp/nix-build-magma-2.7.2.drv-1': Directory not empty
error: 1 dependencies of derivation '/nix/store/vr6knfixvhazw998iqz207dr99ffhbv7-python3-3.11.5-env.drv' failed to build
error: 1 dependencies of derivation '/nix/store/9wbs0ybrpkc81b0x26wrsdhb7c86iqa2-textgen.drv' failed to build

Open Assistant support

I'd love to see Nixified support for Open Assistant.

I've got it running on NixOS using Docker with the default somewhat useless distilgpt2 model - which seems to be hard coded.

Attempting to build locally (w/ RoCM support for my 6900XT) on NixOS fails because of linking problems with libstdc++.so.6 with the python greenlet library. I'm not skilled enough at nix to figure out how to fix the issues.

Here are the steps to run with Docker + distilgpt2 - this works, but the model is useless:

cd ~/source
git clone https://github.com/LAION-AI/Open-Assistant/
cd Open-Assistant

DOCKER_BUILDKIT=1 docker compose --profile ci --profile inference up --build --attach-dependencies
# https://localhost:3000/ # OpenAssitant UI
# https://localhost:1080/ # Fake email box where you can get link to sign-in with email

Here is what I tried to build on NixOS using their full-dev-setup.sh - doesn't work: the inference workers don't start:

mkdir ~/source &>/dev/null; cd ~/source
git clone https://github.com/LAION-AI/Open-Assistant/ &>/dev/null
cd Open-Assistant
python -m venv .venv --system-site-packages
source .venv/bin/activate
# Use this (https://pytorch.org/get-started/locally/) to get pip install command:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

#
# FIXME: If 'lit' fails to install see https://github.com/pypa/pip/issues/8559
#
#Installing collected packages: mpmath, lit, cmake, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock,  charset-normalizer, certifi, requests, jinja2, pytorch-triton-rocm, torch, torchvision, torchaudio
#  DEPRECATION: lit is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
#  Running setup.py install for lit ... done
#

cd inference
pip install uvicorn -r worker/requirements.txt -r server/requirements.txt -r text-client/requirements.txt
pushd ../oasst-shared && pip install . && popd

MODEL_ID=OA_SFT_Pythia_12Bq bash ./full-dev-setup.sh OA_SFT_Pythia_12Bq

# greenlet python package has an error with libstdc++.so.6 not found
# worker has a problem with .tokenizers ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory

(Open Assistant was also mentioned in #16)

"tensorboardx-2.5.1.drv' failed with exit code 134" on nixos unstable, building invokeai-amd

Here is what I'm getting:

 % nix run github:nixified-ai/flake#invokeai-amd -- --web
do you want to allow configuration setting 'extra-substituters' to be set to 'https://ai.cachix.org' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? n
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'ai.cachix.org-1:N9dzRK+alWwoKXQlnn0H6aUx0lU/mspIoz8hMvGvbbc=' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? y
error: builder for '/nix/store/dq88axzv2415y2har3hzjzypzmp0y5qv-python3.10-tensorboardx-2.5.1.drv' failed with exit code 134;
       last 10 log lines:
       >   File "/nix/store/0vfaxdjhrx9h0nc41csbqp1c323l9k3g-python3.10-pluggy-1.0.0/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
       >   File "/nix/store/wmk96rdcmvpnnynx7hv595v8mzy5gxgl-python3.10-pytest-7.2.0/lib/python3.10/site-packages/_pytest/config/__init__.py", line 167 in main
       >   File "/nix/store/wmk96rdcmvpnnynx7hv595v8mzy5gxgl-python3.10-pytest-7.2.0/lib/python3.10/site-packages/_pytest/config/__init__.py", line 190 in console_main
       >   File "/nix/store/wmk96rdcmvpnnynx7hv595v8mzy5gxgl-python3.10-pytest-7.2.0/lib/python3.10/site-packages/pytest/__main__.py", line 5 in <module>
       >   File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/runpy.py", line 86 in _run_code
       >   File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/runpy.py", line 196 in _run_module_as_main
       >
       > Extension modules: torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _cffi_backend, crc32c, markupsafe._speedups, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, matplotlib._image (total: 28)
       > /nix/store/qddjjyvjyrfi7i0x17b0sh9kbigplmjc-pytest-check-hook/nix-support/setup-hook: line 53: 796670 Aborted                 (core dumped) /nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/bin/python3.10 -m pytest -k "not test_TorchVis and not test_onnx_graph" --ignore="tests/test_lint.py"
       > /nix/store/c3f4jdwzn8fm9lp72m91ffw524bakp6v-stdenv-linux/setup: line 1593: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/dq88axzv2415y2har3hzjzypzmp0y5qv-python3.10-tensorboardx-2.5.1.drv'.
error: 1 dependencies of derivation '/nix/store/m6l2wjzhrgy2m3bclx6fv0x47xj5dyl7-python3.10-pytorch-lightning-1.9.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/nyxg11ps997sxvhcc6by1mbww4hkisr1-python3.10-InvokeAI-2.3.1.post2.drv' failed to build

I am one of these "have zero clue about the packaging, let me just run it" kind of people :) Hope this report helps with reaching the goal of making AI tooling more reproducible. I'll be here if you need more info and can test stuff from time to time to check fixes.

I've re-run this 3 times, with the same result every time.

api errors with "Could not find the character "Alpaca" error

~~> nix run .#textgen-nvidia -- --api --auto-launch
Running on local URL:  http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
2023-12-12 23:44:37 ERROR:Could not find the character "Alpaca" inside instruction-templates/. No character has been loaded.
Traceback (most recent call last):
  File "/nix/store/7hpffz24mjm12y5ymd2is43lxl7nf27b-python3-3.11.6-env/lib/python3.11/site-packages/gradio/routes.py", line 414, in run_predict
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/7hpffz24mjm12y5ymd2is43lxl7nf27b-python3-3.11.6-env/lib/python3.11/site-packages/gradio/blocks.py", line 1323, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/7hpffz24mjm12y5ymd2is43lxl7nf27b-python3-3.11.6-env/lib/python3.11/site-packages/gradio/blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/7hpffz24mjm12y5ymd2is43lxl7nf27b-python3-3.11.6-env/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/7hpffz24mjm12y5ymd2is43lxl7nf27b-python3-3.11.6-env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/nix/store/7hpffz24mjm12y5ymd2is43lxl7nf27b-python3-3.11.6-env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 833, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/1zzqmn5cl5c8dcbv37xp8xvvii892015-textgen-patchedSrc/modules/chat.py", line 561, in load_character
    raise ValueError
ValueError

I'm using 63339e4.

Possibly related:

error whil fetching input github:invoke-ai

command:

nix flake show github:nixified-ai/flake --show-trace

output:

...
… while fetching the input 'github:invoke-ai/InvokeAI/650f4bb58ceca458bff1410f35cd6d6caad399c6'

       error: failed to extract archive (truncated gzip input)

CUDA detection failed for text-generation-webui

First off, thanks for the great project.

I'm trying to run the text-generation-webui project, and hitting a CUDA error. This is on an older card, a 1080 TI, so I possibly just need to upgrade it? Here's what I'm trying to do and the error:

$ nix --extra-experimental-features nix-command --extra-experimental-features flakes run .#textgen-nvidia
warning: Using saved setting for 'extra-substituters = https://ai.cachix.org' from ~/.local/share/nix/trusted-settings.json.
warning: Using saved setting for 'extra-trusted-public-keys = ai.cachix.org-1:N9dzRK+alWwoKXQlnn0H6aUx0lU/mspIoz8hMvGvbbc=' from ~/.local/share/nix/trusted-settings.json.
False
/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /run/opengl-driver/lib:/nix/store/izaqzlav74p7q86bjwk8201wab14q4cs-cudatoolkit-11.8.0/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/home/me/.nix-profile/etc/xdg'), PosixPath('/nix/var/nix/profiles/default/etc/xdg')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/share/pixmaps'), PosixPath('/home/me/.nix-profile/share/pixmaps'), PosixPath('/nix/var/nix/profiles/default/share/icons'), PosixPath('/home/me/.nix-profile/share/icons'), PosixPath('/home/me/.icons')}
The following directories listed in your path were found to be non-existent: {PosixPath('nixpkgs=/nix/var/nix/profiles/per-user/root/channels/nixos'), PosixPath('nixos-config=/etc/nixos/configuration.nix')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/profiles/per-user/me/info'), PosixPath('/home/me/.nix-profile/share/info'), PosixPath('/home/me/.nix-profile/info'), PosixPath('/nix/var/nix/profiles/default/share/info'), PosixPath('/run/current-system/sw/info'), PosixPath('/nix/var/nix/profiles/default/info')}
The following directories listed in your path were found to be non-existent: {PosixPath('/home/me/.nix-profile/lib/gtk-2.0'), PosixPath('/nix/var/nix/profiles/default/lib/gtk-2.0'), PosixPath('/home/me/.nix-profile/lib/gtk-4.0'), PosixPath('/nix/var/nix/profiles/default/lib/gtk-4.0'), PosixPath('/home/me/.nix-profile/lib/gtk-3.0'), PosixPath('/nix/var/nix/profiles/default/lib/gtk-3.0')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/lib/mozilla/plugins'), PosixPath('/etc/profiles/per-user/me/lib/mozilla/plugins'), PosixPath('/home/me/.nix-profile/lib/mozilla/plugins'), PosixPath('/run/current-system/sw/lib/mozilla/plugins')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/share/terminfo'), PosixPath('/home/me/.nix-profile/share/terminfo')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/lib/mozilla/plugins'), PosixPath('/etc/profiles/per-user/me/lib/mozilla/plugins'), PosixPath('/home/me/.nix-profile/lib/mozilla/plugins'), PosixPath('/run/current-system/sw/lib/mozilla/plugins')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default'), PosixPath('/home/me/.nix-profile')}
The following directories listed in your path were found to be non-existent: {PosixPath('/home/me/.nix-profile/lib/libexec'), PosixPath('/nix/var/nix/profiles/default/lib/libexec'), PosixPath('/run/current-system/sw/lib/libexec'), PosixPath('/etc/profiles/per-user/me/lib/libexec')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/profiles/per-user/me/lib/kde4/plugins'), PosixPath('/run/current-system/sw/lib/kde4/plugins'), PosixPath('/nix/var/nix/profiles/default/lib/qt4/plugins'), PosixPath('/etc/profiles/per-user/me/lib/qt4/plugins'), PosixPath('/run/current-system/sw/lib/qt4/plugins'), PosixPath('/home/me/.nix-profile/lib/kde4/plugins'), PosixPath('/nix/var/nix/profiles/default/lib/kde4/plugins'), PosixPath('/home/me/.nix-profile/lib/qt4/plugins')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so.11.0'), PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
DEBUG: Possible options found for libcudart.so: {PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so.11.0'), PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  warn(msg)
CUDA SETUP: Required library version not found: libbitsandbytes_cuda118_nocublaslt.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/server.py", line 30, in <module>
    from modules import (
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/modules/chat.py", line 18, in <module>
    from modules.text_generation import (
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/modules/text_generation.py", line 24, in <module>
    from modules.models import clear_torch_cache, local_rank
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/modules/models.py", line 10, in <module>
    from accelerate import infer_auto_device_map, init_empty_weights
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/accelerator.py", line 35, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 131, in <module>
    from .bnb import has_4bit_bnb_layers, load_and_quantize_model
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/utils/bnb.py", line 42, in <module>
    import bitsandbytes as bnb
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Here's the output from nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06    Driver Version: 520.56.06    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   28C    P8    10W / 280W |    546MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

If I run the bitsandbytes commands myself, the package is built and installs just fine, so I'm not sure why it's failing as part of the larger build.

cp: not replacing '/var/lib/invokeai/configs/models.yaml'

When I set up the service I get this error on startup. Perhaps an upstream issue?

invoke-ai crashes during installation on Ubuntu 22.04.2 LTS

[18:03:52] ~ $ nix run "github:nixified-ai/flake#invokeai-amd" -- --web
warning: ignoring untrusted substituter 'https://hydra.iohk.io'
warning: ignoring untrusted substituter 'https://iohk.cachix.org'
warning: ignoring untrusted substituter 'https://nixbld.m-labs.hk'
warning: ignoring untrusted substituter 'https://unblob.cachix.org'
warning: ignoring untrusted substituter 'https://hydra.iohk.io'
warning: ignoring untrusted substituter 'https://iohk.cachix.org'
warning: ignoring untrusted substituter 'https://nixbld.m-labs.hk'
warning: ignoring untrusted substituter 'https://unblob.cachix.org'
warning: ignoring untrusted substituter 'https://ai.cachix.org'
[1/34/61 built, 477 copied (4899.6/4901.4 MiB), 933.7 MiB DL] building torch-1.13.1+rocm5.1.1-cp310-cp310-linux_x86_64.whl:                                  Dload  Upload   Total   Spent    Left  Speed
[1/34/61 built, 477 copied (4899.6/4901.4 MiB), 933.7 MiB DL] building torch-1.13.1+rocm5.1.1-cp310-cp310-linux_x86_64.whl:                                  Dload  Upload   Total   Spent    Left  Speed
[1/34/61 built, 477 copied (4899.6/4901.4 MiB), 933.7 MiB DL] building torch-1.13.1+rocm5.1.1-cp310-cp310-linux_x86_64.whl:                                  Dload  Upload   Total   Spent    Left  Speed
2023-04-03 18:33:55.429880: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

Aborted (core dumped)

Invokeai-amd failed to build/run

My Issue

I ran nix run github:nixified-ai/flake#invokeai-amd -- --web and got:

warning: Using saved setting for 'extra-substituters = https://ai.cachix.org' from ~/.local/share/nix/trusted-settings.json.
warning: Using saved setting for 'extra-trusted-public-keys = ai.cachix.org-1:N9dzRK+alWwoKXQlnn0H6aUx0lU/mspIoz8hMvGvbbc=' from ~/.local/share/nix/trusted-settings.json.
warning: ignoring untrusted substituter 'https://ai.cachix.org'
2023-04-12 08:35:25.584940: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-12 08:35:25.659855: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)

My system (output from `neofetch --stdout`)

OS: NixOS 22.11.2483.49efda9011e (Raccoon) x86_64 
Host: ASRock X670E Taichi Carrara 
Kernel: 6.1.10 
Uptime: 1 hour, 6 mins 
Packages: 1448 (nix-system), 3309 (nix-user) 
Shell: bash 5.1.16 
Resolution: 2560x1440, 2560x1440 
DE: Plasma 
WM: KWin 
Icons: breeze-dark [GTK2/3] 
Terminal: .konsole-wrappe 
CPU: AMD Ryzen 9 7950X (32) @ 4.500GHz 
GPU: AMD ATI Raphael 
GPU: AMD ATI Radeon RX 6600/6600 XT/6600M 
Memory: 6610MiB / 31216MiB

Support Axolotl

I would love to see axolotl added. Would throw money at it in fact.

Error: unsupported locale setting

During initial setup with nix run .#invokeai-nvidia on a non-nixos machine i got Error: unsupported locale setting thrown from nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/locale.py:620.

The fix is as easy as export LC_ALL=C, but it is not something i want to add to my bash profile.
I've run into something like this before, and my fix at the time was to include more locales in the runtime closure.

Additional context

$ env | grep -e ^LC_ -e ^LOCAL -e ^LANG | sort
LANG=en_US.UTF-8
LC_ADDRESS=nb_NO.UTF-8
LC_IDENTIFICATION=nb_NO.UTF-8
LC_MEASUREMENT=nb_NO.UTF-8
LC_MONETARY=nb_NO.UTF-8
LC_NAME=nb_NO.UTF-8
LC_NUMERIC=nb_NO.UTF-8
LC_PAPER=nb_NO.UTF-8
LC_TELEPHONE=nb_NO.UTF-8
LC_TIME=nb_NO.UTF-8

The setup worked fine on my nixos machine, which also uses a norwegian locale.

can there be a new target "vulkan" be added?

I have an intel arc 750 and would like to use the ai via vulkan api.
PyTourch should support it but you have to compile it with a feature flag manually enabled.

projects like realesrgan-ncnn-vulkan work nice so it should be doable?

Support for InstantNGP

InstantNGP [1] is a toolkit to train Neural Radiance Fields (NERF).

Basically video/set of images in, 3D model out.

I started packaging it here [2] and got it working along with colmap. It needs some tweaking to work with this repo.

I don't know yet if it can be patched to work with HIP. Its not based in Pytorch and the dependencies are vendored so self contained. The derivation already exposes the generated dynamic lib as a Python package and the CUDA target is old enough to work with the Tesla K80 from GCP.

This build is for the headless mode. More work is needed for the GUI to work. I used it headless because I was using it in a headless instance.

[1] https://github.com/NVlabs/instant-ngp
[2] https://github.com/lucasew/nixcfg/blob/a8aa81d09c3f68a9d330e838c71db594b063fc80/nix/pkgs/instantngp.nix#LL1

Can't load models: got an unexpected keyword argument 'n_threads_batch'

I think models are just not loading for me period. Is there some known working model I should be able to use? I tried:

TheBloke/orca_mini_v3_7B-GGUF

It downloaded fine but I get this error when trying to load with llama.cpp loader:

TypeError: Llama.__init__() got an unexpected keyword argument 'n_threads_batch'

out of date projects

Hi! The projects for both KoboldAI and InvokeAI are running out of date. The currently InvokeAI rev is covered here, but 3.0.0 pre is staging and a cadence for updates would be useful (#40 ). KoboldAI seems to have forked with this fork being the most active: henk717/KoboldAI. Happy to help, but I'm a bit of a Nix newbie and am not quite sure how the pixnixify step is managed.

Support for LLaMa and Alpaca

This project is awesome and I'm so glad it exists.

However I'm wonder about support dalai or text-generation-web-ui for running LLaMA and Alpaca. These models are much more suitable to be run on home computers and I believe would make an excellent addition to nixified.ai :)

I'd be open to helping as well, though my Nix-fu is not quite as powerful as yours.

How to add xformers package?

Where would I have to add the xformers package?
I’ve read in the documentation that it could help a lot with performance in NVIDIA cards.

Cannot create characters in textgen.

When running textgen-nvidia as of 7b9730e, and attempting to add a new character via Parameters -> Character:

FileNotFoundError: [Errno 2] No such file or directory: '/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/characters/TestCharacter.yaml'

Where TestCharacter is the name of the character created through the UI wizard.

Needless to say, it probably is not supposed to be trying to write to the nix store, so, bug? :P

Towards bundled GPU drivers

Problem Statement

Right now, we're not bundling userspace GPU driver libraries (Mesa, NVIDIA libraries). Instead, we rely on the correct drivers being available at /run/opengl-driver/lib, as they are on NixOS. This creates a problematic scenario in which the driver libraries are linked against incompatible versions of the same libraries that we're using as well. Chiefly among them, glibc (#34).

In order to be forwards- and backwards-compatible with anything and everything, our only real choice is to bundle those libraries as well. Bundled GPU driver libraries need to be compatible with the kernel they're running on. To my knowledge, this doesn't really matter for Mesa because the kernel APIs for in-tree graphics drivers are fairly stable. Not so much with NVIDIA. We need a way to dynamically detect at runtime which version of the NVIDIA kernel driver is in use and then realise the accompanying userspace driver libraries.

Ideas

Figure out how Flatpak does it and see if their way is reliable enough for us
Use https://github.com/guibou/nixGL

Social Links?

Is there anywhere in particular where people wishing to contribute can chat about this project? I want to help package a few things into this flake here but it would be nice if there was a chatroom to discuss details about the project.

Can't Unload Models (max_loaded_models option isn't working)

Issue

While switching the models, the "VRAM in use" keeps increasing to a point where the application crashes with message "CUDA out of memory"

Potential Solution

The invoke.ai application seems to have a "--max_loaded_models 1" option, which isn't working when launched through nixified-ai:

nix flakes run github:nixified-ai/flake#invokeai-nvidia -- --free_gpu_mem 1 --max_loaded_models 1
Unknown args: ['--max_loaded_models', '1']

Steps to Reproduce

Geneate an image, notice the "VRAM in use: x"
Generate another image using the same model, notice "VRAM in use:" is still x
Switch to another model, notice increase in "VRAM in use: y"
Generate another image using the same model, notice "VRAM in use" is still y
Switch to another model, notice increase in "VRAM in use: z"
Switch to another model, application crashes with "CUDA out of memory"

Clarification/Request

How to ensure VRAM doesn't keep increasing when switching models?

What is best solution of MLops/dataflow in the Nix land.

It looks like we did the same thing.

running your AI/data jobs with Orchestrates: (https://github.com/GTrunSec/dataflow2nix/tree/main/nix)
deal with a complex workflow then publish them?: https://github.com/GTrunSec/workflow-template

; nix run github:nixified-ai/flake#koboldai-nvidia                                                                                                                                                                                                         # ❌1 took 2s zsh 
warning: binary cache 'http://127.0.0.1:12304' is for Nix stores with prefix 'Nix::Store::getStoreDir', not '/nix/store'
Traceback (most recent call last):
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1076, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/onnx/config.py", line 35, in <module>
    from PIL import Image
  File "/nix/store/ndcb708sh41sjbszxj27a78jl6d2amcz-python3-3.10.11-env/lib/python3.10/site-packages/PIL/Image.py", line 103, in <module>
    from . import _imaging as core
ImportError: /nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libc.so.6: version `GLIBC_2.36' not found (required by /nix/store/0d4xl0xk1g0w41yqyd64jvzbip5lhfig-libXdmcp-1.1.3/lib/libXdmcp.so.6)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1076, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 47, in <module>
    from .configuration_gpt2 import GPT2Config
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/models/gpt2/configuration_gpt2.py", line 23, in <module>
    from ...onnx import OnnxConfigWithPast, PatchingSpec
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1066, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1078, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.onnx.config because of the following error (look up to see its traceback):
/nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libc.so.6: version `GLIBC_2.36' not found (required by /nix/store/0d4xl0xk1g0w41yqyd64jvzbip5lhfig-libXdmcp-1.1.3/lib/libXdmcp.so.6)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nix/store/513r43w6qkrv1ri5m49dawhjvrp205dw-koboldAi-patchedSrc/aiserver.py", line 62, in <module>
    from transformers import StoppingCriteria, GPT2Tokenizer, GPT2LMHeadModel, GPTNeoForCausalLM, GPTNeoModel, AutoModelForCausalLM, AutoTokenizer, PreTrainedModel, modeling_utils
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1067, in __getattr__
    value = getattr(module, name)
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1066, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/nix/store/blsb0ajywpv3ahbzxwaf2d23r77cxb5n-python3-3.10.10-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1078, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.gpt2.modeling_gpt2 because of the following error (look up to see its traceback):
Failed to import transformers.onnx.config because of the following error (look up to see its traceback):
/nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libc.so.6: version `GLIBC_2.36' not found (required by /nix/store/0d4xl0xk1g0w41yqyd64jvzbip5lhfig-libXdmcp-1.1.3/lib/libXdmcp.so.6)