CUDA error: invalid argument,about d8ahazard/sd_dreambooth_extension

Comments (18)

0xItx commented on August 16, 2024 3

Enabling fp16 solved this for me as well (even with my base model being fp32!) 👍

from sd_dreambooth_extension.

vshulman commented on August 16, 2024 1

Can confirm the same error and reverting to 6f0ab90 resolves the issue, although it does now fail during image generation or saving checkpoint (Which i assume were resolved post 6f0ab90 )

from sd_dreambooth_extension.

Lukium commented on August 16, 2024 1

Also posted here as it appears to be the same issue: #115

UNRELATED EDIT BUT STILL AN ISSUE: It claims to save the weights to .\models\dreambooth\[trainedmodelname] but it actually saves to .\models instead

What follows is a wall of text showing the full install process to a successful install from scratch. Note, that enabling --xformers will kill the install and lead to the error mentioned by the OP (The reason why I'm posting this here after hours of getting the same error and trying to troubleshoot it), which leads me to think the following:

One: there are steps in the install that are happening at the wrong time/order or not waiting for a needed step to complete.
Two: there seems to be some odd permission issues happening, which if my memory serves me right (cv2 being related to xformers), the fact that one of the permissions being thrown are related to cv2, this is likely the reason why --xformers is breaking the install.

Steps:

I created a new conda environment called sd-dreambooth and activated it.
I cloned the current Auto1111's GIT (the same exact thing will happen if you clone d8ahazzard's so this doesn't seem to be relevant at all)
UI launches fine, I go to Extensions and tell it to install Dreambooth, which works with a few errors throughout:

stderr: Traceback (most recent call last): File "G:\stable-diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\install.py", line 13, in <module> run(f'"{sys.executable}" -m pip install -r "{req_file}"', f"Checking {name} requirements.", f"Couldn't install {name} requirements.") File "G:\stable-diffusion\stable-diffusion-webui\launch.py", line 34, in run raise RuntimeError(message) RuntimeError: Couldn't install Dreambooth requirements.. Command: "C:\Users\Lukium\.conda\envs\sd-dreambooth\python.exe" -m pip install -r "G:\stable-diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\requirements.txt" Error code: 1

AND

stderr: Running command git clone --filter=blob:none --quiet https://github.com/ShivamShrirao/diffusers.git 'C:\Users\Lukium\AppData\Local\Temp\pip-install-jn6bgho7\diffusers_7d00acc10b3847a7a70237c1935c7415' ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\Lukium\\.conda\\envs\\sd-dreambooth\\Lib\\site-packages\\cv2\\cv2.pyd' Consider using the --user option or check the permissions.

I terminate it with CTRL+C and re-run it again. Get a few more errors but it gets further:

stderr: Traceback (most recent call last): File "G:\stable-diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\install.py", line 16, in <module> run(f'"{sys.executable}" -m {torch_cmd}', "Checking torch and torchvision versions", "Couldn't install torch") File "G:\stable-diffusion\stable-diffusion-webui\launch.py", line 34, in run raise RuntimeError(message) RuntimeError: Couldn't install torch. Command: "C:\Users\Lukium\.conda\envs\sd-dreambooth\python.exe" -m pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 Error code: 1

AND

stderr: ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\Lukium\\.conda\\envs\\sd-dreambooth\\Lib\\site-packages\\~orch\\lib\\asmjit.dll' Consider using the --user option or check the permissions.

AND

Launching Web UI with arguments: Traceback (most recent call last): File "G:\stable-diffusion\stable-diffusion-webui\launch.py", line 256, in <module> start() File "G:\stable-diffusion\stable-diffusion-webui\launch.py", line 247, in start import webui File "G:\stable-diffusion\stable-diffusion-webui\webui.py", line 13, in <module> from modules import devices, sd_samplers, upscaler, extensions, localization File "G:\stable-diffusion\stable-diffusion-webui\modules\sd_samplers.py", line 8, in <module> import k_diffusion.sampling File "G:\stable-diffusion\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\__init__.py", line 1, in <module> from . import augmentation, config, evaluation, external, gns, layers, models, sampling, utils File "G:\stable-diffusion\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\config.py", line 8, in <module> from . import augmentation, layers, models, utils File "G:\stable-diffusion\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\layers.py", line 8, in <module> from . import utils File "G:\stable-diffusion\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\utils.py", line 13, in <module> from torchvision.transforms import functional as TF File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\site-packages\torchvision\__init__.py", line 5, in <module> from torchvision import datasets File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\site-packages\torchvision\datasets\__init__.py", line 1, in <module> from ._optical_flow import KittiFlow, Sintel, FlyingChairs, FlyingThings3D, HD1K File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\site-packages\torchvision\datasets\_optical_flow.py", line 12, in <module> from ..io.image import _read_png_16 File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\site-packages\torchvision\io\__init__.py", line 8, in <module> from ._load_gpu_decoder import _HAS_GPU_VIDEO_DECODER File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\site-packages\torchvision\io\_load_gpu_decoder.py", line 1, in <module> from ..extension import _load_library File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\site-packages\torchvision\extension.py", line 93, in <module> _check_cuda_version() File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\site-packages\torchvision\extension.py", line 66, in _check_cuda_version raise RuntimeError( RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.6 and torchvision has CUDA Version=11.3. Please reinstall the torchvision that matches your PyTorch install. Press any key to continue . . .

Then just re-launch it one more time (We're almost there). Only get one error:

stderr: Traceback (most recent call last): File "G:\stable-diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\install.py", line 47, in <module> shutil.copy(fullfile, site_dir) File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\shutil.py", line 417, in copy copyfile(src, dst, follow_symlinks=follow_symlinks) File "C:\Users\Lukium\.conda\envs\sd-dreambooth\lib\shutil.py", line 256, in copyfile with open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: 'G:\\stable-diffusion\\stable-diffusion-webui\\venv\\Lib\\site-packages\\bitsandbytes'

I'm already familiar with this one, which has to do with the bitsandbytes implementation using absolute path, so I just symlink it from the conda env to the place where it wants it to be. So Now I run it one more time and Voila, everything is now operational:

`(sd-dreambooth) G:\stable-diffusion\stable-diffusion-webui>webui-user.bat
Python 3.10.6 | packaged by conda-forge | (main, Oct 24 2022, 16:02:16) [MSC v.1916 64 bit (AMD64)]
Commit hash: 98947d173e3f1667eba29c904f681047dea9de90
Installing requirements for Web UI
loading Dreambooth reqs from G:\stable-diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\requirements.txt
Checking Dreambooth requirements.
Checking torch and torchvision versions
Dreambooth revision is d15cbc9
[!] Not using xformers memory efficient attention.
Diffusers version is 0.8.0.dev0.
Torch version is 1.12.1+cu116.
Torch vision version is 0.13.1+cu116.
Copying 8Bit Adam files for Windows.

Launching Web UI with arguments:
Preloading Dreambooth!
[!] Not using xformers memory efficient attention.
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Using VAE found beside selected model
Loading weights [81761151] from G:\stable-diffusion\stable-diffusion-webui\models\Stable-diffusion\sd-15\sd-v1-5.ckpt
Global Step: 840000
Loading VAE weights from: G:\stable-diffusion\stable-diffusion-webui\models\Stable-diffusion\sd-15\sd-v1-5.vae.pt
Applying cross attention optimization (Doggettx).
Model loaded.
Loaded a total of 0 textual inversion embeddings.
Embeddings:
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Starting Dreambooth training...
VRAM cleared.
Allocated: 0.0GB
Reserved: 0.0GB

Loaded model.
Allocated: 0.0GB
Reserved: 0.0GB

Ignoring non-image file: dataset\Thumbs.db
Caching latents: 100%|███████████████████████████████████████████████████████████████████| 7/7 [00:29<00:00, 4.14s/it]
Scheduler Loaded
Allocated: 0.0GB
Reserved: 0.0GB

Steps: 50%|█████████████████████████ | 500/1000 [03:20<03:17, 2.54it/s, loss=0.0728, lr=5e-6]Successfully trained model for a total of 2000 steps, converting to ckpt.
[] Weights saved at G:\stable-diffusion\stable-diffusion-webui\models\dreambooth\SD15CustomMK1
Steps: 100%|█████████████████████████████████████████████████| 1000/1000 [06:52<00:00, 2.58it/s, loss=0.0653, lr=5e-6]Successfully trained model for a total of 2500 steps, converting to ckpt.
[] Weights saved at G:\stable-diffusion\stable-diffusion-webui\models\dreambooth\SD15CustomMK1
Training complete??
CLEANUP:
Allocated: 11.0GB
Reserved: 17.7GB

Cleanup Complete.
Allocated: 11.0GB
Reserved: 16.6GB

Steps: 100%|█████████████████████████████████████████████████| 1000/1000 [07:07<00:00, 2.34it/s, loss=0.0653, lr=5e-6]
Training completed, reloading SD Model.
Allocated: 0.0GB
Reserved: 0.0GB

Memory output: {'VRAM cleared.': '0.0/0.0GB', 'Training completed, reloading SD Model.': '0.0/0.0GB'}
Re-applying optimizations...
Returning result: Training finished. Total lifetime steps: 2500`

from sd_dreambooth_extension.

morphicschris commented on August 16, 2024

Running on an RTX 3080 10GB.

from sd_dreambooth_extension.

seihoukei commented on August 16, 2024

Same issue, 3090 Ti 24GB.

Arguments: ('scampussv4', 'd:\stablediffusions\scampussdata', '', 'digital painting of a scampuss', '', 'scampuss', '', 4.0, 8, 40.0, 0, 512, False, True, 1, 1, 1, 4300, 1, False, 1e-06, False, 'constant', 0, False, 0.9, 0.999, 0.01, 1e-08, 1, 500, 500, 'no', False, '', False, True, False) {}

from sd_dreambooth_extension.

shogo3209 commented on August 16, 2024

Same issue 3080TI 12GB
Arguments: ('ckpt_make', '', 'girl', 'girl', '', '', 1.0, 7.5, 40.0, 0, 512, False, True, 1, 1, 1, 1000, 1, True, 5e-06, False, 'constant', 0, False, 0.9, 0.999, 0.01, 1e-08, 1, 500, 500, 'no', True, '', False, True, True) {}

from sd_dreambooth_extension.

cstone1991 commented on August 16, 2024

same issue 3080 12GB
Arguments: ('resumetest', 'E:\StableDiffusion\stable-diffusion-webui\outputs\Training\yume\Cropped', '', 'photo of m_yumegirl', 'photo of a woman', '', '', 1.0, 7.5, 40.0, 0, 512, False, True, 1, 1, 1, 200, 1, True, 5e-06, False, 'constant', 0, True, 0.9, 0.999, 0.01, 1e-08, 1, 0, 100, 'no', True, '', False, True, True) {}

from sd_dreambooth_extension.

jsfs11 commented on August 16, 2024

Exact same errors on a 4090 also

from sd_dreambooth_extension.

0xItx commented on August 16, 2024

Getting the same error on Windows / RTX 3090. Tried many setting combinations at the training page to no avail. With & without the cu116 torch versions.
I also tried reinstalling the repo from scratch, and creating new DB models based on SD / Trinart / others.

from sd_dreambooth_extension.

seihoukei commented on August 16, 2024

I've tried enabling Adam and setting precision to fp16 (instead of none) under Advanced and it helped.

from sd_dreambooth_extension.

Cerv3ra commented on August 16, 2024

Just enabling Fp16 makes it work for me on a 3090.

(Perhaps we should change the default?)

Edit: It breaks on save for some reason.

from sd_dreambooth_extension.

seihoukei commented on August 16, 2024

Yeah, disabling Adam kept it working, disabling mixed precision broke it.
However we face the next issue (also reported somewhere around) - in fp16 mode the final model is not created in models folder. my current workaround is setting it to make model every X steps and do X+1 steps for training. This way the model file is created after X steps and does one more step into void.

from sd_dreambooth_extension.

ozzymuppet commented on August 16, 2024

I've tried enabling Adam and setting precision to fp16 (instead of none) under Advanced and it helped.

This worked for me, thank you.

from sd_dreambooth_extension.

d8ahazard commented on August 16, 2024

So it sounds like this is resolved?

Closing. Feel free to re-open if I'm mistaken.

from sd_dreambooth_extension.

commented on August 16, 2024

I wasn't getting this error a few days ago, but now as of the latest commit (4d9b93d) I have to use fp16 as well. If I revert back to 6f0ab90 it works fine, no need to change the mixed precision option.

Edit: Upon further investigation, this may be due to some change that might've happened in my setup. I have two GPUs, 3090 and 2080 Ti, and now it seems only my the 3090 is utilized for actually generating images, and the 2080 Ti is only used for VRAM. I'm getting similar behavior in other software like Blender as well, even though it recognizes both GPUs. It would make sense I'm getting this error and have to use fp16, since only then it "fits" in both of the GPU's memory, and I actually had gotten an error like this in Blender years ago when I tried utilizing both GPUs without thinking about this limitation.
Will update again if I'm able to resolve this issue.

Edit 2: Partially resolved. I was previously using the CUDA_VISIBLE_DEVICES env var to set which GPU to use, and at some point very recently this must've stopped working. If I now specify the device with command line arg --device-id (which hadn't existed at the time before I used the env var), the GPU is being used as expected. However, in order to get this extension to not throw the CUDA error, I still have to revert back to 6f0ab90, so I don't think this is fixed yet.

Edit 3: Narrowed it down to edcae96 and using Shivam's diffusers package causing the problem. If I install the version that was previously specified instead (0.7.2), I can then use the latest commit of this extension just fine.
There's another problem related to this though. Since the webui attempts to use the version listed in it's own requirements_version.txt for diffusers (0.3.0), it will end up installing that version instead. This is why a lot of people (including myself) get AttributeError: 'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing' mentioned in #18, #40, and #107.
I think ultimately what needs to be done here is we need to make sure the correct diffusers version is installed, while also making sure it does not end up throwing the CUDA error mentioned here (which through my testing, using 0.7.2 works just fine), or the error in #21 (if that still applies, I never ran into that error so I can't say for certain). Frankly the only way I see the former being resolved cleanly is getting the webui to use 0.7.2 by default, and the latter you may know what the status of that is.
You're probably getting mixed reports because some people are likely launching webui.py directly, bypassing launch.py which normally is invoked by the webui-user script, and contains all the installation checks.

from sd_dreambooth_extension.

commented on August 16, 2024

Pinging @d8ahazard regarding my comments above (not sure if edits reflect new pings). Last edit is the important bit.

from sd_dreambooth_extension.

underlines commented on August 16, 2024

Lukium commented on Nov 12

Also posted here as it appears to be the same issue: #115

UNRELATED EDIT BUT STILL AN ISSUE: It claims to save the weights to .\models\dreambooth[trainedmodelname] but it actually saves to .\models instead

What follows is a wall of text showing the full install process to a successful install from scratch. Note, that enabling --xformers will kill the install and lead to the error mentioned by the OP (The reason why I'm posting this here after hours of getting the same error and trying to troubleshoot it), which leads me to think the following:
....

i tried for over 20 hours to get a new env working again (after i was doing dreambooth's fine 3 months ago with shivam's original repo). after trying everyting: dreambooth, LORA, xformers, windows WSL2 ubuntu, native windows, stable tuner etc. the only thing that worked was following your guide:

basically:

create new env python 3.10.0
activate it
git clone Auto1111's repo to sd-dreambooth
copy the sd1.5 or sd2.1 model into the models directory
python launch.py
in the UI install dreambooth, ignore the errors in console.
kill the webui
python launch.py, and wait for it to install more stuff. then kill it again
python launch.py --xformers (works only on certain cards like my 3080! other's have to build it)
now the check should run through with all requirements installed properly on windows, natively! WOW
you can train on 5GB with LORA, or 10GB with higher batch counts, probably even dreambooth native, with the right settings (check shivam's repo)

probably even works on the 768px sd v2.1 model, I will try soon.

Thanks!

from sd_dreambooth_extension.

PresleyRen commented on August 16, 2024

some issue at 4070ti

from sd_dreambooth_extension.

CUDA error: invalid argument about sd_dreambooth_extension HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent