akegarasu / lora-scripts Goto Github PK

View Code? Open in Web Editor NEW

4.4K 28.0 544.0 1.36 MB

LoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.

License: GNU Affero General Public License v3.0

PowerShell 0.41% Jupyter Notebook 0.12% Shell 0.30% Python 98.17% Dockerfile 0.01% TypeScript 0.98%

dreambooth finetune lora

lora-scripts's Issues

运行库缺失

python提示缺失triton并且pip安装不了,不论是系统的pip还是venv下的pip.

并且建议升级一下包里的torch,据说1.13.1和xformers0.0.16提升很大

安装的时候报错：
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.14.0 requires torch==1.13.0, but you have torch 1.12.1+cu116 which is incompatible.
torchaudio 0.13.0+cpu requires torch==1.13.0, but you have torch 1.12.1+cu116 which is incompatible.

一个可以优化的无影响的错误显示

windows跑这个脚本会提示：
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
基本上每次启动一轮ecopoch都会反复显示很多很多遍

但是不会影响训练。
经谷歌查找,Triton这个组件是不适用于windows而是给linux用的，所以windows报这个错误其实没任何意义，应该有个途径把它屏蔽掉，不然有点浪费时间（不过相比于整个训练，也不是浪费太多时间）

无法开始训练（python版本3.10.10）

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 1799.40it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (256, 256), count: 100
bucket 1: resolution (384, 384), count: 200
bucket 2: resolution (512, 512), count: 1500
mean ar error (without repeats): 0.0
prepare accelerator
Traceback (most recent call last):
File "D:\AI\kohya_ss\train_network.py", line 652, in
train(args)
File "D:\AI\kohya_ss\train_network.py", line 108, in train
accelerator, unwrap_model = train_util.prepare_accelerator(args)
File "D:\AI\kohya_ss\library\train_util.py", line 1973, in prepare_accelerator
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, mixed_precision=args.mixed_precision,
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 355, in init
raise ValueError(err.format(mode="fp16", requirement="a GPU"))
ValueError: fp16 mixed precision requires a GPU
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\AI\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI/novelai-webui-aki-v3A/models/Stable-diffusion/chilloutmix_NiPrunedFp32Fix.safetensors', '--train_data_dir=D:/AI/text1/image', '--resolution=512,512', '--output_dir=D:/AI/text1/model', '--logging_dir=D:/AI/text1/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0002', '--lr_scheduler=cosine', '--lr_warmup_steps=180', '--train_batch_size=1', '--max_train_steps=1800', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=Adafactor', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

linux多机多卡训练需要改什么参数吗

使用两台linux服务器，每台上面有一张a10，运行train.sh后感觉两台机器在独立训练，有什么关键配置要改吗？
这两台机器跑accelerate官方的样例nlp_example.py是成功的

无法开始训练（已确认venv是python 3.10.8）

subprocess.CalledProcessError: Command '['D:\stable-diffusion-webui\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/stable-diffusion-webui/models/Stable-diffusion/protogenX34Photorealism_1.ckpt', '--train_data_dir=./train/wq', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=15', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=32', '--network_alpha=32', '--output_name=starwanqian', '--train_batch_size=2', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.

安装依赖失败

没有国内手机，夸克网盘那个下载不了
如果运行install cn会安装依赖失败
运行install会标红

cp : 找不到路径“I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\sd-scripts\bitsandbytes_windows”，因为该路径不
存在。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\install.ps1:16 字符: 1

cp .\bitsandbytes_windows*.dll ..\venv\Lib\site-packages\bitsandbyte ...

  + CategoryInfo          : ObjectNotFound: (I:\stable-diffu...ndbytes_windows:String) [Copy-Item], ItemNotFoundExce
 ption
  + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand

cp : 找不到路径“I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\sd-scripts\bitsandbytes_windows\cextension.py”
，因为该路径不存在。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\install.ps1:17 字符: 1

cp .\bitsandbytes_windows\cextension.py ..\venv\Lib\site-packages\bit ...

  + CategoryInfo          : ObjectNotFound: (I:\stable-diffu...s\cextension.py:String) [Copy-Item], ItemNotFoundExce
 ption
  + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand

cp : 找不到路径“I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\sd-scripts\bitsandbytes_windows\main.py”，因为
该路径不存在。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\install.ps1:18 字符: 1

cp .\bitsandbytes_windows\main.py ..\venv\Lib\site-packages\bitsandby ...

  + CategoryInfo          : ObjectNotFound: (I:\stable-diffu...windows\main.py:String) [Copy-Item], ItemNotFoundExce
 ption
  + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand

但是会安装成功，在这样的情况下运行train会

accelerate : 无法将“accelerate”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写，如果包括路径，请
确保路径正确，然后再试一次。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\train.ps1:95 字符: 1

accelerate launch --num_cpu_threads_per_process=8 "./sd-scripts/train ...

  + CategoryInfo          : ObjectNotFound: (accelerate:String) [], CommandNotFoundException
  + FullyQualifiedErrorId : CommandNotFoundException

查了一下，Set-ExecutionPolicy -ExecutionPolicy RemoteSigned我也输入了，第二次有确认

If xformers install way can be optimize?

pip install git+https://github.com/facebookresearch/xformers.git@0bad001ddd56c080524d37c84ff58d9cd030ebfd

this commond error like below

fatal: 无法访问 'https://github.com/facebookresearch/xformers.git/'：Failed to connect to github.com port 443 after 130947 ms: 连接超时

if there is another way to install xformers?

安装过程有报错

ERROR: Exception:
Traceback (most recent call last):
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\cli\base_command.py", line 167, in exc_logging_wrapper
status = run_func(*args)
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\cli\req_command.py", line 247, in wrapper
return func(self, options, args)
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\commands\install.py", line 461, in run
installed = install_given_reqs(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\req_init_.py", line 73, in install_given_reqs
requirement.install(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\req\req_install.py", line 790, in install
install_wheel(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\operations\install\wheel.py", line 727, in install_wheel
_install_wheel(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\operations\install\wheel.py", line 587, in _install_wheel
file.save()
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\operations\install\wheel.py", line 388, in save
shutil.copyfileobj(f, dest)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 195, in copyfileobj
buf = fsrc_read(length)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 925, in read
data = self._read1(n)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1015, in _read1
self._update_crc(data)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 943, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'torch/lib/cusolver64_11.dll'

[notice] A new release of pip available: 22.2.1 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

min_snr_gamma support?

在看civit上的训练教程发现了这个算法：https://arxiv.org/abs/2303.09556
据说可以提高训练效率。不知道大佬有没有了解，训练脚本是否有计划/有必要加入这个？

其他依赖安装失败。报错：ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

Looking in indexes: https://mirrors.bfsu.edu.cn/pypi/web/simple
Looking in links: https://mirror.sjtu.edu.cn/pytorch-wheels/torch_stable.html
Collecting torch==1.12.1+cu116
Downloading https://mirror.sjtu.edu.cn/pytorch-wheels/cu116/torch-1.12.1%2Bcu116-cp310-cp310-win_amd64.whl (2388.4 MB)
---------------------------------------- 2.4/2.4 GB 34.4 MB/s eta 0:00:00
Collecting torchvision==0.13.1+cu116
Downloading https://mirror.sjtu.edu.cn/pytorch-wheels/cu116/torchvision-0.13.1%2Bcu116-cp310-cp310-win_amd64.whl (2.6 MB)
---------------------------------------- 2.6/2.6 MB 9.6 MB/s eta 0:00:00
Collecting typing-extensions
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/31/25/5abcd82372d3d4a3932e1fa8c3dbf9efac10cc7c0d16e78467460571b404/typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Collecting numpy
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/fa/df/53e8c0c8ccecf360b827a3d2b1b6060644c635c3149a9d6415a6fe4ccf44/numpy-1.24.2-cp310-cp310-win_amd64.whl (14.8 MB)
---------------------------------------- 14.8/14.8 MB 65.1 MB/s eta 0:00:00
Collecting requests
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/d2/f4/274d1dbe96b41cf4e0efb70cbced278ffd61b5c7bb70338b62af94ccb25b/requests-2.28.2-py3-none-any.whl (62 kB)
---------------------------------------- 62.8/62.8 kB ? eta 0:00:00
Collecting pillow!=8.3.*,>=5.3.0
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/5e/7c/293136a5171800001be33c21a51daaca68fae954b543e2c015a6bb81a716/Pillow-9.4.0-cp310-cp310-win_amd64.whl (2.5 MB)
---------------------------------------- 2.5/2.5 MB 154.1 MB/s eta 0:00:00
Collecting idna<4,>=2.5
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl (61 kB)
---------------------------------------- 61.5/61.5 kB ? eta 0:00:00
Collecting certifi>=2017.4.17
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/71/4c/3db2b8021bd6f2f0ceb0e088d6b2d49147671f25832fb17970e9b583d742/certifi-2022.12.7-py3-none-any.whl (155 kB)
---------------------------------------- 155.3/155.3 kB ? eta 0:00:00
Collecting charset-normalizer<4,>=2
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/98/f4/5ca33ee1e0b3412cbd13eae230321a9fe819acf1a99ad6482420fb97cc6b/charset_normalizer-3.0.1-cp310-cp310-win_amd64.whl (96 kB)
---------------------------------------- 96.5/96.5 kB ? eta 0:00:00
Collecting urllib3<1.27,>=1.21.1
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/fe/ca/466766e20b767ddb9b951202542310cba37ea5f2d792dae7589f1741af58/urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
---------------------------------------- 140.6/140.6 kB ? eta 0:00:00
Installing collected packages: charset-normalizer, urllib3, typing-extensions, pillow, numpy, idna, certifi, torch, requests, torchvision
Successfully installed certifi-2022.12.7 charset-normalizer-3.0.1 idna-3.4 numpy-1.24.2 pillow-9.4.0 requests-2.28.2 torch-1.12.1+cu116 torchvision-0.13.1+cu116 typing-extensions-4.5.0 urllib3-1.26.14
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
其他依赖安装失败。
安装失败。

An error happened when I training On macOS

OS version 13.2
Python 3.10.9
Device MacBook Pro M1 Pro 32G
Error info

caching latents.
  0%|                                                                                                                         | 0/32 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/cpp/Downloads/sd/lora-scripts/./sd-scripts/train_network.py", line 548, in <module>
    train(args)
  File "/Users/cpp/Downloads/sd/lora-scripts/./sd-scripts/train_network.py", line 167, in train
    train_dataset.cache_latents(vae)
  File "/Users/cpp/Downloads/sd/lora-scripts/sd-scripts/library/train_util.py", line 508, in cache_latents
    info.latents = vae.encode(img_tensor).latent_dist.sample().squeeze(0).to("cpu")
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 114, in encode
    h = self.encoder(x)
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/diffusers/models/vae.py", line 101, in forward
    sample = self.conv_in(sample)
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
Traceback (most recent call last):
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    simple_launcher(args)
  File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 552, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/cpp/Downloads/sd/lora-scripts/venv/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/nonomi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=64', '--network_alpha=32', '--output_name=nonomi', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.

训练的脚本执行的时候报错:UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 64 和 CalledProcessError

我看了青龙的视频,修改了安装脚本
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 -f https://mirror.sjtu.edu.cn/pytorch-wheels/torch_stable.html -i https://mirrors.bfsu.edu.cn/pypi/web/simple -U
pip install --upgrade -r requirements.txt

不知道有没有关系

`Traceback (most recent call last):
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 415, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 419, in load_state_dict
if f.read(7) == "version":
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 64: illegal multibyte sequence

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\train_network.py", line 711, in
train(args)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\train_network.py", line 130, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\library\train_util.py", line 2626, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\library\model_util.py", line 921, in load_models_from_stable_diffusion_checkpoint
text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 2301, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 431, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin' at 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)`

开始训练时报错

prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
found directory 6_ichinose contains 76 image files
456 train images with repeating.
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 76/76 [00:00<00:00, 1117.78it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (512, 640), count: 456
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net:
loading vae:
Traceback (most recent call last):
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 415, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "D:\Software\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 705, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "D:\Software\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 419, in load_state_dict
if f.read(7) == "version":
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 64: illegal multibyte sequence

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Software\lora-scripts\sd-scripts\train_network.py", line 507, in
train(args)
File "D:\Software\lora-scripts\sd-scripts\train_network.py", line 96, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "D:\Software\lora-scripts\sd-scripts\library\train_util.py", line 1860, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "D:\Software\lora-scripts\sd-scripts\library\model_util.py", line 919, in load_models_from_stable_diffusion_checkpoint
text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 2301, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 431, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin' at 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Software\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\Software\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\Software\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\Software\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\Software\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/ichinose', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,640', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=ichinose', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--network_train_unet_only', '--use_8bit_adam']' returned non-zero exit status 1.
Train finished

能支持设置连续训练队列吗

每个lora训练几个小时，睡前开始跑，早上急着出门来不及重开一轮，一天里大部分时间都浪费了有点可惜

linux的安装是不是得更新一下

和window下面的依赖都不一样了，同步一下啊

我可以在 lora-scripts 中使用 LyCORIS 吗

sd-scripts已经增加了对LyCORIS，但我在lora-scripts中似乎更新不到sd-scripts的新版本

为什么4060移动版训练也报显存

使用的最新的github代码，
底模：chilloutmix
报错内容：
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 8.00 GiB total capacity; 7.04 GiB already allocated; 0 bytes free; 7.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

运行train.sh失败

run train.sh info

(venv) taper@pc:~/App/lora-scripts$ ./train.sh 
Traceback (most recent call last):
  File "/home/taper/App/python/3.10.6/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/launch.py", line 747, in launch_command
    defaults = load_config_from_file(args.config_file)
  File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/config/config_args.py", line 64, in load_config_from_file
    return config_class.from_yaml_file(yaml_file=config_file)
  File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/config/config_args.py", line 117, in from_yaml_file
    return cls(**config_dict)
TypeError: ClusterConfig.__init__() got an unexpected keyword argument 'command_file'

train.sh

# 我只修改了这几行
pretrained_model="./sd-models/v1-5-pruned-emaonly.safetensors" # base model path | 底模路径
train_data_dir="./train/xxx/"              # train dataset path | 训练数据集路径
resolution="512,640"      # image resolution w,h. 图片分辨率，宽,高。支持非正方形，但必须是 64 倍数。
max_train_epoches=20      # max train epoches | 最大训练 epoch
output_name="xxx"           # output model name | 模型保存名称

麻烦作者大大看看到底是哪里出问题了，这个accelerate的错误信息实在不好排查

正则化图片不识别

我已经设定正则化图集的路径，并且正则化图集文件夹里也是有东西的，但是仍然提示无正则化图像。

训练时出现的错误

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 76/76 [00:00<00:00, 4675.58it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (384, 640), count: 54
bucket 1: resolution (448, 576), count: 12
bucket 2: resolution (512, 512), count: 342
bucket 3: resolution (640, 384), count: 48
mean ar error (without repeats): 0.022491709236204763
prepare accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint
C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\safetensors\torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
Traceback (most recent call last):
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\train_network.py", line 724, in <module>
    train(args)
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\train_network.py", line 135, in train
    text_encoder, vae, unet, _ = train_util.load_target_model(
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\library\train_util.py", line 2649, in load_target_model
    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path, device)
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\library\model_util.py", line 869, in load_models_from_stable_diffusion_checkpoint
    _, state_dict = load_checkpoint_with_text_encoder_conversion(ckpt_path, device)
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\library\model_util.py", line 844, in load_checkpoint_with_text_encoder_conversion
    state_dict = load_file(ckpt_path) # , device) # may causes error
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\safetensors\torch.py", line 100, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: shape '[1280, 1280, 3, 3]' is invalid for input of size 3657939
Traceback (most recent call last):
  File "C:\Users\default.DESKTOP-08UGQK7\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\default.DESKTOP-08UGQK7\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\default.DESKTOP-08UGQK7\\Desktop\\lora-scripts-main\\venv\\Scripts\\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/v1-5-pruned-emaonly.safetensors', '--train_data_dir=./train/nine-age', '--output_dir=./output', '--logging_dir=./logs', '--log_prefix=nine_ling', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=nine_ling', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0']' returned non-zero exit status 1.
Train finished

以上是运行train.ps1时出现的报错，基础模型也换了，文件名照着大佬发的视频又检查了一遍还是不行，人已经傻了

I found an issue probably many people have same situation like me

Device: win10 / 3060ti
In my training step, i use the notepad++ to change the parameters, however, when i used shell to activate, it gave me this error.(picture 1)
.
I quickly found that there is something wrong in this train.ps1. Althought the parameter "--learning-rate" has number in it , i dont know why the programme ignore it. So I change the programme like this

Fortunately, it worked, but then it reported me another error like this.

So is there any better way to solve those problems? I was quite confused.

Error caught was: No module named 'triton'

反复重新安装过几次了，都会报这个错误。看install.bash里面确实有triton的安装指令

FileNotFoundError: [Errno 2] No such file or directory: './train/aki'

$ bash train.sh
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
Traceback (most recent call last):
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/./sd-scripts/train_network.py", line 548, in
train(args)
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/./sd-scripts/train_network.py", line 121, in train
train_dataset = DreamBoothDataset(args.train_batch_size, args.train_data_dir, args.reg_data_dir,
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/sd-scripts/library/train_util.py", line 754, in init
train_dirs = os.listdir(train_data_dir)
FileNotFoundError: [Errno 2] No such file or directory: './train/aki'
Traceback (most recent call last):
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/aki', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=32', '--network_alpha=32', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.

4070一直无法运行

取消8 bit adam也没办法使用

prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
Traceback (most recent call last):
File "D:\AI\developer\lora\lora-scripts\sd-scripts\train_network.py", line 548, in
train(args)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\train_network.py", line 156, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\library\train_util.py", line 1584, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, args.pretrained_model_name_or_path)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\library\model_util.py", line 877, in load_models_from_stable_diffusion_checkpoint
converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\library\model_util.py", line 234, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
File "C:\Users\XIE\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\XIE\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\AI\developer\lora\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\AI\developer\lora\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\AI\developer\lora\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\AI\developer\lora\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\AI\developer\lora\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/koreanDollLikeness_v15.safetensors', '--train_data_dir=./train/donghan', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=15', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=32', '--network_alpha=32', '--output_name=donghan', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=512', '--max_bucket_reso=512']' returned non-zero exit status 1.
Train finished

请问怎么设置才能使用v2.1大模型呢？

请问怎么设置才能使用v2.1大模型呢？尝试手动在train.ps1里第101行--enable_bucket `前面加上--v2，执行会提醒v2不能同时使用clip_skip，然后注释掉clip_skip仍然运行不了

Failed to install xformers==0.0.14.dev0

When I ran the install-cn.ps1, it failed at installing xformers.

The install cmd:
pip install -U -I --no-deps https://jihulab.com/api/v4/projects/82097/packages/pypi/files/e8508fe14c8f2552a822f5e6f5620b24fdd4ba3129c2a31a39b56425bcc023bc/xformers-0.0.14.dev0+torch12-cp310-cp310-win_amd64.whl

The error msg:
ERROR: xformers-0.0.14.dev0+torch12-cp310-cp310-win_amd64.whl is not a supported wheel on this platform.

I tried to manually install that version via pip, but no version matches:
pip install xformers==0.0.14.dev0 ERROR: Could not find a version that satisfies the requirement xformers==0.0.14.dev0 (from versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.0.10, 0.0.11, 0.0.12, 0.0.13, 0.0.16rc424, 0.0.16rc425, 0.0.16, 0.0.17.dev448, 0.0.17.dev449, 0.0.17.dev451, 0.0.17.dev461, 0.0.17.dev464) ERROR: No matching distribution found for xformers==0.0.14.dev0

Also tried some .whl in this page. Still no luck.

Could anyone help?

linux 系统安装环境出错

昨天晚上还能正常运行，今天运行就失败了，用的是python已经更新成3.10.8版本的colab上面运行的

新版本autodl镜像不支持a100 80gb 和其他高显存显卡处理器等的训练

报错
Traceback (most recent call last):
File "/root/lora-scripts/./sd-scripts/train_network.py", line 699, in
train(args)
File "/root/lora-scripts/./sd-scripts/train_network.py", line 538, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 381, in forward
sample, res_samples = downsample_block(
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 612, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/attention.py", line 484, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/lora-scripts/sd-scripts/library/train_util.py", line 1700, in forward_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None) # 最適なのを選んでくれる
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 975, in memory_efficient_attention
return op.apply(query, key, value, attn_bias, p, scale).reshape(output_shape)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 360, in forward
out, lse = cls.FORWARD_OPERATOR(
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/_ops.py", line 143, in call
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
steps: 0%| | 0/1600 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/lora/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/lora/bin/python', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/wzgrx', '--output_dir=./output', '--logging_dir=./logs', '--resolution=1536,1920', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=128', '--network_alpha=64', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.
(lora) root@autodl-container-a25511bdfa-77fa09a8:~/lora-scripts#

训练提示时输出错误提示，过程没有终止，A matching Triton is not available, some optimizations will not be enabled.

是正常的日志输出吗

M40 24GB 训练时报错

prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare images.
found directory train\nijika\6_nijika contains 42 image files
252 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (1024, 1024)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False

[Subset 0 of Dataset 0]
image_dir: "train\nijika\6_nijika"
image_count: 42
num_repeats: 6
shuffle_caption: True
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
is_reg: False
class_tokens: nijika
caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|█████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 419.76it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (1024, 1024), count: 126
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net:
loading vae:
loading text encoder:
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|██████████████████████████████████████████████████████████████████████████████████| 21/21 [00:36<00:00, 1.73s/it]
import network module: networks.lora
create LoRA network. base dim (rank): 32, alpha: 32.0
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use AdamW optimizer | {}
override steps. steps for 10 epochs is / 指定エポックまでのステップ数: 1260
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 252
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 126
num epochs / epoch数: 10
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1260
steps: 0%| | 0/1260 [00:00<?, ?it/s]epoch 1/10
Traceback (most recent call last):
File "K:\lora-scripts\sd-scripts\train_network.py", line 699, in
train(args)
File "K:\lora-scripts\sd-scripts\train_network.py", line 538, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "K:\lora-scripts\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 381, in forward
sample, res_samples = downsample_block(
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 612, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\attention.py", line 216, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\attention.py", line 484, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\sd-scripts\library\train_util.py", line 1700, in forward_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None) # 最適なのを選んでくれる
File "K:\lora-scripts\venv\lib\site-packages\xformers\ops.py", line 865, in memory_efficient_attention
return op.apply(query, key, value, attn_bias, p).reshape(output_shape)
File "K:\lora-scripts\venv\lib\site-packages\xformers\ops.py", line 319, in forward
out, lse = cls.FORWARD_OPERATOR(
File "K:\lora-scripts\venv\lib\site-packages\torch_ops.py", line 143, in call
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
steps: 0%| | 0/1260 [01:11<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Administrator.WIN-7JDG5CD1HHH\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Administrator.WIN-7JDG5CD1HHH\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "K:\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "K:\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "K:\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "K:\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['K:\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/nijika', '--output_dir=./output', '--logging_dir=./logs', '--resolution=1024,1024', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=ckpt', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption']' returned non-zero exit status 1.
Train finished

ubuntu系统下安装了两个3090显卡但是报错

Traceback (most recent call last):
File "/home/ipfs10/lora/lora-scripts/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/ipfs10/lora/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/ipfs10/lora/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/home/ipfs10/lora/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ipfs10/lora/lora-scripts/venv/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=/home/ipfs10/stable-diffusion-webui/models/Stable-diffusion/yy/yy_900.safetensors', '--train_data_dir=./train/sucai', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,640', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=64', '--network_alpha=32', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.

How to use local environment instead of venv

How to use local environment instead of venv?
怎么用本地环境替代虚拟环境……用虚拟环境还得再装一次pytorch，我电脑空间不是很够啊

No data found

(lora) root@autodl-container-a1d611953c-8035be35:~/lora-scripts# bash train.sh
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: ./train/laohuang/.ipynb_checkpoints
0 train images with repeating.
loading image sizes.
0it [00:00, ?it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (256, 832), count: 0
bucket 1: resolution (256, 896), count: 0
bucket 2: resolution (256, 960), count: 0
bucket 3: resolution (256, 1024), count: 0
bucket 4: resolution (320, 704), count: 0
bucket 5: resolution (320, 768), count: 0
bucket 6: resolution (384, 640), count: 0
bucket 7: resolution (448, 576), count: 0
bucket 8: resolution (512, 512), count: 0
bucket 9: resolution (576, 448), count: 0
bucket 10: resolution (640, 384), count: 0
bucket 11: resolution (704, 320), count: 0
bucket 12: resolution (768, 320), count: 0
bucket 13: resolution (832, 256), count: 0
bucket 14: resolution (896, 256), count: 0
bucket 15: resolution (960, 256), count: 0
bucket 16: resolution (1024, 256), count: 0
/root/miniconda3/envs/lora/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/root/miniconda3/envs/lora/lib/python3.10/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
mean ar error (without repeats): nan
No data found. Please verify arguments / 画像がありません。引数指定を確認してください

运行install-cn.ps1命令，提示已经安装成功，训练时候提示“Error caught was: No module named 'triton'“

如题，这个应该怎么解决呢？

CalledProcessError: Command

Traceback (most recent call last):
File "F:\lora-scripts-0.2.0\sd-scripts\train_network.py", line 642, in
train(args)
File "F:\lora-scripts-0.2.0\sd-scripts\train_network.py", line 114, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "F:\lora-scripts-0.2.0\sd-scripts\library\train_util.py", line 2008, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "F:\lora-scripts-0.2.0\sd-scripts\library\model_util.py", line 877, in load_models_from_stable_diffusion_checkpoint
converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
File "F:\lora-scripts-0.2.0\sd-scripts\library\model_util.py", line 234, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
File "C:\Users\keina\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\keina\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\lora-scripts-0.2.0\venv\Scripts\accelerate.exe_main.py", line 7, in
File "F:\lora-scripts-0.2.0\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "F:\lora-scripts-0.2.0\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "F:\lora-scripts-0.2.0\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\lora-scripts-0.2.0\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/before.safetensors', '--train_data_dir=./train/', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,768', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=64', '--network_alpha=32', '--output_name=after', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--reg_data_dir=./train/reg', '--use_8bit_adam', '--use_lion_optimizer']' returned non-zero exit status 1.
Train finished

Liunx上训练报错

running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 78
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 26
  num epochs / epoch数: 20
  batch size per device / バッチサイズ: 3
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 520
steps:   0%|                                                                                    | 0/520 [00:00<?, ?it/s]epoch 1/20
Traceback (most recent call last):
  File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 699, in <module>
    train(args)
  File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 538, in train
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 381, in forward
    sample, res_samples = downsample_block(
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 612, in forward
    hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
    hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 484, in forward
    hidden_states = self.attn1(norm_hidden_states) + hidden_states
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/lora-scripts/sd-scripts/library/train_util.py", line 1700, in forward_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)  # 最適なのを選んでくれる
  File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 975, in memory_efficient_attention
    return op.apply(query, key, value, attn_bias, p, scale).reshape(output_shape)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 360, in forward
    out, lse = cls.FORWARD_OPERATOR(
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/_ops.py", line 442, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:140 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:488 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:291 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:296 [backend fallback]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:482 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:743 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:189 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:484 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]

steps:   0%|                                                                                    | 0/520 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/stable/anaconda3/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/stable/anaconda3/bin/python', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/yazi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,704', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=ba_yazi_V10', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.

大佬们帮忙看下是什么情况呢

ModuleNotFoundError: No module named 'lion_pytorch'

用8bit-adam可以正常训练，使用lion训练就提示错误，请问如何处理？具体错误描述如下：

Traceback (most recent call last):
  File "D:\mys\tools\Lora\sd-scripts\library\train_util.py", line 1593, in get_optimizer
    import lion_pytorch
ModuleNotFoundError: No module named 'lion_pytorch'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\mys\tools\Lora\sd-scripts\train_network.py", line 507, in <module>
    train(args)
  File "D:\mys\tools\Lora\sd-scripts\train_network.py", line 150, in train
    optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
  File "D:\mys\tools\Lora\sd-scripts\library\train_util.py", line 1595, in get_optimizer
    raise ImportError("No lion_pytorch / lion_pytorch がインストールされていないようです")
ImportError: No lion_pytorch / lion_pytorch がインストールされていないようです
Traceback (most recent call last):
  File "C:\Users\iamgb\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\iamgb\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\mys\tools\Lora\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "D:\mys\tools\Lora\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "D:\mys\tools\Lora\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "D:\mys\tools\Lora\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\mys\\tools\\Lora\\venv\\Scripts\\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=64', '--network_alpha=64', '--output_name=ZhangJiani', '--train_batch_size=1', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--network_train_unet_only', '--use_lion_optimizer']' returned non-zero exit status 1.
Train finished

two 2080ti,但是知识调用了一块

two 2080ti,但是知识调用了一块，如何设置调用两块同时训练

ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

TypeError: replace_unet_cross_attn_to_xformers.<locals>.forward_xformers() got an unexpected keyword argument 'encoder_hidden_states'

when train I get the following problem. Have anyone have same problem?

steps: 0%| | 0/20880 [00:00<?, ?it/s]epoch 1/20
Traceback (most recent call last):
File "/data/sd/sd/stable-diffusion-webui/lora-scripts/./sd-scripts/train_network.py", line 642, in
train(args)
File "/data/sd/sd/stable-diffusion-webui/lora-scripts/./sd-scripts/train_network.py", line 503, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 489, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 582, in forward
sample, res_samples = downsample_block(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 837, in forward
hidden_states = attn(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 265, in forward
hidden_states = block(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 291, in forward
attn_output = self.attn1(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
TypeError: replace_unet_cross_attn_to_xformers..forward_xformers() got an unexpected keyword argument 'encoder_hidden_states'

Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

TypeError: Input must be callable

/content/lora-scripts
Traceback (most recent call last):
File "./sd-scripts/train_network.py", line 20, in
import library.config_util as config_util
File "/content/lora-scripts/sd-scripts/library/config_util.py", line 113, in
class ConfigSanitizer:
File "/content/lora-scripts/sd-scripts/library/config_util.py", line 116, in ConfigSanitizer
def __validate_and_convert_twodim(klass, value: Sequence) -> Tuple:
File "/usr/local/lib/python3.8/dist-packages/toolz/functoolz.py", line 201, in init
raise TypeError("Input must be callable")
TypeError: Input must be callable
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.safetensors', '--train_data_dir=./train/aki', '--output_dir=./output', '--logging_dir=./logs', '--resolution=768,768', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=2e-4', '--text_encoder_lr=3e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=3', '--network_dim=128', '--network_alpha=64', '--output_name=train', '--train_batch_size=4', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=1', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_lion_optimizer']' returned non-zero exit status 1.

Train with RuntimeError

My graphics card is RTX 2080, when I run train.ps1 with ckpt model I got this problem:
RuntimeError: PytorchStreamReader failed reading zip archive: failed central directory
My CUAD version is 11.7, is that torch version's problem? And when I changed model to safetensors, I got a new problem:
KeyError:'encoder.conv_in.weight'
How can I fix it, appreciate a lot

install-cn.ps1字符报错

前晚pull后还运行过一次没问题，今早手贱又pull了一次，结果powershell报这个错，我没动powershell设置，但脚本好像也没错？

./install-cn.ps1
所在位置 D:\tools\AIGC\lora-scripts\install-cn.ps1:48 字符: 14
+ Write-Output "瀹夎瀹屾瘯銆?
+              ~~~~~~~~~
字符串缺少终止符: "。
    + CategoryInfo          : ParserError: (:) [], ParseException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

请问一下怎么强制更新呀？

如题。

我的是1050ti显卡4G显存的，第一次练lora是成功的，后来就不行了，一直爆显存，有什么方法解决吗？

train_network.py: error: unrecognized arguments: --use_lion_optimizer

今天更新了版本，启动时报上面的错误，具体的配置内容如下：

# LoRA train script by @Akegarasu

# Train data path | 设置训练用模型、图片
$pretrained_model = "./sd-models/model.ckpt" # base model path | 底模路径
$train_data_dir = "./train" # train dataset path | 训练数据集路径
$reg_data_dir = "" # directory for regularization images | 正则化数据集路径，默认不使用正则化图像。

# Train related params | 训练相关参数
$resolution = "512,512" # image resolution w,h. 图片分辨率，宽,高。支持非正方形，但必须是 64 倍数。
$batch_size = 1 # batch size
$max_train_epoches = 10 # max train epoches | 最大训练 epoch
$save_every_n_epochs = 1 # save every n epochs | 每 N 个 epoch 保存一次
$network_dim = 64 # network dim | 常用 4~128，不是越大越好
$network_alpha = 64 # network alpha | 常用与 network_dim 相同的值或者采用较小的值，如 network_dim的一半 防止下溢。默认值为 1，使用较小的 alpha 需要提升学习率。
$clip_skip = 2 # clip skip | 玄学 一般用 2
$train_unet_only = 1 # train U-Net only | 仅训练 U-Net，开启这个会牺牲效果大幅减少显存使用。6G显存可以开启
$train_text_encoder_only = 0 # train Text Encoder only | 仅训练 文本编码器

# Learning rate | 学习率
$lr = "1e-4"
$unet_lr = "1e-4"
$text_encoder_lr = "1e-5"
$lr_scheduler = "cosine_with_restarts" # "linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"
$lr_warmup_steps = 0 # warmup steps | 仅在 lr_scheduler 为 constant_with_warmup 时需要填写这个值
$lr_restart_cycles = 1 # cosine_with_restarts restart cycles | 余弦退火重启次数，仅在 lr_scheduler 为 cosine_with_restarts 时起效。

# Output settings | 输出设置
$output_name = "test" # output model name | 模型保存名称
$save_model_as = "safetensors" # model save ext | 模型保存格式 ckpt, pt, safetensors

# 其他设置
$network_weights = "" # pretrained weights for LoRA network | 若需要从已有的 LoRA 模型上继续训练，请填写 LoRA 模型路径。
$min_bucket_reso = 256 # arb min resolution | arb 最小分辨率
$max_bucket_reso = 1024 # arb max resolution | arb 最大分辨率
$persistent_data_loader_workers = 0 # persistent dataloader workers | 容易爆内存，保留加载训练集的worker，减少每个 epoch 之间的停顿

# 优化器设置
$use_8bit_adam = 0 # use 8bit adam optimizer | 使用 8bit adam 优化器节省显存，默认启用。部分 10 系老显卡无法使用，修改为 0 禁用。
$use_lion = 1 # use lion optimizer | 使用 Lion 优化器
···

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
Traceback (most recent call last):
File "D:\AI\lora-scripts\sd-scripts\train_network.py", line 497, in
train(args)
File "D:\AI\lora-scripts\sd-scripts\train_network.py", line 150, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "D:\AI\lora-scripts\sd-scripts\library\train_util.py", line 1566, in get_optimizer
import bitsandbytes as bnb
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes_init_.py", line 6, in
from .autograd._functions import (
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in
import bitsandbytes.functional as F
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cextension.py", line 41, in
lib = CUDALibrary_Singleton.get_instance().lib
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cextension.py", line 37, in get_instance
cls.instance.initialize()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cextension.py", line 31, in initialize
self.lib = ct.cdll.LoadLibrary(binary_path)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\ctypes_init.py", line 452, in LoadLibrary
return self.dlltype(name)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\ctypes_init.py", line 364, in init
if '/' in name or '\' in name:
TypeError: argument of type 'WindowsPath' is not iterable
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/goodAsianGirlFaceV10_goodasiangirlfaceV10.safetensors', '--train_data_dir=./train/shishi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=768,1024', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=64', '--network_alpha=32', '--output_name=liushishi', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.
Train finished

akegarasu / lora-scripts Goto Github PK

lora-scripts's Issues

Recommend Projects

Recommend Topics

Recommend Org