akegarasu / lora-scripts Goto Github PK
View Code? Open in Web Editor NEWLoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.
License: GNU Affero General Public License v3.0
LoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.
License: GNU Affero General Public License v3.0
安装的时候报错:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.14.0 requires torch==1.13.0, but you have torch 1.12.1+cu116 which is incompatible.
torchaudio 0.13.0+cpu requires torch==1.13.0, but you have torch 1.12.1+cu116 which is incompatible.
[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 1799.40it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (256, 256), count: 100
bucket 1: resolution (384, 384), count: 200
bucket 2: resolution (512, 512), count: 1500
mean ar error (without repeats): 0.0
prepare accelerator
Traceback (most recent call last):
File "D:\AI\kohya_ss\train_network.py", line 652, in
train(args)
File "D:\AI\kohya_ss\train_network.py", line 108, in train
accelerator, unwrap_model = train_util.prepare_accelerator(args)
File "D:\AI\kohya_ss\library\train_util.py", line 1973, in prepare_accelerator
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, mixed_precision=args.mixed_precision,
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 355, in init
raise ValueError(err.format(mode="fp16", requirement="a GPU"))
ValueError: fp16 mixed precision requires a GPU
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\AI\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI/novelai-webui-aki-v3A/models/Stable-diffusion/chilloutmix_NiPrunedFp32Fix.safetensors', '--train_data_dir=D:/AI/text1/image', '--resolution=512,512', '--output_dir=D:/AI/text1/model', '--logging_dir=D:/AI/text1/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0002', '--lr_scheduler=cosine', '--lr_warmup_steps=180', '--train_batch_size=1', '--max_train_steps=1800', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=Adafactor', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
使用两台linux服务器,每台上面有一张a10,运行train.sh后感觉两台机器在独立训练,有什么关键配置要改吗?
这两台机器跑accelerate官方的样例nlp_example.py是成功的
subprocess.CalledProcessError: Command '['D:\stable-diffusion-webui\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/stable-diffusion-webui/models/Stable-diffusion/protogenX34Photorealism_1.ckpt', '--train_data_dir=./train/wq', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=15', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=32', '--network_alpha=32', '--output_name=starwanqian', '--train_batch_size=2', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.
没有国内手机,夸克网盘那个下载不了
如果运行install cn会安装依赖失败
运行install会标红
cp : 找不到路径“I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\sd-scripts\bitsandbytes_windows”,因为该路径不
存在。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\install.ps1:16 字符: 1
+ CategoryInfo : ObjectNotFound: (I:\stable-diffu...ndbytes_windows:String) [Copy-Item], ItemNotFoundExce
ption
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand
cp : 找不到路径“I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\sd-scripts\bitsandbytes_windows\cextension.py”
,因为该路径不存在。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\install.ps1:17 字符: 1
+ CategoryInfo : ObjectNotFound: (I:\stable-diffu...s\cextension.py:String) [Copy-Item], ItemNotFoundExce
ption
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand
cp : 找不到路径“I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\sd-scripts\bitsandbytes_windows\main.py”,因为
该路径不存在。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\install.ps1:18 字符: 1
+ CategoryInfo : ObjectNotFound: (I:\stable-diffu...windows\main.py:String) [Copy-Item], ItemNotFoundExce
ption
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand
但是会安装成功,在这样的情况下运行train会
accelerate : 无法将“accelerate”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请
确保路径正确,然后再试一次。
所在位置 I:\stable-diffusion-webui_23-02-17\lora-scripts-0.2.0\train.ps1:95 字符: 1
+ CategoryInfo : ObjectNotFound: (accelerate:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
查了一下,Set-ExecutionPolicy -ExecutionPolicy RemoteSigned我也输入了,第二次有确认
pip install git+https://github.com/facebookresearch/xformers.git@0bad001ddd56c080524d37c84ff58d9cd030ebfd
this commond error like below
fatal: 无法访问 'https://github.com/facebookresearch/xformers.git/':Failed to connect to github.com port 443 after 130947 ms: 连接超时
if there is another way to install xformers?
ERROR: Exception:
Traceback (most recent call last):
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\cli\base_command.py", line 167, in exc_logging_wrapper
status = run_func(*args)
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\cli\req_command.py", line 247, in wrapper
return func(self, options, args)
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\commands\install.py", line 461, in run
installed = install_given_reqs(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\req_init_.py", line 73, in install_given_reqs
requirement.install(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\req\req_install.py", line 790, in install
install_wheel(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\operations\install\wheel.py", line 727, in install_wheel
_install_wheel(
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\operations\install\wheel.py", line 587, in _install_wheel
file.save()
File "C:\Users\Tyo\Desktop\lora-scripts-main\lora-scripts-main\venv\lib\site-packages\pip_internal\operations\install\wheel.py", line 388, in save
shutil.copyfileobj(f, dest)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 195, in copyfileobj
buf = fsrc_read(length)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 925, in read
data = self._read1(n)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1015, in _read1
self._update_crc(data)
File "C:\Users\Tyo\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 943, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'torch/lib/cusolver64_11.dll'
[notice] A new release of pip available: 22.2.1 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
在看civit上的训练教程发现了这个算法:https://arxiv.org/abs/2303.09556
据说可以提高训练效率。不知道大佬有没有了解,训练脚本是否有计划/有必要加入这个?
Looking in indexes: https://mirrors.bfsu.edu.cn/pypi/web/simple
Looking in links: https://mirror.sjtu.edu.cn/pytorch-wheels/torch_stable.html
Collecting torch==1.12.1+cu116
Downloading https://mirror.sjtu.edu.cn/pytorch-wheels/cu116/torch-1.12.1%2Bcu116-cp310-cp310-win_amd64.whl (2388.4 MB)
---------------------------------------- 2.4/2.4 GB 34.4 MB/s eta 0:00:00
Collecting torchvision==0.13.1+cu116
Downloading https://mirror.sjtu.edu.cn/pytorch-wheels/cu116/torchvision-0.13.1%2Bcu116-cp310-cp310-win_amd64.whl (2.6 MB)
---------------------------------------- 2.6/2.6 MB 9.6 MB/s eta 0:00:00
Collecting typing-extensions
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/31/25/5abcd82372d3d4a3932e1fa8c3dbf9efac10cc7c0d16e78467460571b404/typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Collecting numpy
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/fa/df/53e8c0c8ccecf360b827a3d2b1b6060644c635c3149a9d6415a6fe4ccf44/numpy-1.24.2-cp310-cp310-win_amd64.whl (14.8 MB)
---------------------------------------- 14.8/14.8 MB 65.1 MB/s eta 0:00:00
Collecting requests
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/d2/f4/274d1dbe96b41cf4e0efb70cbced278ffd61b5c7bb70338b62af94ccb25b/requests-2.28.2-py3-none-any.whl (62 kB)
---------------------------------------- 62.8/62.8 kB ? eta 0:00:00
Collecting pillow!=8.3.*,>=5.3.0
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/5e/7c/293136a5171800001be33c21a51daaca68fae954b543e2c015a6bb81a716/Pillow-9.4.0-cp310-cp310-win_amd64.whl (2.5 MB)
---------------------------------------- 2.5/2.5 MB 154.1 MB/s eta 0:00:00
Collecting idna<4,>=2.5
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl (61 kB)
---------------------------------------- 61.5/61.5 kB ? eta 0:00:00
Collecting certifi>=2017.4.17
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/71/4c/3db2b8021bd6f2f0ceb0e088d6b2d49147671f25832fb17970e9b583d742/certifi-2022.12.7-py3-none-any.whl (155 kB)
---------------------------------------- 155.3/155.3 kB ? eta 0:00:00
Collecting charset-normalizer<4,>=2
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/98/f4/5ca33ee1e0b3412cbd13eae230321a9fe819acf1a99ad6482420fb97cc6b/charset_normalizer-3.0.1-cp310-cp310-win_amd64.whl (96 kB)
---------------------------------------- 96.5/96.5 kB ? eta 0:00:00
Collecting urllib3<1.27,>=1.21.1
Downloading https://mirrors.bfsu.edu.cn/pypi/web/packages/fe/ca/466766e20b767ddb9b951202542310cba37ea5f2d792dae7589f1741af58/urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
---------------------------------------- 140.6/140.6 kB ? eta 0:00:00
Installing collected packages: charset-normalizer, urllib3, typing-extensions, pillow, numpy, idna, certifi, torch, requests, torchvision
Successfully installed certifi-2022.12.7 charset-normalizer-3.0.1 idna-3.4 numpy-1.24.2 pillow-9.4.0 requests-2.28.2 torch-1.12.1+cu116 torchvision-0.13.1+cu116 typing-extensions-4.5.0 urllib3-1.26.14
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
其他依赖安装失败。
安装失败。
caching latents.
0%| | 0/32 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/cpp/Downloads/sd/lora-scripts/./sd-scripts/train_network.py", line 548, in <module>
train(args)
File "/Users/cpp/Downloads/sd/lora-scripts/./sd-scripts/train_network.py", line 167, in train
train_dataset.cache_latents(vae)
File "/Users/cpp/Downloads/sd/lora-scripts/sd-scripts/library/train_util.py", line 508, in cache_latents
info.latents = vae.encode(img_tensor).latent_dist.sample().squeeze(0).to("cpu")
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 114, in encode
h = self.encoder(x)
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/diffusers/models/vae.py", line 101, in forward
sample = self.conv_in(sample)
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
Traceback (most recent call last):
File "/Users/cpp/Downloads/sd/lora-scripts/venv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
simple_launcher(args)
File "/Users/cpp/Downloads/sd/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 552, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/cpp/Downloads/sd/lora-scripts/venv/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/nonomi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=64', '--network_alpha=32', '--output_name=nonomi', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.
我看了青龙的视频,修改了安装脚本
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 -f https://mirror.sjtu.edu.cn/pytorch-wheels/torch_stable.html -i https://mirrors.bfsu.edu.cn/pypi/web/simple -U
pip install --upgrade -r requirements.txt
不知道有没有关系
`Traceback (most recent call last):
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 415, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 419, in load_state_dict
if f.read(7) == "version":
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 64: illegal multibyte sequence
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\train_network.py", line 711, in
train(args)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\train_network.py", line 130, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\library\train_util.py", line 2626, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\sd-scripts\library\model_util.py", line 921, in load_models_from_stable_diffusion_checkpoint
text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 2301, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 431, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin' at 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "E:\soft\stable-diffusion-webui_23-01-20\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)`
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
found directory 6_ichinose contains 76 image files
456 train images with repeating.
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 76/76 [00:00<00:00, 1117.78it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 640), count: 456
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net:
loading vae:
Traceback (most recent call last):
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 415, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "D:\Software\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 705, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "D:\Software\lora-scripts\venv\lib\site-packages\torch\serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 419, in load_state_dict
if f.read(7) == "version":
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 64: illegal multibyte sequence
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Software\lora-scripts\sd-scripts\train_network.py", line 507, in
train(args)
File "D:\Software\lora-scripts\sd-scripts\train_network.py", line 96, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "D:\Software\lora-scripts\sd-scripts\library\train_util.py", line 1860, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "D:\Software\lora-scripts\sd-scripts\library\model_util.py", line 919, in load_models_from_stable_diffusion_checkpoint
text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 2301, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "D:\Software\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 431, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin' at 'huggingface\hub\models--openai--clip-vit-large-patch14\snapshots\8d052a0f05efbaefbc9e8786ba291cfdf93e5bff\pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Software\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\Software\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\Software\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\Software\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\Software\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/ichinose', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,640', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=ichinose', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--network_train_unet_only', '--use_8bit_adam']' returned non-zero exit status 1.
Train finished
每个lora训练几个小时,睡前开始跑,早上急着出门来不及重开一轮,一天里大部分时间都浪费了有点可惜
和window下面的依赖都不一样了,同步一下啊
sd-scripts已经增加了对LyCORIS,但我在lora-scripts中似乎更新不到sd-scripts的新版本
使用的最新的github代码,
底模:chilloutmix
报错内容:
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 8.00 GiB total capacity; 7.04 GiB already allocated; 0 bytes free; 7.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
run
train.sh
info
(venv) taper@pc:~/App/lora-scripts$ ./train.sh
Traceback (most recent call last):
File "/home/taper/App/python/3.10.6/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/launch.py", line 747, in launch_command
defaults = load_config_from_file(args.config_file)
File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/config/config_args.py", line 64, in load_config_from_file
return config_class.from_yaml_file(yaml_file=config_file)
File "/home/taper/App/python/3.10.6/lib/python3.10/site-packages/accelerate/commands/config/config_args.py", line 117, in from_yaml_file
return cls(**config_dict)
TypeError: ClusterConfig.__init__() got an unexpected keyword argument 'command_file'
train.sh
# 我只修改了这几行
pretrained_model="./sd-models/v1-5-pruned-emaonly.safetensors" # base model path | 底模路径
train_data_dir="./train/xxx/" # train dataset path | 训练数据集路径
resolution="512,640" # image resolution w,h. 图片分辨率,宽,高。支持非正方形,但必须是 64 倍数。
max_train_epoches=20 # max train epoches | 最大训练 epoch
output_name="xxx" # output model name | 模型保存名称
麻烦作者大大看看到底是哪里出问题了,这个accelerate
的错误信息实在不好排查
[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 76/76 [00:00<00:00, 4675.58it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (384, 640), count: 54
bucket 1: resolution (448, 576), count: 12
bucket 2: resolution (512, 512), count: 342
bucket 3: resolution (640, 384), count: 48
mean ar error (without repeats): 0.022491709236204763
prepare accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint
C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\safetensors\torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(filename, framework="pt", device=device) as f:
C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = cls(wrap_storage=untyped_storage)
Traceback (most recent call last):
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\train_network.py", line 724, in <module>
train(args)
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\train_network.py", line 135, in train
text_encoder, vae, unet, _ = train_util.load_target_model(
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\library\train_util.py", line 2649, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path, device)
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\library\model_util.py", line 869, in load_models_from_stable_diffusion_checkpoint
_, state_dict = load_checkpoint_with_text_encoder_conversion(ckpt_path, device)
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\sd-scripts\library\model_util.py", line 844, in load_checkpoint_with_text_encoder_conversion
state_dict = load_file(ckpt_path) # , device) # may causes error
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\safetensors\torch.py", line 100, in load_file
result[k] = f.get_tensor(k)
RuntimeError: shape '[1280, 1280, 3, 3]' is invalid for input of size 3657939
Traceback (most recent call last):
File "C:\Users\default.DESKTOP-08UGQK7\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\default.DESKTOP-08UGQK7\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\default.DESKTOP-08UGQK7\Desktop\lora-scripts-main\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\default.DESKTOP-08UGQK7\\Desktop\\lora-scripts-main\\venv\\Scripts\\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/v1-5-pruned-emaonly.safetensors', '--train_data_dir=./train/nine-age', '--output_dir=./output', '--logging_dir=./logs', '--log_prefix=nine_ling', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=nine_ling', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0']' returned non-zero exit status 1.
Train finished
以上是运行train.ps1时出现的报错,基础模型也换了,文件名照着大佬发的视频又检查了一遍还是不行,人已经傻了
Device: win10 / 3060ti
In my training step, i use the notepad++ to change the parameters, however, when i used shell to activate, it gave me this error.(picture 1)
.
I quickly found that there is something wrong in this train.ps1. Althought the parameter "--learning-rate" has number in it , i dont know why the programme ignore it. So I change the programme like this
Fortunately, it worked, but then it reported me another error like this.
So is there any better way to solve those problems? I was quite confused.
反复重新安装过几次了,都会报这个错误。看install.bash里面确实有triton的安装指令
$ bash train.sh
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
Traceback (most recent call last):
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/./sd-scripts/train_network.py", line 548, in
train(args)
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/./sd-scripts/train_network.py", line 121, in train
train_dataset = DreamBoothDataset(args.train_batch_size, args.train_data_dir, args.reg_data_dir,
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/sd-scripts/library/train_util.py", line 754, in init
train_dirs = os.listdir(train_data_dir)
FileNotFoundError: [Errno 2] No such file or directory: './train/aki'
Traceback (most recent call last):
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/data/application/xiw/project/pratice/Chilloutmix/lora-scripts/venv/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/aki', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=32', '--network_alpha=32', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.
取消8 bit adam也没办法使用
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
Traceback (most recent call last):
File "D:\AI\developer\lora\lora-scripts\sd-scripts\train_network.py", line 548, in
train(args)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\train_network.py", line 156, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\library\train_util.py", line 1584, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, args.pretrained_model_name_or_path)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\library\model_util.py", line 877, in load_models_from_stable_diffusion_checkpoint
converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
File "D:\AI\developer\lora\lora-scripts\sd-scripts\library\model_util.py", line 234, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
File "C:\Users\XIE\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\XIE\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\AI\developer\lora\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\AI\developer\lora\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\AI\developer\lora\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\AI\developer\lora\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\AI\developer\lora\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/koreanDollLikeness_v15.safetensors', '--train_data_dir=./train/donghan', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=15', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=32', '--network_alpha=32', '--output_name=donghan', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=512', '--max_bucket_reso=512']' returned non-zero exit status 1.
Train finished
请问怎么设置才能使用v2.1大模型呢?尝试手动在train.ps1里第101行--enable_bucket `前面加上--v2,执行会提醒v2不能同时使用clip_skip,然后注释掉clip_skip仍然运行不了
When I ran the install-cn.ps1, it failed at installing xformers.
The install cmd:
pip install -U -I --no-deps https://jihulab.com/api/v4/projects/82097/packages/pypi/files/e8508fe14c8f2552a822f5e6f5620b24fdd4ba3129c2a31a39b56425bcc023bc/xformers-0.0.14.dev0+torch12-cp310-cp310-win_amd64.whl
The error msg:
ERROR: xformers-0.0.14.dev0+torch12-cp310-cp310-win_amd64.whl is not a supported wheel on this platform.
I tried to manually install that version via pip, but no version matches:
pip install xformers==0.0.14.dev0 ERROR: Could not find a version that satisfies the requirement xformers==0.0.14.dev0 (from versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.0.10, 0.0.11, 0.0.12, 0.0.13, 0.0.16rc424, 0.0.16rc425, 0.0.16, 0.0.17.dev448, 0.0.17.dev449, 0.0.17.dev451, 0.0.17.dev461, 0.0.17.dev464) ERROR: No matching distribution found for xformers==0.0.14.dev0
Also tried some .whl in this page. Still no luck.
Could anyone help?
报错
Traceback (most recent call last):
File "/root/lora-scripts/./sd-scripts/train_network.py", line 699, in
train(args)
File "/root/lora-scripts/./sd-scripts/train_network.py", line 538, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 381, in forward
sample, res_samples = downsample_block(
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 612, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/diffusers/models/attention.py", line 484, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/lora-scripts/sd-scripts/library/train_util.py", line 1700, in forward_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None) # 最適なのを選んでくれる
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 975, in memory_efficient_attention
return op.apply(query, key, value, attn_bias, p, scale).reshape(output_shape)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 360, in forward
out, lse = cls.FORWARD_OPERATOR(
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/torch/_ops.py", line 143, in call
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
steps: 0%| | 0/1600 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/lora/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/root/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/lora/bin/python', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/wzgrx', '--output_dir=./output', '--logging_dir=./logs', '--resolution=1536,1920', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=128', '--network_alpha=64', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.
(lora) root@autodl-container-a25511bdfa-77fa09a8:~/lora-scripts#
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare images.
found directory train\nijika\6_nijika contains 42 image files
252 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (1024, 1024)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False
[Subset 0 of Dataset 0]
image_dir: "train\nijika\6_nijika"
image_count: 42
num_repeats: 6
shuffle_caption: True
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
is_reg: False
class_tokens: nijika
caption_extension: .txt
[Dataset 0]
loading image sizes.
100%|█████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 419.76it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (1024, 1024), count: 126
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net:
loading vae:
loading text encoder:
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|██████████████████████████████████████████████████████████████████████████████████| 21/21 [00:36<00:00, 1.73s/it]
import network module: networks.lora
create LoRA network. base dim (rank): 32, alpha: 32.0
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use AdamW optimizer | {}
override steps. steps for 10 epochs is / 指定エポックまでのステップ数: 1260
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 252
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 126
num epochs / epoch数: 10
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1260
steps: 0%| | 0/1260 [00:00<?, ?it/s]epoch 1/10
Traceback (most recent call last):
File "K:\lora-scripts\sd-scripts\train_network.py", line 699, in
train(args)
File "K:\lora-scripts\sd-scripts\train_network.py", line 538, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "K:\lora-scripts\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 381, in forward
sample, res_samples = downsample_block(
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 612, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\attention.py", line 216, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\venv\lib\site-packages\diffusers\models\attention.py", line 484, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "K:\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "K:\lora-scripts\sd-scripts\library\train_util.py", line 1700, in forward_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None) # 最適なのを選んでくれる
File "K:\lora-scripts\venv\lib\site-packages\xformers\ops.py", line 865, in memory_efficient_attention
return op.apply(query, key, value, attn_bias, p).reshape(output_shape)
File "K:\lora-scripts\venv\lib\site-packages\xformers\ops.py", line 319, in forward
out, lse = cls.FORWARD_OPERATOR(
File "K:\lora-scripts\venv\lib\site-packages\torch_ops.py", line 143, in call
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
steps: 0%| | 0/1260 [01:11<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Administrator.WIN-7JDG5CD1HHH\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Administrator.WIN-7JDG5CD1HHH\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "K:\lora-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "K:\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "K:\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "K:\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['K:\lora-scripts\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/nijika', '--output_dir=./output', '--logging_dir=./logs', '--resolution=1024,1024', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=ckpt', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption']' returned non-zero exit status 1.
Train finished
Traceback (most recent call last):
File "/home/ipfs10/lora/lora-scripts/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/ipfs10/lora/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/ipfs10/lora/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/home/ipfs10/lora/lora-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ipfs10/lora/lora-scripts/venv/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=/home/ipfs10/stable-diffusion-webui/models/Stable-diffusion/yy/yy_900.safetensors', '--train_data_dir=./train/sucai', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,640', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=64', '--network_alpha=32', '--output_name=aki', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.
How to use local environment instead of venv?
怎么用本地环境替代虚拟环境……用虚拟环境还得再装一次pytorch,我电脑空间不是很够啊
(lora) root@autodl-container-a1d611953c-8035be35:~/lora-scripts# bash train.sh
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: ./train/laohuang/.ipynb_checkpoints
0 train images with repeating.
loading image sizes.
0it [00:00, ?it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (256, 832), count: 0
bucket 1: resolution (256, 896), count: 0
bucket 2: resolution (256, 960), count: 0
bucket 3: resolution (256, 1024), count: 0
bucket 4: resolution (320, 704), count: 0
bucket 5: resolution (320, 768), count: 0
bucket 6: resolution (384, 640), count: 0
bucket 7: resolution (448, 576), count: 0
bucket 8: resolution (512, 512), count: 0
bucket 9: resolution (576, 448), count: 0
bucket 10: resolution (640, 384), count: 0
bucket 11: resolution (704, 320), count: 0
bucket 12: resolution (768, 320), count: 0
bucket 13: resolution (832, 256), count: 0
bucket 14: resolution (896, 256), count: 0
bucket 15: resolution (960, 256), count: 0
bucket 16: resolution (1024, 256), count: 0
/root/miniconda3/envs/lora/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/root/miniconda3/envs/lora/lib/python3.10/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
mean ar error (without repeats): nan
No data found. Please verify arguments / 画像がありません。引数指定を確認してください
如题,这个应该怎么解决呢?
Traceback (most recent call last):
File "F:\lora-scripts-0.2.0\sd-scripts\train_network.py", line 642, in
train(args)
File "F:\lora-scripts-0.2.0\sd-scripts\train_network.py", line 114, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "F:\lora-scripts-0.2.0\sd-scripts\library\train_util.py", line 2008, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "F:\lora-scripts-0.2.0\sd-scripts\library\model_util.py", line 877, in load_models_from_stable_diffusion_checkpoint
converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
File "F:\lora-scripts-0.2.0\sd-scripts\library\model_util.py", line 234, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
File "C:\Users\keina\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\keina\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\lora-scripts-0.2.0\venv\Scripts\accelerate.exe_main.py", line 7, in
File "F:\lora-scripts-0.2.0\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "F:\lora-scripts-0.2.0\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "F:\lora-scripts-0.2.0\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\lora-scripts-0.2.0\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/before.safetensors', '--train_data_dir=./train/', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,768', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=64', '--network_alpha=32', '--output_name=after', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--reg_data_dir=./train/reg', '--use_8bit_adam', '--use_lion_optimizer']' returned non-zero exit status 1.
Train finished
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 78
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 26
num epochs / epoch数: 20
batch size per device / バッチサイズ: 3
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 520
steps: 0%| | 0/520 [00:00<?, ?it/s]epoch 1/20
Traceback (most recent call last):
File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 699, in <module>
train(args)
File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 538, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 381, in forward
sample, res_samples = downsample_block(
File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 612, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 484, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stable/lora-scripts/sd-scripts/library/train_util.py", line 1700, in forward_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None) # 最適なのを選んでくれる
File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 975, in memory_efficient_attention
return op.apply(query, key, value, attn_bias, p, scale).reshape(output_shape)
File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 360, in forward
out, lse = cls.FORWARD_OPERATOR(
File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/_ops.py", line 442, in __call__
return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:140 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:488 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:291 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:296 [backend fallback]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:482 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:743 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:189 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:484 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
steps: 0%| | 0/520 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/stable/anaconda3/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/stable/anaconda3/bin/python', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/yazi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,704', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=ba_yazi_V10', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.
大佬们帮忙看下是什么情况呢
用8bit-adam
可以正常训练,使用lion
训练就提示错误,请问如何处理?具体错误描述如下:
Traceback (most recent call last):
File "D:\mys\tools\Lora\sd-scripts\library\train_util.py", line 1593, in get_optimizer
import lion_pytorch
ModuleNotFoundError: No module named 'lion_pytorch'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\mys\tools\Lora\sd-scripts\train_network.py", line 507, in <module>
train(args)
File "D:\mys\tools\Lora\sd-scripts\train_network.py", line 150, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "D:\mys\tools\Lora\sd-scripts\library\train_util.py", line 1595, in get_optimizer
raise ImportError("No lion_pytorch / lion_pytorch がインストールされていないようです")
ImportError: No lion_pytorch / lion_pytorch がインストールされていないようです
Traceback (most recent call last):
File "C:\Users\iamgb\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\iamgb\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\mys\tools\Lora\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "D:\mys\tools\Lora\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\mys\tools\Lora\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\mys\tools\Lora\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\mys\\tools\\Lora\\venv\\Scripts\\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=64', '--network_alpha=64', '--output_name=ZhangJiani', '--train_batch_size=1', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--network_train_unet_only', '--use_lion_optimizer']' returned non-zero exit status 1.
Train finished
two 2080ti,但是知识调用了一块,如何设置调用两块同时训练
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
when train I get the following problem. Have anyone have same problem?
steps: 0%| | 0/20880 [00:00<?, ?it/s]epoch 1/20
Traceback (most recent call last):
File "/data/sd/sd/stable-diffusion-webui/lora-scripts/./sd-scripts/train_network.py", line 642, in
train(args)
File "/data/sd/sd/stable-diffusion-webui/lora-scripts/./sd-scripts/train_network.py", line 503, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 489, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 582, in forward
sample, res_samples = downsample_block(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 837, in forward
hidden_states = attn(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 265, in forward
hidden_states = block(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 291, in forward
attn_output = self.attn1(
File "/data/sd/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
TypeError: replace_unet_cross_attn_to_xformers..forward_xformers() got an unexpected keyword argument 'encoder_hidden_states'
Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
/content/lora-scripts
Traceback (most recent call last):
File "./sd-scripts/train_network.py", line 20, in
import library.config_util as config_util
File "/content/lora-scripts/sd-scripts/library/config_util.py", line 113, in
class ConfigSanitizer:
File "/content/lora-scripts/sd-scripts/library/config_util.py", line 116, in ConfigSanitizer
def __validate_and_convert_twodim(klass, value: Sequence) -> Tuple:
File "/usr/local/lib/python3.8/dist-packages/toolz/functoolz.py", line 201, in init
raise TypeError("Input must be callable")
TypeError: Input must be callable
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.safetensors', '--train_data_dir=./train/aki', '--output_dir=./output', '--logging_dir=./logs', '--resolution=768,768', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=2e-4', '--text_encoder_lr=3e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=3', '--network_dim=128', '--network_alpha=64', '--output_name=train', '--train_batch_size=4', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=1', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_lion_optimizer']' returned non-zero exit status 1.
My graphics card is RTX 2080, when I run train.ps1 with ckpt model I got this problem:
RuntimeError: PytorchStreamReader failed reading zip archive: failed central directory
My CUAD version is 11.7, is that torch version's problem? And when I changed model to safetensors, I got a new problem:
KeyError:'encoder.conv_in.weight'
How can I fix it, appreciate a lot
前晚pull后还运行过一次没问题,今早手贱又pull了一次,结果powershell报这个错,我没动powershell设置,但脚本好像也没错?
./install-cn.ps1
所在位置 D:\tools\AIGC\lora-scripts\install-cn.ps1:48 字符: 14
+ Write-Output "瀹夎瀹屾瘯銆?
+ ~~~~~~~~~
字符串缺少终止符: "。
+ CategoryInfo : ParserError: (:) [], ParseException
+ FullyQualifiedErrorId : TerminatorExpectedAtEndOfString
如题。
今天更新了版本,启动时报上面的错误,具体的配置内容如下:
# LoRA train script by @Akegarasu
# Train data path | 设置训练用模型、图片
$pretrained_model = "./sd-models/model.ckpt" # base model path | 底模路径
$train_data_dir = "./train" # train dataset path | 训练数据集路径
$reg_data_dir = "" # directory for regularization images | 正则化数据集路径,默认不使用正则化图像。
# Train related params | 训练相关参数
$resolution = "512,512" # image resolution w,h. 图片分辨率,宽,高。支持非正方形,但必须是 64 倍数。
$batch_size = 1 # batch size
$max_train_epoches = 10 # max train epoches | 最大训练 epoch
$save_every_n_epochs = 1 # save every n epochs | 每 N 个 epoch 保存一次
$network_dim = 64 # network dim | 常用 4~128,不是越大越好
$network_alpha = 64 # network alpha | 常用与 network_dim 相同的值或者采用较小的值,如 network_dim的一半 防止下溢。默认值为 1,使用较小的 alpha 需要提升学习率。
$clip_skip = 2 # clip skip | 玄学 一般用 2
$train_unet_only = 1 # train U-Net only | 仅训练 U-Net,开启这个会牺牲效果大幅减少显存使用。6G显存可以开启
$train_text_encoder_only = 0 # train Text Encoder only | 仅训练 文本编码器
# Learning rate | 学习率
$lr = "1e-4"
$unet_lr = "1e-4"
$text_encoder_lr = "1e-5"
$lr_scheduler = "cosine_with_restarts" # "linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"
$lr_warmup_steps = 0 # warmup steps | 仅在 lr_scheduler 为 constant_with_warmup 时需要填写这个值
$lr_restart_cycles = 1 # cosine_with_restarts restart cycles | 余弦退火重启次数,仅在 lr_scheduler 为 cosine_with_restarts 时起效。
# Output settings | 输出设置
$output_name = "test" # output model name | 模型保存名称
$save_model_as = "safetensors" # model save ext | 模型保存格式 ckpt, pt, safetensors
# 其他设置
$network_weights = "" # pretrained weights for LoRA network | 若需要从已有的 LoRA 模型上继续训练,请填写 LoRA 模型路径。
$min_bucket_reso = 256 # arb min resolution | arb 最小分辨率
$max_bucket_reso = 1024 # arb max resolution | arb 最大分辨率
$persistent_data_loader_workers = 0 # persistent dataloader workers | 容易爆内存,保留加载训练集的worker,减少每个 epoch 之间的停顿
# 优化器设置
$use_8bit_adam = 0 # use 8bit adam optimizer | 使用 8bit adam 优化器节省显存,默认启用。部分 10 系老显卡无法使用,修改为 0 禁用。
$use_lion = 1 # use lion optimizer | 使用 Lion 优化器
···
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
Traceback (most recent call last):
File "D:\AI\lora-scripts\sd-scripts\train_network.py", line 497, in
train(args)
File "D:\AI\lora-scripts\sd-scripts\train_network.py", line 150, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "D:\AI\lora-scripts\sd-scripts\library\train_util.py", line 1566, in get_optimizer
import bitsandbytes as bnb
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes_init_.py", line 6, in
from .autograd._functions import (
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in
import bitsandbytes.functional as F
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cextension.py", line 41, in
lib = CUDALibrary_Singleton.get_instance().lib
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cextension.py", line 37, in get_instance
cls.instance.initialize()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\bitsandbytes\cextension.py", line 31, in initialize
self.lib = ct.cdll.LoadLibrary(binary_path)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\ctypes_init.py", line 452, in LoadLibrary
return self.dlltype(name)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\ctypes_init.py", line 364, in init
if '/' in name or '\' in name:
TypeError: argument of type 'WindowsPath' is not iterable
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\Administrator\AppData\Local\Programs\Python\Python3108\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/goodAsianGirlFaceV10_goodasiangirlfaceV10.safetensors', '--train_data_dir=./train/shishi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=768,1024', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--network_dim=64', '--network_alpha=32', '--output_name=liushishi', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.
Train finished
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.