xavierxiao / dreambooth-stable-diffusion Goto Github PK
View Code? Open in Web Editor NEWImplementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
License: MIT License
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
License: MIT License
C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loggers\test_tube.py:104: LightningDeprecationWarning: The TestTubeLogger is deprecated since v1.5 and will be removed in v1.7. We recommend switching to the
pytorch_lightning.loggers.TensorBoardLoggeras an alternative. rank_zero_deprecation( Monitoring val/loss_simple_ema as checkpoint metric. Merged modelckpt-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'logs\\SUBJECT2022-10-04T06-25-48_DSU90\\checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 1, 'every_n_train_steps': 500}} GPU available: True, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py:1584: UserWarning: GPU available but not used. Set the gpus flag in your trainer
Trainer(gpus=1)or script
--gpus=1`.
rank_zero_warn(
train, PersonalizedBase, 1500
reg, PersonalizedBase, 15000
validation, PersonalizedBase, 15
accumulate_grad_batches = 1
++++ NOT USING LR SCALING ++++
Setting learning rate to 1.00e-06
C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:275: LightningDeprecationWarning: The on_keyboard_interrupt
callback hook was deprecated in v1.5 and will be removed in v1.7. Please use the on_exception
callback hook instead.
rank_zero_deprecation(
C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:284: LightningDeprecationWarning: Base LightningModule.on_train_batch_start
hook signature has changed in v1.5. The dataloader_idx
argument will be removed in v1.7.
rank_zero_deprecation(
C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:291: LightningDeprecationWarning: Base Callback.on_train_batch_end
hook signature has changed in v1.5. The dataloader_idx
argument will be removed in v1.7.
rank_zero_deprecation(
C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\datamodule.py:469: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.
rank_zero_deprecation(
LatentDiffusion: Also optimizing conditioner params!
Project config
model:
base_learning_rate: 1.0e-06
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
reg_weight: 1.0
linear_start: 0.00085
linear_end: 0.012
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: caption
image_size: 64
channels: 4
cond_stage_trainable: true
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: false
embedding_reg_weight: 0.0
unfreeze_model: true
model_lr: 1.0e-06
personalization_config:
target: ldm.modules.embedding_manager.EmbeddingManager
params:
placeholder_strings:
- '*'
initializer_words:
- sculpture
per_image_tokens: false
num_vectors_per_token: 1
progressive_words: false
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
- 4
num_heads: 8
use_spatial_transformer: true
transformer_depth: 1
context_dim: 768
use_checkpoint: true
legacy: false
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 512
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
ckpt_path: C:\Users\Urban\Desktop\textual_inversion-main\models\ldm\sd-v1-4-full-ema.ckpt
data:
target: main.DataModuleFromConfig
params:
batch_size: 1
num_workers: 1
wrap: false
train:
target: ldm.data.personalized.PersonalizedBase
params:
size: 512
set: train
per_image_tokens: false
repeats: 100
placeholder_token: dog
reg:
target: ldm.data.personalized.PersonalizedBase
params:
size: 512
set: train
reg: true
per_image_tokens: false
repeats: 10
placeholder_token: dog
validation:
target: ldm.data.personalized.PersonalizedBase
params:
size: 512
set: val
per_image_tokens: false
repeats: 10
placeholder_token: dog
Lightning config
modelcheckpoint:
params:
every_n_train_steps: 500
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 200
max_images: 8
increase_log_steps: false
trainer:
benchmark: true
max_steps: 800
gpus: 0
982 M Trainable params
83.7 M Non-trainable params
1.1 B Total params
4,264.941 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\data_loading.py:132: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers
argument(try 8 which is the number of cpus on this machine) in the
DataLoader` init to improve performance.
rank_zero_warn(
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]Summoning checkpoint.
Traceback (most recent call last):
File "main.py", line 838, in
trainer.fit(model, data)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run
self._dispatch()
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage
return self._run_train()
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 236, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(*args, **kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 368, in validation_step
_, loss_dict_no_ema = self.shared_step(batch)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 908, in shared_step
loss = self(x, c)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 937, in forward
c = self.get_learned_conditioning(c)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 595, in get_learned_conditioning
c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 324, in encode
return self(text, **kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 319, in forward
z = self.transformer(input_ids=tokens, **kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 297, in transformer_forward
return self.text_model(
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 258, in text_encoder_forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids, embedding_manager=embedding_manager)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 180, in embedding_forward
inputs_embeds = self.token_embedding(input_ids)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
return F.embedding(
File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)`
I'm in need of any perspective anybody can give on making this compatible for the right calls on a windows environment where WSL is not an option.
To create a public link, set share=True
in launch()
.
Loading weights [6f849715] from C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\models\gfx_20_training_images_2020_batch_size_person_class_word.ckpt
Traceback (most recent call last):
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 273, in run_predict
output = await app.blocks.process_api(
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 742, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 653, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\modules\ui.py", line 927, in run_settings
opts.data_labels[key].onchange()
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\webui.py", line 41, in f
res = func(*args, **kwargs)
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\webui.py", line 73, in
shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(shared.sd_model)))
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\modules\sd_models.py", line 147, in reload_model_weights
load_model_weights(sd_model, checkpoint_info.filename, checkpoint_info.hash)
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\modules\sd_models.py", line 97, in load_model_weights
pl_sd = torch.load(checkpoint_file, map_location="cpu")
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 705, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "C:\Python_Scripts_stableDiffusion\super-stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
This is what I get when trying to run the training command
Traceback (most recent call last):
File "main.py", line 634, in
config.data.params.reg.params.placeholder_token = opt.class_word
File "/home/linuxinstall/anaconda3/envs/ldm/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 353, in getattr
self._format_and_raise(
File "/home/linuxinstall/anaconda3/envs/ldm/lib/python3.8/site-packages/omegaconf/base.py", line 190, in _format_and_raise
format_and_raise(
File "/home/linuxinstall/anaconda3/envs/ldm/lib/python3.8/site-packages/omegaconf/_utils.py", line 821, in format_and_raise
_raise(ex, cause)
File "/home/linuxinstall/anaconda3/envs/ldm/lib/python3.8/site-packages/omegaconf/_utils.py", line 719, in _raise
raise ex.with_traceback(sys.exc_info()[2]) # set end OC_CAUSE=1 for full backtrace
File "/home/linuxinstall/anaconda3/envs/ldm/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 351, in getattr
return self._get_impl(key=key, default_value=DEFAULT_MARKER)
File "/home/linuxinstall/anaconda3/envs/ldm/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 438, in _get_impl
node = self._get_node(key=key, throw_on_missing_key=True)
File "/home/linuxinstall/anaconda3/envs/ldm/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 470, in _get_node
raise ConfigKeyError(f"Missing key {key}")
omegaconf.errors.ConfigAttributeError: Missing key reg
full_key: data.params.reg
object_type=dict
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 851, in
if trainer.global_rank == 0:
NameError: name 'trainer' is not defined
Also as a side question, I haven't been sure what exactly is the difference between the data_root images and the reg_data_root images in the training command?
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.65 GiB total capacity; 22.01 GiB already allocated; 26.44 MiB free; 22.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
What can i do
there is something like this?
main.py aborts abruptly only giving "^C" as the error
(Colab miniconda environment, have tried installing libraries manually through pip and get the same problem)
keep getting this error that its been Killed lol
/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2102: LightningDeprecationWarning: Trainer.root_gpu is deprecated in v1.6 and will be removed in v1.8. Please use Trainer.strategy.root_device.index instead.
rank_zero_deprecation(
Epoch 0: 0%| | 0/1010 [00:00<?, ?it/s]/venv/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:72: UserWarning: Trying to infer the batch_size from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size).
warning_cache.warn(
/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:229: UserWarning: You called self.log('global_step', ...) in your training_step but the value needs to be floating point. Converting it to torch.float32.
warning_cache.warn(
Epoch 0: 0%| | 1/1010 [00:02<47:26, 2.82s/it, loss=0.0249, v_num=0, train/losHere comes the checkpoint...
Killed
Hi - thanks so much for this awesome implementation!
I noticed a small issue with a missing font file, which leads to a crash after saving the first checkpoint at 500 steps. It was easy to work around, but I thought I'd mention it:
Dreambooth-Stable-Diffusion/ldm/util.py
Line 25 in 00e984c
Traceback (most recent call last):
File "scripts/stable_txt2img.py", line 292, in <module>
main()
File "scripts/stable_txt2img.py", line 195, in main
model = load_model_from_config(config, f"{opt.ckpt}")
File "scripts/stable_txt2img.py", line 31, in load_model_from_config
model = instantiate_from_config(config.model)
File "/Users/adam/workspace/stable-diffusion/ldm/util.py", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/Users/adam/workspace/stable-diffusion/ldm/models/diffusion/ddpm.py", line 448, in __init__
super().__init__(conditioning_key=conditioning_key, *args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'personalization_config'
Starting this "issue" to show how easy it is to overtrain an object.
200 regularization images. 13 subject photos. 2000 steps.
GROUND TRUTH
3D render of sks, unreal engine 5
3D render of sks guitar, unreal engine 5
jimi hendrix playing a guitar made of sks
jimi hendrix playing a guitar made of sks, a psychedelic godlike humanoid, hyper detailed, in the style of rutkowski and junji ito and bob ross and lisa frank, sks
sks fox in a sks black trench coat, holding sks with both hands, 3 d render, unreal engine 5, photorealistic
sks guitar in the style of carne griffiths
woman playing sks guitar, pastel in the style of bruce weber
publicity photograph of a marble statue sks guitar, in the massive atrium of a musical intrumnent museum
simon brehm playing an acoustic double bass sks guitar in the style of carl larsson
portrait painting of incredibly beautiful woman holding an sks guitar
painting on a wall of sks guitar in the style of banky
Hey, So I managed to run Stable Diffusion dreambooth training in just 17.7GB GPU usage by replacing the attention with memory efficient flash attention from xformers. Along with using way less memory, it also runs 2 times faster. So it's possible to train SD in 24GB GPUs now. Tested on Nvidia A10G, took 15-20 mins to train. I hope it's helpful.
Code in my fork: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/
With some more tweaks it might be possible to train even on 16 GB gpus.
Hey I am running on PC via Anaconda CLI with Administrator privileges.
I'm getting this error thrown: "PermissionError: [WinError 5] Access is denied", which cascades with more "MemoryError" errors.
How may I get it to work?
This is the command I am using:
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume models/ldm/stable-diffusion-v1/sd-v1-4-full-ema.ckpt -n spinferno_01 --gpus 0, --data_root training-images --reg_data_root regularization-images --class_word man
and this is the full exception traceback:
exception traceback.txt
My deepest thanks in advance for anyone who can take a look and get this working :)
I want to save and load the model from google drive or other resources online can you please add 2 cells for that functionality.
Thanks
I have never built a Gradio or Streamlit app, but it shouldn't be too complicated. It would give non-tech savvy people the opportunity to use it easily e.g. https://github.com/sd-webui/stable-diffusion-webui
Traceback (most recent call last):
File "E:\ai\sd\dbsdo\main.py", line 643, in
model = load_model_from_config(config, opt.actual_resume)
File "E:\ai\sd\dbsdo\main.py", line 30, in load_model_from_config
model = instantiate_from_config(config.model)
File "E:\ai\sd\dbsdo\ldm\util.py", line 86, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()), **kwargs)
File "E:\ai\sd\dbsdo\ldm\util.py", line 94, in get_obj_from_str
return getattr(importlib.import_module(module, package=None), cls)
File "E:\Anaconda3\lib\importlib_init_.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "E:\ai\sd\dbsdo\ldm\models\diffusion\ddpm.py", line 26, in
from ldm.models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL
File "E:\ai\sd\dbsdo\ldm\models\autoencoder.py", line 6, in
from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer
ImportError: cannot import name 'VectorQuantizer2' from 'taming.modules.vqvae.quantize' (E:\Anaconda3\lib\site-packages\taming\modules\vqvae\quantize.py)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\ai\sd\dbsdo\main.py", line 860, in
if trainer.global_rank == 0:
NameError: name 'trainer' is not defined
I've seen this mentioned in some other threads here, but figured I'd start a new issue.
This is what my trained model looked like before:
And after training someone else:
Every setting was different for the second person (class word, regularization images, etc). Yet there's certainly some sort of melding happening to the original trained token. Second token looked mostly fine (though there were some of the cfg_scale colorful artifacts).
Does anyone have any info on how to train multiple tokens at once (which seems possible, when looking deep in the code)?
Or any info on how to preserve the original trained token while adding another?
I saw the majority of the README file talking about regularization images. I guess those are used for fine-tuning. But what are the data_root used for? It seems to be from text inversion repo. Should I give the two the same folder path?
I noticed that there are some modified versions that require less VRAM, will these modifications affect the quality of the trained model?
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume ~/Dreambooth-Stable-Diffusion/sd-v1-4.ckpt -n project01 --gpus 0, --data_root ~/input --reg_data_root ~/Dreambooth-Stable-Diffusion/outputs/txt2img-samples/samples --class_word titus
"""
Epoch 0: 45%|███████████████████████▉ | 137/303 [02:04<02:30, 1.10it/s, loss=0.412, v_num=0, train/loss_simple_step=0.783, train/loss_vlb_step=0.0191, train/loss_step=0.783, global_step=136.0]Summoning checkpoint.
Traceback (most recent call last):
File "main.py", line 830, in
trainer.fit(model, data)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
self.fit_loop.run()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 156, in advance
batch_idx, (batch, self.batch_progress.is_last_batch) = next(self._dataloader_iter)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 203, in next
return self.fetching_function()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 270, in fetching_function
self._fetch_next_batch()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 300, in _fetch_next_batch
batch = next(self.dataloader_iter)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 550, in next
return self.request_next_batch(self.loader_iters)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 562, in request_next_batch
return apply_to_collection(loader_iters, Iterator, next)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 95, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
IsADirectoryError: Caught IsADirectoryError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ubuntu/Dreambooth-Stable-Diffusion/main.py", line 221, in getitem
return tuple(d[idx] for d in self.datasets)
File "/home/ubuntu/Dreambooth-Stable-Diffusion/main.py", line 221, in
return tuple(d[idx] for d in self.datasets)
File "/home/ubuntu/Dreambooth-Stable-Diffusion/ldm/data/personalized.py", line 188, in getitem
image = Image.open(self.image_paths[i % self.num_images])
File "/home/ubuntu/anaconda3/envs/ldm/lib/python3.8/site-packages/PIL/Image.py", line 2953, in open
fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/Dreambooth-Stable-Diffusion/outputs/txt2img-samples/samples/.ipynb_checkpoints'
"""
Hi, thanks for Dreambooth implementation. It's amazing.
The issue I suffered was as follows.
I trained the model with my data and wanted to generate some images.
But the above error occurred when generating.
I used the same code in this repo and didn't fix anything.
Then, what should I do about this issue..
Hi, similar to https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion, would you be interested in adding example notebook for training and inference using the diffusers pipeline?
Any ideas on how to reduce the size of the trained model from 12gb to 4gb(same as original stable diffusion checkpoint) ?
Runpod.io:
It appears that the below line is missing from "dreambooth_runpod_joepenna.ipynb". I was getting the "trainer not defined" error until I manually added it prior to the training step, ran the line, then it appears to be working (still training so not sure how well the training will go).
%load ldm/data/personalized.py
Also instructions need to be updated since this file has changed, previously we had to change the below code in it to reflect our token. This piece of code is now missing.
training_templates_smallest = [
'joepenna {}',
]
After passing 2 Epochs, I am getting this error:
pytorch_lightning.utilities.exceptions.MisconfigurationException: No test_dataloader()
method defined to run Trainer.test
.
Here is some more context:
Epoch 0, global step 499: val/loss_simple_ema was not in top 1
Epoch 0: 100%|█| 505/505 [09:33<00:00, 1.14s/it, loss=0.276, v_num=0, train/loss_simple_step=0.0151, train/loss_vlb_step=6.71e-5, Average Epoch time: 573.97 seconds
Average Peak memory 35456.11MiB
Epoch 1: 0%| | 0/505 [00:00<?, ?it/s, loss=0.276, v_num=0, train/loss_simple_step=0.0151, train/loss_vlb_step=6.71e-5, train/loss_Data shape for DDIM sampling is (1, 4, 64, 64), eta 1.0
Running DDIM Sampling with 200 timesteps
DDIM Sampler: 100%|███████████████████████████████████████████████████████████████████████████████| 200/200 [00:22<00:00, 8.96it/s]
Data shape for DDIM sampling is (1, 4, 64, 64), eta 1.0███████████████████████████████████████████| 200/200 [00:22<00:00, 8.96it/s]
Running DDIM Sampling with 200 timesteps
DDIM Sampler: 100%|███████████████████████████████████████████████████████████████████████████████| 200/200 [00:29<00:00, 6.78it/s]
Epoch 1: 0%| | 1/505 [00:59<8:19:19, 59.44s/it, loss=0.275, v_num=0, train/loss_simple_step=0.0144, train/loss_vlb_step=6.2e-5, tr[W accumulate_grad.h:185] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [320, 320, 1, 1], strides() = [320, 1, 1, 1]
param.sizes() = [320, 320, 1, 1], strides() = [320, 1, 320, 320] (function operator())
Epoch 1: 59%|▌| 300/505 [06:13<04:15, 1.24s/it, loss=0.245, v_num=0, train/loss_simple_step=0.0778, train/loss_vlb_step=0.000256, Average Epoch time: 373.33 seconds
Average Peak memory 35567.64MiB
Epoch 1: 60%|▌| 301/505 [06:13<04:13, 1.24s/it, loss=0.245, v_num=0, train/loss_simple_step=0.0778, train/loss_vlb_step=0.000256,
Saving latest checkpoint...
Traceback (most recent call last):
File "main.py", line 835, in <module>
trainer.test(model, data)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test
return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl
results = self._run(model, ckpt_path=self.tested_ckpt_path)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1128, in _run
verify_loop_configurations(self)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 42, in verify_loop_configurations
__verify_eval_loop_configuration(trainer, model, "test")
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 186, in __verify_eval_loop_configuration
raise MisconfigurationException(f"No `{loader_name}()` method defined to run `Trainer.{trainer_method}`.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: No `test_dataloader()` method defined to run `Trainer.test`.
Almost made it to a trained model after a day trying but I've fallen at this hurdle if anyone can help.
Using runpod on an A5000
project_name = "Tammy"
max_training_steps = 1000
class_word = "woman" # << match this word to the class word from regularization images above
reg_data_root = "/workspace/Dreambooth-Stable-Diffusion/outputs/txt2img-samples/samples/" + dataset
!rm -rf training_samples/.ipynb_checkpoints
!python "main.py"
--base configs/stable-diffusion/v1-finetune_unfrozen.yaml
-t
--actual_resume "model.ckpt"
--reg_data_root {reg_data_root}
-n {project_name}
--gpus 0,
--data_root "/workspace/Dreambooth-Stable-Diffusion/training_samples"
--max_training_steps {max_training_steps}
--class_word {class_word}
--no-test
This runs until;
anity Checking: 0it [00:00, ?it/s]/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers
argument(try 64 which is the number of cpus on this machine) in the
DataLoaderinit to improve performance. rank_zero_warn( /venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the
num_workers argument
(try 64 which is the number of cpus on this machine) in the DataLoader
init to improve performance.
rank_zero_warn(
Training: 0it [00:00, ?it/s]/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2102: LightningDeprecationWarning: Trainer.root_gpu
is deprecated in v1.6 and will be removed in v1.8. Please use Trainer.strategy.root_device.index
instead.
rank_zero_deprecation(
/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2102: LightningDeprecationWarning: Trainer.root_gpu
is deprecated in v1.6 and will be removed in v1.8. Please use Trainer.strategy.root_device.index
instead.
rank_zero_deprecation(
Epoch 0: 0%| | 0/2020 [00:00<?, ?it/s]/venv/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:72: UserWarning: Trying to infer the batch_size
from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size)
.
warning_cache.warn(
/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:229: UserWarning: You called self.log('global_step', ...)
in your training_step
but the value needs to be floating point. Converting it to torch.float32.
warning_cache.warn(
Epoch 0: 25%|▏| 500/2020 [10:51<33:00, 1.30s/it, loss=0.344, v_num=0, train/lo/venv/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:378: UserWarning: ModelCheckpoint(monitor='val/loss_simple_ema')
could not find the monitored key in the returned metrics: ['train/loss_simple', 'train/loss_simple_step', 'train/loss_vlb', 'train/loss_vlb_step', 'train/loss', 'train/loss_step', 'global_step', 'epoch', 'step']. HINT: Did you call log('val/loss_simple_ema', value)
in the LightningModule
?
warning_cache.warn(m)
Epoch 0, global step 500: 'val/loss_simple_ema' was not in top 1
Killed
I tried some advice from another thread to open the terminal and kill the launcher and webui process which worked on my first error but now I can't get past this one.
I could run the txt2img successfully on my MacBook with M1 chip.
However, when I ran main.py
I met problems like this:
RuntimeError: Placeholder storage has not been allocated on MPS device!
Also, I met a message like this: [W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.
There seems to be no instructions on model training on Mac.I wonder if there is a good solution.Thank you!
Hi, thanks for providing the codes and it's been helpful so far. One thing I have in mind is that, if there is any tips to make it work on A100. I know somewhere there is already a discussion about the memory usage and it's said that the training pipeline uses 35+ GB. I tried it on a 8 A100 instances last night and it still gives out of memory issues. Running on a single A6000 works though as A6000 has slightly more GPU memory than A100. It's just 48GB vs 40GB though...
Are there any plans for future windows compatibility?
Currently it causes a lot of issues thanks to the dependencies
i trains the first 332 then when its done it trains again to 189 and the i get a "OSError: cannot open resource?"
Traceback (most recent call last):
File "main.py", line 830, in
trainer.fit(model, data)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 231, in advance
self.trainer._call_callback_hooks("on_train_batch_end", batch_end_outputs, batch, batch_idx, **extra_kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1630, in _call_callback_hooks
self._on_train_batch_end(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1660, in _on_train_batch_end
callback.on_train_batch_end(self, self.lightning_module, outputs, batch, batch_idx, 0)
File "/workspace/Dreambooth-Stable-Diffusion-main/main.py", line 456, in on_train_batch_end
self.log_img(pl_module, batch, batch_idx, split="train")
File "/workspace/Dreambooth-Stable-Diffusion-main/main.py", line 424, in log_img
images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/Dreambooth-Stable-Diffusion-main/ldm/models/diffusion/ddpm.py", line 1343, in log_images
xc = log_txt_as_img((x.shape[2], x.shape[3]), batch["caption"])
File "/workspace/Dreambooth-Stable-Diffusion-main/ldm/util.py", line 25, in log_txt_as_img
font = ImageFont.truetype('data/DejaVuSans.ttf', size=size)
File "/opt/conda/lib/python3.7/site-packages/PIL/ImageFont.py", line 844, in truetype
return freetype(font)
File "/opt/conda/lib/python3.7/site-packages/PIL/ImageFont.py", line 841, in freetype
return FreeTypeFont(font, size, index, encoding, layout_engine)
File "/opt/conda/lib/python3.7/site-packages/PIL/ImageFont.py", line 194, in init
font, size, index, encoding, layout_engine=layout_engine
OSError: cannot open resource
I made a simple colab here https://colab.research.google.com/drive/1tugfQjRtH26QsX9lvMUdJqfCwWb5GHw4?usp=sharing
With some black magic (such as loading state dict to 16gb vram instead 12gb ram, using tensorboard logger because new pytorch lightning cannot use testtube and so on) it can run all the code up to these lines:
# run
if opt.train:
try:
trainer.fit(model, data)
except Exception:
melk()
raise
And i have 2 issues:
First, TypeError: on_train_batch_start() missing 1 required positional argument: 'dataloader_idx'
after trainer.fit(model, data)
, im sure what all dirs are correct and have no idea why. Also after that, model stores on cpu, not cuda.
Second, on exeption, melk()
do trainer.save_checkpoint(ckpt_path)
, and on this moment, RAM inflating very fast and reaches 12GB, so colab crashes.
It's just that i think running this in colab would be so cool, not everyone has 16GB of VRAM to load the model with full precision. Textual inversion is awesome too, but this one looks a bit more interesting.
Thanks for releasing this implementation.
GPU: Quadro P6000, 24GB RAM
I get "CUDA out of memory" on running both scripts/stable_txt2img.py
and main.py
.
How much RAM did you consume in your experiments? And do you have suggestions on how to reduce/ de-allocate wasteful memory usage?
Traceback (most recent call last):
File "main.py", line 836, in
trainer.fit(model, data)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 205, in run
self.on_advance_end()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 255, in on_advance_end
self._run_validation()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 311, in _run_validation
self.val_loop.run()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 134, in advance
self._on_evaluation_batch_end(output, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 267, in _on_evaluation_batch_end
self.trainer._call_callback_hooks(hook_name, output, *kwargs.values())
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/content/Dreambooth-SD-optimized/main.py", line 463, in on_validation_batch_end
self.log_img(pl_module, batch, batch_idx, split="val")
File "/content/Dreambooth-SD-optimized/main.py", line 426, in log_img
images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/content/Dreambooth-SD-optimized/ldm/models/diffusion/ddpm.py", line 1328, in log_images
batch = batch[0]
KeyError: 0
Does it work like dreambooth with identity preservation ? OR does it synhesize derivatives /mutations like textural iversion?
Did you tested it on human face or complex character like a robot with particular armor to see if it would recreate the armor exactly like on training images ?
Validation: 0it [00:00, ?it/s]
Validation: 0%| | 0/30 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/30 [00:00<?, ?it/s]pop from empty list
Summoning checkpoint.
Traceback (most recent call last):
File "main.py", line 832, in <module>
trainer.fit(model, data)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 205, in run
self.on_advance_end()
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 255, in on_advance_end
self._run_validation()
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 311, in _run_validation
self.val_loop.run()
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 134, in advance
self._on_evaluation_batch_end(output, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 267, in _on_evaluation_batch_end
self.trainer._call_callback_hooks(hook_name, output, *kwargs.values())
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/home/ubuntu/Dreambooth-Stable-Diffusion/main.py", line 460, in on_validation_batch_end
self.log_img(pl_module, batch, batch_idx, split="val")
File "/home/ubuntu/Dreambooth-Stable-Diffusion/main.py", line 424, in log_img
images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
File "/usr/lib/python3/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 1328, in log_images
batch = batch[0]
KeyError: 0
Hello,
Very excited to see that a Dreambooth implementation is already available for SD! Great work.
I'm trying to run the training script on a 3090, so I switched "unfreeze_model" from True to False in v1-finetune_unfrozen.yaml (I don't think this GPU has enough VRAM for the unfrozen model) - unfortunately this yields the following error:
Summoning checkpoint.
Traceback (most recent call last):
File "main.py", line 831, in <module>
trainer.fit(model, data)
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1188, in _run
self._pre_dispatch()
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1223, in _pre_dispatch
self.accelerator.pre_dispatch(self)
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 136, in pre_dispatch
self.training_type_plugin.pre_dispatch()
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\plugins\training_type\ddp.py", line 394, in pre_dispatch
self.configure_ddp()
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\plugins\training_type\ddp.py", line 371, in configure_ddp
self._model = self._setup_model(LightningDistributedModule(self.model))
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\pytorch_lightning\plugins\training_type\ddp.py", line 189, in _setup_model
return DistributedDataParallel(module=model, device_ids=self.determine_ddp_device_ids(), **self._ddp_kwargs)
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\torch\nn\parallel\distributed.py", line 477, in __init__
self._log_and_throw(
File "T:\programs\anaconda3\envs\textualinversion\lib\site-packages\torch\nn\parallel\distributed.py", line 604, in _log_and_throw
raise err_type(err_msg)
RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Any advice? Thank you!
Hello. Thanx for the good work. I managed to get a trained model by using a colab notebook. Now I exported the model in the folder "stable_diffusion_weights" and downloaded it. Where do I put the folder in a local stable-diffusion installation and use it in the prompt? I use the version of automatic1111.
please clear this issue.
It seems like the entire latent space is shifted towards what you're training. And also, the longer you train, the more is affected.
However, @nikopueringer was figuring out what's missing in @XavierXiao's code from Google's implementation -- regularization on the go.
To be reductive, it's:
As a test, I trained my face on the class word "brazilian". At 9K steps, here are some unrelated prompts (euler, seed 1, cfg 15):
Some of the issues above might be ameliorated by removing "photo of" from the personalized.py
file? Or with more regularization images, perhaps as many images as there are steps? Or perhaps a much much narrower class?
Traceback (most recent call last):
File "main.py", line 792, in
ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
AttributeError: 'int' object has no attribute 'strip'
I'm using this fork: https://github.com/gammagec/Dreambooth-SD-optimized
I've seen a lot of digital art from both Midjourney and the rest of the A.I. folks. and I've seen that there's an artistic depth to Midjourney that the others simply don't have. Sure, here and there I will find one creation that appears to be outstanding, and yet they are way fewer than what one can find in Midjourney. Here's an example of the same prompt (carbon, hydrogen, oxygen, nitrogen, sculpture, abstract, octane rendering, hyper extremism) I used this prompt to represent the very basic elements giving rise to the origin of the universe, of life. As you can see, I needed an abstract image with enough depth that could represent this idea. Here are a couple of examples I got from SD and Midjourney. Guess which one was made by MJ?
It was suggested to me to include the prompts: DOF, strong bokeh, sampler dpm2. scale:7, steps: 61 and yet I cannot find one single result that can truly compete with MJ.
I understand that MJ uses SD, but why is it so far superior?
Was MJ trained with paintings by famous artists? Is that the reason?
In the paper, trained images is about one object, how about traning with images belong to one domin, so the trained model can do img2img trasks lile pix2pix or cyclegan model?
I did two fine-tunings of the model:
Starting from sd-v1-4-full-ema.ckpt
, as in the README.
Training data were photos/selfies. I modified personalized.py
to use a custom identifier. The class noun was "man".
Starting from result of (1).
Training data were photos of my partner, again with a (different) custom identifier, but this time using "woman" as the class noun.
The good news are that both models produce pretty high quality results for the new concept. These are not always ideal for photos, I noticed e.g. glossy edges and other artifacts in a number of them, but not too out of the ordinary for SD. However, great generalization to painted styles and pretty good editability.
However, when trying to generate the second set of regularization images from the first finetuned model, I did notice some degradation of the results (I ended up generating regularization images with the original model). The pattern seems similar to the examples without prior-preservation loss from the paper (Fig. 13):
Not sure why output quality and diversity for "photo of a man" is further degraded here, given the different class noun for the second training.
Would lm.int8 allow this to be trained on low end cards?
https://huggingface.co/blog/hf-bitsandbytes-integration
https://github.com/TimDettmers/bitsandbytes
My command is
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml --train --actual_resume models/ldm/stable-diffusion-v1/sd-v1-4-full-ema.ckpt --name jacobpederson --gpus 1, --data_root training/face --reg_data_root classes/class-face-samples --class_word man
Error is
Traceback (most recent call last):
File "main.py", line 792, in
ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
AttributeError: 'int' object has no attribute 'strip'
Full text here https://pastebin.com/cPtUUe6H
Also referenced here #46
Thanks!
What should i do?
Traceback (most recent call last):
File "main.py", line 787, in <module>
ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
AttributeError: 'int' object has no attribute 'strip'
I know the paper doesn't really talk about it either. But what's the impact of using the template "a photo of", instead of just "a"? Should the template be something that describes the training images (if the training images were drawings of dogs, it should be drawings instead of photo)?
After watching experiments on Textual Inversion from the sidelines, I decided to jump in and try this. Thank goodness I had a A6000 on hand for the whopping 38 gigs of VRAM needed.
but the results are amazing. Far better than basic Textual Inversion. And it trains in a FRACTION of the time!
the following images are me as a zombie, using a basic prompt like “digital painting of yourNameHere as a zombie, undead, decomposing, concept art, by Greg Rutkowski and Wlop”
some of my initial findings:
Low CFG (I used 5) helps the image be more flexible in style and subject. Otherwise it kind of just looks like the iPhone selfies I trained it on.
You really have to push to bring in painter styles and subjects. I had to write “zombie, undead, decomposing” before it finally broke away from generating a normal face.
there are about a million variables I can think of to start tweaking. I would love to hear about other people’s experiences. What’s the best image set to train on? How many images? How many iterations? Can we make this run on a 3090?
I have a ton of questions and experiments to do. What do you guys think?
Every time I train a model it crashes when saving the checkpoint
`Epoch 0, global step 499: val/loss_simple_ema was not in top 1
Summoning checkpoint.
Traceback (most recent call last):
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 379, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 499, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 380, in save
return
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 259, in exit
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:300] . unexpected pos 6057217600 vs 6057217496
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 836, in
trainer.fit(model, data)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run
self._dispatch()
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage
return self._run_train()
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1319, in _run_train
self.fit_loop.run()
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 216, in advance
self.trainer.call_hook("on_train_batch_end", batch_end_outputs, batch, batch_idx, **extra_kwargs)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1495, in call_hook
callback_fx(*args, **kwargs)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\callback_hook.py", line 181, in on_train_batch_end
callback.on_train_batch_end(self, self.lightning_module, outputs, batch, batch_idx)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 309, in on_train_batch_end
self.save_checkpoint(trainer)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 400, in save_checkpoint
self._save_last_checkpoint(trainer, monitor_candidates)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 670, in _save_last_checkpoint
trainer.save_checkpoint(filepath, self.save_weights_only)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1913, in save_checkpoint
self.checkpoint_connector.save_checkpoint(filepath, weights_only)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py", line 472, in save_checkpoint
self.trainer.training_type_plugin.save_checkpoint(_checkpoint, filepath)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 294, in save_checkpoint
return self.checkpoint_io.save_checkpoint(checkpoint, filepath)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\plugins\io\torch_plugin.py", line 37, in save_checkpoint
atomic_save(checkpoint, path)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\utilities\cloud_io.py", line 68, in atomic_save
torch.save(checkpoint, bytesbuffer)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 381, in save
_legacy_save(obj, opened_file, pickle_module, pickle_protocol)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 225, in exit
self.file_like.flush()
ValueError: I/O operation on closed file.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 379, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 499, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 380, in save
return
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 259, in exit
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:300] . unexpected pos 9857564736 vs 9857564632
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 838, in
melk()
File "main.py", line 818, in melk
trainer.save_checkpoint(ckpt_path)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1913, in save_checkpoint
self.checkpoint_connector.save_checkpoint(filepath, weights_only)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py", line 472, in save_checkpoint
self.trainer.training_type_plugin.save_checkpoint(_checkpoint, filepath)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 294, in save_checkpoint
return self.checkpoint_io.save_checkpoint(checkpoint, filepath)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\plugins\io\torch_plugin.py", line 37, in save_checkpoint
atomic_save(checkpoint, path)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\pytorch_lightning\utilities\cloud_io.py", line 68, in atomic_save
torch.save(checkpoint, bytesbuffer)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 381, in save
_legacy_save(obj, opened_file, pickle_module, pickle_protocol)
File "H:\Anaconda\envs\dreambooth\lib\site-packages\torch\serialization.py", line 225, in exit
self.file_like.flush()
ValueError: I/O operation on closed file.`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.