Coder Social home page Coder Social logo

adobe-research / custom-diffusion Goto Github PK

View Code? Open in Web Editor NEW
1.8K 31.0 135.0 62 MB

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

Home Page: https://www.cs.cmu.edu/~custom-diffusion

License: Other

Python 99.06% Shell 0.94%
customization fine-tuning text-to-image-generation computer-vision diffusion-models few-shot pytorch

custom-diffusion's Introduction

Custom Diffusion

[NEW!] Custom Diffusion is now supported in diffusers. Please refer here for training and inference details.

[NEW!] CustomConcept101 dataset. We release a new dataset of 101 concepts along with their evaluation prompts. For more details please refer here.

[NEW!] Custom Diffusion with SDXL. Diffusers code now with updated diffusers==0.21.4.


Custom Diffusion allows you to fine-tune text-to-image diffusion models, such as Stable Diffusion, given a few images of a new concept (~4-20). Our method is fast (~6 minutes on 2 A100 GPUs) as it fine-tunes only a subset of model parameters, namely key and value projection matrices, in the cross-attention layers. This also reduces the extra storage for each additional concept to 75MB.

Our method further allows you to use a combination of multiple concepts such as new object + new artistic style, multiple new objects, and new object + new category. See multi-concept results for more visual results.

Multi-Concept Customization of Text-to-Image Diffusion
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu
In CVPR 2023

Results

All our results are based on fine-tuning stable-diffusion-v1-4 model. We show results on various categories of images, including scene, pet, personal toy, and style, and with a varying number of training samples. For more generations and comparisons with concurrent methods, please refer to our webpage and gallery.

Single-Concept Results

Multi-Concept Results

Method Details

Given the few user-provided images of a concept, our method augments a pre-trained text-to-image diffusion model, enabling new generations of the concept in unseen contexts. We fine-tune a small subset of model weights, namely the key and value mapping from text to latent features in the cross-attention layers of the diffusion model. Our method also uses a small set of regularization images (200) to prevent overfitting. For personal categories, we add a new modifier token V* in front of the category name, e.g., V* dog. For multiple concepts, we jointly train on the dataset for the two concepts. Our method also enables the merging of two fine-tuned models using optimization. For more details, please refer to our paper.

Getting Started

git clone https://github.com/adobe-research/custom-diffusion.git
cd custom-diffusion
git clone https://github.com/CompVis/stable-diffusion.git
cd stable-diffusion
conda env create -f environment.yaml
conda activate ldm
pip install clip-retrieval tqdm

Our code was developed on the following commit #21f890f9da3cfbeaba8e2ac3c425ee9e998d5229 of stable-diffusion.

Download the stable-diffusion model checkpoint wget https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt For more details, please refer here.

Dataset: we release some of the datasets used in paper here. Images taken from UnSplash are under UnSplash LICENSE. Moongate dataset can be downloaded from here.

Models: all our models can be downloaded from here.

Single-Concept Fine-tuning

Real images as regularization

## download dataset
wget https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip
unzip data.zip

## run training (30 GB on 2 GPUs)
bash scripts/finetune_real.sh "cat" data/cat real_reg/samples_cat  cat finetune_addtoken.yaml <pretrained-model-path>

## save updated model weights
python src/get_deltas.py --path logs/<folder-name> --newtoken 1

## sample
python sample.py --prompt "<new1> cat playing with a ball" --delta_ckpt logs/<folder-name>/checkpoints/delta_epoch\=000004.ckpt --ckpt <pretrained-model-path>

The <pretrained-model-path> is the path to the pretrained sd-v1-4.ckpt model. Our results in the paper are not based on the clip-retrieval for retrieving real images as the regularization samples. But this also leads to similar results.

Generated images as regularization

bash scripts/finetune_gen.sh "cat" data/cat gen_reg/samples_cat  cat finetune_addtoken.yaml <pretrained-model-path>

Multi-Concept Fine-tuning

Joint training

## run training (30 GB on 2 GPUs)
bash scripts/finetune_joint.sh "wooden pot" data/wooden_pot real_reg/samples_wooden_pot \
                                    "cat" data/cat real_reg/samples_cat  \
                                    wooden_pot+cat finetune_joint.yaml <pretrained-model-path>

## save updated model weights
python src/get_deltas.py --path logs/<folder-name> --newtoken 2

## sample
python sample.py --prompt "the <new2> cat sculpture in the style of a <new1> wooden pot" --delta_ckpt logs/<folder-name>/checkpoints/delta_epoch\=000004.ckpt --ckpt <pretrained-model-path>

Optimization based weights merging

Given two fine-tuned model weights delta_ckpt1 and delta_ckpt2 for any two categories, the weights can be merged to create a single model as shown below.

python src/composenW.py --paths <delta_ckpt1>+<delta_ckpt2> --categories  "wooden pot+cat"  --ckpt <pretrained-model-path> 

## sample
python sample.py --prompt "the <new2> cat sculpture in the style of a <new1> wooden pot" --delta_ckpt optimized_logs/<folder-name>/checkpoints/delta_epoch\=000000.ckpt --ckpt <pretrained-model-path>

Training using Diffusers library

[NEW!] Custom Diffusion is also supported in diffusers now. Please refer here for training and inference details.

## install requirements 
pip install accelerate>=0.24.1
pip install modelcards
pip install transformers>=4.31.0
pip install deepspeed
pip install diffusers==0.21.4
accelerate config
export MODEL_NAME="CompVis/stable-diffusion-v1-4"

Single-Concept fine-tuning

## launch training script (2 GPUs recommended, increase --max_train_steps to 500 if 1 GPU)

accelerate launch src/diffusers_training.py \
          --pretrained_model_name_or_path=$MODEL_NAME  \
          --instance_data_dir=./data/cat  \
          --class_data_dir=./real_reg/samples_cat/ \
          --output_dir=./logs/cat  \
          --with_prior_preservation --real_prior --prior_loss_weight=1.0 \
          --instance_prompt="photo of a <new1> cat"  \
          --class_prompt="cat" \
          --resolution=512  \
          --train_batch_size=2  \
          --learning_rate=1e-5  \
          --lr_warmup_steps=0 \
          --max_train_steps=250 \
          --num_class_images=200 \
          --scale_lr --hflip  \
          --modifier_token "<new1>"

## sample 
python src/diffusers_sample.py --delta_ckpt logs/cat/delta.bin --ckpt "CompVis/stable-diffusion-v1-4" --prompt "<new1> cat playing with a ball"

You can also use --enable_xformers_memory_efficient_attention and enable fp16 during accelerate config for faster training with lower VRAM requirement. To train with SDXL use diffusers_training_sdxl.py with MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0".

Multi-Concept fine-tuning

Provide a json file with the info about each concept, similar to this.

## launch training script (2 GPUs recommended, increase --max_train_steps to 1000 if 1 GPU)

accelerate launch src/diffusers_training.py \
          --pretrained_model_name_or_path=$MODEL_NAME  \
          --output_dir=./logs/cat_wooden_pot  \
          --concepts_list=./assets/concept_list.json \
          --with_prior_preservation --real_prior --prior_loss_weight=1.0 \
          --resolution=512  \
          --train_batch_size=2  \
          --learning_rate=1e-5  \
          --lr_warmup_steps=0 \
          --max_train_steps=500 \
          --num_class_images=200 \
          --scale_lr --hflip  \
          --modifier_token "<new1>+<new2>" 

## sample 
python src/diffusers_sample.py --delta_ckpt logs/cat_wooden_pot/delta.bin --ckpt "CompVis/stable-diffusion-v1-4" --prompt "<new1> cat sitting inside a <new2> wooden pot and looking up"

Optimization based weights merging for Multi-Concept

Given two fine-tuned model weights delta1.bin and delta2.bin for any two categories, the weights can be merged to create a single model as shown below.

python src/diffusers_composenW.py --paths <delta1.bin>+<delta2.bin> --categories  "wooden pot+cat"  --ckpt "CompVis/stable-diffusion-v1-4"

## sample
python src/diffusers_sample.py --delta_ckpt optimized_logs/<folder-name>/delta.bin --ckpt "CompVis/stable-diffusion-v1-4" --prompt "<new1> cat sitting inside a <new2> wooden pot and looking up"

The diffuser training code is modified from the following DreamBooth, Textual Inversion training scripts. For more details on how to setup accelarate please refer here.

Fine-tuning on human faces

For fine-tuning on human faces, we recommend learning_rate=5e-6 and max_train_steps=750 in the above diffuser training script or using finetune_face.yaml config in stable-diffusion training script.

We observe better results with a lower learning rate, longer training, and more images for human faces compared to other categories shown in our paper. With fewer images, fine-tuning all parameters in the cross-attention is slightly better, which can be enabled with --freeze_model "crossattn".
Example results on fine-tuning with 14 close-up photos of Richard Zhang with the diffusers training script.

Model compression

python src/compress.py --delta_ckpt <finetuned-delta-path> --ckpt <pretrained-model-path>

## sample
python sample.py --prompt "<new1> cat playing with a ball" --delta_ckpt logs/<folder-name>/checkpoints/compressed_delta_epoch\=000004.ckpt --ckpt <pretrained-model-path> --compress

Sample generations with different level of compression. By default our code saves the low-rank approximation with top 60% singular values to result in ~15 MB models.

Checkpoint conversions for stable-diffusion-v1-4

  • From diffusers delta.bin to CompVis delta_model.ckpt.
python src/convert.py --delta_ckpt <path-to-folder>/delta.bin --ckpt <path-to-model-v1-4.ckpt> --mode diffuser-to-compvis                  
# sample
python sample.py --delta_ckpt <path-to-folder>/delta_model.ckpt --ckpt <path-to-model-v1-4.ckpt> --prompt <text-prompt> --config configs/custom-diffusion/finetune_addtoken.yaml
python src/convert.py --delta_ckpt <path-to-folder>/delta.bin --ckpt <path-to-model-v1-4.ckpt> --mode diffuser-to-webui                  
# launch UI in stable-diffusion-webui directory
bash webui.sh --embeddings-dir <path-to-folder>/webui/embeddings  --ckpt <path-to-folder>/webui/model.ckpt
  • From CompVis delta_model.ckpt to diffusers delta.bin.
python src/convert.py --delta_ckpt <path-to-folder>/delta_model.ckpt --ckpt <path-to-model-v1-4.ckpt> --mode compvis-to-diffuser                  
# sample
python src/diffusers_sample.py --delta_ckpt <path-to-folder>/delta.bin --ckpt "CompVis/stable-diffusion-v1-4" --prompt <text-prompt>
python src/convert.py --delta_ckpt <path-to-folder>/delta_model.ckpt --ckpt <path-to-model-v1-4.ckpt> --mode compvis-to-webui                  
# launch UI in stable-diffusion-webui directory
bash webui.sh --embeddings-dir <path-to-folder>/webui/embeddings  --ckpt <path-to-folder>/webui/model.ckpt

Converted checkpoints are saved in the <path-to-folder> of the original checkpoints.

References

@article{kumari2022customdiffusion,
  title={Multi-Concept Customization of Text-to-Image Diffusion},
  author={Kumari, Nupur and Zhang, Bingliang and Zhang, Richard and Shechtman, Eli and Zhu, Jun-Yan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2023}
}

Acknowledgments

We are grateful to Nick Kolkin, David Bau, Sheng-Yu Wang, Gaurav Parmar, John Nack, and Sylvain Paris for their helpful comments and discussion, and to Allie Chang, Chen Wu, Sumith Kulal, Minguk Kang, Yotam Nitzan, and Taesung Park for proofreading the draft. We also thank Mia Tang and Aaron Hertzmann for sharing their artwork. Some of the datasets are downloaded from Unsplash. This work was partly done by Nupur Kumari during the Adobe internship. The work is partly supported by Adobe Inc.

custom-diffusion's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

custom-diffusion's Issues

Question about diffusers training

In diffuser_training.py line 1205:
index_grads_to_zero = index_grads_to_zero | torch.arange(len(tokenizer)) != modifier_token_id[I]

Maybe here should use '&' but '|' ?

Table 1 Metrics

Thank you for this great work! Is there a script or code to help replicate evaluation of your method using text/image alignment and KID metrics?

Can this run on google colab? What is the minimum required VRAM? Any google colab notebook available?

Can this run on free google colab?

What is the minimum required VRAM?

Any google colab notebook available?

I have recently a made a tutorial for stable diffusion 1.5 and dream booth. It works great. I would like to prepare a tutorial for this one too.

https://www.youtube.com/watch?v=mnCY8uM7E50

image

In this video, I will show you how to create your own Lensa app-like magic avatars without using any third-party apps. Our method ensures 100% privacy, unlike uploading your personal photos to external apps and platforms. I am sure you have seen many people sharing their magic avatars on social media. While these paid apps use free open-source AI models, they still require payment. Additionally, AI image generation apps may keep your private photos and use them as they please. Instead of paying and trusting the goodwill of these paid apps, we will use these open-source public AI models ourselves. I will provide step-by-step instructions, so even those with no technical expertise can follow along. We will be using Google's Colab, a free #AI platform that offers access to a powerful GPU at no cost. Therefore, this tutorial can even be completed on a mobile phone without a PC. All you need is a Gmail account. Once we train the open-source image generation model #StableDiffusion with #DreamBooth by using our own portrait images, the possibilities are endless. Unlike other applications, the method I will demonstrate does not impose any limitations on image generation. You will be able to freely compose any kind of image as many times as you desire. You can not only generate your own avatars, but also any other images you want, such as highly detailed car images or landscapes.

TypeError: new_forward() got an unexpected keyword argument 'attention_mask'

ENV:
torch 1.11.0+cu113
torchvision 0.12.0+cu113
diffusers 0.11.1
transformers 4.25.0

Traceback (most recent call last):
File "src/diffuser_training.py", line 1252, in
main(args)
File "src/diffuser_training.py", line 1152, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 424, in forward
sample, res_samples = downsample_block(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 777, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/diffusers/models/attention.py", line 216, in forward
hidden_states = block(hidden_states, encoder_hidden_states=encoder_hidden_states, timestep=timestep)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/diffusers/models/attention.py", line 498, in forward
self.attn2(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
TypeError: new_forward() got an unexpected keyword argument 'attention_mask'

How to use the checkpoint folders?

There is only one delta.bin that I can use to generate images, but how should I use the checkpoints folders to test them out? Thank you!

Questions about Dataset and Evaluation

Hello, I am trying to reproduce the results of your paper, but I found that some training images are not available in the link.

Specifically, there are no images for flower, table, and chair concept.
Also, there are eight images for the concept of dog and 136 images for the concept of the moongate, but it is different from the paper (Following the Table 5 in the Appendix, it seems that there are ten images for the dog and 135 images for the moongate.)

If it is possible, could I get the the images of flower, table, chair, and full image set for the dog?

Also, I want to ask some details of evaluation for clarity.

  1. I think the same generated images with prompts having the modifier token (20 prompts x 50 samples = 1000 images) are used for measuring the metrics of image alignment and text alignment. And the generated image dataset with prompts without the modifier token (20 prompts x 50 samples = 1000 images) is used for KID. Is it right?

  2. For measuring KID, how the validation images are retrieved from the dataset LAION-400M? In the document, you mention that "Our results in the paper are not based on the clip-retrieval for retrieving real images as the regularization samples." Then, what is the method you use for the retrieval (for training and evaluation)?

  3. For measuring KID, we have 20 prompts per a concept. When you retrieve the images from LAION-400M, did you retrieve the 25 images per a prompt to make the 500 validation images? And when you retrieve the real images, I think the modifier token is removed from the prompts. Is this right?

What's the purpose of these lines?

Thanks for your amazing work! I have one quick question: what is the purpose of these lines, in the modified CrossAttention's forward function? It seems like you disable the gradient of the first token in the embedding? Can you explain a bit?

Thanks!

Connection refused by the server

Thanks for the project!
Would you please advise how to fix below error? thanks!

(ldm) root@autodl-container-ad2d11a13c-8c30d587:~/autodl-tmp/custom-diffusion# bash scripts/finetune_real.sh "cat" data/cat real_reg/samples_cat cat finetune_addtoken.yaml sd-v1-4.ckpt
cat
data/cat
real_reg/samples_cat
cat
finetune_addtoken.yaml
sd-v1-4.ckpt
Connection refused by the server..
19566 0
Connection refused by the server..
80970 1
6477 2
Connection refused by the server..
158533 3
37678 4
7444 5
25722 6
18951 7
30868 8
Connection refused by the server..
196341 9

Not working

I use this command with 100 reg images in ./data/1girl-reg and 56 instance images in ./data/Power

accelerate launch src/diffuser_training.py --pretrained_model_name_or_path 'C:\Users\jhon\StableTuner\models\Anything-V3' --instance_data_dir ./data/Power --class_data_dir ./data/1girl-reg --output_dir ./logs/power --with_prior_preservation --prior_loss_weight 1.0 --instance_prompt "illustration of a <new1> 1girl" --class_prompt "1girl" --resolution 512 --revision fp16 --mixed_precision fp16 --gradient_checkpointing --gradient_accumulation_steps 1 --use_8bit_adam --train_batch_size 1 --learning_rate 5e-6 --lr_warmup_steps 0 --max_train_steps 250 --scale_lr --modifier_token "<new1>"

after training I sample like this
python src/sample_diffuser.py --delta_ckpt logs/power/delta.bin --ckpt "C:\Users\jhon\StableTuner\models\Anything-V3" --prompt "illustration of <new1> 1girl"

I also tried

python src/sample_diffuser.py --delta_ckpt logs/power/delta.bin --ckpt "C:\Users\jhon\StableTuner\models\Anything-V3" --prompt "<new1> 1girl"

the generated images do not look anything like my instance character below are some of the the instance character images that I used for training

explorer_FRprcjSFmJ

below are some of the regularization images that I used for training
explorer_bMvWoV5Iz9

and this is the sampled image
s

As you can see the images look nothing like the instance character not sure what I am doing wrong

questions about the src/convert.py

Hi! I notice that you are trying to update the code to be compatible with the new version of diffuser. However, seems like the src/convert.py are not the correct code after the newest commit. Could you please correct this mistake?

current code is not compatible with new diffusers

since the CrossAttention module has been seriously refactored in new diffusers versions (0.13+, probably after 0.11.1), current create_custom_diffusion code does not work with it out of the box anymore.
that said, unet finetuning works quite well even without updating forward function, but requires way longer training - like ~2000 steps instead of ~500. copying old self._attention code into the new_forward enhances the quality, but still takes more time and VRAM than e.g. CompVis version.

CUDA out of memory for sample_diffuser

Hi,

faced an out of memory error after trying to test out my trained model with sample_diffuser.py. 40 GBs was too little for it. Is there any way to fix this?

Command that I ran:

! python src/sample_diffuser.py --delta_ckpt /content/drive/MyDrive/logs/cat/delta.bin --ckpt "stabilityai/stable-diffusion-2-1" --prompt "female face in the style of <new1>"
dict_keys(['unet', 'modifier_token'])
  0% 0/200 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "src/sample_diffuser.py", line 70, in <module>
    sample(args.ckpt, args.delta_ckpt, args.from_file, args.prompt, args.compress, args.freeze_model)
  File "src/sample_diffuser.py", line 27, in sample
    images = pipe([prompt]*5, num_inference_steps=200, guidance_scale=6., eta=1.).images
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 606, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=prompt_embeds).sample
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py", line 481, in forward
    sample, res_samples = downsample_block(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_blocks.py", line 789, in forward
    hidden_states = attn(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/transformer_2d.py", line 265, in forward
    hidden_states = block(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 291, in forward
    attn_output = self.attn1(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/cross_attention.py", line 160, in forward
    return self.processor(
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/cross_attention.py", line 239, in __call__
    attention_probs = attn.get_attention_scores(query, key, attention_mask)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/cross_attention.py", line 203, in get_attention_scores
    attention_probs = attention_probs.to(dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.91 GiB (GPU 0; 39.59 GiB total capacity; 34.67 GiB already allocated; 2.86 GiB free; 34.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Clear Instructions for Finetuning Faces

Hey thanks for amazing work,
I was trying to implement finetuning faces like dreambooth works really well in finetuning model on specific face subject. But when I tried finetuning faces there is very less guidelines for it. But I tried training through diffusers and the results were very odd and the identity loss was very high. Training through scripts is difficult to understand since there is no specific guideline for what is instance prompt and what is class prompt.
It would be great if you can provide us better guidelines on finetuning faces easily with custom diffusion.
Also the papers say the model is ~6x faster than dreambooth, however this does not seems to be the case, since training custom diffusion through diffusers is taking almost the same time and gpu memory, please guide us on this as well.
Really looking forward to your reply

About the training of input embedding.

In the paper "Multi-Concept Customization of Text-to-Image Diffusion," it is stated that only the target token "new1" or V* was tuned. However, upon reviewing the code in diffuser_training.py, it appears that the entire embedding was optimized. Can you clarify if this is accurate? Thank you for your response.

composen_W_diffuser runtime error

When I run the following command, I encounter the following error. Have you encountered this error & how to solve it?

Command:
python composenW_diffuser.py --paths delta_cat.bin+delta_wooden_pot.bin --categories "cat+wooden pot" --ckpt "CompVis/stable-diffusion-v1-4"

Error:
_LinAlgError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info). This error may appear if the input matrix contains NaN.

convert.py contains bugs and suggested fixes

I am trying to run the compvis-to-webui mode and it doesnt work. I went into the code and found some potential bugs. After I tried to fix it, it seems to work although I havent tried using the output models. Please see if they are correct and fix it. Thank you!

Lines 72-73, should be 'and' not 'or'

    # convert checkpoint to webui
    if mode in ['diffuser-to-webui' and 'compvis-to-webui']:

Lines 91-92, 'compvis_st' is not defined within that scope so I think it should be 'st'

            st = torch.load(delta_ckpt)["state_dict"]
            model.load_state_dict(st, strict=False)

Screen Shot 2023-02-24 at 3 34 27 PM

retrieve.py doesn't work

Hi!

when I executed finetune_real.sh,
I faced the error below.
could you give me some advices to fix this?

Traceback (most recent call last):
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/opt/conda/envs/ldm/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/opt/conda/envs/ldm/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/opt/conda/envs/ldm/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/opt/conda/envs/ldm/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/opt/conda/envs/ldm/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/opt/conda/envs/ldm/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/retrieve.py", line 91, in <module>
    retrieve(args.target_name, args.outpath, args.num_class_images)
  File "src/retrieve.py", line 27, in retrieve
    results = client.query(text=target_name)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/clip_retrieval/clip_client.py", line 84, in query
    return self.__search_knn_api__(text=text)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/clip_retrieval/clip_client.py", line 131, in __search_knn_api__
    return requests.post(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/requests/adapters.py", line 547, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

About given prompts.

Hi, I want to ask why the moongate.txt and moongate_dog.txt don't contain placeholder token?

Where can I find "--ckpt"?

Such as, python src/compress.py --delta_ckpt --ckpt
Here: pretrained-model-path="CompVis/stable-diffusion-v1-4"?
But: FileNotFoundError: [Errno 2] No such file or directory: 'CompVis/stable-diffusion-v1-4'

Entry Not Found for url: https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/tokenizer/config.json.

I am getting this error both through the main repo and the gradio one, trying out 1.4,1.5,2.1 SD versions as well

Connection refused by the server..
19109 164
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 239, in hf_raise_for_status
    response.raise_for_status()
  File "/opt/conda/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/tokenizer/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1067, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1376, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 257, in hf_raise_for_status
    raise EntryNotFoundError(message, response) from e
huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-63ab18d6-5143fbe5424472572e730170)

Entry Not Found for url: https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/tokenizer/config.json.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/sdw/examples/dreambooth/custom-diffusion/custom-diffusion/src/diffuser_training.py", line 1246, in <module>
    main(args)
  File "/workspace/sdw/examples/dreambooth/custom-diffusion/custom-diffusion/src/diffuser_training.py", line 892, in main
    tokenizer = AutoTokenizer.from_pretrained(
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 575, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 776, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/transformers/configuration_utils.py", line 559, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/transformers/configuration_utils.py", line 614, in _get_config_dict
    resolved_config_file = cached_file(
  File "/opt/conda/lib/python3.9/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: stabilityai/stable-diffusion-2-1-base does not appear to have a file named tokenizer/config.json. Checkout 'https://huggingface.co/stabilityai/stable-diffusion-2-1-base/main' for available files.
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/opt/conda/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'custom-diffusion/src/diffuser_training.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--output_dir=results', '--concepts_list=results/temp.json', '--with_prior_preservation', '--prior_loss_weight=1.0', '--resolution=768', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-06', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=2000', '--num_class_images=200', '--initializer_token=ktn+pll+ucd', '--scale_lr', '--hflip', '--modifier_token', '<new1>', '--real_prior', '--use_8bit_adam', '--train_text_encoder', '--gradient_checkpointing']' returned non-zero exit status 1.

Least squares setup for model merge

Hi, when iterating through the attention layers to form Vtarget1, how come we are able to reuse the same text embedding (when multiplying the weights here) throughout all the layers? Wouldn't this embedding get transformed as it propagates through the network?

How hard would this be to extend this concept to not only the attention layers? I'm wanting to merge two fine tuned models for clip and unet, trained on all layers.

Also I think line 91 in composenW.py is a bug where prompts param isn't used?

Not working on macOS (Silicon)

% git clone https://github.com/adobe-research/custom-diffusion.git
Cloning into 'custom-diffusion'...
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (49/49), done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 214 (delta 26), reused 31 (delta 14), pack-reused 165
Receiving objects: 100% (214/214), 51.28 MiB | 33.20 MiB/s, done.
Resolving deltas: 100% (88/88), done.
% git clone https://github.com/CompVis/stable-diffusion.git
Cloning into 'stable-diffusion'...
remote: Enumerating objects: 340, done.
remote: Total 340 (delta 0), reused 0 (delta 0), pack-reused 340
Receiving objects: 100% (340/340), 42.65 MiB | 30.80 MiB/s, done.
Resolving deltas: 100% (114/114), done.
% cd stable-diffusion
% conda env create -f environment.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - pip=20.3
  - torchvision=0.12.0
  - cudatoolkit=11.3
  - python=3.8.5

[ CrossAttention: new-forward pass]

Hi,

Wonderful work, thanks for sharing the code.

I have a question with respect to new_forward pass function defined in model.py file. I do not understand why when the cross-attention in True, you manipulate the key, and value tensors with new variable named modifier. (line 154-158).

Thanks

Wich license is right?

So, which license apply to custom diffusion? Here at the repo, we have copyright with permission to use at non commercial use. At huggingface model, it's MIT. Wich one does apply?

Thanks

1 gpu train error

bash scripts/finetune_real.sh "cat" data/cat real_reg/samples_cat cat finetune_addtoken.yaml v1-5-pruned-emaonly.ckpt

| Name | Type | Params

0 | model | DiffusionWrapper | 859 M
1 | first_stage_model | AutoencoderKL | 83.7 M
2 | cond_stage_model | FrozenCLIPEmbedderWrapper | 123 M

57.1 M Trainable params
1.0 B Non-trainable params
1.1 B Total params
4,264.944 Total estimated model params size (MB)
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]Summoning checkpoint.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/fuchao/work/github/custom-diffusion/train.py:967 in │
│ │
│ 964 │ │ # run │
│ 965 │ │ if opt.train: │
│ 966 │ │ │ try: │
│ ❱ 967 │ │ │ │ trainer.fit(model, data) │
│ 968 │ │ │ except Exception: │
│ 969 │ │ │ │ melk() │
│ 970 │ │ │ │ raise │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py │
│ :553 in fit │
│ │
│ 550 │ │ │
│ 551 │ │ self.checkpoint_connector.resume_start() │
│ 552 │ │ │
│ ❱ 553 │ │ self._run(model) │
│ 554 │ │ │
│ 555 │ │ assert self.state.stopped │
│ 556 │ │ self.training = False │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py │
│ :918 in _run │
│ │
│ 915 │ │ self.checkpoint_connector.restore_training_state() │
│ 916 │ │ │
│ 917 │ │ # dispatch start_training or start_evaluating or start_predicting
│ ❱ 918 │ │ self._dispatch() │
│ 919 │ │ │
│ 920 │ │ # plugin will finalized fitting (e.g. ddp_spawn will load trained model) │
│ 921 │ │ self._post_dispatch() │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py │
│ :986 in _dispatch │
│ │
│ 983 │ │ elif self.predicting: │
│ 984 │ │ │ self.accelerator.start_predicting(self) │
│ 985 │ │ else: │
│ ❱ 986 │ │ │ self.accelerator.start_training(self) │
│ 987 │ │
│ 988 │ def run_stage(self): │
│ 989 │ │ self.accelerator.dispatch(self) │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/accelerators/accel │
│ erator.py:92 in start_training │
│ │
│ 89 │ │ self.setup_precision_plugin() │
│ 90 │ │
│ 91 │ def start_training(self, trainer: "pl.Trainer") -> None: │
│ ❱ 92 │ │ self.training_type_plugin.start_training(trainer) │
│ 93 │ │
│ 94 │ def start_evaluating(self, trainer: "pl.Trainer") -> None: │
│ 95 │ │ self.training_type_plugin.start_evaluating(trainer) │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/plugins/training_t │
│ ype/training_type_plugin.py:161 in start_training │
│ │
│ 158 │ │
│ 159 │ def start_training(self, trainer: "pl.Trainer") -> None: │
│ 160 │ │ # double dispatch to initiate the training loop │
│ ❱ 161 │ │ self._results = trainer.run_stage() │
│ 162 │ │
│ 163 │ def start_evaluating(self, trainer: "pl.Trainer") -> None: │
│ 164 │ │ # double dispatch to initiate the test loop │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py │
│ :996 in run_stage │
│ │
│ 993 │ │ │ return self._run_evaluate() │
│ 994 │ │ if self.predicting: │
│ 995 │ │ │ return self._run_predict() │
│ ❱ 996 │ │ return self._run_train() │
│ 997 │ │
│ 998 │ def _pre_training_routine(self): │
│ 999 │ │ # wait for all to join if on distributed │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py │
│ :1031 in _run_train │
│ │
│ 1028 │ │ if not self.is_global_zero and self.progress_bar_callback is not None: │
│ 1029 │ │ │ self.progress_bar_callback.disable() │
│ 1030 │ │ │
│ ❱ 1031 │ │ self._run_sanity_check(self.lightning_module) │
│ 1032 │ │ │
│ 1033 │ │ # enable train mode │
│ 1034 │ │ self.model.train() │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py │
│ :1115 in _run_sanity_check │
│ │
│ 1112 │ │ │ │
│ 1113 │ │ │ # run eval step │
│ 1114 │ │ │ with torch.no_grad(): │
│ ❱ 1115 │ │ │ │ self._evaluation_loop.run() │
│ 1116 │ │ │ │
│ 1117 │ │ │ self.on_sanity_check_end() │
│ 1118 │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py:111 │
│ in run │
│ │
│ 108 │ │ while not self.done: │
│ 109 │ │ │ try: │
│ 110 │ │ │ │ self.on_advance_start(*args, **kwargs) │
│ ❱ 111 │ │ │ │ self.advance(*args, **kwargs) │
│ 112 │ │ │ │ self.on_advance_end() │
│ 113 │ │ │ │ self.iteration_count += 1 │
│ 114 │ │ │ │ self.restarting = False │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/e │
│ valuation_loop.py:110 in advance │
│ │
│ 107 │ │ dataloader_iter = enumerate(dataloader) │
│ 108 │ │ dl_max_batches = self._max_batches[self.current_dataloader_idx] │
│ 109 │ │ │
│ ❱ 110 │ │ dl_outputs = self.epoch_loop.run( │
│ 111 │ │ │ dataloader_iter, self.current_dataloader_idx, dl_max_batches, self.num_datal │
│ 112 │ │ ) │
│ 113 │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py:111 │
│ in run │
│ │
│ 108 │ │ while not self.done: │
│ 109 │ │ │ try: │
│ 110 │ │ │ │ self.on_advance_start(*args, **kwargs) │
│ ❱ 111 │ │ │ │ self.advance(*args, **kwargs) │
│ 112 │ │ │ │ self.on_advance_end() │
│ 113 │ │ │ │ self.iteration_count += 1 │
│ 114 │ │ │ │ self.restarting = False │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evalua │
│ tion_epoch_loop.py:110 in advance │
│ │
│ 107 │ │ │
│ 108 │ │ # lightning module methods │
│ 109 │ │ with self.trainer.profiler.profile("evaluation_step_and_end"): │
│ ❱ 110 │ │ │ output = self.evaluation_step(batch, batch_idx, dataloader_idx) │
│ 111 │ │ │ output = self.evaluation_step_end(output) │
│ 112 │ │ │
│ 113 │ │ self.batch_progress.increment_processed() │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evalua │
│ tion_epoch_loop.py:154 in evaluation_step │
│ │
│ 151 │ │ else: │
│ 152 │ │ │ self.trainer.lightning_module._current_fx_name = "validation_step" │
│ 153 │ │ │ with self.trainer.profiler.profile("validation_step"): │
│ ❱ 154 │ │ │ │ output = self.trainer.accelerator.validation_step(step_kwargs) │
│ 155 │ │ │
│ 156 │ │ return output │
│ 157 │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/accelerators/accel │
│ erator.py:211 in validation_step │
│ │
│ 208 │ │ │ │ (only if multiple val dataloaders used) │
│ 209 │ │ """ │
│ 210 │ │ with self.precision_plugin.val_step_context(), self.training_type_plugin.val_ste │
│ ❱ 211 │ │ │ return self.training_type_plugin.validation_step(*step_kwargs.values()) │
│ 212 │ │
│ 213 │ def test_step(self, step_kwargs: Dict[str, Union[Any, int]]) -> Optional[STEP_OUTPUT │
│ 214 │ │ """The actual test step. │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/plugins/training_t │
│ ype/training_type_plugin.py:178 in validation_step │
│ │
│ 175 │ │ pass │
│ 176 │ │
│ 177 │ def validation_step(self, *args, **kwargs): │
│ ❱ 178 │ │ return self.model.validation_step(*args, **kwargs) │
│ 179 │ │
│ 180 │ def test_step(self, *args, **kwargs): │
│ 181 │ │ return self.model.test_step(*args, **kwargs) │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py:27 in │
│ decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ /home/fuchao/work/github/stable-diffusion/ldm/models/diffusion/ddpm.py:359 in validation_step │
│ │
│ 356 │ │
│ 357 │ @torch.no_grad() │
│ 358 │ def validation_step(self, batch, batch_idx): │
│ ❱ 359 │ │ _, loss_dict_no_ema = self.shared_step(batch) │
│ 360 │ │ with self.ema_scope(): │
│ 361 │ │ │ _, loss_dict_ema = self.shared_step(batch) │
│ 362 │ │ │ loss_dict_ema = {key + '_ema': loss_dict_ema[key] for key in loss_dict_ema} │
│ │
│ /home/fuchao/work/github/custom-diffusion/src/model.py:294 in shared_step │
│ │
│ 291 │ │
│ 292 │ def shared_step(self, batch, **kwargs): │
│ 293 │ │ x, c, mask = self.get_input_withmask(batch, **kwargs) │
│ ❱ 294 │ │ loss = self(x, c, mask=mask) │
│ 295 │ │ return loss │
│ 296 │ │
│ 297 │ @torch.no_grad() │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in │
│ _call_impl │
│ │
│ 1127 │ │ # this function, and just call forward. │
│ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1130 │ │ │ return forward_call(*input, **kwargs) │
│ 1131 │ │ # Do not call functions when jit is used │
│ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /home/fuchao/work/github/stable-diffusion/ldm/models/diffusion/ddpm.py:875 in forward │
│ │
│ 872 │ │ if self.model.conditioning_key is not None: │
│ 873 │ │ │ assert c is not None │
│ 874 │ │ │ if self.cond_stage_trainable: │
│ ❱ 875 │ │ │ │ c = self.get_learned_conditioning(c) │
│ 876 │ │ │ if self.shorten_cond_schedule: # TODO: drop this option │
│ 877 │ │ │ │ tc = self.cond_ids[t].to(self.device) │
│ 878 │ │ │ │ c = self.q_sample(x_start=c, t=tc, noise=torch.randn_like(c.float())) │
│ │
│ /home/fuchao/work/github/stable-diffusion/ldm/models/diffusion/ddpm.py:554 in │
│ get_learned_conditioning │
│ │
│ 551 │ def get_learned_conditioning(self, c): │
│ 552 │ │ if self.cond_stage_forward is None: │
│ 553 │ │ │ if hasattr(self.cond_stage_model, 'encode') and callable(self.cond_stage_mod │
│ ❱ 554 │ │ │ │ c = self.cond_stage_model.encode(c) │
│ 555 │ │ │ │ if isinstance(c, DiagonalGaussianDistribution): │
│ 556 │ │ │ │ │ c = c.mode() │
│ 557 │ │ │ else: │
│ │
│ /home/fuchao/work/github/custom-diffusion/src/custom_modules.py:318 in encode │
│ │
│ 315 │ │ return z │
│ 316 │ │
│ 317 │ def encode(self, text): │
│ ❱ 318 │ │ return self(text) │
│ 319 │
│ 320 │
│ 321 if name == "main": │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in │
│ _call_impl │
│ │
│ 1127 │ │ # this function, and just call forward. │
│ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1130 │ │ │ return forward_call(*input, **kwargs) │
│ 1131 │ │ # Do not call functions when jit is used │
│ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /home/fuchao/work/github/custom-diffusion/src/custom_modules.py:310 in forward │
│ │
│ 307 │ │ input_shape = tokens.size() │
│ 308 │ │ tokens = tokens.view(-1, input_shape[-1]) │
│ 309 │ │ │
│ ❱ 310 │ │ hidden_states = self.transformer.text_model.embeddings(input_ids=tokens) │
│ 311 │ │ hidden_states = (1-indices)hidden_states.detach() + indiceshidden_states │
│ 312 │ │ │
│ 313 │ │ z = self.custom_forward(hidden_states, tokens) │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in │
│ _call_impl │
│ │
│ 1127 │ │ # this function, and just call forward. │
│ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1130 │ │ │ return forward_call(*input, **kwargs) │
│ 1131 │ │ # Do not call functions when jit is used │
│ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_cl │
│ ip.py:223 in forward │
│ │
│ 220 │ │ │ position_ids = self.position_ids[:, :seq_length] │
│ 221 │ │ │
│ 222 │ │ if inputs_embeds is None: │
│ ❱ 223 │ │ │ inputs_embeds = self.token_embedding(input_ids) │
│ 224 │ │ │
│ 225 │ │ position_embeddings = self.position_embedding(position_ids) │
│ 226 │ │ embeddings = inputs_embeds + position_embeddings │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in │
│ _call_impl │
│ │
│ 1127 │ │ # this function, and just call forward. │
│ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1130 │ │ │ return forward_call(*input, **kwargs) │
│ 1131 │ │ # Do not call functions when jit is used │
│ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1133 │ │ if self.backward_hooks or global_backward_hooks: │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/sparse.py:158 in │
│ forward │
│ │
│ 155 │ │ │ │ self.weight[self.padding_idx].fill
(0) │
│ 156 │ │
│ 157 │ def forward(self, input: Tensor) -> Tensor: │
│ ❱ 158 │ │ return F.embedding( │
│ 159 │ │ │ input, self.weight, self.padding_idx, self.max_norm, │
│ 160 │ │ │ self.norm_type, self.scale_grad_by_freq, self.sparse) │
│ 161 │
│ │
│ /home/fuchao/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/functional.py:2199 in │
│ embedding │
│ │
│ 2196 │ │ # torch.embedding_renorm

│ 2197 │ │ # remove once script supports set_grad_enabled │
│ 2198 │ │ no_grad_embedding_renorm(weight, input, max_norm, norm_type) │
│ ❱ 2199 │ return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) │
│ 2200 │
│ 2201 │
│ 2202 def embedding_bag( │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

Unable to reproduce results

I am using diffuser based multi-concept fine-tuning script for the case of cat and wooden_pot but I am not able to reproduce the kind of results shown in the image below.

image

For example- for the prompt: V1 cat sitting inside a V2 wooden pot and looking up, I get the following results on sampling post fine-tuning.
image

I am using more than 2 GPUs while training. Are there any specific steps to follow that I might have missed?

The results I change the data are not good

The results I reproduced with the data in the paper are good

my input images:
image

my result:
image
image

But I change other pictures, the result is worse(the sunglasses is not like my input sunglasses)

my input images:
image

my result:
image

My training parameters are as follows:(I use the pretrained_model: runwayml/stable-diffusion-v1-5)

!accelerate launch src/diffuser_training.py
--pretrained_model_name_or_path=$MODEL_NAME
--output_dir=$OUTPUT_DIR
--concepts_list="/content/concepts_list.json"
--with_prior_preservation --prior_loss_weight=1.0
--resolution=512
--train_batch_size=2
--learning_rate=1e-5
--lr_warmup_steps=0
--max_train_steps=500
--num_class_images=200
--scale_lr --hflip
--modifier_token "<whxm>+<bkmj>"

My concept_list.json are as follows:
image

Is there something wrong with my training process causing this poor result?
Thanks!

fp16 training

Is it possible to train in mixed-precision to reduce training time? Using --mixed_precision "fp16" gives ValueError: Attempting to unscale FP16 gradients.

--mixed_precision "bf16" also gives an error

Also, is the code optimized to work with xformers by default?

The install requirement

Hi,

Firstly, thanks for the work. Can please list the installation requirement.

I am having this issue right now and seems I cannot make it work after a few hours: AttributeError: 'CrossAttention' object has no attribute 'new_forward' .

Look forward to hear from you soon!

Strange stop gradient operation

What is the purpose of these codes in diffuser_training.py?
if crossattn:
modifier = torch.ones_like(key)
# print(key.shape)
modifier[:, :1, :] = modifier[:, :1, :]0.
key = modifier
key + (1-modifier)key.detach()
value = modifier
value + (1-modifier)*value.detach()

It looks like only use the first token to train to_k and to_v and not allowing gradient flow to the modifier token.

Evaluation Prompts

Firstly, thank you for your great work. :)

I saw the prompts folder that you uploaded.
For single concept evaluation, the prompts includes the customized token ().
Did you use the same prompts for evaluating text alignment?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.