hkproj / pytorch-stable-diffusion Goto Github PK

View Code? Open in Web Editor NEW

344.0 7.0 75.0 1.97 MB

Stable Diffusion implemented from scratch in PyTorch

Home Page: https://www.youtube.com/watch?v=ZBKpAp_6TGI

License: MIT License

Jupyter Notebook 84.70% Python 15.30%

diffusion-models latent-diffusion-models paper-implementations pytorch pytorch-implementation stable-diffusion

pytorch-stable-diffusion's Introduction

pytorch-stable-diffusion

PyTorch implementation of Stable Diffusion from scratch

Download weights and tokenizer files:

Download vocab.json and merges.txt from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/tokenizer and save them in the data folder
Download v1-5-pruned-emaonly.ckpt from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main and save it in the data folder

Tested fine-tuned models:

Just download the ckpt file from any fine-tuned SD (up to v1.5).

InkPunk Diffusion: https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main
Illustration Diffusion (Hollie Mengert): https://huggingface.co/ogkalu/Illustration-Diffusion/tree/main

Special thanks

Special thanks to the following repositories:

pytorch-stable-diffusion's People

Contributors

Stargazers

Watchers

Forkers

stasys-hub japisuru schirmacher axel588 jkf87 prajolshrestha lbcsept dhhieu397 javierbetancurth subrota-mondal zhtushar23 juan-garassino pangyuteng yilewang dibal7 hariharapanda-deepak minheq wjtomcat hbahuguna kshitij79 stella05925 jh-leon-kim bobgoodi76 rushit wipwai baizh0u minxing8 afro-lingo hsingjun bcorrad gioannides dylanreedx sehwanyoo sungjuryu nick8592 jupyterlabhuxin fkhawar awaisjafar zerkclown mvsakrishna karn3003 lemonface0309 mirektudublin rockyman-star poro zhangwengyu999 lucifer4073 chulwoopack sifubro danieloladele-forked eleott-hi teng1996 laozhanger lukacshunor ay-2814 alialemimatinpour ziyu4huang kuangdai strogo hacongnga ruihangzhang97 devasaya2003 kiristern yichaocai1 hassan-jr paperwave thanhnam001 zoybzo tsaichanglan haokunwen anikpal-code code-ram srijan-mishra l-h-xu lomoda0715

pytorch-stable-diffusion's Issues

Parallel Computing

How can I use the code on multiple GPUs using torch.nn.DataParallel? It runs out of memory when using one GPU. The kernel dies when inferring on CPU.

Duplicated links

As shown in the image, both link are equal at README.md.

Code for training the model

Can you also write a simple file for training a database of images? @hkproj

Hello. Thank you so much for sharing your video and code . I found yours is the most easy one to follow.
Do you have a plan to write about training code ? Maybe fine tuning the pretrained model with some specific domain data.
Regards

Not an Issue

Will you post a tutorial on how to train/fine tune the model ?

Question about model training.

I watched all your videos and followed along, it tooks about 5 days 😀, it's very fun and appreciate you!
Now I wonder how to train this model.

I also watched another video of yours “How diffusion models work - explanation and code!”.
This is also very useful and great video, thank you again!!
The video was about how to train unet(diffusion model) for latent denosing.

But we have four major models in here:
VAE-encoder, VAE-decoder, unet, and clip

If we want to train unet(diffusion mode) like "diffusion model training youtube",
does we freeze other models and train only unet?

However, the definition of learning is not well understood.
For example, if we want to create image B with a specific style of A, like A image -> styled B image

Where should I feed images A or random(input) and styled B(output), respectively?
The inference will look like this, but I don't know how to do it in training phase.
A(or random) -> VAE-encode -> [ z, clip-emb, time-emb -> unet -> z] * loop -> VAE-decode -> B

It is also questionable whether clip-embeding should just be left blank or random or specific text prompt?.
or should I input A image for clip-embeding?

I have searched on youtube for that how people train stable diffusion model then most video was using dreambooth.
It looks very hight level again like hugging face.

I would like to know exact concept and what happen under the hood.
Thanks to your video code I could understand stable diffusion ddpm model but I want to expand training concept.

Thank you for amazing works!
Happy new year!

why do we need set causal_mask =True in clip

prompt is a sentence ,we don't need to predict next token in prompt, is there a question to see the right tokens?
x = self.attention(x, causal_mask=True)

Issue about in-painting

Thanks a lot for your youtube channel and the code !!! but I have a question about in-painting:
you said in latent space there'll be a combination, but actually no

In https://github.com/CompVis/latent-diffusion, they just show a pre-trained model for in-painting...and i have check the code, no COMBINE in latent space. It seems like they just train a model especially for in-painting and done.

Any response will be appreciated !!!!

RuntimeError: Error(s) in loading state_dict for VAE_Encoder:

In fact, I faced this problem when I run the demo, it seems like the keys after converted cannot be found. What should I do?
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VAE_Encoder:
Unexpected key(s) in state_dict: "1.groupnorm_1.weight", "1.groupnorm_1.bias", "1.conv_1.weight", "1.conv_1.bias", "1.groupnorm_2.weight", "1.groupnorm_2.bias", "1.conv_2.weight", "1.conv_2.bias", "2.groupnorm_1.weight", "2.groupnorm_1.bias", "2.conv_1.weight", "2.conv_1.bias", "2.groupnorm_2.weight", "2.groupnorm_2.bias", "2.conv_2.weight", "2.conv_2.bias", "4.groupnorm_1.weight", "4.groupnorm_1.bias", "4.conv_1.weight", "4.conv_1.bias", "4.groupnorm_2.weight", "4.groupnorm_2.bias", "4.conv_2.weight", "4.conv_2.bias", "5.groupnorm_1.weight", "5.groupnorm_1.bias", "5.conv_1.weight", "5.conv_1.bias", "5.groupnorm_2.weight", "5.groupnorm_2.bias", "5.conv_2.weight", "5.conv_2.bias", "7.groupnorm_1.weight", "7.groupnorm_1.bias", "7.conv_1.weight", "7.conv_1.bias", "7.groupnorm_2.weight", "7.groupnorm_2.bias", "7.conv_2.weight", "7.conv_2.bias", "8.groupnorm_1.weight", "8.groupnorm_1.bias", "8.conv_1.weight", "8.conv_1.bias", "8.groupnorm_2.weight", "8.groupnorm_2.bias", "8.conv_2.weight", "8.conv_2.bias", "10.groupnorm_1.weight", "10.groupnorm_1.bias", "10.conv_1.weight", "10.conv_1.bias", "10.groupnorm_2.weight", "10.groupnorm_2.bias", "10.conv_2.weight", "10.conv_2.bias", "11.groupnorm_1.weight", "11.groupnorm_1.bias", "11.conv_1.weight", "11.conv_1.bias", "11.groupnorm_2.weight", "11.groupnorm_2.bias", "11.conv_2.weight", "11.conv_2.bias", "12.groupnorm_1.weight", "12.groupnorm_1.bias", "12.conv_1.weight", "12.conv_1.bias", "12.groupnorm_2.weight", "12.groupnorm_2.bias", "12.conv_2.weight", "12.conv_2.bias", "14.groupnorm_1.weight", "14.groupnorm_1.bias", "14.conv_1.weight", "14.conv_1.bias", "14.groupnorm_2.weight", "14.groupnorm_2.bias", "14.conv_2.weight", "14.conv_2.bias".

Not realy issue

Hey Umar Im really grateful for your yt video and github repo for this stable diffusion model. If its not to much to ask is there any way I can talk to you about some errors im getting through discord ?

Question about the output of the Unet

Hi, thank you so much for your work and sharing.

I want to retrain the Unet from the beginning, may I ask what is the output of the Unet according to this ddpm scheduler? Whether it is the random noise or the noise computed with the specific time step? I saw other codes, they use the random noise as the output of the model, but my model only works when I used the later one.

Highly appreciate your help!