Coder Social home page Coder Social logo

hkproj / pytorch-stable-diffusion Goto Github PK

View Code? Open in Web Editor NEW
344.0 7.0 75.0 1.97 MB

Stable Diffusion implemented from scratch in PyTorch

Home Page: https://www.youtube.com/watch?v=ZBKpAp_6TGI

License: MIT License

Jupyter Notebook 84.70% Python 15.30%
diffusion-models latent-diffusion-models paper-implementations pytorch pytorch-implementation stable-diffusion

pytorch-stable-diffusion's Introduction

pytorch-stable-diffusion

PyTorch implementation of Stable Diffusion from scratch

Download weights and tokenizer files:

  1. Download vocab.json and merges.txt from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/tokenizer and save them in the data folder
  2. Download v1-5-pruned-emaonly.ckpt from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main and save it in the data folder

Tested fine-tuned models:

Just download the ckpt file from any fine-tuned SD (up to v1.5).

  1. InkPunk Diffusion: https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main
  2. Illustration Diffusion (Hollie Mengert): https://huggingface.co/ogkalu/Illustration-Diffusion/tree/main

Special thanks

Special thanks to the following repositories:

  1. https://github.com/CompVis/stable-diffusion/
  2. https://github.com/divamgupta/stable-diffusion-tensorflow
  3. https://github.com/kjsman/stable-diffusion-pytorch
  4. https://github.com/huggingface/diffusers/

pytorch-stable-diffusion's People

Contributors

hkproj avatar lemonface0309 avatar nick8592 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-stable-diffusion's Issues

Parallel Computing

How can I use the code on multiple GPUs using torch.nn.DataParallel? It runs out of memory when using one GPU. The kernel dies when inferring on CPU.

training code

Hello. Thank you so much for sharing your video and code . I found yours is the most easy one to follow.
Do you have a plan to write about training code ? Maybe fine tuning the pretrained model with some specific domain data.
Regards

Not an Issue

Will you post a tutorial on how to train/fine tune the model ?

Question about model training.

I watched all your videos and followed along, it tooks about 5 days ๐Ÿ˜€, it's very fun and appreciate you!
Now I wonder how to train this model.

I also watched another video of yours โ€œHow diffusion models work - explanation and code!โ€.
This is also very useful and great video, thank you again!!
The video was about how to train unet(diffusion model) for latent denosing.

But we have four major models in here:
VAE-encoder, VAE-decoder, unet, and clip

If we want to train unet(diffusion mode) like "diffusion model training youtube",
does we freeze other models and train only unet?

However, the definition of learning is not well understood.
For example, if we want to create image B with a specific style of A, like A image -> styled B image

Where should I feed images A or random(input) and styled B(output), respectively?
The inference will look like this, but I don't know how to do it in training phase.
A(or random) -> VAE-encode -> [ z, clip-emb, time-emb -> unet -> z] * loop -> VAE-decode -> B

It is also questionable whether clip-embeding should just be left blank or random or specific text prompt?.
or should I input A image for clip-embeding?

I have searched on youtube for that how people train stable diffusion model then most video was using dreambooth.
It looks very hight level again like hugging face.

I would like to know exact concept and what happen under the hood.
Thanks to your video code I could understand stable diffusion ddpm model but I want to expand training concept.

Thank you for amazing works!
Happy new year!

Issue about in-painting

Thanks a lot for your youtube channel and the code !!! but I have a question about in-painting:
you said in latent space there'll be a combination, but actually no
2d9adcf33e0c5cbde361e311bc2a22fb

In https://github.com/CompVis/latent-diffusion, they just show a pre-trained model for in-painting...and i have check the code, no COMBINE in latent space. It seems like they just train a model especially for in-painting and done.

Any response will be appreciated !!!!

RuntimeError: Error(s) in loading state_dict for VAE_Encoder:

In fact, I faced this problem when I run the demo, it seems like the keys after converted cannot be found. What should I do?
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VAE_Encoder:
Unexpected key(s) in state_dict: "1.groupnorm_1.weight", "1.groupnorm_1.bias", "1.conv_1.weight", "1.conv_1.bias", "1.groupnorm_2.weight", "1.groupnorm_2.bias", "1.conv_2.weight", "1.conv_2.bias", "2.groupnorm_1.weight", "2.groupnorm_1.bias", "2.conv_1.weight", "2.conv_1.bias", "2.groupnorm_2.weight", "2.groupnorm_2.bias", "2.conv_2.weight", "2.conv_2.bias", "4.groupnorm_1.weight", "4.groupnorm_1.bias", "4.conv_1.weight", "4.conv_1.bias", "4.groupnorm_2.weight", "4.groupnorm_2.bias", "4.conv_2.weight", "4.conv_2.bias", "5.groupnorm_1.weight", "5.groupnorm_1.bias", "5.conv_1.weight", "5.conv_1.bias", "5.groupnorm_2.weight", "5.groupnorm_2.bias", "5.conv_2.weight", "5.conv_2.bias", "7.groupnorm_1.weight", "7.groupnorm_1.bias", "7.conv_1.weight", "7.conv_1.bias", "7.groupnorm_2.weight", "7.groupnorm_2.bias", "7.conv_2.weight", "7.conv_2.bias", "8.groupnorm_1.weight", "8.groupnorm_1.bias", "8.conv_1.weight", "8.conv_1.bias", "8.groupnorm_2.weight", "8.groupnorm_2.bias", "8.conv_2.weight", "8.conv_2.bias", "10.groupnorm_1.weight", "10.groupnorm_1.bias", "10.conv_1.weight", "10.conv_1.bias", "10.groupnorm_2.weight", "10.groupnorm_2.bias", "10.conv_2.weight", "10.conv_2.bias", "11.groupnorm_1.weight", "11.groupnorm_1.bias", "11.conv_1.weight", "11.conv_1.bias", "11.groupnorm_2.weight", "11.groupnorm_2.bias", "11.conv_2.weight", "11.conv_2.bias", "12.groupnorm_1.weight", "12.groupnorm_1.bias", "12.conv_1.weight", "12.conv_1.bias", "12.groupnorm_2.weight", "12.groupnorm_2.bias", "12.conv_2.weight", "12.conv_2.bias", "14.groupnorm_1.weight", "14.groupnorm_1.bias", "14.conv_1.weight", "14.conv_1.bias", "14.groupnorm_2.weight", "14.groupnorm_2.bias", "14.conv_2.weight", "14.conv_2.bias".

Not realy issue

Hey Umar Im really grateful for your yt video and github repo for this stable diffusion model. If its not to much to ask is there any way I can talk to you about some errors im getting through discord ?

Question about the output of the Unet

Hi, thank you so much for your work and sharing.

I want to retrain the Unet from the beginning, may I ask what is the output of the Unet according to this ddpm scheduler? Whether it is the random noise or the noise computed with the specific time step? I saw other codes, they use the random noise as the output of the model, but my model only works when I used the later one.

Highly appreciate your help!

Small Typo

In attention.py line 39 you misspelled casual_mask

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.