hkproj / pytorch-stable-diffusion Goto Github PK
View Code? Open in Web Editor NEWStable Diffusion implemented from scratch in PyTorch
Home Page: https://www.youtube.com/watch?v=ZBKpAp_6TGI
License: MIT License
Stable Diffusion implemented from scratch in PyTorch
Home Page: https://www.youtube.com/watch?v=ZBKpAp_6TGI
License: MIT License
Can you also write a simple file for training a database of images? @hkproj
Hello. Thank you so much for sharing your video and code . I found yours is the most easy one to follow.
Do you have a plan to write about training code ? Maybe fine tuning the pretrained model with some specific domain data.
Regards
In attention.py line 39 you misspelled casual_mask
Will you post a tutorial on how to train/fine tune the model ?
I'm newbie. Can you make a video to run this project ?
Thanks
Hi, thank you so much for your work and sharing.
I want to retrain the Unet from the beginning, may I ask what is the output of the Unet according to this ddpm scheduler? Whether it is the random noise or the noise computed with the specific time step? I saw other codes, they use the random noise as the output of the model, but my model only works when I used the later one.
Highly appreciate your help!
How can I use the code on multiple GPUs using torch.nn.DataParallel? It runs out of memory when using one GPU. The kernel dies when inferring on CPU.
Hi!
Can you upload the PDF slides?
I watched all your videos and followed along, it tooks about 5 days ๐, it's very fun and appreciate you!
Now I wonder how to train this model.
I also watched another video of yours โHow diffusion models work - explanation and code!โ.
This is also very useful and great video, thank you again!!
The video was about how to train unet(diffusion model) for latent denosing.
But we have four major models in here:
VAE-encoder, VAE-decoder, unet, and clip
If we want to train unet(diffusion mode) like "diffusion model training youtube",
does we freeze other models and train only unet?
However, the definition of learning is not well understood.
For example, if we want to create image B with a specific style of A, like A image -> styled B image
Where should I feed images A or random(input) and styled B(output), respectively?
The inference will look like this, but I don't know how to do it in training phase.
A(or random) -> VAE-encode -> [ z, clip-emb, time-emb -> unet -> z] * loop -> VAE-decode -> B
It is also questionable whether clip-embeding should just be left blank or random or specific text prompt?.
or should I input A image for clip-embeding?
I have searched on youtube for that how people train stable diffusion model then most video was using dreambooth.
It looks very hight level again like hugging face.
I would like to know exact concept and what happen under the hood.
Thanks to your video code I could understand stable diffusion ddpm model but I want to expand training concept.
Thank you for amazing works!
Happy new year!
prompt is a sentence ,we don't need to predict next token in prompt, is there a question to see the right tokens?
x = self.attention(x, causal_mask=True)
In fact, I faced this problem when I run the demo, it seems like the keys after converted cannot be found. What should I do?
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VAE_Encoder:
Unexpected key(s) in state_dict: "1.groupnorm_1.weight", "1.groupnorm_1.bias", "1.conv_1.weight", "1.conv_1.bias", "1.groupnorm_2.weight", "1.groupnorm_2.bias", "1.conv_2.weight", "1.conv_2.bias", "2.groupnorm_1.weight", "2.groupnorm_1.bias", "2.conv_1.weight", "2.conv_1.bias", "2.groupnorm_2.weight", "2.groupnorm_2.bias", "2.conv_2.weight", "2.conv_2.bias", "4.groupnorm_1.weight", "4.groupnorm_1.bias", "4.conv_1.weight", "4.conv_1.bias", "4.groupnorm_2.weight", "4.groupnorm_2.bias", "4.conv_2.weight", "4.conv_2.bias", "5.groupnorm_1.weight", "5.groupnorm_1.bias", "5.conv_1.weight", "5.conv_1.bias", "5.groupnorm_2.weight", "5.groupnorm_2.bias", "5.conv_2.weight", "5.conv_2.bias", "7.groupnorm_1.weight", "7.groupnorm_1.bias", "7.conv_1.weight", "7.conv_1.bias", "7.groupnorm_2.weight", "7.groupnorm_2.bias", "7.conv_2.weight", "7.conv_2.bias", "8.groupnorm_1.weight", "8.groupnorm_1.bias", "8.conv_1.weight", "8.conv_1.bias", "8.groupnorm_2.weight", "8.groupnorm_2.bias", "8.conv_2.weight", "8.conv_2.bias", "10.groupnorm_1.weight", "10.groupnorm_1.bias", "10.conv_1.weight", "10.conv_1.bias", "10.groupnorm_2.weight", "10.groupnorm_2.bias", "10.conv_2.weight", "10.conv_2.bias", "11.groupnorm_1.weight", "11.groupnorm_1.bias", "11.conv_1.weight", "11.conv_1.bias", "11.groupnorm_2.weight", "11.groupnorm_2.bias", "11.conv_2.weight", "11.conv_2.bias", "12.groupnorm_1.weight", "12.groupnorm_1.bias", "12.conv_1.weight", "12.conv_1.bias", "12.groupnorm_2.weight", "12.groupnorm_2.bias", "12.conv_2.weight", "12.conv_2.bias", "14.groupnorm_1.weight", "14.groupnorm_1.bias", "14.conv_1.weight", "14.conv_1.bias", "14.groupnorm_2.weight", "14.groupnorm_2.bias", "14.conv_2.weight", "14.conv_2.bias".
Hey Umar Im really grateful for your yt video and github repo for this stable diffusion model. If its not to much to ask is there any way I can talk to you about some errors im getting through discord ?
Thanks a lot for your youtube channel and the code !!! but I have a question about in-painting:
you said in latent space there'll be a combination, but actually no
In https://github.com/CompVis/latent-diffusion, they just show a pre-trained model for in-painting...and i have check the code, no COMBINE in latent space. It seems like they just train a model especially for in-painting and done.
Any response will be appreciated !!!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.