Comments (6)
I cannot reproduce the same results as showed in paper images "Figure 1" with the same text prompt.
Unfortunately, this is normal, because the publicly available model:
- is smaller, as it has roughly 10x fewer parameters,
- was trained on a filtered dataset.
You should get outputs similar to the third row of Figure 9.
From a user perspective, the main benefit of GLIDE is that it is much faster than the CLIP-guided methods which I have tried so far.
Is the base the only checkpoint available for the base diffusion model?
I think so. From what I can see in the code below, there are 6 checkpoints:
- two for classifier-free guidance (sampling and upsampling),
- two for inpainting (sampling and upsampling),
- two for CLIP (text encoding and image encoding).
glide-text2im/glide_text2im/download.py
Lines 10 to 17 in 742510e
from glide-text2im.
I see the nice following commits:
- 146bd9c
add install command to notebooks
-> git and pip at the start of the notebook, - f468908
add colab links
-> Colab badges for the links in the README, - 9cc8e56
colab GPU backend
-> GPU support toggled ON.
from glide-text2im.
I can see that the sampling part is slightly different than yours, adding the
model_fn
function to the sample loop. Is this related to the fact that they just do free guidance (cond_fn=None
) rather than clip guidance like in your colab?
To clarify any confusion:
-
when
cond_fn
is notNone
, I assume you are looking at the CLIP-guided approach:
https://github.com/openai/glide-text2im/blob/main/notebooks/clip_guided.ipynb -
the notebook linked in my first post is the classifier-free guidance, with
cond_fn=None
, copied from:
https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
Unless I am missing something, the model_fn
function is added to the sample loop in both notebooks called text2im.ipynb
.
# Sample from the base model.
model.del_cache()
samples = diffusion.p_sample_loop(
model_fn,
(full_batch_size, 3, options["image_size"], options["image_size"]),
device=device,
clip_denoised=True,
progress=True,
model_kwargs=model_kwargs,
cond_fn=None,
)[:batch_size]
model.del_cache()
Also, I have tried to combine the last two, and results seems to be better, like if clip guidance, for the
small
model introduces too much randomness. Any idea why?
I need to see the diff of what you did to understand better.
I would be glad to test this and see the results, if they are better. :) The black cat with white paws looks nice. 👍
from glide-text2im.
Thanks! I have two versions, this one
samples = diffusion.p_sample_loop(
model,
(batch_size, 3, options["image_size"], options["image_size"]),
device=device,
clip_denoised=True,
progress=True,
model_kwargs=model_kwargs,
cond_fn=cond_fn,
)
where
cond_fn = clip_model.cond_fn([prompt] * batch_size, guidance_scale)
and in the latest colab from the repo
samples = diffusion.p_sample_loop(
model_fn,
(full_batch_size, 3, options["image_size"], options["image_size"]),
device=device,
clip_denoised=True,
progress=True,
model_kwargs=model_kwargs,
cond_fn=None,
)[:batch_size]
with cond_fn=None
and as model_fn
def model_fn(x_t, ts, **kwargs):
half = x_t[: len(x_t) // 2]
combined = th.cat([half, half], dim=0)
model_out = model(combined, ts, **kwargs)
eps, rest = model_out[:, :3], model_out[:, 3:]
cond_eps, uncond_eps = th.split(eps, len(eps) // 2, dim=0)
half_eps = uncond_eps + guidance_scale * (cond_eps - uncond_eps)
eps = th.cat([half_eps, half_eps], dim=0)
return th.cat([eps, rest], dim=1)
from glide-text2im.
@woctezuma thanks!!! Is the base the only checkpoint available for the base diffusion model? I cannot reproduce the same results as showed in paper images "Figure 1" with the same text prompt.
In the references I can see also CLIP guided diffusion models for both 2566x256 and 512x512.
Crowson, K. Clip guided diffusion hq 256x256
https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj,
2021a.Crowson,K.Clip guided diffusion 512x512,secondarymodelmethod
https://twitter.com/RiversHaveWings/status/1462859669454536711, 2021b.
from glide-text2im.
@woctezuma thanks! I can see that the sampling part is slightly different than yours, adding the model_fn
function to the sample loop. Is this related to the fact that they just do free guidance (cond_fn=None
) rather than clip guidance like in your colab? Also, I have tried to combine the last two, and results seems to be better, like if clip guidance, for the small
model introduces too much randomness. Any idea why?
# Create the text tokens to feed to the model.
tokens = model.tokenizer.encode(prompt)
tokens, mask = model.tokenizer.padded_tokens_and_mask(
tokens, options['text_ctx']
)
# Create the classifier-free guidance tokens (empty)
full_batch_size = batch_size * 2
uncond_tokens, uncond_mask = model.tokenizer.padded_tokens_and_mask(
[], options['text_ctx']
)
# Pack the tokens together into model kwargs.
model_kwargs = dict(
tokens=th.tensor(
[tokens] * batch_size + [uncond_tokens] * batch_size, device=device
),
mask=th.tensor(
[mask] * batch_size + [uncond_mask] * batch_size,
dtype=th.bool,
device=device,
),
)
# Create a classifier-free guidance sampling function
def model_fn(x_t, ts, **kwargs):
half = x_t[: len(x_t) // 2]
combined = th.cat([half, half], dim=0)
model_out = model(combined, ts, **kwargs)
eps, rest = model_out[:, :3], model_out[:, 3:]
cond_eps, uncond_eps = th.split(eps, len(eps) // 2, dim=0)
half_eps = uncond_eps + guidance_scale * (cond_eps - uncond_eps)
eps = th.cat([half_eps, half_eps], dim=0)
return th.cat([eps, rest], dim=1)
# Sample from the base model.
model.del_cache()
samples = diffusion.p_sample_loop(
model_fn,
(full_batch_size, 3, options["image_size"], options["image_size"]),
device=device,
clip_denoised=True,
progress=True,
model_kwargs=model_kwargs,
cond_fn=None,
)[:batch_size]
model.del_cache()
# Show the output
show_images(samples)
from glide-text2im.
Related Issues (20)
- While running the clip_guided notebook in CPU mode I get: "RuntimeError - Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead" HOT 8
- Larger batch size to generate images in text2im.ipynb? HOT 2
- Training parameters
- In GPU mode generated image is all black with NaN tensor values (no problems in CPU mode) HOT 8
- Better resolution images for inpainting HOT 4
- Has the inpainting Colab been developed with the CLIP-guided version?
- disappointed, looks the model is poor for unseen data HOT 2
- In the clipped guided version, were GLIDE (filtered) and CLIP trained together?
- Question about the CLIP model HOT 2
- Question about generating masks
- YouTube video walk-through of this codebase
- How could I load a mask generated by myself? HOT 2
- Ways to reduce number of failed inpaints?
- Fixing Random Seed
- About CLIP training on nosied images
- Experimental IS and FID values without classifier guidance and CLIP guidance
- Inpaiting fin-tune details HOT 1
- How can i use .py file to run project in pycharm?
- Something wrong with upsample-inpaint checkpoint
- Is the formula for CFG different from the reference?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glide-text2im.