Comments (8)
I haven't tested this code with less than 16GB of GPU memory, but this is a bit surprising since each model is roughly 400M parameters and therefore around 800MB of memory.
One suggestion: try loading the checkpoint on CPU, and then moving to GPU, like so:
options_up = model_and_diffusion_defaults_upsampler() options_up['use_fp16'] = has_cuda options_up['timestep_respacing'] = 'fast27' # use 27 diffusion steps for very fast sampling model_up, diffusion_up = create_model_and_diffusion(**options_up) model_up.load_state_dict(load_checkpoint('upsample', th.device('cpu'))) model_up.eval() if has_cuda: model_up.convert_to_fp16() model_up.to(device) print('total upsampler parameters', sum(x.numel() for x in model_up.parameters()))
Amazing, thanks! In fact I read total upsampler parameters 398361286
.
Using the CPU trick it worked with the 4GB GTX 1050. Also it took few seconds to generate this curious dog
Maybe this approach could be a guideline in the docs...
from glide-text2im.
I haven't tested this code with less than 16GB of GPU memory, but this is a bit surprising since each model is roughly 400M parameters and therefore around 800MB of memory.
One suggestion: try loading the checkpoint on CPU, and then moving to GPU, like so:
options_up = model_and_diffusion_defaults_upsampler()
options_up['use_fp16'] = has_cuda
options_up['timestep_respacing'] = 'fast27' # use 27 diffusion steps for very fast sampling
model_up, diffusion_up = create_model_and_diffusion(**options_up)
model_up.load_state_dict(load_checkpoint('upsample', th.device('cpu')))
model_up.eval()
if has_cuda:
model_up.convert_to_fp16()
model_up.to(device)
print('total upsampler parameters', sum(x.numel() for x in model_up.parameters()))
from glide-text2im.
I still get the following error even when trying the above code
RuntimeError: CUDA out of memory. Tried to allocate 5.27 GiB (GPU 0; 11.77 GiB total capacity; 6.51 GiB already allocated; 1.50 GiB free; 7.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
from glide-text2im.
I ran into the same issue on a 4GB GPU. Oddly enough, loading the upsample model before the base model worked for me with no other changes.
from glide-text2im.
@kgullion thanks for your suggestion. But I ave tried to run the upscale model before the base model and I still get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 1.32 GiB (GPU 0; 11.77 GiB total capacity; 4.32 GiB already allocated; 540.94 MiB free; 5.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Note this error only occurs, if I want to run the code with higher batch size e.g. >50
from glide-text2im.
It looks like you had a 12-GB-VRAM GPU, while OP managed to work with a 4-GB-VRAM GPU.
Try clearing the memory, e.g. by restarting the Colab session if you use Google Colab.
from glide-text2im.
Hi @woctezuma . Thank you for your suggestion. I am running the code locally. I am not sure how to clear the memory in the code.
Also one thing to note, I am using the 'Inpaint' script:
from typing import Tuple
from PIL import Image
from datetime import datetime
import numpy as np
import torch as th
import torch.nn.functional as F
from glide_text2im.download import load_checkpoint
from glide_text2im.model_creation import (
create_model_and_diffusion,
model_and_diffusion_defaults,
model_and_diffusion_defaults_upsampler
)
def read_image(path: str, size: int = 256) -> Tuple[th.Tensor, th.Tensor]:
pil_img = Image.open(path).convert('RGB')
pil_img = pil_img.resize((size, size), resample=Image.BICUBIC)
img = np.array(pil_img)
return th.from_numpy(img)[None].permute(0, 3, 1, 2).float() / 127.5 - 1
#########################################
# Sampling parameters
prompt = "a evil cyborg"
batch_size = 16
guidance_scale = 5.0
# Tune this parameter to control the sharpness of 256x256 images.
# A value of 1.0 is sharper, but sometimes results in grainy artifacts.
upsample_temp = 0.997
# Source image we are inpainting
source_image_256 = read_image('notebooks/img_219.png', size=256)
source_image_64 = read_image('notebooks/img_219.png', size=64)
# The mask should always be a boolean 64x64 mask, and then we
# can upsample it for the second stage.
source_mask_64 = th.ones_like(source_image_64)[:, :1]
source_mask_64[:, :, 20:] = 0
source_mask_256 = F.interpolate(source_mask_64, (256, 256), mode='nearest')
#########################################
#########################################
# This notebook supports both CPU and GPU.
# On CPU, generating one sample may take on the order of 20 minutes.
# On a GPU, it should be under a minute.
has_cuda = th.cuda.is_available()
device = th.device('cpu' if not has_cuda else 'cuda')
# Make a filename
xprompt = prompt.replace(" ", "_")[:] + "-gs_" + str(guidance_scale)
# Create base model.
options = model_and_diffusion_defaults()
options['inpaint'] = True
options['use_fp16'] = has_cuda
options['timestep_respacing'] = '100' # use 100 diffusion steps for fast sampling
model, diffusion = create_model_and_diffusion(**options)
model.eval()
if has_cuda:
model.convert_to_fp16()
model.to(device)
model.load_state_dict(load_checkpoint('base-inpaint', device))
print('total base parameters', sum(x.numel() for x in model.parameters()))
# Create upsampler model.
options_up = model_and_diffusion_defaults_upsampler()
options_up['inpaint'] = True
options_up['use_fp16'] = has_cuda
options_up['timestep_respacing'] = 'fast27' # use 27 diffusion steps for very fast sampling
model_up, diffusion_up = create_model_and_diffusion(**options_up)
model_up.eval()
if has_cuda:
model_up.convert_to_fp16()
model_up.to(device)
model_up.load_state_dict(load_checkpoint('upsample-inpaint', device))
print('total upsampler parameters', sum(x.numel() for x in model_up.parameters()))
def save_images(batch: th.Tensor):
""" Save images """
scaled = ((batch + 1) * 127.5).round().clamp(0, 255).to(th.uint8).cpu()
reshaped = scaled.permute(2, 0, 3, 1).reshape([batch.shape[2], -1, 3])
# Save strip
stamp = datetime.today().strftime('%H%M%S')
Image.fromarray(reshaped.numpy()).save(f'output-{stamp}.png')
# Save individual
for _ in range(0, batch.shape[0]):
test_single = scaled.select(0, _)
test_reshape = test_single.permute(1, 2, 0).reshape([batch.shape[2], -1, 3])
Image.fromarray(test_reshape.numpy()).save(f'{xprompt}-{_}-{stamp}.png')
# Visualise the image we are inpainting - if you want to, uncomment
# save_images(source_image_256 * source_mask_256)
##############################
# Sample from the base model #
##############################
# Create the text tokens to feed to the model.
tokens = model.tokenizer.encode(prompt)
tokens, mask = model.tokenizer.padded_tokens_and_mask(
tokens, options['text_ctx']
)
# Create the classifier-free guidance tokens (empty)
full_batch_size = batch_size * 2
uncond_tokens, uncond_mask = model.tokenizer.padded_tokens_and_mask(
[], options['text_ctx']
)
# Pack the tokens together into model kwargs.
model_kwargs = dict(
tokens=th.tensor(
[tokens] * batch_size + [uncond_tokens] * batch_size, device=device
),
mask=th.tensor(
[mask] * batch_size + [uncond_mask] * batch_size,
dtype=th.bool,
device=device,
),
# Masked inpainting image
inpaint_image=(source_image_64 * source_mask_64).repeat(full_batch_size, 1, 1, 1).to(device),
inpaint_mask=source_mask_64.repeat(full_batch_size, 1, 1, 1).to(device),
)
# Create an classifier-free guidance sampling function
def model_fn(x_t, ts, **kwargs):
half = x_t[: len(x_t) // 2]
combined = th.cat([half, half], dim=0)
model_out = model(combined, ts, **kwargs)
eps, rest = model_out[:, :3], model_out[:, 3:]
cond_eps, uncond_eps = th.split(eps, len(eps) // 2, dim=0)
half_eps = uncond_eps + guidance_scale * (cond_eps - uncond_eps)
eps = th.cat([half_eps, half_eps], dim=0)
return th.cat([eps, rest], dim=1)
def denoised_fn(x_start):
# Force the model to have the exact right x_start predictions
# for the part of the image which is known.
return (
x_start * (1 - model_kwargs['inpaint_mask'])
+ model_kwargs['inpaint_image'] * model_kwargs['inpaint_mask']
)
# Sample from the base model.
model.del_cache()
samples = diffusion.p_sample_loop(
model_fn,
(full_batch_size, 3, options["image_size"], options["image_size"]),
device=device,
clip_denoised=True,
progress=True,
model_kwargs=model_kwargs,
cond_fn=None,
denoised_fn=denoised_fn,
)[:batch_size]
model.del_cache()
# 64x64 output - not worth saving, but uncomment if you like!
# save_images(samples)
##############################
# Upsample the 64x64 samples #
##############################
tokens = model_up.tokenizer.encode(prompt)
tokens, mask = model_up.tokenizer.padded_tokens_and_mask(
tokens, options_up['text_ctx']
)
# Create the model conditioning dict.
model_kwargs = dict(
# Low-res image to upsample.
low_res=((samples + 1) * 127.5).round() / 127.5 - 1,
# Text tokens
tokens=th.tensor(
[tokens] * batch_size, device=device
),
mask=th.tensor(
[mask] * batch_size,
dtype=th.bool,
device=device,
),
# Masked inpainting image.
inpaint_image=(source_image_256 * source_mask_256).repeat(batch_size, 1, 1, 1).to(device),
inpaint_mask=source_mask_256.repeat(batch_size, 1, 1, 1).to(device),
)
def denoised_fn(x_start):
# Force the model to have the exact right x_start predictions
# for the part of the image which is known.
return (
x_start * (1 - model_kwargs['inpaint_mask'])
+ model_kwargs['inpaint_image'] * model_kwargs['inpaint_mask']
)
# Sample from the base model.
model_up.del_cache()
up_shape = (batch_size, 3, options_up["image_size"], options_up["image_size"])
up_samples = diffusion_up.p_sample_loop(
model_up,
up_shape,
noise=th.randn(up_shape, device=device) * upsample_temp,
device=device,
clip_denoised=True,
progress=True,
model_kwargs=model_kwargs,
cond_fn=None,
denoised_fn=denoised_fn,
)[:batch_size]
model_up.del_cache()
# Show the output
save_images(up_samples)
Setting the batch size to greater than 16 will result in an OOM error.
here is my nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:43:00.0 On | N/A |
| 0% 44C P0 34W / 170W | 1100MiB / 12288MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3555 G /usr/lib/xorg/Xorg 643MiB |
| 0 N/A N/A 4427 G /usr/bin/gnome-shell 185MiB |
| 0 N/A N/A 9394 G ...275038181146881444,131072 23MiB |
| 0 N/A N/A 14262 G telegram-desktop 2MiB |
| 0 N/A N/A 14661 G /usr/lib/firefox/firefox 204MiB |
| 0 N/A N/A 16530 G gnome-control-center 2MiB |
| 0 N/A N/A 16806 G .../debug.log --shared-files 21MiB |
| 0 N/A N/A 24302 G ..._24179.log --shared-files 13MiB |
+-----------------------------------------------------------------------------+
from glide-text2im.
@kgullion Thanks, I will try your suggestion!
from glide-text2im.
Related Issues (20)
- While running the clip_guided notebook in CPU mode I get: "RuntimeError - Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead" HOT 8
- Larger batch size to generate images in text2im.ipynb? HOT 2
- Training parameters
- In GPU mode generated image is all black with NaN tensor values (no problems in CPU mode) HOT 8
- Better resolution images for inpainting HOT 4
- Has the inpainting Colab been developed with the CLIP-guided version?
- disappointed, looks the model is poor for unseen data HOT 2
- In the clipped guided version, were GLIDE (filtered) and CLIP trained together?
- Question about the CLIP model HOT 2
- Question about generating masks
- YouTube video walk-through of this codebase
- How could I load a mask generated by myself? HOT 2
- Ways to reduce number of failed inpaints?
- Fixing Random Seed
- About CLIP training on nosied images
- Experimental IS and FID values without classifier guidance and CLIP guidance
- Inpaiting fin-tune details HOT 1
- How can i use .py file to run project in pycharm?
- Something wrong with upsample-inpaint checkpoint
- Is the formula for CFG different from the reference?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glide-text2im.