Not sure how i ran out of memory given this is the only time ive tried running somethi

For those in need of a quick solution: make a python and use the <code class="n

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Feel free to benchmark the speed. Yo: <a

CUDA out of memory about big-sleep HOT 14 OPEN

hocermanc commented on August 14, 2024

CUDA out of memory

from big-sleep.

Comments (14)

WiseNat commented on August 14, 2024 2

To use this, Pytorch requires a decent amount of VRAM - probably around ~8GB for preset one.

It seems like setting the image_size parameter in the Imagine constructor to either 128 or 256 helps lower the amount being allocated.

from big-sleep.

scambier commented on August 14, 2024 2

For those in need of a quick solution: make a python script and use the num_cutouts option.

from big_sleep import Imagine

dream = Imagine(
    text = "a pyramid made of ice",
    lr = 5e-2,
    save_every = 25,
    save_progress = True,
    num_cutouts = 64 # 64 is ok for 6GB of video memory
)

dream()

from big-sleep.

Jackal-boop commented on August 14, 2024

same error here

from big-sleep.

WiseNat commented on August 14, 2024

I have a similar issue when running the command line version - not yet tested the library itself.

Traceback (most recent call last):
  File "c:\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python38\Scripts\dream.exe\__main__.py", line 7, in <module>
  File "c:\python38\lib\site-packages\big_sleep\cli.py", line 65, in main
    fire.Fire(train)
  File "c:\python38\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\python38\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "c:\python38\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\cli.py", line 62, in train
    imagine()
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\big_sleep.py", line 396, in forward
    self.model(self.encoded_texts["max"][0]) # one warmup step due to issue with CLIP and CUDA
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\big_sleep.py", line 216, in forward
    image_embed = perceptor.encode_image(into)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 519, in encode_image
    return self.visual(image.type(self.dtype))
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 410, in forward
    x = self.transformer(x)
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 381, in forward
    return self.resblocks(x)
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 369, in forward
    x = x + self.mlp(self.ln_2(x))
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 340, in forward
    ret = super().forward(x.type(torch.float32))
  File "c:\python38\lib\site-packages\torch\nn\modules\normalization.py", line 170, in forward
    return F.layer_norm(
  File "c:\python38\lib\site-packages\torch\nn\functional.py", line 2202, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 4.22 GiB already allocated; 10.82 MiB free; 4.38 GiB reserved in total by PyTorch)

from big-sleep.

NotNANtoN commented on August 14, 2024

Even better than reducing image_size is to reduce num_cutouts if you install the PR in #60. That way it will run faster and use less memory, while still generating images in the same resolution.

from big-sleep.

LtqxWYEG commented on August 14, 2024

It's the same resolution as in 512px², but if you set it below 64, (Does it have to be a power of two, btw?) it appears to have less... aah... informational resolution? Like... you know what I mean. More blurry. Eh. 64 seems fine though.
64 and 128 produce wildly different results if torch_deterministic - can you please tell me why?

from big-sleep.

NotNANtoN commented on August 14, 2024

Hi @LtqxWYEG. Do you refer to my comment about num_cutouts? The results should be different for sure, but in my experience the quality does not degrade too much. The whole process just gets a bit more unstable.

from big-sleep.

LtqxWYEG commented on August 14, 2024

Hi @LtqxWYEG. Do you refer to my comment about num_cutouts? The results should be different for sure, but in my experience the quality does not degrade too much. The whole process just gets a bit more unstable.

Yea. I guess it depends on what you want it to imagine. Dogs seem to be easiest. I tested with "magma flowing through a city" or "fractal mushrooms" and 16 never produced anything with much information in it. 32 was still a little blob-like, but 64 was as ok as 128 imo.
(Does it have to be a power of two, btw?)

from big-sleep.

NotNANtoN commented on August 14, 2024

True, some texts are more difficult to create and thus need larger num_cutouts.

It does not have to be a power of two. As far as I understand, 32 will be faster than 31 because powers of two can be nicely split into batches on a GPU. But something like 48 should be fine too. Feel free to benchmark the speed.

from big-sleep.

LtqxWYEG commented on August 14, 2024

Feel free to benchmark the speed.

Yo: #32= 43.59s; 33= 48.87s; 48= 48.13s; 64= 52.67s (101 iterations, Tesla T4)
You are correct that non multiples of 2² are slower than multiples of it. As you can see 33 is usually slower than 48, which is time-wise the middle between 32 and 64.

People said they want to set num_cutouts permanently to 32. I'd say 48 is better.

from big-sleep.

NotNANtoN commented on August 14, 2024

@LtqxWYEG Quite interesting! It seems that 48 is a viable option indeed.

You said people want 32? Where do they say so? Am I missing out on a discussion somewhere?

I agree though, that the default of 128 should definitely be reduced such that the 512 model fits into maybe even 6GB of VRAM.

from big-sleep.

LtqxWYEG commented on August 14, 2024

@NotNANtoN yeah, it's here: #15 (comment)

from big-sleep.

joetomasone commented on August 14, 2024

Any options for a lowly 2G of dedicated GPU memory? Can we use shared memory? All of the workarounds I've seen don't work for such a paltry amount of memory.

from big-sleep.

wolfgangmeyers commented on August 14, 2024

I think you need at least 4GB, and that is with minimal settings (`--num-cutouts=16, --image-size=128)

from big-sleep.

CUDA out of memory about big-sleep HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent