Coder Social home page Coder Social logo

CUDA out of memory about big-sleep HOT 14 OPEN

hocermanc avatar hocermanc commented on August 14, 2024
CUDA out of memory

from big-sleep.

Comments (14)

WiseNat avatar WiseNat commented on August 14, 2024 2

To use this, Pytorch requires a decent amount of VRAM - probably around ~8GB for preset one.

It seems like setting the image_size parameter in the Imagine constructor to either 128 or 256 helps lower the amount being allocated.

from big-sleep.

scambier avatar scambier commented on August 14, 2024 2

For those in need of a quick solution: make a python script and use the num_cutouts option.

from big_sleep import Imagine

dream = Imagine(
    text = "a pyramid made of ice",
    lr = 5e-2,
    save_every = 25,
    save_progress = True,
    num_cutouts = 64 # 64 is ok for 6GB of video memory
)

dream()

from big-sleep.

Jackal-boop avatar Jackal-boop commented on August 14, 2024

same error here

from big-sleep.

WiseNat avatar WiseNat commented on August 14, 2024

I have a similar issue when running the command line version - not yet tested the library itself.

Traceback (most recent call last):
  File "c:\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python38\Scripts\dream.exe\__main__.py", line 7, in <module>
  File "c:\python38\lib\site-packages\big_sleep\cli.py", line 65, in main
    fire.Fire(train)
  File "c:\python38\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\python38\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "c:\python38\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\cli.py", line 62, in train
    imagine()
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\big_sleep.py", line 396, in forward
    self.model(self.encoded_texts["max"][0]) # one warmup step due to issue with CLIP and CUDA
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\big_sleep.py", line 216, in forward
    image_embed = perceptor.encode_image(into)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 519, in encode_image
    return self.visual(image.type(self.dtype))
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 410, in forward
    x = self.transformer(x)
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 381, in forward
    return self.resblocks(x)
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 369, in forward
    x = x + self.mlp(self.ln_2(x))
  File "c:\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python38\lib\site-packages\big_sleep\clip.py", line 340, in forward
    ret = super().forward(x.type(torch.float32))
  File "c:\python38\lib\site-packages\torch\nn\modules\normalization.py", line 170, in forward
    return F.layer_norm(
  File "c:\python38\lib\site-packages\torch\nn\functional.py", line 2202, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 4.22 GiB already allocated; 10.82 MiB free; 4.38 GiB reserved in total by PyTorch)

from big-sleep.

NotNANtoN avatar NotNANtoN commented on August 14, 2024

Even better than reducing image_size is to reduce num_cutouts if you install the PR in #60. That way it will run faster and use less memory, while still generating images in the same resolution.

from big-sleep.

LtqxWYEG avatar LtqxWYEG commented on August 14, 2024

It's the same resolution as in 512px², but if you set it below 64, (Does it have to be a power of two, btw?) it appears to have less... aah... informational resolution? Like... you know what I mean. More blurry. Eh. 64 seems fine though.
64 and 128 produce wildly different results if torch_deterministic - can you please tell me why?

from big-sleep.

NotNANtoN avatar NotNANtoN commented on August 14, 2024

Hi @LtqxWYEG. Do you refer to my comment about num_cutouts? The results should be different for sure, but in my experience the quality does not degrade too much. The whole process just gets a bit more unstable.

from big-sleep.

LtqxWYEG avatar LtqxWYEG commented on August 14, 2024

Hi @LtqxWYEG. Do you refer to my comment about num_cutouts? The results should be different for sure, but in my experience the quality does not degrade too much. The whole process just gets a bit more unstable.

Yea. I guess it depends on what you want it to imagine. Dogs seem to be easiest. I tested with "magma flowing through a city" or "fractal mushrooms" and 16 never produced anything with much information in it. 32 was still a little blob-like, but 64 was as ok as 128 imo.
(Does it have to be a power of two, btw?)

from big-sleep.

NotNANtoN avatar NotNANtoN commented on August 14, 2024

True, some texts are more difficult to create and thus need larger num_cutouts.

It does not have to be a power of two. As far as I understand, 32 will be faster than 31 because powers of two can be nicely split into batches on a GPU. But something like 48 should be fine too. Feel free to benchmark the speed.

from big-sleep.

LtqxWYEG avatar LtqxWYEG commented on August 14, 2024

Feel free to benchmark the speed.

Yo: #32= 43.59s; 33= 48.87s; 48= 48.13s; 64= 52.67s (101 iterations, Tesla T4)
You are correct that non multiples of 2² are slower than multiples of it. As you can see 33 is usually slower than 48, which is time-wise the middle between 32 and 64.

People said they want to set num_cutouts permanently to 32. I'd say 48 is better.

from big-sleep.

NotNANtoN avatar NotNANtoN commented on August 14, 2024

@LtqxWYEG Quite interesting! It seems that 48 is a viable option indeed.

You said people want 32? Where do they say so? Am I missing out on a discussion somewhere?

I agree though, that the default of 128 should definitely be reduced such that the 512 model fits into maybe even 6GB of VRAM.

from big-sleep.

LtqxWYEG avatar LtqxWYEG commented on August 14, 2024

@NotNANtoN yeah, it's here: #15 (comment)

from big-sleep.

joetomasone avatar joetomasone commented on August 14, 2024

Any options for a lowly 2G of dedicated GPU memory? Can we use shared memory? All of the workarounds I've seen don't work for such a paltry amount of memory.

from big-sleep.

wolfgangmeyers avatar wolfgangmeyers commented on August 14, 2024

I think you need at least 4GB, and that is with minimal settings (`--num-cutouts=16, --image-size=128)

from big-sleep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.