Coder Social home page Coder Social logo

NF4Tensor uses 8 bits of memory about ao HOT 7 CLOSED

cuichenx avatar cuichenx commented on June 2, 2024
NF4Tensor uses 8 bits of memory

from ao.

Comments (7)

drisspg avatar drisspg commented on June 2, 2024

I am not able to replicate this,

import torch
import torchao
from pathlib import Path
import logging
logging.basicConfig(level=logging.INFO)

from transformer_nuggets.utils.benchmark import save_memory_snapshot
print(f"Memory allocated: {torch.cuda.memory_allocated()}, Memory Reserved: {torch.cuda.memory_reserved()}")
with save_memory_snapshot(Path("nf4_memory")):
    original = torch.rand([1024, 4096], dtype=torch.bfloat16, device="cuda")
    print(f"Memory allocated: {torch.cuda.memory_allocated()}, Memory Reserved: {torch.cuda.memory_reserved()}")
    t4 = torchao.dtypes.nf4tensor.NF4Tensor.from_tensor(original, 64, 256)
    del original
    for _ in range(10):
        a = torch.empty(4096, dtype=torch.bfloat16, device="cuda")
    print(f"Memory allocated: {torch.cuda.memory_allocated()}, Memory Reserved: {torch.cuda.memory_reserved()}")

Produces:

Memory allocated: 0, Memory Reserved: 0
Memory allocated: 8388608, Memory Reserved: 20971520
Memory allocated: 2302976, Memory Reserved: 90177536

The final memory allocated by the NF4Tensor is: 2302976
(2302976 / (1024 * 4096)) = 4.39 bytes/param

from ao.

cuichenx avatar cuichenx commented on June 2, 2024

Hi @drisspg , thanks for the prompt reply! I can get the same result as you using your script. I found out if the nf4 tensor is initialized from a cpu tensor and then moved to GPU, then the memory usage is 8-bit.
Also, calling .cuda() the first time doesn't seem to have an effect?

import torch
import torchao
from pathlib import Path
import logging
logging.basicConfig(level=logging.INFO)

# from transformer_nuggets.utils.benchmark import save_memory_snapshot
print(f"Memory allocated: {torch.cuda.memory_allocated()}, Memory Reserved: {torch.cuda.memory_reserved()}")
# with save_memory_snapshot(Path("nf4_memory")):
original = torch.rand([1024, 4096], dtype=torch.bfloat16)
print(f"Memory allocated: {torch.cuda.memory_allocated()}, Memory Reserved: {torch.cuda.memory_reserved()}")
t4 = torchao.dtypes.nf4tensor.NF4Tensor.from_tensor(original, 64, 256)
del original

t4 = t4.cuda()
print(f"Memory allocated: {torch.cuda.memory_allocated()}, Memory Reserved: {torch.cuda.memory_reserved()}")
t4 = t4.cuda()
print(f"Memory allocated: {torch.cuda.memory_allocated()}, Memory Reserved: {torch.cuda.memory_reserved()}")

Produces:

Memory allocated: 0, Memory Reserved: 0
Memory allocated: 0, Memory Reserved: 0
Memory allocated: 0, Memory Reserved: 0
Memory allocated: 4194304, Memory Reserved: 20971520

from ao.

msaroufim avatar msaroufim commented on June 2, 2024

Yeah I think I broke this

return args[0][0].get_original_weight().to(args[1]["dtype"]).to(args[1]["device"])

from ao.

drisspg avatar drisspg commented on June 2, 2024

Ahhh you are right the problem was the in the implementation in to, but I think that has since been resolved on main:https://github.com/pytorch/ao/blob/main/torchao/dtypes/nf4tensor.py#L899-L923
where previously you were actually just getting a bf16 tensors secretly since it wasnt supported

from ao.

drisspg avatar drisspg commented on June 2, 2024

actually well one caveat is if you call t.cuda() this will end up returning you a full bf16 value but if you call t.to("cuda") this should work as expected

from ao.

cuichenx avatar cuichenx commented on June 2, 2024

Thanks for the comments!
Using a nightly release from 2024.4.26, the output from my snippet above is

Memory allocated: 0, Memory Reserved: 0
Memory allocated: 0, Memory Reserved: 0
Memory allocated: 8388608, Memory Reserved: 20971520
Memory allocated: 8388608, Memory Reserved: 20971520

and this happens for both .cuda() and .to('cuda').

I think there's still a memory issue with moving an NF4 tensor from CPU to GPU.
Initializing the NF4 tensor directly on GPU still produces the correct result.

from ao.

drisspg avatar drisspg commented on June 2, 2024

I think the update landed 2 days ago so likely isn't in that package

from ao.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.