Coder Social home page Coder Social logo

Comments (5)

rwightman avatar rwightman commented on August 24, 2024 1

@EIFY did you try forcing the non reentrant checkpointing? could look to change the default if that works...

from open_clip.

rwightman avatar rwightman commented on August 24, 2024

@kkjh0723 I think it might break with gradient checkpointing? not sure there is a workaround, possibly maybe using non reentrant mode?

from open_clip.

EIFY avatar EIFY commented on August 24, 2024

I got the same error trying to run both --grad-checkpointing and --torchcompile, but since pytorch 2.1.0 --torchcompile now works with --accum-freq > 1 as the next best option.

from open_clip.

EIFY avatar EIFY commented on August 24, 2024

@rwightman No I haven't tried that.

In that regard, the good news is that

if self.grad_checkpointing and not torch.jit.is_scripting():
# TODO: handle kwargs https://github.com/pytorch/pytorch/issues/79887#issuecomment-1161758372
x = checkpoint(r, x, None, None, attn_mask)

pytorch/pytorch#79887 is now fixed and we should be able to do e.g.

if self.grad_checkpointing and not torch.jit.is_scripting(): 
    x = checkpoint(r, x, None, None, attn_mask, use_reentrant=False)

The bad news is that other than that grad_checkpointing is either delegated to the vision/text trunks w/o argument support

@torch.jit.ignore
def set_grad_checkpointing(self, enable=True):
self.visual.set_grad_checkpointing(enable)
self.transformer.grad_checkpointing = enable

or not supported at all:
@torch.jit.ignore
def set_grad_checkpointing(self, enable=True):
# FIXME support for non-transformer
pass

So fairly involved changes would be necessary. I will try doing the easy part and see if it at least gets past that when I get a chance.

from open_clip.

EIFY avatar EIFY commented on August 24, 2024

@rwightman OK so it turned out that use_reentrant=False doesn't help. It still breaks at the same point:

[2023-11-08 12:56:29,383] [0/0] torch._utils_internal: [INFO] CompilationMetrics(frame_key='1', co_name='forward', co_filename='/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/model.py', co_firstlineno=256, cache_size=0, guard_count=None, graph_op_count=None, graph_node_count=None, graph_input_count=None, entire_frame_compile_time_s=None, backend_compile_time_s=None, fail_reason="'NNModuleVariable' object has no attribute 'get_name'")
Traceback (most recent call last):
(...)
torch._dynamo.exc.InternalTorchDynamoError: 'NNModuleVariable' object has no attribute 'get_name'

from user code:
   File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/model.py", line 274, in forward
    image_features = dim_scale_img * self.encode_image(image, normalize=self.normalize) if image is not None else None
  File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/model.py", line 239, in encode_image
    features = self.visual(image)
  File "/home/jason-chou/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/transformer.py", line 486, in forward
    x = self.transformer(x)
  File "/home/jason-chou/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/transformer.py", line 319, in forward
    x = checkpoint(r, x, None, None, attn_mask, use_reentrant=False)

from open_clip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.