Comments (5)
@EIFY did you try forcing the non reentrant checkpointing? could look to change the default if that works...
from open_clip.
@kkjh0723 I think it might break with gradient checkpointing? not sure there is a workaround, possibly maybe using non reentrant mode?
from open_clip.
I got the same error trying to run both --grad-checkpointing
and --torchcompile
, but since pytorch 2.1.0 --torchcompile
now works with --accum-freq
> 1 as the next best option.
from open_clip.
@rwightman No I haven't tried that.
In that regard, the good news is that
open_clip/src/open_clip/transformer.py
Lines 320 to 322 in 91923df
pytorch/pytorch#79887 is now fixed and we should be able to do e.g.
if self.grad_checkpointing and not torch.jit.is_scripting():
x = checkpoint(r, x, None, None, attn_mask, use_reentrant=False)
The bad news is that other than that grad_checkpointing
is either delegated to the vision/text trunks w/o argument support
open_clip/src/open_clip/model.py
Lines 260 to 263 in 91923df
or not supported at all:
open_clip/src/open_clip/modified_resnet.py
Lines 161 to 164 in 91923df
So fairly involved changes would be necessary. I will try doing the easy part and see if it at least gets past that when I get a chance.
from open_clip.
@rwightman OK so it turned out that use_reentrant=False
doesn't help. It still breaks at the same point:
[2023-11-08 12:56:29,383] [0/0] torch._utils_internal: [INFO] CompilationMetrics(frame_key='1', co_name='forward', co_filename='/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/model.py', co_firstlineno=256, cache_size=0, guard_count=None, graph_op_count=None, graph_node_count=None, graph_input_count=None, entire_frame_compile_time_s=None, backend_compile_time_s=None, fail_reason="'NNModuleVariable' object has no attribute 'get_name'")
Traceback (most recent call last):
(...)
torch._dynamo.exc.InternalTorchDynamoError: 'NNModuleVariable' object has no attribute 'get_name'
from user code:
File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/model.py", line 274, in forward
image_features = dim_scale_img * self.encode_image(image, normalize=self.normalize) if image is not None else None
File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/model.py", line 239, in encode_image
features = self.visual(image)
File "/home/jason-chou/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/transformer.py", line 486, in forward
x = self.transformer(x)
File "/home/jason-chou/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jason-chou/.local/lib/python3.10/site-packages/open_clip/transformer.py", line 319, in forward
x = checkpoint(r, x, None, None, attn_mask, use_reentrant=False)
from open_clip.
Related Issues (20)
- [doc bug?] timm version HOT 3
- Error When Loading Non SigLIP Pre-Trained Checkpoint To Train With Sigmoid Loss HOT 7
- RuntimeError: PytorchStreamReader failed locating file constants.pkl: file not found HOT 2
- coca training doesn't work HOT 2
- SigLIP logits HOT 7
- `SigLipTokenizer` and `get_reduction_mask_fn` are unused HOT 2
- Loading only text/image model HOT 4
- Tips for those who want to get the last_hidden_state HOT 1
- How to load hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 without internet access HOT 2
- Long text encoding requirements HOT 4
- Request for Access to Visual Encoder Attention Weights HOT 2
- Explanation of Tensorboard Validation Graphs HOT 8
- Question on why non deterministic shuffle is used HOT 3
- Why T5 tokenizer vocab size dose not match the token_embedding size in the pretrained siglip model? HOT 2
- no ‘logit_bias’ for DFN2B-CLIP-ViT-L-14 HOT 2
- ClipLoss.forward() got an unexpected keyword argument 'logit_bias' HOT 2
- Different validation scores before and after save on the same .pt HOT 4
- OOM Error trying to finetune ViT-B-32 on Nvidia A10 HOT 4
- Fine tuning on a pretrained model can lead to poor performance HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from open_clip.