Thank you in advance for this amazing project :-) I'm trying to run

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Your model is on your GPU ( device=torch.device("cuda")</code

Not able to run inference in fp16 mode about open_clip HOT 5 CLOSED

mlfoundations commented on August 24, 2024

Not able to run inference in fp16 mode

from open_clip.

Comments (5)

ivanprado commented on August 24, 2024 4

@rwightman suggested in #80 that the right way of running inference in fp16 (so that it runs faster) is by using autocast. Below you can find an example.

import torch
from PIL import Image
import open_clip
import requests

device = torch.device("cuda")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32',
                                                             pretrained='openai',
                                                             device=device)

url = "https://raw.githubusercontent.com/mlfoundations/open_clip/main/docs/CLIP.png"
image = preprocess(Image.open(requests.get(url, stream=True).raw)).unsqueeze(0).to(device)
text = open_clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.autocast(device_type=device.type):
    with torch.no_grad():
        image_features = model.encode_image(image)
        text_features = model.encode_text(text)
        image_features /= image_features.norm(dim=-1, keepdim=True)
        text_features /= text_features.norm(dim=-1, keepdim=True)

        text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

    print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

This is a solution to the issue, so I'm closing it. Thank you very much for the help!

from open_clip.

thouger commented on August 24, 2024 1

thank,it help me to !

from open_clip.

carlini commented on August 24, 2024

Your model is on your GPU (device=torch.device("cuda")) but your input is on the CPU. You should send the image and text to the GPU first.

from open_clip.

ivanprado commented on August 24, 2024

Ohh, sorry. I sent the wrong code sample. This is the one for which I get the full/half precision error:

import torch
from PIL import Image
import open_clip
import requests

device = torch.device("cuda")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32-quickgelu',
                                                             pretrained='laion400m_e32',
                                                             precision="fp16",
                                                             device=device)

url = "https://raw.githubusercontent.com/mlfoundations/open_clip/main/docs/CLIP.png"
image = preprocess(Image.open(requests.get(url, stream=True).raw)).unsqueeze(0).to(device)
text = open_clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

The full error:

RuntimeError                              Traceback (most recent call last)
Input In [7], in <module>
     15 text = open_clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)
     17 with torch.no_grad():
---> 18     image_features = model.encode_image(image)
     19     text_features = model.encode_text(text)
     20     image_features /= image_features.norm(dim=-1, keepdim=True)

File ~/miniconda/lib/python3.8/site-packages/open_clip/model.py:406, in CLIP.encode_image(self, image)
    405 def encode_image(self, image):
--> 406     return self.visual(image)

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda/lib/python3.8/site-packages/open_clip/model.py:261, in VisualTransformer.forward(self, x)
    260 def forward(self, x: torch.Tensor):
--> 261     x = self.conv1(x)  # shape = [*, width, grid, grid]
    262     x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]
    263     x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/conv.py:446, in Conv2d.forward(self, input)
    445 def forward(self, input: Tensor) -> Tensor:
--> 446     return self._conv_forward(input, self.weight, self.bias)

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442, in Conv2d._conv_forward(self, input, weight, bias)
    438 if self.padding_mode != 'zeros':
    439     return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    440                     weight, bias, self.stride,
    441                     _pair(0), self.dilation, self.groups)
--> 442 return F.conv2d(input, weight, bias, self.stride,
    443                 self.padding, self.dilation, self.groups)

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

@carlini do you see anything wrong on it or this might be a bug?

from open_clip.

ivanprado commented on August 24, 2024

I think I found the problem. I've created the following PR that fixes the problem: #80

from open_clip.

Not able to run inference in fp16 mode about open_clip HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent