Coder Social home page Coder Social logo

Comments (5)

ivanprado avatar ivanprado commented on August 24, 2024 4

@rwightman suggested in #80 that the right way of running inference in fp16 (so that it runs faster) is by using autocast. Below you can find an example.

import torch
from PIL import Image
import open_clip
import requests

device = torch.device("cuda")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32',
                                                             pretrained='openai',
                                                             device=device)

url = "https://raw.githubusercontent.com/mlfoundations/open_clip/main/docs/CLIP.png"
image = preprocess(Image.open(requests.get(url, stream=True).raw)).unsqueeze(0).to(device)
text = open_clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.autocast(device_type=device.type):
    with torch.no_grad():
        image_features = model.encode_image(image)
        text_features = model.encode_text(text)
        image_features /= image_features.norm(dim=-1, keepdim=True)
        text_features /= text_features.norm(dim=-1, keepdim=True)

        text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

    print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

This is a solution to the issue, so I'm closing it. Thank you very much for the help!

from open_clip.

thouger avatar thouger commented on August 24, 2024 1

thank,it help me to !

from open_clip.

carlini avatar carlini commented on August 24, 2024

Your model is on your GPU (device=torch.device("cuda")) but your input is on the CPU. You should send the image and text to the GPU first.

from open_clip.

ivanprado avatar ivanprado commented on August 24, 2024

Ohh, sorry. I sent the wrong code sample. This is the one for which I get the full/half precision error:

import torch
from PIL import Image
import open_clip
import requests

device = torch.device("cuda")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32-quickgelu',
                                                             pretrained='laion400m_e32',
                                                             precision="fp16",
                                                             device=device)

url = "https://raw.githubusercontent.com/mlfoundations/open_clip/main/docs/CLIP.png"
image = preprocess(Image.open(requests.get(url, stream=True).raw)).unsqueeze(0).to(device)
text = open_clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

The full error:

RuntimeError                              Traceback (most recent call last)
Input In [7], in <module>
     15 text = open_clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)
     17 with torch.no_grad():
---> 18     image_features = model.encode_image(image)
     19     text_features = model.encode_text(text)
     20     image_features /= image_features.norm(dim=-1, keepdim=True)

File ~/miniconda/lib/python3.8/site-packages/open_clip/model.py:406, in CLIP.encode_image(self, image)
    405 def encode_image(self, image):
--> 406     return self.visual(image)

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda/lib/python3.8/site-packages/open_clip/model.py:261, in VisualTransformer.forward(self, x)
    260 def forward(self, x: torch.Tensor):
--> 261     x = self.conv1(x)  # shape = [*, width, grid, grid]
    262     x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]
    263     x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/conv.py:446, in Conv2d.forward(self, input)
    445 def forward(self, input: Tensor) -> Tensor:
--> 446     return self._conv_forward(input, self.weight, self.bias)

File ~/miniconda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442, in Conv2d._conv_forward(self, input, weight, bias)
    438 if self.padding_mode != 'zeros':
    439     return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    440                     weight, bias, self.stride,
    441                     _pair(0), self.dilation, self.groups)
--> 442 return F.conv2d(input, weight, bias, self.stride,
    443                 self.padding, self.dilation, self.groups)

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

@carlini do you see anything wrong on it or this might be a bug?

from open_clip.

ivanprado avatar ivanprado commented on August 24, 2024

I think I found the problem. I've created the following PR that fixes the problem: #80

from open_clip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.