Your current environment <div class="snippet-clipboard-content notranslate posit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for reporting this! Can you check whether <a class="issue-link js-issue-link" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Bug]: Paligemma support for PNG files about vllm HOT 19 CLOSED

BabyChouSr commented on August 25, 2024

[Bug]: Paligemma support for PNG files

from vllm.

Comments (19)

BabyChouSr commented on August 25, 2024 1

thank you! works now :)

from vllm.

BabyChouSr commented on August 25, 2024 1

@JanuRam I don't think that the model should be used for chat responses. You will not receive content that is very meaningful. Try by using the llava template. However, I would say that chat is probably not the use case that you would want to use this model for. If you are looking for chat, you should try https://huggingface.co/openbmb/MiniCPM-V-2_6

python -m vllm.entrypoints.openai.api_server \
    --model google/paligemma-3b-mix-224 \
    --chat-template template_llava.jinja

from vllm.

DarkLight1337 commented on August 25, 2024

Thanks for reporting this! Can you check whether #6430 fixes this issue?

from vllm.

ywang96 commented on August 25, 2024

Not related to this PR in particular, but since you're serving this from the OpenAI API server, I don't think PaliGemma is supposed to work out-of-box with it because it was never instruction fine-tuned.

In the PaliGemma paper, it says

Gemma [79] is a family of auto-regressive decoder-only open large language models built
from the same research and technology used to create the Gemini [7] models. The models come
in different sizes (2B, 7B), both pretrained and instruction fine-tuned. PaliGemma uses the 2B
pretrained version.

from vllm.

BabyChouSr commented on August 25, 2024

@DarkLight1337 Thank you for taking on this issue! Sorry, but this still doesn't work for me. I pulled your branch using git fetch origin pull/6430/head but i still run into the same error with the same input.

@ywang96 You bring up a good point! I'll have to familiarize myself with the paper, thanks for sharing.

from vllm.

DarkLight1337 commented on August 25, 2024

Oops, I forgot to update the async version of fetch_image. Can you try again?

from vllm.

JanuRam commented on August 25, 2024

Hi @BabyChouSr
I tried the below curl command on the paligemma model that we have hosted

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/paligemma-3b-mix-448",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What’s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://placehold.co/600x400/jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

I am getting the following output, not the one you mentioned

<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_participation>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_participation>assistant"

What might be the issue here, can you please help me!

from vllm.

DarkLight1337 commented on August 25, 2024

You should use a custom chat template so that the input has the same format as the one shown on HuggingFace.

from vllm.

JanuRam commented on August 25, 2024

@DarkLight1337 I hope the request body for the paligemma api is same for all when hosted through vLLM. Why we should be using custom chat template. Can you please elaborate much on this?

from vllm.

DarkLight1337 commented on August 25, 2024

From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.

from vllm.

JanuRam commented on August 25, 2024

@DarkLight1337 To give more context, I tried the above curl command on the paligemma model that we have hosted through vLLM framework as same as what @BabyChouSr used for his query. But our output was completely different from he has told. So, I had asked a help for that.

from vllm.

DarkLight1337 commented on August 25, 2024

How are you hosting the model? Please show the command that you used.

from vllm.

ywang96 commented on August 25, 2024

@DarkLight1337 To give more context, I tried the above curl command on the paligemma model that we have hosted through vLLM framework as same as what @BabyChouSr used for his query. But our output was completely different from he has told. So, I had asked a help for that.

I don't think by default the temperature is set to 0 (i.e, we're not greedily sampling) and that's probably why you're seeing the difference.

I would also encourage you to take a look at our example script examples/offline_inference_vision_language.py.

from vllm.

JanuRam commented on August 25, 2024

How are you hosting the model? Please show the command that you used.

@DarkLight1337 It is through a cloud platform called Jarvislabs.ai, they have a vLLM option to host open source models through hugging face. When I tried with paligemma, it gave us two apis, one is /v1/chat/completions and /v1/completions. I thought /v1/chat/completions would work for us and tried it, but didn't proper response. The simple goal here is to given an image and a prompt. It should be able to give the output.

from vllm.

DarkLight1337 commented on August 25, 2024

Do you have the ability to pass through command-line arguments? As mentioned above:

From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.

from vllm.

JanuRam commented on August 25, 2024

Do you have the ability to pass through command-line arguments? As mentioned above:

From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.

No, I have control only on the request body given for the API call.

from vllm.

DarkLight1337 commented on August 25, 2024

How about selecting the HuggingFace model to use? Maybe you can fork the model repo and add the chat template to it.

from vllm.

JanuRam commented on August 25, 2024

Not sure. But, my doubt is why I am not able to get a proper output as like @BabyChouSr got for his jpg image query using /v1/chat/completions api call with paligemma model?

from vllm.

JanuRam commented on August 25, 2024

It is not for chat (conversational purpose), mainly for visual question answering to be precise.

from vllm.

[Bug]: Paligemma support for PNG files about vllm HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent