Coder Social home page Coder Social logo

Comments (19)

BabyChouSr avatar BabyChouSr commented on August 25, 2024 1

thank you! works now :)

from vllm.

BabyChouSr avatar BabyChouSr commented on August 25, 2024 1

@JanuRam I don't think that the model should be used for chat responses. You will not receive content that is very meaningful. Try by using the llava template. However, I would say that chat is probably not the use case that you would want to use this model for. If you are looking for chat, you should try https://huggingface.co/openbmb/MiniCPM-V-2_6

python -m vllm.entrypoints.openai.api_server \
    --model google/paligemma-3b-mix-224 \
    --chat-template template_llava.jinja

from vllm.

DarkLight1337 avatar DarkLight1337 commented on August 25, 2024

Thanks for reporting this! Can you check whether #6430 fixes this issue?

from vllm.

ywang96 avatar ywang96 commented on August 25, 2024

Not related to this PR in particular, but since you're serving this from the OpenAI API server, I don't think PaliGemma is supposed to work out-of-box with it because it was never instruction fine-tuned.

In the PaliGemma paper, it says

Gemma [79] is a family of auto-regressive decoder-only open large language models built
from the same research and technology used to create the Gemini [7] models. The models come
in different sizes (2B, 7B), both pretrained and instruction fine-tuned. PaliGemma uses the 2B
pretrained version.

from vllm.

BabyChouSr avatar BabyChouSr commented on August 25, 2024

@DarkLight1337 Thank you for taking on this issue! Sorry, but this still doesn't work for me. I pulled your branch using git fetch origin pull/6430/head but i still run into the same error with the same input.

@ywang96 You bring up a good point! I'll have to familiarize myself with the paper, thanks for sharing.

from vllm.

DarkLight1337 avatar DarkLight1337 commented on August 25, 2024

Oops, I forgot to update the async version of fetch_image. Can you try again?

from vllm.

JanuRam avatar JanuRam commented on August 25, 2024

Hi @BabyChouSr
I tried the below curl command on the paligemma model that we have hosted

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/paligemma-3b-mix-448",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What’s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://placehold.co/600x400/jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

I am getting the following output, not the one you mentioned

<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_participation>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_participation>assistant"

What might be the issue here, can you please help me!

from vllm.

DarkLight1337 avatar DarkLight1337 commented on August 25, 2024

You should use a custom chat template so that the input has the same format as the one shown on HuggingFace.

from vllm.

JanuRam avatar JanuRam commented on August 25, 2024

@DarkLight1337 I hope the request body for the paligemma api is same for all when hosted through vLLM. Why we should be using custom chat template. Can you please elaborate much on this?

from vllm.

DarkLight1337 avatar DarkLight1337 commented on August 25, 2024

From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.

from vllm.

JanuRam avatar JanuRam commented on August 25, 2024

@DarkLight1337 To give more context, I tried the above curl command on the paligemma model that we have hosted through vLLM framework as same as what @BabyChouSr used for his query. But our output was completely different from he has told. So, I had asked a help for that.

from vllm.

DarkLight1337 avatar DarkLight1337 commented on August 25, 2024

How are you hosting the model? Please show the command that you used.

from vllm.

ywang96 avatar ywang96 commented on August 25, 2024

@DarkLight1337 To give more context, I tried the above curl command on the paligemma model that we have hosted through vLLM framework as same as what @BabyChouSr used for his query. But our output was completely different from he has told. So, I had asked a help for that.

I don't think by default the temperature is set to 0 (i.e, we're not greedily sampling) and that's probably why you're seeing the difference.

I would also encourage you to take a look at our example script examples/offline_inference_vision_language.py.

from vllm.

JanuRam avatar JanuRam commented on August 25, 2024

How are you hosting the model? Please show the command that you used.

@DarkLight1337 It is through a cloud platform called Jarvislabs.ai, they have a vLLM option to host open source models through hugging face. When I tried with paligemma, it gave us two apis, one is /v1/chat/completions and /v1/completions. I thought /v1/chat/completions would work for us and tried it, but didn't proper response. The simple goal here is to given an image and a prompt. It should be able to give the output.

from vllm.

DarkLight1337 avatar DarkLight1337 commented on August 25, 2024

Do you have the ability to pass through command-line arguments? As mentioned above:

From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.

from vllm.

JanuRam avatar JanuRam commented on August 25, 2024

Do you have the ability to pass through command-line arguments? As mentioned above:

From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.

No, I have control only on the request body given for the API call.

from vllm.

DarkLight1337 avatar DarkLight1337 commented on August 25, 2024

How about selecting the HuggingFace model to use? Maybe you can fork the model repo and add the chat template to it.

from vllm.

JanuRam avatar JanuRam commented on August 25, 2024

Not sure. But, my doubt is why I am not able to get a proper output as like @BabyChouSr got for his jpg image query using /v1/chat/completions api call with paligemma model?

from vllm.

JanuRam avatar JanuRam commented on August 25, 2024

It is not for chat (conversational purpose), mainly for visual question answering to be precise.

from vllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.