Context We see that basic chatglm output generated words are repet

Tried the same OV converted model using chatglm-openvino (<a href="https://github.com/

Hi, any update on this? <a class="user-mention notranslate" data-hovercard-type="user"

Hi. I don't have any update. <a class="user-mention notranslate" data-hovercard-type="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi. I don't have any update. <a class="user-mention notranslate" data-hov

Thanks for your input <a class="user-mention notranslate" data-hovercard-type="user" d

chatglm output results are repeating with basic prompts about openvino.genai HOT 12 OPEN

avinashbhat09 commented on July 17, 2024

chatglm output results are repeating with basic prompts

from openvino.genai.

Comments (12)

peterchen-intel commented on July 17, 2024 1

It may due to benchmark.py force to output "--infer_count" tokens for performance consistency (with fixed output size instead of stop at end_token). Will add an option to stop_ending_token.

from openvino.genai.

avinashbhat09 commented on July 17, 2024

Tried the same OV converted model using chatglm-openvino (https://github.com/OpenVINO-dev-contest/chatglm3.openvino) and it works fine. We dont see any repetitive words.

With this we can conclude below:

No issue running inference on CPU or GPU
Not a model issue
No quantization issue

This looks more of gen-ai interfacing with the model.

from openvino.genai.

avinashbhat09 commented on July 17, 2024

Can we add chatbot kind of implementation same as chatglm-openvino into the gen-ai to support chatglm?

from openvino.genai.

avinashbhat09 commented on July 17, 2024

Hi, any update on this? @Wovchena

from openvino.genai.

Wovchena commented on July 17, 2024

Hi. I don't have any update. @peterchen-intel is the correct person to discuss llm_bench related questions with. As for the chatbot kind of implementation, the sample is here https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample.

from openvino.genai.

avinashbhat09 commented on July 17, 2024

Thanks @Wovchena . Unfortunately, the chat sample does not work for me.

from openvino.genai.

avinashbhat09 commented on July 17, 2024

Hi. I don't have any update. @peterchen-intel is the correct person to discuss llm_bench related questions with. As for the chatbot kind of implementation, the sample is here https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample.

@peterchen-intel : Any input from your side?

from openvino.genai.

Wovchena commented on July 17, 2024

Your model is stateless. You need a stateful one. To export such a model, ensure you don't have --disable-stateful while running optimum-cli export openvino. Alternatively, if you use python ./llm_bench/python/convert.py, you need to specify --stateful (and not --disable-stateful).

from openvino.genai.

avinashbhat09 commented on July 17, 2024

Thanks for your input @Wovchena . I converted using this command and it worked.
optimum-cli export openvino --trust-remote-code --model THUDM/chatglm3-6b chatglm3-6b_stateful --task question-answering

So this confirms that we dont have issues with model or quantization. Coming back to original bug- when we use benchmark.py why do we see the answers repeating? Anything can be done to fix that? For our validation get the metrices printed for each response is important (like token/sec, first token latency etc). Which is currently not available in chat_sample.

from openvino.genai.

peterchen-intel commented on July 17, 2024

CVS-146307

from openvino.genai.

peterchen-intel commented on July 17, 2024

#606

from openvino.genai.

peterchen-intel commented on July 17, 2024

@avinashbhat09 Can you try HEAD of openvino.genai master branch with option --end_token_stopping?

from openvino.genai.

chatglm output results are repeating with basic prompts about openvino.genai HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent