Comments (12)
It may due to benchmark.py force to output "--infer_count" tokens for performance consistency (with fixed output size instead of stop at end_token). Will add an option to stop_ending_token.
from openvino.genai.
Tried the same OV converted model using chatglm-openvino (https://github.com/OpenVINO-dev-contest/chatglm3.openvino) and it works fine. We dont see any repetitive words.
With this we can conclude below:
-
No issue running inference on CPU or GPU
-
Not a model issue
-
No quantization issue
This looks more of gen-ai interfacing with the model.
from openvino.genai.
Can we add chatbot kind of implementation same as chatglm-openvino into the gen-ai to support chatglm?
from openvino.genai.
Hi, any update on this? @Wovchena
from openvino.genai.
Hi. I don't have any update. @peterchen-intel is the correct person to discuss llm_bench related questions with. As for the chatbot kind of implementation, the sample is here https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample.
from openvino.genai.
Thanks @Wovchena . Unfortunately, the chat sample does not work for me.
from openvino.genai.
Hi. I don't have any update. @peterchen-intel is the correct person to discuss llm_bench related questions with. As for the chatbot kind of implementation, the sample is here https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample.
@peterchen-intel : Any input from your side?
from openvino.genai.
Your model is stateless. You need a stateful one. To export such a model, ensure you don't have --disable-stateful
while running optimum-cli export openvino
. Alternatively, if you use python ./llm_bench/python/convert.py
, you need to specify --stateful
(and not --disable-stateful
).
from openvino.genai.
Thanks for your input @Wovchena . I converted using this command and it worked.
optimum-cli export openvino --trust-remote-code --model THUDM/chatglm3-6b chatglm3-6b_stateful --task question-answering
So this confirms that we dont have issues with model or quantization. Coming back to original bug- when we use benchmark.py why do we see the answers repeating? Anything can be done to fix that? For our validation get the metrices printed for each response is important (like token/sec, first token latency etc). Which is currently not available in chat_sample.
from openvino.genai.
CVS-146307
from openvino.genai.
from openvino.genai.
@avinashbhat09 Can you try HEAD of openvino.genai master branch with option --end_token_stopping?
from openvino.genai.
Related Issues (20)
- Requesting help to understand how TTFT is calculated. Is there any documentation? HOT 3
- MPT-30B met benchmarking issue HOT 1
- ChatGLM3 output token size is not generated as expected HOT 6
- stable diffusion 1.5 generates noise image with "--num"
- Causal LM text generation example errors out HOT 5
- [Good First Issue]: Project won't run because it can't load openvino_tokenizers.dll - even though it exists! HOT 5
- FullyConnected nodes use slow reference kernel on ARM HOT 18
- some regression for benchmark on stable diffusion v1.5 HOT 5
- the quality of generated image is low by benchmark on stable diffusion v1.5 HOT 7
- chatglm3 fails with jsonl input HOT 9
- llama3 perf is low HOT 5
- Typo in README.md?
- [GPU] force exit HOT 1
- `chat_sample.exe` doesn't work well when using `Llama-2-7b-chat-hf` HOT 1
- if there is a sample code to run on NPU? HOT 10
- openvino genai on ubuntu 24.04, failure to install HOT 1
- Chat streaming is not working with Phi 2
- cannot generate models for cpp sample. HOT 4
- can provide model zoo for cpp or python sample direct execution? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openvino.genai.