openvinotoolkit / openvino.genai Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 114.0 17.92 MB

Run Generative AI models using native OpenVINO C++ API

License: Apache License 2.0

CMake 2.84% Python 61.52% C++ 35.63%

openvino.genai's People

Contributors

Stargazers

Watchers

Forkers

wgzintel yangsu2022 wovchena skuros vurusovs riacheruvu eaidova sammysun0711 slyalin ljaljushkin usstq peterchen-intel wenyi5608 zhangyiintel ilya-lavrenov helena-intel andreyanufr yeonbok kblaszczak-intel liyanjin alexkoff88 xipingyan vuiseng9 apaniukov xczhai ngaloppo alexsu52 lucifer99966623 tiger100256-hu cabelo wenkel1x heimafeitian ojjsaw sungeunk krzyczar big-model wenjiew mawell-curry kimwoonggon olpipi yatarkan p-wysocki godwin-t xiao1228 utkarsh-2002 vshampor pavel-esir hegdeadithyak vishwa44 kuanxian1 qxprakash yutinggao7 vinayn18 as-suvorov anzr299 mengbingrock a3213105 jlxintel tranchung163 ritikaxshakya akiseakusa sanjith-kumar-20 idk507 dearborn-open-ai sshlyapn popovaan kodiaqq josephrp turbobuilt junxichhen sswierze debasishbsws nakuls0909 kinnam888 chux0519 praasz iefode park12sj sstrehlk jacekpawlak cyddharth-gupta hsulin0806 shayril leonnkn gklodkox cavusmustafa zhuo-yoyowz tolyatalamanov raymondlo84 akladiev l-bat phlong3105 byungilm icemastert

openvino.genai's Issues

Int4(llama-7b-chat) converted model generates response with German words

I have converted the llama-7b-chat model to int4 using following commands:

python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP16 --compress_weights INT4_SYM INT4_ASYM 4BIT_DEFAULT

python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP32 --compress_weights 4BIT_DEFAULT

I'm running benchmarking with the int4 converted models. I tried with following variations and as you can see all the responses contain German words.

OV_FP16-INT4_ASYM

OV_FP16-INT4_SYM

Tried with different prompt - It is giving partial answer in German.

OV_FP16-4BIT_DEFAULT

OV_FP32-4BIT_DEFAULT

Using the following prompt generates the complete response in German.

Am I missing something here ? Please provide some guidance.

llm_bench requirements.txt / environment setup is broken

auto-gptq can't be installed due to a build error:

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/gx/znq60x355475d1njb709q0jh0000gn/T/pip-install-2tmqwfxi/auto-gptq_3c9ac5ea4a8f4b049c9ff044cc750c38/setup.py", line 58, in <module>
          CUDA_VERSION = "".join(os.environ.get("CUDA_VERSION", default_cuda_version).split("."))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: 'NoneType' object has no attribute 'split'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

The issue is that the CPU-only version of Pytorch is installed, where default_cuda_version=torch.version.cuda is None. Seems like an upstream issue, but we may need to workaround here.

[Question] How can I use bf16 as inference precision for LLM?

As Title

There is no any performance improvement between FP32 and FP32-INT4_ASYM

I am running Qwen-7B on SPR.
And I found there is no any significant perf improvement between FP32 & compressed FP32-INT4_ASYM.

FP32 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/FP32     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 129.71 ms/token

INT4 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/compressed_weights/OV_FP32-INT4_ASYM     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 121.91 ms/token

Did I loose something important in benchmarking cmd? Or it is an issue.

looking for verified/fixed version of python packages in llm_bench/python/requirements.txt

(openvino.genai) E:\projects\openvino.genai\text_generation\causal_lm\cpp>python ......\llm_bench\python\convert.py --model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --output_dir .\TinyLlama-1.1B-Chat-v1.0\ --precision FP16 --stateful
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6'
Traceback (most recent call last):
File "D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\utils\import_utils.py", line 1364, in get_module
return importlib.import_module("." + module_name, self.name)
File "D:\anaconda\envs\openvino.genai\lib\importlib_init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in call_with_frames_removed
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\onnx_main.py", line 33, in
from .convert import export_models, validate_models_outputs
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\onnx\convert.py", line 49, in
from transformers.pytorch_utils import is_torch_less_than_1_11
ImportError: cannot import name 'is_torch_less_than_1_11' from 'transformers.pytorch_utils' (D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\pytorch_utils.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\projects\openvino.genai\llm_bench\python\convert.py", line 27, in
from optimum.exporters.openvino import export_models
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\openvino_init_.py", line 1, in
from .main import main_export
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\openvino_main_.py", line 24, in
from optimum.exporters.onnx import main as optimum_main
File "", line 1075, in _handle_fromlist
File "D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\utils\import_utils.py", line 1352, in getattr
value = self._get_module(name)
File "D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\utils\import_utils.py", line 1366, in _get_module
raise RuntimeError(
RuntimeError: Failed to import optimum.exporters.onnx.main because of the following error (look up to see its traceback):
cannot import name 'is_torch_less_than_1_11' from 'transformers.pytorch_utils' (D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\pytorch_utils.py)

Request:
Maybe fix the version of all python packages instead of using ">=" or "no given version" for converting model.

Migrate DEFAULT 4-bit weight quantization of LLMs to optimum configs

As we merged support of default 4-bit configs in Optimum-Intel, we need to adjust llm.bench accordingly.
All the custom model configs should be disabled in llm.bench as well, but we can keep the ability to customize the set of Optimum configs with in llm.bench

FullyConnected nodes use slow reference kernel on ARM

@Wovchena This is related to #406 but goes deeper, hence I decided to make this a new issue.

TL;DR

For greedy_causal_lm inference on arm, large matmuls (e.g. 1x2x4096:4096x4096 in query/key/value projection) fall back to using a slow reference matmul implementation (ref_any_bf16)
Smaller matmuls (e.g. 1x32x2x2:1x32x2x128 in dot-product attention) use a faster gemm_acl_f16 kernel, which indicates that ACL kernels through oneDNN are available but not being used for point (1).
On x86 most matmuls use brgemm_avx512_bf16 kernel resulting in much faster inference.

Question: Is there a heuristic in OpenVINO that causes large matmuls on arm to fall back to the reference implementation?

Details

Following #327, I added code in greedy_causal_lm.cpp to save profiling information after inference. The table below shows the kernels used on x86 vs arm and their runtime (in microseconds) from the first decoder block of meta-llama/Llama-2-7b-hf (HuggingFace, compressed to INT8_ASYM).

Operation	Shape	x86 kernel	x86 time (mcs)	arm kernel	arm time (mcs)
q_proj	1x2x4096:4096x4096	`brgemm_avx512_bf16`	584	`ref_any_f16`	44714
k_proj	1x2x4096:4096x4096	`brgemm_avx512_bf16`	699	`ref_any_f16`	39713
v_proj	1x2x4096:4096x4096	`brgemm_avx512_bf16`	5775	`ref_any_f16`	37816
o_proj	1x2x4096:4096x4096	`brgemm_avx512_bf16`	833	`ref_any_f16`	36469
mlp.gate_proj	1x2x4096:4096x11008	`brgemm_avx512_bf16`	1023	`ref_any_f16`	97537
mlp.up_proj	1x2x4096:4096x11008	`brgemm_avx512_bf16`	965	`ref_any_f16`	101677
mlp.down_proj	1x2x11008:11008x4096	`brgemm_avx512_bf16`	953	`ref_any_f16`	98749
classifier (logits)	1x2x4096:4096x32000	`brgemm_avx512_bf16`	1813	`ref_any_f16`	284708

For small matmuls however, I noticed gemm_acl_f16 being called on arm.

Operation	Shape	arm kernel	arm time (mcs)
A = q_proj * k_proj ^ T	1x32x2x128:1x32x2x128	`gemm_acl_f16`	226
A * v_proj	1x32x2x2:1x32x2x128	`gemm_acl_f16`	69

This indicates that ACL kernels are available during inference but not being used only for large matmuls. To check that this diversion is not coming from oneDNN, I compiled oneDNN with ACL with following version combinations and benchmarked the above matmul sizes with benchdnn. Results are shown below, which indicate that gemm:acl is used on arm for all of the above sizes when the two compile correctly.

oneDNN source	oneDNN version	ACL version	Kernel used on arm	Comments
oneapi-src/oneDNN	3.3.3	24.02.1	Build fails; error from ACL	References to ACL v24.02.1 were found in openvino docs.
oneapi-src/oneDNN	3.3.3	23.11	Build fails; error from ACL	N/A
oneapi-src/oneDNN	3.3.3	23.08	`gemm:acl`	Expected kernel is used.
openvinotoolkit/oneDNN	3.3.3	24.02.1	`gemm:jit_f32`	ACL compilation flag ignored by cmake
openvinotoolkit/oneDNN	3.3.3	23.11	`gemm:jit_f32`	ACL compilation flag ignored by cmake
openvinotoolkit/oneDNN	3.3.3	23.08	`gemm:jit_f32`	ACL compilation flag ignored by cmake

I think ACL kernels should ideally be used for any matmul on arm since they are available and being used for some operations. Since oneDNN is not the one causing a reference kernel to be used, I am led to believe that a heuristic within OpenVINO is causing this fallback. I'm hoping to know whether that is indeed the case, and (if yes) where this heuristic has been implemented.

Migrate to OpenVINO tokenizers repo

OpenVINO Tokenizers extension has a new home https://github.com/openvinotoolkit/openvino_tokenizers

Or maybe it's better to remove submodule and use release packages via cmake fetch content?

Falcon-40B convert successfully but output_dir has not been created

Convert cmd:

python3 convert.py         --model_id tiiuae/falcon-40b         --output_dir /root/.cache/huggingface/hub/tiiuae/falcon-40b-ov         --stateful --precision FP16

Benchmarking cmd:

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/falcon-40b-ov/pytorch/dldt/FP16  -p "It is done" -n 3     -bs 1     -d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

Model converting successfully to /root/.cache/huggingface/hub/tiiuae/falcon-40b-ov:
Model cannot be found during benchmarking:
Dir /root/.cache/huggingface/hub/tiiuae/falcon-40b-ov does not exist.

Llama-2-7b-hf convert "Export model to OpenVINO directly failed" "Model will be exported to ONNX"

python convert.py --model_id /home/llm/disk/llm/meta-llama/Llama-2-7b-hf --output_dir /home/llm/disk/llm/meta-llama/Llama-2-7b-hf-openvino --precision FP32
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
[ INFO ] openvino runtime version: 2023.3.0-13775-ceeafaf64f3-releases/2023/3
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.50s/it]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
The cos_cached attribute will be removed in 4.40. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead.
The sin_cached attribute will be removed in 4.40. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead.
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:1057: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_218 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_218':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True

[Good First Issue]: Verify Phi-1_5 with GenAI text_generation

Context

This task regards enabling tests for Phi-1_5. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify mistral-7b with GenAI text_generation

Context

This task regards enabling tests for mistral-7b. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify gemma-2b-it with GenAI text_generation

Context

This task regards enabling tests for gemma-2b-it. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

[Good First Issue]: Verify dolly-v2-12b with GenAI text_generation

Context

This task regards enabling tests for dolly-v2-12b. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Requesting help to understand how TTFT is calculated. Is there any documentation?

Context

Requesting help to understand how TTFT is calculated

What needs to be done?

Requesting help to understand how TTFT is calculated

Example Pull Requests

No response

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
How to link your Pull Request to an issue

Contact points

@ngaloppo @

Ticket

Requesting help to understand how TTFT is calculated

[Good First Issue]: Verify chatglm3-6b with GenAI text_generation

Context

This task regards enabling tests for chatglm3-6b. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

MPT-30B met benchmarking issue

Benchmarking cmd:

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/mpt-30b-chat-ov/pytorch/dldt/FP16     -p "It is done, and submitted. ..."     -n 2     -bs 1     -d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

Add profiling option to C++ sample for greedy_causal_lm

I want to capture profiling information in my C++ application using get_runtime_model(). I attempted to patch the sample with something like:

    ov::CompiledModel compiledModel = core.compile_model(
        std::string{argv[1]} + "/openvino_model.xml", "CPU", ov::device::properties("CPU", ov::enable_profiling(true)) );
...

    std::string FLAGS_exec_graph_path = "greedy_causal_lm.exec_graph.xml";
    try {
        ov::serialize(compiledModel.get_runtime_model(), FLAGS_exec_graph_path);
        std::cerr << "Executable graph is stored to " << FLAGS_exec_graph_path << std::endl;
    } catch (const std::exception& ex) {
        std::cerr << "Can't get executable graph: " << ex.what() << std::endl;
    }

but when I execute the sample I hit an error with the profiling code:

./build/greedy_causal_lm llama-2-7b-chat.f16.int4/pytorch/dldt/compressed_weights/OV_FP16-4BIT_DEFAULT "Why is the Sun yellow?"
Can't get executable graph: Exception from src/inference/src/cpp/compiled_model.cpp:35:
Exception from src/plugins/intel_cpu/src/node.cpp:499:
Node Broadcast_143553 contains less child edges than 1

The sample works fine without the profiling snippet.

Can an option be added to the sample C++ applications to correctly dump runtime profiling information?

[Good First Issue]: Verify qwen1.5-7b-chat with GenAI text_generation

Context

This task regards enabling tests for qwen1.5-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify gemma-7b-it with GenAI text_generation

Context

This task regards enabling tests for gemma-7b-it. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Convert failed using Salesforce/codegen25-7b-multi

Convert cmd:

python3 convert.py         --model_id Salesforce/codegen25-7b-multi         --output_dir /root/.cache/huggingface/hub/mpt-30b-chat-ov         --stateful --precision FP16

Convert failed:

[Good First Issue]: Verify youri-7b-chat with GenAI text_generation

Context

This task regards enabling tests for youri-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Migrate to optimum-cli from llm_bench usage

Conversion configs are merged to optimum, so we can use optimum-cli directly.

[Good First Issue]: Verify notus-7b-v1 with GenAI text_generation

Context

This task regards enabling tests for notus-7b-v1. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify llama-2-7b-chat with GenAI text_generation

Context

This task regards enabling tests for llama-2-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Infra] Use dependabot to update Python dependencies

Currently, dependencies in requirements files are not pinned and with new dependencies versions we have broken pipelines like on 2023.3 now.

Let's pin dependencies and use dependabot to track new versions and send updates as PRs. Such PRs can be tested on compatibility with our CI.

Convert failed of baichuan-inc/Baichuan2-13B-Chat

Convert cmd:

Convert cmd: python3 convert.py         --model_id baichuan-inc/Baichuan2-13B-Chat         --output_dir /root/.cache/huggingface/hub/baichuan2-13b-chat-ov         --stateful --precision FP16

BTW, Baichuan2-7B works well

[Good First Issue]: Verify baichuan2-7b-chat with GenAI text_generation

Context

This task regards enabling tests for baichuan2-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

the quality of generated image is low by benchmark on stable diffusion v1.5

I've run the benchmark to test stable diffusion v1.5, but the quality of generated image is low. I doubted there are something computing wrong in the process.
I compared it with optimum-intel. and set the same prompt, steps (20), resolution (512x512), and got very different quality image. please have a look at.

some regression for benchmark on stable diffusion v1.5

Dear,
I've run the llm_bench\python to test stable diffusion v1.5, and found some regression between different ov packages, like below,

2023.3->8.73s
2024.0->10.65s
2024.1->9.18s
the parameters is 20 steps, 512x512, the others are the same with prompt/stable-diffusion.jsonl file.
the command is:
python benchmark.py -d GPU --model "C:\AIGC\openvino\models\stable-diffusion-optimum-sdv1_5" --prompt_file "prompts/stable-diffusion.jsonl" -n 1

the platform is MTL U9 185 iGPU with 32GB.
Thanks a lot,

[Discussion][Good First Issue]: Verify different LLMs work with text_generation

Context

This is an effort to increase Large Language Models tests coverage in OpenVINO GenAI.

Working on this task will let you familiarize yourself with:

C++ inference workflow
Different LLM models
GitHub Actions extension
Adding Python tests
And more

If you would like to add a new model which there's not a task for, please let us know! We would love to get outside ideas.

What needs to be done?

Try running an LLM specified in your specific task with greedy_causal_lm.cpp and beam_search_causal_lm.cpp.
If it doesn't fail and outputs a reasonable output, the PR should be submitted to extend the supported models list.
If the model is small enough to fit into the default GitHub Actions runner, add the model to tests here.
If the comparison against python fails it's worth adding a test for the model anyway, just without comparison against python, leave a comment in workflow code if this is the case. Since default runners are available for everyone, one can verify the test passes by opening a PR in their fork first.

Example Pull Requests

Example commit: bf4c200#diff-2c8a6fc2893aa2e1103985c1ee763cc325de6042ea66a11ae30428d77e73e416

Resources

Please note that building OpenVINO yourself is not required, we're just using it. Please follow the GenAI guide on working with text_generation
Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

List of tasks in the effort

Expand me

Contact points

@Wovchena

Ticket

134074

Convert.py breaks for microsoft/trocr-base-printed

Context

Running convert.py for microsoft/trocr-base-printed, it gives the below error:

ValueError: Unrecognized configuration class <class
'transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, FalconConfig, FuyuConfig, GemmaConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MambaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

As you can see in the error message, it says Model type should be one of ... and "TrOCRConfig" is part of the list. So "trocr-base-printed" should be supported, but the conversion fails.

What needs to be done?

N/A

Example Pull Requests

No response

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
How to link your Pull Request to an issue

Contact points

N/A

Ticket

No response

ChatGLM3 output token size is not generated as expected

Benchmark cmd (-ic is 128):

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/chatglm3-6b-ov/pytorch/dldt/FP16     -p "It is done, and submitted..."     -n 2     -bs 1     -d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

BTW, ChatGLM2's output size is right.

[Good First Issue]: Verify phi-2 with GenAI text_generation

Context

This task regards enabling tests for phi-2. You can find more details under openvino_notebooks LLM question answering README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Causal LM text generation example errors out

I am trying to run text generation using text_generaion/causal_lm/cpp/greedy_causal_lm.cpp without any modifications. I followed the build instructions in the README and ran this command, after which I was shown the following error.

$ ./build/greedy_causal_lm ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16 "Why is the Sun yellow?"
Exception from src/inference/src/core.cpp:85:
Exception from src/frontends/ir/src/ir_deserializer.cpp:438:
Invalid IR! ScatterNDUpdate_15 name is not unique!

I have limited knowledge of the toolkit's internal processes, and would like to get some indication of where this issue might be arising from (and how it could be resolved).

System information

Platform: Linux (Ubuntu 22.04) running on an x86 IceLake CPU
OpenVINO 2024.1.0, installed by following instructions here
gcc and g++ versions 11.4.0

The python environment has the following packages among others:

torch==2.3.0
openvino==2024.1.0
openvino-tokenizers==2024.1.0
transformers==4.37.2
optimum==1.19.1

Other things to note

I happen to have access to two machines with aarch64 and x86 processors and a shared file system. The tokenizer conversion using openvino-tokenizers (command below) runs successfully every time on the aarch64 machine but often results in a seg fault on the x86 machine. I could not find any pattern in the occurrence of seg faults.

convert_tokenizer ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --output ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --with-detokenizer --trust-remote-code

On the aarch64 machine, trying to build greedy_causal_lm.cpp along with other files using cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j results in the following error. This never occurs on the x86 machine.

CMake Error at /home/nishant/workspace/llm/openvino.genai/thirdparty/openvino_tokenizers/CMakeLists.txt:15 (find_package):
  By not providing "FindOpenVINO.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "OpenVINO",
  but CMake did not find one.

  Could not find a package configuration file provided by "OpenVINO" with any
  of the following names:

    OpenVINOConfig.cmake
    openvino-config.cmake

  Add the installation prefix of "OpenVINO" to CMAKE_PREFIX_PATH or set
  "OpenVINO_DIR" to a directory containing one of the above files.  If
  "OpenVINO" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

I've had to switch between the two machines to execute specific commands that run on those machines successfully. This could have resulted in some issues as well.

2023/3 is unusable for LLM samples

New versions of dependencies are out and LLM samples don't work anymore e2e

Add speculative decoding example

Speculative decoding is a very popular method to speed up text generation significantly. Moreover, it is already being adopted by industry.
Can you have such an example for text generation? For example, it could be Llama2-7B + TinyLlama.

MPT-30B met convert issue

Convert cmd:

python3 convert.py         --model_id mosaicml/mpt-30b-chat         --output_dir /root/.cache/huggingface/hub/mpt-30b-chat-ov         --stateful --precision FP16

[Good First Issue]: Verify red-pajama-3b-instruct with GenAI text_generation

Context

This task regards enabling tests for red-pajama-3b-instruct. You can find more details under openvino_notebooks LLM question answering README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

stable diffusion 1.5 generates noise image with "--num"

Environment
Windows 11
openvino 2024.1.0
12900K+A770

Command
build\Release\stable_diffusion.exe --num 4 -d GPU.1 -m models/dreamlike_anime_1_0_ov

Description

With arg "--num 4", noise image was coming out starting from second round generation.

[Good First Issue]: Verify tiny-llama-1b-chat with GenAI text_generation

Context

This task regards enabling tests for tiny-llama-1b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

[Good First Issue]: Verify mpt-7b-chat with GenAI text_generation

Context

This task regards enabling tests for mpt-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: causal_lm/cpp must read EOS token value from rt_info of openvino_tokenizer.xml

Context

End Of Sequence tokens are an essential part of LLM training and inference. You can find more details in this comment.

Thanks to a PR adding End Of Sequence tokens to Runtime Info openvino_tokenizers now put EOS token value into rt_info section in OpenVINO Intermediate Representation format (.xml file to be specific) when converting a tokenizer to OpenVINO.

Since EOS has been enabled in OpenVINO, now it needs to be enabled in GenAI text_generation module.

What needs to be done?

beam_search_causal_lm.cpp and greedy_causal_lm.cpp from https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp should read the EOS token instead of having a hardcoded value with comment // There's no way to extract special token values from the detokenizer for now.

It’s required to extract the value using ov::Model::get_rt_info() and use it. Remove the comments about absence of way to extract that value.

Example Pull Requests

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

@pavel-esir

Ticket

132861

conda install fails on Windows

Step 1 of the SD1.5 setup is failing on Windows. Is there a specific channel that should be added?

(openvino_sd_cpp) C:\Users\local_user>conda install openvino eigen c-compiler cxx-compiler make
Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - c-compiler
  - openvino
  - make
  - cxx-compiler

Current channels:

  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

System details:
Intel Core UItra 155H
32GB LPDDR5
conda v23.7.4

[Improvement] Wrong output size of google/flan-t5-xl

Benchmark cmd:

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/flan-t5-xl-ov/pytorch/dldt/FP16     -p "It is done..."     -n 3     -bs 1
-d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

The -ic is 128 but got 129 in output log:

[Good First Issue]: Verify mini-cpm-2b-dpo with GenAI text_generation

Context

This task regards enabling tests for mini-cpm-2b-dpo. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

[Good First Issue]: Project won't run because it can't load openvino_tokenizers.dll - even though it exists!

Context

I downloaded and compiled on Windows. Probably made a stupid mistake, and that's why it won't work. But I'm trying to get this project done so I can join the Intel Partner Alliance! But, I can't get this project to run because of a weird error preventing loading of the DLL even though it exists at the exact location - I ctrl clicked it and it opened up so it's there. But it just won't load right. Can you tell if there's anything in particular that would cause the dll to be unable to load?

This is using OpenVino 2024.0.0 with this model "acen20/Mistral-7B-Instruct-v0.2-openvino-int4".

Error output:

(base) PS C:\Users\hdtru\prg\openvino.genai\text_generation\causal_lm\cpp> .\build\Release\beam_search_causal_lm.exe " C:\Users\hdtru\.cache\huggingface\hub\models--acen20--Mistral-7B-Instruct-v0.2-openvino-int4\snapshots\0a94c646b59e31dc1c52024c1469a5087edf704c\openvino_model.xml" "What is your name?"
Exception from src\inference\src\cpp\core.cpp:163:
Cannot add extension. Cannot find entry point to the extension library. This error happened: Cannot load library 'C:\Users\hdtru\prg\openvino.genai\text_generation\causal_lm\cpp\build\openvino_tokenizers\src\Release\openvino_tokenizers.dll': 127 from cwd: C:\Users\hdtru\prg\openvino.genai\text_generation\causal_lm\cpp

What needs to be done?

Fix it so dll loads correctly... not sure what else to say!

Example Pull Requests

No response

Resources

none

Contact points

not sure

Ticket

No response

[Good First Issue]: Verify red-pajama-3b-chat with GenAI text_generation

Context

This task regards enabling tests for red-pajama-3b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

[Good First Issue]: Verify dolly-v2-3b with GenAI text_generation

Context

This task regards enabling tests for dolly-v2-3b. You can find more details under openvino_notebooks LLM question answering README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify zephyr-7b-beta with GenAI text_generation

Context

This task regards enabling tests for zephyr-7b-beta. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers

Contact points

Described in the main Discussion issue at: #259

Ticket

No response