Coder Social home page Coder Social logo

openvino.genai's People

Contributors

akiseakusa avatar andrei-kochin avatar andreyanufr avatar anzr299 avatar as-suvorov avatar dependabot[bot] avatar eaidova avatar fionazz92 avatar ilya-lavrenov avatar jlxintel avatar kodiaqq avatar likholat avatar ljaljushkin avatar p-wysocki avatar pavel-esir avatar peterchen-intel avatar qxprakash avatar sammysun0711 avatar skuros avatar slyalin avatar sswierze avatar usstq avatar vurusovs avatar wgzintel avatar wovchena avatar xczhai avatar xipingyan avatar yangsu2022 avatar yatarkan avatar zhangyiintel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openvino.genai's Issues

[Good First Issue]: Verify gemma-7b-it with GenAI text_generation

Context

This task regards enabling tests for gemma-7b-it. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify dolly-v2-12b with GenAI text_generation

Context

This task regards enabling tests for dolly-v2-12b. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify baichuan2-7b-chat with GenAI text_generation

Context

This task regards enabling tests for baichuan2-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify llama-2-7b-chat with GenAI text_generation

Context

This task regards enabling tests for llama-2-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Requesting help to understand how TTFT is calculated. Is there any documentation?

Context

Requesting help to understand how TTFT is calculated

What needs to be done?

Requesting help to understand how TTFT is calculated

Example Pull Requests

No response

Resources

Contact points

@ngaloppo @

Ticket

Requesting help to understand how TTFT is calculated

[Good First Issue]: Verify mini-cpm-2b-dpo with GenAI text_generation

Context

This task regards enabling tests for mini-cpm-2b-dpo. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

[Infra] Use dependabot to update Python dependencies

Currently, dependencies in requirements files are not pinned and with new dependencies versions we have broken pipelines like on 2023.3 now.

Let's pin dependencies and use dependabot to track new versions and send updates as PRs. Such PRs can be tested on compatibility with our CI.

[Good First Issue]: Verify phi-2 with GenAI text_generation

Context

This task regards enabling tests for phi-2. You can find more details under openvino_notebooks LLM question answering README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

llm_bench requirements.txt / environment setup is broken

auto-gptq can't be installed due to a build error:

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/gx/znq60x355475d1njb709q0jh0000gn/T/pip-install-2tmqwfxi/auto-gptq_3c9ac5ea4a8f4b049c9ff044cc750c38/setup.py", line 58, in <module>
          CUDA_VERSION = "".join(os.environ.get("CUDA_VERSION", default_cuda_version).split("."))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: 'NoneType' object has no attribute 'split'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

The issue is that the CPU-only version of Pytorch is installed, where default_cuda_version=torch.version.cuda is None. Seems like an upstream issue, but we may need to workaround here.

[Good First Issue]: Verify mistral-7b with GenAI text_generation

Context

This task regards enabling tests for mistral-7b. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Int4(llama-7b-chat) converted model generates response with German words

I have converted the llama-7b-chat model to int4 using following commands:

python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP16 --compress_weights INT4_SYM INT4_ASYM 4BIT_DEFAULT

python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP32 --compress_weights 4BIT_DEFAULT

I'm running benchmarking with the int4 converted models. I tried with following variations and as you can see all the responses contain German words.

  • OV_FP16-INT4_ASYM

image

  • OV_FP16-INT4_SYM

image

Tried with different prompt - It is giving partial answer in German.

image

  • OV_FP16-4BIT_DEFAULT

image

  • OV_FP32-4BIT_DEFAULT

image

Using the following prompt generates the complete response in German.

image

Am I missing something here ? Please provide some guidance.

Convert failed of baichuan-inc/Baichuan2-13B-Chat

Convert cmd:

Convert cmd: python3 convert.py         --model_id baichuan-inc/Baichuan2-13B-Chat         --output_dir /root/.cache/huggingface/hub/baichuan2-13b-chat-ov         --stateful --precision FP16

image

BTW, Baichuan2-7B works well

[Good First Issue]: Verify qwen1.5-7b-chat with GenAI text_generation

Context

This task regards enabling tests for qwen1.5-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Llama-2-7b-hf convert "Export model to OpenVINO directly failed" "Model will be exported to ONNX"

python convert.py --model_id /home/llm/disk/llm/meta-llama/Llama-2-7b-hf --output_dir /home/llm/disk/llm/meta-llama/Llama-2-7b-hf-openvino --precision FP32
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
[ INFO ] openvino runtime version: 2023.3.0-13775-ceeafaf64f3-releases/2023/3
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.50s/it]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
The cos_cached attribute will be removed in 4.40. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead.
The sin_cached attribute will be removed in 4.40. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead.
/home/llm/miniconda3/envs/openvino/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:1057: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_218 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_218':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True

[Good First Issue]: Verify youri-7b-chat with GenAI text_generation

Context

This task regards enabling tests for youri-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

There is no any performance improvement between FP32 and FP32-INT4_ASYM

I am running Qwen-7B on SPR.
And I found there is no any significant perf improvement between FP32 & compressed FP32-INT4_ASYM.

FP32 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/FP32     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 129.71 ms/token

INT4 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/compressed_weights/OV_FP32-INT4_ASYM     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 121.91 ms/token

Did I loose something important in benchmarking cmd? Or it is an issue.

the quality of generated image is low by benchmark on stable diffusion v1.5

I've run the benchmark to test stable diffusion v1.5, but the quality of generated image is low. I doubted there are something computing wrong in the process.
I compared it with optimum-intel. and set the same prompt, steps (20), resolution (512x512), and got very different quality image. please have a look at.

xx_1
stable-diffusion-optimum-sdv1_5_p0_iter0_pid15516_output

stable diffusion 1.5 generates noise image with "--num"

Environment
Windows 11
openvino 2024.1.0
12900K+A770

Command
build\Release\stable_diffusion.exe --num 4 -d GPU.1 -m models/dreamlike_anime_1_0_ov

Description
image
With arg "--num 4", noise image was coming out starting from second round generation.

[Discussion][Good First Issue]: Verify different LLMs work with text_generation

Context

This is an effort to increase Large Language Models tests coverage in OpenVINO GenAI.

Working on this task will let you familiarize yourself with:

  • C++ inference workflow
  • Different LLM models
  • GitHub Actions extension
  • Adding Python tests
  • And more

If you would like to add a new model which there's not a task for, please let us know! We would love to get outside ideas.

What needs to be done?

  1. Try running an LLM specified in your specific task with greedy_causal_lm.cpp and beam_search_causal_lm.cpp.
  2. If it doesn't fail and outputs a reasonable output, the PR should be submitted to extend the supported models list.
  3. If the model is small enough to fit into the default GitHub Actions runner, add the model to tests here.
  4. If the comparison against python fails it's worth adding a test for the model anyway, just without comparison against python, leave a comment in workflow code if this is the case. Since default runners are available for everyone, one can verify the test passes by opening a PR in their fork first.

Example Pull Requests

Example commit: bf4c200#diff-2c8a6fc2893aa2e1103985c1ee763cc325de6042ea66a11ae30428d77e73e416

Resources

List of tasks in the effort

Expand me

Contact points

@Wovchena

Ticket

134074

some regression for benchmark on stable diffusion v1.5

Dear,
I've run the llm_bench\python to test stable diffusion v1.5, and found some regression between different ov packages, like below,

  1. 2023.3->8.73s

  2. 2024.0->10.65s

  3. 2024.1->9.18s
    the parameters is 20 steps, 512x512, the others are the same with prompt/stable-diffusion.jsonl file.
    the command is:

  4. python benchmark.py -d GPU --model "C:\AIGC\openvino\models\stable-diffusion-optimum-sdv1_5" --prompt_file "prompts/stable-diffusion.jsonl" -n 1

the platform is MTL U9 185 iGPU with 32GB.
Thanks a lot,

Causal LM text generation example errors out

I am trying to run text generation using text_generaion/causal_lm/cpp/greedy_causal_lm.cpp without any modifications. I followed the build instructions in the README and ran this command, after which I was shown the following error.

$ ./build/greedy_causal_lm ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16 "Why is the Sun yellow?"
Exception from src/inference/src/core.cpp:85:
Exception from src/frontends/ir/src/ir_deserializer.cpp:438:
Invalid IR! ScatterNDUpdate_15 name is not unique!

I have limited knowledge of the toolkit's internal processes, and would like to get some indication of where this issue might be arising from (and how it could be resolved).

System information

  • Platform: Linux (Ubuntu 22.04) running on an x86 IceLake CPU
  • OpenVINO 2024.1.0, installed by following instructions here
  • gcc and g++ versions 11.4.0

The python environment has the following packages among others:

torch==2.3.0
openvino==2024.1.0
openvino-tokenizers==2024.1.0
transformers==4.37.2
optimum==1.19.1

Other things to note

  • I happen to have access to two machines with aarch64 and x86 processors and a shared file system. The tokenizer conversion using openvino-tokenizers (command below) runs successfully every time on the aarch64 machine but often results in a seg fault on the x86 machine. I could not find any pattern in the occurrence of seg faults.
convert_tokenizer ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --output ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --with-detokenizer --trust-remote-code
  • On the aarch64 machine, trying to build greedy_causal_lm.cpp along with other files using cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j results in the following error. This never occurs on the x86 machine.
CMake Error at /home/nishant/workspace/llm/openvino.genai/thirdparty/openvino_tokenizers/CMakeLists.txt:15 (find_package):
  By not providing "FindOpenVINO.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "OpenVINO",
  but CMake did not find one.

  Could not find a package configuration file provided by "OpenVINO" with any
  of the following names:

    OpenVINOConfig.cmake
    openvino-config.cmake

  Add the installation prefix of "OpenVINO" to CMAKE_PREFIX_PATH or set
  "OpenVINO_DIR" to a directory containing one of the above files.  If
  "OpenVINO" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

I've had to switch between the two machines to execute specific commands that run on those machines successfully. This could have resulted in some issues as well.

[Improvement] Wrong output size of google/flan-t5-xl

Benchmark cmd:

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/flan-t5-xl-ov/pytorch/dldt/FP16     -p "It is done..."     -n 3     -bs 1
-d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

The -ic is 128 but got 129 in output log:
image

[Good First Issue]: Verify mpt-7b-chat with GenAI text_generation

Context

This task regards enabling tests for mpt-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify Phi-1_5 with GenAI text_generation

Context

This task regards enabling tests for Phi-1_5. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify tiny-llama-1b-chat with GenAI text_generation

Context

This task regards enabling tests for tiny-llama-1b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

conda install fails on Windows

Step 1 of the SD1.5 setup is failing on Windows. Is there a specific channel that should be added?

(openvino_sd_cpp) C:\Users\local_user>conda install openvino eigen c-compiler cxx-compiler make
Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - c-compiler
  - openvino
  - make
  - cxx-compiler

Current channels:

  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

System details:
Intel Core UItra 155H
32GB LPDDR5
conda v23.7.4

[Good First Issue]: Project won't run because it can't load openvino_tokenizers.dll - even though it exists!

Context

I downloaded and compiled on Windows. Probably made a stupid mistake, and that's why it won't work. But I'm trying to get this project done so I can join the Intel Partner Alliance! But, I can't get this project to run because of a weird error preventing loading of the DLL even though it exists at the exact location - I ctrl clicked it and it opened up so it's there. But it just won't load right. Can you tell if there's anything in particular that would cause the dll to be unable to load?

This is using OpenVino 2024.0.0 with this model "acen20/Mistral-7B-Instruct-v0.2-openvino-int4".

Error output:

(base) PS C:\Users\hdtru\prg\openvino.genai\text_generation\causal_lm\cpp> .\build\Release\beam_search_causal_lm.exe " C:\Users\hdtru\.cache\huggingface\hub\models--acen20--Mistral-7B-Instruct-v0.2-openvino-int4\snapshots\0a94c646b59e31dc1c52024c1469a5087edf704c\openvino_model.xml" "What is your name?"
Exception from src\inference\src\cpp\core.cpp:163:
Cannot add extension. Cannot find entry point to the extension library. This error happened: Cannot load library 'C:\Users\hdtru\prg\openvino.genai\text_generation\causal_lm\cpp\build\openvino_tokenizers\src\Release\openvino_tokenizers.dll': 127 from cwd: C:\Users\hdtru\prg\openvino.genai\text_generation\causal_lm\cpp

What needs to be done?

Fix it so dll loads correctly... not sure what else to say!

Example Pull Requests

No response

Resources

none

Contact points

not sure

Ticket

No response

MPT-30B met benchmarking issue

image

Benchmarking cmd:

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/mpt-30b-chat-ov/pytorch/dldt/FP16     -p "It is done, and submitted. ..."     -n 2     -bs 1     -d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

Convert.py breaks for microsoft/trocr-base-printed

Context

Running convert.py for microsoft/trocr-base-printed, it gives the below error:

ValueError: Unrecognized configuration class <class
'transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, FalconConfig, FuyuConfig, GemmaConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MambaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

As you can see in the error message, it says Model type should be one of ... and "TrOCRConfig" is part of the list. So "trocr-base-printed" should be supported, but the conversion fails.

What needs to be done?

N/A

Example Pull Requests

No response

Resources

Contact points

N/A

Ticket

No response

[Good First Issue]: causal_lm/cpp must read EOS token value from rt_info of openvino_tokenizer.xml

Context

End Of Sequence tokens are an essential part of LLM training and inference. You can find more details in this comment.

Thanks to a PR adding End Of Sequence tokens to Runtime Info openvino_tokenizers now put EOS token value into rt_info section in OpenVINO Intermediate Representation format (.xml file to be specific) when converting a tokenizer to OpenVINO.

Since EOS has been enabled in OpenVINO, now it needs to be enabled in GenAI text_generation module.

What needs to be done?

beam_search_causal_lm.cpp and greedy_causal_lm.cpp from https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp should read the EOS token instead of having a hardcoded value with comment // There's no way to extract special token values from the detokenizer for now.

It’s required to extract the value using ov::Model::get_rt_info() and use it. Remove the comments about absence of way to extract that value.

Example Pull Requests

Resources

Contact points

@pavel-esir

Ticket

132861

[Good First Issue]: Verify red-pajama-3b-chat with GenAI text_generation

Context

This task regards enabling tests for red-pajama-3b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

Add profiling option to C++ sample for greedy_causal_lm

I want to capture profiling information in my C++ application using get_runtime_model(). I attempted to patch the sample with something like:

    ov::CompiledModel compiledModel = core.compile_model(
        std::string{argv[1]} + "/openvino_model.xml", "CPU", ov::device::properties("CPU", ov::enable_profiling(true)) );
...

    std::string FLAGS_exec_graph_path = "greedy_causal_lm.exec_graph.xml";
    try {
        ov::serialize(compiledModel.get_runtime_model(), FLAGS_exec_graph_path);
        std::cerr << "Executable graph is stored to " << FLAGS_exec_graph_path << std::endl;
    } catch (const std::exception& ex) {
        std::cerr << "Can't get executable graph: " << ex.what() << std::endl;
    }

but when I execute the sample I hit an error with the profiling code:

./build/greedy_causal_lm llama-2-7b-chat.f16.int4/pytorch/dldt/compressed_weights/OV_FP16-4BIT_DEFAULT "Why is the Sun yellow?"
Can't get executable graph: Exception from src/inference/src/cpp/compiled_model.cpp:35:
Exception from src/plugins/intel_cpu/src/node.cpp:499:
Node Broadcast_143553 contains less child edges than 1

The sample works fine without the profiling snippet.

Can an option be added to the sample C++ applications to correctly dump runtime profiling information?

looking for verified/fixed version of python packages in llm_bench/python/requirements.txt

(openvino.genai) E:\projects\openvino.genai\text_generation\causal_lm\cpp>python ......\llm_bench\python\convert.py --model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --output_dir .\TinyLlama-1.1B-Chat-v1.0\ --precision FP16 --stateful
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6'
Traceback (most recent call last):
File "D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\utils\import_utils.py", line 1364, in get_module
return importlib.import_module("." + module_name, self.name)
File "D:\anaconda\envs\openvino.genai\lib\importlib_init
.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in call_with_frames_removed
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\onnx_main
.py", line 33, in
from .convert import export_models, validate_models_outputs
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\onnx\convert.py", line 49, in
from transformers.pytorch_utils import is_torch_less_than_1_11
ImportError: cannot import name 'is_torch_less_than_1_11' from 'transformers.pytorch_utils' (D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\pytorch_utils.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\projects\openvino.genai\llm_bench\python\convert.py", line 27, in
from optimum.exporters.openvino import export_models
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\openvino_init_.py", line 1, in
from .main import main_export
File "D:\anaconda\envs\openvino.genai\lib\site-packages\optimum\exporters\openvino_main_.py", line 24, in
from optimum.exporters.onnx import main as optimum_main
File "", line 1075, in _handle_fromlist
File "D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\utils\import_utils.py", line 1352, in getattr
value = self._get_module(name)
File "D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\utils\import_utils.py", line 1366, in _get_module
raise RuntimeError(
RuntimeError: Failed to import optimum.exporters.onnx.main because of the following error (look up to see its traceback):
cannot import name 'is_torch_less_than_1_11' from 'transformers.pytorch_utils' (D:\anaconda\envs\openvino.genai\lib\site-packages\transformers\pytorch_utils.py)

Request:
Maybe fix the version of all python packages instead of using ">=" or "no given version" for converting model.

[Good First Issue]: Verify chatglm3-6b with GenAI text_generation

Context

This task regards enabling tests for chatglm3-6b. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify red-pajama-3b-instruct with GenAI text_generation

Context

This task regards enabling tests for red-pajama-3b-instruct. You can find more details under openvino_notebooks LLM question answering README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify notus-7b-v1 with GenAI text_generation

Context

This task regards enabling tests for notus-7b-v1. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

ChatGLM3 output token size is not generated as expected

Benchmark cmd (-ic is 128):

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/chatglm3-6b-ov/pytorch/dldt/FP16     -p "It is done, and submitted..."     -n 2     -bs 1     -d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

image

BTW, ChatGLM2's output size is right.

FullyConnected nodes use slow reference kernel on ARM

@Wovchena This is related to #406 but goes deeper, hence I decided to make this a new issue.

TL;DR

  • For greedy_causal_lm inference on arm, large matmuls (e.g. 1x2x4096:4096x4096 in query/key/value projection) fall back to using a slow reference matmul implementation (ref_any_bf16)
  • Smaller matmuls (e.g. 1x32x2x2:1x32x2x128 in dot-product attention) use a faster gemm_acl_f16 kernel, which indicates that ACL kernels through oneDNN are available but not being used for point (1).
  • On x86 most matmuls use brgemm_avx512_bf16 kernel resulting in much faster inference.

Question: Is there a heuristic in OpenVINO that causes large matmuls on arm to fall back to the reference implementation?

Details

Following #327, I added code in greedy_causal_lm.cpp to save profiling information after inference. The table below shows the kernels used on x86 vs arm and their runtime (in microseconds) from the first decoder block of meta-llama/Llama-2-7b-hf (HuggingFace, compressed to INT8_ASYM).

Operation Shape x86 kernel x86 time (mcs) arm kernel arm time (mcs)
q_proj 1x2x4096:4096x4096 brgemm_avx512_bf16 584 ref_any_f16 44714
k_proj 1x2x4096:4096x4096 brgemm_avx512_bf16 699 ref_any_f16 39713
v_proj 1x2x4096:4096x4096 brgemm_avx512_bf16 5775 ref_any_f16 37816
o_proj 1x2x4096:4096x4096 brgemm_avx512_bf16 833 ref_any_f16 36469
mlp.gate_proj 1x2x4096:4096x11008 brgemm_avx512_bf16 1023 ref_any_f16 97537
mlp.up_proj 1x2x4096:4096x11008 brgemm_avx512_bf16 965 ref_any_f16 101677
mlp.down_proj 1x2x11008:11008x4096 brgemm_avx512_bf16 953 ref_any_f16 98749
classifier (logits) 1x2x4096:4096x32000 brgemm_avx512_bf16 1813 ref_any_f16 284708

For small matmuls however, I noticed gemm_acl_f16 being called on arm.

Operation Shape arm kernel arm time (mcs)
A = q_proj * k_proj ^ T 1x32x2x128:1x32x2x128 gemm_acl_f16 226
A * v_proj 1x32x2x2:1x32x2x128 gemm_acl_f16 69

This indicates that ACL kernels are available during inference but not being used only for large matmuls. To check that this diversion is not coming from oneDNN, I compiled oneDNN with ACL with following version combinations and benchmarked the above matmul sizes with benchdnn. Results are shown below, which indicate that gemm:acl is used on arm for all of the above sizes when the two compile correctly.

oneDNN source oneDNN version ACL version Kernel used on arm Comments
oneapi-src/oneDNN 3.3.3 24.02.1 Build fails; error from ACL References to ACL v24.02.1 were found in openvino docs.
oneapi-src/oneDNN 3.3.3 23.11 Build fails; error from ACL N/A
oneapi-src/oneDNN 3.3.3 23.08 gemm:acl Expected kernel is used.
openvinotoolkit/oneDNN 3.3.3 24.02.1 gemm:jit_f32 ACL compilation flag ignored by cmake
openvinotoolkit/oneDNN 3.3.3 23.11 gemm:jit_f32 ACL compilation flag ignored by cmake
openvinotoolkit/oneDNN 3.3.3 23.08 gemm:jit_f32 ACL compilation flag ignored by cmake

I think ACL kernels should ideally be used for any matmul on arm since they are available and being used for some operations. Since oneDNN is not the one causing a reference kernel to be used, I am led to believe that a heuristic within OpenVINO is causing this fallback. I'm hoping to know whether that is indeed the case, and (if yes) where this heuristic has been implemented.

Falcon-40B convert successfully but output_dir has not been created

Convert cmd:

python3 convert.py         --model_id tiiuae/falcon-40b         --output_dir /root/.cache/huggingface/hub/tiiuae/falcon-40b-ov         --stateful --precision FP16

Benchmarking cmd:

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/falcon-40b-ov/pytorch/dldt/FP16  -p "It is done" -n 3     -bs 1     -d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log
  1. Model converting successfully to /root/.cache/huggingface/hub/tiiuae/falcon-40b-ov:
    image
  2. Model cannot be found during benchmarking:
    image
  3. Dir /root/.cache/huggingface/hub/tiiuae/falcon-40b-ov does not exist.

Add speculative decoding example

Speculative decoding is a very popular method to speed up text generation significantly. Moreover, it is already being adopted by industry.
Can you have such an example for text generation? For example, it could be Llama2-7B + TinyLlama.

[Good First Issue]: Verify gemma-2b-it with GenAI text_generation

Context

This task regards enabling tests for gemma-2b-it. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

MPT-30B met convert issue

Convert cmd:

python3 convert.py         --model_id mosaicml/mpt-30b-chat         --output_dir /root/.cache/huggingface/hub/mpt-30b-chat-ov         --stateful --precision FP16

image

[Good First Issue]: Verify dolly-v2-3b with GenAI text_generation

Context

This task regards enabling tests for dolly-v2-3b. You can find more details under openvino_notebooks LLM question answering README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

[Good First Issue]: Verify zephyr-7b-beta with GenAI text_generation

Context

This task regards enabling tests for zephyr-7b-beta. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.