Hello ipex-llm experts, I suffers issue about Llama-3-8B on MTL-H's iGPU and need

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference about bigdl HOT 3 OPEN

zcwang commented on June 3, 2024

llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference

from bigdl.

Comments (3)

qiuxin2012 commented on June 3, 2024

Arc770 and iGPU can't working on the same env, we are still working on it, related issue: #10940
But the error is different, should be RuntimeError: could not create a primitive. This difference may be caused by your different torch version.

from bigdl.

zcwang commented on June 3, 2024

Got it! I will remove ARC770 to test my iGPU again in MTL.

BTW I also test the same SW environment in my TGL platform (Corei7-1185G7) and the iGPU indeed works well.

SW environment in Ubuntu 22.04+kernel v6.8.2

intel_extension_for_pytorch   2.1.20+git0e2bee2
torch                         2.1.0.post0+cxx11.abi
torchvision                   0.16.0+fbb4cc5
intel-openmp                  2024.1.0
openvino                      2024.1.0
openvino-telemetry            2024.1.0

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [24.13.29138.7]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.29138]

Test result

(llm-test) intel@myDUT:~/work/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3$ ONEAPI_DEVICE_SELECTOR=level_zero:0 python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt 'History of Intel' --n-predict 64
2024-05-15 09:57:23,463 - INFO - intel_extension_for_pytorch auto imported
/home/intel/anaconda3/envs/llm-test/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  5.16it/s]
2024-05-15 09:57:25,302 - INFO - Converting the current model to sym_int4 format......
/home/intel/anaconda3/envs/llm-test/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
Inference time: 9.984711408615112 s
-------------------- Prompt --------------------
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>


-------------------- Output (skip_special_tokens=False) --------------------
<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Intel Corporation is an American multinational corporation that specializes in the design and manufacture of microprocessors, memory chips, and other semiconductor technologies. Here is a brief history of the company:

**Early Years (1968-1979)**

Intel was founded on July 18, 1968, by Gordon Moore and Robert N

@qiuxin2012 , I appreciate your support.

from bigdl.

zcwang commented on June 3, 2024

@qiuxin2012 . I confirmed MTL-H iGPU works well without ARC770 in platform.

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 155H OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO  [24.13.29138.7]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) Graphics 1.3 [1.3.29138]
...
(llm) intel@mydevice:~/work/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3$ ONEAPI_DEVICE_SELECTOR=level_zero:0 python ./generate.py --repo-id-or-model-path meta-ll
ama/Meta-Llama-3-8B-Instruct --prompt 'History of Intel' --n-predict 64
2024-05-15 10:36:33,547 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  5.48it/s]
2024-05-15 10:36:34,559 - INFO - Converting the current model to sym_int4 format......
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
Inference time: 6.857227563858032 s
-------------------- Prompt --------------------
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>


-------------------- Output (skip_special_tokens=False) --------------------
<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The legendary Intel!

Intel Corporation is an American multinational corporation that specializes in the design and manufacture of microprocessors, the "brain" of modern computers. Here's a brief history of the company:

**Early Years (1968-1971)**

Intel was founded on July 18, 1968, by Gordon

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H]
Registry and code: 13 MB
Command: python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt History of Intel --n-predict 64
Uptime: 63.459550 s

from bigdl.

llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference about bigdl HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent