Comments (10)
Hi @raj-ritu17 ,
Would you mind sharing the result of pip list
in your current conda env to help us better locate this issue ?
from bigdl.
@rnwang04, here is the list
(llama) intel@IMU-NEX-EMR1-SUT:~/ritu/ollama$ pip list
Package Version
--------------------------- ------------------
accelerate 0.21.0
aiohttp 3.9.3
aiosignal 1.3.1
altair 5.2.0
annotated-types 0.6.0
anyio 4.3.0
async-timeout 4.0.3
attrs 23.2.0
backoff 2.2.1
bigdl-core-cpp 2.5.0b20240421
bigdl-core-xe-21 2.5.0b20240416
bigdl-core-xe-esimd-21 2.5.0b20240416
bigdl-llm 2.5.0b20240416
blinker 1.7.0
cachetools 5.3.3
certifi 2024.2.2
charset-normalizer 3.3.2
chromadb 0.3.25
click 8.1.7
clickhouse-connect 0.7.3
coloredlogs 15.0.1
dataclasses-json 0.5.14
diskcache 5.6.3
distro 1.9.0
dpcpp-cpp-rt 2024.0.2
duckdb 0.10.1
exceptiongroup 1.2.0
fastapi 0.110.0
filelock 3.13.1
flatbuffers 24.3.7
frozenlist 1.4.1
fsspec 2024.3.0
gguf 0.6.0
gitdb 4.0.11
GitPython 3.1.42
greenlet 3.0.3
h11 0.14.0
hnswlib 0.8.0
httpcore 1.0.4
httptools 0.6.1
httpx 0.25.2
huggingface-hub 0.21.4
humanfriendly 10.0
idna 3.6
influxdb-client 1.41.0
intel-cmplr-lib-rt 2024.0.2
intel-cmplr-lic-rt 2024.0.2
intel-extension-for-pytorch 2.1.10+xpu
intel-opencl-rt 2024.0.2
intel-openmp 2024.0.2
ipex-llm 2.1.0b20240421
Jinja2 3.1.3
jsonpatch 1.33
jsonpointer 2.4
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
langchain 0.1.12
langchain-community 0.0.28
langchain-core 0.1.32
langchain-text-splitters 0.0.1
langsmith 0.1.27
llama_cpp_python 0.2.56
lz4 4.3.3
markdown-it-py 3.0.0
MarkupSafe 2.1.5
marshmallow 3.21.1
mdurl 0.1.2
mkl 2024.0.0
mkl-dpcpp 2024.0.0
monotonic 1.6
mpmath 1.3.0
multidict 6.0.5
mypy-extensions 1.0.0
networkx 3.2.1
numexpr 2.9.0
numpy 1.26.4
ollama 0.1.7
onednn 2024.0.0
onemkl-sycl-blas 2024.0.0
onemkl-sycl-datafitting 2024.0.0
onemkl-sycl-dft 2024.0.0
onemkl-sycl-lapack 2024.0.0
onemkl-sycl-rng 2024.0.0
onemkl-sycl-sparse 2024.0.0
onemkl-sycl-stats 2024.0.0
onemkl-sycl-vm 2024.0.0
onnxruntime 1.17.1
openai 1.14.1
openapi-schema-pydantic 1.2.4
orjson 3.9.15
overrides 7.7.0
packaging 23.2
pandas 2.0.3
pillow 10.2.0
pip 23.3.1
posthog 3.5.0
protobuf 4.25.3
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 15.0.1
pydantic 1.10.14
pydantic_core 2.16.3
pydeck 0.8.1b0
Pygments 2.17.2
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
pytz 2024.1
PyYAML 6.0.1
reactivex 4.0.4
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
rich 13.7.1
rpds-py 0.18.0
safetensors 0.4.2
sentencepiece 0.2.0
setuptools 68.2.2
six 1.16.0
smmap 5.0.1
sniffio 1.3.1
SQLAlchemy 2.0.28
starlette 0.36.3
streamlit 1.32.2
sympy 1.12
tabulate 0.9.0
tbb 2021.11.0
tenacity 8.2.3
tokenizers 0.15.2
toml 0.10.2
toolz 0.12.1
torch 2.1.0a0+cxx11.abi
torchvision 0.16.0a0+cxx11.abi
tornado 6.4
tqdm 4.66.2
transformers 4.36.0
typing_extensions 4.10.0
typing-inspect 0.9.0
tzdata 2024.1
urllib3 2.2.1
uvicorn 0.29.0
uvloop 0.19.0
watchdog 4.0.0
watchfiles 0.21.0
websockets 12.0
wheel 0.41.2
yarl 1.9.4
zstandard 0.22.0
from bigdl.
Hi @raj-ritu17 ,
I guess this issue is caused by :
onednn 2024.0.0
onemkl-sycl-blas 2024.0.0
onemkl-sycl-datafitting 2024.0.0
onemkl-sycl-dft 2024.0.0
onemkl-sycl-lapack 2024.0.0
onemkl-sycl-rng 2024.0.0
onemkl-sycl-sparse 2024.0.0
onemkl-sycl-stats 2024.0.0
onemkl-sycl-vm 2024.0.0
If your want to use llama.cpp or ollama on Linux system, DO NOT USE pip to install oneapi like this pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
.
Would you mind creating a new conda env without pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
and try ollama again ?
from bigdl.
thanks for the update, don't we need those for one-api ?
I was following this document:
https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html
from bigdl.
don't we need those for one-api ?
I was following this document:https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html
Yes, in our linux installation guide, we recommend using APT to install oneapi : https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi.
You don't need to install oneapi in your conda env again by pip install any oneapi related package.
So I suggest that you can create a new conda env and do not use pip to install any oneapi related package to see whether this issue can be solved.
from bigdl.
@rnwang04, much appreciated
this is working fine.
from bigdl.
@rnwang04 unfortunately, it doesn't work anymore.
it is not stable. I pulled a new model and tried to run but its just hang.
for example (llama3:
intel@IMU-NEX-EMR1-SUT:~/ritu/ollama$ ./ollama run gemma
pulling manifest
pulling ef311de6af9d... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 5.0 GB
pulling 097a36493f71... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.4 KB
pulling 109037bec39c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 136 B
pulling 65bb16cf5983... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 109 B
pulling 0c2a5137eb3c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 483 B
verifying sha256 digest
writing manifest
removing any unused layers
success
⠸
⠦
⠇
⠧
⠏
⠙
⠏
⠙
Error: timed out waiting for llama runner to start:
and server side:
found 4 SYCL devices:
| | | |Compute |Max compute|Max work|Max sub| |
|ID| Device Type| Name|capability|units |group |group |Global mem size|
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
| 0|[level_zero:gpu:0]| Intel(R) Data Center GPU Flex 170| 1.3| 512| 1024| 32| 14193102848|
| 1| [opencl:gpu:0]| Intel(R) Data Center GPU Flex 170| 3.0| 512| 1024| 32| 14193102848|
| 2| [opencl:cpu:0]| INTEL(R) XEON(R) PLATINUM 8580| 3.0| 240| 8192| 64| 67113893888|
| 3| [opencl:acc:0]| Intel(R) FPGA Emulation Device| 1.2| 240|67108864| 64| 67113893888|
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:512
llm_load_tensors: ggml ctx size = 0.19 MiB
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors: SYCL0 buffer size = 4773.90 MiB
llm_load_tensors: CPU buffer size = 615.23 MiB
[GIN] 2024/04/23 - 13:37:01 | 200 | 54.86µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/04/23 - 13:37:01 | 200 | 3.36308ms | 127.0.0.1 | GET "/api/tags"
time=2024-04-23T13:46:43.762+02:00 level=ERROR source=server.go:285 msg="error starting llama server" server=cpu_avx2 error="timed out waiting for llama runner to start: "
time=2024-04-23T13:46:43.763+02:00 level=ERROR source=server.go:293 msg="unable to load any llama server" error="timed out waiting for llama runner to start: "
[GIN] 2024/04/23 - 13:46:43 | 500 | 10m1s | 127.0.0.1 | POST "/api/chat"
and I have tried llama2 also, but sometime having different issue.
Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp, line:17037, func:operator()
SYCL error: CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!
in function ggml_backend_sycl_buffer_set_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:17037
GGML_ASSERT: /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:3021: !"SYCL error"
time=2024-04-23T14:09:56.097+02:00 level=ERROR source=server.go:285 msg="error starting llama server" server=cpu_avx2 error="llama runner process no longer running: -1 error:CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!\n in function ggml_backend_sycl_buffer_set_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:17037\nGGML_ASSERT: /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:3021: !\"SYCL error\""
time=2024-04-23T14:09:56.097+02:00 level=ERROR source=server.go:293 msg="unable to load any llama server" error="llama runner process no longer running: -1 error:CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!\n in function ggml_backend_sycl_buffer_set_tensor
here is my pip list:
(llm-v2) intel@IMU-NEX-EMR1-SUT:~/ritu/ollama$ pip list
Package Version
--------------------------- ------------------
accelerate 0.21.0
annotated-types 0.6.0
bigdl-core-xe-21 2.5.0b20240421
bigdl-core-xe-esimd-21 2.5.0b20240421
certifi 2024.2.2
charset-normalizer 3.3.2
filelock 3.13.4
fsspec 2024.3.1
huggingface-hub 0.22.2
idna 3.7
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp 2024.1.0
ipex-llm 2.1.0b20240421
Jinja2 3.1.3
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
packaging 24.0
pillow 10.3.0
pip 23.3.1
protobuf 5.27.0rc1
psutil 5.9.8
py-cpuinfo 9.0.0
pydantic 2.7.0
pydantic_core 2.18.1
PyYAML 6.0.1
regex 2024.4.16
requests 2.31.0
safetensors 0.4.3
sentencepiece 0.2.0
setuptools 68.2.2
sympy 1.12.1rc1
tabulate 0.9.0
tokenizers 0.13.3
torch 2.1.0a0+cxx11.abi
torchvision 0.16.0a0+cxx11.abi
tqdm 4.66.2
transformers 4.31.0
typing_extensions 4.11.0
urllib3 2.2.1
wheel 0.41.2
from bigdl.
Hi @raj-ritu17
I think these two actually are different issues from above oneapi issue.
For the first one, hang of gemma:
If your program hang after llm_load_tensors: CPU buffer size = xx.xx MiB
, usually set use_mmap
to false can solve it, you can refer to this: #10797 (comment) to see how to add PARAMETER use_mmap false
for your model.
For the second, -5 (PI_ERROR_OUT_OF_RESOURCES)
-5 (PI_ERROR_OUT_OF_RESOURCES)
means you are out of your GPU memory.
You can check your GPU memory by watch -t -n 1 "sudo xpu-smi stats -d 0 | grep \"GPU Memory Used\""
or any other monitoring tool.
I guess this is caused by you are loading several models in your VRAM at the same time.
You can check it by ./ollama list
, and if there are several models, you can remove some of them by ./ollama rm xxxx
from bigdl.
@rnwang04
So, I have set the nmap and gpu parameter.
still hanging .. could be it a memory issue?
this was my memory utilization at the runtime:
from bigdl.
Hi @raj-ritu17
according to our offline sync, we could think that this problem has been solved ?
from bigdl.
Related Issues (20)
- Failing to run ipex-llm ollama on Intel Arc A770 HOT 12
- Can you help to release common.lib for llama.cpp with ipex-llm? HOT 1
- llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference HOT 3
- Segmentation fault (core dumped) while inferencing with MTL iGPU HOT 4
- Support both Llama2 and stablelm/Zephyr-3B HOT 2
- all-in-one benchmark with Baichuan2-13B OOM HOT 1
- MTL Windows Qwen-VL AttributeError: 'QWenAttention' object has no attribute 'position_ids' HOT 3
- ChatGLM run error on MTL iGPU HOT 1
- failed to run truthfulqa_mc1 by harness HOT 1
- how to switch to load multiple llm models in a streamlit page? HOT 3
- Transform a string into input llama2-specific and llama3-specific input ? HOT 1
- Docker on Windows vllm serving issue HOT 15
- default values of max_generated_tokens, top_k, top_p, and temperature? HOT 1
- log using ipex-llm instead of bigdl-llm in while running native models
- Weights of LlamaForCausalLM were not initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B-Instruct? HOT 1
- vLLM offline_inference.py failed to run on CPU inference HOT 1
- Unable to save quantized model HOT 1
- Llama 3 performance drop from transformers version 4.37.2 to 4.38.0
- about conflict HOT 1
- Phi3-4k winograde drop from 0515 version to 0516 version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.