Coder Social home page Coder Social logo

Comments (10)

rnwang04 avatar rnwang04 commented on May 20, 2024

Hi @raj-ritu17 ,
Would you mind sharing the result of pip list in your current conda env to help us better locate this issue ?

from bigdl.

raj-ritu17 avatar raj-ritu17 commented on May 20, 2024

@rnwang04, here is the list

(llama) intel@IMU-NEX-EMR1-SUT:~/ritu/ollama$ pip list
Package                     Version
--------------------------- ------------------
accelerate                  0.21.0
aiohttp                     3.9.3
aiosignal                   1.3.1
altair                      5.2.0
annotated-types             0.6.0
anyio                       4.3.0
async-timeout               4.0.3
attrs                       23.2.0
backoff                     2.2.1
bigdl-core-cpp              2.5.0b20240421
bigdl-core-xe-21            2.5.0b20240416
bigdl-core-xe-esimd-21      2.5.0b20240416
bigdl-llm                   2.5.0b20240416
blinker                     1.7.0
cachetools                  5.3.3
certifi                     2024.2.2
charset-normalizer          3.3.2
chromadb                    0.3.25
click                       8.1.7
clickhouse-connect          0.7.3
coloredlogs                 15.0.1
dataclasses-json            0.5.14
diskcache                   5.6.3
distro                      1.9.0
dpcpp-cpp-rt                2024.0.2
duckdb                      0.10.1
exceptiongroup              1.2.0
fastapi                     0.110.0
filelock                    3.13.1
flatbuffers                 24.3.7
frozenlist                  1.4.1
fsspec                      2024.3.0
gguf                        0.6.0
gitdb                       4.0.11
GitPython                   3.1.42
greenlet                    3.0.3
h11                         0.14.0
hnswlib                     0.8.0
httpcore                    1.0.4
httptools                   0.6.1
httpx                       0.25.2
huggingface-hub             0.21.4
humanfriendly               10.0
idna                        3.6
influxdb-client             1.41.0
intel-cmplr-lib-rt          2024.0.2
intel-cmplr-lic-rt          2024.0.2
intel-extension-for-pytorch 2.1.10+xpu
intel-opencl-rt             2024.0.2
intel-openmp                2024.0.2
ipex-llm                    2.1.0b20240421
Jinja2                      3.1.3
jsonpatch                   1.33
jsonpointer                 2.4
jsonschema                  4.21.1
jsonschema-specifications   2023.12.1
langchain                   0.1.12
langchain-community         0.0.28
langchain-core              0.1.32
langchain-text-splitters    0.0.1
langsmith                   0.1.27
llama_cpp_python            0.2.56
lz4                         4.3.3
markdown-it-py              3.0.0
MarkupSafe                  2.1.5
marshmallow                 3.21.1
mdurl                       0.1.2
mkl                         2024.0.0
mkl-dpcpp                   2024.0.0
monotonic                   1.6
mpmath                      1.3.0
multidict                   6.0.5
mypy-extensions             1.0.0
networkx                    3.2.1
numexpr                     2.9.0
numpy                       1.26.4
ollama                      0.1.7
onednn                      2024.0.0
onemkl-sycl-blas            2024.0.0
onemkl-sycl-datafitting     2024.0.0
onemkl-sycl-dft             2024.0.0
onemkl-sycl-lapack          2024.0.0
onemkl-sycl-rng             2024.0.0
onemkl-sycl-sparse          2024.0.0
onemkl-sycl-stats           2024.0.0
onemkl-sycl-vm              2024.0.0
onnxruntime                 1.17.1
openai                      1.14.1
openapi-schema-pydantic     1.2.4
orjson                      3.9.15
overrides                   7.7.0
packaging                   23.2
pandas                      2.0.3
pillow                      10.2.0
pip                         23.3.1
posthog                     3.5.0
protobuf                    4.25.3
psutil                      5.9.8
py-cpuinfo                  9.0.0
pyarrow                     15.0.1
pydantic                    1.10.14
pydantic_core               2.16.3
pydeck                      0.8.1b0
Pygments                    2.17.2
python-dateutil             2.9.0.post0
python-dotenv               1.0.1
pytz                        2024.1
PyYAML                      6.0.1
reactivex                   4.0.4
referencing                 0.34.0
regex                       2023.12.25
requests                    2.31.0
rich                        13.7.1
rpds-py                     0.18.0
safetensors                 0.4.2
sentencepiece               0.2.0
setuptools                  68.2.2
six                         1.16.0
smmap                       5.0.1
sniffio                     1.3.1
SQLAlchemy                  2.0.28
starlette                   0.36.3
streamlit                   1.32.2
sympy                       1.12
tabulate                    0.9.0
tbb                         2021.11.0
tenacity                    8.2.3
tokenizers                  0.15.2
toml                        0.10.2
toolz                       0.12.1
torch                       2.1.0a0+cxx11.abi
torchvision                 0.16.0a0+cxx11.abi
tornado                     6.4
tqdm                        4.66.2
transformers                4.36.0
typing_extensions           4.10.0
typing-inspect              0.9.0
tzdata                      2024.1
urllib3                     2.2.1
uvicorn                     0.29.0
uvloop                      0.19.0
watchdog                    4.0.0
watchfiles                  0.21.0
websockets                  12.0
wheel                       0.41.2
yarl                        1.9.4
zstandard                   0.22.0

from bigdl.

rnwang04 avatar rnwang04 commented on May 20, 2024

Hi @raj-ritu17 ,
I guess this issue is caused by :

onednn                      2024.0.0
onemkl-sycl-blas            2024.0.0
onemkl-sycl-datafitting     2024.0.0
onemkl-sycl-dft             2024.0.0
onemkl-sycl-lapack          2024.0.0
onemkl-sycl-rng             2024.0.0
onemkl-sycl-sparse          2024.0.0
onemkl-sycl-stats           2024.0.0
onemkl-sycl-vm              2024.0.0

If your want to use llama.cpp or ollama on Linux system, DO NOT USE pip to install oneapi like this pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0.
Would you mind creating a new conda env without pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 and try ollama again ?

from bigdl.

raj-ritu17 avatar raj-ritu17 commented on May 20, 2024

thanks for the update, don't we need those for one-api ?

I was following this document:
https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html

from bigdl.

rnwang04 avatar rnwang04 commented on May 20, 2024

don't we need those for one-api ?
I was following this document: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html

Yes, in our linux installation guide, we recommend using APT to install oneapi : https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi.

You don't need to install oneapi in your conda env again by pip install any oneapi related package.

So I suggest that you can create a new conda env and do not use pip to install any oneapi related package to see whether this issue can be solved.

from bigdl.

raj-ritu17 avatar raj-ritu17 commented on May 20, 2024

@rnwang04, much appreciated
this is working fine.

from bigdl.

raj-ritu17 avatar raj-ritu17 commented on May 20, 2024

@rnwang04 unfortunately, it doesn't work anymore.
it is not stable. I pulled a new model and tried to run but its just hang.

for example (llama3:

intel@IMU-NEX-EMR1-SUT:~/ritu/ollama$ ./ollama run  gemma
pulling manifest
pulling ef311de6af9d... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 5.0 GB
pulling 097a36493f71... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.4 KB
pulling 109037bec39c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  136 B
pulling 65bb16cf5983... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  109 B
pulling 0c2a5137eb3c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  483 B
verifying sha256 digest
writing manifest
removing any unused layers
success
⠸
⠦
⠇
⠧
⠏
⠙
⠏
⠙
Error: timed out waiting for llama runner to start:

and server side:

found 4 SYCL devices:
|  |                  |                                             |Compute   |Max compute|Max work|Max sub|               |
|ID|       Device Type|                                         Name|capability|units      |group   |group  |Global mem size|
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
| 0|[level_zero:gpu:0]|            Intel(R) Data Center GPU Flex 170|       1.3|        512|    1024|     32|    14193102848|
| 1|    [opencl:gpu:0]|            Intel(R) Data Center GPU Flex 170|       3.0|        512|    1024|     32|    14193102848|
| 2|    [opencl:cpu:0]|               INTEL(R) XEON(R) PLATINUM 8580|       3.0|        240|    8192|     64|    67113893888|
| 3|    [opencl:acc:0]|               Intel(R) FPGA Emulation Device|       1.2|        240|67108864|     64|    67113893888|
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:512
llm_load_tensors: ggml ctx size =    0.19 MiB
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  4773.90 MiB
llm_load_tensors:        CPU buffer size =   615.23 MiB
[GIN] 2024/04/23 - 13:37:01 | 200 |       54.86µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/04/23 - 13:37:01 | 200 |     3.36308ms |       127.0.0.1 | GET      "/api/tags"



time=2024-04-23T13:46:43.762+02:00 level=ERROR source=server.go:285 msg="error starting llama server" server=cpu_avx2 error="timed out waiting for llama runner to start: "
time=2024-04-23T13:46:43.763+02:00 level=ERROR source=server.go:293 msg="unable to load any llama server" error="timed out waiting for llama runner to start: "
[GIN] 2024/04/23 - 13:46:43 | 500 |         10m1s |       127.0.0.1 | POST     "/api/chat"

and I have tried llama2 also, but sometime having different issue.

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp, line:17037, func:operator()
SYCL error: CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!
  in function ggml_backend_sycl_buffer_set_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:17037
GGML_ASSERT: /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:3021: !"SYCL error"
time=2024-04-23T14:09:56.097+02:00 level=ERROR source=server.go:285 msg="error starting llama server" server=cpu_avx2 error="llama runner process no longer running: -1 error:CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!\n  in function ggml_backend_sycl_buffer_set_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:17037\nGGML_ASSERT: /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml-sycl.cpp:3021: !\"SYCL error\""
time=2024-04-23T14:09:56.097+02:00 level=ERROR source=server.go:293 msg="unable to load any llama server" error="llama runner process no longer running: -1 error:CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!\n  in function ggml_backend_sycl_buffer_set_tensor

here is my pip list:

(llm-v2) intel@IMU-NEX-EMR1-SUT:~/ritu/ollama$ pip list
Package                     Version
--------------------------- ------------------
accelerate                  0.21.0
annotated-types             0.6.0
bigdl-core-xe-21            2.5.0b20240421
bigdl-core-xe-esimd-21      2.5.0b20240421
certifi                     2024.2.2
charset-normalizer          3.3.2
filelock                    3.13.4
fsspec                      2024.3.1
huggingface-hub             0.22.2
idna                        3.7
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp                2024.1.0
ipex-llm                    2.1.0b20240421
Jinja2                      3.1.3
MarkupSafe                  2.1.5
mpmath                      1.3.0
networkx                    3.3
numpy                       1.26.4
packaging                   24.0
pillow                      10.3.0
pip                         23.3.1
protobuf                    5.27.0rc1
psutil                      5.9.8
py-cpuinfo                  9.0.0
pydantic                    2.7.0
pydantic_core               2.18.1
PyYAML                      6.0.1
regex                       2024.4.16
requests                    2.31.0
safetensors                 0.4.3
sentencepiece               0.2.0
setuptools                  68.2.2
sympy                       1.12.1rc1
tabulate                    0.9.0
tokenizers                  0.13.3
torch                       2.1.0a0+cxx11.abi
torchvision                 0.16.0a0+cxx11.abi
tqdm                        4.66.2
transformers                4.31.0
typing_extensions           4.11.0
urllib3                     2.2.1
wheel                       0.41.2

from bigdl.

rnwang04 avatar rnwang04 commented on May 20, 2024

Hi @raj-ritu17
I think these two actually are different issues from above oneapi issue.

For the first one, hang of gemma:

If your program hang after llm_load_tensors: CPU buffer size = xx.xx MiB, usually set use_mmap to false can solve it, you can refer to this: #10797 (comment) to see how to add PARAMETER use_mmap false for your model.

For the second, -5 (PI_ERROR_OUT_OF_RESOURCES)

-5 (PI_ERROR_OUT_OF_RESOURCES) means you are out of your GPU memory.
You can check your GPU memory by watch -t -n 1 "sudo xpu-smi stats -d 0 | grep \"GPU Memory Used\"" or any other monitoring tool.
I guess this is caused by you are loading several models in your VRAM at the same time.
You can check it by ./ollama list, and if there are several models, you can remove some of them by ./ollama rm xxxx

from bigdl.

raj-ritu17 avatar raj-ritu17 commented on May 20, 2024

@rnwang04
So, I have set the nmap and gpu parameter.
still hanging .. could be it a memory issue?

this was my memory utilization at the runtime:
image

from bigdl.

rnwang04 avatar rnwang04 commented on May 20, 2024

Hi @raj-ritu17
according to our offline sync, we could think that this problem has been solved ?

from bigdl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.