Describe the bug Starting from a clean setup (Python 3.10), trying

Pretty much the same thing at first (using 0.2.9): <div class="snippet-clipboard-c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

bug: OpenLLM not loading the model,about bentoml/openllm

Comments (10)

aarnphm commented on May 22, 2024 1

This cuda 11.3, which I didn't test on. Can you try cuda 11.8?

Let me add a section to the readme about known CUDA support.

from openllm.

QLutz commented on May 22, 2024

Thanks for your answer (and the great lib by the way!)

Starting from another fresh install and running:

# uninstall previous coda install
sudo /usr/bin/nvidia-uninstall
# install cuda 11.8
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run --silent
# install openllm
conda create -n py10 python=3.10 -y
conda activate py10
pip install "openllm[llama, fine-tune, vllm]"
openllm start llama --model-id huggyllama/llama-13b

The missing SciPy issue still shows up. After installing it, the logs go straight to the checkpoint shards loading (without displaying anything about downloading the model weights). Then, nothing much happens (OpenLLM slowly uses more and more RAM but barely any CPU and no GPU). Any chance loading via CPU may be the bottleneck here ? (despite the GPU being found as evidenced by Deepspeed setting the right accelerator).

from openllm.

aarnphm commented on May 22, 2024

I just fixed a bug for loading on single gpu.

Can u try with 0.2.6?

I guess since you are using a100, it should be good to load the whole model into memory

from openllm.

QLutz commented on May 22, 2024

The logs one hour and a half after running openllm start llama --model-id huggyllama/llama-13b:

bin /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
[2023-07-24 07:39:35,243] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Downloading (…)fetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33.4k/33.4k [00:00<00:00, 13.5MB/s]
Downloading (…)of-00003.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.95G/9.95G [02:43<00:00, 60.7MB/s]
Downloading (…)of-00003.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.90G/9.90G [02:40<00:00, 61.7MB/s]
Downloading (…)of-00003.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.18G/6.18G [01:41<00:00, 61.0MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [07:06<00:00, 142.29s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.21s/it]
Downloading (…)neration_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 1.01MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [00:00<00:00, 5.03MB/s]
Downloading tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 5.05MB/s]
Downloading (…)/main/tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 12.5MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 3.17MB/s]
^C^C^C^C^C^C2023-07-24T08:00:54+0000 [DEBUG] [cli] Importing service "_service.py:svc" from working dir: "/opt/conda/envs/py10/lib/python3.10/site-packages/openllm"
bin /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
2023-07-24T08:01:14+0000 [INFO] [cli] Created a temporary directory at /tmp/tmpqthsnq8d
2023-07-24T08:01:14+0000 [INFO] [cli] Writing /tmp/tmpqthsnq8d/_remote_module_non_scriptable.py
[2023-07-24 08:01:14,881] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-07-24T08:01:16+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-24T08:01:17+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-24T08:01:17+0000 [DEBUG] [cli] Trying paths: ['/home/user/.docker/config.json', '/home/user/.dockercfg']
2023-07-24T08:01:17+0000 [DEBUG] [cli] Found file at path: /home/user/.docker/config.json
2023-07-24T08:01:17+0000 [DEBUG] [cli] Found 'credHelpers' section
2023-07-24T08:01:17+0000 [DEBUG] [cli] [Tracing] Create new propagation context: {'trace_id': 'daf4767d6aa948b4b96d0cdc18949e70', 'span_id': '8ddcc746bd7df314', 'parent_span_id': None, 'dynamic_sampling_context': None}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [12:19<00:00, 246.41s/it]
Using pad_token, but it is not set yet.

Still nothing loaded on the GPU by that time unfortunately.

from openllm.

aarnphm commented on May 22, 2024

What happens with openllm start llama --model-id huggyllama/llama-13b --debug?

from openllm.

QLutz commented on May 22, 2024

Pretty much the same thing at first (using 0.2.9):

[2023-07-25 14:03:55,952] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:04<00:00, 81.50s/it]

But things got moving when I tried to shutdown the command:

^C^C^C^C^C^CStarting server with arguments: ['/opt/conda/envs/py10/bin/python3.10', '-m', 'bentoml', 'serve-http', '_service.py:svc', '--host', '0.0.0.0', '--port', '3000', '--backlog', '2048', '--api-workers', '12', '--working-dir', '/opt/conda/envs/py10/lib/python3.10/site-packages/openllm', '--ssl-version', '17', '--ssl-ciphers', 'TLSv1']
2023-07-25T14:25:28+0000 [DEBUG] [cli] Importing service "_service.py:svc" from working dir: "/opt/conda/envs/py10/lib/python3.10/site-packages/openllm"
2023-07-25T14:25:31+0000 [DEBUG] [cli] Initializing MLIR with module: _site_initialize_0
2023-07-25T14:25:31+0000 [DEBUG] [cli] Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/opt/conda/envs/py10/lib/python3.10/site-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
2023-07-25T14:25:32+0000 [DEBUG] [cli] No jax_plugins namespace packages available
2023-07-25T14:25:33+0000 [DEBUG] [cli] etils.epath found. Using etils.epath for file I/O.
2023-07-25T14:25:51+0000 [INFO] [cli] Created a temporary directory at /tmp/tmpgwt7mutk
2023-07-25T14:25:51+0000 [INFO] [cli] Writing /tmp/tmpgwt7mutk/_remote_module_non_scriptable.py
[2023-07-25 14:25:52,312] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-07-25T14:26:01+0000 [DEBUG] [cli] Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 7 to 5
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 5 to 7
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 7 to 5
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 5 to 7
2023-07-25T14:26:11+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-25T14:26:11+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-25T14:26:11+0000 [DEBUG] [cli] Trying paths: ['/home/user/.docker/config.json', '/home/qlutz/.dockercfg']
2023-07-25T14:26:11+0000 [DEBUG] [cli] Found file at path: /home/user/.docker/config.json
2023-07-25T14:26:11+0000 [DEBUG] [cli] Found 'credHelpers' section
2023-07-25T14:26:11+0000 [DEBUG] [cli] [Tracing] Create new propagation context: {'trace_id': '663640676af84209a41185161a0d1eac', 'span_id': 'b2ab05f9966f5d45', 'parent_span_id': None, 'dynamic_sampling_context': None}
Loading checkpoint shards:   0%|                                                                                                                                                                                                                                                                                                                      | 0/3 [00:00<?, ?it/s]

Either way, nothing is loaded on the GPU.

from openllm.

aarnphm commented on May 22, 2024

how many GPUs do you have? nvidia-smi?

from openllm.

QLutz commented on May 22, 2024

Still the same setup as in the original post: 1xA100 80GB. I tested on Cuda 11.6 and 11.8

from openllm.

QLutz commented on May 22, 2024

Fixed in the last version (0.2.25) for the described setup and model. Thanks !

from openllm.

npuichigo commented on May 22, 2024

@aarnphm still has the same problem when use openllm start baichuan to load baichuan llm. No gpu usage and cannot accept requests.

from openllm.

bug: OpenLLM not loading the model about openllm HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent