LocalAI version: container image: AIO Cuda12-latest <p dir="a

AIO - memory issue - embedding about localai HOT 4 OPEN

shuther commented on June 19, 2024

AIO - memory issue - embedding

from localai.

Comments (4)

localai-bot commented on June 19, 2024

Hello shuther,

Thank you for reporting this issue. I apologize for the inconvenience you're experiencing with the LocalAI container image. Based on the information you've provided, it seems this is a memory allocation issue on your VM.

Could you please provide the output of the following command to help us gather more details about your GPU and system capabilities?

nvidia-smi

This command will give us information about the GPU usage and memory details.

Additionally, please try to run the command you provided but with the following modification:

CUDA_LAUNCH_BLOCKING=0

This will disable CUDA launch blocking, which might help to prevent the GPU memory error. You should run the command like this:

CUDA_LAUNCH_BLOCKING=0 curl http://linuxmain.local:8445/embeddings \
  -X POST -H "Content-Type: application/json" \
  -d '{"input": "Your text string goes here", "model": "text-embedding-ada-002}"'

I understand that this issue is crucial for your project, and I assure you that we'll do our best to investigate this problem and provide a solution as soon as possible.

Best regards,

The Github Bot (Experiment of @mudler)

from localai.

shuther commented on June 19, 2024

works fine if I stop the docker and start it again; likely we need a better eviction approach when we switch models?

from localai.

shuther commented on June 19, 2024

extra logs
nvidia-smi # when I launch the docker (initial load)

Thu Apr 25 11:14:44 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2060        Off | 00000000:13:00.0  On |                  N/A |
| 38%   38C    P8              16W / 160W |    258MiB /  6144MiB |     22%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+


+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2158      G   /usr/lib/xorg/Xorg                          131MiB |
|    0   N/A  N/A      2686      G   /usr/bin/gnome-shell                         67MiB |
|    0   N/A  N/A      3376      G   /usr/bin/nextcloud                            3MiB |
|    0   N/A  N/A     24782      G   ...30092458,1701102826035513081,262144       50MiB |
+---------------------------------------------------------------------------------------+

I spotted this error also:

localai-docker-api-1  | 9:15AM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/vall-e-x/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/exllama/run.sh
localai-docker-api-1  | 9:15AM INF [llama-cpp] Attempting to load
localai-docker-api-1  | 9:15AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp
localai-docker-api-1  | 9:15AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 9:15AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: llama-cpp): {backendString:llama-cpp model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0000bae00 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 9:15AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
localai-docker-api-1  | 9:15AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:44089'
localai-docker-api-1  | 9:15AM INF [llama-cpp] Fails: fork/exec /tmp/localai/backend_data/backend-assets/grpc/llama-cpp: permission denied
localai-docker-api-1  | 9:15AM INF [llama-ggml] Attempting to load

localai-docker-api-1  | 9:15AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:44789'
localai-docker-api-1  | 9:15AM INF [rwkv] Fails: fork/exec /tmp/localai/backend_data/backend-assets/grpc/rwkv: permission denied
...
ocalai-docker-api-1  | 9:15AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
localai-docker-api-1  | 9:15AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:42503'
localai-docker-api-1  | 9:15AM INF [whisper] Fails: fork/exec /tmp/localai/backend_data/backend-assets/grpc/whisper: permission denied
localai-docker-api-1  | 9:15AM INF [stablediffusion] Attempting to load
...
localai-docker-api-1  | 9:15AM INF [/build/backend/python/vall-e-x/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS

Now with LOCALAI_SINGLE_ACTIVE_BACKEND=true we get the embedding working.
I would recommend making a change to the docker compose yaml file to load by default the .env (and update the documentation since it seems a crucial parameter)
Still the eviction in case of memory error should be tried ?

nvidia-smi

Thu Apr 25 11:19:50 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2060        Off | 00000000:13:00.0  On |                  N/A |
| 38%   39C    P8              13W / 160W |   4422MiB /  6144MiB |     20%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2158      G   /usr/lib/xorg/Xorg                          131MiB |
|    0   N/A  N/A      2686      G   /usr/bin/gnome-shell                         67MiB |
|    0   N/A  N/A      3376      G   /usr/bin/nextcloud                            3MiB |
|    0   N/A  N/A     24782      G   ...30092458,1701102826035513081,262144       50MiB |
|    0   N/A  N/A   1647486      C   python                                        0MiB |
|    0   N/A  N/A   1647698      C   python                                        0MiB |
+---------------------------------------------------------------------------------------+

from localai.

jtwolfe commented on June 19, 2024

I believe that the eviction process is being assessed atm, maybe related to #2047 and #2102

from localai.

AIO - memory issue - embedding about localai HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent