Coder Social home page Coder Social logo

AIO - memory issue - embedding about localai HOT 4 OPEN

shuther avatar shuther commented on June 19, 2024
AIO - memory issue - embedding

from localai.

Comments (4)

localai-bot avatar localai-bot commented on June 19, 2024

Hello shuther,

Thank you for reporting this issue. I apologize for the inconvenience you're experiencing with the LocalAI container image. Based on the information you've provided, it seems this is a memory allocation issue on your VM.

Could you please provide the output of the following command to help us gather more details about your GPU and system capabilities?

nvidia-smi

This command will give us information about the GPU usage and memory details.

Additionally, please try to run the command you provided but with the following modification:

CUDA_LAUNCH_BLOCKING=0

This will disable CUDA launch blocking, which might help to prevent the GPU memory error. You should run the command like this:

CUDA_LAUNCH_BLOCKING=0 curl http://linuxmain.local:8445/embeddings \
  -X POST -H "Content-Type: application/json" \
  -d '{"input": "Your text string goes here", "model": "text-embedding-ada-002}"'

I understand that this issue is crucial for your project, and I assure you that we'll do our best to investigate this problem and provide a solution as soon as possible.

Best regards,

The Github Bot (Experiment of @mudler)

from localai.

shuther avatar shuther commented on June 19, 2024

works fine if I stop the docker and start it again; likely we need a better eviction approach when we switch models?

from localai.

shuther avatar shuther commented on June 19, 2024

extra logs
nvidia-smi # when I launch the docker (initial load)

Thu Apr 25 11:14:44 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2060        Off | 00000000:13:00.0  On |                  N/A |
| 38%   38C    P8              16W / 160W |    258MiB /  6144MiB |     22%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+


+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2158      G   /usr/lib/xorg/Xorg                          131MiB |
|    0   N/A  N/A      2686      G   /usr/bin/gnome-shell                         67MiB |
|    0   N/A  N/A      3376      G   /usr/bin/nextcloud                            3MiB |
|    0   N/A  N/A     24782      G   ...30092458,1701102826035513081,262144       50MiB |
+---------------------------------------------------------------------------------------+

I spotted this error also:

localai-docker-api-1  | 9:15AM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/vall-e-x/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/exllama/run.sh
localai-docker-api-1  | 9:15AM INF [llama-cpp] Attempting to load
localai-docker-api-1  | 9:15AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp
localai-docker-api-1  | 9:15AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 9:15AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: llama-cpp): {backendString:llama-cpp model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0000bae00 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 9:15AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
localai-docker-api-1  | 9:15AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:44089'
localai-docker-api-1  | 9:15AM INF [llama-cpp] Fails: fork/exec /tmp/localai/backend_data/backend-assets/grpc/llama-cpp: permission denied
localai-docker-api-1  | 9:15AM INF [llama-ggml] Attempting to load
localai-docker-api-1  | 9:15AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:44789'
localai-docker-api-1  | 9:15AM INF [rwkv] Fails: fork/exec /tmp/localai/backend_data/backend-assets/grpc/rwkv: permission denied
...
ocalai-docker-api-1  | 9:15AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
localai-docker-api-1  | 9:15AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:42503'
localai-docker-api-1  | 9:15AM INF [whisper] Fails: fork/exec /tmp/localai/backend_data/backend-assets/grpc/whisper: permission denied
localai-docker-api-1  | 9:15AM INF [stablediffusion] Attempting to load
...
localai-docker-api-1  | 9:15AM INF [/build/backend/python/vall-e-x/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS

Now with LOCALAI_SINGLE_ACTIVE_BACKEND=true we get the embedding working.
I would recommend making a change to the docker compose yaml file to load by default the .env (and update the documentation since it seems a crucial parameter)
Still the eviction in case of memory error should be tried ?

nvidia-smi

Thu Apr 25 11:19:50 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2060        Off | 00000000:13:00.0  On |                  N/A |
| 38%   39C    P8              13W / 160W |   4422MiB /  6144MiB |     20%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2158      G   /usr/lib/xorg/Xorg                          131MiB |
|    0   N/A  N/A      2686      G   /usr/bin/gnome-shell                         67MiB |
|    0   N/A  N/A      3376      G   /usr/bin/nextcloud                            3MiB |
|    0   N/A  N/A     24782      G   ...30092458,1701102826035513081,262144       50MiB |
|    0   N/A  N/A   1647486      C   python                                        0MiB |
|    0   N/A  N/A   1647698      C   python                                        0MiB |
+---------------------------------------------------------------------------------------+

from localai.

jtwolfe avatar jtwolfe commented on June 19, 2024

I believe that the eviction process is being assessed atm, maybe related to #2047 and #2102

from localai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.