Coder Social home page Coder Social logo

Comments (5)

gillbates avatar gillbates commented on June 11, 2024 1

same problem here.

from localai.

ga-it avatar ga-it commented on June 11, 2024

Hi @mudler

I appreciate all your great work and workload

Any word on the above? Is it my misconfiguration or is this a bone fide bug?

I am stuck without a resolution path.

Regards

from localai.

teto avatar teto commented on June 11, 2024

I usually wouldn't add anything but because of the label "unconfirmed" I wanted to say "me too". I haven't been able to find the rootcause, a same version works but all of a sudden doesn't anymore. I might have updated my system inbetween, which could explain that.
I use my GPU nvidia with the https://github.com/Robitx/gp.nvim plugin. It fails all the time now, even on new sessions.

...
$ nix run .#local-ai-cublas -- --models-path ~/localai-models --autoload-galleries --address ":11111" --debug
....
<|im_start|>assistant

[127.0.0.1]:51000 200 - POST /v1/chat/completions
1:23AM DBG Sending chunk: {"created":1711585346,"object":"chat.completion.chunk","id":"868f2609-0af6-4e96-9e92-ff3d7fc84aca","model":"mistral","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
This

'data: {"created":1711585346,"object":"chat.completion.chunk","id":"868f2609-0af6-4e96-9e92-ff3d7fc84aca","model":"mistral","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":
0,"total_tokens":0}}\ndata: {"created":1711585346,"object":"chat.completion.chunk","id":"868f2609-0af6-4e96-9e92-ff3d7fc84aca","model":"mistral","choices":[{"index":0,"finish_reason":"stop","delta":{"content":""}}],"usage":{"prompt_tokens":0,"completion_to
kens":0,"total_tokens":0}}\ndata: [DONE]\n' 

and as I was writing this message, I realized I started adding recently the --autoload-galleries and without it localAI now works again \o/ I am not sure what the flag does but looks like a tricky one !

from localai.

s0undy avatar s0undy commented on June 11, 2024

Same issue here. Im able to send 1-2 messages and get responses back then it just stops.

Logs

`2024-04-05 20:21:19 6:21PM DBG Model already loaded in memory: 5c7cd056ecf9a4bb5b527410b97f48cb
2024-04-05 20:21:19 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:19 
2024-04-05 20:21:19 6:21PM DBG Model '5c7cd056ecf9a4bb5b527410b97f48cb' already loaded
2024-04-05 20:21:19 6:21PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341279,"level":"INFO","function":"launch_slot_with_data","line":884,"message":"slot is processing task","slot_id":0,"task_id":58}
2024-04-05 20:21:19 6:21PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341279,"level":"INFO","function":"update_slots","line":1783,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":58,"p0":0}
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"U"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"d"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"e"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"r"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":" "}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"k"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"o"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"m"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"m"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"u"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"f"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"u"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"l"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"l"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"m"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23 
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"รค"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Keeps going like this untill it stops

2024-04-05 20:23:53 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

2024-04-05 20:23:53 
2024-04-05 20:23:53 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:23:53 
2024-04-05 20:23:53 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:23:53 
2024-04-05 20:23:54 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:23:54 
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"print_timings","line":327,"message":"prompt eval time     =    3762.01 ms /  1559 tokens (    2.41 ms per token,   414.41 tokens per second)","slot_id":0,"task_id":58,"t_prompt_processing":3762.013,"num_prompt_tokens_processed":1559,"t_token":2.413093649775497,"n_tokens_second":414.4057981724146}
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"print_timings","line":341,"message":"generation eval time =  150698.70 ms /  2048 runs   (   73.58 ms per token,    13.59 tokens per second)","slot_id":0,"task_id":58,"t_token_generation":150698.697,"n_decoded":2048,"t_token":73.58334814453124,"n_tokens_second":13.59003123961981}
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"print_timings","line":351,"message":"          total time =  154460.71 ms","slot_id":0,"task_id":58,"t_prompt_processing":3762.013,"t_token_generation":150698.697,"t_total":154460.71}
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"update_slots","line":1594,"message":"slot released","slot_id":0,"task_id":58,"n_ctx":4096,"n_past":3606,"n_system_tokens":0,"n_cache_tokens":3607,"truncated":false}

LocalAI version:
Docker using docker-compose:
Image version: 7e498578e3fd

version: "3.9"
services:
  api:
    image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      # ...
    volumes:
      - ./models:/build/models:cached
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Environment, CPU architecture, OS, and Version:
WSL2- Ubuntu 22.04
Linux GIBBSTATION 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

CPU info:
2024-04-05 20:33:03 model name : AMD Ryzen 5 5600X 6-Core Processor
2024-04-05 20:33:03 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm

NVIDIA GPU detected via WSL2
2024-04-05 20:33:03 Fri Apr 5 18:33:03 2024
2024-04-05 20:33:03 +---------------------------------------------------------------------------------------+
2024-04-05 20:33:03 | NVIDIA-SMI 545.23.06 Driver Version: 545.92 CUDA Version: 12.3 |
2024-04-05 20:33:03 |-----------------------------------------+----------------------+----------------------+
2024-04-05 20:33:03 | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
2024-04-05 20:33:03 | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
2024-04-05 20:33:03 | | | MIG M. |
2024-04-05 20:33:03 |=========================================+======================+======================|
2024-04-05 20:33:03 | 0 NVIDIA GeForce RTX 3070 On | 00000000:2B:00.0 On | N/A |
2024-04-05 20:33:03 | 56% 46C P3 47W / 270W | 1189MiB / 8192MiB | 27% Default |
2024-04-05 20:33:03 | | | N/A |
2024-04-05 20:33:03 +-----------------------------------------+----------------------+----------------------+
2024-04-05 20:33:03
2024-04-05 20:33:03 +---------------------------------------------------------------------------------------+
2024-04-05 20:33:03 | Processes: |
2024-04-05 20:33:03 | GPU GI CI PID Type Process name GPU Memory |
2024-04-05 20:33:03 | ID ID Usage |
2024-04-05 20:33:03 |=======================================================================================|
2024-04-05 20:33:03 | 0 N/A N/A 31 G /Xwayland N/A |
2024-04-05 20:33:03 | 0 N/A N/A 33 G /Xwayland N/A |
2024-04-05 20:33:03 | 0 N/A N/A 35 G /Xwayland N/A |
2024-04-05 20:33:03 +---------------------------------------------------------------------------------------+
2024-04-05 20:33:03 NVIDIA GPU detected. Attempting to find memory size...
2024-04-05 20:33:03 Total GPU Memory: 8192 MiB

EDIT
After doing some troubleshooting it seems that the issue only occurs when trying to have a conversation around embedded documents. When curling the API with sentences manually im able to have it working without it throwing any errors.

from localai.

localai-bot avatar localai-bot commented on June 11, 2024

It seems like the issue you're experiencing is related to sending multiple messages in a conversation involving embedded documents. This issue might be due to a bug in one of the recent updates of the LocalAI version you are using. However, I have found a workaround for you to continue using the chat functionality while this issue is being investigated and fixed.

Workaround: When sending multiple messages in a conversation involving embedded documents, split the messages into separate chat completion requests with a delay between them. For example, instead of sending:

{
  "object": "chat.completion.chunk",
  "choices": [
  {
    "index": 0,
    "finish_reason": "",
    "delta": {
      "content": 'Message 1...',
      "children": [
        {
          "object

from localai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.