Comments (5)
same problem here.
from localai.
Hi @mudler
I appreciate all your great work and workload
Any word on the above? Is it my misconfiguration or is this a bone fide bug?
I am stuck without a resolution path.
Regards
from localai.
I usually wouldn't add anything but because of the label "unconfirmed" I wanted to say "me too". I haven't been able to find the rootcause, a same version works but all of a sudden doesn't anymore. I might have updated my system inbetween, which could explain that.
I use my GPU nvidia with the https://github.com/Robitx/gp.nvim plugin. It fails all the time now, even on new sessions.
...
$ nix run .#local-ai-cublas -- --models-path ~/localai-models --autoload-galleries --address ":11111" --debug
....
<|im_start|>assistant
[127.0.0.1]:51000 200 - POST /v1/chat/completions
1:23AM DBG Sending chunk: {"created":1711585346,"object":"chat.completion.chunk","id":"868f2609-0af6-4e96-9e92-ff3d7fc84aca","model":"mistral","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
This
'data: {"created":1711585346,"object":"chat.completion.chunk","id":"868f2609-0af6-4e96-9e92-ff3d7fc84aca","model":"mistral","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":
0,"total_tokens":0}}\ndata: {"created":1711585346,"object":"chat.completion.chunk","id":"868f2609-0af6-4e96-9e92-ff3d7fc84aca","model":"mistral","choices":[{"index":0,"finish_reason":"stop","delta":{"content":""}}],"usage":{"prompt_tokens":0,"completion_to
kens":0,"total_tokens":0}}\ndata: [DONE]\n'
and as I was writing this message, I realized I started adding recently the --autoload-galleries and without it localAI now works again \o/ I am not sure what the flag does but looks like a tricky one !
from localai.
Same issue here. Im able to send 1-2 messages and get responses back then it just stops.
Logs
`2024-04-05 20:21:19 6:21PM DBG Model already loaded in memory: 5c7cd056ecf9a4bb5b527410b97f48cb
2024-04-05 20:21:19 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:19
2024-04-05 20:21:19 6:21PM DBG Model '5c7cd056ecf9a4bb5b527410b97f48cb' already loaded
2024-04-05 20:21:19 6:21PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341279,"level":"INFO","function":"launch_slot_with_data","line":884,"message":"slot is processing task","slot_id":0,"task_id":58}
2024-04-05 20:21:19 6:21PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341279,"level":"INFO","function":"update_slots","line":1783,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":58,"p0":0}
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"U"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"d"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"e"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"r"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":" "}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"k"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"o"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"m"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"m"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"u"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"f"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"u"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"l"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"l"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"m"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:21:23
2024-04-05 20:21:23 6:21PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"รค"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
Keeps going like this untill it stops
2024-04-05 20:23:53 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:23:53
2024-04-05 20:23:53 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:23:53
2024-04-05 20:23:53 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:23:53
2024-04-05 20:23:54 6:23PM DBG Sending chunk: {"created":1712341017,"object":"chat.completion.chunk","id":"d28dfe6e-75ec-4fea-b74a-a69f6e2afafd","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"content":"\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
2024-04-05 20:23:54
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"print_timings","line":327,"message":"prompt eval time = 3762.01 ms / 1559 tokens ( 2.41 ms per token, 414.41 tokens per second)","slot_id":0,"task_id":58,"t_prompt_processing":3762.013,"num_prompt_tokens_processed":1559,"t_token":2.413093649775497,"n_tokens_second":414.4057981724146}
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"print_timings","line":341,"message":"generation eval time = 150698.70 ms / 2048 runs ( 73.58 ms per token, 13.59 tokens per second)","slot_id":0,"task_id":58,"t_token_generation":150698.697,"n_decoded":2048,"t_token":73.58334814453124,"n_tokens_second":13.59003123961981}
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"print_timings","line":351,"message":" total time = 154460.71 ms","slot_id":0,"task_id":58,"t_prompt_processing":3762.013,"t_token_generation":150698.697,"t_total":154460.71}
2024-04-05 20:23:54 6:23PM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43219): stdout {"timestamp":1712341434,"level":"INFO","function":"update_slots","line":1594,"message":"slot released","slot_id":0,"task_id":58,"n_ctx":4096,"n_past":3606,"n_system_tokens":0,"n_cache_tokens":3607,"truncated":false}
LocalAI version:
Docker using docker-compose:
Image version: 7e498578e3fd
version: "3.9"
services:
api:
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
ports:
- 8080:8080
environment:
- DEBUG=true
# ...
volumes:
- ./models:/build/models:cached
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Environment, CPU architecture, OS, and Version:
WSL2- Ubuntu 22.04
Linux GIBBSTATION 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
CPU info:
2024-04-05 20:33:03 model name : AMD Ryzen 5 5600X 6-Core Processor
2024-04-05 20:33:03 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm
NVIDIA GPU detected via WSL2
2024-04-05 20:33:03 Fri Apr 5 18:33:03 2024
2024-04-05 20:33:03 +---------------------------------------------------------------------------------------+
2024-04-05 20:33:03 | NVIDIA-SMI 545.23.06 Driver Version: 545.92 CUDA Version: 12.3 |
2024-04-05 20:33:03 |-----------------------------------------+----------------------+----------------------+
2024-04-05 20:33:03 | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
2024-04-05 20:33:03 | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
2024-04-05 20:33:03 | | | MIG M. |
2024-04-05 20:33:03 |=========================================+======================+======================|
2024-04-05 20:33:03 | 0 NVIDIA GeForce RTX 3070 On | 00000000:2B:00.0 On | N/A |
2024-04-05 20:33:03 | 56% 46C P3 47W / 270W | 1189MiB / 8192MiB | 27% Default |
2024-04-05 20:33:03 | | | N/A |
2024-04-05 20:33:03 +-----------------------------------------+----------------------+----------------------+
2024-04-05 20:33:03
2024-04-05 20:33:03 +---------------------------------------------------------------------------------------+
2024-04-05 20:33:03 | Processes: |
2024-04-05 20:33:03 | GPU GI CI PID Type Process name GPU Memory |
2024-04-05 20:33:03 | ID ID Usage |
2024-04-05 20:33:03 |=======================================================================================|
2024-04-05 20:33:03 | 0 N/A N/A 31 G /Xwayland N/A |
2024-04-05 20:33:03 | 0 N/A N/A 33 G /Xwayland N/A |
2024-04-05 20:33:03 | 0 N/A N/A 35 G /Xwayland N/A |
2024-04-05 20:33:03 +---------------------------------------------------------------------------------------+
2024-04-05 20:33:03 NVIDIA GPU detected. Attempting to find memory size...
2024-04-05 20:33:03 Total GPU Memory: 8192 MiB
EDIT
After doing some troubleshooting it seems that the issue only occurs when trying to have a conversation around embedded documents. When curling the API with sentences manually im able to have it working without it throwing any errors.
from localai.
It seems like the issue you're experiencing is related to sending multiple messages in a conversation involving embedded documents. This issue might be due to a bug in one of the recent updates of the LocalAI version you are using. However, I have found a workaround for you to continue using the chat functionality while this issue is being investigated and fixed.
Workaround: When sending multiple messages in a conversation involving embedded documents, split the messages into separate chat completion requests with a delay between them. For example, instead of sending:
{
"object": "chat.completion.chunk",
"choices": [
{
"index": 0,
"finish_reason": "",
"delta": {
"content": 'Message 1...',
"children": [
{
"object
from localai.
Related Issues (20)
- [BUG] setuptools 70.0.0 breaks PyTorch 2.1
- Feature Discussion: Role-Based Auth HOT 2
- The API v1/images/generations do not apply the size
- Please integrate chatTTS HOT 1
- Error grabbing logs: invalid character '\x00' looking for beginning of value
- Distributed Llama.cpp Inferencing option `llamacpp-worker` not working HOT 2
- animagine-xl not working on latest-cpu
- Site missing "WEBUI" info
- Chat WebUI stops updating HOT 1
- Rerank API not accessible: {"error":{"code":500,"message":"grpc service not ready","type":""}} HOT 1
- Parler-tts doesn't work when installed from gallery, documentation unhelpful
- coqui tts: change to better maintained fork HOT 2
- Transformers backend supports mps
- API appears to hang forever if a response is interrupted HOT 1
- support models from OCI registry
- Timeout
- add zsh/bash autocompletion to local-ai CLI
- Feature request: Request the ability to upload a private SSL certificate provided by a secure solution for downloading models HOT 1
- Feature request: Request the ability to upload a private SSL certificate provided by a secure solution for downloading models
- Quickstart not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from localai.