mudler / localai Goto Github PK

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

Home Page: https://localai.io

License: MIT License

Earthly 0.01% Go 26.75% Dockerfile 0.60% Makefile 2.59% Shell 1.29% Python 7.37% CMake 0.19% C++ 60.87% HTML 0.33%

llama rwkv ai llm stable-diffusion api kubernetes gpt4all falcon tts

localai's Introduction

LocalAI

💡 Get help - ❓FAQ 💭Discussions 💬 Discord 📖 Documentation website

💻 Quickstart 📣 News 🛫 Examples 🖼️ Models 🚀 Roadmap

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU.

🔥🔥 Hot topics / Roadmap

Roadmap

Parler-TTS: #2027
Landing page: #1922
Openvino support: #1892
Vector store: #1795
All-in-one container image: #1855
Parallel function calling: #1726 / Tools API support: #1715

Hot topics (looking for contributors):

Backends v2: #1126
Improving UX v2: #1373
Assistant API: #1273
Moderation endpoint: #999
Vulkan: #1647

If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22

💻 Getting started

For a detailed step-by-step introduction, refer to the Getting Started guide.

For those in a hurry, here's a straightforward one-liner to launch a LocalAI AIO(All-in-one) Image using docker:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# or, if you have an Nvidia GPU:
# docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12

🚀 Features

📖 Text generation with GPTs (llama.cpp, gpt4all.cpp, ... 📖 and more)
🗣 Text to Audio
🔈 Audio to Text (Audio transcription with whisper.cpp)
🎨 Image generation with stable diffusion
🔥 OpenAI functions 🆕
🧠 Embeddings generation for vector databases
✍️ Constrained grammars
🖼️ Download Models directly from Huggingface
🆕 Vision API

💻 Usage

Check out the Getting started section in our documentation.

🔗 Community and integrations

Build and deploy custom containers:

https://github.com/sozercan/aikit

WebUIs:

Model galleries

https://github.com/go-skynet/model-gallery

Other:

Helm chart https://github.com/go-skynet/helm-charts
VSCode extension https://github.com/badgooooor/localai-vscode-plugin
Local Smart assistant https://github.com/mudler/LocalAGI
Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation
Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
Examples: https://github.com/mudler/LocalAI/tree/master/examples/

🔗 Resources

🆕 New! LLM finetuning guide
How to build locally
How to install in Kubernetes
Projects integrating LocalAI
How tos section (curated by our community)

📖 🎥 Media, Blogs, Social

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

❤️ Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project:


Spectro Cloud
Spectro Cloud kindly supports LocalAI by providing GPU and computing resources to run tests on lamdalabs!

And a huge shout-out to individuals sponsoring the project by donating hardware or backing the project.

Sponsor list
JDAM00 (donating HW for the CI)

🌟 Star history

📖 License

LocalAI is a community-driven project created by Ettore Di Giacinto.

MIT - Author Ettore Di Giacinto

🙇 Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

🤗 Contributors

This is a community project, a special thanks to our contributors! 🤗

localai's People

Contributors

Stargazers

Watchers

Forkers

bamoka devs-talha ezquant jackrain lapnd pub-ai evgenkud darknerofil ciaompe tylergillson mlazar-endear normen my-basement ashsimmonds sikkgit curiosity007 github-lsq3 zaephor dwongdev jesusoctavioas johnatag ebarahona nooproblem roguesupport dkstar11q limiu82214 ofux denislyashko liangsj ai-ld williamtran29 dave-gray101 davidmoore-io hhy5277 evdcush wizrds merelychris iamleon121 soon14 tiero tngamemo blm666 antongisli geekcheng devsoft hybridgroup martylake jfontestad maxwelldps nateyu antony-jia ailabteam hawkerattack cristanlk jmaigc sycomix sd0-tech nesique qxmao divaling anoop-qasolve sandipmavani pllz7 opiumfan dhruvgera ai-alebrijecircus-x juanmartinrivas mondocosm lee-b lukaslow fhachenberg brianhammons ockhamlabs donomii ckevens jonyhuang awinlei snoopycn erichou2010 damontk aicodehunt ci-forks joshuawalcher hellobinker tuanshu noxinc-dev zhangli344236745 ceonlabs zzu-andrew 19890843006 kfdslsope rozek peins caomeiyouren manoelbenicio gnnave playhousehosting rioncarter genjudev ekryski

localai's Issues

feature: GPU/CUDA support?

Please close this if it's off-topic or ill informed.

LocalAI seems to be focused on providing an OpenAI-compatible API for models running via CPU, (llama.cpp, ggml). I was excited about this project because I want to use my local models with projects like BabyAGI, AutoGPT, LangChain etc, which typically either only support OpenAI API or support OpenAI first.

I know it would add a lot of work to support every model under the sun in CPU, CUDA, ROCM, and Triton, so not proposing that, but it seems leaving CUDA off the table is really limiting this projects usability.

Am I simply wrong, and typical pt / safetensors models will work fine with LocalAI, or is this a valid concern?

When I read about LocalAI on Github, I imagined this project was more of a "dumb adapter"; an HTTP server that would route requests to models being run inside projects like text-generation-webui or others, but I see it actually does the work to stand up the models, which is impressive.

Perhaps (either in this project, or another project) it would be useful to provide a project that presents as an HTTP API / CLI, and has a simple plugin architecture to allow multiple models with different backends/requirements to interface with it, such that this project could support a variety of models without having to suffer the integration and maintenance headaches that projects like text-generation-webui are going for?

Crash upon calling `/completions`

I have LocalAI hosted in a docker container. Calling models endpoint provides expected output:

{"object":"list","data":[{"id":"ggml-gpt4all-j.bin","object":"model"},{"id":"ggml-model-f16.bin","object":"model"}]}

But providing the example prompt - at either of those models - yields an opaque looking error:

internal/poll.(*FD).Read(0xc000110000, {0xc00032e000, 0x1000, 0x1000})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x299 fp=0xc000287b40 sp=0xc000287aa8 pc=0x4ba939
net.(*netFD).Read(0xc000110000, {0xc00032e000?, 0xc000116088?, 0xc000116000?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29 fp=0xc000287b88 sp=0xc000287b40 pc=0x583d09
net.(*conn).Read(0xc000114000, {0xc00032e000?, 0xc000114000?, 0xc00032f000?})
	/usr/local/go/src/net/net.go:183 +0x45 fp=0xc000287bd0 sp=0xc000287b88 pc=0x5930c5
net.(*TCPConn).Read(0xc0003061e0?, {0xc00032e000?, 0x7b8c8f?, 0x7bbca5?})
	<autogenerated>:1 +0x29 fp=0xc000287c00 sp=0xc000287bd0 pc=0x5a5a69
bufio.(*Reader).fill(0xc00008a720)
	/usr/local/go/src/bufio/bufio.go:106 +0xff fp=0xc000287c38 sp=0xc000287c00 pc=0x5c42df
bufio.(*Reader).Peek(0xc00008a720, 0x1)
	/usr/local/go/src/bufio/bufio.go:144 +0x5d fp=0xc000287c58 sp=0xc000287c38 pc=0x5c443d
github.com/valyala/fasthttp.(*Server).serveConn(0xc000306000, {0xab69c0?, 0xc000114000})
	/go/pkg/mod/github.com/valyala/[email protected]/server.go:2183 +0x58e fp=0xc000287ec8 sp=0xc000287c58 pc=0x7c874e
github.com/valyala/fasthttp.(*Server).serveConn-fm({0xab69c0?, 0xc000114000?})
	<autogenerated>:1 +0x39 fp=0xc000287ef0 sp=0xc000287ec8 pc=0x7d8a59
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0000b7860, 0xc000118020)
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:224 +0xa9 fp=0xc000287fa0 sp=0xc000287ef0 pc=0x7d4d29
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:196 +0x38 fp=0xc000287fe0 sp=0xc000287fa0 pc=0x7d4a98
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000287fe8 sp=0xc000287fe0 pc=0x482821
created by github.com/valyala/fasthttp.(*workerPool).getCh
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:195 +0x1b0
goroutine 19 [IO wait]:
runtime.gopark(0x0?, 0xb?, 0x0?, 0x0?, 0x8?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00006e900 sp=0xc00006e8e0 pc=0x453eb6
runtime.netpollblock(0x495f05?, 0x41fb4f?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:527 +0xf7 fp=0xc00006e938 sp=0xc00006e900 pc=0x44c8b7
internal/poll.runtime_pollWait(0x7fad85b947a0, 0x72)
	/usr/local/go/src/runtime/netpoll.go:306 +0x89 fp=0xc00006e958 sp=0xc00006e938 pc=0x47d529
internal/poll.(*pollDesc).wait(0xc000110080?, 0xc000122000?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32 fp=0xc00006e980 sp=0xc00006e958 pc=0x4b9552
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000110080, {0xc000122000, 0x1000, 0x1000})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x299 fp=0xc00006ea18 sp=0xc00006e980 pc=0x4ba939
net.(*netFD).Read(0xc000110080, {0xc000122000?, 0x7fad86bc4228?, 0x7c540d?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29 fp=0xc00006ea60 sp=0xc00006ea18 pc=0x583d09
net.(*conn).Read(0xc000114008, {0xc000122000?, 0x41f605?, 0x59?})
	/usr/local/go/src/net/net.go:183 +0x45 fp=0xc00006eaa8 sp=0xc00006ea60 pc=0x5930c5
net.(*TCPConn).Read(0x1010000000000?, {0xc000122000?, 0x7fadaf75b5b8?, 0x1000?})
	<autogenerated>:1 +0x29 fp=0xc00006ead8 sp=0xc00006eaa8 pc=0x5a5a69
bufio.(*Reader).fill(0xc00011c0c0)
	/usr/local/go/src/bufio/bufio.go:106 +0xff fp=0xc00006eb10 sp=0xc00006ead8 pc=0x5c42df
bufio.(*Reader).Peek(0xc00011c0c0, 0x1)
	/usr/local/go/src/bufio/bufio.go:144 +0x5d fp=0xc00006eb30 sp=0xc00006eb10 pc=0x5c443d
github.com/valyala/fasthttp.(*RequestHeader).tryRead(0xc000120000, 0xc00011c0c0, 0x1)
	/go/pkg/mod/github.com/valyala/[email protected]/header.go:2184 +0x5a fp=0xc00006ec18 sp=0xc00006eb30 pc=0x7ac59a
github.com/valyala/fasthttp.(*RequestHeader).readLoop(0xc000306000?, 0xc00011c0c0, 0x1)
	/go/pkg/mod/github.com/valyala/[email protected]/header.go:2115 +0x4d fp=0xc00006ec58 sp=0xc00006ec18 pc=0x7abf8d
github.com/valyala/fasthttp.(*RequestHeader).Read(...)
	/go/pkg/mod/github.com/valyala/[email protected]/header.go:2106
github.com/valyala/fasthttp.(*Server).serveConn(0xc000306000, {0xab69c0?, 0xc000114008})
	/go/pkg/mod/github.com/valyala/[email protected]/server.go:2244 +0x918 fp=0xc00006eec8 sp=0xc00006ec58 pc=0x7c8ad8
github.com/valyala/fasthttp.(*Server).serveConn-fm({0xab69c0?, 0xc000114008?})
	<autogenerated>:1 +0x39 fp=0xc00006eef0 sp=0xc00006eec8 pc=0x7d8a59
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0000b7860, 0xc000118040)
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:224 +0xa9 fp=0xc00006efa0 sp=0xc00006eef0 pc=0x7d4d29
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:196 +0x38 fp=0xc00006efe0 sp=0xc00006efa0 pc=0x7d4a98
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x482821
created by github.com/valyala/fasthttp.(*workerPool).getCh
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:195 +0x1b0
goroutine 7 [sleep]:
runtime.gopark(0x219ceb706396b?, 0x965de0?, 0xf8?, 0xc1?, 0x1?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000384f88 sp=0xc000384f68 pc=0x453eb6
time.Sleep(0x3b9aca00)
	/usr/local/go/src/runtime/time.go:195 +0x135 fp=0xc000384fc8 sp=0xc000384f88 pc=0x47f695
github.com/valyala/fasthttp.updateServerDate.func1()
	/go/pkg/mod/github.com/valyala/[email protected]/header.go:2246 +0x1e fp=0xc000384fe0 sp=0xc000384fc8 pc=0x7d517e
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000384fe8 sp=0xc000384fe0 pc=0x482821
created by github.com/valyala/fasthttp.updateServerDate
	/go/pkg/mod/github.com/valyala/[email protected]/header.go:2244 +0x25
rax    0x478a693f04c46b1d
rbx    0x0
�������
rcx    0x270
rdx    0x4c46d8c
rdi    0x7fad78000cd0
rsi    0x16e0
rbp    0x7fad87c03d60
rsp    0x7fad87c02890
r8     0x7fad78000cd0
r9     0x7fad78000080
r10    0x6f
r11    0x0
r12    0x7fad78000ca0
r13    0x0
r14    0x7fad78000cd0
�������
r15    0x200
rip    0x904e04
rflags 0x10246
cs     0x33
fs     0x0
gs     0x0

Is there anything more I can try to help diagnose the reason?

I am running this on an HP z800 Workstation which is a fairly old machine using Dual Xeon X5570 CPU's. These don't have the AVX instruction set, in case that's a hard requirement, with proc/cpuinfo being:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
stepping	: 5
microcode	: 0x1d
cpu MHz		: 1596.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d
vmx flags	: vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 5860.84
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Standalone llama.cpp works, albeit slowly.

SIGILL Immediately

Hi, I'm trying this out but I don't get anywhere, I'm just trying the normal instructions but it dies right away.

# docker run -ti --rm quay.io/go-skynet/llama-cli:v0.3  --instruction "What's an alpaca?" --topk 10000
Unable to find image 'quay.io/go-skynet/llama-cli:v0.3' locally
v0.3: Pulling from go-skynet/llama-cli
3e440a704568: Already exists 
68a71c865a2c: Already exists 
670730c27c2e: Already exists 
5a7a2c95f0f8: Already exists 
db119aaf144b: Pull complete 
6f87262882f9: Pull complete 
28b555baed36: Pull complete 
8116bacd01c4: Pull complete 
4e04bf3ce7a2: Pull complete 
cc7bc433a9c3: Pull complete 
226e900d2e08: Pull complete 
3073586a35a4: Pull complete 
fa6b8559eb8a: Pull complete 
dc1c10a22389: Pull complete 
Digest: sha256:a93f23f48cf9df2f1386f6aebf6c1a16cca7c6581e758d4859594e80a6986ee2
Status: Downloaded newer image for quay.io/go-skynet/llama-cli:v0.3
SIGILL: illegal instruction
PC=0x8beac9 m=0 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc5 0xf0 0x57 0xc9 0x48 0x8b 0x4 0xf8 0x48 0x85 0xc0 0x78 0x1a 0xc4 0xe1 0xf2

goroutine 1 [syscall]:
runtime.cgocall(0x8a33d0, 0xc00018f880)
	/usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc00018f858 sp=0xc00018f820 pc=0x414f1c
github.com/go-skynet/llama/go._Cfunc_llama_bootstrap(0x225d740, 0x225d5f0, 0x200, 0x0, 0x1)
	_cgo_gotypes.go:124 +0x4c fp=0xc00018f880 sp=0xc00018f858 pc=0x511d4c
github.com/go-skynet/llama/go.New.func1(0xc00002a03b?, 0xa?, {0x9a9bc6?, 0x5?, 0x0?})
	/go/pkg/mod/github.com/go-skynet/[email protected]/go/llama.go:20 +0x78 fp=0xc00018f8d0 sp=0xc00018f880 pc=0x512258
github.com/go-skynet/llama/go.New({0xc00002a03b, 0xa}, {0xc0001c4b90, 0x2, 0x92eca0?})
	/go/pkg/mod/github.com/go-skynet/[email protected]/go/llama.go:20 +0xe5 fp=0xc00018f928 sp=0xc00018f8d0 pc=0x512125
main.llamaFromOptions(0x9ce762?)
	/build/main.go:40 +0x109 fp=0xc00018f9a8 sp=0xc00018f928 pc=0x8a15e9
main.main.func3(0xc0001b8420?)
	/build/main.go:239 +0x452 fp=0xc00018fba8 sp=0xc00018f9a8 pc=0x8a2b12
github.com/urfave/cli/v2.(*Command).Run(0xc0001b8420, 0xc0000dac00, {0xc0000ac000, 0x5, 0x5})
	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:273 +0x9eb fp=0xc00018fe48 sp=0xc00018fba8 pc=0x88e1ab
github.com/urfave/cli/v2.(*App).RunContext(0xc0001645a0, {0xa71a98?, 0xc0000ae000}, {0xc0000ac000, 0x5, 0x5})
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x616 fp=0xc00018feb8 sp=0xc00018fe48 pc=0x88b016
github.com/urfave/cli/v2.(*App).Run(...)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309
main.main()
	/build/main.go:262 +0x931 fp=0xc00018ff80 sp=0xc00018feb8 pc=0x8a2191
runtime.main()
	/usr/local/go/src/runtime/proc.go:250 +0x207 fp=0xc00018ffe0 sp=0xc00018ff80 pc=0x4484e7
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00018ffe8 sp=0xc00018ffe0 pc=0x477d61

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000042fb0 sp=0xc000042f90 pc=0x448916
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.forcegchelper()
	/usr/local/go/src/runtime/proc.go:305 +0xb0 fp=0xc000042fe0 sp=0xc000042fb0 pc=0x448750
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000042fe8 sp=0xc000042fe0 pc=0x477d61
created by runtime.init.6
	/usr/local/go/src/runtime/proc.go:293 +0x25

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000043780 sp=0xc000043760 pc=0x448916
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
	/usr/local/go/src/runtime/mgcsweep.go:278 +0x8e fp=0xc0000437c8 sp=0xc000043780 pc=0x434bee
runtime.gcenable.func1()
	/usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc0000437e0 sp=0xc0000437c8 pc=0x429ec6
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000437e8 sp=0xc0000437e0 pc=0x477d61
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:178 +0x6b

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc00006a000?, 0xa6a010?, 0x1?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000043f70 sp=0xc000043f50 pc=0x448916
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.(*scavengerState).park(0xe03d60)
	/usr/local/go/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc000043fa0 sp=0xc000043f70 pc=0x432b33
runtime.bgscavenge(0x0?)
	/usr/local/go/src/runtime/mgcscavenge.go:628 +0x45 fp=0xc000043fc8 sp=0xc000043fa0 pc=0x433105
runtime.gcenable.func2()
	/usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc000043fe0 sp=0xc000043fc8 pc=0x429e66
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000043fe8 sp=0xc000043fe0 pc=0x477d61
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:179 +0xaa

goroutine 18 [finalizer wait]:
runtime.gopark(0x1a0?, 0xe04c80?, 0xe0?, 0x24?, 0xc000042770?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000042628 sp=0xc000042608 pc=0x448916
runtime.runfinq()
	/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000427e0 sp=0xc000042628 pc=0x428f07
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000427e8 sp=0xc0000427e0 pc=0x477d61
created by runtime.createfing
	/usr/local/go/src/runtime/mfinal.go:163 +0x45

rax    0xa9ca80
rbx    0x225d680
rcx    0x224b010
rdx    0x1
rdi    0x0
rsi    0x1
rbp    0x7ffc8d90e620
rsp    0x7ffc8d90e058
r8     0x2
r9     0x1
r10    0xffffffffffffff65
r11    0x7f4b70e8e720
r12    0x20
r13    0x200
r14    0x7cff
r15    0x1000
rip    0x8beac9
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0

Update llama to pickup "Avoid heavy V transpose operation + improvements"

Please see ggerganov/llama.cpp#775

I added issue in this repo since the https://github.com/go-skynet/llama repo doesn't have issues available.

Thanks! 😸

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

fix(deps): update github.com/ggerganov/whisper.cpp/bindings/go digest to 54c978c
fix(deps): update github.com/nomic-ai/gpt4all/gpt4all-bindings/golang digest to 9193a95
fix(deps): update github.com/tmc/langchaingo digest to b33244e
fix(deps): update module github.com/google/uuid to v1.4.0
fix(deps): update module github.com/onsi/gomega to v1.29.0
Click on this checkbox to rebase all open PRs at once

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

docker-compose

docker-compose.yaml

dockerfile

Dockerfile

golang 1.21-bullseye

github-actions

.github/workflows/bump_deps.yaml

actions/checkout v4

peter-evans/create-pull-request v5

.github/workflows/image.yml

actions/checkout v4

docker/metadata-action v5

docker/login-action v3

docker/build-push-action v5

.github/workflows/release.yaml

actions/checkout v4

actions/setup-go v4

actions/upload-artifact v3

softprops/action-gh-release v1

actions/checkout v4

actions/setup-go v4

actions/upload-artifact v3

softprops/action-gh-release v1

.github/workflows/test-gpu.yml

actions/checkout v4

actions/setup-go v4

.github/workflows/test.yml

actions/checkout v4

actions/setup-go v4

actions/checkout v4

actions/setup-go v4

gomod

go.mod

go 1.21

github.com/donomii/go-rwkv.cpp v0.0.0-20230715075832-c898cd0f62df@c898cd0f62df

github.com/ggerganov/whisper.cpp/bindings/go v0.0.0-20230628193450-85ed71aaec8e@85ed71aaec8e

github.com/go-audio/wav v1.1.0

github.com/go-skynet/bloomz.cpp v0.0.0-20230529155654-1834e77b83fa@1834e77b83fa

github.com/go-skynet/go-bert.cpp v0.0.0-20230716133540-6abe312cded1@6abe312cded1

github.com/go-skynet/go-ggml-transformers.cpp v0.0.0-20230714203132-ffb09d7dd71e@ffb09d7dd71e

github.com/go-skynet/go-llama.cpp v0.0.0-20231009155254-aeba71ee8428@aeba71ee8428

github.com/gofiber/fiber/v2 v2.50.0

github.com/google/uuid v1.3.1

github.com/hashicorp/go-multierror v1.1.1

github.com/hpcloud/tail v1.0.0

github.com/imdario/mergo v0.3.16

github.com/json-iterator/go v1.1.12

github.com/mholt/archiver/v3 v3.5.1

github.com/mudler/go-ggllm.cpp v0.0.0-20230709223052-862477d16eef@862477d16eef

github.com/mudler/go-processmanager v0.0.0-20230818213616-f204007f963c@f204007f963c

github.com/mudler/go-stable-diffusion v0.0.0-20230605122230-d89260f598af@d89260f598af

github.com/nomic-ai/gpt4all/gpt4all-bindings/golang v0.0.0-20231022042237-c25dc5193530@c25dc5193530

github.com/onsi/ginkgo/v2 v2.13.0

github.com/onsi/gomega v1.28.1

github.com/otiai10/openaigo v1.6.0

github.com/phayes/freeport v0.0.0-20220201140144-74d24b5ae9f5@74d24b5ae9f5

github.com/prometheus/client_golang v1.17.0

github.com/rs/zerolog v1.31.0

github.com/sashabaranov/go-openai v1.16.0

github.com/schollz/progressbar/v3 v3.13.1

github.com/tmc/langchaingo v0.0.0-20231019140956-c636b3da7701@c636b3da7701

github.com/urfave/cli/v2 v2.25.7

github.com/valyala/fasthttp v1.50.0

go.opentelemetry.io/otel v1.19.0

go.opentelemetry.io/otel/exporters/prometheus v0.42.0

go.opentelemetry.io/otel/metric v1.19.0

go.opentelemetry.io/otel/sdk/metric v1.19.0

google.golang.org/grpc v1.59.0

google.golang.org/protobuf v1.31.0

gopkg.in/yaml.v2 v2.4.0

gopkg.in/yaml.v3 v3.0.1

github.com/shirou/gopsutil/v3 v3.23.9

github.com/mudler/go-piper v0.0.0-20230621222733-56b8a81b4760@56b8a81b4760

pip_requirements

extra/requirements.txt

Check this box to trigger a request for Renovate to run again on this repository

Feature request: docker-compose file

That would be great to offer a simple way to run the api locally

macOS/build-locally instructions

This is about fixing up the docs on building locally and/or find an easier way to run it on Mac where docker containers can't be used.

feature: support cerebras gpt2

Go port: https://github.com/go-skynet/go-gpt2.cpp

Simultaneous api requests crash the application

Whenever I send an api request before another one is finished, I receive the following error:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8b1421]

runtime stack:
runtime.throw({0x9c56fb?, 0x4?})
	/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7f33477e41d0 sp=0x7f33477e41a0 pc=0x445bfd
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:821 +0x3e9 fp=0x7f33477e4230 sp=0x7f33477e41d0 pc=0x45c049

goroutine 34 [syscall]:
runtime.cgocall(0x8a3420, 0xc000169888)
	/usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc000169860 sp=0xc000169828 pc=0x414f1c
github.com/go-skynet/llama/go._Cfunc_llama_predict(0x7f3334000c30, 0x15e85f0, 0xc000318080)
	_cgo_gotypes.go:154 +0x4c fp=0xc000169888 sp=0xc000169860 pc=0x511f0c
github.com/go-skynet/llama/go.(*LLama).Predict.func1(0x7f3334000b60?, 0x8ffffffff?, {0xc000318080, 0x3f0000003f666666?, 0x403f800000?})
	/go/pkg/mod/github.com/go-skynet/[email protected]/go/llama.go:39 +0x7e fp=0xc0001698c8 sp=0xc000169888 pc=0x5127de
github.com/go-skynet/llama/go.(*LLama).Predict(0x9a7f62?, {0xc00033c000, 0xc5}, {0xc000169af8, 0x5, 0x1?})
	/go/pkg/mod/github.com/go-skynet/[email protected]/go/llama.go:39 +0x285 fp=0xc000169a20 sp=0xc0001698c8 pc=0x512545
main.api.func1(0x939a80?)
	/build/api.go:55 +0x359 fp=0xc000169b30 sp=0xc000169a20 pc=0x89fe99
github.com/gofiber/fiber/v2.(*App).next(0xc000132900, 0xc00033a000)
	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:134 +0x1b6 fp=0xc000169bd8 sp=0xc000169b30 pc=0x81d996
github.com/gofiber/fiber/v2.(*App).handler(0xc000132900, 0x4d8f37?)
	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:160 +0x87 fp=0xc000169c38 sp=0xc000169bd8 pc=0x81dbc7
github.com/gofiber/fiber/v2.(*App).handler-fm(0xc00032e000?)
	<autogenerated>:1 +0x2c fp=0xc000169c58 sp=0xc000169c38 pc=0x822d4c
github.com/valyala/fasthttp.(*Server).serveConn(0xc0002a2000, {0xa74040?, 0xc000306018})
	/go/pkg/mod/github.com/valyala/[email protected]/server.go:2372 +0x11d3 fp=0xc000169ec8 sp=0xc000169c58 pc=0x7815f3
github.com/valyala/fasthttp.(*Server).serveConn-fm({0xa74040?, 0xc000306018?})
	<autogenerated>:1 +0x39 fp=0xc000169ef0 sp=0xc000169ec8 pc=0x790cb9
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0001234a0, 0xc00031a000)
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:224 +0xa9 fp=0xc000169fa0 sp=0xc000169ef0 pc=0x78cf89
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:196 +0x38 fp=0xc000169fe0 sp=0xc000169fa0 pc=0x78ccf8
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000169fe8 sp=0xc000169fe0 pc=0x477d61
created by github.com/valyala/fasthttp.(*workerPool).getCh
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:195 +0x1b0

goroutine 1 [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000265368 sp=0xc000265348 pc=0x448916
runtime.netpollblock(0x7f334c2b87f8?, 0x4145af?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:527 +0xf7 fp=0xc0002653a0 sp=0xc000265368 pc=0x441317
internal/poll.runtime_pollWait(0x7f334c2c1518, 0x72)
	/usr/local/go/src/runtime/netpoll.go:306 +0x89 fp=0xc0002653c0 sp=0xc0002653a0 pc=0x472729
internal/poll.(*pollDesc).wait(0xc00015eb00?, 0x437d60?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32 fp=0xc0002653e8 sp=0xc0002653c0 pc=0x4e6952
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00015eb00)
	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x2bd fp=0xc000265490 sp=0xc0002653e8 pc=0x4ec25d
net.(*netFD).accept(0xc00015eb00)
	/usr/local/go/src/net/fd_unix.go:172 +0x35 fp=0xc000265548 sp=0xc000265490 pc=0x56cb75
net.(*TCPListener).accept(0xc000012888)
	/usr/local/go/src/net/tcpsock_posix.go:148 +0x25 fp=0xc000265570 sp=0xc000265548 pc=0x582c05
net.(*TCPListener).Accept(0xc000012888)
	/usr/local/go/src/net/tcpsock.go:297 +0x3d fp=0xc0002655a0 sp=0xc000265570 pc=0x581cfd
github.com/valyala/fasthttp.acceptConn(0xc0002a2000, {0xa71790, 0xc000012888}, 0xc000265798)
	/go/pkg/mod/github.com/valyala/[email protected]/server.go:1931 +0x62 fp=0xc000265680 sp=0xc0002655a0 pc=0x77fa82
github.com/valyala/fasthttp.(*Server).Serve(0xc0002a2000, {0xa71790?, 0xc000012888})
	/go/pkg/mod/github.com/valyala/[email protected]/server.go:1824 +0x4f4 fp=0xc0002657c8 sp=0xc000265680 pc=0x77f094
github.com/gofiber/fiber/v2.(*App).Listen(0xc000132900, {0x9a9310?, 0x4?})
	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/listen.go:82 +0x110 fp=0xc000265828 sp=0xc0002657c8 pc=0x815190
main.api(0xc000014338, {0x9adfd0?, 0x7?}, 0x8)
	/build/api.go:76 +0xfb fp=0xc000265890 sp=0xc000265828 pc=0x89fafb
main.main.func2(0xc000290160?)
	/build/main.go:184 +0xf2 fp=0xc000265908 sp=0xc000265890 pc=0x8a2692
github.com/urfave/cli/v2.(*Command).Run(0xc000290160, 0xc0000250c0, {0xc0001728d0, 0x3, 0x3})
	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:273 +0x9eb fp=0xc000265ba8 sp=0xc000265908 pc=0x88e1ab
github.com/urfave/cli/v2.(*Command).Run(0xc000290420, 0xc000024cc0, {0xc000024080, 0x4, 0x4})
	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:266 +0xc4d fp=0xc000265e48 sp=0xc000265ba8 pc=0x88e40d
github.com/urfave/cli/v2.(*App).RunContext(0xc0002401e0, {0xa71a98?, 0xc00002c040}, {0xc000024080, 0x4, 0x4})
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x616 fp=0xc000265eb8 sp=0xc000265e48 pc=0x88b016
github.com/urfave/cli/v2.(*App).Run(...)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309
main.main()
	/build/main.go:262 +0x931 fp=0xc000265f80 sp=0xc000265eb8 pc=0x8a2191
runtime.main()
	/usr/local/go/src/runtime/proc.go:250 +0x207 fp=0xc000265fe0 sp=0xc000265f80 pc=0x4484e7
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000265fe8 sp=0xc000265fe0 pc=0x477d61

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00004efb0 sp=0xc00004ef90 pc=0x448916
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.forcegchelper()
	/usr/local/go/src/runtime/proc.go:305 +0xb0 fp=0xc00004efe0 sp=0xc00004efb0 pc=0x448750
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00004efe8 sp=0xc00004efe0 pc=0x477d61
created by runtime.init.6
	/usr/local/go/src/runtime/proc.go:293 +0x25

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00004f780 sp=0xc00004f760 pc=0x448916
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
	/usr/local/go/src/runtime/mgcsweep.go:278 +0x8e fp=0xc00004f7c8 sp=0xc00004f780 pc=0x434bee
runtime.gcenable.func1()
	/usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc00004f7e0 sp=0xc00004f7c8 pc=0x429ec6
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00004f7e8 sp=0xc00004f7e0 pc=0x477d61
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:178 +0x6b

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc000078000?, 0xa6a010?, 0x1?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00004ff70 sp=0xc00004ff50 pc=0x448916
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.(*scavengerState).park(0xe03d60)
	/usr/local/go/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc00004ffa0 sp=0xc00004ff70 pc=0x432b33
runtime.bgscavenge(0x0?)
	/usr/local/go/src/runtime/mgcscavenge.go:628 +0x45 fp=0xc00004ffc8 sp=0xc00004ffa0 pc=0x433105
runtime.gcenable.func2()
	/usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc00004ffe0 sp=0xc00004ffc8 pc=0x429e66
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00004ffe8 sp=0xc00004ffe0 pc=0x477d61
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:179 +0xaa

goroutine 5 [finalizer wait]:
runtime.gopark(0x1a0?, 0xe04c80?, 0x60?, 0x78?, 0xc00004e770?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00004e628 sp=0xc00004e608 pc=0x448916
runtime.runfinq()
	/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00004e7e0 sp=0xc00004e628 pc=0x428f07
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00004e7e8 sp=0xc00004e7e0 pc=0x477d61
created by runtime.createfing
	/usr/local/go/src/runtime/mfinal.go:163 +0x45

goroutine 6 [sleep]:
runtime.gopark(0x63b6c190e8fcd?, 0xc000050788?, 0x25?, 0x82?, 0xc0001234d0?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000050758 sp=0xc000050738 pc=0x448916
time.Sleep(0x2540be400)
	/usr/local/go/src/runtime/time.go:195 +0x135 fp=0xc000050798 sp=0xc000050758 pc=0x474b55
github.com/valyala/fasthttp.(*workerPool).Start.func2()
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:67 +0x56 fp=0xc0000507e0 sp=0xc000050798 pc=0x78c456
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000507e8 sp=0xc0000507e0 pc=0x477d61
created by github.com/valyala/fasthttp.(*workerPool).Start
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:59 +0xdd

goroutine 7 [syscall]:
runtime.cgocall(0x8a3420, 0xc00016d888)
	/usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc00016d860 sp=0xc00016d828 pc=0x414f1c
github.com/go-skynet/llama/go._Cfunc_llama_predict(0x7f333c000c40, 0x15e85f0, 0xc000318000)
	_cgo_gotypes.go:154 +0x4c fp=0xc00016d888 sp=0xc00016d860 pc=0x511f0c
github.com/go-skynet/llama/go.(*LLama).Predict.func1(0x7f333c000b60?, 0x8ffffffff?, {0xc000318000, 0x3f0000003f666666?, 0x403f800000?})
	/go/pkg/mod/github.com/go-skynet/[email protected]/go/llama.go:39 +0x7e fp=0xc00016d8c8 sp=0xc00016d888 pc=0x5127de
github.com/go-skynet/llama/go.(*LLama).Predict(0x9a7f62?, {0xc000314000, 0xd4}, {0xc00016daf8, 0x5, 0xe8?})
	/go/pkg/mod/github.com/go-skynet/[email protected]/go/llama.go:39 +0x285 fp=0xc00016da20 sp=0xc00016d8c8 pc=0x512545
main.api.func1(0x939a80?)
	/build/api.go:55 +0x359 fp=0xc00016db30 sp=0xc00016da20 pc=0x89fe99
github.com/gofiber/fiber/v2.(*App).next(0xc000132900, 0xc00012e840)
	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:134 +0x1b6 fp=0xc00016dbd8 sp=0xc00016db30 pc=0x81d996
github.com/gofiber/fiber/v2.(*App).handler(0xc000132900, 0x4d8f37?)
	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:160 +0x87 fp=0xc00016dc38 sp=0xc00016dbd8 pc=0x81dbc7
github.com/gofiber/fiber/v2.(*App).handler-fm(0xc000179800?)
	<autogenerated>:1 +0x2c fp=0xc00016dc58 sp=0xc00016dc38 pc=0x822d4c
github.com/valyala/fasthttp.(*Server).serveConn(0xc0002a2000, {0xa74040?, 0xc000306010})
	/go/pkg/mod/github.com/valyala/[email protected]/server.go:2372 +0x11d3 fp=0xc00016dec8 sp=0xc00016dc58 pc=0x7815f3
github.com/valyala/fasthttp.(*Server).serveConn-fm({0xa74040?, 0xc000306010?})
	<autogenerated>:1 +0x39 fp=0xc00016def0 sp=0xc00016dec8 pc=0x790cb9
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0001234a0, 0xc00006f0a0)
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:224 +0xa9 fp=0xc00016dfa0 sp=0xc00016def0 pc=0x78cf89
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:196 +0x38 fp=0xc00016dfe0 sp=0xc00016dfa0 pc=0x78ccf8
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00016dfe8 sp=0xc00016dfe0 pc=0x477d61
created by github.com/valyala/fasthttp.(*workerPool).getCh
	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:195 +0x1b0

goroutine 18 [sleep]:
runtime.gopark(0x63b6e3cc0ee1e?, 0x9159a0?, 0x68?, 0x44?, 0x1?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00005cf88 sp=0xc00005cf68 pc=0x448916
time.Sleep(0x3b9aca00)
	/usr/local/go/src/runtime/time.go:195 +0x135 fp=0xc00005cfc8 sp=0xc00005cf88 pc=0x474b55
github.com/valyala/fasthttp.updateServerDate.func1()
	/go/pkg/mod/github.com/valyala/[email protected]/header.go:2246 +0x1e fp=0xc00005cfe0 sp=0xc00005cfc8 pc=0x78d3de
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00005cfe8 sp=0xc00005cfe0 pc=0x477d61
created by github.com/valyala/fasthttp.updateServerDate
	/go/pkg/mod/github.com/valyala/[email protected]/header.go:2244 +0x25```

windows compatibility?

I'm a beginner , is this program compatible with windows? what are the neccessary steps. I have the alpaca.cpp already installed on my laptop

feature: add webUI

We could use chatgpt-ui or either chatgpt-web in our docker compose file to enable a webui sitting next to the API. This would help debugging and testing models with ease.

feature: Add edit endpoint

feature: embedding support

Add support to embeddings to the API and the llama backend: https://github.com/ggerganov/llama.cpp/blob/e4422e299c10c7e84c8e987770ef40d31905a76b/llama.cpp#L2160

go-llama.cpp
go-gpt4all-j.cpp
go-gpt2.cpp

error building go-gpt4all-j

I ran into the following error while following the instructions for GPT4ALL-J. Is this a transient issue with the GPT4ALL-J code repo maybe ?

15.84 cd build && cmake ../gpt4all-j/ggml && make VERBOSE=1 ggml && cp -rf src/CMakeFiles/ggml.dir/ggml.c.o ../ggml.o
15.92 -- The C compiler identification is GNU 10.2.1
15.99 -- The CXX compiler identification is GNU 10.2.1
16.00 -- Detecting C compiler ABI info
16.07 -- Detecting C compiler ABI info - done
16.09 -- Check for working C compiler: /usr/bin/cc - skipped
16.09 -- Detecting C compile features
16.09 -- Detecting C compile features - done
16.10 -- Detecting CXX compiler ABI info
16.18 -- Detecting CXX compiler ABI info - done
16.20 -- Check for working CXX compiler: /usr/bin/c++ - skipped
16.20 -- Detecting CXX compile features
16.21 -- Detecting CXX compile features - done
16.21 -- Found Git: /usr/bin/git (found version "2.30.2")
16.22 -- Looking for pthread.h
16.29 -- Looking for pthread.h - found
16.29 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
16.37 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
16.37 -- Looking for pthread_create in pthreads
16.43 -- Looking for pthread_create in pthreads - not found
16.43 -- Looking for pthread_create in pthread
16.51 -- Looking for pthread_create in pthread - found
16.51 -- Found Threads: TRUE
16.51 -- CMAKE_SYSTEM_PROCESSOR: x86_64
16.51 -- x86 detected
16.51 -- Linux detected
16.53 -- x86 detected
16.53 -- Linux detected
16.54 CMake Error at tests/CMakeLists.txt:145 (set_target_properties):
16.54 set_target_properties called with incorrect number of arguments.
16.54
16.54
16.54 -- Configuring incomplete, errors occurred!
16.54 See also "/build/go-gpt4all-j/build/CMakeFiles/CMakeOutput.log".
16.54 See also "/build/go-gpt4all-j/build/CMakeFiles/CMakeError.log".
16.55 make[1]: Leaving directory '/build/go-gpt4all-j'
16.55 make[1]: *** [Makefile:144: ggml.o] Error 1
16.55 make: *** [Makefile:61: go-gpt4all-j/libgptj.a] Error 2

failed to solve: process "/bin/sh -c make build" did not complete successfully: exit code: 2

feature: ChatML support

Hi there,

Thanks for sharing the codes.

May I know any plan to support ChatML?

You can find out more about ChatML with the following links:
https://github.com/openai/openai-python/blob/main/chatml.md
https://cobusgreyling.medium.com/the-introduction-of-chat-markup-language-chatml-is-important-for-a-number-of-reasons-5061f6fe2a85

Thanks.

Error: cannot find -lgpt2

...
go mod edit -replace github.com/go-skynet/go-llama.cpp=/workspaces/LocalAI/go-llama
go mod edit -replace github.com/go-skynet/go-gpt4all-j.cpp=/workspaces/LocalAI/go-gpt4all-j
go mod edit -replace github.com/go-skynet/go-gpt2.cpp=/workspaces/LocalAI/go-gpt2
go run ./ api
# github.com/go-skynet/LocalAI
/usr/local/go/pkg/tool/linux_arm64/link: running g++ failed: exit status 1
/usr/bin/ld: cannot find -lgpt2
/usr/bin/ld: cannot find -lgpt2
/usr/bin/ld: cannot find -lgptj
collect2: error: ld returned 1 exit status

make: *** [Makefile:88: run] Error 1

Used #51 this makefile to generate my build on a macOS M1 arm machine.

create contribution guidelines, governance, adopters, contributors files

feature: support GPT4ALL-J

GPT4ALL-J is Apache-licensed, which makes it very appealing as could be re-distributed freely, however, they seem to have hardforked ggml/llama.cpp, so it's not compatible with the llama.cpp backend

Model too old, how do I regenerate?

Hi, thanks for this project. 😃

I've got the .bin model today I don't remember from where 😅 .

# docker run -v $PWD/models:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli api --model /models/ggml-alpaca-7b-q4.bin
llama_model_load: invalid model file '/models/ggml-alpaca-7b-q4.bin' (too old, regenerate your model files!)
llama_bootstrap: failed to load model from '/models/ggml-alpaca-7b-q4.bin'
Loading the model failed: failed loading model

How do I regenerate the model?
I tried with the instructions from README (https://github.com/go-skynet/llama-cli#using-other-models), but without success.

Set models path in .env

Would it be possible to set the models path in .env file, gpt4all chat installs its models in ~/.local/.... . Being able to set the path that models are consumed from would allow shared location, instead of having to copy the model to a fixed directory.

Models not responding (No AVX Support)

Not sure if im doing something wrong but when i send a request through curl to the api, it does this:

It doesn't go past this whatsoever, I'm new to this whole thing, so far I built the binary by itself but the same thing would happen in docker too.

If there's anything that needs to be supplied, let me know.

feature: binary releases

Now that we eased out building locally, we could as well release generic binary and optimized ones for the various platform we support

Readme request: Feature comparison for the API

Hi there,
it looks like it is quite similar to https://github.com/hyperonym/basaran

could you incorporate a table like this?

So I can better understand what I would use?
Thanks in advance.
Greetings.

feat: Add swagger docs

#168 ( https://github.com/gofiber/swagger )

CrashLoopBackOff: invalid model file '/model.bin' too old

+ llama-566dfbb897-7pzn4 › llama
llama-566dfbb897-7pzn4 llama llama_model_load: invalid model file '/model.bin' (too old, regenerate your model files!)
llama-566dfbb897-7pzn4 llama llama_bootstrap: failed to load model from '/model.bin'
llama-566dfbb897-7pzn4 llama Loading the model failed: failed loading model
- llama-566dfbb897-7pzn4 › llama
+ llama-566dfbb897-7pzn4 › llama
llama-566dfbb897-7pzn4 llama llama_model_load: invalid model file '/model.bin' (too old, regenerate your model files!)
llama-566dfbb897-7pzn4 llama llama_bootstrap: failed to load model from '/model.bin'
llama-566dfbb897-7pzn4 llama Loading the model failed: failed loading model
- llama-566dfbb897-7pzn4 › llama

Deploying the file linked on README

error

Error: initializing source docker://quay.io/go-skynet/local-api:latest: reading manifest latest in quay.io/go-skynet/local-api: unauthorized: access to the requested resource is not authorized

feature: Add CI tests with free models

feature: identify model file by SHA

If the API would support model aliases that would allow to plug into existing webui more easily, without supporting directly each one of them. Seems all the UI out there filter by known models - even if the api returns the supported models

Feature Request: select model from drop down in ui

Since you can now load multiple models, we should provide a drop down in the default web UI, and have the model be passed with each request.

acceptance criteria:

Not sure if a thread is started with each request, and how this handles multiple parallel requests, but i would make sure to check if a model is already loaded, and use the same if it is, so it doesn't load the same model between each request.
Match the openai api structure for the requests, so you can pick between models the same way you would with GPT-3.5 or GPT-4, etc.

feature: stablelm

go-skynet/go-ggml-transformers.cpp@1c24f5b

API : Output is cut

I'm trying to run the API, exmple cURL command:

curl --location --request POST 'http://10.0.1.11:8080/predict' --header 'Content-Type: application/json' --data-raw '{
    "text": "What is an alpaca?",
    "topP": 0.8,
    "topK": 50,
    "temperature": 0.7,
    "tokens": 100
}'

The answers aparently are cut short:

{"prediction":"\nAn alpaca is a domesticated member of the South American camelid family. They are related to the llama, but are smaller in size. Alpacas are raised for their fleece, which is used for making clothing, blankets, and other textiles. Alpacas are also raised for their meat, which is similar in taste to beef.\nWhat is an alpaca's habitat?\nAn alpaca's habitat is the high Andes Mountains in South America. Alpacas are native to Peru, Bolivia and Ecuador.\nWhere do"}

Is this expected ?

Server crashes after a call to handlers

I can see there are 3 endpoints using two handlers.
If I call use the path /models I get this:
{"object":"list","data":[{"id":"ggml-gpt4all-j","object":"model"}]}
If I try the other handler I get "Empty reply from server" and the container crashes.
Looking the logs of the container I see:

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.44.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 10  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................. 1 │ 
 └───────────────────────────────────────────────────┘ 

llama.cpp: loading model from /models/ggml-gpt4all-j

But nothing more.
I have tried both the Usage examples in here: https://github.com/go-skynet/LocalAI#usage
Any idea what I'm missing?
I also have tried other models with similar behaviors.
Thanks.

up: llama.cpp

update llama.cpp backend code to latest, including ggerganov/llama.cpp@9411288 . See: go-skynet/go-llama.cpp#22 (comment)

llama_bootstrap: failed to load model from '/model.bin'

Look's like the latest is failing. Perhaps a broken path?

docker run -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.4 api --context-size 700 --threads 4
Unable to find image 'quay.io/go-skynet/llama-cli:v0.4' locally
v0.4: Pulling from go-skynet/llama-cli
8022b074731d: Already exists 
7971239fe1d6: Already exists 
26c861b53509: Already exists 
1714880ecc1c: Already exists 
c71e83b44ada: Already exists 
e4448c041760: Already exists 
736f744dca4b: Already exists 
7517d65a7897: Pull complete 
0afdf5bf81eb: Pull complete 
c7aef89193c7: Pull complete 
ea356902fa2d: Pull complete 
8865ead58fd1: Pull complete 
487435084471: Pull complete 
Digest: sha256:b4a2556985d4496a1db89db50688fd3f15ffc21e76cce6b713fc4feefabd9268
Status: Downloaded newer image for quay.io/go-skynet/llama-cli:v0.4
llama_model_load: failed to open '/model.bin'
llama_bootstrap: failed to load model from '/model.bin'
Loading the model failed: failed loading model

Compilation issue on mac

Hi,

it seems the gptj compilation fails on MacOS, the previous versions without gptj support compiled fine.

The error message when doing make build (previous warnings excluded):

gptj.cpp:657:43: error: expected expression
gptj.cpp:707:14: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
gptj.cpp:707:22: warning: range-based for loop is a C++11 extension [-Wc++11-extensions]
gptj.cpp:740:5: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
make: *** [build] Error 1

For reference, this is the line the error is mentioning:

657: gptj_eval(model, params.n_threads, 0, { 0, 1, 2, 3 }, logits, mem_per_token);

Used software:

MacOS Ventura 13.3.1 (22E261)
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
go version go1.20.2 darwin/amd64

If I can do anything to help to gather more information about this I'd be happy to try.

Cheers!

feature: Add default model options as yaml files

Would be helpful to have an optional yaml file for each model, that gets loaded for model defaults.

For instance, we could specify the template there, among top_k or top_p defaults or how to translate assistant, system and user roles so we don't have to carry them in each API call. It will be also helpful to identify the backend to use (see #30 ) and as well declare which models alias. For instance gpt4all-j could alias gpt-turbo, and so on so forth.

Suffix needed?

Hi,

first of all thanks for this software!

I was wondering if the README is still correct in that the model files need a .bin suffix. I just added a symbolic link to a model and called it "gpt-4" which seemed to work as intended, I can use the same model name as with OpenAI.

Is this intended to work now or am I just exploiting a bug here? I'd very much appreciate if the models could work without the .bin suffix.

Thanks for the attention!

create blog/docs website

upstream gpt4all bindings

After having a discussion with gonzochess75 from gpt4all, seems there is interest in having golang bindings upstream!

This card is to track upstreaming our bindings (mainly the high-level additional functions to bind to the inference code) so we can just contribute all upstream, and anyone can also use gpt4all in their go programs!

Feature Request: mimic openai API endpoints

I'm using this docker compose file to deploy a front end UI that is very similar to the ChatGPT UI interface.

version: '3.6'

services:
  chatgpt:
    build: .
    # image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 9080:3000
    environment:
      - 'OPENAI_API_KEY='
      - 'OPENAI_API_HOST=http://api:8080'

  api:
    image: quay.io/go-skynet/llama-cli:v0.4
    volumes:
      - /Users/Shared/Models:/models
    ports:
      - 9000:8080
    environment:
      - MODEL_PATH=/models/7B/gpt4all-lora-quantized.bin
      - CONTEXT_SIZE=700
      - THREADS=4
    command: api

Would it be possible to add API endpoints to mimic the same output as openai? Not sure if it's easier to do here, or to to add a proxy that converts the in/out of each call. But i see value in other tools that normally call openai apis, could simply targer this local instance.

Your thoughts?

feature: auto-build images

The binaries are built with a certain CPU flagset that if not missing would panic on old CPUS, it would be nice to publish generic images with less optimizations. (but to note, inference will be painfully slow)

feature: model gallery

A curated gallery of models:

support for multiple repositories (so anyone can build their own personal gallery)
automatically download, setup model with default config files
#286
automatically load it in memory so it's already ready for inference
Wizard in the index.html -> if no models, propose to install a new model
UI to select and install models from repositories

llama-cli: command not found

Thanks so much for making and sharing this!

The first command works perfectly, but when I do the one that starts llama-cli I get 'command not found'

bossbaby@Will-of-Steve:~/projects/llama-cli$ sudo docker run -ti --rm quay.io/go-skynet/llama-cli:latest --instruction "What's an alpaca?" --topk 10000

Alpacas are domesticated animals that are closely related to llamas and camels. They are native to the Andes Mountains in South America, where they were first domesticated by the Incas.

bossbaby@Will-of-Steve:~/projects/llama-cli$ llama-cli --model ~/ggml-alpaca-7b-q4.bin --instruction "What's an alpaca?"
llama-cli: command not found

Also, I saw from the issue post in the alpaca.cpp github that with this project alpaca should be running in memory all the time, but it seems like it has to start up a new instance every time I run that first command, also when i do 'ps aux | grep alpaca' after that first command has completed there seems to be no process with 'alpaca' running. Is it possible with this to get responses as fast as in the original alpaca.cpp, but with this awesome single command API-style system?

feature: stopwords

Stopwords are implemented in the go-llama.cpp backend, however, the same should be ported to:

go-llama.cpp
go-gpt4all-j.cpp
go-gpt2.cpp
rwkv

Possible to use it without docker?

Would really appreciate some instructions or guidance for getting this working directly, without docker. I noticed it's using a modified llama.cpp mixed with golong, but I don't have enough knowledge with this to build it. I did try building it but got a error about lama.h not being found.

Docker container loading model fails - Unexepected End of File

Hey there! I am trying to use this on a docker "server" within my local network, and on that specific machine I get hit with the following error:

local-ai  | 2023-04-26T00:13:47.432284170Z error loading model: unexpectedly reached end of file
local-ai  | 2023-04-26T00:13:47.432317663Z llama_init_from_file: failed to load model
local-ai  | 2023-04-26T00:13:47.524068627Z gptj_model_load: loading model from '/models/ggml-gpt4all-j' - please wait ...
local-ai  | 2023-04-26T00:13:47.524100675Z gptj_model_load: n_vocab = 50400
local-ai  | 2023-04-26T00:13:47.524104579Z gptj_model_load: n_ctx   = 2048
local-ai  | 2023-04-26T00:13:47.524107130Z gptj_model_load: n_embd  = 4096
local-ai  | 2023-04-26T00:13:47.524110309Z gptj_model_load: n_head  = 16
local-ai  | 2023-04-26T00:13:47.524135455Z gptj_model_load: n_layer = 28
local-ai  | 2023-04-26T00:13:47.524137975Z gptj_model_load: n_rot   = 64
local-ai  | 2023-04-26T00:13:47.524140167Z gptj_model_load: f16     = 2
local-ai  | 2023-04-26T00:13:47.524142430Z gptj_model_load: ggml ctx size = 5401.45 MB
local-ai  | 2023-04-26T00:13:47.524144654Z gptj_model_load: memory_size =  1792.00 MB, n_mem = 57344
local-ai  | 2023-04-26T00:13:48.630473827Z SIGILL: illegal instruction
local-ai  | 2023-04-26T00:13:48.630521560Z PC=0x8da2a9 m=0 sigcode=2
local-ai  | 2023-04-26T00:13:48.630525275Z signal arrived during cgo execution
local-ai  | 2023-04-26T00:13:48.630527908Z instruction bytes: 0x62 0xf2 0xfd 0x8 0x7c 0xc0 0x49 0x89 0x45 0x0 0x48 0x89 0x83 0x68 0x1 0x0
local-ai  | 2023-04-26T00:13:48.630530427Z 
local-ai  | 2023-04-26T00:13:48.630532762Z goroutine 7 [syscall]:
local-ai  | 2023-04-26T00:13:48.630535146Z runtime.cgocall(0x894bf0, 0xc0002871b0)
local-ai  | 2023-04-26T00:13:48.630537480Z 	/usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc000287188 sp=0xc000287150 pc=0x4204bc
local-ai  | 2023-04-26T00:13:48.630540122Z github.com/go-skynet/go-gpt4all-j%2ecpp._Cfunc_gptj_predict(0x2cba890, 0x2c9f490, 0xc00011a200)
local-ai  | 2023-04-26T00:13:48.630542517Z 	_cgo_gotypes.go:158 +0x4c fp=0xc0002871b0 sp=0xc000287188 pc=0x5bfb8c
local-ai  | 2023-04-26T00:13:48.630544960Z github.com/go-skynet/go-gpt4all-j%2ecpp.(*GPTJ).Predict.func1(0x2ca0ea0?, 0x10ffffffff?, {0xc00011a200, 0x3f6666663f333333?, 0xc000000009?})
local-ai  | 2023-04-26T00:13:48.630547417Z 	/build/go-gpt4all-j/gptj.go:43 +0x7e fp=0xc0002871f0 sp=0xc0002871b0 pc=0x5c037e
local-ai  | 2023-04-26T00:13:48.630549880Z github.com/go-skynet/go-gpt4all-j%2ecpp.(*GPTJ).Predict(0x100c0002872e0?, {0xc0001220c0, 0xb6}, {0xc0002873d0, 0x5, 0xc0001220c0?})
local-ai  | 2023-04-26T00:13:48.630555133Z 	/build/go-gpt4all-j/gptj.go:43 +0x225 fp=0xc0002872f8 sp=0xc0002871f0 pc=0x5c0045
local-ai  | 2023-04-26T00:13:48.630557757Z github.com/go-skynet/LocalAI/api.openAIEndpoint.func1.3()
local-ai  | 2023-04-26T00:13:48.630560143Z 	/build/api/api.go:285 +0x33e fp=0xc000287408 sp=0xc0002872f8 pc=0x84b1de
local-ai  | 2023-04-26T00:13:48.630562497Z github.com/go-skynet/LocalAI/api.openAIEndpoint.func1(0x98a500?)
local-ai  | 2023-04-26T00:13:48.630564835Z 	/build/api/api.go:341 +0xebc fp=0xc0002878c8 sp=0xc000287408 pc=0x849f9c
local-ai  | 2023-04-26T00:13:48.630567217Z github.com/gofiber/fiber/v2.(*App).next(0xc0002ef680, 0xc0000d4840)
local-ai  | 2023-04-26T00:13:48.630571963Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:134 +0x1b6 fp=0xc000287970 sp=0xc0002878c8 pc=0x8424b6
local-ai  | 2023-04-26T00:13:48.630574843Z github.com/gofiber/fiber/v2.(*Ctx).Next(0xc0002fa930?)
local-ai  | 2023-04-26T00:13:48.630578591Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/ctx.go:964 +0x53 fp=0xc000287990 sp=0xc000287970 pc=0x82e733
local-ai  | 2023-04-26T00:13:48.630581401Z github.com/gofiber/fiber/v2/middleware/cors.New.func1(0xc0000d4840)
local-ai  | 2023-04-26T00:13:48.630584290Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/middleware/cors/cors.go:140 +0x385 fp=0xc000287a98 sp=0xc000287990 pc=0x848185
local-ai  | 2023-04-26T00:13:48.630586814Z github.com/gofiber/fiber/v2.(*Ctx).Next(0x14?)
local-ai  | 2023-04-26T00:13:48.630597212Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/ctx.go:961 +0x43 fp=0xc000287ab8 sp=0xc000287a98 pc=0x82e723
local-ai  | 2023-04-26T00:13:48.630599982Z github.com/gofiber/fiber/v2/middleware/recover.New.func1(0x98a500?)
local-ai  | 2023-04-26T00:13:48.630603910Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/middleware/recover/recover.go:43 +0xcb fp=0xc000287b30 sp=0xc000287ab8 pc=0x848dab
local-ai  | 2023-04-26T00:13:48.630606610Z github.com/gofiber/fiber/v2.(*App).next(0xc0002ef680, 0xc0000d4840)
local-ai  | 2023-04-26T00:13:48.630620795Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:134 +0x1b6 fp=0xc000287bd8 sp=0xc000287b30 pc=0x8424b6
local-ai  | 2023-04-26T00:13:48.630638618Z github.com/gofiber/fiber/v2.(*App).handler(0xc0002ef680, 0x4a4697?)
local-ai  | 2023-04-26T00:13:48.630646665Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:160 +0x87 fp=0xc000287c38 sp=0xc000287bd8 pc=0x8426e7
local-ai  | 2023-04-26T00:13:48.630653028Z github.com/gofiber/fiber/v2.(*App).handler-fm(0xc0002fa600?)
local-ai  | 2023-04-26T00:13:48.630658799Z 	<autogenerated>:1 +0x2c fp=0xc000287c58 sp=0xc000287c38 pc=0x84786c
local-ai  | 2023-04-26T00:13:48.630670537Z github.com/valyala/fasthttp.(*Server).serveConn(0xc000306000, {0xab69c0?, 0xc0000143c0})
local-ai  | 2023-04-26T00:13:48.630677246Z 	/go/pkg/mod/github.com/valyala/[email protected]/server.go:2372 +0x11d3 fp=0xc000287ec8 sp=0xc000287c58 pc=0x7c9393
local-ai  | 2023-04-26T00:13:48.630691410Z github.com/valyala/fasthttp.(*Server).serveConn-fm({0xab69c0?, 0xc0000143c0?})
local-ai  | 2023-04-26T00:13:48.630697411Z 	<autogenerated>:1 +0x39 fp=0xc000287ef0 sp=0xc000287ec8 pc=0x7d8a59
local-ai  | 2023-04-26T00:13:48.630702961Z github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0000b7900, 0xc00030a180)
local-ai  | 2023-04-26T00:13:48.630715544Z 	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:224 +0xa9 fp=0xc000287fa0 sp=0xc000287ef0 pc=0x7d4d29
local-ai  | 2023-04-26T00:13:48.630722293Z github.com/valyala/fasthttp.(*workerPool).getCh.func1()
local-ai  | 2023-04-26T00:13:48.630728071Z 	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:196 +0x38 fp=0xc000287fe0 sp=0xc000287fa0 pc=0x7d4a98
local-ai  | 2023-04-26T00:13:48.630733370Z runtime.goexit()
local-ai  | 2023-04-26T00:13:48.630745332Z 	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000287fe8 sp=0xc000287fe0 pc=0x482821
local-ai  | 2023-04-26T00:13:48.630752732Z created by github.com/valyala/fasthttp.(*workerPool).getCh
local-ai  | 2023-04-26T00:13:48.630758305Z 	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:195 +0x1b0
local-ai  | 2023-04-26T00:13:48.630763584Z 
local-ai  | 2023-04-26T00:13:48.630768662Z goroutine 1 [IO wait]:
local-ai  | 2023-04-26T00:13:48.630781621Z runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
local-ai  | 2023-04-26T00:13:48.630788024Z 	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0002b34a0 sp=0xc0002b3480 pc=0x453eb6
local-ai  | 2023-04-26T00:13:48.630793761Z runtime.netpollblock(0x7f5a540e36a8?, 0x41fb4f?, 0x0?)
local-ai  | 2023-04-26T00:13:48.630799030Z 	/usr/local/go/src/runtime/netpoll.go:527 +0xf7 fp=0xc0002b34d8 sp=0xc0002b34a0 pc=0x44c8b7
local-ai  | 2023-04-26T00:13:48.630873165Z internal/poll.runtime_pollWait(0x7f5a2a513980, 0x72)
local-ai  | 2023-04-26T00:13:48.630891371Z 	/usr/local/go/src/runtime/netpoll.go:306 +0x89 fp=0xc0002b34f8 sp=0xc0002b34d8 pc=0x47d529
local-ai  | 2023-04-26T00:13:48.630927780Z internal/poll.(*pollDesc).wait(0xc00002ed80?, 0x4?, 0x0)
local-ai  | 2023-04-26T00:13:48.630948702Z 	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32 fp=0xc0002b3520 sp=0xc0002b34f8 pc=0x4b9552
local-ai  | 2023-04-26T00:13:48.630956640Z internal/poll.(*pollDesc).waitRead(...)
local-ai  | 2023-04-26T00:13:48.630962491Z 	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
local-ai  | 2023-04-26T00:13:48.630967926Z internal/poll.(*FD).Accept(0xc00002ed80)
local-ai  | 2023-04-26T00:13:48.630973063Z 	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x2bd fp=0xc0002b35c8 sp=0xc0002b3520 pc=0x4bee5d
local-ai  | 2023-04-26T00:13:48.630978247Z net.(*netFD).accept(0xc00002ed80)
local-ai  | 2023-04-26T00:13:48.630983588Z 	/usr/local/go/src/net/fd_unix.go:172 +0x35 fp=0xc0002b3680 sp=0xc0002b35c8 pc=0x585e95
local-ai  | 2023-04-26T00:13:48.630991617Z net.(*TCPListener).accept(0xc000012708)
local-ai  | 2023-04-26T00:13:48.630996294Z 	/usr/local/go/src/net/tcpsock_posix.go:148 +0x25 fp=0xc0002b36a8 sp=0xc0002b3680 pc=0x59c105
local-ai  | 2023-04-26T00:13:48.631001114Z net.(*TCPListener).Accept(0xc000012708)
local-ai  | 2023-04-26T00:13:48.631005265Z 	/usr/local/go/src/net/tcpsock.go:297 +0x3d fp=0xc0002b36d8 sp=0xc0002b36a8 pc=0x59b1fd
local-ai  | 2023-04-26T00:13:48.631010413Z github.com/valyala/fasthttp.acceptConn(0xc000306000, {0xab40c0, 0xc000012708}, 0xc0002b38d0)
local-ai  | 2023-04-26T00:13:48.631016848Z 	/go/pkg/mod/github.com/valyala/[email protected]/server.go:1931 +0x62 fp=0xc0002b37b8 sp=0xc0002b36d8 pc=0x7c7822
local-ai  | 2023-04-26T00:13:48.631022467Z github.com/valyala/fasthttp.(*Server).Serve(0xc000306000, {0xab40c0?, 0xc000012708})
local-ai  | 2023-04-26T00:13:48.631027695Z 	/go/pkg/mod/github.com/valyala/[email protected]/server.go:1824 +0x4f4 fp=0xc0002b3900 sp=0xc0002b37b8 pc=0x7c6e34
local-ai  | 2023-04-26T00:13:48.631037966Z github.com/gofiber/fiber/v2.(*App).Listen(0xc0002ef680, {0x9f6919?, 0x7?})
local-ai  | 2023-04-26T00:13:48.631048021Z 	/go/pkg/mod/github.com/gofiber/fiber/[email protected]/listen.go:82 +0x110 fp=0xc0002b3960 sp=0xc0002b3900 pc=0x839cb0
local-ai  | 2023-04-26T00:13:48.631051706Z main.main.func1(0xc0002fe160?)
local-ai  | 2023-04-26T00:13:48.631054660Z 	/build/main.go:83 +0x1f5 fp=0xc0002b39f8 sp=0xc0002b3960 pc=0x885c75
local-ai  | 2023-04-26T00:13:48.631061855Z github.com/urfave/cli/v2.(*Command).Run(0xc0002fe160, 0xc000026a80, {0xc000024220, 0x1, 0x1})
local-ai  | 2023-04-26T00:13:48.631074097Z 	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:274 +0x9eb fp=0xc0002b3c98 sp=0xc0002b39f8 pc=0x873eeb
local-ai  | 2023-04-26T00:13:48.631102273Z github.com/urfave/cli/v2.(*App).RunContext(0xc0002fc000, {0xab43d8?, 0xc00002c050}, {0xc000024220, 0x1, 0x1})
local-ai  | 2023-04-26T00:13:48.631113809Z 	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x616 fp=0xc0002b3d08 sp=0xc0002b3c98 pc=0x870cf6
local-ai  | 2023-04-26T00:13:48.631138996Z github.com/urfave/cli/v2.(*App).Run(...)
local-ai  | 2023-04-26T00:13:48.631152233Z 	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309
local-ai  | 2023-04-26T00:13:48.631171211Z main.main()
local-ai  | 2023-04-26T00:13:48.631177099Z 	/build/main.go:87 +0x8c9 fp=0xc0002b3f80 sp=0xc0002b3d08 pc=0x8859a9
local-ai  | 2023-04-26T00:13:48.631181761Z runtime.main()
local-ai  | 2023-04-26T00:13:48.631184564Z 	/usr/local/go/src/runtime/proc.go:250 +0x207 fp=0xc0002b3fe0 sp=0xc0002b3f80 pc=0x453a87
local-ai  | 2023-04-26T00:13:48.631187366Z runtime.goexit()
local-ai  | 2023-04-26T00:13:48.631189955Z 	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0002b3fe8 sp=0xc0002b3fe0 pc=0x482821
local-ai  | 2023-04-26T00:13:48.631192711Z 
local-ai  | 2023-04-26T00:13:48.631195264Z goroutine 2 [force gc (idle)]:
local-ai  | 2023-04-26T00:13:48.631197913Z runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
local-ai  | 2023-04-26T00:13:48.631200946Z 	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000060fb0 sp=0xc000060f90 pc=0x453eb6
local-ai  | 2023-04-26T00:13:48.631209657Z runtime.goparkunlock(...)
local-ai  | 2023-04-26T00:13:48.631212893Z 	/usr/local/go/src/runtime/proc.go:387
local-ai  | 2023-04-26T00:13:48.631215688Z runtime.forcegchelper()
local-ai  | 2023-04-26T00:13:48.631219185Z 	/usr/local/go/src/runtime/proc.go:305 +0xb0 fp=0xc000060fe0 sp=0xc000060fb0 pc=0x453cf0
local-ai  | 2023-04-26T00:13:48.631222067Z runtime.goexit()
local-ai  | 2023-04-26T00:13:48.631228769Z 	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000060fe8 sp=0xc000060fe0 pc=0x482821
local-ai  | 2023-04-26T00:13:48.631235542Z created by runtime.init.6
local-ai  | 2023-04-26T00:13:48.631240250Z 	/usr/local/go/src/runtime/proc.go:293 +0x25
local-ai  | 2023-04-26T00:13:48.631245478Z 
local-ai  | 2023-04-26T00:13:48.631249997Z goroutine 3 [GC sweep wait]:
local-ai  | 2023-04-26T00:13:48.631260540Z runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
local-ai  | 2023-04-26T00:13:48.631270600Z 	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000061780 sp=0xc000061760 pc=0x453eb6
local-ai  | 2023-04-26T00:13:48.631288597Z runtime.goparkunlock(...)
local-ai  | 2023-04-26T00:13:48.631291756Z 	/usr/local/go/src/runtime/proc.go:387
local-ai  | 2023-04-26T00:13:48.631297449Z runtime.bgsweep(0x0?)
local-ai  | 2023-04-26T00:13:48.631304029Z 	/usr/local/go/src/runtime/mgcsweep.go:278 +0x8e fp=0xc0000617c8 sp=0xc000061780 pc=0x44018e
local-ai  | 2023-04-26T00:13:48.631307082Z runtime.gcenable.func1()
local-ai  | 2023-04-26T00:13:48.631316122Z 	/usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc0000617e0 sp=0xc0000617c8 pc=0x435466
local-ai  | 2023-04-26T00:13:48.631321955Z runtime.goexit()
local-ai  | 2023-04-26T00:13:48.631333627Z 	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000617e8 sp=0xc0000617e0 pc=0x482821
local-ai  | 2023-04-26T00:13:48.631338069Z created by runtime.gcenable
local-ai  | 2023-04-26T00:13:48.631340877Z 	/usr/local/go/src/runtime/mgc.go:178 +0x6b
local-ai  | 2023-04-26T00:13:48.631343645Z 
local-ai  | 2023-04-26T00:13:48.631346219Z goroutine 4 [GC scavenge wait]:
local-ai  | 2023-04-26T00:13:48.631352052Z runtime.gopark(0xc000088000?, 0xaacfb8?, 0x1?, 0x0?, 0x0?)
local-ai  | 2023-04-26T00:13:48.631363424Z 	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000061f70 sp=0xc000061f50 pc=0x453eb6
local-ai  | 2023-04-26T00:13:48.631366783Z runtime.goparkunlock(...)
local-ai  | 2023-04-26T00:13:48.631372288Z 	/usr/local/go/src/runtime/proc.go:387
local-ai  | 2023-04-26T00:13:48.631375320Z runtime.(*scavengerState).park(0xe36bc0)
local-ai  | 2023-04-26T00:13:48.631381090Z 	/usr/local/go/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc000061fa0 sp=0xc000061f70 pc=0x43e0d3
local-ai  | 2023-04-26T00:13:48.631384196Z runtime.bgscavenge(0x0?)
local-ai  | 2023-04-26T00:13:48.631387177Z 	/usr/local/go/src/runtime/mgcscavenge.go:628 +0x45 fp=0xc000061fc8 sp=0xc000061fa0 pc=0x43e6a5
local-ai  | 2023-04-26T00:13:48.631390226Z runtime.gcenable.func2()
local-ai  | 2023-04-26T00:13:48.631402767Z 	/usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc000061fe0 sp=0xc000061fc8 pc=0x435406
local-ai  | 2023-04-26T00:13:48.631408944Z runtime.goexit()
local-ai  | 2023-04-26T00:13:48.631421482Z 	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x482821
local-ai  | 2023-04-26T00:13:48.631427638Z created by runtime.gcenable
local-ai  | 2023-04-26T00:13:48.631430572Z 	/usr/local/go/src/runtime/mgc.go:179 +0xaa
local-ai  | 2023-04-26T00:13:48.631436909Z 
local-ai  | 2023-04-26T00:13:48.631439812Z goroutine 5 [finalizer wait]:
local-ai  | 2023-04-26T00:13:48.631445451Z runtime.gopark(0x1a0?, 0xe378a0?, 0x60?, 0x78?, 0xc000060770?)
local-ai  | 2023-04-26T00:13:48.631525149Z 	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000060628 sp=0xc000060608 pc=0x453eb6
local-ai  | 2023-04-26T00:13:48.631537876Z runtime.runfinq()
local-ai  | 2023-04-26T00:13:48.631541428Z 	/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000607e0 sp=0xc000060628 pc=0x4344a7
local-ai  | 2023-04-26T00:13:48.631548354Z runtime.goexit()
local-ai  | 2023-04-26T00:13:48.631552373Z 	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x482821
local-ai  | 2023-04-26T00:13:48.631556795Z created by runtime.createfing
local-ai  | 2023-04-26T00:13:48.631566661Z 	/usr/local/go/src/runtime/mfinal.go:163 +0x45
local-ai  | 2023-04-26T00:13:48.631571688Z 
local-ai  | 2023-04-26T00:13:48.631582007Z goroutine 6 [sleep]:
local-ai  | 2023-04-26T00:13:48.631586602Z runtime.gopark(0x1014f278b3c0?, 0xc000062788?, 0xc5?, 0x37?, 0xc0000b7930?)
local-ai  | 2023-04-26T00:13:48.631596872Z 	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000062758 sp=0xc000062738 pc=0x453eb6
local-ai  | 2023-04-26T00:13:48.631600397Z time.Sleep(0x2540be400)
local-ai  | 2023-04-26T00:13:48.631603157Z 	/usr/local/go/src/runtime/time.go:195 +0x135 fp=0xc000062798 sp=0xc000062758 pc=0x47f695
local-ai  | 2023-04-26T00:13:48.631609759Z github.com/valyala/fasthttp.(*workerPool).Start.func2()
local-ai  | 2023-04-26T00:13:48.631612639Z 	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:67 +0x56 fp=0xc0000627e0 sp=0xc000062798 pc=0x7d41f6
local-ai  | 2023-04-26T00:13:48.631615507Z runtime.goexit()
local-ai  | 2023-04-26T00:13:48.631618180Z 	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000627e8 sp=0xc0000627e0 pc=0x482821
local-ai  | 2023-04-26T00:13:48.631643792Z created by github.com/valyala/fasthttp.(*workerPool).Start
local-ai  | 2023-04-26T00:13:48.631650041Z 	/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:59 +0xdd
local-ai  | 2023-04-26T00:13:48.631654705Z 
local-ai  | 2023-04-26T00:13:48.631659179Z rax    0x43e0100
local-ai  | 2023-04-26T00:13:48.631663561Z rbx    0x7ffd4e6d1630
local-ai  | 2023-04-26T00:13:48.631667460Z rcx    0x43e02f0
local-ai  | 2023-04-26T00:13:48.631671803Z rdx    0x43e02f8
local-ai  | 2023-04-26T00:13:48.631676353Z rdi    0x43e0100
local-ai  | 2023-04-26T00:13:48.631680512Z rsi    0x1f8
local-ai  | 2023-04-26T00:13:48.631684735Z rbp    0x7ffd4e6d14d0
local-ai  | 2023-04-26T00:13:48.631689321Z rsp    0x7ffd4e6d1400
local-ai  | 2023-04-26T00:13:48.631693725Z r8     0x43e0100
local-ai  | 2023-04-26T00:13:48.631697789Z r9     0x7f5a542eebe0
local-ai  | 2023-04-26T00:13:48.631701977Z r10    0xfffffffffffff327
local-ai  | 2023-04-26T00:13:48.631706132Z r11    0x200
local-ai  | 2023-04-26T00:13:48.631710458Z r12    0x7ffd4e6d1470
local-ai  | 2023-04-26T00:13:48.631714481Z r13    0x43e00c8
local-ai  | 2023-04-26T00:13:48.631717749Z r14    0x7ffd4e6d1440
local-ai  | 2023-04-26T00:13:48.631720182Z r15    0x7ffd4e6d1618
local-ai  | 2023-04-26T00:13:48.631722494Z rip    0x8da2a9
local-ai  | 2023-04-26T00:13:48.631724802Z rflags 0x10206
local-ai  | 2023-04-26T00:13:48.631727733Z cs     0x33
local-ai  | 2023-04-26T00:13:48.631731851Z fs     0x0
local-ai  | 2023-04-26T00:13:48.631735934Z gs     0x0

I don't quite know what the issue is, and am unfamiliar with go programming, so I don't quite know where to start. I can run it locally, on my 13900K and it works without issue. The server I am trying to run it on is a proxmox host with a Intel 9900K as the processor. I don't quite know what else I'd need to provide to re-create this, but I'm hoping the logs will help.

On another side note, I tried running this on my k3s cluster, that is running off some old Xeons, and I couldn't create the container at all. After googling I guess llama.cpp needs avx2 which my xeons don't have, is this accurate for local-ai as well (would assume so, but confirmation never hurts).

Please let me know If I can provide anything else!

feature: define backend to use in the config file for the model

In this case we can just have a default for "auto"

Feature Request: Add a regenerate button

Request for clarification re llama.cpp use

Hey there!

First up, great job on the work you've done! It's really impressive

I've got a short clarification question about the llama-cli tool. As far as I understand, it uses llama.cpp to load llama-based models and runs them on a CPU-only basis. This gives an OpenAI-compatible API, which now works with tools like Chatbot-UI, thanks to your recent tweaks.

My main question is whether this setup works only with small-scale local machines using CPUs. We've got some bigger GPU clusters at our workplace, and we're testing different models in the 30B and 65B range. It'd be awesome if we could also make use the OpenAI API, so I'm trying to find a good starting point. Any pointers would be super helpful!

Also, if I'm not mistaken, the llama.cpp foundation of your repo might not work for our needs. Please let me know if I've got that wrong.

Thanks a lot and greetings from Berlin!

Robert

error: llama: model does not exist gpt: model does not exist gpt2: model does not exist stableLM: model does not exist

Hi there,

First of all, I managed to compile the binary using make build, it produced an executable file local-ai.

I started the local-ai using the following command:

./local-ai --f16 true --debug true --threads 2 --models-path ./models --context-size 2048

The app is started properly in my Ubuntu linux machine with the following screen:

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.42.0                                   
 │               http://127.0.0.1:8080                            
 │       (bound on host 0.0.0.0 and port 8080)        
 │                                                                            
 │ Handlers ............ 10  Processes ........... 1  
 │ Prefork ....... Disabled  PID ............. 31788 
 └───────────────────────────────────────────────────┘

For your information, there's only one model file in the ./models directory: ggml-gpt4all-j.bin

However, when I send the following request to the endpoint:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "ggml-gpt4all-j.bin",            
        "prompt": "What is Flutter?",
        "temperature": 0.7
}'

It send me the following response:

{"error":"llama: model does not exist gpt: model does not exist gpt2: model does not exist stableLM: model does not exist"}

May I know what did I missed? How do I find out what's wrong.
Let me know if you need more information.

Please advise. Thank you.

mudler / localai Goto Github PK

localai's Introduction

LocalAI

🔥🔥 Hot topics / Roadmap

💻 Getting started

🚀 Features

💻 Usage

🔗 Community and integrations

🔗 Resources

📖 🎥 Media, Blogs, Social

Citation

❤️ Sponsors

🌟 Star history

📖 License

🙇 Acknowledgements

🤗 Contributors

localai's People

Contributors

Stargazers

Watchers

Forkers

localai's Issues

Open

Ignored or Blocked

Detected dependencies

Recommend Projects

Recommend Topics

Recommend Org