Inference api server for local gguf language models. Based on Llama.cpp
- Multi models: switch between models at runtime
- Inference queries: http api and streaming response support
- Tasks: predefined language model tasks
Works with the Infergui frontend