aronneburg / codeassistant-vscode-endpoint-server Goto Github PK

View Code? Open in Web Editor NEW

LLM based code assistant endpoint server for tabnine/huggingface-vscode extensions

License: Apache License 2.0

Python 97.71% Dockerfile 2.29%

codeassistant-vscode-endpoint-server's Introduction

Hugging Face VSCode Endpoint Server

A server for huggingface-vscode custom endpoints using LLMs.

This fork properly handles multiple client requests, adds Bearer-token authentication and supports https.

Currently, we are not using batches for inference

for a single client, it's not required
multi-client use may benefit

Usage

Install pipenv Then run

pipenv install
CUDA_VISIBLE_DEVICES=<devices> pipenv run python -m app.main --api-type=code --pretrained=<model> --auth-prefix=<token> --port 8004

Use http://localhost:8000/api/generate/ as Hugging Face Code > Model ID or Endpoint in VSCode.

In VS code:

"Hugging Face Code: Set API token" (type Ctrl + Shift + P)
Set it according to the option: --auth-prefix, which defaults to "<secret-key>"

API

curl -X POST http://localhost:8000/api/generate/ -d '{"inputs": "def fib(n):", "parameters": {"max_new_tokens": "10"}}' -H "Authorization: Bearer <secret-key>"
# response = {"generated_text": "def fib(n):\n    if n == 0:\n        return"}

Completion triggers

The extension triggers, whenever one of the keys listed below gets typed.

If the IDE does not show a suggestion after you typed key, you can retrigger it by typing DEL + key. As the server caches previous completions, this is more efficient than continuing to type.

export const COMPLETION_TRIGGERS = [
  " ",
  ".",
  "(",
  ")",
  "{",
  "}",
  "[",
  "]",
  ",",
  ":",
  "'",
  '"',
  "=",
  "<",
  ">",
  "/",
  "\\",
  "+",
  "-",
  "|",
  "&",
  "*",
  "%",
  "=",
  "$",
  "#",
  "@",
  "!",
];