Coder Social home page Coder Social logo

abetlen / llama-cpp-python Goto Github PK

View Code? Open in Web Editor NEW
6.9K 67.0 824.0 1.92 MB

Python bindings for llama.cpp

Home Page: https://llama-cpp-python.readthedocs.io

License: MIT License

Python 97.82% CMake 0.64% Dockerfile 0.62% Shell 0.57% Makefile 0.36%

llama-cpp-python's Introduction

๐Ÿฆ™ Python Bindings for llama.cpp

Documentation Status Tests PyPI PyPI - Python Version PyPI - License PyPI - Downloads Github All Releases

Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

Documentation is available at https://llama-cpp-python.readthedocs.io/en/latest.

Installation

Requirements:

  • Python 3.8+
  • C compiler
    • Linux: gcc or clang
    • Windows: Visual Studio or MinGW
    • MacOS: Xcode

To install the package, run:

pip install llama-cpp-python

This will also build llama.cpp from source and install it alongside this python package.

If this fails, add --verbose to the pip install see the full cmake build log.

Pre-built Wheel (New)

It is also possible to install a pre-built wheel with basic CPU support.

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Installation Configuration

llama.cpp supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the llama.cpp README for a full list.

All llama.cpp cmake build options can be set via the CMAKE_ARGS environment variable or via the --config-settings / -C cli flag during installation.

Environment Variables
# Linux and Mac
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \
  pip install llama-cpp-python
# Windows
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
pip install llama-cpp-python
CLI / requirements.txt

They can also be set via pip install -C / --config-settings command and saved to a requirements.txt file:

pip install --upgrade pip # ensure pip is up to date
pip install llama-cpp-python \
  -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"
# requirements.txt

llama-cpp-python -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"

Supported Backends

Below are some common backends, their build commands and any additional environment variables required.

OpenBLAS (CPU)

To install with OpenBLAS, set the LLAMA_BLAS and LLAMA_BLAS_VENDOR environment variables before installing:

CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
CUDA

To install with CUDA support, set the LLAMA_CUDA=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python

Pre-built Wheel (New)

It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements:

  • CUDA Version is 12.1, 12.2, 12.3, or 12.4
  • Python Version is 3.10, 3.11 or 3.12
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>

Where <cuda-version> is one of the following:

  • cu121: CUDA 12.1
  • cu122: CUDA 12.2
  • cu123: CUDA 12.3
  • cu124: CUDA 12.4

For example, to install the CUDA 12.1 wheel:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
Metal

To install with Metal (MPS), set the LLAMA_METAL=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

Pre-built Wheel (New)

It is also possible to install a pre-built wheel with Metal support. As long as your system meets some requirements:

  • MacOS Version is 11.0 or later
  • Python Version is 3.10, 3.11 or 3.12
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
CLBlast (OpenCL)

To install with CLBlast, set the LLAMA_CLBLAST=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
hipBLAS (ROCm)

To install with hipBLAS / ROCm support for AMD cards, set the LLAMA_HIPBLAS=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
Vulkan

To install with Vulkan support, set the LLAMA_VULKAN=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
Kompute

To install with Kompute support, set the LLAMA_KOMPUTE=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
SYCL

To install with SYCL support, set the LLAMA_SYCL=on environment variable before installing:

source /opt/intel/oneapi/setvars.sh   
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
RPC

To install with RPC support, set the LLAMA_RPC=on environment variable before installing:

source /opt/intel/oneapi/setvars.sh   
CMAKE_ARGS="-DLLAMA_RPC=on" pip install llama-cpp-python

Windows Notes

Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'

If you run into issues where it complains it can't find 'nmake' '?' or CMAKE_C_COMPILER, you can extract w64devkit as mentioned in llama.cpp repo and add those manually to CMAKE_ARGS before running pip install:

$env:CMAKE_GENERATOR = "MinGW Makefiles"
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/w64devkit/bin/g++.exe"

See the above instructions and set CMAKE_ARGS to the BLAS backend you want to use.

MacOS Notes

Detailed MacOS Metal GPU install documentation is available at docs/install/macos.md

M1 Mac Performance Issue

Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh

Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.

M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`

Try installing with

CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python

Upgrading and Reinstalling

To upgrade and rebuild llama-cpp-python add --upgrade --force-reinstall --no-cache-dir flags to the pip install command to ensure the package is rebuilt from source.

High-level API

API Reference

The high-level API provides a simple managed interface through the Llama class.

Below is a short example demonstrating how to use the high-level API to for basic text completion:

from llama_cpp import Llama

llm = Llama(
      model_path="./models/7B/llama-model.gguf",
      # n_gpu_layers=-1, # Uncomment to use GPU acceleration
      # seed=1337, # Uncomment to set a specific seed
      # n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
print(output)

By default llama-cpp-python generates completions in an OpenAI compatible format:

{
  "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "object": "text_completion",
  "created": 1679561337,
  "model": "./models/7B/llama-model.gguf",
  "choices": [
    {
      "text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
      "index": 0,
      "logprobs": None,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 28,
    "total_tokens": 42
  }
}

Text completion is available through the __call__ and create_completion methods of the Llama class.

Pulling models from Hugging Face Hub

You can download Llama models in gguf format directly from Hugging Face using the from_pretrained method. You'll need to install the huggingface-hub package to use this feature (pip install huggingface-hub).

llm = Llama.from_pretrained(
    repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF",
    filename="*q8_0.gguf",
    verbose=False
)

By default from_pretrained will download the model to the huggingface cache directory, you can then manage installed model files with the huggingface-cli tool.

Chat Completion

The high-level API also provides a simple interface for chat completion.

Chat completion requires that the model knows how to format the messages into a single prompt. The Llama class does this using pre-registered chat formats (ie. chatml, llama-2, gemma, etc) or by providing a custom chat handler object.

The model will will format the messages into a single prompt using the following order of precedence:

  • Use the chat_handler if provided
  • Use the chat_format if provided
  • Use the tokenizer.chat_template from the gguf model's metadata (should work for most new models, older models may not have this)
  • else, fallback to the llama-2 chat format

Set verbose=True to see the selected chat format.

from llama_cpp import Llama
llm = Llama(
      model_path="path/to/llama-2/llama-model.gguf",
      chat_format="llama-2"
)
llm.create_chat_completion(
      messages = [
          {"role": "system", "content": "You are an assistant who perfectly describes images."},
          {
              "role": "user",
              "content": "Describe this image in detail please."
          }
      ]
)

Chat completion is available through the create_chat_completion method of the Llama class.

For OpenAI API v1 compatibility, you use the create_chat_completion_openai_v1 method which will return pydantic models instead of dicts.

JSON and JSON Schema Mode

To constrain chat responses to only valid JSON or a specific JSON Schema use the response_format argument in create_chat_completion.

JSON Mode

The following example will constrain the response to valid JSON strings only.

from llama_cpp import Llama
llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that outputs in JSON.",
        },
        {"role": "user", "content": "Who won the world series in 2020"},
    ],
    response_format={
        "type": "json_object",
    },
    temperature=0.7,
)

JSON Schema Mode

To constrain the response further to a specific JSON Schema add the schema to the schema property of the response_format argument.

from llama_cpp import Llama
llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that outputs in JSON.",
        },
        {"role": "user", "content": "Who won the world series in 2020"},
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {"team_name": {"type": "string"}},
            "required": ["team_name"],
        },
    },
    temperature=0.7,
)

Function Calling

The high-level API supports OpenAI compatible function and tool calling. This is possible through the functionary pre-trained models chat format or through the generic chatml-function-calling chat format.

from llama_cpp import Llama
llm = Llama(model_path="path/to/chatml/llama-model.gguf", chat_format="chatml-function-calling")
llm.create_chat_completion(
      messages = [
        {
          "role": "system",
          "content": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary"

        },
        {
          "role": "user",
          "content": "Extract Jason is 25 years old"
        }
      ],
      tools=[{
        "type": "function",
        "function": {
          "name": "UserDetail",
          "parameters": {
            "type": "object",
            "title": "UserDetail",
            "properties": {
              "name": {
                "title": "Name",
                "type": "string"
              },
              "age": {
                "title": "Age",
                "type": "integer"
              }
            },
            "required": [ "name", "age" ]
          }
        }
      }],
      tool_choice={
        "type": "function",
        "function": {
          "name": "UserDetail"
        }
      }
)
Functionary v2

The various gguf-converted files for this set of models can be found here. Functionary is able to intelligently call functions and also analyze any provided function outputs to generate coherent responses. All v2 models of functionary supports parallel function calling. You can provide either functionary-v1 or functionary-v2 for the chat_format when initializing the Llama class.

Due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is required to provide HF Tokenizer for functionary. The LlamaHFTokenizer class can be initialized and passed into the Llama class. This will override the default llama.cpp tokenizer used in Llama class. The tokenizer files are already included in the respective HF repositories hosting the gguf files.

from llama_cpp import Llama
from llama_cpp.llama_tokenizer import LlamaHFTokenizer
llm = Llama.from_pretrained(
  repo_id="meetkai/functionary-small-v2.2-GGUF",
  filename="functionary-small-v2.2.q4_0.gguf",
  chat_format="functionary-v2",
  tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-small-v2.2-GGUF")
)

NOTE: There is no need to provide the default system messages used in Functionary as they are added automatically in the Functionary chat handler. Thus, the messages should contain just the chat messages and/or system messages that provide additional context for the model (e.g.: datetime, etc.).

Multi-modal Models

llama-cpp-python supports such as llava1.5 which allow the language model to read information from both text and images.

Below are the supported multi-modal models and their respective chat handlers (Python API) and chat formats (Server API).

Model LlamaChatHandler chat_format
llava-v1.5-7b Llava15ChatHandler llava-1-5
llava-v1.5-13b Llava15ChatHandler llava-1-5
llava-v1.6-34b Llava16ChatHandler llava-1-6
moondream2 MoondreamChatHandler moondream2
nanollava NanollavaChatHandler nanollava
llama-3-vision-alpha Llama3VisionAlphaChatHandler llama-3-vision-alpha

Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.

from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava15ChatHandler
chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin")
llm = Llama(
  model_path="./path/to/llava/llama-model.gguf",
  chat_handler=chat_handler,
  n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
)
llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You are an assistant who perfectly describes images."},
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }
            ]
        }
    ]
)

You can also pull the model from the Hugging Face Hub using the from_pretrained method.

from llama_cpp import Llama
from llama_cpp.llama_chat_format import MoondreamChatHandler

chat_handler = MoondreamChatHandler.from_pretrained(
  repo_id="vikhyatk/moondream2",
  filename="*mmproj*",
)

llm = Llama.from_pretrained(
  repo_id="vikhyatk/moondream2",
  filename="*text-model*",
  chat_handler=chat_handler,
  n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
)

response = llm.create_chat_completion(
    messages = [
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }

            ]
        }
    ]
)
print(response["choices"][0]["text"])

Note: Multi-modal models also support tool calling and JSON mode.

Loading a Local Image

Images can be passed as base64 encoded data URIs. The following example demonstrates how to do this.

import base64

def image_to_base64_data_uri(file_path):
    with open(file_path, "rb") as img_file:
        base64_data = base64.b64encode(img_file.read()).decode('utf-8')
        return f"data:image/png;base64,{base64_data}"

# Replace 'file_path.png' with the actual path to your PNG file
file_path = 'file_path.png'
data_uri = image_to_base64_data_uri(file_path)

messages = [
    {"role": "system", "content": "You are an assistant who perfectly describes images."},
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": data_uri }},
            {"type" : "text", "text": "Describe this image in detail please."}
        ]
    }
]

Speculative Decoding

llama-cpp-python supports speculative decoding which allows the model to generate completions based on a draft model.

The fastest way to use speculative decoding is through the LlamaPromptLookupDecoding class.

Just pass this as a draft model to the Llama class during initialization.

from llama_cpp import Llama
from llama_cpp.llama_speculative import LlamaPromptLookupDecoding

llama = Llama(
    model_path="path/to/model.gguf",
    draft_model=LlamaPromptLookupDecoding(num_pred_tokens=10) # num_pred_tokens is the number of tokens to predict 10 is the default and generally good for gpu, 2 performs better for cpu-only machines.
)

Embeddings

To generate text embeddings use create_embedding or embed. Note that you must pass embedding=True to the constructor upon model creation for these to work properly.

import llama_cpp

llm = llama_cpp.Llama(model_path="path/to/model.gguf", embedding=True)

embeddings = llm.create_embedding("Hello, world!")

# or create multiple embeddings at once

embeddings = llm.create_embedding(["Hello, world!", "Goodbye, world!"])

There are two primary notions of embeddings in a Transformer-style model: token level and sequence level. Sequence level embeddings are produced by "pooling" token level embeddings together, usually by averaging them or using the first token.

Models that are explicitly geared towards embeddings will usually return sequence level embeddings by default, one for each input string. Non-embedding models such as those designed for text generation will typically return only token level embeddings, one for each token in each sequence. Thus the dimensionality of the return type will be one higher for token level embeddings.

It is possible to control pooling behavior in some cases using the pooling_type flag on model creation. You can ensure token level embeddings from any model using LLAMA_POOLING_TYPE_NONE. The reverse, getting a generation oriented model to yield sequence level embeddings is currently not possible, but you can always do the pooling manually.

Adjusting the Context Window

The context window of the Llama models determines the maximum number of tokens that can be processed at once. By default, this is set to 512 tokens, but can be adjusted based on your requirements.

For instance, if you want to work with larger contexts, you can expand the context window by setting the n_ctx parameter when initializing the Llama object:

llm = Llama(model_path="./models/7B/llama-model.gguf", n_ctx=2048)

OpenAI Compatible Web Server

llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).

To install the server package and get started:

pip install 'llama-cpp-python[server]'
python3 -m llama_cpp.server --model models/7B/llama-model.gguf

Similar to Hardware Acceleration section above, you can also install with GPU (cuBLAS) support like this:

CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]'
python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers 35

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

To bind to 0.0.0.0 to enable remote connections, use python3 -m llama_cpp.server --host 0.0.0.0. Similarly, to change the port (default is 8000), use --port.

You probably also want to set the prompt format. For chatml, use

python3 -m llama_cpp.server --model models/7B/llama-model.gguf --chat_format chatml

That will format the prompt according to how model expects it. You can find the prompt format in the model card. For possible options, see llama_cpp/llama_chat_format.py and look for lines starting with "@register_chat_format".

If you have huggingface-hub installed, you can also use the --hf_model_repo_id flag to load a model from the Hugging Face Hub.

python3 -m llama_cpp.server --hf_model_repo_id Qwen/Qwen1.5-0.5B-Chat-GGUF --model '*q8_0.gguf'

Web Server Features

Docker image

A Docker image is available on GHCR. To run the server:

docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL=/models/llama-model.gguf ghcr.io/abetlen/llama-cpp-python:latest

Docker on termux (requires root) is currently the only known way to run this on phones, see termux support issue

Low-level API

API Reference

The low-level API is a direct ctypes binding to the C API provided by llama.cpp. The entire low-level API can be found in llama_cpp/llama_cpp.py and directly mirrors the C API in llama.h.

Below is a short example demonstrating how to use the low-level API to tokenize a prompt:

import llama_cpp
import ctypes
llama_cpp.llama_backend_init(False) # Must be called once at the start of each program
params = llama_cpp.llama_context_default_params()
# use bytes for char * params
model = llama_cpp.llama_load_model_from_file(b"./models/7b/llama-model.gguf", params)
ctx = llama_cpp.llama_new_context_with_model(model, params)
max_tokens = params.n_ctx
# use ctypes arrays for array params
tokens = (llama_cpp.llama_token * int(max_tokens))()
n_tokens = llama_cpp.llama_tokenize(ctx, b"Q: Name the planets in the solar system? A: ", tokens, max_tokens, llama_cpp.c_bool(True))
llama_cpp.llama_free(ctx)

Check out the examples folder for more examples of using the low-level API.

Documentation

Documentation is available via https://llama-cpp-python.readthedocs.io/. If you find any issues with the documentation, please open an issue or submit a PR.

Development

This package is under active development and I welcome any contributions.

To get started, clone the repository and install the package in editable / development mode:

git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python

# Upgrade pip (required for editable mode)
pip install --upgrade pip

# Install with pip
pip install -e .

# if you want to use the fastapi / openapi server
pip install -e .[server]

# to install all optional dependencies
pip install -e .[all]

# to clear the local build cache
make clean

You can also test out specific commits of lama.cpp by checking out the desired commit in the vendor/llama.cpp submodule and then running make clean and pip install -e . again. Any changes in the llama.h API will require changes to the llama_cpp/llama_cpp.py file to match the new API (additional changes may be required elsewhere).

FAQ

Are there pre-built binaries / binary wheels available?

The recommended installation method is to install from source as described above. The reason for this is that llama.cpp is built with compiler optimizations that are specific to your system. Using pre-built binaries would require disabling these optimizations or supporting a large number of pre-built binaries for each platform.

That being said there are some pre-built binaries available through the Releases as well as some community provided wheels.

In the future, I would like to provide pre-built binaries and wheels for common platforms and I'm happy to accept any useful contributions in this area. This is currently being tracked in #741

How does this compare to other Python bindings of llama.cpp?

I originally wrote this package for my own use with two goals in mind:

  • Provide a simple process to install llama.cpp and access the full C API in llama.h from Python
  • Provide a high-level Python API that can be used as a drop-in replacement for the OpenAI API so existing apps can be easily ported to use llama.cpp

Any contributions and changes to this package will be made with these goals in mind.

License

This project is licensed under the terms of the MIT license.

llama-cpp-python's People

Contributors

abetlen avatar aniljava avatar bretello avatar c0sogi avatar caiyesd avatar cisc avatar dependabot[bot] avatar earonesty avatar fakerybakery avatar gjmulder avatar iamlemec avatar isydmr avatar janvdp avatar jeffrey-fong avatar jm12138 avatar kddubey avatar khimaros avatar maximilian-winter avatar millionthodin16 avatar niek avatar phiharri avatar pipboyguy avatar player1537 avatar rgerganov avatar sagsmug avatar smartappli avatar stonelinks avatar th-neu avatar twaka avatar zocainviken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llama-cpp-python's Issues

Investigate using `make` instead of `cmake` to build shared library

It's been pointed out that make may be better supported by llama.cpp (on some platforms). We're currently using scikit-build to build the shared library on installation with cmake but it also supports make.

Additional note as pointed out in #32 we should support passing environment variables in both settings.

Implement `logprobs`

logprobs return format should match OpenAI API. Currently calling a Llama instance with logprobs enabled just returns a list of floats.

Example of the correct format:

"logprobs": {
    "text_offset": [
        11,
        12,
        13,
        14,
        15,
        17,
        18,
        20,
        21,
        23,
        24,
        26,
        27,
        29,
        30,
        32
    ],
    "token_logprobs": [
        -0.028534053,
        -0.0013638621,
        -0.0001191709,
        -0.037809037,
        -0.008346983,
        -1.3900239e-05,
        -6.0395385e-05,
        -2.462996e-05,
        -5.4432137e-05,
        -4.3108244e-05,
        -6.0395385e-05,
        -4.382537e-05,
        -4.489638e-05,
        -4.751897e-05,
        -0.00017937786,
        -7.314978e-05
    ],
    "tokens": [
        "\n",
        "\n",
        "1",
        ",",
        " 2",
        ",",
        " 3",
        ",",
        " 4",
        ",",
        " 5",
        ",",
        " 6",
        ",",
        " 7",
        ","
    ],
    "top_logprobs": [
        {
            "\n": -0.028534053,
            "\n\n": -5.3414392,
            " (": -6.8118296,
            " in": -4.9322805,
            ":": -5.6061873
        },
        {
            "\n": -0.0013638621,
            " \u00a7\u00a7": -8.594428,
            "//": -9.296644,
            "1": -9.727121,
            "Count": -9.291412
        },
        {
            " 1": -10.996209,
            "\"": -12.673454,
            "#": -12.253096,
            "1": -0.0001191709,
            "One": -9.39247
        },
        {
            " -": -6.4947214,
            " 2": -7.7675867,
            ")": -8.327954,
            ",": -0.037809037,
            ".": -3.3655276
        },
        {
            "\n": -14.826643,
            " ": -10.675518,
            " 2": -0.008346983,
            " two": -16.126537,
            "2": -4.792885
        },
        {
            " ,": -11.469002,
            " 3": -12.7872095,
            ",": -1.3900239e-05,
            ".": -14.724538,
            "<|endoftext|>": -15.308233
        },
        {
            " ": -12.118958,
            " 3": -6.0395385e-05,
            " three": -17.906118,
            "3": -9.814757,
            "<|endoftext|>": -15.049129
        },
        {
            " ,": -10.729593,
            " 4": -14.016008,
            ",": -2.462996e-05,
            ".": -14.297305,
            "<|endoftext|>": -13.67176
        },
        {
            " ": -11.351273,
            " 4": -5.4432137e-05,
            "4": -10.086686,
            "<|endoftext|>": -13.919009,
            "\u00a0": -16.80569
        },
        {
            " ,": -10.206355,
            " 5": -12.87644,
            ",": -4.3108244e-05,
            ".": -13.588498,
            "<|endoftext|>": -13.03574
        },
        {
            " ": -11.478045,
            " 5": -6.0395385e-05,
            "5": -9.931537,
            "<|endoftext|>": -13.568035,
            "\u00a0": -16.266188
        },
        {
            " ,": -10.160495,
            " 6": -12.964705,
            ",": -4.382537e-05,
            ".": -14.101328,
            "<|endoftext|>": -13.08568
        },
        {
            " ": -11.344849,
            " 6": -4.489638e-05,
            "6": -10.329956,
            "<|endoftext|>": -14.879237,
            "\u00a0": -16.98358
        },
        {
            " ,": -10.096309,
            " 7": -12.389179,
            ",": -4.751897e-05,
            ".": -13.817777,
            "<|endoftext|>": -13.860558
        },
        {
            " ": -11.630913,
            " 7": -0.00017937786,
            " seven": -16.613815,
            "7": -8.680304,
            "<|endoftext|>": -14.859097
        },
        {
            " ,": -9.754253,
            " 8": -11.516983,
            ",": -7.314978e-05,
            ".": -13.250221,
            "<|endoftext|>": -12.703088
        }
    ]
}

Long duration until generation starts with big context

When just saying like "Hello, who are you?", I get like 200ms/token and it starts generating almost instantly.
On the other hand, when I paste a small text (e.g. search results from duck duck go api) and ask a question to it, I have to wait +- 1min and then it generates but quite slow. Is this normal behaviour?

My cpu is a ryzen 7 6800h and 32gb ddr5 ram. I'm running vicuna 7b.
I paste the search result context from the python bindings.

[Question] How to use kv cache?

Hello!

I have been trying to test the new kv cache loading and ran into an issue, it seems to segfault when running llama_eval.
To save the current cache i do:

import llama_cpp
import pickle
from ctypes import cast
# Some work...
kv_tokens = llama_cpp.llama_get_kv_cache_token_count(ctx)
kv_len = llama_cpp.llama_get_kv_cache_size(ctx)
kv_cache = llama_cpp.llama_get_kv_cache(ctx) 
kv_cache = cast(kv_cache, llama_cpp.POINTER(llama_cpp.c_uint8 * kv_len))
kv_cache = bytearray(kv_cache)
with open("test.bin", "wb") as f:
    pickle.dump([kv_cache,kv_tokens], f)

Loading:

with open("test.bin", "rb") as f:
    kv_cache, kv_tokens = pickle.load(f)
    llama_cpp.llama_set_kv_cache(ctx, 
	    (llama_cpp.c_uint8 * len(kv_cache)).from_buffer(kv_cache),
	    len(kv_cache),
	    kv_tokens
    )

But running llama_cpp.llama_eval after will result in a segfault.

llama-cpp-python version: 0.1.16

How do i fix this?
Thanks

Chat does not remember initial prompt as well as llama.cpp

When feeding llama.cpp main app with an initial prompt from a file (--file parameter) like this:

./main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct --file 'prompt.txt' --keep -1 --n_predict -1 --mlock -m models/ggml-vicuna-13b-4bit.bin

... it remebers the character name that is in that file for quite a long time. But it seams that when using low_level_api_chat_cpp.py like this:

python3 low_level_api_chat_cpp.py --mlock --color --interactive-first --interactive-start -r "### Human:" -ins -c 2048 -i --repeat_penalty 1.2 --temp 0 --n_parts -1 --ignore-eos --keep -1 --n_predict -1 --file '/Users/admin/scripts/llama-cpp-python/examples/low_level_api/prompt.txt' -m ../../llama.cpp/models/ggml-vicuna-13b-4bit.bin

it forgets the name quicker, even if I use the keep -1 parameter. It seams to be worse at remembering context overall.

Any way to make it remember context for longer as in llama.cpp main app?

Unexpected output

EDIT: I'm running this on an M1 Macbook. Using the model directly works as expected, but running it through Python gives me this output. The .dylib binary is built from source too.

Do you know what could be giving me this output? Using the model without the bindings works as expected...

  "id": "cmpl-f49883d5-e368-4fa0-a4fa-bf758daa1831",
  "object": "text_completion",
  "created": 1680203705,
  "model": "ggml-model-q4_0-new.bin",
  "choices": [
    {
      "text": "Question: What are the names of the planets in the solar system? Answer: \u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 48,
    "total_tokens": 67
  }
}

Installation on Windows failed in building wheel,UnicodeDecodeError

Running pip install llama-cpp-python==0.1.23

Collecting llama-cpp-python==0.1.23
  Using cached llama_cpp_python-0.1.23.tar.gz (530 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  ร— Getting requirements to build wheel did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [17 lines of output]
      Traceback (most recent call last):
        File "I:\Python\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "I:\Python\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "I:\Python\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\31415\AppData\Local\Temp\pip-build-env-2r2v_z25\overlay\Lib\site-packages\setuptools\build_meta.py", line 338, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "C:\Users\31415\AppData\Local\Temp\pip-build-env-2r2v_z25\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in _get_build_requires
          self.run_setup()
        File "C:\Users\31415\AppData\Local\Temp\pip-build-env-2r2v_z25\overlay\Lib\site-packages\setuptools\build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 6, in <module>
        File "I:\Python\lib\pathlib.py", line 1135, in read_text
          return f.read()
      UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 4: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

ร— Getting requirements to build wheel did not run successfully.
โ”‚ exit code: 1
โ•ฐโ”€> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

hi, I need help:

Hello, I need help, I am a very beginner programmer and I understand almost nothing, but could someone explain to me step by step how to use and execute the AI? I don't understand how to do it

[WinError 193] When trying to run the high level API example with vicuna

I ran pip install llama-cpp-python and the installation was a success, then I created a python file and copied over the example text in the readme.
The only change I made was the model path to the vicuna model I am using and when I try to run the script I end up getting this error:

image

  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama_cpp.py", line 36, in _load_shared_library
    return ctypes.CDLL(str(_lib_path))
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: [WinError 193] %1 is not a valid Win32 application

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Chula\Desktop\code_projects\Vicuna_1\test.py", line 1, in <module>
    from llama_cpp import Llama
  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\__init__.py", line 1, in <module>
    from .llama_cpp import *
  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama_cpp.py", line 46, in <module>
    _lib = _load_shared_library(_lib_base_name)
  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama_cpp.py", line 38, in _load_shared_library
    raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}")
RuntimeError: Failed to load shared library 'C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama.dll': [WinError 193] %1 is not a valid Win32 application
Press any key to continue . . .

I'm fairly new to all this so may have done something wrong but can seem to find a fix anywhere.

[Investigate] Custom `llama.dll` Dependency Resolution Issues on Windows

This is a note for using a custom llama.dll build on Windows. I ran into dependency resolution issues with loading my own llama.dll compiled with BLAS support and some extra hardware specific optimization flags. No matter what I do, it can't seem to locate all of its dependencies, even though I've tried placing them in system paths and even same dir.

My current workaround is using the default llama.dll that llama-cpp-python builds, but it doesn't have the hardware optimizations and BLAS compatibility that I enabled in my custom build. So, I'm still trying to figure out what my issue is. Maybe something python specific that i'm missing...

I'm dropping this issue here just in case anyone else runs into something similar. If you have any ideas or workarounds, let me know. I'll keep trying to figure it out until I get it resolved haha :)

[Feature] Dynamic Model Loading and Model Endpoint in FastAPI

I'd like to propose a future feature I think would add useful flexibility for users of the completions/embeddings API . I'm suggesting the ability to dynamically load models based on calls to the FastAPI endpoint.

The concept is as follows:

  • Have a predefined location for model files (e.g., a models folder within the project) and allow users to specify an additional model folder if needed.
  • When the API starts, it checks the designated model folders and populates the available models dynamically.
  • Users can query the available models through a GET request to the /v1/engines endpoint , which would return a list of models and their statuses.
  • Users can then specify the desired model when making inference requests.

This dynamic model loading feature would align with the behavior of the OpenAI spec for models and model status. It would offer users the flexibility to easily choose and use different models without having make manual changes to the project or configs.

This is a suggestion for later, but I wanted to suggest it now so we can plan if we do decide to implement it.

Let me know your thoughts :)

[Windows] [Windows] "Failed building wheel for llama-cpp-python

Edit : For now i've installed the wheel from "https://github.com/Loufe/llama-cpp-python/blob/main/wheels/llama_cpp_python-0.1.26-cp310-cp310-win_amd64.whl". The installation of the wheel works. So everything is fine for me. Got things working also in WSL with no issue.
I would still be happy to build the wheel myself, first as a learning experience, to understand what I did wrong, and secondly, because if I understood well from "#40", it might lead to better performance If I compile it myself ? "The issue is that the binaries will likely not be built with the correct optimizations for the users particular CPU which will likely result in much worse performance than the user expects."
Though maybe I did not understood correctly and it doesn't matter.
I leave the issue in case it might be useful to someone, or in case someone wants to try to help me build the wheel for fun.

Hi !
I've been trying to install this package for a while, but I can't get it working on windows.

When I run "pip install llama-cpp-python", I get the following errors :

(short version, i'll put the full output at the end of the message)

ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

It seems that it is trying to find a C compiler and then build the wheel for the library (as far as I understand it).
At some point it seems to find one :

-- Trying 'Visual Studio 16 2019 x64 v142' generator - success

But then it seems to fail :


CMake Error at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-m7g4zo_5/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
        The C compiler
          "C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe"
        is not able to compile a simple test program.

And then give me a very long output full of directory paths to explain to me why it failed.

Close to the end I can see an error that might be relevant :

An error occurred while configuring with CMake.
 Command: 'C:\Users\Antoine\AppData\Local\Temp\pip-build-env-m7g4zo_5\overlay\Lib\site-packages\cmake\data\bin/cmake.exe' 'C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7' -G 'Visual Studio 16 2019' '-DCMAKE_INSTALL_PREFIX:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7\_skbuild\win-amd64-3.10\cmake-install' -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE '-DCMAKE_MODULE_PATH:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-build-env-m7g4zo_5\overlay\Lib\site-packages\skbuild\resources\cmake' '-DPYTHON_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPYTHON_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' '-DPYTHON_LIBRARY:PATH=D:\Anaconda\envs\textgen\libs\python310.lib' '-DPython_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython_FIND_REGISTRY:STRING=NEVER '-DPython3_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython3_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython3_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython3_FIND_REGISTRY:STRING=NEVER -T v142 -A x64 -DCMAKE_BUILD_TYPE:STRING=Release
        Source directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7
        Working directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7\_skbuild\win-amd64-3.10\cmake-build
      Please see CMake's output for more information. 

But I don't really know what to make of it.
I would really love to understand how to make it work on windows, but I lack knowledge on building wheels.
I've tried :
- Upgrading pip and setup tools
- I installed Visual studio AND build tools for C++ (therefore I have cmake on my computer, but I don't know if it's even used when trying to build the wheel considering the previous output...)
- Find 'CMake's output for more information.', but I have no idea where to find it and google didn't helped me on that one.
- Downloading the repo and trying to build it using cmake, but maybe I did it wrong :

(textgen) PS F:\ChatBots\text-generation-webui\repositories\GPTQ-for-LLaMa\cmaketentative\llama-cpp-python> cmake ./ -B./build
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
CMake Error at CMakeLists.txt:21 (add_subdirectory):
  The source directory

    F:/ChatBots/text-generation-webui/repositories/GPTQ-for-LLaMa/cmaketentative/llama-cpp-python/vendor/llama.cpp

  does not contain a CMakeLists.txt file.


CMake Error at CMakeLists.txt:22 (install):
  install TARGETS given target "llama" which does not exist.


-- Configuring incomplete, errors occurred!


-Tried doing the same thing with cygwin after installing it

I've tried since yesterday to make it work but I can't figure it out. Is there someone that could help me get this working on my windows machine ?

Thank you very much in advance.

Additional info :
I'm trying to install it in a conda environment named 'textgen", but not sure it is relevant.

Full error output :


(textgen) PS F:\ChatBots\text-generation-webui\repositories\GPTQ-for-LLaMa> pip install llama-cpp-python
Collecting llama-cpp-python
  Using cached llama_cpp_python-0.1.27.tar.gz (529 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in d:\anaconda\envs\textgen\lib\site-packages (from llama-cpp-python) (4.5.0)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  ร— Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [263 lines of output]


      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 17 2022

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
      -- The C compiler identification is MSVC 19.29.30148.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- The CXX compiler identification is MSVC 19.29.30148.0
      CMake Warning (dev) at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-yc585726/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCXXCompiler.cmake:168 (if):
        Policy CMP0054 is not set: Only interpret if() arguments as variables or
        keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
        details.  Use the cmake_policy command to set the policy and suppress this
        warning.

        Quoted variables like "MSVC" will no longer be dereferenced when the policy
        is set to NEW.  Since the policy is not set the OLD behavior will be used.
      Call Stack (most recent call first):
        CMakeLists.txt:4 (ENABLE_LANGUAGE)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-yc585726/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCXXCompiler.cmake:189 (elseif):
        Policy CMP0054 is not set: Only interpret if() arguments as variables or
        keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
        details.  Use the cmake_policy command to set the policy and suppress this
        warning.

        Quoted variables like "MSVC" will no longer be dereferenced when the policy
        is set to NEW.  Since the policy is not set the OLD behavior will be used.
      Call Stack (most recent call first):
        CMakeLists.txt:4 (ENABLE_LANGUAGE)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Configuring done (11.1s)
      -- Generating done (0.0s)
      -- Build files have been written to: C:/Users/Antoine/AppData/Local/Temp/pip-install-kab8dxp_/llama-cpp-python_6aab703992964fd9953365ad8cceacea/_cmake_test_compile/build
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator - success
      --------------------------------------------------------------------------------

      Configuring Project
        Working directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build
        Command:
          'C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\cmake\data\bin/cmake.exe' 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea' -G 'Visual Studio 16 2019' '-DCMAKE_INSTALL_PREFIX:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-install' -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE '-DCMAKE_MODULE_PATH:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\resources\cmake' '-DPYTHON_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPYTHON_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' '-DPYTHON_LIBRARY:PATH=D:\Anaconda\envs\textgen\libs\python310.lib' '-DPython_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython_FIND_REGISTRY:STRING=NEVER '-DPython3_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython3_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython3_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython3_FIND_REGISTRY:STRING=NEVER -T v142 -A x64 -DCMAKE_BUILD_TYPE:STRING=Release

      -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
      -- The C compiler identification is MSVC 19.29.30148.0
      -- The CXX compiler identification is MSVC 19.29.30148.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - failed
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - broken
      CMake Error at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-yc585726/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
        The C compiler

          "C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe"

        is not able to compile a simple test program.

        It fails with the following output:

          Change Dir: C:/Users/Antoine/AppData/Local/Temp/pip-install-kab8dxp_/llama-cpp-python_6aab703992964fd9953365ad8cceacea/_skbuild/win-amd64-3.10/cmake-build/CMakeFiles/CMakeScratch/TryCompile-gyt569

          Run Build Command(s):C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/MSBuild/Current/Bin/MSBuild.exe cmTC_903a4.vcxproj /p:Configuration=Debug /p:Platform=x64 /p:VisualStudioVersion=16.0 /v:n && Microsoft (R) Build Engine version 16.11.2+f32259642 pour .NET Framework
          Copyright (C) Microsoft Corporation. Tous droits rรƒยฉservรƒยฉs.

          La gรƒยฉnรƒยฉration a dรƒยฉmarrรƒยฉ 09/04/2023 16:06:07.
          Projet "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" sur le noud 1 (cibles par dรƒยฉfaut).
          PrepareForBuild:
            Crรƒยฉation du rรƒยฉpertoire "cmTC_903a4.dir\Debug\".
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppBuild.targets(517,5): warning MSB8029: Le rรƒยฉpertoire intermรƒยฉdiaire ou le rรƒยฉpertoire de sortie ne peut pas se trouver sous le rรƒยฉpertoire temporaire car cela risque de crรƒยฉer des problรƒยจmes avec la gรƒยฉnรƒยฉration incrรƒยฉmentielle. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
            Crรƒยฉation du rรƒยฉpertoire "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\Debug\".
            Crรƒยฉation du rรƒยฉpertoire "cmTC_903a4.dir\Debug\cmTC_903a4.tlog\".
          InitializeBuildStatus:
            Crรƒยฉation de "cmTC_903a4.dir\Debug\cmTC_903a4.tlog\unsuccessfulbuild", car "AlwaysCreate" a รƒยฉtรƒยฉ spรƒยฉcifiรƒยฉ.
          ClCompile:
            C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\CL.exe /c /Zi /W1 /WX- /diagnostics:column /Od /Ob0 /D _MBCS /D WIN32 /D _WINDOWS /D "CMAKE_INTDIR=\"Debug\"" /Gm- /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /Fo"cmTC_903a4.dir\Debug\\" /Fd"cmTC_903a4.dir\Debug\vc142.pdb" /external:W1 /Gd /TC /errorReport:queue "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\testCCompiler.c"
            Compilateur d'optimisation Microsoft (R) C/C++ versionร‚ย 19.29.30148 pour x64
            testCCompiler.c
            Copyright (C) Microsoft Corporation. Tous droits rรƒยฉservรƒยฉs.
            cl /c /Zi /W1 /WX- /diagnostics:column /Od /Ob0 /D _MBCS /D WIN32 /D _WINDOWS /D "CMAKE_INTDIR=\"Debug\"" /Gm- /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /Fo"cmTC_903a4.dir\Debug\\" /Fd"cmTC_903a4.dir\Debug\vc142.pdb" /external:W1 /Gd /TC /errorReport:queue "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\testCCompiler.c"
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003: Impossible d'exรƒยฉcuter la tรƒยขche exรƒยฉcutable spรƒยฉcifiรƒยฉe "CL.exe". System.IO.DirectoryNotFoundException: Impossible de trouver une partie du chemin d'accรƒยจs 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.dir\Debug\cmTC_903a4.tlog'. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.FileSystemEnumerableIterator`1.CommonInit() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.Directory.GetFiles(String path, String searchPattern) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.TrackedDependencies.ExpandWildcards(ITaskItem[] expand) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles.InternalConstruct(ITask ownerTask, ITaskItem[] tlogFiles, Boolean constructOutputsFromTLogs) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles..ctor(ITaskItem[] tlogFiles) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.CPPTasks.CL.PostExecuteTool(Int32 exitCode) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.CPPTasks.TrackedVCToolTask.ExecuteTool(String pathToTool, String responseFileCommands, String commandLineCommands) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.ToolTask.Execute() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          Gรƒยฉnรƒยฉration du projet "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" terminรƒยฉe (cibles par dรƒยฉfaut) -- รƒโ€ฐCHEC.

          รƒโ€ฐCHEC de la build.

          "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" (cible par dรƒยฉfaut) (1) ->
          (PrepareForBuild cible) ->
            C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppBuild.targets(517,5): warning MSB8029: Le rรƒยฉpertoire intermรƒยฉdiaire ou le rรƒยฉpertoire de sortie ne peut pas se trouver sous le rรƒยฉpertoire temporaire car cela risque de crรƒยฉer des problรƒยจmes avec la gรƒยฉnรƒยฉration incrรƒยฉmentielle. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]


          "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" (cible par dรƒยฉfaut) (1) ->
          (ClCompile cible) ->
            C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003: Impossible d'exรƒยฉcuter la tรƒยขche exรƒยฉcutable spรƒยฉcifiรƒยฉe "CL.exe". System.IO.DirectoryNotFoundException: Impossible de trouver une partie du chemin d'accรƒยจs 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.dir\Debug\cmTC_903a4.tlog'. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.FileSystemEnumerableIterator`1.CommonInit() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  System.IO.Directory.GetFiles(String path, String searchPattern) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.TrackedDependencies.ExpandWildcards(ITaskItem[] expand) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles.InternalConstruct(ITask ownerTask, ITaskItem[] tlogFiles, Boolean constructOutputsFromTLogs) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles..ctor(ITaskItem[] tlogFiles) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.CPPTasks.CL.PostExecuteTool(Int32 exitCode) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.CPPTasks.TrackedVCToolTask.ExecuteTool(String pathToTool, String responseFileCommands, String commandLineCommands) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    รƒย  Microsoft.Build.Utilities.ToolTask.Execute() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]

              1 Avertissement(s)
              1 Erreur(s)

          Temps รƒยฉcoulรƒยฉ 00:00:00.82





        CMake will not be able to correctly generate this project.
      Call Stack (most recent call first):
        CMakeLists.txt:3 (project)


      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\setuptools_wrap.py", line 634, in setup
          env = cmkr.configure(
        File "C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\cmaker.py", line 332, in configure
          raise SKBuildError(

      An error occurred while configuring with CMake.
        Command:
          'C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\cmake\data\bin/cmake.exe' 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea' -G 'Visual Studio 16 2019' '-DCMAKE_INSTALL_PREFIX:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-install' -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE '-DCMAKE_MODULE_PATH:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\resources\cmake' '-DPYTHON_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPYTHON_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' '-DPYTHON_LIBRARY:PATH=D:\Anaconda\envs\textgen\libs\python310.lib' '-DPython_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython_FIND_REGISTRY:STRING=NEVER '-DPython3_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython3_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython3_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython3_FIND_REGISTRY:STRING=NEVER -T v142 -A x64 -DCMAKE_BUILD_TYPE:STRING=Release
        Source directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea
        Working directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build
      Please see CMake's output for more information.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Standalone Server

Since the server is one of the goals / highlights of this project. I'm planning to move it into a subpackage e.g. llama-cpp-python[server] or something like that.

Work that needs to be done first:

  • Ensure compatibility with OpenAI
    • Response objects match
    • Request objects match
    • Loaded model appears under /v1/models endpoint
    • Test OpenAI client libraries
    • Unsupported parameters should be silently ignored
  • Ease-of-use
    • Integrate server as a subpackage
    • CLI tool to run the server

Future work

  • Prompt caching to improve latency
  • Support multiple models in the same server
  • Add tokenization endpoints to make it easier to make it easier for small clients to calculate context window sizes

Incredibly slow response time

Hello.
I am still new to llama-cpp and I was wondering if it was normal that it takes an incredibly long time to respond to my prompt.

Fyi, I am assuming it runs on my CPU, here are my specs:

  • I have 16.0Gb of RAM
  • I am using an AMD Ryzen 7 1700X Eight-Core Processor rated at 3.40Ghz
  • Just in case, my GPU is a NVIDIA GeForce RTX 2070 SUPER.

Everything else seems to work fine, the model could be load correctly (Or at least, it seems to be).
I did a first test using the code showcased in the README.md

from llama_cpp import Llama
llm = Llama(model_path="models/7B/...")
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

which returned me this:

image
The output is what I expected (Even though Uranus, Neptune and Pluto were missing), but when I see the total time, it is extremely long (1124707.08ms, 18 minutes).

I did this second code in order to try a bit to see what could be causing the insanely long response time but I don't know what's going on.

from llama_cpp import Llama
import time
print("Model loading")
llm = Llama(model_path="./model/ggml-model-q4_0_new.bin")

while True:
    prompt = input("Prompt> ")
    start_time = time.time()

    prompt = f"Q: {prompt} A: "
    print("Your prompt:", prompt, "Start time:", start_time)

    output = llm(prompt, max_tokens=1, stop=["Q:", "\n"], echo=True)
    print("Output:", output)
    print("End time:", time.time())
    print("--- Prompt reply duration: %s seconds ---" % (time.time() - start_time))

I may have done things wrong since I am still new to all of this, but do any of you have any idea on how I could speed up the process? I searched for solutions through google, github and different forums, but nothing seems to work.

PS: For those interested in the CLI output when it loads the model:

llama_model_load: loading model from './model/ggml-model-q4_0_new.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from './model/ggml-model-q4_0_new.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  512.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

I apologize in advance if my english doesn't make sense sometimes, it is not my native language.
Thanks in advance for the help, regards. ๐Ÿ‘‹

Investigate model aliasing

Allow the user to alias their local models to OpenAI model names as many tools have those hard-coded.

This may cause unexpected issues with tokenization mismatches.

Error running on M1 Mac

Hi!

I am having issues with using it on a M1 Mac:

from llama_cpp import Llama
produces this error:

zsh: illegal hardware instruction

Best,
Benjamin

On m1 mac, after install, running and navigating to local URL I get error "{"detail":"Not Found"}"

Everything worked without errors but the web page says:
{"detail":"Not Found"}

Console shows the following:
INFO: Started server process [4194]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO: ::1:58413 - "GET / HTTP/1.1" 404 Not Found

The only thing I didn't understand about the install process was what was supposed to go in the brackets for [server] in the command pip "install llama-cpp-python[server]". Could that have caused a problem?

llama.cpp works perfectly standalone BTW.

Interactive mode/Session

Hey there sorry if this obviously possible/easy, but is it possible right now to use llama.cpp's interactive mode and if so how?

Possible Typo in fastapi_server.py: n_ctx vs n_batch

When running the server in fastapi_server.py I noticed a possible typo in the configuration of the llama_cpp.Llama instance.

Here is the relevant code:

llama = llama_cpp.Llama(
    settings.model,
    f16_kv=True,
    use_mlock=True,
    embedding=True,
    n_threads=6,
    n_batch=2048,     <--- Should be n_ctx=2048
)

It appears that n_batch is set to 2048, but I believe it might be intended to set n_ctx to 2048 instead. When I tried to run the code as is, I encountered an exception due to the assert for ctx being None during generation. Changing n_batch to n_ctx resolved the issue.

Also, default batch size is 8, so 2048 seems a bit high :)

Awesome job!!!

https://github.com/abetlen/llama-cpp-python/blob/b9a4513363267dcc1f4b77d709ac3333fc889c6e/examples/fastapi_server.py#LL36C5-L36C5

[Question] Drop in replacement for OpenAI

I notice that you mentioned your goal of creating a drop in replacement for OpenAI. Awesome job! This is super helpful to have and especially with your demo using fastAPI.

I'm looking at langchain right now, and I see you have implemented most, if not all, of the OpenAI API including streaming. Since it got official integration with langchain today, and I'm getting ready to get the integration working with streaming as literally a drop in for OpenAI in langchain. Do you already have this done? Just trying to see what your goals are in the near future for this package :)

Kudos on a great job! Need a little help with BLAS

Let me first congratulate everyone working on this for:

  1. Python bindings for llama.cpp
  2. Making them compatible with openai's api
  3. Superb documentation!

Was wondering if anyone can help me get this working with BLAS? Right now when the model loads, I see BLAS=0.
I've been using kobold.cpp, and they have a BLAS flag at compile time which enables BLAS. It cuts down the prompt loading time by 3-4X. This is a major factor in handling longer prompts and chat-style messages.

P.S - Was also wondering what the difference is between create_embedding(input) and embed(input)?

main.py not running on M1 Mac due to llama_context_default_params symbol not found

Things were working fine until i closed my terminal window and opened a new one and starting seeing issues (don't remember the error). I went ahead and did a quick update (via "development") steps in readme and started getting this issue when running python3 -m llama_cpp.server

Traceback (most recent call last):
  File "/Users/kelden/opt/anaconda3/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/kelden/opt/anaconda3/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/kelden/Documents/tiny-leaps/llama-cpp-python/llama_cpp/__init__.py", line 1, in <module>
    from .llama_cpp import *
  File "/Users/kelden/Documents/tiny-leaps/llama-cpp-python/llama_cpp/llama_cpp.py", line 99, in <module>
    _lib.llama_context_default_params.argtypes = []
  File "/Users/kelden/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
    func = self.__getitem__(name)
  File "/Users/kelden/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x308a36490, llama_context_default_params): symbol not found

I've gone in and done make in llama.cpp again, run the develop script again and again to no avail. deleted the .so file and rebuilt it multiple times, made sure the MODEL variable is set properly too :/ what am i doing wrong

Interactive mode with Llama class

  • Similar to interactive mode in llama.cpp.
  • Changes should not effect __call__ and the create_* method behaviour.
  • Should support max_tokens / infinite generation, eos / ignore_eos, and a reverse prompt.
  • Should support streaming

AttributeError occured

can you help me? plz

LangChain 0.0.136,llama-cpp-python 0.1.31
AttributeError occured at this code.

it didn't happen yesterday.

Code:
from langchain.llms import LlamaCpp
LlmLlama = LlamaCpp(model_path="./ggml-vicuna-13b-4bit.bin")

Error:
AttributeError: 'Llama' object has no attribute 'ctx'

Shared library with base name 'llama' not found, windows

can someone advise me on this issue, windows
from .llama_cpp import * File "C:\Users\usr1\Anaconda3\envs\chatgpt1\lib\site-packages\llama_cpp\llama_cpp.py", line 46, in
_lib = _load_shared_library(_lib_base_name)
File "C:\Users\moham\Anaconda3\envs\chatgpt1\lib\site-packages\llama_cpp\llama_cpp.py", line 40, in _load_shared_library
raise FileNotFoundError(f"Shared library with base name '{lib_base_name}' not found")
FileNotFoundError: Shared library with base name 'llama' not found

Add performance optimization example

I've had some success using scikit-optimize to tune the parameters for the Llama class, can improve token eval performance by around ~50% from just the default parameters. Planning to turn this into a script, it could also be of some use for upstream llama.cpp users.

image

Text generation stops prematurely

Hi!

I have stumbled upon a problem with low_level_api_chat_cpp.py . When asking for generation of longer texts (for example "tell a story about a cat") the generated text is cut off prematurely only after a couple of sentences in the middle of a sentence. Using the same prompt in llama.cpp there is no such problem, the text is generated in it's entirety. Using this command for llama.cpp:

./main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct --mlock -m models/ggml-vicuna-13b-4bit.bin

and this one in llama-cpp-python:

python3 low_level_api_chat_cpp.py --mlock --color --interactive-first --interactive-start -r "### Human:" -ins -c 2048 -i --repeat_penalty 1.2 --temp 0 --n_parts -1 --ignore-eos -m ../../llama.cpp/models/ggml-vicuna-13b-4bit.bin

Is this a bug?

Issue with emoji decoding in steaming mode, only

When the model wants to output an emoji, this error comes up:

Debugging middleware caught exception in streamed response at a point where response headers were already sent. Traceback (most recent call last): File "C:\Users\zblac\AppData\Local\Programs\Python\Python310\lib\site-packages\werkzeug\wsgi.py", line 500, in __next__ return self._next() File "C:\Users\zblac\AppData\Local\Programs\Python\Python310\lib\site-packages\werkzeug\wrappers\response.py", line 50, in _iter_encoded for item in iterable: File "C:\Users\zblac\llama.cpp\test\normal.py", line 37, in vicuna for line in response: File "C:\Users\zblac\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 370, in _create_completion "text": text[start:].decode("utf-8"), UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 0: unexpected end of data

Add Verbose Logging Support to Diagnose Performance Issues

Sorry, this might be totally wrong place to open the issue. Feel free to close.

Anyway, I'm working with a 3rd party project* that uses your awesome wrapper and I'm having problems there, which brings me back here. Everything seems to be working, but not with the speed I expect after using plain llama.cpp. With some prompts it seems to even completely freeze, never completing the task. Could I somehow raise this wrapper's logging level to make it more verbose, so I could see in real-time as it works?

* https://github.com/hwchase17/langchain

Support pickling the `Llama` instance

As pointed out by here, the Llama class cannot currently be pickled because it has pointers to C memory addresses. To implement this we'll need to write custom __getstate__ and / or __reduce__ methods for pickling as well as a __setstate__ methods for unpickling

References

[Question] Purpose of completion ID field

I have a question about the id field in the data returned from the completions endpoint. I see that there's a unique ID that identifies what completion a message is part of, and I'm wondering if this is only data for the client, or whether it has additional functionality.

Eventually I'm hoping to have a a couple different models running on my server and I'm trying to figure out if there's a mechanism that exists for a sort of chat functionality with unique contexts. Llama.cpp recently gained the ability to run multiple instances at once without much overhead, so I'm looking for a way to keep a unique context between a couple conversation 'threads'.

Is there any mechanism, or is there a plan for one? Just want to make sure I'm not missing something if it's built already xD

{
  "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "object": "text_completion",
  "created": 1679561337,
  "model": "models/7B/...",
  "choices": [
    {
      "text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
      "index": 0,
      "logprobs": None,
      "finish_reason": "stop"
    }
  ],
...
}

make: *** No rule to make target `libllama.so'. Stop.

Configuring Project
        Working directory:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-build
        Command:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/cmake/data/bin/cmake /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006 -G Ninja -DCMAKE_INSTALL_PREFIX:PATH=/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-install -DPYTHON_VERSION_STRING:STRING=3.9.6 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/Users/emmanuel/workspace/code/.venv/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -DPYTHON_LIBRARY:PATH=libpython3.9.a -DPython_EXECUTABLE:PATH=/Users/emmanuel/workspace/code/.venv/bin/python3 -DPython_ROOT_DIR:PATH=/Users/emmanuel/workspace/code/.venv -DPython_INCLUDE_DIR:PATH=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -DPython_FIND_REGISTRY:STRING=NEVER -DPython3_EXECUTABLE:PATH=/Users/emmanuel/workspace/code/.venv/bin/python3 -DPython3_ROOT_DIR:PATH=/Users/emmanuel/workspace/code/.venv -DPython3_INCLUDE_DIR:PATH=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -DPython3_FIND_REGISTRY:STRING=NEVER -DCMAKE_MAKE_PROGRAM:FILEPATH=/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/ninja/data/bin/ninja -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_OSX_DEPLOYMENT_TARGET:STRING=13.0 -DCMAKE_OSX_ARCHITECTURES:STRING=arm64
      
      -- The C compiler identification is AppleClang 14.0.0.14000029
      -- The CXX compiler identification is AppleClang 14.0.0.14000029
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Configuring done (0.3s)
      -- Generating done (0.0s)
      CMake Warning:
        Manually-specified variables were not used by the project:
      
          PYTHON_EXECUTABLE
          PYTHON_INCLUDE_DIR
          PYTHON_LIBRARY
          PYTHON_VERSION_STRING
          Python3_EXECUTABLE
          Python3_FIND_REGISTRY
          Python3_INCLUDE_DIR
          Python3_ROOT_DIR
          Python_EXECUTABLE
          Python_FIND_REGISTRY
          Python_INCLUDE_DIR
          Python_ROOT_DIR
          SKBUILD
      
      
      -- Build files have been written to: /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-build
      [1/2] Generating /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/vendor/llama.cpp/libllama.so
      FAILED: /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/vendor/llama.cpp/libllama.so
      cd /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/vendor/llama.cpp && make libllama.so
      make: *** No rule to make target `libllama.so'.  Stop.
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/setuptools_wrap.py", line 642, in setup
          cmkr.make(make_args, install_target=cmake_install_target, env=env)
        File "/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/cmaker.py", line 679, in make
          self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
        File "/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/cmaker.py", line 710, in make_impl
          raise SKBuildError(
      
      An error occurred while building with CMake.
        Command:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/cmake/data/bin/cmake --build . --target install --config Release --
        Install target:
          install
        Source directory:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006
        Working directory:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-build
      Please check the install target is valid and see CMake's output for more information.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python

can't seem to find the cpp file.

Any suggestions?

Implement caching for evaluated prompts

The goal of this feature is to reduce latency for repeated calls to the chat_completion api by saving the kv_cache keyed by the prompt tokens.

The basic version of this is to simply save the kv_state after the prompt is generated.

Additionally we should investigate if it's possible save and restore the kv_state after the completion has been generated as well.

simple python api example?

I have the server running and everything, but I really fail to understand the documentation at http://localhost:8000/docs.
Is there a simple code example of how I would interact with this from python (flask)?

Like, e.g. my code for querying OpenAI (for which this should be a "drop-in" replacement) is the following, what would be the equiqualent when using llama-cpp-python?

  def get_text_gpt(prompt_persona, prompt, temp=0.8, freqpen=0.0, stop=None, biasdict=None, maxtok=512):
      # make sure mutable default arguments are reset
      biasdict = {} if biasdict is None else biasdict
      stop = "" if stop is None else stop
  
      try:
          response = openai.ChatCompletion.create(
              model="gpt-3.5-turbo",
              temperature=temp,
              max_tokens=maxtok,
              frequency_penalty=freqpen,
              stop=stop,
              logit_bias=biasdict,
              messages=[
                  {"role": "system", "content": prompt_persona},
                  {"role": "user", "content": prompt}]
          )
          message_content = response['choices'][0]['message']['content']
          return (message_content)
  
      except Exception as e:
          error = f"GPT API error: {e}"
          return error

Installation on Windows failed because Visual Studio is not installed

Trying to install with

pip install llama-cpp-python==0.1.23

on Windows in a micromamba environment resulted in the following error. It seems like the package is looking for Visual Studio, which is not installed on my system.

Is it possible to make it such that the package can be installed without the need for Visual Studio?

(C:\Users\me\Downloads\oobabooga-windows\oobabooga-windows\installer_files\env) C:\Users\me\Downloads\oobabooga-windows\oobabooga-windows\text-generation-webui>pip install llama-cpp-python==0.1.23
Collecting llama-cpp-python==0.1.23
  Downloading llama_cpp_python-0.1.23.tar.gz (530 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 530.0/530.0 kB 504.2 kB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in c:\users\me\downloads\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages (from llama-cpp-python==0.1.23) (4.5.0)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  ร— Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [308 lines of output]


      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 17 2022

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 16 2019

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 15 2017 x64 v141' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 15 2017

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 15 2017 x64 v141' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Running

         'nmake' '-?'

        failed with:

         The system cannot find the file specified


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Running

         'nmake' '-?'

        failed with:

         The system cannot find the file specified


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Running

         'nmake' '-?'

        failed with:

         The system cannot find the file specified


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator - failure
      --------------------------------------------------------------------------------

      ********************************************************************************
      scikit-build could not get a working generator for your system. Aborting build.

      Building windows wheels for Python 3.10 requires Microsoft Visual Studio 2022.
      Get it with "Visual Studio 2017":

        https://visualstudio.microsoft.com/vs/

      Or with "Visual Studio 2019":

          https://visualstudio.microsoft.com/vs/

      Or with "Visual Studio 2022":

          https://visualstudio.microsoft.com/vs/

      ********************************************************************************
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Wrong size of embeddings

While playing around, I noticed the embeddings are only 512 floats rather than the 4096 you get when using the standalone application.

So I went digging and I found the culprit which was a copy-paste residue in the function llama_n_embd

def llama_n_embd(ctx: llama_context_p) -> c_int:
return _lib.llama_n_ctx(ctx)

It's calling llama_n_ctx rather than llama_n_embd.

I don't think this warrants a pull request as it is a very easy issue to fix, so I made a simple issue instead.

Keep up the good work :)

Fix unicode decoding error

Certain tokens in the vocabulary cannot be decoded to valid utf-8, I'm actually not sure if this is because they represent partial utf codepoints, but in any case they cause generation to fail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.