Coder Social home page Coder Social logo

deepfindai / gpt-llama.cpp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from keldenl/gpt-llama.cpp

0.0 0.0 0.0 4.08 MB

A llama.cpp drop-in replacement for OpenAI's GPT endpoints, allowing GPT-powered apps to run off local llama.cpp models instead of OpenAI.

License: MIT License

Shell 9.69% JavaScript 90.31%

gpt-llama.cpp's Introduction

gpt-llama.cpp

gpt-llama.cpp logo

discord npm version npm downloads license

Replace OpenAi's GPT APIs with llama.cpp's supported models locally

Demo

Demo GIF Real-time speedy interaction mode demo of using gpt-llama.cpp's API + chatbot-ui (GPT-powered app) running on a M1 Mac with local Vicuna-7B model. See all demos here.

๐Ÿ”ฅ Hot Topics (4/20/2023)

  • ๐Ÿ”ฅ๐Ÿ”ฅ WE MADE A DISCORD CHANNEL, JOIN HERE: https://discord.gg/aWHBQnJaFC ๐Ÿ”ฅ๐Ÿ”ฅ
  • Auto-GPT support Basic support should be complete. Continued optimizatino work..
  • BabyAGI/TeenageAGI support
  • Discord bot!

Description

gpt-llama.cpp is an API wrapper around llama.cpp. It runs a local API server that simulates OpenAI's API GPT endpoints but uses local llama-based models to process requests.

It is designed to be a drop-in replacement for GPT-based applications, meaning that any apps created for use with GPT-3.5 or GPT-4 can work with llama.cpp instead.

The purpose is to enable GPT-powered apps without relying on OpenAI's GPT endpoint and use local models, which decreases cost (free) and ensures privacy (local only).

Tested platforms

  • macOS (ARM)
  • macOS (Intel)
  • Windows
  • Linux (Port :443 blocked by default, may have to change the port to 8000 to get working)

Features

gpt-llama.cpp provides the following features:

  • Drop-in replacement for GPT-based applications
  • Interactive mode supported, which means that requests within the same chat context will have blazing-fast responses
  • Automatic adoption of new improvements from llama.cpp
  • Usage of local models for GPT-powered apps
  • Support for multiple platforms

Supported applications

The following applications (list growing) have been tested and confirmed to work with gpt-llama.cpp:

More applications are currently being tested, and welcome requests for verification or fixes by opening a new issue in the repo.

See all demos here.

Quickstart Installation

Prerequisite

๐Ÿ”ด๐Ÿ”ด โš ๏ธ DO NOT SKIP THIS STEP โš ๏ธ ๐Ÿ”ด๐Ÿ”ด

Setup llama.cpp by following the instructions below. This is based on the llama.cpp README. You may skip if you have llama.cpp set up already.

Mac

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# install Python dependencies
python3 -m pip install -r requirements.txt

Windows

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
  • Then, download the latest release of llama.cpp here I do not know if there is a simple way to tell if you should download avx, avx2 or avx512, but oldest chip for avx and newest chip for avx512, so pick the one that you think will work with your machine. (lets try to automate this step into the future)
  • Extract the contents of the zip file and copy everything in the folder (which should include main.exe) into your llama.cpp folder that you had just cloned. Now back to the command line
# install Python dependencies
python3 -m pip install -r requirements.txt

Test llama.cpp

Confirm that llama.cpp works by running an example. Replace <YOUR_MODEL_BIN> with your llama model, typically named something like ggml-model-q4_0.bin

# Mac
./main -m models/7B/<YOUR_MODEL_BIN> -p "the sky is"

# Windows
main -m models/7B/<YOUR_MODEL_BIN> -p "the sky is"

It'll start spitting random BS, but you're golden if it's responding. You may now move on to 1 of the 2 below methods to get up and running.

Running gpt-llama.cpp

NPM Package

# run without installing
npx gpt-llama.cpp start

# alternatively, you can install it globally
npm i gpt-llama.cpp -g
gpt-llama.cpp start

That's it!

Run Locally

  1. Clone the repository:

    git clone https://github.com/keldenl/gpt-llama.cpp.git
    cd gpt-llama.cpp
    • Recommended folder structure
         documents
         โ”œโ”€โ”€ llama.cpp
         โ”‚   โ”œโ”€โ”€ models
         โ”‚   โ”‚   โ””โ”€โ”€ <YOUR_.BIN_MODEL_FILES_HERE>
         โ”‚   โ””โ”€โ”€ main
         โ””โ”€โ”€ gpt-llama.cpp
      
  2. Install the required dependencies:

    npm install
  3. Start the server!

    # Basic usage
    npm start 
    
    # To run on a diffrent port
    # Mac
    PORT=8000 npm start
    
    # Windows cmd
    set PORT=8000
    npm start

Usage

  1. To set up the GPT-powered app, there are 2 ways:

    • To use with a documented GPT-powered application, follow supported applications directions.
    • To use with a undocumented GPT-powered application, please do the following:
      • Update the openai_api_key slot in the gpt-powered app to the absolute path of your local llama-based model (i.e. for mac, "/Users/<YOUR_USERNAME>/Documents/llama.cpp/models/vicuna/7B/ggml-vicuna-7b-4bit-rev1.bin").
      • Change the BASE_URL for the OpenAi endpoint the app is calling to localhost:443 or localhost:443/v1. This is sometimes provided in the .env file, or would require manual updating within the app OpenAi calls depending on the specific application.
  2. Open another terminal window and test the installation by running the below script, make sure you have a llama .bin model file ready. Test the server by running the test-installation script

    # Mac
    sh ./test-installion.sh
  3. (Optional) Access the Swagger API docs at http://localhost:443/docs to test requests using the provided interface. Note that the authentication token needs to be set to the path of your local llama-based model (i.e. for mac, "/Users/<YOUR_USERNAME>/Documents/llama.cpp/models/vicuna/7B/ggml-vicuna-7b-4bit-rev1.bin") for the requests to work properly.

API Documentation

Obtaining and verifying the Facebook LLaMA original model and Stanford Alpaca model data

  • Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.

  • The LLaMA models are officially distributed by Facebook and will never be provided through this repository.

Contributing

You can contribute to gpt-llama.cpp by creating branches and pull requests to merge. Please follow the standard process for open sourcing.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

gpt-llama.cpp's People

Contributors

keldenl avatar afbenevides avatar adampaigge avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.