Coder Social home page Coder Social logo

rezanonestop / llm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rustformers/llm

0.0 0.0 0.0 6.13 MB

An ecosystem of Rust libraries for working with large language models

Home Page: https://docs.rs/llm/latest/llm/

License: Apache License 2.0

Rust 99.70% Nix 0.16% Dockerfile 0.13%

llm's Introduction

llm - Large Language Models for Everyone, in Rust

llm is an ecosystem of Rust libraries for working with large language models - it's built on top of the fast, efficient GGML library for machine learning.

A llama riding a crab, AI-generated

Image by @darthdeus, using Stable Diffusion

Latest version MIT/Apache2 Discord

The primary entrypoint for developers is the llm crate, which wraps llm-base and the supported model crates. Documentation for released version is available on Docs.rs.

For end-users, there is a CLI application, llm-cli, which provides a convenient interface for interacting with supported models. Text generation can be done as a one-off based on a prompt, or interactively, through REPL or chat modes. The CLI can also be used to serialize (print) decoded models, quantize GGML files, or compute the perplexity of a model. It can be downloaded from the latest GitHub release or by installing it from crates.io.

llm is powered by the ggml tensor library, and aims to bring the robustness and ease of use of Rust to the world of large language models. At present, inference is only on the CPU, but we hope to support GPU inference in the future through alternate backends.

Currently, the following models are supported:

See getting models for more information on how to download supported models.

Using llm in a Rust Project

This project depends on Rust v1.65.0 or above and a modern C toolchain.

The llm crate exports llm-base and the model crates (e.g. bloom, gpt2 llama).

Add llm to your project by listing it as a dependency in Cargo.toml. To use the version of llm you see in the main branch of this repository, add it from GitHub (although keep in mind this is pre-release software):

[dependencies]
llm = { git = "https://github.com/rustformers/llm" , branch = "main" }

To use a released version, add it from crates.io by specifying the desired version:

[dependencies]
llm = "0.1"

By default, llm builds with support for remotely fetching the tokenizer from Hugging Face's model hub. To disable this, disable the default features for the crate, and turn on the models feature to get llm without the tokenizer:

[dependencies]
llm = { version = "0.1", default-features = false, features = ["models"] }

NOTE: To improve debug performance, exclude the transitive ggml-sys dependency from being built in debug mode:

[profile.dev.package.ggml-sys]
opt-level = 3

Leverage Accelerators with llm

The llm library is engineered to take advantage of hardware accelerators such as cuda and metal for optimized performance.

To enable llm to harness these accelerators, some preliminary configuration steps are necessary, which vary based on your operating system. For comprehensive guidance, please refer to Acceleration Support in our documentation.

Using llm from Other Languages

Bindings for this library are available in the following languages:

Using the llm CLI

The easiest way to get started with llm-cli is to download a pre-built executable from a released version of llm, but the releases are currently out of date and we recommend you install from source instead.

Installing from Source

To install the main branch of llm with the most recent features to your Cargo bin directory, which rustup is likely to have added to your PATH, run:

cargo install --git https://github.com/rustformers/llm llm-cli

The CLI application can then be run through llm. See also features and acceleration support to turn features on as required. Note that GPU support (CUDA, OpenCL, Metal) will not work unless you build with the relevant feature.

Installing with cargo

Note that the currently published version is out of date and does not include support for the most recent models. We currently recommend that you install from source.

To install the most recently released version of llm to your Cargo bin directory, which rustup is likely to have added to your PATH, run:

cargo install llm-cli

The CLI application can then be run through llm. See also features to turn features on as required.

Features

By default, llm builds with support for remotely fetching the tokenizer from Hugging Face's model hub. This adds a dependency on your system's native SSL stack, which may not be available on all systems.

To disable this, disable the default features for the build:

cargo build --release --no-default-features

To enable hardware acceleration, see Acceleration Support for Building section, which is also applicable to the CLI.

Getting Models

GGML models are easy to acquire. They are primarily located on Hugging Face (see From Hugging Face), but can be obtained from elsewhere.

Models are distributed as single files, and do not need any additional files to be downloaded. However, they are quantized with different levels of precision, so you will need to choose a quantization level that is appropriate for your application.

Additionally, we support Hugging Face tokenizers to improve the quality of tokenization. These are separate files (tokenizer.json) that can be used with the CLI using the -v or -r flags, or with the llm crate by using the appropriate TokenizerSource enum variant.

For a list of models that have been tested, see the known-good models.

Certain older GGML formats are not supported by this project, but the goal is to maintain feature parity with the upstream GGML project. For problems relating to loading models, or requesting support for supported GGML model types, please open an Issue.

From Hugging Face

Hugging Face ๐Ÿค— is a leader in open-source machine learning and hosts hundreds of GGML models. Search for GGML models on Hugging Face ๐Ÿค—.

r/LocalLLaMA

This Reddit community maintains a wiki related to GGML models, including well organized lists of links for acquiring GGML models (mostly from Hugging Face ๐Ÿค—).

Usage

Once the llm executable has been built or is in a $PATH directory, try running it. Here's an example that uses the open-source RedPajama language model:

llm infer -a gptneox -m RedPajama-INCITE-Base-3B-v1-q4_0.bin -p "Rust is a cool programming language because" -r togethercomputer/RedPajama-INCITE-Base-3B-v1

In the example above, the first two arguments specify the model architecture and command, respectively. The required -m argument specifies the local path to the model, and the required -p argument specifies the evaluation prompt. The optional -r argument is used to load the model's tokenizer from a remote Hugging Face ๐Ÿค— repository, which will typically improve results when compared to loading the tokenizer from the model file itself; there is also an optional -v argument that can be used to specify the path to a local tokenizer file. For more information about the llm CLI, use the --help parameter.

There is also a simple inference example that is helpful for debugging:

cargo run --release --example inference gptneox RedPajama-INCITE-Base-3B-v1-q4_0.bin -r $OPTIONAL_VOCAB_REPO -p $OPTIONAL_PROMPT

Q&A

Does the llm CLI support chat mode?

Yes, but certain fine-tuned models (e.g. Alpaca, Vicuna, Pygmalion) are more suited to chat use-cases than so-called "base models". Here's an example of using the llm CLI in REPL (Read-Evaluate-Print Loop) mode with an Alpaca model - note that the provided prompt format is tailored to the model that is being used:

llm repl -a llama -m ggml-alpaca-7b-q4.bin -f utils/prompts/alpaca.txt

There is also a Vicuna chat example that demonstrates how to create a custom chatbot:

cargo run --release --example vicuna-chat llama ggml-vicuna-7b-q4.bin

Can llm sessions be persisted for later use?

Sessions can be loaded (--load-session) or saved (--save-session) to file. To automatically load and save the same session, use --persist-session. This can be used to cache prompts to reduce load time, too.

How do I use llm to quantize a model?

llm can produce a q4_0- or q4_1-quantized model from an f16-quantized GGML model

cargo run --release quantize -a $MODEL_ARCHITECTURE $MODEL_IN $MODEL_OUT {q4_0,q4_1}

Do you provide support for Docker and NixOS?

The llm Dockerfile is in the utils directory; the NixOS flake manifest and lockfile are in the project root.

What's the best way to get in touch with the llm community?

GitHub Issues and Discussions are welcome, or come chat on Discord!

Do you accept contributions?

Absolutely! Please see the contributing guide.

What applications and libraries use llm?

Applications

  • llmcord: Discord bot for generating messages using llm.
  • local.ai: Desktop app for hosting an inference API on your local machine using llm.
  • secondbrain: Desktop app to download and run LLMs locally in your computer using llm.
  • floneum: A graph editor for local AI workflows.

Libraries

  • llm-chain: Build chains in large language models for text summarization and completion of more complex tasks

llm's People

Contributors

philpax avatar llukas22 avatar danforbes avatar setzer22 avatar iacore avatar kerfufflev2 avatar pixelspark avatar skirodev avatar darxkies avatar steventrouble avatar averypelle avatar tehmatt avatar tanmaysachan avatar odysa avatar jafioti avatar hlhr202 avatar royvorster avatar floppydisck avatar radu-matei avatar katopz avatar jon-chuang avatar bcho avatar hhamud avatar pdufour avatar metalflame12 avatar viirya avatar karelnagel avatar clarkmcc avatar trizko avatar mwbryant avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.