Coder Social home page Coder Social logo

rayrtfr / llama.cpp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ggerganov/llama.cpp

0.0 0.0 1.0 14.98 MB

Port of Facebook's LLaMA model in C/C++

License: MIT License

Shell 0.69% C++ 65.67% Python 4.17% C 18.75% Objective-C 1.71% Cuda 5.02% Swift 0.02% Nix 0.23% Makefile 0.31% CMake 0.67% Metal 2.58% Dockerfile 0.09% Zig 0.07% Batchfile 0.01%

llama.cpp's Introduction

使用llama.cpp量化部署

llama.cpp工具为例,介绍模型量化并在本地部署的详细步骤。Windows则可能需要cmake等编译工具的安装。本地快速部署体验推荐使用经过指令精调的Atom-7B-Chat模型,有条件的推荐使用6-bit或者8-bit模型,效果更佳。 运行前请确保:

  1. 系统应有make(MacOS/Linux自带)或cmake(Windows需自行安装)编译工具
  2. 建议使用Python 3.10以上编译和运行该工具

Step 1: 克隆和编译llama.cpp

  1. (可选)如果已下载旧版仓库,建议git pull拉取最新代码,并执行make clean进行清理
  2. 拉取最新版适配过Atom大模型的llama.cpp仓库代码
$ git clone https://github.com/Rayrtfr/llama.cpp
  1. 对llama.cpp项目进行编译,生成./main(用于推理)和./quantize(用于量化)二进制文件。
$ make

Windows/Linux用户如需启用GPU推理,则推荐与BLAS(或cuBLAS如果有GPU)一起编译,可以提高prompt处理速度。以下是和cuBLAS一起编译的命令,适用于NVIDIA相关GPU。参考:llama.cpp#blas-build

$ make LLAMA_CUBLAS=1

macOS用户无需额外操作,llama.cpp已对ARM NEON做优化,并且已自动启用BLAS。M系列芯片推荐使用Metal启用GPU推理,显著提升速度。只需将编译命令改为:LLAMA_METAL=1 make,参考llama.cpp#metal-build

$ LLAMA_METAL=1 make

Step 2: 生成量化版本模型

目前llama.cpp已支持.safetensors文件以及huggingface格式.bin转换为GGUF的FP16格式。

$ python convert.py --outfile ./atom-7B-cpp.gguf  /path/Atom-7B-Chat

$ ./quantize ./atom-7B-cpp.gguf ./ggml-atom-7B-q4_0.gguf q4_0

Step 3: 加载并启动模型

  • 如果想使用GPU推理:cuBLAS/Metal编译需要指定offload层数,在./main中指定例如-ngl 40表示offload 40层模型参数到GPU

使用以下命令启动聊天。

text="<s>Human: 介绍一下北京\n</s><s>Assistant:"
./main -m \
./ggml-atom-7B-q4_0.gguf \
-p "${text}"  \
--logdir ./logtxt 

如果要带聊天的上下文,上面的text需要调整成类似这样:

text="<s>Human: 介绍一下北京\n</s><s>Assistant:北京是一个美丽的城市</s>\n<s>Human: 再介绍一下合肥\n</s><s>Assistant:"

更详细的官方说明请参考:https://github.com/Rayrtfr/llama.cpp/tree/master/examples/main

llama.cpp's People

Contributors

ggerganov avatar slaren avatar cebtenzzre avatar johannesgaessler avatar ikawrakow avatar kerfufflev2 avatar sw avatar prusnak avatar jhen0409 avatar howard0su avatar anzz1 avatar slyecho avatar someoneserge avatar dannydaemonic avatar green-sky avatar galunid avatar danbev avatar ejones avatar monatis avatar xaedes avatar lshzh-ww avatar marcusdunn avatar 0cc4m avatar jart avatar ptsochantaris avatar shibe2 avatar tjohnman avatar goerch avatar comex avatar jpodivin avatar

Forkers

knifeman

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.