Comments (8)
Can't utilize GPU on Mac with
llama_cpp_rs = { git = "https://github.com/mdrokz/rust-llama.cpp", version = "0.3.0", features = [ "metal", ] }
Code
use llama_cpp_rs::{ options::{ModelOptions, PredictOptions}, LLama, }; fn main() { let model_options = ModelOptions { n_gpu_layers: 1, ..Default::default() }; let llama = LLama::new("zephyr-7b-alpha.Q2_K.gguf".into(), &model_options); println!("llama: {:?}", llama); let predict_options = PredictOptions { tokens: 0, threads: 14, top_k: 90, top_p: 0.86, token_callback: Some(Box::new(|token| { println!("token1: {}", token); true })), ..Default::default() }; llama .unwrap() .predict( "what are the national animals of india".into(), predict_options, ) .unwrap(); }
Error
llama_new_context_with_model: kv self size = 64.00 MB llama_new_context_with_model: ggml_metal_init() failed llama: Err("Failed to load model") thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', src/main.rs:40:10
Hmm weird i dont have a mac available currently to test this, i will try to see about this. Thanks
from rust-llama.cpp.
i have the same problem on my Apple M1.
from rust-llama.cpp.
@phudtran i have found root cause. you should put the ggml-metal.metal
file next to your binary. i found disable the debug log print build.rs
for building metal feature. so print more log to find the error.
build.rs
fn compile_metal(cx: &mut Build, cxx: &mut Build) {
cx.flag("-DGGML_USE_METAL").flag("-DGGML_METAL_NDEBUG");
cxx.flag("-DGGML_USE_METAL");
println!("cargo:rustc-link-lib=framework=Metal");
println!("cargo:rustc-link-lib=framework=Foundation");
println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders");
println!("cargo:rustc-link-lib=framework=MetalKit");
cx.include("./llama.cpp/ggml-metal.h")
.file("./llama.cpp/ggml-metal.m");
}
disable GGML_METAL_NDEBUG
fn compile_metal(cx: &mut Build, cxx: &mut Build) {
cx.flag("-DGGML_USE_METAL"); // <============== enable print debug log.
cxx.flag("-DGGML_USE_METAL");
println!("cargo:rustc-link-lib=framework=Metal");
println!("cargo:rustc-link-lib=framework=Foundation");
println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders");
println!("cargo:rustc-link-lib=framework=MetalKit");
cx.include("./llama.cpp/ggml-metal.h")
.file("./llama.cpp/ggml-metal.m");
}
@mdrokz Should add some flags to enable(disable) the debug log ?
from rust-llama.cpp.
@zackshen I've tried adding the ggml-metal.metal
file next to the binary, but now I get the following message:
-[MTLComputePipelineDescriptorInternal setComputeFunction:withType:]:722: failed assertion 'computeFunction must not be nil.'
from rust-llama.cpp.
@zackshen I've tried adding the
ggml-metal.metal
file next to the binary, but now I get the following message:
-[MTLComputePipelineDescriptorInternal setComputeFunction:withType:]:722: failed assertion 'computeFunction must not be nil.'
I have never seen this error before. just modified the example code in the this repo for testing gpu utilization. Can you show your code ?
from rust-llama.cpp.
@phudtran i have found root cause. you should put the
ggml-metal.metal
file next to your binary. i found disable the debug log printbuild.rs
for building metal feature. so print more log to find the error.build.rs
fn compile_metal(cx: &mut Build, cxx: &mut Build) { cx.flag("-DGGML_USE_METAL").flag("-DGGML_METAL_NDEBUG"); cxx.flag("-DGGML_USE_METAL"); println!("cargo:rustc-link-lib=framework=Metal"); println!("cargo:rustc-link-lib=framework=Foundation"); println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders"); println!("cargo:rustc-link-lib=framework=MetalKit"); cx.include("./llama.cpp/ggml-metal.h") .file("./llama.cpp/ggml-metal.m"); }disable
GGML_METAL_NDEBUG
fn compile_metal(cx: &mut Build, cxx: &mut Build) { cx.flag("-DGGML_USE_METAL"); // <============== enable print debug log. cxx.flag("-DGGML_USE_METAL"); println!("cargo:rustc-link-lib=framework=Metal"); println!("cargo:rustc-link-lib=framework=Foundation"); println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders"); println!("cargo:rustc-link-lib=framework=MetalKit"); cx.include("./llama.cpp/ggml-metal.h") .file("./llama.cpp/ggml-metal.m"); }@mdrokz Should add some flags to enable(disable) the debug log ?
I will add an option for enabling / disabling debug
from rust-llama.cpp.
Encountered the same error. Placing ggml-metal.metal into the project directory leads to the same error as @hugonijmek have seen.
However, this solves the original issue: setting the following env variable to point to llama.cpp sources GGML_METAL_PATH_RESOURCES=/rust-llama.cpp/llama.cpp/ solves the issue. (https://github.com/ggerganov/whisper.cpp/blob/master/ggml-metal.m#L261)
from rust-llama.cpp.
If you want to include it in the build so you don't have to worry about having the shader file parallel or using the environment variable, you can use the solution from the rustformers/llm respository:
rustformers/llm@9d39ff8
To get it working, update the needle to the current string.
The file this puts in the output directory has a prefix to 'ggml-metal.o' so when checking the ggml_type
in compile_llama
, check for "metal" and if so, search the directory for the file using a call to ends_with("-ggml-metal.o")
and then add that with cxx.object(metal_path)
.
from rust-llama.cpp.
Related Issues (20)
- Not cloning llama.cpp submodule HOT 2
- Cant compile on Win64 HOT 2
- Cant build on Mac aarch64
- Cant build on Ubuntu 22.04 HOT 1
- Error in loading models HOT 5
- llama.cpp ./embedding HOT 4
- Slow Performance compared to Python Binding HOT 1
- Sometimes crashes with UTF8 error HOT 3
- `LLama` is not `Send` HOT 4
- Error when enabling CUDA on Windows HOT 12
- clang - fatal error: 'assert.h' file not found HOT 2
- remove or add a way to disable `println!("count {}", reverse_count);` HOT 1
- Error running Phi2 Models
- Using metal and `n_gpu_layers` produces no tokens HOT 5
- Support for GBNF Grammars HOT 2
- Include ggml-metal.metal file in source code HOT 1
- Maintance and improvements HOT 6
- Bug cannot build correct in macos with m2 chip
- Compiling with metal feature has `ggml-metal.o` linker failure HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-llama.cpp.