mdrokz / rust-llama.cpp Goto Github PK
View Code? Open in Web Editor NEWLLama.cpp rust bindings
Home Page: https://crates.io/crates/llama_cpp_rs/
License: MIT License
LLama.cpp rust bindings
Home Page: https://crates.io/crates/llama_cpp_rs/
License: MIT License
error occurred: Command ZERO_AR_DATE="1" "ar" "cq" "/home/tc-wolf/rust-llama.cpp/target/release/build/llama_cpp_rs-75252caa56296e09/out/libbinding.a" "/home/tc-wolf/rust-llama.cpp/target/release/build/llama_cpp_rs-75252caa56296e09/out/./llama.cpp/common/common.o" "/home/tc-wolf/rust-llama.cpp/target/release/build/llama_cpp_rs-75252caa56296e09/out/./llama.cpp/llama.o" "/home/tc-wolf/rust-llama.cpp/target/release/build/llama_cpp_rs-75252caa56296e09/out/./binding.o" "/home/tc-wolf/rust-llama.cpp/target/release/build/llama_cpp_rs-75252caa56296e09/out/llama.cpp/ggml.o" "/home/tc-wolf/rust-llama.cpp/target/release/build/llama_cpp_rs-75252caa56296e09/out/llama.cpp/ggml-metal.o" with args "ar" did not execute successfully (status code exit status: 1).
This is because the cc-rs crate adds a hash to the generated object file name to avoid collisions if there is another file in a subdirectory with the same name.
Should be fixed by #39
When trying to build it on my Macbook aarch64, im getting a build error.
cargo build --verbose
Fresh unicode-ident v1.0.9
Fresh glob v0.3.1
Fresh minimal-lexical v0.2.1
Fresh proc-macro2 v1.0.63
Fresh cfg-if v1.0.0
Fresh regex-syntax v0.7.2
Fresh libc v0.2.147
Fresh quote v1.0.29
Fresh memchr v2.5.0
Fresh libloading v0.7.4
Fresh either v1.8.1
Fresh regex v1.8.4
Fresh syn v2.0.22
Fresh nom v7.1.3
Fresh which v4.4.0
Fresh clang-sys v1.6.1
Fresh log v0.4.19
Fresh peeking_take_while v0.1.2
Fresh bitflags v2.3.3
Fresh cexpr v0.6.0
Fresh prettyplease v0.2.9
Fresh shlex v1.1.0
Fresh lazy_static v1.4.0
Fresh rustc-hash v1.1.0
Fresh lazycell v1.3.0
Fresh cc v1.0.79
Fresh bindgen v0.66.1
Compiling llama_cpp_rs v0.2.0 (/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp)
Running `/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-31c12aeaf8da45ac/build-script-build`
The following warnings were emitted during compilation:
warning: clang: warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument]
warning: In file included from ./llama.cpp/examples/common.cpp:1:
warning: In file included from ./llama.cpp/examples/common.h:5:
warning: In file included from ./llama.cpp/llama.h:4:
warning: ./llama.cpp/ggml.h:254:24: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_TYPE_COUNT,
warning: ^
warning: ./llama.cpp/ggml.h:260:36: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_BACKEND_GPU_SPLIT = 20,
warning: ^
warning: ./llama.cpp/ggml.h:278:36: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_FTYPE_MOSTLY_Q6_K = 14, // except 1d tensors
warning: ^
warning: ./llama.cpp/ggml.h:355:22: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_OP_COUNT,
warning: ^
warning: ./llama.cpp/ggml.h:450:27: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_TASK_FINALIZE,
warning: ^
warning: ./llama.cpp/ggml.h:1294:23: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_OPT_LBFGS,
warning: ^
warning: ./llama.cpp/ggml.h:1303:54: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_LINESEARCH_BACKTRACKING_STRONG_WOLFE = 2,
warning: ^
warning: ./llama.cpp/ggml.h:1318:43: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: GGML_LINESEARCH_INVALID_PARAMETERS,
warning: ^
warning: In file included from ./llama.cpp/examples/common.cpp:1:
warning: In file included from ./llama.cpp/examples/common.h:5:
warning: ./llama.cpp/llama.h:124:46: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
warning: LLAMA_FTYPE_MOSTLY_Q6_K = 18,// except 1d tensors
warning: ^
warning: In file included from ./llama.cpp/examples/common.cpp:1:
warning: ./llama.cpp/examples/common.h:25:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t seed = -1; // RNG seed
warning: ^
warning: ./llama.cpp/examples/common.h:26:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t n_threads = get_num_physical_cores();
warning: ^
warning: ./llama.cpp/examples/common.h:27:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t n_predict = -1; // new tokens to predict
warning: ^
warning: ./llama.cpp/examples/common.h:28:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t n_ctx = 512; // context size
warning: ^
warning: ./llama.cpp/examples/common.h:29:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t n_batch = 512; // batch size for prompt processing (must be >=32 to use BLAS)
warning: ^
warning: ./llama.cpp/examples/common.h:30:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t n_keep = 0; // number of tokens to keep from initial prompt
warning: ^
warning: ./llama.cpp/examples/common.h:31:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t n_gpu_layers = 0; // number of layers to store in VRAM
warning: ^
warning: ./llama.cpp/examples/common.h:32:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t main_gpu = 0; // the GPU that is used for scratch and small tensors
warning: ^
warning: ./llama.cpp/examples/common.h:33:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float tensor_split[LLAMA_MAX_DEVICES] = {0}; // how split tensors should be distributed across GPUs
warning: ^
warning: ./llama.cpp/examples/common.h:34:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool low_vram = 0; // if true, reduce VRAM usage at the cost of performance
warning: ^
warning: ./llama.cpp/examples/common.h:38:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t top_k = 40; // <= 0 to use vocab size
warning: ^
warning: ./llama.cpp/examples/common.h:39:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float top_p = 0.95f; // 1.0 = disabled
warning: ^
warning: ./llama.cpp/examples/common.h:40:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float tfs_z = 1.00f; // 1.0 = disabled
warning: ^
warning: ./llama.cpp/examples/common.h:41:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float typical_p = 1.00f; // 1.0 = disabled
warning: ^
warning: ./llama.cpp/examples/common.h:42:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float temp = 0.80f; // 1.0 = disabled
warning: ^
warning: ./llama.cpp/examples/common.h:43:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float repeat_penalty = 1.10f; // 1.0 = disabled
warning: ^
warning: ./llama.cpp/examples/common.h:44:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int32_t repeat_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size)
warning: ^
warning: ./llama.cpp/examples/common.h:45:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float frequency_penalty = 0.00f; // 0.0 = disabled
warning: ^
warning: ./llama.cpp/examples/common.h:46:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float presence_penalty = 0.00f; // 0.0 = disabled
warning: ^
warning: ./llama.cpp/examples/common.h:47:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: int mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
warning: ^
warning: ./llama.cpp/examples/common.h:48:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float mirostat_tau = 5.00f; // target entropy
warning: ^
warning: ./llama.cpp/examples/common.h:49:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: float mirostat_eta = 0.10f; // learning rate
warning: ^
warning: ./llama.cpp/examples/common.h:51:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string model = "models/7B/ggml-model.bin"; // model path
warning: ^
warning: ./llama.cpp/examples/common.h:52:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string model_alias = "unknown"; // model alias
warning: ^
warning: ./llama.cpp/examples/common.h:53:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string prompt = "";
warning: ^
warning: ./llama.cpp/examples/common.h:54:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string path_prompt_cache = ""; // path to file for saving/loading prompt eval state
warning: ^
warning: ./llama.cpp/examples/common.h:55:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string input_prefix = ""; // string to prefix user inputs with
warning: ^
warning: ./llama.cpp/examples/common.h:56:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string input_suffix = ""; // string to suffix user inputs with
warning: ^
warning: ./llama.cpp/examples/common.h:59:30: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string lora_adapter = ""; // lora adapter path
warning: ^
warning: ./llama.cpp/examples/common.h:60:30: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: std::string lora_base = ""; // base model path for the lora adapter
warning: ^
warning: ./llama.cpp/examples/common.h:62:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool memory_f16 = true; // use f16 instead of f32 for memory kv
warning: ^
warning: ./llama.cpp/examples/common.h:63:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool random_prompt = false; // do not randomize prompt if none provided
warning: ^
warning: ./llama.cpp/examples/common.h:64:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool use_color = false; // use color to distinguish generations and inputs
warning: ^
warning: ./llama.cpp/examples/common.h:65:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool interactive = false; // interactive mode
warning: ^
warning: ./llama.cpp/examples/common.h:66:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool prompt_cache_all = false; // save user input and generations to prompt cache
warning: ^
warning: ./llama.cpp/examples/common.h:67:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool prompt_cache_ro = false; // open the prompt cache read-only and do not update it
warning: ^
warning: ./llama.cpp/examples/common.h:69:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool embedding = false; // get only sentence embedding
warning: ^
warning: ./llama.cpp/examples/common.h:70:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool interactive_first = false; // wait for user input immediately
warning: ^
warning: ./llama.cpp/examples/common.h:71:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool multiline_input = false; // reverse the usage of `\`
warning: ^
warning: ./llama.cpp/examples/common.h:73:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool instruct = false; // instruction mode (used for Alpaca models)
warning: ^
warning: ./llama.cpp/examples/common.h:74:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool penalize_nl = true; // consider newlines as a repeatable token
warning: ^
warning: ./llama.cpp/examples/common.h:75:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool perplexity = false; // compute perplexity over the prompt
warning: ^
warning: ./llama.cpp/examples/common.h:76:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool use_mmap = true; // use mmap for faster loads
warning: ^
warning: ./llama.cpp/examples/common.h:77:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool use_mlock = false; // use mlock to keep model in memory
warning: ^
warning: ./llama.cpp/examples/common.h:78:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool mem_test = false; // compute maximum memory usage
warning: ^
warning: ./llama.cpp/examples/common.h:79:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool numa = false; // attempt optimizations that help on some NUMA systems
warning: ^
warning: ./llama.cpp/examples/common.h:80:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool export_cgraph = false; // export the computation graph
warning: ^
warning: ./llama.cpp/examples/common.h:81:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool verbose_prompt = false; // print prompt tokens before generation
warning: ^
warning: ./llama.cpp/examples/common.h:100:6: error: no template named 'tuple' in namespace 'std'
warning: std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_params(const gpt_params & params);
warning: ~~~~~^
warning: ./llama.cpp/examples/common.h:123:26: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool multiline_input = false;
warning: ^
warning: ./llama.cpp/examples/common.h:124:20: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: bool use_color = false;
warning: ^
warning: ./llama.cpp/examples/common.h:125:27: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: console_color_t color = CONSOLE_COLOR_DEFAULT;
warning: ^
warning: ./llama.cpp/examples/common.h:127:15: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: FILE* out = stdout;
warning: ^
warning: ./llama.cpp/examples/common.h:131:15: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
warning: FILE* tty = nullptr;
warning: ^
warning: ./llama.cpp/examples/common.cpp:537:6: error: no template named 'tuple' in namespace 'std'
warning: std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_params(const gpt_params & params) {
warning: ~~~~~^
warning: ./llama.cpp/examples/common.cpp:538:5: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
warning: auto lparams = llama_context_default_params();
warning: ^
warning: ./llama.cpp/examples/common.cpp:556:21: error: no member named 'make_tuple' in namespace 'std'
warning: return std::make_tuple(nullptr, nullptr);
warning: ~~~~~^
warning: ./llama.cpp/examples/common.cpp:563:21: error: no member named 'make_tuple' in namespace 'std'
warning: return std::make_tuple(nullptr, nullptr);
warning: ~~~~~^
warning: ./llama.cpp/examples/common.cpp:575:25: error: no member named 'make_tuple' in namespace 'std'
warning: return std::make_tuple(nullptr, nullptr);
warning: ~~~~~^
warning: ./llama.cpp/examples/common.cpp:579:17: error: no member named 'make_tuple' in namespace 'std'
warning: return std::make_tuple(model, lctx);
warning: ~~~~~^
warning: 63 warnings and 6 errors generated.
error: failed to run custom build command for `llama_cpp_rs v0.2.0 (/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp)`
Caused by:
process didn't exit successfully: `/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-31c12aeaf8da45ac/build-script-build` (exit status: 1)
--- stdout
cargo:rerun-if-env-changed=TARGET
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_aarch64-apple-darwin
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_aarch64_apple_darwin
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
cargo:rerun-if-changed=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/15.0.0/include/stdbool.h
TARGET = Some("aarch64-apple-darwin")
OPT_LEVEL = Some("0")
HOST = Some("aarch64-apple-darwin")
cargo:rerun-if-env-changed=CC_aarch64-apple-darwin
CC_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=CC_aarch64_apple_darwin
CC_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_CC
HOST_CC = None
cargo:rerun-if-env-changed=CC
CC = None
cargo:rerun-if-env-changed=CFLAGS_aarch64-apple-darwin
CFLAGS_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=CFLAGS_aarch64_apple_darwin
CFLAGS_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_CFLAGS
HOST_CFLAGS = None
cargo:rerun-if-env-changed=CFLAGS
CFLAGS = None
cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
CRATE_CC_NO_DEFAULTS = None
DEBUG = Some("true")
CARGO_CFG_TARGET_FEATURE = Some("aes,crc,dit,dotprod,dpb,dpb2,fcma,fhm,flagm,fp16,frintts,jsconv,lor,lse,neon,paca,pacg,pan,pmuv3,ras,rcpc,rcpc2,rdm,sb,sha2,sha3,ssbs,vh")
running: "cc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-gdwarf-2" "-fno-omit-frame-pointer" "-arch" "arm64" "-I" "./llama.cpp" "-Wall" "-Wextra" "-Wall" "-Wextra" "-Wpedantic" "-Wcast-qual" "-Wdouble-promotion" "-Wshadow" "-Wstrict-prototypes" "-Wpointer-arith" "-march=native" "-mtune=native" "-o" "/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-16e92bd0bb55faf0/out/./llama.cpp/ggml.o" "-c" "./llama.cpp/ggml.c"
exit status: 0
cargo:rerun-if-env-changed=AR_aarch64-apple-darwin
AR_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=AR_aarch64_apple_darwin
AR_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_AR
HOST_AR = None
cargo:rerun-if-env-changed=AR
AR = None
cargo:rerun-if-env-changed=ARFLAGS_aarch64-apple-darwin
ARFLAGS_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=ARFLAGS_aarch64_apple_darwin
ARFLAGS_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_ARFLAGS
HOST_ARFLAGS = None
cargo:rerun-if-env-changed=ARFLAGS
ARFLAGS = None
running: ZERO_AR_DATE="1" "ar" "cq" "/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-16e92bd0bb55faf0/out/libggml.a" "/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-16e92bd0bb55faf0/out/./llama.cpp/ggml.o"
exit status: 0
running: "ar" "s" "/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-16e92bd0bb55faf0/out/libggml.a"
exit status: 0
cargo:rustc-link-lib=static=ggml
cargo:rustc-link-search=native=/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-16e92bd0bb55faf0/out
TARGET = Some("aarch64-apple-darwin")
OPT_LEVEL = Some("0")
HOST = Some("aarch64-apple-darwin")
cargo:rerun-if-env-changed=CXX_aarch64-apple-darwin
CXX_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=CXX_aarch64_apple_darwin
CXX_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_CXX
HOST_CXX = None
cargo:rerun-if-env-changed=CXX
CXX = None
cargo:rerun-if-env-changed=CXXFLAGS_aarch64-apple-darwin
CXXFLAGS_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=CXXFLAGS_aarch64_apple_darwin
CXXFLAGS_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_CXXFLAGS
HOST_CXXFLAGS = None
cargo:rerun-if-env-changed=CXXFLAGS
CXXFLAGS = None
cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
CRATE_CC_NO_DEFAULTS = None
DEBUG = Some("true")
CARGO_CFG_TARGET_FEATURE = Some("aes,crc,dit,dotprod,dpb,dpb2,fcma,fhm,flagm,fp16,frintts,jsconv,lor,lse,neon,paca,pacg,pan,pmuv3,ras,rcpc,rcpc2,rdm,sb,sha2,sha3,ssbs,vh")
running: "c++" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-gdwarf-2" "-fno-omit-frame-pointer" "-arch" "arm64" "-shared" "-I" "./llama.cpp/examples" "-I" "./llama.cpp" "-Wall" "-Wextra" "-Wall" "-Wdeprecated-declarations" "-Wunused-but-set-variable" "-Wextra" "-Wpedantic" "-Wcast-qual" "-Wno-unused-function" "-Wno-multichar" "-march=native" "-mtune=native" "-o" "/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-16e92bd0bb55faf0/out/./llama.cpp/examples/common.o" "-c" "./llama.cpp/examples/common.cpp"
cargo:warning=clang: warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument]
cargo:warning=In file included from ./llama.cpp/examples/common.cpp:1:
cargo:warning=In file included from ./llama.cpp/examples/common.h:5:
cargo:warning=In file included from ./llama.cpp/llama.h:4:
cargo:warning=./llama.cpp/ggml.h:254:24: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_TYPE_COUNT,
cargo:warning= ^
cargo:warning=./llama.cpp/ggml.h:260:36: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_BACKEND_GPU_SPLIT = 20,
cargo:warning= ^
cargo:warning=./llama.cpp/ggml.h:278:36: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_FTYPE_MOSTLY_Q6_K = 14, // except 1d tensors
cargo:warning= ^
cargo:warning=./llama.cpp/ggml.h:355:22: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_OP_COUNT,
cargo:warning= ^
cargo:warning=./llama.cpp/ggml.h:450:27: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_TASK_FINALIZE,
cargo:warning= ^
cargo:warning=./llama.cpp/ggml.h:1294:23: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_OPT_LBFGS,
cargo:warning= ^
cargo:warning=./llama.cpp/ggml.h:1303:54: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_LINESEARCH_BACKTRACKING_STRONG_WOLFE = 2,
cargo:warning= ^
cargo:warning=./llama.cpp/ggml.h:1318:43: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= GGML_LINESEARCH_INVALID_PARAMETERS,
cargo:warning= ^
cargo:warning=In file included from ./llama.cpp/examples/common.cpp:1:
cargo:warning=In file included from ./llama.cpp/examples/common.h:5:
cargo:warning=./llama.cpp/llama.h:124:46: warning: commas at the end of enumerator lists are a C++11 extension [-Wc++11-extensions]
cargo:warning= LLAMA_FTYPE_MOSTLY_Q6_K = 18,// except 1d tensors
cargo:warning= ^
cargo:warning=In file included from ./llama.cpp/examples/common.cpp:1:
cargo:warning=./llama.cpp/examples/common.h:25:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t seed = -1; // RNG seed
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:26:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t n_threads = get_num_physical_cores();
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:27:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t n_predict = -1; // new tokens to predict
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:28:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t n_ctx = 512; // context size
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:29:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t n_batch = 512; // batch size for prompt processing (must be >=32 to use BLAS)
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:30:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t n_keep = 0; // number of tokens to keep from initial prompt
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:31:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t n_gpu_layers = 0; // number of layers to store in VRAM
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:32:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t main_gpu = 0; // the GPU that is used for scratch and small tensors
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:33:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float tensor_split[LLAMA_MAX_DEVICES] = {0}; // how split tensors should be distributed across GPUs
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:34:45: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool low_vram = 0; // if true, reduce VRAM usage at the cost of performance
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:38:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t top_k = 40; // <= 0 to use vocab size
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:39:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float top_p = 0.95f; // 1.0 = disabled
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:40:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float tfs_z = 1.00f; // 1.0 = disabled
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:41:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float typical_p = 1.00f; // 1.0 = disabled
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:42:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float temp = 0.80f; // 1.0 = disabled
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:43:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float repeat_penalty = 1.10f; // 1.0 = disabled
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:44:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int32_t repeat_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size)
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:45:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float frequency_penalty = 0.00f; // 0.0 = disabled
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:46:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float presence_penalty = 0.00f; // 0.0 = disabled
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:47:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= int mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:48:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float mirostat_tau = 5.00f; // target entropy
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:49:31: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= float mirostat_eta = 0.10f; // learning rate
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:51:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string model = "models/7B/ggml-model.bin"; // model path
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:52:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string model_alias = "unknown"; // model alias
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:53:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string prompt = "";
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:54:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string path_prompt_cache = ""; // path to file for saving/loading prompt eval state
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:55:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string input_prefix = ""; // string to prefix user inputs with
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:56:35: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string input_suffix = ""; // string to suffix user inputs with
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:59:30: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string lora_adapter = ""; // lora adapter path
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:60:30: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= std::string lora_base = ""; // base model path for the lora adapter
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:62:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool memory_f16 = true; // use f16 instead of f32 for memory kv
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:63:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool random_prompt = false; // do not randomize prompt if none provided
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:64:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool use_color = false; // use color to distinguish generations and inputs
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:65:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool interactive = false; // interactive mode
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:66:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool prompt_cache_all = false; // save user input and generations to prompt cache
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:67:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool prompt_cache_ro = false; // open the prompt cache read-only and do not update it
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:69:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool embedding = false; // get only sentence embedding
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:70:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool interactive_first = false; // wait for user input immediately
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:71:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool multiline_input = false; // reverse the usage of `\`
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:73:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool instruct = false; // instruction mode (used for Alpaca models)
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:74:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool penalize_nl = true; // consider newlines as a repeatable token
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:75:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool perplexity = false; // compute perplexity over the prompt
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:76:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool use_mmap = true; // use mmap for faster loads
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:77:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool use_mlock = false; // use mlock to keep model in memory
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:78:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool mem_test = false; // compute maximum memory usage
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:79:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool numa = false; // attempt optimizations that help on some NUMA systems
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:80:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool export_cgraph = false; // export the computation graph
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:81:28: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool verbose_prompt = false; // print prompt tokens before generation
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:100:6: error: no template named 'tuple' in namespace 'std'
cargo:warning=std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_params(const gpt_params & params);
cargo:warning=~~~~~^
cargo:warning=./llama.cpp/examples/common.h:123:26: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool multiline_input = false;
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:124:20: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= bool use_color = false;
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:125:27: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= console_color_t color = CONSOLE_COLOR_DEFAULT;
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:127:15: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= FILE* out = stdout;
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.h:131:15: warning: default member initializer for non-static data member is a C++11 extension [-Wc++11-extensions]
cargo:warning= FILE* tty = nullptr;
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.cpp:537:6: error: no template named 'tuple' in namespace 'std'
cargo:warning=std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_params(const gpt_params & params) {
cargo:warning=~~~~~^
cargo:warning=./llama.cpp/examples/common.cpp:538:5: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
cargo:warning= auto lparams = llama_context_default_params();
cargo:warning= ^
cargo:warning=./llama.cpp/examples/common.cpp:556:21: error: no member named 'make_tuple' in namespace 'std'
cargo:warning= return std::make_tuple(nullptr, nullptr);
cargo:warning= ~~~~~^
cargo:warning=./llama.cpp/examples/common.cpp:563:21: error: no member named 'make_tuple' in namespace 'std'
cargo:warning= return std::make_tuple(nullptr, nullptr);
cargo:warning= ~~~~~^
cargo:warning=./llama.cpp/examples/common.cpp:575:25: error: no member named 'make_tuple' in namespace 'std'
cargo:warning= return std::make_tuple(nullptr, nullptr);
cargo:warning= ~~~~~^
cargo:warning=./llama.cpp/examples/common.cpp:579:17: error: no member named 'make_tuple' in namespace 'std'
cargo:warning= return std::make_tuple(model, lctx);
cargo:warning= ~~~~~^
cargo:warning=63 warnings and 6 errors generated.
exit status: 1
--- stderr
error occurred: Command "c++" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-gdwarf-2" "-fno-omit-frame-pointer" "-arch" "arm64" "-shared" "-I" "./llama.cpp/examples" "-I" "./llama.cpp" "-Wall" "-Wextra" "-Wall" "-Wdeprecated-declarations" "-Wunused-but-set-variable" "-Wextra" "-Wpedantic" "-Wcast-qual" "-Wno-unused-function" "-Wno-multichar" "-march=native" "-mtune=native" "-o" "/Users/jorgosnomikos/RustroverProjects/rust-llama.cpp/target/debug/build/llama_cpp_rs-16e92bd0bb55faf0/out/./llama.cpp/examples/common.o" "-c" "./llama.cpp/examples/common.cpp" with args "c++" did not execute successfully (status code exit status: 1).
Llama.cpp has had support for BNF style grammars for a while, but I don't see how I can use them with these bindings.
Is there a way?
If not, is there a good starting place for hooking them up? I could take a whack at trying to get it set up, but I don't know a lot about cpp <-> rust bindings
Hi i cannot compile on my Win11 machine.
This is the verbose warning:
PS C:\Users\gtnom\RustroverProjects\rust-llama.cpp> cargo build --verbose Fresh unicode-ident v1.0.9 Fresh glob v0.3.1 Fresh minimal-lexical v0.2.1 Fresh regex-syntax v0.7.2 Fresh either v1.8.1 Fresh once_cell v1.18.0 Fresh log v0.4.19 Fresh shlex v1.1.0 Fresh lazy_static v1.4.0 Fresh proc-macro2 v1.0.63 Fresh regex v1.8.4 Fresh lazycell v1.3.0 Fresh rustc-hash v1.1.0 Fresh bitflags v2.3.3 Fresh peeking_take_while v0.1.2 Fresh cc v1.0.79 Fresh libc v0.2.147 Fresh winapi v0.3.9 Fresh quote v1.0.29 Fresh memchr v2.5.0 Fresh nom v7.1.3 Fresh syn v2.0.22 Fresh libloading v0.7.4 Fresh which v4.4.0 Fresh prettyplease v0.2.9 Fresh clang-sys v1.6.1 Fresh cexpr v0.6.0 Fresh bindgen v0.66.1 Compiling llama_cpp_rs v0.2.0 (C:\Users\gtnom\RustroverProjects\rust-llama.cpp) Running
C:\Users\gtnom\RustroverProjects\rust-llama.cpp\target\debug\build\llama_cpp_rs-684aac4c827c5037\build-script-build`
The following warnings were emitted during compilation:
warning: cl : Command line error D8021 : invalid numeric argument '/Wextra'
error: failed to run custom build command for llama_cpp_rs v0.2.0 (C:\Users\gtnom\RustroverProjects\rust-llama.cpp)
Caused by:
process didn't exit successfully: C:\Users\gtnom\RustroverProjects\rust-llama.cpp\target\debug\build\llama_cpp_rs-684aac4c827c5037\build-script-build
(exit code: 1)
--- stdout
cargo:rerun-if-env-changed=TARGET
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64-pc-windows-msvc
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64_pc_windows_msvc
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
cargo:rerun-if-changed=C:\Program Files\LLVM\lib\clang\17\include\stdbool.h
TARGET = Some("x86_64-pc-windows-msvc")
OPT_LEVEL = Some("0")
HOST = Some("x86_64-pc-windows-msvc")
cargo:rerun-if-env-changed=CC_x86_64-pc-windows-msvc
CC_x86_64-pc-windows-msvc = None
cargo:rerun-if-env-changed=CC_x86_64_pc_windows_msvc
CC_x86_64_pc_windows_msvc = None
cargo:rerun-if-env-changed=HOST_CC
HOST_CC = None
cargo:rerun-if-env-changed=CC
CC = None
cargo:rerun-if-env-changed=CFLAGS_x86_64-pc-windows-msvc
CFLAGS_x86_64-pc-windows-msvc = None
cargo:rerun-if-env-changed=CFLAGS_x86_64_pc_windows_msvc
CFLAGS_x86_64_pc_windows_msvc = None
cargo:rerun-if-env-changed=HOST_CFLAGS
HOST_CFLAGS = None
cargo:rerun-if-env-changed=CFLAGS
CFLAGS = None
cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
CRATE_CC_NO_DEFAULTS = None
CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
DEBUG = Some("true")
running: "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\bin\HostX64\x64\cl.exe" "-nologo" "-MD" "-Z7" "-Brepro"
"-I" "./llama.cpp" "-W4" "-Wall" "-Wextra" "-Wpedantic" "-Wcast-qual" "-Wdouble-promotion" "-Wshadow" "-Wstrict-prototypes" "-Wpointer-arith" "-march=native" "-
mtune=native" "-FoC:\Users\gtnom\RustroverProjects\rust-llama.cpp\target\debug\build\llama_cpp_rs-261afeb35ceff647\out\./llama.cpp/ggml.o" "-c" "./llama.cpp/ggml.c"
cargo:warning=cl : Command line error D8021 : invalid numeric argument '/Wextra'
exit code: 2
--- stderr
error occurred: Command "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\bin\HostX64\x64\cl.exe" "-nologo" "-MD"
"-Z7" "-Brepro" "-I" "./llama.cpp" "-W4" "-Wall" "-Wextra" "-Wpedantic" "-Wcast-qual" "-Wdouble-promotion" "-Wshadow" "-Wstrict-prototypes" "-Wpointer-arith" "-m
arch=native" "-mtune=native" "-FoC:\Users\gtnom\RustroverProjects\rust-llama.cpp\target\debug\build\llama_cpp_rs-261afeb35ceff647\out\./llama.cpp/ggml.o" "-c" "./llama.cpp/ggml.c" with args "cl.exe" did not execute successfully (status code exit code: 2).`
Some prompts fail with:
thread 'main' panicked at 'called Result::unwrap()
on an Err
value: Utf8Error { valid_up_to: 0, error_len: None }', /home/kimt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/llama_cpp_rs-0.3.0/src/lib.rs:528:46
I'm using this model: https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF
Anything I can do to try to fix this problem?
Can't utilize GPU on Mac with
llama_cpp_rs = { git = "https://github.com/mdrokz/rust-llama.cpp", version = "0.3.0", features = [
"metal",
] }
Code
use llama_cpp_rs::{
options::{ModelOptions, PredictOptions},
LLama,
};
fn main() {
let model_options = ModelOptions {
n_gpu_layers: 1,
..Default::default()
};
let llama = LLama::new("zephyr-7b-alpha.Q2_K.gguf".into(), &model_options);
println!("llama: {:?}", llama);
let predict_options = PredictOptions {
tokens: 0,
threads: 14,
top_k: 90,
top_p: 0.86,
token_callback: Some(Box::new(|token| {
println!("token1: {}", token);
true
})),
..Default::default()
};
llama
.unwrap()
.predict(
"what are the national animals of india".into(),
predict_options,
)
.unwrap();
}
Error
llama_new_context_with_model: kv self size = 64.00 MB
llama_new_context_with_model: ggml_metal_init() failed
llama: Err("Failed to load model")
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', src/main.rs:40:10
To use Metal, the ggml-metal.metal
file needs to be placed in the current directory. In the llm crate we have a little hack that patches ggml-metal.m
to include the contents of that file directly in the source code, which is more convenient. See https://github.com/rustformers/llm/blob/9376078c12ea1990bd42e63432656819a056d379/crates/ggml/sys/build.rs#L198
The same hack can be applied here too. I can make a PR if this is deemed a good idea...
Is there no rust binding to get the embeddings?
Using llama.cpp one would use:
./embedding -m ./path/to/model --log-disable -p "Hello World!" 2>/dev/null
I'm in a context where I have to instantiate a LLama
instance once and then call it across threads.
Within the compiler errors I see this which is probably of use:
error: future cannot be sent between threads safely
...
help: within `LLama`, the trait `std::marker::Send` is not implemented for `*mut c_void`
Is there any way to make LLama
thread-safe? Or maybe some way to accomplish more or less the same thing where one model is being called to generate text from multiple threads?
I'm running the example script with a few different models:
use llama_cpp_rs::{
options::{ModelOptions, PredictOptions},
LLama,
};
pub fn llama_predict() -> Result<String, anyhow::Error> {
// metal seems to give really bad results
let model_options = ModelOptions {
//n_gpu_layers: 1,
..Default::default()
};
// let model_options = ModelOptions::default();
let llama = LLama::new(
"models/mistral-7b-instruct-v0.1.Q4_0.gguf".into(),
&model_options,
)
.unwrap();
let predict_options = PredictOptions {
//top_k: 20,
// top_p: 0.1,
// f16_kv: true,
token_callback: Some(Box::new(|token| {
println!("token: {}", token);
true
})),
..Default::default()
};
// TODO: get this working on master. Metal support is flakey.
let response = llama
.predict(
"what are the national animals of india".into(),
predict_options,
)
.unwrap();
println!("Response: {}", response);
Ok(response)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_llama_cpp_rs() -> Result<(), anyhow::Error> {
let response = llama_predict()?;
println!("Response: {}", response);
assert!(!response.is_empty());
Ok(())
}
}
When not using metal (not using n_gpu_layers
) the models generate tokens ex:
token: ind
token: ian
token: national
token: animal
token: is
token: t
token: iger
token:
Response: indian national animal is tiger
Response: indian national animal is tiger
When I use n_gpu_layers
it does not generate tokens, ex:
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 64.00 MiB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 76.07 MiB
llama_new_context_with_model: max tensor size = 102.54 MiB
count 0
token:
token:
token:
token:
...
Response:
Response:
Is this a known behavior?
This is in the readme, but it doesn't actually work:
git clone --recurse-submodules https://github.com/mdrokz/rust-llama.cpp
Running this within the repo, fixes it:
git submodule add https://github.com/ggerganov/llama.cpp/ llama.cpp
But it still won't build, though...
Looks like braking changes in llama.cpp
Hi, just wanted to say thank you for creating this project! I am testing out building a simple application - identical to your example, but setting the crate type as a lib using wasm-pack. And I get the following error:
cargo:warning=clang: warning: argument unused during compilation: '-march=native' [-Wunused-command-line-argument]
cargo:warning=In file included from ./llama.cpp/ggml.c:4:
cargo:warning=./llama.cpp/ggml-impl.h:7:10: fatal error: 'assert.h' file not found
cargo:warning=#include <assert.h>
cargo:warning= ^~~~~~~~~~
cargo:warning=1 error generated.
exit status: 1
I am fairly new to Rust, any ideas on how to work around this? I am running on macOS and just building with "wasm-pack build"
@mdrokz are you planning to main this project? I saw it uses a pretty old llama.cpp version.
When enabling the cuda feature, I get the following error on windows:
[...]
running: "nvcc" "-O0" "-ffunction-sections" "-fdata-sections" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "./llama.cpp/ggml-cuda.h" "-Wall" "-Wextra" "--forward-unknown-to-host-compiler" "-arch=native" "/W4" "/Wall" "/wd4820" "/wd4710" "/wd4711" "/wd4820" "/wd4514" "-DGGML_USE_CUBLAS" "-DGGML_CUDA_DMMV_X=32" "-DGGML_CUDA_DMMV_Y=1" "-DK_QUANTS_PER_ITERATION=2" "-Wno-pedantic" "-o" "C:\\dev\\ai_kuinox\\target\\debug\\build\\llama_cpp_rs-dbbb5a5dac5f7f5e\\out\\./llama.cpp/ggml-cuda.o" "-c" "./llama.cpp/ggml-cuda.cu"
nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
exit code: 1
when run example from readme, build failed. cannot find ggml.o file or directory? cannot get the point. any successful implementation with macos m2 chip?
When trying to build it on my Ubuntu 22.04, I'm getting a build error.
cargo build
+ cargo build
Compiling proc-macro2 v1.0.63
Compiling quote v1.0.29
Compiling libc v0.2.147
Compiling memchr v2.5.0
Compiling glob v0.3.1
Compiling unicode-ident v1.0.9
Compiling prettyplease v0.2.9
Compiling cfg-if v1.0.0
Compiling minimal-lexical v0.2.1
Compiling bindgen v0.66.1
Compiling regex-syntax v0.7.2
Compiling either v1.8.1
Compiling bitflags v2.3.3
Compiling rustc-hash v1.1.0
Compiling lazy_static v1.4.0
Compiling shlex v1.1.0
Compiling lazycell v1.3.0
Compiling libloading v0.7.4
Compiling log v0.4.19
Compiling peeking_take_while v0.1.2
Compiling cc v1.0.79
Compiling clang-sys v1.6.1
Compiling nom v7.1.3
Compiling which v4.4.0
Compiling syn v2.0.22
Compiling regex v1.8.4
Compiling cexpr v0.6.0
Compiling llama_cpp_rs v0.3.0 (/home/rodrigo/Documents/SRS/rust-llama.cpp)
error: failed to run custom build command for `llama_cpp_rs v0.3.0 (/home/rodrigo/Documents/SRS/rust-llama.cpp)`
Caused by:
process didn't exit successfully: `/home/rodrigo/Documents/SRS/rust-llama.cpp/target/debug/build/llama_cpp_rs-3e62109abc25cc59/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-env-changed=TARGET
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64-unknown-linux-gnu
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64_unknown_linux_gnu
cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
--- stderr
thread 'main' panicked at 'Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"', /home/rodrigo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen-0.66.1/lib.rs:604:31
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
First, thanks for your work :)
I'm trying to silence llama.cpp
output and keep only the answer.
I've closed stderr
temporally while loading the model (this is not a nice approach, but it works).
unsafe {
libc::close(libc::STDERR_FILENO);
}
let llama = LLama::new(model,&options);
unsafe {
let wr = "w".as_ptr() as *const c_char;
let fd = libc::fdopen(libc::STDERR_FILENO, wr);
libc::dup2(fd as i32, libc::STDERR_FILENO);
}
But when I call predict
I still have an unwanted output count 0
.
Maybe you can change it to log::debug!("count {}", reverse_count);
?
I been playing around with the Python and Rust bindings of llama and noticed that Python was producing content much faster despite same model / input.
When I printed out the args/specs of the run I noticed some things were missing from the Rust binding that Python was using.
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: n_yarn_orig_ctx = 2048
I am not sure if either Python is using better specifications or I am using inappropriate poor specs because I been playing around with threads, n_batch, batch and n_gpu layers.
I tried to find comments via the Rust code but couldn't find anything.
Any recommendations?
When attempting to run dolphin-2_6-phi-2.Q4_0.gguf
I'm getting error loading model: unknown model architecture: 'phi2'
.
Phi2 support was added a couple of weeks ago: ggerganov/llama.cpp#4490.
Is there a way to include this?
For reference, I am using this repo as part of a different package using current master:
[dependencies]
llama_cpp_rs = { git = "https://github.com/mdrokz/rust-llama.cpp.git", rev = "4922cac", features = ["metal"] }
Hello!
I'm trying to run the basic CPU example in the repo and I'm facing the following error when trying to load the "wizard-vicuna-13B.ggmlv3.q4_0.bin" model:
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from /<hidden>/models/wizard-vicuna-13B.ggmlv3.q4_0.bin
llama_load_model_from_file: failed to load model
called `Result::unwrap()` on an `Err` value: "Failed to load model"
thread 'llama::tests::cuda_inference' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', app/llm/src/llama.rs:84:127
stack backtrace:
Then, I tried it with other .gguf models, and in all my attempts, the code would load the model but get stuck in the prediction until I get a free error ( which would take some minutes).
Does llama.cpp not support .bin files and are the llama models just so heavy that I can't run on my notebook (I have a Intel® Core™ i5-12500H and NVIDIA® GeForce® RTX™ 3050 Ti, GDDR6 de 4 GB)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.