Coder Social home page Coder Social logo

mybigday / llama.rn Goto Github PK

View Code? Open in Web Editor NEW
167.0 167.0 13.0 7.69 MB

React Native binding of llama.cpp

License: MIT License

JavaScript 0.15% C++ 55.59% TypeScript 1.16% Swift 0.01% Ruby 0.10% C 36.68% Objective-C 4.62% Objective-C++ 0.64% Shell 0.08% CMake 0.05% Java 0.92%
android ios llama llama-cpp llm react-native

llama.rn's People

Contributors

jhen0409 avatar smashinfries avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

llama.rn's Issues

Feature Request: TextStreaming

Is it possible to add a text streaming feature? It looks like your loading a local cpp server I wonder does swift support sockets for react native? Inference is so slow right on mobile devices right now, streaming would help the user know something is happening. Interested in contributing if you need contribs. I believe it is supported by llama.cpp in langchains implementation but im not sure if that's custom

LLaVa support

llama.cpp includes LLaVa example (+clip.cpp), we could use it to provide vision support. We may implement it after #30 is done.

Also, it will be great if we could make an another package named clip.rn or react-native-clip, but currently I afraid we haven't more resources to maintain it, so just keep in mind.

cannot load model

issue in the README regarding model loading. It mentions the 'gguf' model but lacks clear instructions. is file loading implemented yet? No model found is always result.

Error Initializing Llama Model: Context Limit Reached

Description

When using this library on an Expo v50 app I got a persistent error that stopped the completions from working. The app was functioning correctly, but after a series of prompts, it failed with the error "[Error: Context limit reached]". I smell this is an easy fix from my end? Any help is appreciated :)

Steps to Reproduce

  1. Initialize the Llama model with standard settings.
  2. Send multiple messages to the model.
  3. Observe the failure after several interactions.

Environment

  • Expo version: v50
  • Model: phi2 3B Q2_K - Medium
  • Device: iPhone 14 Pro Max (development build)

Code Snippet

export const runLlama = async (message: string, modelId: string) => {
  const MODEL_FILE_PATH = `${FileSystem.documentDirectory}${modelId}.gguf`;
  return new Promise((resolve, reject) => {
    initLlama({
      model: MODEL_FILE_PATH,
      use_mlock: true,
      n_ctx: 2048,
      n_gpu_layers: 1,
    })
    .then((context) => {
      context.completion({
        prompt: message,
        n_predict: 60,
        temperature: 0.7,
        top_p: 1.0,
        stop: ['</s>', 'Llama:', 'User:'],
      }, (data) => {
        const { token } = data;
        resolve({ type: 'partial', token });
      })
      .then((sss) => {
        console.log('Completion result:', sss.text);
        resolve({ type: 'final', text: sss.text });
      })
      .catch((error) => {
        console.log('Error running Llama:', error);
        reject(error);
      });
    })
    .catch((error) => {
      console.log('Error initializing Llama:', error);
      reject(error);
    });
  });
};

Error

 ERROR  Error running Llama: [Error: Context limit reached]

Android: Cannot load models, stopCompletions not working.

As it says on the tin. Loading small 3b models ala Tiny Llama or StableLM models do not work. Tested models:

Attempting to call initLlama results in

  • Error: Failed to initialize context

Which I can only assume is here:

I do not know enough about native functions to investigate further.

In addition, stopCompletions() does not stop a completion on Android.
Thanks for your work, the project is fantastic otherwise.

stablelm-2-zephyr-1_6b-Q8_0.gguf does not work

Hello,

I've been working on getting the stablelm-2-zephyr-1_6b-Q8_0.gguf operational (link: https://huggingface.co/spaces/stabilityai/stablelm-2-1_6b-zephyr), especially since the 3B version seems to function quite well. However, I'm encountering an issue with the 1.6B version where it fails to initialize the context. Currently, I'm using the latest version of your master branch to compile the library. Is there a straightforward modification I can make on my end to resolve this?

from logs:

01-29 22:50:05.365 3017 20732 E RNLLAMA_LOG_ANDROID: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 340, got 268

Thank you.

Crash on loading specifc model

Related to:
Vali-98/ChatterUI#20

Model used:
https://huggingface.co/Crataco/stablelm-2-1_6b-chat-imatrix-GGUF/blob/main/stablelm-2-1_6b-chat.Q4_K_M.imx.gguf

llama.rn version:
0.3.1

Error provided my llama.rn:

[RNLlama] is_model_loaded false
handling signal: 11
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x698 in tid 4201 (AsyncTask #1), pid 31056 (ali98.ChatterUI)

From what I can tell its attempting to memory outside its address space. Oddly enough, this doesn't occur in emulator, only built apk's.

Parallel decoding

llama.cpp now supports parallel decoding in one context so we can support.

Breaking change: Deprecate stopCompletion method and move to return values of completion.

Failure to initLlama on Xiaomi phones.

Hello again, I've received reports from users of ChatterUI that model loading fails on Xiaomi branded phones:

Confirmed not working:

  • Xiaomi Poco F5 - Android 14
  • Redmi 10C - Android 13

I've also queried about other phones, and got a few responses for working devices.

Confirmed working:

  • Samsung A71 - Android 13
  • Samsung M52 - Android 13

Version used:

  • llama.rn 0.3.0-rc.14

Logcat response on the tested Poco F5:

RNLLAMA_ANDROID_JNI: [RNLlama] is_model_loaded false

There aren't enough users to confirm this is a trend across all Xiaomi phones, but it is peculiar.

[Android] Seed value does not create deterministic outputs.

As mentioned in the title, setting a seed value does not make an output deterministic on Android.

  • llama.rn version: 0.3.0-rc.13

  • Model used: phi-2.Q3_K_M.gguf

  • Android Devices Tested on: Emulated Pixel 3a - Android 14

Params used:

{
  "frequency_penalty": 0, 
  "grammar": "", 
  "min_p": 0.07, 
  "mirostat": 0, 
  "mirostat_eta": 0.1, 
  "mirostat_tau": 5, 
  "n_predict": 288, 
  "n_threads": 5, 
  "presence_penalty": 0, 
  "prompt": "", 
  "repeat_penalty": 1, 
  "seed": 2, 
  "stop": ["User:", "### Response: "], 
  "temperature": 1, 
  "tfs_z": 1, 
  "top_k": 0, 
  "top_p": 1, 
  "typical_p": 1
}

Update llamacpp module to latest

I am having troubling updating the llamacpp submodule mysel. Could the project be updated as llamacpp has added support for a few new base models that currently do not work in llama.rn?

Bump llamacpp version pls

Could we get a new release with the latest llamacpp? Llamacpp is evolving fast, and as far as I understand it got a few fixes during the last days which are related to llama 3. Or in other words, llama 3 is broken with the current release (they do run but the quality of ggml versions are affected apparently).

OpenCL Implementation for Android

First of all, thanks for the hard work on bringing this project to the react-native ecosystem.

I have been using llama.rn for a few weeks now in my personal project:
https://github.com/Vali-98/ChatterUI

I was wondering if there is any interest in implementing OpenCL for android. I have attempted to work on it myself to little success, given my inexperience with native modules.

Early stopping inference

Shouldn't there be a function that allows the user to stop inference? Could be implemented as a callback function just like in whisper.rn's realtimeInference()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.