Coder Social home page Coder Social logo

tjake / jlama Goto Github PK

View Code? Open in Web Editor NEW
294.0 14.0 26.0 3.51 MB

Jlama is a modern Java inference engine for LLMs

License: Apache License 2.0

Java 54.86% Shell 0.22% Makefile 0.15% C 43.23% JavaScript 0.96% CSS 0.12% HTML 0.34% Dockerfile 0.09% Batchfile 0.05%
ai java llm simd transformers gpt llama llama2 openai huggingface

jlama's Issues

CodeLlama loading is broken?

This worked in Oct 15 jlama:

$ ./run-cli.sh complete -p "def fib(" -t 0.2 -tc 24 -n 100 models/CodeLlama-7b-hf

Now it OOMs (note that I have doubled the default Xmx, which was not necessary in Oct)

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at picocli.CommandLine.executeUserObject(CommandLine.java:2035)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264)
	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664)
	at picocli.CommandLine.parseWithHandler(CommandLine.java:2599)
	at com.github.tjake.jlama.cli.JlamaCli.main(JlamaCli.java:30)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:111)
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:66)
	at com.github.tjake.jlama.cli.commands.CompleteCommand.run(CompleteCommand.java:16)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:107)
	... 11 more
Caused by: java.lang.OutOfMemoryError
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
	at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.base/java.util.stream.IntPipeline.forEach(IntPipeline.java:463)
	at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:620)
	at com.github.tjake.jlama.model.llama.LlamaModel.loadTransformerBlockWeights(LlamaModel.java:56)
	at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:109)
	at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:31)
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	... 14 more
Caused by: java.lang.OutOfMemoryError: Cannot reserve 180355136 bytes of direct buffer memory (allocated: 25708094948, limit: 25769803776)
	at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
	at com.github.tjake.jlama.util.UnsafeDirectByteBuffer.allocateAlignedByteBuffer(UnsafeDirectByteBuffer.java:36)
	at com.github.tjake.jlama.tensor.FloatBufferTensor.<init>(FloatBufferTensor.java:73)
	at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:112)
	at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:16)
	at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:172)
	at com.github.tjake.jlama.model.llama.LlamaModel.lambda$loadTransformerBlockWeights$1(LlamaModel.java:70)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
	at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)

Windows build failures

[ERROR] testSaxpy(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.051 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testSaxpy(TestOperations.java:180)

[ERROR] testSxpby(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.031 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testSxpby(TestOperations.java:214)

[ERROR] testAccumulate(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.019 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testAccumulate(TestOperations.java:118)

[ERROR] testDotProduct(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.144 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testDotProduct(TestOperations.java:85)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   TestOperations.testAccumulate:118 » ClassCast a Vector<class java.lang.Integer...
[ERROR]   TestOperations.testDotProduct:85 » ClassCast a Vector<class java.lang.Integer>...
[ERROR]   TestOperations.testSaxpy:180 » ClassCast a Vector<class java.lang.Integer>: re...
[ERROR]   TestOperations.testSxpby:214 » ClassCast a Vector<class java.lang.Integer>: re...
[INFO]
[ERROR] Tests run: 17, Failures: 0, Errors: 4, Skipped: 6
[```

./download-hf-model.sh wont work on Windows

Writing it in Java itself will solve that. So I did. Happy to contribute it so here you go FWIW

package com.github.tjake.jlama.cli;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class DownloadModel {
    private static final String HF_ACCESS_TOKEN = System.getenv("HF_ACCESS_TOKEN");
    private static final String MODEL_DIR = "models";

    public static void main(String[] args) throws IOException {
        if (args.length != 1) {
            usage();
            System.exit(1);
        }

        String hfModel = args[0];
        String authHeader = null;
        if (HF_ACCESS_TOKEN != null && !HF_ACCESS_TOKEN.isEmpty()) {
            authHeader = "Authorization: Bearer " + HF_ACCESS_TOKEN;
        }

        InputStream modelInfoStream = getResponse("https://huggingface.co/api/models/" + hfModel, authHeader);
        String modelInfo = readInputStream(modelInfoStream);

        if (modelInfo == null) {
            System.out.println("No valid model found or trying to access a restricted model (use HF_ACCESS_TOKEN env. var.)");
            System.exit(1);
        }

        List<String> allFiles = parseFileList(modelInfo);
        if (allFiles.isEmpty()) {
            System.out.println("No valid model found");
            System.exit(1);
        }

        List<String> tensorFiles = new ArrayList<>();
        for (String currFile : allFiles) {
            if (currFile.contains("safetensor")) {
                tensorFiles.add(currFile);
            }
        }

        if (tensorFiles.isEmpty()) {
            System.out.println("Model is not available in safetensor format");
            System.exit(1);
        }

        allFiles.addAll(Arrays.asList("config.json", "vocab.json", "tokenizer.json"));

        Path modelDir = Paths.get(MODEL_DIR, hfModel);
        try {
            Files.createDirectories(modelDir);
        } catch (IOException e) {
            System.out.println("Error creating directory: " + modelDir);
            System.exit(1);
        }

        for (String currFile : allFiles) {
            System.out.println("Downloading file: " + modelDir.resolve(currFile));
            downloadFile(hfModel, currFile, authHeader, modelDir.resolve(currFile));
        }

        System.out.println("Downloading file: " + modelDir.resolve("tokenizer.model") + " (if it exists)");
        downloadFile(hfModel, "tokenizer.model", authHeader, modelDir.resolve("tokenizer.model"));

        System.out.println("Done! Model downloaded in ./" + MODEL_DIR + "/" + hfModel);
    }

    private static void usage() {
        System.out.println("""
                usage: java DownloadModel [-h] owner/model_name

                This program will download a safetensor files and inference configuration from huggingface.
                To download restricted models set the HF_ACCESS_TOKEN environment variable to a valid HF access token.
                To create a token see https://huggingface.co/settings/tokens

                OPTIONS:
                   -h   Show this message

                EXAMPLES:
                    java DownloadModel gpt2-medium
                    java DownloadModel meta-llama/Llama-2-7b-chat-hf""");
    }

    private static List<String> parseFileList(String modelInfo) {
        List<String> fileList = new ArrayList<>();
        try {
            ObjectMapper objectMapper = new ObjectMapper();
            JsonNode rootNode = objectMapper.readTree(modelInfo);
            JsonNode siblingsNode = rootNode.path("siblings");
            if (siblingsNode.isArray()) {
                for (JsonNode siblingNode : siblingsNode) {
                    String rFilename = siblingNode.path("rfilename").asText();
                    fileList.add(rFilename);
                }
            }
        } catch (IOException e) {
            System.out.println("Error parsing JSON: " + e.getMessage());
        }
        return fileList;
    }

    public static InputStream getResponse(String urlString, String authHeader) {
        try {
            URL url = new URL(urlString);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();

            // Set the request method
            connection.setRequestMethod("GET");

            // Set the request header
            if (authHeader != null)
                connection.setRequestProperty("Authorization", authHeader);

            // Get the response code
            int responseCode = connection.getResponseCode();

            if (responseCode == HttpURLConnection.HTTP_OK) {
                // If the response code is 200 (HTTP_OK), return the input stream
                return connection.getInputStream();
            } else {
                // If the response code is not 200, throw an IOException
                throw new IOException("HTTP response code: " + responseCode);
            }
        }
        catch (IOException ioe)
        {
            System.out.println("WARNING: Fetch of URL " + urlString + " failed due to " + ioe);
            return null;
        }
    }

    public static String readInputStream(InputStream inStream) throws IOException {
        if (inStream == null) return null;

        BufferedReader inReader = new BufferedReader(new InputStreamReader(inStream));
        StringBuilder stringBuilder = new StringBuilder();

        String currLine;
        while ((currLine = inReader.readLine()) != null) {
            stringBuilder.append(currLine);
            stringBuilder.append(System.lineSeparator());
        }

        return stringBuilder.toString();
    }
    private static void downloadFile(String hfModel, String currFile, String authHeader, Path outputPath) throws IOException {
        InputStream inStream = getResponse("https://huggingface.co/" + hfModel + "/resolve/main/" + currFile, authHeader);
        if (inStream == null)
            throw new IOException("WARNING: Fetch of file " + currFile + " failed.");
        Files.copy(inStream, outputPath, StandardCopyOption.REPLACE_EXISTING);
    }
}

streaming server support?

Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?

Feature request: support for the smallest reasonable codegen model

I want to build a local Copilot with JLama but generalist models are too big and slow.

Three candidates I found:
replit-code-v1_5-3b:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.MPT

codegen-2B-multi:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.CODEGEN

WizardCoder-1B-V1.0 (using the safetensors branch):

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@693fe6c9): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.GPT_BIGCODE

File model.safetensors.index.json not found

I downloaded the model directly from meta's repo not hugging face but the code is looking for a file called

model.safetensors.index.json when opening with loadWithWeights

But I do not have this file? Where is this coming from? There is a file called params.json {"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}

is that the same?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.