Coder Social home page Coder Social logo

janhq / nitro Goto Github PK

View Code? Open in Web Editor NEW
1.8K 14.0 96.0 46.46 MB

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers πŸ‘‹ Jan

Home Page: https://cortex.so

License: Apache License 2.0

Shell 0.34% CMake 0.55% C++ 59.29% Batchfile 0.13% C 14.92% Makefile 0.36% TypeScript 23.48% JavaScript 0.30% Dockerfile 0.19% Python 0.27% Inno Setup 0.17%
gguf llama2 llamacpp tensorrt-llm accelerated ai inference-engine openai-api stable-diffusion cuda llama llm llms

nitro's Introduction

Cortex

cortex-cpplogo

Documentation - API Reference - Changelog - Bug reports - Discord

⚠️ Cortex is currently in Development: Expect breaking changes and bugs!

About

Cortex is an OpenAI-compatible AI engine that developers can use to build LLM apps. It is packaged with a Docker-inspired command-line interface and client libraries. It can be used as a standalone server or imported as a library.

Cortex Engines

Cortex supports the following engines:

  • cortex.llamacpp: cortex.llamacpp library is a C++ inference tool that can be dynamically loaded by any server at runtime. We use this engine to support GGUF inference with GGUF models. The llama.cpp is optimized for performance on both CPU and GPU.
  • cortex.onnx Repository: cortex.onnx is a C++ inference library for Windows that leverages onnxruntime-genai and uses DirectML to provide GPU acceleration across a wide range of hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
  • cortex.tensorrt-llm: cortex.tensorrt-llm is a C++ inference library designed for NVIDIA GPUs. It incorporates NVIDIA’s TensorRT-LLM for GPU-accelerated inference.

Quicklinks

Quickstart

Prerequisites

  • OS:
    • MacOSX 13.6 or higher.
    • Windows 10 or higher.
    • Ubuntu 22.04 and later.
  • Dependencies:
    • Node.js: Version 18 and above is required to run the installation.
    • NPM: Needed to manage packages.
    • CPU Instruction Sets: Available for download from the Cortex GitHub Releases page.
    • OpenMPI: Required for Linux. Install by using the following command:
      sudo apt install openmpi-bin libopenmpi-dev

Visit Quickstart to get started.

NPM

# Install using NPM
npm i -g cortexso
# Run model
cortex run mistral
# To uninstall globally using NPM
npm uninstall -g cortexso

Homebrew

# Install using Brew
brew install cortexso
# Run model
cortex run mistral
# To uninstall using Brew
brew uninstall cortexso

You can also install Cortex using the Cortex Installer available on GitHub Releases.

Cortex Server

cortex serve

# Output
# Started server at http://localhost:1337
# Swagger UI available at http://localhost:1337/api

You can now access the Cortex API server at http://localhost:1337, and the Swagger UI at http://localhost:1337/api.

Build from Source

To install Cortex from the source, follow the steps below:

  1. Clone the Cortex repository here.
  2. Navigate to the cortex-js folder.
  3. Open the terminal and run the following command to build the Cortex project:
npx nest build
  1. Make the command.js executable:
chmod +x '[path-to]/cortex/cortex-js/dist/src/command.js'
  1. Link the package globally:
npm link

Cortex CLI Commands

The following CLI commands are currently available. See CLI Reference Docs for more information.

  serve               Providing API endpoint for Cortex backend.
  chat                Send a chat request to a model.
  init|setup          Init settings and download cortex's dependencies.
  ps                  Show running models and their status.
  kill                Kill running cortex processes.
  pull|download       Download a model. Working with HuggingFace model id.
  run [options]       EXPERIMENTAL: Shortcut to start a model and chat.
  models              Subcommands for managing models.
  models list         List all available models.
  models pull         Download a specified model.
  models remove       Delete a specified model.
  models get          Retrieve the configuration of a specified model.
  models start        Start a specified model.
  models stop         Stop a specified model.
  models update       Update the configuration of a specified model.
  benchmark           Benchmark and analyze the performance of a specific AI model using your system.
  presets             Show all the available model presets within Cortex.
  telemetry           Retrieve telemetry logs for monitoring and analysis.
  embeddings          Creates an embedding vector representing the input text.
  engines             Subcommands for managing engines.
  engines get         Get an engine details.
  engines list        Get all the available Cortex engines.
  engines init        Setup and download the required dependencies to run cortex engines.
  configs             Subcommands for managing configurations.
  configs get         Get a configuration details.
  configs list        Get all the available configurations.
  configs set         Set a configuration.

Contact Support

nitro's People

Contributors

0xsage avatar cameronng avatar dan-homebrew avatar dotieuthien avatar eckartal avatar github-actions[bot] avatar hahuyhoang411 avatar henryh0x1 avatar hiento09 avatar hientominh avatar hiro-v avatar ikraduya avatar imtuyethan avatar innoobwetrust avatar irfanpena avatar jan-service-account avatar l2d avatar louis-jan avatar marknguyen1302 avatar maurodruwel avatar mevemo avatar namchuai avatar psugihara avatar shavit avatar tikikun avatar tohrnii avatar urmauur avatar van-qa avatar vansangpfiev avatar wujjpp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nitro's Issues

Epic: Refactor Nitro into a standalone inference service on top of Llama.cpp, compatible with Jan

Deliverable

  • janhq/nitro Github Repo #20
  • Nitro documentation with overview and installation janhq/jan#113
  • Stretch goal: endpoint /models returns a list of models that have been downloaded & are ready to be used

Owners

Big Picture

  • Jan can take in a Nitro server URL
  • Nitro can run on Apple Silicon (GGUF, can drop GGML)
  • Nitro can run on Nvidia GPUs (with llama.cpp)
  • We are a .cpp compatible server

Exclusions

  • Focus on Llama.cpp first, we will tackle TensorRT in subsequent sprint (align with our DGX cluster arriving)

chore: refactor jan-inference -> Nitro repo

Nitro, at the moment, encompasses:

  • llama-python-backend, llama.cpp
  • C++ server
  • Accelerated models (submodule?)
  • GGML models (submodule?)

The point is that we'll be adding more to it long term

bug: cuBlas build is currently not working

mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON
cmake --build . --config Release

This command supposed to make nitro able to work on NVIDIA GPU, but somehow it causing segfault now

Nitro has a installation script and configurations

Success Criteria

  • User should be able to configure Nitro and change defaults
  • User should be able to run a single script to install Nitro/OS dependencies
  • User should be able to deploy Nitro service
  • User should be able to integrate Nitro with Jan seamlessly

Rough spec

An installation path could look like the following

  1. Install dependencies: ./install.sh

logs:
sh # If gpu_mode: echo "Running Nitro on GPUs, checking dependencies" # install nvidia-smi ...

  1. Configure .env

    NITRO_PORT: 8000
    GPU_MODE: true
    
    # What other configs are possible for a good UX?
  2. Install the model(s) into directory

    wget ... /models
  3. Run Nitro: run.sh

Ship Nitro as a Binary

Nitro should be statically built and distribution as a binary

Tasks

  • Build drogon with llama cpp
  • Spin up Mac VM for testing
  • Target mac os, x86, metal supported binary
  • (clarify?) We have llm endpoint using ggml
  • Server can be configured using a config file

Success criteria

  • Nitro is multi platform binary
  • Runs a Drogon C++ server
  • Serves llama-cpp for Metal or CPU only modes
  • Include encoding / decoding
  • An architecture diagram showings whats up

feat: Nitro should support docker image

Problem

  • Manual steps in README.md to build make me frustrated, esp. when there are system deps. bugs

Success Criteria

  • Dockerfile - related to #32
  • Prebuilt docker images

Additional context
None

feat: Nitro speed up for 1st inference time after model loaded

Problem
The first request to nitro web server is slow, which make me frustrated. When it's ready, pls make sure me as a user can get quick result.

Success Criteria

  • 1st request from user should be quick
  • You should do mock request after loading model for the first time
  • The /health should yield 500 indicating the model warm up process is not done yet, else 200 if done. The case for process exit has been handled

Additional context
None atm

Add github action for nitro build

  • Use Github action in janhq
  • Artifacts: Github releases
  • Runner matrix for build status

Platform

  • Linux - amd64 - with/ without(cuda)
  • Mac - amd64 - without Metal
  • Mac - arm64 - with Metal

Load model fail should exit with code 1 instead of continue serving http server

[1] stderr: gguf_init_from_file: invalid magic number 0a8a0280
[1] 
[1] stderr: error loading model: llama_model_loader: failed to load model from /Users/louis/Library/Application Support/jan-electron/pytorch_model.bin
[1] 
[1] llama_load_model_from_file: failed to load model
[1] llama_init_from_gpt_params: error: failed to load model '/Users/louis/Library/Application Support/jan-electron/pytorch_model.bin'
[1] 
[1] stdout: 20231005 01:38:04.960344 UTC 4991698 INFO   - main.cc:27
[1] 20231005 01:38:04.971173 UTC 4991698 INFO  {"timestamp":1696469884,"level":"WARNING","function":"llamaCPP","line":1198,"message":"build info","build":1273,"commit":"99115f3"} - llamaCPP.h:108
[1] 20231005 01:38:04.971215 UTC 4991698 INFO  {"timestamp":1696469884,"level":"WARNING","function":"llamaCPP","line":1204,"message":"system info","n_threads":6,"total_threads":10,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "} - llamaCPP.h:108
[1] 20231005 01:38:04.971447 UTC 4991698 INFO  {"timestamp":1696469884,"level":"ERROR","function":"loadModel","line":245,"message":"unable to load model","model":"/Users/louis/Library/Application Support/jan-electron/pytorch_model.bin"} - llamaCPP.h:108
[1] 20231005 01:38:04.971451 UTC 4991698 INFO  "Error loading the model" - llamaCPP.h:108
[1]       ___                                   ___           ___     
[1]      /__/        ___           ___        /  /\         /  /\    
[1]      \  \:\      /  /\         /  /\      /  /::\       /  /::\   
[1]       \  \:\    /  /:/        /  /:/     /  /:/\:\     /  /:/\:\  
[1]   _____\__\:\  /__/::\       /  /:/     /  /:/  \:\   /  /:/  \:\ 
[1]  /__/::::::::\ \__\/\:\__   /  /::\    /__/:/ /:/___ /__/:/ \__\:\
[1]  \  \:\~~\~~\/    \  \:\/\ /__/:/\:\   \  \:\/:::::/ \  \:\ /  /:/
[1]   \  \:\  ~~~      \__\::/ \__\/  \:\   \  \::/~~~~   \  \:\  /:/ 
[1]    \  \:\          /__/:/       \  \:\   \  \:\        \  \:\/:/  
[1]     \  \:\         \__\/         \__\/    \  \:\        \  \::/   
[1]      \__\/                                 \__\/         \__\/    
[1] 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.