Coder Social home page Coder Social logo

llama4micro's Introduction

llama4micro ๐Ÿฆ™๐Ÿ”ฌ

A "large" language model running on a microcontroller.

Example run

Background

I was wondering if it's possible to fit a non-trivial language model on a microcontroller. Turns out the answer is some version of yes! (Later, things got a bit out of hand and now the prompt is based on objects detected by the camera.)

This project is using the Coral Dev Board Micro with its FreeRTOS toolchain. The board has a number of neat hardware features, but โ€“ most importantly for our purposes โ€“ it has 64MB of RAM. That's tiny for LLMs, which are typically measured in the GBs, but comparatively huge for a microcontroller.

The LLM implementation itself is an adaptation of llama2.c and the tinyllamas checkpoints trained on the TinyStories dataset. The quality of the smaller model versions isn't ideal, but good enough to generate somewhat coherent (and occasionally weird) stories.

Note

Language model inference runs on the 800 MHz Arm Cortex-M7 CPU core. Camera image classification uses the Edge TPU and a compiled YOLOv5 model. The board also has a second 400 MHz Arm Cortex-M4 CPU core, which is currently unused.

Setup

Clone this repo with its submodules karpathy/llama2.c, google-coral/coralmicro, and ultralytics/yolov5.

git clone --recurse-submodules https://github.com/maxbbraun/llama4micro.git

cd llama4micro

The pre-trained models are in the models/ directory. Refer to the instructions on how to download and convert them.

Build the image:

mkdir build
cd build

cmake ..
make -j

Flash the image:

python3 -m venv venv
. venv/bin/activate

pip install -r ../coralmicro/scripts/requirements.txt

python ../coralmicro/scripts/flashtool.py \
    --build_dir . \
    --elf_path llama4micro

Usage

  1. The models load automatically when the board powers up.
    • This takes ~7 seconds.
    • The green light will turn on when ready.
  2. Point the camera at an object and press the button.
    • The green light will turn off.
    • The camera will take a picture and detect an object.
  3. The model now generates tokens starting with a prompt based on the object.
    • The results are streamed to the serial port.
    • This happens at a rate of ~2.5 tokens per second.
  4. Generation stops after the end token or maximum steps.
    • The green light will turn on again.
    • Goto 2.

llama4micro's People

Contributors

maxbbraun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

llama4micro's Issues

Try larger models ๐Ÿ’ช

The current implementation works with the 15M parameter version of tinyllamas. Just dropping in the next larger one (42M) flashes fine, but freezes at runtime.

Would need to look into what's happening here. It could be that the model weights plus the run state are larger than the available RAM (63.5MB). I might also have overlooked something about the memory layout. If it's the former, there might be a way to optimize memory usage to fit everything.

Another option would be to train a model between 15M and 42M parameters that just barely fits without any further optimizations.

Measure power consumption ๐Ÿ”Œ

Once nice thing about microcontrollers is their low power consumption. We should measure it! While running inference and while idle/suspended.

I assume it'll consume less power when driven with 3.3V directly instead of 5V USB. I think the PMIC used here should work with a battery. Sheets 3 and 4 of this schematic are probably useful to figure this out.

There are also some relevant notes on power savings in this draft.

Optimize inference speed โšก๏ธ

Experimenting with compiler options in branch fast-opts.

Switching from -Os to -O3 seems to have significant impact on tokens per second. (-Ofast doesn't noticeably add on top.)

->>> Averaged 2.60 tokens/s
+>>> Averaged 3.79 tokens/s

Unfortunately, something about this seems to break the camera input or TPU inference and I haven't debugged that yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.