Coder Social home page Coder Social logo

qkv_rs's Introduction

qkv_rs

This is an experimental project which intends to make a program that can perform inference as a single block of a transformer. The idea is if we are able to develop a flexible and optimized transformer block we could launch many of them and have them communicate to perform full inference of a model. This is very much a work in progress and I started it to keep up to speed on best practices of transformer-based model inference optimizations. (KV caching generally, paged attention, flash attention, etc.)

Project Goals

  • Single Block Focus: The qkv_rs program should only launch a single block (i.e not trying to rebuild pytorch)
  • Modular Design: In order to support future optimizations the project is separating into a logical/physical graph. The physical graph can have hardware specific optimization.
  • Scalable Communication (IPC): Eventually support IPC if multiple blocks are launched
  • Forward Pass Only Nothing around backprop/gradient calculation etc. should be added to this codebase. It significantly adds complexity and doesn't match the project intention

Current Status

  • Compute Graph in Progress:
    • The logical and physical structures of the compute graph are being actively developed.
    • I'm exploring different ways of structuring the graphs, considering rewrite rules, etc.
    • Need to add support to read weight files from a few formats with GGUF being first
    • Need to add tests to validate the eventual outputs against xformers and alike
  • Potential Features:
    • Quantization:
    • Sparse Attention:
    • Mixed Precision:

Contributing

This is very much an educational side project, if you'd like to add something please don't hesistate. Please feel free to open issues for bug reports or feature requests. If you'd like to contribute code, fork the repository and submit a pull request.

qkv_rs's People

Contributors

michaelgiba avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.