Coder Social home page Coder Social logo

elftausend / custos Goto Github PK

View Code? Open in Web Editor NEW
60.0 2.0 7.0 2.61 MB

A minimal OpenCL, CUDA, Vulkan and host CPU array manipulation engine / framework.

License: MIT License

Rust 99.05% Cuda 0.01% Dockerfile 0.13% Kotlin 0.78% Java 0.02%
rust opencl gpu cpu cuda cuda-support array-manipulations framework custos autograd

custos's Introduction

custos logo


Crates.io version Docs Rust GPU rust-clippy Android NNAPI

A minimal, extensible OpenCL, Vulkan (with WGSL), CUDA, NNAPI (Android) and host CPU array manipulation engine / framework written in Rust. This crate provides tools for executing custom array and automatic differentiation operations.

Installation

The latest published version is of 0.7.x. (April 14th, 2023). A lot has changed since. 0.7.x can be found in the custos-0.7 branch.

Add "custos" as a dependency:

[dependencies]
custos = "0.7.0"

# to disable the default features (cpu, cuda, opencl, static-api, blas, macro) and use an own set of features:
#custos = {version = "0.7.0", default-features=false, features=["opencl", "blas"]}

Available features:

custos ships combineable modules. Different selected modules result in different behaviour when executing operations. New modules can be added in user code.

use custos::prelude::*; 
// Autograd, Base = Modules
let device = CPU::<Autograd<Base>>::new();

To make specific modules useable for building a device, activate the corresponding features:

Feature Module Description
on by default Base Default behaviour.
autograd Autograd Enables running automatic differentiation.
cached Cached Reuses allocations on demand.
fork Fork Decides whether the CPU or GPU is faster for an operation. It then uses the faster device for following computations. (unified memory devices)
lazy Lazy Lazy execution of operations and lazy intermediate allocations. Enables support for CUDA graphs.
graph Graph Adds a memory usage optimizeable graph.

Usage of these modules when writing custom operations: modules.md

If an operations wants to be affected by a module, specific custos code must be called in that operation.

To make specific devices useable, activate the corresponding features:

Feature Device Notes
cpu CPU Uses heap allocations.
stack Stack Useable in no-std environments as it uses stack allocated Buffers without requiring alloc. Practically only supports the Base module.
opencl OpenCL Currently the only device that supports automatic unified memory mapping.
cuda CUDA
vulkan Vulkan Shaders are written in WGSL.
nnapi NnapiDevice Lazy module is mandatory.

Remaining features:

Feature Description
std  Adds standard library support.
no-std For no std environments, activates stack feature.
static-api Enables the creation of Buffers without providing a device.
macro Reexport of custos-macro
blas Adds gemm functions of the system's (selected) BLAS library.

Implement an operation for CPU:

  • If you want to implement your own operations for all compute devices, consider looking here: implement_operations.rs
    or to see it at a larger scale, look here custos-math (outdated, requires custos 0.7) or here sliced (for automatic diff examples).

This operation is only affected by the Cached module (and partially Autograd).

use std::ops::Mul;
use custos::prelude::*;

pub trait MulBuf<T, S: Shape = (), D: Device = Self>: Sized + Device {
    fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, Self, S>;
}

impl<Mods, T, S, D> MulBuf<T, S, D> for CPU<Mods>
where
    Mods: Retrieve<Self, T, S>,
    T: Mul<Output = T> + Copy + 'static,
    S: Shape,
    D: Device,
    D::Base<T, S>: core::ops::Deref<Target = [T]>
{
    fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, Self, S> {
        let mut out = self.retrieve(lhs.len(), (lhs, rhs));

        for ((lhs, rhs), out) in lhs.iter().zip(rhs.iter()).zip(&mut out) {
            *out = *lhs * *rhs;
        }

        out
    }
}

A lot more usage examples can be found in the tests and examples folders. (Or in the unary operation file, custos-math and sliced)

custos's People

Contributors

elftausend avatar haydnv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

custos's Issues

NNAPI/Hexagon

Hi, amazing little library!

Currently, exploring using it as a platform for GPU execution of an RNN-based LLM. One thing though, it will eventually get deployed to mobile, and there, NNAPI is king and provides crazy performance on some platforms.

Is there a chance the library can support NNAPI?

It would also be extremely beneficial, to, maybe, directly access or support Hexagon instead:
https://developer.qualcomm.com/software/hexagon-dsp-sdk

Thanks,
Oleksii

opencl unified memory device mismatch

If an OpenCL device selected via the environment variable CUSTOS_CL_DEVICE_IDX uses unified memory and another device without unified memory is then selected via a dynamic index, the unified_cl flag is still active.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.