breandan / kotlingrad Goto Github PK

🧩 Shape-Safe Symbolic Differentiation with Algebraic Data Types

Home Page: https://breandan.net/public/masters_thesis.pdf#page=49

License: Apache License 2.0

Kotlin 100.00%

automatic-differentiation differentiable-programming computer-algebra kotlin types shape-safety multi-stage-programming symbolic-differentiation optimization gradient-descent

kotlingrad's Introduction

Kotlin∇: Type-safe Symbolic Differentiation for the JVM

Kotlin∇ is a type-safe automatic differentiation framework written in Kotlin. It allows users to express differentiable programs with higher-dimensional data structures and operators. We attempt to restrict syntactically valid constructions to those which are algebraically valid and can be checked at compile-time. By enforcing these constraints in the type system, it eliminates certain classes of runtime errors that may occur during the execution of a differentiable program. Due to type-inference, most type declarations may be safely omitted by the end-user. Kotlin∇ strives to be expressive, safe, and notationally similar to mathematics.

Introduction
Supported features
Usage
Visualization
Testing and gradient checking
How does it work?
Experimental ideas
Formal grammar
UML diagram
Comparison to other frameworks
References
Acknowledgements

Introduction

Inspired by Stalin∇, Autograd, DiffSharp, Myia, Nexus, Tangent, Lantern et al., Kotlin∇ attempts to port recent advancements in automatic differentiation (AD) to the Kotlin language. AD is useful for gradient descent and has a variety of applications in numerical optimization and machine learning. Our implementation adds a number of experimental ideas, including compile-time shape-safety, algebraic simplification and numerical stability checking with property-based testing. We aim to provide an algebraically-grounded implementation of AD for shape-safe tensor operations. Tensors in Kotlin∇ are represented as multidimensional arrays.

Features

Kotlin∇ currently supports the following features:

Arithmetical operations on scalars, vectors and matrices
Shape-safe vector and matrix algebra
Partial and higher-order differentiation on scalars
Property-based testing for numerical gradient checking
Recovery of symbolic derivatives from AD

Additionally, it aims to support:

PyTorch-style define-by-run semantics
N-dimensional tensors and higher-order tensor operators
Fully-general AD over control flow, variable reassignment (via delegation), and array programming, possibly using a typed IR such as Myia

All of these features are implemented without access to bytecode or special compiler tricks - just using higher-order functions and lambdas as shown in Lambda the Ultimate Backpropogator, embedded DSLs a la Lightweight Modular Staging, and ordinary generics. Please see below for a more detailed feature comparison.

Usage

Installation

Kotlin∇ is hosted on Maven Central. An example project is provided here.

Gradle

dependencies {
  implementation("ai.hypergraph:kotlingrad:0.4.7")
}

Maven

<dependency>
  <groupId>ai.hypergraph</groupId>
  <artifactId>kotlingrad</artifactId>
  <version>0.4.7</version>
</dependency>

Jupyter Notebook

To access Kotlin∇'s notebook support, use the following line magic:

@file:DependsOn("ai.hypergraph:kotlingrad:0.4.7")

For more information, explore the tutorial.

Notation

Kotlin∇ operators are higher-order functions, which take at most two inputs and return a single output, all of which are functions with the same numerical type, and whose shape is denoted using superscript in the rightmost column below.

Math	Infix ^†	Prefix	Postfix^‡	Operator Type Signature
$$\mathbf{A}(\mathbf{B})$$ $$\mathbf{A}\circ\mathbf{B}$$	`a(b)` `a of b`			$$(\texttt{a}: ℝ^{τ}→ℝ^{π}, \texttt{b}: ℝ^{λ} → ℝ^{τ}) → (ℝ^{λ}→ℝ^{π})$$
$$\mathbf{A}\pm\mathbf{B}$$	`a + b` `a - b`	`plus(a, b)` `minus(a, b)`		$$(\texttt{a}: ℝ^{τ}→ℝ^{π}, \texttt{b}: ℝ^{λ} → ℝ^{π}) → (ℝ^{?}→ℝ^{π})$$
$$\mathbf{A}\mathbf{B}$$	`a * b` `a.times(b)`	`times(a, b)`		$$(\texttt{a}: ℝ^{τ}→ℝ^{m×n}, \texttt{b}: ℝ^{λ}→ℝ^{n×p}) → (ℝ^{?}→ℝ^{m×p})$$
$$\frac{\mathbf{A}}{\mathbf{B}}$$ $$\mathbf{A}\mathbf{B}^{-1}$$	`a / b` `a.div(b)`	`div(a, b)`		$$(\texttt{a}: ℝ^{τ}→ℝ^{m×n}, \texttt{b}: ℝ^{λ}→ℝ^{p×n}) → (ℝ^{?}→ℝ^{m×p})$$
$$\pm\mathbf{A}$$		`-a` `+a`	`a.neg()` `a.pos()`	$$(\texttt{a}: ℝ^{τ}→ℝ^{π}) → (ℝ^{τ}→ℝ^{π})$$
$$\sin{a}$$ $$\cos{a}$$ $$\tan{a}$$		`sin(a)` `cos(a)` `tan(a)`	`a.sin()` `a.cos()` `a.tan()`	$$(\texttt{a}: ℝ→ℝ) → (ℝ→ℝ)$$
$$\ln{a}$$		`ln(a)` `log(a)`	`a.ln()` `a.log()`	$$(\texttt{a}: ℝ^{τ}→ℝ^{m×m}) → (ℝ^{τ}→ℝ^{m×m})$$
$$\log_{b}a$$	`a.log(b)`	`log(a, b)`		$$(\texttt{a}: ℝ^{τ}→ℝ^{m×m}, \texttt{b}: ℝ^{λ}→ℝ^{m×m}) → (ℝ^{?}→ℝ)$$
$$\mathbf{A}^b$$	`a.pow(b)`	`pow(a, b)`		$$(\texttt{a}: ℝ^{τ}→ℝ^{m×m}, \texttt{b}: ℝ^{λ}→ℝ) → (ℝ^{?}→ℝ^{m×m})$$
$$\sqrt{A}$$ $$\sqrt[3]{A}$$	`a.pow(1.0/2)` `a.root(3)`	`sqrt(a)` `cbrt(a)`	`a.sqrt()` `a.cbrt()`	$$(\texttt{a}: ℝ^{τ}→ℝ^{m×m}) → (ℝ^{τ}→ℝ^{m×m})$$
$$\frac{da}{db},\frac{\partial{a}}{\partial{b}}$$ $$D_b{a}$$	`a.d(b)` `d(a) / d(b)`	`grad(a)[b]`		$$(\texttt{a}: C(ℝ^{τ}→ℝ)^{*}, \texttt{b}: C(ℝ^{λ}→ℝ)) → (ℝ^{?}→ℝ)$$
$$\nabla{a}$$		`grad(a)`	`a.grad()`	$$(\texttt{a}: C(ℝ^{τ}→ℝ)) → (ℝ^{τ}→ℝ^{τ})$$
$$\nabla_{\mathbf{B}}a$$	`a.d(b)` `a.grad(b)`	`grad(a, b)` `grad(a)[b]`		$$(\texttt{a}: C(ℝ^{τ}→ℝ^{π}), \texttt{b}: C(ℝ^{λ}→ℝ^{ω})) → (ℝ^{?}→ℝ^{π×ω})$$
$$\nabla\cdot{\mathbf{A}}$$		`divg(a)`	`a.divg()`	$$(\texttt{a}: C(ℝ^{τ}→ℝ^{m})) → (ℝ^{τ}→ℝ)$$
$$\nabla\times{\mathbf{A}}$$		`curl(a)`	`a.curl()`	$$(\texttt{a}: C(ℝ^{3}→ℝ^{3})) → (ℝ^{3}→ℝ^{3})$$
$$\mathcal{J}(\mathbf{A})$$		`grad(a)`	`a.grad()`	$$(\texttt{a}: C(ℝ^{τ}→ℝ^{m})) → (ℝ^{τ}→ℝ^{m×τ})$$
$$\mathbf{H}(a)$$		`hess(a)`	`a.hess()`	$$(\texttt{a}: C(ℝ^{τ}→ℝ)) → (ℝ^{τ}→ℝ^{τ×τ})$$
$$\Delta{a},\nabla^{2}a$$		`lapl(a)`	`a.lapl()`	$$(\texttt{a}: C(ℝ^{τ}→ℝ)) → (ℝ^{τ}→ℝ^{τ})$$

ℝ can be a Double, Float or BigDecimal. Specialized operators are defined for subsets of ℝ, e.g., Int, Short or BigInteger for subsets of ℤ, however differentiation is only defined for continuously differentiable functions on ℝ.

^† a and b are higher-order functions. These may be constants (e.g., 0, 1.0), variables (e.g., Var()) or expressions (e.g., x + 1, 2 * x + y).

^‡ For infix notation, . is optional. Parentheses are also optional depending on precedence.

^§ Matrix division is defined iff B is invertible, although it could be possible to redefine this operator using the Moore-Penrose inverse.

^∗ Where C(ℝ^m) is the space of all continuous functions over ℝ. If the function is not over ℝ, it will fail at compile-time. If the function is over ℝ but not continuous differentiable at the point under consideration, it will fail at runtime.

^? The input shape is tracked at runtime, but not at the type level. While it would be nice to infer a union type bound over the inputs of binary functions, it is likely impossible using the Kotlin type system without great effort. If the user desires type checking when invoking higher order functions with literal values, they will need to specify the combined input type explicitly or do so at runtime.

^{τ, λ, π, ω} Arbitrary products.

Higher-Rank Derivatives

Kotlin∇ supports derivatives between tensors of up to rank 2. The shape of a tensor derivative depends on (1) the shape of the function under differentiation and (2) the shape of the variable with respect to which we are differentiating.

I/O Shape	$$ℝ^{?}→ℝ$$	$$ℝ^{?}→ℝ^{m}$$	$$ℝ^{?}→ℝ^{j×k}$$
$$ℝ^{?}→ℝ$$	$$ℝ^{?}→ℝ$$	$$ℝ^{?}→ℝ^{m}$$	$$ℝ^{?}→ℝ^{j×k}$$
$$ℝ^{?}→ℝ^{n}$$	$$ℝ^{?}→ℝ^{n}$$	$$ℝ^{?}→ℝ^{m×n}$$	❌
$$ℝ^{?}→ℝ^{h×i}$$	$$ℝ^{?}→ℝ^{h×i}$$	❌	❌

Matrix-by-vector, vector-by-matrix, and matrix-by-matrix derivatives require rank 3+ tensors and are currently unsupported.

Higher-order derivatives

Kotlin∇ supports arbitrary order derivatives on scalar functions, and up to 2nd order derivatives on vector functions. Higher-order derivatives on matrix functions are unsupported.

Shape safety

Shape safety is an important concept in Kotlin∇. There are three broad strategies for handling shape errors:

Hide the error somehow by implicitly reshaping or broadcasting arrays
Announce the error at runtime, with a relevant message, e.g., InvalidArgumentError
Do not allow programs which can result in a shape error to compile

In Kotlin∇, we use the last strategy to check the shape of tensor operations. Consider the following program:

// Inferred type: Vec<Double, D2>
val a = Vec(1.0, 2.0)
// Inferred type: Vec<Double, D3>
val b = Vec(1.0, 2.0, 3.0)

val c = b + b

// Does not compile, shape mismatch
// a + b

Attempting to sum two vectors whose shapes do not match will fail to compile, and they must be explicitly resized.

// Inferred type: Mat<Double, D1, D4>
val a = Mat1x4(1.0, 2.0, 3.0, 4.0)
// Inferred type: Mat<Double, D4, D1>
val b = Mat4x1(1.0, 2.0, 3.0, 4.0)

val c = a * b

// Does not compile, inner dimension mismatch
// a * a
// b * b

Similarly, attempting to multiply two matrices whose inner dimensions do not match will fail to compile.

val a = Mat2x4( 
  1.0, 2.0, 3.0, 4.0,
  5.0, 6.0, 7.0, 8.0
)

val b = Mat4x2( 
  1.0, 2.0,
  3.0, 4.0,
  5.0, 6.0,
  7.0, 8.0
)

// Types are optional, but encouraged
val c: Mat<Double, D2, D2> = a * b 

val d = Mat2x1(1.0, 2.0)

val e = c * d

val f = Mat3x1(1.0, 2.0, 3.0)

// Does not compile, inner dimension mismatch
// e * f

Explicit types are optional but encouraged. Type inference helps preserve shape information over long programs.

fun someMatFun(m: Mat<Double, D3, D1>): Mat<Double, D3, D3> = ...
fun someMatFun(m: Mat<Double, D2, D2>) = ...

When writing a function, it is mandatory to declare the input type(s), but the return type may be omitted. Shape-safety is currently supported up to rank-2 tensors, i.e. matrices.

Example

The following example shows how to derive higher-order partials of a function z of type ℝ²→ℝ:

val z = x * (-sin(x * y) + y) * 4  // Infix notation
val `∂z∕∂x` = d(z) / d(x)          // Leibniz notation [Christianson, 2012]
val `∂z∕∂y` = d(z) / d(y)          // Partial derivatives
val `∂²z∕∂x²` = d(`∂z∕∂x`) / d(x)  // Higher-order derivatives
val `∂²z∕∂x∂y` = d(`∂z∕∂x`) / d(y) // Higher-order partials
val `∇z` = z.grad()                // Gradient operator

val values = arrayOf(x to 0, y to 1)

println("z(x, y) \t= $z\n" +
  "z(${values.map { it.second }.joinToString()}) \t\t= ${z(*values)}\n" +
  "∂z/∂x \t\t= $`∂z∕∂x` \n\t\t= " + `∂z∕∂x`(*values) + "\n" +
  "∂z/∂y \t\t= $`∂z∕∂y` \n\t\t= " + `∂z∕∂y`(*values) + "\n" +
  "∂²z/∂x² \t= $`∂z∕∂y` \n\t\t= " + `∂²z∕∂x²`(*values) + "\n" +
  "∂²z/∂x∂y \t= $`∂²z∕∂x∂y` \n\t\t= " + `∂²z∕∂x∂y`(*values) + "\n" +
  "∇z \t\t= $`∇z` \n\t\t= [${`∇z`[x]!!(*values)}, ${`∇z`[y]!!(*values)}]ᵀ")

Any backticks and unicode characters above are simply for readability and have no effect on the behavior. Running this program via ./gradlew HelloKotlingrad should produce the following output:

z(x, y)         = ((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)
z(0, 1)         = 0.0
∂z/∂x           = d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(x) 
                = 4.0
∂z/∂y           = d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(y) 
                = 0.0
∂²z/∂x²         = d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(y) 
                = 4.0
∂²z/∂x∂y        = d(d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(x)) / d(y) 
                = 4.0
∇z              = {y=d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(y), x=d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(x)} 
                = [4.0, 0.0]ᵀ

Variable capture

Not only does Kotlin∇'s type system encode output shape, it is also capable of tracking free and bound variables, for order-independent name binding and partial application. Expressions inhabited by free variables are typed as functions until fully bound, at which time they return a concrete value. Consider the following example:

val q = X + Y * Z + Y + 0.0
val p0 = q(X to 1.0, Y to 2.0, Z to 3.0) // Name binding
val p1 = q(X to 1.0, Y to 1.0)(Z to 1.0) // Variadic currying
val p3 = q(Z to 1.0)(X to 1.0, Y to 1.0) // Any order is possible
val p4 = q(Z to 1.0)(X to 1.0)(Y to 1.0) // Proper currying
val p5 = q(Z to 1.0)(X to 1.0) // Returns a partially applied function
val p6 = (X + Z + 0)(Y to 1.0) // Does not compile

This feature is made possible by encoding a type-level Hasse diagram over a small set of predefined variable names, with skip-connections for variadic combination and partial application. Curious readers may glean further details by referring to the implementation and usage example.

Visualization tools

Kotlin∇ provides various graphical tools that can be used for visual debugging.

Dataflow graphs

Kotlin∇ functions are a type of directed acyclic graph, called dataflow graphs (DFGs). For example, running the expression ((1 + x * 2 - 3 + y + z / y).d(y).d(x) + z / y * 3 - 2).render() will display the following DFG:

Red and blue edges indicate the right and left inputs to a binary operator, respectively. Consider the DFG for a batch of stochastic gradients on linear regression, which can be written in matrix form as $\nabla_{\Theta}||\mathbf{Y} - \mathbf{X}\Theta||^2$ :

Thetas represent the hidden parameters under differentiation and the constants are the batch inputs (X) and targets (Y). When all the free variables are bound to numerical values, the graph collapses into a single node, which can be unwrapped into a Kotlin Number.

Plotting

To generate the sample 2D plots below, run ./gradlew Plot2D.

Plotting is also possible in higher dimensions, for example in 3D via ./gradlew Plot3D:

Loss curves

Gradient descent is one application for Kotlin∇. Below, is a typical loss curve of SGD on a multilayer perceptron:

To train the model, execute ./gradlew MLP from within the parent directory.

Testing

To run the tests, execute ../gradlew allTests from the core directory.

Kotlin∇ claims to eliminate certain runtime errors, but how do we know the proposed implementation is not incorrect? One method, borrowed from the Haskell community, is called property-based testing (PBT), closely related to metamorphic testing. Notable implementations include QuickCheck, Hypothesis and ScalaTest (ported to Kotlin in Kotest). PBT uses algebraic properties to verify the result of an operation by constructing semantically equivalent but syntactically distinct expressions, which should produce the same answer. Kotlin∇ uses two such equivalences to validate its AD implementation:

Analytic differentiation: manually differentiate and compare the values returned on a subset of the domain with AD.
Finite difference approximation: sample space of symbolic (differentiable) functions, comparing results of AD to FD.

For example, consider the following test, which checks whether the analytical derivative and the automatic derivative, when evaluated at a given point, are equal to each other within the limits of numerical precision:

val x by Var()
val y by Var()

val z = y * (sin(x * y) - x)            // Function under test
val `∂z∕∂x` = d(z) / d(x)               // Automatic derivative
val manualDx = y * (cos(x * y) * y - 1) // Analytical derivative 

"∂z/∂x should be y * (cos(x * y) * y - 1)" {
  NumericalGenerator.assertAll { ẋ, ẏ ->
    // Evaluate the results at a given seed
    val autoEval = `∂z∕∂x`(x to ẋ, y to ẏ) 
    val manualEval = manualDx(x to ẋ, y to ẏ)
    // Should pass iff Δ(adEval, manualEval) < Ɛ
    autoEval shouldBeApproximately manualEval
  }
}

PBT will search the input space for two numerical values ẋ and ẏ, which violate the specification, then "shrink" them to discover pass-fail boundary values. We can construct a similar test using finite differences:

"d(sin x)/dx should be equal to (sin(x + dx) - sin(x)) / dx" {
  NumericalGenerator.assertAll { ẋ ->
    val f = sin(x)
    
    val `df∕dx` = d(f) / d(x)
    val adEval = `df∕dx`(ẋ) 
    
    val dx = 1E-8
    // Since ẋ is a raw numeric type, sin => kotlin.math.sin
    val fdEval = (sin(ẋ + dx) - sin(ẋ)) / dx
    adEval shouldBeApproximately fdEval
  }
}

Above, we compare numerical errors for three types of computational differentiation against infinite precision symbolic differentiation (IP):

Finite precision automatic differentiation (AD)
Finite precision symbolic differentiation (SD)
Finite precision finite differences (FD)

AD and SD both exhibit relative errors (i.e. with respect to each other) several orders of magnitude lower than their absolute errors (i.e. with respect to IP), which roughly agree to within numerical precision. As expected, FD exhibits numerical error significantly higher than AD and SD due to the inaccuracy of floating-point division.

There are many other ways to independently verify the numerical gradient, such as dual numbers or the complex step derivative. Another method is to compare the numerical output against a well-known implementation, such as TensorFlow. We plan to conduct a more thorough comparison of numerical accuracy and performance.

How?

To understand the core of Kotlin∇'s AD implementation, please refer to the scalar example.

This project relies on a few Kotlin-specific language features, which together enable a concise, flexible and type-safe user interface. The following features have proven beneficial to the development of Kotlin∇:

Operator overloading

Operator overloading enables concise notation for arithmetic on abstract types, where the types encode algebraic structures, e.g., Group, Ring, and Field. These abstractions are extensible to other kinds of mathematical structures, such as complex numbers and quaternions.

For example, suppose we have an interface Group, which overloads the operators + and *, and is defined like so:

interface Group<T: Group<T>> {
  operator fun plus(addend: T): T

  operator fun times(multiplicand: T): T
}

Here, we specify a recursive type bound using a method known as F-bounded quantification to ensure that operations return the concrete type variable T, rather than something more abstract like Group. Imagine a class Fun that has implemented Group. It can be used as follows:

fun <T: Group<T>> cubed(t: T): T = t * t * t

fun <T: Group<T>> twiceCubed(t: T): T = cubed(t) + cubed(t)

Like Python, Kotlin supports overloading a limited set of operators, which are evaluated using a fixed precedence. In the current version of Kotlin∇, operators do not perform any computation, they simply construct a directed acyclic graph representing the symbolic expression. Expressions are only evaluated when invoked as a function.

First-class functions

With higher-order functions and lambdas, Kotlin treats functions as first-class citizens. This allows us to represent mathematical functions and programming functions with the same underlying abstractions (typed FP). Several recent papers have demonstrated the expressiveness of this paradigm for automatic differentiation.

In Kotlin∇, all expressions can be treated as functions. For example:

fun <T: Group<T>> makePoly(x: Var<T>, y: Var<T>) = x * y + y * y + x * x
val x by Var()
val y by Var()
val f = makePoly(x, y)
val z = f(1.0, 2.0) // Returns a value
println(z) // Prints: 7

Additionally, it is possible to build functions consisting of varying dimensional inputs:

fun <T: Fun<T>> mlp(p1: VFun<T, D3>, p2: MFun<T, D3, D3>, p3: T) =
  ((p1 * p2 + p1 * p2 * p2 dot p1 + p1) - p3) pow p3

Multi-stage programming

Kotlin∇ uses operator overloading in the host language to first construct a dataflow graph, but evaluates the graph lazily. Called "multi-stage programming", or staging, this is a metaprogramming technique from the ML community which enables type-safe runtime code translation and compilation. More recently, staging has been put to effective use for compiling embedded DSLs similar to Kotlin∇.

In its current form, Kotlin∇ takes a "shallow embedding" approach. Similar to an interpreter, it adheres closely to the user-defined program and does not perform much code specialization or rewriting for optimization purposes. Unlike an interpreter, it postpones evaluation until all free variables in an expression have been bound. Consider the following snippet, which decides when to evaluate an expression:

var EAGER = false
operator fun invoke(newBindings: Bindings<X>): Fun<X> =
    Composition(this, newBindings).run { if (bindings.complete || EAGER) evaluate() else this }

If bindings are complete, this means there are no unbound variables remaining (implementation omitted for brevity), and we can evaluate the expression to obtain a numerical result. Suppose we have the following user code:

val x = Var()
val y = Var()
val z = Var()
val f0 = x + y * z
var f1 = f0(x to 1).also { println(it) } // Prints: (x + y * z)(x=1)
var f2 = f1(y to 2).also { println(it) } // Prints: (x + y * z)(x=1)(y=2)
var f3 = f2(z to 3).also { println(it) } // Prints: 7

Once the last line is reached, all variables are bound, and instead of returning a Composition, Kotlin∇ evaluates the function, returning a constant. Alternatively, if EAGER mode is enabled, each invocation is applied as early as possible:

EAGER = true
f1 = f0(x to 1).also { println(it) } // Prints: 1 + y * z
f2 = f1(y to 2).also { println(it) } // Prints: 1 + 2 * z
f3 = f2(z to 3).also { println(it) } // Prints: 7

In the following section, we describe how evaluation works.

Algebraic data types

Algebraic data types (ADTs) in the form of sealed classes (a.k.a. sum types) facilitate a limited form of pattern matching over a closed set of subclasses. By using these, the compiler forces us to provide an exhaustive control flow when type checking a sealed class. Consider the following classes:

class Const<T: Fun<T>>(val number: Number) : Fun<T>()
class Sum<T: Fun<T>>(val left: Fun<T>, val right: Fun<T>) : Fun<T>()
class Prod<T: Fun<T>>(val left: Fun<T>, val right: Fun<T>) : Fun<T>()
class Var<T: Fun<T>>: Fun<T>() { override val variables: Set<Var<X>> = setOf(this) }
class Zero<T: Fun<T>>: Const<T>(0.0)
class One<T: Fun<T>>: Const<T>(1.0)

When checking the type of a sealed class, consumers must explicitly handle every case, as incomplete control flow will produce a compiler error rather than fail at runtime. Consider a simplified definition of the superclass Fun, which defines invocation and differentiation using a restricted form of pattern matching:

sealed class Fun<X: Fun<X>>(open val variables: Set<Var<X>> = emptySet()): Group<Fun<X>> {
    constructor(vararg fns: Fun<X>): this(fns.flatMap { it.variables }.toSet())

    // Since the subclasses of Fun are a closed set, no `else  ...` is required.
    operator fun invoke(map: Bindings<X>): Fun<X> = when (this) {
        is Const -> this
        is Var -> map.getOrElse(this) { this } // Partial application is permitted
        is Prod -> left(map) * right(map) // Smart casting implicitly casts after checking
        is Sum -> left(map) + right(map)
    }

    fun d(variable: Var<X>): Fun<X> = when(this) {
       is Const -> Zero
       is Var -> if (variable == this) One else Zero
       // Product rule: d(u*v)/dx = du/dx * v + u * dv/dx
       is Prod -> left.d(variable) * right + left * right.d(variable)
       is Sum -> left.d(variable) + right.d(variable)
    }

    operator fun plus(addend: Fun<T>) = Sum(this, addend)

    operator fun times(multiplicand: Fun<T>) = Prod(this, multiplicand)
}

Symbolic differentiation as implemented by Kotlin∇ has two distinct passes, one for differentiation and one for evaluation. Differentiation constitutes a top-down substitution process on the computation graph and evaluation propagates the values from the bottom, up. This reduction semantics for this procedure are described more precisely in the specification.

Kotlin∇ functions are not only data structures, but Kotlin functions which can be invoked by passing a Bindings instance (effectively, a Map<Fun<X>, Fun<X>>). To enable this functionality, we overload the invoke operator, then recurse over the graph, using Bindings as a lookup table. If a matching subexpression is found, we propagate the bound value instead of the matching function. This is known as the interpreter pattern.

Kotlin's smart casting is an example of flow-sensitive type analysis where the abstract type Fun can be treated as Sum after performing an is Sum check. Without smart casting, we would need to write (this as Sum).left to access the member, left, causing a potential ClassCastException if the cast were mistaken.

Extension functions

By using extension functions, users can convert between numerical types in the host language and our eDSL, by augmenting classes with additional operators. Context-oriented programming, allows users to define custom extensions without requiring subclasses or inheritance.

data class Const<T: Group<T>>(val number: Double) : Fun()
data class Sum<T: Group<T>>(val e1: Fun, val e2: Fun) : Fun()
data class Prod<T: Group<T>>(val e1: Fun, val e2: Fun) : Fun()

class Fun<T: Group<T>>: Group<Fun<T>> {
  operator fun plus(addend: Fun<T>) = Sum(this, addend)
  
  operator fun times(multiplicand: Fun<T>) = Prod(this, multiplicand)
}

object DoubleContext {
  operator fun Number.times(expr: Fun<Double>) = Const(toDouble()) * expr
}

Now, we can use the context to define another extension, Fun.multiplyByTwo, which computes the product inside a DoubleContext, using the operator overload we defined above:

fun Fun<Double>.multiplyByTwo() = with(DoubleContext) { 2 * this } // Uses `*` operator in DoubleContext

Extensions can also be defined in another file or context and imported on demand. For example, Kotlin∇ also uses extensions to define shape-safe constructors and operators for vector and matrix arithmetic.

Multiple dispatch

In conjunction with ADTs, Kotlin∇ also uses multiple dispatch to instantiate the most specific result type of applying an operator based on the type of its operands. While multiple dispatch is not an explicit language feature, it can be emulated using inheritance.

Building on the previous example, a common task in AD is to simplify a graph. This is useful in order to minimize the total number of calculations required, improving numerical stability. We can eagerly simplify expressions based on algebraic rules of replacement. Smart casting allows us to access members of a class after checking its type, without explicitly casting it:

override fun times(multiplicand: Function<X>): Function<X> = when {
  this == zero -> this
  this == one -> multiplicand
  multiplicand == one -> this
  multiplicand == zero -> multiplicand
  this == multiplicand -> pow(two)
  this is Const && multiplicand is Const -> const(value * multiplicand.value)
  // Further simplification is possible using rules of replacement
  else -> Prod(this, multiplicand)
}

val result = Const(2.0) * Sum(Var(2.0), Const(3.0)) // Sum(Prod(Const(2.0), Var(2.0)), Const(6.0))

This allows us to put all related control flow on a single abstract class which is inherited by subclasses, simplifying readability, debugging and refactoring.

Shape-safe tensor operations

While first-class dependent types are useful for ensuring arbitrary shape safety (e.g., when concatenating and reshaping matrices), they are unnecessary for simple equality checking (such as when multiplying two matrices). When the shape of a tensor is known at compile-time, it is possible to encode this information using a less powerful type system*, as long as it supports subtyping and parametric polymorphism (a.k.a. generics). In practice, we can implement a shape-checked tensor arithmetic in languages like Java, Kotlin, C++, C# or Typescript, which accept generic type parameters. In Kotlin, whose type system is less expressive than Java, we use the following strategy.

Shape safety is currently supported up to rank-2 tensors, i.e. matrices. To perform dimension checking in our type system, we first enumerate a list of integer type literals as a chain of subtypes, C <: C - 1 <: C - 2 <: ... <: 1 <: 0, where C is the largest fixed-length dimension we wish to represent, which can be specified by the user prior to compilation. This guarantees linear space and time complexity for subtype checking, with a constant upper bound.

@file:Suppress("ClassName")
interface Nat<T: D0> { val i: Int } // Used for certain type bounds
sealed class D0(open val i: Int = 0) { companion object: D0(), Nat<D0> }
sealed class D1(override val i: Int = 1): D0(i) { companion object: D1(), Nat<D1> }
sealed class D2(override val i: Int = 2): D1(i) { companion object: D2(), Nat<D2> }
sealed class D3(override val i: Int = 3): D2(i) { companion object: D3(), Nat<D3> }
//... † Automatically generated

Next, we overload the call operator to emulate instantiating a collection literal, using arity to infer its dimensionality. Consider the rank-1 case for length inference on vector literals:

open class Vec<E, Len: D1>(val contents: List<E>)
fun <T> Vec(t1: T): Vec<T, D1> = Vec(listOf(t1))
fun <T> Vec(t1: T, t2: T): Vec<T, D2> = Vec(listOf(t1, t2))
fun <T> Vec(t1: T, t2: T, t3: T): Vec<T, D3> = Vec(listOf(t1, t2, t3))
//... † Automatically generated

Finally, we encode length as a parameter of the operand type. Since integer literals are a chain of subtypes, we need only define one operator using the highest literal, and can rely on Liskov substitution to preserve shape safety for all subtypes.

infix operator fun <C: D1, V: Vec<Int, C>> V.plus(v: V): Vec<Int, C> =
  Vec(contents.zip(v.contents).map { it.first + it.second })

The operator + can now be used like so. Incompatible operands will cause a type error:

val one = Vec(1, 2, 3) + Vec(1, 2, 3)          // Always runs safely
val add = Vec(1, 2, 3) + Vec(listOf(/*...*/))  // May fail at runtime
val sum = Vec(1, 2) + add                      // Does not compile

A similar syntax is available for matrices and higher-rank tensors. For example, Kotlin∇ can infer the shape of multiplying two matrices, and will not compile if their inner dimensions do not match:

open class Mat<X, R: D1, C: D1>(vararg val rows: Vec<X, C>)
fun <X> Mat1x2(d0: X, d1: X): Mat<X, D1, D2> = Mat(Vec(d0, d1))
fun <X> Mat2x1(d0: X, d1: X): Mat<X, D2, D1> = Mat(Vec(d0), Vec(d1))
//... † Automatically generated
operator fun <Q: D1, R: D1, S: D1> Mat<Int, Q, R>.times(m: Mat<Int, R, S>): Mat<Int, Q, S> = TODO()

// Inferred type: Mat<Int, D4, D4>
val l = Mat4x4(
  1, 2, 3, 4,
  5, 6, 7, 8,
  9, 0, 0, 0,
  9, 0, 0, 0
)

// Inferred type: Mat<Int, D4, D3>
val m = Mat4x3(
  1, 1, 1,
  2, 2, 2,
  3, 3, 3,
  4, 4, 4
)

// Inferred type: Mat<Int, D4, D3>
val lm = l * m
// m * m // Compile error: Expected Mat<3, *>, found Mat<4, 3>

Further examples are provided for shape-safe matrix operations such as addition, subtraction and transposition.

A similar technique is possible in Haskell, which is capable of a more powerful form of type-level computation, type arithmetic. Type arithmetic makes it easy to express convolutional arithmetic and other arithmetic operations on shape variables (say, splitting a vector in half), which is currently not possible, or would require enumerating every possible combination of type literals.

^∗ Many type systems are still capable of performing arbitrary computation in the type checker. As specified, Java's type system is known to be Turing Complete. It may be possible to emulate a limited form of dependent types in Java by exploiting this property, although this may not be computationally tractable due to the practical limitations noted by Grigore.

^† Statically generated code, shipped within the library. To regenerate these methods (e.g., using larger dimensions), a code generator is provided.

Intermediate representation

Kotlin∇ programs are staged into Kaliningraph, an experimental IR for graph computation. As written by the user, many graphs are computationally suboptimal due to expression swell and parameter sharing. To accelerate forward- and backpropagation, it is often advantageous to simplify the graph by applying the reduction semantics in a process known as graph canonicalization. Kaliningraph enables compiler-like optimizations over the graph such as expression simplification and analytic root-finding, and supports features for visualization and debugging, e.g., in computational notebooks.

Property delegation

Property delegation is a reflection feature in the Kotlin language which lets us access properties to which an instance is bound. For example, we can read the property name like so:

class Var(val name: String?) {
  operator fun getValue(thisRef: Any?, property: KProperty<*>) = Var(name ?: property.name)
}

This feature allows consumers to instantiate variables e.g., in an embedded DSL without redeclaring their names:

val x by Var()   // With property delegation
val x = Var("x") // Without property delegation

Without property delegation, users would need to repeat the property name in the constructor.

Experimental ideas

The current API is stable but can be improved in many ways. Currently, Kotlin∇ does not infer a function's input dimensionality (i.e. free variables and their corresponding shape). While it is possible to perform variable capture over a small alphabet using type safe currying, this technique incurs a large source code overhead. It may be possible to reduce the footprint using phantom types or some form of union type bound (cf. Kotlin, Java).

When the shape of an N-dimensional array is known at compile-time, we can use type-level integers to ensure shape conforming tensor operations (inspired by Nexus and others).

Allowing users to specify a matrix's structure in its type signature, (e.g., Singular, Symmetric, Orthogonal, Unitary, Hermitian, Toeplitz) would allow us to specialize derivation over such matrices (cf. section 2.8 of The Matrix Cookbook).

Church encoding

Computers appear to be very complicated machines. Beneath this complexity lies a remarkably simple idea: many apparently complex routines can be rewritten in terms of function composition. Consider the binary operator ^, which can be lowered as follows:

a ^ b :=  a * ... * a 
          \_________/
            b times
a * b :=  a + ... + a 
          \_________/
            b times
a + b :=  a + 1 + ... + 1
              \_________/
                b times
a := next*(next(...next(1)...))
     \________________/
          a times

∗ next is also called S in Peano arithmetic.

By using the λ-calculus, Church tells us, we can lower a large portion of mathematics onto a single operator: function application. Curry, by way of Schönfinkel, gives us combinatory logic, a kind of Rosetta stone for deciphering and translating between a host of cryptic languages. These two ideas, λ-calculus and combinators, are keys to unlocking many puzzles in computer science and mathematics.

Though mathematically elegant, Church numerals are not particularly efficient or pleasant to read. One discovers that trying to encode Church arithmetic in a language without dependent types grows quickly impractical. By selecting a higher radix, however, it is possible to reduce spatial complexity and improve readability, albeit at the cost of increased temporal complexity on certain operations (e.g., + and -). Kotlin∇ uses a binary encoding by default, however generators for other bases are also provided for convenience.

Type classes

The trouble with numerical towers is that they assume all inheritors are aware of the tower. In practice, many types we would like to reuse are entirely oblivious to our DSL. How do we allow users to bring in existing types without needing to modify their source code? This kind of ad hoc polymorphism can be achieved using a pattern called the type class. While the JVM does not allow multiple inheritance on classes, it does support multiple inheritance and default methods on interfaces, allowing users to implement an interface via delegation rather than inheritance.

Suppose we have a base type, Nat defined as an interface with a unitary member, nil, and its successor function, next, representing the Church encoding for natural numbers. To emulate instantiation, we can provide a nested class equipped with a constructor overriding nil and next as follows:

interface Nat<T> {
  val nil: T
  val one: T get() = nil.next()
  fun T.next(): T

  class of<T>(
    override val nil: T,
    val vnext: T.() -> T
  ): Nat<T> {
    override fun T.next(): T = vnext()
  }
}

Now, if we wanted to wrap an external type, such as Double, inside our tower, we could do so as follows:

val doubleNat = Nat.of(nil = 0.0) { this + 1.0 }

Although the Nat interface is very expressive, evaluating arithmetic expressions on Nats can be computationally expensive. For instance, we could define the first three hyperoperations naïvely as follows:

tailrec fun <T> Nat<T>.plus(l: T, r: T, acc: T = l, i: T = nil): T =
  if (i == r) acc else plus(l, r, acc.next(), i.next())

tailrec fun <T> Nat<T>.times(l: T, r: T, acc: T = nil, i: T = nil): T =
  if (i == r) acc else times(l, r, acc + l, i.next())

tailrec fun <T> Nat<T>.pow(base: T, exp: T, acc: T = one, i: T = one): T =
  if (i == exp) acc else pow(base, exp, acc * base, i.next())

However, we note that computing pow(a, b) using this representation requires 𝓞(a↑b) operations using Knuth notation. Clearly, we must do better if this encoding is to be usable. We can make Nat more efficient by introducing a subtype, Group, which forces implementors to define a native addition operator:

interface Group<T>: Nat<T> {
  override fun T.next(): T = this + one
  override fun T.plus(t: T): T

  class of<T>(
    override val nil: T, override val one: T,
    val plus: (T, T) -> T
  ): Group<T> {
    override fun T.plus(t: T) = plus(this, t)
  }
}

Given a Group, we can now define a more efficient implementation of Fibonacci. This will use the group-specific addition operator:

tailrec fun <T> Nat<T>.fibonacci(
  n: T,
  seed: Pair<T, T> = nil to one,
  fib: (Pair<T, T>) -> Pair<T, T> = { (a, b) -> b to a + b },
  i: T = nil,
): T =
  if (i == n) fib(seed).first
  else fibonacci(n = n, seed = fib(seed), i = i.next())

val doubleGroup = Group.of(one = 1.0, plus = { a, b -> a + b })
println(doubleGroup.fibonacci(10.0)) // Prints: 233.0

We could further extend this chain by introducing a subtype called Ring, which overrides + and requires implementors to define a native * operator. Rings and their relatives are known to have many useful applications in graph theory and statistics:

interface Ring<T>: Group<T> {
  override fun T.plus(t: T): T
  override fun T.times(t: T): T

  class of<T>(
    override val nil: T, override val one: T,
    val plus: (T, T) -> T,
    val times: (T, T) -> T
  ): Ring<T> {
    override fun T.plus(t: T) = plus(this, t)
    override fun T.times(t: T) = times(this, t)
  }
}

val doubleRing = Ring.of(one = 1.0, plus = { a, b -> a + b }, times = { a, b -> a * b })

Since differentiation is a linear map between function spaces, we now have the primitives necessary to build a fully-generic AD system, and could easily implement the sum and product rules. To view the above example in full, see Types.kt.

What benefit does this abstraction provide to the end user? By parameterizing over primitive operators, Kotlin∇ consumers can easily swap out a tensor backend without needing to alter or recompile any upstream dependencies. This feature makes multiplatform development a breeze: wherever a type class operator (e.g., + or *) with matching signature is encountered across a project, it will be dispatched to the user-supplied lambda delegate for specialized execution on custom hardware. Runtime indirection can be elided with proper compiler inlining for zero-cost abstraction.

Type arithmetic

By default, Kotlin∇ supports compile time type arithmetic in the following domain:

Fully symmetric arithmetic: { a ⍟ b ϵ [0..16){+,-,*}[0..16) | 0 ≤ a ⍟ b }
Asymmetric arithmetic: { a ⍟ b ϵ [0..512){+,-}[0..16) | 0 ≤ a ⍟ b < 512 }
Semi-symmetric arithmetic: { a / b = c, a = b * c | a, b, c ϵ [0..128) & a % b = 0 }

Arithmetic outside this domain is checked at runtime, prior to evaluation.

Compile time type arithmetic is achieved by generating a type-level representation of the Church encoding. A usage example is shown in ChurchArithmeticTest.kt, which may be run with the following command:

./gradlew :kotlingrad:cleanJvmTest :kotlingrad:jvmTest --tests "ai.hypergraph.kotlingrad.typelevel.church.ChurchArithmeticTest"

Extensions to other bases, including binary and decimal are also provided, which may be used as follows:

// Boolean arithmetic
val b32 = T.F
  .let { it + T.F }   // B_4<Ø>
  .let { it + T.F.F } // B_8<Ø>
  .let { it + T.T }   // T<T<F<T<Ø>>>>
  .let { it + T.F }   // T<F<T<T<Ø>>>>
  .let { it - T.F }   // T<T<F<T<Ø>>>>
  .let { it + T.F }   // T<F<T<T<Ø>>>>
  .let { it + T.F }   // T<T<T<T<Ø>>>>
  .let { it + T }     // T<F<F<F<Ø>>>>

assertEquals(T.F.F.F.F, b32)

// Chinese arithmetic
val 四十二 = (十七 减 九)
  .let { it 加 it }        // 六<一<无>>
  .let { (it 加 八) 加 六 } // 零<三<无>>
  .let { (it 减 三) 加 九 } // 六<三<无>>
  .let { (it 加 六) 除 六 } // 七<无>
  .let { (it 乘 六) 加 五 } // 七<四<无>>
  .let { (it 减 三) 减 九 } // 五<三<无>>
  .let { (it 加 五) 加 二 } // 二<四<无>>
  .also { assertEquals(六 乘 七, it) }

assertEquals(42, 四十二.toInt())

To alter the arithmetic domain, edit the file BinGen.kt/算盘厂.kt, then use the following command to regenerate Arithmetic.kt/算盘.kt:

./gradlew genShapes

In practice, compile time type arithmetic may struggle to compute numbers in excess of 4095. The Kotlin team has been informed of these issues:

This API is experimental and subject to change without notice. In the future, it will be used to statically type check tensor functions whose output shape is an arithmetic function of the input shapes, e.g., concatenation, splitting and convolution.

Grammar

For a detailed grammar and semantics, please refer to the Kotlin∇ specification.

UML Diagram

The following graph depicts the subtyping relation between classes and interfaces in Kotlin∇.

Comparison

Unlike certain frameworks which simply wrap an existing AD library in a type-safe DSL, Kotlin∇ contains a fully shape-safe implementation of algorithmic differentiation, written in pure Kotlin. By doing so, it can leverage Kotlin language features such as typed functional programming, as well as interoperability with other languages on the JVM platform. Furthermore, it implements symbolic differentiation, which unlike Wengert tape or dual-number based ADs, allows it to calculate derivatives of arbitrarily high order with zero extra engineering required. Further details can be found below.

Framework	Language	SD¹	AD²	HD³	DP⁴	FP⁵	TS⁶	SS⁷	DT⁸	MP⁹
Kotlin∇	Kotlin	✔️	✔️	✔️	✔️	✔️	✔️	✔️	🚧	✔️
DiffSharp	F#	❌	✔️	✔️	✔️	✔️	✔️	❌	❌	❌
TensorFlow.FSharp	F#	❌	❌	❌	✔️	✔️	✔️	✔️	❌	❌
shapesafe	Scala	🚧	🚧	🚧	🚧	✔️	✔️	✔️	🚧	❌
Nexus	Scala	❌	✔️	❌	✔️	✔️	✔️	✔️	❌	❌
Lantern	Scala	❌	✔️	✔️	✔️	✔️	✔️	❌	❌	❌
Hipparchus	Java	❌	✔️	❌	❌	❌	✔️	❌	❌	❌
JAutoDiff	Java	✔️	✔️	❌	❌	❌	✔️	❌	❌	❌
Eclipse DL4J	Java	❌	🚧	❌	❌	❌	✔️	❌	❌	❌
SICMUtils	Clojure	✔️	✔️	✔️	✔️	✔️	❌	❌	❌	❌
Halide	C++	❌	✔️	✔️	✔️	❌	✔️	❌	❌	❌
Tensor Safe	Haskell	❌	❌	❌	❌	✔️	✔️	✔️	✔️	❌
HaskTorch	Haskell	❌	❌	❌	❌	✔️	✔️	✔️	❌	❌
Dex	Haskell	❌	✔️	✔️	✔️	✔️	✔️	✔️	🚧	❌
Grenade	Haskell	❌	❌	❌	❌	✔️	✔️	✔️	❌	❌
Stalin∇	Scheme	❌	✔️	❌	❌	✔️	❌	❌	❌	❌
Myia	Python	✔️	✔️	✔️	✔️	✔️	❌	❌	❌	🚧
Autograd	Python	❌	✔️	❌	❌	❌	❌	❌	❌	❌
JAX	Python	❌	✔️	✔️	✔️	✔️	❌	❌	❌	🚧
Tangent	Python	❌	✔️	❌	❌	❌	❌	❌	❌	❌
Analitik	Analitik	✔️	❌	❌	❌	✔️	❌	❌	❌	❌

¹ Symbolic differentiation*, ² Automatic differentiation*, ³ Higher-order/rank differentiation, ⁴ Differentiable programming*, ⁵ Functional programming, ⁶ Compile-time type safety, ⁷ Compile-time shape safety, ⁸ Dependently Typed, ⁹ Multiplatform

^∗ Although we do not distinguish between AD and SD, here we adopt the authors' preferred nomenclature. We do make a distinction between differentiable programming libraries and those which simply construct neural networks. The 🚧 symbol indicates work in progress.

References

To the author's knowledge, Kotlin∇ is the first AD implementation in native Kotlin. While the particular synthesis of these ideas (i.e. shape-safe, functional AD, using generic types) is unique, it has been influenced by a long list of prior work in AD. Below is a list of projects and publications that helped inspire this work.

Automatic differentiation

Complexity

Fast parallel computation of polynomials using few processors, Valiant and Skyum (1983)
The complexity of partial derivatives, Baur and Strassen (1983)
Lower Bounds on Arithmetic Circuits via Partial Derivatives
Learning Restricted Models of Arithmetic Circuits

Differentiable programming

Calculus

The Matrix Calculus You Need For Deep Learning, Parr and Howard (2018)
Backpropagation in matrix notation, Mishachev (2017)
Matrix derivatives, from the Matrix Cookbook
Div, Grad, Curl and All That, Petersen and Pedersen (2012)
Matrix Differentiation (and some other stuff), Barnes (2006)
Symbolic Matrix Derivatives, Dwyer and Macphail (1948)

Computer algebra

Towards an API for the real numbers, Boehm (2020)
miniKanren as a Tool for Symbolic Computation in Python, Willard (2020)
A Design Proposal for an Object Oriented Algebraic Library, Niculescu (2003)
On Using Generics for Implementing Algebraic Structures, Niculescu (2011)
How to turn a scripting language into a domain-specific language for computer algebra, Jolly and Kredel (2008)
Evaluation of a Java Computer Algebra System, Kredel (2007)
Typesafe Abstractions for Tensor Operations, Chen (2017)
Einstein Summation in Numpy, Bilaniuk (2016)
Issues in Computer Algebra, Nunes-Harwitt
Term Rewriting and All That, Baader and Nipkow (1998)
Describing the syntax of programming languages using conjunctive and Boolean grammars, Okhotin (2016)
Formal languages over GF(2), Okhotin (2019)

Symbolic mathematics

KMath - Kotlin mathematics extensions library
SymJa - Computer algebra language & symbolic math library for Android
tensor - Linear algebra for tensors with symbolic and numeric scalars
Hipparchus - An efficient, general-purpose mathematics components library in the Java programming language
miniKanren - A tool for symbolic computation and logic programming
SymJava - A Java library for fast symbolic-numeric computation
JAS - Java Algebra System
jalgebra - An abstract algebra library for Java
COJAC - Numerical sniffing tool and Enriching number wrapper for Java
chebfun - Allows representing functions as Chebyshev polynomials, for easy symbolic differentiation (or integration)
horeilly1101/deriv - Open source derivative calculator REST API (and Java library)

AD libraries

TensorFlow.FSharp: An eDSL for writing numerical models in F# with support for interactive tensor shape-checking
Stalin∇, a brutally optimizing compiler for the VLAD language, a pure dialect of Scheme with first-class automatic differentiation operators
Autograd - Efficiently computes derivatives of NumPy code
Myia - SCT based AD, adapted from Pearlmutter & Siskind's "Reverse Mode AD in a functional framework"
JAX - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Dex - Research language for array processing in the Haskell/ML family
Nexus - Type-safe tensors, deep learning and probabilistic programming in Scala
Tangent - "Source-to-Source Debuggable Derivatives in Pure Python"
Grenade - composable, dependently typed, practical, and fast RNNs in Haskell
Lantern - a framework in Scala, based on delimited continuations and multi-stage programming
JAutoDiff - An Automatic Differentiation Library
DiffSharp, a functional AD library implemented in the F# language
Analitik - Algebraic language for the description of computing processes using analytical transformations

Special thanks

The following individuals have helped shape this project through their enthusiasm and thoughtful feedback. Please check out their work.

kotlingrad's People

Contributors

Stargazers

Watchers

kotlingrad's Issues

times() is not overloaded for Mat x Vec

In the current state of the library, it is impossible to multiply a m x n matrix with an n-dimensional vector. I've gotten around this by adding functions to convert between the types, however this is a construct I think should be supported by the library

I've glanced through the source, and it seems as though the inheritance relationship between these types is backwards. Mat is a subclass of Vec<Vec<E, Cols>, Rows>, however it seems as though Mat should be a top level type, while Vec is the special case where Cols=1.

Establish the connection with GF(2)

Kotlin∇ tries to build a mathematically sound type system capturing the relationship between functions and fields. This design should be described more formally, but in short there are two approaches:

The direct approach is to define Fun<X>: Field<X> where X is a member of some algebra over the reals, equipped with the usual arithmetic operators. When applying an operator, it is evaluated eagerly and you get back X immediately. This is a shallow eDSL, evaluated eagerly.
Another way is to treat the function as a member of a field whose elements are themselves functions, e.g. Fun<X>: Field<Fun<X>>. Instead of returning X, operators return Fun<X> by default, which can be evaluated by using invoke(...): X (X and Fun<X> are both fields).

The second representation comes from finite field theory, which has important implications for expression parsing and language design. It would be useful to understand this connection more deeply.

References

Update Viktor to 1.1.0

Is your feature request related to a problem? Please describe.
Viktor 1.0.1 must not be used because it depends on jcenter repository, which is deprecated, so it's very important to upgrade to Viktor 1.1.0 published on the Central.

Describe the solution you'd like
Upgrade Viktor from 1.0.1 to 1.1.0.

Typo in installation guide

gpr.user=<USERNAME>
grp.key=<PERSONAL_ACCESS_TOKEN>

The second line should be gpr....

Confused me for a while with unhelpful "Could not resolve" errors from gradle.

Typo in test

https://github.com/breandan/kotlingrad/blob/master/src/test/kotlin/edu/umontreal/kotlingrad/calculus/TestGradient.kt

val x = Var("x")
val y = Var("x")

I guess that should be

val x = Var("x")
val y = Var("y")

How to add library to project?

Is kotlingrad hosted in any maven repos (such as jcenter or central)? I'd like to use this but I can't find a source to import it from

Possible collaboration with Facebook?

@breandan Your project seems to be a true state of the art autodiff library! Facebook worked on bringing autodifferentiation to Kotlin last year: https://ai.facebook.com/blog/paving-the-way-for-software-20-with-kotlin/

Any news on what happened? They don't seem to have released a library yet.
it would be really nice if you joined forces, they might hire you and your library has a lot of technical merits, from quick look Kotlingrad might be the "best" autodiff library out there (although extensive benchmarks vs e.g JAX/XLA are missing)
Such a collaboration could help bring traction toward Kotlin for machine learning, especially if Facebook made the revolutionnarily disruptive decision to fund GraalPython [0]

[0] Kotlin is interopperable with Python through GraalPython https://github.com/oracle/graalpython
finally you might find this blog interesting http://www.stochasticlifestyle.com/engineering-trade-offs-in-automatic-differentiation-from-tensorflow-and-pytorch-to-jax-and-julia/ (although you will probably not learn anything new from it :)

unrelated but I wonder what Ndarray implementation do you use? ND4J is by far the implementation with the most human resources, and it can easily use optimized backends such as openBLAS or better: Intel MKL https://github.com/eclipse/deeplearning4j/tree/master/nd4j

finally, a long term idea for KotlinGrad might be to develop a compiler plugin. For easing this process, arrow meta can be used https://github.com/arrow-kt/arrow-meta

Cross-compilation support for Java & OpenJDK 8

Currently, running ./gradlew test under Java 8 generates the following output:

$ ./gradlew test

...

FAILURE: Build failed with an exception.

* What went wrong:
java.lang.UnsupportedClassVersionError: org/openjfx/gradle/JavaFXPlugin has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
> org/openjfx/gradle/JavaFXPlugin has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 362ms

This seems to imply that support for Java 8 and OpenJDK 8 (which is still the mainstream java environment due to license issues) is discarded in favour of java 11.

Can we revise the gradle build file such that it supports cross-compiilation for both Java 8 and Java 11?

Formulas and diagrams in README not visible in dark mode

Hi 👋

I've got GitHub dark mode enabled.

When I look at the README, I'm not able to view the diagrams and formulas.

The formulas in the 'Math' column are black text on a transparent background
I can see the blue/red arrows in the dataflow graph, but the text is also black

When I look at the dataflow graphs directly, because they are SVGs GitHub renders them with a transparent background

Suggestions

SVG Images

Here are my suggestions for improving the visibility of the SVG images:

In the SVGs, add a white background (no more transparent backgrounds)
Create two different SVGs, one for dark mode, and another for light mode. Apply the appropriate GitHub theme context CSS class to each image.

Users can now specify the theme an image is displayed for in Markdown. Appending #gh-dark-mode-only or #gh-light-mode-only to the end of an image url will define whether it's only shown to viewers using a light or a dark GitHub theme.

Here's an example of a commit adding support for light/dark logos)

Formulas

For the formulas I see that they are images that are referenced using the GitHub API

https://github.com/breandan/kotlingrad/blob/0.4.5/README.md?plain=1#L113

https://github.com/breandan/kotlingrad/blob/0.4.5/README.md?plain=1#L134

Perhaps the GitHub syntax might support dark mode? https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions

Here are some examples:

Inline expression (using dollars): $\mathbf{A}(\mathbf{B})$

Block using $$:

$$\mathbf{A}(\mathbf{B})$$

Block using code fence math code fence block:

$$\mathbf{A}(\mathbf{B})$$

They look good to me! Screenshot of the above example

AD with heterogeneous types in function inputs & outputs

This is a a request for more information.

Currently, it is only possible to represent functions where all inputs and outputs share a single type. In future iterations, it is possible to extend support for building functions with varying input/output types and enforce constraints on both, using covariant and contravariant type bounds.

This sounds like an important limitation. 🤔

Curious: is there the same limitation in Lantern or Tangent ?

Can see reference to Nexus - very interesting.

When the shape of an N-dimensional array is known at compile-time, we can use type-level integer literals to ensure shape conforming tensor operations (inspired by Nexus and others).

"Kotlin𝛁 supports shape-safe tensor operations by encoding tensor rank as a parameter of the operand’s type signature."

It would very useful to have more details and (ideally) code examples to compare across all three frameworks and advise on future plans to advance. It would be a cool topic for a series of blog posts.

Migrate to context-free grammar

Following KMath [Nozik (2018)] Kotlin∇ currently implements Hirschfield et al.'s (2005) context-oriented programming style. We define a set of numerical contexts within which consumers must perform all operations (e.g. with(DoubleContext) {...}). As Nozik admits, this approach has certain disadvantages. It is also somewhat awkward to use e.g. in a notebook setting where code may span multiple cells. It should be possible to redefine the type system in a context-free manner to emulate the implicit type conversion semantics of the host language, without enclosing everything inside an explicit context.

Compile to Kaliningraph IR

Kaliningraph is a BLAS-backed graph library which supports computation graphs with interactive debugging and visualization. This would enable users to "see" backpropagation and visually step through the program while it is executing. It also supports constant propagation and has a sparse matrix representation, for optimized execution. We will need to implement an arithmetic circuit encoder to benefit from Miller's evaluator.

java.lang.module.ResolutionException in dependencies

I've add dependencies
implementation("ai.hypergraph:kotlingrad:0.4.7")
in Gradle project, build.gradle.

Compiled without error. When I tried to run:

> Task :run FAILED Error occurred during initialization of boot layer java.lang.module.ResolutionException: Modules guru.nidi.graphviz.kotlin and guru.nidi.graphviz export package guru.nidi.graphviz to module XXXX

where XXXX could be svgSalamander or com.ibm.icu.

I can figure out the problem of two modules java/kotlin graphviz both define package guru.nidi.graphviz.

I search on internet, without any clue.

Thanks for any help.

Port the library to Kotlin/Multiplatform

Describe the solution you'd like
Change the project model to Kotlin/Multiplatform and provide targets other than JVM.

The deployed Dokka HTML is empty

The deployed documentation site is empty (I'll try to investigate this on my own, just reporting).

http://breandan.net/kotlingrad/

Scala DSL

Purpose

Introducing a secondary tensor operation DSL (Domain Specific Language) written in & optimised for Scala language & various compilers (the most common of which are JVM based scalac 2.12+, other options are Scala.js & Dotty)

Despite not being a dependently typed language, Scala may be favourable for some reasons:

Scala is the preferred language of Apache Spark, Apache Flink, CHISEL (Constructing Hardware In Scala Embedded Language) and NVidia RAPIDS, all of which are important infrastructures for large scale grid learning.
Singleton types (shapeless / 2.13) allows arbitrary number of shape types to be generated on-demand, without relying on code generator or metaprogramming
Path-dependent types allows shape algebra to be defined for both symbols & numbers
Type inference by summoning implicits is more flexible for theorem proving than inheritance & delegation
Operator overloading & infix syntax allows DSL to be closer to math notations, e.g. vec1 dot_* vec2 can be easier to read than dotProduct(vec1, vec2)
Advanced type algebra can be expressed in macro & roadmap features. Specifically, the singleton-ops library and it’s associated discussions in SIP-23 have suggested that Scala 2.13 & Dotty will use a more direct & expressive approach

Usage

A PoC project has been submitted at:

https://github.com/tribbloid/shapesafe/tree/master

At this moment (06/07/2020) it only implemented very few operations (dot_product, concat) for double vectors. However most features in the following list has been proved and can be showcased by running gradle test on DoubleVectorSpec.scala.

Required & proven features:

Type-level shape representation for vector
Compile-time shape safety for operations with constant shapes (1)
Run-time shape safety for operations with variable shapes (2)
(1) can smoothly degrade to (2) for more flexible programming paradigm
Not relying on compile-time code generation or defining large number of compile-time classes, both of which may stress a JVM classloader
Not relying on esoteric compiler plugin, macro or unstable libraries. (the PoC only uses singleton-ops which is actively maintained by lightbend core team)

Required & unproven:

Type-level shape representation for matrix
Type-level shape representation for arbitrary tensor, with no limitation on its arity and dimensionality (ideally through using recursive generic types, the same technique that defined shapeless.HList)
Support for UInt & Float element data types
Support for more operations, particularly if used frequently by ANN

Nice to have:

Not relying on church type encoding (The only violation in my PoC is shapeles.Nat), which causes slow compilation
Compile-time & run-time shape safety for named tensor
Shape algebra defined for symbols instead of number supported by user-defined axioms & compile-time theorem proving
Dotty or Scala.js support

Not in the scope:

AutoDiff / Differentiable Programming: should be agnostic to DSL
Interoperability with Apache Spark, NVidia RAPIDS or any other library, models should manifest into different executions regardless of DSL being used
Shape safe compositor / pipeline API, too much work

How it fits into roadmap

The competition for supremacy of deep learning ecosystem is brutal and unforgiving. With torch & tensorflow dominating both research & development phase, people have little reasons to steer away from Python & a dynamically typed, procedurally validated scaffolding paradigm. But there are exceptions: the large scale, mission critical, complex systems in production, like autopilot and SLAM, most likely prefers spending much effort reinventing & maintaining a more verbose and arduous code base written in C++ or other low level languages. For these systems, demands for built-in correctness and predictability of execution far outweights the ability to write more concise code.

This is, IMHO, the market niche where kotlingrad can fit in: for mega-engineering rather than prototyping. In particular, to enable users to:

write provably valid neural architecture WITHOUT sanity test
if not possible, write neural architecture with lower test coverage that validates the part that cannot be proven, the 80-20 rule in test coverage is very real and account for most edge case failures in an architecture that lacks any type.
under the above premise, write short & easy to understand code

. in that order. My design & optimisation of DSLs should be consistent with this doctrine. The chosen of Scala & JVM as carrier should naturally put kotlingrad in the ecosystem of Apache Spark, and maybe of RISC-V on the long run, both of which are for large scale production.

unaryPlus overload for group elements

Is your feature request related to a problem? Please describe.

It would be nice to have explicit unary plus prefix op for group elements, etc.

In my usecase, it would be nice for code consistency (even assuming that this operator will just return this)

https://github.com/mipt-npm/kmath/blob/2723c376d9b123cb0d311411d6757d6fb8f379b7/kmath-kotlingrad/src/main/kotlin/kscience/kmath/kotlingrad/ScalarsAdapters.kt#L88

Describe the solution you'd like
Add unaryPlus for SFun.

Time traveling visual debugger

Debugging Kotlin∇ code within IntelliJ IDEA can be somewhat cumbersome due to the functional API structure (lots of deeply-nested stack traces and context switching). To facilitate more user-friendly debugging, we should add support for visual debugging by exposing Kaliningraph’s built-in graph visualization capabilities. For example, the user could attach a debugger to the computation graph by writing fn.debug(bindings), which would open a browser window and allow them to replay the rewriting steps, see numerical values flowing through the nodes and step forward and backward through time.

Alternatively, a better approach might be to reuse Buche, which is specifically designed for visual debugging. In this case, we would simply need to attach to the browser using a background process then write the logs to e.g. standard output. Perhaps it is possible to get Buche to play together with Kotlin/JS somehow, need to investigate this approach further to understand its complexity as compared with Kweb rendering.

Could not resolve dependencies even with maven repo added.

My build.gradle.kts file is like

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    java
    kotlin("jvm") version "1.3.50"
}

group = "..."
version = "..."

repositories {
    maven("https://maven.pkg.github.com/breandan/kotlingrad")
    mavenCentral()
}

dependencies {
    implementation(kotlin("stdlib-jdk8"))
    implementation("edu.umontreal:kotlingrad:0.1")
    testCompile("junit", "junit", "4.12")
}

configure<JavaPluginConvention> {
    sourceCompatibility = JavaVersion.VERSION_1_8
}
tasks.withType<KotlinCompile> {
    kotlinOptions.jvmTarget = "1.8"
}

and it fails to resolve edu.umontreal:kotlingrad:0.1.

Add `.idea/codeStyles` to the repository

Is your feature request related to a problem? Please describe.
I don't know how to use the IntelliJ IDEA formatter to format the project.

Describe the solution you'd like
Track .idea/codeStyles folder in the Git repository.

Unable to resolve dependencies

When using

    <dependency>
      <groupId>ai.hypergraph</groupId>
      <artifactId>kotlingrad</artifactId>
      <version>0.4.7</version>
    </dependency>

I'm unable to resolve any ai types.

If I switch to

    <dependency>
      <groupId>ai.hypergraph</groupId>
      <artifactId>kotlingrad-jvm</artifactId>
      <version>0.4.7</version>
    </dependency>

instead, I get failures to resolve kotlin-bom

Unresolved dependency: 'org.jetbrains.kotlin:kotlin-bom:jar:1.6.10'

Currently it seems impossible to use this project with the current pom setup. Do you have an example of an external project using kotlingrad where this works?

Publish roadmap and contributing doc?

I'm curious if your visions include making it a feature-complete NN training framework?

What will be the master plan? Integrating with Torch/TF/MXNet or build hardware-level compilation framework from scratch?

Also, what is the standard & code of conduct for contributions from the community?

I'm totally convinced of its capability and believe it can fit into the missing link between horizontal AI (Apache Spark/Apache Flink) and vertical AI (domain-specific hardware, Chisel/FIRRTL)

Jar file of 0.3.5 not published on jitpack

See:

https://jitpack.io/com/github/breandan/kotlingrad/0.3.5/

Compared to:

https://jitpack.io/com/github/breandan/kotlingrad/0.3.4

Publish Dokka generated API documentation

Is your feature request related to a problem? Please describe.
My API exposes some Kotlingrad types, and my Dokka site lacks links to Kotlingrad's one.

Describe the solution you'd like
Publish Dokka generated static site to breandan.github.io/kotlingrad

Gradual typing

Currently, Kotlin∇ requires declaring all type and shape information at compile time. It should be possible to relax this constraint to support more flexible prototyping. To implement this feature, we would need to introduce an untyped class hierarchy to the DSL. Bidirectional operator overloads could be used to translate to and from the typed and untyped hierarchies. The semantics for these conversions must be thought out more carefully, but essentially would not be dissimilar to the existing operators for dimension switching, i.e. between scalar, vector and matrix functions.

References

Gradually typed symbolic expressions, Broman and Siek (2018)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

breandan / kotlingrad Goto Github PK

kotlingrad's Introduction

Kotlin∇: Type-safe Symbolic Differentiation for the JVM

Table of contents

Introduction

Features

Usage

Installation

Gradle

Maven

Jupyter Notebook

Notation

Higher-Rank Derivatives

Higher-order derivatives

Shape safety

Example

Variable capture

Visualization tools

Dataflow graphs

Plotting

Loss curves

Testing

How?

Operator overloading

First-class functions

Multi-stage programming

Algebraic data types

Extension functions

Multiple dispatch

Shape-safe tensor operations

Intermediate representation

Property delegation

Experimental ideas

Church encoding

Type classes

Type arithmetic

Grammar

UML Diagram

Comparison

References

Automatic differentiation

Complexity

Differentiable programming

Calculus

Computer algebra

Symbolic mathematics

Neural networks

Type systems

Domain-specific languages

Automated testing

AD libraries

Special thanks

kotlingrad's People

Contributors

Stargazers

Watchers

Forkers

kotlingrad's Issues

References

Suggestions

SVG Images

Formulas

Purpose

Usage

How it fits into roadmap

References

Recommend Projects

Recommend Topics

Recommend Org