joelberkeley / spidr Goto Github PK
View Code? Open in Web Editor NEWAccelerated machine learning with dependent types
License: Apache License 2.0
Accelerated machine learning with dependent types
License: Apache License 2.0
In Acquisition
, is there a simpler way (e.g. using S
), to require batch_size
is positive while actually passing in the batch size and not batch_size - 1
which would be a weird API?
This is for completeness, since it's perfectly reasonable to do this w/o any data.
When a Tensor
is used more than once, e.g. x
in x = 1; y = x + x
, a new XlaOp
is created for every usage, rather than using reusing the XlaOp
. This wastes time in constructing, and likely compiling, the graph. It also may use more memory than necessary.
Refactor the internals of Tensor
and Tensor
ops such that values are reused.
See idris-lang/Idris2#53 for status on Idris2 support for building docs
ArrayLike
can accept any data type, including Vect
s (e.g. an ArrayLike [3, 4, 5] Double
is the same as an ArrayLike [3, 4] (Vect 5 Double)
). This may make type inference awkward with no known benefit. Is it possible and beneficial to restrict the definition?
Integer width is relevant in a number of points in the chain of constructing and using Tensor
s:
Integer
or Int
or Int32
etc.)Strictly orthogonal Broadcastable
constructors may make implementations easier
In some case it's useful to be able to say "this thing works for this shape and any shapes isomorphic to it". Isomorphic shapes include those with extra or fewer dimensions of length 1 e.g. [3, 1, 2], [3, 2] and [1, 3, 2]. It is essentially a cross between Squeezable
and a subset of Broadcastable
.
A use case I came across was ClosedFormDistribution
for Gaussian
. This is implemented for event shape [1], but could equally be implemented for [] or [1, 1], with sth like
Isomorphic shape [] => ClosedFormDistribution shape Gaussian where
...
and we'd presumably reshape within the implementation as appropriate. Similar could be done for GaussianProcess
so that we don't need targets = [1]
Do we want a dedicated operator for element-wise multiplication, division etc, so that x * y
is always
the mathematical version, and readers can differentiate between that and, say, x *# y
for element-wise multiplication?
I suspect there is well-founded logic behind what shapes can and can't be broadcast to others. It would be really nice to have Broadcastable
use that logic, so that we know it's complete and why (and perhaps even prove it's complete?), as well as other potential niceties like better type inference, unique values for each pair of from
and to
shapes etc.
Scalar Tensor
values have a natural interpretation as numerics. We could implement the numeric interfaces Num
, Neg
, Abs
etc. for these.
So that we can begin to use broadcasting, add the basic definition of broadcasting, such that code type checks for adding an axis of any size : shape
-> n :: shape
for any n
Run Poplibs in the CI run. This helps towards being able to call Poplibs from Idris.
The Cholesky factor can be reused for inference and for calculating the marginal likelihood. Write an implementation that takes this into account so it's not recalculated.
This may help https://gregorygundersen.com/blog/2019/09/12/practical-gp-regression/
Requires #105
see title
Many Tensor
tests don't test with inf and nan values. This is particularly true for hard-coded test cases (often arrays). All ops that use Double
should have such tests.
The standard operator for in-place division is /=
in other languages. In Idris, that operator is used for inequality. How do we resolve this?
Add at least two Idris 2 tests to CI. Can be of really simple functions like \x => x + 1
and \x => -x
so that it's easy to add tests for new Idris functionality.
Write and implement a tutorial on Gaussian process inference and how it is designed in spidr. We can do this as a latex literate file and include equations for the code we're using, then build it and publish it to the website if possible.
Run XLA in the CI run. This helps towards being able to call XLA from Idris.
The XLA docs explain how a tensor of shape [n, m] can be broadcast to [p, n, m] or a [n, p, m] when the axes to match with are specified. Do we want to support this? As of writing, this kind of broadcasting is possible in spidr by expanding dims then broadcasting.
Qus:
The author didn't know how to choose operator precedence when defining new operators. As such, they we all guess. Go through all of them and find a good precedence for all of them.
Should also check associativity too, though there was less guessing involved there.
Call a simple C function (e.g. \x => x + 1
) from Idris 2
To the end of running the backend from Idris, it would help to be able to run an example from pure C.
Idris version number in package description file is a string because the compiler won't accept it in the format illustrated in the docs. Have raised this with Idris devs: idris-lang/Idris2#1373
The Compiler module should only be used to provide implementation for Tensor
functionality. It shouldn't be importable by client code.
Support complex numbers (integral and/or floating point depending on what XLA supports).
It may be possible to use dependent types to implement Einstein summation (as per tf.einsum
) with no runtime overhead and type-checked index grammar ("...ij -> ...ji"
etc.)
This would introduce gradient descent algorithms into spidr
Requires #67 . Possibly not needed if BNNs or neural processes cover this already
In order to call into Poplibs from Idris, we want to install Poplibs as part of the CI run
As a user, it would be nice to be able to write scalar tensors using numeric literals. This would be possible if we implemented fromInteger
and fromDouble
. Bear in mind that this may confuse the type checker, as e.g. 1 * 2
will be ambiguous.
It may be better if we autogenerate the C wrapper for XLA. SWIG is one option for this
The likelihood mean is unused in the calculation of the GP posterior. This is error-prone and counterintuitive. Options include using it, or only accepting the likelihood variance in the arguments
Via variational inference or warped GPs, or some other method. Possibly not needed if BNNs or neural processes cover this already
When mathematical ops fail, should they return an Error
or do something else? How does XLA handle failure for maths ops?
Idris is unable to resolve the shapes for matrix multiplication operator @@
, and specifying the shapes is highly unergonomic
Research how to implement autograd.
We could do this be hand-writing the derivative of each op, though it would be best if we could ensure that grad can only be used if the entire composed function has gradients implemented throughout.
One possibility to do this is to add an additional data constructor for Tensor
, where the first pattern corresponds to op on the value, and latter to the derivative of the op on the value. Totality may then ensure derivatives are available throughout. Additionally, this may work well with functions of multiple variables.
References
Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation
This is a major milestone in the project, where we can run some backend code from Idris. The example can be of any size, as long as it's possible to verify it's working.
Probabilistic programming seems to be an increasingly popular approach to programming with probability distributions. Determine what that might look like in Idris, and whether it's worth adopting.
TensorFlow allows broadcasting to dimensions of length 0. Do we allow this and if not, should we?
e.g. 1 :: t
-> 0 :: t
It would be illustrative to compare a Gaussian process implementation to a neural network implementation with uncertainty estimates, both from a design perspective but also a performance perspective.
Implement a ProbabilisticModel
for whatever was implemented in #62, and use it in the Bayesian optimization tutorial for the failure data. Whether it's appropriate for that usage is probably not too important right now
An alternative is a neural process
Usages of Empiric
and ProbabilisticModel
require the target shape to be specified even when it can be derived from the distribution event shape.
Note: the compiler balks if the explicit target shape does not match that in the distribution, which implies it knows they need to be the same shape.
If we implement standard category-theoretic interfaces, such as Functor
, Foldable
and SemiGroup
, this will make spidr more easily usable in the idris ecosystem, as well as provide the standard benefits of such interfaces.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.