#PyAutoDiff
PyAutoDiff automatically compiles NumPy code using Theano's powerful symbolic engine, allowing users to take advantage of features like mathematical optimization, GPU acceleration, and automatic differentiation.
This library is under active development. Features may break or change.
###Decorators
PyAutoDiff provides simple decorators for compiling arbitrary NumPy functions and their derivatives. For most users, these will be the primary interface to autodiff.
from autodiff import function, gradient
# -- compile a Theano function
@function
def f(x):
return x ** 2
print f(5.0) # 25.0
# -- compile a function returning the gradient
@gradient
def f(x):
return x ** 2
print f(5.0) # 10.0
# -- compile a function returning the gradient only with respect to a specific input
@gradient(wrt='y')
def f(x, y):
return x * y
print f(3.0, 5.0) # 3.0
Users can call a higher-level optimization interface that wraps SciPy minimization routines (currently L-BFGS-B, nonlinear conjugate gradient, and Newton-CG), using autodiff to compute the required derivatives and Hessian-vector products.
import numpy as np
from autodiff.optimize import fmin_l_bfgs_b
# -- A trivial least-squares minimization problem
def fn(x):
y = np.arange(3.0)
return ((x - y) ** 2).mean()
x_opt = fmin_l_bfgs_b(fn, init_args=np.zeros(3))
print x_opt # [0.0, 1.0, 2.0]
The Symbolic
class allows more general tracing of NumPy objects through (potentially) multiple functions. Users should call it's trace
method on any functions and arguments, followed by either the compile_function
or compile_gradient
method in order to get a compiled Theano function.
Critically, Symbolic
can compile functions not only from existing arguments and results, but of any NumPy object
referenced while tracing. The following example traces objects through three different functions and ultimately compiles a function of an existing argument, a global variable, and a local variable via autodiff's tag
mechanism:
import numpy as np
import theano.tensor
from autodiff import Symbolic, tag
# -- a vanilla function
def f1(x):
return x + 2
# -- a function referencing a global variable
y = np.random.random(10)
def f2(x):
return x * y
# -- a function with a local variable
def f3(x):
z = tag(np.ones(10), 'local_var')
return (x + z) ** 2
# -- create a general symbolic tracer and apply it to the three functions
x = np.random.random(10)
tracer = Symbolic()
out1 = tracer.trace(f1, x)
out2 = tracer.trace(f2, out1)
out3 = tracer.trace(f3, out2)
# -- compile a function representing f(x, y, z) = out3
new_fn = tracer.compile_function(inputs=[x, y, 'local_var'],
outputs=out3)
# -- compile the gradient of f(x) = out3, with respect to y
fn_grad = tracer.compile_gradient(inputs=x,
outputs=out3,
wrt=y,
reduction=theano.tensor.sum)
assert np.allclose(new_fn(x, y, np.ones(10)), f3(f2(f1(x))))
Autodiff classes are also available (the decorators are simply convenient ways of automatically wrapping functions in classes). In addition to the function` and gradient decorators/classes shown here, a Hessian-vector product class and decorator are also available.
from autodiff import Function, Gradient
def fn(x):
return x ** 2
f = Function(fn) # compile the function
g = Gradient(fn) # compile the gradient of the function
print f(5.0) # 25.0
print g(5.0) # 10.0
The Function
class and @function
decorator use Theano to compile the target function. PyAutoDiff has support for all NumPy operations with Theano equivalents and limited support for many Python behaviors (see caveats).
The Gradient
class and @gradient
decorator compile functions which return the gradient of the the target function. The target function must be scalar-valued. A wrt
keyword may be passed to the class or decorator to indicate which variables should be differentiated; otherwise all arguments are used.
The HessianVector
class and @hessian_vector
decorator compile functions that return the product of an argument's Hessian and an arbitrary vector (or tensor). The vectors must be provided to the resulting function with the _tensors
keyword argument.
The autodiff.optimize
module wraps some SciPy minimizers, automatically compiling functions to compute derivatives and Hessian-vector products that the minimizers require in order to optimize an arbitrary function.
The Symbolic
class is used for general purpose symbolic tracing, usually through multiple functions. It creates Function
instances as necessary to trace different variables, and has compile_function()
and compile_gradient()
methods to get the compiled functions corresponding to a traced set of operations.
PyAutoDiff replaces many variables with symbolic Theano versions. This can cause problems, because some Theano functions do not support symbolic arguments. To resolve this, autodiff provides a constant()
modifier, which instructs PyAutoDiff to construct a non-symbolic (and therefore constant) version of a variable.
Most of the time, users will not have to call constant()
-- it is only necessary in certain cases.
For example, the following functions will compile, because the axis
argument 1
is loaded as a constant, even when bound to a variable a
.
from autodiff import constant, function
m = np.ones((3, 4))
@function
def fn_1(x):
return x.sum(axis=1)
@function
def fn_2(x):
a = 1
return x.sum(axis=a)
print fn_1(m)
However, the decorated function's arguments are always assumed to be symbolic. Therefore, the following function will fail because the axis
argument is the symbolic variable a
and tensor.sum
does not accept symbolic arguments:
@function
def bad_fn(x, a):
return x.sum(axis=a)
print bad_fn(m, 1) # error
By calling 'constant()appropriately, we can convert the symbolic variable back to a constant
int`. Now the function will compile:
@function
def good_fn(x, a):
return x.sum(axis=constant(a))
print good_fn1(m, 1)
PyAutoDiff tacing makes it relatively easy to access a function's symbolic inputs and outputs, allowing Theano to compile the function with ease. However, advanced users may wish to access the symbolic representations of other variables, including variables local to the function. Autodiff stores symbolic variables by the id of the corresponding Python object, something which may not always be available to the user. Instead, users can manually tag symbolic variables with arbitrary keys, as the following example demonstrates:
from autodiff import tag
@function
def local_fn(x):
y = tag(x + 2, 'y_var')
z = y * 3
return z
local_fn(10.0) # call local_fn to trace and compile it
y_sym = local_fn.s_vars['y_var'] # access the symbolic version of the function's
# local variable 'y', tagged as 'y_var'
Tagging is especially useful in combination with autodiff's Symbolic
class, as it allows tracing and compiling functions of purely local variables. An example of this behavior can be found in the Symbolic section of the Quickstart.
Pay attention to dtypes -- they are locked when Theano compiles a function. In particular, note the following:
- The gradient of an integer argument is defined as zero.
- Theano only supports
float32
operations on the GPU
Autodiff supports most forms of advanced indexing. One notable exception is the combination of indices and slices, which Theano does not recognize (at the time of this writing). For example:
x[[1, 2, 3], 2:] # not supported
Generally, PyAutoDiff supports any NumPy operation with a Theano equivalent. You will probably get unexpected results if you use more general Python operations like control flow tools (for
, if/else
, try/except
, etc.) or iteraters without understanding how Theano handles them.
When PyAutoDiff prepares to compile, it calls the Python function one time in order to find out what it does. With the exception of NumPy arrays and numbers, whatever happens on that first run is locked into the compiled function: the length of every for
loop, the selected branch of every if/else
statement, even the axis of every np.sum(axis=my_var)
.
In the current version of PyAutoDiff, there is a way to avoid this problem, but at the cost of significantly more expensive calculations. If an autodiff class is instantiated with keyword use_cache=False
, then it will not cache its compiled functions. Therefore, it will reevaluate all control flow statements at every call. However, it will call the NumPy function, compile a Theano function, and call the Theano function every time -- meaning functions will take at least twice as long to run and possibly more. This should only be used as a last resort if more clever designs are simply not possible.
As a rule of thumb: if the code you're writing doesn't operate directly on a NumPy array, then there's a good chance it won't behave as you expect.
Here is an example of compilation "locking" a control flow, and how to set use_cache
to avoid it:
from autodiff import function
def loop_mult(x, N):
y = 0
for i in range(N):
y += x
return y
f = Function(loop_mult)
print f(2, 4) # 8
print f(2, 5) # also 8! The loop is locked in the compiled function.
g = Function(loop_mult, use_cache=False)
print g(2, 4) # 8
print g(2, 5) # 10, but a much slower calculation than the cached version.
- James Bergstra for bringing PyAutoDiff to light.
- Travis Oliphant for posting a very early version of numba that provided the inspiration and starting point for this project.
- The entire Theano team.