aesara-devs / aeppl Goto Github PK
View Code? Open in Web Editor NEWTools for an Aesara-based PPL.
Home Page: https://aeppl.readthedocs.io
License: MIT License
Tools for an Aesara-based PPL.
Home Page: https://aeppl.readthedocs.io
License: MIT License
We need to add support for mixtures defined using the IfElse
Op
.
The following is a simple example that isn't supported:
import aesara.tensor as at
from aesara.ifelse import ifelse
from aeppl.joint_logprob import factorized_joint_logprob
srng = at.random.RandomStream(seed=2320)
I_rv = srng.bernoulli(0.5, name="I")
X_rv = srng.normal(0, 1, name="X")
Y_rv = srng.gamma(0.5, 0.5, name="Y")
Z_rv = ifelse(I_rv, X_rv, Y_rv)
Z_rv.name = "Z"
z_vv = Z_rv.clone()
i_vv = I_rv.clone()
logp_parts = factorized_joint_logprob({Z_rv: z_vv, I_rv: i_vv})
I'm happy to work on docstrings if you find this task to be tedium. ๐
Let me know, @brandonwillard!
Please provide a minimal, self-contained, and reproducible example.
Line 92 in 18275e2
Please provide the full traceback of any errors.
AssertionError: sigma > 0
Apply node that caused the error: Assert{msg='sigma > 0'}(Elemwise{Composite{((i0 + (i1 * sqr(i2))) - log(i3))}}[(0, 2)].0, All.0)
Please provide any additional information below.
Sampling with PyMC (master) fails with assertion errors and does not treat AssertionErorr as divergent sample
Expected Behaviour
return -inf
Possible Solution
This snippet solved my issue
aesara.assert_op.Assert = lambda name: (lambda res, *cond: aesara.tensor.switch(
aesara.tensor.all(aesara.tensor.stack([c.all() for c in cond])),
res,
-np.inf
))
python -c "import aesara; print(aesara.config)"
)Working in pymc-devs/pymc#5438, I realized that the pattern that we need to match for Marginalized RVs is not too bad:
import aesara.tensor as at
from aeppl import joint_logprob
weights_rv = at.random.dirichlet([1, 1], name="weights")
components = at.stack([at.random.normal(name="c1"), at.random.normal(name="c2")], axis=-1)
mix_rv = at.sum(components * weights_rv, axis=-1)
mix_rv.name = "mix"
# weights can also just be a deterministic, only thing that matters is that it adds up to 1
weights_vv = weights_rv.clone()
mix_vv = mix_rv.clone()
joint_logprob({weights_rv: weights_vv, mix_rv: mix_vv})
import aesara.tensor as at
import aeppl
from aeppl.transforms import TransformValuesOpt, DEFAULT_TRANSFORM
p_rv = at.random.beta(1, 1, name="p")
p_vv = p_rv.clone()
tr = TransformValuesOpt({p_vv: DEFAULT_TRANSFORM})
logp_dict = aeppl.factorized_joint_logprob({p_rv: p_vv}, extra_rewrites=tr)
logp_dict
assert p_vv in logp_dict # raises
Noted by @kc611
It seems natural that we could compute the logprob graph for several independent variables (e.g, if we have more than one likelihood term, or disconnected "potential" terms).
x_rv = at.random.uniform(0, 1, name="x")
y_rv = at.random.uniform(0, 5, name="y")
z_rv = at.random.uniform(0, 10, name="z")
x = x_rv.clone()
y = y_rv.clone()
z = z_rv.clone()
joint_logprob(
[x_rv, y_rv, z_rv],
{x_rv: x, y_rv: y, z_rv: z},
)
However, I am afraid this could lead to double counting issues...
Also, right now we can stack RVs, but I guess this is limited by the shapes of the RVs
joint_logprob(
at.stack([x_rv, y_rv, z_rv]),
{x_rv: x, y_rv: y, z_rv: z},
)
The HalfNormal
in aeppl seems to accomodate the loc
parameter, which means the default transform should be Interval[loc, None]
, assuming the loc is just shifting the cutoff point of the HalfNormal
from zero.
This was pointed out during aeppl
integration of this particular test
https://github.com/pymc-devs/pymc3/blob/main/pymc3/tests/test_transforms.py#L114
The implementation of Stickbreaking
transform concatenates along the default axis instead of last axis (axis = -1
).
Line 277 in 9d9ffb7
Please provide a minimal, self-contained, and reproducible example.
An MWE can be as follows:
import aesara
import aesara.tensor as at
import numpy as np
value = at.matrix("value")
# Forward
log_value = at.log(value)
shift = at.sum(log_value, -1, keepdims=True) / value.shape[-1]
value = log_value[..., :-1] - shift
# Backward
value = at.concatenate([value, -at.sum(value, -1, keepdims=True)])
exp_value_max = at.exp(value - at.max(value, -1, keepdims=True))
result = exp_value_max / at.sum(exp_value_max, -1, keepdims=True)
result = aesara.function([value],[result])
result(np.array([[0.09,0.01,0.9],[0.09,0.01,0.9]])) # Not sure about the exact values
Please provide the full traceback of any errors.
TypeError: Bad input argument to aesara function with name "/tmp/ipykernel_8640/3915568923.py:1" at index 0 (0-based).
Backtrace when that variable is created:
File "/home/kc611/anaconda3/envs/pymcv4/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 532, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/kc611/anaconda3/envs/pymcv4/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2899, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/kc611/anaconda3/envs/pymcv4/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2944, in _run_cell
return runner(coro)
File "/home/kc611/anaconda3/envs/pymcv4/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "/home/kc611/anaconda3/envs/pymcv4/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3170, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "/home/kc611/anaconda3/envs/pymcv4/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3361, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/home/kc611/anaconda3/envs/pymcv4/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_8640/4089953573.py", line 9, in <module>
value = at.concatenate([value, -at.sum(value, -1, keepdims=True)])
Non-unit value on shape on a broadcastable dimension.
Please provide any additional information below.
Whereas concatenating along the last axis we get an implementation that works
import aesara
import aesara.tensor as at
import numpy as np
value = at.matrix("value")
# Forward
log_value = at.log(value)
shift = at.sum(log_value, -1, keepdims=True) / value.shape[-1]
value = log_value[..., :-1] - shift
# Backward
value = at.concatenate([value, -at.sum(value, -1, keepdims=True)], axis = -1) # Concatenating along axis = -1
exp_value_max = at.exp(value - at.max(value, -1, keepdims=True))
result = exp_value_max / at.sum(exp_value_max, -1, keepdims=True)
result = aesara.function([value],[result])
result(np.array([[0.09,0.01,0.9],[0.09,0.01,0.9]])) # Not sure about the exact values
# array([[0.23974926, 0.22131646, 0.53893428],
# [0.23974926, 0.22131646, 0.53893428]])
cc @ricardoV94 (Not sure which one of you guys pointed this out)
The logprob graph for truncated distributions is straightforward, not very different from that of censored distributions that we have already implemented.
However we need a new Op / graph to represent / generate truncated draws from an arbitrary RV.
Symbolically we can take truncated draws from any RV by doing rejection sampling untill all desired draws are in the range lower/upper. This however can be incredibly slow if the truncation interval corresponds to a very small mass of the original distribution.
If we have an inverse cdf function we can more easily obtain random draws by drawing from a uniform distribution in the truncation range and taking the inverse cdf of those points.
Finally we might have specialized Ops, graphs that could be dispatched to an RV (for instance if one has already implemented a TruncatedNormal based on the scipy distribution)
I think we could use a hierarchical dispatch strategy to go from the most to least specialized forms, and perhaps wrap the result in an OpFromGraph for easy logprob parsing. Here is a pseudo-code suggestion of how this could be implemented;
def truncate(rv, lower, upper):
# Try to dispatch on specific graph/Op
try:
truncated = _truncate(rv)
except NotImplementedError:
# Try to use [i]cdf if they are implemented for given RV
try:
icdf = _icdf(rv)
cdf_lower = at.exp(_logcdf(rv, lower))
cdf_upper = at.exp(_logcdf(rv, upper))
uniform = at.random.uniform(cdf_lower, cdf_upper, size=rv.size)
truncated = icdf(uniform)
except NotImplementedError:
# Default to slow while scan graph
...
# Wrap truncated in a custom OpFromGraph that can be easily parsed for the logprob component?
...
This is just a quick sketch, happy to hear other ideas.
The entire "ignore_logprob" feature should really be removed, or, if anything, reimplemented in some non-tag
-based way. In order to do this, we need to do something about its use in mixture_replace
, though.
Originally posted by @brandonwillard in #72 (comment)
We have a PR open for an AePPL conda-forge recipe here: conda-forge/staged-recipes#15766.
Consider the following scenario in which a value variable is the output of a graph (i.e. non-"atomic"):
import aesara
import aesara.tensor as at
from aeppl import joint_logprob
Y_rv = at.random.normal(0, 1, size=(2,), name="Y")
y_vv = Y_rv.clone()
y_vv.name = "y"
y_vv_2 = 2 * y_vv
y_vv_2.name = "y_2"
logprob_non_atomic = joint_logprob({Y_rv: y_vv_2})
aesara.dprint([logprob_non_atomic, y_vv_2])
# Sum{acc_dtype=float64} [id A] ''
# |Assert{msg='sigma > 0'} [id B] 'y_2_logprob'
# |Elemwise{sub,no_inplace} [id C] ''
# | |Elemwise{sub,no_inplace} [id D] ''
# | | |Elemwise{mul,no_inplace} [id E] ''
# | | | |InplaceDimShuffle{x} [id F] ''
# | | | | |TensorConstant{-0.5} [id G]
# | | | |Elemwise{pow,no_inplace} [id H] ''
# | | | |Elemwise{true_div,no_inplace} [id I] ''
# | | | | |Elemwise{sub,no_inplace} [id J] ''
# | | | | | |Elemwise{mul,no_inplace} [id K] 'y_2'
# | | | | | | |InplaceDimShuffle{x} [id L] ''
# | | | | | | | |TensorConstant{2} [id M]
# | | | | | | |y [id N]
# | | | | | |InplaceDimShuffle{x} [id O] ''
# | | | | | |TensorConstant{0} [id P]
# | | | | |InplaceDimShuffle{x} [id Q] ''
# | | | | |TensorConstant{1} [id R]
# | | | |InplaceDimShuffle{x} [id S] ''
# | | | |TensorConstant{2} [id T]
# | | |InplaceDimShuffle{x} [id U] ''
# | | |Elemwise{log,no_inplace} [id V] ''
# | | |TensorConstant{2.5066282746310002} [id W]
# | |InplaceDimShuffle{x} [id X] ''
# | |Elemwise{log,no_inplace} [id Y] ''
# | |TensorConstant{1} [id R]
# |All [id Z] ''
# |Elemwise{gt,no_inplace} [id BA] ''
# |TensorConstant{1} [id R]
# |TensorConstant{0.0} [id BB]
# Elemwise{mul,no_inplace} [id BC] 'y_2'
# |InplaceDimShuffle{x} [id BD] ''
# | |TensorConstant{2} [id M]
# |y [id N]
The graph output demonstrates that the non-"atomic" y_vv_2
has been cloned in the joint_logprob
output (i.e. the IDs of the y_vv_2
term and the equivalent sub-graph in logprob_non_atomic
are not equal).
It looks like y_vv_2
is being cloned when rvs_to_value_vars
is called here.
While this behavior isn't necessarily a bug, it can be confusing and sometimes complicate things for callers of joint_logprob
(e.g. aesara-devs/aehmc#25 (comment)), so we should prevent this cloning whenever possible.
Right now transforms expect RandomVariable inputs and fail otherwise. It would be useful if they were more flexible and allowed, for instance, RVs / MeasurableVariables defined via OpFromGraph
.
Here is an example that currently fails with MixtureRVs:
import aesara.tensor as at
from aeppl import joint_logprob
from aeppl.transforms import TransformValuesOpt, LogOddsTransform
idx_rv = at.random.bernoulli(0.5, name='idx')
mixture1 = at.random.beta(100, 1, name='mixture_comp1')
mixture2 = at.random.beta(1, 100, name='mixture_comp2')
mixture_rv = at.stack(mixture1, mixture2)[idx_rv] # MixtureRV is artificially created via `at.stack` to trigger `OpFromGraph`
mixture_rv.name = 'mixture'
idx_vv = idx_rv.clone()
mixture_vv = mixture_rv.clone()
transform_opt = TransformValuesOpt({mixture_vv: LogOddsTransform()})
logp = joint_logprob(
{mixture_rv: mixture_vv, idx_rv: idx_vv},
extra_rewrites=transform_opt,
)
ERROR (aesara.graph.opt): Optimization failure due to: transform_values
ERROR (aesara.graph.opt): node: mixture-mixture{inline=True}(ScalarFromTensor.0, mixture_comp1, mixture_comp2, RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F3233008E40>), RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F3232C05040>), RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F3232BA4820>))
ERROR (aesara.graph.opt): TRACEBACK:
ERROR (aesara.graph.opt): Traceback (most recent call last):
File "/home/ricardo/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/graph/opt.py", line 2025, in process_node
replacements = lopt.transform(fgraph, node)
File "/home/ricardo/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/graph/opt.py", line 1187, in transform
return self.fn(*args, **kwargs)
File "/home/ricardo/Documents/Projects/aeppl/aeppl/transforms.py", line 140, in transform_values
new_op = _create_transformed_rv_op(node.op, transform)()
File "/home/ricardo/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/tensor/random/op.py", line 124, in __init__
self.name = name or getattr(self, "name")
AttributeError: 'TransformedMixtureRV' object has no attribute 'name'
logp.eval({idx_vv: 0, mixture_vv: 10})
# array(-inf)
In contrast this works:
idx_rv = at.random.bernoulli(0.5, name='idx')
mixture = at.random.beta([100, 1], [1, 100], name='mixture_components')
mixture_rv = mixture[idx_rv]
mixture_rv.name = 'mixture'
idx_vv = idx_rv.clone()
mixture_vv = mixture_rv.clone()
transform_opt = TransformValuesOpt({mixture_vv: LogOddsTransform()})
logp = joint_logprob(
{mixture_rv: mixture_vv, idx_rv: idx_vv},
extra_rewrites=transform_opt,
)
logp.eval({idx_vv: 0, mixture_vv: 10})
# array(-6.09256229)
There is nothing special about transform rewrite that requires RandomVariable
input, since they work on the value variables and not on the respective RVs, except for the base logprob, which by definition is available for any MeasurableVariable
.
This cropped up while working in #22, even though value transforms are pretty useless there.
import aesara.tensor as at
import aeppl
x_rv = at.random.normal(name='x')
y_rv = at.random.normal(x_rv, 1, name="y")
y_vv = y_rv.clone()
logprob_dict = aeppl.factorized_joint_logprob({y_rv: y_vv})
logprob_dict
KeyError Traceback (most recent call last)
<ipython-input-6-2e07a6eebca4> in <module>
7 y_vv = y_rv.clone()
8
----> 9 logprob_dict = aeppl.factorized_joint_logprob({y_rv: y_vv})
~/.local/lib/python3.8/site-packages/aeppl/joint_logprob.py in factorized_joint_logprob(rv_values, warn_missing_rvs, extra_rewrites, **kwargs)
165 continue
166
--> 167 q_rv_value_vars = [
168 replacements[q_rv_var]
169 for q_rv_var in outputs
~/.local/lib/python3.8/site-packages/aeppl/joint_logprob.py in <listcomp>(.0)
166
167 q_rv_value_vars = [
--> 168 replacements[q_rv_var]
169 for q_rv_var in outputs
170 if not getattr(q_rv_var.tag, "ignore_logprob", False)
KeyError: x
In most of the local optimizers we return a specific class node (eg TransformedRV
, MixtureRV
). We should be explicit about it in the type hints, instead of using the more generic Optional[List[Node]]
This is a follow up on a discussion that started in #22 (comment)
I believe that whenever aeppl cannot provide a rewrite for a given graph the safest way to proceed is to raise an exception. In the following snippet the returned output is not a "joint_lopgrob" graph at all. The slightly obscure warning that is raised right now sounds insufficient. Users should not be expected to know what kind of graphs aeppl will be able to parse or not, and have to investigate the graph to see if some terms happen to be missing.
import aesara.tensor as at
import aeppl
x_rv = at.random.normal(name="x")
y_rv = at.cos(x_rv)
z_rv = at.random.normal(name='z') # Just so that the returned output is not `None`
y_vv = y_rv.clone()
z_vv = z_rv.clone()
aeppl.joint_logprob({y_rv: y_vv, z_rv: z_vv})
UserWarning: Found a random variable that was neither among the observations nor the conditioned variables: normal_rv{0, (0, 0), floatX, False}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FDFC27D6D60>), TensorConstant{[]}, TensorConstant{11}, TensorConstant{0.0}, TensorConstant{1.0})
I suggested in that PR to add a simple check at the end of factorized_joint_logprob
of the form of:
missing_value_terms = set(original_values.values()) - set(logprob_vars.keys())
if missing_value_terms:
raise NotImplementedError(
f"The logprob terms for the following value variables could not be derived: {missing_value_terms}"
)
SciPy's implementation might not be the best to follow, so let's revisit the implementation of this logpdf
.
Originally posted by @ricardoV94 in #1 (comment)
Copying comments from this exploratory #102 PR:
This PR is a proof of concept for how we could handle composite RVs that require nested aeppl rewrites. The goal is to figure out if this approach is too limited / cumbersome and, if so, what could work better.
Roughly, most of our rewrites come in two parts:
_logprob
can be safely dispatchedIn step 2, we often need inputs that are themselves measurable variables, but which don't have value variables assigned to them:
Lines 283 to 287 in 05d0c68
If these inputs are RandomVariables things are fine, as those are measurable by default. Otherwise, we might still be dealing with things that could be measurable... but we will never know because they don't (and shouldn't) have value variables and their corresponding rewrites won't be triggered due to step 1.
The illustrating example in this PR is a scalar mixture that has a clipped variable as one of it's components:
x = at.clip(at.random.normal(loc=1), 0.5, 1.5); x.name = "x"
y = at.random.beta(1, 2, size=None, name="y")
comps = at.stack(x, y); comps.name = "comps"
idxs = at.random.bernoulli(0.4, size=None, name="idxs")
mix = comps[idxs]; mix.name = "mix"
mix_vv = mix.clone()
idxs_vv = idxs.clone()
logp = joint_logprob({idxs: idxs_vv, mix: mix_vv})
I have also added a test example with nested scalar mixtures
The suggestion is that any rewrite that depends on other inputs being measurable, can assign a temporary value variable UPSTREAM_VALUE
to those inputs. This value variable is automatically discarded by the PreserveRVMappings
when such variables are converted to measurable ones by their own rewrites.
This would work for some current and future rewrites like #26, but not for everything.
For example nested rewrites that depend on the direct manipulation / creation of value variables such as inc_subtensor and scans would not work. This would also not work when the _logprob
dispatched function depends on having RandomVariables as inputs, as happens with non-scalar mixtures:
Line 354 in 05d0c68
Also depending on the order with which rewrites are called this may not always work, because the attribution of the UPSTREAM_VALUE
is not a visible graph change, and I guess the Equilibrium optimization will stop if after passing through all nodes, none is changed
We now have a good framework for easily implementing the HMC-related RandomVariable
transforms (or, as the ML people have (re)labeled them, "normalizing flows").
We can start by creating a single Aesara rewrite (e.g. using aesara.graph.opt.local_optimizer
) that replaces transformed RandomVariable
s with an intermediate form in the "sample-space" graph (i.e. the first input to joint_logprob
). That intermediate form could be a new RandomVariable
type (e.g. TransformedRV
) with a _logprob
dispatch that implements the change-of-variables formula (including the Jacobian). This new RandomVariable
type would essentially be a wrapper around the original RandomVariable
that only serves to provide a distinct _logprob
dispatch.
There was some work documented here already: pymc-devs/pymc#4396
Would be a great feature to include in the library.
CC'ing @manthehunted on the off-chance that he/she is still interested in this
Our mixture logprob graphs are tailored for univariate mixtures, by either relying on rv_pull_down
rewrites that only work for univariate random variables or assuming there is a 1-1 mapping between the shape value variable and the shape of each random variable component, which is not the case for multivariate distributions.
Line 335 in 3331081
Lines 344 to 347 in 3331081
Lines 317 to 320 in 3331081
The meta information present in RandomVariable.[ndim_supp, ndims_params]
and the logic in aesara.tensor.random.utils.broadcast_params`s should give us the tools to infer the right base log-probability shape.
This could also be aided by aesara-devs/aesara#695
Currently, Scan
log-probability support only handles cases in which the MeasurableVariable
is created inside the body/step function of the Scan
, and not when the body/step function simply references a MeasurableVariable
that is being iterated over by the Scan
.
For example, the following is not supported:
import aesara
import aesara.tensor as at
from aeppl.joint_logprob import factorized_joint_logprob
srng = at.random.RandomStream(seed=2320)
N = 10
Y_rv = srng.normal(0, 1, size=N, name="Y")
def step_fn(y_t):
return y_t
Y_1T_rv, _ = aesara.scan(
fn=step_fn,
sequences=[Y_rv],
strict=True,
)
y_vv = Y_1T_rv.clone()
y_vv.name = "y"
logp_parts = factorized_joint_logprob({Y_1T_rv: y_vv})
This example is very trivial, but, if we change step_fn
so that it performs a supported, measurable operation on y_t
(e.g. indexing a mixture), it wouldn't work for the same reason.
When a value is assigned to Scan
output terms like Y_1T_rv
, we could "push" the relevant sequences
inputs into the step function. In other words, we could construct the type of graph we currently handle.
Working from the example above, we would rewrite the Scan
into something like the following:
# Apply a rewrite like `local_rv_size_lift` to get properly `size`-broadcasted parameters
# in a new variable `new_Y_rv`
mu_bcast, sigma_bcast = new_Y_rv.owner.inputs[3:]
def new_step_fn(mu_t, sigma_t)
return Y_rv.owner.op(mu_t, sigma_t, name="Y[t]")
new_Y_1T_rv, _ = aesara.scan(
fn=new_step_fn,
sequences=[mu_bcast, sigma_bcast],
strict=True,
)
Both #26 and #22 raise the question of how to deal with automatic RVs whose value_variable results from an intermediate broadcast of the base RV
loc = at.vector("loc")
y_base_rv = at.random.normal(0, 1, name="y_base_rv", size=2)
y_rv = loc + y_base_rv
y_val = y_rv.clone()
logp = joint_logprob(y_rv, {y_rv: y_val})
logp.eval({y_val: [0, 0, 0, 0], loc_val: [0, 0, 0, 0]})
The computed logp is only correct when loc
is of size==2
.
This might be even more complicated if the y_base_rv
inputs are themselves variable (meaning it may itself have not a fixed size)
I see two ways we could deal with this:
y_base_rv.shape == y_val.shape
in the returned logprobThe following snippet is failing locally:
import aesara
import aesara.tensor as at
from aeppl import joint_logprob
x = at.scalar('x')
beta_rv = at.random.normal(0, 1, name='beta')
y_rv = at.random.normal(beta_rv*x, 1, name='y')
beta = beta_rv.type()
y = y_rv.type()
logp = joint_logprob(y_rv, {beta_rv: beta, y_rv: y})
logp_fun = aesara.function([x, beta, y], logp)
---------------------------------------------------------------------------
UnusedInputError Traceback (most recent call last)
<ipython-input-1-ea0e0eb814fa> in <module>
11
12 logp = joint_logprob(y_rv, {beta_rv: beta, y_rv: y})
---> 13 logp_fun = aesara.function([x, beta, y], logp)
~/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/compile/function/__init__.py in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
335 # note: pfunc will also call orig_function -- orig_function is
336 # a choke point that all compilation must pass through
--> 337 fn = pfunc(
338 params=inputs,
339 outputs=outputs,
~/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/compile/function/pfunc.py in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys)
522 inputs.append(si)
523
--> 524 return orig_function(
525 inputs,
526 cloned_outputs,
~/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/compile/function/types.py in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys)
1970 try:
1971 Maker = getattr(mode, "function_maker", FunctionMaker)
-> 1972 m = Maker(
1973 inputs,
1974 outputs,
~/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/compile/function/types.py in __init__(self, inputs, outputs, mode, accept_inplace, function_builder, profile, on_unused_input, fgraph, output_keys, name)
1573
1574 # Check if some input variables are unused
-> 1575 self._check_unused_inputs(inputs, outputs, on_unused_input)
1576
1577 # Make a list of (SymbolicInput|SymblicInputKits, indices,
~/Documents/Projects/aeppl/venv/lib/python3.8/site-packages/aesara/compile/function/types.py in _check_unused_inputs(self, inputs, outputs, on_unused_input)
1745 )
1746 elif on_unused_input == "raise":
-> 1747 raise UnusedInputError(msg % (inputs.index(i), i.variable, err_msg))
1748 else:
1749 raise ValueError(
UnusedInputError: aesara.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 0 is not part of the computational graph needed to compute the outputs: x.
To make this error into a warning, you can pass the parameter on_unused_input='warn' to aesara.function. To disable it completely, use on_unused_input='ignore'.
This way, it works as expected:
logp = joint_logprob(y_rv, {beta_rv: beta, y_rv: y, x:x})
It's a small suggestion, but I think showing an actual logp evaluation can make it much more intuitive what the library is about. Specially for people who are not familiar with aesara/theano.
During discussion of pymc-devs/pymc#5155 we (@ricardoV94 and I) also talked a bit about the API for aeppl.factorized_joint_logprob
and how it deals with random variables that are not specified at all.
I think it might be better to change the api a bit so that all random variables in the graph have to be specified somehow. That would make it more future proof (eg implement marginalization using closed form solution or some kind of numerical integration) and also make it closer to the math notation and harder for users to shoot themselves into the foot.
Let's say we have a graph like this:
import aesara
import aesara.tensor as at
import aeppl
a = at.random.normal()
b = at.random.normal(loc=a)
a_val = at.dscalar()
b_val = at.dscalar()
We could now be interested in those values (math notation):
# joint logp
P(a = a_val, b = b_val)
# some conditional logps
P(a = a_val | b = b_val) = P(a = a_val)
P(b = b_val | a = a_val)
# marginal logps
P(a = a_val) = \int_b_val P(a = a_val | b = b_val) P(b = b_val) # integral doesn't do anything...
P(b = b_val) = \int_a_val P(b = b_val | a = a_val) P(b = b_val) # this one does
# Random variables whose expectation is the marginal logp (but that might converge *really* slowly or even not at all in many cases)
P_rv(a = a_val | b = B), where B ~ b # not sure actually how to write this down properly
P_rv(b = b_val | a = A), where A ~ a
If we call aeppl.factorized_joint_logprob(all_vars)
, we get all the conditional logps: {key_var: P(key_var = key_val | all_remaining) for (key_var, key_val) in all_vars}
, which seems fine to me (although I might only be interested in some of those in the first place, but I think this is fine anyway).
If we call aeppl.factorized_joint_logprob({b: b_val})
however, we actually get the random variable instead of the marginal logp, even though the call looks almost like the math notation of the marginal logp.
We could make this choice explicit by switching this use case to an api like this:
aeppl.factorized_joint_logprob({b: b_val}, sample=[a])
Which we could later extend to marginalization using something those:
aeppl.factorized_joint_logprob({b: b_val}, marginalize=[a]) # closed form solution only
aeppl.factorized_joint_logprob({b: b_val}, marginalize=[a], integration_options={a: "gauss_hermite(21)"}) # some numerical integration
I didn't have time to figure out the source of the problem, but the test suite in PyMC started to fail multiple places with aeppl 0.0.16:
https://github.com/pymc-devs/pymc/pull/5185/checks
https://github.com/pymc-devs/pymc/pull/5159/checks
When dealing with transformed variables we first back-transform them so that they can be safely used as inputs to other downstream variables in here:
Line 145 in 18275e2
However, we later have to forward-transform it again in order to compute the jacobian term:
Lines 86 to 87 in 18275e2
This can lead to unnecessary numerical instability in models with complex transformations. The Interval transform is one such case and led to a PyMC performance regression documented in pymc-devs/pymc#5088
import aesara
import aesara.tensor as at
from aesara.compile.mode import Mode
from aeppl.transforms import IntervalTransform
a, b, x = at.scalars('a', 'b', 'x')
intvl = IntervalTransform(args_fn=lambda *args: (a, b))
graph = intvl.forward(intvl.backward(x))
graph_fn = aesara.function([a, b, x], graph, mode=Mode().excluding('fusion'))
aesara.dprint(graph_fn)
Elemwise{Sub}[(0, 0)] [id A] '' 9
|Elemwise{Log}[(0, 0)] [id B] '' 7
| |Elemwise{sub,no_inplace} [id C] '' 5
| |Elemwise{Add}[(0, 0)] [id D] '' 4
| | |Elemwise{mul,no_inplace} [id E] '' 1
| | | |Elemwise{sigmoid,no_inplace} [id F] '' 0
| | | | |x [id G]
| | | |b [id H]
| | |Elemwise{Mul}[(0, 0)] [id I] '' 3
| | |Elemwise{Sub}[(0, 1)] [id J] '' 2
| | | |TensorConstant{1.0} [id K]
| | | |Elemwise{sigmoid,no_inplace} [id F] '' 0
| | |a [id L]
| |a [id L]
|Elemwise{Log}[(0, 0)] [id M] '' 8
|Elemwise{Sub}[(0, 1)] [id N] '' 6
|b [id H]
|Elemwise{Add}[(0, 0)] [id D] '' 4
The graph is far from optimized and I don't see an elegant way how we could rewrite it away. This is probably true for the Simplex transform as well.
One alternative would be to make the original value variable an additional input of the TransformedRV, enabling us to use it directly when computing the jacobian term. This would also be future proof in that we don't need to add specialized rewrites for new transforms that may be added in aeppl or in PyMC
We need to add support for mixtures defined using the Elemwise
+ Switch
Op
.
The following is a simple example that isn't supported:
import aesara.tensor as at
from aeppl.joint_logprob import factorized_joint_logprob
srng = at.random.RandomStream(seed=2320)
I_rv = srng.bernoulli(0.5, size=10, name="I")
X_rv = srng.normal(0, 1, name="X")
Y_rv = srng.gamma(0.5, 0.5, size=10, name="Y")
Z_rv = at.switch(I_rv, X_rv, Y_rv)
Z_rv.name = "Z"
z_vv = Z_rv.clone()
i_vv = I_rv.clone()
logp_parts = factorized_joint_logprob({Z_rv: z_vv, I_rv: i_vv})
This is leading to failing tests in pymc-devs/pymc#5170
from aeppl.transforms import Simplex
import aesara.tensor as at
x = at.fmatrix('x')
assert x.dtype == "float32"
forw_x = Simplex().forward(x)
assert forw_x.dtype == "float32" # Raises
It happens because of this division by shape which is by default int64:
Line 268 in 808236f
Potential terms are arbitrary terms that are added to the logprob graph without "values".
Would something like this make sense?
x_rv = at.random.normal()
potential = at.switch(x_rv < 0, -np.inf, 0)
x = x_rv.clone()
joint_logprob([x_rv, potential], {x_rv: x, potential: None})
We can add support for Scan
Op
s so that joint_logprob
would cover nearly arbitrary timeseries.
A lot of the relevant Scan
rewrite code was written for symbolic-pymc
in pymc-devs/symbolic-pymc#113 and pymc-devs/symbolic-pymc#114, so we can take some of the code/ideas from there and adapt it to aeppl
.
See the Gitter conversation about this idea here. (NB: If you use https://develop.element.io/ you can also turn on LaTeX rendering in the "Labs" settings.)
Also, see these tests for explicit examples of log-likelihood generation for a simple DLM and HMM in symbolic-pymc
.
With a new local_optimizer
addition to RVSinker
, we could add support for affine/linear transforms of RandomVariables
.
In other words, the following could be made to work:
import aesara.tensor as at
from aeppl import joint_logprob
a, b = at.scalars("ab")
Y_rv = a * at.random.normal(0, 1, name="Y") + b
y_val = Y_rv.type()
logp = joint_logprob(Y_rv, {Y_rv: y_val})
I don't understand how to generate a graph with multiple automatic transforms:
import aesara
import aesara.tensor as at
from aesara.graph.opt import in2out, out2in
from aeppl import joint_logprob
from aeppl.transforms import transform_logprob
# Does not matter if I set ignore_newtrees to True, or if I use out2in
transform_opt = in2out(transform_logprob, ignore_newtrees=False)
l_rv = at.random.uniform(0, 1, name='l_rv')
u_rv = at.random.uniform(2, 4, name='u_rv')
# l_rv = at.random.normal(name='l_rv')
# u_rv = at.random.normal(name='u_rv')
y_rv = at.random.uniform(l_rv, u_rv, name='y_rv')
l = l_rv.type()
l.name = 'l'
u = u_rv.type()
u.name = 'u'
y = y_rv.type()
y.name='y'
logp = joint_logprob(y_rv, {y_rv:y, l_rv:l, u_rv:u}, extra_rewrites=transform_opt)
logp.eval({l: -1, u:1, y:0})
AttributeError: 'numpy.random._generator.Generator' object has no attribute 'uniform_interval'
I changed this line so that the name would not be a tuple:
Line 232 in b179a6e
cls_dict["name"] = f"{rv_name}_{trans_name}"
But otherwise the error comes from the same source, i.e., trying to access the rng_fn
of the uniform_interval
At the moment, the default RVTransform.log_jac_det
method isn't taking the absolute value of the Jacobian determinant. See here.
That line should be changed to return at.log(at.abs(at.nlinalg.det(at.atleast_2d(jacobian(phi_inv, [value])))))
See #82 (comment).
Please provide a minimal, self-contained, and reproducible example.
import aesara.tensor as at
import aeppl
cov = at.eye(2)
mu_rv = at.random.normal(size=2, name='mu')
mu_broadcasted = at.broadcast_arrays(mu_rv, cov[..., -1])[0]
# mu_broadcasted = mu_rv # No error if this line is uncommented
x_rv = at.random.multivariate_normal(mu_broadcasted, cov, name='x')
mu_vv = mu_rv.clone()
x_vv = x_rv.clone()
logp = aeppl.factorized_joint_logprob({mu_rv: mu_vv, x_rv: x_vv})
aeppl/joint_logprob.py:161: UserWarning: Found a random variable that was neither among the observations nor the conditioned variables: normal_rv{0, (0, 0), floatX, False}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F51B7F3EC80>), TensorConstant{[]}, TensorConstant{11}, TensorConstant{(2,) of 0.0}, TensorConstant{(2,) of 1.0})
warnings.warn(
logp[x_vv].eval({mu_vv: [0, 0], x_vv: [0, 0]})
UnusedInputError Traceback (most recent call last)
/tmp/ipykernel_55923/2490478877.py in <module>
----> 1 logp[x_vv].eval({mu_vv: [0, 0], x_vv: [0, 0]})
~/miniconda3/envs/aeppl/lib/python3.9/site-packages/aesara/graph/basic.py in eval(self, inputs_to_values)
548 inputs = tuple(sorted(inputs_to_values.keys(), key=id))
549 if inputs not in self._fn_cache:
--> 550 self._fn_cache[inputs] = function(inputs, self)
551 args = [inputs_to_values[param] for param in inputs]
552
~/miniconda3/envs/aeppl/lib/python3.9/site-packages/aesara/compile/function/__init__.py in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
335 # note: pfunc will also call orig_function -- orig_function is
336 # a choke point that all compilation must pass through
--> 337 fn = pfunc(
338 params=inputs,
339 outputs=outputs,
~/miniconda3/envs/aeppl/lib/python3.9/site-packages/aesara/compile/function/pfunc.py in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys)
522 inputs.append(si)
523
--> 524 return orig_function(
525 inputs,
526 cloned_outputs,
~/miniconda3/envs/aeppl/lib/python3.9/site-packages/aesara/compile/function/types.py in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys)
1971 try:
1972 Maker = getattr(mode, "function_maker", FunctionMaker)
-> 1973 m = Maker(
1974 inputs,
1975 outputs,
~/miniconda3/envs/aeppl/lib/python3.9/site-packages/aesara/compile/function/types.py in __init__(self, inputs, outputs, mode, accept_inplace, function_builder, profile, on_unused_input, fgraph, output_keys, name)
1574
1575 # Check if some input variables are unused
-> 1576 self._check_unused_inputs(inputs, outputs, on_unused_input)
1577
1578 # Make a list of (SymbolicInput|SymblicInputKits, indices,
~/miniconda3/envs/aeppl/lib/python3.9/site-packages/aesara/compile/function/types.py in _check_unused_inputs(self, inputs, outputs, on_unused_input)
1746 )
1747 elif on_unused_input == "raise":
-> 1748 raise UnusedInputError(msg % (inputs.index(i), i.variable, err_msg))
1749 else:
1750 raise ValueError(
UnusedInputError: aesara.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 0 is not part of the computational graph needed to compute the outputs: mu.
To make this error into a warning, you can pass the parameter on_unused_input='warn' to aesara.function. To disable it completely, use on_unused_input='ignore'.
First seen in pymc-devs/pymc#5382 (comment)
Please provide a minimal, self-contained, and reproducible example.
import aeppl
import aesara.tensor as at
import aesara
with aesara.config.change_flags(floatX="float32", warn_float64="raise"):
aeppl.logprob(at.random.normal(), at.constant(1.))
Please provide the full traceback of any errors.
/home/ferres/.miniconda3/envs/pymc/lib/python3.9/site-packages/aeppl/logprob.py(75)normal_logprob()
-> - at.log(np.sqrt(2.0 * np.pi))
(Pdb) l
70 def normal_logprob(op, values, *inputs, **kwargs):
71 (value,) = values
72 mu, sigma = inputs[3:]
73 res = (
74 -0.5 * at.pow((value - mu) / sigma, 2)
75 -> - at.log(np.sqrt(2.0 * np.pi))
76 - at.log(sigma)
77 )
78 res = Assert("sigma > 0")(res, at.all(at.gt(sigma, 0.0)))
79 return res
Please provide any additional information below.
python -c "import aesara; print(aesara.config)"
)Seems more consistent to use a Feature dictionary to track this type of graph information vs adding it to the tag of the nodes.
Is there any downside?
We could try to wrap some method from scipy.integrate
and use it to convolve continuous variables in a joint graph of z = x + y
where neither variable {x, y}
is being conditioned on.
I don't know how feasible this would be, since scipy requires a callable. Can we use something like OpFromGraph?
Obviously we can (and should) implement the optimized graphs for variables that have simple analytical form such as the convolution of two gaussians (which I think pymc-symbolic could already do)?
The Aesara pretty-printers do not indent their results. We could really use some good column-aware indentation to make the print output readable.
For example, the IPython pretty printing library has a convenient means of making parts of a string "breakable" (e.g. I used it here).
Just noticed it when creating a new environment for the repo
Unnecessary warnings appear to be emitted by test_joint_logprob_subtensor
. In at least one instance, it looks like they're triggered by RandomStateType
objects.
The current transforms rewrite looks for all RVs that have a standard TransformedRV and automatically converts those (and the respective value variables). To plugin aeppl
into PyMC3 in the future, we probably need something more flexible where we specify what transforms we want to apply to which value variables.
This should also allow us to apply transforms to value variables that are associated with dynamically derived RVs.
For example:
import aesara.tensor as at
from aeppl import joint_logprob
shift_rv = at.random.normal(0, 1)
y_rv = at.random.exponential(1) + shift
shift = shift_rv.clone()
y = y_rv.clone()
logp = joint_logprob(y_rv, {y_rv: y, shift_rv: shift}, transfoms={y: Interval(shift_rv, None)})
I think the logic could still be implemented as a custom "rewrite", I just used a new keyword to simplify the example.
We probably need to collapse each factor to a scalar to avoid double counting issues like this:
import aesara.tensor.random as atr
from aeppl import joint_logprob
#%%
x_rv = atr.normal(name="x")
y_rv = atr.normal(x_rv, 1, size=2, name="y")
x = x_rv.clone()
y = y_rv.clone()
logprob = joint_logprob([y_rv], {x_rv: x, y_rv: y})
eval1 = logprob.eval({x: 0, y: [0, 0]}).sum()
eval1 # -3.6757541328186907
#%%
x_rv = atr.normal(name="x")
y_rv1 = atr.normal(x_rv, 1, name="y1")
y_rv2 = atr.normal(x_rv, 1, name="y2")
x = x_rv.clone()
y1 = y_rv1.clone()
y2 = y_rv2.clone()
logprob = joint_logprob([y_rv1, y_rv2], {x_rv: x, y_rv1: y1, y_rv2: y2})
eval2 = logprob.eval({x: 0, y1: 0, y2: 0})
eval2 # array(-2.7568156)
This is also more easily dealt with if we work with something like #47, where we can decide how to combine terms after we collect them.
Currently, Scan
does not have a pprint
implementation.
I wonder whether it would be possible to rewrite the logp graphs to marginalize over finite discrete variables, indicated by the user (not necessarily all that are in the graph).
x_rv = at.random.bernoulli(0.7)
y_rv = at.normal([0, 1], [1, 1])[x_rv]
y = y_rv.type()
logp = joint_logprob(y_rv, {y_rv: y}, marginalize={x_rv})
Whose logp would be p(y_rv=y | x_rv=0) * p(x_rv=0) + p(y_rv=y | x_rv=1) * p(x_rv=1)
This is straightforward(ish) if the marginalization happens just above the requested variable (e.g., y_rv
), but gets more complicated if it happens at the top of a deeper graph.
The idea would be to derive the logprob for the following type of graphs
y_rv = at.max(at.random.uniform(0, 1, size=3)) # or min
y_rv = at.sort(at.random.uniform(0, 1, size=3))
y_rv = at.sort(at.random.uniform(0, 1, size=3))[idx] # max / min correspond to idx==-1 or idx==0
https://en.wikipedia.org/wiki/Order_statistic#Probabilistic_analysis
This might be a bit far-fetched / difficult to find a good general solution that goes beyond a few simple cases (e.g, order statistics with non-i.i.d RVs):
y_rv = at.max(at.random.uniform([0, 1], [2, 3]))
y_rv = at.max(at.stack([at.random.uniform(0, 1), at.random.normal(0, 1)]))
Since joint_logprob
will be a likely point-of-entry for most people, we should make its signature more explicit and find a way to reuse parts of the docstring from factorized_joint_logprob
.
import numpy as np
import aesara
import aesara.tensor as at
import aeppl
from aeppl.transforms import TransformValuesOpt, LogOddsTransform
init = at.random.beta(1, 1)
innov, _ = aesara.scan(
fn=lambda prev_innov: at.random.beta(prev_innov * 10, (1 - prev_innov) * 10),
outputs_info=[init],
n_steps=10,
)
init_vv = init.clone()
innov_vv = innov.clone()
tr = TransformValuesOpt({
init_vv: LogOddsTransform(),
innov_vv: LogOddsTransform(),
})
logp = aeppl.joint_logprob({init: init_vv, innov: innov_vv}, extra_rewrites=tr)
# init_vv is properly transformed
logp.eval({init_vv: -0.5, innov_vv: np.full(10, 0.5)})
# array(7.22123237)
# innov_vv is not, eval raises ParameterValueError
logp.eval({init_vv: -0.5, innov_vv: np.full(10, -0.5)})
---------------------------------------------------------------------------
ParameterValueError Traceback (most recent call last)
scan_perform.pyx in aesara.scan.scan_perform.perform()
ParameterValueError: 0 <= value <= 1, alpha > 0, beta > 0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.