pyro-ppl / pyro Goto Github PK

Deep universal probabilistic programming with Python and PyTorch

License: Apache License 2.0

Python 99.35% Makefile 0.18% TeX 0.09% CSS 0.01% Shell 0.15% Dockerfile 0.04% C++ 0.18%

python pytorch machine-learning bayesian probabilistic-programming bayesian-inference variational-inference probabilistic-modeling deep-learning

pyro's Introduction

Getting Started | Documentation | Community | Contributing

Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notably, it was designed with these principles in mind:

Universal: Pyro is a universal PPL - it can represent any computable probability distribution.
Scalable: Pyro scales to large data sets with little overhead compared to hand-written code.
Minimal: Pyro is agile and maintainable. It is implemented with a small core of powerful, composable abstractions.
Flexible: Pyro aims for automation when you want it, control when you need it. This is accomplished through high-level abstractions to express generative and inference models, while allowing experts easy-access to customize inference.

Pyro was originally developed at Uber AI and is now actively maintained by community contributors, including a dedicated team at the Broad Institute. In 2019, Pyro became a project of the Linux Foundation, a neutral space for collaboration on open source software, open standards, open data, and open hardware.

For more information about the high level motivation for Pyro, check out our launch blog post. For additional blog posts, check out work on experimental design and time-to-event modeling in Pyro.

Installing

Installing a stable Pyro release

Install using pip:

pip install pyro-ppl

Install from source:

git clone [email protected]:pyro-ppl/pyro.git
cd pyro
git checkout master  # master is pinned to the latest release
pip install .

Install with extra packages:

To install the dependencies required to run the probabilistic models included in the examples/tutorials directories, please use the following command:

pip install pyro-ppl[extras]

Make sure that the models come from the same release version of the Pyro source code as you have installed.

Installing Pyro dev branch

For recent features you can install Pyro from source.

Install Pyro using pip:

pip install git+https://github.com/pyro-ppl/pyro.git

or, with the extras dependency to run the probabilistic models included in the examples/tutorials directories:

pip install git+https://github.com/pyro-ppl/pyro.git#egg=project[extras]

Install Pyro from source:

git clone https://github.com/pyro-ppl/pyro
cd pyro
pip install .  # pip install .[extras] for running models in examples/tutorials

Running Pyro from a Docker Container

Refer to the instructions here.

Citation

If you use Pyro, please consider citing:

@article{bingham2019pyro,
  author    = {Eli Bingham and
               Jonathan P. Chen and
               Martin Jankowiak and
               Fritz Obermeyer and
               Neeraj Pradhan and
               Theofanis Karaletsos and
               Rohit Singh and
               Paul A. Szerlip and
               Paul Horsfall and
               Noah D. Goodman},
  title     = {Pyro: Deep Universal Probabilistic Programming},
  journal   = {J. Mach. Learn. Res.},
  volume    = {20},
  pages     = {28:1--28:6},
  year      = {2019},
  url       = {http://jmlr.org/papers/v20/18-403.html}
}

pyro's People

Contributors

Stargazers

Watchers

Forkers

shyamalschandra sburne1 ruohoruotsi zaxtax diehumblex arvinds-ds pbaljeka kijungyoon solo-rey stephenra hoangcuong2011 w4nderlust neerajprad 19ai ibrahim85 codeaudit ameyc alejandroposada yalechang ajaytalati dbonadiman anthonymcqueen21 limberc pranavsubramani pgollakota amoliu caozq19 kustomzone hbcbh1999 treiden deansmith poppingtonic wsnedy jhwjhw0123 bayesianbrad hal2001 junweima fridex minddrummer joseeteixeira kfrankc ashimb9 pavelk2 mthomas4736 dl-yc darthsuogles guliisgreat phoebejinx huynhnguyen hieuqtran nagyist kcompher hxi ifarhankhan sabau snci dapoace arrmac lxw4939 cherifsy alfiyazi zilongzhong pavan-carbon joedihare clxdsjyx stemangiola duke24k bansicloud cp612sh samholt vyomshm g-wang bballamudi clonemvp andyzhuang brettkoonce rpongweb ianswebpage empia tsukukobaan okam mbrukman tchen0123 blackaceatzworg zahorecztibor ferdavid1 farizrahman4u misbah6317 null-a sojvai nahidalam lfthwjx devopsmi beaver-company neelshah18 wjh70301 chenyyx cinneesol intfrr awesome-python

pyro's Issues

Documentation

Documentation for existing features:

Tutorial-style documentation for mini-batching via iXrange

This probably falls under general documentation (#12), but I wanted to create a separate issue since I expect that this will be one of the biggest sticking points for people coming from other probabilistic programming languages. (It certainly is for me.)

Something I'd find particularly helpful: Examples that show how to transform models that operate on individual data points into models that operate on batches. As an important special case, this should include sequence models that operate on inputs of varying lengths, such that we need to use padding/sorting.

Related: Support PyTorch DataLoader #46

Add alpha-divergence minimization

A more general form of KL-divergence could be useful.
https://arxiv.org/pdf/1511.03243.pdf

Allows you to do things like estimation for GANs where the generator minimizes the f-divergence and the discriminator does a density ratio estimation.

fix support for Binomial

i added a support method for Binomial in order to test search, but it is only correct for the univariate (non-vectorized) case. in general, need to enumerate the set of 1/0 tensors of a given shape.

Variance reduction for ELBo for discrete variables

The current implementation of ELBo uses the naive likelihood ratio (aka reinforce, aka score function) estimator for non-reparameterized samples. This is very high variance. We should think about strategies for reducing variance, though implementation should probably be driven by need.

Possible methods:

Rao-blackwellize for independent data points in map_data (related to #13).
Integrate out downstream (or otherwise independent) choices,
Subtract running baseline.
Adaptive (nn) baseline. (I wonder how much of the other reduction methods can be handled by adaptive baselines?)

Support PT DataLoader for minibatching

This is a problem @olcayc ran into while batching. Typecasting, then wrapping the Tensor output of DataLoader causes type errors in pyro. Will need to root cause where this error is happening (I dont have the stack trace)
According to the examples, the workaround currently is to batch manually but we should support batching with DataLoader

Poutine stack behavior

Right now the entire poutine stack is applied at every Pyro primitive site, at least until a block is reached. Although the transparent flag takes care of many use cases we care about, it might be better if each poutine in the stack has to call the next one itself, so that otherwise they are never called at all. In the case where there are two or more non-transparent poutines in the stack, it seems wasteful to draw a sample from one and then immediately discard it at another (and potentially incorrect, if e.g. the stochastic function at the site produces exchangeable not but i.i.d. samples).

To this end, @ngoodman suggested the following changes:

we make available next_pyro_sample(), etc, functions that call the next lower poutine method. if it doesn't get called then lower poutine layers simply don't run; the return value is the prev_val (which no longer needs to be an explicit arg to sample, etc).

instead of having a transparent flag, the default behavior of the base poutine class would be to be transparent. we would provide a poutine.forward that provides the standard non-transparent methods.

The purpose of this issue is to discuss the details of potential changes to the poutine stack. See also #51 .

Update Distribution interface

per (pleasant, unheated) discussion with @eb8680, we could consider upgrading distribution objects to be callables (i.e. functions) that return a sample, and also have a scoring function attached. in particular, we could allow them to have args:

>>binomial(ps)  
[1]
>>binomial.sample(ps)
[1]
>>binomial.log_pdf(ps,[1])
ln(0.5)

there are reasons to like this: it simplifies distributions (don't need to construct and then use), and makes it more clear that stateful dists (eg CRP) are the ones that require a constructor. (presumably if we go this route, we will change the Binomial class to a provided binomial function.)

on the other hand, it requires making the signature to pyro.sample and pyro.observe more complex: they must take the fn args as input.

are there other complications to worry about?

if we do make this change, should a distribution automatically call pyro.sample when it is called in the context of a pyro program? i.e. should pyro.sample(binomial, args) be writable as binomial(args)?

graphviz poutine [wip] [rfc]

as a first step towards more advanced gradient estimators, i made a simple poutine that makes a visualization of the forward graph. since this code isn't ready for a PR yet (in particular it needs more testing and may be incomplete in various ways) please look here for now:

https://github.com/uber/pyro/blob/b37b51132976983074f616f74430024ec3b44931/pyro/poutine/viz_poutine.py
https://github.com/uber/pyro/blob/b37b51132976983074f616f74430024ec3b44931/pyro/poutine/__init__.py

for a simple example of what this generates for a simple model, with a few different graph variants,
look here:

https://github.com/uber/pyro/blob/b37b51132976983074f616f74430024ec3b44931/fullgraph.pdf
https://github.com/uber/pyro/blob/b37b51132976983074f616f74430024ec3b44931/skinnygraph.pdf
https://github.com/uber/pyro/blob/b37b51132976983074f616f74430024ec3b44931/graph.nofuncs.pdf
https://github.com/uber/pyro/blob/b37b51132976983074f616f74430024ec3b44931/examples/viz_example.py

the main purpose of this issue is to collect possible answers to the question:
what are the desired features for such a vizpoutine?

presumably, some amount of customizability would be desired, e.g.
-- the option to remove all unnamed intermediate nodes [done]
-- the option to remove all Functions from the graph, i.e. just show dependencies between named variables [done]
-- the option to plot a guide and model side by side
-- the option to encapsulate named modules in a single node
but what else?

also, especially in the context of stochastic objective functions, one will probably want similar capabilities that aren't necessarily embedded inside of a poutine. how best to do that?

the current implementation relies entirely on intercepting the call method in Function. is this sufficient? are there better ways to do this? what's a good set of example graphs to write unittests for?

related to #50, #42, #20

additional notes:
-- don't want to reinvent the wheel here, pytorch tensorboard stuff will happen/is happening
-- for now just need enough graph structure (however hackishly obtained) to develop/debug gradient estimators
-- for simplicity using graph modules like networkx here. definitely overkill but no need to overengineer this yet

onward to awesome gradient estimators!

build failing on python 3

our newly working travisCI tells me that the build is failing (as of d4f7929) for python 3. so we should fix that?

What's in a name?

We currently can assign names to each variable by using the index in the for loop over samples.

However, when we have minibatches this becomes tricky.

Even more tricky is the whole thing if we have internal structure unfolding within a single sample and names.

I think we should have a group-wide pow-wow to discuss our naming conventions and where we want to go with names as our register of data to inference algorithms.
There may be nice things we can do to reduce the wear-and-tear on the user who wants to just have this be taken care of without having to do extensive string-acrobatics.

Visualization in Pyro

Visualization is critical for understanding and debugging models and inference algorithms. What are our visualization desiderata and what features are necessary to support them?

It seems to me that eventually, we'll need support for automatic visualization of samples, scores, and summary statistics and their evolution in execution traces, trace and autograd graph structures, losses and individual loss terms and their evolution in inference algorithms, NaN detection, and parameter/parameter gradient summary statistics (variance, norm, etc). We may also want custom visualizations for specific primitive distributions, and for our model criticism stuff.

Given that we've settled on visdom as our primary visualization tool, what Pyro-specific things do we need to implement, and what sort of interface(s) should we provide? Should we evaluate other tools as well?

Some references:

webppl-viz
visdom
TensorBoard
TensorBoard and Edward
TensorBoard decoupled from the rest of TF: crayon and dmlc's fork
TensorBoard for PyTorch
Visualization in PyMC

automatically upgrade numbers and such to Variables?

in examples that we've mocked up, it is sometimes a drag to have to write things like Variable(torch.ones(1)). it might be feasible to have distributions automatically upgrade their parameters to torch Variables as needed, so that we could just write 1.0. (it's also possible this will cause collisions or confusion somehow.)

if we go this route we should consider automatically converting: raw numbers, python lists, torch tensors, and numpy arrays.

Automated test generation for inference

Along the lines of #16 we should follow webPPL's example closely and make the inference test interface more generic, automate test creation, and write test models independently of the tests themselves so that they can be reused. We could even reuse the test data/ground truths from webPPL and other projects where possible.

refinements to Marginal (and friends)

hypothetically, the abstraction we'll be encouraging is to have a variety of inference functions that take a model and return some representation of the posterior over traces. these trace-posteriors can then be consumed by (one? a family of?) marginal functions which return a function: args->distribution object (whose args match model args), and expectation functions which return reals.

currently the only such marginal function is the mostly-implemented one in Search.py, which is specific to search based posteriors. we should make one that takes VI based posteriors, and make sure the abstractions let us do what we want.

because we think of the marginal as a deterministic function from args to distributions (i think? do we ever want the stochasticity in approximations preserved?), it is reasonable to cache the return value. this in turn makes dynamic programming very easy to express. some issues to be dealt with: what do we do if the args to model are real valued? detect and avoid caching? interpolate? how do we make sure that stochastic (mutual) recursion works appropriately? see dp-cache in webppl.

dynamic models with irregular lengths

Currently we have no very good way of dealing with packed sequences, such as time series of irregular lengths which have been packed into tensors with zero-padding to be able to do minibatch sgd while masking them out.

Pytorch has helper functions for packed sequences, but we need to see how we intend to support such irregularities efficiently without having to process each sample separately.

This may interface weirdly with the mapdata issue I brought up ( #33 ).

Discuss!

Replacing or refactoring QueuePoutine

It's not clear that the queue poutine as currently written is the best abstraction for our use cases. For example, rather than the current division of labor between QueuePoutine and Search, we may want a different poutine operation in Search that returns an array of partial traces, rather than a single complete trace.

The purpose of this issue is to list a few situations where the QueuePoutine is/could be used and to collect our thoughts about what, if anything, should replace it.

@ngoodman suggested the following structure as an alternative:

poutine.escape(
    poutine.trace(
        poutine.replay(
            poutine.enumerater(model), next_trace)), ismarked)
where:

enumerater: keeps a mapping from names to supports, and either
returns the next support element or an end-of-support mark. also sets
a mark when it is used (for escape to use).

replay: replays against a trace, and only allows down-stack poutines to run
if it doesn't find a value.

trace: records the trace. it is transparent.

escape: when triggered by ismarked does a non-local exit, immediately calling
the exit_poutine stack. (so trace will return its partial trace. need to modify returns slightly.)

this then requires the search inference algorithm to keep a queue of partial traces,
on return from the poutines it keeps the current trace if it didn't hit support end,
and pushes the new trace if it isn't complete. (this is less efficient than the current
implementation which pushes all the extensions form a support at once.)

Automated test generation for distributions

The distributions should be tested more extensively and generically. At a minimum, there should be tests for sampling, scoring, CUDA compatibility, and support where appropriate. Sampling and scoring tests should work by comparing our implementations against their counterparts from numpy/scipy.

Most of our existing distributions have most of these, but writing new tests for each new distribution by hand can be time-consuming and error-prone. We should follow webPPL's example closely (see test-samplers, test-scorers, and test-statistics here) and generate our tests automatically given a distribution class, a "type" (e.g. discrete or continuous), a ground-truth sample function, a ground-truth score function, and any other necessary information (e.g. ground-truth summary statistics).

For each distribution, we should have the following tests generated automatically from a configuration file:

Sampler

Draw samples and ground-truth samples, compare with 2-sample tests
Use specialized tests (e.g. KS, chi^2, permutation) for distributions that support them (e.g. 1-d distributions or discrete distributions)

Scorer

Draw ground-truth samples and compare scorer with ground-truth scorer
Draw samples, compare scorer with ground-truth scorer

Moments

For distributions with analytically available moments, compute 3rd-party ground truth, otherwise estimate from 3rd-party ground-truth samples
Compare empirical moments to ground-truth analytical or empirical moments

Batching/vectorization

Sampling, scoring, moments: compare to list of non-batched
Check broadcasting and resizing semantics

Correctness across types

some distributions can have different return types, e.g. Categorical
Compare scorer across different types for same scores

bayesian neural nets

pyro should have a really beautiful, idiomatic way to describe bayesian neural nets.... currently the only way to do it is to construct the model net from raw tensors + tensor math (i.e. not using the predefined nn modules from pytorch). i've been thinking through some options.

here is the one i like best so far. This ony requires the addition of a pyro.random_module(module, prior) helper that intercepts the params of module and samples them with prior instead of registering them as parameters.

Comments on this approach, or alternatives, are welcome! If folks like this, then I or someone can add the helper and an example.

Abstract Trace_kernel class from which MCMC kernels will inherit.

Kernels take a trace and return a trace.

Refactor Search and KLqp with common Trace data structure

To increase modularity, inference algorithms should be rewritten to operate on reified execution traces.

pyro or Pyro?

see title

Refactor map_data

Vectorize for tensor inputs
Add name argument and record in traces

Automatic grouping of variables

Based on discussions with @stuhlmueller, blocking/exposing variables needs some way to group variables together
Example of manually constructing names to expose them.

From chat:
i think (3) would be pretty easy to address without a general solution for automatic naming
for example, suppose you could tag sample/observe statements like sample("char_123", dist, tag="char") and then later call block(model, expose={"tags": ["char"]}), or something like that

Related to #49, #37

klqp => elbo

issue to track/remind to do this before launch

Monte Carlo objectives for VI (IW-ELBO and friends)

Now that the poutine rebake is done, we should re-implement the importance weighted estimator for ELBo (i.e. the one for the IWAE). Should be straightforward.

Variance reduction for reparametrized ELBo

For the reparameterizable case (courtesy @eb8680):

iw-elbo implementation is stale

the importance-weighted elbo implmentation is out of synch with the rest of the system. it should be updated or (temporarily) removed.

re-factoring it in terms of the importance sampling inference method might be clarifying.

parents of sites within trace

We should have a field in trace that we call 'parents' at each sample site in order to have a neat way to capture conditional independence in pyro.

We just discussed how we could fill that without putting the information into names, it will probably involve traversing the autograd graph.

This will interact with #42 .

This will also interact with issues like doing causal inference and many other parts of the plans we have.

Children of sites would also be interesting to get but are easier to get via looking back.

add Categorical distribution

in addition to being generally useful, we need Categorical for the returned marginal distribution when doing systematic search. (see #2.)

Fix broken examples

Some of the examples are broken because of distribution interface changes. Identify and fix these broken examples.

Vectorized loss in KL_QP and global variables

I wrote a quick Bayesian logistic regression example for @ChunyuanLI and realized that the vectorized elbo is broken when some sample statements aren't batched, which happens when you have global variables like the weights in logistic regression. I added a tiny fix in that branch to make it run (just add a few expand_as calls), but rather than pushing that immediately I would prefer to look for a more permanent solution for Rao-Blackwellization sooner rather than later or else scrap it for now - doing it this way has caused a lot of other problems.

double check and integrate search inference

need to code-review and add tests for the systematic search implementation. i confirmed that it worked on a simple example, but then refactored a little bit without testing (bad developer!).

also, the Marginal method doesn't currently return a distribution as it should, because we hadn't implemented Categorical.

tasks for this are something like:

split out trace data structure and Marginal from Search.py
refactor in terms of new trace data structure (#19).
clean up Marginal to return a distribution (and in light of #23).
decide how to handle caching in this Marginal, if we keep it fix issue with keys.
add an Expectation function, too, since tests are easier with that.
add tests!

Draft Marginal and Trace Posterior interfaces

In my Monte Carlo branch, I found myself needing a unified interface for trace posteriors and marginal distributions along the lines of what we've been talking about since #6 and #23 . It would be great to get feedback on those in isolation before they end up in a PR.

Do they fulfill our needs and our original specifications?
Are they doomed to poor performance by construction?
Should optimization-based inference (KL_QP at the moment) return a trace posterior or something else?
Should trace posteriors be reified in lists, represented implicitly by generators, or take another form entirely?
Am I using memoization too aggressively?
Should Marginal hide all internal randomness with poutine.block?
How can we efficiently build histograms of tensors or other types that aren't naively usable with expected behavior as dictionary keys?
etc.

Push search example onto dev branch

It would be great to have an example for search in order to put the algorithm in context.

Cheers,

should observe be a special case of sample?

we could add an optional argument to sample that indicates an observed return value. i.e. pyro.sample("name", dist, observed_val=...). when observed_val is None then it acts as a sampler as usual, when there is an observed value, then it acts as an observed variable (thenpyro.observe would be an alias for pyro.sample.)

this makes it nicer to do models with flexible / missing data. it makes the poutine interface slightly simpler, though just moves the code for handling observes into a case of the sample method.

does it mess up any inference algorithms or use cases?

RuntimeError: differentiating stochastic functions requires providing a reward

Running variational inference on models that include discrete choices currently results in an error for me:

> python examples/bernoulli_gmm.py
dataset loaded
/opt/conda/lib/python3.6/site-packages/torch/autograd/_functions/basic_ops.py:34: UserWarning: self and other not broadcastable, but have the same number of elements.  Falling back to deprecated pointwise behavior.
  return a.sub(b)
/opt/conda/lib/python3.6/site-packages/torch/autograd/_functions/basic_ops.py:63: UserWarning: self and other not broadcastable, but have the same number of elements.  Falling back to deprecated pointwise behavior.
  return a.div(b)
/opt/conda/lib/python3.6/site-packages/torch/autograd/_functions/basic_ops.py:17: UserWarning: self and other not broadcastable, but have the same number of elements.  Falling back to deprecated pointwise behavior.
  return a.add(b)
Traceback (most recent call last):
  File "examples/bernoulli_gmm.py", line 77, in <module>
    loss_sample = grad_step(i, data[i])
  File "/data/pyro/infer/kl_qp.py", line 33, in __call__
    return self.step(*args, **kwargs)
  File "/data/pyro/infer/kl_qp.py", line 75, in step
    loss.backward()
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/stochastic_function.py", line 15, in _do_backward
    raise RuntimeError("differentiating stochastic functions requires "
RuntimeError: differentiating stochastic functions requires providing a reward

Versions:

Python 3.6.1 :: Anaconda custom (64-bit)
PyTorch 0.2.0_2

I observed the same behavior for PyTorch 0.1.x (without the UserWarnings).

(semi) automatic naming

Per discussion about #37 today, a major user pain that came up is manually creating names for inference. Something like:

for i  in (...):
  name_scope_1 = "name" + str(i)
    for j in (...):
      name_scope_2 = name_scope_1 + str(j)
        ...

A solution to this could be to have a function like named_enumerate(name, iterator) that would construct these names on the fly.

add Discrete distribution

this is useful for constructing marginals in particular.

it should be a very straightforward extension of Categorical (especially if implemented as a subclass?).

Clean up distributions library

Currently, to calculate the vectorized score, multinomial iterates through the batch to perform row-wise log-factorial. Ideally PT would support a way to do this efficiently, but otherwise, this should be written via chained tensor operators.

Improve non-default Tensor type support, remove setting global flag

Currently, to support GPU (or non-float) tensors, there is a global set_cuda method for pyro. This sets a global default tensor type for torch, when we should be careful to call type_as when constructing autograd Variables.

This is especially problematic in the distributions library where there are no type_as calls.

Here is an example demonstrating what I mean:

import torch
from torch.autograd import Variable
from pyro.distributions import DiagNormal

z = Variable(torch.randn(4))
mu = Variable(torch.randn(4))
sigma = torch.exp(Variable(torch.randn(4))

# Succeeds
DiagNormal().batch_log_pdf(z, mu=mu, sigma=sigma)

# Fails
DiagNormal().batch_log_pdf(z.cuda(), mu=mu.cuda(), sigma=sigma.cuda())

# Fails
DiagNormal().batch_log_pdf(z.double(), mu=mu.double(), sigma=sigma.double())

I know there are global flags:

# global tensor constructors become all cuda 
pyro.set_cuda()

# global float 
pyro.set_cpu()

But it's not very good practice to force users to set global torch tensor types just to get non-float tensor support in pyro distributions. Before wider release, I'd like to correct this behavior.

Should we garbage-collect local parameters sometimes?

Suppose we're doing variational inference in a model with global and local parameters. For example, in addition to parameters shared by all data points, there might be a per-data-point random variable with mean-field guide. My current understanding is that we'd approach this by naming the local parameters using a unique index, e.g. the position of the corresponding data point in the dataset. This way, we're actually getting distinct parameters for different local data points.

However, in large datasets, we'll probably encounter most data points only once. Does storing all of these local parameters affect performance? Will this substantially affect memory usage for some models?

This is probably not an important issue right now, but if it eventually turned out to be, here are some ideas:

There could be some way to mark parameters as ephemeral, so that they only persist until we "move on" to the next data point (or batch).
There could be a parameter store that uses a LRU cache with memory limit, so that infrequently used parameters get dropped when the limit is exceeded. This would do the right thing for small data sets, where we do want to store all the local parameters, and for hierarchical models, where it may be hard to specify in advance what exactly we want to persist.

(This came up while implementing a char-rnn example, where I wanted to fit a distinct softmax alpha for each text snippet. I'm currently sharing a single alpha between different data points, and doing a bunch of alpha-only update steps until it converges for the current data point, but should probably switch to using distinct names. However, one of the datasets I train on is randomly generated in part and thus effectively infinite, which made me wonder whether distinct names are the right way to go.)

mapdata over internal structure

We have interesting interactions between mapdata and structured models such as time-series models. Ibn an ideal world, we could just do something like mapdata with the parts of the model that are executed at each time-slice over the index 'time' and get a result.

I currently have some version of that using a for-loop, but when using for instance an RNN there are some ugly parts when it comes to sliding around hidden states and for loops are just inelegant.

In general I think this is worth a bigger discussion at one of our next meetings to get the right helper functions set up.

Update:
To clarify, I think this case is most interesting when the internal structure is not iid, so behaves a bit differently to our 'normal' mapdata.

add support methods for (at least) discrete distributions

we need these for Search to be useful.

Add ability to save and load param store

We need the ability to easily (and reasonably quickly) save the param store to disk, and re-load it.

This is useful in a long training run both for saving state for future analyses and for checkpointing in case of a crash.

Test vs Train

Currently, our objectives (i.e. KLqp) automatically do a step in the direction of the gradients.

However, for many cases, such as evaluating on a test-set, we need to also be able to just get the score out without taking a gradient step.

I suggest we change the structure of the infer functions a bit such that we can account for this.

The easiest hack would be to have infer.step and infer.score , where infer.step internally does infer.score plus the gradient.

Probably there are some more abstractions to think about here.

Cheers,

Refactor/rewrite test suite

Currently unit tests have no boilerplate reusability or modularity. Also, statistical failures happen much too often.

Poutine composition assistance

Currently, care must be taken when composing poutines, since they don't commute and different stacks may behave in subtly different and incorrect ways, e.g. the difference between trace(queue(f)) (used in Search) and queue(trace(f)) (sort of incoherent). The poutine stack mechanism should be changed so that it removes the burden of noticing these things from the user and enforces the composition rules that we have in our heads.

pyro.ai

Ideally, the website should have a main page with an "about" blurb and a quick install guide. The documentation should be accessible from the main page and live in a subdomain (eg. docs.pyro.ai)

implement chi objective for VI

it will be useful to have another objective in the mix for testing the system.

the chi objective will be a good starting point partly because it upper-bounds evidence (complementing the elbo lower bound) and tends toward over (rather than under) dispersion.

https://arxiv.org/pdf/1611.00328.pdf