stefanradev93 / bayesflow Goto Github PK

A Python library for amortized Bayesian workflows using generative neural networks.

License: MIT License

Python 99.61% Makefile 0.19% Batchfile 0.20%

bayesian-inference deep-learning model-comparison amortized-inference simulation-based invertible-networks parameter-estimation likelihood-free uncertainty-quantification

bayesflow's Issues

Best practice recommendations for saving trained networks

Hi Stefan,

do you have any recommendations for how to best store the trained networks when using the parameter estimation workflow notebook?

Best regards
Zijian

handing of missing data

I would like to ask what should we do for missing data in BayesFlow

E.g. I have a dataset X with N row and 5 columns, and they are i.i.d., and

X [ I , j ] ~ binomial (1, params[ j ] ) , hence I got five parameters , j = 0, 1, 2, 3, 4

say if for some reasons, X[ I , j ] is missing when I %% 5 = j

what should I put in X [ I , j ] in order to get correct posterior?

I tried putting value like -1 will not get correct results

I tried the version without missing and the output posterior is reasonable.

Thank you.

Integrate remaining classes from Meta_Workflow notebook into main library

Currently, the following classes for meta are defined directly in Meta_Workflow.ipynb and should to be integrated into bayesflow.networks:

InvariantCouplingNet
ConditionalCouplingLayer with InvariantCouplingNet units
InvariantBayesFlow

An empty checkpoint folder without memory.pkl crashes training

Hey everyone,

Great work on the new version of the package. The notebook examples are quite clear and easy to adapt.

In case you didn't know about this error, if you create an empty checkpoint folder ahead of running bayesflow.trainers.Trainer() this seems to crash the function. You can let Trainer() create the folder for you, but this may be a bit annoying for some analysis pipelines. For example, this will crash for me when using the master branch:

os.makedirs('checkpoint_model1')
trainer = bf.trainers.Trainer(
    amortizer=amortizer, 
    generative_model=generative_model, 
    configurator=configurator,
    checkpoint_path='checkpoint_model1')

Cheers,

FAQ in README

Commonly asked questions should be added in their own section in the README.

Tests for Set Transformers

Add units tests for attention.py module and extend tests for summary_networks.py module.

Regarding-your-joss-submission

Link to the joss review issue :
openjournals/joss-reviews#5702
This issue is regarding your submission of the software paper BayesFlow to JOSS. I was trying to run the tests you mentioned in the tests folder using pytest. I am encountering the error because of the pyproject.toml file(this line in particular --cov BayesFlow). Can you please look into it once and respond.

Experience Replay for all Trainers

Implement trainer.train_experience_replay for all trainers in BaseTrainer.

Adapt bayesflow.buffer.MemoryReplayBuffer to variable argument length

Has this situation come across to you?

Hi Stefan et al.,

I came across a situation that no matter how many ACBs I have stacked up (I have tried up to 13 ACBs), the transformed latent variables Zs are not perfectly following N(0,1) as the examples shown in paper/your example code.

Has this situation come across to you? Do you have any suggestions (if the transformed latent variables are not perfectly trained to be N(0,1), then the posterior draws from the inference phrase wouldn't be valid, right?)

Thank you so much!
Best,
Shuwan

Implementation of an IdentityNetwork in case no summary network is used

In case the datasets have small sizes, it seems reasonable not to use any summary network. However, if we set summary_net = None, some problems concerning dimensions arise: Raw data are stored in an array of shape (batch_size, n_obs, D = dimension of each observation), while summary networks have outputs of shape (batch_size, summary_stats_dim). Thus, it would be helpful to have a IdentityNetwork which simply concatenates a dataset of multiple observations to one single vector. The outputs of the IdentityNetwork would then represent the case in which we use no summary network, but match the required input dimension of the inference network.

An own try to implement such an IdentityNetwork resulted in the following error:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

from numba import njit
import tensorflow as tf

from bayesflow.networks import InvertibleNetwork, InvariantNetwork
from bayesflow.amortizers import SingleModelAmortizer
from bayesflow.trainers import ParameterEstimationTrainer
from bayesflow.diagnostics import *
from bayesflow.models import GenerativeModel

def prior(batch_size):
    """
    Samples from the prior 'batch_size' times.
    ----------
    
    Arguments:
    batch_size : int -- the number of samples to draw from the prior
    ----------
    
    Output:
    theta : np.ndarray of shape (batch_size, theta_dim) -- the samples batch of parameters
    """
    
    # Prior ranges for the simulator 
    # v_c ~ U(-7.0, 7.0)
    # a_c ~ U(0.1, 4.0)
    # t0 ~ U(0.1, 3.0)
    p_samples = np.random.uniform(low=(0.1, 0.1, 0.1, 0.1, 0.1), 
                                  high=(7.0, 7.0, 4.0, 4.0, 3.0), size=(batch_size, 5))
    return p_samples.astype(np.float32)


@njit
def diffusion_trial(v, a, ndt, zr, dt, max_steps):
    """Simulates a trial from the diffusion model."""

    n_steps = 0.
    x = a * zr

    # Simulate a single DM path
    while (x > 0 and x < a and n_steps < max_steps):

        # DDM equation
        x += v*dt + np.sqrt(dt) * np.random.normal()

        # Increment step
        n_steps += 1.0

    rt = n_steps * dt
    return rt + ndt if x > 0. else -rt - ndt

@njit
def diffusion_condition(n_trials, v, a, ndt, zr=0.5, dt=0.005, max_steps=1e4):
    """Simulates a diffusion process over an entire condition."""
    
    x = np.empty(n_trials)
    for i in range(n_trials):
        x[i] = diffusion_trial(v, a, ndt, zr, dt, max_steps)
    return x


@njit
def diffusion_2_conds(params, n_trials, dt=0.005, max_steps=1e4):
    """
    Simulates a diffusion process for 2 conditions with 5 parameters (v1, v2, a1, a2, ndt).
    """
    
    n_trials_c1 = n_trials[0]
    n_trials_c2 = n_trials[1]
    
    v1, v2, a1, a2, ndt = params
    rt_c1 = diffusion_condition(n_trials_c1, v1, a1, ndt,  dt=dt, max_steps=max_steps)
    rt_c2 = diffusion_condition(n_trials_c2, v2, a2, ndt, dt=dt, max_steps=max_steps)
    rts = np.concatenate((rt_c1, rt_c2))
    return rts


def batch_simulator(prior_samples, n_obs, dt=0.005, s=1.0, max_iter=1e4):
    """
    Simulate multiple diffusion_model_datasets.
    """
    
    n_sim = prior_samples.shape[0]
    sim_data = np.empty((n_sim, n_obs), dtype=np.float32)
    
    n1 = n_obs//2
    n2 = n_obs - n1
    
    # Simulate diffusion data
    for i in range(n_sim):
        sim_data[i] = diffusion_2_conds(prior_samples[i], (n1, n2))
        
    # Create condition labels
    cond_arr = np.stack(n_sim * [np.concatenate((np.zeros(n1), np.ones(n2)))] )
    sim_data = np.stack((sim_data, cond_arr), axis=-1)
    
    return sim_data

class IdentityNetwork(tf.keras.Model):
    """Implements a flattened identity mapping of inputs to outputs."""

    def call(self, x: tf.Tensor):
        """Performs a flattened identity mapping.
        
        Parameters
        ----------
        x : tf.Tensor
            Input of shape (batch_size, N, x_dim)
        
        Returns
        -------
        out : tf.Tensor
            Output of shape (batch_size, out_dim)
        """
        return tf.reshape(x, [x.shape[0], -1])
    
#summary_net = InvariantNetwork()
summary_net = IdentityNetwork()
inference_net = InvertibleNetwork({'n_params': 5})
amortizer = SingleModelAmortizer(inference_net, summary_net)

generative_model = GenerativeModel(prior, batch_simulator)

trainer = ParameterEstimationTrainer(
    network=amortizer, 
    generative_model=generative_model,
)

# Pre-simulated data (could be loaded from somewhere else)
n_sim = 5000
n_obs = 100
true_params, x = generative_model(n_sim, n_obs)

losses = trainer.train_offline(epochs=1, batch_size=64, params=true_params, sim_data=x)

ValueError: Exception encountered when calling layer "sequential" (type Sequential).

Input 0 of layer "dense" is incompatible with the layer: expected axis -1 of input shape to have value 103, but received input with shape (64, 203)

Call arguments received:
  • inputs=tf.Tensor(shape=(64, 203), dtype=float32)
  • training=None
  • mask=None

Change Support / Acknowledgements

Add Project 528702768 and move below Citations

Guidelines on number of output and hidden units in LSTM

Scenario: Using an LSTM as summary network for uni-/multivariate time series data, with a flexible number of time series steps.

Can you give recommendations on the number of output units (summary_dim in https://github.com/stefanradev93/BayesFlow/blob/master/bayesflow/summary_networks.py#L123, default=10) and dimension of the LSTM hidden space (lstm_units, default=128)?

I would guess that summary_dim should be >= number of parameters to make sure enough information can be preserved. lstm_units has here the task of summarizing the entire sequence prior to passing through the final dense network (arguably a slightly different task than in e.g. machine translation tasks), therefore it should be fairly large too?

Bug in Simulator setup

https://github.com/stefanradev93/BayesFlow/blob/749c0a6a04ed157eb3505796a4571e73405f5dc6/bayesflow/simulation.py#LL745C1-L745C53 - this should be self.prior_is_batched = self.prior.is_batched, if I am not mistaken

Softmax parameter cannot be recovered

Hi,
Thanks for providing this amazing tool! I have tried to fit a cognitive model with Softmax as the choice rule in the end. That is, P(option i) = exp(beta * Ui) / [sum(exp(beta * Uj)) for all j]; However, I cannot recover parameter beta;
I tried several minimal example with two options; and find that it seems exp(x1)/(exp(x1)+exp(x2)) is considered similarly as x1/(x1+x2); so when you have parameter beta in the latter case, it won't be able to be recovered;
Do you know what causes this and how to solve it? Thanks in advance!

Possible axis mislabelling in bayesflow.diagnostics.plot_calibration_curves

Hello all,

Thank you all for the marvelous work on Bayesflow.
I have been trying to understand the calibration curve outputted by bayesflow.diagnostics.plot_calibration_curves. It seems to me that the calibration curve should be a function of predicted probability instead of the true probability. As defined in the expected_calibration_error function,

    # Loop for each model and compute calibration errs per bin
    for k in range(n_models):
        y_true = (m_true.argmax(axis=1) == k).astype(np.float32)
        y_prob = m_pred[:, k]
        prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=num_bins)

        # Compute ECE by weighting bin errors by bin size
        bins = np.linspace(0.0, 1.0, num_bins + 1)
        binids = np.searchsorted(bins[1:-1], y_prob)
        bin_total = np.bincount(binids, minlength=len(bins))
        nonzero = bin_total != 0
        cal_err = np.sum(np.abs(prob_true - prob_pred) * (bin_total[nonzero] / len(y_true)))

        cal_errs.append(cal_err)
        probs.append((prob_true, prob_pred))
    return cal_errs, probs

Then, the true probability array should be the first element in the output list instead of the second element. Going back to the bayesflow.diagnostics.plot_calibration_curves function,

    # Determine n_subplots dynamically
    n_row = int(np.ceil(num_models / 6))
    n_col = int(np.ceil(num_models / n_row))
    cal_errs, cal_probs = expected_calibration_error(true_models, pred_models, num_bins)

    # Initialize figure
    if fig_size is None:
        fig_size = (int(5 * n_col), int(5 * n_row))
    fig, axarr = plt.subplots(n_row, n_col, figsize=fig_size)
    if n_row > 1:
        ax = axarr.flat

    # Plot marginal calibration curves in a loop
    if n_row > 1:
        ax = axarr.flat
    else:
        ax = axarr
    for j in range(num_models):
        # Plot calibration curve
        ax[j].plot(cal_probs[j][0], cal_probs[j][1], color=color)

If I did not misunderstand, the true probability is plotted in the x-axis while the predicted probability is plotted in the y-axis. However, the axis labels are,

        ax[j].set_xlabel("Predicted probability", fontsize=label_fontsize)
        ax[j].set_ylabel("True probability", fontsize=label_fontsize)

Did I miss anything here? Once again, thank you for the splendid work on Bayesflow!

Use Training Strategy Pattern in Trainer

The current implementation of bayesflow.trainers.Trainer implements several training strategies in one large class. This can be problematic for both maintainers and users, since

Implementing a new strategy from outside bayesflow requires inheriting from the Trainer class
Modifying small parts of a strategy outside of its input parameters can be difficult or impossible
Maintaining a mental overview over the Trainer functionality is impeded.

We can remedy these issues by implementing the trainer strategies explicitly as their own concept, and following composition over inheritance in the Trainer class.

Parallelize Test Workflows

We can parallelize the test workflow in 2 ways:

Over multiple environments (i.e., python versions). This can be done either via parallelization directly in the github workflow, or we can tell tox to parallelize environments.
Over multiple tests with pytest-xdist using the flag -n auto.

Drawbacks:

Parallelizing the tests changes the console output for the test workflow, since intermediate results cannot be shown until all tests have completed.

Trainer without generative model

Feature: Option to create a Trainer instance without a generative model for offline learning.

Use Case: Simulate the data outside of Python (e.g. C++/Matlab) but use the BayesFlow library in offline learning mode.

Cannot do offline training with summary network

Hello!

I am using BayesFlow for the first time, and having some issues doing offline training with a summary network (everything works fine if I exclude it and just supply the data to the inference network). Basically, I am simulating from an agent-based model in R, and then bringing the data (13 parameters and 1921 output features) into Python so that it has the format required by BayesFlow.

{'prior_non_batchable_context': None, 'prior_batchable_context': None, 'prior_draws': array([[ 0.76718193, -0.87377334,  0.22366658, ..., -0.39844744,
        -0.35801412,  0.04897532],
       [ 1.30669447,  0.66729612,  1.7323655 , ..., -1.21459231,
         0.74958571, -0.25844436],
       [ 0.90680171,  1.29136708,  0.70601619, ..., -1.28499242,
         0.98797805,  0.6543645 ],
       ...,
       [-1.57379467, -0.29633931, -0.97821418, ..., -2.09261737,
        -0.91399776,  1.30333936],
       [ 0.45297908, -0.91439699,  0.79108664, ..., -0.16306107,
        -0.50990459, -0.60457136],
       [-0.80346153,  0.05948135, -1.28584127, ...,  0.532937  ,
         0.54166668, -0.12752289]]), 'sim_non_batchable_context': None, 'sim_batchable_context': None, 'sim_data': array([[ 0.44540442,  0.40116571, -0.54365648, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [-0.84753774, -1.34450615, -1.15397759, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [ 0.44540442,  0.40116571,  1.01192892, ..., -0.01023475,
        -0.01002833, -0.01001542],
       ...,
       [-0.19984821, -0.77635165, -1.15397759, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [-0.60475894, -0.00569413, -1.00236414, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [ 0.44540442,  0.40116571,  1.01192892, ..., -0.01023475,
        -0.01002833, -0.01001542]])}

Here is my configure input script:

#define function that configures the input for the amortizer
def configure_input(forward_dict):
    #prepare placeholder dict
    out_dict = {}

    #add to keys
    out_dict["parameters"] = forward_dict["prior_draws"].astype(np.float32)
    out_dict["summary_conditions"] = forward_dict["sim_data"].astype(np.float32)
    out_dict["direct_conditions"] = None

    #return the output dictionary
    return out_dict

Then, if I run the following code:

summary_net = bf.networks.DeepSet()
inference_net = bf.networks.InvertibleNetwork(num_params = 13)
amortized_posterior = bf.amortizers.AmortizedPosterior(inference_net, summary_net)
trainer = bf.trainers.Trainer(amortizer = amortized_posterior, configurator = configure_input, memory = True)
offline_training = trainer.train_offline(simulations_dict = data, epochs = 2, batch_size = 32)

I get the following error:

ValueError: Exception encountered when calling layer 'sequential_549' (type Sequential).
                    
                    Input 0 of layer "dense_1530" is incompatible with the layer: expected min_ndim=2, found ndim=1. Full shape received: (64,)
                    
                    Call arguments received by layer 'sequential_549' (type Sequential):
                      • inputs=tf.Tensor(shape=(64,), dtype=float32)
                      • training=True
                      • mask=None

Do you know what the issue might be?

Publish as conda installable package

This would allow users to use the more flexible conda dependency solver over pip, which is helpful for large projects. See here for instructions.

Blow-up of the loss during training

Hi Stefan,

I am currently experiencing that for certain models (especially when the size of the dataset is slightly larger, e.g. n_obs = 100), the loss sometimes blows up extremely (say, about once every 40 epochs) so that the network is not converged after 100 epochs. Below is an example of such a blow-up (I simply printed the loss after each iteration):

..., -7.752036, -7.820761, -7.7039595, -7.6541286, -7.866893, -7.6999016, -7.7101583, -7.876931, -7.926651, 718.84845, 3543009.5, -1.1094699, 55.888943, 97.26874, 86.27302, 88.00319, 72.44775, 55.576897, 55.679665, 42.166874, 38.97898, 26.977058, 21.449322, 13.875741, 9.592149, 4.081973, 0.62280387, 0.8802163, -0.20799232, -0.26961386, -0.02687946, -1.470809, -0.33887288, -0.95399594, -1.1490471, -1.5857422, -1.2776537, -1.7386447, -1.5166757, -2.1432261, -2.240626, -2.8441887, -2.161209, -2.8351824, -2.9727767, -2.9907823, -3.5804815, -3.6286428, -3.659181, -3.566587, -3.8588657, -3.8712082, -3.9153986, -4.028116, -3.8262885, ...

Do I possibly need to change something in the optimizer setting or is it the number/size of affine coupling blocks that needs to be chosen more carefully? And is the default optimizer setting in the current BayesFlow package the same as the ones used for the BayesFlow paper?

Thanks a lot in advance!

Best regards
Zijian

error when importing the bayesflow

Hi Stefan et al.,

I am so impressed by the bayesflow work. However, when I try to implement it myself, I got an error when importing the bayesflow.

I have tried with Pycharm/Anaconda, they gave me the same error message:

ValueError: Arg specs do not match: original=FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), new=FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), fn=<function ones_like_v2 at 0x000002D9E1F41A80>

Do you know what happened and how to solve this?

Thank you so much!
Best regards,
Shuwan

posterior mean outside the domain of parameter

Hi Stefan,

I am testing some models, however, I found that the prior of a parameter is Uniform (0.1, 0.9). but for a real data set, for one parameter the posterior mean is -0.6 and posterior sd is 0.03, may I ask how is that possible and any way we can solve it? e.g. this is a probability parameter so I use logistic transformation?

Thank you.

Best
Tim

Question about handling missing data

Hi, I'm wondering whether it is of any interest to include the ability for sequential networks or invariant networks to handle missing data. Some of our subjects have only partial data, but it would be a huge waste to just remove them. From the documentation, it's not straightforward to handle missing data. If this is something the maintainers feel like would be useful I am down to help with implementing this.

Allow kwargs in trainer.train_offline

Currently, the signature for trainer.train_offline is train_offline(self, epochs, batch_size, *args). This means that keyword calls will fail. All of the following calls should work:

Calls without keywords

train_offline(10, 32, theta, x)
train_offline(10, 32, m_oh, x)
train_offline(10, 32, m_oh, theta, x)

Calls with keywords

train_offline(epochs=10, batch_size=32, params=theta, sim_data=x)
train_offline(epochs=10, batch_size=32, model_indices=m_oh, sim_data=x)
train_offline(epochs=10, batch_size=32, model_indices=m_oh, params=theta, sim_data=x)

Calls with keywords, arbitrary order within **kwargs

train_offline(epochs=10, batch_size=32, sim_data=x, params=theta)
train_offline(epochs=10, batch_size=32, sim_data=x, model_indices=m_oh)
train_offline(epochs=10, batch_size=32, params=theta, sim_data=x, model_indices=m_oh)
train_offline(epochs=10, batch_size=32, sim_data=x, model_indices=m_oh, params=theta)

No mixed calls need to be supported.

Contribution guide

I suggest adding a contribution guide (see e.g. this doc) to make it easier to contribute to the tool. In particular, this should cover:

From which branch to start (Development?)
The standard workflow (make issue, fork, make sure tests pass + add new ones, create pull request)

`test_time_series_transformer` occasionally fails

This check is not reliable enough:

# Test non-permutation invariant
assert not np.allclose(out, out_perm, atol=1e-5)

Turning down atol could resolve this, but I'm open to more constructive fixes. Could also be worth investigating if this is not a deeper problem with the TimeSeriesTransformer.

Free-form flows for SBI

The paper 'Free-form Flows: Make Any Architecture a Normalizing Flow' by Draxler et al. 2023 introduces a loss that allows to use any architecture as inference network for sbi. A new amortizer could implement the loss function and be tested on benchmark tasks.

Link to paper: https://arxiv.org/abs/2310.16624

should I continue to try with more ACBs?

Hi Stefan et al.,

I am trying a model with 4 ACBs, and after training, the loss seemed to not moving down any more. But in validation, the posterior samples don't get close to the truth parameters.

So I wonder if I should try with more ACBs or anything? Do you have some suggestions?

Thank you so much!
Best,
Shuwan

Coin Sampling Optimizer

Coin sampling (Sharrock & Nemeth, 2023, [1]) is a learning-rate-free optimization scheme that performs on par with perfectly LR-tuned SVGD up to O(log N).

According to @louissharrock, coin sampling is currently implemented in JAX. If there was a TensorFlow implementation, we could directly use it within BayesFlow if it meets the tf.keras.optimizer.Optimizer API.

Within the BayesFlow pipeline, the optimizer is passed when we initiate the training loop. Here's a pointer for online training via train_online:

BayesFlow/bayesflow/trainers.py

Line 371 in f5f7a6a

optimizer : tf.keras.optimizer.Optimizer or None

[1] https://proceedings.mlr.press/v202/sharrock23a

Summary network for time series data

Hi Stefan,

I would like to use a summary network to handle time series data coming from an ODE model. It is a toy model with one or two observed state(s), n_obs=31 fixed for each dataset and the described dynamics is very simple. From your experience, do you maybe have a proposal for a summary network that I should try out? To me, it is more important that the network is sufficient to capture all the relevant information than having the optimal output size. Thanks a lot!

Best regards
Zijian

offline training with varying sample size

In tutorial -- online training, I saw that we could put prior_N function to allow for varying sample size. I would like to ask how we can do the same for offline training? what should be the input of sim_data in the trainer.train_offine function? As suggested by others, this could be very helpful since we can use other software to simulate data like C++

Thank you very much!

Learning rate schedule for trainer

Data / Parameters Tranformations

Add functionality for providing data/parameter transformations, such as scaling, normalization, noise injection, etc. to the trainers.

x ~ scale * N(theta, 1) or x ~ N(theta, scale)? 🤔

Hello 👋 ,
I was trying to understand the benchmark datasets used to evaluate BayesFlow's performance. One thing that I noted for a couple of datasets is the scaling factor in front of the distribution for sampling the observations. One example is linked below, which is the T.2 task from the sbib paper.
Generating points for the original task would look like this:
theta ~ scale * N(0,1) = N(0, scale * 1)
x ~ N(theta, scale * 1)
with scale being the same for both distributions i.e., scale=0.1

Now your implementation is:
theta ~ scale * N(0,1) (same as above)
x ~ scale * N(theta, 1) = N(scale * theta, scale * 1)
which is equivalent to having a different prior compared to the T.2 task for the same data distribution, i.e.:
theta ~ scale^2 * N(0, 1) = N(0, scale^2)
x ~ N(theta, scale * 1)

maybe im also missing something but i would be glad for clarification. I also wanted to ask what code you used for the multivariate dataset from your paper part 3.3?

BayesFlow/bayesflow/benchmarks/gaussian_linear_uniform.py

Line 85 in 5ff1c1e

x = scale * rng.normal(loc=theta, size=(n_obs, theta.shape[0], theta.shape[1]))

Installation instructions

Easily findable instructions on how to install the tool would be good (I could not find any ...)

JOSS review: citations missing

BayesFlow/JOSS/paper.md

Line 83 in 45389ad

    
           When a non-amortized inference procedure does not create a computational bottleneck, approximate Bayesian computation (ABC) might be an appropriate tool. This is the case if a single data set needs to be analyzed, if an infrastructure for parallel computing is readily available, or if repeated re-fits of a model (e.g., cross-validation) are not desired.

I believe the paper should also cite ABCpy here: https://github.com/eth-cscs/abcpy, with paper here

Model loaded from Checkpoint differ in Master and Development branch

Trained models saved in the master branch and then loaded within the development branch do not give the same results.

I tested it using the Intro_Amortized_Posterior_Estimation-notebook in a clean install of the Master branch and saved the model by providing the checkpoint_path argument for the Trainer. Then I loaded the model within a clean install of the Development branch where I only skipped the online training cell.

All the diagnostics plots in the notebook changed significantly. Most prominently the plot_recovery as can be seen in the attached images.

Master branch

Development branch

However, retrain the model on the Development branch gives the correct results like in the Master branch. Hence, it seems like the branches have different defaults somewhere, which would be interesting to know.

Automatic Batching using `vmap`

Classes such as bayesflow.simulation.ContextGenerator or bayesflow.simulation.Prior allow the user to pass in a batched or an unbatched generation function, where typically, passing both is disallowed.

For quality of life (and possibly compatibility with #123) it would make sense to change this to a simple boolean flag, and automatically parallelize unbatched functions for the user with a variation of either JAX's or PyTorch's vmap function.

Example pseudo-code:

Replace this

class Prior:
    def __init__(self, batch_prior_fun=None, prior_fun=None):
        if (batch_prior_fun is None) == (prior_fun is None):
            raise ValueError
        
        self.prior = prior_fun
        self.batched_prior = batch_prior_fun

with

class Prior:
    def __init__(self, prior_fun, is_batched=True):
        self.prior = prior_fun if is_batched else vmap(prior_fun)

Since this is a breaking change, we would need a soft introduction of this feature including backward compatibility and deprecation messages.

Possible concerns:

vmap is backend-dependent, and not always compatible with all operations that could happen in prior_fun (e.g. data-dependent control flow), so allowing the user to disable this would be beneficial
sequential generation of batches is inherently incompatible with vmap, since vmap makes the stronger assumption that each batch item is independently processed

clarification for relationship between the sample size in the training step and inference

Thanks for the very interesting approach for simulation based inference

I have been testing on this and just would like to ask the following:

Is it a requirement that the sample size in the training is the sample as the real data?

e.g. in the example for parameter estimation, if I use the following:

losses = trainer.simulate_and_train_offline(n_sim=10000, epochs=75, batch_size=64, n_obs=10000)

i.e. the dataset has 10000 observation

then if I simulate a dataset with size 10000 and to obtain posterior, I got some reasonable value.

but when I just use the following simulate a dataset with size 1000, I got the following true value and posterior

true_params = prior(1)
print(true_params)
x = batch_simulator(true_params,1000)
param_samples = amortizer.sample(x, 10000)
print(param_samples.mean(0))
print(param_samples.std(0))

[[1.8459227 4.879591 0.42127568 0.44341898 2.0850098 ]] : true value
[ 1.9427828 14.766885 12.579591 -0.948793 1.0707246] : posterior mean
[ 0.9751236 18.545668 8.544211 0.651906 1.5197684] : posterior sd

Thank you.

ConfigurationError using TimeSeriesTransformer

Hi -

I'm looking to use BayesFlow for a model with a nested RL structure. In a previous post (#70 (comment)), I saw that you recommended using TimeSeriesTransformer instead of InvariantNetwork to ensure the model has memory. I also know that I want to pass along relevant variables related to context and other model calculations. Therefore, each simulated trial outputs not only the predicted choices but also other variables like utilities (see the function FD2_trial). However, I'm having issues structuring my model so that it has the correct dimensions.

When I try to configure the trainer, I get the following error:

Could not carry out computations of generative_model ->configurator -> amortizer -> loss! Error trace:
Exception encountered when calling layer 'multi_head_attention_45' (type MultiHeadAttention).

In this tf.Variable creation, the initial value's shape ((3, 4, 32)) is not compatible with the explicitly supplied shape argument ((7, 4, 32)).

I tried to change the input_dim of the TimeSeriesTransformer, but I keep getting the same error. I've also tried various fixes within the configurator without success. Could you help me determine why this error is occurring and how I can fix it?

Thanks,
Blair

Here is the full set-up:

def FD2_trial(offer, norm0, alpha0, beta0, epsilon0, delta0, eta = 0.8):

#Minimum offer amount
mn = 1

alpha = alpha0
beta = beta0
epsilon = epsilon0
delta = delta0

@staticmethod
def _FS_sim(offers=None, alpha=None, norm=None):

    norm_violation = norm-offers
    norm_violation = np.max([norm_violation, 0])
    
    return offers - alpha*norm_violation

# Norm update (RW)
norm = norm0 + epsilon * (offer-norm0)

CV = _FS_sim(offers=offer, alpha=alpha, norm=norm)

ao = np.max([offer-delta, mn])

if _FS_sim(offers=ao, alpha=alpha, norm=norm) > 0:
    aFV = eta * _FS_sim(offers=ao, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ao-delta, mn]), alpha=alpha, norm=norm), 0)
else:
    aFV = eta**2 * np.max(_FS_sim(offers=np.max([ao+delta, mn]), alpha=alpha, norm=norm), 0)

ro = offer+delta

if _FS_sim(offers=ro, alpha=alpha, norm=norm) > 0:
    rFV = eta * _FS_sim(offers=ro, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ro-delta,mn]), alpha=alpha, norm=norm), 0)
else:
    rFV = eta**2 * np.max(_FS_sim(offers=np.max([ro+delta, mn]), alpha=alpha, norm=norm), 0)
        
V = CV + (aFV - rFV)

# calculate probability of accepting offer:                                                                                             
prob = 1 / ( 1 + np.exp(-beta*V))
prob = 0.0001 + 0.9998 * prob
prob_a = [1-prob, prob]

# Simulate choices
action = np.random.choice([0, 1], p=prob_a)

# Output is action, offer, utilities, internal norms, and probabilities
return action, offer, V, norm, prob

def FD2_prior():

alpha = RNG.beta(2.0, 2.0)
beta = RNG.gamma(7.5,1.0)
epsilon = RNG.beta(2.0, 2.0)
delta = RNG.uniform(-2, 2)

return np.hstack((alpha, beta, epsilon, delta))

PARAM_NAMES = [
r"$alpha$",
r"$beta$",
r"$epsilon$",
r"$delta$",
]

prior = bf.simulation.Prior(prior_fun=FD2_prior, param_names=PARAM_NAMES)

MIN_OBS = 30
MAX_OBS = 30
NUM_CONDITIONS = 1


def random_num_obs(min_obs=MIN_OBS, max_obs=MAX_OBS):


return RNG.integers(low=min_obs, high=max_obs + 1)

 def generate_offer_matrix(num_obs):

offer_mat = np.zeros([num_obs,2])

for n in range(num_obs):
    offer_mat[n,1] = np.random.randint(-2,1)
    offer_mat[n,0] = np.random.randint(0,3)
    
return offer_mat


context_gen = bf.simulation.ContextGenerator(
non_batchable_context_fun=random_num_obs,
batchable_context_fun=generate_offer_matrix,
use_non_batchable_for_batchable=True,
 )


def UG_experiment(theta, offer_mat, num_obs):

out = np.zeros((num_obs, 5))
offers = np.zeros(num_obs)
norms = np.zeros(num_obs)
offers[0] = 5 # First offer is always 5
norms[0] = 10 # Initializing norms at 10
for n in range(num_obs):  
    out[n, :] = FD2_trial(offers[n], norm0 = norms[n], alpha0 = theta[0], beta0 = theta[1], epsilon0 = theta[2], delta0 = theta[3])

    if n < (num_obs - 1):
        # Only calculate next offer/norm if there are more offers coming
        norms[n+1] = out[n,3]
        if out[n, 0] == 1:
            # If accept, next offer goes down by [0, 1, 2] (to min of 1)
            offers[n+1] = max(offers[n]+offer_mat[n,1],1)
        else:
            # If reject, next offer goes up by [0, 1, 2] (to max of 9)
            offers[n+1] = min(offers[n]+offer_mat[n,0],9)

return out

simulator = bf.simulation.Simulator(simulator_fun=UG_experiment, context_generator=context_gen)

model = bf.simulation.GenerativeModel(prior=prior, simulator=simulator, name="FD2")

summary_net = bf.networks.TimeSeriesTransformer(input_dim=7, summary_dim=32, name="FD2_summary")

inference_net = bf.networks.InvertibleNetwork(
num_params=len(prior.param_names),
coupling_settings={"dense_args": dict(kernel_regularizer=None), "dropout": False},
name="FD2_inference",
)

amortizer = bf.amortizers.AmortizedPosterior(inference_net, summary_net, name="FD2_amortizer")

prior_means, prior_stds = prior.estimate_means_and_stds(n_draws=100000)
prior_means = np.round(prior_means, decimals=1)
prior_stds = np.round(prior_stds, decimals=1)

def configurator(forward_dict):

# Prepare placeholder dict
out_dict = {}

# Extract simulated choices
data = forward_dict["sim_data"].astype(np.float32)

# Context
context = forward_dict['sim_batchable_context']

# Concatenate 
out_dict["summary_conditions"] = np.c_[data, context].astype(np.float32) 

vec_num_obs = forward_dict["sim_non_batchable_context"] * np.ones((data.shape[0], 1))
out_dict["direct_conditions"] = np.sqrt(vec_num_obs).astype(np.float32)

# Get data generating parameters
params = forward_dict["prior_draws"].astype(np.float32)

# Standardize parameters
out_dict["parameters"] = (params - prior_means) / prior_stds

return out_dict

trainer = bf.trainers.Trainer(
generative_model=model, amortizer=amortizer, configurator=configurator, checkpoint_path="FD2_model"
)

Hierarchical models

Hi Stefan et al.,

Thank you for your research and building this fantastic package. Amin Ghaderi and I are currently building and testing joint models of EEG and behavioural data for each experimental trial. This necessitates approximating posteriors from joint likelihoods on the single-trial level, so we are using BayesFlow. We will likely have a preprint ready within the next couple of weeks.

We are thinking about the future of using BayesFlow for joint modeling (e.g. neural and behavioural data) research. We often use hierarchical models, which we have often implemented using JAGS and Stan etc (https://github.com/mdnunez/pyhddmjags).

Does BayesFlow currently have the ability to fit hierarchical models? And have you thought about his line of research? I suspect it would necessitate some chain of invertible blocks?

A simple hierarchical model extension of the existing DDM example would be a model in which there are many experimental conditions c (e.g. 10). But there is also a hierarchical drift rate parameter, the mean of the condition drift rates:

Likelihood: choice-RT ~ DDM(drift[c], ...)
Prior: drift[c] ~ Normal(mu_drift, std_drift^2)
Hyperprior: mu_drift ~ Uniform(-7, 7), etc.

This isn't as simple as including an extra normal distribution in the simulator because we want to estimate marginal posterior distributions for the 10 condition drifts: delta[c], as well as marginal posterior distributions for the hierarchical parameters mu_drift and std_drift. I also believe there is a fundamental difference between the model above and this model:
Likelihood: choice-RT ~ Complicated_likelihood(mu_drift, std_drift^2, ...)
Prior: mu_drift ~ Uniform(-7, 7), etc.

Kind regards,

Michael

limit of training data for input.

Hi there,

May I ask if there is any limit for input training data?

for example I want to put 50000 simulated data in the following size: (50000,3000,45)

when I try to run

%%time losses = trainer.train_offline(epochs=100, batch_size=64, params=true_params, sim_data=x)

I got error:

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

is that because of my GPU limit (20G)?

actually all (3000,45) are integer only, is it possible to specify integer array to save ram?

Amortized point estimators

The paper "Likelihood-free parameter estimation with neural Bayes estimators" (Sainsbury-Dale, Zammit-Mangion, & Huser, 2023) enables neural amortized point estimation, which is generally faster than fully Bayesian neural inference with conditional generative models (current standard in BayesFlow).

Implementing amortized point estimators in BayesFlow would help with fast prototyping and sanity checks (and surely other tasks, too!).

Some BayesFlow pointers/proposals:

Implement the point estimation features in a class AmortizedPointEstimator in bayesflow.amortizers
Implement appropriate loss functions in bayesflow.losses (depending on the target quantity)
Implement a suitable inference network in bayesflow.inference_networks which takes the summary network output (e.g., DeepSet outputs) and regresses to the point estimate
Summary networks are already implemented in bayesflow.summary_networks and probably work out-of-the-box

Some links:

Paper: https://doi.org/10.1080/00031305.2023.2249522
NeuralEstimators package (R): https://github.com/msainsburydale/NeuralEstimators
NeuralEstimators.jl package (Julia): https://github.com/msainsburydale/NeuralEstimators.jl

Wrong R^2 in bayesflow.diagnostics.true_vs_estimated()

The scatterplot function bayesflow.diagnostics.true_vs_estimated() displays incorrect values for the explained variance R^2, i.e. values greater than 1.0 and less than -1.0.

Feature: Hierarchical Model Comparison

Implement the methods from https://arxiv.org/abs/2301.11873 in the library. We need:

A general purpose HierarchicalNetwork in summary_networks.py
A PMPNetwork with probability outputs and twirls.
Modify EvidentialNetwork in terms of generality.
Add new tests.

Upgrade Python Requirements

3.9 / 3.10 → 3.10 / 3.11

Prior Decorators

The current design of bayesflow.simulation.Prior strongly resembles the decorator pattern. For small codebases, it may be beneficial to directly allow the usage of Priors as decorators.

Example:

@prior(is_batched=True)
def f(batch_size):
    return np.random.uniform(low=-2.0, high=2.0, size=batch_size)

This can likely be achieved alongside the current class-based usage pattern.

how to determine the number of layers

Hi Stefan et al.,

I am curious that how did you determine the number of layers for the different models?

In the simulation studies, you mentioned like 6-10 ACBs should be enough for moderate model setup. So instead of trying different #of ACBs and keep stacking up, do you have any suggestions/better way?

Thanks!
Best,
Shuwan

Refactor Trainer structure

Currently, there are three separate trainer classes:

bayesflow.trainers.ParameterEstimationTrainer,
bayesflow.trainers.ModelComparisonTrainer, and
bayesflow.trainers.MetaTrainer.

They share the same method names but act on different inputs/outputs as well as generative models and network architectures. This issue aims at refactoring the Trainer classes into a metaclass bayesflow.trainers.BaseTrainer with subclasses ParameterEstimationTrainer, ModelComparisonTrainer, and MetaTrainer. The introduction of a unified GenerativeModelinterface has outsourced the forward inference process from the trainers to the generative model. This circumstance facilitates refactoring the trainer class since the core difference used to lie in the forward inference implementations.

stefanradev93 / bayesflow Goto Github PK

bayesflow's Issues

Recommend Projects

Recommend Topics

Recommend Org