Coder Social home page Coder Social logo

stefanradev93 / bayesflow Goto Github PK

View Code? Open in Web Editor NEW
296.0 20.0 45.0 196.73 MB

A Python library for amortized Bayesian workflows using generative neural networks.

Home Page: https://bayesflow.org/

License: MIT License

Python 99.61% Makefile 0.19% Batchfile 0.20%
bayesian-inference deep-learning model-comparison amortized-inference simulation-based invertible-networks parameter-estimation likelihood-free uncertainty-quantification

bayesflow's Introduction

BayesFlow

Actions Status License: MIT DOI contributions welcome

Welcome to our BayesFlow library for efficient simulation-based Bayesian workflows! Our library enables users to create specialized neural networks for amortized Bayesian inference, which repay users with rapid statistical inference after a potentially longer simulation-based training phase.

For starters, check out some of our walk-through notebooks:

  1. Quickstart amortized posterior estimation
  2. Tackling strange bimodal distributions
  3. Detecting model misspecification in posterior inference
  4. Principled Bayesian workflow for cognitive models
  5. Posterior estimation for ODEs
  6. Posterior estimation for SIR-like models
  7. Model comparison for cognitive models
  8. Hierarchical model comparison for cognitive models

Documentation & Help

The project documentation is available at https://bayesflow.org. Please use the BayesFlow Forums for any BayesFlow-related questions and discussions, and GitHub Issues for bug reports and feature requests.

Installation

See INSTALL.rst for installation instructions.

Conceptual Overview

A cornerstone idea of amortized Bayesian inference is to employ generative neural networks for parameter estimation, model comparison, and model validation when working with intractable simulators whose behavior as a whole is too complex to be described analytically. The figure below presents a higher-level overview of neurally bootstrapped Bayesian inference.

Getting Started: Parameter Estimation

The core functionality of BayesFlow is amortized Bayesian posterior estimation, as described in our paper:

Radev, S. T., Mertens, U. K., Voss, A., Ardizzone, L., & Köthe, U. (2020). BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE Transactions on Neural Networks and Learning Systems, available for free at: https://arxiv.org/abs/2003.06281.

However, since then, we have substantially extended the BayesFlow library such that it is now much more general and cleaner than what we describe in the above paper.

Minimal Example

import numpy as np
import bayesflow as bf

To introduce you to the basic workflow of the library, let's consider a simple 2D Gaussian model, from which we want to obtain posterior inference. We assume a Gaussian simulator (likelihood) and a Gaussian prior for the means of the two components, which are our only model parameters in this example:

def simulator(theta, n_obs=50, scale=1.0):
    return np.random.default_rng().normal(loc=theta, scale=scale, size=(n_obs, theta.shape[0]))

def prior(D=2, mu=0., sigma=1.0):
    return np.random.default_rng().normal(loc=mu, scale=sigma, size=D)

Then, we connect the prior with the simulator using a GenerativeModel wrapper:

generative_model = bf.simulation.GenerativeModel(prior, simulator, simulator_is_batched=False)

Next, we create our BayesFlow setup consisting of a summary and an inference network:

summary_net = bf.networks.SetTransformer(input_dim=2)
inference_net = bf.networks.InvertibleNetwork(num_params=2)
amortized_posterior = bf.amortizers.AmortizedPosterior(inference_net, summary_net)

Finally, we connect the networks with the generative model via a Trainer instance:

trainer = bf.trainers.Trainer(amortizer=amortized_posterior, generative_model=generative_model)

We are now ready to train an amortized posterior approximator. For instance, to run online training, we simply call:

losses = trainer.train_online(epochs=10, iterations_per_epoch=1000, batch_size=32)

Prior to inference, we can use simulation-based calibration (SBC, https://arxiv.org/abs/1804.06788) to check the computational faithfulness of the model-amortizer combination on unseen simulations:

# Generate 500 new simulated data sets
new_sims = trainer.configurator(generative_model(500))

# Obtain 100 posterior draws per data set instantly
posterior_draws = amortized_posterior.sample(new_sims, n_samples=100)

# Diagnose calibration
fig = bf.diagnostics.plot_sbc_histograms(posterior_draws, new_sims['parameters'])

The histograms are roughly uniform and lie within the expected range for well-calibrated inference algorithms as indicated by the shaded gray areas. Accordingly, our neural approximator seems to have converged to the intended target.

As you can see, amortized inference on new (real or simulated) data is easy and fast. We can obtain further 5000 posterior draws per simulated data set and quickly inspect how well the model can recover its parameters across the entire prior predictive distribution.

posterior_draws = amortized_posterior.sample(new_sims, n_samples=5000)
fig = bf.diagnostics.plot_recovery(posterior_draws, new_sims['parameters'])

For any individual data set, we can also compare the parameters' posteriors with their corresponding priors:

fig = bf.diagnostics.plot_posterior_2d(posterior_draws[0], prior=generative_model.prior)

We see clearly how the posterior shrinks relative to the prior for both model parameters as a result of conditioning on the data.

References and Further Reading

  • Radev, S. T., Mertens, U. K., Voss, A., Ardizzone, L., & Köthe, U. (2020). BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(4), 1452-1466.

  • Radev, S. T., Graw, F., Chen, S., Mutters, N. T., Eichel, V. M., Bärnighausen, T., & Köthe, U. (2021). OutbreakFlow: Model-based Bayesian inference of disease outbreak dynamics with invertible neural networks and its application to the COVID-19 pandemics in Germany. PLoS Computational Biology, 17(10), e1009472.

  • Bieringer, S., Butter, A., Heimel, T., Höche, S., Köthe, U., Plehn, T., & Radev, S. T. (2021). Measuring QCD splittings with invertible networks. SciPost Physics, 10(6), 126.

  • von Krause, M., Radev, S. T., & Voss, A. (2022). Mental speed is high until age 60 as revealed by analysis of over a million participants. Nature Human Behaviour, 6(5), 700-708.

Model Misspecification

What if we are dealing with misspecified models? That is, how faithful is our amortized inference if the generative model is a poor representation of reality? A modified loss function optimizes the learned summary statistics towards a unit Gaussian and reliably detects model misspecification during inference time.

In order to use this method, you should only provide the summary_loss_fun argument to the AmortizedPosterior instance:

amortized_posterior = bf.amortizers.AmortizedPosterior(inference_net, summary_net, summary_loss_fun='MMD')

The amortizer knows how to combine its losses and you can inspect the summary space for outliers during inference.

References and Further Reading

  • Schmitt, M., Bürkner P. C., Köthe U., & Radev S. T. (2022). Detecting Model Misspecification in Amortized Bayesian Inference with Neural Networks. ArXiv preprint, available for free at: https://arxiv.org/abs/2112.08866

Model Comparison

BayesFlow can not only be used for parameter estimation, but also to perform approximate Bayesian model comparison via posterior model probabilities or Bayes factors. Let's extend the minimal example from before with a second model $M_2$ that we want to compare with our original model $M_1$:

def simulator(theta, n_obs=50, scale=1.0):
    return np.random.default_rng().normal(loc=theta, scale=scale, size=(n_obs, theta.shape[0]))

def prior_m1(D=2, mu=0., sigma=1.0):
    return np.random.default_rng().normal(loc=mu, scale=sigma, size=D)

def prior_m2(D=2, mu=2., sigma=1.0):
    return np.random.default_rng().normal(loc=mu, scale=sigma, size=D)

For the purpose of this illustration, the two toy models only differ with respect to their prior specification ($M_1: \mu = 0, M_2: \mu = 2$). We create both models as before and use a MultiGenerativeModel wrapper to combine them in a meta_model:

model_m1 = bf.simulation.GenerativeModel(prior_m1, simulator, simulator_is_batched=False)
model_m2 = bf.simulation.GenerativeModel(prior_m2, simulator, simulator_is_batched=False)
meta_model = bf.simulation.MultiGenerativeModel([model_m1, model_m2])

Next, we construct our neural network with a PMPNetwork for approximating posterior model probabilities:

summary_net = bf.networks.SetTransformer(input_dim=2)
probability_net = bf.networks.PMPNetwork(num_models=2)
amortized_bmc = bf.amortizers.AmortizedModelComparison(probability_net, summary_net)

We combine all previous steps with a Trainer instance and train the neural approximator:

trainer = bf.trainers.Trainer(amortizer=amortized_bmc, generative_model=meta_model)
losses = trainer.train_online(epochs=3, iterations_per_epoch=100, batch_size=32)

Let's simulate data sets from our models to check our networks' performance:

sims = trainer.configurator(meta_model(5000))

When feeding the data to our trained network, we almost immediately obtain posterior model probabilities for each of the 5000 data sets:

model_probs = amortized_bmc.posterior_probs(sims)

How good are these predicted probabilities in the closed world? We can have a look at the calibration:

cal_curves = bf.diagnostics.plot_calibration_curves(sims["model_indices"], model_probs)

Our approximator shows excellent calibration, with the calibration curve being closely aligned to the diagonal, an expected calibration error (ECE) near 0 and most predicted probabilities being certain of the model underlying a data set. We can further assess patterns of misclassification with a confusion matrix:

conf_matrix = bf.diagnostics.plot_confusion_matrix(sims["model_indices"], model_probs)

For the vast majority of simulated data sets, the "true" data-generating model is correctly identified. With these diagnostic results backing us up, we can proceed and apply our trained network to empirical data.

BayesFlow is also able to conduct model comparison for hierarchical models. See this tutorial notebook for an introduction to the associated workflow.

References and Further Reading

  • Radev S. T., D’Alessandro M., Mertens U. K., Voss A., Köthe U., & Bürkner P. C. (2021). Amortized Bayesian Model Comparison with Evidental Deep Learning. IEEE Transactions on Neural Networks and Learning Systems. doi:10.1109/TNNLS.2021.3124052 available for free at: https://arxiv.org/abs/2004.10629

  • Schmitt, M., Radev, S. T., & Bürkner, P. C. (2022). Meta-Uncertainty in Bayesian Model Comparison. In International Conference on Artificial Intelligence and Statistics, 11-29, PMLR, available for free at: https://arxiv.org/abs/2210.07278

  • Elsemüller, L., Schnuerch, M., Bürkner, P. C., & Radev, S. T. (2023). A Deep Learning Method for Comparing Bayesian Hierarchical Models. ArXiv preprint, available for free at: https://arxiv.org/abs/2301.11873

Likelihood Emulation

In order to learn the exchangeable (i.e., permutation invariant) likelihood from the minimal example instead of the posterior, you may use the AmortizedLikelihood wrapper:

likelihood_net = bf.networks.InvertibleNetwork(num_params=2)
amortized_likelihood = bf.amortizers.AmortizedLikelihood(likelihood_net)

This wrapper can interact with a Trainer instance in the same way as the AmortizedPosterior. Finally, you can also learn the likelihood and the posterior simultaneously by using the AmortizedPosteriorLikelihood wrapper and choosing your preferred training scheme:

joint_amortizer = bf.amortizers.AmortizedPosteriorLikelihood(amortized_posterior, amortized_likelihood)

Learning both densities enables us to approximate marginal likelihoods or perform approximate leave-one-out cross-validation (LOO-CV) for prior or posterior predictive model comparison, respectively.

References and Further Reading

Radev, S. T., Schmitt, M., Pratz, V., Picchini, U., Köthe, U., & Bürkner, P.-C. (2023). JANA: Jointly amortized neural approximation of complex Bayesian models. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, 216, 1695-1706. (arXiv)(PMLR)

Support

This project is currently managed by researchers from Rensselaer Polytechnic Institute, TU Dortmund University, and Heidelberg University. It is partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, Project 528702768). The project is further supported by Germany's Excellence Strategy -- EXC-2075 - 390740016 (Stuttgart Cluster of Excellence SimTech) and EXC-2181 - 390900948 (Heidelberg Cluster of Excellence STRUCTURES), as well as the Informatics for Life initiative funded by the Klaus Tschira Foundation.

Citing BayesFlow

You can cite BayesFlow along the lines of:

  • We approximated the posterior with neural posterior estimation and learned summary statistics (NPE; Radev et al., 2020), as implemented in the BayesFlow software for amortized Bayesian workflows (Radev et al., 2023a).
  • We approximated the likelihood with neural likelihood estimation (NLE; Papamakarios et al., 2019) without hand-crafted summary statistics, as implemented in the BayesFlow software for amortized Bayesian workflows (Radev et al., 2023b).
  • We performed simultaneous posterior and likelihood estimation with jointly amortized neural approximation (JANA; Radev et al., 2023a), as implemented in the BayesFlow software for amortized Bayesian workflows (Radev et al., 2023b).
  1. Radev, S. T., Schmitt, M., Schumacher, L., Elsemüller, L., Pratz, V., Schälte, Y., Köthe, U., & Bürkner, P.-C. (2023a). BayesFlow: Amortized Bayesian workflows with neural networks. The Journal of Open Source Software, 8(89), 5702.(arXiv)(JOSS)
  2. Radev, S. T., Mertens, U. K., Voss, A., Ardizzone, L., Köthe, U. (2020). BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(4), 1452-1466. (arXiv)(IEEE TNNLS)
  3. Radev, S. T., Schmitt, M., Pratz, V., Picchini, U., Köthe, U., & Bürkner, P.-C. (2023b). JANA: Jointly amortized neural approximation of complex Bayesian models. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, 216, 1695-1706. (arXiv)(PMLR)

BibTeX:

@article{bayesflow_2023_software,
  title = {{BayesFlow}: Amortized {B}ayesian workflows with neural networks},
  author = {Radev, Stefan T. and Schmitt, Marvin and Schumacher, Lukas and Elsemüller, Lasse and Pratz, Valentin and Schälte, Yannik and Köthe, Ullrich and Bürkner, Paul-Christian},
  journal = {Journal of Open Source Software},
  volume = {8},
  number = {89},
  pages = {5702},
  year = {2023}
}

@article{bayesflow_2020_original,
  title = {{BayesFlow}: Learning complex stochastic models with invertible neural networks},
  author = {Radev, Stefan T. and Mertens, Ulf K. and Voss, Andreas and Ardizzone, Lynton and K{\"o}the, Ullrich},
  journal = {IEEE transactions on neural networks and learning systems},
  volume = {33},
  number = {4},
  pages = {1452--1466},
  year = {2020}
}

@inproceedings{bayesflow_2023_jana,
  title = {{JANA}: Jointly amortized neural approximation of complex {B}ayesian models},
  author = {Radev, Stefan T. and Schmitt, Marvin and Pratz, Valentin and Picchini, Umberto and K\"othe, Ullrich and B\"urkner, Paul-Christian},
  booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence},
  pages = {1695--1706},
  year = {2023},
  volume = {216},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR}
}

bayesflow's People

Contributors

arrjon avatar daniel-habermann avatar dependabot[bot] avatar elseml avatar han-ol avatar iosonomarco avatar kucharssim avatar larskue avatar levolz avatar luschumacher avatar marvinschmitt avatar paul-buerkner avatar rene-bucchia avatar rusty-electron avatar stefanradev93 avatar thegialeo avatar vpratz avatar yannikschaelte avatar zimea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bayesflow's Issues

offline training with varying sample size

In tutorial -- online training, I saw that we could put prior_N function to allow for varying sample size. I would like to ask how we can do the same for offline training? what should be the input of sim_data in the trainer.train_offine function? As suggested by others, this could be very helpful since we can use other software to simulate data like C++

Thank you very much!

Has this situation come across to you?

Hi Stefan et al.,

I came across a situation that no matter how many ACBs I have stacked up (I have tried up to 13 ACBs), the transformed latent variables Zs are not perfectly following N(0,1) as the examples shown in paper/your example code.

Has this situation come across to you? Do you have any suggestions (if the transformed latent variables are not perfectly trained to be N(0,1), then the posterior draws from the inference phrase wouldn't be valid, right?)

Thank you so much!
Best,
Shuwan

posterior mean outside the domain of parameter

Hi Stefan,

I am testing some models, however, I found that the prior of a parameter is Uniform (0.1, 0.9). but for a real data set, for one parameter the posterior mean is -0.6 and posterior sd is 0.03, may I ask how is that possible and any way we can solve it? e.g. this is a probability parameter so I use logistic transformation?

Thank you.

Best
Tim

handing of missing data

I would like to ask what should we do for missing data in BayesFlow

E.g. I have a dataset X with N row and 5 columns, and they are i.i.d., and

X [ I , j ] ~ binomial (1, params[ j ] ) , hence I got five parameters , j = 0, 1, 2, 3, 4

say if for some reasons, X[ I , j ] is missing when I %% 5 = j

what should I put in X [ I , j ] in order to get correct posterior?

I tried putting value like -1 will not get correct results

I tried the version without missing and the output posterior is reasonable.

Thank you.

Refactor Trainer structure

Currently, there are three separate trainer classes:

  • bayesflow.trainers.ParameterEstimationTrainer,
  • bayesflow.trainers.ModelComparisonTrainer, and
  • bayesflow.trainers.MetaTrainer.

They share the same method names but act on different inputs/outputs as well as generative models and network architectures. This issue aims at refactoring the Trainer classes into a metaclass bayesflow.trainers.BaseTrainer with subclasses ParameterEstimationTrainer, ModelComparisonTrainer, and MetaTrainer. The introduction of a unified GenerativeModelinterface has outsourced the forward inference process from the trainers to the generative model. This circumstance facilitates refactoring the trainer class since the core difference used to lie in the forward inference implementations.

Parallelize Test Workflows

We can parallelize the test workflow in 2 ways:

  1. Over multiple environments (i.e., python versions). This can be done either via parallelization directly in the github workflow, or we can tell tox to parallelize environments.
  2. Over multiple tests with pytest-xdist using the flag -n auto.

Drawbacks:

  • Parallelizing the tests changes the console output for the test workflow, since intermediate results cannot be shown until all tests have completed.

Blow-up of the loss during training

Hi Stefan,

I am currently experiencing that for certain models (especially when the size of the dataset is slightly larger, e.g. n_obs = 100), the loss sometimes blows up extremely (say, about once every 40 epochs) so that the network is not converged after 100 epochs. Below is an example of such a blow-up (I simply printed the loss after each iteration):

..., -7.752036, -7.820761, -7.7039595, -7.6541286, -7.866893, -7.6999016, -7.7101583, -7.876931, -7.926651, 718.84845, 3543009.5, -1.1094699, 55.888943, 97.26874, 86.27302, 88.00319, 72.44775, 55.576897, 55.679665, 42.166874, 38.97898, 26.977058, 21.449322, 13.875741, 9.592149, 4.081973, 0.62280387, 0.8802163, -0.20799232, -0.26961386, -0.02687946, -1.470809, -0.33887288, -0.95399594, -1.1490471, -1.5857422, -1.2776537, -1.7386447, -1.5166757, -2.1432261, -2.240626, -2.8441887, -2.161209, -2.8351824, -2.9727767, -2.9907823, -3.5804815, -3.6286428, -3.659181, -3.566587, -3.8588657, -3.8712082, -3.9153986, -4.028116, -3.8262885, ...

Do I possibly need to change something in the optimizer setting or is it the number/size of affine coupling blocks that needs to be chosen more carefully? And is the default optimizer setting in the current BayesFlow package the same as the ones used for the BayesFlow paper?

Thanks a lot in advance!

Best regards
Zijian

Question about handling missing data

Hi, I'm wondering whether it is of any interest to include the ability for sequential networks or invariant networks to handle missing data. Some of our subjects have only partial data, but it would be a huge waste to just remove them. From the documentation, it's not straightforward to handle missing data. If this is something the maintainers feel like would be useful I am down to help with implementing this.

Hierarchical models

Hi Stefan et al.,

Thank you for your research and building this fantastic package. Amin Ghaderi and I are currently building and testing joint models of EEG and behavioural data for each experimental trial. This necessitates approximating posteriors from joint likelihoods on the single-trial level, so we are using BayesFlow. We will likely have a preprint ready within the next couple of weeks.

We are thinking about the future of using BayesFlow for joint modeling (e.g. neural and behavioural data) research. We often use hierarchical models, which we have often implemented using JAGS and Stan etc (https://github.com/mdnunez/pyhddmjags).

Does BayesFlow currently have the ability to fit hierarchical models? And have you thought about his line of research? I suspect it would necessitate some chain of invertible blocks?

A simple hierarchical model extension of the existing DDM example would be a model in which there are many experimental conditions c (e.g. 10). But there is also a hierarchical drift rate parameter, the mean of the condition drift rates:

Likelihood: choice-RT ~ DDM(drift[c], ...)
Prior: drift[c] ~ Normal(mu_drift, std_drift^2)
Hyperprior: mu_drift ~ Uniform(-7, 7), etc.

This isn't as simple as including an extra normal distribution in the simulator because we want to estimate marginal posterior distributions for the 10 condition drifts: delta[c], as well as marginal posterior distributions for the hierarchical parameters mu_drift and std_drift. I also believe there is a fundamental difference between the model above and this model:
Likelihood: choice-RT ~ Complicated_likelihood(mu_drift, std_drift^2, ...)
Prior: mu_drift ~ Uniform(-7, 7), etc.

Kind regards,

Michael

JOSS review: citations missing

When a non-amortized inference procedure does not create a computational bottleneck, approximate Bayesian computation (ABC) might be an appropriate tool. This is the case if a single data set needs to be analyzed, if an infrastructure for parallel computing is readily available, or if repeated re-fits of a model (e.g., cross-validation) are not desired.

I believe the paper should also cite ABCpy here: https://github.com/eth-cscs/abcpy, with paper here

Related to openjournals/joss-reviews#5702

Softmax parameter cannot be recovered

Hi,
Thanks for providing this amazing tool! I have tried to fit a cognitive model with Softmax as the choice rule in the end. That is, P(option i) = exp(beta * Ui) / [sum(exp(beta * Uj)) for all j]; However, I cannot recover parameter beta;
I tried several minimal example with two options; and find that it seems exp(x1)/(exp(x1)+exp(x2)) is considered similarly as x1/(x1+x2); so when you have parameter beta in the latter case, it won't be able to be recovered;
Do you know what causes this and how to solve it? Thanks in advance!

Experience Replay for all Trainers

Implement trainer.train_experience_replay for all trainers in BaseTrainer.

  • Adapt bayesflow.buffer.MemoryReplayBuffer to variable argument length

Prior Decorators

The current design of bayesflow.simulation.Prior strongly resembles the decorator pattern. For small codebases, it may be beneficial to directly allow the usage of Priors as decorators.

Example:

@prior(is_batched=True)
def f(batch_size):
    return np.random.uniform(low=-2.0, high=2.0, size=batch_size)

This can likely be achieved alongside the current class-based usage pattern.

Regarding-your-joss-submission

Link to the joss review issue :
openjournals/joss-reviews#5702
This issue is regarding your submission of the software paper BayesFlow to JOSS. I was trying to run the tests you mentioned in the tests folder using pytest. I am encountering the error because of the pyproject.toml file(this line in particular --cov BayesFlow). Can you please look into it once and respond.

Model loaded from Checkpoint differ in Master and Development branch

Trained models saved in the master branch and then loaded within the development branch do not give the same results.

I tested it using the Intro_Amortized_Posterior_Estimation-notebook in a clean install of the Master branch and saved the model by providing the checkpoint_path argument for the Trainer. Then I loaded the model within a clean install of the Development branch where I only skipped the online training cell.

All the diagnostics plots in the notebook changed significantly. Most prominently the plot_recovery as can be seen in the attached images.

Master branch
recovery_plot_master_branch

Development branch
recovery_plot_development_branch

However, retrain the model on the Development branch gives the correct results like in the Master branch. Hence, it seems like the branches have different defaults somewhere, which would be interesting to know.

Trainer without generative model

Feature: Option to create a Trainer instance without a generative model for offline learning.

Use Case: Simulate the data outside of Python (e.g. C++/Matlab) but use the BayesFlow library in offline learning mode.

clarification for relationship between the sample size in the training step and inference

Thanks for the very interesting approach for simulation based inference

I have been testing on this and just would like to ask the following:

Is it a requirement that the sample size in the training is the sample as the real data?

e.g. in the example for parameter estimation, if I use the following:

losses = trainer.simulate_and_train_offline(n_sim=10000, epochs=75, batch_size=64, n_obs=10000)

i.e. the dataset has 10000 observation

then if I simulate a dataset with size 10000 and to obtain posterior, I got some reasonable value.

but when I just use the following simulate a dataset with size 1000, I got the following true value and posterior

true_params = prior(1)
print(true_params)
x = batch_simulator(true_params,1000)
param_samples = amortizer.sample(x, 10000)
print(param_samples.mean(0))
print(param_samples.std(0))

[[1.8459227 4.879591 0.42127568 0.44341898 2.0850098 ]] : true value
[ 1.9427828 14.766885 12.579591 -0.948793 1.0707246] : posterior mean
[ 0.9751236 18.545668 8.544211 0.651906 1.5197684] : posterior sd

Thank you.

Data / Parameters Tranformations

Add functionality for providing data/parameter transformations, such as scaling, normalization, noise injection, etc. to the trainers.

Use Training Strategy Pattern in Trainer

The current implementation of bayesflow.trainers.Trainer implements several training strategies in one large class. This can be problematic for both maintainers and users, since

  1. Implementing a new strategy from outside bayesflow requires inheriting from the Trainer class
  2. Modifying small parts of a strategy outside of its input parameters can be difficult or impossible
  3. Maintaining a mental overview over the Trainer functionality is impeded.

We can remedy these issues by implementing the trainer strategies explicitly as their own concept, and following composition over inheritance in the Trainer class.

Coin Sampling Optimizer

Coin sampling (Sharrock & Nemeth, 2023, [1]) is a learning-rate-free optimization scheme that performs on par with perfectly LR-tuned SVGD up to O(log N).

According to @louissharrock, coin sampling is currently implemented in JAX. If there was a TensorFlow implementation, we could directly use it within BayesFlow if it meets the tf.keras.optimizer.Optimizer API.

Within the BayesFlow pipeline, the optimizer is passed when we initiate the training loop. Here's a pointer for online training via train_online:

optimizer : tf.keras.optimizer.Optimizer or None

[1] https://proceedings.mlr.press/v202/sharrock23a

how to determine the number of layers

Hi Stefan et al.,

I am curious that how did you determine the number of layers for the different models?

In the simulation studies, you mentioned like 6-10 ACBs should be enough for moderate model setup. So instead of trying different #of ACBs and keep stacking up, do you have any suggestions/better way?

Thanks!
Best,
Shuwan

Free-form flows for SBI

The paper 'Free-form Flows: Make Any Architecture a Normalizing Flow' by Draxler et al. 2023 introduces a loss that allows to use any architecture as inference network for sbi. A new amortizer could implement the loss function and be tested on benchmark tasks.

Link to paper: https://arxiv.org/abs/2310.16624

Cannot do offline training with summary network

Hello!

I am using BayesFlow for the first time, and having some issues doing offline training with a summary network (everything works fine if I exclude it and just supply the data to the inference network). Basically, I am simulating from an agent-based model in R, and then bringing the data (13 parameters and 1921 output features) into Python so that it has the format required by BayesFlow.

{'prior_non_batchable_context': None, 'prior_batchable_context': None, 'prior_draws': array([[ 0.76718193, -0.87377334,  0.22366658, ..., -0.39844744,
        -0.35801412,  0.04897532],
       [ 1.30669447,  0.66729612,  1.7323655 , ..., -1.21459231,
         0.74958571, -0.25844436],
       [ 0.90680171,  1.29136708,  0.70601619, ..., -1.28499242,
         0.98797805,  0.6543645 ],
       ...,
       [-1.57379467, -0.29633931, -0.97821418, ..., -2.09261737,
        -0.91399776,  1.30333936],
       [ 0.45297908, -0.91439699,  0.79108664, ..., -0.16306107,
        -0.50990459, -0.60457136],
       [-0.80346153,  0.05948135, -1.28584127, ...,  0.532937  ,
         0.54166668, -0.12752289]]), 'sim_non_batchable_context': None, 'sim_batchable_context': None, 'sim_data': array([[ 0.44540442,  0.40116571, -0.54365648, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [-0.84753774, -1.34450615, -1.15397759, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [ 0.44540442,  0.40116571,  1.01192892, ..., -0.01023475,
        -0.01002833, -0.01001542],
       ...,
       [-0.19984821, -0.77635165, -1.15397759, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [-0.60475894, -0.00569413, -1.00236414, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [ 0.44540442,  0.40116571,  1.01192892, ..., -0.01023475,
        -0.01002833, -0.01001542]])}

Here is my configure input script:

#define function that configures the input for the amortizer
def configure_input(forward_dict):
    #prepare placeholder dict
    out_dict = {}

    #add to keys
    out_dict["parameters"] = forward_dict["prior_draws"].astype(np.float32)
    out_dict["summary_conditions"] = forward_dict["sim_data"].astype(np.float32)
    out_dict["direct_conditions"] = None

    #return the output dictionary
    return out_dict

Then, if I run the following code:

summary_net = bf.networks.DeepSet()
inference_net = bf.networks.InvertibleNetwork(num_params = 13)
amortized_posterior = bf.amortizers.AmortizedPosterior(inference_net, summary_net)
trainer = bf.trainers.Trainer(amortizer = amortized_posterior, configurator = configure_input, memory = True)
offline_training = trainer.train_offline(simulations_dict = data, epochs = 2, batch_size = 32)

I get the following error:

ValueError: Exception encountered when calling layer 'sequential_549' (type Sequential).
                    
                    Input 0 of layer "dense_1530" is incompatible with the layer: expected min_ndim=2, found ndim=1. Full shape received: (64,)
                    
                    Call arguments received by layer 'sequential_549' (type Sequential):
                      • inputs=tf.Tensor(shape=(64,), dtype=float32)
                      • training=True
                      • mask=None

Do you know what the issue might be?

Possible axis mislabelling in bayesflow.diagnostics.plot_calibration_curves

Hello all,

Thank you all for the marvelous work on Bayesflow.
I have been trying to understand the calibration curve outputted by bayesflow.diagnostics.plot_calibration_curves. It seems to me that the calibration curve should be a function of predicted probability instead of the true probability. As defined in the expected_calibration_error function,

    # Loop for each model and compute calibration errs per bin
    for k in range(n_models):
        y_true = (m_true.argmax(axis=1) == k).astype(np.float32)
        y_prob = m_pred[:, k]
        prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=num_bins)

        # Compute ECE by weighting bin errors by bin size
        bins = np.linspace(0.0, 1.0, num_bins + 1)
        binids = np.searchsorted(bins[1:-1], y_prob)
        bin_total = np.bincount(binids, minlength=len(bins))
        nonzero = bin_total != 0
        cal_err = np.sum(np.abs(prob_true - prob_pred) * (bin_total[nonzero] / len(y_true)))

        cal_errs.append(cal_err)
        probs.append((prob_true, prob_pred))
    return cal_errs, probs

Then, the true probability array should be the first element in the output list instead of the second element. Going back to the bayesflow.diagnostics.plot_calibration_curves function,

    # Determine n_subplots dynamically
    n_row = int(np.ceil(num_models / 6))
    n_col = int(np.ceil(num_models / n_row))
    cal_errs, cal_probs = expected_calibration_error(true_models, pred_models, num_bins)

    # Initialize figure
    if fig_size is None:
        fig_size = (int(5 * n_col), int(5 * n_row))
    fig, axarr = plt.subplots(n_row, n_col, figsize=fig_size)
    if n_row > 1:
        ax = axarr.flat

    # Plot marginal calibration curves in a loop
    if n_row > 1:
        ax = axarr.flat
    else:
        ax = axarr
    for j in range(num_models):
        # Plot calibration curve
        ax[j].plot(cal_probs[j][0], cal_probs[j][1], color=color)

If I did not misunderstand, the true probability is plotted in the x-axis while the predicted probability is plotted in the y-axis. However, the axis labels are,

        ax[j].set_xlabel("Predicted probability", fontsize=label_fontsize)
        ax[j].set_ylabel("True probability", fontsize=label_fontsize)

Did I miss anything here? Once again, thank you for the splendid work on Bayesflow!

x ~ scale * N(theta, 1) or x ~ N(theta, scale)? 🤔

Hello 👋 ,
I was trying to understand the benchmark datasets used to evaluate BayesFlow's performance. One thing that I noted for a couple of datasets is the scaling factor in front of the distribution for sampling the observations. One example is linked below, which is the T.2 task from the sbib paper.
Generating points for the original task would look like this:
theta ~ scale * N(0,1) = N(0, scale * 1)
x ~ N(theta, scale * 1)
with scale being the same for both distributions i.e., scale=0.1

Now your implementation is:
theta ~ scale * N(0,1) (same as above)
x ~ scale * N(theta, 1) = N(scale * theta, scale * 1)
which is equivalent to having a different prior compared to the T.2 task for the same data distribution, i.e.:
theta ~ scale^2 * N(0, 1) = N(0, scale^2)
x ~ N(theta, scale * 1)

maybe im also missing something but i would be glad for clarification. I also wanted to ask what code you used for the multivariate dataset from your paper part 3.3?

x = scale * rng.normal(loc=theta, size=(n_obs, theta.shape[0], theta.shape[1]))

Implementation of an IdentityNetwork in case no summary network is used

In case the datasets have small sizes, it seems reasonable not to use any summary network. However, if we set summary_net = None, some problems concerning dimensions arise: Raw data are stored in an array of shape (batch_size, n_obs, D = dimension of each observation), while summary networks have outputs of shape (batch_size, summary_stats_dim). Thus, it would be helpful to have a IdentityNetwork which simply concatenates a dataset of multiple observations to one single vector. The outputs of the IdentityNetwork would then represent the case in which we use no summary network, but match the required input dimension of the inference network.

An own try to implement such an IdentityNetwork resulted in the following error:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

from numba import njit
import tensorflow as tf

from bayesflow.networks import InvertibleNetwork, InvariantNetwork
from bayesflow.amortizers import SingleModelAmortizer
from bayesflow.trainers import ParameterEstimationTrainer
from bayesflow.diagnostics import *
from bayesflow.models import GenerativeModel

def prior(batch_size):
    """
    Samples from the prior 'batch_size' times.
    ----------
    
    Arguments:
    batch_size : int -- the number of samples to draw from the prior
    ----------
    
    Output:
    theta : np.ndarray of shape (batch_size, theta_dim) -- the samples batch of parameters
    """
    
    # Prior ranges for the simulator 
    # v_c ~ U(-7.0, 7.0)
    # a_c ~ U(0.1, 4.0)
    # t0 ~ U(0.1, 3.0)
    p_samples = np.random.uniform(low=(0.1, 0.1, 0.1, 0.1, 0.1), 
                                  high=(7.0, 7.0, 4.0, 4.0, 3.0), size=(batch_size, 5))
    return p_samples.astype(np.float32)


@njit
def diffusion_trial(v, a, ndt, zr, dt, max_steps):
    """Simulates a trial from the diffusion model."""

    n_steps = 0.
    x = a * zr

    # Simulate a single DM path
    while (x > 0 and x < a and n_steps < max_steps):

        # DDM equation
        x += v*dt + np.sqrt(dt) * np.random.normal()

        # Increment step
        n_steps += 1.0

    rt = n_steps * dt
    return rt + ndt if x > 0. else -rt - ndt

@njit
def diffusion_condition(n_trials, v, a, ndt, zr=0.5, dt=0.005, max_steps=1e4):
    """Simulates a diffusion process over an entire condition."""
    
    x = np.empty(n_trials)
    for i in range(n_trials):
        x[i] = diffusion_trial(v, a, ndt, zr, dt, max_steps)
    return x


@njit
def diffusion_2_conds(params, n_trials, dt=0.005, max_steps=1e4):
    """
    Simulates a diffusion process for 2 conditions with 5 parameters (v1, v2, a1, a2, ndt).
    """
    
    n_trials_c1 = n_trials[0]
    n_trials_c2 = n_trials[1]
    
    v1, v2, a1, a2, ndt = params
    rt_c1 = diffusion_condition(n_trials_c1, v1, a1, ndt,  dt=dt, max_steps=max_steps)
    rt_c2 = diffusion_condition(n_trials_c2, v2, a2, ndt, dt=dt, max_steps=max_steps)
    rts = np.concatenate((rt_c1, rt_c2))
    return rts


def batch_simulator(prior_samples, n_obs, dt=0.005, s=1.0, max_iter=1e4):
    """
    Simulate multiple diffusion_model_datasets.
    """
    
    n_sim = prior_samples.shape[0]
    sim_data = np.empty((n_sim, n_obs), dtype=np.float32)
    
    n1 = n_obs//2
    n2 = n_obs - n1
    
    # Simulate diffusion data
    for i in range(n_sim):
        sim_data[i] = diffusion_2_conds(prior_samples[i], (n1, n2))
        
    # Create condition labels
    cond_arr = np.stack(n_sim * [np.concatenate((np.zeros(n1), np.ones(n2)))] )
    sim_data = np.stack((sim_data, cond_arr), axis=-1)
    
    return sim_data

class IdentityNetwork(tf.keras.Model):
    """Implements a flattened identity mapping of inputs to outputs."""

    def call(self, x: tf.Tensor):
        """Performs a flattened identity mapping.
        
        Parameters
        ----------
        x : tf.Tensor
            Input of shape (batch_size, N, x_dim)
        
        Returns
        -------
        out : tf.Tensor
            Output of shape (batch_size, out_dim)
        """
        return tf.reshape(x, [x.shape[0], -1])
    
#summary_net = InvariantNetwork()
summary_net = IdentityNetwork()
inference_net = InvertibleNetwork({'n_params': 5})
amortizer = SingleModelAmortizer(inference_net, summary_net)

generative_model = GenerativeModel(prior, batch_simulator)

trainer = ParameterEstimationTrainer(
    network=amortizer, 
    generative_model=generative_model,
)

# Pre-simulated data (could be loaded from somewhere else)
n_sim = 5000
n_obs = 100
true_params, x = generative_model(n_sim, n_obs)

losses = trainer.train_offline(epochs=1, batch_size=64, params=true_params, sim_data=x)

ValueError: Exception encountered when calling layer "sequential" (type Sequential).

Input 0 of layer "dense" is incompatible with the layer: expected axis -1 of input shape to have value 103, but received input with shape (64, 203)

Call arguments received:
  • inputs=tf.Tensor(shape=(64, 203), dtype=float32)
  • training=Nonemask=None

Allow kwargs in trainer.train_offline

Currently, the signature for trainer.train_offline is train_offline(self, epochs, batch_size, *args). This means that keyword calls will fail. All of the following calls should work:

Calls without keywords

train_offline(10, 32, theta, x)
train_offline(10, 32, m_oh, x)
train_offline(10, 32, m_oh, theta, x)

Calls with keywords

train_offline(epochs=10, batch_size=32, params=theta, sim_data=x)
train_offline(epochs=10, batch_size=32, model_indices=m_oh, sim_data=x)
train_offline(epochs=10, batch_size=32, model_indices=m_oh, params=theta, sim_data=x)

Calls with keywords, arbitrary order within **kwargs

train_offline(epochs=10, batch_size=32, sim_data=x, params=theta)
train_offline(epochs=10, batch_size=32, sim_data=x, model_indices=m_oh)
train_offline(epochs=10, batch_size=32, params=theta, sim_data=x, model_indices=m_oh)
train_offline(epochs=10, batch_size=32, sim_data=x, model_indices=m_oh, params=theta)

No mixed calls need to be supported.

Automatic Batching using `vmap`

Classes such as bayesflow.simulation.ContextGenerator or bayesflow.simulation.Prior allow the user to pass in a batched or an unbatched generation function, where typically, passing both is disallowed.

For quality of life (and possibly compatibility with #123) it would make sense to change this to a simple boolean flag, and automatically parallelize unbatched functions for the user with a variation of either JAX's or PyTorch's vmap function.

Example pseudo-code:

Replace this

class Prior:
    def __init__(self, batch_prior_fun=None, prior_fun=None):
        if (batch_prior_fun is None) == (prior_fun is None):
            raise ValueError
        
        self.prior = prior_fun
        self.batched_prior = batch_prior_fun

with

class Prior:
    def __init__(self, prior_fun, is_batched=True):
        self.prior = prior_fun if is_batched else vmap(prior_fun)

Since this is a breaking change, we would need a soft introduction of this feature including backward compatibility and deprecation messages.

Possible concerns:

  • vmap is backend-dependent, and not always compatible with all operations that could happen in prior_fun (e.g. data-dependent control flow), so allowing the user to disable this would be beneficial
  • sequential generation of batches is inherently incompatible with vmap, since vmap makes the stronger assumption that each batch item is independently processed

Contribution guide

I suggest adding a contribution guide (see e.g. this doc) to make it easier to contribute to the tool. In particular, this should cover:

  • From which branch to start (Development?)
  • The standard workflow (make issue, fork, make sure tests pass + add new ones, create pull request)

should I continue to try with more ACBs?

Hi Stefan et al.,

I am trying a model with 4 ACBs, and after training, the loss seemed to not moving down any more. But in validation, the posterior samples don't get close to the truth parameters.

So I wonder if I should try with more ACBs or anything? Do you have some suggestions?

Thank you so much!
Best,
Shuwan

limit of training data for input.

Hi there,

May I ask if there is any limit for input training data?

for example I want to put 50000 simulated data in the following size: (50000,3000,45)

when I try to run

%%time
losses = trainer.train_offline(epochs=100, batch_size=64, params=true_params, sim_data=x)

I got error:

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

is that because of my GPU limit (20G)?

actually all (3000,45) are integer only, is it possible to specify integer array to save ram?

ConfigurationError using TimeSeriesTransformer

Hi -

I'm looking to use BayesFlow for a model with a nested RL structure. In a previous post (#70 (comment)), I saw that you recommended using TimeSeriesTransformer instead of InvariantNetwork to ensure the model has memory. I also know that I want to pass along relevant variables related to context and other model calculations. Therefore, each simulated trial outputs not only the predicted choices but also other variables like utilities (see the function FD2_trial). However, I'm having issues structuring my model so that it has the correct dimensions.

When I try to configure the trainer, I get the following error:

Could not carry out computations of generative_model ->configurator -> amortizer -> loss! Error trace:
Exception encountered when calling layer 'multi_head_attention_45' (type MultiHeadAttention).

In this tf.Variable creation, the initial value's shape ((3, 4, 32)) is not compatible with the explicitly supplied shape argument ((7, 4, 32)).

I tried to change the input_dim of the TimeSeriesTransformer, but I keep getting the same error. I've also tried various fixes within the configurator without success. Could you help me determine why this error is occurring and how I can fix it?

Thanks,
Blair

Here is the full set-up:

def FD2_trial(offer, norm0, alpha0, beta0, epsilon0, delta0, eta = 0.8):

#Minimum offer amount
mn = 1

alpha = alpha0
beta = beta0
epsilon = epsilon0
delta = delta0

@staticmethod
def _FS_sim(offers=None, alpha=None, norm=None):

    norm_violation = norm-offers
    norm_violation = np.max([norm_violation, 0])
    
    return offers - alpha*norm_violation

# Norm update (RW)
norm = norm0 + epsilon * (offer-norm0)

CV = _FS_sim(offers=offer, alpha=alpha, norm=norm)

ao = np.max([offer-delta, mn])

if _FS_sim(offers=ao, alpha=alpha, norm=norm) > 0:
    aFV = eta * _FS_sim(offers=ao, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ao-delta, mn]), alpha=alpha, norm=norm), 0)
else:
    aFV = eta**2 * np.max(_FS_sim(offers=np.max([ao+delta, mn]), alpha=alpha, norm=norm), 0)

ro = offer+delta

if _FS_sim(offers=ro, alpha=alpha, norm=norm) > 0:
    rFV = eta * _FS_sim(offers=ro, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ro-delta,mn]), alpha=alpha, norm=norm), 0)
else:
    rFV = eta**2 * np.max(_FS_sim(offers=np.max([ro+delta, mn]), alpha=alpha, norm=norm), 0)
        
V = CV + (aFV - rFV)

# calculate probability of accepting offer:                                                                                             
prob = 1 / ( 1 + np.exp(-beta*V))
prob = 0.0001 + 0.9998 * prob
prob_a = [1-prob, prob]

# Simulate choices
action = np.random.choice([0, 1], p=prob_a)

# Output is action, offer, utilities, internal norms, and probabilities
return action, offer, V, norm, prob

def FD2_prior():

alpha = RNG.beta(2.0, 2.0)
beta = RNG.gamma(7.5,1.0)
epsilon = RNG.beta(2.0, 2.0)
delta = RNG.uniform(-2, 2)

return np.hstack((alpha, beta, epsilon, delta))

PARAM_NAMES = [
r"$alpha$",
r"$beta$",
r"$epsilon$",
r"$delta$",
]

prior = bf.simulation.Prior(prior_fun=FD2_prior, param_names=PARAM_NAMES)

MIN_OBS = 30
MAX_OBS = 30
NUM_CONDITIONS = 1


def random_num_obs(min_obs=MIN_OBS, max_obs=MAX_OBS):


return RNG.integers(low=min_obs, high=max_obs + 1)

 def generate_offer_matrix(num_obs):

offer_mat = np.zeros([num_obs,2])

for n in range(num_obs):
    offer_mat[n,1] = np.random.randint(-2,1)
    offer_mat[n,0] = np.random.randint(0,3)
    
return offer_mat


context_gen = bf.simulation.ContextGenerator(
non_batchable_context_fun=random_num_obs,
batchable_context_fun=generate_offer_matrix,
use_non_batchable_for_batchable=True,
 )


def UG_experiment(theta, offer_mat, num_obs):

out = np.zeros((num_obs, 5))
offers = np.zeros(num_obs)
norms = np.zeros(num_obs)
offers[0] = 5 # First offer is always 5
norms[0] = 10 # Initializing norms at 10
for n in range(num_obs):  
    out[n, :] = FD2_trial(offers[n], norm0 = norms[n], alpha0 = theta[0], beta0 = theta[1], epsilon0 = theta[2], delta0 = theta[3])

    if n < (num_obs - 1):
        # Only calculate next offer/norm if there are more offers coming
        norms[n+1] = out[n,3]
        if out[n, 0] == 1:
            # If accept, next offer goes down by [0, 1, 2] (to min of 1)
            offers[n+1] = max(offers[n]+offer_mat[n,1],1)
        else:
            # If reject, next offer goes up by [0, 1, 2] (to max of 9)
            offers[n+1] = min(offers[n]+offer_mat[n,0],9)

return out

simulator = bf.simulation.Simulator(simulator_fun=UG_experiment, context_generator=context_gen)

model = bf.simulation.GenerativeModel(prior=prior, simulator=simulator, name="FD2")

summary_net = bf.networks.TimeSeriesTransformer(input_dim=7, summary_dim=32, name="FD2_summary")

inference_net = bf.networks.InvertibleNetwork(
num_params=len(prior.param_names),
coupling_settings={"dense_args": dict(kernel_regularizer=None), "dropout": False},
name="FD2_inference",
)

amortizer = bf.amortizers.AmortizedPosterior(inference_net, summary_net, name="FD2_amortizer")

prior_means, prior_stds = prior.estimate_means_and_stds(n_draws=100000)
prior_means = np.round(prior_means, decimals=1)
prior_stds = np.round(prior_stds, decimals=1)

def configurator(forward_dict):

# Prepare placeholder dict
out_dict = {}

# Extract simulated choices
data = forward_dict["sim_data"].astype(np.float32)

# Context
context = forward_dict['sim_batchable_context']

# Concatenate 
out_dict["summary_conditions"] = np.c_[data, context].astype(np.float32) 

vec_num_obs = forward_dict["sim_non_batchable_context"] * np.ones((data.shape[0], 1))
out_dict["direct_conditions"] = np.sqrt(vec_num_obs).astype(np.float32)

# Get data generating parameters
params = forward_dict["prior_draws"].astype(np.float32)

# Standardize parameters
out_dict["parameters"] = (params - prior_means) / prior_stds

return out_dict

trainer = bf.trainers.Trainer(
generative_model=model, amortizer=amortizer, configurator=configurator, checkpoint_path="FD2_model"
)

error when importing the bayesflow

Hi Stefan et al.,

I am so impressed by the bayesflow work. However, when I try to implement it myself, I got an error when importing the bayesflow.

I have tried with Pycharm/Anaconda, they gave me the same error message:

ValueError: Arg specs do not match: original=FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), new=FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), fn=<function ones_like_v2 at 0x000002D9E1F41A80>

Do you know what happened and how to solve this?

Thank you so much!
Best regards,
Shuwan

FAQ in README

Commonly asked questions should be added in their own section in the README.

Summary network for time series data

Hi Stefan,

I would like to use a summary network to handle time series data coming from an ODE model. It is a toy model with one or two observed state(s), n_obs=31 fixed for each dataset and the described dynamics is very simple. From your experience, do you maybe have a proposal for a summary network that I should try out? To me, it is more important that the network is sufficient to capture all the relevant information than having the optimal output size. Thanks a lot!

Best regards
Zijian

`test_time_series_transformer` occasionally fails

This check is not reliable enough:

# Test non-permutation invariant
assert not np.allclose(out, out_perm, atol=1e-5)

Turning down atol could resolve this, but I'm open to more constructive fixes. Could also be worth investigating if this is not a deeper problem with the TimeSeriesTransformer.

Amortized point estimators

The paper "Likelihood-free parameter estimation with neural Bayes estimators" (Sainsbury-Dale, Zammit-Mangion, & Huser, 2023) enables neural amortized point estimation, which is generally faster than fully Bayesian neural inference with conditional generative models (current standard in BayesFlow).

Implementing amortized point estimators in BayesFlow would help with fast prototyping and sanity checks (and surely other tasks, too!).

Some BayesFlow pointers/proposals:

  • Implement the point estimation features in a class AmortizedPointEstimator in bayesflow.amortizers
  • Implement appropriate loss functions in bayesflow.losses (depending on the target quantity)
  • Implement a suitable inference network in bayesflow.inference_networks which takes the summary network output (e.g., DeepSet outputs) and regresses to the point estimate
  • Summary networks are already implemented in bayesflow.summary_networks and probably work out-of-the-box

Some links:

Guidelines on number of output and hidden units in LSTM

Scenario: Using an LSTM as summary network for uni-/multivariate time series data, with a flexible number of time series steps.

Can you give recommendations on the number of output units (summary_dim in https://github.com/stefanradev93/BayesFlow/blob/master/bayesflow/summary_networks.py#L123, default=10) and dimension of the LSTM hidden space (lstm_units, default=128)?

I would guess that summary_dim should be >= number of parameters to make sure enough information can be preserved. lstm_units has here the task of summarizing the entire sequence prior to passing through the final dense network (arguably a slightly different task than in e.g. machine translation tasks), therefore it should be fairly large too?

An empty checkpoint folder without memory.pkl crashes training

Hey everyone,

Great work on the new version of the package. The notebook examples are quite clear and easy to adapt.

In case you didn't know about this error, if you create an empty checkpoint folder ahead of running bayesflow.trainers.Trainer() this seems to crash the function. You can let Trainer() create the folder for you, but this may be a bit annoying for some analysis pipelines. For example, this will crash for me when using the master branch:

os.makedirs('checkpoint_model1')
trainer = bf.trainers.Trainer(
    amortizer=amortizer, 
    generative_model=generative_model, 
    configurator=configurator,
    checkpoint_path='checkpoint_model1')

Cheers,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.