stefanradev93 / bayesflow Goto Github PK
View Code? Open in Web Editor NEWA Python library for amortized Bayesian workflows using generative neural networks.
Home Page: https://bayesflow.org/
License: MIT License
A Python library for amortized Bayesian workflows using generative neural networks.
Home Page: https://bayesflow.org/
License: MIT License
Hi Stefan,
do you have any recommendations for how to best store the trained networks when using the parameter estimation workflow notebook?
Best regards
Zijian
I would like to ask what should we do for missing data in BayesFlow
E.g. I have a dataset X with N row and 5 columns, and they are i.i.d., and
X [ I , j ] ~ binomial (1, params[ j ] ) , hence I got five parameters , j = 0, 1, 2, 3, 4
say if for some reasons, X[ I , j ] is missing when I %% 5 = j
what should I put in X [ I , j ] in order to get correct posterior?
I tried putting value like -1 will not get correct results
I tried the version without missing and the output posterior is reasonable.
Thank you.
Currently, the following classes for meta are defined directly in Meta_Workflow.ipynb
and should to be integrated into bayesflow.networks
:
InvariantCouplingNet
ConditionalCouplingLayer
with InvariantCouplingNet unitsInvariantBayesFlow
Hey everyone,
Great work on the new version of the package. The notebook examples are quite clear and easy to adapt.
In case you didn't know about this error, if you create an empty checkpoint folder ahead of running bayesflow.trainers.Trainer()
this seems to crash the function. You can let Trainer()
create the folder for you, but this may be a bit annoying for some analysis pipelines. For example, this will crash for me when using the master
branch:
os.makedirs('checkpoint_model1')
trainer = bf.trainers.Trainer(
amortizer=amortizer,
generative_model=generative_model,
configurator=configurator,
checkpoint_path='checkpoint_model1')
Cheers,
Commonly asked questions should be added in their own section in the README.
Add units tests for attention.py
module and extend tests for summary_networks.py
module.
Link to the joss review issue :
openjournals/joss-reviews#5702
This issue is regarding your submission of the software paper BayesFlow to JOSS. I was trying to run the tests you mentioned in the tests folder using pytest. I am encountering the error because of the pyproject.toml file(this line in particular --cov BayesFlow). Can you please look into it once and respond.
Implement trainer.train_experience_replay
for all trainers in BaseTrainer
.
bayesflow.buffer.MemoryReplayBuffer
to variable argument lengthHi Stefan et al.,
I came across a situation that no matter how many ACBs I have stacked up (I have tried up to 13 ACBs), the transformed latent variables Zs are not perfectly following N(0,1) as the examples shown in paper/your example code.
Has this situation come across to you? Do you have any suggestions (if the transformed latent variables are not perfectly trained to be N(0,1), then the posterior draws from the inference phrase wouldn't be valid, right?)
Thank you so much!
Best,
Shuwan
In case the datasets have small sizes, it seems reasonable not to use any summary network. However, if we set summary_net = None, some problems concerning dimensions arise: Raw data are stored in an array of shape (batch_size, n_obs, D = dimension of each observation), while summary networks have outputs of shape (batch_size, summary_stats_dim). Thus, it would be helpful to have a IdentityNetwork which simply concatenates a dataset of multiple observations to one single vector. The outputs of the IdentityNetwork would then represent the case in which we use no summary network, but match the required input dimension of the inference network.
An own try to implement such an IdentityNetwork resulted in the following error:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from numba import njit
import tensorflow as tf
from bayesflow.networks import InvertibleNetwork, InvariantNetwork
from bayesflow.amortizers import SingleModelAmortizer
from bayesflow.trainers import ParameterEstimationTrainer
from bayesflow.diagnostics import *
from bayesflow.models import GenerativeModel
def prior(batch_size):
"""
Samples from the prior 'batch_size' times.
----------
Arguments:
batch_size : int -- the number of samples to draw from the prior
----------
Output:
theta : np.ndarray of shape (batch_size, theta_dim) -- the samples batch of parameters
"""
# Prior ranges for the simulator
# v_c ~ U(-7.0, 7.0)
# a_c ~ U(0.1, 4.0)
# t0 ~ U(0.1, 3.0)
p_samples = np.random.uniform(low=(0.1, 0.1, 0.1, 0.1, 0.1),
high=(7.0, 7.0, 4.0, 4.0, 3.0), size=(batch_size, 5))
return p_samples.astype(np.float32)
@njit
def diffusion_trial(v, a, ndt, zr, dt, max_steps):
"""Simulates a trial from the diffusion model."""
n_steps = 0.
x = a * zr
# Simulate a single DM path
while (x > 0 and x < a and n_steps < max_steps):
# DDM equation
x += v*dt + np.sqrt(dt) * np.random.normal()
# Increment step
n_steps += 1.0
rt = n_steps * dt
return rt + ndt if x > 0. else -rt - ndt
@njit
def diffusion_condition(n_trials, v, a, ndt, zr=0.5, dt=0.005, max_steps=1e4):
"""Simulates a diffusion process over an entire condition."""
x = np.empty(n_trials)
for i in range(n_trials):
x[i] = diffusion_trial(v, a, ndt, zr, dt, max_steps)
return x
@njit
def diffusion_2_conds(params, n_trials, dt=0.005, max_steps=1e4):
"""
Simulates a diffusion process for 2 conditions with 5 parameters (v1, v2, a1, a2, ndt).
"""
n_trials_c1 = n_trials[0]
n_trials_c2 = n_trials[1]
v1, v2, a1, a2, ndt = params
rt_c1 = diffusion_condition(n_trials_c1, v1, a1, ndt, dt=dt, max_steps=max_steps)
rt_c2 = diffusion_condition(n_trials_c2, v2, a2, ndt, dt=dt, max_steps=max_steps)
rts = np.concatenate((rt_c1, rt_c2))
return rts
def batch_simulator(prior_samples, n_obs, dt=0.005, s=1.0, max_iter=1e4):
"""
Simulate multiple diffusion_model_datasets.
"""
n_sim = prior_samples.shape[0]
sim_data = np.empty((n_sim, n_obs), dtype=np.float32)
n1 = n_obs//2
n2 = n_obs - n1
# Simulate diffusion data
for i in range(n_sim):
sim_data[i] = diffusion_2_conds(prior_samples[i], (n1, n2))
# Create condition labels
cond_arr = np.stack(n_sim * [np.concatenate((np.zeros(n1), np.ones(n2)))] )
sim_data = np.stack((sim_data, cond_arr), axis=-1)
return sim_data
class IdentityNetwork(tf.keras.Model):
"""Implements a flattened identity mapping of inputs to outputs."""
def call(self, x: tf.Tensor):
"""Performs a flattened identity mapping.
Parameters
----------
x : tf.Tensor
Input of shape (batch_size, N, x_dim)
Returns
-------
out : tf.Tensor
Output of shape (batch_size, out_dim)
"""
return tf.reshape(x, [x.shape[0], -1])
#summary_net = InvariantNetwork()
summary_net = IdentityNetwork()
inference_net = InvertibleNetwork({'n_params': 5})
amortizer = SingleModelAmortizer(inference_net, summary_net)
generative_model = GenerativeModel(prior, batch_simulator)
trainer = ParameterEstimationTrainer(
network=amortizer,
generative_model=generative_model,
)
# Pre-simulated data (could be loaded from somewhere else)
n_sim = 5000
n_obs = 100
true_params, x = generative_model(n_sim, n_obs)
losses = trainer.train_offline(epochs=1, batch_size=64, params=true_params, sim_data=x)
ValueError: Exception encountered when calling layer "sequential" (type Sequential).
Input 0 of layer "dense" is incompatible with the layer: expected axis -1 of input shape to have value 103, but received input with shape (64, 203)
Call arguments received:
• inputs=tf.Tensor(shape=(64, 203), dtype=float32)
• training=None
• mask=None
Add Project 528702768 and move below Citations
Scenario: Using an LSTM as summary network for uni-/multivariate time series data, with a flexible number of time series steps.
Can you give recommendations on the number of output units (summary_dim
in https://github.com/stefanradev93/BayesFlow/blob/master/bayesflow/summary_networks.py#L123, default=10) and dimension of the LSTM hidden space (lstm_units
, default=128)?
I would guess that summary_dim
should be >= number of parameters to make sure enough information can be preserved. lstm_units
has here the task of summarizing the entire sequence prior to passing through the final dense network (arguably a slightly different task than in e.g. machine translation tasks), therefore it should be fairly large too?
https://github.com/stefanradev93/BayesFlow/blob/749c0a6a04ed157eb3505796a4571e73405f5dc6/bayesflow/simulation.py#LL745C1-L745C53 - this should be self.prior_is_batched = self.prior.is_batched
, if I am not mistaken
Hi,
Thanks for providing this amazing tool! I have tried to fit a cognitive model with Softmax as the choice rule in the end. That is, P(option i) = exp(beta * Ui) / [sum(exp(beta * Uj)) for all j]; However, I cannot recover parameter beta;
I tried several minimal example with two options; and find that it seems exp(x1)/(exp(x1)+exp(x2)) is considered similarly as x1/(x1+x2); so when you have parameter beta in the latter case, it won't be able to be recovered;
Do you know what causes this and how to solve it? Thanks in advance!
Hello all,
Thank you all for the marvelous work on Bayesflow.
I have been trying to understand the calibration curve outputted by bayesflow.diagnostics.plot_calibration_curves. It seems to me that the calibration curve should be a function of predicted probability instead of the true probability. As defined in the expected_calibration_error function,
# Loop for each model and compute calibration errs per bin
for k in range(n_models):
y_true = (m_true.argmax(axis=1) == k).astype(np.float32)
y_prob = m_pred[:, k]
prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=num_bins)
# Compute ECE by weighting bin errors by bin size
bins = np.linspace(0.0, 1.0, num_bins + 1)
binids = np.searchsorted(bins[1:-1], y_prob)
bin_total = np.bincount(binids, minlength=len(bins))
nonzero = bin_total != 0
cal_err = np.sum(np.abs(prob_true - prob_pred) * (bin_total[nonzero] / len(y_true)))
cal_errs.append(cal_err)
probs.append((prob_true, prob_pred))
return cal_errs, probs
Then, the true probability array should be the first element in the output list instead of the second element. Going back to the bayesflow.diagnostics.plot_calibration_curves function,
# Determine n_subplots dynamically
n_row = int(np.ceil(num_models / 6))
n_col = int(np.ceil(num_models / n_row))
cal_errs, cal_probs = expected_calibration_error(true_models, pred_models, num_bins)
# Initialize figure
if fig_size is None:
fig_size = (int(5 * n_col), int(5 * n_row))
fig, axarr = plt.subplots(n_row, n_col, figsize=fig_size)
if n_row > 1:
ax = axarr.flat
# Plot marginal calibration curves in a loop
if n_row > 1:
ax = axarr.flat
else:
ax = axarr
for j in range(num_models):
# Plot calibration curve
ax[j].plot(cal_probs[j][0], cal_probs[j][1], color=color)
If I did not misunderstand, the true probability is plotted in the x-axis while the predicted probability is plotted in the y-axis. However, the axis labels are,
ax[j].set_xlabel("Predicted probability", fontsize=label_fontsize)
ax[j].set_ylabel("True probability", fontsize=label_fontsize)
Did I miss anything here? Once again, thank you for the splendid work on Bayesflow!
The current implementation of bayesflow.trainers.Trainer
implements several training strategies in one large class. This can be problematic for both maintainers and users, since
bayesflow
requires inheriting from the Trainer
classWe can remedy these issues by implementing the trainer strategies explicitly as their own concept, and following composition over inheritance in the Trainer class.
We can parallelize the test workflow in 2 ways:
tox
to parallelize environments.pytest-xdist
using the flag -n auto
.Drawbacks:
Feature: Option to create a Trainer
instance without a generative model for offline learning.
Use Case: Simulate the data outside of Python (e.g. C++/Matlab) but use the BayesFlow library in offline learning mode.
Hello!
I am using BayesFlow for the first time, and having some issues doing offline training with a summary network (everything works fine if I exclude it and just supply the data to the inference network). Basically, I am simulating from an agent-based model in R, and then bringing the data (13 parameters and 1921 output features) into Python so that it has the format required by BayesFlow.
{'prior_non_batchable_context': None, 'prior_batchable_context': None, 'prior_draws': array([[ 0.76718193, -0.87377334, 0.22366658, ..., -0.39844744,
-0.35801412, 0.04897532],
[ 1.30669447, 0.66729612, 1.7323655 , ..., -1.21459231,
0.74958571, -0.25844436],
[ 0.90680171, 1.29136708, 0.70601619, ..., -1.28499242,
0.98797805, 0.6543645 ],
...,
[-1.57379467, -0.29633931, -0.97821418, ..., -2.09261737,
-0.91399776, 1.30333936],
[ 0.45297908, -0.91439699, 0.79108664, ..., -0.16306107,
-0.50990459, -0.60457136],
[-0.80346153, 0.05948135, -1.28584127, ..., 0.532937 ,
0.54166668, -0.12752289]]), 'sim_non_batchable_context': None, 'sim_batchable_context': None, 'sim_data': array([[ 0.44540442, 0.40116571, -0.54365648, ..., -0.01023475,
-0.01002833, -0.01001542],
[-0.84753774, -1.34450615, -1.15397759, ..., -0.01023475,
-0.01002833, -0.01001542],
[ 0.44540442, 0.40116571, 1.01192892, ..., -0.01023475,
-0.01002833, -0.01001542],
...,
[-0.19984821, -0.77635165, -1.15397759, ..., -0.01023475,
-0.01002833, -0.01001542],
[-0.60475894, -0.00569413, -1.00236414, ..., -0.01023475,
-0.01002833, -0.01001542],
[ 0.44540442, 0.40116571, 1.01192892, ..., -0.01023475,
-0.01002833, -0.01001542]])}
Here is my configure input
script:
#define function that configures the input for the amortizer
def configure_input(forward_dict):
#prepare placeholder dict
out_dict = {}
#add to keys
out_dict["parameters"] = forward_dict["prior_draws"].astype(np.float32)
out_dict["summary_conditions"] = forward_dict["sim_data"].astype(np.float32)
out_dict["direct_conditions"] = None
#return the output dictionary
return out_dict
Then, if I run the following code:
summary_net = bf.networks.DeepSet()
inference_net = bf.networks.InvertibleNetwork(num_params = 13)
amortized_posterior = bf.amortizers.AmortizedPosterior(inference_net, summary_net)
trainer = bf.trainers.Trainer(amortizer = amortized_posterior, configurator = configure_input, memory = True)
offline_training = trainer.train_offline(simulations_dict = data, epochs = 2, batch_size = 32)
I get the following error:
ValueError: Exception encountered when calling layer 'sequential_549' (type Sequential).
Input 0 of layer "dense_1530" is incompatible with the layer: expected min_ndim=2, found ndim=1. Full shape received: (64,)
Call arguments received by layer 'sequential_549' (type Sequential):
• inputs=tf.Tensor(shape=(64,), dtype=float32)
• training=True
• mask=None
Do you know what the issue might be?
This would allow users to use the more flexible conda dependency solver over pip, which is helpful for large projects. See here for instructions.
Hi Stefan,
I am currently experiencing that for certain models (especially when the size of the dataset is slightly larger, e.g. n_obs = 100), the loss sometimes blows up extremely (say, about once every 40 epochs) so that the network is not converged after 100 epochs. Below is an example of such a blow-up (I simply printed the loss after each iteration):
..., -7.752036, -7.820761, -7.7039595, -7.6541286, -7.866893, -7.6999016, -7.7101583, -7.876931, -7.926651, 718.84845, 3543009.5, -1.1094699, 55.888943, 97.26874, 86.27302, 88.00319, 72.44775, 55.576897, 55.679665, 42.166874, 38.97898, 26.977058, 21.449322, 13.875741, 9.592149, 4.081973, 0.62280387, 0.8802163, -0.20799232, -0.26961386, -0.02687946, -1.470809, -0.33887288, -0.95399594, -1.1490471, -1.5857422, -1.2776537, -1.7386447, -1.5166757, -2.1432261, -2.240626, -2.8441887, -2.161209, -2.8351824, -2.9727767, -2.9907823, -3.5804815, -3.6286428, -3.659181, -3.566587, -3.8588657, -3.8712082, -3.9153986, -4.028116, -3.8262885, ...
Do I possibly need to change something in the optimizer setting or is it the number/size of affine coupling blocks that needs to be chosen more carefully? And is the default optimizer setting in the current BayesFlow package the same as the ones used for the BayesFlow paper?
Thanks a lot in advance!
Best regards
Zijian
Hi Stefan et al.,
I am so impressed by the bayesflow work. However, when I try to implement it myself, I got an error when importing the bayesflow.
I have tried with Pycharm/Anaconda, they gave me the same error message:
ValueError: Arg specs do not match: original=FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), new=FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), fn=<function ones_like_v2 at 0x000002D9E1F41A80>
Do you know what happened and how to solve this?
Thank you so much!
Best regards,
Shuwan
Hi Stefan,
I am testing some models, however, I found that the prior of a parameter is Uniform (0.1, 0.9). but for a real data set, for one parameter the posterior mean is -0.6 and posterior sd is 0.03, may I ask how is that possible and any way we can solve it? e.g. this is a probability parameter so I use logistic transformation?
Thank you.
Best
Tim
Hi, I'm wondering whether it is of any interest to include the ability for sequential networks or invariant networks to handle missing data. Some of our subjects have only partial data, but it would be a huge waste to just remove them. From the documentation, it's not straightforward to handle missing data. If this is something the maintainers feel like would be useful I am down to help with implementing this.
Currently, the signature for trainer.train_offline
is train_offline(self, epochs, batch_size, *args)
. This means that keyword calls will fail. All of the following calls should work:
Calls without keywords
train_offline(10, 32, theta, x)
train_offline(10, 32, m_oh, x)
train_offline(10, 32, m_oh, theta, x)
Calls with keywords
train_offline(epochs=10, batch_size=32, params=theta, sim_data=x)
train_offline(epochs=10, batch_size=32, model_indices=m_oh, sim_data=x)
train_offline(epochs=10, batch_size=32, model_indices=m_oh, params=theta, sim_data=x)
Calls with keywords, arbitrary order within **kwargs
train_offline(epochs=10, batch_size=32, sim_data=x, params=theta)
train_offline(epochs=10, batch_size=32, sim_data=x, model_indices=m_oh)
train_offline(epochs=10, batch_size=32, params=theta, sim_data=x, model_indices=m_oh)
train_offline(epochs=10, batch_size=32, sim_data=x, model_indices=m_oh, params=theta)
No mixed calls need to be supported.
I suggest adding a contribution guide (see e.g. this doc) to make it easier to contribute to the tool. In particular, this should cover:
This check is not reliable enough:
# Test non-permutation invariant
assert not np.allclose(out, out_perm, atol=1e-5)
Turning down atol
could resolve this, but I'm open to more constructive fixes. Could also be worth investigating if this is not a deeper problem with the TimeSeriesTransformer
.
The paper 'Free-form Flows: Make Any Architecture a Normalizing Flow' by Draxler et al. 2023 introduces a loss that allows to use any architecture as inference network for sbi. A new amortizer could implement the loss function and be tested on benchmark tasks.
Link to paper: https://arxiv.org/abs/2310.16624
Hi Stefan et al.,
I am trying a model with 4 ACBs, and after training, the loss seemed to not moving down any more. But in validation, the posterior samples don't get close to the truth parameters.
So I wonder if I should try with more ACBs or anything? Do you have some suggestions?
Thank you so much!
Best,
Shuwan
Coin sampling (Sharrock & Nemeth, 2023, [1]) is a learning-rate-free optimization scheme that performs on par with perfectly LR-tuned SVGD up to O(log N).
According to @louissharrock, coin sampling is currently implemented in JAX. If there was a TensorFlow implementation, we could directly use it within BayesFlow if it meets the tf.keras.optimizer.Optimizer
API.
Within the BayesFlow pipeline, the optimizer is passed when we initiate the training loop. Here's a pointer for online training via train_online
:
BayesFlow/bayesflow/trainers.py
Line 371 in f5f7a6a
Hi Stefan,
I would like to use a summary network to handle time series data coming from an ODE model. It is a toy model with one or two observed state(s), n_obs=31 fixed for each dataset and the described dynamics is very simple. From your experience, do you maybe have a proposal for a summary network that I should try out? To me, it is more important that the network is sufficient to capture all the relevant information than having the optimal output size. Thanks a lot!
Best regards
Zijian
In tutorial -- online training, I saw that we could put prior_N function to allow for varying sample size. I would like to ask how we can do the same for offline training? what should be the input of sim_data in the trainer.train_offine function? As suggested by others, this could be very helpful since we can use other software to simulate data like C++
Thank you very much!
Add functionality for providing data/parameter transformations, such as scaling, normalization, noise injection, etc. to the trainers.
Hello 👋 ,
I was trying to understand the benchmark datasets used to evaluate BayesFlow's performance. One thing that I noted for a couple of datasets is the scaling factor in front of the distribution for sampling the observations. One example is linked below, which is the T.2 task from the sbib paper.
Generating points for the original task would look like this:
theta ~ scale * N(0,1) = N(0, scale * 1)
x ~ N(theta, scale * 1)
with scale being the same for both distributions i.e., scale=0.1
Now your implementation is:
theta ~ scale * N(0,1) (same as above)
x ~ scale * N(theta, 1) = N(scale * theta, scale * 1)
which is equivalent to having a different prior compared to the T.2 task for the same data distribution, i.e.:
theta ~ scale^2 * N(0, 1) = N(0, scale^2)
x ~ N(theta, scale * 1)
maybe im also missing something but i would be glad for clarification. I also wanted to ask what code you used for the multivariate dataset from your paper part 3.3?
Easily findable instructions on how to install the tool would be good (I could not find any ...)
Line 83 in 45389ad
I believe the paper should also cite ABCpy here: https://github.com/eth-cscs/abcpy, with paper here
Related to openjournals/joss-reviews#5702
Trained models saved in the master branch and then loaded within the development branch do not give the same results.
I tested it using the Intro_Amortized_Posterior_Estimation-notebook in a clean install of the Master branch and saved the model by providing the checkpoint_path
argument for the Trainer
. Then I loaded the model within a clean install of the Development branch where I only skipped the online training cell.
All the diagnostics plots in the notebook changed significantly. Most prominently the plot_recovery
as can be seen in the attached images.
However, retrain the model on the Development branch gives the correct results like in the Master branch. Hence, it seems like the branches have different defaults somewhere, which would be interesting to know.
Classes such as bayesflow.simulation.ContextGenerator
or bayesflow.simulation.Prior
allow the user to pass in a batched or an unbatched generation function, where typically, passing both is disallowed.
For quality of life (and possibly compatibility with #123) it would make sense to change this to a simple boolean flag, and automatically parallelize unbatched functions for the user with a variation of either JAX's or PyTorch's vmap
function.
Example pseudo-code:
Replace this
class Prior:
def __init__(self, batch_prior_fun=None, prior_fun=None):
if (batch_prior_fun is None) == (prior_fun is None):
raise ValueError
self.prior = prior_fun
self.batched_prior = batch_prior_fun
with
class Prior:
def __init__(self, prior_fun, is_batched=True):
self.prior = prior_fun if is_batched else vmap(prior_fun)
Since this is a breaking change, we would need a soft introduction of this feature including backward compatibility and deprecation messages.
Possible concerns:
prior_fun
(e.g. data-dependent control flow), so allowing the user to disable this would be beneficialThanks for the very interesting approach for simulation based inference
I have been testing on this and just would like to ask the following:
Is it a requirement that the sample size in the training is the sample as the real data?
e.g. in the example for parameter estimation, if I use the following:
losses = trainer.simulate_and_train_offline(n_sim=10000, epochs=75, batch_size=64, n_obs=10000)
i.e. the dataset has 10000 observation
then if I simulate a dataset with size 10000 and to obtain posterior, I got some reasonable value.
but when I just use the following simulate a dataset with size 1000, I got the following true value and posterior
true_params = prior(1)
print(true_params)
x = batch_simulator(true_params,1000)
param_samples = amortizer.sample(x, 10000)
print(param_samples.mean(0))
print(param_samples.std(0))
[[1.8459227 4.879591 0.42127568 0.44341898 2.0850098 ]] : true value
[ 1.9427828 14.766885 12.579591 -0.948793 1.0707246] : posterior mean
[ 0.9751236 18.545668 8.544211 0.651906 1.5197684] : posterior sd
Thank you.
Hi -
I'm looking to use BayesFlow for a model with a nested RL structure. In a previous post (#70 (comment)), I saw that you recommended using TimeSeriesTransformer
instead of InvariantNetwork
to ensure the model has memory. I also know that I want to pass along relevant variables related to context and other model calculations. Therefore, each simulated trial outputs not only the predicted choices but also other variables like utilities (see the function FD2_trial
). However, I'm having issues structuring my model so that it has the correct dimensions.
When I try to configure the trainer, I get the following error:
Could not carry out computations of generative_model ->configurator -> amortizer -> loss! Error trace:
Exception encountered when calling layer 'multi_head_attention_45' (type MultiHeadAttention).
In this
tf.Variable
creation, the initial value's shape ((3, 4, 32)) is not compatible with the explicitly suppliedshape
argument ((7, 4, 32)).
I tried to change the input_dim
of the TimeSeriesTransformer
, but I keep getting the same error. I've also tried various fixes within the configurator
without success. Could you help me determine why this error is occurring and how I can fix it?
Thanks,
Blair
Here is the full set-up:
def FD2_trial(offer, norm0, alpha0, beta0, epsilon0, delta0, eta = 0.8):
#Minimum offer amount
mn = 1
alpha = alpha0
beta = beta0
epsilon = epsilon0
delta = delta0
@staticmethod
def _FS_sim(offers=None, alpha=None, norm=None):
norm_violation = norm-offers
norm_violation = np.max([norm_violation, 0])
return offers - alpha*norm_violation
# Norm update (RW)
norm = norm0 + epsilon * (offer-norm0)
CV = _FS_sim(offers=offer, alpha=alpha, norm=norm)
ao = np.max([offer-delta, mn])
if _FS_sim(offers=ao, alpha=alpha, norm=norm) > 0:
aFV = eta * _FS_sim(offers=ao, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ao-delta, mn]), alpha=alpha, norm=norm), 0)
else:
aFV = eta**2 * np.max(_FS_sim(offers=np.max([ao+delta, mn]), alpha=alpha, norm=norm), 0)
ro = offer+delta
if _FS_sim(offers=ro, alpha=alpha, norm=norm) > 0:
rFV = eta * _FS_sim(offers=ro, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ro-delta,mn]), alpha=alpha, norm=norm), 0)
else:
rFV = eta**2 * np.max(_FS_sim(offers=np.max([ro+delta, mn]), alpha=alpha, norm=norm), 0)
V = CV + (aFV - rFV)
# calculate probability of accepting offer:
prob = 1 / ( 1 + np.exp(-beta*V))
prob = 0.0001 + 0.9998 * prob
prob_a = [1-prob, prob]
# Simulate choices
action = np.random.choice([0, 1], p=prob_a)
# Output is action, offer, utilities, internal norms, and probabilities
return action, offer, V, norm, prob
def FD2_prior():
alpha = RNG.beta(2.0, 2.0)
beta = RNG.gamma(7.5,1.0)
epsilon = RNG.beta(2.0, 2.0)
delta = RNG.uniform(-2, 2)
return np.hstack((alpha, beta, epsilon, delta))
PARAM_NAMES = [
r"$alpha$",
r"$beta$",
r"$epsilon$",
r"$delta$",
]
prior = bf.simulation.Prior(prior_fun=FD2_prior, param_names=PARAM_NAMES)
MIN_OBS = 30
MAX_OBS = 30
NUM_CONDITIONS = 1
def random_num_obs(min_obs=MIN_OBS, max_obs=MAX_OBS):
return RNG.integers(low=min_obs, high=max_obs + 1)
def generate_offer_matrix(num_obs):
offer_mat = np.zeros([num_obs,2])
for n in range(num_obs):
offer_mat[n,1] = np.random.randint(-2,1)
offer_mat[n,0] = np.random.randint(0,3)
return offer_mat
context_gen = bf.simulation.ContextGenerator(
non_batchable_context_fun=random_num_obs,
batchable_context_fun=generate_offer_matrix,
use_non_batchable_for_batchable=True,
)
def UG_experiment(theta, offer_mat, num_obs):
out = np.zeros((num_obs, 5))
offers = np.zeros(num_obs)
norms = np.zeros(num_obs)
offers[0] = 5 # First offer is always 5
norms[0] = 10 # Initializing norms at 10
for n in range(num_obs):
out[n, :] = FD2_trial(offers[n], norm0 = norms[n], alpha0 = theta[0], beta0 = theta[1], epsilon0 = theta[2], delta0 = theta[3])
if n < (num_obs - 1):
# Only calculate next offer/norm if there are more offers coming
norms[n+1] = out[n,3]
if out[n, 0] == 1:
# If accept, next offer goes down by [0, 1, 2] (to min of 1)
offers[n+1] = max(offers[n]+offer_mat[n,1],1)
else:
# If reject, next offer goes up by [0, 1, 2] (to max of 9)
offers[n+1] = min(offers[n]+offer_mat[n,0],9)
return out
simulator = bf.simulation.Simulator(simulator_fun=UG_experiment, context_generator=context_gen)
model = bf.simulation.GenerativeModel(prior=prior, simulator=simulator, name="FD2")
summary_net = bf.networks.TimeSeriesTransformer(input_dim=7, summary_dim=32, name="FD2_summary")
inference_net = bf.networks.InvertibleNetwork(
num_params=len(prior.param_names),
coupling_settings={"dense_args": dict(kernel_regularizer=None), "dropout": False},
name="FD2_inference",
)
amortizer = bf.amortizers.AmortizedPosterior(inference_net, summary_net, name="FD2_amortizer")
prior_means, prior_stds = prior.estimate_means_and_stds(n_draws=100000)
prior_means = np.round(prior_means, decimals=1)
prior_stds = np.round(prior_stds, decimals=1)
def configurator(forward_dict):
# Prepare placeholder dict
out_dict = {}
# Extract simulated choices
data = forward_dict["sim_data"].astype(np.float32)
# Context
context = forward_dict['sim_batchable_context']
# Concatenate
out_dict["summary_conditions"] = np.c_[data, context].astype(np.float32)
vec_num_obs = forward_dict["sim_non_batchable_context"] * np.ones((data.shape[0], 1))
out_dict["direct_conditions"] = np.sqrt(vec_num_obs).astype(np.float32)
# Get data generating parameters
params = forward_dict["prior_draws"].astype(np.float32)
# Standardize parameters
out_dict["parameters"] = (params - prior_means) / prior_stds
return out_dict
trainer = bf.trainers.Trainer(
generative_model=model, amortizer=amortizer, configurator=configurator, checkpoint_path="FD2_model"
)
Hi Stefan et al.,
Thank you for your research and building this fantastic package. Amin Ghaderi and I are currently building and testing joint models of EEG and behavioural data for each experimental trial. This necessitates approximating posteriors from joint likelihoods on the single-trial level, so we are using BayesFlow. We will likely have a preprint ready within the next couple of weeks.
We are thinking about the future of using BayesFlow for joint modeling (e.g. neural and behavioural data) research. We often use hierarchical models, which we have often implemented using JAGS and Stan etc (https://github.com/mdnunez/pyhddmjags).
Does BayesFlow currently have the ability to fit hierarchical models? And have you thought about his line of research? I suspect it would necessitate some chain of invertible blocks?
A simple hierarchical model extension of the existing DDM example would be a model in which there are many experimental conditions c (e.g. 10). But there is also a hierarchical drift rate parameter, the mean of the condition drift rates:
Likelihood: choice-RT ~ DDM(drift[c], ...)
Prior: drift[c] ~ Normal(mu_drift, std_drift^2)
Hyperprior: mu_drift ~ Uniform(-7, 7), etc.
This isn't as simple as including an extra normal distribution in the simulator because we want to estimate marginal posterior distributions for the 10 condition drifts: delta[c], as well as marginal posterior distributions for the hierarchical parameters mu_drift and std_drift. I also believe there is a fundamental difference between the model above and this model:
Likelihood: choice-RT ~ Complicated_likelihood(mu_drift, std_drift^2, ...)
Prior: mu_drift ~ Uniform(-7, 7), etc.
Kind regards,
Michael
Hi there,
May I ask if there is any limit for input training data?
for example I want to put 50000 simulated data in the following size: (50000,3000,45)
when I try to run
%%time losses = trainer.train_offline(epochs=100, batch_size=64, params=true_params, sim_data=x)
I got error:
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
is that because of my GPU limit (20G)?
actually all (3000,45) are integer only, is it possible to specify integer array to save ram?
The paper "Likelihood-free parameter estimation with neural Bayes estimators" (Sainsbury-Dale, Zammit-Mangion, & Huser, 2023) enables neural amortized point estimation, which is generally faster than fully Bayesian neural inference with conditional generative models (current standard in BayesFlow).
Implementing amortized point estimators in BayesFlow would help with fast prototyping and sanity checks (and surely other tasks, too!).
Some BayesFlow pointers/proposals:
AmortizedPointEstimator
in bayesflow.amortizers
bayesflow.losses
(depending on the target quantity)bayesflow.inference_networks
which takes the summary network output (e.g., DeepSet outputs) and regresses to the point estimatebayesflow.summary_networks
and probably work out-of-the-boxSome links:
NeuralEstimators
package (R): https://github.com/msainsburydale/NeuralEstimatorsNeuralEstimators.jl
package (Julia): https://github.com/msainsburydale/NeuralEstimators.jlThe scatterplot function bayesflow.diagnostics.true_vs_estimated()
displays incorrect values for the explained variance R^2
, i.e. values greater than 1.0 and less than -1.0.
Implement the methods from https://arxiv.org/abs/2301.11873 in the library. We need:
HierarchicalNetwork
in summary_networks.py
PMPNetwork
with probability outputs and twirls.EvidentialNetwork
in terms of generality.3.9 / 3.10 → 3.10 / 3.11
The current design of bayesflow.simulation.Prior
strongly resembles the decorator pattern. For small codebases, it may be beneficial to directly allow the usage of Prior
s as decorators.
Example:
@prior(is_batched=True)
def f(batch_size):
return np.random.uniform(low=-2.0, high=2.0, size=batch_size)
This can likely be achieved alongside the current class-based usage pattern.
Hi Stefan et al.,
I am curious that how did you determine the number of layers for the different models?
In the simulation studies, you mentioned like 6-10 ACBs should be enough for moderate model setup. So instead of trying different #of ACBs and keep stacking up, do you have any suggestions/better way?
Thanks!
Best,
Shuwan
Currently, there are three separate trainer classes:
bayesflow.trainers.ParameterEstimationTrainer
,bayesflow.trainers.ModelComparisonTrainer
, andbayesflow.trainers.MetaTrainer
.They share the same method names but act on different inputs/outputs as well as generative models and network architectures. This issue aims at refactoring the Trainer classes into a metaclass bayesflow.trainers.BaseTrainer
with subclasses ParameterEstimationTrainer
, ModelComparisonTrainer
, and MetaTrainer
. The introduction of a unified GenerativeModel
interface has outsourced the forward inference process from the trainers to the generative model. This circumstance facilitates refactoring the trainer class since the core difference used to lie in the forward inference implementations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.