openpharma / brms.mmrm Goto Github PK
View Code? Open in Web Editor NEWR package to run Bayesian MMRMs using {brms}
Home Page: https://openpharma.github.io/brms.mmrm/
License: Other
R package to run Bayesian MMRMs using {brms}
Home Page: https://openpharma.github.io/brms.mmrm/
License: Other
I saw that a manuscript based on the preprint on SBC was accepted and will soon be published in Bayesian Analysis:
@Article{Modrak2024,
author = {Martin Modr{\'a}k and Angie H. Moon and Shinyoung Kim and Paul B{\"u}rkner and Niko Huurre and Kate\v{r}ina Faltejskov{\'a} and Andrew Gelman and Aki Vehtari},
journal = {Bayesian Analysis},
title = {{S}imulation-{B}ased {C}alibration {C}hecking for {B}ayesian {C}omputation: {T}he {C}hoice of {T}est {Q}uantities {S}hapes {S}ensitivity},
year = {2024},
volume = {forthcoming},
doi = {10.1214/23-BA1404},
url = {https://doi.org/10.1214/23-BA1404},
}
We could add this reference to the respective package vignette.
I noticed that the authors refer to SBC "checking":
The term in the literature is “simulation-based calibration”; here we have added the word “checking” to emphasize that these methods do not themselves produce calibration; rather, they measure departure from calibration.
I think it would be helpful (for clarity) to refer to SBC "checking" in the vignette too.
EDIT: just keeping this as a backup. I don't think we need to change to this direction just yet.
We currently have a brms
-powered MMRM prototype, and it can analyze and summarize treatment effects of change from baseline outcomes. However, there are still roadblocks:
hypothesis()
output probabilities of the form, "Prob(treatment effect > threshold)". In addition, I know my team will ask for something the call "effect size", which is the marginal posterior of treatment effect / sigma
. These and other posterior summaries are important for inference, but brms
does not seem to have a way to do this for arbitrary parameterizations. The fitted-values-based workaround at #20 is unorthodox (might not be well-received) and is computationally inefficient.Given the above, I would like to begin a pure Stan prototype that focuses on the parameterizations where informative priors will already make sense. I am thinking of two cases in particular:
For both (1) and (2), the parameters are meaningful and interpretable enough that it is feasible to assign informative priors. And I think they cover most of the kinds of informative priors we would want to assign. Admittedly, they are unorthodox because there is no intercept term etc., but I think the ability to set informative priors provides plenty of justification. There might be other ways to set informative priors, e.g. transformations of variables or functional priors, but that's quite a bit of a hurdle, and it would create opportunities for error / more explaining we would have to do.
All, please look at the authors list, copyright/license, and acknowledgements/references in the DESCRIPTION, LICENSE, LICENSE.md, and README.Rmd files. Please let me know if I have missed anything that should be included or if I should omit or change anything mentioned. (Let's plan to cite the Novartis vignettes when they are public and a citation is available.)
cc @kkmann, @yonicd, @andrew-bean, @chstock, @danleibovitz, @weberse2
brms
allows priors to be unspecified or partially specified. It would be great to have a brms.mmrm
function to get the full prior given a dataset, formula, and partially specified prior. brms::prior_summary()
requires an existing brms
object.
Using brms
to fit an MMRM on simulated data, I see:
library(brms)
library(brms.mmrm)
library(dplyr)
data <- brm_simulate()$data %>%
rename(
CHG = response,
AVISIT = time,
TRT01P = group,
USUBJID = patient
) %>%
mutate(BASE = rnorm(n = n()))
formula <- brmsformula(
CHG ~ 1 + AVISIT + BASE + BASE:AVISIT + TRT01P + TRT01P:AVISIT,
autocor = ~unstr(time = AVISIT, gr = USUBJID),
sigma ~ AVISIT
)
prior <- prior(normal(0, 5), class = Intercept) +
prior(normal(0, 2), class = b) +
prior(normal(0, 2), class = Intercept, dpar = sigma) +
prior(normal(0, log(4.0) / 1.64), class = b, dpar = sigma)
output <- brm(
formula = formula,
prior = prior,
data = data,
chains = 1,
iter = 50,
refresh = 0L
)
> print(output)
Family: gaussian
Links: mu = identity; sigma = log
Formula: CHG ~ 1 + AVISIT + BASE + BASE:AVISIT + TRT01P + TRT01P:AVISIT
autocor ~ unstr(time = AVISIT, gr = USUBJID)
sigma ~ AVISIT
Data: data (Number of observations: 800)
Draws: 1 chains, each with iter = 50; warmup = 25; thin = 1;
total post-warmup draws = 25
Correlation Structures:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
cortime(1,2) -0.21 0.01 -0.23 -0.20 1.25 4 NA
cortime(1,3) -0.56 0.00 -0.56 -0.55 1.54 3 NA
cortime(2,3) 0.18 0.00 0.18 0.19 1.83 2 2
cortime(1,4) 0.04 0.00 0.04 0.05 1.70 13 3
cortime(2,4) -0.05 0.01 -0.07 -0.04 1.77 3 6
cortime(3,4) -0.11 0.01 -0.12 -0.10 1.89 2 NA
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.04 0.00 1.04 1.04 1.63 6 2
sigma_Intercept -0.84 0.00 -0.85 -0.83 2.26 2 10
AVISIT2 0.32 0.01 0.31 0.33 1.04 8 NA
AVISIT3 0.69 0.00 0.68 0.69 1.84 7 NA
AVISIT4 0.93 0.00 0.92 0.93 1.70 3 NA
BASE 0.02 0.00 0.02 0.03 1.04 9 NA
TRT01P2 -1.74 0.00 -1.74 -1.73 1.49 2 NA
AVISIT2:BASE 0.04 0.00 0.04 0.05 2.36 2 2
AVISIT3:BASE -0.02 0.00 -0.03 -0.02 1.05 11 NA
AVISIT4:BASE -0.71 0.00 -0.71 -0.71 1.32 7 NA
AVISIT2:TRT01P2 -1.01 0.00 -1.02 -1.00 2.32 2 NA
AVISIT3:TRT01P2 -0.42 0.00 -0.43 -0.41 1.16 8 NA
AVISIT4:TRT01P2 0.48 0.00 0.47 0.49 1.04 9 NA
sigma_AVISIT2 0.63 0.00 0.62 0.63 1.04 7 NA
sigma_AVISIT3 -2.11 0.01 -2.12 -2.10 2.26 2 5
sigma_AVISIT4 1.18 0.00 1.17 1.18 0.97 10 NA
Most of the parameterization is clear enough, but I do not understand why there is a sigma_Intercept
parameter in addition to the more understandable residual standard deviation parameters sigma_AVISIT2
, sigma_AVISIT3
, and sigma_AVISIT4
. I expected to see a sigma_AVISIT1
parameter instead of sigma_Intercept
because residual variances do not seem like a place to have a treatment effect parameterization. Anyone care to enlighten me?
As discussed recently, when borrowing from external data that is misaligned w.r.t. the visit schedule, linear (additive) mixed models might be helpful to derive priors for MMRMs. In practice, this challenge could occur quite frequently and might be something to be addressed in a future vignette.
The attached file includes a first (very preliminary) exploration of models with spline-based smooths fitted using brms::brm()
and mgcv::gam()
, and marginal means computed with the {emmeans}
package.
Bayesian linear (additive) mixed model with spline-based smooths.pdf
Related to #53.
c.f. #51 (comment). I started a vignette at https://rconsortium.github.io/brms.mmrm/articles/methods.html with the goal of explaining every part of our method, and it is sparse on how emmeans
transforms posterior samples of model coefficients into marginal means. I understand the emmeans
docs and https://www.jstatsoft.org/article/view/v069i01 correctly, the current behavior seems reasonable:
I think this means each marginal mean is an average over the predictions at a given combination of study arm and time point, where continuous nuisance variables are set to the mean and categorical nuisance variables are averaged in a weighted way according to how levels are represented in the data. I think it would be good to dig into the code, e.g. https://github.com/rvlenth/emmeans/blob/master/R/emmeans.R#L332-L389, to figure out if this is true. And I have trouble wrapping my head around categorical nuisance variables, especially when there is more than 1.
Unstructured correlation was introduced in brms
version 2.19.0 (https://github.com/paul-buerkner/brms/blob/master/NEWS.md)
Since this package depends on brms::unstr
, should we not require brms >= 2.19.0? Otherwise, users with older versions of brms will get a warning on package load and then an error when brm_model
is called.
# remotes::install_version("brms", version = "2.18.0")
stopifnot(packageVersion("brms") < "2.19.0")
library(brms.mmrm)
# Warning message:
# object ‘unstr’ is not exported by 'namespace:brms'
library(dplyr)
library(magrittr)
set.seed(0L)
raw_data <- brm_simulate_simple(
n_group = 3,
n_patient = 100,
n_time = 4
) %>%
extract2("data") %>%
brm_simulate_continuous(c("biomarker1", "biomarker2", "biomarker3"))
data <- brm_data(
data = raw_data,
outcome = "response",
role = "response",
group = "group",
patient = "patient",
time = "time",
covariates = c("biomarker1", "biomarker2"),
level_control = "group_1",
level_baseline = "time_1"
)
formula <- brm_formula(
data = data,
intercept = TRUE,
effect_baseline = FALSE,
effect_group = TRUE,
effect_time = TRUE,
interaction_baseline = FALSE,
interaction_group = TRUE
)
model <- brm_model(data = data, formula = formula, refresh = 0)
# Error in unstr(time = time, gr = patient) :
# could not find function "unstr"
Considering #40, I think it is reasonable to change the parameterization of the model to accommodate whatever kind of informative prior the user wants. I think we could have wrappers for common cases: one with separate parameters for all the group means, and another with parameters for placebo group means and treatment effects. The wrappers could set up the data, formulas, and independent component-wise mixture priors.
Toeplitz, AR(1), etc. Should be straightforward to just supply the right brms
functions to the formula in brm_formula()
.
In our last meeting we decided to add a vignette that provides exemplary comparisons of Bayesian MMRM fits, obtained by brms.mmrm::brm_model()
, and frequentist MMRM fits, obtained by mmrm::mmrm()
, using one or two of the identified datasets. There is now a new branch where this vignette will be developed.
@kkmann, @chstock, @yonicd, @andrew-bean: any objections to transferring this repo to http://github.com/openpharma/brms.mmrm?
Use emmeans
and implement one or two basic plots. I aim to draft this before the next developer meeting on 2023-04-23.
Arguments like response
, group
, and time
are repeated in functions brm_formula()
and brm_summary()
. This means the user needs to copy them from one to the other, and there is room for error when these sets of arguments are not perfectly consistent.
When we were talking about the package specification, @kkmann mentioned the concept of "roles" from the tidy models universe. Metadata, such as which column is the treatment group indicator, should belong with the data, and it should only have to be declared once. We can implement functions to assign these roles as attributes and make the result a classed tibble
. We could also throw in some preprocessing functionality like completing and arranging the data. Seems like a good time to also test how well brms
handles missing values.
Suggested by @chstock. I picture a model with subgroup and subgroup/treatment interaction and subgroup/treatment/time interaction with the following summaries:
We would also want a test of treatment/subgroup interaction as a whole, c.f. openpharma/mmrm#164.
I think this issue deserves to be addressed eventually, but the base case and informative priors are more important.
Looking at an MMRM run from the Novartis vignettes:
mmrm_mac_model <- bf( CHG ~ 0 + AVISIT + BASE + BASE:AVISIT + TRT01P + TRT01P:AVISIT + ( 0 + AVISIT + BASE + BASE:AVISIT | STUDYID),
autocor=~unstr(time=AVISIT, gr=USUBJID), sigma ~ AVISIT,
center=FALSE)
mmrm_mac_prior <- prior(normal(0, 1), class=b) +
prior(normal(0, 1), class=b, dpar=sigma) +
prior(normal(0, 0.5), class=sd, group=STUDYID) +
prior(lkj(2), class=cor, group=STUDYID)
histfit <- brm(
formula=mmrm_mac_model,
prior=mmrm_mac_prior,
data = historical_studies,
drop_unused_levels=FALSE,
seed = 234525,
control = list(adapt_delta=0.90),
chains = 1,
iter = 500
)
There is a lot I would like to simplify here. To me, the formula is a leaky abstraction and requires a thorough understanding of the model. Even more so with the prior specification: not only does the syntax require deep knowledge of the parameterization, it also literally writes user-provided Stan code, which seems dangerous for users who are not experts in Bayesian methods or MMRMs. I would like to seriously tighten up the encapsulation while also preserving the flexibility and helping users set informative priors correctly on the correct fixed effects.
I am thinking of something like an interface for a formula and/or prior builder which accepts a dataset and the user's choice of model. Beyond that, I have not wrapped my head around how exactly it would work, and I would be happy to take suggestions.
Would help to check:
coda::as.mcmc()
/brms
/emmeans
uses commas to delimit treatment and time in the column names of marginal means.brm_mariginal_summaries()
and brm_marginal_probabilities()
are long and tidy, which is nice, but they are not in a form convenient to paste into clinical study reports. It would be good to have downstream functions to help with the latter, maybe using gt
or Tplyr
or xtable
.
With the basic package wrapper in place, I think this is a good place to start with the meaningful content.
@kkmann, @yonicd, and @chstock, I know we talked about using brms
itself to simulate data. However, just for this first go-round, I think I would prefer to use ordinary R. Once we have that, we will have the beginnings of an interface. Then after we can fit a brms
model, it will be easier to go back and rewrite the internals of the simulation function to use brms
. Sound reasonable?
README Usage section and maybe a package vignette discussing different parameterizations.
I fitted MMRMs to the orthodont dataset using both mmrm::mmrm()
and brms.mmrm::brm_model()
. Apologies, the reprex has become a bit lengthy. Of note, the endpoint here is a response
, not a chg
-variable. I noted a few things that are summarized below:
Prerequisites
packages <- c("tidyr", "dplyr", "stringr", "tictoc", "nlme")
invisible(lapply(packages, library, character.only = T))
library(brms.mmrm) # version brms.mmrm_0.0.2.9001
library(mmrm) # mmrm_0.3.3
set.seed(123L)
Data
# data
data(Orthodont)
# age variable: brms.mmrm needs a character or factor 'time' variable
Orthodont$age_fct <- as.character(Orthodont$age)
Orthodont$age_fct[which(Orthodont$age_fct == "8")] <- "08" # changed as otherwise age = "10" becomes the reference level
Orthodont$age_fct <- factor(Orthodont$age_fct)
levels(Orthodont$age_fct)
#> [1] "08" "10" "12" "14"
# sex variable: reference level changed since `level_control` in brm_data() did not seem be effective
levels(Orthodont$Sex)
#> [1] "Male" "Female"
Orthodont$Sex <- relevel(Orthodont$Sex, ref = "Female")
levels(Orthodont$Sex)
#> [1] "Female" "Male"
lapply(Orthodont, class) |> unlist()
#> distance age Subject1 Subject2 Sex age_fct
#> "numeric" "numeric" "ordered" "factor" "factor" "factor"
head(Orthodont)
#> Grouped Data: distance ~ age | Subject
#> distance age Subject Sex age_fct
#> 1 26.0 8 M01 Male 08
#> 2 25.0 10 M01 Male 10
#> 3 29.0 12 M01 Male 12
#> 4 31.0 14 M01 Male 14
#> 5 21.5 8 M02 Male 08
#> 6 22.5 10 M02 Male 10
mmrm
mmrm_fit <- mmrm(
formula = distance ~ age_fct + Sex * age_fct + us(age_fct | Subject),
data = Orthodont,
control = mmrm_control(method = "Satterthwaite")
)
summary(mmrm_fit)
#> mmrm fit
#>
#> Formula: distance ~ age_fct + Sex * age_fct + us(age_fct | Subject)
#> Data: Orthodont (used 108 observations from 27 subjects with maximum 4
#> timepoints)
#> Covariance: unstructured (10 variance parameters)
#> Method: Satterthwaite
#> Vcov Method: Asymptotic
#> Inference: REML
#>
#> Model selection criteria:
#> AIC BIC logLik deviance
#> 434 447 -207 414
#>
#> Coefficients:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 21.1818 0.7017 25.0000 30.189 < 2e-16 ***
#> age_fct10 1.0455 0.6155 25.0000 1.699 0.101793
#> age_fct12 1.9091 0.6068 25.0000 3.146 0.004241 **
#> age_fct14 2.9091 0.6729 25.0000 4.323 0.000215 ***
#> SexMale 1.6932 0.9115 25.0000 1.858 0.075038 .
#> age_fct10:SexMale -0.1080 0.7995 25.0000 -0.135 0.893671
#> age_fct12:SexMale 0.9347 0.7883 25.0000 1.186 0.246904
#> age_fct14:SexMale 1.6847 0.8741 25.0000 1.927 0.065385 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Covariance estimate:
#> 08 10 12 14
#> 08 5.4155 2.7168 3.9102 2.7102
#> 10 2.7168 4.1848 2.9272 3.3172
#> 12 3.9102 2.9272 6.4557 4.1307
#> 14 2.7102 3.3172 4.1307 4.9857
brms.mmrm
# Pre-process dataset
brms.mmrm_data <- brm_data(
data = Orthodont,
outcome = "distance",
role = "response",
group = "Sex",
time = "age_fct",
patient = "Subject",
baseline = NULL,
level_control = "Female", # an effect for 'sexMale' is shown for both "Male" and "Female"
level_baseline = "08" # an error occurs if set to NULL
)
# Model formula
brms.mmrm_formula <- brm_formula(
data = brms.mmrm_data,
intercept = TRUE,
effect_baseline = FALSE,
effect_group = TRUE,
effect_time = TRUE,
interaction_baseline = TRUE,
interaction_group = TRUE,
correlation = "unstructured"
)
print(brms.mmrm_formula)
#> distance ~ age_fct + Sex + Sex:age_fct + unstr(time = age_fct, gr = Subject)
#> sigma ~ 0 + age_fct
# Model fit
options(mc.cores = 3)
tic()
brms.mmrm_fit <- brm_model(
data = brms.mmrm_data,
formula = brms.mmrm_formula,
chains = 3,
iter = 2000,
refresh = 0,
cores = 3
)
toc()
#> 155.63 sec elapsed
summary(brms.mmrm_fit)
#> Family: gaussian
#> Links: mu = identity; sigma = log
#> Formula: distance ~ age_fct + Sex + Sex:age_fct + unstr(time = age_fct, gr = Subject)
#> sigma ~ 0 + age_fct
#> Data: data (Number of observations: 108)
#> Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 3000
#>
#> Correlation Structures:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> cortime(X08,X10) 0.46 0.14 0.15 0.71 1.00 1766 1808
#> cortime(X08,X12) 0.55 0.13 0.25 0.77 1.00 1865 2199
#> cortime(X10,X12) 0.43 0.15 0.08 0.69 1.00 2007 2106
#> cortime(X08,X14) 0.39 0.16 0.05 0.66 1.00 1675 1901
#> cortime(X10,X14) 0.62 0.12 0.34 0.81 1.00 1907 1897
#> cortime(X12,X14) 0.62 0.12 0.35 0.81 1.00 2109 2260
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 21.17 0.71 19.76 22.62 1.00 1516 1838
#> age_fctX10 1.04 0.69 -0.34 2.44 1.00 1273 1540
#> age_fctX12 1.90 0.69 0.50 3.24 1.00 1420 1871
#> age_fctX14 2.89 0.75 1.36 4.40 1.01 1214 1615
#> SexMale 1.71 0.93 -0.13 3.54 1.00 1546 1955
#> age_fctX10:SexMale -0.10 0.90 -1.86 1.72 1.00 1227 1675
#> age_fctX12:SexMale 0.94 0.90 -0.78 2.77 1.00 1419 1944
#> age_fctX14:SexMale 1.70 0.98 -0.23 3.60 1.00 1185 1602
#> sigma_age_fctX08 0.85 0.14 0.60 1.13 1.00 2121 2342
#> sigma_age_fctX10 0.71 0.13 0.46 0.99 1.00 2002 2045
#> sigma_age_fctX12 0.92 0.14 0.68 1.20 1.00 1949 2055
#> sigma_age_fctX14 0.78 0.13 0.55 1.07 1.00 1843 2039
Some observations and possible issues
group
variable through level_control
in brm_data()
did not seem have an effect.role = "response"
in brm_data()
, I understood from the documentation that no level_baseline = ...
is required, which makes sense to me. However, I received an error when trying level_baseline = NULL
. (With effect_baseline = FALSE
and effect_time = TRUE
in brm_formula()
the parameterization looked good.)Perhaps we can briefly discuss this at the next meeting.
I figured it might be good to align on scoping async. Here are a few thoughts - curious to hear your thoughts.
~ AVISIT*TRT01P
models (fully saturated without covariates) there is a linear bijection between coefficients and marginal means - this can be used to pull back a prior on the marginal means to the parameter scale. Sensible priors will be correlated on the parameter scale, even if they are not on the marginal means scale. Some degree of correlation on the marginal means scale is, however, likely as well.brms
but possible via the stanvar
argument to brm
.In a nutshell, the key problem to me seems to enable easy and transparent prior specification in a number of different borrowing use-cases (tbd). This would probably only be feasible to implement when restricting the number of supported models to an absolute minimum. For anything out of scope working directly with brms or rstan are still options.
Once scoping etc is clear we could think about hard coding the models to get rid of the brms dependency and facilitate parallel processing.
We talked about specifying a joint prior on the placebo means or treatment effects, then translating it to an induced prior on any parameterization of regression coefficients. The literature calls this approach "conditional means priors" (CMPs), or BCJ priors after the authors of https://www.jstor.org/stable/2291571, and the topic comes up in prior elicitation. Sources:
The method itself is exactly what you would expect: for a model of the form g(E(y|data)) = X*b
for link function g()
, and a vector of conditional means m
such that g(m) = P*b
given invertible matrix P
, we specify independent priors on the components of m
and simply consider the induced prior on b = P_inverse * g(m)
. The components of m
become the true parameters of the model, and the regression coefficients in b
are just transformed parameters. The full augmented model looks like:
y ~ MVN(g_inverse(X*b), Sigma) # likelihood
b = P_inverse * g(m) # transformation
m ~ p(...) # prior
Sigma ~ LKJ(...) # prior
Since the link g()
is the identity function, the model collapses down to this:
y ~ MVN(Q*m, Sigma) # likelihood
m ~ p(...) # prior
Sigma ~ LKJ(...) # prior
where Q = X * P_inverse
. In other words, our true parameters are m
and Sigma
. When we do MCMC, we draw joint posterior samples of (m
, Sigma
). This seems to be the most feasible (and recommended) way to handle CMPs. The alternative is to analytically transform the joint prior on m
to a correlated joint prior on b
beforehand, which seems extremely messy and error-prone even in a specialized case like ours. Then we would need to hack the Stan code of brms
to implement a custom likelihood family because of the correlated priors on the components of b
, which is also be extremely messy and error-prone. So unfortunately, I think CMPs force us away from brms
.
Similar to #28, my company's TFLs report something we have been calling "effect size", which is the marginal posterior of the treatment difference divided by the residual standard deviation for that time point. (I can loop back to check whether this is still necessary to report.) I thought it would be easy to calculate, but it is challenging to get posterior samples of the residual standard deviations. In the reference vignettes, we have fixed effects for time instead of marginal standard deviation parameters, and they can be negative. (Which, as an aside, seems strange for an MMRM.)
So far, the package is small and early enough that I have been making commits directly to the main branch. Going forward, I will use pull requests, and I will tag reviewers as needed. Is there anyone who would like to always be tagged as a reviewer? I would like this to be a collaborative to the extent that people are interested in helping to write the code.
In wlandau/asa-biop-swe-wg@87f253c, I drafted a workstream page similar to https://github.com/RConsortium/asa-biop-swe-wg/blob/main/mmrm_R_package.qmd / https://rconsortium.github.io/asa-biop-swe-wg/mmrm_R_package.html. Any objections if I submit a PR?
Working on #8, I am struggling with the hypothesis()
function in brms
. From the Novartis vignettes and package documentation, it looks like hypothesis()
requires us to state the exact linear combinations of formula parameters which correspond to the marginal means we want. In the current setup, this seems extremely difficult to do in the general case.
As a workaround, we could drop brms::hypothesis()
and even emmeans::emmeans()
from the postprocessing and instead directly derive the marginal means using the model matrix. According to https://discourse.mc-stan.org/t/model-matrix-from-brms-model/29411/2, we could get the model matrix using brms::make_standata()
. At that point, we could use the model matrix to transform the posterior samples of model parameters to posterior samples of the marginal means. Everything else downstream is straightforward, except possibly choosing what to do about baseline covariate fixed effects, c.f. #10 (comment) and #10 (comment).
It would probably be good to have a detailed comparison of a non-informative brms fit vs. at least mmrm to make sure brms is working fine here.
Consistency in citations of methodological literature (in vignettes and elsewhere in the package documentation) seems desirable. I suggest to take all references from a central BibTeX file (inst/bibliography.bib
) and use one citation style (inst/asa.csl
) throughout the package. Literature references are currently used in vignettes, the DESCRIPTION
, README.Rmd
, brm_package.R(\Rd)
and maybe at other places. I'd be happy to go through the package and prepare a PR.
brm_prior_simple()
should accommodate all covariance structures supported by brms.mmrm
.
For #40, I think we are heading toward the following kind of workflow:
I will prototype this in a branch. It will be tricky because brms.mmrm
thus far has been designed to use an ordinary data/formula and makes a lot of assumptions there.
Looks like trouble compiling the model: https://github.com/RConsortium/brms.mmrm/actions/runs/5648046496/job/15299396175
As part of #12, it would be good to run a simulation-based calibration study. Looking over the brm_simulate()
function, I think it needs some work first:
brms
: https://openpharma.github.io/brms.mmrm/articles/methods.html.brms
. Ideally, we would want to be able to supply an object from brms::get_prior()
and get a simulated dataset. For the purposes of SBC, the simulation should be implemented with custom R code and not brms
.model.matrix()
converts each factor into a set of dichotomous covariates anyway.As for simulation scenarios, I think it would be good to try:
Any other simulation scenarios come to mind?
I followed the Novartis vignettes to set priors for the MMRM, but I am not sure if if I am setting them correctly. Code to set up the model.
Example interface code:
When I run this code, I see the following for brms::prior_summary(model)
:
prior class coef group resp dpar nlpar lb ub source
normal(0, 100) b user
normal(0, 100) b group2 (vectorized)
normal(0, 100) b time2 (vectorized)
normal(0, 100) b time2:group2 (vectorized)
normal(0, 100) b time3 (vectorized)
normal(0, 100) b time3:group2 (vectorized)
normal(0, 100) b time4 (vectorized)
normal(0, 100) b time4:group2 (vectorized)
normal(0, 100) b sigma user
normal(0, 100) b time2 sigma (vectorized)
normal(0, 100) b time3 sigma (vectorized)
normal(0, 100) b time4 sigma (vectorized)
normal(0, 100) Intercept user
normal(0, 100) Intercept sigma user
When I try to include an LKJ prior in the code, I get:
Error: The following priors do not correspond to any model parameter:
L ~ lkj(1)
Function 'get_prior' might be helpful to you.
Why are the correlation parameters missing? I had thought the autocor
argument would be enough:
For the brms
-based prototype, I am struggling with hypothesis()
for the reasons I describe at #19 (comment). One way around this is to do inference on patient-level fitted values as opposed to the model parameters themselves. If beta
is the vector of model coefficients and X
is the model matrix, then mu = X %*% beta
is the vector of fitted values for each patient and timepoint. Now, let g()
be a function that accepts the vector mu
and returns the marginal response at each treatment group and time point. If beta_samples
has the posterior samples of beta
, then mu_samples = g(X %*% beta_samples)
has valid posterior samples of the marginal means, which makes it straightforward to do any kind of posterior inference by treatment group and time point. I am told this technique is similar to the "OBS Margins" post-processing method in SAS.
Advantages:
Disadvantages:
I would like to revisit #92 (reply in thread). When I sketched #93, I realized just how chaotic a fully manual approach would make the interface. It is actually super disruptive if the user and the package would have to maintain two different sets of data and two different formulas. I think it is worth revisiting the contrasts()
interface to determine whether:
(1) does not seem like a big deal if it doesn't work out, since the treatment-effect-like version has a more reasonable correlation structure anyway. For (2), an alternative would be to let brm_data()
do proportional averaging for nuisance factors. This could actually help all kinds of informative prior scenarios, even custom ones, and it could make marginal mean calculations easier.
Then comes the separate problem of making a friendlier interface for informative priors.
If the time
variable in the data has levels "visit 1"
, "visit 2"
, etc, then brms
will remove spaces when it generates names for the b_sigma
parameters, e.g. "b_sigma_timevisit1"
. This could make it difficult for issues like #29. Indeed, brms
seems to use make.names()
a lot internally.
I propose to use make.names()
to sanitize the group and time variables in brm_data()
, then assign the unique levels in role attributes.
One dataset enabling
data set containing studies with same/similar intervention and different ones/only control.
simulated + save to pkg.
To continue @chstock's great work from #71, I think we can align closer with https://openpharma.github.io/mmrm/latest-tag/articles/mmrm_review_methods.html. We can analyze the BCVA dataset, and we can discuss our results relative to the ones from https://openpharma.github.io/mmrm/latest-tag/articles/mmrm_review_methods.html#marginal-treatment-effect-estimates-comparison.
I propose a codeless vignette that writes out the model in full detail and describes the methodology behind the emmeans
-based post processing. I have not done this yet because both brms
and emmeans
are black boxes, but I hope to understand more. Fully spelled-out descriptions of the methods, plus empirical comparisons with mmrm
and SAS, will be really important for helping users in the life sciences decide whether to trust the package.
c.f. rvlenth/emmeans#425. Should wait for the next emmeans
CRAN release though.
I think we will be ready for a first CRAN submission after #36 and #43. As we discussed, this would help users try the package, which might even help with validation. As the disclaimer in the README states, the package is not validated, so I think the version number could be something like 0.0.1 instead of 1.0.0. How does that all sound?
Following up on #93, we could have different end-to-end workflows for specific informative prior use cases, such as successive differences (e.g. #92). We could begin with brm_data()
, have specific functions like brm_data_sdif()
to create case-specific classed datasets, and then make functions like brm_formula()
and brm_transform_marginal()
could become S3 generics with case-specific methods. We could also have a new S3 generic for informative prior specification which would only apply to those specific cases.
Would be good to talk about the visualizations to include in the package.
The current version of my company's Bayesian MMRM standard has optional functionality to report summaries averaged over a subset of time points. (Or maybe it's averaged over all time points, I do not remember. I will need to loop back and check.) I am personally not a fan of this feature, but it might be necessary for internal adoption. Could be implemented in brm_summary()
.
One tricky part of brms
is compilation. This is okay for a single data analysis, but for simulation studies, it would be burdensome to compile the model again with every analysis. Opening this thread to discuss.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.