paul-buerkner / brms Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 179.0 264.04 MB

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan

Home Page: https://paul-buerkner.github.io/brms/

License: GNU General Public License v2.0

R 88.38% Stan 4.18% TeX 7.44%

bayesian-inference brms multilevel-models r-package stan statistical-models

brms's People

Contributors

Stargazers

Watchers

Forkers

npp97-field berenz mages paulhendricks obaidpervaizgill lixiangchun aforren1 joshnr13 hoardboard anhnguyendepocen sandy4321 vmikk mikejacktzen seanmlb elaine50596 peter1328 famuvie statwonk mejihero lemna zhaoxiaohe bnicenboim jmorten jaykimbravekjh jrosen48 jgabry imsrgadich kevinmickey csetraynor tjmahr tehillao radovankavicky gapdata perlatex nemochina2008 bbolker jeremylea joyceylee edenhuangsh kauttoj m-macaskill dragonfly-science asiftechie vkehayas giserh guhjy philipp-neubauer nyiuab mashedpoteto pebsconsulting conormm rishisadhir jayrobwilliams erge324 rdinnager lnsongxf stjordanis kieranjmartin achetverikov mcol drew-sauve snaketron limor-raviv robindang0573 nliced andicyim datasciences donaldrwilliams maj-biostat yhw1234 karthy257 vlucet rafeemusabbir liyi3344520 ococrook joyneelm ikerbg92 ufangyang asael697 rvlenth rumenick danielyao12 brendanf jflournoy adamrauk-lilly singmann wds15 lthsieh andrjohns pangshuang mutlusun vickyyang-forester hsbadr zhaoxia413 martinmodrak seabbs gotticketsto360tour standardgalactic lauken13 moarchives

brms's Issues

predictions of ordinal variables

One of the best features of this package is its extensive support for ordinal regression. Thank you for this.

However, two issues. First, is there anyway to "predict" anything other than classes from, say, a cumulative ordinal regression? POLR and Ordinal both support the ability to do this (type='prob' or 'cum prob'), and since my categories actually have "value" in a sense this ability would be very useful to me. Or perhaps we can do something equivalent by working through the classes currently being predicted?

Second, when using meanfield VB, I am getting "log of 0" errors when using probit for the initiating values. Logit works fine. All output values / categories are between -1 and +2 and have been converted to ordered factors. If this sounds like a bug, please let me know and I'll try to post a toy that can reproduce it.

Again, many thanks.

Expand random effects structures for categorical models

Currently, random effects are assumed to be the same across all outcome values of a categorical model, which doesn't really makes sense given that fixed effects are estimated separately for each outcome value. A user thus suggested to expand the random effects structures of categorical models to fit separate random effects for each level of the outcome.

Zero-one inflated beta distribution

Is there a way to use zero-one inflated beta distribution in brms? My variable has lots of zeros and (some) ones, with other values between 0 and 1. I think the zero-one inflated beta distribution is the right distribution for my problem.

Implement mixture models

Stan allows to fit mixture models in a flexible way as described in Section 10 of the Stan 2.9.0 Reference Manual. Regarding their implementation in brms, one has to think about (1) what kind of mixture models to allow and (2) how to specify them within the brm function. Regarding (2) it might be a good idea to implement a new addition argument mixture for the LHS of formula, in which the type of mixture could be specified.

Speed question

I have a question about the speed. I can run models like this:

brm(formula = y ~ x1 + x2 + x3 + (1 | subcat),
data = df,
family = "gaussian",
n.chains = 4,
n.cluster = 4)

with 10,000 data points or less, and it runs in 10 minutes or so. But when I increase the size to 40,000, it takes almost 10 hours to run! Am I doing something wrong, or this is normal? I don't run out of memory and the machine has 4 cpus. How can I speed it up?

Error using . in formula

When I try to use '.' in a formula, I get the following error:

> system.time (bb2 <- brm (Species ~ ., data=ir2,
+                          family="bernoulli", n.chains=3, n.iter=3000, n.warmup=600,
+                          prior=c(set_prior ("normal (0, 5)"))))
Error in terms.formula(formula) : '.' in formula and no 'data' argument

I can spell out all of the columns individually, but the '.' notation is quite useful. I believe brm is using a Formula call under the hood that doesn't supply the 'data' argument.

Poisson family

I get this error when I choose poisson instead of gaussian family:

Start sampling
Error in rstan::sflist2stanfit(rmNULL(x$fit)) :
'sflist' should have at least 1 element
Calls: brm ->
In addition: Warning messages:
1: In FUN(X[[i]], ...) :
chain 1 did not contain samples and was removed from the fitted model
2: In FUN(X[[i]], ...) :
chain 2 did not contain samples and was removed from the fitted model
3: In FUN(X[[i]], ...) :
chain 3 did not contain samples and was removed from the fitted model
4: In FUN(X[[i]], ...) :
chain 4 did not contain samples and was removed from the fitted model
Execution halted

Add sparse matrix multiplication in Stan code

Sparse design matrices frequently appear in models fitted with brms, for instance when using multivariate syntax. Although sparse matrix multiplication isn't very "pretty" in Stan, it shouldn't be too difficult to implement and hopefully speeds up computation for some models.

Prediction seems to use a lot of RAM

I create a categorical model (target has four levels) with 12 variables, two of which are categorical. I'm running this on my employer's wimpy Windows laptop with 8 GB of RAM -- about 2 GB free -- and if I try predicting 45K test cases, it takes forever to eventually run out of RAM. If I predict 6K test cases, it does that in about 30 seonds and with very little RAM. (So I've resorted to a foreach loop to do 6,000 at a time.)

This seems like a problem, since it's routine to score a lot more data than is used for training.

Error in zero_inflated_beta model, when there is a non-relevant column with type 'logical'

This is a strange error I was getting, and it took me some time to figure it out. My dataframe had a column with type logic (say x3, with all NA values). When I ran ZIB model with trait in the formula, I would get:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels

And I was not using that column in my formula! If I removed trait from the equation it would run though.

I figured out when I remove the offending column, it runs.

Here is my brm call:

brm(formula = d ~ 0 + trait * (x1 + x2) + (0 + trait | item), 
    data = df,
    family = 'zero_inflated_beta',
    chains = 2,
    cluster = 2,
    iter = 1000, 
    warmup = 300) -> fitted_model

Add support for non-linear mixed models

As brms generates Stan code on the fly, it should be possible to translate non-linear model formulae into Stan code, given that Stan knows the occuring functions (such as exp or log; possibly under another name) or that I have defined these functions in the 'functions' block of the Stan code. Definitely more of a long term goal, though.

posterior_samples: rename chains -> chain

posterior_samples(brmsfit, add.chains = T) returns a data.frame with additional columns "chain_s_" and "iter". As "chains" is an identifier variable, it would be more memorable to call the column "chain" (singular).

Truncated distributions

It would be nice if brms would allow to specify truncation for a specified distribution. Stan reference manual specify it in the following way, where L is lower and U is upper bound.

 data {
      int<lower=0> N;
      real U;
      real L;
      real<lower=L,upper=U> y[N];
    }
    parameters {
real mu;
      real<lower=0> sigma;
    }
    model {
      for (n in 1:N)
        y[n] ~ normal(mu,sigma) T[L,U];
    }

Validate self-defined Stan functions

Recently, rstan has implemented the option to include self-defined Stan functions that are written in separate .txt files with the #include operator. Using rstan::expose_stan_functions, it is then possible to call these functions from R and to test them more rigorously.

newdata prediction with forked/hurdle families fails

I tested it only with hurdle models, but I think it's something general to forked families in brms:::melt_data.
You add a missing value dummy column for the response, but then you use model.frame without na.action = na.pass and end up with an empty column.
But if I "fix" this while debugging, I run into the next error from brms:::fitted_hurdle, which complains that data$N_trait is not set. I think you probably wanted to set it in brms:::amend_newdata at the end?

I'll send you the fit below via email.

m1hg = brm(value ~ 0 + (0 + trait | target) + (0 + trait | rater) + trait * situation * gFaktor, gesamt_st, family = hurdle_gamma, warmup = 260, iter = 400, chains = 2)
summary(m1hg)
newd = unique(gesamt_unknown[, c("situation","gFaktor")])
predicted = fitted(m1hg, newdata = newd, re_formula = NA, summary = T)

Error in `$<-.data.frame`(`*tmp*`, "response", value = logical(0)) : 
  replacement has 0 rows, data has 880
> traceback()
8: stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
       "replacement has %d rows, data has %d"), N, nrows), domain = NA)
7: `$<-.data.frame`(`*tmp*`, "response", value = logical(0))
6: `$<-`(`*tmp*`, "response", value = logical(0))
5: melt_data(data, family = family, effects = effects)
4: update_data(newdata, family = fit$family, effects = ee, et$group, 
       na.action = na.pass, drop.unused.levels = FALSE)
3: amend_newdata(newdata, fit = object, re_formula = re_formula, 
       allow_new_levels = allow_new_levels)
2: fitted.brmsfit(m1hnb, newdata = newd, re_formula = NA, summary = T)
1: fitted(m1hnb, newdata = newd, re_formula = NA, summary = T)

Predict function doesn't work in developmental version

I get this error when I want to predict new data:

Error: C stack usage 8037119 is too close to the limit

but when I revert to Cran version, it works fine.

User-defined function/distributions ?

Wouldn't it be nice if some interface could allow the (Stan ?) definition of user functions / distributions to be passed to the functions{} paragraph of the Stan code implementing a model ?

Case in point : I followed the advice given in the (now closed) issue #46, which was essentially to give a name to the linear part of a nonlinear model in the formula argument of brm and to define it in the nonlinear argument ; an example :

brm(Y ~ LinPred + log(k+(1-k)*exp(beta*X)), data=...,
    nonlinear = list(LinPred ~ A + B + (1|C), k ~ 1, beta ~ 1),
    prior=c(set_prior("cauchy(0,3)", nlpar="beta")
            set_prior("uniform(0,1)", nlpar="k", lb=0, ub=1),
            set_prior("uniform(-100,100)", nlpar="LinPred", lb=-100, ub=100)))

The last prior definition is a nonsense, since LinPred is a (deterministic) function of the independent variables and the corrsponding (fixed effect) parameters. What should have been writtent would be something along the lines of set_prior("flat()", nlpar="LinPred"), which would be meaningful if we could pass the Stan snippet :

functions {
    real flat_log(real x) {
        return 0.0;
    }
}

in the Stan model code.

Other uses : simplifying the writing of complex nonlinear functions, implementing alternate parametrization of various distributions, etc...

What do you think ?

Question regarding stan model

I'm trying to see if I can speed up my runs. They take up to many days for larger 3 or 4 level models.

I was reading on the stan manual that when it comes to picking row m of matrix M, it's much more efficient to use an array of row_vectors (page 40). I see that you use this technique in defining the data, but it seems like it's not used in parameters. For example, in parameters block, I have:

matrix[N_1,K_1] pre_1

wouldn't it be better to do something like:

row_vector[K_1] pre_1[N_1]

as later you do:

for (i in 1:N_1) {
r_1[i] <- sd_1 .* (to_vector(pre_1[i])); # scale REs
}

Is there a reason for using matrix for pre_1, etc?

Estimate prediction larger than 90% quantile

I'm getting predictions for my Zerio-Inflated Beta model, and for some predictions I see:

     Estimate  Est.Error 10%ile 25%ile 75%ile 90%ile
        (dbl)      (dbl)  (dbl)  (dbl)  (dbl)  (dbl)
1 0.009530440 0.03391364      0      0      0      0
2 0.010388541 0.03790067      0      0      0      0
3 0.007963558 0.03169906      0      0      0      0
4 0.008344872 0.03136680      0      0      0      0
5 0.008264070 0.03016002      0      0      0      0
6 0.008028368 0.03091140      0      0      0      0

My guess is that the Estimate is the mean of the posterior, right? Wouldn't it make more sense to use median of the posterior as an estimate for ZIB model?

My predict line is:

testing_pred = predict(object = zib_fit, newdata = testing, probs = c(0.1, 0.25, 0.75, 0.9))

error if newdata has the class "brms.frame"

As discussed via mail. I haven't traced the internals very thoroughly but I found a workaround: removing the brms.frame class, which stuck with me when I made the newdata object one way, but not another. If it's present, you expect that you can safely query the terms attribute which got lost in the process.
Pretty esoteric scenario, but anyway:

fit <- brm(rating ~ treat + period + carry + (1|subject), 
           data = inhaler, iter = 30, warmup = 10)

fixef_vars = all.vars(delete.response(terms(lme4::nobars(fit$formula))))
predictor_data = fit$data[, fixef_vars]
newd = unique(predictor_data)[1:10,]

predicted = fitted(fit, newdata = newd, re_formula = NA, summary = T, scale = "response")
# ## Error in if (attr(attr(data, "terms"), "response")) { : 
# argument is of length zero
traceback()
#6: model.response(data)
#5: unname(model.response(data))
#4: make_standata(new_formula, data = newdata, family = fit$family, 
#                  autocor = fit$autocor, partial = fit$partial, control = control)
#3: amend_newdata(newdata, fit = object, re_formula = re_formula, 
#                  allow_new_levels = allow_new_levels)
#2: fitted.brmsfit(fit, newdata = newd, re_formula = NA, summary = T, 
#                   scale = "response")
#1: fitted(fit, newdata = newd, re_formula = NA, summary = T, scale = "response")

# this works
fixef_vars = delete.response(terms(lme4::nobars(fit$formula)))
predictor_data = model.frame(fixef_vars, data = fit$data)
newd2 = unique(predictor_data)[1:10,]

predicted = fitted(fit, newdata = newd2, re_formula = NA, summary = T, scale = "response")


attributes(newd2)
attributes(newd)
## the latter also has the class "brms.frame"

# this fixes it
class(newd) = "data.frame"
predicted = fitted(fit, newdata = newd, re_formula = NA, summary = T, scale = "response")

Add support for generalized additive mixed models

The basic goal is to have the functionality of gam and gamm from the mgcv and gamm4 packages.

Actually this has been on my to do list for quite a while now, but somehow I forgot to make an issue for it.

Print updates during sampling

Not an issue but a feature that would be nice: the ability to view sampling progress (as with the "refresh" option in a stan() call). Apologies if this is already implemented and I've just missed it.

Predict multilevel models with interaction between two levels fails

Thanks for the great work!

My model looks like this:

brm(formula = y ~ x1 + x2 + x3 +
(1 | cat:subcat) +
(x1 + x2 + x3 || cat),
data = df,
family = "gaussian",
n.chains = 4,
n.cluster = 4,
cluster_type = "FORK") -> model1

It runs fine, but when I try to predict it using:

new_pred = predict(object = model1, newdata = newData, probs = c(0.005, 0.995))

It give an error:

Error in get(gnames[i], newdata): object 'cat:subcat' not found.

I tried creating a column "cat:subcat", but it gives another error:

Error in amend_newdata(newdata, fit = object, re_formula = re_formula, : levels 813, 1346, 1397, 2517, 2003, 1439, 1373, 1329, 2322, 2247, 919, 2320, 1451, 1146, 2578, ..., 2679 of grouping factor 1346 not found in the fitted model.

What's the solution?

Also, it seems like we can't use "1 | cat/subcat" to denote the hierarchy. Is "1 | cat:subcat" gonna do the same thing?

Prediction of categorical getting error in make_standata

In R, using brms 0.6.0, I create a model:

br1 <- brm (FLAG ~ Age + [[...left out rest of formula...]], data=x2.train, refresh=20,
                     prior=set_prior ("normal (0, 6)"), family="categorical",
                     n.chains=3, n.cluster=3)

And I can do:

predict (br1)

which works, but

predict (br1, x2.train)

results in an error

Error in if (max(standata$ncat) == 2) message("Only 2 levels detected so that family     'bernoulli' might be a more efficient choice.") : 
  missing value where TRUE/FALSE needed

with stack traceback:

1: predict(br1, xyz)
2: predict.brmsfit(br1, xyz)
3: amend_newdata(newdata, fit = object, re_formula = re_formula, allow_new_levels =     allow_new_levels)
4: make_standata(fit$formula, data = newdata, family = fit$family, autocor =        
    fit$autocor, partial = fit$partial, newdata

I do notice that FLAG has no NA's in predict.brmsfit, but is entirely NA's in amend_newdata, which I think might be how the predict works, but might also be wrong.

Wide CI for prediction of zero-inflated beta family

Here's my brm call:

brm(formula = d ~ p + (1 || item), 
    data = training,
    family = "zero_inflated_beta",
    chains = 2,
    cluster = 2,
    iter = 1000, 
    warmup = 300) -> zib_fit

The model runs with no problems, and it seems like it is converged (Rhat = 1, the two chains mix well, and posterior distribution of parameters look similar and normal-like). Here's the summary of zib_fit:

Family: zero_inflated_beta (logit) 
Formula: d ~ p + (1 || item) 
   Data: training (Number of observations: 8971) 
Samples: 2 chains, each with iter = 1000; warmup = 300; thin = 1; 
         total post-warmup samples = 1400
   WAIC: Not computed

Random Effects: 
~item (Number of levels: 1330) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     1.43      0.07     1.29     1.58        195    1

Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -1.11      0.11    -1.32    -0.89        419 1.01
p             5.54      0.22     5.12     5.96        315 1.00

Family Specific Parameters: 
    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
phi     1.64      0.07     1.51      1.8        848    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

However, when I use the model to predict for new data:

testing_pred = predict(object = zib_fit, newdata = testing, probs = c(0.05, 0.95))

the predictions look like this

Estimate    Est.Error   5%ile   95%ile
1   0.03223522  0.17344704  0.00000000  0.00000000
2   0.2166267   0.3436864   0.0000000   0.9736827
3   0.09275476  0.27608277  0.00000000  0.98260804
4   0.08670104  0.26223131  0.00000000  0.91945058
5   0.09281734  0.27890228  0.00000000  0.98422182
6   0.2148495   0.3208814   0.0000000   0.9330208

Prediction interval is most of the times as large as the range of response (and sometimes zero)!
By the way, what is Est.Error?

Two more questions:

What would be appropriate priors for zero-inflated beta model?
Where can I find the estimate of the probability of zero in this model?

Summary function with waic = TRUE

In large models, the Summary function takes a very long time (and tons of memory) to run, with "waic = TRUE" option. Why?

Question regarding nested models

Let's say my data is such that I have 10 cats and within each cat, a variable number of subcats, and in each subcat, a variable number of observations. I've been writing my model like this so far:

brm(formula = y ~ x1 + x2 + x3 +
                 (1 | cat:subcat) + (1 | cat),
    data = df,
    family = "gaussian")

What if I write it like this?

brm(formula = y ~ x1 + x2 + x3 +
                 (1 | subcat) + (1 | cat),
    data = df,
    family = "gaussian")

Is this still nested? Do I need to specify cat:subcat at all?

Allow estimating missing values in data

Currently, rows in data that contain missing values (i.e., NAs) are removed and thus completely ignored, because Stan does not allow NAs in the data. However, it would certainly be nice to have a JAGS like behavior of estimating missing responses automatically.

One option is to add some Stan code for estimating missing responses as described in Section 8 of the Stan 2.9.0 Reference Manual. Another possibility is to use the predict method with newdata within the brm function to estimate missing responses after fitting the model. The former would have the advantage that users could easily take the Stan code dealing with missing data and generalize it beyond what brms offers. The latter would be somewhat cleaner overall and also much easier to implement.

Ability to specify (Stan) types and priors for nonlinear parameters

As far as I understand, the nonlinear parameters are supposed to be real scalars (or positve scalars for the RE standard deviations).

I recently stumbled upon a case where I would have loved to specify a custom type and size : I wanted to model the effect of some variable T as monotonous (i. e either constantly increasing or constantly decreasing). The conventional solution is to use the "ordinal factor" type of R and an orthogonal polynomial basis ; this is valid for "equally spaced" levels of the ordinal variable and gives difficut-to-interpret results.

An obvious solution is to factor the effect as the product of a "shape" parameter and a size parameter. This is trivially done in Stan by declaring the shape as a simplex and the size as a real. Stan runs on my example gave me reasonable (and enlightening !) answers.

As far as I understand, this cannot be done in brms, for lack of ability :

to declare the shape as a simplex,
to pass the size of the simplex as a parameter
and to give it a"reasonable prior.

I feel that this could be a reasonable and useful addition to brms capabilities.

Install problem

Great software, but the latest version does not upgrade or install for me, even clean. The error seems to arise in this section:

1112: #include <rstan/rstaninc.hpp>
1113: /**
1114: * Define Rcpp Module to expose stan_fit's functions to R.
1115: */
1116: RCPP_MODULE(stan_fit4model19b4ffa5ef5_gaussian_identity__brms_model_mod){
1117: Rcpp::class_<rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model,
1118: boost::random::ecuyer1988> >("stan_fit4model19b4ffa5ef5_gaussian_identity__brms_model")
1119: // .constructorRcpp::List()
1120: .constructor<SEXP, SEXP>()
1121: // .constructor<SEXP, SEXP>()
1122: .method("call_sampler",
1123: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::call_sampler)
1124: .method("param_names",
1125: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::param_names)
1126: .method("param_names_oi",
1127: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::param_names_oi)
1128: .method("param_fnames_oi",
1129: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::param_fnames_oi)
1130: .method("param_dims",
1131: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::param_dims)
1132: .method("param_dims_oi",
1133: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::param_dims_oi)
1134: .method("update_param_oi",
1135: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::update_param_oi)
1136: .method("param_oi_tidx",
1137: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::param_oi_tidx)
1138: .method("grad_log_prob",
1139: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::grad_log_prob)
1140: .method("log_prob",
1141: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::log_prob)
1142: .method("unconstrain_pars",
1143: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::unconstrain_pars)
1144: .method("constrain_pars",
1145: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::constrain_pars)
1146: .method("num_pars_unconstrained",
1147: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::num_pars_unconstrained)
1148: .method("unconstrained_param_names",
1149: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::unconstrained_param_names)
1150: .method("constrained_param_names",
1151: &rstan::stan_fit<model19b4ffa5ef5_gaussian_identity__brms_model_namespace::model19b4ffa5ef5_gaussian_identity__brms_model, boost::random::ecuyer1988>::constrained_param_names)
1152: ;
1153: }
1154:
1155: // declarations
1156: extern "C" {
1157: SEXP file19b42a4479df( ) ;
1158: }
1159:
1160: // definition
1161:
1162: SEXP file19b42a4479df( ){
1163: return Rcpp::wrap("gaussian(identity) brms-model");
1164: }
1165:
1166:
Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! Warning message:
running command 'make -f "C:/PROGRA~~1/R/R-32~~1.3/etc/x64/Makeconf" -f "C:/PROGRA~~1/R/R-32~~1.3/share/make/winshlib.mk" -f "C:/Users/Alien/Documents/.R/Makevars" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' SHLIB="file19b42a4479df.dll" WIN=64 TCLBIN=64 OBJECTS="file19b42a4479df.o"' had status 127
Error : unable to load R code in package 'brms'
ERROR: lazy loading failed for package 'brms'

Allow to compute pointwise linear predictor values

Right now, the linear predictor is computed for all observations at once using matrix algebra, which is reasonable fast, because we can rely on the C code of the base package.

However, this approach leads to problems for models containing many observation, as the resulting linear predictor matrix is of dimension Nsamples x Nobservations, which will be pretty big for a few thousand samples and probably hundreds of thousands of observations. Thus, it would be good to have an option of computing the linear predictor (and related results) separately for each observation.

In fact, this is already possible using the newdata argument but desperately slow, as the code is not optimized for this scenario (e.g., the same checks are repeated multiple times).

Perform some data transformations inside the Stan code

This is a specific use-case that may not rank highly, but I hope to demo brms and Bayesian modeling in a lecture format and would love to be able to have brm generate Stan code that I could show without having to wait for Stan/C++ to compile it, then for it to be fit (which will be shorter than the Stan/C++ compile).

So I'm thinking something like a stancode_only=TRUE argument would be a nice thing to have for this situation and general learning. (Not sure how it should be handled. On the one hand, brm will output a structure of type brmsfit which stancode takes as an argument. But a brmsfit that's not a fit doesn't make sense. Maybe stancode could accept brmsfit or formula and if offered a formula would do what I'm suggesting.)

Make a "tidy" method for brmsfit objects

See https://github.com/dgrtwo/broom.

Difficulty with specification of random (linear) effects in nonlinear models.

I have trouble unterstanding how to add a nonlinear effect to an (already working) mixed linear model.

Working example : this compiles and gives sensible results :

brm(log(Duree)~Classe+Arcade+ReTt+NoTf+NoWO+(1|Etudiant), data=brmsData)

First attempt :

brm(log(Duree)~Classe+Arcade+ReTt+NoTf+NoWO+(1|Etudiant)+(-log(km+(1-km)*exp(-betaExp*Exp))),
data=brmsData, nonlinear=list(km~1,betaExp~1),prior=c(set_prior("cauchy(0,3)",nlpar="km"),
set_prior("cauchy(0,3)",nlpar="betaExp")))
Erreur : Random effects in non-linear models should be specified in the 'nonlinear' argument.

Well... second attempt :

brm(log(Duree)~Classe+Arcade+ReTt+NoTf+NoWO+(1|Etudiant)+(-log(km+(1-km)*exp(-betaExp*Exp))),
data=brmsData, nonlinear=list(km~1,betaExp~1,1~Etudiant),
prior=c(set_prior("cauchy(0,3)",nlpar="km"),set_prior("cauchy(0,3)",nlpar="betaExp"),
set_prior("cauchy(0,10)",nlpar="1")))
Erreur : Random effects in non-linear models should be specified in the 'nonlinear' argument.

Ah.

Added an "Intercept" column of ones to the dataset. Then :

brm(log(Duree)~0+Intercept+Classe+Arcade+ReTt+NoTf+NoWO+log(km+(1-km)*exp(-betaExp*Exp)),
data=brmsData, nonlinear=list(km~1,betaExp~1,Intercept~Etudiant),
prior=c(set_prior("cauchy(0,3)",nlpar="km"),set_prior("cauchy(0,3)",nlpar="betaExp"),
set_prior("cauchy(5,3)", nlpar="Intercept")))
SYNTAX ERROR, MESSAGE(S) FROM PARSER:


ERROR at line 35

 33:      for (n in 1:N) { 
 34:        // compute non-linear predictor 
 35:        eta[n] <- (0 + eta_Intercept[n] + C[n, 1] + C[n, 2] + C[n, 3] + C[n, 4] + C[n, 5] + log(eta_km[n] +); 
                                                                                                               ^
 36:      } 

PARSER EXPECTED: <expression>
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file1f1447d26961' due to the above error.

And I'm stuck... The error seems unrelated to the previous errors, but I'm not sure.

Any ideas ?

Implement multivariate student-t models

Using the multi_student_t function class of Stan, it shouldn't be too difficult to implement multivariate student-t models, given that the R code for multivariate normal models is already in place.

What families have ARMA effect implemented?

When I try Poission or Negative Binomial families, it seems like ARMA effect is not yet implemented for them. I was wondering, what families have ARMA effects implemented?

Error: ARMA effects for family negbinomial are not yet implemented

Slowness compared to rstanarm

I love brms, and am currently writing a blog post about it. Just discovered rstanarm, which is similar, but brms is better in most every case. The one thing that it lacks is that it seems to be slower to run. When I do a logistic regression on the last two Iris classifications, rstanarm runs in about 7 or 8 seconds from the time I hit return, while brms takes 30-50 seconds.

Not a show-stopper, and I imagine the gap is mainly due to fixed-cost overhead, so matters less and less on larger problems, but it's noticeable. Also, if both brms and rstanarm are loaded, brm also appears to run a bit more slowly than it does when that's not the case. (The latter isn't an important use case, except that folks might have both loaded when comparing them.)

Restrict fixed effects parameters when using bounded priors

I don't manage to use uniform priors in brm. Consider this (admittedly quite silly) example. Is there something I missing or is that a bug?

library(brms)
data("cars")
lm(dist ~ speed, data = cars)
### Call:
### lm(formula = dist ~ speed, data = cars)
###
### Coefficients:
### (Intercept)        speed  
###     -17.579        3.932 

test1 <- brm(dist ~ speed, data = cars, family = gaussian(link = "identity"), prior = set_prior("normal(0, 10)", class = "b")) # works
test2 <- brm(dist ~ speed, data = cars, family = gaussian(link = "identity"), prior = set_prior("uniform(2, 4)", class = "b", coef = "speed")) # doesn't work

Compiling the C++ model
starting worker pid=14386 on localhost:11575 at 13:39:28.187
starting worker pid=14412 on localhost:11575 at 13:39:28.570

SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1).
  [1] "Error : Rejecting initial value:"                                                                     
  [2] "  Log probability evaluates to log(0), i.e. negative infinity."                                       
  [3] "  Stan can't start sampling from this initial value."                                                 
  [4] "Rejecting initial value:"                                                                             
  [5] "  Log probability evaluates to log(0), i.e. negative infinity."                                       
  [6] "  Stan can't start sampling from this initial value."                                                                                                                       
...                                    
[300] "  Stan can't start sampling from this initial value."                                                 
[301] "Initialization between (--2, -2) failed after 100 attempts. "                                         
[302] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."
error occurred during calling the sampler; sampling not done
here are whatever error messages were returned
[[1]]
Stan model 'gaussian(identity) brms-model' does not contain samples.

[[2]]
Stan model 'gaussian(identity) brms-model' does not contain samples.

Warning message:
In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug

Add support for zero-inflated and hurdle models

The following models should be implemented:

Additional tasks:

add documentation

Implement offsets

Offsets are a basic feature of many model fitting functions such as lm, glm, or glmer.

Therefore, it would be nice to also have them in brms.

Autoregressive ordered probit models

A user asked if it is possible to model autoregressive effects of the latent response variable in an ordinal (cumulative) model. The model formulation can be found here: https://mediatum.ub.tum.de/doc/1097570/file.pdf

Modeling autoregressive effects of the observered ordered response is already possible using the cor_arr function, but it would be a great addition to allow autoregressive effects also for the latent variable.

Add full support for multivariate models

Up to now, multivariate responses are only supported if they are assumed to be (multi-) normally distributed.

It is not yet possible to incorporate multiple responses with different error distributions in the same model similar to what MCMCglmm does.

Before actually writing new code for this feature, it needs to be clarified how the dependencies of the responses are most appropriately and efficiently modeled in Stan.

Autoregressive effects of residuals

Autoregressive (AR) effects as they are currently implemented refer to effects of the response on itself. Some other packages (such as nlme) use the AR term to refer to autoregression of the residuals.

Thus, it would be nice to have AR effects of the residuals in addition to AR effects of the response.

Spatial correlation structures

We should think about implementing spatial correlations similar to the correlation structures that can be modeled with nlme.

Add Phylogenetic Correlation Structures

I would like to do comparative analysis with brms but it needs phylogenetic correlation structures.

Some examples at ?ape::corClasses

Question regarding prediction of new factors in multilevel models

Let's say I have such a model:

brm(formula = y ~ x1 + x2 + x3 +
                 (1 | cat:subcat) + (x1 + x2 + x3 || cat),
    data = df,
    family = "gaussian")

I know that the predict function does not support having new factors. But how can I get the best estimate for a new "subcat", given that it belongs to a given "cat"? I imagine since we already have estimated the posterior distribution for "cat:subcat" random variable, we should be able to use its mean value for a new "subcat".

Error Compiling

Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! In file included from file39a401aeace.cpp:826:
In file included from /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rstan/include/rstan/rstaninc.hpp:3:
In file included from /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rstan/include/rstan/stan_fit.hpp:80:
/Library/Frameworks/R.framework/Versions/3.2/Resources/library/rstan/include/rstan/io/rlist_ref_var_context_factory.hpp:4:10: fatal error: 'stan/interface/var_context_factory/dump_factory.hpp' file not found

include <stan/interface/var_context_factory/dump_factory.hpp>

1 error generated.
make: *** [file39a401aeace.o] Error 1
In addition: Warning message:
running command '/Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB file39a401aeace.cpp 2> file39a401aeace.cpp.err.txt' had status 1

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] brms_0.5.0 ggplot2_1.0.1 rstan_2.7.0-1 inline_0.3.14 Rcpp_0.12.1

loaded via a namespace (and not attached):
[1] digest_0.6.8 MASS_7.3-44 grid_3.2.2 plyr_1.8.3
[5] gtable_0.1.2 stats4_3.2.2 magrittr_1.5 scales_0.3.0
[9] stringi_0.5-5 reshape2_1.4.1 devtools_1.9.1 proto_0.3-10
[13] tools_3.2.2 stringr_1.0.0 munsell_0.4.2 parallel_3.2.2
[17] abind_1.4-3 colorspace_1.2-6 memoise_0.2.1 gridExtra_2.0.0

Automated marginal effect plots

As discussed via email, it would be cool to get automatic marginal effect plots to explore models quickly.

In some cases my lack of knowledge about brms internals and formula programming is a bit crippling (ie. not yet a perfect procedure to choose which variables to include in newdata object, nothing yet when random effects aren't to be marginalised).
I also summarise ordinal outcomes myself in a way that may be "impure" and if it's ok, should probably go in the predict method.

Here's what I have so far:
https://gist.github.com/rubenarslan/80ecc20be0e0600b41dc

Implement 2PL item response models

Using the multivariate formula syntax currently in use for multivariate linear models as well as zero-inflated and hurdle models, it should be relatively easy to specifiy (binomial) 2PL IRT models.
The Stan code for it shouldn't be too difficult either.

Improve sampling speed of models containing only fixed effects

It appears that there is some room for improvement with regard to sampling speed when fitting models containing only fixed effects (see also #24). From what I have seen, this seems to be an issue primarily for highly correlated predictors, but it might be as well more general.