drizopoulos / jmbayes2 Goto Github PK

Extended Joint Models for Longitudinal and Survival Data

Home Page: https://drizopoulos.github.io/JMbayes2/

R 63.25% C++ 32.96% Stan 0.03% C 0.06% HTML 3.70%

mixed-models longitudinal-analysis survival-models multi-state competing-risks prediction-model personalized-medicine precision-medicine

jmbayes2's Introduction

JMbayes2: Extended Joint Models for Longitudinal and Time-to-Event Data

The package JMbayes2 fits joint models for longitudinal and time-to-event data. It can accommodate multiple longitudinal outcomes of different type (e.g., continuous, dichotomous, ordinal, counts), and assuming different distributions, i.e., Gaussian, Student’s-t, Gamma, Beta, unit Lindley, censored Normal, Binomial, Poisson, Negative Binomial, and Beta-Binomial. For the event time process, right, left and interval censored data can be handled, while competing risks, multi-sate and recurrent-event processes are also covered..

JMbayes2 fits joint models using Markov chain Monte Carlo algorithms implemented in C++. Besides the main modeling function, the package also provides a number of functions to summarize and visualize the results.

Installation

JMbayes2 can be installed from CRAN:

install.packages("JMbayes2")

The developments version can be installed from GitHub:

# install.packages("remotes")
remotes::install_github("drizopoulos/jmbayes2")

Minimal Example

To fit a joint model in JMbayes2 we first need to fit separately the mixed-effects models for the longitudinal outcomes and a Cox or accelerated failure time (AFT) model for the event process. The mixed models need to be fitted with function lme() from the nlme package or function mixed_model() from the GLMMadaptive package. The Cox or AFT model need to be fitted with function coxph() or function survreg() from the survival package. The resulting model objects are passed as arguments in the jm() function that fits the corresponding joint model. We illustrate this procedure for a joint model with three longitudinal outcomes using the PBC dataset:

# Cox model for the composite event death or transplantation
pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)

# a linear mixed model for log serum bilirubin
fm1 <- lme(log(serBilir) ~ year * sex, data = pbc2, random = ~ year | id)

# a linear mixed model for the prothrombin time
fm2 <- lme(prothrombin ~ year * sex, data = pbc2, random = ~ year | id)

# a mixed effects logistic regression for ascites
fm3 <- mixed_model(ascites ~ year + sex, data = pbc2,
                   random = ~ year | id, family = binomial())

# the joint model that links all sub-models
jointFit <- jm(CoxFit, list(fm1, fm2, fm3), time_var = "year",
                n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)
summary(jointFit)

jmbayes2's People

Contributors

Stargazers

Watchers

jmbayes2's Issues

predict() fails if longitudinal data are not pre-sorted by both ID and time_var

Here is an example adapted from the JMBayes2 vignettes:

fm1 <- lme(log(serBilir) ~ ns(year, 3) * sex, data = pbc2,
random = ~ ns(year, 3) | id, control = lmeControl(opt = 'optim'))

fm2 <- lme(prothrombin ~ ns(year, 2) * sex, data = pbc2,
random = ~ ns(year, 2) | id, control = lmeControl(opt = 'optim'))

fm3 <- mixed_model(ascites ~ year * sex, data = pbc2,
random = ~ year | id, family = binomial())

pbc2.id$event <- as.numeric(pbc2.id$status != "alive")
CoxFit <- coxph(Surv(years, event) ~ drug + age, data = pbc2.id)

jointFit <- jm(CoxFit, list(fm1, fm2, fm3), time_var = "year",
control = list(n_iter = 1000L, # TOTAL number of iterations (includes n_burnin)
n_burnin = 700L, # Burn-in iterations
n_thin = 10L))

t0 <- 5
ND <- pbc2[pbc2$id %in% c(25, 93), ]
ND <- ND[ND$year <= t0, ]
ND$status2 <- 0
ND$years <- t0

## Scramble the rows in 'ND' ... if we skip this step, predict() works as intended
ND <- ND[sample(1:nrow(ND), nrow(ND), FALSE),]

predLong1 <- predict(jointFit,
newdata = ND,
type = "mean_subject",
type_pred = "response",
return_newdata = TRUE,
n_samples = 15)

Error in simulate_REs(Data, mcmc, control) :
Mat::elem(): incompatible matrix dimensions: 2x1 and 5x1

Heterogeneous Longitudinal Residuals

Does JMBayes2 accommodate heterogeneous residual variance in the longitudinal model? -- e.g. residuals are larger for men than for women. I am implementing this in nlme::lme() as follows:

weights = varIdent(form = ~ 1 | Sex)

If JMBayes2 does accommodate them, where do I find the posterior estimates of the group-specific variance terms in the model output?

Thank you

Dynamic predictions: longitudinal outcome axis scale issue.

Hi,

I have found a possible issue with the plot.predict_jm(), currently it uses the range of values from the predict function for the ylim values, however if I have a biomarker with a valid range from 1-50000 (or 1-11 on the log scale) and I use the log scale of that variable in the JM the plot.predict_jm() function produces the following graph:

Which is easy to interpret, however if I use fun_long = exp, to use the original biomarker scale I get the following plot in which the ylim makes the graph hard to interpret:

It would be good if you could override the ylim for the longitudinal model(s) by specifying a custom range so the graph looks like this:

Many Thanks.

Default baseline hazard knots format and placement when Bsplines_degree > 0L

Hi {JMbayes2} team - thank you for the impressive work on this package!

Related to #24, but I was noticing strange results (and totally different compared to {JM}) when using the baseline hazards that were not piecewise exponential. It was with an applied dataset where events only started occurring after 1 month (and maximum follow-up was 6 months, with administrative censoring). I tried to reproduce a similar set up based on the pbc2 data below:

set.seed(20220715)
library(JM)
#> Loading required package: MASS
#> Loading required package: nlme
#> Loading required package: splines
#> Loading required package: survival
library(JMbayes2)
#> Loading required package: GLMMadaptive
#> 
#> Attaching package: 'GLMMadaptive'
#> The following object is masked from 'package:MASS':
#> 
#>     negative.binomial
#> 
#> Attaching package: 'JMbayes2'
#> The following object is masked from 'package:MASS':
#> 
#>     area

# Suppose events only start after 3 years
ids_3y <- with(pbc2.id, id[which(years >= 3)])
pbc2.id_3y <- subset(pbc2.id, id %in% ids_3y)
pbc2_3y <- subset(pbc2, id %in% ids_3y)

# Fit submodels - second example from JM::jointModel()
lmeFit <- lme(log(serBilir) ~ year * drug, random = ~ year | id, data = pbc2_3y)
coxFit <- coxph(Surv(years, status2) ~ drug, data = pbc2.id_3y, x = TRUE)

# Fit with JM
pbc_JM <- jointModel(
  lmeObject = lmeFit,
  survObject = coxFit,
  method = "spline-PH-aGH",
  timeVar = "year",
  ord = 3L, # quadratic B-splines
  lng.in.kn = 9L # internal knots, so 10 "base_hazard_segments"
)

# Check knots (boundary knots repeated ord = 3L times)
pbc_JM$control$knots
#> $`1`
#>  [1]  0.01347527  0.01347527  0.01347527  4.21202497  5.18946446  5.73978754
#>  [7]  6.51626328  7.21443434  8.19789727  8.98368196 10.03121235 12.16679444
#> [13] 14.30566203 14.30566203 14.30566203

# Fit JMbayes2 with defaults
pbc_JMbayes2_m1 <- jm(
  Mixed_objects = lmeFit,
  Surv_object = coxFit,
  time_var = "year",
  Bsplines_degree = 2L, # quadratic, the default
  base_hazard_segments = 10L # also default
)

# Check knots
pbc_JMbayes2_m1$control$knots
#> $`1`
#>  [1]  0.9237761  2.0389333  3.1540905  4.2692476  5.3844048  6.4995619
#>  [7]  7.6147191  8.7298762  9.8450334 10.9601906 12.0753477 13.1905049
#> [13] 14.3056620 15.4208192 16.5359763

# Fit JMbayes2 with knots as formatted by JM
pbc_JMbayes2_m2 <- jm(
  Mixed_objects = lmeFit,
  Surv_object = coxFit,
  time_var = "year",
  knots = pbc_JM$control$knots,
  Bsplines_degree = 2L
)

# Compare coefficients
cbind.data.frame(
  "JM" = coef(pbc_JM, "Event"),
  "JMbayes2 (own knots)" = c(
    coef(pbc_JMbayes2_m1)$gammas,
    coef(pbc_JMbayes2_m1)$association
  ),
  "JMbayes2 (JM knots)" = c(
    coef(pbc_JMbayes2_m2)$gammas,
    coef(pbc_JMbayes2_m2)$association
  )
)
#>                       JM JMbayes2 (own knots) JMbayes2 (JM knots)
#> drugD-penicil 0.09590339            0.0196705          0.07466035
#> Assoct        1.09028319            0.6905448          1.12973032

^{Created on 2022-07-15 by the reprex package (v2.0.1)}

So beyond the results being very different, there is also:

Some of the knots from pbc_JMbayes2_m1 are placed outside of the follow-up window. If you try the same code with Bsplines_degree = 3L, the first knot will even be placed at a negative timepoint!
The format of the knots (I think created here) does not seem to be the one required when passed to splineDesign() here. Based on what JM::jointModel() does, it would seem you need the boundary knots to be repeated Bsplines_degree + 1L times. Somehow though, things still work out with the defaults if you have data with events starting from the beginning of follow-up (if you re-run the above example with the original pbc2 datasets, results are the same for all three models..)

I am using the latest version of {JM}, and the latest development version of {JMbayes2}. Thanks in advance!

No MCMC samples for shared random effects?

Is it possible to access the MCMC samples for the shared random effects of a joint model?

I'm assuming that the "b" element of the "mcmc" list of a "jm" object would be the place to find them. However, this is the only element of the list that does not have the right dimensions.

I'm interested in the fitted values of the model over the different MCMC samples. In a different issue it says to construct them using the design matrices and posterior means. However, the "Mean" element of the "statistics" component of a "jm" object does not equal the "b" element, so is it just one random sample?

Thanks!

# Standard Example
pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)
fm1 <- lme(log(serBilir) ~ year * sex, data = pbc2, random = ~ year | id)
jointFit <- jm(CoxFit, list(fm1), time_var = "year",
                n_iter = 1200L, n_burnin = 200L, n_thin = 5L)
str(jointFit$mcmc$b)

Question: Identifiability of Wlong_std_alphas and W_std_gammas

If I understand correctly, the log-hazard function at time t is:
log Hazard(t) = Association Effects at t + Baseline log Hazard at t + Survival Effects at t

As I understand the model outputs, both the association effects and the baseline hazard are modeled with a centering term:

Association Effect (t) = LongResponse(t) * (U(t) %*% alphas) - Wlong_std_alphas
Baseline log Hazard (t) = W0_H(t) %*% bs_gammas - W_std_gammas

Both of these centering terms are constants in the model.

My question is: how are Wlong_std_alphas and W_std_gammas made to be identifiable in estimation?

I understand that they have separate interpretations:

W_std_gammas is essentially an intercept term for the baseline hazard spline
Wlong_std_alphas "zeros out" the association effects so that they do not interfere with W_std_gammas

Unexpected behavior when specifying a narrow prior

Thank you in advance for considering my feedback.

I am wanting to fit a model where I restrict the prior on one of the longitudinal model betas.

[More specifically, I want to use the baseline value of the response as a predictor, but I want to use the response value itself in the association function. So, I would like the prior on the baseline covariate intercept term to be N(1, \epsilon).]

When I attempt to assign a narrow prior to one of the betas (we'll call it beta1):

The model converges much more slowly. Not a big surprise. In fact, it doesn't converge.
The standard deviation of the posterior samples for beta1 is indeed small, however
The posterior mean estimate for beta1 is nowhere close to the vicinity of the prior mean
Posterior estimates for several other parameters are also greatly affected (even in different longitudinal sub-models)

I have coded up an example using the competing risks vignette, which I attach as a text file for reference. The only thing I have changed from the vignette is the prior on 'year' in the prothrombin sub-model.
MWE_Priors.txt

The vignette version returns the following outputs:

#> Longitudinal Outcome: log(serBilir) (family = gaussian, link = identity)
#>                   Mean  StDev    2.5%   97.5%      P   Rhat
#> (Intercept)     1.1528 0.1214  0.9134  1.3938 0.0000 1.0005
#> poly(year, 2)1 25.8670 2.8550 20.2399 31.3946 0.0000 1.0051
#> poly(year, 2)2  0.8610 2.1699 -3.4219  5.1293 0.6863 1.0018
#> drugD-penicil  -0.1588 0.1623 -0.4785  0.1566 0.3298 1.0004
#> p(,2)1         -1.8612 2.8932 -7.6179  3.7252 0.5235 1.0023
#> p(,2)2         -0.9272 2.4912 -5.7397  3.9439 0.7163 1.0000
#> sigma           0.3410 0.0153  0.3156  0.3747 0.0000 1.0041
#> 
#> Longitudinal Outcome: prothrombin (family = gaussian, link = identity)
#>                       Mean  StDev    2.5%   97.5%      P   Rhat
#> (Intercept)        10.6320 0.2094 10.2283 11.0407 0.0000 1.0000
#> year                0.2839 0.0769  0.1345  0.4360 0.0000 1.0002
#> drugD-penicil      -0.0914 0.2943 -0.6689  0.4837 0.7545 0.9999
#> year:drugD-penicil -0.0164 0.1024 -0.2184  0.1845 0.8700 1.0002
#> sigma               1.0820 0.0254  1.0367  1.1358 0.0000 1.0012

But the output from my example (using the same number of MCMC samples but with no expectation of converging any time soon) are:

Longitudinal Outcome: log(serBilir) (family = gaussian, link = identity)
                  Mean   StDev     2.5%   97.5%      P    Rhat
(Intercept)     8.4432 10.6819   0.6581 27.4133 0.0098 14.2345
poly(year, 2)1  8.1195 10.0260  -9.3468 20.8925 0.6495  6.3008
poly(year, 2)2 -1.6317  2.8288  -7.2137  4.0336 0.5223  1.1792
drugD-penicil  10.4412 15.3757  -0.6011 37.9708 0.8107 12.1046
p(,2)1          0.3375  5.1752 -10.0762  8.8311 0.8420  2.5454
p(,2)2         -1.3391  3.0385  -7.3115  4.6914 0.6405  1.1741
sigma          12.6094 17.2553   0.3212 42.7289 0.0000 15.1153

Longitudinal Outcome: prothrombin (family = gaussian, link = identity)
                        Mean   StDev      2.5%     97.5%      P   Rhat
(Intercept)           5.1614 10.2129   -7.9874   20.0370 0.7867 5.9491
year               1741.1538  0.0124 1741.1292 1741.1785 0.0000 1.0310
drugD-penicil       -12.5322 14.0377  -31.9819    6.7102 0.3557 6.7871
year:drugD-penicil   -3.8517  4.7470  -10.9456    3.4351 0.4045 5.9394
sigma              2758.4391 14.2312 2730.2376 2786.2170 0.0000 1.1861

In general, I think I understand that the sampler is not really built for this kind of application, and that the narrow variance on the beta1 prior is causing the sampler explore the parameter space very inefficiently.

I guess the key issue that sticks with me -- and the point I want to raise to your attention is: how is it that the posterior sampling strays so far from the prior in the first place? The prior on 'year' for the prothrombin submodel has a mean of 0.244 and a variance of 0.001 (precision = 1000). It makes no sense for the MCMC sampler to be exploring values near 1741. I know from shorter runs of my example model that the 'year' parameter estimate is going astray very early in the posterior sampling (in the first 110 samples), but then the prior precision takes over and limits exploration of the parameter space for 'year'. It feels like the prior is not used for an initial cycle of 100+ samples. By the time the prior precision takes over, it is too late and the MCMC sampler in the wrong region of the parameter space.

And an important related question is: how do I ever know when a prior (even a default prior) is too specific?

Again, thank you for thinking about this. I am hopeful my feedback can help improve the MCMC sampler.

Extracting log-likelihood

The help document for "jm" says that an object of class "jm" will contain log-likelihood (conditional and marginal) for posterior samples. So, for example, for a fitted model named 'JMfit', the components JMfit$logLik and JMfit$mlogLik will contain the posterior likelihoods.

When I run the competing risks model from the vignette, the log-likelihoods are not part of the output.

Is there a way to retrieve the posterior log-likelihoods?

Coverage probability seems too high for fixed effects parameters of the longitudinal submodels

Hello,
I am trying to compare the available packages for joint models with multivariate longitudinal outcome and a terminal event through simulation studies.
I have surprising results with JMbayes2, the coverage probabilities for the fixed effects of the longitudinal submodels are very close to 1 instead of 0.95 (based on quantiles at 2.5% and 97.5% given in the summary of jm() output), am I doing something wrong?
Please find attached a code that illustrates the issue in a simple case (one marker with random intercept and slope and a terminal event) but I have similar results with multivariate longitudinal data.
Thanks,
Denis Rustand

library(joineR) # for data simulation
library(JMbayes2)
nmod=50 # number of models
nsub=100 # number of individuals

# store coverage of true value (1) of each model parameter in "res_list" to compute coverage probabilities
res_list <- replicate(nmod, rep(NA, 4), simplify = FALSE)
for(i in 1:nmod){
# association between longi and surv is set to zero
data <- simjoint(nsub, model="intslope", gamma=0)

  CoxFit <- coxph(Surv(survtime, cens) ~ ctsx + binx, data = data$survival)
  fm1 <- lme(Y ~ time + ctsxl + binxl, random = ~ time | id, data = data$longitudinal)
  JMB <- jm(CoxFit, fm1, time_var = "time")
  JMB2 <- summary(JMB)

  # 1 if true value is within credible interval, 0 otherwise
  res_list[[i]][1] <- ifelse(1>=(JMB2$Outcome1["(Intercept)", "2.5%"]) & 1<=(JMB2$Outcome1["(Intercept)", "97.5%"]), 1, 0) # intercept
  res_list[[i]][2] <- ifelse(1>=(JMB2$Outcome1["time", "2.5%"]) & 1<=(JMB2$Outcome1["time", "97.5%"]), 1, 0) # slope
  res_list[[i]][3] <- ifelse(1>=(JMB2$Outcome1["binxl", "2.5%"]) & 1<=(JMB2$Outcome1["binxl", "97.5%"]), 1, 0) # continuous covariate
  res_list[[i]][4] <- ifelse(1>=(JMB2$Outcome1["ctsxl", "2.5%"]) & 1<=(JMB2$Outcome1["ctsxl", "97.5%"]), 1, 0) # binary covariate
}

Coverage=rowMeans(matrix(unlist(res_list), ncol=nmod))
# (I only look at the fixed effects from the longitudinal submodel)
names(Coverage) <- c("Y_intercept", "Y_slope", "Y_ctsx", "Y_binx") # intercept, slope, continuous covariate and binary covariate
print(Coverage) # coverage probabilities

Error encountered with jm function when supplying a survreg model

I was trying out other types of survival models with your example and ran into this issue.

library(JMbayes2)
#> Warning: package 'JMbayes2' was built under R version 4.1.2
#> Loading required package: survival
#> Loading required package: nlme
#> Loading required package: GLMMadaptive
#> Warning: package 'GLMMadaptive' was built under R version 4.1.2
#> Loading required package: splines

# AFT model for the composite event death or transplantation
pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
AFT <- survreg(Surv(years, status2) ~ sex, data = pbc2.id, dist = "loglogistic")

# a linear mixed model for log serum bilirubin
fm1 <- lme(log(serBilir) ~ year * sex, data = pbc2, random = ~ year | id)

# a linear mixed model for the prothrombin time
fm2 <- lme(prothrombin ~ year * sex, data = pbc2, random = ~ year | id)

# a mixed effects logistic regression model for ascites
fm3 <- mixed_model(ascites ~ year + sex, data = pbc2,
                   random = ~ year | id, family = binomial())

# the joint model that links all sub-models
jointFit <- jm(AFT, list(fm1, fm2, fm3), time_var = "year",
               n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)
#> Error in checkForRemoteErrors(val): 3 nodes produced errors; first error: matrix multiplication: incompatible matrix dimensions: 4680x1 and 2x1
summary(jointFit)
#> Error in summary(jointFit): object 'jointFit' not found

^{Created on 2022-02-15 by the reprex package (v2.0.1)}

Numerical differentiation

The numerical differentiation here

JMbayes2/R/help_functions.R

Lines 990 to 1007 in 5859d7e

    
           if (direction_i == "both") { 
        
               if (is.list(time)) { 
        
                   t1 <- lapply(time, function (t) t + eps_i) 
        
                   t2 <- lapply(time, function (t) pmax(t - eps_i, 0)) 
        
               } else { 
        
                   t1 <- time + eps_i 
        
                   t2 <- pmax(time - eps_i, 0) 
        
               } 
        
               e <- 2 * eps_i 
        
           } else { 
        
               t1 <- time 
        
               if (is.list(time)) { 
        
                   t2 <- lapply(time, function (t) pmax(t - eps_i, 0)) 
        
               } else { 
        
                   t2 <- pmax(time - eps_i, 0) 
        
               } 
        
               e <- eps_i 
        
           }

may give the wrong result when the point is close to zero. This is illustrated below:

test_func <- function(f, x, direction, eps = .001){
  if(direction == "both"){
    x1 <- x + eps
    delta <- 2 * eps
  } else if (direction == "backwards"){
    x1 <- x
    delta <- eps
  }

  x2 <- pmax(x - eps, 0)
  (f(x1) - f(x2)) / delta
}

f <- function(x) x^2 + x

test_func(f, 1:2, direction = "both") # c(2 * 1 + 1, 2 * 2 + 1)
test_func(f, 1:2, direction = "backwards") # ~ correct

test_func(f, 0, direction = "both") # wrong
test_func(f, 0, direction = "backwards") # wrong
test_func(f, 1e-5, direction = "both") # wrong
test_func(f, 1e-5, direction = "backwards") # wrong

The close to zero evaluation may happen if the time scale is small and there is no left-truncation. A fix can be something like:

fix_func <- function(f, x, direction, eps = .001){
  if(direction == "both"){
    x1 <- x + eps
    delta <- ifelse(x <= eps, eps, 2 * eps)
  } else if (direction == "backwards"){
    if(any(x <= eps))
      stop("'backwards' with times less than eps")
    x1 <- x
    delta <- eps
  }

  x2 <- ifelse(x <= eps, x, x - eps) # safe
  (f(x1) - f(x2)) / delta
}

fix_func(f, 1:2, direction = "both") # c(2 * 1 + 1, 2 * 2 + 1)
fix_func(f, 1:2, direction = "backwards") # ~ correct

fix_func(f, 0, direction = "both") # ~ correct
fix_func(f, 0, direction = "backwards") # error
fix_func(f, 1e-5, direction = "both") # ~ correct
fix_func(f, 1e-5, direction = "backwards") # error

The error can be replaced by forward numerical differentiation. An example where this may be an issue is given below. Perhaps it is something else (e.g. the default prior?) in which case I apologize!

library(JMbayes2)
sum(pbc2$year == 0) # quite a few zeros
#R> [1] 312
max(tapply(pbc2$year, pbc2$id, min)) # everybody has a time zero measurement
#R> [1] 0

# only keep three measurements per individual
pbc2 <- pbc2[order(pbc2$id, pbc2$year), ]
set.seed(1)
to_keep <- tapply(1:NROW(pbc2), pbc2$id, function(indices)
  if(length(indices) < 4) indices else
    c(indices[1], sample(indices, 2L)))
to_keep <- unlist(to_keep)
pbc2 <- pbc2[to_keep, ]

# fit on the original data
fm_org <- lme(fixed = log(serBilir) ~ year * sex, random = ~ year | id,
              data = pbc2)

fCox_org <- coxph(Surv(years, status2) ~ drug + age, data = pbc2.id)

fForms <- list("log(serBilir)" = ~ slope(log(serBilir)))

fit_org <- jm(fCox_org, fm_org, time_var = "year",
              functional_forms = fForms, n_chains = 1L,
              n_iter = 26000L, n_burnin = 1000L)

# squeeze the time scale
squeeze_fac <- max(pbc2$year) * 2
pbc2_squeeze <- transform(pbc2, year = year / squeeze_fac)
pbc2.id_squeeze <- transform(pbc2.id, years = years / squeeze_fac)

fm_squeeze <- lme(fixed = log(serBilir) ~ year * sex, random = ~ year | id,
                  data = pbc2_squeeze)

fCox_squeeze <- coxph(Surv(years, status2) ~ drug + age, data = pbc2.id_squeeze)

fit_squeeze <- jm(fCox_squeeze, fm_squeeze, time_var = "year",
                  functional_forms = fForms, n_chains = 1L,
                  n_iter = 26000L, n_burnin = 1000L)

# not that similar
summary(fit_org)$Survival
#R>                             Mean      StDev        2.5%       97.5%       P
#R> drugD-penicil        -0.05857478 0.32620558 -0.68630913  0.59358824 0.84824
#R> age                   0.06668607 0.01584273  0.03640867  0.09912225 0.00008
#R> slope(log(serBilir)) 11.24150868 1.56719955  8.37510852 14.42183870 0.00000
summary(fit_squeeze)$Survival
#R>                             Mean      StDev       2.5%     97.5%       P
#R> drugD-penicil        -0.06562505 0.33071585 -0.6943179 0.5990626 0.82672
#R> age                   0.06890149 0.01584006  0.0385639 0.1012742 0.00000
#R> slope(log(serBilir))  0.47809285 0.06831340  0.3465517 0.6176610 0.00000

# slope should have changed with a factor 1 / squeeze_fac
summary(fit_org)$Survival["slope(log(serBilir))", ] / squeeze_fac
#R>                           Mean      StDev      2.5%     97.5% P
#R> slope(log(serBilir)) 0.4482367 0.06248952 0.3339437 0.5750472 0
summary(fit_squeeze)$Survival["slope(log(serBilir))", ]
#R>                           Mean     StDev      2.5%    97.5% P
#R> slope(log(serBilir)) 0.4780929 0.0683134 0.3465517 0.617661 0

# the output from lme seems correct
rbind(fm_org$coefficients$fixed * c(1, squeeze_fac, 1, squeeze_fac),
      fm_squeeze$coefficients$fixed)
#R>      (Intercept)     year  sexfemale year:sexfemale
#R> [1,]   0.8088588 5.862404 -0.2818363      -2.880438
#R> [2,]   0.8088588 5.862406 -0.2818362      -2.880439

JMBayes2 results don't seem to match JMBayes results

Hi Dimirtris,
Great to see this new package. I was just testing it out and I found that results from JMBayes and JMBayes2 don't match when the same sub-models are used as arguments.

A reprex:
library(JMbayes)
lmeFit.aids <- lme(CD4 ~ obstime * drug, random = ~ obstime | patient, data = aids)
survFit.aids <- coxph(Surv(Time, death) ~ drug, data = aids.id, x = TRUE)
JMB1_model <- jointModelBayes(lmeFit.aids, survFit.aids, timeVar = "obstime")
library(JMbayes2)
JMB2_model <- jm(survFit.aids, lmeFit.aids, time_var = "obstime")

Looking at the results:

As you can see the results are quite different. I note that some that there are less parameters in JMBayes2, so perhaps I should not expect the same results ? However in my own data the AssocT parameter changed from a significant negative number to a significant positive number so I thought this issue was worth raising.
Many thanks!

calibration_metrics Thoriz argument error

I appear to get an error message when explicitly specifying both Dt=NULL & Thoriz=as.numeric() arguments in the calibration_metrics function, or even just the Thoriz argument on its own without Dt.

calibration_metrics(JMB2,  newdata = newdata, Tstart=i, Dt=NULL, Thoriz=8)
Error in calibration_plot(object, newdata, Tstart, Dt = Dt, df_ns = df_ns,  : 
  either 'Thoriz' or 'Dt' must be non null.

I think this small bug is made by the missing Thoriz argument in the calibration_plot function. I think explicitly adding this argument to line 408 would solve this issue.

    comps <- calibration_plot(object, newdata, Tstart, Dt = Dt, df_ns = df_ns, Thoriz=Thoriz,
                              plot = FALSE)

Many thanks,
Harry

question: interval censored outcomes?

Hi @drizopoulos!

This software is incredible. Thank you for developing it. JMbayes allows users to pass models for interval-censored outcomes fit with survreg into the joint model fitting function. Does JMbayes2 allow this? I apologize if this is stated in the documentation somewhere and I missed it.

Nested / multilevel random-effects query

Dear Prof Rizopoulos,

Is it possible that the lme can handle multiple sources of a multi-level/nested random effect model, say not only individuals but also between different studies?

When I try to specify the mixed-effects model via lme I can either use the || notation or say random = list(Study= ~ 1, USUBJID = ~ ns(Time,2))). When I go to fit the jm, I receive the error
Error in mat[id, ] <- b : number of items to replace is not a multiple of replacement length (fitting with only the individual ID term is fine)

I assume the jm function cannot handle nested random-effects at this time. What would you advise, keeping the Study variable as a main effect only in the lme and/or using some strata/cluster/frailty term in the survival submodel?

Many thanks,
Harry

Request: Matrix inputs

The jm() function cannot handle matrix input variables, such as generated using splines. It produces an error that the input dimensions do not match the dimensions of the object.

More specifically, consider three workflows using natural splines -- i.e., ns(variable, df):

Dataset -> Fit coxph and longitudinal models using ns(variable, df) in the formula -> Run jm()
Dataset -> Create spline variable using ns(variable, df) -> Fit coxph and longitudinal models using spline variable -> Run jm()
Dataset -> Create spline variable using ns(variable, df) -> Fit coxph and longitudinal models without spline variable -> Run jm()

Workflow #1 runs just fine.
Workflow #2 returns multiple warnings about incorrect dimensions.
Workflow #3 also returns multiple warnings, in spite of the fact that the spline variable is not an input to the model.

Note: the lme() and coxph() submodels fit correctly. It is just jm() that appears to have an indexing problem.

The following is a MWE:

# Cox model for the composite event death or transplantation
pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
CoxFit <- coxph(Surv(years, status2) ~ sex + ns(albumin, 2), data = pbc2.id)

# a linear mixed model for log serum bilirubin
fm1 <- lme(log(serBilir) ~ year * sex + ns(albumin, 2), data = pbc2, random = ~ year | id)

# the joint model that links all sub-models
jointFit <- jm(CoxFit, fm1, time_var = "year",
               n_iter = 500L, n_burnin = 200L, n_thin = 5L)



# Cox model for the composite event death or transplantation
newpbc2id <- pbc2.id
newpbc2id$albuminSpl <- ns(newpbc2id$albumin, 2)
CoxFit <- coxph(Surv(years, status2) ~ sex + albuminSpl, data = newpbc2id)

# a linear mixed model for log serum bilirubin
newpbc <- pbc2
newpbc$albuminSpl <- ns(newpbc$albumin, 2)
fm1 <- lme(log(serBilir) ~ year * sex + albuminSpl, data = newpbc, random = ~ year | id)

# the joint model that links all sub-models
jointFit <- jm(CoxFit, fm1, time_var = "year",
               n_iter = 500L, n_burnin = 200L, n_thin = 5L)



# Cox model for the composite event death or transplantation
CoxFit <- coxph(Surv(years, status2) ~ sex, data = newpbc2id)

# a linear mixed model for log serum bilirubin
fm1 <- lme(log(serBilir) ~ year * sex, data = newpbc, random = ~ year | id)

# the joint model that links all sub-models
jointFit <- jm(CoxFit, fm1, time_var = "year",
               n_iter = 500L, n_burnin = 200L, n_thin = 5L)

ordinal and continuous time-dependent covariates with survival outcome in the same joint model

Hi @drizopoulos

Thank you for constantly continuing with the development of the JMbayes2 package. It has been extremely useful for my work.

I was wondering if there is a way to fit joint models for survival outcomes where one longitudinal outcome is ordinal and the others are either continuous and binomial? This without having to use the expanded dataset created to fit the continuation ratio model for the ordinal outcome when fitting the models for the other longitudinal outcomes. I tried that and get an error stating that the same version of the dataset has to be used for each of the longitudinal sub models.

Also, when I fit the joint model with a continuation ratio for the ordinal outcome I get the following error.

_Error in mlogLik_jm(res_thetas, statistics$Mean[["b"]], statistics$post_vars, :
log_det_sympd(): given matrix is not symmetric positive ### definite

Is there a way to correct for this error from either the mixed_model or the jm functions?

Any advice on this would be greatly apprecited.

Difference in coefficient values between jm and jointModelBayes

Hello!
First, I want to thank you for the great and helpful work!!
I used JMbayes package before in a project, and now I started using JMbayes2 since I need to used more than one longitudinal markers. Since I am using almost the same data in both projects, I have noticed that jm function gives similar results to jointModelBayes, but with a difference by factor of 10 in opposite direction for current value and cumulative association structure. For the current value, using jointModelBayes, I got α =0.13, Whereas when using jm α= 0.013. The value obtained by jointModelBayes is the one closer to what is known in the literature in my case. Whereas for the cumulative association structure, using jointModelBayes, α =0.01, while using jm α= 0.18.
I would appreciate your feedback on this problem.

Error in predict function in JMbayes2

Thank you , Here is the error

predLong <- predict(jFit_CR, newdata = ND, return_newdata = TRUE, times = seq(6.5, 15, length = 25))
Error in split.default(x, g) : first argument must be a vector

predEvent <- predict(jFit_CR, newdata = ND, return_newdata = TRUE, process = "event")
Error in split.default(x, g) : first argument must be a vector

Competing risks example

Hi, can you provide additional joint model example in the presence of competing risks?

Residual plots

Hi,

How can I plot residuals of a jm model (JMbayes2)?

Thanks

Estimated p-value is either 0.000 or 0.6667

Hi, I used the JMbayes2 jm function to fit the joint model (n_iter = 20000L, n_burnin = 12000L, n_thin = 5L) and the estimated p-values of the survival model were either 0.000 or 0.667, I was wondering if there is anything wrong with the model. Thank you so much!

Strange results with piecewise constant baseline hazard

Dear Professor Rizopoulos,

I have been using a lot the joint modeling packages developed by you, firstly the JM package, and currently the JMbayes2 package. I like to thank you for the important work you have done in this research area.

However, I faced a problem with the JMbayes2 package, due to baseline hazard estimation. I have fitted the joint models with baseline hazard set as piecewise constant with 2 knots (3 hazard segments; implemented by adding “control=list(Bsplines_degree=0, base_hazard_segments=3))”). Results have been plausible as long as I used the one particular endpoint, but when I started to fit models with a different endpoint, all the results (dependently on the exposure variable) became very strange.

To be clear, I am investigating the association between different longitudinally measured nutritional variables and the risk of type 1 diabetes. This problematic endpoint (type 1 diabetes, T1D) compared to the first one I used with plausible behavior (islet autoimmunity, IA) has

less events (n=89/N=5673 for T1D vs. n=243/N=5523 for IA),
the first event occurs not until after 0.79-year follow-up (vs. after 0.1-year follow-up for IA), and
all censored events occur at the end of the follow-up based on data originating from the register (vs. immediately after start of the follow-up for IA).

For example, the obtained hazard ratios (95% CI) for energy intake are 23.2 (19.0, 28.2) with piecewise constant, 2.18 (1.71, 2.78) with piecewise linear, and 1.46 (1.04, 2.02) with piecewise quadratic baseline hazard with 3 hazard segments. The last one agrees with the result obtained with jointModel function from the JM-package, where piecewise constant baseline hazard with knots at 1.75 and 3.75 years (in 5.75-year follow-up) was used.

However, I noticed that JMbayes2 may fit the piecewise constant baseline hazard between the first and the last censored or observed endpoint (by ignoring the time before the first endpoint?), since the boundary knots of the baseline hazard seems to be placed at 0.79 and 5.75 years for type 1 diabetes outcome. Here are the outputs relating to the baseline hazards of piecewise constant, piecewise linear and piecewise quadratic models (notice that the follow-up starts from time point 0):

1. Piecewise constant baseline hazard (with 2 knots)

summary(jmod_diab_ENERC_total_MJ_cons3)$control$knots$1
[1] 0.7876454 2.4417636 4.0958818 5.7500000
jmod_diab_ENERC_total_MJ_cons3$statistics$Mean$bs_gammas
bs_gammas_1 bs_gammas_2 bs_gammas_3
-9.439852 -11.944761 -14.200867

2. Piecewise linear baseline hazard (with 2 knots)

summary(jmod_diab_ENERC_total_MJ_lin3)$control$knots$1
[1] -0.8664727 0.7876454 2.4417636 4.0958818 5.7500000 7.4041182
jmod_diab_ENERC_total_MJ_lin3$statistics$Mean$bs_gammas
bs_gammas_1 bs_gammas_2 bs_gammas_3 bs_gammas_4
-6.573006 -6.478030 -6.723222 -7.118328

3. Piecewise quadratic baseline hazard (with 2 knots)

summary(jmod_diab_ENERC_total_MJ_quadr3)$control$knots$1
[1] -2.5205909 -0.8664727 0.7876454 2.4417636 4.0958818 5.7500000 7.4041182 9.0582364
jmod_diab_ENERC_total_MJ_quadr3$statistics$Mean$bs_gammas
bs_gammas_1 bs_gammas_2 bs_gammas_3 bs_gammas_4 bs_gammas_5
-6.670896 -6.299837 -6.161903 -6.251574 -6.398656
`

But if I manually change the time of the endpoint of one child (censored or observed, this was done for the child having exposure data only from the first measurement at time 0, so I did not modify the exposure data) to time 0.01, I obtain a more rational result of HR: 1.40 (95% CI: 1.01, 1.88), and the same baseline hazard output for piecewise-constant hazard becomes as:

summary(jmod_diab_ENERC_total_MJ_cons3_test1)$control$knots$1
[1] 0.010000 1.923333 3.836667 5.750000
jmod_diab_ENERC_total_MJ_cons3_test1$statistics$Mean$bs_gammas
bs_gammas_1 bs_gammas_2 bs_gammas_3
-6.409399 -6.210152 -6.125433

So, my question is: Is the baseline hazard estimation going wrong with piecewise-constant hazard? Is it ignoring the time before the first observed or censored endpoint in estimation?

Thank you in advance for the help!

With best regards, Essi Syrjälä

Quadrature points with counting data

Regardless of the user input, the number of quadrature points is overwritten with counting data due to this line

JMbayes2/R/jm.R

Line 259 in 5859d7e

con$GK_k <- 7L

Confidence interval for tvAUC

Hi Dimitris,
It would be super very to have the option to return not only the AUC by the tvAUC function, but also a 95% CI. Do you have any plans of adding this in the future? I am comparing the performance of two models, but it is quite difficult to judge the difference between the AUCs without knowing their precision. I guess I could do a bootstrap as the simplest solution, but wanted to hear your thoughts before trying that.
Cheers, Adam

Difference in estimates from longitudinal model for mixed_model() and output jm()

Dear Dimitris Rizopoulos,

I have noticed that the estimates for the longitudinal model fitted with mixed_model() are different from the ones in the output after jm().

Example from the reference manual:


# [1] Fit the mixed model using lme().
fm1 <- lme(fixed = log(serBilir) ~ year * sex + I(year^2) +
             age + prothrombin, random = ~ year | id, data = pbc2)
summary(fm1)

# [2] Fit a Cox model, specifying the baseline covariates to be included in the
# joint model.
fCox1 <- coxph(Surv(years, status2) ~ drug + age, data = pbc2.id)
# [3] The basic joint model is fitted using a call to jm() i.e.,
joint_model_fit_1 <- jm(fCox1, fm1, time_var = "year",
                        n_chains = 1L, n_iter = 11000L, n_burnin = 1000L
                        )
summary(joint_model_fit_1)

Questions:

I am assuming this difference is due to the frequentist versus Bayesian nature of the models. Which model estimates should I report? The ones from mixed_model() or the longitudinal outcome from the jm() output?
In the aforementioned example the estimates from both models are quite similar but in the model I am working on they differ wildly. For example I get an intercept of -7.85 with mixed_model() - which did converge- but an intercept of 102.35 in the output of the longitudinal outcome after fitting the joint model with jm(). Something similar happens for the effect of time. Could this be because the joint model did not converge and which possible approaches to take in that case? Or could this be due to something else?

I tried to keep my example as short as possible but I can of course provide more details. It's more a general question about the differences in estimates than my example per se.

Kind regards,

Els

tvROC output

This is a question on the interpretation of the output of the tvROC function in the dynamic predictions vignette. In the example the ROC output presents cut-offs, sensitivity, specificity, along with some other accuracy metrics.

In this example, those metrics are given for each cut-off threshold. In row 39, the asterisk indicates the optimal Youden index, i.e., giving equal weight to both sensitivity & specificity and maximising them both. What is the interpretation here, that the 'optimal' cut-off which maximises the model's discriminative ability to maximise true-positives and minimise false-negatives, is a risk threshold of 0.73 using longitudinal information up to 5 years, predicting at 8 years? Additionally, this cut-off column is the probability of event risk threshold and not the biomarker - is this correct?

Can you also explain the interpretation of the qSN, qSP, and qOverall indexes?

Thank you

what does pos_ylab_long mean ?

I try to control the position of the y-axis labels in the middle of the two y-axis, however, I don't know the parameters mean. I tried lots of different numbers, and it doesn't work.

Could you tell me what number should I use?

outcome 1: values limits 20 -100

outcome 2: values limits 0.2 - 10

Thanks very much

Question / Issue: Calculating conditional longitudinal predictions (when some subjects have missing data)

Thank you in advance!

Context:

I want to produce mean posterior residuals for the longitudinal response.
Conditional predictions using predict() do not match those I hand-calculate using: y^{pred} = X\beta + Zb
For some subjects, my longitudinal data table only has a missing value for the response. This is because some subjects have no data after baseline, and baseline is a predictor variable. I do not know whether this is complicating issues, but I mention it in case.

Problem:

Predictions from predict() do not match predictions that I hand-calculate as: y^{pred} = X\beta + Zb, where I am using the mean posterior estimates for b. I know that predict() is drawing from MCMC samples, but I think this approach should be okay for my purpose (finding mean posterior residuals). Am I missing something?
While I wonder whether predict() is correctly calculating predicted values, I have reached my limit in R coding as I try to trace the difference in methods.
I should note that the problem with my actual data/model is more extreme than in the attached MWE.

Other Issue:

Am I using the correct technique to fit the model given the missing responses?

I prep the data by taking all responses (including baseline), and then I replaced baseline responses with NAs.
In fitting the lme() submodel, I use na.action = na.exclude
This ensures that all subjects (with their predictors) appear in the longitudinal dataset, even though some have no observed longitudinal responses.

Again, thank you for your response.

MWE (R script saved as .txt):
MWE_predict.txt

Wide variation in jm`s performance

Dear Drizopoulos,

I am trying to investigate the wide performance variation for jm that a user of our HPC system reported.
The difference of time run for different environments are very big and, at least for me and other colleagues, it should not happen.

Below I list the time run for different tests I have done. After I give some details of the jm.

User laptop: Corei9 32GB - Reported that jm runs in 4h
HPC : Intel Xeon Platinum 48 cores (no Hyperthreading) - Slurm as job scheduler and allocate resources via cgroup
Test with 4 cores 64GB RAM - 22h
Test with 16 cores 128GB RAM - I killed after 22h of execution
Drained a node to run without slurm: Entire node of 48 cores and 762GB RAM - I killed after 24h (it was still running)

To try to simulate the user laptop, I have run in cloud as well:
r4 instance with 1 core and 32GB RAM - Finished in 5h
r4 instance with 2 cores (2 threads per core) - Finished in 17.8h

The run time is very different even for servers with similar configuration. Also, it seems when increase number of cores the runtime increase a lot. Even given an entire server it is taking long time to run comparing with user laptop. The Intel Xeon Platinum is a better processor than Core i9

I am not expert in R and in JM. My guess it is something like cost function that it varies according to each step of each execution.

The code is running is:
MCMC summary:
chains: 4
iterations per chain: 27500
burn-in per chain: 2500

pacman::p_load(JMbayes2, splines, nlme, survival, dplyr, install=FALSE)
E607 <- readRDS("objects/E607.rds")
CHHiP <- E607 %>%
select(trialno, PSAValue_tdc, Tmax, cens_endpt_NonCR0_1, tstart, phase , age, arm , ISUP_gleasGG,
clint_cat_2 , hthintd_2 ) #%>%
print(head(CHHiP))

ID_E607 <- readRDS("objects/ID_E607.rds")
SuC_01 <- readRDS("objects/SuC_01.rds")
LME_full_ME_noPhase_pdDiag_ML <- readRDS("objects/LME_full_ME_noPhase_pdDiag_ML.rds")

(fForms <- readRDS("objects/fForms.rds"))

JMB2_tstart5_v0.23_GitHub <- jm(Surv_object = SuC_01, Mixed_objects = LME_full_ME_noPhase_pdDiag_ML,
time_var = "tstart", functional_forms = fForms, n_iter = 27500L,
n_burnin = 2500L, n_chains = 4L)

saveRDS(JMB2_tstart5_v0.23_GitHub,
"objects/JMB2_tstart5_v0.23_GitHub.rds")

The same code was run in different servers listed above.
The data and code was in a local fast NVMe storage and was executed with command below:
Rscript --vanilla JMB2_tstart5_v0.23_GitHub.r --tmp=./

R and library versions:
R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)

attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] dplyr_1.0.8 JMbayes2_0.2-0 GLMMadaptive_0.8-5 nlme_3.1-155
[5] survival_3.2-13

loaded via a namespace (and not attached):
[1] Rcpp_1.0.8 magrittr_2.0.3 MASS_7.3-55 tidyselect_1.1.2
[5] munsell_0.5.0 colorspace_2.0-2 lattice_0.20-45 R6_2.5.1
[9] rlang_1.0.2 fansi_1.0.3 parallel_4.1.0 grid_4.1.0
[13] gtable_0.3.0 pacman_0.5.1 utf8_1.2.2 cli_3.2.0
[17] coda_0.19-4 ellipsis_0.3.2 matrixStats_0.61.0 tibble_3.1.6
[21] lifecycle_1.0.1 crayon_1.5.1 Matrix_1.4-0 gridExtra_2.3
[25] purrr_0.3.4 ggplot2_3.3.5 vctrs_0.4.0 glue_1.6.2
[29] compiler_4.1.0 pillar_1.7.0 generics_0.1.2 scales_1.1.1
[33] pkgconfig_2.0.3

Any information why it has a huge difference in runtime even using better CPU and more cores is much appreciate.
I know some libraries are designed for specific CPU, number of cores etc. I am not sure about jm.

Kind Regards
Jader

Error when save_random_effect = TRUE and missing longitudinal data

First of all... JMBayes2 is awesome. Thank you! It is really useful!

Issue:
If some subjects have no data for the longitudinal response, I can still fit a model in JMBayes2.
However, if I also specify save_random_effects = TRUE, then jm() returns the following error (see MWE below):

Error in dim(x) <- sapply(dnames_b, length) :
dims [product 1555] do not match the length of object [1560]

To create a MWE, I modified the code from the competing risks vignette (https://drizopoulos.github.io/JMbayes2/articles/Competing_Risks.html) by replacing longitudinal responses for Subject 1 with NA. Then, in the longitudinal models, I set na.action to na.exclude.

pb_exclude <- pbc2 %>% mutate(prothrombin = replace(prothrombin, id == 1, NA),
serBilir = replace(serBilir, id == 1, NA))

fm1_exclude <- lme(log(serBilir) ~ poly(year, 2) * drug,
data = pb_exclude,
random = ~ poly(year, 2) | id,
na.action = na.exclude)
fm2_exclude <- lme(prothrombin ~ year * drug,
data = pb_exclude,
random = ~ year | id,
na.action = na.exclude)

jFit_CR <- jm(CoxFit_CR, list(fm1_exclude, fm2_exclude), time_var = "year",
functional_forms = CR_forms,
control = list(n_iter = 250L,
n_burnin = 150L,
n_thin = 1L,
save_random_effects = TRUE))

Note: the model fits fine if save_random_effects = FALSE.

jm() with only random intercept possible?

Dear Dimitris Rizopoulos,

I have noticed that mixed_model() with only a random intercept returns an error: jm() does not currently work when you have a single longitudinal outcome and only random intercepts.

However, in the jm() function there is the argument zero_ind: a list with integer vectors indicating which coefficients are set to zero in the calculation of the value term. This can be used to include for example only the random intercept; default is NULL.

Could you give an example of how to include only the random intercept term?

Many thanks in advance.

Kind regards,

Els

jm iteration progress bar

Is there scope to add a progress bar on the fitting of jm, as there was within JMbayes::jointModelBayes(..., control=list(verbose=TRUE)), or a way to do this on the fly or see/print how many MCMC iterations have been performed at periodic intervals?

Thanks, Harry

Sum of cumulative risk more than one for dynamic prediction.

Hello!
I have tried to experiment with your newly added dynamic prediction for competing risks and have results I do not understand. When extending time of prediction to 25 in the example provided the sum of cumulative risk adds up to more than one.
Here is the code I use:
predLong <- predict(jFit_CR, newdata = ND, return_newdata = TRUE, times = seq(6.5, 25, length = 25))
predEvent <- predict(jFit_CR, newdata = ND, return_newdata = TRUE, process = "event", times = seq(6.5, 25, length = 25))
plot(predLong, predEvent, outcomes = 1:2, pos_ylab_long = c(0.1, 20))

sessioninfo:
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

other attached packages:
[1] JMbayes2_0.2-0 GLMMadaptive_0.8-2 nlme_3.1-151 survival_3.1-12

Best regards
Anders Norvik

survival no events in the interval

Hello
I am very interested in your package to look at biomarker(s) associated with death. I have a cohort of participants who all died after 50 and 360 days of follow-up. For those,I have ~50 markers with heterogeneity in sampling time.
I was able to run your joint model as described in https://drizopoulos.github.io/JMbayes2/articles/Dynamic_Predictions.html
but I fail running the survival analyses:

roc <- tvROC(jointFit, newdata = mydata, Tstart = t0, Dt = 60, cores = 1L)

where t0= 300 days (some participants died after 330-360 days) . I tried with multiple t0 and Dt...

error message
rror in tvROC.jm(jointFit, newdata = marker_df_wide2, Tstart = t0, Dt = 60, :
it seems that there are no events in the interval [Tstart, Thoriz).

I would be happy to share the data via mp if you are interested
thank you!

Errors in tvROC

Hi
I tried to evaluate my model using tvROC, and it shows the following error message.

fit_lme1 <- lme(log(TOTCHL) ~ Time + log(AGE),random = ~ Time | ID_d, data = data.train)
fit_lme2 <- mixed_model(SMOKER ~ Time + log(AGE), random = ~ Time | ID_d, data = data.train,family = binomial())
fit_lme3 <- mixed_model(HXDIAB ~ Time + log(AGE), random = ~ Time | ID_d, data = data.train, family = binomial())
fit_lme4 <- lme( log(HDLCHL) ~ Time + log(AGE), random = ~ Time | ID_d, data = data.train )
fit_lme5 <- lme( log(SBP) ~ Time + log(AGE) + I(RXHYP), random = ~ Time | ID_d, data = data.train)
fit_cox <- coxph(Surv(LENYASCVD, ASCVD) ~ log(AGE) , data = data.train )
fit_jm_wm <- jm(fit_cox, list(fit_lme1, fit_lme2, fit_lme3, fit_lme4, fit_lme5), time_var = "Time", n_chains = 1L)
> tvROC(fit_jm_wm, newdata = data.test, Tstart = 5, Dt = 10, cores = 1L)
Time-dependent Sensitivity and Specificity for the Joint Model fit_jm_wm At time: 15 Using information up to time: 5 (422 subjects still at risk) Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

> head(data.train )
ID_d LENYASCVD ASCVD AGE Time RXHYP HXDIAB SMOKER TOTCHL HDLCHL SBP
1 4 16.766667 0 46 0.0 0 0 1 206.0013 39.52000 145.3333
2 4 16.766667 0 46 2.8 0 1 1 188.0012 47.00029 149.3333
3 4 16.766667 0 46 6.2 0 1 1 210.0000 40.00000 136.0000
4 7 7.666667 1 61 0.0 0 0 0 179.0011 39.52000 184.0000
5 7 7.666667 1 61 2.7 0 0 0 177.0011 44.00027 162.0000
6 7 7.666667 1 61 7.1 0 0 0 163.0000 36.00000 194.0000

> head(data.test)
ID_d LENYASCVD ASCVD AGE Time RXHYP HXDIAB SMOKER TOTCHL HDLCHL SBP
7 23 6.666667 0 63 0.0 0 0 1 172.0011 54.92800 113.3333
8 23 6.666667 0 63 2.7 0 0 1 150.0009 51.00031 106.6667
9 23 6.666667 0 63 6.0 0 0 1 155.0000 75.00000 128.0000
15 37 11.000000 1 62 0.0 1 0 0 158.0010 36.63100 116.0000
16 37 11.000000 1 62 2.7 1 0 1 168.0010 47.00029 136.0000
17 37 11.000000 1 62 5.8 1 0 1 177.0000 57.00000 238.6667

Using random slopes but with fixed intercept

Hi! I was wondering what happens behind the scenes when running a model with random slopes but fixed intercept? The results between {JM} and {JMbayes2} are quite different in the example below using the pbc2 data:

set.seed(202207151)
library(JM)
#> Loading required package: MASS
#> Loading required package: nlme
#> Loading required package: splines
#> Loading required package: survival
library(JMbayes2)
#> Loading required package: GLMMadaptive
#> 
#> Attaching package: 'GLMMadaptive'
#> The following object is masked from 'package:MASS':
#> 
#>     negative.binomial
#> 
#> Attaching package: 'JMbayes2'
#> The following object is masked from 'package:MASS':
#> 
#>     area

# Fit submodel with random slopes but fixed intercept
lmeFit <- lme(log(serBilir) ~ ns(year, 2) * drug, random = ~ 0 + ns(year, 2) | id, data = pbc2)
coxFit <- coxph(Surv(years, status2) ~ drug, data = pbc2.id, x = TRUE)

# JM fit
pbc_JM <- jointModel(
  lmeObject = lmeFit,
  survObject = coxFit,
  method = "piecewise-PH-aGH",
  timeVar = "year",
  lng.in.kn = 10L
)

# JMbayes2 fit
pbc_JMbayes2 <- jm(
  Mixed_objects = lmeFit,
  Surv_object = coxFit,
  time_var = "year",
  Bsplines_degree = 0L,
  knots = list(c(0, pbc_JM$control$knots, max(pbc2.id$years) + 1))
)

# Compare coefficients
cbind.data.frame(
  "JM" = coef(pbc_JM, "Event"),
  "JMbayes2" = c(
    coef(pbc_JMbayes2)$gammas,
    coef(pbc_JMbayes2)$association
  )
)
#>                       JM   JMbayes2
#> drugD-penicil 0.01240778 -0.0389509
#> Assoct        1.34275739  0.7221391

^{Created on 2022-07-15 by the reprex package (v2.0.1)}

The coefficients for the survival submodels are different, but so to are the variance-covariance matrices of the random effects (despite being on same dimensions). I also used splines the example above because using linear random slopes and fixed intercept yields an error:

# Using just linear random slope
update(
  object = pbc_JMbayes2,
  Mixed_objects = update(lmeFit, fixed. = . ~ year * drug, random = ~ 0 + year | id)
)
#> Error in jm(Surv_object = coxFit, Mixed_objects = update(lmeFit, fixed. = . ~ : jm() does not currently work when you have a single longitudinal outcome and only random intercepts.

Note I get the same results as model pbc_JM using rstanarm::stan_jm() as below:

library(rstanarm)
options(mc.cores = 3)

pbc_stan <- stan_jm(
  formulaLong = log(serBilir) ~ ns(year, 2) * drug + (0 + ns(year, 2) | id),
  dataLong = pbc2,
  formulaEvent = survival::Surv(years, status2) ~ drug,
  dataEvent = pbc2.id,
  time_var = "year",
  basehaz = "piecewise",
  basehaz_ops = list("knots" = pbc_JM$control$knots),
  chains = 3,
  cores = 3
)

Thanks again in advance!

question JMbayes2 package

Hi drizopoulos
Should the JMbayes2 package have two longitudinal variables?
The following codes, will their output be wrong?
modelcox<- coxph(Surv(f.time,ckd)gender+age+tg+fbs,data=data.wide,model=T)
fil.lm<-lme(TSHtime,random=~time|id,data=dataLong)
ffm<-list("TSH"=~value(TSH)+slope(TSH))
fit.jm<-jm(modelcox,fit.lm,functional_forms=ffm,time_var="time",n_iter=10000)

thanks

Error in checkForRemoteErrors(val) : 19 nodes produced errors

I am trying to fit the joint model with a large dataset with more than 130k datalines and I met the following error:
Error in checkForRemoteErrors(val) :
19 nodes produced errors; first error: addition: incompatible matrix dimensions: 18620x1 and 20501x1
Calls: jm ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Execution halted

Do you know how I can solve the problem? Thank you so much!

Can not install the Package ("JMbayes")

Unable to install the package("JMbayes") in R 4.1.3 on Mac OS 12.3 or Win 11.
Error: package or namespace load failed for 'jmbayes': When 'rjags' is counted in loadnamespace() Onload failed.

Possibly ID bug

Hi. I think I found a small bug that occurs when the longitudinal grouping variable is named anything other than id. I made a reproducible example:

library(JMbayes)
library(JMbayes2)

#lets imagine the ID variable is names something else - i.e. "UID"
pbc2$UID <- pbc2$id
# remove originals
pbc2$id <- NULL

lmefit <- lme(fixed = log(serBilir) ~ year * sex, random =  ~ year | UID, data = pbc2)
coxfit <- coxph(Surv(years, status2) ~ drug + age, data = pbc2.id, x = TRUE, model = TRUE)

# JMBayes runs fine
#jm1 <- jointModelBayes(lmefit, coxfit, timeVar = "year")
# JMbayes2 gets confused
jm2 <- jm(coxfit, lmefit, time_var = "year")

Output:

Error when running 'predict()' according to the example in competing risk vignette

Thank you for the great work/article published on 2021-12-19.

When running the example according to this article,

predLong <- predict(jFit_CR, newdata = ND, return_newdata = TRUE,
                    times = seq(6.5, 15, length = 25))
predEvent <- predict(jFit_CR, newdata = ND, return_newdata = TRUE,
                     process = "event")

It shows Error in split.default(x, g) : first argument must be a vector

According to the discussion in JMbayes, performing dynamic predictions from a competing risk mvJMbayes object is not supported, I am very excited that it is implemented in JMbayes2, it would be so wonderful to run through the example code and apply it with other data later.

BTW, JMbayes2 version I used is 0.1.8.
Thank you very much for your time.

function 'Rcpp_precious_remove' not provided by package 'Rcpp'

Dear @drizopoulos
Thanks for your JMbayes2.
I updated to 0.1-7, however, there is a error:

jointFit1 <- jm(CoxFit, fm1, time_var = "year")

Error in logLik_jm(thetas, model_data, model_info, control) : 
  function 'Rcpp_precious_remove' not provided by package 'Rcpp'

I also re-install Rcpp, but still error.
Best,
Jian-Guo

non-linear mixed models for longitudinal data?

Hi Dimitris and team,

first of all thanks for sharing this nice package with the community - the interface looks really elegant and flexible!

One question: We are often interested in fitting parametric non-linear models for the longitudinal data (reasons are better interpretability, extrapolation, etc.) - any chance that the nlme() function could be used too in the end in JMbayes2 and not just lme()?

Thanks for your thoughts,
cheers
Daniel

Missing values and NaN's not allowed if 'na.rm' is FALSE

I tried to run the joint model with package "JMbayes2" and it returned this error "missing values and NaN's not allowed if 'na.rm' is FALSE
Calls: jm ... summary.mcmc.list -> t -> apply -> FUN -> quantile.default
Execution halted".
But I do not have any missing value in my data set. I also tried other datasets and they run well. I was wondering what will hit the error without having missing data. Thank you so much!

Suggestion for input check

Dear @drizopoulos ,

The code implicitly assumes that the data set for the longitudinal model is already ordered by idVar and time_var. This is relevant when the data set contains missing values as in the following example:

# Cox model for the composite event death or transplantation
pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)

# a linear mixed model for log serum cholesterol with random ordering
set.seed(123)
pbc2_ro <- pbc2[sample(seq_len(nrow(pbc2)), nrow(pbc2)), ]
fm1 <- lme(log(serChol) ~ year * sex, data = pbc2_ro, random = ~ year | id,
           na.action = na.omit)

# the joint model throws an error
jointFit <- jm(CoxFit, list(fm1), time_var = "year",
               n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)

Error in mat[id, ] <- b :
number of items to replace is not a multiple of replacement length

The information which observations to omit comes from the longitudinal model (the attribute provides unordered info) and is applied to the ordered data in function jm. This messes up the idVar levels and leads to an incorrect reduction in unq_idL:

JMbayes2/R/jm.R

Line 68 in 7ae452e

dataL <- dataL[order(idL, dataL[[time_var]]), ]

JMbayes2/R/jm.R

Line 85 in 7ae452e

NAs_FE_dataL <- lapply(mf_FE_dataL, attr, "na.action")

JMbayes2/R/jm.R

Lines 113 to 124 in 7ae452e

    
           idL <- mapply2(exclude_NAs, NAs_FE_dataL, NAs_RE_dataL, 
        
                          MoreArgs = list(id = idL)) 
        
           idL[] <- lapply(idL, match, table = unq_id) 
        
           # the index variable idL_lp is to be used to subset the random effects of each outcome 
        
           # such that to calculate the Zb part of the model as rowSums(Z * b[idL_lp, ]). This 
        
           # means that for outcomes that miss some subjects, we recode the id variable from 1 
        
           # until n', where n' is the number of subjects available for the respective outcome 
        
           idL_lp <- lapply(idL, function (x) match(x, unique(x))) 
        
           # the unique values of idL is to be used in specifying which subjects have which outcomes 
        
           # this is relevant in the calculation of the log density / probability mass function 
        
           # for the longitudinal outcomes 
        
           unq_idL <- lapply(idL, unique)

Maybe you could include a more explicit check of this condition and a telling error message?
Best, Alex

LONGITUDINAL PROCESS AND RECURRENT PROCESS ASSOCIATION STRUCTURE

Does the package allow to add in the longitudinal part of the model the shared frailty of the recurrent event process?
Thank you

Error in plot.window: need finite 'ylim' values

Thank you for this great package.

I encountered the following error when the plot function is called with a prediction that includes a biomarker with NAs:

Error in plot.window(...) : need finite 'ylim' values

Example:

fm1 <- lme(log(serBilir) ~ ns(year, 3) * sex, data = pbc2,
           random = ~ ns(year, 3) | id, control = lmeControl(opt = 'optim'))
fm4 <- lme(log(serChol) ~ ns(year, 2) * sex, data = pbc2, na.action = na.exclude,
           random = ~ ns(year, 2) | id, control = lmeControl(opt = 'optim'))


jointFit <- jm(CoxFit, list(fm1, fm4), time_var = "year")

t0 <- 5
ND <- pbc2[pbc2$id %in% c(25, 93), ]
ND <- ND[ND$year < t0, ]
ND$status2 <- 0
ND$years <- t0

predLong2 <- predict(jointFit, newdata = ND,
                     times = seq(t0, 12, length.out = 51),
                     return_newdata = TRUE)

predSurv <- predict(jointFit, newdata = ND, process = "event",
                    times = seq(t0, 12, length.out = 51),
                    return_newdata = TRUE)

cols <- c('#F25C78', '#D973B5')
plot(predLong2, predSurv, outcomes = 1:2, subject = 93,
     fun_long = list(exp, exp),
     fun_event = function (x) 1 - x,
     ylab_event = "Survival Probabilities",
     ylab_long = c("Serum Bilirubin", "Serum Chol"),
     bg = '#132743', col_points = cols, col_line_long = cols,
     col_line_event = '#F7F7FF', col_axis = "white", 
     fill_CI_long = c("#F25C7880", "#D973B580"),
     fill_CI_event = "#F7F7FF80",
     pos_ylab_long = c(1.9, 1.9))

Thank you very much for your time.

Interaction beween longitudinal outcomes

Hello,

Many thanks for such an amazing package and the way you bring people to understand.

My question is about interaction between splines. I want to fit a bivariate Joint model with 2 longitudinal continuous outcomes that I will model with splines, each. How can I introduce an interaction term between these 2 ouctomes? Should I consider a bivariate linear mixed model with interaction prior to build a Joint model?

Many thanks for reading,

Best,

Lisa

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) AND Error in quantile.default(newX[, i], ...)

Dear Prof Rizopoulos,

I have been reading and following the development of the joint model package that you have created with great enthusiasm. First, please allow me to say thank you for the work that you have done. I am a PhD student researching the application of the joint model in predicting end-stage kidney disease using clinicopathological data with death as a competing risk.

I have reviewed JMbayes2 vignette closely and modified my dataset and codes to "mimic" the tutorial

library preparation

libraries <- c("tidyverse", 'data.table', 'skimr', "mice", "JMbayes2")
installed <- installed.packages()[,'Package']
new.libs <- libraries[!(libraries %in% installed)]
if(length(new.libs)) install.packages(new.libs,repos="http://cran.csiro.au",
dependencies=TRUE)
lib.ok <- sapply(libraries, require, character.only=TRUE)

survival model

train1_dtCR <- train1_dt %>%
select(-time_eskd, -status_eskd) %>%
mutate(status_compete = factor(status_compete, levels = c(0, 1, 2),
labels = c("censored", "eskd", "death"))) %>%
JMbayes2::crisk_setup(data = .,
statusVar = "status_compete",
censLevel = "censored",
nameStrata = "CR")

jm_surv_model <- coxph(Surv(time = time_compete, event = status2) ~
(sex + age_init + gn + gn*age_init)*strata(CR),
data = train1_dtCR)
summary(jm_surv_model)

mixed model

albcreat_model <- lme(albcreat_ratio ~ relyear + I(relyear^2) +
I(relyear^3) + age_init + gn + relyear:age_init +
relyear:gn, random = ~relyear|id,
data = longtrain1_dt,
control = list(maxIter = 100, opt = "optim"))

albumin_model <- lme(albumin ~ relyear + I(relyear^2)+ age_init + gn +
relyear:age_init + relyear:gn + sex + relyear:sex,
random = ~relyear|id, data = longtrain1_dt,
control = list(maxIter = 100, opt = "optim"))

adjca_model <- lme(adjca ~ relyear + age_init + gn +
relyear:age_init + relyear:gn + sex + relyear:sex,
random = ~relyear|id, data = longtrain1_dt,
control = list(maxIter = 100, opt = "optim"))

phosphate_model <- lme(phosphate ~ relyear + I(relyear^2)+ age_init + gn +
relyear:age_init + relyear:gn + sex + relyear:sex,
random = ~relyear|id, data = longtrain1_dt,
control = list(maxIter = 100, opt = "optim"))

bicarb_model <- lme(bicarb ~ relyear + I(relyear^2)+ age_init + gn +
relyear:age_init + relyear:gn + sex + relyear:sex,
random = ~relyear|id, data = longtrain1_dt,
control = list(maxIter = 100, opt = "optim"))

egfr_model <- lme(egfr ~ relyear + I(relyear^2) + age_init + relyear:age_init +
sex + relyear:sex, random = ~relyear|id,
data = longtrain1_dt,
control = list(maxIter = 100, opt = "optim"))

joint model

base_CR_forms <- list(
"albumin" = ~ value(albumin):CR,
"albcreat_ratio" = ~ value(albcreat_ratio):CR,
"bicarb" = ~ value(bicarb):CR,
"adjca" = ~ value(adjca):CR,
"egfr" = ~ value(egfr):CR,
"phosphate" = ~ value(phosphate):CR)

joint_model_tangri <- jm(Surv_object = jm_surv_model,
Mixed_objects = list(albumin_model, bicarb_model, adjca_model,
egfr_model, phosphate_model),
time_var = "relyear",
functional_forms = base_CR_forms,
n_iter = 25000L, n_burnin = 5000L, n_thin = 5L)

After approximately 12 hours of analysis, I got these error messages

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Error in quantile.default(newX[, i], ...) :
missing values and NaN's not allowed if 'na.rm' is FALSE

Reviewing stack overflow, Rstudio community, and Google:

There is no NAs or NAN or infinite in my dataset (both short - id version and long).
In one of the Rstudio community, a similar issue occurs in another package which has been resolved after adjusting the order of the libraries - JMbayes2 is the last library attached.
I saw a similar issue with JMbayes - the error remains even after I installed your GitHub version.

Thank you in advance for your help and consideration.

Sincerely,
Daniel Christiadi

	if (direction_i == "both") {
	if (is.list(time)) {
	t1 <- lapply(time, function (t) t + eps_i)
	t2 <- lapply(time, function (t) pmax(t - eps_i, 0))
	} else {
	t1 <- time + eps_i
	t2 <- pmax(time - eps_i, 0)
	}
	e <- 2 * eps_i
	} else {
	t1 <- time
	if (is.list(time)) {
	t2 <- lapply(time, function (t) pmax(t - eps_i, 0))
	} else {
	t2 <- pmax(time - eps_i, 0)
	}
	e <- eps_i
	}

	idL <- mapply2(exclude_NAs, NAs_FE_dataL, NAs_RE_dataL,
	MoreArgs = list(id = idL))
	idL[] <- lapply(idL, match, table = unq_id)
	# the index variable idL_lp is to be used to subset the random effects of each outcome
	# such that to calculate the Zb part of the model as rowSums(Z * b[idL_lp, ]). This
	# means that for outcomes that miss some subjects, we recode the id variable from 1
	# until n', where n' is the number of subjects available for the respective outcome
	idL_lp <- lapply(idL, function (x) match(x, unique(x)))
	# the unique values of idL is to be used in specifying which subjects have which outcomes
	# this is relevant in the calculation of the log density / probability mass function
	# for the longitudinal outcomes
	unq_idL <- lapply(idL, unique)

drizopoulos / jmbayes2 Goto Github PK

jmbayes2's Introduction

JMbayes2: Extended Joint Models for Longitudinal and Time-to-Event Data

Installation

Minimal Example

jmbayes2's People

Contributors

Stargazers

Watchers

Forkers

jmbayes2's Issues

1. Piecewise constant baseline hazard (with 2 knots)

2. Piecewise linear baseline hazard (with 2 knots)

3. Piecewise quadratic baseline hazard (with 2 knots)

library preparation

survival model

mixed model

joint model

Recommend Projects

Recommend Topics

Recommend Org