cecileproust-lima / lcmm Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 13.0 25.12 MB

R package lcmm

Home Page: https://CecileProust-Lima.github.io/lcmm/

R 65.20% Fortran 34.28% C 0.51%

lcmm's People

Contributors

Stargazers

Watchers

Forkers

sambrilleman marywaterloo cgpu jrmie yuanfang90 caomengqi anarouanet m-macaskill sfcheung bettinagruen zhangkaicr ntutti itayhadas

lcmm's Issues

Using LMER fixed effects in LCMM

Hello Dr. Proust-Lima and lcmm-team,

I am building a LCMM model, and trying to understand how to select the covariates for the model. Is it feasible to first use LMER with Anova (add critical parameter, measure loglikelihood between models, retain if significant) to select the covariates, and then use the same covariates for constructing the LCMM model?

I would be very grateful for any advice or links on covariate selection. The examples that I find online do not go into detail on how the covariates are selected, but are focused on the definition and usage of the covariates. Thank you.

Regards,
Nikhil.

Interpretation of output - # parameters, df for max. log-likelihood, best goodness of fit statistic

Dear Dr. Proust-Lima,

We are currently trying to classify participants on daily mood questionnaires using lcmm (thank you so much for creating such a helpful tool for analysis).

As we are relatively new to this type of analysis, we had a few quick questions regarding our output:

Is there a way to find out the number of degrees of freedom that the model used? Or what is the best way to interpret the significance of maximum log-likelihood statistic?
What do the “number of parameters” refer to in the statistical model output? (Is this in any way related to the number of levels in our dependent categorical variable?)
Is there any methodological best practice for which Goodness-of-fit statistic to use for choosing the appropriate number of classes per model (i.e. AIC vs BIC vs Log-likelihood)?

Thank you in advance!

Unable to specify B vector in hlme using certain sampling functions

In the following code, specifying B using certain commands throws the error The model specified in B should be of class hlme. The B vectors all have the same length (6) and are of class numeric

It is particularly strange as rnorm(6) and rnorm(6,0,1) have the same output. Not sure what to do about this.

Thank you in advance!

library(lcmm)

#
# Create data
#
eps <- 1.0

t <- seq(1.0, 10.0, 1.0)
y1 <- 2*t + 3 + rnorm(length(t), 0.0, eps)
y2 <- -t + 1 + rnorm(length(t), 0.0, eps)

df <- data.frame("t" = t,
                 "y" = c(y1, y2),
                 "patient" = c(rep("1", length(t)), rep("2", length(t))))

#
# Train a gmm without specifying initial parameters B
#
gmm.1 <- hlme(fixed = y~t, mixture=~-1+t, random=~1,
            data = df,
            subject = "patient",
            ng = 2)

#
# For the same model, can specify B using rep
#
gmm.2 <- hlme(fixed = y~t, mixture=~-1+t, random=~1,
                       data = df,
                       subject = "patient",
                       B = rep(1.0, 6),
                       ng = 2)

#
# Can specify B using rnorm if mean and variance are specified
#
gmm.3 <- hlme(fixed = y~t, mixture=~-1+t, random=~1,
              data = df,
              subject = "patient",
              B = rnorm(6, 0, 1),
              ng = 2)

#
# Can't specify B using rnorm if mean and variance are not specified.
# Throws error "The model specified in B should be of class hlme"
#
gmm.4 <- hlme(fixed = y~t, mixture=~-1+t, random=~1,
              data = df,
              subject = "patient",
              B = rnorm(6),
              ng = 2)


#
# Can't specify B using runif
# Throws error "The model specified in B should be of class hlme"
#
gmm.5 <- hlme(fixed = y~t, mixture=~-1+t, random=~1,
              data = df,
              subject = "patient",
              B = runif(6),
              ng = 2)

Sessioninfo:

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] lcmm_1.7.8 survival_2.41-3

loaded via a namespace (and not attached):
[1] compiler_3.4.1 Matrix_1.2-11 tools_3.4.1 splines_3.4.1 grid_3.4.1 lattice_0.20-35

How is lcmm different from latent growth curve modeling?

Dear Professor Proust-Lima,

I wonder how is lcmm different from latent growth curve modeling and what are the advantages of lcmm over latent growth curve modeling for analyzing longitudinal ordinal data?

Thanks,
Mary

Generation of random starting values

Dear Dr Proust-Lima,

I’ve been working with the lcmm package to derive latent class trajectories and your package is great and has been very useful and well-used! However, I have a question for which I could not find an answer in the documentation of the package.

I would like to know more about the process behind the generation of random starting values.

I cannot locate the function random() within base R or within your lcmm package which seems to be used within the gridsearch function, in the line

mc$B <- substitute(random(minit), environment())

An extract within the hlme function code that I found is:

#gestion de B=random(mod)
        Brandom <- FALSE
        if(length(cl$B)==2)            {
                if(class(eval(cl$B[[2]]))!="hlme")
 stop("The model specified in B should be of class hlme")
                if(as.character(cl$B[1])!="random") 
stop("Please use random() to specify random initial values”) .......

Also I have read the reference provided of Biernacki et al., (2003) for variants of the EM algorithm and still it is not much clearer to me, how this is implemented, and accompanying vignette discusses the iterative Marqardt algorithm for parameter estimation - pithing the Newton-Raphson family.

I’m not sure my question is clear enough. I can try to make it clearer if needed of course.

Thanks in advance for your help.

Best regards
Hannah

Suggestion for parallelized gridsearch

Dear Cécile, dear Viviane,

The lcmm package is great. Thanks for your work!

Lately, I have been working a lot with jointlcmm models. Fitting more complex joint models is computationally very expensive, and performing a grid search can take much time. As I get impatient easily, I want to suggest a parallelized version of gridsearch, which can reduce the waiting time quite a bit (depending on the available hardware).

The proposed gridsearch.parallel function call does not differ from that of gridsearch except for the cl argument expecting a cluster created using parallel::makeCluster.

gridsearch.parallel <- function(m,rep,maxiter,minit,cl=NULL)
{
  if(!is.null(cl)){
    clusterSetRNGStream(cl)
    mc <- match.call()$m
    mc$maxiter <- maxiter
    assign("minit",eval(minit))
    
    clusterCall(cl, function () require(lcmm))
    clusterExport(cl, list("mc", "maxiter", "minit", as.character(as.list(mc[-1])$data)), envir = environment())
    
    cat("Be patient, grid search is running ...\n")
    
    models <- parLapply(cl, 1:rep, function(X)
    {
      mc$B <- substitute(random(minit),parent.frame(n=2))
      return(do.call(as.character(mc[[1]]),as.list(mc[-1])))
    }
    )
    
    llmodels <- sapply(models,function(x){return(x$loglik)})
    
    kmax <- which.max(llmodels)
    mc$B <- models[[kmax]]$best
    mc$maxiter <- NULL
    
    cat("Search completed, performing final estimation\n")
    
    return(do.call(as.character(mc[[1]]),as.list(mc[-1])))
  }
  return(do.call(gridsearch, as.list(match.call()[2:5])))
}

Evaluation

Considering the example form the gridsearch man page, but using 250 replications instead of 50. The execution time halved using all cores on a machine with four physical cores (average increase in speed by factor 2.1, 1.9, and 1.6 using four, three or two cores, compared to the non-parallelized gridsearch; see below).

m1 <- hlme(Y ~ Time * X1, random =~ Time, subject = 'ID', ng = 1, 
           data = data_hlme)

library(parallel)

times <- matrix(NA, 10, 5)
for(i in 1:10){
  times[i,1] <- system.time(
    gridsearch(rep = 250, maxiter = 10, minit = m1, hlme(Y ~ Time * X1,
      mixture =~ Time, random =~ Time, classmb =~ X2 + X3, subject = 'ID',
      ng = 2, data = data_hlme))
  )[3]
  for(j in 1:4){
      times[i,j+1] <- system.time({
        cl <- makeCluster(j)
        gridsearch.parallel(rep = 250, maxiter = 10, minit = m1, hlme(Y ~ Time * X1,
          mixture =~ Time, random =~ Time, classmb =~ X2 + X3, subject = 'ID',
          ng = 2, data = data_hlme), cl=cl)
        stopCluster(cl)
      })[3]
  }
}

As there is some overhead for the initialization of clusters, I would expect the relative increase in speed for computational more expensive models to be higher than for the presented example.

Best,
Raphael

categorical or numeric time points and LSmeans

Dear Prof. Cécile and your team,

I am using LCMM package to classify the patients based on their trajectory patterns. The outcome is continuous variable and we do not want to consider any covariate except the time (week: 0,1,2,4,8,12,16,20,24). The question is " does the week in our model can considered as numeric or catergorical ?"

lcmm.fit<-hlme(outcome~week,random=~week,mixture=~week,subject="ID",data=dataset,ng=3)

Another question is could we report the LSmeans at each time point for each identified subgroups? In MMRM method, we have the option to report the LSmean but MMRM consider the time point as categorical data, so I wonder if this LCMM can have the same option to report the LSmean? Thanks a lot.

Best wishes,
Bochao Jia

convergence and test

Hello,
i am trying to use your package, "lcmm" in R and will appreciate any help here.

I find something strange.
I started with a simple random effect model with one group/cluster:

yt1+t2+t3 where
t1=time, t2=t^2 and t3=t^3.
as follows:
hlme(yt1+t2+t3,random=~t1+t2+t3,subject='id',ng=1,data=d)

and the model converges. But when I try it with the procedure lme in "nlme" package or lmer in "lme4" package the model does not converge.

In addition when I compare this model to a simpler model y~t1+t2 I get:
G loglik npm BIC %class1
fit.g1.mlt.quad 1 -6405.571 10 12861.96 100
fit.g1.mlt.cub 1 -6389.833 15 12855.89 100

so the LRT would advocate that the cubic model is preferred (t3!=0) but when I use the summary of the cubic model I find it is not significant:
coef Se Wald p-value
int 37.03277 1.98134 18.691 0.00000
t 1.34681 1.09698 1.228 0.21954
t2 -0.19229 0.16652 -1.155 0.24819
t3 0.00799 0.00779 1.026 0.30498

How come?
Do you understand why?

Thank you!
Ronen

can not fix the parameters in variance-covariance matrix

It seems that we can use the argument "posfix" to fix some parameters at a specified value, but when the variance-covariance matrix is NON-diagonal this argument does not work (the parameters will still be estimated in the matrix). Could you please let me know what to do in this case?

Use predictions from a joint model externally

Dear Professor Proust-Lima,

Is there a possibility to use a model, fitted on a training set (lets say training_set), outside of the global workspace where it has been fitted in. More specifically, I would like to apply my model mod on a test set testing_set which contains data similar to that of the training set but which is located somewhere else (lets say a computer/workspace of a colleague)..

So the question is whether it is possible to compute predictions at another workspace just by transferring the fitted model and run it on a test set in another workspace that does not contain the original training set.

Ive tried to experiment with this but unfortunately I received the logical error message which says:
Error in eval(model$call$data) : object 'training_set' not found
after using this piece of code:
predicted <- dynpred(mod, testing_set, landmark = c(unique(testing_set$week)), var.time = "week", horizon = c(1:12), draws = F)

Thanks in advance,
OJ

No standard errors in summary

Often, with my data/models, I'm getting estimated coefficients but no standard errors in the summary output. Now and then I find a variation where I get them, but I can't work out what's going on. Any thoughts about why the se column would be empty?

Example of my call and output:

lcmm(fixed = levelf ~ schoolf * (adj.qtr.ctr + adj.qtr.ctr.2), 
    random = ~adj.qtr.ctr + adj.qtr.ctr.2, subject = "nlearnid", 
    ng = 1, link = "2-quant-splines", cor = AR(adj.qtr.ctr), 
    data = mins2[mins2$epa == "EPA 1", ], verbose = TRUE)
 
Statistical Model: 
     Dataset: mins2[mins2$epa == "EPA 1", ] 
     Number of subjects: 32 
     Number of observations: 136 
     Number of latent classes: 1 
     Number of parameters: 23  
     Link function: Quadratic I-splines with nodes  
2 7  
 
Iteration process: 
     Maximum number of iteration reached without convergence 
     Number of iterations:  100 
     Convergence criteria: parameters= 1.6e-25 
                         : likelihood= 1.2e-10 
                         : second derivatives= 1 
 
Goodness-of-fit statistics: 
     maximum log-likelihood: -151.47  
     AIC: 348.94  
     BIC: 382.65  
 
 
Maximum Likelihood Estimates: 
 
Fixed effects in the longitudinal model:

                              coef Se Wald p-value
intercept (not estimated)        0                
schoolf1                  -0.04442                
schoolf2                   0.29200                
schoolf3                   0.12510                
adj.qtr.ctr                0.18931                
adj.qtr.ctr.2              0.05396                
schoolf1:adj.qtr.ctr       0.08899                
schoolf2:adj.qtr.ctr       0.21295                
schoolf3:adj.qtr.ctr      -0.05349                
schoolf1:adj.qtr.ctr.2     0.05636                
schoolf2:adj.qtr.ctr.2    -0.11592                
schoolf3:adj.qtr.ctr.2    -0.06072                


Variance-covariance matrix of the random-effects:
              intercept adj.qtr.ctr adj.qtr.ctr.2
intercept       3.73018                          
adj.qtr.ctr    -0.98194     0.37123              
adj.qtr.ctr.2   0.01732    -0.02758       0.00536

                              coef Se
AR correlation parameter: -0.28498   
AR standard error:         1.08409   

Residual standard error (not estimated) = 1

Parameters of the link function:

               coef Se Wald p-value
I-splines1 -1.09554                
I-splines2  1.24164                
I-splines3  1.91268                
I-splines4  1.40211

How to evaluate the model accuracy for a specific outcome?

I am working on a model for patient deterioration based on repeated measures of vital signs. I am trying to discover trajectories in these vital signs that would indicate deterioration / risk of deterioration. One option is to use jointlcmm and use deterioration as time to event parameter. However, I would also be interested to leave the outcome (deterioration) out of the model to see if the classes/trajectories identified by the lcmm function are a good indication of deterioration. Or would that be a wrong use of the model?

Lo-Mendell-Rubin tests and bootstrap likelihood ratio tests

Dear Dr Proust-Lima,

Thank you for making the r package lcmm – It is great work! I’m using the lcmm package to study latent classes of treatment response in a longitudinal study.
I would like to use Lo-Mendell-Rubin tests and bootstrap likelihood ratio tests to determine the optimale number of classes, but the tests are not implemented in the current package. Is there a way that I can make the tests using other libraries, or do I have to code them myself?

Best regards,
Anne

R package: Latent Class Growth Analysis or Growth Mixture Modeling

Dear Cécile,

within the framework of my PhD Programme, I'm doing research on the development and course of depressive symptoms in children and adolescents.

In one of my research projects I aim to identify differences in longitudinal change among unobserved groups according to their developmental trajectory of depressive symptom severity (CES-DC). For this purpose, I'm looking for an R package applying Latent Class Growth Analysis (LCGA) or Growth Mixture Modeling (GMM) (Jung & Wickrama, 2008; Nagin, 1999).

Other packages such as the k-means longitudinal clustering approach (R package kml) are highly flexible and easy to administer, but I'm looking for a model-based approach to classifiy my sample into distinct trajectory groups. Are you aware of an LCGA or GMM R package, or is your R package 'LCMM' suitable for the research interest as described above?

Best wishes from Berlin
Jascha

Number of latent classes not matching up to defined 'ng'

Dear Doctor Proust-Lima,
I am currently conducting research for a telephone health coaching service, which seeks to improve health outcomes

I am looking to use the multlcmm function (from lcmm package) to classify participants based on their difference across 2 time points on a collection of outcomes measures (bmi, physical activity, waist circumference, fruit/vegetable consumption). Outcome variables are all continuous.

When running this model:
GHS_LCM5 <- multlcmm(bmi + pa + waistc + fruit + veg ~ 1 + time , random = ~ 1 + time, mixture = ~ 1 + time, ng = 5, subject="enrolment_id", data = GHS_LPA4, link = "linear", B = GHS_LCM1)

Class 3 and 4 do not seem to have any cases assigned, which is not something that I've encountered with other clustering packages such as mclust.

I'm unsure if the issues arise from my model specifications, setup of data, or even choice of function, so any advice would be much appreciated.

Many thanks,
Joe

Applying lcmm on data with missing values

Dear Cécile,

thanks a lot for developing lcmm and being so responsesive here on GitHub!

I do have another question regarding your package. What do you suggest on how to apply 'lcmm' if you have missing data (MAR assumptions hold true) in a continious, non-Gaussian outcome? Are there appropriate missing data techniques included in lcmm or should I impute the data (e.g. 'mice') before I run the lcmm package? How can I be most efficient and end up with robust and less biased estimates?

Best wishes
Jascha

Switching problem for latent classes

Dear Dr. Proust: I hope you are well. I write you in this opportunity because I have some questions about switching problem for latent classes. I have simulated a database (covariates, fixed effects, random effects, latent process, measurement error, intercept associated with class-specific probability, two latent classes, ordinal markers and thresholds), then I estimated the parameters using lcmm function with the goal to recover the parameters used to create the database. In approximately 30 percent of simulations, I detected different order in classes and the intercept of class-specific probability results with similar magnitude but changed sign.
Dr. Proust, does the internal code of lcmm manage this problem? If there was a switching problem, does it affect only multinomial logistic model? Or does it affect too measurement model? Particularly, parameters associated with class-specific fixed effects. (for example for 2 classes, the parameters of class 1 correspond to class 2 and the parameters of class 2 correspond to class 1)

thank you so much for your help,

Ivonne Rentería

PredictY

Hi,
When I tried to use predictY in 1.8.1, it gives me:
Error in olddata[,v]: type 'closure' is not subsettable.
Thank you

Iteration process

Dear Dr. Proust,
My data consist of 70 subjects and each has 909 weeks of mood assessments (MOOD, scale 1-6). In total, there are 63630 observations. My goal is to derive groups based on these trajectories.
I'm running the following model:

lcmm (MOOD~ week , random = ~week , subject = 'ID', mixture = ~ week, ng = 3, idiag = TRUE, data = dataset, link="splines")

I was able to use this code on a smaller dataset. Now I have this:

“Iteration process: The program stopped abnormally. No results can be displayed“

Is this happening because of the number of observations? Am I choosing the wrong model or coding something wrong?

Thanks,
Joao

Questions about lcmm()

Dear Dr. Proust-Lima and lcmm-team,

I am a doctoral student in the psychology department of the University of Córdoba (Spain).
First of all, thank you for creating this wonderful R package and secondly for all examples, feedback and help that you and your lcmm team offer us.
Secondly, I would like to use the R lcmm package to find trajectory classes in longitudinal data with 4 repeated structured measurement time points (every 6 months) of high school student victimization levels (N = 734). Victimization levels for each wave was calculated as follows: Students were asked about who victims are in their class. The number of nominations received by each student was counted, divided by the number of possible nominators to account for differences in classroom size. Proportional scores for each student were calculated by class. The victimization score ranges from -1.14 to 5.20, and is a continuous variable. One of my questions is related to this: Should I transform the negative score into positive?

On the other hand, since victimization is a score with a non-Gaussian distribution, I fit a mixed model of latent process (lcmm function). At this point I would like to show you my script because I have doubts about whether I have written it correctly, because my time variable is computed as time 1 = 0; time 2 = 1; time 3 = 2; time 4 = 3. and I'm not sure if I can use it as I did:

linear1 <- lcmm (z_score_vict ~ Time, random = ~ -1, subject = 'student_ID', data = Variables)
splines1 <- lcmm (z_score_vict ~ Time, random = ~ -1, subject = 'student_ID', data = Variables, link = "splines")

Finally, following other examples that you offer, I have chosen the splines model as the best option to fit the next model with ng with k-classes. However, I do not understand what the exact meaning of this kind of analysis is. I have looked up on Internet but I cannot find anything to help me understand what exactly "splines" calculations mean.

Thank you very much for your support!
Kind regards,
Ana

Why don't we use successive ordinal thresholds when we build the "Binit" vector (vector of initial values)?

Dear Dr. Proust,

When I replicated the following routine in the section 6.3 lcmm,
mord0 <- lcmm(CESD ~ age65 * male, random =~ -1, subject = 'ID', data = paquid, link = 'thresholds')
binit <- NULL
binit[1:6] <- mspl$best[1:6]
binit[7:56] <- mord0$best[4:53]
mord <- lcmm(CESD ~ age65 * male, random =~ age65, subject = 'ID', data = paquid, link = 'thresholds', B = binit)

the list of initial values for thresholds parameters is not strictly increasing (mord0$best[4:53]). Why don't we use something like that binit[7:56] <- mord0$estimlink[i,2]?

with thanks for any guidance you can offer,

Ivonne Rentería

BIC calculation

Dear Dr Proust-Lima,

I used the lcmm package recently and found it is very useful. But I noticed that when calculating the BIC using the lcmm package, the number of subject was used as n rather than the number of observations. Is there a reason for this?

Best regards,
Kate

Error in multlcmm with prior latent class

Hi,

It works good for me using some prior latent class under the function "hlme", However, if I want to use the same prior information on the multlcmm function, it doesnot work. The function does not return any error, but it does not use the prior information given in the function. The prior covariates are 1,2,3 (if I assume 3 clusters) and 0 (for impute). However, the function multlcmm imputed the subject's label with given prior =1,2,or 3, instead of the the subject's label with given 0.

Could you please check ? Thank you so much !

Bochao Jia

How do we get increasing thresholds parameters for ordinal markers?

Dear Dr. Proust,

When I replicated the following routine in the section 6.3 lcmm,
mord0 <- lcmm(CESD ~ age65*male, random = ~-1, subject = 'ID', link = ‘thresholds’, data = paquid)
the list of thresholds parameters is not strictly increasing.

                coef      Se    Wald  p-value

thresh. parm1 -0.99852 0.08843 -11.291 0.00000
thresh. parm2 0.56689 0.02055 27.591 0.00000
thresh. parm3 0.49852 0.01874 26.597 0.00000
thresh. parm4 0.45676 0.01793 25.469 0.00000
...
thresh. parm20 0.27593 0.02446 11.282 0.00000
thresh. parm21 0.25844 0.02555 10.115 0.00000
thresh. parm22 0.20718 0.02655 7.803 0.00000
...
thresh. parm48 0.30197 0.14983 2.015 0.04386
thresh. parm49 0.35615 0.17626 2.021 0.04332
thresh. parm50 0.45527 0.22627 2.012 0.04421

with thanks for any guidance you can offer,

Ivonne Rentería

hlme: argument subject numeric

Version 1.8.1

bmi_long from LCTMtools

data(bmi_long)
bmi_long$id <- as.numeric(bmi_long$id)
model1 <- lcmm::hlme(fixed=bmi~1+age+I(age^2),
mixture = 1+age+I(age^2),
random=-1,
ng=5,
nwg=FALSE, data=bmi_long[1:200,],
subject="id")

Error in lcmm::hlme(fixed = bmi ~ 1 + age + I(age^2), mixture = ~1 + age + : The argument subject must be numeric

class(bmi_long$id)
[1] "numeric"

issue replicated with 2 other datasets

Release your package to R CRAN

Hi,

I wonder could you update your packages to R CRAN? Currently, the Version 1.8.1 has some errors in some functions such as: the prior option is not correct in jointlcmm and multilcmm. So If you could upload a new verion of package to CRAN, our collegues can use it more easily. Thanks.

random effects when the lcmm outcome is ordinal

Dear Prof. Proust-Lima, I´m using your lcmm package to identify distinct trajectories of an ordinal longitudinal outcome and was hoping that you would be able to advise me. I have used this code:
h4<-lcmm(tricat~age,subject='ID',mixture=~age,ng=4,idiag=TRUE,data=hivv,
link="thresholds")
where tricat is the ordinal outcome. People entered the cohort at different ages, and are followed-up for different lengths of time. In one model there are 1000 people, in the other there are 3000, in both groups the median number of 6-monthly follow-up visits is 25.

I`m happy with the results as the groups identified are distinct, the patterns identified make sense and the posterior probabilities are quite high. But I am worried that I should perhaps have included a random effects term. I´ve been trying to understand the implications of not using the random argument, but have not quite been able to get my head around it. I did try running the larger model with random ~age, but it didn't converge. I also tried random ~1, but that produced odd, not particularly convincing groups. Does it seem acceptable to you to run the model without random effects?
with thanks for any guidance you can offer,
Branwen

Edit: Am I right in thinking that by not including random effects, what I have effectively done is group-based trajectory analysis, rather than latent group mixed modeling?

lcmm package with 5classes

hello Ms cecile, i am a student in nutrition and.food hygiene.
I am working on diet trajectories using lcmm package in R . I came accross your paper "Estimation of Extended Mixed Models Using Latent
Classes and Latent Processes: The R Package lcmm
" and a website where you described the analysis lcmm V1.7.8 on cran · a79e70b
which has helped me to understand the analysis process.
i followed the steps. as described in "Estimation of Extended Mixed Models Using Latent
Classes and Latent Processes: The R Package lcmm " but i still encountered many issues which are:

1－i want to generate 5 number of classes of diet , all the models i have tried so far are not convergent when i summarise the models , percentages in some classes are NA and i only get one straight line when i plot the model.

2－ When i try three classes the model is convergent but the number of classes to be generated according to the litterature is 5 just like BMI in your paper.

I AM ASKING MYSELF WHAT COULD BE PROBLEM? I WILL APPRECIATE IF YOU CAN GIVE ME SUGGESTIONS ABOUT IT.
THANK YOU.

SINCERLY YOURS

Testing class membership predictors using a 3-step approach possible in lcmm?

Dear lcmm team,

I would like to perform GMM on longitudinal data with up to 4 repeated unstructured measurement time points of an outcome measuring people’s performance in activities of daily living. Once I have found the number of different classes within the data, I will be interested in the predictors of the class membership. To test them, I would like to use a 3-step approach in order to account for the classification errors in the class allocations (https://www.researchgate.net/publication/228389117_Latent_Class_Modeling_with_Covariates_Two_Improved_Three-Step_Approaches & https://www.statmodel.com/examples/webnotes/AuxMixture_submitted_corrected_webnote.pdf).

According to the paper of Asparouhov and Muthén, the 3 steps to go are (Chapter 2):

Do a Latent Class Analysis with only using the latent class indicator.
-> As a simple first example I used a hlme model with three classes:

lin1 <- hlme(SCIM_raw_num ~ days_SCIM_adm,
random=~ days_SCIM_adm,
mixture=~ days_SCIM_adm,
ng = 3, B =lin0, subject='id', data=data_long)

Create the most likely class variable for each individual (N), its classification error (p) and its uncertainty rates (q; and logits k).
-> I was able to extract the most likely class variable and to calculate the corresponding uncertainty rates using the formulas presented in the mentioned paper:

#Most likely class membership
data_pred <- data_long
data_pred$id <- as.character(data_pred$id)

data_N <- as.data.frame(lin1$pprob[,1:2])
data_N$id <- as.character(data_N$id)
data_pred$N <- factor( data_N$class[sapply(data_pred$id, function(x) which(data_N$id==x))] )

#Number of persons per class
N1 <- 92
N2 <- 270
N3 <- 485

#Posterior classification table = classification errors:
p11 <- 0.7711
p12 <- 0.0694
p13 <- 0.1596
p21 <- 0.0572
p22 <- 0.8986
p23 <- 0.0442
p31 <- 0.0416
p32 <- 0.0261
p33 <- 0.9323

#uncertainty rates
q11 <- p11N1/(p11N1+p21N2+p31N3)
q12 <- p21N2/(p11N1+p21N2+p31N3)
q13 <- p31N3/(p11N1+p21N2+p31N3)
q21 <- p12N1/(p12N1+p22N2+p32N3)
q22 <- p22N2/(p12N1+p22N2+p32N3)
q23 <- p32N3/(p12N1+p22N2+p32N3)
q31 <- p13N1/(p13N1+p23N2+p33N3)
q32 <- p23N2/(p13N1+p23N2+p33N3)
q33 <- p33N3/(p13N1+p23N2+p33N3)

#logits of uncertainty rates = needed for Step 3
k11 <- log(q11/q13)
k12 <- log(q12/q13)
k13 <- log(q13/q13)
k21 <- log(q21/q23)
k22 <- log(q22/q23)
k23 <- log(q23/q23)
k31 <- log(q31/q33)
k32 <- log(q32/q33)
k33 <- log(q33/q33)

Use the most likely class variable as latent class indicator variable (its uncertainty rates should be prefixed by the obtained measurement errors k in step 2). In this step also the predictors of the class membership should be included in the model.
-> For this step I would be very happy about your support how to do it in the lcmm package. Do you have any suggestions? And does the above 3-steps approach also work for models fitted with the lcmm() function and non-linear link functions?

Thank you very much for your support!
Best regards, Jsabel

Multilevel Mixture for Binary dependent variable

I need to estimate a multilevel mixture model (or latent class mixed model) and my dependent variable is binary distributed.
The conditional part of the model, more or less, should be in the form of logit(Y_ijc) = \alpha_c + \beta_c X_ijc + u_jc ,
whereas the overall model is as usuale something like p(y|x)=sum \pi_c * p_c(y|x,\theta_c) where \pi_c are the mixture weights and \theta_c is the vector of the parameters.

Basically, I'm expecting one \alpha and one \beta for each latent group as well as a vector of random intercepts u_i.

I'm not able to find the way to build a similar model using lcmm package.
Thanks for your time.

Adding legend = NULL to the plot of the baseline hazard disrupts sequencing of the dots

This bug is pretty obscure, and it took me a while to work out exactly what was producing the error. But basically, when I specify legend = NULL in the plotting argument, then any ... arguments specified after that are incorrectly sequenced (i.e. their position in the matched call is incorrect).

This means that the use of the match.call function here - and in the subsequent lines of code below that - returns either an error or the incorrect output.

I think the better approach might be to use dots instead of match.call(). I will submit a pull request to that effect.

----------- start reprex ------------

library(lcmm)

m3 <- Jointlcmm(fixed= Ydep1~Time*X1,mixture=~Time,random=~Time,
                classmb=~X3,subject='ID',survival = Surv(Tevent,Event)~ X1+mixture(X2),
                hazard='3-quant-splines',hazardtype='PH',ng=3,data=data_lcmm,
                B=c(0.7576, 0.4095, -0.8232, -0.2737, 0, 0, 0, 0.2838, -0.6338, 
                    2.6324, 5.3963, -0.0273, 1.3979, 0.8168, -15.041, 10.164, 10.2394, 
                    11.5109, -2.6219, -0.4553, -0.6055, 1.473, -0.0383, 0.8512, 0.0389, 
                    0.2624, 1.4982))

# returns an error
plot(m3, which = "hazard", legend = NULL, col = rep(1,3))

# works ok
plot(m3, which = "hazard", col = rep(1,3), legend = NULL)

----------- end reprex ------------

lcmm "crashing" with non numeric subject

HI Cecilie,

Im not sure if you are the right person to contact :)

I have notice a small bug in the LCMM package. If you do not cast the subject variable to numeric, my code will stall, and requires me to reboot R in order to continue.

Perhaps it could automatically flag a warning? or break immediately?

Otherwise I am very much enjoying the package!

Leading minor is not positive definite

Dear Dr Proust-Lima,

Thank you so much for creating this package - it has been well used!

I'm currently running a quadratic model with 2-4 time points per individual (approx 72000) and various subsets keep crashing while others run, despite having the exact same code.

The error message is "Error in chol.default(vbb) : the leading minor of order xx is not positive definite" . Do you have any advice on how to fix this?

Many thanks,

Charlotte

strange results in hlme()

Hello,

I am attempting to fit a series of latent class growth models using the hlme() function and am observing some strange results. Unfortunately, I can't provide the data nor the output due to work restrictions, however I hope I can at least describe my setup.

Sample Size = ~3000 patients
Time = Measured in Months (Values range from -24 to 24)
Outcome = A continuous measure, which ranges from 0 to 14.

Each patient has a variable number of time points at which the outcome was measured. The time points can be unique to a patient too (e.g. patient A has -4, 3, 5, patient B has -14, -12, -4, 5, 20).

I'm attempting to fit the model using this syntax:

model1 <- hlme(data=foo, outcome ~ time + time_squared, random=~time + time_squared, mixture=~time+time_squared, id='id', ng=3)

I vary ng from 2 to 4. Time_squared is a quadratic time variable I've created. I believe the way I have specified this model is equivalent to poly(time, degree=2, raw=TRUE)

The issue comes about when I look at the results for the fixed effects. If I specify, say ng=3, and look at the fixed effects for the longitudinal model, I see that the intercept, slope, and quadratic terms are identical for latent class 1 and 2, while class 3 has unique estimates. This also happens for ng=2, and ng=4. I'm wondering what this indicates? Is it a local maxima issue? I have tried to use gridsearch, and specify initial starting values using the approach in the paper for your package to no avail. If I remove the random effect for slopes, and re-run the model, I get unique estimates (e.g. each latent class has a different intercept and slope in the fixed effects part)

e.g.

model2<- hlme(data=foo, outcome ~ time + time_squared, random=~1, mixture=~time+time_squared, id='id', ng=3)

Any insight is appreciated.
Best,

How to specify mixture in models with multiple classes?

One of the model parameter in case of multiple classes is ‘Mixture’. From the manual of the package and a google search, it did not became clear to me what this parameter does exactly and how to specify it.

In the examples in the manual, this parameter is always the same as the ‘Random’ parameter, but I wonder if this is always the case? When I want to create a model with ( heart rate ~ measurementNr + age + gender + hasDiabetes ), with measurementNr (time variable as integer), age (integer) and gender (male/female), hasdiabetes (y/n). Should I include measurementNr as mixture or also any of the other parameters?

Could you please clearify what the 'mixture' parameter does/means?

Free variance between classes

Hi,
I'm running the following model to find groups in trajectories of weekly sickness absence:

Model1 <- lcmm(sum_week ~week_count, mixture= ~week_count, random = ~week_count, subject=”id”, ng=2, data=week, nwg=F, idiag=F, link=”linear”)

Where sum_week is a numeric value from 1-5 (sick-days per week), and week_count is my time variable (from week 1 to 52 doing the year).

I’m still trying to find the best model for my dataset and want to relax the between-group variance. Specifically I want to allow the variance both in the intercept and the slope factor to differ by class. However, I am not sure how to specify this using the lcmm function?

As far as I could read, the ‘nwg’ boolean does something similar by setting the covariance-variance to be proportional (or non-proportional) over classes. But I’m not sure if this will generate a model with “free” variance between classes?

Best,
Charlotte

Latent variables as exposures in distal models

Dear Cécilie Proust-Lima,

I am considering how to incorporate the uncertainty of the latent variables in distal models with latent variables as the exposure as discussed by Nylund-Gibson et al. 2019 (Prediction from Latent Classes: A Demonstration of Different Approaches to Include Distal Outcomes in Mixture Models, https://doi.org/10.1080/10705511.2019.1590146). Is it possible to use the lcmm package to analyze the association between latent classes and a distal outcome via either the ML three step approach or the BCH method (using the terminology from Nylund-Gibson 2019 )?

Thanks,
Kim

Question about the package lcmm

I used your R package "lcmm" to ran the code in the "Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm" in Rstudio of Mac, Windows system and Rstudio Server, web terminal client (Minerva). The R code I ran is in chapter 6.5 Jointlcmm examples. I am sure the code I ran in different servers was same as what you have shown. But the results of summarytable(mj4b) were different. The results from R software in Mac and Windows were the same as yours in page 43 while the result of summarytable(mj4b) from Minerva was different. Please see the attachment for the different result of summarytable(mj4b). I think there may be some computational issues in this package when the conversion happens between C++ and R language. Could you please check it for this issue? Thank you very much.

lcmm.pdf

How to specify non-normal distributions in lcmm?

linear growth mixture model

Dear all,

I am very new to LCMM, so I want to apologize in advance if my questions are silly or this is not the right place to ask.

My goal is to fit a (linear) growth mixture model, with the restriction that one class has a fixed mean growth of zero.

I followed the recently published working paper by Wardenaar (2020) and unfortunately, I could not find answers to some questions and encountered some problems.

I used the code as follows:
gmm1 <- hlme(y~ time, subject = "ID", random=~1, ng = 1, data = data)
gmm2 <- gridsearch(rep = 100, maxiter = 10, minit = gmm1, hlme(y ~ time, subject = "ID", random=~1, ng = 2, data = data , mixture = ~ time, nwg=T))
gmm3 <- gridsearch(rep = 100, maxiter = 10, minit = gmm1, hlme(y ~ time, subject = "ID", random=~1, ng = 3, data = data, mixture = ~ time, nwg=T))
summarytable(gmm1, gmm2, gmm3)
Some remarks:
-My data is on long format
-time is coded from 0 to 3 as I have 4 measurement points and assume a linear growth.
-Y is the outcome variable measured on a 5-point Likert Scale via several items (there are missing values)
-ID is numeric ranging from 1 to 243

I get the following error message:
Error in hlme(ITL_STEM ~ Index1, subject = "ID", random = ~1, ng = 1, :
The argument subject must be numeric
My ID variable is numeric. I uploaded the data in R via SPSS. Is there anything else that needs special attention here? Sorry if this is a silly question, but I don't understand what I'm missing.
How can I add an argument so that one class has a fixed growth of zero?
How can I get the information which subject is in which latent class?

I’m sorry for taking your time. Any help is much appreciated.
Very best
Daniela

Unable to compute dynamic probabilities with Jointlcmm after using the TimeDepVar command

In the following code, we first specify a joint model by using the Jointlcmm function. Afterwards we take a small sampleset for which we want to compute dynamic probabilities.

When fitting this model without the "Var1" as TimeDepVAR (time dependent variable indicating an intermediate time corresponding to a change in the risk of event; equals end of the course if not present) it can compute these without any problems..

However when including this command accordingly it becomes impossible to compute the predictions as we keep on receiving the same error:

errror in dynpred(joint_fit, sampleset, landmark = c(weeks), var.time = "Time", :
object 'Xevt' not found

Not sure what goes wrong here actually.. is it the variable it self, something with the coding or maybe construction of the data itself (small fictive subset attached)
jmset_mini.zip
Thank you in advance :)

# Fit the joint model timedepvar with  the indication the moment of risk during treatment
joint_fit <- Jointlcmm(Marker ~ poly(Time, degree = 1, raw = FALSE),
                       random = ~ poly(Time, degree = 1, raw = FALSE), TimeDepVar = "Var1",
                       survival = Surv(Time_to_Event, Event) ~ Marker + Treat + Var1, 
                       hazard = "Weibull", subject = "ID", data = jmset_mini,  ng= 1, verbose = FALSE)


# Take a small sample set
sampleset <- jmset_mini[jmset_mini$ID == 7,]
rownames(sampleset) <- NULL
sampleset <- sampleset[,c("ID", "Time", "Marker", "Treat", "Var1",
                          "Event", "Time_to_Event")]

# Specify which weeks we want to see
y <- seq(1:5)
weeks <- sampleset[y,'Time']

# Spit out the dynamic predictions and plug them into a dataframe
dynp_plot1 <- dynpred(joint_fit, sampleset, landmark = c(weeks),
                      var.time = "Time",
                      horizon = c(12), draws = F) # predicting 12 weeks ahead

Is it possible to have a link equal to 'cnorm' in Stata 'traj' pkg?

Hi,

I'm trying to implement Group-based multi-trajectory modelling (Nagin et al 2016) with multlcmm function in lcmm pkg, because I want to use the constructed model to predict new cases. However, I find the data were better fitted Stata pkg [traj](https://www.andrew.cmu.edu/user/bjones/index.htm)(Nagin et al 2016),

The code was:
traj, multgroups(4) /// var1(bio1*) indep1(log_days*) model1(cnorm) min1(0) max1(100) order1(2 2 2 2) /// var2(bio2*) indep2(log_days*) model2(cnorm) min2(0) max2(100) order2(2 2 2 2)

The output graph is as below:

Fig Notes: colored lines showed the observed trajectories for each patient. The black smoothed lines were the predicted trajectories for each group and were plotted with the estimates from 'multtrajplot'.

I tried to use link='3-quant-splines' in multlcmm function.
The code is:

formula <- 'I(log_days) + I(log_days^2)'
link <- '3-quant-splines'
m1 <- multlcmm(as.formula(paste0("bio1 + bio2~", formula)),
           random=as.formula(paste0("~", formula)),
           ng=1,
           subject = 'numID',
           data=dat,
           link = c(link, link))
m <- multlcmm(as.formula(paste0("bio1 + bio2", formula)),
               random=as.formula(paste0("~", formula)),
               mixture=as.formula(paste0("~", formula)),
               ng=4,
               subject = 'numID',
               data=dat,
               link = c(link, link),
               B = random(m1))

However, the output figure is:

There are two questions with the above result:

The output 4 groups were not that different from each other visually.
The estimated trajectories for each group did not fit the data well, especially for Group1.

Therefore, my question is, is it possible to have a multlcmm model with the link same to cnorm in Stata pkg traj?

Question about finding the best link function

Dear Dr. Proust-Lima and lcmm-team,

I would like to use the R package lcmm to find classes of trajectories in longitudinal data with up to 4 repeated unstructured measurement time points of an outcome measuring people’s performance in activities of daily living. The outcome is a sum score of several items and its ranging from 0 to 100. Within a previous analysis, the sum score was transformed from an ordinal into an interval/continuous sum score using Rasch Measurement Model theory. Still, I am unsure if I should fit a Latent Class Linear Mixed Model (hlme function) or a Latent Process Mixed Model (lcmm function). As I found in your work, the lcmm function is used for ordinal, bounded or curvilinear outcomes or outcomes with non-Gaussian random effects and errors. Could you explain what you mean by curvilinear? And is there a way to know if random effects and errors are Gaussian or not?

Since my outcome is at least bounded, I tried to go on using the lcmm function and find the best link function by comparing the discrete AICs that should be provided in the respective model summaries (as described in your paper Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: the R package lcmm). Unfortunately, this discrete AIC values do not show up in my model summaries no matter which link function I test (I am using the lcmm package version 1.8.1). Could you advise me on how to solve this problem?

Thank you very much for your support!
Best regards, Jsabel

Matching the mathematical model with the R code for lcmm

Dear Dr. Proust-Lima,

I am running your lcmm package on a sample of 500 older adults with up to 12 longitudinal assessments that created a 4-level ordinal outcome. I am running the model without any covariates.
Based on my understanding, I am using a version of formula number 8 in your 2017 article (Proust-Lima, C., Philipps, V., & Liquet, B. (2017). Estimation of extended mixed models using latent classes and latent processes: the R package lcmm. Journal of Statistical Software, 78(2), 1-56. doi:10.18637/jss.v078.i02)
Ʌ_i (t)|(c_i=g)=X_L1i (t)^T β+X_L2i (t)^T α_g+Z_i (t_ij)^T u_ig+ω_i (t_ij)

My best model in R is a model with random intercept and quadratic for the effect of age
(I think because my outcome has only 4 levels and the maximum follow-up period is 12 assessments, adding random slope did not improve my model. I mean he variablilty in slope is not enough that can be modeled)
m <- lcmm(outcome4 ~ poly (age, degree = 2, raw = TRUE),
random = ~ 1,
mixture = ~ poly (age, degree = 2, raw = TRUE),
subject = "ID",
data = long,
link = "thresholds",ng=4)

I have two questions that I greatly appreciate your advise on them:

Question 1: Did I introduce my model to R correctly? I want to fit an lcmm model with random intercept and quadratic for the effect of age.

Question 2: How does my lcmm model in R match with formula (8) in your article? which parameters did I estimate and which ones I did not estimate?

Kind regards,
Maryam

95% confidence limits in plot of baseline hazard

I am wondering whether it is possible to include confidence limits in the plot of the baseline hazard, for Jointlcmm objects?

For example, could 95% confidence limits be added somehow to the example below?

In the documentation, I couldn't find anywhere explaining how to do it...

Thanks in advance.

--------- start reprex -----------

library(lcmm)

m3 <- Jointlcmm(fixed= Ydep1~Time*X1,mixture=~Time,random=~Time,
classmb=~X3,subject='ID',survival = Surv(Tevent,Event)~ X1+mixture(X2),
hazard='3-quant-splines',hazardtype='PH',ng=3,data=data_lcmm,
B=c(0.7576, 0.4095, -0.8232, -0.2737, 0, 0, 0, 0.2838, -0.6338, 
2.6324, 5.3963, -0.0273, 1.3979, 0.8168, -15.041, 10.164, 10.2394, 
11.5109, -2.6219, -0.4553, -0.6055, 1.473, -0.0383, 0.8512, 0.0389, 
0.2624, 1.4982))

plot(m3,which="baselinerisk",bty="l")

--------- end reprex -----------

Comparing lmer and LCMM (single class) models

Hello Dr. Proust-Lima and lcmm-team,

I am trying to compare the models generated by LCMM and LMER. My understanding is that the models should be very similar when the class count is set to 1 for LCMM. However, as seen in the example below, the model summaries are quite different (particularly the co-efficients).

Additionally, the predictions obtained using predictY of LCMM and predict.merMod of LMER are different as well.

I would be grateful for pointers on the gap in my understanding.

import::from(lme4, lmer)
import::from(lcmm, lcmm)

set.seed(3114)

#Data from: http://www.danmirman.org/gca/WordLearnEx.txt
df <- read.csv('~/path/to/file/dat.txt', sep='\t')
t <- poly(unique(df$Block), 2)
df[,paste("ot", 1:2, sep="")] <- t[df$Block, 1:2]

# Model: random effects subject, gives average trend line for entire cohort
m1 <- lme4::lmer(Accuracy ~ TP + 
             ot1 + ot2 + (ot1 + ot2| Subject),
             data=df, REML=FALSE)

# Model: random effects subject, but also you have latent class (groups)
# So if you set class = 1, ideally that should give you average trend line for entire cohort
m2 <- lcmm::lcmm(Accuracy ~ TP + ot1 + ot2,
           random = ~ 1 + ot1 + ot2,
           subject='Subject',
           ng=1,
           data=df,
           link='linear')

Output of m1 (lmer)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Accuracy ~ TP + ot1 + ot2 + (ot1 + ot2 | Subject)
   Data: df
 
     AIC      BIC   logLik deviance df.resid
  -330.3   -282.7    176.2   -352.3      549
 
Scaled residuals:
    Min      1Q  Median      3Q     Max
-3.7530 -0.5315  0.1351  0.5544  2.5082
 
Random effects:
 Groups   Name        Variance Std.Dev. Corr      
 Subject  (Intercept) 0.010784 0.10385            
          ot1         0.014975 0.12237  -0.33      
          ot2         0.007908 0.08893  -0.30 -0.80
 Residual             0.024806 0.15750            
Number of obs: 560, groups:  Subject, 56
 
Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.82462    0.02140  38.534
TPLow       -0.03922    0.02974  -1.319
ot1          0.28685    0.02665  10.763
ot2         -0.10908    0.02417  -4.513
 
Correlation of Fixed Effects:
      (Intr) TPLow  ot1  
TPLow -0.695              
ot1   -0.132  0.000      
ot2   -0.095  0.000 -0.242

Output of m2 (lcmm)

General latent class mixed model
     fitted by maximum likelihood method
 
lcmm::lcmm(fixed = Accuracy ~ TP + ot1 + ot2, random = ~1 + ot1 +
    ot2, subject = "Subject", ng = 1, link = "linear", data = df)
 
Statistical Model:
     Dataset: df
     Number of subjects: 56
     Number of observations: 560
     Number of latent classes: 1
     Number of parameters: 11  
     Link function: linear  
 
Iteration process:
     Convergence criteria satisfied
     Number of iterations:  16
     Convergence criteria: parameters= 3.6e-06
                         : likelihood= 3e-05
                         : second derivatives= 2.5e-07
 
Goodness-of-fit statistics:
     maximum log-likelihood: 176.16  
     AIC: -330.32  
     BIC: -308.05  
 
 
Maximum Likelihood Estimates:
 
Fixed effects in the longitudinal model:
 
                              coef      Se   Wald p-value
intercept (not estimated)        0                      
TPLow                     -0.24903 0.19822 -1.256 0.20900
ot1                        1.82129 0.17986 10.126 0.00000
ot2                       -0.69255 0.15520 -4.462 0.00001
 
 
Variance-covariance matrix of the random-effects:
          intercept      ot1     ot2
intercept   0.43474                
ot1        -0.17001  0.60370        
ot2        -0.11070 -0.35185 0.31882
 
Residual standard error (not estimated) = 1
 
Parameters of the link function:
 
                         coef      Se   Wald p-value
Linear 1 (intercept)  0.82462 0.02193 37.599 0.00000
Linear 2 (std err)    0.15750 0.00527 29.895 0.00000

random intercept and random slope - selection of the best model

Dear Dr. Proust-Lima,

Thank you for your very helpful lcmm package!

I am running your lcmm model on a sample of 500 older adults with up to 12 longitudinal assessments that created 4-level ordinal outcome. My first model does not have any covariates. I used one of the codes in your article and I only added age. The model with 8 latent classes has the smallest BIC. Question 1: I am not sure how to use average posterior probabilities to improve my model selection. The latent classes in the model with 8 latent classes do not make lots of clinical sense and the average posterior probabilities are between 0.65 and 0.88. Is there a cut-point for the average posterior probabilities where the probabilities can be considered too small for the model to create robust latent classes? For example, the model with three latent classes has much higher average posterior probabilities (0.93, 0.93, 0.93). However, I believe there is more variability in my data set that can be captured with only 3 latent classes.

Question 2: I also have a question about implementing both random intercept and random slope into my model. Would you please check my model below and see whether I am considering both random intercept and random slop? If not, how would I modify my model to do so? Also, how do I decide whether both these random effects are needed to model my data set?
m <- lcmm(4 level ordinal outcome ~ age75, mixture = ~age75, random = ~ -1, subject = "ID", data = long, link = "thresholds", ng=8)
where long$age75 <- (long$age-75)/10 (youngest participants of my study are 75 years old)

Question 3: Would you please help me to understand what does random = ~ -1 mean? I ran the model with random = ~ +1 to figure out what is going on. The model took much longer to run and it assigned all participants into 7 latent classes and the 8th class remained empty.

Please let me know if my questions are not clear.

I greatly appreciate your input and looking forward to hearing from you.

Best regards,
Maryam

spline prediction with hlme and predictY

Dear Dr Proust-Lima,

I am exploring heterogeneity in response to a treatment using a spline within the 'hlme' function. I encountered an issue when computing the prediction of the model for a new set of 'x' values. For simplicity, I illustrate here the issue using simulated data without response heterogeneity (ng=1), i.e., this permit to fit the model using both 'lmer' and 'hlme' and then to compare the predictions (see the reproducible example 'lcmm_test2.txt' attached):

On this plot we have:

In grey: the simulated data
In red: the the predictions obtained using the function predict.merMod from the lmer package
In blue: the predictions obtained using the predictY function
In yellow: the sample means

You suggested in an offline discussion that this might be due to the definition of the range of the spline and thus to explicitly define it when calling the spline function 'ns'. I thus tried to explicitly provide knots and Boundary.knots:

kk <- quantile(dt$x, prob=seq(0,1,len=5))[2:4]
rr <- range(dt$x)
hm1 <- hlme(y~ns(x,knots=kk, Boundary.knots=rr),subject="id",ng=1,data=dt)

but I get the following error:
Be patient, hlme is running ...
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 500, 3, 2

Then I tried to explicitly provide the basis. This worked for the call to the 'hlme' function but I finally get an error with the predictY function when trying to predict for a new set of 'x' values:

mybasis<- ns(dt$x,knots=kk, Boundary.knots=rr)
hm1 <- hlme(y~mybasis,subject="id",ng=1,data=dt)
mynewbasis <- predict(mybasis, newdt2$x)
predictY(hm1, newdata=mynewbasis, var.time="x")

but get the following Error:
newdata should at least include the following covariates:
mybasis NA
Error in predictY.hlme(hm1, newdata = mynewbasis, var.time = "x") :
see above

It seems that there are two problems here. First, the predictY does not find the expected variable and second, it is not clear for me what should be provided as 'var.time' because here we longer have a time variable such as 'x'.

Can you please advise on how we should parametrize the 'hlme' and 'predictY' functions when using spline as a covariate (see the reproducible example 'lcmm_test2.txt' attached) to obtain the correct predictions ?

Thank you in advance for your time,

Kind regards,

Hugues.

lcmm_test2.txt

Convergence criteria fixed at an unique value

I’m working with repeated measures (n~ 14300, ~20 time points).
I'm trying to use latent class mixed model to describe evolution of a continuous marker (body mass index) according to age (age + age² + cohort effect (binary) ).

I have some convergence issues related to the “derivative” criterion which take the value 1.01 for every model tested for now.

I’ve tried several things:
Tested several initial B values,
increased rep and niter values,
split my data and run the model in each part to locate a possible error in data.

The derivative criterion is always 1.01.

Any idea where i can look for?

Thanks,

##
DATA$age <- DATA$age – 40 #minimal value
DATA$age_squared <- DATA$age * DATA$age
M1 <- hlme(fixed = BMI ~ age + age_squared + c_effect ,
            random = ~ age + age_squared, subject = "PROJ_GEST_NTT",
            data = DATA[,])

M2_Hcc<- gridsearch(hlme(fixed = BMI ~ age + age_squared + c_effect,
                        random = ~ age + age_squared, subject = "PROJ_GEST_NTT",
                        data = DATA[,], 
ng=2, 
mixture=~ age + age_squared), 
rep = 200, maxiter = 20, 
minit = M1)

plot for mean prediction

Dear Dr. Proust,

I am trying to partition patients' trajectories of a biomarker by using your LCMM package, and having a trouble with plotting mean predictions by command "plot".

My dataset "testdata" includes longitudinal data for the biomarker for ~1000 patients coming from 2 different centers, where outcome "biomarker" is a continuous variable, dummy variable "center" is used to adjust center effect, and variable "interval_year" is time variable in the latent class model. Patient's trajectory is assumed a quadratic class-specific mixed model below.

Here are my R scripts:
`m1=hlme(fixed = biomarker ~ center + interval_year + I(interval_year^2), random = ~interval_year, subject = "Study_ID", ng = 1, idiag = FALSE, nwg = FALSE, cor = AR(interval_year), data = testdata)

#Number of latent class is assumed to be 4.
#Use grid search method to estimate parameters to avoid local maxima.
m=gridsearch(hlme(fixed = biomarker~ center + interval_year+ I(interval_year^2), mixture = ~ interval_year+ I(interval_year^2), random = ~interval_year, subject = "Study_ID", ng = 4, idiag = FALSE, nwg = FALSE, cor = AR(interval_year), data = testdata), rep=10, maxiter=50, minit=m1)

##I am able to plot residuals
plot(m)

##Failed to plot mean predictions
plot(m, which("fit"), var.time="interval_year", ylab="biomarker", xlab="follow-up (by year)")
`
After running above plot for mean predictions, I got an error message Error in data[, var.time] : object of type 'symbol' is not subsettable

Do you have any idea why plotting for prediction cannot work in my case? Thanks a lot!

cecileproust-lima / lcmm Goto Github PK

lcmm's People

Contributors

Stargazers

Watchers

Forkers

lcmm's Issues

Evaluation

bmi_long from LCTMtools

data(bmi_long) bmi_long$id <- as.numeric(bmi_long$id) model1 <- lcmm::hlme(fixed=bmi~1+age+I(age^2), mixture = 1+age+I(age^2), random=-1, ng=5, nwg=FALSE, data=bmi_long[1:200,], subject="id")

Error in lcmm::hlme(fixed = bmi ~ 1 + age + I(age^2), mixture = ~1 + age + : The argument subject must be numeric

class(bmi_long$id) [1] "numeric"

Recommend Projects

Recommend Topics

Recommend Org

data(bmi_long)
bmi_long$id <- as.numeric(bmi_long$id)
model1 <- lcmm::hlme(fixed=bmi~1+age+I(age^2),
mixture = 1+age+I(age^2),
random=-1,
ng=5,
nwg=FALSE, data=bmi_long[1:200,],
subject="id")

class(bmi_long$id)
[1] "numeric"