Dear Professor Frank,
I am trying to run survfit using cph object according to the syntax mentioned here: https://www.rdocumentation.org/packages/rms/versions/5.1-1/topics/survfit.cph
My data is in a counting style format as it has TVC’s, however, it is ignoring the subject identifier “id” vector mentioned. Survfit is fitting the model by treating all observations of the same subject to be different from each other (as if it were different subjects) despite specifying the id vector. That is if there are 10 subjects with 3 time stops (1,2,3) for each subject, it is giving 30 different survival curves rather than 10 survival curves each belonging to one subject. I am really not sure what am I doing wrong?
My code is as follows:
library(survival)
install.packages("rms", dependencies = TRUE)
library(rms)
'01. input modelling data'
MOD_DS <- read.csv("V:/CoxPH_SMPL_FOR_R.csv",header=TRUE)
'02. run Cox regression using TVC'
COX_MODEL_V01 <- cph(Surv(TIME_LAG,TIME, BAD) ~ V1+ V2 + V3, data =MOD_DS,y=TRUE,x=TRUE,surv=TRUE)
'03. store covariate values to score accounts by the model' 'The covariates dataset must be in the same format as the modelling ds. It should have a BAD variable as well, though it could be coded to any dummy value as it is not used by survfit'
covs_FOR_SCORING <- read.csv("V:/covs_FOR_SCORING.csv",header=TRUE)
'04. Define the subject identifier vector'
id <- covs_FOR_SCORING[["sub_id"]]
'05. Run survfit using cph'
SCORED_POP <- summary (survfit(COX_MODEL_V01, newdata = covs_FOR_SCORING, individual = FALSE,conf.type =("none"), se.fit = F, id , type="counting"))
As noted in step 05 above, I am using id vector which uses the sub_id mentioned in the newdata = covs_FOR_SCORING dataset but the code runs ignoring the subject identifier. The number of rows in the id vector = number of rows in newdata.
P.S. I don’t want to use the survfit with coxph object as I have a dataset with 120K subjects and it takes 4 hours to fit my model on it. The cph survfit runs in 1 min!!!!
I am mentioning an example dataset for your perusal and debugging.
- please use this an input for modelling using Cox cph
- please use the same dataset for getting survival estimates through survfit
<--------------------------------------- Example dataset --------------------------------------------->
1 |
2 |
3 |
4 |
5 |
6 |
id |
death |
age |
lt_age |
time0 |
time1 |
1 |
1 |
20 |
0 |
0 |
1 |
2 |
0 |
21 |
0 |
0 |
1 |
3 |
0 |
19 |
0 |
0 |
1 |
3 |
1 |
19 |
16.05686 |
1 |
7 |
4 |
0 |
22 |
0 |
0 |
1 |
4 |
0 |
22 |
18.59216 |
1 |
7 |
4 |
1 |
22 |
22 |
7 |
10 |
5 |
0 |
20 |
0 |
0 |
1 |
5 |
0 |
20 |
16.90196 |
1 |
7 |
5 |
0 |
20 |
20 |
7 |
10 |
6 |
0 |
24 |
0 |
0 |
1 |
6 |
0 |
24 |
20.28235 |
1 |
7 |
6 |
0 |
24 |
24 |
7 |
10 |
6 |
1 |
24 |
26.73464 |
10 |
13 |
7 |
0 |
28 |
8.42884 |
0 |
2 |
7 |
0 |
28 |
19.57116 |
2 |
5 |
7 |
0 |
28 |
23.66275 |
5 |
7 |
7 |
0 |
28 |
25.28652 |
7 |
8 |
7 |
0 |
28 |
28 |
8 |
10 |
7 |
1 |
28 |
31.19041 |
10 |
13 |
library(survival)
install.packages("rms", dependencies = TRUE)
library(rms)
'01. input modelling data (please input the above dataset'
MOD_DS <- read.csv("V:/sample_ds_for_surv_cph.csv",header=TRUE)
'02. run Cox regression using TVC'
COX_MODEL_V01 <- cph(Surv(time0,time1, death) ~ age + lt_age , data =MOD_DS,y=TRUE,x=TRUE,surv=TRUE)
'03. store covariate values to score accounts by the model'
covs_FOR_SCORING <- read.csv("V:/MOD_DS .csv",header=TRUE)
id <- covs_FOR_SCORING[["id"]]
'04. score using survfit cph'
SCORED_POP <- summary(survfit(COX_MODEL_V01, newdata = covs_FOR_SCORING, individual = FALSE,conf.type =("none"),se.fit = F, id ,type="counting"))
'05. score using survfit coxph'
COX_MODEL_V01 <- coxph(Surv(time0,time1, death) ~ age + lt_age, data = MOD_DS)
SCORED_POP <- summary(survfit(COX_MODEL_V01, newdata = covs_FOR_SCORING,id=id, individual = FALSE,conf.type =("none"),se.fit = F ))
O/P from survfit coxph is different from O/P from survfit cph. The former has 7 survival curves, one for each subject. The latter has 20 survival curves.
survfit.cph output
time n.risk n.event survival1 survival2 survival3 survival4 survival5 survival6 survival7
1 7 1 0.829 0.929 0.62 0 0.971 0 0
7 5 1 0.829 0.929 0.62 0 0.971 0 0
10 4 1 0.829 0.929 0.62 0 0.971 0 0
13 2 2 0.829 0.929 0.62 0 0.971 0 0
survival8 survival9 survival10 survival11 survival12 survival13 survival14 survival15
0.829 0 0 0.996 0 0 0 0.962
0.829 0 0 0.996 0 0 0 0.962
0.829 0 0 0.996 0 0 0 0.962
0.829 0 0 0.996 0 0 0 0.962
survival16 survival17 survival18 survival19 survival20
2.73e-42 0 0 0 0
2.73e-42 0 0 0 0
2.73e-42 0 0 0 0
2.73e-42 0 0 0 0
survfit.coxph output
1
1
time n.risk n.event survival
1.000 7.000 1.000 0.829
2
time n.risk n.event survival
1.000 7.000 1.000 0.929
3
time n.risk n.event survival
1 7 1 0.620
7 5 1 0.401
4
time n.risk n.event survival
1 7 1 0.971
7 5 1 0.831
10 4 1 0.623
5
time n.risk n.event survival
1 7 1 0.829
7 5 1 0.608
10 4 1 0.384
6
time n.risk n.event survival
1 7 1 0.996
7 5 1 0.920
10 4 1 0.768
13 2 2 0.109
7
time n.risk n.event survival
1 7 1 0.962
7 5 1 0.943
10 4 1 0.878
13 2 2 0.307