Template-kod från Projekt 29 som förhoppningsvis kan återanvändas.
title: "External validation"
author: "Erik Bulow"
date: '2019-11-13'
output:
html_document:
toc: true
toc_float: true
df_print: paged
code_folding: show
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, cache = TRUE)
Start
Install and attatch some useful packages
pkgs <- c("tidyverse", "doParallel", "pROC", "rms", "givitiR")
# Install
pkgsinst <- setdiff(pkgs, rownames(installed.packages()))
if (length(pkgsinst)) install.packages(pkgsinst)
# Attatch
purrr::walk(pkgs, ~suppressPackageStartupMessages(library(., character.only = TRUE)))
Load the exported model to validate (the R object previsously sent, probably with another path).
# Load the exported model object
load("../cache/fit_export.RData")
Set random seed for reproducability:
Prepare data
Inclusion/exlusion
Those were the inclusions/exklusions from Sweden. It might not be necessary to filter out on BMI, hospital and education however. Those variables are not used in the model.
Additional filter to ages 35-99 years to match the Swedish cohort.
knitr::include_graphics("../graphs/flowchart.png")
Variables
Outcome
The outcome variable is boolean (or numeric) indicating wether the patient died of any cause within 90 days after THA (TRUE
/1) or not (FALSE
/0). We did not have any censoring in Sweden. I guess we can simply drop cases where status is unknown?
Predictors
The data to evaluate should look like this:
Baseline variables
- P_Gender: Kvinna/Man = Female/Male
- P_ASA: level 1-3
- P_Age: 35 - 99
Comorbidities
Comorbidities are identified by Elixhauser and Charlson comorbidity during one year before surgery. This is done by ICD-10 codes in Sweden but other versions exist. If ICD-10-codes are used those might be identified by regular expressions in the table below. Those are based on codes like "C123". Hence no puntctuation (not "C12.3"). The regular expressions might be modified to allow possible other patterns. Regular expressoins below are just combined from differnet groups (separated by "|"). They could be rewritten for clarity. If you don't see any regular expressions, click the small arrow in the upper right corner of the table. The grepl
function (base R) can be used to identify ICD-10-codes based on the regular expressions.
load("../cache/categorization.RData")
comorbidities <-
categorization %>%
map_df(as.character) %>%
filter(new %in% names(fit_export$data)[grepl("c_", names(fit_export$data))]) %>%
separate(old, letters, sep = "\\|", fill = "right") %>%
pivot_longer(letters, values_drop_na = TRUE) %>%
mutate(value = trimws(value)) %>%
separate(value, c("index", "name"), extra = "merge") %>%
mutate(name = gsub("_", " ", name))
CCI <-
comorbidities %>%
filter(index == "CCI") %>%
left_join(coder::charlson_icd10, c(name = "group"))
ECI <-
comorbidities %>%
filter(index == "ECI") %>%
left_join(coder::elix_icd10, c(name = "group"))
comorb_defs <-
bind_rows(CCI, ECI) %>%
select(new, regex) %>%
mutate(regex = sprintf("(%s)", gsub("^", "", regex, fixed = TRUE))) %>%
group_by(new) %>%
summarise(regex = paste(regex, collapse = "|"))
comorb_defs
Example data
Let's assume we now have some data (I will use the Swedish data just as an example).
y <- fit_export$y
X <- fit_export$data
Validation of model as is
# Tibble with observed and predicted outcome
obspred <-
tibble(
obs = y,
pred = predict(fit_export, X, type = "response")
)
# ROC curve
ROC <- pROC::roc(obspred, "obs", "pred", direction = "<")
# Estimate CI for AUC based on bootstrapping
# Use parallel processing to speed up the process
doParallel::registerDoParallel()
AUCci <-
pROC::ci.auc(
ROC,
method = "bootstrap",
boot.stratified = FALSE,
parallel = TRUE
)
# Check calibration. Note that devel should actually be "internal" for this example but I use
# "external", since that's what you will use for the UK data.
calibration <-
givitiR::givitiCalibrationBelt(
obspred$obs,
obspred$pred,
devel = "external"
)
Results
For this example we had AUC:
A calibration belt plot might be illustrated as:
plot(calibration, xlim = c(0, 0.03), ylim = c(0, 0.06))
Re-calibrated intercept
Method 2 from table 1 in Steyerberg 2004.
Z <- predict(fit_export, X, type = "response")
# Refit the intercept using Z = a + Xb from above as offset
fit2 <- glm(y ~ 1, offset = Z)
# Same calibration and validation as above
obspred2 <- tibble(obs = y, pred = predict(fit2, type = "response"))
ROC2 <- pROC::roc(obspred2, "obs", "pred", direction = "<")
AUCci2 <- pROC::ci.auc(ROC2, method = "bootstrap", boot.stratified = FALSE, parallel = TRUE)
calibration2 <- givitiR::givitiCalibrationBelt(obspred2$obs, obspred2$pred, devel = "external")
Results
AUCci2
plot(ROC2)
plot(calibration2, xlim = c(0, 0.03), ylim = c(0, 0.06))
Re-calibration of intercenpt and calibration slope
Method 3 from table 1 in Steyerberg 2004.
fit3 <- glm(y ~ 1 + Z)
obspred3 <- tibble(obs = y, pred = predict(fit3, type = "response"))
ROC3 <- pROC::roc(obspred3, "obs", "pred", direction = "<")
AUCci3 <- pROC::ci.auc(ROC3, method = "bootstrap", boot.stratified = FALSE, parallel = TRUE)
calibration3 <- givitiR::givitiCalibrationBelt(obspred3$obs, obspred3$pred, devel = "external")
Results
AUCci3
plot(ROC3)
plot(calibration3, xlim = c(0, 0.03), ylim = c(0, 0.06))
Export data to Sweden
ROC
If it would be OK to export coordinates for ROC-plots I would recommend this code to extract only the minimal data needed:
roc_plot_coords <-
data.frame(
specificities = ROC$specificities,
sensitivities = ROC$sensitivities
)
roc3_plot_coords <-
data.frame(
specificities = ROC3$specificities,
sensitivities = ROC3$sensitivities
)
AUC with CI
The text output from AUCci
and AUCci3
should be enough. Hence, the same character string that gets printed above (but now stored in an object).
AUCci_print <- capture.output(AUCci)
AUCci3_print <- capture.output(AUCci3)
Export objects
Save objects above to file export.RData
(in the current working directory).
save(
roc_plot_coords,
roc3_plot_coords,
AUCci_print,
AUCci3_print,
file = "export.RData"
)
Calibration plots
Help function
This is a simple help function to make a clean calibration belt plot and save it as TIFF (in the curent working directory):
makeplot <- function(x, file_name = deparse(substitute(x))) {
tiff(
paste0(file_name, ".tiff"),
1024, 1024, pointsize = 36,
compression = "lzw"
)
tcks <- seq(.0, .1, .01)
plot(
x,
xlim = c(0, .06),
ylim = c(0, .08),
xlab = "Predicted probabilities [%]",
ylab = "Observed probabilities [%]",
main = NULL,
table = FALSE,
polynomialString = FALSE,
pvalueString = FALSE,
nString = FALSE,
mar = c(5, 4, 0, 0) + 0.1,
xaxt = "n",
yaxt = "n"
)
abline(v = .03, lty = "dashed", col = "darkgreen", lwd = 3)
axis(1, at = tcks, lab = sprintf("%.0f", tcks * 100), las = TRUE)
axis(2, at = tcks, lab = sprintf("%.0f", tcks * 100), las = TRUE)
dev.off()
}
Make and save figures
makeplot(calibration)
makeplot(calibration3)
ROC plots
If it not possible to export data for ROC-plot, here is some code to make a figure and save it as TIFF (in the current working directory) instead:
roc_plot_coords %>%
ggplot(aes(1 - specificities, sensitivities)) +
geom_path(size = 2) +
geom_abline(intercept = 0, slope = 1, color = "grey", linetype = 2) +
theme_minimal() +
theme(
legend.position = c(1, 0),
legend.justification = c(1, 0),
legend.title = element_blank()
)
ggsave(
"roc.tiff",
height = 10,
width = 10,
unit = "cm",
dpi = 900,
compression = "lzw"
)