When using cross-validation to pick the hyperparameter, the propensity scores have ver

Cyclops' cross-validation does not pick very optimal hyperparameters for CohortMethod about cohortmethod HOT 6 CLOSED

ohdsi commented on June 12, 2024

Cyclops' cross-validation does not pick very optimal hyperparameters for CohortMethod

from cohortmethod.

Comments (6)

msuchard commented on June 12, 2024

This does disturb me. Is there a convenient way for me to reproduce the data and PS model? Fitting decisions to consider:

auto- vs. grid-search (auto-search seems to work better in my hands)
of fold (10 is pretty standard, but data may be very noisy)
of replicates (default is 1, but data may be very noisy)
how we pick the hold-out set at each hyper parameter (I am concerned that I coded a very silly choice here)

from cohortmethod.

schuemie commented on June 12, 2024

Below some simple test code that reproduces the problem (I think). I'm using train-test split, so we should be able to rule out the overfitting hypothesis.

library("Cyclops")
library("pROC")


predictOnTest <- function(fit,test){
    betas <- coef(fit)
    intercept <- betas[1]
    betas <- betas[2:length(betas)]
    betas <- data.frame(beta = as.numeric(betas),covariateId = as.numeric(names(betas)))
    prediction <- merge(test$covariates,betas)
    prediction$value = prediction$covariateValue * prediction$beta 
    prediction <- aggregate(value ~ rowId,data=prediction,sum)
    prediction$value = prediction$value + intercept
    link <- function(x) {
        return(1/(1+exp(-x)))
    }
    prediction$value = link(prediction$value)
    return(prediction)
}

evaluate <- function(fit,test){
    pred <- predictOnTest(fit,test)
    predVsTruth <- merge(pred,data$outcomes[,c("rowId","y")])
    auc <- roc(response = predVsTruth$y, predictor = predVsTruth$value)$auc
    writeLines(paste("Variance =",fit$variance,", AUC=",as.character(auc)))   
}

ntest <- 1000
ntrain <- 1000

data <- simulateData(nstrata=1,nrows=ntest+ntrain,ncovars=2000,model="logistic")
test <- list(outcomes = data$outcomes[1:ntest,], covariates = data$covariates[data$covariates$rowId %in% data$outcomes$rowId[1:ntest],])
train <- list(outcomes = data$outcomes[(ntest+1):(ntest+ntrain),], covariates = data$covariates[data$covariates$rowId %in% data$outcomes$rowId[(ntest+1):(ntest+ntrain)],])
cyclopsData <- convertToCyclopsData(train$outcomes,train$covariates,modelType = "lr",addIntercept = TRUE)


prior <- createPrior("laplace", useCrossValidation = TRUE)
control <- createControl(lowerLimit=0.01, upperLimit=10, fold=5, noiseLevel = "silent")
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)
evaluate(fit,test)

prior <- createPrior("laplace", useCrossValidation = TRUE)
control <- createControl(noiseLevel = "silent")
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)
evaluate(fit,test)

prior <- createPrior("laplace", variance = 0.1)
control <- createControl(noiseLevel = "silent")
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)
evaluate(fit,test)

prior <- createPrior("laplace", variance = 1)
control <- createControl(noiseLevel = "silent")
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)
evaluate(fit,test)

prior <- createPrior("laplace", variance = 10)
control <- createControl(noiseLevel = "silent")
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)
evaluate(fit,test)

from cohortmethod.

msuchard commented on June 12, 2024

It appears that we should reconsider our cross-validation selection criterion. We are currently attempting to maximize the predicted (log) likelihood of the hold-out data. Using the following evaluation functions

predictOnTest <- function(fit,test){
    betas <- coef(fit)
    intercept <- betas[1]
    betas <- betas[2:length(betas)]
    betas <- data.frame(beta = as.numeric(betas),covariateId = as.numeric(names(betas)))
    prediction <- merge(test$covariates,betas)
    prediction$value = prediction$covariateValue * prediction$beta 
    prediction <- aggregate(value ~ rowId,data=prediction,sum)
    prediction$value = prediction$value + intercept
    link <- function(x) {
        return(1/(1+exp(-x)))
    }
    prediction$xBeta = prediction$value
    prediction$value = link(prediction$value)
    return(prediction)
}

evaluate <- function(fit,test){
    pred <- predictOnTest(fit,test)
    predVsTruth <- merge(pred,data$outcomes[,c("rowId","y")])
    auc <- roc(response = predVsTruth$y, predictor = predVsTruth$value)$auc    
    predLogLik <- sum(predVsTruth$y * predVsTruth$xBeta) - sum(log(1 + exp(predVsTruth$xBeta)))
    writeLines(paste("Variance =",fit$variance,", AUC=",as.character(auc), "PL=", predLogLik))   
}

now reports the predicted (log) likelihood of the test dataset. CV does seem to be doing a reasonable job finding a maximum when cvRepetitions is pumped up to, say, 10.

For reproducibility, I have been using

set.seed(666)

What easy-to-compute criterion should we be using if we want maximize discrimination (AUC)?

from cohortmethod.

msuchard commented on June 12, 2024

We should also be excluding the intercept term from regularization via

prior <- createPrior("laplace", exclude = c(0), useCrossValidation = TRUE)

from cohortmethod.

schuemie commented on June 12, 2024

Hi Marc,

I'm not sure what exactly solved the problem, but it is solved now. The optimal likelihood also leads to good AUC, although not optimal. But optimizing on AUC seems like its not a very good idea anyway. After fitting a propensity score using cross-validation (using the default settings, so the auto approach), we now get covariate balance.

My guess is that fixing of the folds leads to more stable prediction of performance at grid points, and therefore better estimation of the optimal variance. Anyway, I'm closing this issue.

(It would be nice if we could use parallelization to speed up the cross-validation though ;-) )

from cohortmethod.

schuemie commented on June 12, 2024

Just an afterthought: unstable cross-validation estimates due to rerandomization of the folds would mostly affect evaluations of large prior variances; for smaller variances performance would be more stable because everything shrinks towards 0. This would bias the optimization to select smaller variances, which is what I think I observed.

from cohortmethod.

Cyclops' cross-validation does not pick very optimal hyperparameters for CohortMethod about cohortmethod HOT 6 CLOSED

Comments (6)

of fold (10 is pretty standard, but data may be very noisy)

of replicates (default is 1, but data may be very noisy)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent