Coder Social home page Coder Social logo

ohdsi / cohortmethod Goto Github PK

View Code? Open in Web Editor NEW
78.0 51.0 56.0 108.17 MB

An R package for performing new-user cohort studies in an observational database in the OMOP Common Data Model.

Home Page: https://ohdsi.github.io/CohortMethod

R 96.13% C++ 3.69% Shell 0.11% Perl 0.08%
hades

cohortmethod's Introduction

CohortMethod

Build Status codecov.io

CohortMethod is part of HADES.

Introduction

CohortMethod is an R package for performing new-user cohort studies in an observational database in the OMOP Common Data Model.

Features

  • Extracts the necessary data from a database in OMOP Common Data Model format.
  • Uses a large set of covariates for both the propensity and outcome model, including for example all drugs, diagnoses, procedures, as well as age, comorbidity indexes, etc.
  • Large scale regularized regression to fit the propensity and outcome models.
  • Includes function for trimming, stratifying, matching, and weighting on propensity scores.
  • Includes diagnostic functions, including propensity score distribution plots and plots showing covariate balance before and after matching and/or trimming.
  • Supported outcome models are (conditional) logistic regression, (conditional) Poisson regression, and (conditional) Cox regression.

Screenshots

Propensity (preference score) distributionCovariate balance plot

Technology

CohortMethod is an R package, with some functions implemented in C++.

System Requirements

Requires R (version 3.6.0 or higher). Installation on Windows requires RTools. Libraries used in CohortMethod require Java.

Installation

  1. See the instructions here for configuring your R environment, including RTools and Java.

  2. In R, use the following commands to download and install CohortMethod:

install.packages("remotes")
remotes::install_github("ohdsi/CohortMethod")
  1. Optionally, run this to check if CohortMethod was correctly installed:
connectionDetails <- createConnectionDetails(dbms="postgresql",
                                             server="my_server.org",
                                             user = "joe",
                                             password = "super_secret")

checkCmInstallation(connectionDetails)

Where dbms, server, user, and password need to be changed to the settings for your database environment. Type

?createConnectionDetails

for more details on how to configure your database connection.

User Documentation

Documentation can be found on the package website.

PDF versions of the documentation are also available:

Support

Contributing

Read here how you can contribute to this package.

License

CohortMethod is licensed under Apache License 2.0

Development

CohortMethod is being developed in R Studio.

Development status

CohortMethod is actively being used in several studies and is ready for use.

Acknowledgements

  • This project is supported in part through the National Science Foundation grant IIS 1251151.

cohortmethod's People

Contributors

aki-nishimura avatar anthonysena avatar approximateidentity avatar azimov avatar fanbu1995 avatar fdefalco avatar k-m-li avatar louisahsmith avatar msuchard avatar mvankessel-emc avatar phillipsundin avatar schuemie avatar sirpoovey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cohortmethod's Issues

Variable case inconsistency

line 848 in PsFunctions.R has an inconsistent case in the name of the variable. It is currently
beforeMatchingsumComparator - should it be beforeMatchingSumComparator?

A minor issue but I noticed this when working through the comparative cohort analysis workflow and the SqlRender function converting camel case to snake case bit me with this issue.

too long expressions in querry

Hi Martijn,

In Oracle, the maximum number of expressions in a list is 1000. if length(nsaids) in the example of "Single studies using the CohortMethod package" is greater than 1000, one error will occur. How do we deal with this issue?

One way could be like:
separate the list to couple sub-list with length<=100 and add or clause to the condition clause in the query.

Thanks,

Zuoyi

Error reproducing vignette in postgresql

Hi everyone,

When I try running the vignette in postgresql, I get the following error at 35% completion when calling getDbCohortData:

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: operator does not exist: text + character varying
Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Position: 954)

Any suggestions for how to deal with this?

Gratefully,
Trevor

CohortMethod plots: allow users to add titles?

I would like to add a title to the various plots, something that allows me to define the name of the treated/comparator group and also, if applicable, details which analysis the graph refers to. I have achieved this in the past by saving the plot object and then adding ggtitle() before saving it, but this can cause some annoying formatting issues, particularly if the title is long or multi-lined. would it be possible to have 'title' parameter on our graphs that could format better?

Add flow chart of population / attrition diagram

Create a diagram showing how much patients are 'lost' due to different filtering steps, stratified by treatment status. At least these steps should be distinguished:

  1. Having at least washout period amount of observation time
  2. Requiring the first exposure to be after the washout period
  3. Prior outcome
  4. Matching / trimming
    .

Covariate names are too long

Currently covariate names can be hundreds of characters longs, too long for visualizations and including them in readable tables. Maybe we should (also) generate abbreviates names?

Comparing Cohorts with CohortMethod

I am curious your thoughts on the virtues of using the cohortMethod as a general comparator tool for cohorts. I have a project where I am not comparing outcome differences between groups but rather want to see how the groups differ.

It strikes me that CM does this already in its intermediate steps. Any caveats I should consider?

Patrick and I have chatted about a comparison view for Heracles, but in this case I am looking more for s rank ordered list of concepts.

Dependencies missing

When installing this package with devtools, the dependencies

  • ohdsi/DatabaseConnector
  • ohdsi/SqlRender
  • ohdsi/DatabaseConnector
  • ohdsi/FeatureExtraction

Are missing. This means that devtools::install_github("ohdsi/CohortMethod") fails.
Installing the dependencies by hand allows devtools to install CohortMethod.

Bug: CohortMethod.sql, varchar field in cohort_covariate_ref table not large enough?

This field doesn't seem to be large enough for all cases:

https://github.com/OHDSI/CohortMethod/blob/master/inst/sql/sql_server/CohortMethod.sql#L154

When that gets translated to redshift (at least in my case), the field becomes varchar(256). But this isn't long enough for some cases at this line in the CohortMethod.sql code:

https://github.com/OHDSI/CohortMethod/blob/master/inst/sql/sql_server/CohortMethod.sql#L681

That field can end up too long if the concept_name field is long. For example, for concept_id 439181, we have that concept_name is: "Cortex contusion without open intracranial wound AND with prolonged loss of consciousness (more than 24 hours) without return to pre-existing conscious level" which ends up being too long (total length of concatenated string is 274).

I have code that replicates this if anyone is curious (I doubt it), but I'd have to clean it a bit so I'll only do that if someone's interested. I guess the field size should be increased (though the "max" there is a little ominous) or maybe the field should just be changed to "text"? I'm not sure what the best solution is myself since I'm still not too comfortable with all the inner workings of SqlRender, etc.

Error on getPsModel

Let me know if I can provide any additional details.

propensityModel <- getPsModel(results, cohortData)
Error in abs(cfs$coefficient) :
non-numeric argument to mathematical function
In addition: Warning message:
In merge.ffdf(ff::as.ffdf(cfs), cohortData$covariateRef, by.x = "id", :
No match found, returning NULL as ffdf can not contain 0 rows

traceback
function (x = NULL, max.lines = getOption("deparse.max.lines"))
{
if (is.null(x) && !is.null(x <- get0(".Traceback", envir = baseenv()))) {
}
else if (is.numeric(x))
x <- .Internal(traceback(x))
n <- length(x)
if (n == 0L)
cat(gettext("No traceback available"), "\n")
else {
for (i in 1L:n) {
label <- paste0(n - i + 1L, ": ")
m <- length(x[[i]])
if (!is.null(srcref <- attr(x[[i]], "srcref"))) {
srcfile <- attr(srcref, "srcfile")
x[[i]][m] <- paste0(x[[i]][m], " at ", basename(srcfile$filename),
"#", srcref[1L])
}
if (m > 1)
label <- c(label, rep(substr(" ", 1L,
nchar(label, type = "w")), m - 1L))
if (is.numeric(max.lines) && max.lines > 0L && max.lines <
m) {
cat(paste0(label[1L:max.lines], x[[i]][1L:max.lines]),
sep = "\n")
cat(label[max.lines + 1L], " ...\n")
}
else cat(paste0(label, x[[i]]), sep = "\n")
}
}
invisible(x)
}
<bytecode: 0x000000001505bed0>
<environment: namespace:base>

plotPS enhancements: adding information about equipoise

On the propensity score distribution plot, could we add some statistics like:

  • total cohort size in treated/comparator (perhaps "n=xxx' on the legend)
  • % of treated/comparator group in clinical equipoise (e.g. % with 0.4<=PrefScore<=0.6)
  • % of treated/comparator group without overlap

Error When Running CM Analysis

I received this error when running CM (tough to get after 9.72 hours! :) ):

Error in UseMethod("open") :
no applicable method for 'open' applied to an object of class "data.frame"

Here is the full output, and below is my study configuration:

Connecting using Oracle driver

Constructing treatment and comparator cohorts
Executing multiple queries. This could take a while
|==================================================================================================================================| 100%
Analysis took 0.822 secs
Fetching data from server
Loading took 2.56 secs
Constructing default covariates
|==================================================================================================================================| 100%
Analysis took 9.72 hours
Done
Fetching data from server
Loading took 4.11 mins
Removing redundant covariates
Normalizing covariates

Constructing outcomes
Executing multiple queries. This could take a while
|==================================================================================================================================| 100%
Analysis took 6.27 secs
Done
Fetching data from server
Loading took 0.306 secs
Error in UseMethod("open") :
no applicable method for 'open' applied to an object of class "data.frame"
In addition: Warning message:
In lowLevelQuerySql.ffdf(connection, sql) :
Data has zero rows, returning an empty data frame

Study configuration (I did not use any excluded concept, which I realize is not good but I don't think the cause of the error):

covarSettings <- createCovariateSettings(useCovariateDemographics = TRUE,
useCovariateConditionOccurrence = TRUE,
useCovariateConditionOccurrence365d = TRUE,
useCovariateConditionOccurrence30d = TRUE,
useCovariateConditionOccurrenceInpt180d = TRUE,
useCovariateConditionEra = TRUE,
useCovariateConditionEraEver = TRUE,
useCovariateConditionEraOverlap = TRUE,
useCovariateConditionGroup = TRUE,
useCovariateDrugExposure = TRUE,
useCovariateDrugExposure365d = TRUE,
useCovariateDrugExposure30d = TRUE,
useCovariateDrugEra = TRUE,
useCovariateDrugEra365d = TRUE,
useCovariateDrugEra30d = TRUE,
useCovariateDrugEraEver = TRUE,
useCovariateDrugEraOverlap = TRUE,
useCovariateDrugGroup = TRUE,
useCovariateProcedureOccurrence = TRUE,
useCovariateProcedureOccurrence365d = TRUE,
useCovariateProcedureOccurrence30d = TRUE,
useCovariateProcedureGroup = TRUE,
useCovariateObservation = TRUE,
useCovariateObservation365d = TRUE,
useCovariateObservation30d = TRUE,
useCovariateObservationCount365d = TRUE,
useCovariateMeasurement365d = TRUE,
useCovariateMeasurement30d = TRUE,
useCovariateMeasurementCount365d = TRUE,
useCovariateMeasurementBelow = TRUE,
useCovariateMeasurementAbove = TRUE,
useCovariateConceptCounts = TRUE,
useCovariateRiskScores = TRUE,
useCovariateRiskScoresCharlson = TRUE,
useCovariateRiskScoresDCSI = TRUE,
useCovariateRiskScoresCHADS2 = TRUE,
useCovariateInteractionYear = FALSE,
useCovariateInteractionMonth = FALSE,
deleteCovariatesSmallCount = 100)

cohortMethodData <- getDbCohortMethodData(connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
oracleTempSchema = resultsDatabaseSchema,
targetId = 1082,
comparatorId = 1081,
indicationConceptIds = c(),
washoutWindow = 183,
indicationLookbackWindow = 183,
studyStartDate = "",
studyEndDate = "",
outcomeIds = 1080,
outcomeConditionTypeConceptIds = c(),
exposureDatabaseSchema = resultsDatabaseSchema,
exposureTable = "CATH_STUDY",
outcomeDatabaseSchema = resultsDatabaseSchema,
outcomeTable = "CATH_STUDY",
excludeDrugsFromCovariates = FALSE,
covariateSettings = covarSettings,
cdmVersion = cdmVersion)

Add paremeter checking to getDbCohortData

I was accidentally passing an empty character vector as indication_concept_ids to getDbCohortData (https://github.com/OHDSI/CohortMethod/blob/master/R/DataLoadingSaving.R#L114) and the only reason I noticed was because the auc came out as larger than 1. In fact, the default parameter of c() does exactly the same thing. Wouldn't it be better if we had a parameter check to make sure that indication_concept_ids pass a non-empty vector of integers? Maybe the default parameter should be something legal or should be removed entirely?

I also suspect that if you pass an indication_concept_ids vector with invalid ids (i.e. a number which does not show up in the database), this might have the same effect (assuming there is a sql call somewhere with "... WHERE indication_concept_id IN (####)". If #### is an id that doesn't show up in the database, then that might have the same effect as having no number there at all (which results in a messed up auc). Maybe there needs to be a check deeper somewhere? Or maybe that's too complex and in that situation we can't save the user from himself.

Run CohortMethod on an exemplar study, and document in a vignette

To see if the package has the functionality we need to do studies, we need to replicate an existing study. We'll probably do the classic coxibs-vs-non-selective-NSAIDS-for-UGIB study. Once we've done that, we can capture the process of doing the study in a vignette using KNITR.

TestUnits fail on Travis-CI

Some tests in test-parameterSweep.R fail on Travis-CI (and have been commented out), but succeed on my Mac OS X install and (I assume) under @schuemie via Windows.

Create a simple simulation framework for generating example data

For the vignette in issue #8 we need simulated data that looks just like the real thing. It should be relatively straightforward to generate simulated data as a cohortData object based on the observed statistics in the exemplar study.
Basically, we'll follow these steps:

  1. Generate covariate data, sampling from observed prevalences.
  2. Generate treatment status using the generated covariates and observed betas in the PS model.
  3. Generate outcomes using the generated covariates, treatment status, and observed betas in the outcome model.

New parameter to restrict cohorts to common time period

When comparing two treatments, it may be possible to select treatments which are not both available at the same time period, so therefore during the non-overlapping time, they do not represent valid counterfactual comparisons. For example, when comparing two drugs, one approved in 2013 and another approved in 2014, the only valid time to base a comparison would be 2014 onward, because the second treatment wasn't available in 2013 (and any propensity score would warn in this if INDEX_YEAR were included in the model). A proposed solution: provide a analysis parameter to: 'limit cohorts to period of overlapping calendar time', which would restrict the data to the maximum of the minimum cohort start dates of the two cohorts, and would run through the minimum of the maximum time-at-risk end dates.

Error when indicationConceptIds is null

DataLoadingSaving.R runs a query to summarize #indicated_cohorts, but that table only exists if indicationConceptIds is not null.

I think line 182 has to be changed to: if (indicationConceptIds[1] != ""){

Error running getDbCohortData: cohort_definition_id

I get an error when running getDbCohortData with the default parameters. The issue seems to stem from this commit: dfe0a4d

Here's the code that reproduces the error. The code runs without problems for me on this commit: f8e5a84 (assuming you uncomment the code in the test case corresponding to the change in function name).

library(SqlRender)
library(CohortMethod)

setwd("/tmp")

connectionDetails <- createConnectionDetails(
    dbms = "redshift",
    server = "omop-datasets.cqlmv7nlakap.us-east-1.redshift.amazonaws.com/truven",
    user = Sys.getenv("USER"),
    password = Sys.getenv("MYPGPASSWORD"),
    schema = "mslr_cdm4",
    port = "5439")

# Works on commit f8e5a848b9f55f61785fac1aa1d9e50d97f2628d
#cohortdata <- getDbCohortDataObject(
#    connectionDetails,
#    cdmSchema = connectionDetails$schema,
#    resultsSchema = connectionDetails$schema)

# Does not work on master branch.
cohortdata <- getDbCohortData(
    connectionDetails,
    cdmSchema = connectionDetails$schema,
    resultsSchema = connectionDetails$schema)

Here is the error message:

DBMS:
redshift

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: column c1.cohort_definition_id does not exist)

SQL:
INSERT INTO raw_cohort (cohort_id, person_id, cohort_start_date, cohort_end_date, observation_period_end_date)
SELECT DISTINCT raw_cohorts.cohort_id,
  raw_cohorts.person_id,
  raw_cohorts.cohort_start_date,
  raw_cohorts.cohort_end_date
  AS cohort_end_date,
  op1.observation_period_end_date
  AS observation_period_end_date
FROM (



        SELECT CASE
                WHEN c1.cohort_definition_id = 755695
                    THEN 1
                WHEN c1.cohort_definition_id = 739138
                    THEN 0
                ELSE - 1
                END AS cohort_id,
            c1.subject_id as person_id,
            min(c1.cohort_start_date) AS cohort_start_date,
            min(c1.cohort_end_date) AS cohort_end_date
        FROM mslr_cdm4.drug_era c1
        WHERE c1.cohort_definition_id in (755695,739138)
        GROUP BY c1.cohort_definition_id,
            c1.subject_id

    ) raw_cohorts
INNER JOIN mslr_cdm4.observation_period op1
    ON raw_cohorts.person_id = op1.person_id 

INNER JOIN (
    SELECT person_id,
        condition_start_date AS indication_date
    FROM mslr_cdm4.condition_occurrence
    WHERE condition_concept_id IN (
            SELECT descendant_concept_id
            FROM mslr_cdm4.concept_ancestor
            WHERE ancestor_concept_id IN (439926)
            )
    ) indication
    ON raw_cohorts.person_id = indication.person_id
  AND raw_cohorts.cohort_start_date <= ( indication.indication_date +  183)
  AND raw_cohorts.cohort_start_date >= indication.indication_date

WHERE raw_cohorts.cohort_start_date >= ( op1.observation_period_start_date +  183)
    AND raw_cohorts.cohort_start_date <= op1.observation_period_end_date

Change computeCovariateBalance function to deal with strata

Currently computeCovariateBalance() only computes the overall covariate balance, which is fine when having performed 1-on-1 matching, but pointless when having performed variable ratio matching or stratification.

Need to change the function to compute means per stratum, and aggregate.

ERROR: relation "cov_m_below" does not exist

I'm trying to run the R code generated from Atlas for some simple cohorts. Everything runs fine up until I run this code:

> cohortMethodData <- getDbCohortMethodData(connectionDetails = connectionDetails,
+                                               cdmDatabaseSchema = cdmDatabaseSchema,
+                                               oracleTempSchema = resultsDatabaseSchema,
+                                               targetId = 12,
+                                               comparatorId = 11,
+                                               outcomeIds = 56,
+                                               studyStartDate = "",
+                                               studyEndDate = "",
+                                               exposureDatabaseSchema = resultsDatabaseSchema,
+                                               exposureTable = exposureTable,
+                                               outcomeDatabaseSchema = resultsDatabaseSchema,
+                                               outcomeTable = outcomeTable,
+                                               cdmVersion = cdmVersion,
+                                               excludeDrugsFromCovariates = FALSE,
+                                               firstExposureOnly = FALSE,
+                                               removeDuplicateSubjects = TRUE,
+                                               washoutPeriod = 365,
+                                               covariateSettings = covariateSettings)
Connecting using PostgreSQL driver

Constructing treatment and comparator cohorts
  |=========================================================================================================================================================================================| 100%
Analysis took 0.809 secs
Fetching cohorts from server
Fetching cohorts took 1.14 secs
Constructing default covariates
  |============================================================================================                                                                                             |  49%Error executing SQL: Error in .local(conn, statement, ...): execute JDBC update query failed in dbSendUpdate (ERROR: relation "cov_m_below" does not exist
  Position: 829)

An error report has been created at  /tmp/errorReport.txt
Error in value[[3L]](cond) : no loop for break/next, jumping to top level

What is cov_m_below?

createPs returns NaN

If I run the script below, the auc that is returned is NaN. It's not the same problem as in issue #24 because in this case getdrugfromindication() is actually returning some ids.

library(SqlRender)
library(Cyclops)
library(CohortMethod)


# Login info.
connectionDetails <- createConnectionDetails(
    dbms = "redshift",
    user = Sys.getenv("USER"),
    password = Sys.getenv("MYPGPASSWORD"), 
    server = "omop-datasets.cqlmv7nlakap.us-east-1.redshift.amazonaws.com/truven",
    schema = "mslr_cdm4",
    port = "5439")


# The function that does the analysis.
test <- function() {
    drug_concept_id <- 1342001
    drug_concept_name <- "Enalaprilat"
    comparator_drug_concept_id <- 974166
    comparator_drug_concept_name <- "Hydrochlorothiazide"
    indication_concept_id <- 21001432
    indication_concept_name <- "Hypertension"

    lowBackPain = 194133

    # Get SNOMED-CT drug_concept_id from indication.
    drug_indication_concept_ids <- getdrugfromindication(
        connectionDetails,
        indication_concept_id)

    num_ids <- length(unique(drug_indication_concept_ids))
    print(num_ids)

    # Cohort Method.
    cohortdata <- getDbCohortData(
        connectionDetails,
        cdmSchema = connectionDetails$schema,
        resultsSchema = connectionDetails$schema,
        targetDrugConceptId = drug_concept_id,
        comparatorDrugConceptId = comparator_drug_concept_id,
        indicationConceptIds = drug_indication_concept_ids)

    num_persons <- length(unique(cohortdata$cohorts$personId))
    print(num_persons)
    num_covariates <- length(unique(cohortdata$covariates$covariateId))
    print(num_covariates)

    ps <- createPs(
        cohortdata,
        lowBackPain)

    auc <- computePsAuc(ps)
    print(auc)

    return(auc)
}


getdrugfromindication <- function(connectionDetails, indication_concept_id) {
    sql <- "
    SELECT DISTINCT
        c2.concept_id
    FROM (
        SELECT
            *
        FROM vocabulary.concept
        WHERE
            concept_id = @indication_concept_id
        ) t1 INNER JOIN vocabulary.concept_relationship cr1
            ON t1.concept_id = cr1.concept_id_1
        INNER JOIN vocabulary.concept c1
            ON cr1.concept_id_2 = c1.concept_id
            AND c1.vocabulary_id = 1
        INNER JOIN vocabulary.concept_ancestor ca1
            ON c1.concept_id = ca1.ancestor_concept_id
        INNER JOIN vocabulary.concept c2
            ON ca1.descendant_concept_id = c2.concept_id
            AND c2.vocabulary_id = 1
    ;
    "

    sql <- renderSql(
        sql = sql,
        indication_concept_id = indication_concept_id)$sql

    conn <- connect(connectionDetails)
    data <- dbGetQuery(conn, sql)
    dbDisconnect(conn)

    data$concept_id
}


auc <- test()

Add includeCovariateIds parameter to createPS function

Sometimes you want to include only a subset of the covariates

Currently, createPS only allows removal of covariates using the 'excludeCovariateIds' parameter, but I think it'd be also helpful to have 'includeCovariateIds' parameter. Default behavior, if null, would be to include all covariates. However, if non-null, the list of covariates used should be restricted to those in the list.

Change schema calls to allow different SQL Server schemas inside of database

SQL Server has database and schema. Currently our SQL assumes the schema is 'dbo'. We can remove this assumption, thereby require SQL Server users to put 'DBName.SchemaName' in the string for the cdmSchema and resultsSchema parameters. Then, when this code is rendered for SQL Server, it will flexibly work for all database/schema, and when translated to Oracle/Postgres, the schema (if .dbo) could be removed and not included.

Will make this change after we have a stable version working, because it'll change all our current calls to getDBCohortData used in our development.

Add era construction function

Often we'd like to define our treatment and comparator cohorts not as a single drug, but a combination (e.g. drug classes comprising multiple drugs). In that case it is important to construct correct eras (periods of non-overlapping continuous use of the drug) for those combinations of exposures. SQL for performing this task is floating around OHDSI, but should ideally be captured in a function. This could live in the CohortMethod package for now, although it is a function that is generic to all methods.

Add parameter to allow for sampling of T and C in the data fetch step

Use case: sometimes, during initial feasibility, it may be useful to sample from T and C to fit a propensity score model and execute diagnostics to assess the adequacy of a study, prior to implementing a full study for the outcome of interest. Sampling T/C can reduce the data size and the wait time associated with computing feature extraction and data download.

I think the parameter should be added to the function getDbCohortMethodData.

The PatientLevelPrediction package has an analogous parameter in getPLPData called 'sampleSize', seen here.

Make sure we correctly handle all sources for exposures and outcomes

The getDbCohortData correctly needs to handle fetching exposures from

  1. drug_exposure
  2. drug_era
  3. cohort (either within the CDM schema or a separate schema)
    and handle fetching outcomes from
  4. condition_occurrence
  5. condition_era
  6. cohort (either within the CDM schema or a separate schema)
    In all scenarios, the function should select the appropriate variable names, and use type_concept_id fields when available.

I think currently this is implemented consistently for all scenarios

Cyclops' cross-validation does not pick very optimal hyperparameters for CohortMethod

When using cross-validation to pick the hyperparameter, the propensity scores have very low AUCs, and do not lead to good covariate balance. Simply picking hyperparameter=0.1 gives much better results. We need to figure out if this is due to overfitting (ie the cross-validation is correct), mismatch between optimization functions, or something else.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.