Coder Social home page Coder Social logo

peterbanda / ehr-ohdsi-processor Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 198 KB

OHDSI data processor used to generate various features from EHRs (Electronic Health Records). It was developed and deployed for EHR Morbidity DREAM Challenge.

Home Page: https://www.synapse.org/#!Synapse:syn18405991/wiki/589657

License: Apache License 2.0

Scala 100.00%
ehr ohdsi akka

ehr-ohdsi-processor's Introduction

EHR OHDSI Processor version License Build Status

  • Single pass EHR (Electronic Health Record) feature generation and extraction pipeline using OHDSI standardized tables/csvs and concepts.

  • The processor is implemented using Akka streaming technology, which contributes to memory efficient, fast, and scalable asynchronous processing. The core abstraction is a so-called flow: an in-and-out foldable stream attached to a csv source with prefiltered values based on matching date time intervals. Flows for different features are bundled where the input parsed values are broadcasted, zipped and unzipped accordingly.

  • Features are generated based on JSON-like settings in application.conf, which can be freely altered and rerun without touching or recompiling the code.

  • Supported features types are:

    • Counts - number of records per date interval.

    • Distinct Counts - number of distinct values within a given column (e.g., concept ids and care sites) per date interval.

    • Concept Category Counts - number of records whose concept ids for a given column belong to a given category per date interval.

    • Concept Category Exist Flags - binary flag indicating whether there exists at least one record with a concept id for a given column belonging to a given category per date interval.

    • Duration - duration from the first occurrence calculated per date interval.

    • Sums - sum of values for a given column and date interval; used purely for drug quantity.

    • Time-lag Features - records (for each table/csv file) are sorted by date, then time lags are calculated and compacted to mean, std, min, max, and dominant relative differential frequency (grouped to 5 bins: -2,-1,0,1,2) indicating the prevalent direction/acceleration of record dates.

    • Comorbidity scores - linear comorbidity measure calculated based on Elixhuser's categories per date interval (reported to have good predictability for short-term mortality). Two versions with different weights were used: AHRQ and van Walraven.

    • Dynamically Calculated Comorbidity Scores - instead of using fixed weights as above, weights are dynamically calculated based on differential relative ratios of the dead patients (withing 6 months) vs others. Two versions were implemented (see bellow): the first one uses the same categories as Elixhauser, the second one takes into account only serious conditions.

    • Non-Aggregate (Static) Features - these features are directly generated from person.csv and include gender (concepts binarized), age_at_last_visit, year_of_birth, and month_of_birth.

  • Additionally, the processor supports custom date intervals counted from the last visit (for each person), such as last 6 months, last 5-3 years, etc. For each such date interval a set of features are generated.

Prerequisite

  • JDK 1.8 or higher

Build

To create an executable jar with all dependencies run

sbt assembly

This will produce a file such as ehr-ohdsi-processor-assembly-0.4.1.jar

Usage

1. Feature Generation

  • basic feature generation
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -i=<input_folder> -o=<output_file_name>
  • or without an output file (features.csv in the input folder will be used)
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -i=<input_folder>
  • note the optional 'mode' option
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -mode=features -i=<input_folder> -o=<output_file_name>
  • features generation using custom features, concept categories, or date intervals passed via '-Dconfig.file'
java  -Xms10g -Xmx10g -Xss1M -Dconfig.file=<my_custom_application.conf> -jar ehr-ohdsi-processor-assembly-0.4.1.jar -i=<input_folder> -o=<output_file_name>
  • features generation with time-lag based features
java  -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -with_time_lags= -i=<input_folder> -o=<output_file_name>
  • features generation with time-lag based features and dynamic scores' weights export
java  -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -with_time_lags= -i=<input_folder> -o=<output_file_name> -o-dyn_score_weights=<weight_file_name>
  • features generation with time-lag based features and dynamic scores' weights import
java  -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -with_time_lags= -i=<input_folder> -o=<output_file_name> -i-dyn_score_weights=<weight_file_name>

2. Standardization

  • standardization with comma delimited input files (no spaces)
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -mode=std -i=<input_files> -o=<output_folder_name>
  • or without an output folder (the respective input folders will be used)
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -mode=std -i=<input_files>
  • standardization with additional stats output (the generated file is '-std.stats')
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -mode=std -ostats= -i=<input_files> -o=<output_folder_name>
  • standardization using explicitly passed stats as input (means + stds)
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -mode=std -istats=<input_stats_file> -i=<input_files> -o=<output_folder_name>
  • standardization including the time-lag based feautures
java -Xms10g -Xmx10g -Xss1M -jar ehr-ohdsi-processor-assembly-0.4.1.jar -mode=std -with_time_lags= -i=<input_files> -o=<output_folder_name>

3. Changing the logging configuration

  • by passing a logback file
-Dlogback.configurationFile=<path_to_logback.xml>

ehr-ohdsi-processor's People

Contributors

peterbanda avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.