Coder Social home page Coder Social logo

conceptql_spec's Introduction

ConceptQL Specification

ConceptQL (pronounced concept-Q-L) is a high-level language that allows researchers to unambiguously define their research algorithms. Over the last 7 years, it has been running in production as the heart of our study-building software, Jigsaw

In that time its implementation has evolved considerably, but it has remained consistent with its original goals.

Motivation for ConceptQL

Outcomes Insights has built a vast library of research algorithms and applied those algorithms to large databases of claims data. Early into building the library, we realized we had to overcome two major issues:

  1. Methods sections of research papers commonly use natural language to specify the criteria used to build cohorts from a claims database.
    • Algorithms defined in natural language are often imprecise, open to multiple interpretations, and generally difficult to reproduce.
    • Researchers could benefit from a language that removes the ambiguity of natural language while increasing the reproducibility of their research algorithms.
  2. Querying against claims databases is often difficult.
    • Hand-coding algorithms to extract cohorts from datasets is time-consuming, error-prone, and opaque.
    • Researchers could benefit from a language that allows algorithms to be defined at a high-level and then gets translated into the appropriate queries against a database.

We developed ConceptQL to address these two issues.

We have written a tool that can read research algorithms defined in ConceptQL. The tool can create a diagram for the algorithm which makes it easy to visualize and understand. The tool can also translate the algorithm into a SQL query which runs against data structured in the Generalized Data Model (GDM). The purpose of the GDM is to standardize the format and content of observational data, so standardized applications, tools and methods can be applied to them.

For instance, using ConceptQL we can take a statement that looks like this:

[ "icd9", "412" ]

And generate a diagram that looks like this:


Example 1

["icd9","412"]

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412

And generate SQL that looks like this:

SELECT *
FROM gdm_data.clinical_codes AS cc
WHERE cc.clinical_code_concept_id IN (
  SELECT id
  FROM concepts
  WHERE vocabulary_id = 'ICD9CM'
    AND concept_code = '412'
)

As stated above, one of the goals of ConceptQL is to make it easy to assemble fairly complex queries without having to roll up our sleeves and write raw SQL. To accommodate this complexity, ConceptQL itself has some complexities of its own. That said, we believe ConceptQL will help researchers define, hone, and share their research algorithms.

ConceptQL Overview

What ConceptQL Looks Like

Perhaps seeing an example will be the quickest way to get a sense of a language. Here is a trivial example to whet your appetite. The example is in YAML, but could just as easily be in JSON or any other markup language capable of representing nested sets of heterogeneous arrays and hashes. In fact, the ConceptQL "language" is a just set of nested arrays and hashes. If you squint, it might even look a bit like Lisp. Indeed, the nested arrays are S-expressions.

ConceptQL is a domain-specific, declarative "language" that represents search criteria with some set operations and temporal operations to glue those criteria together.

# Example 1: A simple example in YAML
# This is just a simple array with the first element being 'icd9cm' and the second being '412'
# This example will search the clinical_codes table for all conditions that match the ICD-9 diagnosis code 412.
---
- icd9cm
- 412

ConceptQL Diagrams

Reading ConceptQL in YAML or JSON can be difficult. It is often easier to explore ConceptQL using directed graphs. For instance, the diagram for the simple example listed in YAML above is:


Example 2 - All Conditions Matching MI

["icd9","412"]

All Conditions Matching MI

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412

Each oval depicts an "operator", or rather, a ConceptQL expression. An arrow between a pair of operators indicates that the records from the operator on the tail of the arrow pass on to the operator at the head of the arrow. A simple example should help here:


Example 3 - First Office Visit Per Patient

["first",["cpt","99214"]]

First Office Visit Per Patient

person_id criterion_id criterion_domain start_date end_date source_value
2 58127 procedure_occurrence 2009-07-03 2009-07-03 99214
5 38313 procedure_occurrence 2008-12-14 2008-12-14 99214
6 48992 procedure_occurrence 2009-09-12 2009-09-12 99214
7 63797 procedure_occurrence 2008-08-13 2008-08-13 99214
8 55871 procedure_occurrence 2008-02-27 2008-02-27 99214
9 45498 procedure_occurrence 2009-09-11 2009-09-11 99214
10 48752 procedure_occurrence 2009-05-09 2009-05-09 99214
11 73107 procedure_occurrence 2008-06-17 2008-06-17 99214
12 71635 procedure_occurrence 2008-02-11 2008-02-11 99214
13 74458 procedure_occurrence 2008-01-24 2008-01-24 99214

The diagram above reads "get all procedures that match the CPT 99214 (Office Visit) and then filter them down to the first occurrence (by date) for each person". More succinctly, "get the first office visit for each patient". The diagram is much more terse than that and to accurately read the diagram, you need a lot of implicit knowledge about how each operator operates. Fortunately, this document will impart that knowledge to you.

Please note that all of the diagrams end with an arrow pointing at nothing. We'll explain why soon.

Think of Records as a Stream

ConceptQL diagrams have the "leaf" operators at the top and "trunk" operators at the bottom. You can think of the records of a ConceptQL statement as a flowing stream of data. The leaf operators, or operators that gather records out of the database, act like tributaries. The records flow downwards and either join with other records, or filter out other records until the streams emerge at the bottom of the diagram. Think of each arrow as a stream of records, flowing down through one operator to the next.

The trailing arrow in the diagrams serves as a reminder that every ConceptQL statement yields a stream of records.

Streams Have Types

You might have noticed that the operators and edges in the diagrams often have a color. That color represents what "type" of stream the operator or edge represents. There are several types in ConceptQL, and you'll notice they are somewhat correlated with the tables found in OMOP's CDM v4.0:

  • condition_occurrence
    • red
  • death
    • brown
  • drug_exposure
    • purple
  • person
    • blue
  • procedure_occurrence
    • green

Each stream has a vocabulary of origin (essentially, the vocabulary used to pull records from the GDM database). Based on that origin, each stream will have a particular type. The stream carries this type information as it moves through each operator. When certain operators, particularly set and temporal operators, need to perform filtering, they can use this type information to determine how to best filter a stream. There will be much more discussion about types woven throughout this document. For now, it is sufficient to know that each stream has a type.

You'll also notice that the trailing arrow(s) at the end of the diagrams indicate which types of streams are ultimately passed on at the end of a ConceptQL statement.

Why Types?

At its inception, ConceptQL was developed to query OMOP's CDM v4.0 and its associated tables. Each table represented a certain "type" of data (e.g. condition_occurrence, procedure_occurrence, visit_occurrence, etc) and ConceptQL was designed to treat each type of data as distinct from the other types.

Now that ConceptQL runs against the GDM, types are far less important, but are still maintained to help users distinguish between the kinds of data a ConceptQL statement is querying.

What are Streams Really?

Though thinking in "streams" is helpful, on a few occasions we need to know what's going on under the hood.

Every table in the GDM structure has a surrogate key column (an ID column). When we execute a ConceptQL statement, the "streams" that are generated by the statement are just sets of these IDs for rows that matched the ConceptQL criteria. So each stream is just a set of IDs that point back to some rows in one of the GDM tables. When a stream has a "type" it is really just that the stream contains records associated with its vocabulary of origin.

So when we execute this ConceptQL statement, the resulting "stream" is all IDs from the patients table where the patient is male:


Example 4 - All Male Patients

["gender","Male"]

All Male Patients

person_id criterion_id criterion_domain start_date end_date source_value
1 1 person 1923-05-01 1923-05-01 00013D2EFD8E45D1
2 2 person 1943-01-01 1943-01-01 00016F745862898F
4 4 person 1941-06-01 1941-06-01 00021CA6FF03E670
5 5 person 1936-08-01 1936-08-01 00024B3D2352D2D0
6 6 person 1943-10-01 1943-10-01 0002DAE1C81CC70D
7 7 person 1922-07-01 1922-07-01 0002F28CE057345B
8 8 person 1935-09-01 1935-09-01 000308435E3E5B76
12 12 person 1929-06-01 1929-06-01 00048EF1F4791C68
14 14 person 1934-05-01 1934-05-01 00052705243EA128
16 16 person 1934-01-01 1934-01-01 0007E57CC13CE880

When we execute this ConceptQL statement, the resulting "stream" is all clinical_codes IDs that match ICD-9CM 250.01:


Example 5 - All Condition Occurrences that match ICD-9CM 250.01

["icd9","250.01"]

All Condition Occurrences that match ICD-9CM 250.01

person_id criterion_id criterion_domain start_date end_date source_value
136 15609 condition_occurrence 2009-04-06 2009-04-06 25001
131 17030 condition_occurrence 2010-10-05 2010-10-05 25001
131 18099 condition_occurrence 2010-02-05 2010-02-05 25001
82 22080 condition_occurrence 2008-02-13 2008-02-13 25001
137 23953 condition_occurrence 2008-10-02 2008-10-02 25001
249 25525 condition_occurrence 2009-05-27 2009-05-27 25001
83 27582 condition_occurrence 2009-04-29 2009-04-29 25001
14 28741 condition_occurrence 2009-06-27 2009-06-27 25001
17 30214 condition_occurrence 2010-03-21 2010-03-21 25001
187 31594 condition_occurrence 2009-11-16 2009-11-16 25001

Generally, it is helpful to just think of those queries generating a "stream of people" or a "stream of conditions" and not worry about the table of origin or the fact that they are just IDs.

When a ConceptQL statement is executed, it yields a final set of streams that are just all the IDs that passed through all the criteria. What is done with that set of IDs is up to the user who assembled the ConceptQL statement. If a user gathers all 250.01 Conditions, they will end up with a set of clinical_codes IDs. They could take those IDs and do all sorts of things like:

  • Gather the first and last date of occurrence per person
  • Count the number of occurrences per person
  • Count number of persons with the condition
  • Count the total number of occurrences for the entire population

This kind of aggregation and analysis is beyond the scope of ConceptQL. ConceptQL will get you the IDs of the rows you're interested in, it's up to other parts of the calling system to determine what you do with them.

Selection Operators

Selection operators are search for specific records within the data, e.g. searching for a diagnosis of an old myocardial infarction (ICD-9-CM 412) is a selection. Selection operators are always leaf operators, meaning no operators "feed" into a selection operator.

There are several basic types of selection operators.

Vocabulary Operator

A Vocabulary operator is any selection operator that selects records based on codes from a specific vocabulary. Below is a list of the most common vocabulary operators available in ConceptQL:

Operator Name Stream Type Arguments Returns
cpt4 procedure_occurrence 1 or more CPT codes All records whose source_value match any of the CPT codes
icd9cm condition_occurrence 1 or more ICD-9CM codes All records whose source_value match any of the ICD-9 codes
icd9_procedure procedure_occurrence 1 or more ICD-9 procedure codes All records whose source_value match any of the ICD-9 procedure codes
icd10cm condition_occurrence 1 or more ICD-10 All records whose source_value match any of the ICD-10 codes
hcpcs procedure_occurrence 1 or more HCPCS codes All records whose source_value match any of the HCPCS codes
loinc observation 1 or more LOINC codes All records whose source_value match any of the LOINC codes
rxnorm drug_exposure 1 or more RxNorm IDs All records whose drug_concept_id match any of the RxNorm IDs
snomed condition_occurrence 1 or more SNOMED codes All records whose source_value match any of the SNOMED codes

Person-related Operators

Person operators generate person records, or records that are derived from the table containing patient demographics. The start_date and end_date for a person-based record is the patient's birth date, as explained in more detail in temporal operators and person streams.

Gender Operator

This person operator selects people by gender. Currently, available genders are Male, Female, or Unknown.


Example 6 - Gathers all female person records

["gender","Female"]

Gathers all female person records

person_id criterion_id criterion_domain start_date end_date source_value
3 3 person 1936-09-01 1936-09-01 0001FDD721E223DC
9 9 person 1976-09-01 1976-09-01 000345A39D4157C9
10 10 person 1938-10-01 1938-10-01 00036A21B65B0206
11 11 person 1934-02-01 1934-02-01 000489E7EAAD463F
13 13 person 1936-07-01 1936-07-01 0004F0ABD505251D
15 15 person 1936-03-01 1936-03-01 00070B63745BE497
17 17 person 1919-09-01 1919-09-01 0007F12A492FD25D
18 18 person 1919-10-01 1919-10-01 000A005BA0BED3EA
19 19 person 1942-07-01 1942-07-01 000B4662348C35B4
26 26 person 1939-12-01 1939-12-01 0010D6F80D245D62

Race Operator

This person operator selects people by race. Available races are defined in the Race vocabulary.


Example 7 - Gathers all persons indicated as white

["race","White"]

Gathers all persons indicated as white

person_id criterion_id criterion_domain start_date end_date source_value
1 1 person 1923-05-01 1923-05-01 00013D2EFD8E45D1
2 2 person 1943-01-01 1943-01-01 00016F745862898F
3 3 person 1936-09-01 1936-09-01 0001FDD721E223DC
5 5 person 1936-08-01 1936-08-01 00024B3D2352D2D0
7 7 person 1922-07-01 1922-07-01 0002F28CE057345B
8 8 person 1935-09-01 1935-09-01 000308435E3E5B76
9 9 person 1976-09-01 1976-09-01 000345A39D4157C9
11 11 person 1934-02-01 1934-02-01 000489E7EAAD463F
12 12 person 1929-06-01 1929-06-01 00048EF1F4791C68
13 13 person 1936-07-01 1936-07-01 0004F0ABD505251D

Ethnicity Operator

This person operator selects people by ethnicity. Available ethnicities are defined in the Ethnicity vocabulary.


Example 8 - Gathers all persons indicated as Hispanic

["ethnicity","Hispanic"]

Gathers all persons indicated as Hispanic

person_id criterion_id criterion_domain start_date end_date source_value
4 4 person 1941-06-01 1941-06-01 00021CA6FF03E670
22 22 person 1942-08-01 1942-08-01 000D6D88463D8A76
72 72 person 1933-04-01 1933-04-01 002354398A00234E
76 76 person 1938-03-01 1938-03-01 00242FF5E7A3F2D9
181 181 person 1939-11-01 1939-11-01 005E0AB5172E715F

Person Operator

This person operator selects all patient records.


Example 9 - Gathers all persons indicated as Hispanic

["person"]

Gathers all persons indicated as Hispanic

person_id criterion_id criterion_domain start_date end_date source_value
1 1 person 1923-05-01 1923-05-01 00013D2EFD8E45D1
2 2 person 1943-01-01 1943-01-01 00016F745862898F
3 3 person 1936-09-01 1936-09-01 0001FDD721E223DC
4 4 person 1941-06-01 1941-06-01 00021CA6FF03E670
5 5 person 1936-08-01 1936-08-01 00024B3D2352D2D0
6 6 person 1943-10-01 1943-10-01 0002DAE1C81CC70D
7 7 person 1922-07-01 1922-07-01 0002F28CE057345B
8 8 person 1935-09-01 1935-09-01 000308435E3E5B76
9 9 person 1976-09-01 1976-09-01 000345A39D4157C9
10 10 person 1938-10-01 1938-10-01 00036A21B65B0206

Additional Selection Operators

Death Operator

This operator pulls all death records from the death table.


Example 10

["death"]

person_id criterion_id criterion_domain start_date end_date source_value
16 1 death 2010-12-01 2010-12-01
26 2 death 2009-07-01 2009-07-01
65 3 death 2009-01-01 2009-01-01
119 4 death 2008-09-01 2008-09-01
166 5 death 2008-07-01 2008-07-01
181 6 death 2010-07-01 2010-07-01
190 7 death 2009-09-01 2009-09-01
191 8 death 2009-09-01 2009-09-01
201 9 death 2008-05-01 2008-05-01
215 10 death 2008-01-01 2008-01-01

Information Periods Operator

This operator pulls all information period records from the information_periods table.


Example 11

["information_periods"]

person_id criterion_id criterion_domain start_date end_date source_value
1 1 observation_period 2008-01-01 2010-12-31
1 2 observation_period 2008-01-01 2010-12-31
1 3 observation_period 2008-01-01 2010-12-31
1 4 observation_period 2008-01-01 2010-12-31
2 5 observation_period 2008-01-01 2010-12-31
2 6 observation_period 2008-01-01 2010-12-31
2 7 observation_period 2009-01-01 2010-12-31
2 793 observation_period 2008-01-01 2010-12-31
3 8 observation_period 2008-01-01 2010-12-31
3 9 observation_period 2008-01-01 2010-12-31

Utilization Operators

Utilization operators are selection operators that generate records related to admissions.

SNF Operator

This utilization operator generates records that are associated with Skilled Nursing Facilities (SNF).


Example 12

["snf"]

No Results found.


Hospice Operator

This utilization operator generates records that are associated with Hospice Facilities.


Example 13

["hospice"]

No Results found.


Hospitalization Operator

This utilization operator generates records that are associated with Hospitalizations.


Example 14

["hospitalization"]

person_id criterion_id criterion_domain start_date end_date source_value
134 1 condition_occurrence 2009-11-23 2009-11-30
247 2 condition_occurrence 2008-02-28 2008-03-04
68 3 condition_occurrence 2008-04-19 2008-04-23
247 4 condition_occurrence 2008-03-25 2008-03-29
66 5 condition_occurrence 2009-08-04 2009-08-06
95 6 condition_occurrence 2008-09-03 2008-09-12
247 7 condition_occurrence 2008-08-14 2008-08-21
202 8 condition_occurrence 2008-08-18 2008-08-20
247 9 condition_occurrence 2008-07-13 2008-07-18
153 10 condition_occurrence 2010-01-29 2010-02-08

All Other Operators i.e. Mutation Operators

Virtually all other operators add, remove, filter, or otherwise alter streams of records. They are discussed in this section.

Set Operators

Because streams represent sets of records, it makes sense to include operators that operate on sets

Union Operator

  • Takes any number of upstream operators and aggregates their streams
    • Unions together streams with identical types
      • Think of streams with the same type flowing together into a single stream
      • We're really just gathering the union of all IDs for identically-typed streams
    • Streams with the different types flow along together concurrently without interacting
      • It does not make sense to union, say, condition_occurrences with procedure_occurrences, so streams with different types won't mingle together, but will continue to flow downstream in parallel

Example 15 - Two streams of the same type (condition_occurrence) joined into a single stream

["union",["icd9","412"],["icd9","250.01"]]

Two streams of the same type (condition_occurrence) joined into a single stream

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
136 15609 condition_occurrence 2009-04-06 2009-04-06 25001
66 16171 condition_occurrence 2009-07-25 2009-07-25 412


Example 16 - Two streams of the same type (condition_occurrence) joined into a single stream, then a different stream (procedure_occurrence) flows concurrently

["union",["union",["icd9","412"],["icd9","250.01"]],["cpt","99214"]]

Two streams of the same type (condition_occurrence) joined into a single stream, then a different stream (procedure_occurrence) flows concurrently

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
136 15609 condition_occurrence 2009-04-06 2009-04-06 25001
66 16171 condition_occurrence 2009-07-25 2009-07-25 412


Example 17 - Two streams of the same type (condition_occurrence) joined into a single stream, along with a different stream (procedure_occurrence) flows concurrently (same as above example)

["union",["icd9","412"],["icd9","250.01"],["cpt","99214"]]

Two streams of the same type (condition_occurrence) joined into a single stream, along with a different stream (procedure_occurrence) flows concurrently (same as above example)

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
136 15609 condition_occurrence 2009-04-06 2009-04-06 25001
66 16171 condition_occurrence 2009-07-25 2009-07-25 412

Except Operator

This operator takes two sets of incoming streams, a left-hand stream and a right-hand stream. The operator matches like-type streams between the left-hand and right-hand streams. The operator removes any records in the left-hand stream if they appear in the right-hand stream. The operator passes only records for the left-hand stream downstream. The operator discards all records in the right-hand stream. For example:


Example 18 - All males who are not white

["except",{"left":["gender","male"],"right":["race","white"]}]

All males who are not white

person_id criterion_id criterion_domain start_date end_date source_value
4 4 person 1941-06-01 1941-06-01 00021CA6FF03E670
6 6 person 1943-10-01 1943-10-01 0002DAE1C81CC70D
21 21 person 1932-08-01 1932-08-01 000C7486B11E7030
22 22 person 1942-08-01 1942-08-01 000D6D88463D8A76
50 50 person 1941-08-01 1941-08-01 001AEDD510C92C87
68 68 person 1954-11-01 1954-11-01 0021D4CDAFC0609F
69 69 person 1925-05-01 1925-05-01 00225409819CF5F6
71 71 person 1931-01-01 1931-01-01 0022CC1943D038E2
72 72 person 1933-04-01 1933-04-01 002354398A00234E
75 75 person 1932-08-01 1932-08-01 00241223F034F97B

If the left-hand stream has no types that match the right-hand stream, the left-hand stream passes through unaffected:


Example 19 - All Conditions that are MI

["except",{"left":["icd9","412"],"right":["cpt","99214"]}]

All Conditions that are MI

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412

And just to show how multiple streams behave:


Example 20 - Passes two streams downstream: a stream of Conditions that are MI (this stream is completely unaffected by the right hand stream) and a stream of People that are Male but not White

["except",{"left":["union",["icd9","412"],["gender","Male"]],"right":["race","White"]}]

Passes two streams downstream: a stream of Conditions that are MI (this stream is completely unaffected by the right hand stream) and a stream of People that are Male but not White

person_id criterion_id criterion_domain start_date end_date source_value
4 4 person 1941-06-01 1941-06-01 00021CA6FF03E670
6 6 person 1943-10-01 1943-10-01 0002DAE1C81CC70D
21 21 person 1932-08-01 1932-08-01 000C7486B11E7030
22 22 person 1942-08-01 1942-08-01 000D6D88463D8A76
50 50 person 1941-08-01 1941-08-01 001AEDD510C92C87
68 68 person 1954-11-01 1954-11-01 0021D4CDAFC0609F
69 69 person 1925-05-01 1925-05-01 00225409819CF5F6
71 71 person 1931-01-01 1931-01-01 0022CC1943D038E2
72 72 person 1933-04-01 1933-04-01 002354398A00234E
75 75 person 1932-08-01 1932-08-01 00241223F034F97B

Discussion About Set Operators

Why should we allow two different types of streams to continue downstream concurrently?

This feature lets us do interesting things, like find the first occurrence of either an MI or Death as in the example below. Throw in a few more criteria and you could find the first occurrence of all censoring events for each patient


Example 21 - First occurrence of either MI or Death for each patient

["first",["union",["icd9","412"],["death"]]]

First occurrence of either MI or Death for each patient

person_id criterion_id criterion_domain start_date end_date source_value
11 82614 condition_occurrence 2008-10-29 2008-10-29 412
16 1 death 2010-12-01 2010-12-01
17 27269 condition_occurrence 2008-08-25 2008-08-25 412
26 2 death 2009-07-01 2009-07-01
38 40556 condition_occurrence 2010-02-12 2010-02-12 412
54 74606 condition_occurrence 2008-06-05 2008-06-05 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
65 3 death 2009-01-01 2009-01-01
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
73 52843 condition_occurrence 2008-11-16 2008-11-16 412

Time-oriented Operators

All records in a stream carry a start_date and end_date with them. All temporal comparisons of streams use these two date columns. Each record in a stream derives its start and end date from its corresponding row in its table of origin.

If a record comes from a table that only has a single date value, the record derives both its start_date and end_date from that single date, e.g. a death record derives both its start_date and end_date from its corresponding row's observation_date.

The person stream is a special case. Person records use the person's date of birth as the start_date and end_date. This may sound strange, but we will explain below why this can be useful.

Relative Temporal Operators

When looking at a set of records for a person, perhaps we want to select just the chronologically first or last record. Or maybe we want to select the 2nd record or 2nd to last record. Relative temporal operators provide this type of filtering. Relative temporal operators use a record's start_date to determine chronological order.

Nth Occurrence Operator

  • Takes a two arguments: the stream to select from and an integer argument
  • For the integer argument
    • Positive numbers mean 1st, 2nd, 3rd occurrence in chronological order
      • e.g. 1 => first
      • e.g. 4 => fourth
    • Negative numbers mean 1st, 2nd, 3rd occurrence in reverse chronological order
      • e.g. -1 => last
      • e.g. -4 => fourth from last
    • 0 is undefined?
  • Has an optional parameter: unique
    • Setting the unique parameter to true will de-duplicate all upstream records by patient_id, domain, and source_code, creating a set of unique codes per patient. Then it will find the Nth occurrence from that set of unique codes.

Example 22 - For each patient, select the Condition that represents the second occurrence of an MI

["occurrence",2,["icd9","412"]]

For each patient, select the Condition that represents the second occurrence of an MI

person_id criterion_id criterion_domain start_date end_date source_value
17 46165 condition_occurrence 2009-04-30 2009-04-30 412
88 73803 condition_occurrence 2009-05-30 2009-05-30 412
152 84298 condition_occurrence 2010-11-22 2010-11-22 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
187 87583 condition_occurrence 2010-12-26 2010-12-26 412
212 34052 condition_occurrence 2009-10-07 2009-10-07 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
214 83257 condition_occurrence 2009-11-12 2009-11-12 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412
223 39893 condition_occurrence 2008-10-31 2008-10-31 412

In the example shown below, the unique parameter under the Nth Occurrence operator in the algorithm will de-duplicate all upstream records and create a new set of records limited to the first 410.00, the first 410.01, the first 250.00, and the first 250.01 ICD-9 codes for a patient. The occurrence parameter will find the nth of those unique codes. In the example below, the algorithm will find the second of three unique codes for each patient. This algorithm ensures that this patient has at least two of the ICD-9 codes listed in the ICD-9 operator. As a reminder, the First, Last, and Nth Occurrence operators return only one record per patient. If there are multiple records with the same date that meet the requirements of the First, Last, and Nth Occurrence, an arbitrary row is returned.


Example 23

["occurrence",2,["icd9","410.00","410.01","250.00","250.01"],{"unique":true}]

person_id criterion_id criterion_domain start_date end_date source_value
2 57897 condition_occurrence 2009-05-16 2009-05-16 25000
7 69706 condition_occurrence 2008-05-12 2008-05-12 25000
8 90541 condition_occurrence 2009-03-01 2009-03-03 25000
9 84726 condition_occurrence 2010-09-12 2010-09-12 25000
11 35227 condition_occurrence 2008-12-08 2008-12-08 25000
12 43800 condition_occurrence 2009-05-21 2009-05-21 25000
13 57738 condition_occurrence 2008-11-22 2008-11-22 25000
14 28741 condition_occurrence 2009-06-27 2009-06-27 25001
15 68873 condition_occurrence 2010-11-18 2010-11-18 25000
17 73012 condition_occurrence 2008-04-26 2008-04-26 25000

First Operator


Example 24 - For each patient, select the Condition that represents the first occurrence of an MI

["first",["icd9","412"]]

For each patient, select the Condition that represents the first occurrence of an MI

person_id criterion_id criterion_domain start_date end_date source_value
11 82614 condition_occurrence 2008-10-29 2008-10-29 412
17 27269 condition_occurrence 2008-08-25 2008-08-25 412
38 40556 condition_occurrence 2010-02-12 2010-02-12 412
54 74606 condition_occurrence 2008-06-05 2008-06-05 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
73 52843 condition_occurrence 2008-11-16 2008-11-16 412
77 58165 condition_occurrence 2010-10-06 2010-10-06 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412

Last Operator


Example 25 - For each patient, select the Condition that represents the last occurrence of an MI

["last",["icd9","412"]]

For each patient, select the Condition that represents the last occurrence of an MI

person_id criterion_id criterion_domain start_date end_date source_value
11 82614 condition_occurrence 2008-10-29 2008-10-29 412
17 46165 condition_occurrence 2009-04-30 2009-04-30 412
38 40556 condition_occurrence 2010-02-12 2010-02-12 412
54 74606 condition_occurrence 2008-06-05 2008-06-05 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
73 52843 condition_occurrence 2008-11-16 2008-11-16 412
77 58165 condition_occurrence 2010-10-06 2010-10-06 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 73803 condition_occurrence 2009-05-30 2009-05-30 412

Date Literals

For situations where we need to represent pre-defined date ranges, we can use "date literal" operators.

Date Range Operator

  • Takes a hash with two elements: { start: <date-format>, end: <date-format> }
  • Creates an inclusive, continuous range of dates defined by a start and end date

day

  • Takes a single argument: <date-format>
  • Represents a single day
  • Shorthand for creating a date range that starts and ends on the same date
  • Not yet implemented

What is <date-format>?

Dates follow these formats:

  • "YYYY-MM-DD"
    • Four-digit year, two-digit month with leading 0s, two-digit day with leading 0s
  • "START"
    • Represents the first date of information available from the data source
  • "END"
    • Represents the last date of information available from the data source.

Temporal Comparison Operators

As described above, each record carries a start and end date, defining its own date range. It is through these date ranges that we are able to do temporal filtering of streams via temporal operators.

Temporal operators take two incoming sets of records, a set of left-hand records (LHR) and a set of right-hand records (RHR). A temporal operator groups records by person, then compares dates each person's LHR against dates in their RHR. Only LHR are passed along. RHR are discarded after the comparison.

If a person only has LHR or RHR but not both, the Temporal Operator does not pass any LHR records along.

Each operator performs a different type of comparison between LHR and RHR dates. All operators have some options which are best described later.

During Operator

This is a temporal operator. It passes along any LHR that have a start_date and end_date completely contained within a RHR start_date and end_date.


Example 26 - All MIs for the year 2010

["during",{"left":["icd9","412"],"right":["date_range",{"start":"2010-01-01","end":"2010-12-31"}]}]

All MIs for the year 2010

person_id criterion_id criterion_domain start_date end_date source_value
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
90 33940 condition_occurrence 2010-05-26 2010-05-26 412
38 40556 condition_occurrence 2010-02-12 2010-02-12 412
239 49718 condition_occurrence 2010-02-12 2010-02-12 412
217 56467 condition_occurrence 2010-06-22 2010-06-23 412
77 58165 condition_occurrence 2010-10-06 2010-10-06 412
110 62592 condition_occurrence 2010-06-27 2010-06-27 412
152 84298 condition_occurrence 2010-11-22 2010-11-22 412
187 87583 condition_occurrence 2010-12-26 2010-12-26 412
109 89334 condition_occurrence 2010-07-06 2010-07-06 412

Contains Operator

This is a temporal operator. It passes along any LHR that have a start_date and end_date which completely contain an RHR start_date and end_date.


Example 27 - All MIs occurring on 2010-06-22

["contains",{"left":["icd9","412"],"right":["date_range",{"start":"2010-06-22","end":"2010-06-22"}]}]

All MIs occurring on 2010-06-22

person_id criterion_id criterion_domain start_date end_date source_value
217 56467 condition_occurrence 2010-06-22 2010-06-23 412

Any Overlap Operator

This is a temporal operator. It passes along any LHR whose date range overlaps in any way with a RHR's date range. This diagram attempts to demonstrate all L records that would qualify as having "any_overlap" with an R record.


Example 28 - All MIs that overlap with the month of 2010-06

["any_overlap",{"left":["icd9","412"],"right":["date_range",{"start":"2010-06-01","end":"2010-06-30"}]}]

All MIs that overlap with the month of 2010-06

person_id criterion_id criterion_domain start_date end_date source_value
217 56467 condition_occurrence 2010-06-22 2010-06-23 412
110 62592 condition_occurrence 2010-06-27 2010-06-27 412

Before Operator

This is a temporal operator. Its behavior warrants a bit of explanation.

Imagine events A-A-B-A-B-A. How many A's come before a B? One could argue that only two A's come before the first B, so only the first two A's qualify. However, there are three A's that ultimately come before a B.

The Before operator uses the latter approach, so all LHR records are compared against the start_date of the last RHR record.


Example 29 - All MIs that occurred before a patient's last case of diabetes (250.01)

["before",{"left":["icd9","412"],"right":["icd9","250.01"]}]

All MIs that occurred before a patient's last case of diabetes (250.01)

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
184 24281 condition_occurrence 2008-01-07 2008-01-07 412
17 27269 condition_occurrence 2008-08-25 2008-08-25 412
17 46165 condition_occurrence 2009-04-30 2009-04-30 412
152 82988 condition_occurrence 2008-08-16 2008-08-16 412
214 83257 condition_occurrence 2009-11-12 2009-11-12 412
109 89334 condition_occurrence 2010-07-06 2010-07-06 412
214 89463 condition_occurrence 2008-02-17 2008-02-17 412

After Operator

This is a temporal operator. Its behavior warrants a bit of explanation.

Imagine events A-A-B-A-B-A. How many A's come after a B? One could argue that only one A come before the last B, so only the last A qualifies. However, there are two A's that ultimately come after a B.

The After operator uses the latter approach, so all LHR records are compared against the end_date of the first RHR record.


Example 30 - All MIs that occurred after a patient's first case of diabetes (250.01)

["after",{"left":["icd9","412"],"right":["icd9","250.01"]}]

All MIs that occurred after a patient's first case of diabetes (250.01)

person_id criterion_id criterion_domain start_date end_date source_value
96 20748 condition_occurrence 2008-09-21 2008-09-24 412
223 38682 condition_occurrence 2008-07-28 2008-07-28 412
223 39893 condition_occurrence 2008-10-31 2008-10-31 412
77 58165 condition_occurrence 2010-10-06 2010-10-06 412
190 61423 condition_occurrence 2009-04-09 2009-04-09 412
110 62592 condition_occurrence 2010-06-27 2010-06-27 412
214 83257 condition_occurrence 2009-11-12 2009-11-12 412
152 84298 condition_occurrence 2010-11-22 2010-11-22 412
187 87583 condition_occurrence 2010-12-26 2010-12-26 412

Working Around Before/After Behavior

If this is not the behavior you desire, use one of the relative time operators to select which RHR should be the one used to do comparison


Example 31 - All MIs that occurred before a patient's first case of diabetes (250.01)

["before",{"left":["icd9","412"],"right":["first",["icd9","250.01"]]}]

All MIs that occurred before a patient's first case of diabetes (250.01)

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
184 24281 condition_occurrence 2008-01-07 2008-01-07 412
17 27269 condition_occurrence 2008-08-25 2008-08-25 412
17 46165 condition_occurrence 2009-04-30 2009-04-30 412
152 82988 condition_occurrence 2008-08-16 2008-08-16 412
109 89334 condition_occurrence 2010-07-06 2010-07-06 412
214 89463 condition_occurrence 2008-02-17 2008-02-17 412
187 90638 condition_occurrence 2009-01-31 2009-01-31 412

Time Manipulation Operators

Time Window Operator

There are situations when the date columns associated with a record should have their values shifted forward or backward in time to make a comparison with another set of dates. This is where the Time Window operator is used. It has the following properties:

  • Takes 2 arguments
    • First argument is the stream on which to operate
    • Second argument is a hash with two keys: [:start, :end] each with a value in the following format: "(-?\d+[dmy])+"
      • Both start and end must be defined, even if you are only adjusting one of the dates
    • Some examples
      • 30d => 30 days
      • 20 => 20 days
      • d => 1 day
      • 1y => 1 year
      • -1m => -1 month
      • 10d3m => 3 months and 10 days
      • -2y10m-3d => -2 years, +10 months, -3 days
    • The start or end value can also be '', '0', or nil
      • This will leave the date unaffected
    • The start or end value can also be the string 'start' or 'end'
      • 'start' represents the start_date for each record
      • 'end' represents the end_date for each record
      • See the example below

Example 32 - All Diagnoses of Diabetes (ICD-9 250.01) within 30 days of an MI

["during",{"left":["icd9","250.01"],"right":["time_window",["icd9","412"],{"start":"-30d","end":"30d"}]}]

All Diagnoses of Diabetes (ICD-9 250.01) within 30 days of an MI

person_id criterion_id criterion_domain start_date end_date source_value
96 46366 condition_occurrence 2008-09-13 2008-09-13 25001
223 81480 condition_occurrence 2008-07-03 2008-07-03 25001


Example 33 - Shift the window for all MIs back by 200 years

["time_window",["icd9","412"],{"start":"-200y","end":"-200y"}]

Shift the window for all MIs back by 200 years

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 1808-03-22 1808-03-23 412
177 507 condition_occurrence 1809-06-13 1809-06-16 412
230 523 condition_occurrence 1808-03-14 1808-03-21 412
161 963 condition_occurrence 1809-10-25 1809-10-29 412
60 986 condition_occurrence 1809-07-19 1809-07-22 412
81 1405 condition_occurrence 1809-01-28 1809-01-30 412
88 1572 condition_occurrence 1809-01-03 1809-01-09 412
213 15005 condition_occurrence 1810-02-07 1810-02-07 412
66 16171 condition_occurrence 1809-07-25 1809-07-25 412
220 20660 condition_occurrence 1809-10-31 1809-10-31 412


Example 34 - Expand the dates for all MIs to a window ranging from 2 months and 2 days prior to 1 year and 3 days after the MI

["time_window",["icd9","412"],{"start":"-2m-2d","end":"3d1y"}]

Expand the dates for all MIs to a window ranging from 2 months and 2 days prior to 1 year and 3 days after the MI

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-01-20 2009-03-26 412
177 507 condition_occurrence 2009-04-11 2010-06-19 412
230 523 condition_occurrence 2008-01-12 2009-03-24 412
161 963 condition_occurrence 2009-08-23 2010-11-01 412
60 986 condition_occurrence 2009-05-17 2010-07-25 412
81 1405 condition_occurrence 2008-11-26 2010-02-02 412
88 1572 condition_occurrence 2008-11-01 2010-01-12 412
213 15005 condition_occurrence 2009-12-05 2011-02-10 412
66 16171 condition_occurrence 2009-05-23 2010-07-28 412
220 20660 condition_occurrence 2009-08-29 2010-11-03 412


Example 35 - Collapse all 412 date ranges down to just the date of admission by leaving start_date unaffected and setting end_date to start_date

["time_window",["icd9","412"],{"start":"","end":"start"}]

Collapse all 412 date ranges down to just the date of admission by leaving start_date unaffected and setting end_date to start_date

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-22 412
177 507 condition_occurrence 2009-06-13 2009-06-13 412
230 523 condition_occurrence 2008-03-14 2008-03-14 412
161 963 condition_occurrence 2009-10-25 2009-10-25 412
60 986 condition_occurrence 2009-07-19 2009-07-19 412
81 1405 condition_occurrence 2009-01-28 2009-01-28 412
88 1572 condition_occurrence 2009-01-03 2009-01-03 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412


Example 36 - Nonsensical, but allowed: swap the start_date and end_date for a range

["time_window",["icd9","412"],{"start":"end","end":"start"}]

Nonsensical, but allowed: swap the start_date and end_date for a range

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-23 2008-03-22 412
177 507 condition_occurrence 2009-06-16 2009-06-13 412
230 523 condition_occurrence 2008-03-21 2008-03-14 412
161 963 condition_occurrence 2009-10-29 2009-10-25 412
60 986 condition_occurrence 2009-07-22 2009-07-19 412
81 1405 condition_occurrence 2009-01-30 2009-01-28 412
88 1572 condition_occurrence 2009-01-09 2009-01-03 412
213 15005 condition_occurrence 2010-02-07 2010-02-07 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412

Trim Date Start Operator

Like temporal comparison operators, this operator takes left and right hand streams of records.

The operator finds the most recent end_date in the RHR and uses that date to "trim" the start_dates of each LHR record.

There are four possible scenarios:

  • There is no corresponding RHR for the LHR
    • LHR row is passed through unaffected
  • RHR end_date is before LHR start_date
    • LHR row is passed through unaffected
  • RHR end_date is after LHR end_date
    • LHR row is completely discarded
  • RHR end_date falls between LHR start_date and end_date
    • LHR's start_date is set to RHR's end_date

Example 37 - Create time windows starting at 1980 and ending when patient is 50

["trim_date_start",{"left":["time_window",["person"],{"end":"+50y"}],"right":["date_range",{"start":"1980-01-01","end":"1980-01-01"}]}]

Create time windows starting at 1980 and ending when patient is 50

person_id criterion_id criterion_domain start_date end_date source_value
2 2 person 1980-01-01 1993-01-01 00016F745862898F
3 3 person 1980-01-01 1986-09-01 0001FDD721E223DC
4 4 person 1980-01-01 1991-06-01 00021CA6FF03E670
5 5 person 1980-01-01 1986-08-01 00024B3D2352D2D0
6 6 person 1980-01-01 1993-10-01 0002DAE1C81CC70D
8 8 person 1980-01-01 1985-09-01 000308435E3E5B76
9 9 person 1980-01-01 2026-09-01 000345A39D4157C9
10 10 person 1980-01-01 1988-10-01 00036A21B65B0206
11 11 person 1980-01-01 1984-02-01 000489E7EAAD463F
13 13 person 1980-01-01 1986-07-01 0004F0ABD505251D

Trim Date End Operator

Like temporal comparison operators, this operator takes left and right hand streams of records.

The operator finds the earliest start_date in the RHR and uses that date to "trim" the end_dates of each LHR record.

There are four possible scenarios:

  • There is no corresponding RHR for the LHR
    • LHR row is passed through unaffected
  • RHR start_date is after LHR end_date
    • LHR row is passed through unaffected
  • RHR start_date is before LHR start_date
    • LHR row is completely discarded
  • RHR start_date falls between LHR start_date and end_date
    • LHR's end_date is set to RHR's start_date

Example 38 - Create time windows starting at patient's birth and ending when patient is 50 or 1980, what ever is earlier

["trim_date_end",{"left":["time_window",["person"],{"end":"+50y"}],"right":["date_range",{"start":"1980-01-01","end":"1980-01-01"}]}]

Create time windows starting at patient's birth and ending when patient is 50 or 1980, what ever is earlier

person_id criterion_id criterion_domain start_date end_date source_value
1 1 person 1923-05-01 1973-05-01 00013D2EFD8E45D1
2 2 person 1943-01-01 1980-01-01 00016F745862898F
3 3 person 1936-09-01 1980-01-01 0001FDD721E223DC
4 4 person 1941-06-01 1980-01-01 00021CA6FF03E670
5 5 person 1936-08-01 1980-01-01 00024B3D2352D2D0
6 6 person 1943-10-01 1980-01-01 0002DAE1C81CC70D
7 7 person 1922-07-01 1972-07-01 0002F28CE057345B
8 8 person 1935-09-01 1980-01-01 000308435E3E5B76
9 9 person 1976-09-01 1980-01-01 000345A39D4157C9
10 10 person 1938-10-01 1980-01-01 00036A21B65B0206

Temporal Comparison Operator Options

Sometimes it is difficult to reason through Time Window when working with temporal comparison operators. It would be nice if a more intuitive language could be used to describe some common temporal relationships.

We've added a few parameters, primarily for the Before and After operators, that will help with temporal comparisons.

  • within
    • Takes same date adjustment format as time_window, e.g. 30d or 2m or 1y-3d
    • The start_date and end_date of the RHR are adjusted out in each direction by the amount specified and the LHR must pass the original temporal comparison and then fall within the window created by the adjustment
  • at_least
    • Takes same date adjustment format as time_window, e.g. 30d or 2m or 1y-3d
    • The start_date and end_date of the RHR are adjusted out in each direction by the amount specified and the LHR must pass the original temporal comparison and then fall outside the window created by the adjustment

Let's see them in action:

Prescriptions After a Office Visit - Find all prescriptions of interest occurring within three days after an office visit


Example 39

["after",{"left":["ndc","61392070054","65084025214","65726040125",{"label":"Prescriptions of Interest"}],"right":["cpt","99214",{"label":"Office Visit"}],"within":"3d"}]

person_id criterion_id criterion_domain start_date end_date source_value
207 11312 drug_exposure 2010-08-13 2010-08-13 61392070054

Walk through of example above:

  • Pull some prescriptions of interest into LHR
  • Pull some office visits into RHR
  • Compare LHR against RHR, enforcing that the LHR' start_date falls after the RHR' end_date
  • Enforce that any remaining LHR' start_date falls within an RHR's (start_date - 3 days) and (end_date + 3 days)

Find all Heart Attacks Probably Resulting in Death -- Does anyone die within a year of a heart attack?


Example 40

["before",{"left":["icd9","410.00","410.01","410.10","410.11"],"right":["death"],"within":"1y"}]

person_id criterion_id criterion_domain start_date end_date source_value
191 42977 condition_occurrence 2009-05-21 2009-05-21 41010

Walk through of example above:

  • Pull hospitalizations into LHR
  • Pull death records into RHR
  • Compare LHR against RHR, enforcing that the LHR' end_date falls before the RHR' start_date
  • Enforce that any remaining LHR row's end_date falls within an RHR row's (start_date - 1 year) and (end_date + 1 year)

Example 41 - Find all diagnosis of heart attack at least 1 week after a diagnosis of diabetes

["after",{"left":["icd9","410.00","410.01","410.10","410.11",{"label":"Heart Attack Dx"}],"right":["icd9","250.01",{"label":"Diabetes Dx"}],"at_least":"1w"}]

Find all diagnosis of heart attack at least 1 week after a diagnosis of diabetes

person_id criterion_id criterion_domain start_date end_date source_value
58 79480 condition_occurrence 2009-02-24 2009-02-24 41011

This example illustrates how at_least works.

Considerations

Currently, temporal comparisons are done with an inner join between the LHR relation and the RHR relation. This has some interesting effects:

  • If more than one RHR row matches with an LHR row, multiple copies of the LHR row will end up in the downstream records
    • Should we limit the LHR to only unique rows, essentially de-duping the downstream records?
  • If the same row appears in both the LHR and RHR relation, it is likely the row will match itself (e.g. a row occurs during itself and contains itself etc.)
    • This is a bit awkward and perhaps we should skip joining rows against each other if they are identical (i.e. have the same criterion_id and criterion_type)?

Temporal Operators and Person Streams

Person streams carry a patient's date of birth in their start and end date columns. This makes them almost useless when they are part of the L stream of a temporal operator. But person streams are useful as the R stream. By Time Windowing the patient's date of birth, we can filter based on the patient's age like so:


Example 42 - All MIs that occurred after a male patient's 50th birthday

["after",{"left":["icd9","412"],"right":["time_window",["gender","Male"],{"start":"50y","end":"50y"}]}]

All MIs that occurred after a male patient's 50th birthday

person_id criterion_id criterion_domain start_date end_date source_value
190 61423 condition_occurrence 2009-04-09 2009-04-09 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
197 21914 condition_occurrence 2009-05-07 2009-05-07 412
149 28518 condition_occurrence 2008-04-07 2008-04-07 412
239 49718 condition_occurrence 2010-02-12 2010-02-12 412
239 47130 condition_occurrence 2009-08-04 2009-08-04 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
214 89463 condition_occurrence 2008-02-17 2008-02-17 412
214 83257 condition_occurrence 2009-11-12 2009-11-12 412
187 90638 condition_occurrence 2009-01-31 2009-01-31 412

Episode Operator

There are rare occasions when we'd like to stitch a set of events together into a span of time, such as a set of prescriptions events into a span of time we'd consider a patient to be taking a certain medication. ConceptQL provides the Episode operator for such an occasion.


Example 43 - Episodes of diabetes diagnoses

["episode",["icd9","250.00"],{"gap_of":"90"}]

Episodes of diabetes diagnoses

person_id criterion_id criterion_domain start_date end_date source_value
2 0 episode 2009-05-11 2009-07-31
3 0 episode 2009-11-17 2009-11-17
5 0 episode 2009-06-28 2009-06-28
7 0 episode 2008-04-23 2008-06-09
7 0 episode 2008-10-02 2008-11-30
7 0 episode 2009-07-12 2009-09-13
7 0 episode 2010-04-22 2010-07-18
8 0 episode 2008-10-17 2008-10-17
8 0 episode 2009-03-01 2009-03-03
8 0 episode 2009-07-16 2009-07-16

One side-effect of this operator is that the criterion_domain is set to "episode" and each row no longer contains a reference back to a criterion_id or criterion_table because one or more records are folded into a single row after passing through the episode operator. Often, the episode operator is best-suited to act as the RHR of a temporal operator.

Concurrent Within Operator

This operator compares a set of incoming streams. A record is only passed along if there is a corresponding record in each of the other streams that occurs within the specified time frame.

For instance, say the operator is passed two streams, A and B, and the time frame is 30 days before or after. If a record in A occurs within 30 days of a record in B, both the record from A and the record from B get passed along.

The impetus for this operator was to be able to find procedures that occur within a certain period of time from a diagnosis and to be able to get both the procedure record and the diagnosis record as output.


Example 44 - Diabetes diagnosis within 30 days of an office visit

["concurrent_within",["cpt_or_hcpcs","99214"],["icd9","250.00"],{"start":"-30d","end":"30d"}]

Diabetes diagnosis within 30 days of an office visit

person_id criterion_id criterion_domain start_date end_date source_value
19 62873 procedure_occurrence 2008-05-10 2008-05-10 99214
77 52511 condition_occurrence 2008-10-26 2008-10-26 25000
83 52455 condition_occurrence 2010-10-26 2010-10-26 25000
86 68057 procedure_occurrence 2009-04-21 2009-04-21 99214
95 29773 procedure_occurrence 2008-12-07 2008-12-07 99214
98 56125 procedure_occurrence 2009-11-16 2009-11-18 99214
99 94263 condition_occurrence 2009-06-27 2009-06-27 25000
105 85026 condition_occurrence 2008-03-07 2008-03-07 25000
108 91221 procedure_occurrence 2009-06-05 2009-06-05 99214
110 92651 procedure_occurrence 2010-05-08 2010-05-08 99214

Inline-Filter Operators

There are a several operators that filter an incoming stream before passing it along.

Place of Service Filter Operator

Often in claims data, an event is reported with a place of service via the CMS place of service codes, such as "inpatient hospital" (21) or "outpatient hospital" (22). For these kinds of queries, ConceptQL has the Place of Service Filter


Example 45

["place_of_service_filter",22,["icd9","250.00"]]

No Results found.


Our sample data for the examples in this document does not contain any place of service information so there are no records after applying this filter.

Provenance Operator

For better or worse, provenance, or rather information about the origin of claims data, is some times an important factor in research algorithms. Often, paper authors will describe using "inpatient records", by which they mean using claims that come from "the inpatient file" from a set of claims data. GDM tracks the provenance of data as part of its ETL process, and ConceptQL includes an operator to select rows that have a particular provenance, aptly called Provenance.


Example 46

["provenance","inpatient",["icd9","250.00"]]

person_id criterion_id criterion_domain start_date end_date source_value
68 19 condition_occurrence 2008-04-19 2008-04-23 25000
207 106 condition_occurrence 2010-04-28 2010-05-02 25000
131 170 condition_occurrence 2008-03-22 2008-03-23 25000
243 205 condition_occurrence 2009-08-20 2009-08-23 25000
230 231 condition_occurrence 2008-03-30 2008-04-02 25000
32 257 condition_occurrence 2010-08-30 2010-09-07 25000
79 267 condition_occurrence 2009-08-09 2009-08-24 25000
230 284 condition_occurrence 2008-03-01 2008-03-06 25000
168 297 condition_occurrence 2010-02-14 2010-02-18 25000
37 310 condition_occurrence 2009-01-19 2009-01-20 25000

Provider Filter Operator

There are times when we'd like to captures events only when a particular provider specialty was involved. ConceptQL provides the Provider Filter operator for this. This operator is a bit bare-bones at the moment in that it requires users to enter in the concept ID of the specialty they are seeking, rather than providing a list of known specialties.


Example 47

["provider_filter",["icd9","250.00"],{"specialties":"12301023"}]

No Results found.


Our sample data for the examples in this document does not contain any provider specialty information so there are no records after applying this filter.

One In Two Out Operator

A very common pattern in algorithms is to consider a condition to be valid if:

  • the condition is seen just once in the inpatient file, or
  • the condition is present twice in the outpatient file, with
    • a minimum separation, normally of a 30 days, between the two occurrences

We have created an operator that handles this pattern: One In Two Out


Example 48 - First of diabetes diagnoses, either found once in the inpatient file, or twice in the outpatient file with a minimum of 30 days between outpatient diagnoses

["one_in_two_out",["icd9","250.00"],{"outpatient_minimum_gap":"30d"}]

First of diabetes diagnoses, either found once in the inpatient file, or twice in the outpatient file with a minimum of 30 days between outpatient diagnoses

person_id criterion_id criterion_domain start_date end_date source_value
2 76985 condition_occurrence 2009-07-31 2009-07-31 25000
7 38732 condition_occurrence 2008-06-09 2008-06-09 25000
8 90541 condition_occurrence 2009-03-01 2009-03-03 25000
9 84726 condition_occurrence 2010-09-12 2010-09-12 25000
11 35227 condition_occurrence 2008-12-08 2008-12-08 25000
12 43800 condition_occurrence 2009-05-21 2009-05-21 25000
13 57738 condition_occurrence 2008-11-22 2008-11-22 25000
14 24380 condition_occurrence 2009-09-13 2009-09-13 25000
15 68873 condition_occurrence 2010-11-18 2010-11-18 25000
17 72739 condition_occurrence 2008-05-16 2008-05-16 25000

The One In Two Out operator yields a single row per patient. The row is the earliest confirmed event from the operator's upstream set of events.

The One In Two Out operator uses the file from which a row came from to determine if the row represents an "inpatient" record or an "outpatient" record. The place of service associated with a record is not taken into account. Nothing about the associated visit is considered either.

condition_occurrence rows with provenance related to of "inpatient" files are considered inpatient records. All other records are considered outpatient.

Non-condition_occurrence rows are removed from the stream.

In order to simplify how sets of inpatient and outpatient records are compared to each other temporally, we collapse each incoming row's date range to a single date. Users can choose which date to use for both inpatient and outpatient records.

  • Inpatient Length of Stay
    • Enforces a minimum length of stay for inpatient records
    • If given a whole number greater than 0, requires that the patient had been hospitalized for at least that many days
    • Optional
    • Default: empty
    • If left empty, length of stay is ignored
    • Length of Stay (LOS) is calculated as:
      • If (end_date - start_date) < 2, then LOS is 1
      • else LOS is (end_date - start_date) + 1
  • Inpatient Return Date
    • Determines which date from an inpatient record should be used as the date in which the associated condition "occurred"
    • Options
      • Admit Date
      • Discharge Date
    • Optional
    • Defaults to discharge date (end_date of the given condition_occurrence)
      • The discharge date is preferred over the admit date because only at discharge can we say for certain a patient was diagnosed with a given condition. We cannot be certain a patient had the condition at the time of admission
  • Outpatient Minimum Gap
    • Requires that the second, confirming outpatient diagnosis must occur at least this many days after the initial diagnosis
    • Uses same time adjustment syntax as time_window, e.g. 1d3m10y
    • Required
    • Default: 30d
  • Outpatient Maximum Gap
    • Requires that the second, confirming outpatient diagnosis must occur at within this many days after the initial diagnosis
    • Uses same time adjustment syntax as time_window, e.g. 1d3m10y
    • Optional
    • Default: empty
    • When empty, the confirming diagnosis may occur at any time after the minimum_gap to be considered valid
    • This option is useful when attempting to limit confirming diagnoses for acute diseases, e.g. if attempting to find confirmation of the flu, it would be advisable to set this option in order to limit confirmation to distinct periods when a patient had the flu
  • Outpatient Event to Return
    • Determines which event return should two outpatient records meet all criteria
      • In most situations, using the initial outpatient event is desired, particularly when the algorithm is used to record an exposure of interest or outcome
      • However, returning the initial outpatient event when the algorithm is used to determine an index event can introduce immortal time bias. In these cases, using the confirming event will avoid immortal time bias
    • Options
      • Initial Event
      • Confirming Event
    • Optional
    • Defaults to Initial Event

Person Filter Operator

Often we want to filter out a set of records by people. For instance, say we wanted to find all MIs for all males. We'd use the Person Filter operator for that. Like the Except operator, it takes a left-hand stream and a right-hand stream.

Unlike the Except operator, the Person Filter operator will use all types of all streams in the right-hand side to filter out records in all types of all streams on the left hand side.


Example 49 - All MI Conditions for people who are male

["person_filter",{"left":["icd9","412"],"right":["gender","Male"]}]

All MI Conditions for people who are male

person_id criterion_id criterion_domain start_date end_date source_value
190 61423 condition_occurrence 2009-04-09 2009-04-09 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
197 21914 condition_occurrence 2009-05-07 2009-05-07 412
149 28518 condition_occurrence 2008-04-07 2008-04-07 412
239 49718 condition_occurrence 2010-02-12 2010-02-12 412
239 47130 condition_occurrence 2009-08-04 2009-08-04 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
214 89463 condition_occurrence 2008-02-17 2008-02-17 412
214 83257 condition_occurrence 2009-11-12 2009-11-12 412
187 90638 condition_occurrence 2009-01-31 2009-01-31 412

But we can get crazier. The right-hand side doesn't have to be a person stream. If a non-person stream is used in the right-hand side, the person_filter will cast all right-hand streams to person first and use the union of those streams:


Example 50 - All MI Conditions for people who had an office visit at some point in the data

["person_filter",{"left":["icd9","412"],"right":["cpt","99214"]}]

All MI Conditions for people who had an office visit at some point in the data

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412
96 20748 condition_occurrence 2008-09-21 2008-09-24 412


Example 51 - All MI Conditions for people who had an office visit at some point in the data (an explicit representation of what's happening in the diagram above)

["person_filter",{"left":["icd9","412"],"right":["person",["cpt","99214"]]}]

All MI Conditions for people who had an office visit at some point in the data (an explicit representation of what's happening in the diagram above)

person_id criterion_id criterion_domain start_date end_date source_value
131 172 condition_occurrence 2008-03-22 2008-03-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
230 523 condition_occurrence 2008-03-14 2008-03-21 412
161 963 condition_occurrence 2009-10-25 2009-10-29 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
81 1405 condition_occurrence 2009-01-28 2009-01-30 412
88 1572 condition_occurrence 2009-01-03 2009-01-09 412
66 16171 condition_occurrence 2009-07-25 2009-07-25 412
220 20660 condition_occurrence 2009-10-31 2009-10-31 412
96 20748 condition_occurrence 2008-09-21 2008-09-24 412


Example 52 - All MI Conditions for people who are Male OR had an office visit at some point in the data

["person_filter",{"left":["icd9","412"],"right":["union",["cpt","99214"],["gender","Male"]]}]

All MI Conditions for people who are Male OR had an office visit at some point in the data

person_id criterion_id criterion_domain start_date end_date source_value
234 22299 condition_occurrence 2008-09-01 2008-09-01 412
177 48526 condition_occurrence 2008-09-23 2008-09-23 412
177 507 condition_occurrence 2009-06-13 2009-06-16 412
106 29318 condition_occurrence 2009-02-27 2009-02-27 412
110 62592 condition_occurrence 2010-06-27 2010-06-27 412
184 24281 condition_occurrence 2008-01-07 2008-01-07 412
190 61423 condition_occurrence 2009-04-09 2009-04-09 412
60 986 condition_occurrence 2009-07-19 2009-07-22 412
197 21914 condition_occurrence 2009-05-07 2009-05-07 412
149 28518 condition_occurrence 2008-04-07 2008-04-07 412

And don't forget the left-hand side can have multiple types of streams:


Example 53 - Yields two streams: a stream of all MI Conditions for people who are Male and a stream of all office visit Procedures for people who are Male

["person_filter",{"left":["union",["icd9","412"],["cpt","99214"]],"right":["gender","Male"]}]

Yields two streams: a stream of all MI Conditions for people who are Male and a stream of all office visit Procedures for people who are Male

person_id criterion_id criterion_domain start_date end_date source_value
71 31789 procedure_occurrence 2009-07-12 2009-07-12 99214
68 35465 procedure_occurrence 2008-11-08 2008-11-08 99214
68 35191 procedure_occurrence 2008-06-30 2008-06-30 99214
68 35154 procedure_occurrence 2009-03-26 2009-03-26 99214
68 33110 procedure_occurrence 2010-08-13 2010-08-13 99214
68 32826 procedure_occurrence 2009-07-22 2009-07-22 99214
68 49092 procedure_occurrence 2008-10-28 2008-10-28 99214
68 88881 procedure_occurrence 2010-03-17 2010-03-17 99214
68 80436 procedure_occurrence 2009-11-13 2009-11-13 99214
146 37311 procedure_occurrence 2009-08-17 2009-08-17 99214

Comparison Operators

These operators compare incoming streams of records in special ways to produce a set of records.

Filter Operator

Much like temporal operators, this operator takes left and right hand streams. It passes along LHR that have a RHR with a matching person_id, criterion_id, and criterion_domain. Essentially, it checks to see if a LHR is also present in the right hand stream.

Match Operator

This operator is nearly identical to the Filter operator but primarily is used internally. There are additional options available to this operator that are not exposed in the Jigsaw UI. Users are advised to use the Filter operator as opposed to this one.

Co-Reported Operator

Claims data often reports diagnoses on the same "line" as a procedure. This is done for billing reasons, essentially the provider is saying "I preformed this procedure due to these conditions present in the patient". ConceptQL has an operator that specifically targets this kind of relationship: Co-Reported

The operator takes two or more streams. Events in each stream must be reported at the same time in order to be passed through the operator. In the case of GDM this means that events that share the same context ID.


Example 54 - Diabetes diagnosis reported as part the reason for an office visit

["co_reported",["icd9","250.00"],["cpt","99214"]]

Diabetes diagnosis reported as part the reason for an office visit

person_id criterion_id criterion_domain start_date end_date source_value
25 93217 condition_occurrence 2008-09-18 2008-09-18 25000
25 93220 procedure_occurrence 2008-09-18 2008-09-18 99214
31 87695 condition_occurrence 2008-06-17 2008-06-17 25000
31 87696 procedure_occurrence 2008-06-17 2008-06-17 99214
40 81973 condition_occurrence 2009-05-01 2009-05-01 25000
40 81993 procedure_occurrence 2009-05-01 2009-05-01 99214
44 44702 condition_occurrence 2008-01-10 2008-01-10 25000
44 44715 procedure_occurrence 2008-01-10 2008-01-10 99214
58 35093 condition_occurrence 2009-06-15 2009-06-15 25000
58 35095 procedure_occurrence 2009-06-15 2009-06-15 99214

Note that both the diagnosis and procedure are pass on through the operator.

label Option and Its Features

Any ConceptQL operator can be assigned a label. The label simply provides a way to apply a brief description to an operator, generally, what kind of records the operator is producing.

Any operator with a label inserts its label into the set of records.

For instance, here are some diabetes diagnoses with a label applied:


Example 55

["icd9","250.00",{"label":"diabetes"}]

person_id criterion_id criterion_table criterion_domain start_date end_date source_value source_vocabulary_id label
68 19 clinical_codes condition_occurrence 2008-04-19 2008-04-23 25000 ICD9CM diabetes
207 106 clinical_codes condition_occurrence 2010-04-28 2010-05-02 25000 ICD9CM diabetes
131 170 clinical_codes condition_occurrence 2008-03-22 2008-03-23 25000 ICD9CM diabetes
243 205 clinical_codes condition_occurrence 2009-08-20 2009-08-23 25000 ICD9CM diabetes
230 231 clinical_codes condition_occurrence 2008-03-30 2008-04-02 25000 ICD9CM diabetes
32 257 clinical_codes condition_occurrence 2010-08-30 2010-09-07 25000 ICD9CM diabetes
79 267 clinical_codes condition_occurrence 2009-08-09 2009-08-24 25000 ICD9CM diabetes
230 284 clinical_codes condition_occurrence 2008-03-01 2008-03-06 25000 ICD9CM diabetes
168 297 clinical_codes condition_occurrence 2010-02-14 2010-02-18 25000 ICD9CM diabetes
37 310 clinical_codes condition_occurrence 2009-01-19 2009-01-20 25000 ICD9CM diabetes

The most recent upstream label is the one that is output:


Example 56

["first",["icd9","250.00",{"label":"won't show"}],{"label":"1st diabetes"}]

person_id criterion_id criterion_table criterion_domain start_date end_date source_value source_vocabulary_id label
2 89752 clinical_codes condition_occurrence 2009-05-11 2009-05-11 25000 ICD9CM 1st diabetes
3 93400 clinical_codes condition_occurrence 2009-11-17 2009-11-17 25000 ICD9CM 1st diabetes
5 89717 clinical_codes condition_occurrence 2009-06-28 2009-06-28 25000 ICD9CM 1st diabetes
7 15942 clinical_codes condition_occurrence 2008-04-23 2008-04-23 25000 ICD9CM 1st diabetes
8 95135 clinical_codes condition_occurrence 2008-10-17 2008-10-17 25000 ICD9CM 1st diabetes
9 24929 clinical_codes condition_occurrence 2010-08-01 2010-08-01 25000 ICD9CM 1st diabetes
11 73103 clinical_codes condition_occurrence 2008-06-17 2008-06-17 25000 ICD9CM 1st diabetes
12 15443 clinical_codes condition_occurrence 2009-04-20 2009-04-20 25000 ICD9CM 1st diabetes
13 19968 clinical_codes condition_occurrence 2008-03-05 2008-03-05 25000 ICD9CM 1st diabetes
14 77991 clinical_codes condition_occurrence 2009-06-05 2009-06-05 25000 ICD9CM 1st diabetes

The idea behind this is to create hints about what is being output:


Example 57

["first",["union",["icd9","250.00",{"label":"diabetes"}],["icd9","401.9",{"label":"hypertension"}]]]

person_id criterion_id criterion_table criterion_domain start_date end_date source_value source_vocabulary_id label
1 1065 clinical_codes condition_occurrence 2010-03-12 2010-03-13 4019 ICD9CM hypertension
2 57517 clinical_codes condition_occurrence 2009-03-30 2009-03-30 4019 ICD9CM hypertension
3 93400 clinical_codes condition_occurrence 2009-11-17 2009-11-17 25000 ICD9CM diabetes
5 38310 clinical_codes condition_occurrence 2008-12-14 2008-12-14 4019 ICD9CM hypertension
6 76645 clinical_codes condition_occurrence 2010-02-23 2010-02-23 4019 ICD9CM hypertension
7 15942 clinical_codes condition_occurrence 2008-04-23 2008-04-23 25000 ICD9CM diabetes
8 55867 clinical_codes condition_occurrence 2008-02-27 2008-02-27 4019 ICD9CM hypertension
9 27621 clinical_codes condition_occurrence 2009-10-12 2009-10-19 4019 ICD9CM hypertension
11 73103 clinical_codes condition_occurrence 2008-06-17 2008-06-17 25000 ICD9CM diabetes
12 34937 clinical_codes condition_occurrence 2008-09-17 2008-09-17 4019 ICD9CM hypertension

Recall Operator

If a algorithm is particularly complex, or has a stream of records that are used more than once, it can be helpful to break the algorithm into a set of sub-algorithms. This can be done using the label options and the Recall operator. Any operator that has a label can be accessed via the Recall operator.

  • Takes 1 argument
    • The "label" of an operator from which you'd like to pull the exact same set of records

A stream must be have a label applied to it before Recall can use it.


Example 58 - Save away a stream of records to build the 1 inpatient, 2 outpatient pattern used in claims data algorithms

[["one_in_two_out",["except",{"left":["union",["icd9","330.0","330.1","330.2","330.3","330.8","330.9","331.0","331.11","331.19","331.2","331.3","331.4","331.5","331.6","331.7","331.81","331.82","331.83","331.89","331.9","332.0","333.4","333.5","333.7","333.71","333.72","333.79","333.85","333.94","334.0","334.1","334.2","334.3","334.4","334.8","334.9","335.0","335.10","335.11","335.19","335.20","335.21","335.22","335.23","335.24","335.29","335.8","335.9","338.0","340","341.0","341.1","341.20","341.21","341.22","341.8","341.9","345.00","345.01","345.10","345.11","345.2","345.3","345.40","345.41","345.50","345.51","345.60","345.61","345.70","345.71","345.80","345.81","345.90","345.91","347.00","347.01","347.10","347.11","649.40","649.41","649.42","649.43","649.44","768.70","768.71","768.72","768.73","780.31","780.32","780.33","780.39","784.3"],["icd10cm","E75.00","E75.01","E75.02","E75.09","E75.10","E75.11","E75.19","E75.23","E75.25","E75.26","E75.29","E75.4","F84.2","G10","G11.0","G11.1","G11.2","G11.3","G11.4","G11.8","G11.9","G12.0","G12.1","G12.20","G12.21","G12.22","G12.23","G12.24","G12.25","G12.29","G12.8","G12.9","G13.2","G13.8","G20","G21.4","G24.01","G24.02","G24.09","G24.2","G24.8","G25.4","G25.5","G25.81","G30.0","G30.1","G30.8","G30.9","G31.01","G31.09","G31.1","G31.2","G31.81","G31.82","G31.83","G31.84","G31.85","G31.89","G31.9","G32.81","G35","G36.1","G36.8","G36.9","G37.0","G37.1","G37.2","G37.3","G37.4","G37.5","G37.8","G37.9","G40.001","G40.009","G40.011","G40.019","G40.101","G40.109","G40.111","G40.119","G40.201","G40.209","G40.211","G40.219","G40.301","G40.309","G40.311","G40.319","G40.401","G40.409","G40.411","G40.419","G40.501","G40.509","G40.801","G40.802","G40.803","G40.804","G40.811","G40.812","G40.813","G40.814","G40.821","G40.822","G40.823","G40.824","G40.89","G40.901","G40.909","G40.911","G40.919","G40.A01","G40.A09","G40.A11","G40.A19","G40.B01","G40.B09","G40.B11","G40.B19","G47.411","G47.419","G47.421","G47.429","G80.3","G89.0","G91.0","G91.1","G91.2","G91.3","G91.4","G91.8","G91.9","G93.7","G93.89","G93.9","G94","O99.350","O99.351","O99.352","O99.353","O99.354","O99.355","P91.60","P91.61","P91.62","P91.63","R41.0","R41.82","R47.01","R56.00","R56.01","R56.1","R56.9"],{"label":"neuro dxs"}],"right":["co_reported",["recall","neuro dxs"],["drg","020","021","022","023","024","025","026","027","028","029","030","031","032","033","034","035","036","037","038","039","040","041","042","052","053","054","055","056","057","058","059","060","061","062","063","064","065","066","067","068","069","070","071","072","073","074","075","076","077","078","079","080","081","082","083","084","085","086","087","088","089","090","091","092","093","094","095","096","097","098","099","100","101","102","103"]]}],{"outpatient_event_to_return":"Confirming Event","outpatient_minimum_gap":"30d","inpatient_return_date":"Discharge Date","outpatient_maximum_gap":"365d"}]]

Save away a stream of records to build the 1 inpatient, 2 outpatient pattern used in claims data algorithms

person_id criterion_id criterion_domain start_date end_date source_value
2 397 condition_occurrence 2009-04-18 2009-04-18 7843
7 27644 condition_occurrence 2008-09-21 2008-09-30 3320
14 35530 condition_occurrence 2009-07-04 2009-07-04 3310
15 60361 condition_occurrence 2010-09-01 2010-09-01 78039
19 63713 condition_occurrence 2009-01-09 2009-01-09 3310
28 131 condition_occurrence 2008-04-30 2008-04-30 3310
35 34819 condition_occurrence 2009-05-02 2009-05-02 340
40 33044 condition_occurrence 2010-05-31 2010-05-31 3319
42 72000 condition_occurrence 2008-08-28 2008-08-28 3310
44 39145 condition_occurrence 2008-08-31 2008-08-31 3310

Appendix A - Experimental Operators

These are operators that are currently implemented but are subject to change or removal in the future.

Numeric Filter Operator

This is an inline filter operator. Passes along any record that contains a value_as_number that falls between the given options. Records with NULL for value_as_number are discarded.

Equal Operator

Like a temporal comparison operator, takes left and right hand streams and compares the value in the value_as_number column. Passes along LHR where value_as_number matches a RHR record.

LHR with NULL for value_as_number are discarded.

Appendix B - Algorithm Showcase

Here are some algorithms from OMOP's Health Outcomes of Interest turned into ConceptQL statements to give more examples. I truncated some of the sets of codes to help ensure the diagrams didn't get too large.

Acute Kidney Injury - Narrow Definition and diagnostic procedure

  • ICD-9CM of 584
  • AND
    • ICD-9 procedure codes of 39.95 or 54.98 within 60 days after diagnosis
  • AND NOT
    • A diagnostic code of chronic dialysis any time before initial diagnosis
      • V45.1, V56.0, V56.31, V56.32, V56.8

Example 59

["during",{"left":["except",{"left":["icd9","584"],"right":["after",{"left":["icd9","584"],"right":["icd9","V45.1","V56.0","V56.31","V56.32","V56.8"]}]}],"right":["time_window",["icd9_procedure","39.95","54.98"],{"start":"0","end":"60d"}]}]

No Results found.


Mortality after Myocardial Infarction #3

  • Person Died
  • And Occurrence of 410* prior to death
  • And either
    • MI diagnosis within 30 days prior to 410
    • MI therapy within 60 days after 410

Example 60

["during",{"left":["before",{"left":["icd9","410*"],"right":["death"]}],"right":["union",["time_window",["union",["cpt","0146T","75898","82554","92980","93010","93233","93508","93540","93545"],["icd9_procedure","00.24","36.02","89.53","89.57","89.69"],["loinc","10839-9","13969-1","18843-3","2154-3","33204-9","48425-3","49259-5","6597-9","8634-8"]],{"start":"-30d","end":"0"}],["time_window",["union",["cpt","0146T","75898","82554","92980","93010","93233"],["icd9_procedure","00.24","36.02","89.53","89.57","89.69"]],{"start":"","end":"60d"}]]}]

No Results found.


Appendix C - History of ConceptQL and Its Evolution

ConceptQL was originally developed to query data from the OMOP Common Data Model (CDM) v4.0. This meant that ConceptQL was able to take simple statements like [ "icd9", "412" ] and determine the proper table to query (in this case condition_occurrence) along with the correct source_code and vocabulary_id to search for within that table.

As the OMOP CDM continued to evolve and as its ETL requirements continued to spiral, Outcomes Insights, Inc developed the GDM and ConceptQL was adapted to support querying both data models. Eventually support for OMOP's CDM was removed from ConceptQL because it was no longer used in any production applications of ConceptQL. However, this experience has demonstrated that ConceptQL can be adapted to support other data models should the need arise.

ConceptQL was originally implemented to work against PostgreSQL, but was adapted to work against MSSQL, Oracle, and Impala. We have since refocused ConceptQL to work against PostgreSQL, but we have shown that, with a modest amount of effort, ConceptQL can be made to support most RDBMS systems that support most features found in SQL:2003, such as window functions. Most SQL generated by ConceptQL is fairly "vanilla" in that it does not rely on special quirks or query hints specific to a particular RDBMS in order to function.

conceptql_spec's People

Contributors

aguynamedryan avatar

Stargazers

Matt Erbst avatar timelyportfolio avatar

Watchers

 avatar timelyportfolio avatar James Cloos avatar Mark Danese avatar  avatar  avatar

Forkers

jenniferduryea

conceptql_spec's Issues

Overhaul Spec

The specification document is years out of date and needs to be updated to reflect ConceptQL's current status and abilities

Review updates made on Anvil 16.11.07

We need to update the spec with the following information:

  • information on provenance, place of service, and provider (specialty) filter
  • add provenance filter short cuts
  • talk about using at_least parameters in other temporal operators
  • talk about deprecating certain operators (and talk about removing some operators if not already done so)

Tagging @aguynamedryan for comment and updates.

Create Entries for All Operators

As part of outcomesinsights/jigsaw-diagram-editor#946, we need to provide a description for each operator. For those that were not already described in the document, we'll revise Appendix A and place them in there.

Review algorithm for DRG OMOP

Hi Jen

I create the algorithm 'DRG cancer indexdate - test' in JigSaw, we will probably need you to review this with us. In addition, could we set up a one-hour meeting with her to discuss this algorithm since it is not so straightforward? Below please find the analysis plan, #1 to #3 are related to this algorithm. Thanks
Hsingwen

  1. Inclusion criteria:
  • Identify with the 1st mysuppressive chemo with any of these cancer (breast cancer, lung cancer, colorectal cancer, NHL, ovarian cancer) in the 1yr prior and 1 month post the index date. Cancer will be defined as 1 IP or 2 OPs.
  • Patients of age >=18 years
  1. Exclusion criteria:
  • Patients have any myelosuppressive chemo in the 1yr prior to the index date.
  • Patients have multiple cancers 1yr prior and 1 month post the index date. We’ll still use the 1 IP or 2 OP (of the same cancer type) to define patient to be of a cancer type. Then look for patients with multiple cancer type.
  1. The 1st chemo will be the index date. The index date of the cohort will be from 2012-2016 for DRG, OPTUM and MKSC. 1y prior to index date is baseline.

  2. The outcomes are overall neulasta use and onpro use within 14 days after index date. we will need to know the % of pt had overall neulasta/onpro among the cohort.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.