From #77 @chrisknoll has posed an interesting way of viewing Capr
potentially through R6 via a pure OOP system. The purpose of this post is to
a) understand the benefits of switching to R6
for Capr
and
b) consider the impact of R6
within HADES...when/where to use it, why is it beneficial and ultimately does it even matter
I am hoping to get some feedback or thoughts from others @chrisknoll, @azimov, @ablack3, @anthonysena, @schuemie. I know a post was made a while back referencing this same topic.
Thoughts on OOP in context of Capr
Currently Capr
is written in S4 this was done for two (at this point, flimsy) reasons:
R6
was not available at the time. R still used ReferenceClass
as its pure OOP system. I explored this in early Capr
development but opted to go the S4
route
S4
maintains the "feel" of R. Where as R6
is more amenable to programmers coming from the java and python worlds. S4
is a stricter version of S3
which does a better job of working in a functional programming pipeline. When Capr
was originally created it was intended to heavily leverage the pipe operator %>%
however this proved to be rather awkward.
Resources to give context to S4 and R6 can be found in chapters 14, 15, and 16 of Advanced R. While the strengths of S4 can be found here.
Starting at Capr
v2, there was an intentional effort to transition the feel of the code away from piping and towards nested functions. The construction of cohorts would hence feel like building the ui of a shinyDashboard
. A dashboard requires a header, sidebar and body. Within each section the user provides context on the look by adding text, output, boxes etc. Similar to a cohort definition where the user is constructing sections of the definition...the entry, attrition, exit and era. An example of what Capr
code should look like now is show below:
library(Capr)
cd <- cohort(
#entry event (i.e. primary criteria)
entry = entry(
drugExposure(metformin, male()), # index query of metformin users who are male
observationWindow = continuousObservation(priorDays = 365L), #365 min prior obs
primaryCriteriaLimit = "All" # use all index events
),
# attrition to index event (i.e. inclusion rules)
attrition = attrition(
# no t1d any time prior
'no t1d' = withAll( # start group
exactly( # start criteria (i.e. count)
x = 0,
query = conditionOccurrence(t1d),
aperture = duringInterval(
startWindow = eventStarts(a = -Inf, b = 0, index = "startDate")
)
)
),
expressionLimit = "All" # include all events for attrition
),
# exit when the person leaves the cohort
exit = exit(
endStrategy = drugExit(
conceptSet = metformin,
surveillanceWindow = 30L
) # create metformin era to determine exit
),
# era logic on how to collapse multiple events
era = era(eraDays = 30L) # 30 days of metformin use builds an era (bit redundant)
)
If Capr
were to switch to R6
the syntax would look more like this:
cd <- cohort$new(
#entry event (i.e. primary criteria)
entry = entry$new(
#list of queries or single query
list(
drugExposure$new(
conceptSet = metformin,
attributes =
list(
male$new()
)
),
observationWindow = continuousObservation$new(prior = 365L),
primaryCriteriaLimit = limit$new(type = "All"),
additionalCriteria = NULL # placeholder
qualifyingLimit = limit$new(type = "All")
)
),
# attrition to index event (i.e. inclusion rules)
attrition = attrition$new(
list(
group$new(
name = 'no t1d occurrence',
type = "all",
int = NULL, #placeholder
criteria = criteria$new(
type = "exactly",
int = 1,
query = conditionOccurrence$new(
conceptSet = t1d
),
aperture = aperture$new(
startWindow = eventStarts$new(a = -Inf, b = 0, index = "startDate"),
ignoreObservationPeriod = FALSE # placeholder
)
)
)
)
),
# exit when the person leaves the cohort
exit = exit$new(
endStrategy = drugExit$new(
conceptSet = metformin,
surveillanceWindow = 30L
),
censoringCriteria = NULL #placeholder
),
# era logic on how to collapse multiple events
era = era$new(
eraDays = 30L
)
)
Each class has a new object method where we describe its details. Classes can have further methods such as json coercion, sql builder, print statement, and plot functions. This would be quite nice. I am conscious of not overlapping too much with CirceR
.
Thoughts on R6
My main hesitation with R6
is that it removes the "feel" of R. R works best in S3
when you take advantage of its "pipe-ability" and functional programming attributes. Forcing R code into a pure OOP system may tune out "tidy-verse" programmers trying to enter the OHDSI software space. Think there is legitimate fear here given the design of the DARWIN software which are quite "tidy-verse" heavy. Not that its any sort of competition.
Having said this, I am beginning to realize the benefits of using R6
particularly if we begin to think about complex objects (circe definition) and pipelines (strategus modules). Having a strictly encapsulated objects makes it easier to force a complex routine across a network.
This post has gone way too long (of which I have accidentally deleted it twice) but maybe it starts a conversation to think about the HADES codebase as it becomes more and more complex :)