tlverse / tlverse-handbook Goto Github PK

🎯 :closed_book: Targeted Learning in R: A Causal Data Science Handbook

Home Page: https://tlverse.org/tlverse-handbook

License: Creative Commons Attribution 4.0 International

Makefile 0.14% CSS 0.21% TeX 50.17% Shell 0.09% R 49.18% HTML 0.21%

targeted-learning causal-inference machine-learning tlverse statistics biostatistics causal-data-science causal-machine-learning data-science

tlverse-handbook's Introduction

The `tlverse` handbook

Welcome to the GitHub repository for drafts of Targeted Learning in R: A Causal Data Science Handbook, by Mark van der Laan, Jeremy Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, and Alan Hubbard. This draft is work-in-progress and updates are made available frequently. The draft has been made publicly available in order to solicit feedback (and errata) transparently from the interested community. The book is built using RStudio's bookdown R package; see bookdown.org for more information on how to use bookdown. The online book is automatically built and deployed via GitHub Actions and served to https://tlverse.org/tlverse-handbook by GitHub Pages.

tlverse-handbook's People

Contributors

Stargazers

Watchers

Forkers

yishilin14 jingwww snowdj nikolaospapachristou jpdunc23 hughjiang9827 anhnguyendepocen wintersss ehsanx metamaden jlstiles arlionn andylikescodes girmatkassie zhizuio

tlverse-handbook's Issues

re-styling boxed information

See https://bookdown.org/yihui/rmarkdown-cookbook/custom-blocks.html

Website no more accessible ?

Hi,

As the book is listed on bookdown.org as external book and we check regularly, I noticed the url https://tlverse.org/tlverse-handbook is no more reachable.

Just sharing so you are aware

Question about chapter 8.9 Variable Importance Analysis with OIT

Hello @imalenica,

I am new in tlverse and Dynamic and Optimal Individualized Treatment Regimes. However, I have a question about Variable Importance Analysis with OIT, particularly, I want to know, how to interpret the list of mean outcomes under the optimal individualized treatment (i.e. vim_results), when I modify your example, to include two categorical covariates in your dataset.

Here is the code for two categorical variables

`
#two categorical variables to compare the importance
data$W1<-ifelse(data$W1<quantile(data$W1)[2],1,ifelse(data$W1<quantile(data$W1)[3],2,3))
data$W2<-ifelse(data$W1<quantile(data$W1)[3],1,ifelse(data$W1<quantile(data$W1)[1],2,3))

node_list <- list(
W = c("W3", "W4"),
A = c("W1", "W2", "A"),
Y = "Y"
)

tmle_spec_vim <- tmle3_mopttx_vim(
V=c("W3", "W4"),
type = "blip2",
learners = learner_list,
maximize = FALSE,
method = "SL",
complex = TRUE,
realistic = FALSE,
contrast = "linear"
)

vim_results2 <- tmle3_vim(tmle_spec_vim, data, node_list, learner_list,
adjust_for_other_A = TRUE
)

View(vim_results2)`

I get the next values in the list vim_results2:
with A = W2, psi_tranformed = -0.25093618
with A = W1, psi_tranfomed = 0.08352115
with A = A, psi_tranformed = 0.01225883

Does it mean that W2 is most important than W1? and why?

Examples

Suggestion for extra contents in the TMLE chapter

Hello,

First of all, thank you for all your work, it is really an outstanding contribution that helps applied researchers to better understand how to implement TMLE in practice.

I would like to suggest two ideas of extensions for the TMLE chapter that may help me (and probably others) to apply TMLE in different contexts:

Applying TMLE with multiple treatments: I am referring here to Professor van der Laan's answer on his blog (https://vanderlaan-lab.org/2020/12/03/tmle-for-multi-level-treatments-and-methods-for-sensitivity-analysis/). I felt I understood how I could implement it in practice but an example may help to avoid any unintended mistakes;
Applying TMLE to (matched) case-control study: Similarly, I went through Professor Sherri's Ph.D. thesis and some articles but any specific example may help to make things clearer;

Once again thanks a lot for your packages/book!

chapter 10, issue concerning the auxiliary covariate for the estimation of the NDE

Many thanks for all your work.

I would like to point out an issue concerning the re-parametrization used in the auxiliary covariate C_Y(Q_Z, g)(O) in Chapter 10.

There is a discrepency between the re-parametrization in Chapter 10
p(A=0|Z,W)/p(A=1|Z,W) g(0|W)/g(1|W)
and that in the cited references,
p(A=0|Z,W)/p(A=1|Z,W) g(1|W)/g(0|W) (in Zheng and van der Laan 2012 for example)

I hope this will be useful

Broken PDF link

The link (https://tlverse.org/tlverse-handbook/handbook.pdf) in the "Download" button is broken. We should either remove the button or fix the link.

Unrendered equation in chapter 4

Hi,

Thanks for this wonderful handbook!

In chapter 4 (see link to line in the Rmd file below), the closing bracket for \psi{} is missing:

tlverse-handbook/04-roadmap.Rmd

Line 575 in 7001319

\psi_{\text{ATE} &= \E_0(Y(1) - Y(0)) \\ \nonumber

Screenshot below of this link: https://tlverse.org/tlverse-handbook/roadmap.html#missingness

deploying via GitHub Actions

There's been a surge of migration from deploying bookdown via Travis to deploying via GitHub Actions, which has also resulted in a collapse in support for using Travis to this end (note: this change is largely due to management decisions at Travis). To limit future overhead, we should consider moving to GitHub Actions, which is supported by a combination of the tic package, the usethis package, and ROpenSci's documentation.

Travis package installs not working

The package installation specified lines in .travis.yml are being ignored. This is easily verifiable by noting that the section "Installing package dependencies" is conspicuously absent from recent (any?) build logs.

Error message in 6.3.4 Counterfactual predictions

Hi, thanks very much for this really wonderful resource.

I've been working through Ch.6 and have been getting the following error when attempting to predict counterfactual outcome values in section 6.3.4:

Error in private$.predict(task) : object 'preds' not found

Not sure if I'm doing something wrong here, but I noticed that another user appeared to be trying a work around (see here where the authors initially subset on treatment level before making predictions, which isn't right for their purpose and seems to me like they've had the same issue!).

Thanks very much for reading and any help would be hugely appreciated!

`tmle3mopttx` with missing baseline

It seems that the version of tmle3mopttx tagged in the DESCRIPTION file (specifically, this version) doesn't fully support working with missing data, as identified in this Travis build. This only affects the last example given in the chapter, which is turned off in this PR in order to restore the automated builds. Not urgent to fix, but just recording here for bookkeeping.

Error in Chapter 9.7.4 Example with the WASH Benefits Data

Hi tlverse Team,

Awesome work you did with this documentation. I like the many examples, however when trying to run Example in Chapter 9.7.4 I get the following error:

Error in tmle3::point_tx_likelihood(tmle_task, learner_list) :
W is subject to censoring, this isn't supported yet

I played around and realised that the tmle3 fit works fine as long the confounder "momheight" is excluded, or only the first 1,000 rows of the data are being used. So I assume the problem lies in the data itself?
Help/bugfixing very much appreciated.

R version: 4.3.1
R Studio version: 2023.06.0+421
on MacOS 13.5

Refer to `create_github_token()` instead of `browse_github_pat()`

tlverse-handbook/02-tlverse.Rmd

Line 268 in 48a232a

- Use `usethis::browse_github_pat()` to create a Personal Access Token.

CATE estimation

Hello,
I am new with the tmle3 package, but I have a question: is possible to estimate the conditional average treatment effect (CATE) by a covariate X? If it is possible, please could you share an example?

Best regards,

Juan

Travis deploy

Steps to auto-deploy book to the gh-pages branch via Travis:

The user named in https://github.com/tlverse/tlverse-handbook/blob/master/_deploy.sh#L8-L9 (currently me) has to set up a GitHub PAT with repo scope.
The above need not be done manually, instead use the travis R package to generate such a PAT via travis::travis_set_pat(). Alternatively, one can manually add a GITHUB_PAT environment variable directly in the "settings" tab of a repositories page on Travis after creating a personal access token on GitHub.
Once the above is done, travis will have the necessary permissions to push directly to the gh-pages branch on a successful build via the deploy.sh script.

Note: the repo must be public for this process to work.

PDF TO-DOs

Sectioning
R code chunks: size, color, etc.
Reformat table + figure printing with options in _common.R, using knitr::is_latex_output()
Adding callouts to important topics in sections
Find and fix graphics that need to be higher resolution
New chapters start on new page (via \clearpage)
Consistent color palette throughout
R code output outside page

HTML silenced in PDF output

We'll eventually have to remove HTML output to compile a working PDF with all information included. For now, the fix below works but silences output:

Error: Functions that produce HTML output found in document targeting latex output.
Please change the output type of this document to HTML. Alternatively, you can allow
HTML output in non-HTML formats by adding this option to the YAML front-matter of
your rmarkdown file:

  always_allow_html: true

Note however that the HTML output will not be visible in non-HTML formats.

tmle3optmttx exercise problem -- simplified rule

Hi,

I am attempting to work through the exercise at the end of this chapter. While I can get a TMLE value of the optimal ITR, and also for the realistic optimal ITR, when I attempt to get the simplified rule xgboost appears to be throwing errors.

#a simpler rule
tmle_spec2 <- tmle3_mopttx_blip_revere(
V = c("momedu", "floor", "asset_refrig"), type = "blip2",
learners = learner_list,
maximize = TRUE, complex = FALSE, realistic = FALSE
)

fit <- tmle3(tmle_spec2, data = washb_data, node_list, learner_list)

Error in xgboost::xgb.DMatrix(Xmat) :
xgb.DMatrix does not support construction from double
Error in if (nrow(Xmat) > 0) { : argument is of length zero
Failed on predict
Error in self$compute_step() :
Error in if (nrow(Xmat) > 0) { : argument is of length zero

I can provide all of my code if that is helpful.

Error in installing the package "tlverse"

When I tried to install the package "tlverse" following the guide in handbook, I met the following error:

devtools::install_github("tlverse/tlverse")
Using github PAT from envvar GITHUB_PAT. Use gitcreds::gitcreds_set() and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
Downloading GitHub repo tlverse/tlverse@HEAD
Downloading GitHub repo tlverse/sl3@HEAD
Skipping 1 packages not available: imputeMissings
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/private/var/folders/2w/q3wdqmw51qv5hnlzh8v38yzm0000gn/T/Rtmpf07HCd/remotes1548c2debaab5/tlverse-sl3-6544257/DESCRIPTION’ ...
─ preparing ‘sl3’:
✔ checking DESCRIPTION meta-information ...
─ installing the package to process help pages
-----------------------------------
ERROR: dependency ‘imputeMissings’ is not available for package ‘sl3’
─ removing ‘/private/var/folders/2w/q3wdqmw51qv5hnlzh8v38yzm0000gn/T/RtmpZwcMYJ/Rinst158946ac27ba7/sl3’
-----------------------------------
ERROR: package installation failed
Error: Failed to install 'tlverse' from GitHub:
Failed to install 'sl3' from GitHub:
! System command 'R' failed

I find the core problem is that the package "imputeMissings" is removed from cran! As a result, package "sl3" which relies on it can not be installed.

TMLE chapter, exercise 2 (stratified TMLE) with IST data doesn't work

Error has been replicated by multiple people, with updated packages used.

`## ----tmle3-ex2----------------------------------------------------------------
ist_data <- fread(
paste0(
"https://raw.githubusercontent.com/tlverse/deming2019-workshop/",
"master/data/ist_sample.csv"
)
)

ist <- ist %>% mutate(REGION = as.factor(REGION))

----tmle3-node-list----------------------------------------------------------

node_list <- list(
W = c(
"RDELAY", "RCONSC", "SEX", "AGE",
"RSLEEP", "RATRIAL", "RCT", "RVISINF",
"RHEP24", "RASP3", "RSBP","RDEF1",
"RDEF2","RDEF3","RDEF4", "RDEF5",
"RDEF6", "RDEF7", "RDEF8", "STYPE",
"RXHEP","REGION", "MISSING_RATRIAL_RASP3","MISSING_RHEP24"
),
A = "RXASP",
Y = "DRSISC"
)

----tmle3-ate-spec-----------------------------------------------------------

ate_spec <- tmle_ATE(
treatment_level = 1,
control_level = 0
)

----tmle3-learner-list-------------------------------------------------------

lrnr_mean <- make_learner(Lrnr_mean)
lrnr_glmfast <- make_learner(Lrnr_glm_fast)

define metalearner appropriate to data types

metalearner <- make_learner(
Lrnr_solnp,
loss_function = loss_loglik_binomial,
learner_function = metalearner_logistic_binomial
)

sl_Y <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_glmfast),
metalearner = metalearner
)
sl_A <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_glmfast),
metalearner = metalearner
)

sl_Delta <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_glmfast),
metalearner = metalearner
)

learner_list <- list(A = sl_A, delta_Y = sl_Delta, Y = sl_Y)

----tmle3-spec-fit-----------------------------------------------------------

tmle_fit <- tmle3(ate_spec, ist, node_list, learner_list)
print(tmle_fit)

----tmle3-spec-summary-------------------------------------------------------

node2 <- node_list
node2$V = "REGION"
node2$W <- setdiff(node_list$W, node2$V)

ist2 <- ist

tmle_spec <- tmle_stratified(ate_spec)
stratified_fit <- tmle3(tmle_spec, ist2, node2, learner_list)

ERROR(S):
stratified_fit <- tmle3(tmle_spec, ist2, node2, learner_list)
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 44 of j is 47 which is outside the column number range [1,ncol=46]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 44 of j is 47 which is outside the column number range [1,ncol=46]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 44 of j is 47 which is outside the column number range [1,ncol=46]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=42]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 41 of j is 44 which is outside the column number range [1,ncol=43]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 41 of j is 44 which is outside the column number range [1,ncol=43]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 41 of j is 44 which is outside the column number range [1,ncol=43]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 39 of j is 42 which is outside the column number range [1,ncol=40]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=41]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=41]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=41]
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 36 of j is 37 which is outside the column number range [1,ncol=36]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 37 of j is 38 which is outside the column number range [1,ncol=37]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 37 of j is 38 which is outside the column number range [1,ncol=37]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 37 of j is 38 which is outside the column number range [1,ncol=37]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in self$subset_covariates(task) :
Task missing the following covariates expected by Lrnr_solnp_TRUE_TRUE_FALSE_1e-05: Lrnr_glm_fast_TRUE_Cholesky
Failed on predict
Error in self$compute_step() : Error in self$subset_covariates(task) :
Task missing the following covariates expected by Lrnr_solnp_TRUE_TRUE_FALSE_1e-05: Lrnr_glm_fast_TRUE_Cholesky

Minor post-update break

It looks like the PR at #3 is broken due to an update in the tmle3mopptx package following initial completion of the last draft: https://travis-ci.org/tlverse/tlverse-handbook/builds/555540233#L7033. It looks like the chunk that's broken is at https://github.com/tlverse/tlverse-handbook/blob/master/06-tmle3mopttx.Rmd#L651. Not sure how/what the fix is but just an FYI @podTockom.

Formatting for CRC

See https://yihui.org/en/2018/08/bookdown-crc/