Coder Social home page Coder Social logo

tlverse / tlverse-handbook Goto Github PK

View Code? Open in Web Editor NEW
55.0 5.0 17.0 57.24 MB

🎯 :closed_book: Targeted Learning in R: A Causal Data Science Handbook

Home Page: https://tlverse.org/tlverse-handbook

License: Creative Commons Attribution 4.0 International

Makefile 0.14% CSS 0.21% TeX 50.17% Shell 0.09% R 49.18% HTML 0.21%
targeted-learning causal-inference machine-learning tlverse statistics biostatistics causal-data-science causal-machine-learning data-science

tlverse-handbook's Introduction

The tlverse handbook Booklet

Welcome to the GitHub repository for drafts of Targeted Learning in R: A Causal Data Science Handbook, by Mark van der Laan, Jeremy Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, and Alan Hubbard. This draft is work-in-progress and updates are made available frequently. The draft has been made publicly available in order to solicit feedback (and errata) transparently from the interested community. The book is built using RStudio's bookdown R package; see bookdown.org for more information on how to use bookdown. The online book is automatically built and deployed via GitHub Actions and served to https://tlverse.org/tlverse-handbook by GitHub Pages.

tlverse-handbook's People

Contributors

imalenica avatar jeremyrcoyle avatar nhejazi avatar rachaelvp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tlverse-handbook's Issues

Question about chapter 8.9 Variable Importance Analysis with OIT

Hello @imalenica,

I am new in tlverse and Dynamic and Optimal Individualized Treatment Regimes. However, I have a question about Variable Importance Analysis with OIT, particularly, I want to know, how to interpret the list of mean outcomes under the optimal individualized treatment (i.e. vim_results), when I modify your example, to include two categorical covariates in your dataset.

Here is the code for two categorical variables

`
#two categorical variables to compare the importance
data$W1<-ifelse(data$W1<quantile(data$W1)[2],1,ifelse(data$W1<quantile(data$W1)[3],2,3))
data$W2<-ifelse(data$W1<quantile(data$W1)[3],1,ifelse(data$W1<quantile(data$W1)[1],2,3))

node_list <- list(
W = c("W3", "W4"),
A = c("W1", "W2", "A"),
Y = "Y"
)

tmle_spec_vim <- tmle3_mopttx_vim(
V=c("W3", "W4"),
type = "blip2",
learners = learner_list,
maximize = FALSE,
method = "SL",
complex = TRUE,
realistic = FALSE,
contrast = "linear"
)

vim_results2 <- tmle3_vim(tmle_spec_vim, data, node_list, learner_list,
adjust_for_other_A = TRUE
)

View(vim_results2)`

I get the next values in the list vim_results2:
with A = W2, psi_tranformed = -0.25093618
with A = W1, psi_tranfomed = 0.08352115
with A = A, psi_tranformed = 0.01225883

Does it mean that W2 is most important than W1? and why?

Suggestion for extra contents in the TMLE chapter

Hello,

First of all, thank you for all your work, it is really an outstanding contribution that helps applied researchers to better understand how to implement TMLE in practice.

I would like to suggest two ideas of extensions for the TMLE chapter that may help me (and probably others) to apply TMLE in different contexts:

Once again thanks a lot for your packages/book!

chapter 10, issue concerning the auxiliary covariate for the estimation of the NDE

Many thanks for all your work.

I would like to point out an issue concerning the re-parametrization used in the auxiliary covariate C_Y(Q_Z, g)(O) in Chapter 10.

There is a discrepency between the re-parametrization in Chapter 10
p(A=0|Z,W)/p(A=1|Z,W) g(0|W)/g(1|W)
and that in the cited references,
p(A=0|Z,W)/p(A=1|Z,W) g(1|W)/g(0|W) (in Zheng and van der Laan 2012 for example)

I hope this will be useful

Travis package installs not working

The package installation specified lines in .travis.yml are being ignored. This is easily verifiable by noting that the section "Installing package dependencies" is conspicuously absent from recent (any?) build logs.

Error message in 6.3.4 Counterfactual predictions

Hi, thanks very much for this really wonderful resource.

I've been working through Ch.6 and have been getting the following error when attempting to predict counterfactual outcome values in section 6.3.4:

Error in private$.predict(task) : object 'preds' not found

Not sure if I'm doing something wrong here, but I noticed that another user appeared to be trying a work around (see here where the authors initially subset on treatment level before making predictions, which isn't right for their purpose and seems to me like they've had the same issue!).

Thanks very much for reading and any help would be hugely appreciated!

JO

`tmle3mopttx` with missing baseline

It seems that the version of tmle3mopttx tagged in the DESCRIPTION file (specifically, this version) doesn't fully support working with missing data, as identified in this Travis build. This only affects the last example given in the chapter, which is turned off in this PR in order to restore the automated builds. Not urgent to fix, but just recording here for bookkeeping.

Error in Chapter 9.7.4 Example with the WASH Benefits Data

Hi tlverse Team,

Awesome work you did with this documentation. I like the many examples, however when trying to run Example in Chapter 9.7.4 I get the following error:

Error in tmle3::point_tx_likelihood(tmle_task, learner_list) :
W is subject to censoring, this isn't supported yet

I played around and realised that the tmle3 fit works fine as long the confounder "momheight" is excluded, or only the first 1,000 rows of the data are being used. So I assume the problem lies in the data itself?
Help/bugfixing very much appreciated.

R version: 4.3.1
R Studio version: 2023.06.0+421
on MacOS 13.5

CATE estimation

Hello,
I am new with the tmle3 package, but I have a question: is possible to estimate the conditional average treatment effect (CATE) by a covariate X? If it is possible, please could you share an example?

Best regards,

Juan

Travis deploy

Steps to auto-deploy book to the gh-pages branch via Travis:

  • The user named in https://github.com/tlverse/tlverse-handbook/blob/master/_deploy.sh#L8-L9 (currently me) has to set up a GitHub PAT with repo scope.
  • The above need not be done manually, instead use the travis R package to generate such a PAT via travis::travis_set_pat(). Alternatively, one can manually add a GITHUB_PAT environment variable directly in the "settings" tab of a repositories page on Travis after creating a personal access token on GitHub.
  • Once the above is done, travis will have the necessary permissions to push directly to the gh-pages branch on a successful build via the deploy.sh script.

Note: the repo must be public for this process to work.

PDF TO-DOs

  • Sectioning
  • R code chunks: size, color, etc.
  • Reformat table + figure printing with options in _common.R, using knitr::is_latex_output()
  • Adding callouts to important topics in sections
  • Find and fix graphics that need to be higher resolution
  • New chapters start on new page (via \clearpage)
  • Consistent color palette throughout
  • R code output outside page

HTML silenced in PDF output

We'll eventually have to remove HTML output to compile a working PDF with all information included. For now, the fix below works but silences output:

Error: Functions that produce HTML output found in document targeting latex output.
Please change the output type of this document to HTML. Alternatively, you can allow
HTML output in non-HTML formats by adding this option to the YAML front-matter of
your rmarkdown file:

  always_allow_html: true

Note however that the HTML output will not be visible in non-HTML formats.

tmle3optmttx exercise problem -- simplified rule

Hi,

I am attempting to work through the exercise at the end of this chapter. While I can get a TMLE value of the optimal ITR, and also for the realistic optimal ITR, when I attempt to get the simplified rule xgboost appears to be throwing errors.

#a simpler rule
tmle_spec2 <- tmle3_mopttx_blip_revere(
V = c("momedu", "floor", "asset_refrig"), type = "blip2",
learners = learner_list,
maximize = TRUE, complex = FALSE, realistic = FALSE
)

fit <- tmle3(tmle_spec2, data = washb_data, node_list, learner_list)

Error in xgboost::xgb.DMatrix(Xmat) :
xgb.DMatrix does not support construction from double
Error in if (nrow(Xmat) > 0) { : argument is of length zero
Failed on predict
Error in self$compute_step() :
Error in if (nrow(Xmat) > 0) { : argument is of length zero

I can provide all of my code if that is helpful.

Error in installing the package "tlverse"

When I tried to install the package "tlverse" following the guide in handbook, I met the following error:

devtools::install_github("tlverse/tlverse")
Using github PAT from envvar GITHUB_PAT. Use gitcreds::gitcreds_set() and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
Downloading GitHub repo tlverse/tlverse@HEAD
Downloading GitHub repo tlverse/sl3@HEAD
Skipping 1 packages not available: imputeMissings
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────
βœ” checking for file β€˜/private/var/folders/2w/q3wdqmw51qv5hnlzh8v38yzm0000gn/T/Rtmpf07HCd/remotes1548c2debaab5/tlverse-sl3-6544257/DESCRIPTION’ ...
─ preparing β€˜sl3’:
βœ” checking DESCRIPTION meta-information ...
─ installing the package to process help pages
-----------------------------------
ERROR: dependency β€˜imputeMissings’ is not available for package β€˜sl3’
─ removing β€˜/private/var/folders/2w/q3wdqmw51qv5hnlzh8v38yzm0000gn/T/RtmpZwcMYJ/Rinst158946ac27ba7/sl3’
-----------------------------------
ERROR: package installation failed
Error: Failed to install 'tlverse' from GitHub:
Failed to install 'sl3' from GitHub:
! System command 'R' failed

I find the core problem is that the package "imputeMissings" is removed from cran! As a result, package "sl3" which relies on it can not be installed.

TMLE chapter, exercise 2 (stratified TMLE) with IST data doesn't work

Error has been replicated by multiple people, with updated packages used.

`## ----tmle3-ex2----------------------------------------------------------------
ist_data <- fread(
paste0(
"https://raw.githubusercontent.com/tlverse/deming2019-workshop/",
"master/data/ist_sample.csv"
)
)

ist <- ist %>% mutate(REGION = as.factor(REGION))

----tmle3-node-list----------------------------------------------------------

node_list <- list(
W = c(
"RDELAY", "RCONSC", "SEX", "AGE",
"RSLEEP", "RATRIAL", "RCT", "RVISINF",
"RHEP24", "RASP3", "RSBP","RDEF1",
"RDEF2","RDEF3","RDEF4", "RDEF5",
"RDEF6", "RDEF7", "RDEF8", "STYPE",
"RXHEP","REGION", "MISSING_RATRIAL_RASP3","MISSING_RHEP24"
),
A = "RXASP",
Y = "DRSISC"
)

----tmle3-ate-spec-----------------------------------------------------------

ate_spec <- tmle_ATE(
treatment_level = 1,
control_level = 0
)

----tmle3-learner-list-------------------------------------------------------

lrnr_mean <- make_learner(Lrnr_mean)
lrnr_glmfast <- make_learner(Lrnr_glm_fast)

define metalearner appropriate to data types

metalearner <- make_learner(
Lrnr_solnp,
loss_function = loss_loglik_binomial,
learner_function = metalearner_logistic_binomial
)

sl_Y <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_glmfast),
metalearner = metalearner
)
sl_A <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_glmfast),
metalearner = metalearner
)

sl_Delta <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_glmfast),
metalearner = metalearner
)

learner_list <- list(A = sl_A, delta_Y = sl_Delta, Y = sl_Y)

----tmle3-spec-fit-----------------------------------------------------------

tmle_fit <- tmle3(ate_spec, ist, node_list, learner_list)
print(tmle_fit)

----tmle3-spec-summary-------------------------------------------------------

node2 <- node_list
node2$V = "REGION"
node2$W <- setdiff(node_list$W, node2$V)

ist2 <- ist

tmle_spec <- tmle_stratified(ate_spec)
stratified_fit <- tmle3(tmle_spec, ist2, node2, learner_list)

`

ERROR(S):
stratified_fit <- tmle3(tmle_spec, ist2, node2, learner_list)
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 44 of j is 47 which is outside the column number range [1,ncol=46]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 44 of j is 47 which is outside the column number range [1,ncol=46]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 44 of j is 47 which is outside the column number range [1,ncol=46]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=42]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 41 of j is 44 which is outside the column number range [1,ncol=43]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 41 of j is 44 which is outside the column number range [1,ncol=43]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 41 of j is 44 which is outside the column number range [1,ncol=43]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 39 of j is 42 which is outside the column number range [1,ncol=40]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=41]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=41]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 40 of j is 43 which is outside the column number range [1,ncol=41]
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in ind_ref_mat[as.numeric(x), , drop = FALSE] :
incorrect number of dimensions
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 36 of j is 37 which is outside the column number range [1,ncol=36]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 37 of j is 38 which is outside the column number range [1,ncol=37]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 37 of j is 38 which is outside the column number range [1,ncol=37]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 37 of j is 38 which is outside the column number range [1,ncol=37]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in [.data.table(X, , which(!is.na(coef)), drop = FALSE, with = FALSE) :
Item 43 of j is 46 which is outside the column number range [1,ncol=45]
Error in self$subset_covariates(task) :
Task missing the following covariates expected by Lrnr_solnp_TRUE_TRUE_FALSE_1e-05: Lrnr_glm_fast_TRUE_Cholesky
Failed on predict
Error in self$compute_step() : Error in self$subset_covariates(task) :
Task missing the following covariates expected by Lrnr_solnp_TRUE_TRUE_FALSE_1e-05: Lrnr_glm_fast_TRUE_Cholesky

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.