tlverse / tmle3shift Goto Github PK

🎯 :game_die: Targeted Learning of the Causal Effects of Stochastic Interventions

Home Page: https://tlverse.org/tmle3shift

License: GNU General Public License v3.0

Makefile 1.02% R 93.62% TeX 5.36%

targeted-learning causal-inference machine-learning stochastic-interventions treatment-effects variable-importance marginal-structural-models

tmle3shift's Introduction

R/`tmle3shift`

Targeted Learning of the Causal Effects of Stochastic Interventions

Authors: Nima Hejazi, Jeremy Coyle, and Mark van der Laan

What’s `tmle3shift`?

tmle3shift is an adapter/extension R package in the tlverse ecosystem that exposes support for the estimation of a target parameter defined as the mean counterfactual outcome under a posited shift of the natural value of a continuous-valued intervention, using the formalism of stochastic treatment regimes. As an adapter package, tmle3shift builds upon the core tlverse grammar introduced by tmle3, a general framework that supports the implementation of a range of TMLE parameters through a unified interface. For a detailed description of the target parameter, TML estimator, and algorithm implemented in tmle3shift, the interested reader is invited to consult Dı́az and van der Laan (2012) and Dı́az and van der Laan (2018). For a general discussion of the framework of targeted minimum loss-based estimation and the role this methodology plays in statistical and causal inference, the canonical references are van der Laan and Rose (2011) and van der Laan and Rose (2018).

Building on the original work surrounding the TML estimator for the aforementioned target parameter, tmle3shift additionally implements a set of techniques for variable importance analysis, allowing for a sequence of mean counterfactual outcomes, estimated under a sequence of posited shifts, to be summarized via a working marginal structural model (MSM). The goal of this work is to build upon the tlverse framework and the estimation methodology implemented for a single mean counterfactual outcome in order to introduce an end-to-end methodology for variable importance analyses.

Installation

You can install the development version of tmle3shift from GitHub via remotes with

remotes::install_github("tlverse/tmle3shift")

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

Contributions

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.

Citation

After using the tmle3shift R package, please cite the following:

    @software{hejazi2021tmle3shift-rpkg,
      author = {Hejazi, Nima S and Coyle, Jeremy R and {van der Laan}, Mark
        J},
      title = {{tmle3shift}: {Targeted Learning} of the Causal Effects of
        Stochastic Interventions},
      year = {2021},
      howpublished = {\url{https://github.com/tlverse/tmle3shift}},
      note = {{R} package version 0.2.0},
      url = {https://doi.org/10.5281/zenodo.4603372},
      doi = {10.5281/zenodo.4603372}
    }

R/txshift - An R package providing an independent implementation of the TML estimation procedure and statistical methodology as is made available here, without reliance on the tlverse grammar provided by tmle3.

Funding

The development of this software was supported in part through a grant from the National Institutes of Health: T32 LM012417-02.

License

The contents of this repository are distributed under the GPL-3 license. See file LICENSE for details.

References

Dı́az, Iván, and Mark J van der Laan. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68 (2): 541–49.

———. 2018. “Stochastic Treatment Regimes.” In Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies, 167–80. Springer Science & Business Media.

van der Laan, Mark J, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media.

———. 2018. Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer Science & Business Media.

tmle3shift's People

Contributors

Stargazers

Watchers

Forkers

guhjy

tmle3shift's Issues

Variable importance for nominal variables with few categories

Theoretically, it should be sound to perform variable importance assessment based on a grid of counterfactual shift values with nominal variables; however, in practice, such variables (even when converted via as.numeric) have few unique values. This leads to a downstream bug due to sl3's Variable_Type where the nominal variables are categorized as categorical rather than continuous. This bug is non-trivial to track down and can be distressing to users. A simple but naive solution is to add mean-zero noise to nominal variables such that there appear to be more than 20 or so unique values, as this is sufficient to trick sl3 into recognizing the variable as continuous. For example, in the following variable u has only 4 (ordered) categories but will be recognized as categorical:

n <- 10000
u_idx <- runif(n)
u <- rep(NA, n)
u[u_idx <= 0.1] <- "A"
u[u_idx > 0.1 & u_idx <= 0.3] <- "B"
u[u_idx > 0.3 & u_idx <= 0.95] <- "C"
u[u_idx > 0.95] <- "D"
u <- as.numeric(as.factor(u))

To have it recognized as continuous, one could implement

u <- u + runif(n, -0.001, 0.001)

which will have more categories than the original u yet remain the same in expectation.

warnings in tests

I ran the test suite with options(warn = 2) and it generated some failures:

── Error (test-bound.R:61:3): bounds are being respected in submodel ───────────────────────────
Error in `max(Q_submodel)`: (converted from warning) no non-missing arguments to max; returning -Inf
Backtrace:
    ▆
 1. └─testthat::expect_lte(max(Q_submodel), 1 - Q_bound_level) at test-bound.R:61:2
 2.   └─testthat::quasi_label(enquo(object), label, arg = "object")
── Error (test-marginal_structural.R:171:1): (code run outside of `test_that()`) ───────────────────────────
Error in `learner$predict_fold(learner_task, fold_number)`: (converted from warning) Lrnr_density_semiparametric_NULL_NULL is not cv-aware: self$predict_fold reverts to self$predict
Backtrace:
     ▆
  1. └─R6 fit_tmle3(tmle_task, targeted_likelihood, msm, updater) at test-marginal_structural.R:171:0
  2.   └─tmle3 initialize(...)
  3.     └─private$.tmle_fit(max_it)
  4.       └─self$updater$update(self$likelihood, self$tmle_task)
  5.         └─base::lapply(...)
  6.           └─tmle3 FUN(X[[i]], ...)
  7.             └─tmle_param$estimates(tmle_task, update_fold)
  8.               └─self$clever_covariates(tmle_task, fold_number)
  9.                 └─self$observed_likelihood$get_likelihoods(...)
 10.                   └─self$get_likelihood(tmle_task, nodes[[1]], fold_number)
 11.                     └─self$initial_likelihood$get_likelihood(tmle_task, node, fold_number)
 12.                       └─likelihood_factor$get_likelihood(tmle_task, fold_number)
 13.                         └─self$get_density(tmle_task, fold_number)
 14.                           └─learner$predict_fold(learner_task, fold_number)
── Error (test-stratified_intevention.R:61:1): (code run outside of `test_that()`) ───────────────────────────
Error in `ED * private$.targeted_components`: (converted from warning) longer object length is not a multiple of shorter object length
Backtrace:
    ▆
 1. └─R6 fit_tmle3(tmle_task, targeted_likelihood, tmle_param, updater) at test-stratified_intevention.R:61:0
 2.   └─tmle3 initialize(...)
 3.     └─private$.tmle_fit(max_it)
 4.       └─self$updater$update(self$likelihood, self$tmle_task)

It might be best to design the tests or re-write the underlying code so as not to produce these warnings.

Implementing shift guards that always move intervention

A proposal for a stochastic intervention that moves the natural value of the treatment as much as is possible, given the observed data, exists. Implementing such a shift requires that the ratio of the post-intervention treatment density to the empirical treatment density be evaluated and a maximum (in magnitude) shift identified for each stratum defined by the baseline covariates. An initial implementation of this is available on this branch but the efficiency of this implementation needs improvement.

Implementing original TMLE algorithm

It would be desirable to implement the original version of the algorithm for the TMLE of a counterfactual mean under a stochastic intervention, as described in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4117410/

Support censored outcomes

See https://github.com/tlverse/tmle3/blob/missing-outcome/tests/testthat/test-missing_outcome.R

Compatible with tmle3mediate?

Hello @nhejazi, I'm new to TMLE and attempting mediation analysis with continuous data. I came across your packages, tmle3mediate and tmle3shift, and believe they could be valuable for my work. However, I've encountered an issue with their support for mediators or continuous data, as they require generating a Spec term for each. It seems that I'm unable to include both Specs in a single estimation task. If this is correct, do you have any guidance on capturing causal effects in such scenarios?

Compute MSM parameters via delta method

The machinery built into tmle3 for the delta method should be useful in fleshing out tmle3_Spec_vimshift_delta to fit a joint TMLE over the shifted counterfactual means and then compute the MSM parameters from the individual TML estimates. This relies on updated implementations of

Target MSM parameters directly

To estimate the parameters of a working MSM, it's also possible to fit a TMLE to directly target the parameters of the specified model. An initial implementation has been begun in the following

Comparing two stochastic interventions

It would generally be of interest to be able to effectively compare the counterfactual outcome under two posited values of a stochastic intervention. This would be a rather simple application of the delta method, similar to how the ATE is computed based on a use of Param_delta for two treatment-specific means.

shift functions that respect bounds

new shift-guard function on the add-shift-guard branch implements the finding of bounds for the shifted treatment wrt to baseline covariates W, as given in the relevant book chapter
two new shift functions, similar to shift_additive and shift_additive_inv, that work in a similar way but also respect the bounds provided by shift-guard

Loose test against classic implementation

The txshift package provides an implementation of this same estimation procedure in the case of a simple shift (i.e., shift_additive) without reliance on tmle3 machinery. The independent procedure is used as the basis for this test, which has been loosened as of f83eabf. This test was previously passing under the (original) more stringent criteria as of 355c29f. It seems rather unlikely that a change in this package would have caused this test to break (since no commits near or after 355c29f altered shift_additive); instead, it's likely this was caused by updates to dependencies, which do not currently run reverse dependency checks. This should be further investigated.