Coder Social home page Coder Social logo

hete's Introduction

HETE - A toolbox for heterogeneous treatment effects

Travis-CI Build Status

Install

devtools::install_github("wlattner/hete")

Usage

library(tidyverse)
library(hete)

data(gotv)

df <- gotv %>%
  filter(treatment %in% c("Control", "Neighbors")) %>%
  mutate(treatment = ifelse(treatment == "Control", 0, 1))

m <- hete_single(voted ~ . | treatment, data = df, est = random_forest)
p <- predict(m, df)

Notes

This package makes heavy use of partials to make all the components fit well together. There are few standard function signatures used everywhere:

  • estimator/base learner: function(x, y) -> S3, this roughly corresponds to models in R. The function should take a design matrix x, and an array of outcomes y. The return value should be an S3 object which has a predict implementation.

  • hete estimator: function(x, y, tmt) -> S3, similar to the estimator above but with the addition of the treatment indicator, tmt. This interface becomes important when working with some of the ensemble models or using the cross-validation tools.

References

Talks

  1. The Power of Persuasion Modeling, Strata + Hadoop World, 2017.

References

  1. Taddy, M., et al. (2015). A nonparametric Bayesian analysis of heterogeneous treatment effects in digital experimentation. arXiv: 1412.8563

  2. Siegel, E. (2011). Uplift Modeling: Predictive Analytics: Can't Optimize Marketing Decisions Without It. Precision Impact White Paper.

  3. Feller, A. and Holmes, C. (2009). Beyond Toplines: Heterogeneous Treatment Effects in Randomized Experiments.

  4. Athey, S. and Imbens, G. (2016). The Econometrics of Randomized Experiments. arXiv: 1607.00698

  5. Hill, J. (2010). Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics, 1-24.

  6. Grimmer, J., Messing, S., and Westwood, S. (2016). Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods.

  7. Wager, S. and Athey, S. (2016). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. arXiv: 1510.04342

  8. Athey, S. and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. PNAS 113(24):7353-7360.

  9. Athey, S., Tibshirani, J. and Wager, S. (2016). Solving Heterogeneous Estimating Equations with Gradient Forests. arXiv: 1610:0127.

  10. Imai, K. and Ratkovic, M. (2013). Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation. The Annals of Applied Statistics 7(1): 443-470.

  11. Imai, K. and Strauss, A. (2011). Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Applications to the Optimal Planning of the Get-Out-the-Vote Campaign. Political Analysis 19: 1-19.

  12. Qian, M. and Murphy, S. (2011). Performance Guarantees for Individualized Treatment Rules. Ann Stat 39(2): 1180-1210.

  13. Muller, J., Reshef, D., Du, G. and Jaakkola, T. (2016). Learning Optimal Interventions. arXiv: 1606.05027.

  14. Radcliffe, N. and Surry, P. (2011). Real-World Uplift Modeling with Significance-Based Uplift Trees. Stochastic Solutions White Paper.

hete's People

Contributors

danningc avatar wlattner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hete's Issues

Plot method for `hete_model`

Implement a plot method for hete_model with an option to plot the uplift curve or the binned treatment effect. To do this, the original training data may need to be saved with the model object.

Consistent naming

Use consistent names and abbreviations, tmt for treatment, ctl for control, te for treatment effect, est for estimator.

Implement flip trick

This is another transformed outcome method:

hete_flip_trick <- function(fold_id, x, y, tmt, folds) {
  x_train <- x[folds != fold_id, ]
  x_test <- x[folds == fold_id, ]
  
  y_train <- y[folds != fold_id]
  y_test <- y[folds == fold_id]
  
  tmt_train <- tmt[folds != fold_id]
  tmt_test <- tmt[folds == fold_id]
  
  # the "flip trick", see "Uplift modeling for clinical trial data"
  y_star_train <- ifelse(y_train == tmt_train, 1, 0)
  
  m <- gbm.fit(x_train, y_star_train, distribution = "bernoulli", n.trees = 1000)
  test_preds <- predict(m, as.matrix(x_test), type = "response",
                        n.trees = m$n.trees)
  # un-flip
  test_preds <- (2*test_preds) - 1
  
  test_df <- data.frame(
    predicted_te = test_preds,
    observed_y = y_test,
    treatment = tmt_test
  )
  
  return(test_df)
}

Differentiate between binary and continuous outcomes

For some methods such as hete_split and hete_single the type of outcome does not matter much, the behavior is determined by the model supplied by the user. hete_x when used for a binary outcome, needs to be provided with two binary models and two continuous models. The first step models the response in the treatment and control groups, the second step models the treatment effect in the two groups. The behavior of hete_tot is also different between the two tasks.

Expose all estimators used in `hete_x`

This model fits a total of four models. We currently accept a single base estimator and use this for all four steps. One benefit mentioned in the paper is the ability to use different models for each of the steps, using a more flexible models for the treatment condition with more units for example.

Uplift should be cumulative

in uplift.R:

 random_lift <- ate * frac

  # we want to order the scores from highest to lowest
  qts <- stats::quantile(pred_te, probs = rev(frac))
  model_lift <- purrr::map_dbl(qts, model_lift, y = y, tmt = tmt, pred_te = pred_te)
  # the first one must be 0
  model_lift[1] <- 0

We should also multiply model_lift by the population fraction.

Generate synthetic data

Add a method or two for generating synthetic data. Most papers, especially the ones proposing ensemble methods include a simulation study evaluating the algorithm's performance in cases where the researcher actually knows the true treatment effect for each unit in the training data.

  1. Grimmer, J., Messing, S., & Westwood, S. J. (2017). Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Analysis, 1-22. link
  2. Künzel, S., Sekhon, J., Bickel, P., & Yu, B. (2017). Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning. arXiv preprint arXiv:1706.03461. link
  3. Powers, S., Qian, J., Jung, K., Schuler, A., Shah, N. H., Hastie, T., & Tibshirani, R. (2017). Some methods for heterogeneous treatment effect estimation in high-dimensions. arXiv preprint arXiv:1707.00102. link

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.