wlattner / hete Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 4.0 10.02 MB

Heterogeneous Treatment Effects

License: MIT License

R 100.00%

ab-testing causal-inference experiments r statistics

hete's People

Contributors

Stargazers

Watchers

Forkers

guhjy jaeyk jhpark9090 pmatth

hete's Issues

The uplift curve still looks wrong

Function(s) to build and compare several models on same training data

Set a reasonable default estimator

Tools to aid in model interpretation

Explore ways in which users can dive into the inner workings of the model. Options here would be tools like lime, pdp, and ice.

Vignette walking through an analysis of the Gerber Green GOTV data

Implement flip trick

This is another transformed outcome method:

hete_flip_trick <- function(fold_id, x, y, tmt, folds) {
  x_train <- x[folds != fold_id, ]
  x_test <- x[folds == fold_id, ]
  
  y_train <- y[folds != fold_id]
  y_test <- y[folds == fold_id]
  
  tmt_train <- tmt[folds != fold_id]
  tmt_test <- tmt[folds == fold_id]
  
  # the "flip trick", see "Uplift modeling for clinical trial data"
  y_star_train <- ifelse(y_train == tmt_train, 1, 0)
  
  m <- gbm.fit(x_train, y_star_train, distribution = "bernoulli", n.trees = 1000)
  test_preds <- predict(m, as.matrix(x_test), type = "response",
                        n.trees = m$n.trees)
  # un-flip
  test_preds <- (2*test_preds) - 1
  
  test_df <- data.frame(
    predicted_te = test_preds,
    observed_y = y_test,
    treatment = tmt_test
  )
  
  return(test_df)
}

Plot method for `hete_model`

Implement a plot method for hete_model with an option to plot the uplift curve or the binned treatment effect. To do this, the original training data may need to be saved with the model object.

Summary method for `hete_model`

Add a method or two for generating synthetic data. Most papers, especially the ones proposing ensemble methods include a simulation study evaluating the algorithm's performance in cases where the researcher actually knows the true treatment effect for each unit in the training data.

Grimmer, J., Messing, S., & Westwood, S. J. (2017). Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Analysis, 1-22. link
Künzel, S., Sekhon, J., Bickel, P., & Yu, B. (2017). Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning. arXiv preprint arXiv:1706.03461. link
Powers, S., Qian, J., Jung, K., Schuler, A., Shah, N. H., Hastie, T., & Tibshirani, R. (2017). Some methods for heterogeneous treatment effect estimation in high-dimensions. arXiv preprint arXiv:1707.00102. link

Pollinated transformed outcome forests

Print method for `hete_model`

hete_tot breaks when the outcome is a factor

Uplift should be cumulative

in uplift.R:

 random_lift <- ate * frac

  # we want to order the scores from highest to lowest
  qts <- stats::quantile(pred_te, probs = rev(frac))
  model_lift <- purrr::map_dbl(qts, model_lift, y = y, tmt = tmt, pred_te = pred_te)
  # the first one must be 0
  model_lift[1] <- 0

We should also multiply model_lift by the population fraction.

Turn strata slides into narrative/vignette

Wrappers for popular estimators

randomForest
ranger
xgboost
caret
bart

predict(model, newdata = NULL) should return predictions for training data

This is the typical behavior of models in R.

Expose all estimators used in `hete_x`

This model fits a total of four models. We currently accept a single base estimator and use this for all four steps. One benefit mentioned in the paper is the ability to use different models for each of the steps, using a more flexible models for the treatment condition with more units for example.

Superlearner

Try `parsnip` package for consistent interface to ml models

https://github.com/tidymodels/parsnip

Implement broom methods

https://github.com/tidyverse/broom

tidy: return the uplift curve
glance: return the predicted ate and the q score
augment: add .pred_te column to dataframe

Consistent naming

Use consistent names and abbreviations, tmt for treatment, ctl for control, te for treatment effect, est for estimator.

hete_single example has wrong parameter order for uplift

The examples for hete_single has the incorrect parameter order for uplift, it should be uplift(outcome, treatment, predicted).

Differentiate between binary and continuous outcomes

For some methods such as hete_split and hete_single the type of outcome does not matter much, the behavior is determined by the model supplied by the user. hete_x when used for a binary outcome, needs to be provided with two binary models and two continuous models. The first step models the response in the treatment and control groups, the second step models the treatment effect in the two groups. The behavior of hete_tot is also different between the two tasks.