Coder Social home page Coder Social logo

sparsesc's Introduction

SparseSC is a package that implements an ML-enhanced version of Synthetic Control Methodologies, which introduces penalties on the both feature and unit weights. Specifically it optimizes these two equations:

$$\gamma_0 = \left\Vert Y_T - W\cdot Y_C \right\Vert_F^2 + \lambda_V \left\Vert V \right\Vert_1 $$

$$\gamma_1 = \left\Vert X_T - W\cdot X_C \right\Vert_V + \lambda_W \left\Vert W - J/c \right\Vert_F^2 $$

by optimizing $\lambda_V$ and $\lambda_W$ using cross validation within the control units in the usual way, and where:

  • $X_T$ and $X_C$ are matrices of features (covariates and/or pre-treatement outcomes) for the treated and control units, respectively

  • $Y_T$ and $Y_C$ are matrices of post-treatement outcomes on the treated and control units, respectively

  • $W$ is a matrix of weights such that $W \cdot (X_C \left |Y_C \right)$ forms a matrix of synthetic controls for all units

  • $V$ is a diagnoal matrix of weights applied to the covariates / pre-treatment outcomes.

Note that all matrices are formated with one row per unit and one column per feature / outcome. Breaking down the two main equations, we have:

  • $\left\Vert Y_T - W\cdot Y_C \right\Vert_F^2$ is the out-of-sample squared prediction error (i.e. the squared Frobenius Norm), measured within the control units under cross validation

  • $\left\Vert V \right\Vert_1$ represents how much the model depends on the features

  • $\left\Vert X_T - W\cdot X_C \right\Vert_V$ is the difference between synthetic and observed units in the the feature space weighted by $V$. Specifically, $\left\Vert A \right\Vert_V = AVA^T$

  • $\left\Vert W - J/c \right\Vert_F^2$ is the difference between optimistic weights and simple ($1/N_c$) weighted averages of the control units.

Typically this is used to estimate causal effects from binary treatments on observational panel (longitudinal) data. The functions fit() and fit_fast() provide basic fitting of the model. If you are estimating treatment effects, fitting and diagnostic information can be done via estimate_effects().

Though the fitting methods do not require such structure, the typical setup is where we have panel data of an outcome variable Y for T time periods for N observation units (customer, computers, etc.). We may additionally have some baseline characteristics X about the units. In the treatment effect setting, we will also have a discrete change in treatment status (e.g. some policy change) at time, T0, for a select group of units. When there is treatment, we can think of the pre-treatment data as [X, Y_pre] and post-treatment data as [Y_post].

import SparseSC

# Fit the model:
treatment_unit_size = np.full((N), np.NaN)
treatment_unit_size[treated_unit_idx] = T0
fitted_estimates = SparseSC.estimate_effects(Y,unit_treatment_periods,...)

# Print summary of the model including effect size estimates, 
# p-values, and confidendence intervals:
print(fitted_estimates)

# Extract model attributes:
fitted_estimates.pl_res_post.avg_joint_effect.p_value
fitted_estimates.pl_res_post.avg_joint_effect.CI

# access the fitted Synthetic Controls model:
fitted_model = fitted_estimates.fit

See the docs for more details

Overview

See here for more info on Synthetic Controls. In essence, it is a type of matching estimator. For each unit it will find a weighted average of untreated units that is similar on key pre-treatment data. The goal of Synthetic controls is find out which variables are important to match on (the V matrix) and then, given those, to find a vector of per-unit weights that combine the control units into its synthetic control. The synthetic control acts as the counterfactual for a unit, and the estimate of a treatment effect is the difference between the observed outcome in the post-treatment period and the synthetic control's outcome.

SparseSC makes a number of changes to Synthetic Controls. It uses regularization and feature learning to avoid overfitting, ensure uniqueness of the solution, automate researcher decisions, and allow for estimation on large datasets. See the docs for more details.

The main choices to make are:

  1. The solution structure
  2. The model-type

SparseSC Solution Structure

The first choice is whether to calculate all of the high-level parameters (V, its regularization parameter, and the regularization parameters for the weights) on the main matching objective or whether to get approximate/fast estimates of them using non-matching formulations. The options are:

  • Full joint (done by fit()): We optimize over v_pen, w_pen and V, so that the resulting SC for controls have smallest squared prediction error on Y_post.
  • Separate (done by fit_fast()): We note that we can efficiently estimate w_pen on main matching objective, since, given V, we can reformulate the finding problem into a Ridge Regression and use efficient LOO cross-validation (e.g. RidgeCV) to estimate w_pen. We will estimate V using an alternative, non-matching objective (such as a MultiTaskLasso of using X,Y_pre to predict Y_post). This setup also allows for feature generation to select the match space. There are two variants depending on how we handle v_pen:
    • Mixed. Choose v_pen based on the resulting down-stream main matching objective.
    • Full separate: Choose v_pen base on approximate objective (e.g., MultiTaskLassoCV).

The Fully Separate solution is fast and often quite good so we recommend starting there, and if need be, advancing to the Mixed and then Fully Joint optimizations.

Model types

There are two main model-types (corresponding to different cuts of the data) that can be used to estimate treatment effects.

  1. Retrospective: The goal is to minimize squared prediction error of the control units on Y_post and the full-pre history of the outcome is used as features in fitting. This is the default and was used in the descriptive elements above.
  2. Prospective: We make an artificial split in time before any treatment actually happens (Y_pre=[Y_train,Y_test]). The goal is to minimize squared prediction error of all units on Y_test and Y_train for all units is used as features in fitting.

Given the same amount of features, the two will only differ when there are a non-trivial number of treated units. In this case the prospective model may provide lower prediction error for the treated units, though at the cost of less pre-history data used for fitting. When there are a trivial number of units, the retrospective design will be the most efficient.

See more details about these and two additional model types (Prospective-restrictive, and full) at the docs.

Fitting a synthetic control model

Documentation

You can read these online at Read the Docs. See there for:

  • Custom Donor Pools Parallelization
  • Constraining the V matrix to be in the unit simplex
  • Performance Notes for fit()
  • Additional Performance Considerations for fit()
  • Full parameter listings

To build the documentation see docs/dev_notes.md.

Citation

Brian Quistorff, Matt Goldman, and Jason Thorpe (2020) Sparse Synthetic Controls: Unit-Level Counterfactuals from High-Dimensional Data, Microsoft Journal of Applied Research, 14, pp.155-170.

Installation

The easiest way to install SparseSC is with:

pip install git+https://github.com/microsoft/SparseSC.git

Additional commands to run tests and examples are in the makefile.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

sparsesc's People

Contributors

bquistorff avatar cclauss avatar chronofanz avatar dependabot[bot] avatar jdthorpe avatar microsoft-github-policy-service[bot] avatar mrgoldman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparsesc's Issues

more detailed build instructions?

would you perhaps be able to write more detailed build instructions? my build is consistiently failing every time and I'm not sure why

`estimate_effects` sometimes hangs indefinitely when using `n_multi`

If I have the n_multi parameter set for multithreading (for example, n_multi=os.cpu_count())when calling estimate_effects, it sometimes hangs indefinitely. I have a dataset where estimate_effects typically takes about 10 seconds, but occasionally (maybe 1 in 20 times) it hangs and I have to kill the process. I haven't been able to figure out why it's happening.

Not urgent because the workaround is to just run it single threaded. But I'm curious if anyone has experienced similar issues or knows why it might be happening.

Registered in Pypi repo

Hi SparseSc team,

First of all, Thanks for developing this package for Open Source community.

I wonder what's the timeline for the team to register this package in Pypi public repo?

Thanks
Sam

Retry path-optimal V

Used to provide path dependent answers, but might have been because of old bugs.

Inference + robustness checks

It would be great to add methods for inference based on placébos (permutation tests) and robustness checks based on LOO (see Abadie 2021). Inference is really key and standard in SC studies.

Allow for time-series variable weights

Another way to deal with lots of predictors is that, if they are from a set of time-series, is to have a constant weight per-time-series rather than per individual predictor. This is what MSCMT does (R library).

Allow differing treatment dates

Provide code to do the naive solution. See if there are any speedups (start looking for new penalty parameters at the old best point).

MultiTaskLassoCV got an unexpected keyword argument 'normalize'

Hi, I'm getting an issue when trying to run this code. As far as I know the data is correclty formatted. The type error relates to some LassCV backend - could someone help me please?

I am running it on both python 3.7.9 and 3.11.2, so hopefully it's not a python issue.

image (3)

Calculation of the most "representative" subset

Can you pick a set of units that is the most representative? Naively, could use the esimated W matrix, but that's not quite right. You could rotate through drop a unit and see which one is the most important for MSE of Y_post. You could do a forward step-wise search path.

implement wait and reconnect options for AzurBatch.run

The call to SparseSC.utils.AzureBatch.run() waits for the batch job to complete by default. However, this is not necessary as once created, the batch job is independent of the local process. Also, the currene api doesn't allow for reconnecting to a job after accidentally quitting out of python while run() is running. This is an easy fix (and I did it a lot during development), which needs to be implemented in the API

Fix NameError in estimate_effects

Hello!

I believe that the variable name change introduced in this commit also introduced a bug. The variable was renamed from cv_folds to cf_folds but there are still two instances of cv_folds in estimate_effects. This is causing a NameError. I have also added a PR if you want to just change to the new name: #35

Add more examples

  • Pandas multiindex -> block matrices
  • AA test (power calc)
  • dictionary of trasnformations of Y_pre (full trend, recent trend, normalizing level)
  • Multitask (normalize and then add extra Y_pre and Y_post)
  • Setting one or both lambdas to 0

Allow arbitrary subset of donors for each SC

There are two common scenarios where one would like to not use all donors for a SC.

  1. SUTVA violation: There might be local effects that cause "contamination" between units (e.g. if units are physical and border each other).
  2. Interpolation bias: You might want to disallow donors that have very different X since they might be included in the SC if there are other offsetting units.

I think the best way to enable this is to allow user to pass-in a valid-donor-for matrix with binary values.

Provide standard graphs

Typical graphs include:

  • Actual and SC for a unit
  • "Cloud" of effects for controls and treated (maybe do density if too many)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.