CausalKit is a Python package designed for students and researchers alike. It offers a simple approach to economic and statistical analysis, emphasizing ease of use, interpretability and causation.
- Simple API with r like syntax for linear regression
- Model display functions for easy comparison of models or interpretation of results
- Intuitive interfaces for common causal inference methods such as IV, panel methods and logit models.
- Simple library architecture allowing custom models to be added easily which will leverage existing functionality such as model display functions and r like syntax.
pip install causalkit
import pandas as pd
from src.models.linreg import LinReg
data = pd.read_csv("data.csv")
model = LinReg(df=data,
outcome="outcome_col",
independent=["independant_col1",
"independant_col2"],
standard_error_type='hc0')
model.summary(content_type='static')
- rgonomic API commands for linear regression, with support for fixed effects, IV, and more.
- To use all columns in a dataset simply call:
import pandas as pd
from src.models.linreg import LinReg
data = pd.read_csv("data.csv")
model = LinReg(df=data,
outcome="outcome",
independent=["."])
- Please see the 2__regression_commands_walkthrough.ipynb in /notebooks for more details & functionality.
- Stargazer based regression outputs for easy comparison of models.
import pandas as pd
from src.models.linreg import LinReg
from src.displays.display_linear import display_models
data = pd.read_csv("data.csv")
model = LinReg(df=data,
outcome="outcome_col",
independent=["independant_col1",
"independant_col2"],
standard_error_type='hc0')
model_2 = LinReg(df=data,
outcome="outcome_col",
independent=["independant_col1",
"independant_col2",
"independant_col3"],
standard_error_type='hc0')
display_models([model, model_2])
- Interactive regression outputs to remind you what a stats term means. Just call the .summary() method on your model object and hover over the terms to see their definitions.
- Dropping of na values automatically upon model instantiation
- IV Regression with angrist data as example
- T test for difference in means
- Random effect and mixed effects models
- Common model diagnostics + assumption checks (linearity, normality of residuals, homoscedasticity, and absence of multicollinearity. This could include plots (like QQ plots, residual vs. fitted value plots) and statistical tests.)
- Regularization (Lasso, Ridge, Elastic Net)
- GLM's (poisson, negative binomial, multinomial)
- Bootstrap + resampling methods (bootstrap se ci, permutation tests)
- build more Causal inferenece methods interfaces (matching, regression discontinuity, synthetic control, etc.)
- Interactive interaction explorer (interactive visualization with sliders for continuous variables and dropdowns for categorical variables)
- Refactor Fixed Effects class to handle high dimensional fixed effects efficiently, add method to support within estimator, integrate tabmat library for efficient matrix operations.