grantmcdermott / etwfe Goto Github PK
View Code? Open in Web Editor NEWExtended two-way fixed effects
Home Page: https://grantmcdermott.com/etwfe/
License: Other
Extended two-way fixed effects
Home Page: https://grantmcdermott.com/etwfe/
License: Other
Hi, this is a really cool package, and I'm trying to make it work for my dataset. Please note that I'm a relative beginner in R.
Brief description of my data:
Brief description of my problems
.Dtreat
variable is class is not supported."My code:
First, I impute some values for NAs of controls
suppressPackageStartupMessages(library(mice))
missing_vars <- c("age", "p_n_w_al", "p_eenp_hh", "p_ink_hi", "p_00_14_jr", "g_woz")
df_missing <- tk_cbs_test_12_21_80_t12_hosp[, missing_vars]
mids_obj <- mice(df_missing)
df_imputed <- complete(mids_obj)
tk_cbs_test_12_21_80_t12_hosp_imp <- cbind(tk_cbs_test_12_21_80_t12_hosp[, -which(names(tk_cbs_test_12_21_80_t12_hosp) %in% missing_vars)], df_imputed)
Second, create first.treat variable that is the same for all treated neighbourhoods
tk_cbs_test_12_21_80_t12_hosp_imp$first.treat <- ifelse(tk_cbs_test_12_21_80_t12_hosp_imp$evertr == 1, 2015, 0)
Third, convert all controls to numeric variables where this was not yet the case
tk_cbs_test_12_21_80_t12_hosp_imp$p_eenp_hh <- as.numeric(tk_cbs_test_12_21_80_t12_hosp_imp$p_eenp_hh)
tk_cbs_test_12_21_80_t12_hosp_imp$p_00_14_jr <- as.numeric(tk_cbs_test_12_21_80_t12_hosp_imp$p_00_14_jr)
Fourth, specify the model
twoway_FE_12_21_80_co =
etwfe(
fml = p_prrp ~ treated + urban + age + p_n_w_al + p_eenp_hh + p_ink_hi + p_00_14_jr + g_woz,
tvar = jaar.x,
gvar = first.treat,
ivar = buurt2021,
data = tk_cbs_test_12_21_80_t12_hosp_imp
)
# Have a look at the results
twoway_FE_12_21_80_co
Fifth, recover the Average Treatment Effect (ATE)
emfx(twoway_FE_12_21_80_co)
I hope you can help me out! Regardless of the answer, many thanks for this cool package.
Dear Grant,
Thank you very much for creating a great package. I have a query regarding the behaviour of the xvar
argument for heterogeneous treatment effects:
In the case of a categorical xvar
with more than two levels, I noticed what might be some unexpected behaviour when the variables are demeaned. The demeaned dataset created by etwfe() seems to have a number of variables with NA
column names that don't appear to enter the estimation. As an example, I have augmented the quickstart example on the package homepage to estimate heterogeneous treatment effects across quartiles of lpop
:
library(data.table)
data("mpdta", package = "did")
setDT(mpdta)
# Create quartiles of lpop
mpdta[, lpop_quartile := cut(lpop,
breaks = quantile(lpop, probs = 0:4/4),
labels = 1:4, include.lowest=TRUE)]
mod =
etwfe(
fml = lemp ~ 0,
tvar = year,
gvar = first.treat,
data = mpdta,
xvar = lpop_quartile,
vcov = ~countyreal
)
# etwfe_dat() is etwfe with the final line set to return(data) instead of return(est)
mod_dat = etwfe_dat(
fml = lemp ~ 0,
tvar = year,
gvar = first.treat,
data = mpdta,
xvar = lpop_quartile,
vcov = ~countyreal
)
> colnames(mod_dat)
[1] "year" "countyreal" "lpop" "lemp" "first.treat" "treat"
[7] "lpop_quartile" ".Dtreat" ".Dtreated_cohort" "lpop_quartile_dm" NA NA
> head(mod_dat)
year countyreal lpop lemp first.treat treat lpop_quartile .Dtreat .Dtreated_cohort lpop_quartile_dm NA NA
1 2003 8001 5.896761 8.461469 2007 1 4 FALSE 1 -0.2356021 -0.3089005 0.7277487
2 2004 8001 5.896761 8.336870 2007 1 4 FALSE 1 -0.2356021 -0.3089005 0.7277487
3 2005 8001 5.896761 8.340217 2007 1 4 FALSE 1 -0.2356021 -0.3089005 0.7277487
4 2006 8001 5.896761 8.378161 2007 1 4 FALSE 1 -0.2356021 -0.3089005 0.7277487
5 2007 8001 5.896761 8.487352 2007 1 4 TRUE 1 -0.2356021 -0.3089005 0.7277487
6 2003 8019 2.232377 4.997212 2007 1 1 FALSE 1 -0.2356021 -0.3089005 -0.2722513
The two NA columns appear to be other levels of lpop_quartile
after demeaning. Perhaps I am misunderstanding the process here; I wonder would you be able to confirm if this is the expected behaviour of the function?
Also, I notice that the process for demeaning the `xvar' is relative to time rather than the usual group demeaning that appears in the case of covariates in Wooldridge (2021):
Lines 334 to 341 in 55a032a
I wonder if you could also confirm that this is as expected? Could you provide some intuition as to why a different approach is required here if this is the case?
Many thanks again for providing such an excellent package, and I very much appreciate your help on this.
Dear Grant,
I hope you are doing fine.
I wondered whether you think it would be worth incorporating anticipation effects for the estimator (this would be kind of similar to what Callaway and Sant'Anna have).
It would be generally possible, and Wooldridge writes about that in the paper.
I don't know how this could be implemented for the ETWFE and would have to think about it, but do you think it would be worth digging?
Best, Frederic
First off, thank you for creating the package.
At the moment, more than one covariate leads to an error. The problem is caused when creating the crtls_dm variable in line 109 in etwfe.R, the string has to be split to get the names right, e.g.,
ctrls_dm = paste0(ctrls[[1]], " ", "")), "_dm")
I have not had time to check whether this can be fixed here or would lead to downstream problems. Will try to take a look tomorrow.
Dear Grant,
The marginaleffects::marginaleffects
part in emfx
becomes very slow in large datasets (> 500k), especially when the number of periods increases. At some point, it does not run in a sensible time anymore (running time explodes exponentially).
Currently, etwfe
runs marginaleffects
over the entire dat
dataset. But that is actually not necessary.
The results will be (almost) equivalent if you first collapse the data set for all period-cohort combinations and take weighted means!
I added in my fork the part below that collapses the data. With some simulated data (1 mio. obs, 10 periods), adding this block improves running time by a factor of 4 (100 sec. vs 400 sec.) and this only increases exponentially with larger datasets.
What do you think about this? Greetings !
# define formulas (needs an "if-else" with / without xvar)
form_count = stats::as.formula(paste(".", " ~", gvar, "+", tvar, "+", xvar))
form_data = stats::as.formula(paste(".", " ~", gvar, "+", tvar, "+", xvar, "+ .Dtreat"))
# calculate weights
dat_weights = aggregate(form_count, data = subset(dat, .Dtreat == 1), FUN = length)[c(gvar, tvar, xvar, ".Dtreat")]
names(dat_weights)[names(dat_weights) == ".Dtreat"] = "n"
# collapse the data
dat = aggregate(form_data, data = subset(dat, .Dtreat == 1), FUN = mean, na.rm = TRUE)
# merge the weights onto the collapsed data
dat = merge(dat, dat_weights, all.x = T)
# [...]
mfx = marginaleffects::marginaleffects(
object,
newdata = dat, # the collapsed data
wts = "n", # the cohort-period cell size as weights
variables = ".Dtreat",
by = c(by_var, xvar),
...
Is it possible to include weights in the etwfe function?
Hey Grant,
Hey @vincentarelbundock (maybe you can help me too because this is in the end a question on marginaleffects
rather than etwfe
)
I'm using the xvar
argument so far to calculate effects for different groups of a categorical variable. This works great.
As an example, I can calculate a treatment effect for education = 1, 2, or 3 and I get three treatment effects for each of these groups from emfx
. This would be something like this:
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % .Dtreat. educ
.Dtreat mean(TRUE) - mean(FALSE) 1.1529 0.010874 106.02 < 2.22e-16 1.1315 1.1742 TRUE 1
.Dtreat mean(TRUE) - mean(FALSE) 2.1092 0.010739 196.41 < 2.22e-16 2.0882 2.1303 TRUE 2
.Dtreat mean(TRUE) - mean(FALSE) 3.1726 0.009533 332.82 < 2.22e-16 3.1540 3.1913 TRUE 3
However, for a continous variable, I guess most researchers want to report something different for a linear model: Simply an intercept and the interaction. Assuming I want to interact the treatment with distance
:
etwfe(fml = y ~ 0, tvar = T, gvar = G, xvar = distance, data = dat)
Then etwfe
gives
GLM estimation, Dep. Var.: y
Observations: 1,000
Fixed-effects: G: 19, T: 3
Standard-errors: Clustered (G)
Estimate Std. Error t value Pr(>|t|)
.Dtreat:G::1:T::2:distance 0.014189 0.010905 1.301083 2.0964e-01
.Dtreat:G::1:T::3:distance -0.178339 0.011416 -15.622212 6.5175e-12 ***
.Dtreat:G::2:T::2:distance -0.061419 0.010905 -5.631918 2.4157e-05 ***
.Dtreat:G::2:T::3:distance 0.419388 0.011416 36.737625 < 2.2e-16 ***
.Dtreat:G::3:T::3:distance -0.140821 0.011416 -12.335705 3.2324e-10 ***
...
which is good so far and emfx
reports the marginal effects for different values of distance
.
Term Contrast .Dtreat ldist_air Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
.Dtreat mean(TRUE) - mean(FALSE) TRUE 8.88 -0.0372 6.59 -0.00564 0.996 0.0 -12.96 12.88
.Dtreat mean(TRUE) - mean(FALSE) TRUE 6.69 132.9161 24.82 5.35418 <0.001 23.5 84.26 181.57
.Dtreat mean(TRUE) - mean(FALSE) TRUE 8.24 1.7932 11.79 0.15216 0.879 0.2 -21.31 24.89
.Dtreat mean(TRUE) - mean(FALSE) TRUE 9.45 -7.0808 6.12 -1.15642 0.248 2.0 -19.08 4.92
.Dtreat mean(TRUE) - mean(FALSE) TRUE 9.06 -2.2928 6.43 -0.35667 0.721 0.5 -14.89 10.31
--- 79 rows omitted. See ?print.marginaleffects ---
.Dtreat mean(TRUE) - mean(FALSE) TRUE 9.10 -2.8199 6.39 -0.44118 0.659 0.6 -15.35 9.71
.Dtreat mean(TRUE) - mean(FALSE) TRUE 8.25 8.4341 7.29 1.15744 0.247 2.0 -5.85 22.72
.Dtreat mean(TRUE) - mean(FALSE) TRUE 9.82 120.9891 16.57 7.30238 <0.001 41.7 88.52 153.46
.Dtreat mean(TRUE) - mean(FALSE) TRUE 8.34 6.9434 5.00 1.38910 0.165 2.6 -2.85 16.74
.Dtreat mean(TRUE) - mean(FALSE) TRUE 9.25 -9.4350 9.73 -0.97017 0.332 1.6 -28.50 9.63
But it's a linear model and I would like to see from emfx
simply an Intercept and Slope Coefficient. Sadly, I am not an experts on the marginaleffects
package. Do you know how I could achieve that? (And if you think this would be useful I could also try to implement and push it).
All the best, Frederic
For unbalanced panels, but also just as an additional option.
Hi Grant,
Thanks for making this package. I would like to use this package on my Ubuntu workstation, but it gives me the following error.
I have tried reinstalling R and this package, but the same error occurs. Would there be an easy way to fix it?
Thanks again for the package, and I would very much appreciate your help on this issue. Many thanks in advance!
min(gvar) < min (tvar)
, as with the mpdta dataset in the README, that's a "never treated" value (in the sense of being able to recover a treatment effect).max(gvar) > max(tvar)
. This is the case even when using the "notyet" treated control, which clearly isn't right.Sorry to be a pain and submit another issue, but I wanted to keep track of it.
I just realized when replying to the other issue that the confidence interval of the aggregate effects is weirdly off when using fe = 'none'.
Based on the example mpdta:
mod =
etwfe(
fml = lemp ~ lpop, # outcome ~ controls
tvar = year, # time variable
gvar = first.treat, # group variable
data = mpdta, # dataset
vcov = ~countyreal, # vcov adjustment (here: clustered)
fe = "none"
)
emfx(mod, type = "simple")
Term Contrast .Dtreat Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
1 .Dtreat mean(dY/dX) 1 -0.05063 0.0125 -4.051 5.0988e-05 8.418 8.694
emfx(mod, type = "event")
Term Contrast event Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
1 .Dtreat mean(dY/dX) 0 -0.03321 0.01337 -2.485 0.01296119 8.418 8.694
2 .Dtreat mean(dY/dX) 1 -0.05735 0.01715 -3.344 0.00082628 5.871 6.147
3 .Dtreat mean(dY/dX) 2 -0.13787 0.03079 -4.478 7.5344e-06 5.163 5.462
4 .Dtreat mean(dY/dX) 3 -0.10954 0.03232 -3.390 0.00069966 5.264 5.513
Term Contrast first.treat Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
1 .Dtreat mean(dY/dX) 2004 -0.08763 0.02305 -3.802 0.00014351 5.283 5.557
2 .Dtreat mean(dY/dX) 2006 -0.02128 0.01859 -1.145 0.25240119 5.895 6.162
3 .Dtreat mean(dY/dX) 2007 -0.04595 0.01797 -2.557 0.01055532 8.418 8.694
I'll try to also take a look at this over the weekend.
Much thanks for the package! I was in search of a tool to implement the nonlinear DiD approach described by Wooldridge (2022) and this is exactly what I was was hoping to find.
Before applying etwfe()
and emfx()
to my data, I tried to run the example code provided by the README and vignette. I was unable to replicate any emfx()
results due to the following error (and additional warning):
Error: The function supplied to the `transform_pre` argument must accept two numeric vectors of predicted probabilities of length 0, and return a single numeric value or a numeric vector
of length 0, with no missing value.
In addition: Warning message:
In formals(fun) : argument is not a function
I'm not sure if this is a potential general issue with the recent marginaleffects
update or if it is specific to me. I have placed the entirety of my code and output below in addition to a traceback of the error and my session information.
Thanks in advance for any help!
# Load library:
library(etwfe)
# Load data:
data("mpdta", package = "did")
# Inspect data:
head(mpdta)
# Estimate model:
mod =
etwfe(
fml = lemp ~ lpop, # outcome ~ controls
tvar = year, # time variable
gvar = first.treat, # group variable
data = mpdta, # dataset
vcov = ~countyreal # vcov adjustment (here: clustered)
)
# Inspect model object:
mod
# Get aggregated ATT:
emfx(mod)
> # Load library:
> library(etwfe)
>
> # Load data:
> data("mpdta", package = "did")
>
> # Inspect data:
> head(mpdta)
year countyreal lpop lemp first.treat treat
866 2003 8001 5.896761 8.461469 2007 1
841 2004 8001 5.896761 8.336870 2007 1
842 2005 8001 5.896761 8.340217 2007 1
819 2006 8001 5.896761 8.378161 2007 1
827 2007 8001 5.896761 8.487352 2007 1
937 2003 8019 2.232377 4.997212 2007 1
>
> # Estimate model:
> mod =
+ etwfe(
+ fml = lemp ~ lpop, # outcome ~ controls
+ tvar = year, # time variable
+ gvar = first.treat, # group variable
+ data = mpdta, # dataset
+ vcov = ~countyreal # vcov adjustment (here: clustered)
+ )
>
> # Inspect model object:
> mod
OLS estimation, Dep. Var.: lemp
Observations: 2,500
Fixed-effects: first.treat: 4, year: 5
Varying slopes: lpop (first.treat: 4), lpop (year: 5)
Standard-errors: Clustered (countyreal)
Estimate Std. Error t value Pr(>|t|)
.Dtreat:first.treat::2004:year::2004 -0.021248 0.021728 -0.977890 3.2860e-01
.Dtreat:first.treat::2004:year::2005 -0.081850 0.027375 -2.989963 2.9279e-03 **
.Dtreat:first.treat::2004:year::2006 -0.137870 0.030795 -4.477097 9.3851e-06 ***
.Dtreat:first.treat::2004:year::2007 -0.109539 0.032322 -3.389024 7.5694e-04 ***
.Dtreat:first.treat::2006:year::2006 0.002537 0.018883 0.134344 8.9318e-01
.Dtreat:first.treat::2006:year::2007 -0.045093 0.021987 -2.050907 4.0798e-02 *
.Dtreat:first.treat::2007:year::2007 -0.045955 0.017975 -2.556568 1.0866e-02 *
.Dtreat:first.treat::2004:year::2004:lpop_dm 0.004628 0.017584 0.263184 7.9252e-01
... 6 coefficients remaining (display them with summary() or use argument n)
... 10 variables were removed because of collinearity (.Dtreat:first.treat::2006:year::2004, .Dtreat:first.treat::2006:year::2005 and 8 others [full set in $collin.var])
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.537131 Adj. R2: 0.87167
Within R2: 8.449e-4
>
> # Get aggregated ATT:
> emfx(mod)
Error: The function supplied to the `transform_pre` argument must accept two numeric vectors of predicted probabilities of length 0, and return a single numeric value or a numeric vector
of length 0, with no missing value.
In addition: Warning message:
In formals(fun) : argument is not a function
Code:
# Error traceback:
traceback()
Output:
> # Error traceback:
> traceback()
11: stop(format_message(string = string, ..., line_length = line_length,
indent = indent), call. = call.)
10: format_alert(..., type = "error")
9: insight::format_error(msg)
8: safefun(hi = predicted_hi, lo = predicted_lo, y = predicted,
n = .N, term = term, cross = cross, wts = marginaleffects_wts_internal,
tmp_idx = tmp_idx)
7: `[.data.table`(out, , `:=`("estimate", safefun(hi = predicted_hi,
lo = predicted_lo, y = predicted, n = .N, term = term, cross = cross,
wts = marginaleffects_wts_internal, tmp_idx = tmp_idx)),
by = idx)
6: out[, `:=`("estimate", safefun(hi = predicted_hi, lo = predicted_lo,
y = predicted, n = .N, term = term, cross = cross, wts = marginaleffects_wts_internal,
tmp_idx = tmp_idx)), by = idx]
5: get_contrasts(structure(list(nobs = 2500L, nobs_origin = 2500L,
fml = lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003)/lpop_dm,
call = fixest::feols(fml = Fml, data = data, vcov = ..1,
notes = FALSE), call_env = <environment>, method = "feols",
method_type = "feols", fml_all = list(linear = lemp ~ .Dtreat:i(first.treat,
i.year, ref = 0, ref2 = 2003)/lpop_dm, fixef = ~first.treat +
first.treat[[lpop]] + year + year[[lpop]]), fml_no_xpd = lemp ~
.Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003)/lpop_dm |
first.treat[lpop] + year[lpop], fixef.tol = 1e-06,
fixef.iter = 10000, nparams = 31, fixef_vars = c("first.treat",
"year"), fixef_terms = c("first.treat", "first.treat[[lpop]]",
"year", "year[[lpop]]"), slope_flag = c(1L, 1L), slope_flag_reordered = c(1L,
1L), slope_variables_reordered = list(lpop = c(5.8967609333053,
5.8967609333053, 5.8967609333053, 5.8967609333053, 5.8967609333053,
2.23237719795055, 2.23237719795055, 2.23237719795055, 2.23237719795055,
2.23237719795055, 1.29828248379668, 1.29828248379668, 1.29828248379668,
1.29828248379668, 1.29828248379668, 3.32625829499766, 3.32625829499766,
3.32625829499766, 3.32625829499766, 3.32625829499766, 6.24790553432335,
6.24790553432335, 6.24790553432335, 6.24790553432335, 6.24790553432335,
2.08081559723298, 2.08081559723298, 2.08081559723298, 2.08081559723298,
...
4: do.call("get_contrasts", args)
3: comparisons(model, newdata = newdata, variables = variables,
vcov = vcov, conf_level = conf_level, type = type, wts = wts,
hypothesis = hypothesis, df = df, by = by, eps = eps, transform_pre = slope,
cross = FALSE, internal_call = TRUE, ...)
2: marginaleffects::slopes(object, newdata = dat, wts = "N", variables = ".Dtreat",
by = by_var, ...)
1: emfx(mod)
Code:
# Session information:
sessionInfo()
Output:
> # Session information:
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 12.0.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] etwfe_0.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 lattice_0.20-41 fixest_0.10.4 zoo_1.8-8 marginaleffects_0.10.0 grid_4.0.4 nlme_3.1-152 backports_1.2.1
[9] dreamerr_1.2.3 data.table_1.14.2 checkmate_2.0.0 generics_0.1.0 sandwich_3.0-2 Formula_1.2-4 tools_4.0.4 numDeriv_2016.8-1.1
[17] compiler_4.0.4 insight_0.19.0
Allow users to easily specify which of the following three equivalent models they want:
# varying slopes (current default)
feols(
lemp ~ .Dtreat : i(first_treat, i.year, ref = 0, ref2 = 2003) / lpop_dm |
first_treat[lpop] + year[lpop],
data = mpdta, vcov = ~countyreal
)
# fes only
feols(
lemp ~ .Dtreat : i(first_treat, i.year, ref = 0, ref2 = 2003) / lpop_dm +
lpop + # changed
i(first_treat, lpop, ref = 0) + # changed
i(year, lpop, ref = 2003) | # changed
first_treat + year, # changed
data = mpdta, vcov = ~countyreal
)
# no fes at all
feols(
lemp ~ .Dtreat : i(first_treat, i.year, ref = 0, ref2 = 2003) / lpop_dm +
lpop + # changed
i(first_treat, lpop, ref = 0) + # changed
i(year, lpop, ref = 2003) + # changed
i(first_treat, ref = 0) + i(year, ref = 2003), # changed
data = mpdta, vcov = ~countyreal
)
Complication: Need to double check multiple controls...
Hi @grantmcdermott,
just noticed a possible bug?
If the data being used already has a variable named "group" then the emfx()
function does not seem to aggregate the effects.
In the example below, I first run the same example that you have on your website, so emfx()
works as expected, and then modify the data adding a variable named group. Note that this new variable is not used in the etwfe()
function, and yet emfx()
does not aggregate the results.
Cheers,
Mario
data("mpdta", package = "did")
head(mpdta)
mod =
etwfe::etwfe(
fml = lemp ~ lpop, # outcome ~ controls
tvar = year, # time variable
gvar = first.treat, # group variable
data = mpdta, # dataset
vcov = ~countyreal # vcov adjustment (here: clustered)
)
etwfe::emfx(mod)
_Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
.Dtreat mean(TRUE) - mean(FALSE) TRUE -0.0506 0.0125 -4.05 <0.001 14.3 -0.0751 -0.0261_
mpdta$group <- mpdta$countyreal
mod2 =
etwfe::etwfe(
fml = lemp ~ lpop, # outcome ~ controls
tvar = year, # time variable
gvar = first.treat, # group variable
data = mpdta, # dataset
vcov = ~countyreal # vcov adjustment (here: clustered)
)
etwfe::emfx(mod2)
_Group Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
8001 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.09431 0.0330 -2.8579 0.00426 7.9 -0.1590 -0.029634
8019 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.02163 0.0334 -0.6473 0.51746 1.0 -0.0871 0.043870
8023 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.00310 0.0474 -0.0654 0.94785 0.1 -0.0961 0.089886
8029 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.04333 0.0193 -2.2501 0.02444 5.4 -0.0811 -0.005587
8041 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.10128 0.0382 -2.6539 0.00796 7.0 -0.1761 -0.026483
--- 181 rows omitted. See ?print.marginaleffects ---
55103 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.05469 0.0224 -2.4459 0.01445 6.1 -0.0985 -0.010865
55109 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.00621 0.0205 -0.3027 0.76208 0.4 -0.0464 0.033973
55123 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.03744 0.0191 -1.9587 0.05015 4.3 -0.0749 0.000024
55131 .Dtreat mean(TRUE) - mean(FALSE) TRUE 0.01769 0.0269 0.6585 0.51024 1.0 -0.0350 0.070355
55137 .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.04484 0.0202 -2.2177 0.02658 5.2 -0.0845 -0.005210
Columns: group, term, contrast, .Dtreat, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo_
So thrilled for this and exactly what I was looking for yesterday! I'm wondering whether controls are required, as I can't seem to get your example to run without specifying controls (I'm starting very simply with my own analysis). All is well with your example:
> library(etwfe)
> data("mpdta", package = "did")
> mod =
+ etwfe(
+ fml = lemp ~ lpop, # outcome ~ controls
+ tvar = year, # time variable
+ gvar = first.treat, gref=0, # group variable (with bespoke ref. level)
+ data = mpdta, # dataset
+ vcov = ~countyreal # vcov adjustment (here: clustered)
+ )
> mod
OLS estimation, Dep. Var.: lemp
Observations: 2,500
Fixed-effects: first.treat: 4, year: 5
Varying slopes: lpop (first.treat: 4), lpop (year: 5)
Standard-errors: Clustered (countyreal)
Estimate Std. Error t value Pr(>|t|)
.Dtreat:first.treat::2004:year::2004 -0.021248 0.021728 -0.977890 3.2860e-01
.Dtreat:first.treat::2004:year::2005 -0.081850 0.027375 -2.989963 2.9279e-03 **
.Dtreat:first.treat::2004:year::2006 -0.137870 0.030795 -4.477097 9.3851e-06 ***
.Dtreat:first.treat::2004:year::2007 -0.109539 0.032322 -3.389024 7.5694e-04 ***
.Dtreat:first.treat::2006:year::2006 0.002537 0.018883 0.134344 8.9318e-01
.Dtreat:first.treat::2006:year::2007 -0.045093 0.021987 -2.050907 4.0798e-02 *
.Dtreat:first.treat::2007:year::2007 -0.045955 0.017975 -2.556568 1.0866e-02 *
.Dtreat:first.treat::2004:year::2004:lpop_dm 0.004628 0.017584 0.263184 7.9252e-01
... 6 coefficients remaining (display them with summary() or use argument n)
... 10 variables were removed because of collinearity (.Dtreat:first.treat::2006:year::2004, .Dtreat:first.treat::2006:year::2005 and 8 others [full set in $collin.var])
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.537131 Adj. R2: 0.87167
Within R2: 8.449e-4
If I drop the controls altogether, it throws an error:
> mod =
etwfe(
fml = lemp, # outcome ~ controls
tvar = year, # time variable
gvar = first.treat, gref=0, # group variable (with bespoke ref. level)
data = mpdta, # dataset
vcov = ~countyreal # vcov adjustment (here: clustered)
)
Error in etwfe(fml = lemp, tvar = year, gvar = first.treat, gref = 0, :
object 'lemp' not found
I also tried a few other iterations, e.g., fml = lemp ~
or fml = lemp ~ 1
but these also generate errors.
I know this is all in development, and am really thankful for the resource. Thanks again.
Dear Grant,
Thanks for your amazing work. Wooldridge's estimator allows interacting a binary treatment dummy with a time-constant covariate (say, gender - male / female). Then, you get the effect separately for all groups.
I tried to implement the estimator including this feature on my own, but it's buggy and occasionally the standard errors explode (Too small groups etc.)
Are you planning to include this interaction-option in your package, or did you even already implement it?
Thanks and best, Frederic
When I follow the vignette and run the function emfx, I get the following error: Error in variable_classes[[v$name]] : subscript out of bounds
Something like the following, adapted from subset
, internally. Would add a few microseconds in overhead, but it handles both expressions and strings.
nse_func = function(vars, data) {
nl = as.list(seq_along(data))
names(nl) = names(data)
vars = eval(substitute(vars), nl, parent.frame())
if (is.numeric(vars)) vars = names(data)[vars]
# Simple return object to demonstrate
head(data[, vars, drop = FALSE], 2)
}
# Handles both strings and expressions
nse_func("mpg", mtcars)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
nse_func(mpg, mtcars)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
# Vectors are fine too (although not expected for etwfe)
nse_func(c("mpg", "cyl"), mtcars)
#> mpg cyl
#> Mazda RX4 21 6
#> Mazda RX4 Wag 21 6
nse_func(c(mpg, cyl), mtcars)
#> mpg cyl
#> Mazda RX4 21 6
#> Mazda RX4 Wag 21 6
nse_func(c(mpg:cyl, hp), mtcars)
#> mpg cyl hp
#> Mazda RX4 21 6 110
#> Mazda RX4 Wag 21 6 110
nse_func(names(mtcars), mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Created on 2022-10-15 with reprex v2.0.2
C.f. With the jwdid
Poisson model at the very bottom of this post.
The main results are exactly the same.
library(etwfe)
data("mpdta", package = "did")
mpdta$emp = exp(mpdta$lemp)
# poisson model
pmod = etwfe(emp ~ lpop, tvar=year, gvar=first.treat, gref=0, data=mpdta,
vcov=~countyreal, family = "poisson", fe = "none")
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and eight others have been removed because of collinearity (see $collin.var).
fixest::etable(pmod, digits = 7, signif.code = NA)
#> pmod
#> Dependent Var.: emp
#>
#> Constant 2.484455 (0.0774941)
#> lpop 1.039625 (0.0168640)
#> lpop x first.treat = 2004 -0.0132716 (0.0404687)
#> lpop x first.treat = 2006 -0.0811734 (0.0330063)
#> lpop x first.treat = 2007 0.0162899 (0.0250988)
#> lpop x year = 2004 -0.0066118 (0.0045127)
#> lpop x year = 2005 -0.0077402 (0.0059153)
#> lpop x year = 2006 -0.0090772 (0.0065360)
#> lpop x year = 2007 -0.0035862 (0.0054845)
#> first.treat = 2004 0.2361076 (0.2079180)
#> first.treat = 2006 0.5927050 (0.1770658)
#> first.treat = 2007 -0.1440501 (0.1223770)
#> year = 2004 -0.0105493 (0.0209058)
#> year = 2005 0.0112893 (0.0270397)
#> year = 2006 0.0453898 (0.0291563)
#> year = 2007 0.0542584 (0.0280050)
#> .Dtreat x first.treat = 2004 x year = 2004 -0.0309557 (0.0176208)
#> .Dtreat x first.treat = 2004 x year = 2005 -0.0662240 (0.0255833)
#> .Dtreat x first.treat = 2004 x year = 2006 -0.1329713 (0.0237253)
#> .Dtreat x first.treat = 2004 x year = 2007 -0.1170228 (0.0225006)
#> .Dtreat x first.treat = 2006 x year = 2006 -0.0090494 (0.0244213)
#> .Dtreat x first.treat = 2006 x year = 2007 -0.0681486 (0.0251113)
#> .Dtreat x first.treat = 2007 x year = 2007 -0.0399916 (0.0168249)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2004 0.0118798 (0.0071285)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2005 0.0209935 (0.0119250)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2006 0.0409103 (0.0095714)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2007 0.0250348 (0.0112873)
#> .Dtreat x lpop_dm x first.treat = 2006 x year = 2006 0.0375837 (0.0136183)
#> .Dtreat x lpop_dm x first.treat = 2006 x year = 2007 0.0453804 (0.0164515)
#> .Dtreat x lpop_dm x first.treat = 2007 x year = 2007 -0.0109675 (0.0072256)
#> ________________________________________ ______________________
#> S.E.: Clustered by: countyreal
#> Observations 2,500
#> Squared Cor. 0.95698
#> Pseudo R2 0.95262
#> BIC 287,872.5
But the marginal effects are different.
emfx(pmod, type = "event")
#> Term Contrast event Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 .Dtreat mean(dY/dX) 0 -22.849 16.17 -1.41348 0.15751423 -54.53 8.834
#> 2 .Dtreat mean(dY/dX) 1 3.654 41.81 0.08739 0.93035790 -78.30 85.607
#> 3 .Dtreat mean(dY/dX) 2 -71.521 22.06 -3.24264 0.00118430 -114.75 -28.291
#> 4 .Dtreat mean(dY/dX) 3 -97.798 25.45 -3.84321 0.00012144 -147.67 -47.923
#>
#> Model type: etwfe
#> Prediction type: response
@vincentarelbundock Do you have any ideas? (Sorry for tagging and running, but wanted to get this down quickly before I have to head out!)
In case you missed it: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4516518
Fixest's aggregate command may be an efficient alternative for inference in cases when marginaleffects
takes a long time. Consider the MWE below:
library(etwfe)
data("mpdta", package="did")
mod = etwfe(
fml = lemp ~ lpop,
tvar = year,
gvar = first.treat,
data = mpdta,
vcov = ~countyreal
)
emfx(mod)
aggregate(mod, c("ATT"="^.Dtreat:first.treat::[0-9]{4}:year::[0-9]{4}$"))
emfx
returns an estimate of -0.05062703
with std. error of 0.01249979
, and aggregate
returns the same exact values in this case.
FYI, marginaleffects
0.8.1 is on its way to CRAN.
We can just grab the "xvar" variable from the model attributes (and this should probably be the default behaviour if xvar was specified in the preceding etwfe
call).
Hi Scott, thanks for the great package.
I am using etwfe in a Poisson regression with an exposure variable.
etwfe() works but emfx() returns an error message.
Example using mpdta data and the lpop variable as the exposure.
data("mpdta", package = "did")
mpdta$emp = exp(mpdta$lemp)
etwfe(
emp ~ 0, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
family = "poisson", offset = ~log(lpop)
) |>
emfx()
Error: Unable to compute predicted values with this model. You can try to supply a different dataset to the `newdata` argument. If
this does not work, you can file a report on the Github Issue Tracker:
https://github.com/vincentarelbundock/marginaleffects/issues
It looks like a problem with the marginaleffects
package. Let me know if there is a simple workaround or if I am mistaken.
Hi Grant, thanks a lot for the package!
I succesfully run my model with etwfe()
, but when i try to aggregate it by the emfx
command, i get the following error:
> emfx(teste, type = "calendar")
Error: Models of class "etwfe" are not supported. Supported model classes include:
afex_aov, amest, betareg, bglmerMod, bigglm, biglm, blmerMod, bracl, brglmFit, brmsfit, brnb, clm, clogit, coxph, crch,
fixest, flac, flic, gam, Gam, gamlss, geeglm, glimML, glm, glmerMod, glmmPQL, glmmTMB, glmrob, glmx, gls, hetprob, hurdle,
hxlr, iv_robust, ivpml, ivreg, lm, lm_robust, lme, lmerMod, lmerModLmerTest, lmrob, lmRob, loess, logistf, lrm, mblogit,
mclogit, MCMCglmm, mhurdle, mira, mlogit, multinom, negbin, nls, ols, oohbchoice, orm, phyloglm, phylolm, plm, polr,
Rchoice, rlmerMod, rq, scam, selection, speedglm, speedlm, stanreg, svyolr, tobit, tobit1, truncreg, zeroinfl
New modeling packages can usually be supported by `marginaleffects` if they include a working `predict()` method. If you
believe that this is the case, please file a feature request on Github:
https://github.com/vincentarelbundock/marginaleffects/issues
(Note to self: See email correspondence with @fpretis.)
Dear Professor
Greetings, I've been reading your etwfe code but I'm having a little problem with the gvar variable, my treatment is cumulative percentage km2 of protected areas (percentacumestricta) and outcome (RLI) these variables are continuous and I have data from 2000 to 2014 by country (ID) -year (t).
treated=1 if percentacumestricta >17
I created gvar variable:
bysort ID: egen min_year=min(year) if treated==1
bysort ID: egen First_treated=max(min_year)
replace First_treated=0 if First_treated==.
tab year First_treated
However I couldnยดt understand how you create your gvar, Could you explain me that please?
I really appreciate your comments
Thanks for your time
All the best
Hi @grantmcdermott ,
First of all, thank you for providing this valuable package. It has been instrumental in helping me understand the implementation details of the Extended Two-Way Fixed Effects method developed by Wooldridge.
I do, however, have a question regarding the computation of the simple ATT using a never treated control group. In my experiments, I have noticed significant differences between the results obtained from the etwfe function and those obtained from the Callaway-Sanntanna estimator, as well as when I manually run the etwfe procedure.
After executing the etwfe function, I computed the simple ATT using the marginaleffects::slopes() function and obtained an estimate of -0.0404, whereas the estimate I obtained using emfx() was -0.00166.
I must be overlooking some important detail. To provide further context, I have attached the details of my experiments below.
Thank you in advance!
library(did)
library(etwfe)
library(broom)
library(stringr)
library(marginaleffects)
library(fixest)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
options(knitr.kable.NA = "-")
# Data -------------------------------------------------------------------------
mpdta <- did::mpdta %>%
mutate(
.Dtreat = treat * (year >= first.treat),
time_to_event = ifelse(first.treat != 0, year - first.treat, -1000)
)
head(mpdta)
#> year countyreal lpop lemp first.treat treat .Dtreat time_to_event
#> 866 2003 8001 5.896761 8.461469 2007 1 0 -4
#> 841 2004 8001 5.896761 8.336870 2007 1 0 -3
#> 842 2005 8001 5.896761 8.340217 2007 1 0 -2
#> 819 2006 8001 5.896761 8.378161 2007 1 0 -1
#> 827 2007 8001 5.896761 8.487352 2007 1 1 0
#> 937 2003 8019 2.232377 4.997212 2007 1 0 -4
# Callaway-Sant'Anna - Never Treated Control -----------------------------------
cs_never <- att_gt(
yname = "lemp",
tname = "year",
idname = "countyreal",
gname = "first.treat",
data = mpdta,
control_group = "nevertreated"
)
cs_never_simple <- aggte(cs_never, type = "simple") %>%
tidy() %>%
select(type, cs_never = estimate)
cs_never_dynamic <- aggte(cs_never, type = "dynamic") %>%
tidy() %>%
select(type, event = event.time, cs_never = estimate)
cs_never <- tidy(cs_never) %>%
mutate(type = "group_time") %>%
select(type, group, time, cs_never = estimate) %>%
mutate(
group = as.character(group),
time = as.character(time)
)
cs_never_res <- bind_rows(cs_never, cs_never_simple, cs_never_dynamic)
# Callaway-Sant'Anna - Not Yet Treated Control ---------------------------------
cs_notyet <- att_gt(
yname = "lemp",
tname = "year",
idname = "countyreal",
gname = "first.treat",
data = mpdta,
control_group = "notyettreated"
)
cs_notyet_simple <- aggte(cs_notyet, type = "simple") %>%
tidy() %>%
select(type, cs_notyet = estimate)
cs_notyet_dynamic <- aggte(cs_notyet, type = "dynamic") %>%
tidy() %>%
select(type, event = event.time, cs_notyet = estimate)
cs_notyet <- tidy(cs_notyet) %>%
mutate(type = "group_time") %>%
select(type, group, time, cs_notyet = estimate) %>%
mutate(
group = as.character(group),
time = as.character(time)
)
cs_notyet_res <- bind_rows(cs_notyet, cs_notyet_simple, cs_notyet_dynamic)
# Extended TWFE - Never Treated Control ----------------------------------------
etwfe_never <- etwfe(
fml = lemp ~ 1,
tvar = year,
gvar = first.treat,
data = mpdta,
ivar = countyreal,
cgroup = "never"
)
# compute simple ATT manually
slopes(
model = etwfe_never,
newdata = mpdta %>% filter(.Dtreat == 1),
variable = ".Dtreat",
by = ".Dtreat"
) %>%
print()
#>
#> Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|)
#> .Dtreat mean(TRUE) - mean(FALSE) 1 -0.0404 0.0196 -2.06 0.0396
#> S 2.5 % 97.5 %
#> 4.7 -0.0789 -0.00193
#>
#> Columns: term, contrast, .Dtreat, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
etwfe_never_simple <- emfx(etwfe_never, "simple") %>%
tidy() %>%
mutate(type = "simple") %>%
select(type, etwfe_never = estimate)
etwfe_never_dynamic <- emfx(etwfe_never, "event") %>%
tidy() %>%
mutate(type = "dynamic") %>%
select(type, event, etwfe_never = estimate) %>%
filter(event < 2000)
etwfe_never <- tidy(etwfe_never) %>%
mutate(
type = "group_time",
group = str_extract(term, "(?<=::)[0-9]{4}(?=:year)"),
time = str_extract(term, "(?<=year::)[0-9]{4}")
) %>%
select(type, group, time, etwfe_never = estimate)
etwfe_never_res <- bind_rows(
etwfe_never, etwfe_never_simple, etwfe_never_dynamic
)
# Extended TWFE - Not Yet Treated Control --------------------------------------
etwfe_notyet <- etwfe(
fml = lemp ~ 1,
tvar = year,
gvar = first.treat,
data = mpdta,
ivar = countyreal,
cgroup = "notyet"
)
etwfe_notyet_simple <- emfx(etwfe_notyet, "simple") %>%
tidy() %>%
mutate(type = "simple") %>%
select(type, etwfe_notyet = estimate)
etwfe_notyet_dynamic <- emfx(etwfe_notyet, "event") %>%
tidy() %>%
mutate(type = "dynamic") %>%
select(type, event, etwfe_notyet = estimate) %>%
filter(event < 2000)
etwfe_notyet <- tidy(etwfe_notyet) %>%
mutate(
type = "group_time",
group = str_extract(term, "(?<=::)[0-9]{4}(?=:year)"),
time = str_extract(term, "(?<=year::)[0-9]{4}")
) %>%
select(type, group, time, etwfe_notyet = estimate)
etwfe_notyet_res <- bind_rows(
etwfe_notyet, etwfe_notyet_simple, etwfe_notyet_dynamic
)
# Extended TWFE Manual - Never Treated Control ---------------------------------
manual_never <- feols(
lemp ~
treat:i(first.treat, i.time_to_event, ref = 0, ref2 = -1) |
first.treat + year,
data = mpdta,
cluster = ~countyreal
)
manual_never_simple <- slopes(
model = manual_never,
newdata = mpdta %>% filter(time_to_event >= 0),
variables = "treat",
by = "treat",
) %>%
tidy() %>%
mutate(type = "simple") %>%
select(type, manual_never = estimate)
manual_never_dynamic <- slopes(
model = manual_never,
newdata = mpdta %>% filter(treat == 1),
variables = "treat",
by = "time_to_event",
) %>%
tidy() %>%
mutate(type = "dynamic") %>%
select(type, event = time_to_event, manual_never = estimate)
manual_never <- tidy(manual_never) %>%
mutate(
type = "group_time",
group = str_extract(term, "(?<=::)[0-9]{4}(?=:time)"),
time = str_extract(term, "(?<=time_to_event::)[-0-9]{1,2}"),
time = as.numeric(group) + as.numeric(time),
time = as.character(time)
) %>%
select(type, group, time, manual_never = estimate)
manual_never_res <- bind_rows(
manual_never, manual_never_simple, manual_never_dynamic
)
# Extended TWFE Manual - Not Yet Treated Control -------------------------------
manual_notyet <- feols(
lemp ~
.Dtreat:i(first.treat, i.time_to_event, ref = 0, ref2 = -1) |
first.treat + year,
data = mpdta,
cluster = ~countyreal
)
#> The variables '.Dtreat:first.treat::2006:time_to_event::-3', '.Dtreat:first.treat::2006:time_to_event::-2' and three others have been removed because of collinearity (see $collin.var).
manual_notyet_simple <- slopes(
model = manual_notyet,
newdata = mpdta %>% filter(.Dtreat == 1),
variables = ".Dtreat",
by = ".Dtreat",
) %>%
tidy() %>%
mutate(type = "simple") %>%
select(type, manual_notyet = estimate)
manual_notyet_dynamic <- slopes(
model = manual_notyet,
newdata = mpdta %>% filter(.Dtreat == 1),
variables = ".Dtreat",
by = "time_to_event",
) %>%
tidy() %>%
mutate(type = "dynamic") %>%
select(type, event = time_to_event, manual_notyet = estimate)
manual_notyet <- tidy(manual_notyet) %>%
mutate(
type = "group_time",
group = str_extract(term, "(?<=::)[0-9]{4}(?=:time)"),
time = str_extract(term, "(?<=time_to_event::)[-0-9]{1,2}"),
time = as.numeric(group) + as.numeric(time),
time = as.character(time)
) %>%
select(type, group, time, manual_notyet = estimate)
manual_notyet_res <- bind_rows(
manual_notyet, manual_notyet_simple, manual_notyet_dynamic
)
# Combine ----------------------------------------------------------------------
results <- full_join(
x = cs_never_res,
y = cs_notyet_res,
by = join_by(type, group, time, event)
) %>%
full_join(
y = etwfe_never_res,
by = join_by(type, group, time, event)
) %>%
full_join(
y = etwfe_notyet_res,
by = join_by(type, group, time, event)
) %>%
full_join(
y = manual_never_res,
by = join_by(type, group, time, event)
) %>%
full_join(
y = manual_notyet_res,
by = join_by(type, group, time, event)
)
# Simple ATT -------------------------------------------------------------------
results %>%
filter(type == "simple") %>%
select(cs_never:manual_notyet, -event) %>%
knitr::kable(digits = 4, format = "markdown")
cs_never | cs_notyet | etwfe_never | etwfe_notyet | manual_never | manual_notyet |
---|---|---|---|---|---|
-0.04 | -0.0398 | -0.0017 | -0.0477 | -0.04 | -0.0477 |
# Dynamic ATT ------------------------------------------------------------------
results %>%
filter(type == "dynamic") %>%
select(event, cs_never:manual_notyet) %>%
arrange(event) %>%
knitr::kable(digits = 4, format = "markdown")
event | cs_never | cs_notyet | etwfe_never | etwfe_notyet | manual_never | manual_notyet |
---|---|---|---|---|---|---|
-4 | - | - | 0.0000 | - | 0.0033 | - |
-3 | 0.0305 | 0.0298 | 0.0234 | - | 0.0250 | - |
-2 | -0.0006 | -0.0024 | 0.0228 | - | 0.0245 | - |
-1 | -0.0245 | -0.0243 | -0.0015 | - | 0.0000 | - |
0 | -0.0199 | -0.0189 | -0.0214 | -0.0311 | -0.0199 | -0.0311 |
1 | -0.0510 | -0.0536 | -0.0484 | -0.0522 | -0.0510 | -0.0522 |
2 | -0.1373 | -0.1363 | -0.1373 | -0.1361 | -0.1373 | -0.1361 |
3 | -0.1008 | -0.1008 | -0.1008 | -0.1047 | -0.1008 | -0.1047 |
# Group-Time ATT ---------------------------------------------------------------
results %>%
filter(type == "group_time") %>%
select(group, time, cs_never:manual_notyet, -event) %>%
knitr::kable(digits = 4, format = "markdown")
group | time | cs_never | cs_notyet | etwfe_never | etwfe_notyet | manual_never | manual_notyet |
---|---|---|---|---|---|---|---|
2004 | 2004 | -0.0105 | -0.0194 | -0.0105 | -0.0194 | -0.0105 | -0.0194 |
2004 | 2005 | -0.0704 | -0.0783 | -0.0704 | -0.0783 | -0.0704 | -0.0783 |
2004 | 2006 | -0.1373 | -0.1363 | -0.1373 | -0.1361 | -0.1373 | -0.1361 |
2004 | 2007 | -0.1008 | -0.1008 | -0.1008 | -0.1047 | -0.1008 | -0.1047 |
2006 | 2004 | 0.0065 | -0.0026 | 0.0065 | - | 0.0028 | - |
2006 | 2005 | -0.0028 | -0.0019 | 0.0038 | - | - | - |
2006 | 2006 | -0.0046 | 0.0047 | -0.0008 | 0.0025 | -0.0046 | 0.0025 |
2006 | 2007 | -0.0412 | -0.0412 | -0.0375 | -0.0392 | -0.0412 | -0.0392 |
2007 | 2004 | 0.0305 | 0.0298 | 0.0305 | - | 0.0338 | - |
2007 | 2005 | -0.0027 | -0.0024 | 0.0278 | - | 0.0311 | - |
2007 | 2006 | -0.0311 | -0.0311 | -0.0033 | - | - | - |
2007 | 2007 | -0.0261 | -0.0261 | -0.0294 | -0.0431 | -0.0261 | -0.0431 |
2006 | 2003 | - | - | - | - | -0.0038 | - |
2007 | 2003 | - | - | - | - | 0.0033 | - |
Created on 2023-06-21 with reprex v2.0.2
... so that it automatically invokes marginaleffects::comparisons
under the hood for emfx
. (i.e. evaluates mean(.Dtreat==1
- mean(.Dtreat==0)instead of
mean(dYdX)`).
This won't make a difference for linear models, but will slightly change the results for non-linear families. So it's a bug fix.
Ironically, this was the default in the first iterations of the package. But I changed it whilst troubleshooting the jwdid
difference in #4.
Hi Grant,
Thanks for sharing this package - it looks really cool! Naturally, I am wondering if I can wild (cluster) bootstrap this. =)
After taking a very brief (really very, very brief) glance at both code and the Wooldridge paper, my very topline understanding of the estimation procedure is that etwfe
implements a linear regression / glm with a set of interacted covariates and a few estimated properties (e.g. equation 5.7 in the Wooldridge paper attached below, where
After estimating the 'full' model, one then computes marginal effects, e.g. for the effect on a specific cohort, you compute the marginal effect for a given cohort. If we assume that there are
Here's a quick formulaic example to (hopefully) clarify:
We estimate a linear model
via OLS.
We then compute the average marginal effect at the mean for
Inference for the marginal effect then follows a standard
Do I get this right? If inference can be squeezed into the fwildclusterboot
work with etwfe
- but only under the assumption that
Best, Alex
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.