aahrens1 / ddml Goto Github PK

View Code? Open in Web Editor NEW

8.0 5.0 5.0 5.75 MB

Double/Debiased Machine Learning implementation for Stata

License: MIT License

Stata 98.08% TeX 0.01% R 1.91%

machine-learning causal-inference stata

ddml's Issues

add "simple" Stacking within Stata

Saved ATE/ATET/ATEU results

All estimation results are saved in associative arrays. These are currently uniquely identified by the model name (the Mata object), the specification (number, "ss" or "mse") and the resample (number, "mn" or "md").

This isn't enough to uniquely identify the results for the interactive model because there are three flavours: ATE, ATET and ATEU. Current behaviour is to save the last one estimated, so if you e.g. estimate ATE and then ATET, the saved ATE will be overwritten.

May need to add an additional associative array key to deal with this.

return weights when used with pystacked

Can the DDML command be used for mediator and moderator analysis?

Can the DDML command be used for mediator and moderator analysis? If so, please provide some examples.
Thank you!

Flexible IV output

Estimation output has the Y learner and D learner but is missing the DH learner. From the help file:

Min MSE DDML model, specification 7
y-E[y|X]  = Y2_pystacked_1                         Number of obs   =      2217
D-E[D|X,Z]= Dhat_pystacked_1
------------------------------------------------------------------------------
       share |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |  -.0444965   .0046333    -9.60   0.000    -.0535775   -.0354154
------------------------------------------------------------------------------

optimal iv model: fix LIE bug

Add support for weights

crossfit needs to accommodate weights in parsing of estimation strings.

Error:Cross-fitting fold 1 RMSPE_cv not found

The example code provided in the "Interactive model--ATE and ATET estimation" method at the ddml help file has a mistake.

This is the example code:
'''
webuse cattaneo2, clear

global Y bweight
global D mbsmoke
global X prenatal1 mmarried fbaby mage medu
set seed 42

ddml init interactive, kfolds(5) reps(5)

ddml E[Y|X,D]: pystacked $Y $X, type(reg) methods(ols gradboost)
ddml E[D|X]: pystacked $D $X, type(class) methods(logit gradboost)

ddml crossfit

ddml estimate
ddml estimate, atet
'''

Run this code and get a ”Cross-fitting fold 1 RMSPE_cv not found“ error, probably some problems with the ols and logit parameter calls.

Too many combos

The number of combinations of learners can explode, esp when there are more than 2 conditional expectations being estimated.

We should probably have some kind of "nocombos" or "ssonly" option so that the only results that are reported are the shortstack results.

Might want to add a warning if the number of combinations is large, or maybe have the number of combinations reported by ddml describe, or maybe report it as part of the crossfitting output.

Renaming options

optimaliv to ivhd?
ddml sample to something else? (nb: both ddml init and ddml sample take reps(.) and kfolds(.) options)

Flexible IV + multiple Ds

The allcombos code for flexible IV doesn't work properly when there are multiple endogenous regressors.

The problem is that the D and D_h learners need to be paired together, but this isn't being respected. Say there are 2 D variables, D1 and D2. As written, the code can pair a learner for D1 with a D_h learner for D2.

The fix for this would be a bit messy, and multiple endogenous regressors is not a common specification, so for now I've added a check that disallows multiple D variables with flexible IV.

How to add interactive variables and estimate interactive effects in "Interactive model--ATE and ATET estimation" method?

The help document for the ddml command does not give any information on how to add interactive variables and estimate interactive effects in the "Interactive model--ATE and ATET estimation" method.

I want to do some interactive effects analysis using ddml, Could you give me some advices?

Thanks very much!

How to interpret the coefficients of the ddml command output table？

Below is the code and output table for the analysis using the ddml command. What is the meaning of the coefficients of the variables in the output table? How can they be interpreted?

*varsVG9
local varsVG9 Per_2021_w co_live disease nhQ6a_1_c1 nhQ1a2 nhQ1a3 nhQ1a6 village_kind cjtotal2021_h cjedu_sec

global D1 nhkind2
global D2 cjgovin
global Y Targeting_Errors_Esub
global X `varsVG9'

set seed 44

ddml init partial, kfolds(4) 

local trees = 500

ddml E[Y|X]: pystacked $Y $X, type(class) method(rf) cmdopt1(n_estimators(`trees'))

ddml E[D|X]: pystacked $D1 $X, type(class) method(rf) cmdopt1(n_estimators(`trees'))
ddml E[D|X]: pystacked $D2 $X, type(reg) method(rf) cmdopt1(n_estimators(`trees'))


ddml crossfit
ddml estimate, robust

Targe~s_Esub	Coefficient	Robust std. err.	z	P>z	[95% conf.	interval]
nhkind2	-.0457739	.0213447	-2.14	0.032	-.0876087	-.0039391
cjgovin	-.0809603	.0389666	-2.08	0.038	-.1573335	-.0045871
_cons	-.0108885	.0074808 l	-1.46	0.146	-.0255505	.0037736

re-sampling: multiple runs with different folds

Allow for non-binary D with `interactiveiv`

With a multivalued ordered treatment (D), the interactiveiv estimator can be interpreted as estimating the average causal response introduced in Angrist and Imbens (1995, JASA). This follows from arguments in Frolich (2007, JoE).

Currently, an error is thrown if D is not binary:

. ddml crossfit, shortstack finalest(nnls1) nostdstack
error - interactiveiv model supported only for D=0 or D=1

It would be better not to throw an error.

@thomaswiemann and I have discussed this for the ddml R package and the change was made a couple of months ago and uploaded to CRAN.

use "E[y|x]" instead of "yeq" in multi-line syntax

how to predict in test partition

Hi, It is not clear to me how to predict in a test partition (svar==2).

Tried unsuccessfully:

. ddml init partial if svar ==1, kfolds(2)
warning - model m0 already exists
all existing model results and variables will
be dropped and model m0 will be re-initialized

. ddml E[Y|X]: pystacked $Y `covars', type(reg) method(rf)
Learner Y1_pystacked added successfully.

. ddml E[D|X]: pystacked $D `covars', type(reg) method(rf)
Learner D1_pystacked added successfully.

. ddml crossfit
Cross-fitting E[y|X] equation: sales
Cross-fitting fold 1 2 ...completed cross-fitting
Cross-fitting E[D|X] equation: price
Cross-fitting fold 1 2 ...completed cross-fitting

. predict double yhat if svar ==2
error: data in memory has changed since last -pystacked- call
you are not allowed to change data in memory between -pystacked- fit and -predict-
r(198);

aahrens1 / ddml Goto Github PK

ddml's Issues

Recommend Projects

Recommend Topics

Recommend Org