aahrens1 / ddml Goto Github PK
View Code? Open in Web Editor NEWDouble/Debiased Machine Learning implementation for Stata
License: MIT License
Double/Debiased Machine Learning implementation for Stata
License: MIT License
All estimation results are saved in associative arrays. These are currently uniquely identified by the model name (the Mata object), the specification (number, "ss" or "mse") and the resample (number, "mn" or "md").
This isn't enough to uniquely identify the results for the interactive model because there are three flavours: ATE, ATET and ATEU. Current behaviour is to save the last one estimated, so if you e.g. estimate ATE and then ATET, the saved ATE will be overwritten.
May need to add an additional associative array key to deal with this.
Can the DDML command be used for mediator and moderator analysis? If so, please provide some examples.
Thank you!
Estimation output has the Y learner and D learner but is missing the DH learner. From the help file:
Min MSE DDML model, specification 7
y-E[y|X] = Y2_pystacked_1 Number of obs = 2217
D-E[D|X,Z]= Dhat_pystacked_1
------------------------------------------------------------------------------
share | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | -.0444965 .0046333 -9.60 0.000 -.0535775 -.0354154
------------------------------------------------------------------------------
crossfit needs to accommodate weights in parsing of estimation strings.
The example code provided in the "Interactive model--ATE and ATET estimation" method at the ddml help file has a mistake.
This is the example code:
'''
webuse cattaneo2, clear
global Y bweight
global D mbsmoke
global X prenatal1 mmarried fbaby mage medu
set seed 42
ddml init interactive, kfolds(5) reps(5)
ddml E[Y|X,D]: pystacked $Y $X, type(reg) methods(ols gradboost)
ddml E[D|X]: pystacked $D $X, type(class) methods(logit gradboost)
ddml crossfit
ddml estimate
ddml estimate, atet
'''
Run this code and get a ”Cross-fitting fold 1 RMSPE_cv not found“ error, probably some problems with the ols and logit parameter calls.
The number of combinations of learners can explode, esp when there are more than 2 conditional expectations being estimated.
We should probably have some kind of "nocombos" or "ssonly" option so that the only results that are reported are the shortstack results.
Might want to add a warning if the number of combinations is large, or maybe have the number of combinations reported by ddml describe, or maybe report it as part of the crossfitting output.
optimaliv to ivhd?
ddml sample to something else? (nb: both ddml init and ddml sample take reps(.) and kfolds(.) options)
The allcombos code for flexible IV doesn't work properly when there are multiple endogenous regressors.
The problem is that the D and D_h learners need to be paired together, but this isn't being respected. Say there are 2 D variables, D1 and D2. As written, the code can pair a learner for D1 with a D_h learner for D2.
The fix for this would be a bit messy, and multiple endogenous regressors is not a common specification, so for now I've added a check that disallows multiple D variables with flexible IV.
The help document for the ddml command does not give any information on how to add interactive variables and estimate interactive effects in the "Interactive model--ATE and ATET estimation" method.
I want to do some interactive effects analysis using ddml, Could you give me some advices?
Thanks very much!
Below is the code and output table for the analysis using the ddml command. What is the meaning of the coefficients of the variables in the output table? How can they be interpreted?
*varsVG9
local varsVG9 Per_2021_w co_live disease nhQ6a_1_c1 nhQ1a2 nhQ1a3 nhQ1a6 village_kind cjtotal2021_h cjedu_sec
global D1 nhkind2
global D2 cjgovin
global Y Targeting_Errors_Esub
global X `varsVG9'
set seed 44
ddml init partial, kfolds(4)
local trees = 500
ddml E[Y|X]: pystacked $Y $X, type(class) method(rf) cmdopt1(n_estimators(`trees'))
ddml E[D|X]: pystacked $D1 $X, type(class) method(rf) cmdopt1(n_estimators(`trees'))
ddml E[D|X]: pystacked $D2 $X, type(reg) method(rf) cmdopt1(n_estimators(`trees'))
ddml crossfit
ddml estimate, robust
Targe~s_Esub | Coefficient | Robust std. err. | z | P>z | [95% conf. | interval] |
---|---|---|---|---|---|---|
nhkind2 | -.0457739 | .0213447 | -2.14 | 0.032 | -.0876087 | -.0039391 |
cjgovin | -.0809603 | .0389666 | -2.08 | 0.038 | -.1573335 | -.0045871 |
_cons | -.0108885 | .0074808 l | -1.46 | 0.146 | -.0255505 | .0037736 |
With a multivalued ordered treatment (D), the interactiveiv
estimator can be interpreted as estimating the average causal response introduced in Angrist and Imbens (1995, JASA). This follows from arguments in Frolich (2007, JoE).
Currently, an error is thrown if D is not binary:
. ddml crossfit, shortstack finalest(nnls1) nostdstack
error - interactiveiv model supported only for D=0 or D=1
It would be better not to throw an error.
@thomaswiemann and I have discussed this for the ddml
R package and the change was made a couple of months ago and uploaded to CRAN.
Hi, It is not clear to me how to predict in a test partition (svar==2).
Tried unsuccessfully:
. ddml init partial if svar ==1, kfolds(2)
warning - model m0 already exists
all existing model results and variables will
be dropped and model m0 will be re-initialized
. ddml E[Y|X]: pystacked $Y `covars', type(reg) method(rf)
Learner Y1_pystacked added successfully.
. ddml E[D|X]: pystacked $D `covars', type(reg) method(rf)
Learner D1_pystacked added successfully.
. ddml crossfit
Cross-fitting E[y|X] equation: sales
Cross-fitting fold 1 2 ...completed cross-fitting
Cross-fitting E[D|X] equation: price
Cross-fitting fold 1 2 ...completed cross-fitting
. predict double yhat if svar ==2
error: data in memory has changed since last -pystacked- call
you are not allowed to change data in memory between -pystacked- fit and -predict-
r(198);
The DDML command is fantastic. Can you develop a corresponding R package?
Thank you!
How the problem " crossfitting fold 1 unrecognized command" is solved?
SSIA, as they used to say.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.