dragonnet's People
Forkers
arose13 zhengxuyu txiaowang yuhsienliu xiaobai0518 vishalbelsare mingjunzhong kingfish777 yanc-rs skn123 ouyang-zhicheng jameszhou-gl loveisbasa belwhat gtesei li-ming-fan datali01 zheyishine jumpyjumpy g-github-science tianlei822 gyafdxis-king zixuan0810 aayushagrawal135 ra237 myw823 cleeway markovsc ruijingcui cnfsll jiawei-zhang-a zhanglangjd code-wd apple-jack sunshineluyao forzamilan1899 wushicanasl yongyonglee hushenhua helenehanyu xiafangsummer 24flyman lvchakele chenjiecjj flora-jia-jfr fisheater zjalali2 ehsanx jerjer2465 aw814dragonnet's Issues
IHDP Dataset query
Hello @claudiashi57,
Could you tell why there are 50 CSVs as dataset for IHDP? Does it make sense to combine all the CSVs into one big CSV and then make observations on them?
Query about NPCI data
Could you please explain how you used NPCI to generate the 1k data files mentioned in the paper?
I tried to ask the author and he suggested getting back to you: vdorie/npci#2
This files would be useful to me to replicate your results and compare them to our own approach
Question about binary/real valued outcome
Nvm had an issue with understanding something in the text of the paper and made a bigger deal than it was on it. Thank you.
Upgrade for imports and functions
- Import has updated from
keras.optimizers
totensorflow.keras.optimizers
. -
y_scaler.inverse_transform
requires a 2 dimensions matrix, thus resizing required. -
tf.random.set_random_seed()
has updated totf.random.set_seed(i)
. -
lr
parameter in optimizers are deprecated for both Adam and SGD. Replaced withlearning_rate
. -
from keras.engine.topology import Layer
has updated tofrom tensorflow.keras.layers import Layer
. - Script at
src/experiment/run_ihdp.sh
is updated to make it more generic.
ihdp data indices
Hi,
Thanks for sharing your interesting work. I am trying to work through some of the results of the paper
I noticed that the column indices mentioned idhp_data.py
:
binfeats = [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
contfeats = [i for i in range(25) if i not in binfeats]
They do not match the columns in the csv files contained in the dat
folder. E.g. ihdp_npci_1.csv
Can you please advise if this is the correct reference files.
Thanks in advance.
about GPIO with wiringpi
There is no longer a site where I can use wiringpi to upgrade my Raspberry Pi's GPIO. Could you please create the repository again?
Query about the IHDP data folder
According to the the original paper (Hill 2011) there are 747 units (139 treated, 608 control). In the dat folder there are 50 csv.
Which is the original csv?
Also, Why are there are 50 csv? Are they simulated?
Correct test_size in train_test_split of ihdp_main.py to reproduce in-sample and out-sample paper results
From documentation:
Note: the default code use all the data for prediction and estimation. If you want to get the in-sample or out-sample error: i) change the train_test_split
criteria in ihdp_main.py
; ii) rerun the neural net training; iii) run ihdp_ate.py
with apporiate in-sample data and out-sample data.
From paper:
We randomly split the data into test/validation/train with proportion 63/27/10 and report the in sample and out of sample estimation errors.
Is this split correct (train size 10% vs. test size 63%)?
Demo notebook on simulated examples to check correctness of implementation?
Hi,
thanks for putting the code together!
I tried to train dragonnet on toy examples (KangSchafer) with 0 treatment effect. Various variations on hidden layers, batch sizes, etc yield estimates that are completely off (ATE = -60). Did you test your implementation on toy examples from literature or your own simulated ground truth data to verify it works as intended?
Some example code
np.random.seed(123)
import empirical_calibration as ec
simulation = ec.data.kang_schafer.Simulation(size=2000)
t = simulation.treatment.reshape(-1, 1)
x = simulation.transformed_covariates
y = simulation.outcome.reshape(-1, 1)
t.shape, x.shape, y.shape
# Use causalml to show other methods work as intended
def _ks_df(size, seed = None):
if seed:
np.random.seed(seed)
simulation = ec.data.kang_schafer.Simulation(size=size)
df = pd.DataFrame(
np.column_stack([
simulation.treatment, simulation.covariates,
simulation.transformed_covariates, simulation.outcome
]))
df.columns = [
"treatment", "z1", "z2", "z3", "z4", "x1", "x2", "x3", "x4", "outcome"
]
return df
df = _ks_df(size=1000)
from causalml.inference.meta import XGBTRegressor
xg = XGBTRegressor(random_state=42)
te, lb, ub = xg.estimate_ate(df[["x1", "x2", "x3", "x4"]],
df["treatment"], df["outcome"])
print('Average Treatment Effect (XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
When using dragonnet via the acic_main
functions the estimates are entirely off essentially equal to the very naive method of taking outcome differences (--> ATE estimates of roughly -20).
test_outputs, train_outputs = acic_main.train_and_predict_dragons(t, y, x,
targeted_regularization=True,
output_dir="",
dragon="dragonnet",
knob_loss=models.dragonnet_loss_binarycross,
ratio=1.,
val_split=0.2, batch_size=64, hidden_size_multiplier=2, verbose=False)
Doing this 10 times for sample size of n=5000 the ATE estimates (by method) are as follows:
Similarly for targeted_regularization=False
. Do you have notebooks/documentation on running this on such toy examples for verification of implementation?
Interested in the Table Results
Hi Claudia,
Your work is interesting! I have a little confusion (why two TARNet results) on Table 1 as follows:
I noticed that the stats in the upper section are cited from their original paper, and you attach your testing results in the bottom section. Since all the algorithms in the upper section use the data provided by TARNet paper (which I assume it is a widely used IHDP simulated data), and I also notice that you use your own simulated IHDP dataset, is that the reason why you do a baseline (TARNet) testing on your own simulated dataset, that that's the reason we see two TARNet results on Table 1?
Thanks for any reply in advance!
Regards,
Hechuan
Precisions concerning the ITE computation
Hi Claudia,
Thanks for your work. Could you please elaborate on the way the ITE is computed in the semi-parametric estimation file ?
In particular I don't understand how the
Regards,
Armand
Multiple treatments
Hi, thank you for your work. It is very interesting. I am currently trying to adopt your work to my problem but my problem has several possible treatments and I am having difficulty in generalizing some of the equations. Could you kindly provide some guidance on this?
Code does not match with description in paper
Thank you for sharing your code-base publicly. The idea presented in the paper is interesting. There are, however, several disparities between this code-base and the paper; these include:
-
Not only
make_tarnet
andmake_dragonnet
share the same code, but also the same objective function is used to learn the parameters of TARnet and DRAGONnet. Therefore, the results must be the same. -
It is mentioned in the paper that:
To find the relevant parts of X, first, train a deep net to predict T. Then remove the final (predictive) layer. Finally, use the activation of the remaining net as features for predicting the outcome.
However, the code is implemented such that both outcome loss and cross entropy loss are optimized in the same objective function.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.