This repository is an application of the differentially private penalized logistic regression method proposed in the following paper:
"Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases." Yu, F., M. Rybar, C. Uhler, S. Fienberg. (2014) Proceedings of the 2014 International Conference on Privacy in Statistical Databases.
The paper is available at here.
The example case/control genotype data (./data/anticase_genotypes_Nov09_interaction1_MAF025_1
and ./data/case_genotypes_Nov09_interaction1_MAF025_1
) were generated by HAP-SAMPLE with the following options:
- Population: CEU
- Source for SNPs:
Chrom9_Chrom13_snps.txt
- Disease Model File:
AR_chrom9_chrom13_Nov09_interaction1_MAF025.txt
- Simulation Type: Case/Control
- Number of Cases: 1000
- Number of Controls: 1000
- Average breaks per cM: 1
- Output Format: SNPs v. Individuals
The disease model file describes a disease with 2 causative SNPs having addive effects. For more details about the disease model, see Malaspinas & Uhler (2010).
In R
:
source("./simulation_for_paper.R")
source("./analyze_simulation_for_paper.R")
Make sure that MATLAB
and CVX
are installed.
analyze_simulation_for_paper.R
: analyze simulation results generated bysimulation_for_paper.R
and make plotssimulation_for_paper.R
: perform differentially private elastic-net penalized logistic regression multiple times and save the results to an RData file. Also perform non-private penalized logistic regression and save the results to an RData file.analyze_hapsample.R
: (helper script) load HapSample data into the workspace.run_cvx.R
: (helper script) a wrapper for runningCVX
forMATLAB
inR
.opt_elastic_net_for_R.m
: (helper script) aMATLAB
function that usesCVX
for optimization.
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin14.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lattice_0.20-29 R.matlab_3.1.1 glmnet_1.9-8 Matrix_1.1-4
loaded via a namespace (and not attached):
[1] digest_0.6.8 grid_3.1.2 gtable_0.1.2 plyr_1.8.1
[5] proto_0.3-10 R.methodsS3_1.6.1 R.oo_1.18.0 R.utils_1.34.0
[9] Rcpp_0.11.4