Coder Social home page Coder Social logo

dp-via-dro's Introduction

Differential Privacy via Distributionally Robust Optimization

GitHub supplement of the paper Differential Privacy via Distributionally Robust Optimization (DPviaDRO) by Aras Selvi, Huikang Liu, and Wolfram Wiesemann (2023). The paper has undergone a major revision, which is completed on the 23rd of May, 2024. All links are up-to-date, but we also include the earlier version of the manuscript in this repository.

The preprint is available on Optimization Online and arXiv. The paper was presented at the Robust Optimization Webinars, Rotman Young Scholar Seminar Series, SIAM Conference on Optimization, LBS Transatlantic Doctoral Conference, INFORMS Annual Meeting (2023), PGMO Days, University of Amsterdam, Georgia Tech ISyE Junior Researcher Workshop, and Imperial College Business School.

Update: This paper received a runner-up award for the 2023 INFORMS Optimization Society (IOS) Student Paper Prize, as well as a first place at Imperial College Business School best PhD paper in Analytics and Operations.

Description

The following is a guide to use this repository. The folder "GitHub Appendices" includes the additional experiments and discussions that are supplied with the GitHub repository. We now explain the codes in the following.

Data Independent Noise

This folder includes the C++ codes for the data independent noise optimization.

Data Dependent Noise

This folder includes the C++ codes for the data dependent noise optimization.

Multi-dimensional Data Independent Noise

This folder includes the C++ codes for the 2D query optimization (both directly optimizing it, or computing the 2D guarantees of the product of two 1-D distributions) that we document in the folder "GitHub Appendices" and file "GitHub_supplement.pdf" (pages 11-13). Note that this is a preliminary study, and it is not included in the main paper.

Dataset Cook

This folder has two Python3 Jupyter Notebooks which shows how we cleaned datasets examples: one for naïve Bayes, and another one for proximal coordinate descent.

Visualization

This folder has examples of Python3 visualizations that were used to generate Figures 3 and 4.

HPC Codes

This folder includes C++ compilation commands on the Imperial HPC's Linux Terminal as well as an example .PBS jobs file.

ML Algorithms Julia

Implementations of ML algorithms (NB, PCD) in Julia. analytic_gaussian.jl has the Analytic Gaussian mechanism's Julia implementation (we adopted the original Python implementation in Julia). compare.jl has the standard logistic regression (LR) implementation in case one wants to benchmark the non-private optimal LR classifier (uses MOSEK solver). grid_ell1_prep_main.jl, grid_ell1_prep_ub.jl, grid_ell1_prep_lb.jl are the files generating the code for Tables 1,4,5. interpret_results.jl includes the analysis (p-values, in/out-sample comparison, etc.) of the DP naïve Bayes methods. main.jl is the main DP-NB function including the Gaussian, truncated Laplace, and our optimised data independent mechanism, main_analytic.jl is the same for analytic Gaussian (added later on, hence in another function), and main_data_dependent.jl is the data dependent noise mechanism version which also prepares files for noise distributions to be optimized in C++ with the help of the function in read_distributions.jl. NB.jl has helper functions for the naïve Bayes method, including the non-noisy counts/statistics. The noisy versions are computed in the file sensitivites.jl. The file PCD_functions.jl includes the PCD iteration functions to be called in the iterations of the DP PCD algorithm. PCD_histogram.jl gives the histogram in the GitHub Additional Appendices. PCD_interpret_results ... .jl are the files we used to interpret PCD results. PCD.jl is the main function that runs the DP PCD algorithm, and PCD_multi.jl is the same which uses the data dependent noise (that was prepared in C++). sample.jl includes the functions that allow us to sample noise from various distributions. train_test.jl splits datasets into training and test-sets. The files visualize_bounds ... .jl were used to generate Figure 2.

Final Notes

The following scripts are also available upon request:

  • The C++ codes that we supplied are the simple implementation of our cutting-plane algorithms. We have several modifications that were used on the HPC to run parallel jobs.
  • Different (including hand-crafted) loss functions, and corresponding analytical "speeding up" stages for the cutting-plane method.
  • We shared how we cleaned datasets for two specific datasets. Similar codes for each of the datasets we used, and the final data itself are available upon request.
  • We provided some examples of HPC commands; however, for a specific PBS file, or a specific algorithm's parallelised implementation (to be able to run on the cluster computers), please get in touch with us.
  • We also have the codes of more visualizations that we did and did not report.

Thank You

Thank you for your interest in our work. If you found this work useful in your research and/or applications, please star this repository and cite:

@article{DPviaDRO,
  title={Differential Privacy via Distributionally Robust Optimization},
  author={Selvi, Aras and Liu, Huikang and Wiesemann, Wolfram},
  journal={arXiv preprint 2304.12681},
  year={2023}
}

Please contact Aras ([email protected]) if you encounter any issues using the scripts. For any other comment or question, please do not hesitate to contact us:

Aras Selvi ([email protected])

Huikang Liu ([email protected])

Wolfram Wiesemann ([email protected])

dp-via-dro's People

Contributors

selvi-aras avatar

Stargazers

 avatar  avatar Mahrokh Ghoddousiboroujeni avatar  avatar

Watchers

Linwei Sang avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.