Coder Social home page Coder Social logo

shazi4399 / federated-boosted-dp-trees Goto Github PK

View Code? Open in Web Editor NEW

This project forked from samuel-maddock/federated-boosted-dp-trees

0.0 0.0 0.0 61.95 MB

Code for the CCS'22 paper "Federated Boosted Decision Trees with Differential Privacy"

Shell 0.14% Python 99.86%

federated-boosted-dp-trees's Introduction

Code for "Federated Boosted Decision Trees with Differential Privacy"

This repository contains code for the ACM CCS'22 paper "Federated Boosted Decision Trees with Differential Privacy"

Reference

If the code and/or paper contained in this repository were useful to you please consider citing this work:

@inproceedings{maddock2022federated,
  author = {Maddock, Samuel and Cormode, Graham and Wang, Tianhao and Maple, Carsten and Jha, Somesh}, 
  title = {Federated Boosted Decision Trees with Differential Privacy}, 
  year = {2022}, 
  isbn = {9781450394505}, 
  publisher = {Association for Computing Machinery}, 
  booktitle = {Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security}, 
  address = {New York, NY, USA}, url = {https://doi.org/10.1145/3548606.3560687}, doi = {10.1145/3548606.3560687}, 
  pages = {2249โ€“2263}, 
  location = {Los Angeles, CA, USA}, series = {CCS '22} 
}

Outline

The code is split into two components

  • experiments - Main code for running and plotting experiments
  • federated_gbdt - Main code for the private GBDT model

In order to run any of the code, datasets need to be downloaded and placed in the data directory. We use the following datasets in our experiments: Credit 1, Credit 2, Adult, Bank, Nomao, Higgs

In order to generate plots and tables as in the paper see "Paper Experiments, Plots and Tables"

In order to replicate the main figures in the paper from scratch see "Replication Instructions"

Framework

The code structure of federated_gbdt is as follows

  • core
    • binning- Contains quantile sketching code from the FEVERLESS implementation
    • dp_multiq - Central DP quantiles (not used in the paper)
    • moments_accountant - TensorFlow Privacy Moments Accountant (RDP)
    • pure_ldp - LDP protocols (not used in paper)
    • baseline_constants.py - Contains constants for the FEVERLESS implementation of quantile sketching
    • loss_functions.py - Contains loss functions used in the GBDT algorithm
    • plotting.py - Debugging code
  • models
    • base
      • jit_functions.py - Numba functions for computing GBDT quantities needed for training (split scores and weights)
      • tree_node.py - Contains the DecisionNode class reworked from the FEVERLESS implementation
      • tree_base.py - Base tree implementation
    • gbdt
      • components
        • index_sampler.py- Contains the IndexSampler class for managing which features/observations a tree uses during training
        • privacy_accountant.py - Contains the PrivacyAccountant class for managing DP during training of a PrivateGBDT
        • split_candidate_manager.py - Manages the various methods used to propose split candidates
        • train_monitor.py - Monitors various training statistics of a PrivateGBDT model
      • private_gbdt.py - Contains the main model PrivateGBDT class

Paper Experiments, Plots and Table

All experiments were run with 15 iterations in total (3 iterations over 5 different train-test sets). Code for running experiments is in experiments/paper_experiments/paper_experiments.py and plotting in experiments/paper_experiments/paper_plotter.py

The following methods in paper_experiments.py corresponds to the following figures/tables in the paper:

  • dp_split_methods_with_update_methods - Corresponds to Figure 1 (a,b,c), Table 2 in main text, Figures 7-10 and Table 7-10 in the Appendix
  • dp_split_candidate_methods - Corresponds to Figure 2 (a,b,c) and Table 3 in the main text, Figure 11 and 12 in the Appendix
  • feature_interaction_experiments - Corresponds to Figure 3
  • dp_ebm_experiment - Corresponds to Figure 4
  • batched_boosting - Corresponds to Figure 5, Table 4 in the main text, Figure 13 in the Appendix
  • comparisons_experiment - Corresponds to Figure 6 in the main text and Figures 14-18 in the Appendix

The associated figures are already generated and present in experiments/paper_experiments/paper_plots. To recreate them the following methods in experiment_plotter.py can be used:

  • plot_split_methods_with_update - Figure 1(a,b,c)
  • plot_split_candidates - Figure 2(a,b,c)
  • plot_k_way - Figure 3
  • plot_ebm_comparisons - Figure 4
  • plot_low_eps_bb - Figure 5
  • plot_comparisons - Figure 6
  • table_split_methods_with_update - Table 2
  • table_split_candidate - Table 3
  • table_low_eps_bb - Table 4

Plots and tables for the Appendix can be recreated via the following (although they are already present in paper_plots):

  • appendix_E1 - Figures 7,8,9,10
  • appendix_E1_table - Tables 7,8,9,10
  • appendix_E2 - Figure 12
  • appendix_E3 - Not used
  • appendix_E4 - Figure 13
  • appendix_E5- Figure 14, 15, 16, 17, 18

Replication Instructions

As mentioned above all plots and figures from the main paper and appendix can be replicated from the .csv files by running the appropriate methods in paper_plotter.py

As all experiments in the paper are repeated over 15 iterations they are usually too slow to replicate within a reasonable amount of time. Instead, to roughly replicate an experiment from scratch additional code is provided in experiments/replication_experiments and in particular the experiment_replicator.py file

This contains an ExperimentReplicator class with a replicate(figure_num) method. Most replication experiments have been designed to run on the Credit 1 dataset in ~30 minutes depending on the device. Most run on a single test-train seed over 3 iterations.

The experiments/replication_experiments folder already contains data and replication figures for all 6 figures presented in the main paper. You can also generate appendix figures by changing the dataset that is passed to ExperimentReplicator.replicate

Benchmark replication times performed on a Macbook Air M1:

  • Fig 1(a,b,c): ~30 mins
  • Fig 2(a,b,c): ~45 mins
  • Fig 3: ~20 mins
  • Fig 4: ~10 mins
  • Fig 5: ~15 mins
  • Fig 6: ~25 mins

Acknowledgements

  • Part of the tree structure implementation is based on the public implementation of the FEVERLESS paper with code repo here
  • We make extensive use of the autodp library by Yu-Xiang Wang to verify privacy accounting
  • Part of our privacy accountant uses the RDP moments accountant implemented in TensorFlow Privacy
  • Although not used in our paper, the code supports using datasets from the Penn Machine Learning Benchmarks (PMLB)

federated-boosted-dp-trees's People

Contributors

samuel-maddock avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.