Coder Social home page Coder Social logo

ifca-advanced-computing / pycanon Goto Github PK

View Code? Open in Web Editor NEW
25.0 5.0 4.0 3.02 MB

pyCANON is a Python library and CLI to assess the values of the parameters associated with the most common privacy-preserving techniques.

License: Apache License 2.0

Python 100.00%
anonymization data-privacy k-anonymity l-diversity privacy-preserving data-analysis data-science datascience t-closeness anonymity

pycanon's Introduction

pyCANON

License Documentation Status Pipeline Status

pyCANON is a Python library and CLI to assess the values of the parameters associated with the most common privacy-preserving techniques via anonymization.

Authors: Judith Sáinz-Pardo Díaz and Álvaro López García (IFCA - CSIC).

Installation

We recommend to use Python3 with virtualenv:

virtualenv .venv -p python3
source .venv/bin/activate

Then run the following command to install the library and all its requirements:

pip install pycanon

If you also want to install the functionality that allows to generate PDF files for the reports, install as follows

pip install pycanon[PDF]

Documentation

The pyCANON documentation is hosted on Read the Docs.

Getting started

Example using the adult dataset:

import pandas as pd
from pycanon import anonymity, report

FILE_NAME = "adult.csv"
QI = ["age", "education", "occupation", "relationship", "sex", "native-country"]
SA = ["salary-class"]
DATA = pd.read_csv(FILE_NAME)

# Calculate k for k-anonymity:
k = anonymity.k_anonymity(DATA, QI)

# Print the anonymity report:
report.print_report(DATA, QI, SA)

Description

pyCANON allows to check if the following privacy-preserving techniques are verified and the value of the parameters associated with each of them.

Technique pyCANON function Parameters Notes
k-anonymity k_anonymity k: int  
(α, k)-anonymity alpha_k_anonymity α: float k:int  
ℓ-diversity l_diversity : int  
Entropy ℓ-diversity entropy_l_diversity : int  
Recursive (c,ℓ)-diversity recursive_c_l_diversity c: int : int Not calculated if ℓ=1
Basic β-likeness basic_beta_likeness β: float  
Enhanced β-likeness enhanced_beta_likeness β: float  
t-closeness t_closeness t: float For numerical attributes the definition of the EMD (one-dimensional Earth Mover’s Distance) is used. For categorical attributes, the metric "Equal Distance" is used.
δ-disclosure privacy delta_disclosure δ: float  

More information can be found in this paper.

In addition, a report can be obtained including information on the equivalence claases and the usefulness of the data. In particular, for the latter the following three classically used metrics are implemented (as defined in the documentation): average equivalence class size, classification metric and discernability metric.

Citation

If you are using pyCANON you can cite it as follows:

@article{sainzpardo2022pycanon,
   title={A Python library to check the level of anonymity of a dataset},
   author={S{\'a}inz-Pardo D{\'\i}az, Judith and L{\'o}pez Garc{\'\i}a, {\'A}lvaro},
   journal={Scientific Data},
   volume={9},
   number={1},
   pages={785},
   year={2022},
   publisher={Nature Publishing Group UK London}}

Acknowledgments

The authors would like to thank the funding through the European Union - NextGenerationEU (Regulation EU 2020/2094), through CSIC’s Global Health Platform (PTI+ Salud Global) and the support from the project AI4EOSC “Artificial Intelligence for the European Open Science Cloud” that has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement number 101058593.

pycanon's People

Contributors

alvarolopez avatar judithspd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pycanon's Issues

AttributeError: 'str' object has no attribute 'columns'

Hello :) Thank you for the new library, it is handy to have all the privacy-preserving techniques together.

I would like to share a minor issue I encountered when trying to replicate the example you provided with the adult.csv dataset in the "Getting started" section. The anonymity function imported in the beginning seems to recognize the adult.csv as a string and not as a csv file.

Traceback (most recent call last):
File "", line 1, in
File "/Users/user/.venv/lib/python3.8/site-packages/pycanon/anonymity/_k_anonymity.py", line 41, in k_anonymity
aux_functions.check_qi(data, quasi_ident)
File "/Users/user/.venv/lib/python3.8/site-packages/pycanon/anonymity/utils/aux_functions.py", line 65, in check_qi
cols = data.columns
AttributeError: 'str' object has no attribute 'columns'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.