The privugger from itu-square

Creating documentation page

We should create and make publicly available a documentation page with the document ion of the API as well as tutorials and examples. Ideally, we should use Read the docs.

Installation guide in the documentation

We should have an pip based installation guide in the documentation. Then, we should remove the line sys.path.append(os.path.join("../../..")) from the notebooks.

Adding support for different inference backends

So far we perform inference using pymc3. However, in some situations (e.g., in the lack of conditions) it might be more efficient to directly sample using libraries like scipy.stats. Given that we have specification datatypes for defining distribution (#5), supporting different backends can be added. The infer should use the appropriate backend depending on the analysis to perform.

Hide packages structure from the `import` declarations

Currently, we need way too many imports to access the different packages in the library, e.g., data structures, distributions, etc. It would be better to have just one import with everything.

Support vectorized observations

This commit 126c7a0 adds support for 1d (integer) vector observations. This is useful for analyzing programs whose output is a 1d vector.

We must extend the enhancement to vectors of other types, e.g., float.
Additionally, we could consider extending observations to vectors of higher dimensions.

Creating specification datatypes for the inputs of the programs

We should support the definition of the input parameters of the program to analyze using a specification format. Dictionary formats like JSON are a good candidate.

## Spec (step 1)
names= {
    'dist': Uniform,
    'range': (0,50)
    # ...    
}    
ages={
    'dist': Normal,
    'mu': 0,
    'sigma': 1,
   #  ...    
}
# ...

These specifications will be used to create input datatypes as follows:

import privug as pv

pv_ds=pv.Dataset(input_spec=[names,ages], var_names=['names','ages',...])
# or
p1=pv.float(ages)
# or
p2=pv.int(names)

These will be finally used by the inference method together with the target program, e.g.:

pv.infer(target_program,pv_ds)

Enhance the README file in the repository

We should add example cases of using the privugger is useful and links to the documentation page

Adding support for plotting the pycm3 model

Perhaps separating the function for building the model and running the inference.

Running multiple times with pymc3 backend throws an error

When running infer multiple times with the pymc3 backend we get an error. It seems to be related to re-defining the pymc3 model more than once.

add_observation not working if using concatenate in input_spec

Since the distribution resulting of pv.concatenate has no name the method add_observation crashes

Support for concatenating and stacking distributions

It should be possible to concatenate and stack distributions over different axes.

Discuss new tests

Expand documentation with tutorial examples

We should add more examples of using privugger that serve as a guide for users of the library.

Support for attacker synthesis

We should have a module for attacker synthesis based on the work by @Pluttodk.

Implement a get_model function for the pymc3 backend

Implement a kl-divergence estimator using kde and approximate integration

Add requirements.txt file

Allow to rename output distribution

We can easily allow the user to rename the output distribution by adding a name parameter to the Program constructor.

Use `inferencedata` for all backends

Built-in functions for privacy risk queries

Visualizing a distribution
Querying probability
Querying starndard statistics

Adding support for multivariate distributions in the prior

Recurrent calls to analyses fail

When we run the analysis more than once in two different programs, the analysis doesn't seem to work well.

Concatenate without Deterministic node

fix requirements warning

Adding function to compute KL divergence

This is child issue of #4

Sampling from prior (in pycm3 backend)

It might be useful to be able to sample from the prior to compare leakage between prior/posterior

Changing the name of the library to privug instead of privugger?

Since we are developing a library, it seems more appropriate to call it privug. What do you think?

Add a Program datatype

udpate `opendp` notebook

The opendp library has updated its API. We should update our notebook using opendp.

Add a notebook example working on location data

`pv.Constant` + `pv.concatenate` for continuous variables does not work well

@CorentinPhilippe-Taylor noticed that when concatenating a pv.Constant random variable which is supposed to be continuous, pymc tries to use NUTS for that variable. Then it crashes as the gradient is 0.

Discuss input API for estimators of leakage measures

re variable not found

Hi, I was running the examples from the tutorial.
I got to this point:

trace   = pv.infer(program, 
                   cores=4,
                   draws=10_000,
                   method='pymc3')

This runs for a while but then fails with:

/python3.8/site-packages/privugger/data_structures/program.py", line 54, in add_observation
    vals = re.search(cons, constraints)
NameError: name 're' is not defined

I guessed that it was probably regex missing, so I tried just adding it as a reference to that.
Then I ran it again, to which I got:

python3.8/site-packages/privugger/data_structures/program.py", line 121, in inner
    pm.Normal(f"cons_{i}", distribution,        precision, observed=value)
NameError: name 'pm' is not defined

Could there be missing some imports? or am I maybe doing something wrong?

Add new module for privacy measures

We will add a module with functions to compute several privacy measures.

This survey contains a wide variety of metrics we can consider supporting.

Adding backend with pyro

Moving names of random varibles to declaration

requirements.txt outdated in develop branch

Simply running pip install -r requirements.txt in the develop branch does not install properly all required dependences. Here are a few details we should add (possibly others as well):

numpy==1.22.4
pymc==4.1.5

Create a front page for documentation

Alter the /docs/index.rst such that the front page and nav bar looks more like num.pyro.ai

Create functions for computing quasi-identifier histograms

Analyzing `lambda` or `def` functions directly

We should add support for analyzing a function that is not provided via a .py file.

Adding support for conditions

We should add support for conditions on random variables (modulo the selected backend).

Plotting mutual information given some parameter

Support hierarchical models

Code base refactoring

Make sure we use serpent case everywhere
data_structures class split in different classes within the data_structures module
method.py renamed inference.py

Porting pymc3 backend to pymc

This was triggered due to pymc3 not supporting the new ARM64 architecture in (some) modern laptops

Automatic inference (and program lifting) based on specifications

Related to the program transformation work that @RasmusCarl did, we should have an infer method that takes as input a python program, and a specification (as defined in #5) and returns the inferred trace. Something like,

import privug as pv

pv.infer(target_program, input_spec)

Create an easy way to install it via pip

Creating unit tests

We should create the infrastructure for unit testing.
Then we should add several unit tests checking the correctness of probability queries and privacy measures (#4).
- The tests should take into account the analytical expected value and variance of the estimators.

itu-square / privugger Goto Github PK

privugger's People

Contributors

Stargazers

Watchers

Forkers

privugger's Issues

Recommend Projects

Recommend Topics

Recommend Org