csong27 / membership-inference Goto Github PK

View Code? Open in Web Editor NEW

171.0 171.0 63.0 8 KB

Code for Membership Inference Attack against Machine Learning Models (in Oakland 2017)

Python 100.00%

machine-learning privacy

membership-inference's People

Contributors

Stargazers

Watchers

Forkers

dongsuibm alishakiba hxsylzpf appcoreopc victor-dang martinstrobel mhsamavatian milkigit xiyueyiwan zach-delong traceless2016 neowu1216 chenke0616 simonjjj leoyoungbkit wuj1n9 ashiakerwang vickyqi7 alxshine zwfightzw katerina828 advboxzoo nalolah rishav1 nsusarla-eng ashah044 myadminc thesukantadey tek4vn liu199604 harrywuhust2022 cherry-licongyi hongshenghu derekozhang dbalaji2000 tha061 ashishklalbhu yyuzhongpv dong-jie-chen kypomon sssupertian shichuanxi scipiapps zedoul shohidulcse morlev zhangzp9970 gyx-0 xiaoxiaochi-code pang-0093 ianchen88 oli-ross frank1543179 nishaanthkanna muktocse mumubaibai lycogno ethan-lee-sunghoon arthasgao hurricahjz

membership-inference's Issues

about Trainset

i want to know that the train_feat_file & train_label_file are the train data set of Target? Should I build a Target classified model? Thank you.

Attack model for already trained model

Hi,
I am not an AI expert and I need to train your attack model. I am a little bit confused here.
Why do you train a target model? Isn't the target model the one provided by MLAAS or at least the attacker does not know about.
In other words, I have a trained model with inputs and outputs. I just need to train the shadow model and the attack model. Do I still need to train the target model again?
If not what do I have to change in the code?
Thanks

about some questions

Hello, Dr.song. I read your paper《Membership Inference Attacks Against Machine Learning Models》 the other day. I am very interested in it, but I have two questions about it. First, your attack requires the confidence values of the target model output. What if the output is not the confidence values? Second, the example you use, that is, a certain patient's clinical record was used to train a model associated with a disease determine the appropriate medicine dosage. If I input this person's information and the model outputs his medicine dosage, then this person must suffer from this disease and there is no need for member inference attacks, so what is the meaning of this article?I would appreciate your reply.

About the data

Could anybody help me out that what does 'train_feat' and 'train_label' mean? Do they serve as the training data of the target model? If so, how can I perform the experiment on datasets e.g., CIFAR?

Testing the attack model

Hello Dr. Song,
I was checking your paper and the code. I found two files are saved after your target model training. One is target_model.npz and another is attack_test_data.npz. Can you please mention which of these files is used in the attack model to evaluate the attack?

Thanks in advance.
Shuvo

About the Sample Algorithm introduced on purchase dataset

Hi, Dr. Song. I am now working on the privacy of federated learning, and I have read the article Membership Inference Attacks Against Machine Learning Models . It is very kind of you for sharing the source code.

But I have found my recurrence of attack on purchase dataset don't work well. Could you share the detail of sample algorithm of the simplified purchase dataset? I have found that each commodity are represented as dept, category, company, brand . And I am doubting in these following question :

If this code is for the attack on purchase dataset?
The primary key represent a certain commodity is composed by four parts . Do you remember which commodity feature you have used to represent the simplified primary key of each commodity ?
I have found that you used 600 feature from the commodity list. Can I understand like that : After random sampling on the purchase dataset, we randomly choose 600 columns of the matrix represent which commodity the user has bought. After that, we begin our k-means clustering algorithm, give each input a label?

Thank you for all your assistance.

how to convert input_var to a string matrix, if train_feat_file contains string instead of floats?

Hi csong, I am new in Tensor. I wanted to try your code, but my dataset contains string data instead of floating point values. How should i modify the code in my case? could you please help? Thanks

How to optimize the parameter

Hello, Song.
I try your code via UCI adult's salary data set and cifar10. I get a result similar with yours for adult's salary data set, however for cifar10, i get the a very low accuracy. I guess maybe some parameter is need optimized, such as batch_size or epochs.
So, I wonder if you willing to publish your parameters ? And another question is that for images, I should replace the Target's and Shadow's models from NN to CNN or other?
Thank you, I really appreciate your work.

What about ML-Leaks?

Hello Song
I guess maybe you knew a new paper named ML-Leaks, improved by your works. arxiv:1806.01246. What's your opinion? Do you think ML-Leaks is a more effective works?
Thank you.

Theano and Lasagne versions

Hi,

Could you please let me know the Theano and Lasagne versions used in the code?

Thanks.

About Algorithm 1 Data Synthesis Using the Target Model

Hi Dr. Song,

Thank you for providing us with the source code of the paper. I have been reading and repeating the experiment mentioned in the paper. However, I found that all the training dataset for shadow models just using the data records disjoint from target training dataset of specific dataset (like cifar-10) or replace k features in the code or other experiment implementations, like ml-leaks, cyphercat, mia and etc. Maybe, it could be a little bit different from the original algorithm in the paper.

I wrote the Algorithm 1: Data synthesis using the target model by myself using Pytorch. I generated a random tensor as size of (1, 3, 32, 32) for cifar-10 dataset and use two phases-search and sample as the algorithm in the paper. The code is as below:

def data_synthesize(net, trainset_size, fix_class, initial_record, k_max,
                    in_channels, img_size, batch_size, num_workers, device):
    """
    It is a function to synthesize data
    """
    # Initialize X_tensor with an initial_record, with size of (1, in_channels, img_size, img_size)
    X_tensor = initial_record
    # Generate y_tensor with the size equivalent to X_tensor's
    y_tensor = gen_class_tensor(trainset_size, fix_class)

    y_c_current = 0         # target models probability of fixed class
    j = 0                   # consecutive rejections counter
    k = k_max               # search radius
    max_iter = 100          # max iter number
    conf_min = 0.1          # min probability cutoff to consider a record member of the class
    rej_max = 5             # max number of consecutive rejections
    k_min = 1               # min radius of feature perturbation

    for _ in range(max_iter):

        dataset = TensorDataset(X_tensor, y_tensor)
        dataloader = DataLoader(dataset=dataset, batch_size=batch_size, num_workers=num_workers, shuffle=True)

        y_c = nn_predict_proba(net, dataloader, device, fix_class)

        # Phase 1: Search
        if y_c >= y_c_current:
            # Phase 2: Sample
            if y_c > conf_min and fix_class == torch.argmax(nn_predict(net, dataloader, device), dim=1):
                return X_tensor

            X_new_tensor = X_tensor
            y_c_current = y_c  # renew variables
            j = 0
        else:
            j += 1
            if j > rej_max:  # many consecutive rejects
                k = max(k_min, int(np.ceil(k / 2)))
                j = 0
        X_tensor = rand_tensor(X_new_tensor, k, in_channels, img_size, trainset_size)

    return X_tensor, y_c

However, the prediction probability it generates is so low, like 0.1. Could you please give me some guidance on the Data Synthesis Using the Target Model Algorithm or update the uploaded code? Thanks in advance for your patience!

Best wish!

Yantong

about Chiron

Hello Dr Song. I noticed that you are the author of Chiron. I want to know could I download Chiron and use it for evaluate my research. If so, please let me know the where could get it. Thank you.
By the way, I found your membership privacy inference is a real good job, however, the accuarcy is not high when classes c is not so much, for example, for MINST or Cifar10. I think maybe some improvement could do in the future. What about your opinion about it? and do you have such a plan?
Thank you.

About Target Data

Hello,

Excuse my lack of knowledge but I am failing to run your project.

The code requires a file target_data.npz and I am not sure if I should create that file or whether it should be provided.

Thank you.