csong27 / membership-inference Goto Github PK
View Code? Open in Web Editor NEWCode for Membership Inference Attack against Machine Learning Models (in Oakland 2017)
Code for Membership Inference Attack against Machine Learning Models (in Oakland 2017)
i want to know that the train_feat_file & train_label_file are the train data set of Target? Should I build a Target classified model? Thank you.
Hi,
I am not an AI expert and I need to train your attack model. I am a little bit confused here.
Why do you train a target model? Isn't the target model the one provided by MLAAS or at least the attacker does not know about.
In other words, I have a trained model with inputs and outputs. I just need to train the shadow model and the attack model. Do I still need to train the target model again?
If not what do I have to change in the code?
Thanks
Hello, Dr.song. I read your paper《Membership Inference Attacks Against Machine Learning Models》 the other day. I am very interested in it, but I have two questions about it. First, your attack requires the confidence values of the target model output. What if the output is not the confidence values? Second, the example you use, that is, a certain patient's clinical record was used to train a model associated with a disease determine the appropriate medicine dosage. If I input this person's information and the model outputs his medicine dosage, then this person must suffer from this disease and there is no need for member inference attacks, so what is the meaning of this article?I would appreciate your reply.
Could anybody help me out that what does 'train_feat' and 'train_label' mean? Do they serve as the training data of the target model? If so, how can I perform the experiment on datasets e.g., CIFAR?
Hello Dr. Song,
I was checking your paper and the code. I found two files are saved after your target model training. One is target_model.npz and another is attack_test_data.npz. Can you please mention which of these files is used in the attack model to evaluate the attack?
Thanks in advance.
Shuvo
Hi, Dr. Song. I am now working on the privacy of federated learning, and I have read the article Membership Inference Attacks Against Machine Learning Models . It is very kind of you for sharing the source code.
But I have found my recurrence of attack on purchase dataset don't work well. Could you share the detail of sample algorithm of the simplified purchase dataset? I have found that each commodity are represented as dept, category, company, brand
. And I am doubting in these following question :
Thank you for all your assistance.
Hi csong, I am new in Tensor. I wanted to try your code, but my dataset contains string data instead of floating point values. How should i modify the code in my case? could you please help? Thanks
Hello, Song.
I try your code via UCI adult's salary data set and cifar10. I get a result similar with yours for adult's salary data set, however for cifar10, i get the a very low accuracy. I guess maybe some parameter is need optimized, such as batch_size or epochs.
So, I wonder if you willing to publish your parameters ? And another question is that for images, I should replace the Target's and Shadow's models from NN to CNN or other?
Thank you, I really appreciate your work.
Hello Song
I guess maybe you knew a new paper named ML-Leaks, improved by your works. arxiv:1806.01246. What's your opinion? Do you think ML-Leaks is a more effective works?
Thank you.
Hi,
Could you please let me know the Theano and Lasagne versions used in the code?
Thanks.
Hi Dr. Song,
Thank you for providing us with the source code of the paper. I have been reading and repeating the experiment mentioned in the paper. However, I found that all the training dataset for shadow models just using the data records disjoint from target training dataset of specific dataset (like cifar-10) or replace k features in the code or other experiment implementations, like ml-leaks, cyphercat, mia and etc. Maybe, it could be a little bit different from the original algorithm in the paper.
I wrote the Algorithm 1: Data synthesis using the target model by myself using Pytorch. I generated a random tensor as size of (1, 3, 32, 32) for cifar-10 dataset and use two phases-search and sample as the algorithm in the paper. The code is as below:
def data_synthesize(net, trainset_size, fix_class, initial_record, k_max,
in_channels, img_size, batch_size, num_workers, device):
"""
It is a function to synthesize data
"""
# Initialize X_tensor with an initial_record, with size of (1, in_channels, img_size, img_size)
X_tensor = initial_record
# Generate y_tensor with the size equivalent to X_tensor's
y_tensor = gen_class_tensor(trainset_size, fix_class)
y_c_current = 0 # target models probability of fixed class
j = 0 # consecutive rejections counter
k = k_max # search radius
max_iter = 100 # max iter number
conf_min = 0.1 # min probability cutoff to consider a record member of the class
rej_max = 5 # max number of consecutive rejections
k_min = 1 # min radius of feature perturbation
for _ in range(max_iter):
dataset = TensorDataset(X_tensor, y_tensor)
dataloader = DataLoader(dataset=dataset, batch_size=batch_size, num_workers=num_workers, shuffle=True)
y_c = nn_predict_proba(net, dataloader, device, fix_class)
# Phase 1: Search
if y_c >= y_c_current:
# Phase 2: Sample
if y_c > conf_min and fix_class == torch.argmax(nn_predict(net, dataloader, device), dim=1):
return X_tensor
X_new_tensor = X_tensor
y_c_current = y_c # renew variables
j = 0
else:
j += 1
if j > rej_max: # many consecutive rejects
k = max(k_min, int(np.ceil(k / 2)))
j = 0
X_tensor = rand_tensor(X_new_tensor, k, in_channels, img_size, trainset_size)
return X_tensor, y_c
However, the prediction probability it generates is so low, like 0.1. Could you please give me some guidance on the Data Synthesis Using the Target Model Algorithm or update the uploaded code? Thanks in advance for your patience!
Best wish!
Yantong
Hello Dr Song. I noticed that you are the author of Chiron. I want to know could I download Chiron and use it for evaluate my research. If so, please let me know the where could get it. Thank you.
By the way, I found your membership privacy inference is a real good job, however, the accuarcy is not high when classes c is not so much, for example, for MINST or Cifar10. I think maybe some improvement could do in the future. What about your opinion about it? and do you have such a plan?
Thank you.
Hello,
Excuse my lack of knowledge but I am failing to run your project.
The code requires a file target_data.npz
and I am not sure if I should create that file or whether it should be provided.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.