aranganath / adversarial Goto Github PK

Python 0.94% Jupyter Notebook 99.06%

adversarial's Introduction

Adversarial Sample Detection and Classification with Clustering

The class ClusteringAdvClassifier implements a SKLearn style classifier (with the fit, predict, and score methods implemented) which uses a combination of another trained model (provided by the user) and clustering algorithms to detect and classify adversarial samples. The basic principle behind the clustering algorithm is to detect high magnitude changes in the output of the provided model, indicated by outputs being in a different cluster than regular outputs of the same class. It determines which output cluster a sample's output is expected to fall into by also clustering the sample in the input space, and predicting the output based on the cluster that the input falls into. The training procedure works as follows:

Cluster the input to the classifier based on provided labels (applies a SVM with a RBF kernel to create a round boundary around the data)
Run the provided training samples through the provided model, which is assumed to be already trained.
Cluster the outputs of the provided model, using the provided label.

By training the clustering algorithm with the same labels for the input and output space, we can make an inference about where we expect the output of a sample to be clustered, based on which cluster the sample lands in in the input space. If the sample's output does not land in the output cluster predicted by the input cluster, we flag it as suspicious. For now, suspicious samples are classified simply by returning the classification given by the clustering algorithm rather than the one given by the model. Any sample not flagged as suspicious is classified by the model provided. The intuition behind this method is that the objective of an adversarial attack is to change the image pixels in such a way as to cause the underlying model to misclassify the image, with minimal changes to the pixels themselves. Therefore, we assume that adversarial images will have relatively little difference between themselves and natural/clean images, whereas the difference in model output will necessarily be at least large enough to cause the image to be misclassified.

In this repo

The main testing and visualization of the Adversarial Sample Detection and Classification model is done in the Adversarial_Sample_Detection_with_Clustering jupyter notebook. There are many other notebooks that have previously been used to create visualizations that helped motivate our solution, along with some other ideas we've looked at. They can largely be ignored for now and will likely be removed later. There is also a LayerwiseClustering class which doesn't fully function as a classifer right now, but was being used to create clusters at all major layers in the provided model. We were using this to visualize clusters at each layer, and it could theoretically be modified so that our classifier looks at more than just the input and output layers. The visualization of each layer can be seen in the View_Sample_At_Each_Layer jupyter notebook. model.py defines a small VGG network that we use for testing, and train_model.py is a script that we use for training and saving a model. train_model.py trains both a base network to use as the underlying model for the clustering adversarial classifier, and an adversarially trained version of the same model to use as a baseline of comparison with existing adversarial sample classification methods.

adversarial's People

Contributors

Watchers

adversarial's Issues

Using cleverhans to generate adversarial attacks

Now that we have the functionality to create the k-means clusters, we need to do train a network, which can give upto a 99% accuracy.

Few things that need to worked on:

Change the colors for each class.
Change the color for each cluster centroid.
Need the legend for each class.

Once that is done, we have to create adversarial attacks. We can use cleverhans to do this ! (Personally, have limited knowledge of this.)

Exploring k-means class-wise

We need to find out what the k-means cluster is for each class.

So this is what needs to be done:

Apply the k-means clustering algorithm to each class instead of the entire dataset. This means:

-We need the clusters for each class with their own batch size. So we would need to iterate through each class (0-9), cluster them (You may vary the k value for each cluster and see how it looks.).
-Plot those images and see what these cluster looks like.
-We want to do the same thing for their labels.

We will discuss more as we explore this above.

Create TSNE points and create the distribution

Do this for one class.

Create the TSNE map for one class of images
Create the adversarial attack.
Plot the TSNE map of the adversarial attack on the plot
Plot also the map of labels of the images (form a network)

The main idea is to see how far the adversarial attack is from the cluster

Understanding the results

Here are the next steps to be followed:

Identify which images have been classified correctly by the clustering algorithm and which images have been classified by the cluster.
I have coded up the dct transform from scratch (I think this one is correct). You should find this under the dct_transform_implementation.ipynb book. (Yes it is very slow. As a side task, you can try to improve the speed). Use this to compute the dct transform of each image and notice the difference between the scipy dct transform vs the implementation.

Using the SVMs to separate the clusters effectively

The idea is to use an SVM to separate the clusters from each other.

What needs to be done:

Use the SVM to find the plane of separation between a class and the rest of the classes.
Once the separation is done, plot all the points on a TSNE plot.
Then create the adversarial examples.
Replot the sample on the TSNE plot
See where the example lies with respect to the true image cluster

New process

Follow sequentially:
Steps for both Baseline and cluster
Step 1: For both: Train on clean images (not separately) (use script and save weights)
Step 2: For adversarial examples: Create using the 'clean' trained model
Step 3: For baseline: Train on adversarial images created in step 2 (Only using training) (only 1 epsilon)
Step 4: For baseline: Test on adversarial images created in step 2 (Testing samples created from same epsilon)
Step 5: For baseline: Test on adversarial images created in step 2 (Testing samples created from different epsilon)
Step 6: For cluster: Create DCT examples from step 2 and clean set
Step 7: For cluster: Perform clustering on DCT examples
Step 8: For cluster: Test the cluster on DCT examples of the adversarial test set. (same epsilon)
Step 9: For cluster: Test the cluster on DCT examples of the adversarial test set. (different epsilon)

AFTER: TSNE! and PLOT!

Exploration of k-means

We need to implement a Median clustering algorithm. For now,

Run the k-means clustering program (clustering.py)
Find out what the clusters look like (plot them and see how they look)

I will add more issues as we proceed with this.

aranganath / adversarial Goto Github PK

adversarial's Introduction

Adversarial Sample Detection and Classification with Clustering

In this repo

adversarial's People

Contributors

Watchers

adversarial's Issues

Using cleverhans to generate adversarial attacks

Exploring k-means class-wise

Create TSNE points and create the distribution

Understanding the results

Using the SVMs to separate the clusters effectively

New process

Exploration of k-means

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent