Coder Social home page Coder Social logo

heewon-hailey / cyberattack-analysis Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 0.0 91 KB

implement of ML-based anomaly detection models to identify cyberattacks from NetFlow data

Jupyter Notebook 100.00%
python numpy pandas scikit-learn pca seaborn isolationforest cyberattack netflow anomaly-detection

cyberattack-analysis's Introduction

Cyberattack Detecting Models

About the project

It aims to implement two unsupervised machine learning models, Cluster-based Local Outlier Factors and Isolation Forest, to identify malicious attacks.

Dataset

A collection of traffic flow (NetFlow) data is used. The dataset is collected from from 01/12/12 03:24:33 to 10/12/12 23:24:32 and contains botnet traffic and normal traffic. Training and test dataset for unsupervised models consist of 14 features; timestamp, duration, protocol, source IP, source port, direction, destination IP, destination port, state, source type service, destination type service, total packets, transmitted bytes from both sides and transmitted bytes from source. In addition to the 14 features, the other datasets include true labels.

Training Dataset Valid Dataset Test Dataset
13,882,035 940,062 1,053,845

Identified Attack Scenarios

Several abnormal traffic patterns were observed in the test dataset from Splunk. Firstly, each traffic of UDP and ICMP protocol take up 21% and 39% in the total traffic flow. In general, UDP volume contributes less than 9% while less than 1% of ICMP volume occurs [1]. Second, port 53 was used the most through UDP and its traffic significantly increased for a short time. Besides, the traffic volume via ICMP protocol soared regularly with a similar pattern over time. Lastly, there was frequent volume change via port 25 which is the most often used in TCP protocol. According to the observed pattern, the test data could include port scan attacks, ICMP flood attack in Distributed Denial of Service (DDoS) and spamming. Therefore, I focuse on these attacks to implement machine learning models and detect them.

Feature Generation Methods

No. Method Applied Techniques (in order)
1 A A1. PCA (optimal dimension = 3)
A2. Embedded Selection (Information Entropy)
2 A+B B. Aggregation by time-based clustering
A1. PCA (optimal dimension = 5)
A2. Embedded Selection(Information Entropy)
3 A+B+C B. Aggregation of features by time-based clustering
C. Pearson's correlation coefficient-based filtering
A1. PCA (optimal dimension = 4)
A2. Embedded Selection (Information Entropy)

*Standard Scaler and Normalizer are applied

Detecting Algorithms

Two well-known anomaly detection algorithms, Cluster-based Local Outlier Factor (CBLOF) and Isolation Forest (IF), are considered for classifiers.

No. Model Tuned Params
1 CBLOF n_clusrers = 8
contamination = 0.1
2 IF n_estimators = 100
max_samples = 0.25
contamination = 0.1

Total 6 combinated models (3 Feature Generation Methods * 2 Detecting Algorithms) are examined to compare detected attacks.

Version

Pandas 1.3.1
Numpy 1.19.5
Matplotlib 3.4.2

References

[1] D. Lee, B. E. Carpenter and N. Brownlee. (2010). Observations of UDP to TCP Ratio and Port Numbers. 2010 Fifth International Conference on Internet Monitoring and Protection, Barcelona, pp. 99-104, doi: 10.1109/ICIMP.2010.20.

cyberattack-analysis's People

Contributors

heewon-hailey avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.