Coder Social home page Coder Social logo

catch-the-phisher's Introduction

PhishingClassifier

Buisness Problem:

 Phishing is the fraudulent attempt to obtain sensitive data, such as usernames, passwords by disguising a fake website
 as a trustworthy website.Therefore identifying Phishing websites maximizes internet security.

Target Variable: Result

If Result = 1, Not a Phishing website
If Result = -1, Phishing website.

Data is having 30 predictor attributes and one response variable with binary classes. 0 out of 2456 records is missing for all columns.1362 records are of trust-worthy websites(majority class) and 1094 records are Phishing websites(Minority class) Resultcount

The datatype of predictors are taken as int64, Howerver the predictors are nominal. Therefore Quantitative analysis like correlation has no or less signficance.

Exploratory Data analysis

Since predictors are categorical, they are plotted as Bivariate countplots where class labels are represented as color of the bars.

features1

features2

Looking at the plots we can see that Features like SSL_Final_state, web_traffic,Domain_registeration_length, URL_of_anchor have a greater difference between the classes. And Almost all features are having atleast some difference between two classes. So these variances must be captured.

Chi_Squared test is used to find the relationship between categorical features by measuring their independence. We hypothesize H0:Feature is independent of Result as our null statement We choose our p-value level to 0.05, If the p-value test result is more than 0.05 we fail to reject the Null Hypothesis. This means, there is no relationship between the feature and Result feature based on the Chi-Square test of independence. For the features 'Statistical_report','Iframe','popUpWidnow','on_mouseover','Submitting_to_email','HTTPS_token','port','Favicon','having_At_Symbol' we fail to reject the Null hypothesis. So they are droped

finally Random forest and XGBoost algorithm works fine and used for further prediction

Accuracies

Random forest shows accuracy of 96%

XGBoost algorithm shows accuracy of 96.38%

Flask API testing

when user hits the http://127.0.0.1:5000/predict_api with input data (json format) in any API testing platforms like Postman , Thunderclient(inbuilt in vscode) it will jsonify the output and prints in console

example input data:


{


                             "having_IP_Address":-1, 

                             "URL_Length":1,

                             "Shortining_Service":1,

                             "double_slash_redirecting":-1,

                             "Prefix_Suffix":-1,

                             "having_Sub_Domain":-1,

                             "SSLfinal_State":-1,

                             "Domain_registeration_length":-1,

                             "Request_URL":1,

                             "URL_of_Anchor":-1,

                             "Links_in_tags":1,

                             "SFH":-1,

                             "Abnormal_URL":-1,

                             "Redirect":0,

                             "RightClick":1,

                             "age_of_domain":-1,

                             "DNSRecord":-1,

                             "web_traffic":-1,

                             "Page_Rank":-1,

                             "Google_Index":1,

                             "Links_pointing_to_page":1      

}

Output Screen

Resultcount

catch-the-phisher's People

Contributors

sarikamohan08 avatar

Watchers

 avatar

Forkers

nithish-dell

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.