Coder Social home page Coder Social logo

mdtanvirhossaintusher / credit-card-fraud-detection Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.99 MB

A model for binary classification of credit card data as fraudulent or legitimate

License: MIT License

Python 0.04% Jupyter Notebook 99.96%
binaryclassification decision-tree-classifier gradient-boosting machine-learning random-forest-classifier

credit-card-fraud-detection's Introduction

Credit-Card-Fraud-Detection

Overview

A binary classification model that can classify whether a credit card data is fraudulent or legit.

Data Collection

Data has already available here. The dataset contains transactions made by credit cards in September 2013 by European cardholders. Dataset presents transactions that occurred in two days, which contains 492 frauds transactions out of 284,807 transactions. The dataset is highly imbalanced.

Data Preprocessing

Initially datasets contains 1081 duplicate rows. After removing those it reduces to 283,726 observations, where positive (fraudulent) class contains only 0.001667% of data and negative class contains 0.998333% of data. To reduce imbalanced property, under and over sampling has done where over sampling performs well. On the other hand, under sampling perfomrs poorly.

Model Training

Datasets is trained using Decision Tree, Random Forest and Gradient Boosting classification models. Showing each models performance, their training time and lackings.

Result Analysis

In the table I showed the Precision, Recall, F1 score and accuracy for three models.

Model Precision Recall F1-score Accuracy
fraudulent Non-fraudulent fraudulent Non-fraudulent fraudulent Non-fraudulent Training Testing
Decision Tree 0.53 1.00 0.53 1.00 0.53 1.00 1.00 0.9993
Random Forest 0.94 1.00 0.53 1.00 0.68 1.00 1.00 0.9997
Gradient Boosting 0.03 1.00 0.80 0.99 0.07 0.99 0.9849 0.985

Most important feature for training Decision Tree, Random Forest and Gredient Boosting is V14.

Random Forest is fitting 2 folds for each of 12 candidates, totalling 24 fits. On the other hand, Gredient Boosting is fitting 2 folds for each of 9 candidates, totalling 18 fits.

Random Forest performed best for max depth = 30 and n estimators = 75 where n estimators was 25, 50, 75 and max depth was 10, 20, 30, 40. On the other hand, Gredient Boosting performed best for max depth = 4 and n estimators = 30 where n estimators was 20, 25, 30 and max depth was 2, 3, 4.

Mean Test Score of Gradient Boosting is lower than Random Forest. But Mean Fit Time of Gradient Boosting is higher than Random Forest.

Relation bewteen Estimators and Training Time

From the image we can see there is a Positive Correlation between Training Time and Estimators for Random Forest Classifier.

Positive Correlation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.