Coder Social home page Coder Social logo

rcwylie / fraudulent_payments Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 71.46 MB

This project focuses on card fraud detection within financial data. It works through data preprocessing, four model options (Logistic Regression, Random Forest, Gradient Boosting, Neural Networks), and performance assessment metrics to help determine the best model for the job. Hyperparameter tuning and visualizations are included.

Jupyter Notebook 100.00%
ai fraud-detection ml neural-network random-forest

fraudulent_payments's Introduction

Fraudulent_payments

Overview Aim of the project was to build a number of models and select the best performing one for the job. Managed to build a model with an AUC of 0.99 and accuracy of 99.4% which was deemed a success. The primary stages of this script are as follows:

Data Cleaning and Pre-processing: The script starts by cleaning and pre-processing the dataset to ensure data quality and consistency.

Feature Selection: Feature selection is performed using the Mutual Information method. This step aims to identify the most relevant features for the classification task.

Model Comparison: The script evaluates and compares the performance of different classification models. The following models are included:

  • Logistic Regression
  • Random Forest
  • XGBoost
  • Artificial Neural Network (ANN)

Details Data Cleaning and Pre-processing The initial phase involves data cleaning and pre-processing to handle missing values, outliers, and ensure data consistency. It's essential to have a clean dataset as a foundation for accurate model building.

Feature Selection with Mutual Information Mutual Information is employed to assess the importance of each feature concerning the target variable. Features with higher mutual information are considered more relevant and are retained, while less informative features are discarded.

Model Comparison The script proceeds to build and evaluate four distinct classification models, each with its strengths and characteristics. These models are benchmarked and compared using Area Under Curve (AUC) to determine which one performs best for the specific classification task.

Data used: https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022/

References:

  1. Banking error rate - https://assets.teradata.com/resourceCenter/downloads/CaseStudies/CaseStudy_EB9821_Danske_Bank_Fights_Fraud.pdf 2.Fraud Classification Priniciples Fraud Detection Methods -https://journalofbigdata.springeropen.com/articles/10.1186/s40537-022-00573-8 Fraud Detection Methods -https://www.kaggle.com/code/juanjosmorenogiraldo/bank-fraud-detection-using-gbm#3-%7C-Data-Preprocessing Fraud Detection Methods - https://trenton3983.github.io/files/projects/2019-07-19_fraud_detection_python/2019-07-19_fraud_detection_python.html

  2. Feature Selection Information Gain Method - https://jovian.com/poduguvenu/feature-selection-using-information-gain

  3. Modelling Methods Random Forest - https://www.kaggle.com/code/hassanamin/credit-card-fraud-detection-using-random-forest#Using-Scikit-learn-to-split-data-into-training-and-testing-sets XGBoost and feature selection - https://domino.ai/blog/credit-card-fraud-detection-using-xgboost-smote-and-threshold-moving ANN Node Optimization - https://www.analyticsvidhya.com/blog/2021/09/a-comprehensive-guide-on-neural-networks-performance-optimization/

fraudulent_payments's People

Contributors

rcwylie avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.