Coder Social home page Coder Social logo

mmsaki / suspicious_transactions Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 6.27 MB

Applying SQL skills to analyze historical credit card transactions and consumption patterns in order to identify possible fraudulent transactions.

Jupyter Notebook 100.00%
postgresql python sql plotly query entity-relationship-diagram transaction-analysis transaction-data suspicious-transactions

suspicious_transactions's Introduction

Suspicious Transactions

Table of Contents

  1. Data Modeling
  2. Part 1: SQL
  3. Part 2: Visual Analysis
  4. Results

Data Modeling

  • Create an entity relationship diagram (ERD) by inspecting the provided CSV files.

    Data Model

Part 1: SQL

Some fraudsters hack a credit card by making several small transactions (generally less than $2.00), which are typically ignored by cardholders.

  • How can you isolate (or group) the transactions of each cardholder?

    -- Part1 : Grouping transactions of each card returning card numbers only
    SELECT card
    FROM transaction 
    GROUP BY card;
    
    -- Part1 : Grouping by card and joining cardholder name, cardholder id and their total transaction amounts
    SELECT card_holder.name, credit_card.cardholder_id, transaction.card, sum(amount) as total_spent
    FROM transaction 
    LEFT JOIN credit_card
    on transaction.card = credit_card.card
    LEFT JOIN card_holder
    on credit_card.cardholder_id = card_holder.id
    GROUP BY transaction.card, credit_card.cardholder_id, card_holder.name;
  • Count the transactions that are less than $2.00 per cardholder.

    -- Part1 : Count transactions less than $2 per cardholder
    SELECT card, COUNT(amount) as tx_less_than_2
    FROM transaction
    WHERE amount < 2
    GROUP BY card
    ORDER BY tx_less_than_2 DESC;
    • Output: Small transactions count of less than $2

    • Question: Is there any evidence to suggest that a credit card has been hacked? Explain your rationale.

      • Answer:
        • Yes, Possily. With the assumption of the small transaction hack, cards with the highest count of transations less than $2 have possibly been hacked.
    • Take your investigation a step futher by considering the time period in which potentially fraudulent transactions are made.

      -- Part1 : Count transactions less than $2 per cardholder group by date
        SELECT date, card, COUNT(amount) as tx_less_than_2
        FROM transaction
        WHERE amount < 2
        GROUP BY card, date
        ORDER BY tx_less_than_2 DESC;
  • What are the top 100 highest transactions made between 7:00 am and 9:00 am?

    -- Part1 : Top 100 highest transactions made between 7:00am and 9:00am
    SELECT date, amount
    FROM transaction
    WHERE EXTRACT(HOUR FROM date) BETWEEN '07' AND '08'
    ORDER BY amount DESC
    FETCH FIRST 100 ROWS ONLY;
  • Question: Do you see any anomalous transactions that could be fraudulent

    • Answer:
      • No
  • Question: Is there a higher number of fraudulent transactions made during this time frame versus the rest of the day?

    • Answer:
      • No, the data does not show any correlation to higher number of fradulent transactions.
  • If you answered yes to the previous question, explain why you think there might be fraudulent transactions during this time frame.

  • What are the top 5 merchants prone to being hacked using small transactions?

    -- Part1 : Top 5 merchants prone to being hacked using small transactions
    SELECT id_merchant, count(amount) as number_of_small_tx
    FROM transaction
    WHERE amount < 2
    GROUP BY id_merchant
    ORDER BY count(amount) DESC
    LIMIT 5;

Part 2: Visual Analysis

The two most important customers of the firm may have been hacked. Verify if there are any fraudulent transactions in their history. For privacy reasons, you only know that their cardholder IDs are 2 and 18.

  • Using hvPlot, create a line plot representing the time series of transactions over the course of the year for each cardholder separately.

    • Cardholder 2 Cardholder 2 plot

    • Cardholder 18 Cardholder 18 plot

  • Next, to better compare their patterns, create a single line plot that contains both card holders' trend data.

    Combined cardholder plots

  • Question: What difference do you observe between the consumption patterns? Does the difference suggest a fraudulent transaction? Explain your rationale.

    • Answer:
      • Cardholder 2 has a small dollar amount spending habit while Cardholder 18 has a larger dollar amount spending habit. The difference in spending amounts can indicate fradulent activity on the card if the spending amount is significantly different from their normal spending. For Example, Cardholder 18 has transactions of less than $2, different from their usual larger dollar amount spendidng habit. This could suggest fradulent activity, however this data alone is inconclusive.
  • Using Plotly Express, create a box plot, representing the expenditure data from January 2018 to June 2018 for cardholder ID 25.

    Cardholder 25 January to June bar plot

  • Question: Are there any outliers for cardholder ID 25? How many outliers are there per month?

    • Answer:
      • Yes. There are multiple putliers at least once a month
  • Question: Do you notice any anomalies? Describe your observations and conclusions.

    • Answer:
      • Yes, there is anomalies. Cardholder 25 has extremely large dollar amount transactions and extremely low dollar amount transactions.

Results

suspicious_transactions's People

Contributors

mmsaki avatar

Stargazers

 avatar

Watchers

 avatar  avatar

suspicious_transactions's Issues

Fix link

Output link has been miss matched for transactions grouped by date.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.