Customer Purchase Behavior Prediction

Project Development Journal

`Problem Statement`

We will create a customer purchase behavior prediction system using the following dataset performing data preprocessing, feature engineering and choosing a proper ensemble model with the training and evaluation.

`Dataset Description`

To work on this project, I have chosen TATA: Online Retail Dataset. The dataset contains the same file in 2 different formats csv and xlsx. But I will proceed with the .csv file. It contains 541909 rows and 8 columns. The description and purpose of the each column in the dataset are given as follows: -

InvoiceNo: Unique id for the order billing.
StockCode: A code for the product inventory.
Description: Title of the product.
Quantity: The number of the particular products ordered.
InvoiceDate: Ordering date.
UnitPrice: Individual price for the product.
CustomerID: Unique id for the ordering customer.
Country: Residing country of the customer.

`Assumption`

As we have to predict the customer purhase behavior based on the given features, so, need to find out the relevant customer characteristics while from the dataset.

`Data Pre-processing`

Missing value Handling: - We got NaN values in the 'CustomerID'. As 'CustomerID' is the unique identifier for each customer, so, having NaN in this column won't let us know the customer. Therefore those rows are deleted.
Column Dtype Formation: - The 'CustomerID' was in float64 format. So, I convert it to int format as, it's an unique identifier.
Feature Engineering: - We got 4372 unique ids for the customers. So, we have to find these customers characteristics in this dataset. There were a very few features relatable to customers such as, CustomerID, Country. The other features such as StockCode, Description and some others relate to product rather than customers. So, we have to create some new features from these.
- Revenue_given: Each customer bought different number of products for the unit price. So, we calculated a new column 'Revenue_given' that tells how much the customer has spent ordering products.
- Frequency: How frequently a customer bought products from the company.
- Recency: How many days have been past since last buy for a customer.
- United Kingdom Or Not: We saw that we have the most number of customers from UK, and other values are far away from those values. So, we extracted a new feature from here.
Creating some new features relating to the customers we created a new dataframe dropping other columns.
Handling outliers: Outliers existed in the numeric columns & those were handled.

FileName	Extension	Rows	Columns
Online_Retail_Data_Set	csv	541909	8
final_dataset	csv	3616	5

`EDA & Data Visualization`

At first I tried to plot the characteristics extracted from each of the customer. But looking at the relationships between those variables for individual customer doesn't show any characteristics that can differentiate from others or a group of people.

We plotted one variable vs other variables relationship using pairplots. Even with differentiating the datapoints based on UK residant or not, the datapoints looks crowded. The normal go throgh can't define a boundary line in between these to cluster the data. We can't see any kind of noticable relationships such as, linear or exponential relation between the variables. So, we have to extract features using featured extraction techniques.

`Dimensionality Reduction`

I used PCA to extract features into lower dimension and extracted 2 features. Then labeling these features as X1 and X2, I plot the points. But unfortunately my normal go through is not able to find any differences to differentiate the datapoints.

`Model Selection`

As the dataset is unsupervised, so, there was no labeling with the data. So, we chose to implement clustering and cluster the same behavioral in the same segment.

`Dataset Splitting`

We can split our dataset into train and test split. As we are working with the unsupervised dataset, then while evaluating the model performance with the test set, we can't ensure it's clustering accuracy because we don't know the ground truth label or which cluster each sample belongs to. That's why we will be using the whole dataset as the training set and finally, evaluate the model performance using unsupervised performance metrics.

`Ensemble Model Implementation`

We choose to implement an ensemble algorithm from scratch using 2 individual clustering algorithms. Then with the help of maximum voting, we decided the final value.

K-Means
K-Medoids

The criterion for the individual algorithms were till convergence. It means both the algorithms inside the ensemble will run till their centroids and medoids are found respectively.

`Experimentation and Evaluation`

We will experiment with the own implemented model and the dataset. We will try to cluster our dataset into different number of groups. To evaluate, which group shows good clustering, we will use "Silhoutte Co-efficient" as the evaluation metrics. The value close to +1 means well-seperated clusters, value close to 0 means overlapping clusters with no clear seperations and -1 means the most overlapping clusters.

Cluster Numbers	Silhoutte Coefficient
2	0.67909
3	0.54980
4	0.48623

As we increase the number of clusters, our silhoutte coefficient gets far from +1. So, we will proceed with 2 clusters, as it gives the highest silhoutte sccore.

`Post Clustering Analysis`

Now, if we look at the scatter plot with the features after applying PCA, then the datapoints are clearly placed in different regions carrying their respectable clustering characteristics.

After clustering the datapoints we can see a clear differentiation between the datapoints even in the features before extraction. Looking at the "Revenue_given vs Frequency" and "Revenue_given vs Recency" gives us a clear differentiation. On the contrary, "Frequency vs Recency" scatter plot is not clustering differentiable. Although the categorical variable "United Kingdom Or Not" is present both type of clusters and to me, it doesn't show any significant differentiable characteristics.

`Observation`

The customers sharing same types of purchasing behavior falls within the same clusters.
We can say the customers are mainly differentiated based on the revenue contributed to the company.
Other features, frequency and recency are all scattered within all types of clusters.
We think, our newly extracted feature whether the customer is from UK or not doesn't make any noticable impact.
The dimensionality reduction technique PCA definitly merged the exisiting features and created new contributing ones to the model training properly.

`Model Deployment`

I deployed the clustering model using huggingface. Here you can just input the required features and cluster the customers. Check out the deployment here.

neloy-barman / customer-purchase-behavior-prediction- Goto Github PK