- Class: ECOMMERCE - IS334.N21.TMCL
- Lecturer: Mr. Do Duy Thanh
- This is my Ecommerce project, I build to only research and learn about Machine Learning. Understanding about How Machine Learning apply to ECommerce.
In today's highly competitive business landscape, retaining customers is crucial for the success of an organization. A customer churn prediction model using various machine learning algorithms including logistic regression, random forest, Decision Tree
-
Content Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
-
The data set includes information about:
- Customers who left within the last month – the column is called Churn
- Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
- Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
- Demographic info about customers – gender, age range, and if they have partners and dependents
-
Source: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
- Imbalanced data: In fact, the data we collect will often not have a balance between classes (stayed/churned/...). This is understandable because if the 'churn' is balanced with the 'stayed' class, the organization is on the verge of bankruptcy. The harm of data imbalance is very serious.
- Eg: Suppose we have 100 samples, 99 class 'non churn' and 1 class 'churn'. The model only needs to make predictions that are all 'non churn' to be 99% accurate, perfect. But in reality it is really useless.
- To solve this problem, We have many methods to solve. The simplest is that we will change the evaluation metrics, specifically we will no longer use the 'accuracy' metrics.
- Feature Featuring: The problem requires us to have the knowledge of an expert in the field we are solving. We need to understand the features of the data, which features will affect the desired output
- Model selection: there are many methods to solve this problem, Logistic Regression, K algorithms, classification algorithms, neural networks,...
Details of data preprocessing, training and evaluation of the model, I made a note in the notebook, please read it.
- Deep learning model notebook as extra (Deep neural is a complicated process. In reality, simple things is usually more effective), I have not understood it yet. I'm not sure it work correctly and effectively. Maybe it overfit and I do not see that :DD So I do not recommend to consult this one.
- In Ecommerce project, We requied to create a dashboard to visualize data. this is the URL to Dashboard project: https://github.com/HungPham2002/Dashboard_ECommerce