This project aims to predict customer churn using machine learning techniques. Customer churn is the phenomenon where customers stop using a company's products or services. By predicting which customers are likely to churn, businesses can take proactive measures to retain them, thereby improving customer retention and profitability.
The objective of this project is to build a machine learning pipeline using ZenML that predicts whether a customer will churn based on various features such as account length, international plan, voice mail plan, call details, and other customer-related metrics. The pipeline includes data ingestion, data cleaning, model training, model evaluation, and deployment. The deployed model can be used to make real-time predictions through a Streamlit web application.
The purpose of this repository is to demonstrate how ZenML empowers us to build and deploy machine learning pipelines in multiple ways like:
- By integrating with tools like MLflow for deployment, tracking and more
- By allowing you to build and deploy your machine learning pipelines easily
Let's jump into the Python packages needed. Within the Python environment of your choice, run:
pip install -r requirements.txt
Starting with ZenML 0.20.0, ZenML comes bundled with a React-based dashboard. This dashboard allows us to observe the stacks, stack components and pipeline DAGs in a dashboard interface. To access this, you need to launch the ZenML Server and Dashboard locally, and you must install the optional dependencies for the ZenML server:
pip install "zenml["server"]"
zenml init
zenml up
Installing mlflow integrations using ZenML:
zenml integration install mlflow -y
Steps > ingest_data.py , clean_data.py , model_train.py , evaluation.py
data_cleaning.py > -DataPreprocess, -DataDivision
Building the model on Train & Test datasets.
src > evaluation.py - defining MSE , RMSE , R2 Score
ingest_data
: This step will ingest the data and create aDataFrame
.clean_data
: This step will clean the data and remove the unwanted columns.train_model
: This step will train the model and save the model using MLflow autologging.evaluation
: This step will evaluate the model and save the metrics -- using MLflow autologging -- into the artifact store.
The project can only be executed with a ZenML stack that has an MLflow experiment tracker and model deployer as a component. Configuring a new stack with the two components are as follows:
zenml integration install mlflow -y
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
zenml model-deployer register mlflow --flavor=mlflow
zenml stack register mlflow_stack -a default -o default -d mlflow -e mlflow_tracker --set
We have another pipeline, the deployment_pipeline.py
, that extends the training pipeline, and implements a continuous deployment workflow. It ingests and processes input data, trains a model and then (re)deploys the prediction server that serves the model if it meets our evaluation criteria. The criteria that we have chosen is a configurable threshold on the MSE of the training. The first four steps of the pipeline are the same as above, but we have added the following additional ones:
deployment_trigger
: The step checks whether the newly trained model meets the criteria set for deployment.model_deployer
: This step deploys the model as a service using MLflow (if deployment criteria is met).
In the deployment pipeline, ZenML's MLflow tracking integration is used for logging the hyperparameter values and the trained model itself and the model evaluation metrics -- as MLflow experiment tracking artifacts -- into the local MLflow backend. This pipeline also launches a local MLflow deployment server to serve the latest MLflow model if its accuracy is above a configured threshold.
The MLflow deployment server runs locally as a daemon process that will continue to run in the background after the example execution is complete. When a new pipeline is run which produces a model that passes the accuracy threshold validation, the pipeline automatically updates the currently running MLflow deployment server to serve the new model instead of the old one.
This inference pipeline allows you to use the deployed model to make predictions on new customer data in real-time.
Streamlit Application Setup:
-
Create a Streamlit application to serve the model : We design the UI to allow users to input features such as account length, international plan, voice mail plan, call details, and other metrics.
-
Loading the Deployed Model : Use ZenML's prediction service loader to load the deployed model. Integrate the model prediction functionality into the Streamlit app.
-
Making Predictions: The user inputs customer data through the Streamlit UI. The app sends the data to the deployed model, which returns the churn prediction. The prediction is displayed to the user in the app.