Coder Social home page Coder Social logo

kunle-xy / cd0583-model-scoring-and-drift-using-evidently Goto Github PK

View Code? Open in Web Editor NEW

This project forked from udacity/cd0583-model-scoring-and-drift-using-evidently

0.0 0.0 0.0 2.91 MB

Monitoring machine learning models in production using Evidently.

License: MIT License

Python 98.99% HTML 0.02% Procfile 0.99%

cd0583-model-scoring-and-drift-using-evidently's Introduction

Monitoring machine learning models in production

In this tutorial, we will learn how to monitor machine learning models in production using an open-source framework called Evidently. In their own words - "Evidently helps analyze and track data and ML model quality throughout the model lifecycle. You can think of it as an evaluation layer that fits into the existing ML stack."

Evidently helps in generating:

  1. Interactive visual reports - Evidently has the ability to generate interactive dashboards (.html files) from the pandas dataframe or the .csv files. In general, 7 pre-built reports are available.
  2. Data and ML model profiling - JSON profiles that can be integrated with tools like Mlflow and Airflow.
  3. Real time monitoring - Evidently's monitors collect data and model metrics from a deployed ML service. This functionality can be used to build live dashboards.

Checkout this README to learn more about Evidently.

Model Scoring and Model Drift

We will be using the UCI Bike Sharing Dataset and work with a Regression model deployed on Heroku. Our focus is going to be on data drift which is a type of model drift. Evidently will help us with the following

  1. Model Quality - Evaluate model quality using performance metrics and track when/where the model fails.
  2. Data Drift - Run statistical tests to compare the input feature distribution and visualize the data drift (if any).
  3. Target Drift - Assess how model predictions and target behavior change over time.
  4. Data Quality - Get data health and dig deeper into feature exploration.

How to check drift within a production model?

Prerequisites -

  1. Ensure that you have a Github and Heroku account.
  2. Clone this repository -
git clone https://github.com/udacity/cd0583-model-scoring-and-drift-using-evidently.git

Repository Structure

Directory Tree

  • static folder: contains the .html files generated in Heroku.
  • Procfile: to get the necessary commands working in Heroku.
  • main.py: Python file containing the code to train the model and check the drift.
  • requirements.txt: contains the libraries to be installed in Heroku
  • runtime.txt: contains the python runtime to be installed in Heroku

The main.py file works on monitoring bike demand data. It involves the following steps:

  1. Read the data into a pandas dataframe.
  2. Build and train a regression model.
  3. Use Evidently to evaluate model performance. To that end we first implement column mapping as shown below:
    column_mapping = ColumnMapping()
    
    column_mapping.target = target
    column_mapping.prediction = prediction
    column_mapping.numerical_features = numerical_features
    column_mapping.categorical_features = categorical_features
  4. The following code helps in getting the model performance and building a dashboard.
    regression_perfomance_dashboard = Dashboard(tabs=[RegressionPerformanceTab()])
    regression_perfomance_dashboard.calculate(reference, 
                                            None,  
                                            column_mapping=column_mapping)
    regression_perfomance_dashboard.save("./static/index.html")

    Note: We have implemented the code to evaluate performance for Week 1, 2, and 3.

  5. Data Drift is calculated using the code below:
    column_mapping = ColumnMapping()
    
    column_mapping.numerical_features = numerical_features
    
    data_drift_dashboard = Dashboard(tabs=[DataDriftTab()])
    data_drift_dashboard.calculate(reference, 
                                    current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'], 
                                    column_mapping=column_mapping)
    
    data_drift_dashboard.save("./static/data_drift_dashboard_after_week1.html")

    Note: We have implemented the code to calculate data drift for Week 1 and 2.

Deploying on Heroku

Follow the steps mentioned below:

  1. Login to your Heroku account and Create a new app. Create a new app in Heroku

  2. Choose a unique App name, leave region as United States, and click on Create App. Create app

  3. In the Deployment method section, select Github - Connect to GitHub. In the Connect to GitHub section, search and connect to your repository.

    Note: This should be the forked repository, and NOT the original Udacity repository for this tutorial.

    Connect to Github

  4. In the Automatic deploys section, select main as the branch you want to deploy (unless you have created some other branch), and click on Enable Automatic Deploys. Finally, click on Deploy Branch. Deploy Branch

  5. Click on open app or navigate to name-of-your-app.herokuapp.com to see the Regression Model Performance Report.

    Note: Replace name-of-your-app with the name you set in Step 1.

    Regression Performance Report

  6. To see the data drift dashboards generated by Evidently, navigate to name-of-your-app.herokuapp.com/target_drift_after_week1.html. Data Drift Week 1

    Replace week1 with week2 in the url above to see data drift after week 2. Data Drift Week 2

cd0583-model-scoring-and-drift-using-evidently's People

Contributors

abhiojha8 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.