Coder Social home page Coder Social logo

ajnavneet / gaussiantimeseries_mlops_aws_deployment Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 575 KB

Gaussian Time Series model and MLOps pipeline using the AWS to deploy the model in a production environment.

License: MIT License

Dockerfile 0.08% Python 1.89% Shell 0.11% Jupyter Notebook 97.91%
aws-deployment call-center-analytics gaussian-processes mlops-environment time-series

gaussiantimeseries_mlops_aws_deployment's Introduction

Time Series Gaussian Model and AWS Deployment

Business Objective

A time series is a sequence of data points ordered in time. Typically, time is the independent variable, and the primary goal is to make future forecasts. Time series data has various applications in everyday activities, such as:

  • Tracking daily, hourly, or weekly weather data
  • Monitoring changes in application performance
  • Visualizing real-time vitals in medical devices

Gaussian Processes are a generalization of the Gaussian probability distribution, serving as the foundation for sophisticated non-parametric machine learning algorithms for classification and regression. Gaussian probability distribution functions describe the distribution of random variables, while Gaussian processes capture properties of functions, including their parameters.

Gaussian processes can be employed as a machine learning algorithm for classification predictive modeling.

Deployment involves integrating a machine learning model into an existing production environment for making practical business decisions based on data. MLOps (Machine Learning Operations) is a framework for continuous delivery and deployment of machine learning models. It emphasizes automation and monitoring at all stages of ML system construction, including integration, testing, releasing, deployment, and infrastructure management.

In this project, we aim to create an MLOps project for the time series Gaussian model using Python on the AWS cloud platform (Amazon Web Services) with a focus on cost optimization.


Data Description

The dataset is "Call-centers" data, organized at a monthly level, where calls are categorized by domain as the call center operates for various domains. The dataset also includes external regressors like the number of channels and phone lines, which indicate traffic predictions by in-house analysts and available resources.

The dataset contains 132 rows and 8 columns, including:

  • Month
  • Healthcare
  • Telecom
  • Banking
  • Technology
  • Insurance
  • Number of Phone Lines
  • Number of Channels

Aim

  • Build a Gaussian model using the provided dataset.
  • Create an MLOps pipeline using the Amazon Web Services (AWS) platform to deploy the time series Gaussian model in a production environment.

Tech Stack

  • Language: Python
  • Libraries: Flask, pickle, pandas, numpy, matplotlib, seaborn, scikit-learn, scipy
  • Services: Flask, AWS, Docker, Lightsail, EC2

Approach

Data Preparation and Analysis

  1. Import Libraries and Load the Dataset
  2. Descriptive Analysis
  3. Data Pre-processing
    • Convert dates to numerical format
    • Set date as an index

Exploratory Data Analysis (EDA)

  1. Exploratory Data Analysis (EDA)

    • Data Visualization
  2. Check for Normality

    • Density plots
    • QQ-plots

Gaussian Process Model

  1. Gaussian Process Modeling

    • Initialize kernels
    • Perform train-test split
    • Create a Gaussian process regressor model
    • Fit the model
    • Generate predictions
    • Visualize results
  2. Difference Modeling

    • Create a residual column (difference)
    • Check for normality
    • Perform train-test split
    • Initialize kernel
    • Create a Gaussian model
    • Fit the model
    • Generate predictions on test data
    • Visualize the results

Model Deployment

  1. Model Creation

    • Save the model in pickle format (.pkl)
  2. Flask Application

    • Create a Flask app
  3. EC2 Machine Setup

    • Create an instance on the AWS Management Console
    • Launch the instance
    • Install the 'Putty' tool for remote access
  4. EC2 and Docker Setup

    • Follow the instructions in the 'install-docker.sh' file
  5. AWS CLI Installation

    • Refer to the steps in the 'install-aws-cli.sh' file
  6. Lightsail Installation

    • Follow the steps in the 'install-lightsail-cli.sh' file
  7. Upload Files to EC2 Machine

    • Method 1:
      • Upload the code file in zip format via AWS Console (Cloud Shell)
    • Method 2:
      • Create an S3 storage bucket
      • Copy the object URL and use it on the EC2 machine to download the code
      • Unzip the Bitbucket folder
  8. Deployment

    • Follow the deployment instructions in 'lightsail-deployment.md'

Project Structure

  • Input: CallCenterData.xlsx
  • MLPipeline: Contains functions organized into different Python files
  • Notebook: IPython notebook for the time series Gaussian model
  • Output: Gaussian model saved in a pickle format
  • App.py: Flask app configuration
  • Dockerfile: Docker image configuration
  • Engine.py: File that calls functions from MLPipeline
  • install-aws-cli.sh: Steps for AWS CLI installation
  • install-docker.sh: Steps for Docker installation
  • install-lightsail-cli.sh: Steps for Lightsail installation
  • lightsail-deployment.md: Readme file with Lightsail deployment instructions
  • requirements.txt: List of essential libraries with their versions

gaussiantimeseries_mlops_aws_deployment's People

Contributors

ajnavneet avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.