Coder Social home page Coder Social logo

mlops-aws-insurance's Introduction

MLOPS MEDICAL INSURANCE COSTS PREDICTION ⚱️

aws

This is a personal MLOps project based on a Kaggle dataset for medical insurance costs prediction. It contains several AWS SageMaker pipelines from preprocessing till deployment, inference and monitoring.

Feel free to ⭐ and clone this repo 😉

Tech Stack

Visual Studio Code Jupyter Notebook Python Pandas NumPy Matplotlib scikit-learn Flask Anaconda Linux AWS Git

Project Structure

The project has been structured with the following folders and files:

  • .github/workflows: contains the CI/CD files (GitHub Actions)
  • aws_pipelines: AWS pipelines from preprocessing till deployment and monitoring
    • preprocessing_pipeline.py: data preprocessing
    • training_pipeline.py: model training
    • tuning_pipeline.py: model fine tuning
    • evaluate_pipeline.py: model evaluation
    • register_pipeline.py: model registry
    • cond_register_pipeline.py: model conditional registry (based on MAE Threshold)
    • deployment_pipeline.py: model automatic deployment
    • manual_deployment_pipeline.py: model manual deployment (requires manual approval on AWS)
    • inference_pipeline.py: model automatic deployment and endpoint creation
    • data_quality_pipeline.py: model registry with data quality baseline
    • model_quality_pipeline.py: model registry with data and model quality baseline
    • monitoring_pipeline.py: data and model monitor schedules creation
  • data: raw and clean data
  • Notebooks: Exploratory Data Analysis
  • src: code_scripts for processing, training, evaluation, serving (Flask), lambda, inference and endpoint testing
  • .env_sample: sample environmental variables
  • .flake8: flake requirements
  • .gitattributes: gitattributes
  • Makefile: install requirements, formating, testing, linting, coverage report and clean up
  • pyproject.toml: linting and formatting
  • requirements.txt: project requirements

Project Description

The dataset was obtained from Kaggle and contains 1338 rows and 7 columns to predict health insurance costs. To prepare the data for modelling, an Exploratory Data Analysis was conducted. For modeling, the categorical features where encoded, Tensorflow was use as model and the mean absolute error threshold was selected for model registry.

Project Set Up

The Python version used for this project is Python 3.10.

  1. Clone the repo (or download it as a zip file):

    git clone https://github.com/benitomartin/mlops-aws-insurance.git
  2. Create the virtual environment named main-env using Conda with Python version 3.10:

    conda create -n main-env python=3.10
    conda activate main-env
  3. Execute the Makefile script and install the project dependencies included in the requirements.txt:

    pip install -r requirements.txt
    
    or
    
    make install

Additionally, please note that an AWS Account, credentials, and proper policies with full access to SageMaker, S3, and Lambda are necessary for the projects to function correctly. Make sure to configure the appropriate credentials to interact with AWS services.

Pipeline Deployment

All pipelines where deployed on AWS SageMaker, as well as the Model Registry and Endpoints. At the end of each pipeline the is a line that must be uncommented to run it on AWS:

# Start the pipeline execution (if required)
evaluation_pipeline.start()

Additionally the experiments were tracked on Comel ML.

mlops-aws-insurance's People

Contributors

benitomartin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.