Basics MLE Module Homework

This is a project for Basics MLE module of a course. All .py scripts were tested on MacOS, if there are any performance issues on different OS please let me know.
Project does not require additional setup and can be run as is when cloned.
Upon completion of intermediate steps script returns small and (sometimes) informative log.
As a result you will recieve .csv file with predictions and .pth model file of the latest trained model.

Project Structure

epam_hometask
├── data                      # Data files used for training and inference, file containing complete dataset
│   ├── raw_data.csv
│   ├── inference_iris.csv
│   └── train_iris.csv
├── data_prep                # Scripts used for data uploading and splitting into training and inference parts
│   ├── data_prep.py
│   └── __init__.py           
├── inference                 # Scripts and Dockerfiles used for inference
│   ├── Dockerfile
│   ├── inference.py
│   └── __init__.py
├── models                    # Folder where trained models are stored
│   └── various model files
├── training                  # Scripts and Dockerfiles used for training
│   ├── Dockerfile
│   ├── train.py
│   └── __init__.py
├── results                    # Folder where final model and results are stored
│   ├── Outputs.csv
│   └── model files
├── utils.py                  # Utility functions and classes that are used in scripts
├── requirements.txt          # All requirements for the project
├── settings.json             # All configurable parameters and settings
└── README.md

How to run

Training

To run training you should first creare image for training.py To create image run:

docker build -t train_img -f training/Dockerfile .

And then run this command to execute training

docker run -it train_img /bin/bash

Doing the following will create docker image with copied data for training and output trained model.

Inference

To run inference you should first creare image for inference.py To create image run:

docker build -t inference_img -f inference/Dockerfile .

And then run this command to execute inference

docker run -it inference_img /bin/bash

Doing the following will create docker image with copied data for training and output trained model. Script will also run automatically with creation of a docker container.

Alternatively you can simply run python scripts to ensure that everything works as intended. These scripts should be run in order, demonstrated below to successfully build the model and not return any errors:

Run data_prep
Run training.py
Run inference.py

Succsessful run of data_prep is indicated by creating data directory and 3 files inside of it;
Succsessful run of data_prep is indicated by creating models directory, model file, checkpoint file and decoder file inside of it
Succsessful run of inference is indicated by creating results directory, outputs file inside of it

Information about each script

Data prep

Running data_prep.py script performs the following:

Downloads data from the webpage;
Saves full dataset into data directory as a .csv file. If data directory does not exist, directory is created;
Splits dataset into training and inference parts according to test_size parameter in settings.json;
Saves training and inference dataset into data directory as a .csv files with names specified in settings.json;

Training

Running train.py script performs the following:

Training file from data is preprocessed for modelling:
- Target column is label encoded, decoder is saved in model directory for future use;
- Data is split into training and validation parts;
- Train and validation datasets are converted to Dataloaders;
Model is trained and validated on created dataloaders;
Model is saved in model directory;
Model checkpoint with best model performance is saved into models directories;
F1 score of best performing checkpoint is printed out;

Inference

Running inference.py script performs the following:

Inference file from data directory is preprocessed for predictions;
Model and checkpoint with best performance are loaded from model directory;
Inference data is passed into a model and outputs are saved in results directory as a .csv file;

mlosyakov / epam_hometask Goto Github PK

epam_hometask's Introduction

Basics MLE Module Homework

Project Structure

How to run

Training

Inference

Information about each script

Data prep

Training

Inference

epam_hometask's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent