Coder Social home page Coder Social logo

epam_hometask's Introduction

Basics MLE Module Homework

This is a project for Basics MLE module of a course. All .py scripts were tested on MacOS, if there are any performance issues on different OS please let me know.
Project does not require additional setup and can be run as is when cloned.
Upon completion of intermediate steps script returns small and (sometimes) informative log.
As a result you will recieve .csv file with predictions and .pth model file of the latest trained model.

Project Structure

epam_hometask
├── data                      # Data files used for training and inference, file containing complete dataset
│   ├── raw_data.csv
│   ├── inference_iris.csv
│   └── train_iris.csv
├── data_prep                # Scripts used for data uploading and splitting into training and inference parts
│   ├── data_prep.py
│   └── __init__.py           
├── inference                 # Scripts and Dockerfiles used for inference
│   ├── Dockerfile
│   ├── inference.py
│   └── __init__.py
├── models                    # Folder where trained models are stored
│   └── various model files
├── training                  # Scripts and Dockerfiles used for training
│   ├── Dockerfile
│   ├── train.py
│   └── __init__.py
├── results                    # Folder where final model and results are stored
│   ├── Outputs.csv
│   └── model files
├── utils.py                  # Utility functions and classes that are used in scripts
├── requirements.txt          # All requirements for the project
├── settings.json             # All configurable parameters and settings
└── README.md

How to run

Training

To run training you should first creare image for training.py To create image run:

docker build -t train_img -f training/Dockerfile .

And then run this command to execute training

docker run -it train_img /bin/bash 

Doing the following will create docker image with copied data for training and output trained model.

Inference

To run inference you should first creare image for inference.py To create image run:

docker build -t inference_img -f inference/Dockerfile .

And then run this command to execute inference

docker run -it inference_img /bin/bash 

Doing the following will create docker image with copied data for training and output trained model. Script will also run automatically with creation of a docker container.

Alternatively you can simply run python scripts to ensure that everything works as intended. These scripts should be run in order, demonstrated below to successfully build the model and not return any errors:

  1. Run data_prep
  2. Run training.py
  3. Run inference.py

Succsessful run of data_prep is indicated by creating data directory and 3 files inside of it;
Succsessful run of data_prep is indicated by creating models directory, model file, checkpoint file and decoder file inside of it
Succsessful run of inference is indicated by creating results directory, outputs file inside of it

Information about each script

Data prep

Running data_prep.py script performs the following:

  1. Downloads data from the webpage;
  2. Saves full dataset into data directory as a .csv file. If data directory does not exist, directory is created;
  3. Splits dataset into training and inference parts according to test_size parameter in settings.json;
  4. Saves training and inference dataset into data directory as a .csv files with names specified in settings.json;

Training

Running train.py script performs the following:

  1. Training file from data is preprocessed for modelling:
    • Target column is label encoded, decoder is saved in model directory for future use;
    • Data is split into training and validation parts;
    • Train and validation datasets are converted to Dataloaders;
  2. Model is trained and validated on created dataloaders;
  3. Model is saved in model directory;
  4. Model checkpoint with best model performance is saved into models directories;
  5. F1 score of best performing checkpoint is printed out;

Inference

Running inference.py script performs the following:

  1. Inference file from data directory is preprocessed for predictions;
  2. Model and checkpoint with best performance are loaded from model directory;
  3. Inference data is passed into a model and outputs are saved in results directory as a .csv file;

epam_hometask's People

Contributors

mlosyakov avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.