Coder Social home page Coder Social logo

masterthesis's Introduction

Evaluation of Batch Workload Characterization Techniques for Performance Modeling of Distributed Data Processing Systems

This repository contains all materials and code related to my master's thesis, which focuses on workload similarity evaluation from big data processing platforms such as Apache Hadoop or Apache Spark. It uses Docker, Ansible and Terraform to ensure a consistent and reproducible development environment.

Key Components

  • Ansible Scripts: Script for the HiBench deployment, configuration and workload submits. These scripts can be found in the hibench folder.
  • Deployment: Contains Terraform scripts for setting up the HDInsight Azure computing cluster. These resources are located in the deployment folder.
  • HiBench Config: A benchmarking suite for big data applications, including configuration files and scripts to run various benchmarks. These configurations are available in the hibench/conf folder.
  • Systematic Literature Review Results & Layer Analysis: Layerdata, Jupyter notebooks and scripts for analyzing the layers used in performance models. These analyses are contained in the layer-analysis folder.
  • Similarity Metrics: Scripts and notebooks for calculating similarity metrics between workloads. These resources are found in the hibench/results folder.
  • Experiments Results: Scripts and data for analyzing benchmark results. The relevant files are located in the hibench/results folder.

Prerequisites

Ensure you have Docker Desktop 4.13.0 or later installed on your system.

Getting Started

Setting Up the Development Environment

  1. Create and Open Docker Development Environment

Click on this link, to create the dev environment and open the container shell with you ide

Inside the IDE Terminal:

Authenticate with Azure using the command below:

    az login

Specify the Azure subscription to use by replacing [SUBSCRIPTION ID] with your actual subscription ID:

    az account set --subscription [SUBSCRIPTION ID]

Verify the currently selected Azure subscription:

    az account list --query "[?isDefault]"

Initialize Terraform with:

    make init

Deploying the Cluster

To deploy the cluster, execute the following command:

    make deploy

To destroy the cluster and clean up resources, use:

    make destroy

Makefile Commands

  • init: Initialize the Terraform configuration.
  • deploy: Deploy the infrastructure using Terraform.
  • destroy: Destroy the Terraform-managed infrastructure.
  • output: Retrieve the details of the HDInsight cluster from Terraform output.
  • destroy-force: Force delete the Azure resource group and clean up Terraform state files.
  • 4: Resize the HDInsight cluster to 4 worker nodes.
  • 8: Resize the HDInsight cluster to 8 worker nodes.
  • 12: Resize the HDInsight cluster to 12 worker nodes.
  • submit: Run a MapReduce job in the cluster.
  • setup: Setup HiBench environment using Ansible.
  • ping: Ping all nodes in the HiBench inventory.
  • ssh: Establish an SSH connection to the endpoint.
  • ssh-wn0: Establish an SSH connection to the worker node 0.
  • upload: Upload data to the remote server via SCP.

Acknowledgments

Special thanks to my supervisor and all those who supported me throughout my Master's journey.

masterthesis's People

Watchers

Alexander Guttenberger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.