Coder Social home page Coder Social logo

csc547-dockervm's Introduction

CSC547-DockerVM

This folder containers of 7 experiments to evaluate the overhead and scalability of Docker containers and to compare Docker container with VM, and all related data analysis script(e.g.,format_sar_output and R script files)

The first experiment is to test Docker scalability with stress tool. To do the experiment, you should have stress and docker preinstalled in the host OS.

The commands_stress file records related commands to run this experiment. Since the CPU cores on the host OS is 8, we will launch 8 Docker containers simultaneously.

To stress 4 CPU cores, use command “stress -c 4 &”. Then use the command for i in {1..8};do date>output_$i&&docker run sequenceiq/hadoop-docker:2.6.0 sh -c 'etc/bootstrap.sh&& date&& sleep 30 && date && /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teragen -Dmapreduce.map.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0" -Dyarn.app.mapreduce.am.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0" 1000 teragen_out_dir;date' >>output_$i & done;

to record the execution time of launching Docker container and starting Hadoop, the execution time of running MapReduce job Teragen.

We did 4 times of stress experiment, stressing 0,4,6,8 CPU cores so the the available CPU cores are 8,4,2,0. All recorded execution timestamp are in the subdirectories of “stress_docker”. nostress_start8 means stressing 0 CPU cores, cpu4_stress_start8 means stressing 4 CPU cores, cpu6_stress_start8 means stressing 6 CPU cores, and CPU8_stress_start8 means stressing 8 CPU cores.

The calculated execution time duration is listed in the spreadsheet strees_docker/stress_cpu_time.ods.

To note that stressing 8 CPU cores will prevent the execution of Teragen MapReduce job. So use the command for i in {1..8};do date>output_$i&&docker run sequenceiq/hadoop-docker:2.6.0 sh -c 'etc/bootstrap.sh&& date' >>output_$i & done; would avoid executing MapReduce job.

stress_cpu.R use the execution time to do the exponential regression and to plot figure stress_docker/stress_cpu.png.

The left 6 experiments use “sar” to evaluate the overhead. Specifically, the folder output_empty is to evaluate the overhead of “sar” tool; output_launch_docker is to evaluate the overhead of launching a Docker container; output_launch_dockerhadoop is to to evaluate the overhead of launching a Docker container and starting Hadoop inside the Docker container; output_launch_dockerhadoopprogram is to evaluate the overhead of launching a Docker container, starting Hadoop and running a MapReduce job Teragen; output_launch_vm is to evaluate the overhead of launching a VM; output_lauch_vmhadoop is to evaluate the overhead of launching a VM and starting Hadoop inside the VM. Because of lacking of enough available resources, running the MapReduce job Teragen fails in the VM.

To use “sar”, sysstat package has to be first installed. Here are commands to install sysstat.

For Ubuntu: sudo apt-get install sysstat For Centos: yum install sysstat.

“war -u” is to collect CPU Usages of all cpus; “sar -r” to collect memory statistics; “war -b” to collect I/O statistics; “sar -n DEVICENAME” to collect network statistics.

Our experiments collected CPU, Memory, IO and Network statistics, and use the interval of 2 seconds. Thus the command is like “sar -urb -n DEV 2 NumberOfTimes”. NumberOfTimes is an Integer specifying how many times you want to collect the metrics.

The times of collecting metrics for output_empty, output_launch_docker, output_launch_dockerhadoop, output_launch_dockerhadoopprogram, output_launch_vm, output_lauch_vmhadoop are 15, 30, 30, 90, 30,60.

For each experiment, we did 3 times, so there are 3 files of the collected data are put under each folder accordingly. We use the awk script “format_sar_output.awk” to separate various metrics into different files. The command tis “./format_sar_output.awk sar_metric_filename”. It will generate the files with the name pattern “sar_metric_filename.”+ “cpu,men,disk,network device” +”metric_name”+”.dat”. These .dat files are also under the folder of each experiment.

parse_output_output_empty.R, parse_output_output_launch_docker.R, parse_output_output_launch_dockerhadoop.R, parse_output_output_launch_dockerhadoopprogram.R, parse_output_output_launch_vm.R, parse_output_output_launch_vmhadoop.R are the R scripts to read the metric values from the .dat files and to plot figures for each metrics. The plotted figures are the .png files under each folder.

compare_docker_vm.R and compare_docker_vm_hadoop.R are the R scripts to compare the overhead of VM to Docker container. These two scripts use variables created in the parse*.R scripts and should only be executed after all the parse*.R have been executed. docker_vm.png and docker_vm_hadoop.png are the generated two figures.

Not all the metrics are impacted by Docker or VM. The selected_png folder selected the figures that are most interesting.

csc547-dockervm's People

Contributors

wangpeipei90 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.