malawski / cloudworkflowsimulator Goto Github PK

License: Other

Java 40.36% TeX 38.42% Makefile 0.03% R 10.36% Shell 0.93% Python 6.68% Ruby 3.23%

cloudworkflowsimulator's Introduction

Quick start:

https://github.com/malawski/cloudworkflowsimulator/wiki/Running-simulations-with-CWS


To (re-)build the project run

    ant

in the project root directory. Similarly the unit and integration tests can be run with

    ant test

in the root directory.

All dependencies are included in the `lib` directory.

cloudworkflowsimulator's People

Contributors

Stargazers

Watchers

cloudworkflowsimulator's Issues

Add support for caching in GlobalStorageManager

I will add support for caching in GlobalStorageManager. The idea is as follows:

each VM will have its cache size in gigabytes (this is the amount of disk space which can be used for cache)
at first there will be implemented FIFO cache for input and output files.
next ideas for cache are to implement something more powerful (e.g. LRU) and add API for algorithms to instruct the cache (e.g. "Hi, you should cache this file because I'm going to use it").

@Mequrel - check & review

CacheManager doesn't send STORAGE_ALL_BEFORE_TRANSFERS_COMPLETED event

When all files are in the cache then CacheManager doesn't send STORAGE_ALL_BEFORE_TRANSFERS_COMPLETED.

Ensure that transfer time estimation is correctly handled

In some places we use task runtime estimation. Recently we've added transfer time estimations. Now we must ensure that in all places transfer time is included in task runtime.

Fix latency handling in the GlobalStorageManager

Task runtime prediction is broken because it doesn't take latency into account. Moreover, the DEFAULT_LATENCY is 1 which is way too big.

Validator of scheduling logs - all tasks were run

Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:

All tasks started and finished
All transfer tasks started and finished

Create very basic implementation of storage support

I will create first implementation of storage support as described in the design doc under:
https://docs.google.com/document/d/1OgbHgmZPTVXf5C-cYADFe3_HGL4PAf8X5DtrMUIZ8ww/edit?usp=sharing

Make experiment runner storage aware.

Make experiment runner storage aware. Now we do not use any storage in experiments.

Move jUnit tests from src/ to test/

There are many jUnit tests under src/ directory tree. They should be moved to test-only directory, e.g. test/.

The only concern I have regarding this ticket is whether those tests are real "tests" or on invocations of experiments. It is up to the assignee to determine this.

Create GlobalStorageParams class for storing parameters.

Create GlobalStorageParams class for storing parameters to enable reading them from file or setting as program's attributes. Or just to separate business code from boilerplate one.

Validator of scheduling logs - reexecution of corrupted tasks

Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:

All corrupted tasks were reexecuted
All corrupted transfer tasks were reexecuted

Make global storage transfers interactive

Now transfer times are computed at the beginning of the transfer, e.g.:
double transferTime = file.getSize() / params.getWriteSpeed();

I will change it to be more interactive. The files will be transferred by chunks. This will allow us to e.g. simulate congestions or change speeds on-the-fly

Reorganize Java packeges to make them more maintainable

We should reorganize Java packages to make them more readable and maintainable. In particular, package cws.core.* looks too big. There are many unrelated classes. I'd be nice to create a few new fine-grained packages and move some classes there.

NPE while running DPDS simulation

I've got NPE while running DPDS simulation with args:

-application GENOME -inputdir /home/piotr/studia/inz/dags -outputfile simulation_outs/simulation_out.csv -distribution pareto_unsorted -ensembleSize 20 -algorithm DPDS

application = GENOME
inputdir = /home/piotr/studia/inz/dags
outputfile = simulation_outs/simulation_out.csv
distribution = pareto_unsorted
ensembleSize = 20
scalingFactor = 1.000000
algorithm = DPDS
seed = 1368600708741
runtimeVariance = 0.000000
delay = 0.000000
failureRate = 0.000000
/home/piotr/studia/inz/dags/GENOME.n.50.2.dag
/home/piotr/studia/inz/dags/GENOME.n.100.1.dag
/home/piotr/studia/inz/dags/GENOME.n.100.5.dag
/home/piotr/studia/inz/dags/GENOME.n.100.7.dag
/home/piotr/studia/inz/dags/GENOME.n.200.0.dag
/home/piotr/studia/inz/dags/GENOME.n.50.8.dag
/home/piotr/studia/inz/dags/GENOME.n.50.0.dag
/home/piotr/studia/inz/dags/GENOME.n.100.4.dag
/home/piotr/studia/inz/dags/GENOME.n.100.6.dag
/home/piotr/studia/inz/dags/GENOME.n.50.4.dag
/home/piotr/studia/inz/dags/GENOME.n.100.0.dag
/home/piotr/studia/inz/dags/GENOME.n.50.6.dag
/home/piotr/studia/inz/dags/GENOME.n.50.1.dag
/home/piotr/studia/inz/dags/GENOME.n.200.1.dag
/home/piotr/studia/inz/dags/GENOME.n.50.3.dag
/home/piotr/studia/inz/dags/GENOME.n.50.5.dag
/home/piotr/studia/inz/dags/GENOME.n.50.7.dag
/home/piotr/studia/inz/dags/GENOME.n.100.2.dag
/home/piotr/studia/inz/dags/GENOME.n.100.3.dag
/home/piotr/studia/inz/dags/GENOME.n.400.0.dag
budget = 13.000000 995.000000 109.111111
deadline = 7350.000000 340414.000000 37007.111111

..Exception in thread "main" java.lang.NullPointerException
    at cws.core.scheduler.EnsembleDynamicScheduler$JobComparator.compare(EnsembleDynamicScheduler.java:32)
    at cws.core.scheduler.EnsembleDynamicScheduler$JobComparator.compare(EnsembleDynamicScheduler.java:1)
    at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:611)
    at java.util.PriorityQueue.siftUp(PriorityQueue.java:589)
    at java.util.PriorityQueue.offer(PriorityQueue.java:291)
    at java.util.PriorityQueue.add(PriorityQueue.java:268)
    at java.util.AbstractQueue.addAll(AbstractQueue.java:189)
    at cws.core.scheduler.EnsembleDynamicScheduler.moveAllJobsToPriorityQueue(EnsembleDynamicScheduler.java:67)
    at cws.core.scheduler.EnsembleDynamicScheduler.scheduleJobs(EnsembleDynamicScheduler.java:53)
    at cws.core.WorkflowEngine.jobFinished(WorkflowEngine.java:254)
    at cws.core.WorkflowEngine.processEvent(WorkflowEngine.java:120)
    at cws.core.cloudsim.CWSSimEntity.processEvent(CWSSimEntity.java:47)
    at org.cloudbus.cloudsim.core.SimEntity.run(SimEntity.java:406)
    at org.cloudbus.cloudsim.core.CloudSim.runClockTick(CloudSim.java:518)
    at org.cloudbus.cloudsim.core.CloudSim.run(CloudSim.java:882)
    at org.cloudbus.cloudsim.core.CloudSim.startSimulation(CloudSim.java:188)
    at cws.core.cloudsim.CloudSimWrapper.startSimulation(CloudSimWrapper.java:32)
    at cws.core.algorithms.DynamicAlgorithm.simulate(DynamicAlgorithm.java:131)
    at cws.core.algorithms.TestRun.main(TestRun.java:264)

Data transfer and computation parallelisation

Currently tasks are being executed as follows:

[retrieving all input files for task 1] [main computation of task 1] [uploading all output files for task 2] [retrieving all input files for task 2] ...

We can optimize it a little. For the moment only the case with all input files for next task already existing in cache. We can execute this pending task immediately and during its execution upload output files from previous tasks. Many of workflows have wide "data distribution" elements and we expect considerable improvement in running time of workflows, because such workflows often match the mentioned case during execution.

Important assumptions:

It is more important to upload output data than to download input data for the next task and execute it. Two reasons: when VM fails we have to reschedule everything beginning from the task with postponed upload and the opposite assumption could lead to postponing uploads infinetely if execution times of tasks were shorter than upload times (and there were at least 2 output files per task).
This feature is not optional, it is meant to improve simulator realisticness there is no sense to add yet another switch

The idea of implementation is as follows:
We have two queues (figuratively rather than physically): one for CPU (one core so far) and one for data transfers. When CPU queue is emptied, VM pushes upload tasks to data transfer queue, thereafter scheduler is informed and schedules next task. Scheduling next tasks consists of requesting for input files for task and putting task into CPU queue. Task will last in queue until all input files are downloaded and available. If all files have been already in cache, task is started immediately.

In the future we can enhance the process by allowing scheduler to tell us beforehand what task will be the next one. We could take advantage of idle data transfer queue time then and download needed files before current task would have finished.

Add scripts for viewing and filtering CSV files

Make integration tests smaller

As in the title. Now integration tests run forever (> 30m). It shouldn't change the tests' results if we make them smaller. Ideally they should run in less than a few minutes.

Validator of scheduling logs - budget & deadline

Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:

We didn't run out of money before the end
We didn't exceeded deadline

Get simulation parameters through System.getProperty(...)

I believe we should change the way we pass and parse parameters to the simulation.

The idea is to use java.lang.System's properties. I.e. instead of setting -seed=123 cmd line param we'd need to set -Dcws.seed=123.

Pros?

This won't change the way we run simulation significantly
This will be A LOT easier to maintain than to deal with enormous switch on command line arguments

Since last month I've added at least 5 new simulation parameters. With parameters it will be much easier to customize them.

Add cache support to VM.java

Now VM always have cache size of 0. We should enable parametrizing it for simulations.

Log global storage state for future auditing

We should log global storage state (e.g. number of transfers at any given time) for future auditing and integration tests.

Deprecate CWSSimEntity.send method

We've already started using cloudsim.send(...) instead of the static CloudSim.send(...). There's also protected method in SimEntry called SimEntry.send(...) which is basically wrapped cloudsim.send(...). We should deprecate it in CWSSimEntry and use cloudsim.send(...) in return.

Cleanup the codebase as an introductory ticket

There are many compiler warnings all around the codebase. I'm familiarizing with the project, so during analyzing it I'll try to get rid of most of that warnings.

Add storage summaries to the final CSV file

We should add storage summaries to the final CSV file. E.g. total number of bytes transferred, total time spent on transfers.

Create "Hello World CWS" wiki page

I will create "Hello World CWS" wiki page to describe 5 minute tutorial how to set up, run and analyze simple experiment.

Commons-io.jar

Add commons-io.jar and refactor all silent resource closings.

Repair algorithms hardcoded params

Most of parameters of schedulers and provisioners like safety thresholds are hardcoded as public static final field in classes. It would be good to extract them and allow to change them without recompilation of whole project.

Add mocking library

I'll add mockito mock library to facilitate tests.

Update cloudsim.jar library

We should update CloudSim library to the newest version. On their website they claim that the updates are backward compatible and contain many bug fixes.

The reason for the update is that it may speed up simulation process, I believe so.

Introduce consistent source code formatting among all *.java files

I will introduce consistent formatting style in all Java files. To achieve that I will use Eclipse's XML formatter standard and create a wiki page with coding conventions.

I suggest having simple Eclipse's fomatter with a few modifications:

120 character length lines
use spaces instead of tabs
format Javadoc comments

Validator of scheduling logs - transfer & computational tasks order

Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:

All dependent input data transfers finished before a task started
All dependent output data transfers started after a task finished

Add enabling/disabling detailed log as a param

Add option --detailed-log (-dl) enabling detailed log (i.e. for gantt chart). By default detailed log should be disabled.

Fix broken tests

There are 10 test failing due to NPE in global storage manager. This NPE is caused by tasks with either no inputs or no outputs.

Use enum for WorkflowEvent instances

Now WorkflowEvent is an interface with constants defined for events. This is hard to maintain and pleading for hard to find bugs. I believe that we can easily switch WorkflowEvent to enum by taking advantage of our wrappers (CWSSimEntity & CloudSimWrapper). This will ease the development a lot and give some compile time type checking.

Estimate storage transfer times

In order to storage aware algorithms work properly we need to estimate files transfers times. It will be a straightforward estimation as discussed on the team's meeting.

Create VMParams to store VM parameters

Constructor of VM class looks like this now:

public VM(int mips, int cores, double bandwidth, double price, CloudSimWrapper cloudsim) {

It would be good to gather all params of VM in one small class.

Add integration tests to check if whole system works properly.

We should add integration tests to check if whole system works properly. For example to check if the simulation finishes, computation time is within some reasonable boundaries, etc.

DAG parser can create task with nulled input jobs

task.getInputFiles() or task.getOutputFiles should never return null. It's better to return empty list.

Please remove also null sanity checks from DAGDynamicScheduler

Enable CLI arguments to be human readable

Now we can run CWS with command:

java -cp 'lib/*:bin' cws.core.algorithms.TestRun --application GENOME --input-dir \
/home/piotr/stu/dags --output-file simulation_out.csv --distribution pareto_unsorted -- \
algorithm SPSS --storage-manager global -s100 --storage-manager-read=1000000000 -- \
storage-manager-write=30000000 -es 20 --sc fifo --cs 1000000000

It'd be convenient to be able to use "1GiB" instead of "1000000000"

Update "Running simulations" wiki pages to reflect recent changes

Wiki tutorial should be updated, because running scripts have been changed.

Add IoC container to ease managing dependencies in project

I think we should not choose any complex framework like spring, I'd rather choose sth lightweight like juice, because we don't have big enterprise project. Eventual choise is up to the assignee.

The task consists also refactoring of some basic dependencies in order to inverse them using the container.

Fix integration and junit tests

There are some integration tests which always fail. Some of them are flaky and fail from time to time (I'm unable to reproduce which...). This should be fixed.

Create a wrapper for CloudSim class to make CWS testable

I've been struggling a lot trying to create a simple unit test. CloudSim is so test unfriendly that I didn't manage to do this even after a few hours of work.

The solution is to create a wrapper class for CloudSim. After that we will start using wrapper.send(...) instead of CloudSim.send(...). This will allow to test interactions between object easily.

How could we implement it? We can add class:

public class CWSSimEntity extends SimEntity {
  property CloudSimWrapper;
}
...
public class CloudSimWrapper {
  // most important CloudSim's methods
}

And then all classes that now inherited from SimEntity could be easily changed to inherit from CWSSimEntity (possibly the name could be changed)

Classes not inheriting SimEntity could get the reference from some DI container.

@Mequrel review this

Expose storage parameters to the CLI

Commons-io.jar

Add commons-io.jar and refactor all silent resource closings.

Implement congestion model in storage manager.

I will implement congestion model in the Global Storage Manager. It will contain two parameters to simulate congestion:

number of replicas per file (to simulate S3 like services)
maximal global throughput of the storage (to simulate NFSes)

Investigate EnsembleManager class design

I've found such code in some tests:

new EnsembleManager(dags, engine, cloudsim);

For me it looks weird, because the reference isnt hold and everything works. I think the inversion of dependency is probably needed.

All founded occurrences of this by me have a note in code:
// FIXME (_mequrel): looks awkward, a comment should be added or some logic inversed

Prepare set of scripts to visualize scheduling

Create gantt chart generator. There is existing one we should leverage if possible in new version. Add detailed instruction how to use that script. Such charts are inevitable both during analysing scheduling algorithms and during simulator debugging process.

Add validation during this process (or as a standalone script) validating improper schedules like end-time < start-time and so on.

malawski / cloudworkflowsimulator Goto Github PK

cloudworkflowsimulator's Introduction

cloudworkflowsimulator's People

Contributors

Stargazers

Watchers

Forkers

cloudworkflowsimulator's Issues

Recommend Projects

Recommend Topics

Recommend Org