malawski / cloudworkflowsimulator Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Quick start: https://github.com/malawski/cloudworkflowsimulator/wiki/Running-simulations-with-CWS To (re-)build the project run ant in the project root directory. Similarly the unit and integration tests can be run with ant test in the root directory. All dependencies are included in the `lib` directory.
I will add support for caching in GlobalStorageManager. The idea is as follows:
@Mequrel - check & review
When all files are in the cache then CacheManager doesn't send STORAGE_ALL_BEFORE_TRANSFERS_COMPLETED.
In some places we use task runtime estimation. Recently we've added transfer time estimations. Now we must ensure that in all places transfer time is included in task runtime.
Task runtime prediction is broken because it doesn't take latency into account. Moreover, the DEFAULT_LATENCY is 1 which is way too big.
Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:
All tasks started and finished
All transfer tasks started and finished
I will create first implementation of storage support as described in the design doc under:
https://docs.google.com/document/d/1OgbHgmZPTVXf5C-cYADFe3_HGL4PAf8X5DtrMUIZ8ww/edit?usp=sharing
Make experiment runner storage aware. Now we do not use any storage in experiments.
There are many jUnit tests under src/ directory tree. They should be moved to test-only directory, e.g. test/.
The only concern I have regarding this ticket is whether those tests are real "tests" or on invocations of experiments. It is up to the assignee to determine this.
Create GlobalStorageParams class for storing parameters to enable reading them from file or setting as program's attributes. Or just to separate business code from boilerplate one.
Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:
All corrupted tasks were reexecuted
All corrupted transfer tasks were reexecuted
Now transfer times are computed at the beginning of the transfer, e.g.:
double transferTime = file.getSize() / params.getWriteSpeed();
I will change it to be more interactive. The files will be transferred by chunks. This will allow us to e.g. simulate congestions or change speeds on-the-fly
We should reorganize Java packages to make them more readable and maintainable. In particular, package cws.core.*
looks too big. There are many unrelated classes. I'd be nice to create a few new fine-grained packages and move some classes there.
I've got NPE while running DPDS simulation with args:
-application GENOME -inputdir /home/piotr/studia/inz/dags -outputfile simulation_outs/simulation_out.csv -distribution pareto_unsorted -ensembleSize 20 -algorithm DPDS
application = GENOME
inputdir = /home/piotr/studia/inz/dags
outputfile = simulation_outs/simulation_out.csv
distribution = pareto_unsorted
ensembleSize = 20
scalingFactor = 1.000000
algorithm = DPDS
seed = 1368600708741
runtimeVariance = 0.000000
delay = 0.000000
failureRate = 0.000000
/home/piotr/studia/inz/dags/GENOME.n.50.2.dag
/home/piotr/studia/inz/dags/GENOME.n.100.1.dag
/home/piotr/studia/inz/dags/GENOME.n.100.5.dag
/home/piotr/studia/inz/dags/GENOME.n.100.7.dag
/home/piotr/studia/inz/dags/GENOME.n.200.0.dag
/home/piotr/studia/inz/dags/GENOME.n.50.8.dag
/home/piotr/studia/inz/dags/GENOME.n.50.0.dag
/home/piotr/studia/inz/dags/GENOME.n.100.4.dag
/home/piotr/studia/inz/dags/GENOME.n.100.6.dag
/home/piotr/studia/inz/dags/GENOME.n.50.4.dag
/home/piotr/studia/inz/dags/GENOME.n.100.0.dag
/home/piotr/studia/inz/dags/GENOME.n.50.6.dag
/home/piotr/studia/inz/dags/GENOME.n.50.1.dag
/home/piotr/studia/inz/dags/GENOME.n.200.1.dag
/home/piotr/studia/inz/dags/GENOME.n.50.3.dag
/home/piotr/studia/inz/dags/GENOME.n.50.5.dag
/home/piotr/studia/inz/dags/GENOME.n.50.7.dag
/home/piotr/studia/inz/dags/GENOME.n.100.2.dag
/home/piotr/studia/inz/dags/GENOME.n.100.3.dag
/home/piotr/studia/inz/dags/GENOME.n.400.0.dag
budget = 13.000000 995.000000 109.111111
deadline = 7350.000000 340414.000000 37007.111111
..Exception in thread "main" java.lang.NullPointerException
at cws.core.scheduler.EnsembleDynamicScheduler$JobComparator.compare(EnsembleDynamicScheduler.java:32)
at cws.core.scheduler.EnsembleDynamicScheduler$JobComparator.compare(EnsembleDynamicScheduler.java:1)
at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:611)
at java.util.PriorityQueue.siftUp(PriorityQueue.java:589)
at java.util.PriorityQueue.offer(PriorityQueue.java:291)
at java.util.PriorityQueue.add(PriorityQueue.java:268)
at java.util.AbstractQueue.addAll(AbstractQueue.java:189)
at cws.core.scheduler.EnsembleDynamicScheduler.moveAllJobsToPriorityQueue(EnsembleDynamicScheduler.java:67)
at cws.core.scheduler.EnsembleDynamicScheduler.scheduleJobs(EnsembleDynamicScheduler.java:53)
at cws.core.WorkflowEngine.jobFinished(WorkflowEngine.java:254)
at cws.core.WorkflowEngine.processEvent(WorkflowEngine.java:120)
at cws.core.cloudsim.CWSSimEntity.processEvent(CWSSimEntity.java:47)
at org.cloudbus.cloudsim.core.SimEntity.run(SimEntity.java:406)
at org.cloudbus.cloudsim.core.CloudSim.runClockTick(CloudSim.java:518)
at org.cloudbus.cloudsim.core.CloudSim.run(CloudSim.java:882)
at org.cloudbus.cloudsim.core.CloudSim.startSimulation(CloudSim.java:188)
at cws.core.cloudsim.CloudSimWrapper.startSimulation(CloudSimWrapper.java:32)
at cws.core.algorithms.DynamicAlgorithm.simulate(DynamicAlgorithm.java:131)
at cws.core.algorithms.TestRun.main(TestRun.java:264)
Currently tasks are being executed as follows:
[retrieving all input files for task 1] [main computation of task 1] [uploading all output files for task 2] [retrieving all input files for task 2] ...
We can optimize it a little. For the moment only the case with all input files for next task already existing in cache. We can execute this pending task immediately and during its execution upload output files from previous tasks. Many of workflows have wide "data distribution" elements and we expect considerable improvement in running time of workflows, because such workflows often match the mentioned case during execution.
Important assumptions:
The idea of implementation is as follows:
We have two queues (figuratively rather than physically): one for CPU (one core so far) and one for data transfers. When CPU queue is emptied, VM pushes upload tasks to data transfer queue, thereafter scheduler is informed and schedules next task. Scheduling next tasks consists of requesting for input files for task and putting task into CPU queue. Task will last in queue until all input files are downloaded and available. If all files have been already in cache, task is started immediately.
In the future we can enhance the process by allowing scheduler to tell us beforehand what task will be the next one. We could take advantage of idle data transfer queue time then and download needed files before current task would have finished.
Add scripts for viewing and filtering CSV files
As in the title. Now integration tests run forever (> 30m). It shouldn't change the tests' results if we make them smaller. Ideally they should run in less than a few minutes.
Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:
We didn't run out of money before the end
We didn't exceeded deadline
I believe we should change the way we pass and parse parameters to the simulation.
The idea is to use java.lang.System
's properties. I.e. instead of setting -seed=123
cmd line param we'd need to set -Dcws.seed=123
.
Pros?
Since last month I've added at least 5 new simulation parameters. With parameters it will be much easier to customize them.
Now VM always have cache size of 0. We should enable parametrizing it for simulations.
We should log global storage state (e.g. number of transfers at any given time) for future auditing and integration tests.
We've already started using cloudsim.send(...)
instead of the static CloudSim.send(...)
. There's also protected method in SimEntry
called SimEntry.send(...)
which is basically wrapped cloudsim.send(...)
. We should deprecate it in CWSSimEntry
and use cloudsim.send(...)
in return.
There are many compiler warnings all around the codebase. I'm familiarizing with the project, so during analyzing it I'll try to get rid of most of that warnings.
We should add storage summaries to the final CSV file. E.g. total number of bytes transferred, total time spent on transfers.
I will create "Hello World CWS" wiki page to describe 5 minute tutorial how to set up, run and analyze simple experiment.
Add commons-io.jar and refactor all silent resource closings.
Most of parameters of schedulers and provisioners like safety thresholds are hardcoded as public static final field in classes. It would be good to extract them and allow to change them without recompilation of whole project.
I'll add mockito mock library to facilitate tests.
We should update CloudSim library to the newest version. On their website they claim that the updates are backward compatible and contain many bug fixes.
The reason for the update is that it may speed up simulation process, I believe so.
I will introduce consistent formatting style in all Java files. To achieve that I will use Eclipse's XML formatter standard and create a wiki page with coding conventions.
I suggest having simple Eclipse's fomatter with a few modifications:
Create part of sophisticated validator of output logs that ensure scheduling was correct that meets following conditions:
All dependent input data transfers finished before a task started
All dependent output data transfers started after a task finished
Add option --detailed-log (-dl) enabling detailed log (i.e. for gantt chart). By default detailed log should be disabled.
There are 10 test failing due to NPE in global storage manager. This NPE is caused by tasks with either no inputs or no outputs.
Now WorkflowEvent is an interface with constants defined for events. This is hard to maintain and pleading for hard to find bugs. I believe that we can easily switch WorkflowEvent to enum by taking advantage of our wrappers (CWSSimEntity & CloudSimWrapper). This will ease the development a lot and give some compile time type checking.
In order to storage aware algorithms work properly we need to estimate files transfers times. It will be a straightforward estimation as discussed on the team's meeting.
Constructor of VM class looks like this now:
public VM(int mips, int cores, double bandwidth, double price, CloudSimWrapper cloudsim) {
It would be good to gather all params of VM in one small class.
We should add integration tests to check if whole system works properly. For example to check if the simulation finishes, computation time is within some reasonable boundaries, etc.
task.getInputFiles() or task.getOutputFiles should never return null. It's better to return empty list.
Please remove also null sanity checks from DAGDynamicScheduler
Now we can run CWS with command:
java -cp 'lib/*:bin' cws.core.algorithms.TestRun --application GENOME --input-dir \
/home/piotr/stu/dags --output-file simulation_out.csv --distribution pareto_unsorted -- \
algorithm SPSS --storage-manager global -s100 --storage-manager-read=1000000000 -- \
storage-manager-write=30000000 -es 20 --sc fifo --cs 1000000000
It'd be convenient to be able to use "1GiB" instead of "1000000000"
Wiki tutorial should be updated, because running scripts have been changed.
I think we should not choose any complex framework like spring, I'd rather choose sth lightweight like juice, because we don't have big enterprise project. Eventual choise is up to the assignee.
The task consists also refactoring of some basic dependencies in order to inverse them using the container.
There are some integration tests which always fail. Some of them are flaky and fail from time to time (I'm unable to reproduce which...). This should be fixed.
I've been struggling a lot trying to create a simple unit test. CloudSim is so test unfriendly that I didn't manage to do this even after a few hours of work.
The solution is to create a wrapper class for CloudSim. After that we will start using wrapper.send(...)
instead of CloudSim.send(...)
. This will allow to test interactions between object easily.
How could we implement it? We can add class:
public class CWSSimEntity extends SimEntity {
property CloudSimWrapper;
}
...
public class CloudSimWrapper {
// most important CloudSim's methods
}
And then all classes that now inherited from SimEntity
could be easily changed to inherit from CWSSimEntity
(possibly the name could be changed)
Classes not inheriting SimEntity
could get the reference from some DI container.
@Mequrel review this
Expose storage parameters to the CLI
Add commons-io.jar and refactor all silent resource closings.
I will implement congestion model in the Global Storage Manager. It will contain two parameters to simulate congestion:
I've found such code in some tests:
new EnsembleManager(dags, engine, cloudsim);
For me it looks weird, because the reference isnt hold and everything works. I think the inversion of dependency is probably needed.
All founded occurrences of this by me have a note in code:
// FIXME (_mequrel): looks awkward, a comment should be added or some logic inversed
Create gantt chart generator. There is existing one we should leverage if possible in new version. Add detailed instruction how to use that script. Such charts are inevitable both during analysing scheduling algorithms and during simulator debugging process.
Add validation during this process (or as a standalone script) validating improper schedules like end-time < start-time and so on.
Jobs should be labelled by the priority of DAG job, see: https://github.com/malawski/cloudworkflowsimulator/blob/master/src/cws/core/log/WorkflowLog.java
Now, as we haven't started development yet, it's the right moment to switch to the Maven build system. This little effort now should ease development process in the future.
The DynamicProvisionerDynamicSchedulerTest test is ignored. I know that it runs dozen of minutes, but I vote for unignoring it, because it is the only one "real" integration test we have.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.