atlarge-research / opendc-simulator Goto Github PK

Datacenter simulation toolkit for the OpenDC project

License: MIT License

Kotlin 99.49% Dockerfile 0.51%

opendc simulator datacenter tu-delft kotlin scheduling simulation-toolkit

opendc-simulator's Introduction

We are moving towards a mono-repo code model in v2.0 of OpenDC.
See the main repo for the latest version, which now has the code previously contained in this sub-repo embedded in it.

OpenDC Simulator

Collaborative Datacenter Simulation and Exploration for Everybody

Getting the source

Download the source code by running the following code in your command prompt:

$ git clone https://github.com/atlarge-research/opendc-simulator.git

or simply grab a copy of the source code as a Zip file.

Building

For building the source code, we use Gradle. To run gradle, enter the following in your command prompt:

$ ./gradlew build

To test the source code, run the following code in your command prompt:

$ ./gradlew test

License

The code is released under the MIT license. See the LICENSE.txt file.

opendc-simulator's People

Contributors

Stargazers

Watchers

Forkers

sacheendra mihaineacsu lfdversluis greengearx b1gb4dw0lf sanderronde isabella232

opendc-simulator's Issues

Implement Instrumentation API for Omega kernel

Depends on #11

Machine Timestamp

Hello,

I am working on the lab for distributed systems class.
Is there a way to get the machine which became idle the earliest?

Or should I add timestamps to the machines locally or at StageScheduler when DONE message is received? If you suggest the latter, how can I get the machine that sent the DONE message?

Thanks!

Add hard limit on simulations from database

An experiment could run indefinitely if the tasks never finish. This causes exhaustion of the thread pool the simulator is running.

Update build toolchain

This issue is concerned with updating the build toolchain of the project. At the moment, we are using the following versions:

Gradle 3.5
Currently, Gradle is at release 4.8 which provides native JUnit 5 support. This allows us to remove the current configuration and build plugins needed to support JUnit 5. In addition the new Gradle versions add support for build caching
Kotlin 1.2.21
Kotlin is currently at version 1.2.51 and provides a whole lot of new features. See https://blog.jetbrains.com/kotlin/2017/11/kotlin-1-2-released/
Dokka 0.9.15
The latest version of Dokka, 0.9.17, now supports the new plugin DSL from Gradle.

Create design for Instrumentation API

For the first public release, we need a design for an Instrumentation API which is able to measure data from the system in simulation and process the data that has been measured.

Goals

Allow user to measure data from the system in simulation
Allow user to transform the measured data
Allow user to use the data (save/store)

Design

Measurement
The measurement of data happens in an Instrument, which is similar to a Process and has access to the state of all entities in the model. These measurements can happen periodically, by making the instrument hold (sleep) between the measurements. The measurements are then emitted via the methods provided by the SendChannel interface.

val entity = ...
val instrument: Instrument<Int, Model> = {
    while(true) {
        send(entity.state.counter)
        hold(10)
    }
}

Installation
The Kernel interface will provide a method install(instrument: Instrument<T, M>): ReceiveChannel<T> that allows a user to install an instrument into a (running) system. The method will return a stream of measurements of type T (we use ReceiveChannel from kotlinx-coroutines to represent a stream).
Transformation and Collection
The idea is to use the channel returned by the install method to asynchronously transform the measurements into meaningful data:

val stream: ReceiveChannel<Int> = kernel.install(instrument)
val avg: Deferred<Int> = async(Confined) { stream.take(100).average() }

kernel.run(1000)

We're doing the Lab exercise for Distributed Systems, at the VU, and we're having trouble creating a setup file for the SC18PlatformRunner. We've tried the website, but that failed (see atlarge-research/opendc-frontend#66). We've also found Sc18SetupParser, but its somewhat hard to reverse-engineer a good setup from that.

Could you provide an example setup file? Or provide/link to documentation on the format of this setup file?

Cached traces cause interference

Trace objects are cached inside the simulator, causing simulations using the same logical trace to reuse this object and interfere with each other, possible causing the simulator to run indefinitely due to #3 and #4.

Tasks scheduled to machines without processing units

At the moment, a scheduler will schedule tasks to a machine without processing units, causing the system to run indefinitely because these tasks never finish.

Simulator appears to crash on non-ASCII experiment name

A recent test revealed that the system crashed when a user used a non-ASCII character (an emoji) as name for an experiment. All subsequent experiments remained in the QUEUED state until I dumped the and restarted the Docker, so it looks like the simulator was down.

Can no longer login?

I experimented with OpenDC last year and setup a couple projects, but the login no longer seems to work?

Clicking 'Login with Google' in the top right opens the popup window and prompts for G username and pass, but open entering them, popup disappears but Login button remains unchanged and no new options show up. Am I missing something here or has there been a change?

Would love to use the online version because the port forwarding on the Docker version seems tricky! :-)

Interpolate task progress

Expected Behaviour

The progress of tasks update every tick according to the speed of the machines.

Actual Behaviour

At the moment, the progress of tasks will only update at a specific interval (10 ticks) instead of per tick.

Redesign core simulation API

This issue describes the redesign of the core simulation API for v1.1. The goal of the redesign is to provide a cleaner and more uniform core simulation API to the simulation authors.

The new simulation interface works as follows for simulation authors:

Create the entities in the model. Dynamic entities are represented as Processes and interact with the model environment by the interchange of messages:

class Ping : Process<Int, PingPongModel> {
    override val initialState = 0
    fun Context<Int, PingPongModel>.run() {
        while(true) {
             receive {
                 if (message == "Pong") {
                     sender.send("Ping", delay = model.delay)
                     state += 1
                 }
             }
        }
    }
}
...

Create an object that contains the model properties and keeps track of the entities in that model:

class PingPongModel {
    val entities: Pair<Entity, Entity> = Pair(Ping(), Pong())
    val delay: Int = 2
}

Create a model bootstrap that introduces the active entities in a model to the simulation kernel and the initial messages to get the system started:

val bootstrap: Bootstrap<PingPongModel> = Bootstrap.create { ctx ->
     val model = PingPongModel()
     val (ping, pong) = model.entities
     ctx.register(ping)
     ctx.register(pong)
     ctx.schedule("Pong", ping, pong)
     return model
}

Create a simulation kernel using the above bootstrap and start the simulation:

val kernel = OmegeKernelFactory.create(bootstrap)
kernel.run(1000)

In addition, we also clean up the development tree:

opendc-core - contains the core interfaces that are used to create a simulation model.
opendc-kernel-omega - contains the default kernel implementation to run the simulation models with.
opendc-stdlib - contains helpful generic procedures and structures for use in simulations.
opendc-model-odc - contains the default simulation model we use for simulation datacenters.

Implement Continuous Integration support

We should add support for Continuous Integration (e.g. Travis CI) in order to continuously test the code base.

Sending message to stopped process causes crash

During simulation, when trying to send a message to a Process that has already returned (stopped), the simulation kernel crashes with the following error:

kotlin.UninitializedPropertyAccessException: lateinit property continuation has not been initialized

Persist events instead of simulation state

This issue is concerned with refactoring the OpenDC simulation model to persist events instead of simulation state.

At the moment, at every second in the simulation, the state of machines and task is recorded and written to the database. However, this means that even when a task or machine never changes during the simulation, we do need to record all these states and write them to the database for this entity. This will cause a single simulation to write a huge amount of rows to the database. A benefit of this approach is that getting the state for a single point in time can be accessed quite quickly.

The model should be refactored so it writes events that occur to the database instead of these states. This means that entities that do not have any events associated with them will not occupy space in the database. A downside of this approach is that getting the state for a single point in time requires replaying the events starting from some checkpoint. This change will also require significant changes in the database as well as the current frontend.

See more about Event Sourcing here:

Docker build failure

The Docker image provided in the repository fails to build due file permission errors.

Kernel resumes old continuation of frozen or crashed process

At the moment, the opendc-kernel-omega implementation may resume an old continuation of a process after crashing and receiving a new message. This will result in an IllegalStateException for resuming an already resumed continuation.

This issue is caused by fact that the kernel does not clear the continuation after it has been used and assumes the continuation will always be correctly reset thereafter.

Offending lines can be found in

opendc-simulator/opendc-kernel-omega/src/main/kotlin/com/atlarge/opendc/omega/OmegaSimulation.kt

Lines 186 to 192 in ebd8c8e

    
           val context = envelope.destination.context ?: continue 
        
           if (envelope.message !is Interrupt) { 
        
               context.continuation.resume(envelope) 
        
           } else { 
        
               context.continuation.resumeWithException(envelope.message) 
        
           }

	val context = envelope.destination.context ?: continue

	if (envelope.message !is Interrupt) {
	context.continuation.resume(envelope)
	} else {
	context.continuation.resumeWithException(envelope.message)
	}