Coder Social home page Coder Social logo

eties's Introduction

Eliciting event ties in ROSS to test for non-determinism


This model is designed to be absolutely terrible for a PDES simulator that doesn't handle event ties deterministically. Meaning from simulation to simulation, while events with different timestamps occur in timestamp order by the end of the simulation, the order of events with the same timestamps may occur in different orders.

This is not a problem if the model properly dithers its events with random values to try and ensure that each event has a unique timestamp but sometimes, especially in long running simulations, RNG collisions can occur despite efforts to avoid it.

If a model uses only operations on state that are 100% commutative, then this isn't really a big deal. For example, if every event had a random value that, when received, was added to the running sum on the receiving LP (and this value is subtracted during rollbacks), then the order in which events are processed - even tie events - doesn't matter.

That is not the case if instead of addition we use an operation like mean. That is what this model does. Every LP starts with a random value called its cur_rec_mean - current recursive mean. It sends a group of new events to random destinations, each event message containing a different random value.

Upon receipt of an event, an LP will take its current recursive mean and average it with the new random value that it received. This is not attempting to create an average over all random values received - that would be order independent. This is creating an average of an average of an average... and so on.

This is strictly order dependent and different orders can have wildly different final values. This is based on the property:

Mean(Mean(A,B),C) != Mean(Mean(A,C),B)


To run this simulation, you need ROSS. Symlink this model into the ROSS source models folder and build ROSS.

Once ROSS is built, an example execution in sequential mode is:

mpirun -n 1 models/eties/eties --synch=1 --extramem=1000 --nlp=4 --start-events=10 --end=10000

That will run the simulation in sequential mode with 4 LPs, each creating 10 starting events, and ending at timestamp 10000. The final output, the running mean and running sums printed out at the end, should be deterministic - the same every time the simulation is run.

To run this same simulation in parallel conservative on 4 PEs:

mpirun -n 4 models/eties/eties --synch=2 --extramem=1000 --nlp=1 --start-events=10 --end=10000

This should also be deterministic and its output should match the sequential execution. If it doesn't, then there is a problem with deterministic ordering of forward events.

To run this same simulation in parallel optimistic on 4 PEs:

mpirun -n 4 models/eties/eties --synch=3 --extramem=1000 --nlp=1 --start-events=10 --end=10000

If ROSS doesn't handle event ties properly: by either rolling events back in a non-deterministic way or replaying them forward in a non-deterministic way, then this will both not match the sequential execution output - nor will it be consistent from run to run. This effect happens less frequently when there are not many rollbacks or event ties so extending the end time of the simulation will lead to more event ties over a longer period of time.

A helpful tool in debugging is to use optimistic debug, or --synch=4. This runs all events forward in sequential, but then rolls them all back as well to the beginning of the simulation. If the LP state at the end of the simulation does not match the LP state at the beginnning, then there is something wrong - likely with the model level reverse compuation:

mpirun -n 1 models/eties/eties --synch=4 --extramem=1000 --nlp=4 --nkp=1 --start-events=10 --end=10000

This model is also used to test a possible fix in ROSS for this behavior by using an independent deterministic RNG value for aiding in the processing of event ties to ensure that they are processed in the correct order. If that feature is enabled, then there is an additional debug option #define'd that allows for the printing out of the count of that RNG value which can be verified to be fully rolled back using --synch=4.

eties's People

Contributors

nmcglo avatar

Stargazers

Elsa Gonsiorowski, PhD avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.