Coder Social home page Coder Social logo

rocoto's Introduction

Rocoto Workflow Management System

Introduction

Workflow Management is a concept that originated in the 1970's to handle business process management. Workflow management systems were developed to manage complex collections of business processes that need to be carried out in a certain way with complex interdependencies and requirements. Scientific Workflow Management is much newer, and is very much like its business counterpart, except that it is usually data oriented instead of process oriented. That is, scientific workflows are driven by the scientific data that "flows" through them. Scientific workflow tasks are usually triggered by the availability of some kind of input data, and a task's result is usally some kind of data that is fed as input to another task in the workflow. The individual tasks themselves are scientific codes that perform some kind of computation or retreive or store some type of data for a computation. So, whereas a business workflow is comprised of a diverse set of processes that have to be completed in a certain way, sometimes carried out by a machine, sometimes carried out by a human being, a scientific workflow is usually comprised of a set of computations that are driven by the availabiilty of input data.

Why Workflow Management?

The day when a scientist could conduct his or her numerical modeling and simulation research by writing, running, and monitoriing the progress of a modest Fortran code or two, is quickly becoming a distant memory. It is a fact that researchers now often have to make hundreds or thousands of runs of a numerical model to get a single result. In addition, each end-to-end "run" of the model often entails running many different codes for pre- and post-processing in addition to the model itself. And, in some cases, multiple models and their associated pre- and post-processing tasks are coupled together to build a larger, more complex model. The codes that comprise the end-to-end modeling systems often have complex interdependencies that dictate the order in which they can be run. And, in order to run the end-to-end system efficiently, concurrency must be used when dependencies allow it. The problem of scale and complexity is exacerbated by the fact that these codes are usually run on high performance machines that are notoriously difficult for scientists to use, and which tend to exhibit frequent failures. As machines get larger and larger, the failure rate of hardware and software components increases commensurately. Ad-hoc management of the execution of a complex modeling system is often difficult even for a single end-to-end run on a machine that never fails. Multiply that by the thousands of runs needed to perform a scientific experiment, in a hostile computing environment where hardware and facility outages are not uncommon, and you have a very challenging situation. For simulations that must run reliably in realtime, the situtation is almost hopeless. The traditional ad-hoc techniques for automating the execution of modeling systems (e.g. driver scripts, batch job chains or trees) do not provide sufficient fault tolerance for the scale and complexity of current and future workflows, nor are they reusable; each modeling system requires a custom automation system.

A Workflow Management System addresses the problems of complexity, scale, reliability, and reusability by providing two things:

  • A high-level means by which to describe the various codes that need to be run, along with their runtime requirements and interdependencies.
  • An automation engine for reliably managing the execution of the workflow

Prerequisites

Depending on how the components of a modeling system are designed and how existing software for running them is designed, some changes may be necessary to make use of a workflow management system. In order to take full advantage of the features offered by a workflow management system, the model system components must be well designed. In particular the following best practices should be followed:

Each workflow task must correctly check for its successful completion, and must return a non-zero exit status upon failure. An exit status of 0 means success, regardless of what actually happened. No workflow task should contain automation features. Automation is the workflow managmenet system's responsibility. A workflow management system cannot manage tasks or jobs that it is not aware of. Enable reuse of workflow tasks by using principles of modular design to build autonomous model components with well-defined interfaces for input and output that can be run stand-alone. Prefer the construction of small model components that do only one thing. It is easy to combine several small, well-designed, components together to build a larger, more complex workflow task. It is generally much more difficult to divide large, complex, model components into smaller ones to form multiple workflow tasks. Avoid combining serial and parallel processing in the same workflow task unless the serial processing is very short in duration.

Documenation

Detailed documentation is provided at http://christopherwharrop.github.io/rocoto/

rocoto's People

Contributors

christopherwharrop avatar christopherwharrop-noaa avatar samueltrahannoaa avatar samtrahan avatar t-brown avatar christinaholt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.