Coder Social home page Coder Social logo

polusai / workflow-inference-compiler Goto Github PK

View Code? Open in Web Editor NEW
14.0 5.0 13.0 8.18 MB

A domain specific language for creating scientific pipelines

License: MIT License

Python 93.67% Shell 0.74% Common Workflow Language 4.23% Jupyter Notebook 1.26% Batchfile 0.10%

workflow-inference-compiler's Introduction

Sophios

doc-buid-status

Scientific computing can be difficult in practice due to various complex software issues. In particular, chaining together software packages into a computational pipeline can be very error prone. Using the Common Workflow Language (CWL) greatly helps, but like many other workflow languages users still need to explicitly specify how to connect inputs & outputs. Sophios allows users to specify computational protocols at a very high level of abstraction, it automatically infers almost all connections between inputs & outputs, and it compiles to CWL for execution.

Documentation

The documentation is available on readthedocs.

Quick Start

See the installation guide for more details, but:

For pip users:

pip install sophios

In order to execute the CWL workflows that are generated by sophios, cwltool and all of its underlying dependencies need to be present in the system. Unfortunately pip has no capability to resolve and install these dependencies. PLease refer to the cwltool installation guide to prepare the system to run CWL workflows.

For conda users / developers:

See the installation guide for developers

sophios --yaml ../workflow-inference-compiler/docs/tutorials/helloworld.wic --graphviz --run_local --quiet

Sophios is a Domain Specific Language (DSL) based on the Common Workflow Language. CWL is fantastic, but explicitly constructing the Directed Acyclic Graph (DAG) associated with a non-trivial workflow is not so simple. Instead of writing raw CWL, users can write workflows in a much simpler yml DSL. For technical reasons edge inference is far from unique, so users should always check that edge inference actually produces the intended DAG.

Edge Inference

The key feature is that in most cases, users do not need to specify any of the edges! They will be automatically inferred for users based on types, file formats, and naming conventions. For more information, see the user guide If for some reason edge inference fails, there is a syntax for creating explicit edges.

Subworkflows

Subworkflows are very useful for creating reusable, composable building blocks. As shown above, recursive subworkflows are fully supported, and the edge inference algorithm has been very carefully constructed to work across subworkflow boundaries.

Explicit CWL

Since the yml DSL files are automatically compiled to CWL, users should not have to know any CWL. However, the yml DSL is secretly CWL that is simply missing almost all of the tags! In other words, the compiler merely adds missing information to the files, and so if the users know CWL, they are free to explicitly add the information themselves. Thus, the yml DSL is intentionally a leaky abstraction.

Python API

In addition to the underlying declarative yaml syntax, there is an API for writing WIC workflows in python. The python API is philosophically the exact opposite: users should not have to know any CWL, and in fact all CWL features are hidden unless explicitly exposed.

workflow-inference-compiler's People

Contributors

camilovelezr avatar cyangnyu avatar hsidky avatar jfennick avatar misterbrandonwalker avatar mohamedouladi avatar ndonyapour avatar sameeul avatar vjaganat90 avatar ywang271828 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

workflow-inference-compiler's Issues

Management of Directories

Moving PolusAI/image-tools#466 to correct repo

Summary

Management of directories by the WIC needs to be investigated.

What is the current bug behavior?

After initial investigation, it seems that any output directory hierarchy is ignored and output directory is created in the working directory.
Intermediary directories created by wic are also created directly created in the working directory. Their names are build from the named input preprended the step name (ex: inpDirOmeConverter). This means that running several workflows will eventually shove all data in the same directory.

What is the expected correct behavior?

  • output paths hierarchy should not be ignored and the full path created. Relative paths will not work in certain configurations.
  • intermediary directories (created when inputs and outputs are linked) should be created within a directory with WORKFLOW_NAME or a similar idea to help discriminate between various workflow outputs.

Error messages in WIC Python API

Moving PolusAI/image-tools#467 to correct repo

Description

When WIC fails to compile with WIC Python API, no helpful error message is logged back to the user.

Proposal

Running WIC on the command line directly provides better feedback so let's look integrating those.

Additional context

Those problems are recurrent when using the WIC Python API to develop workflows.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.