parsl / parsl Goto Github PK

View Code? Open in Web Editor NEW

464.0 464.0 191.0 23.8 MB

Parsl - a Python parallel scripting library

Home Page: http://parsl-project.org

License: Apache License 2.0

Python 97.75% Shell 0.87% Makefile 0.36% C 0.09% CSS 0.02% HTML 0.90% Roff 0.01%

hacktoberfest

parsl's Introduction

Parsl - Parallel Scripting Library

Parsl extends parallelism in Python beyond a single computer.

You can use Parsl just like Python's parallel executors but across multiple cores and nodes. However, the real power of Parsl is in expressing multi-step workflows of functions. Parsl lets you chain functions together and will launch each function as inputs and computing resources are available.

import parsl
from parsl import python_app


# Make functions parallel by decorating them
@python_app
def f(x):
    return x + 1

@python_app
def g(x, y):
    return x + y

# Start Parsl on a single computer
with parsl.load():
    # These functions now return Futures
    future = f(1)
    assert future.result() == 2

    # Functions run concurrently, can be chained
    f_a, f_b = f(2), f(3)
    future = g(f_a, f_b)
    assert future.result() == 7

Start with the configuration quickstart to learn how to tell Parsl how to use your computing resource, then explore the parallel computing patterns to determine how to use parallelism best in your application.

Quickstart

Install Parsl using pip:

$ pip3 install parsl

To run the Parsl tutorial notebooks you will need to install Jupyter:

$ pip3 install jupyter

Detailed information about setting up Jupyter with Python is available here

Note: Parsl uses an opt-in model to collect usage statistics for reporting and improvement purposes. To understand what stats are collected and enable collection please refer to the usage tracking guide

Documentation

The complete parsl documentation is hosted here.

The Parsl tutorial is hosted on live Jupyter notebooks here

For Developers

Download Parsl:

$ git clone https://github.com/Parsl/parsl

Build and Test:

$ cd parsl # navigate to the root directory of the project
$ make   # show all available makefile targets
$ make virtualenv # create a virtual environment
$ source .venv/bin/activate # activate the virtual environment
$ make deps # install python dependencies from test-requirements.txt
$ make test # make (all) tests. Run "make config_local_test" for a faster, smaller test set.
$ make clean # remove virtualenv and all test and build artifacts

Install:
```
$ cd parsl
$ python3 setup.py install
```
Use Parsl!

Requirements

Parsl is supported in Python 3.8+. Requirements can be found here. Requirements for running tests can be found here.

Code of Conduct

Parsl seeks to foster an open and welcoming environment - Please see the Parsl Code of Conduct for more details.

Contributing

We welcome contributions from the community. Please see our contributing guide.

parsl's People

Contributors

Stargazers

Watchers

Forkers

monicadlewis97 pradal yadudoc j-woz djf604 kylechard zhaobin74 lukaszlacinski annawoodard pnookala matthewturk xarthisius daheise dakoop ianfoster khurtado jmoon1506 zonca oesteban ncsa andrewlitteken tskluzac mjwilde nhazekam timarmstrong benclifford riff-rights lgray sprinterzzj yyht macintoshpie benhg lnaden bnl-sdcc aashish24 ravihansa3000 nahilsobh tjdasso earslan10 wardlt parslfestdemo dgasmith glentner amandaw95 tjuedema saktar-unr tshaffe1 xinyixiang lhayhurst aquanauts btovar mengyitriste kellyrowland matthewbm fengggli mattebaughman tuhz meschw04 exalearn bilke mwoodbri jg-quarknet gerrick bakerjl lincolnbryant raffmont sohitmiglani jameshcorbett knagaitsev dthain su-cimmiv vkhodygo yochannah jrueb generalcommission oddant1 mfknie shism2 blaiszik nirandaperera kommav cylondata stefangary vaovao tphung3 jeff-yc-wong wrench-project kephale ming-yan kodingkoning parallelworks aclowes b-butler paulprice ehusby kerneyj tcompa fractal-analytics-platform ericlee543 crisingulani

parsl's Issues

quickstart: download needed?

On the quickstart part of the README, should we remove item 1 (download), since I think the pip install means that a manual download is not needed?

Usage Tracking and logging for Parsl project reporting

For reporting purposes we need to capture the following information

A UDP packet with this info could be sent out from the Parsl script and this could be captured and logged on a server hosted on AWS.
Here are the components we'll need

Log info on the client side and send a UDP packet
Receive and log info to stable storage on our side.
Config option to opt-out of tracking
Documentation to clearly indicate the info collected and how to opt-out

Separate repo for apps and demos

Posting task here for better tracking.

Move apps and demos to separate repos
Fold them in here as submodules

Missing documentation from Libsubmit

When we split the libsubmit repo out of parsl we also split key documentation that is necessary in Parsl.
We need to move this documentation back in either via interproject sphinx links, or even via adding libsubmit in via git sub-repos.

Troubleshooting guide

We need a troubleshooting guide to help understand and workaround known issues:

No public ip when trying to do remote execution results in workers hanging.
Parsl does not autodetect public ip
Apps fail with engine failure errors when there's a python version mismatch between client and remote side.
Issues with serializing certain data structures (from @WardLT)
Parsl complains about missing boto3 when installed via pip

Error when Python versions differ

We should catch when the Parsl script and worker Python versions are different.

Output files from apps

If we have an app that produces no output files we call such an app like :

<var:App_fu> = app_func ( .... )

However if the app has outputs, the return is a tuple :

<var:App_fu>, [ <var:Data_fu> ... ] = app_func ( ...., outputs=[ Files ....])

When this was last discussed, one proposed solution was to simply return the app future in both cases, and have the data futures be a property of the app future. This would look like this :

# Single return
app_fu = app_func ( ... , outputs=['a.txt', 'b.txt'] )
# User has to unpack outputs separately 
x, y = app_fu.outputs

Right now, we have the outputs property added, but continue to return a tuple of app future and the list of data futures. I propose that these changes should go into 0.3.0.

Splitting execution_providers out into a separate package

Other collaborators have expressed interest in sharing and building on the execution_providers component. This functionality is fairly self contained and could easily be forked into a separate repo and package. Parsl will depend on this separate package.

Split execution_providers into libsubmit repo
Create new pypi package libsubmit
Parsl to use libsubmit as a dependency

Circular dependency in Parsl

Reported by @dfj604

The following code will hang due to a circular dependency :

@App('python', dfk)
def App_A (x):
    import time
    time.sleep(0.2)
    return x*2

@App('python', dfk)
def AppSum (inputs=[]):
    return sum(inputs)

# Creating a list of App_Futures
app_futures = [ App_A(i) for i in range(0,2) ]
# Sum depends on the future results from the previous step
app_futures.extend([AppSum(inputs=app_futures)])
# However by extending app_futures with the future from AppSum, AppSum
# now depends on itself and will block forever.

print(app_futures)
[i.result() for i in app_futures]

While this maybe not ideal code from the user, this is something parsl should try to protect against.
Making a deep copy of the args and kwargs to immutable containers (say tuples) could help.

Docs - Tutorial and sphinx docs.

We might want to simply have a jupyter notebook with the tutorial and fold that into sphinx.

README+ changes

I think the README should say what the current version is, and ideally, should include a high-level roadmap, maybe via a link to another .md file.

Swift-T exits as parsl pumps >1K jobs

This is an issue on branch: swift_t_integrate at commit: 83d21c7 .

This is tested on Midway, and the following are steps to reproduce:

swift-t -l -n 6 executor.swift

Run the test with 100 tasks: This works.

python3 test.py -c 100

Run tests again with 1K tasks

python3 test.py -c 1000

This will fail with swift-t just exiting with no error messages and the parsl script hanging.

Support for S3 Files.

This issue is motivated by the need to execute workflows on AWS where data is often stored on S3.
We need the File types to support moving data to and from S3 as well as handle various auth mechanisms required to connect with S3 in a secure fashion. There are multiple problems that need to be tackled for this issue:

How do we specify S3 files. Here's one possibility :

# Specifying an input file is easy
x = File("s3://parsl_dtest/foo.dat")

# Specifying an output file
# If foo.result is not on CWD and is nested, this can get ugly
y = File("s3://parsl_dtest/foo.result")

fu = app_process(x, outputs=[y])

Staging modes.
Does the S3 files get staged from the workers, client side. Are the going to be cached ?
Can staged files be shared between workers ?
Auth
The auth model needs to be planned to handle the various modes by which AWS credentials can be handled. We want to avoid credentials moving in plaintext between the components as well as discourage putting credentials in plaintext in workflow code.

Work on this requires a resolution of #9

This feature is required for SwiftSeq (@JasonJPitt)

non-working strange github link

The link at the top of https://github.com/Parsl/parsl points to https://swift-lang.github.io/swift-e-lab/

I'm not sure why, but in any case, there's nothing at this page.

Automatic retry for failed apps

In many failure cases Parsl could retry an app within some given parameters. We need to develop a model for specifying retry logic and representing different failure modes in parsl (e.g., app failure vs system failure).

Dominic and others have requested this feature.

disconnections

What happens if a Parsl code starts an app, then disconnects?

For example, on an HPC system, Parsl is running on a front end node and launches a long-running HPC app, with another app that depends on it. While the HPC app is running, the front end node is restarted. The HPC app keeps running. Will the dependent app ever run?

Dev docs missing for AWS and Azure

Dev Docs are missing for both AWS and Azure. We need docs following the style in the slurm provider.
@benhg Could you help on this ?

name

How about PySwift.

Multisite support

This is to track ongoing work on the multisite branch.

Why is this important:

This will allow us to manage apps with different resource requirements in a single workflow, such as apps that only need a core and should be executed on the login node vs, apps that run on multiple
nodes with MPI.
Light weight apps executing on threads could perform flowcontrol and launch heavy tasks.

Accept multiple executors in the dfk
Launch multiple executors in the DFK from the config definition
Support for multiple IPP executors
Attach specific tasks to sites via enhancements to the decorator to take sites kwarg :

@App ('bash', dfk, sites=['local'])
def foo (x):
    return 'echo $(({0} * 2))

@App ('bash', dfk, sites='all' ) # Sites = 'all is the default
def sleep_compute (x) :
    return 'sleep x'

Run directories

Workflows in production are often run in the same directory multiple times. Various log files, checkpoint files, submit scripts, config options etc are often produced and ideally stored in a manner that would allow the user to easily identify files associated with a single workflow run. Having a run specific directory would also help #39 and #48.

I propose that we have run directory that is named runNNN by default and placed in the current directory. Users would have the option to specify a different run directory path, if they so choose.

Request blocks from providers in response to workflow pressure.

Track available resources from each execution site, as well as workflow pressure to each site to make a determination about appropriately scaling to match workflow requirements.

This will need a few things :

Track outstanding tasks pending on a site
Implement algorithm that uses task overflow counters and timers to ensure balance of responsiveness
vs effective measurement.
Integrate with provider interfaces to scale based on the task pressure algo

Support for containers

Support for running applications in containers. Make this a high level app type. Investigate support for reusing cached containers. Investigate support for containers as a sandbox.

Potentially we would want to support different container models for different systems. We'd want to define a single app that could then use different containers on each system.

Perhaps something like the following:

app('container', dfk):
def foo(param1, param2):
return {singularity : {container-uri: uri, config: foo, run: "param1 param2"},
docker: {container-uri: uri, config: foo, run: "param1 param2"}}

Maybe better to wrap each container type as its own class?

Parsl dflow structure missing dependency information

This is based on a report from @benhg

The dependency lists help per task in the dataflow kernel datastructures are empty lists. They should contain information of their dependent tasks.

This is blocking @benhg 's work on live visualization of the dataflow graph.

Clarity on passing futures to apps

From conversation with @WardLT :

The current doc set lacks clarity on how arguments to app functions work. For eg:

Explain with examples special keywords (stdout, stderr, inputs, outputs)
How futures can be passed as arguments
Clarify passing arbitrary no. of futures as a list via special keyword inputs

Workflow-level checkpointing

We need to provide the ability to record workflow state (e.g., task status, input/output, task command) and export it to an external store (e.g., database, file). Allow for restart from the checkpoint.

Manage job posting

Exception handling

Exception handling in Parsl should support the following cases:

Exceptions raised or exit codes returned
File outputs specified but missing
Input futures failed
Walltime exceeded
[Later] Data staging failed

We should also support this same set for both Bash and Python apps.

Unknowns:

Where in the code should an exception be raised/handled for asynchronous apps.
Retries and retry condition checks.

Add CONTRIBUTING.md

Since the number of contributors is growing, we should add the usual CONTRIBUTING.md document (example, example) to specify project conventions. This is not a super exciting task, but I think it will reduce onboarding time. I think it's more common to have this on the top-level (and we could link to it from docs/devguide, where we already have some great info), but I don't think it matters either way.

Here's a starting point for things that should probably be included.

Reporting issues

Instructions for reporting issues

Contributing code

Coding style conventions (this should probably just be "follow PEP8" or we could link to i.e. the google style guide)
Link to a description of which docstring style convention to follow (example)
Point to linting config (example)
Add examples of how to configure autopep8/yapf or equivalent with emacs and vim
Instructions for running tests
Description (or link to one) of the development workflow (example)
Description of how code reviews should be assigned
Instructions for commit messages (could add a link for further reference like this)

Execution Providers

This is a placeholder to track progress on feature development towards building Execution Providers.

Execution Providers are interfaces to compute resources such as clouds, clusters with schedulers, container orchestration systems etc. Such an interface would provide a reliable method to start tasks and most often pilot mechanisms that generally offer higher task dispatch rates.

As of this post we have the following:

Part of this work is being tracked on this google doc

new label needed for issues

Can we create a documentation label, so we can mark issues that are related to documentation? I either don't have permission to do this or can't figure out how in GitHub - or maybe both...

scriptDir vs script_dir inconsistencies.

Throughout documentation from libsubmit and general docs on configuring script_dir and scriptDir are mentioned. This inconsistency needs to be fixed in parsl, libsubmit and codes.

Might make sense to release this ahead as a point release.

Capture diagnostic information

Users would like to automate the capture of various levels of diagnostic information (sometimes used for provenance). Examples include: node info, OS level info, environment variables, scheduler info, code version, and resource usage. As a first cut we should provide the ability to capture arbitrary information via pre- and post-launch hooks. We can implement examples of these hooks to capture basic information such as that described above.

publish as conda package

This would be useful to install as a conda package. Can you add that to your roadmap? Suggest you create a channel on anaconda.org to publish the conda package.

AppFutures do not update

AppFutures do not update when result() is called before the job has been passed to the executor.

Support Butler file objects

DESC Requirement

This depends on File object issues #9 and #38

Explore butler file objects.
How to determine if butler file objects were created.
Identify when a butler output is a collection ?

This is still very vague and needs research into what is needed to support butler data objects in Parsl workflows. With all the unknowns this should be Parsl-0.5.0. Perhaps @annawoodard could help understand this space better.

Butler info: https://github.com/lsst/daf_butler

Cannot run two notebooks with IPyParallelExecutor

If I execute two Jupyter notebooks that both use IPyParallelExecutor workers, Parsl jobs for the notebook that started last do not run.

Bug

There seems to be a bug preventing the recursive fibonacci test running with Ipython parallel from passing. Although it passed the Travis CI check, it's not working on my local machine.

Live visualization of execution graph

This feature is motivated by Hemant's request to enable visualization of the execution graph and more importantly simplify the user-flow involved in determining failures, retries and recovery.

Here's a breakdown in terms of priority:

Visualize the DAG with tasks as vertices and futures/dependencies as edges
Update the DAG periodically by polling the tasks data structure in the dfk
Color code the vertices, mapping the run states
Enable hover to show status / fail codes/ exceptions etc as appropriate
Enable editing of the task ~~app~~ args
Design error flows to describe automated healing and user-assisted healing.
Enable resetting of task state to indicate a retry request

Quoting consistency in docs

Do we have a standard for single/double quotes? For example, see the code snippet here: http://parsl.readthedocs.io/en/latest/userguide/data.html

Better reporting of module import failures in apps

From conversations with @WardLT :

While our documentation points out that all modules required by the app call should be imported by the user in the function body, it is easy to miss some. Better reporting of these errors would be useful to the user.

Upload to PyPI

Test and ensure the setup works correctly.

Add travis.yml to confirm that builds and basic tests pass.
Upload to PyPI

External, post-execution visualization

We would like the ability to export the graph that was run and visualize the graph to understand what was executed.

Futures have inconsistent behavior in bash app fn body

Eg :

@App('bash', dfk)
def app1(inputs=[], outputs=[], stdout=None, stderr=None, mock=False ):
    cmd_line = '''echo 'test' > {outputs[0]}'''

@App('bash', dfk)
def app2(inputs=[], outputs=[], stdout=None, stderr=None, mock=False ):

    with open('somefile.txt', 'w') as f:
        f.write("%s\n" % inputs[0])          #<--------------- Here inputs[0] is a DataFuture
    cmd_line = '''echo '{inputs[0]}' > {outputs[0]}'''
app1_future = app1(inputs = [],
                       outputs = [ "simple-out.txt"])
#app1_future.result()                                                                                                                                                    

app2_future = app2(inputs=[app1_future.outputs[0]],
                       outputs = ["simple-out2.txt"])
app2_future.result()

One fix is to evaluate the fn body entirely at app execution time. This need some work.

hide dfk

If 99% of users and usages don't use the dfk variable, but simply pass it in and out of various functions, we should hide it by default.

Can we do this in a backward compatible way? (make dkf an optional arg)

This probably requires major documentation/tutorial changes.

Error when IPP is not installed

It would be nice if Parsl could catch this error and relay something useful to the user. Like install IPP.

Traceback (most recent call last):
File "/home/chard/parsl-source/parsl/parsl/dataflow/start_controller.py", line 87, in init
self.proc = subprocess.Popen(opts, stdout=stdout, stderr=stderr, preexec_fn=os.setsid)
File "/home/chard/miniconda3/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)
File "/home/chard/miniconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ipcontroller': 'ipcontroller'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "simple-parsl.py", line 39, in
dfk = DataFlowKernel(config=ipp_config)
File "/home/chard/parsl-source/parsl/parsl/dataflow/dflow.py", line 66, in init
self.controller_proc = Controller(**self.config["controller"])
File "/home/chard/parsl-source/parsl/parsl/dataflow/start_controller.py", line 91, in init
raise ControllerErr(msg)
parsl.dataflow.error.ControllerErr: Controller init failed:Reason:IPPController failed to start: [Errno 2] No such file or directory: 'ipcontroller': 'ipcontroller'

File types

We currently don't have a file type and use strings holding paths to files to represent files.
As a result it is difficult to differentiate between real strings and strings representing files.
We need to be able to identify files, especially in the returns list to make sure that the file was created/exists
and if not an app failure needs to be raised.

Add tests for examples in the Composing a workflow section

There are few code snippets in the newly added documentation that isn't in the test suite.
http://parsl.readthedocs.io/en/latest/userguide/workflow.html

Enable passing env variables to Parsl apps

Here's some code that should work, but is not currently supported :

  @App('bash', dfk):
  def foo (env = {key: value}) : 
        return ''' echo $key '''   # <--- this should work

RFC - Bash App definition change

We currently use a magic variable "cmd_line" in bash apps to specify the command-line string to be executed. For eg :

@App('python', dfk)
def hello ( ):
   cmd_line = "echo 'Hello world' "

This is bad for a few reasons:

This is surprising behavior in python
There's ambiguity in behavior if the user were to reassign to cmd_line
Any values returned are certainly lost (again unusual behavior for python)
Implementation wise, we trace the value assigned to the magic variable and this is computationally expensive.

I recommend that we switch to having bash apps return a string, and that string be treated as the command line executable and if we decide that this change is acceptable, we should merge for 0.3.0.
For eg:

@App('python', dfk)
def hello ( ):
   return "echo 'Hello World' "

Support for application profiling

Support for application profiling with reporting on:

CPU usage
Memory usage
Disk IO
Target resource info (env, uname -a etc..)

Perhaps we could collect this for apps which run longer than N minutes to reduce profiling overhead for trivial tasks ?