Coder Social home page Coder Social logo

parsl / parsl Goto Github PK

View Code? Open in Web Editor NEW
464.0 464.0 191.0 23.8 MB

Parsl - a Python parallel scripting library

Home Page: http://parsl-project.org

License: Apache License 2.0

Python 97.75% Shell 0.87% Makefile 0.36% C 0.09% CSS 0.02% HTML 0.90% Roff 0.01%
hacktoberfest

parsl's Introduction

Parsl - Parallel Scripting Library

Apache Licence V2.0 Build status Documentation Status NSF award info NSF award info NSF award info NSF award info

Parsl extends parallelism in Python beyond a single computer.

You can use Parsl just like Python's parallel executors but across multiple cores and nodes. However, the real power of Parsl is in expressing multi-step workflows of functions. Parsl lets you chain functions together and will launch each function as inputs and computing resources are available.

import parsl
from parsl import python_app


# Make functions parallel by decorating them
@python_app
def f(x):
    return x + 1

@python_app
def g(x, y):
    return x + y

# Start Parsl on a single computer
with parsl.load():
    # These functions now return Futures
    future = f(1)
    assert future.result() == 2

    # Functions run concurrently, can be chained
    f_a, f_b = f(2), f(3)
    future = g(f_a, f_b)
    assert future.result() == 7

Start with the configuration quickstart to learn how to tell Parsl how to use your computing resource, then explore the parallel computing patterns to determine how to use parallelism best in your application.

Quickstart

Install Parsl using pip:

$ pip3 install parsl

To run the Parsl tutorial notebooks you will need to install Jupyter:

$ pip3 install jupyter

Detailed information about setting up Jupyter with Python is available here

Note: Parsl uses an opt-in model to collect usage statistics for reporting and improvement purposes. To understand what stats are collected and enable collection please refer to the usage tracking guide

Documentation

The complete parsl documentation is hosted here.

The Parsl tutorial is hosted on live Jupyter notebooks here

For Developers

  1. Download Parsl:

    $ git clone https://github.com/Parsl/parsl
    
  2. Build and Test:

    $ cd parsl # navigate to the root directory of the project
    $ make   # show all available makefile targets
    $ make virtualenv # create a virtual environment
    $ source .venv/bin/activate # activate the virtual environment
    $ make deps # install python dependencies from test-requirements.txt
    $ make test # make (all) tests. Run "make config_local_test" for a faster, smaller test set.
    $ make clean # remove virtualenv and all test and build artifacts
    
  3. Install:

    $ cd parsl
    $ python3 setup.py install
    
  4. Use Parsl!

Requirements

Parsl is supported in Python 3.8+. Requirements can be found here. Requirements for running tests can be found here.

Code of Conduct

Parsl seeks to foster an open and welcoming environment - Please see the Parsl Code of Conduct for more details.

Contributing

We welcome contributions from the community. Please see our contributing guide.

parsl's People

Contributors

andrew-s-rosen avatar annawoodard avatar aymenfja avatar benclifford avatar benhg avatar btovar avatar cms21 avatar colinthomas-z80 avatar connorpigg avatar daheise avatar danielskatz avatar error-4u avatar garri1105 avatar harichandra-prasath avatar hategan avatar khk-globus avatar kylechard avatar lgray avatar lhayhurst avatar lukaszlacinski avatar macintoshpie avatar ravihansa3000 avatar rc-git avatar rjmello avatar sophie-bui avatar tjdasso avatar tphung3 avatar wardlt avatar yadudoc avatar zhuozhaoli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsl's Issues

quickstart: download needed?

On the quickstart part of the README, should we remove item 1 (download), since I think the pip install means that a manual download is not needed?

Usage Tracking and logging for Parsl project reporting

For reporting purposes we need to capture the following information

  • Hash of Username, hostname (ip)
  • Hash of Parsl script (if possible)
  • Count of apps launched by DFK
  • Site IDs used
  • Total run time

A UDP packet with this info could be sent out from the Parsl script and this could be captured and logged on a server hosted on AWS.
Here are the components we'll need

  • Log info on the client side and send a UDP packet
  • Receive and log info to stable storage on our side.
  • Config option to opt-out of tracking
  • Documentation to clearly indicate the info collected and how to opt-out

Missing documentation from Libsubmit

When we split the libsubmit repo out of parsl we also split key documentation that is necessary in Parsl.
We need to move this documentation back in either via interproject sphinx links, or even via adding libsubmit in via git sub-repos.

Troubleshooting guide

We need a troubleshooting guide to help understand and workaround known issues:

  • No public ip when trying to do remote execution results in workers hanging.
  • Parsl does not autodetect public ip
  • Apps fail with engine failure errors when there's a python version mismatch between client and remote side.
  • Issues with serializing certain data structures (from @WardLT)
  • Parsl complains about missing boto3 when installed via pip

Output files from apps

If we have an app that produces no output files we call such an app like :

<var:App_fu> = app_func ( .... )

However if the app has outputs, the return is a tuple :

<var:App_fu>, [ <var:Data_fu> ... ] = app_func ( ...., outputs=[ Files ....])

When this was last discussed, one proposed solution was to simply return the app future in both cases, and have the data futures be a property of the app future. This would look like this :

# Single return
app_fu = app_func ( ... , outputs=['a.txt', 'b.txt'] )
# User has to unpack outputs separately 
x, y = app_fu.outputs

Right now, we have the outputs property added, but continue to return a tuple of app future and the list of data futures. I propose that these changes should go into 0.3.0.

Splitting execution_providers out into a separate package

Other collaborators have expressed interest in sharing and building on the execution_providers component. This functionality is fairly self contained and could easily be forked into a separate repo and package. Parsl will depend on this separate package.

  • Split execution_providers into libsubmit repo
  • Create new pypi package libsubmit
  • Parsl to use libsubmit as a dependency

Circular dependency in Parsl

Reported by @dfj604

The following code will hang due to a circular dependency :

@App('python', dfk)
def App_A (x):
    import time
    time.sleep(0.2)
    return x*2

@App('python', dfk)
def AppSum (inputs=[]):
    return sum(inputs)

# Creating a list of App_Futures
app_futures = [ App_A(i) for i in range(0,2) ]
# Sum depends on the future results from the previous step
app_futures.extend([AppSum(inputs=app_futures)])
# However by extending app_futures with the future from AppSum, AppSum
# now depends on itself and will block forever.

print(app_futures)
[i.result() for i in app_futures]

While this maybe not ideal code from the user, this is something parsl should try to protect against.
Making a deep copy of the args and kwargs to immutable containers (say tuples) could help.

README+ changes

I think the README should say what the current version is, and ideally, should include a high-level roadmap, maybe via a link to another .md file.

Swift-T exits as parsl pumps >1K jobs

This is an issue on branch: swift_t_integrate at commit: 83d21c7 .

This is tested on Midway, and the following are steps to reproduce:

swift-t -l -n 6 executor.swift

Run the test with 100 tasks: This works.

python3 test.py -c 100

Run tests again with 1K tasks

python3 test.py -c 1000

This will fail with swift-t just exiting with no error messages and the parsl script hanging.

Support for S3 Files.

This issue is motivated by the need to execute workflows on AWS where data is often stored on S3.
We need the File types to support moving data to and from S3 as well as handle various auth mechanisms required to connect with S3 in a secure fashion. There are multiple problems that need to be tackled for this issue:

  1. How do we specify S3 files. Here's one possibility :
# Specifying an input file is easy
x = File("s3://parsl_dtest/foo.dat")

# Specifying an output file
# If foo.result is not on CWD and is nested, this can get ugly
y = File("s3://parsl_dtest/foo.result")

fu = app_process(x, outputs=[y])
  1. Staging modes.
    Does the S3 files get staged from the workers, client side. Are the going to be cached ?
    Can staged files be shared between workers ?

  2. Auth
    The auth model needs to be planned to handle the various modes by which AWS credentials can be handled. We want to avoid credentials moving in plaintext between the components as well as discourage putting credentials in plaintext in workflow code.

Work on this requires a resolution of #9

This feature is required for SwiftSeq (@JasonJPitt)

Automatic retry for failed apps

In many failure cases Parsl could retry an app within some given parameters. We need to develop a model for specifying retry logic and representing different failure modes in parsl (e.g., app failure vs system failure).

Dominic and others have requested this feature.

disconnections

What happens if a Parsl code starts an app, then disconnects?

For example, on an HPC system, Parsl is running on a front end node and launches a long-running HPC app, with another app that depends on it. While the HPC app is running, the front end node is restarted. The HPC app keeps running. Will the dependent app ever run?

name

How about PySwift.

Multisite support

This is to track ongoing work on the multisite branch.

Why is this important:

  1. This will allow us to manage apps with different resource requirements in a single workflow, such as apps that only need a core and should be executed on the login node vs, apps that run on multiple
    nodes with MPI.
  2. Light weight apps executing on threads could perform flowcontrol and launch heavy tasks.
  • Accept multiple executors in the dfk
  • Launch multiple executors in the DFK from the config definition
  • Support for multiple IPP executors
  • Attach specific tasks to sites via enhancements to the decorator to take sites kwarg :
@App ('bash', dfk, sites=['local'])
def foo (x):
    return 'echo $(({0} * 2))

@App ('bash', dfk, sites='all' ) # Sites = 'all is the default
def sleep_compute (x) :
    return 'sleep x'

Run directories

Workflows in production are often run in the same directory multiple times. Various log files, checkpoint files, submit scripts, config options etc are often produced and ideally stored in a manner that would allow the user to easily identify files associated with a single workflow run. Having a run specific directory would also help #39 and #48.

I propose that we have run directory that is named runNNN by default and placed in the current directory. Users would have the option to specify a different run directory path, if they so choose.

Request blocks from providers in response to workflow pressure.

Track available resources from each execution site, as well as workflow pressure to each site to make a determination about appropriately scaling to match workflow requirements.

This will need a few things :

  • Track outstanding tasks pending on a site
  • Implement algorithm that uses task overflow counters and timers to ensure balance of responsiveness
    vs effective measurement.
  • Integrate with provider interfaces to scale based on the task pressure algo

Support for containers

Support for running applications in containers. Make this a high level app type. Investigate support for reusing cached containers. Investigate support for containers as a sandbox.

Potentially we would want to support different container models for different systems. We'd want to define a single app that could then use different containers on each system.

Perhaps something like the following:

app('container', dfk):
def foo(param1, param2):
return {singularity : {container-uri: uri, config: foo, run: "param1 param2"},
docker: {container-uri: uri, config: foo, run: "param1 param2"}}

Maybe better to wrap each container type as its own class?

Clarity on passing futures to apps

From conversation with @WardLT :

The current doc set lacks clarity on how arguments to app functions work. For eg:

  • Explain with examples special keywords (stdout, stderr, inputs, outputs)
  • How futures can be passed as arguments
  • Clarify passing arbitrary no. of futures as a list via special keyword inputs

Workflow-level checkpointing

We need to provide the ability to record workflow state (e.g., task status, input/output, task command) and export it to an external store (e.g., database, file). Allow for restart from the checkpoint.

Exception handling

Exception handling in Parsl should support the following cases:

  1. Exceptions raised or exit codes returned
  2. File outputs specified but missing
  3. Input futures failed
  4. Walltime exceeded
  5. [Later] Data staging failed

We should also support this same set for both Bash and Python apps.

Unknowns:

  1. Where in the code should an exception be raised/handled for asynchronous apps.
  2. Retries and retry condition checks.

Add CONTRIBUTING.md

Since the number of contributors is growing, we should add the usual CONTRIBUTING.md document (example, example) to specify project conventions. This is not a super exciting task, but I think it will reduce onboarding time. I think it's more common to have this on the top-level (and we could link to it from docs/devguide, where we already have some great info), but I don't think it matters either way.

Here's a starting point for things that should probably be included.

Reporting issues

  • Instructions for reporting issues

Contributing code

  • Coding style conventions (this should probably just be "follow PEP8" or we could link to i.e. the google style guide)
  • Link to a description of which docstring style convention to follow (example)
  • Point to linting config (example)
  • Add examples of how to configure autopep8/yapf or equivalent with emacs and vim
  • Instructions for running tests
  • Description (or link to one) of the development workflow (example)
  • Description of how code reviews should be assigned
  • Instructions for commit messages (could add a link for further reference like this)

Execution Providers

This is a placeholder to track progress on feature development towards building Execution Providers.

Execution Providers are interfaces to compute resources such as clouds, clusters with schedulers, container orchestration systems etc. Such an interface would provide a reliable method to start tasks and most often pilot mechanisms that generally offer higher task dispatch rates.

As of this post we have the following:

  • Execution Provider interface
  • Slurm provider, tested on (RCC/midway and Nersc/Cori)
  • AWS provider
  • Azure provider
  • Jetstream provider that partially complies.
  • PBS provider
  • Condor provider (@LincolnBryant)

Part of this work is being tracked on this google doc

new label needed for issues

Can we create a documentation label, so we can mark issues that are related to documentation? I either don't have permission to do this or can't figure out how in GitHub - or maybe both...

scriptDir vs script_dir inconsistencies.

Throughout documentation from libsubmit and general docs on configuring script_dir and scriptDir are mentioned. This inconsistency needs to be fixed in parsl, libsubmit and codes.

Might make sense to release this ahead as a point release.

Capture diagnostic information

Users would like to automate the capture of various levels of diagnostic information (sometimes used for provenance). Examples include: node info, OS level info, environment variables, scheduler info, code version, and resource usage. As a first cut we should provide the ability to capture arbitrary information via pre- and post-launch hooks. We can implement examples of these hooks to capture basic information such as that described above.

publish as conda package

This would be useful to install as a conda package. Can you add that to your roadmap? Suggest you create a channel on anaconda.org to publish the conda package.

AppFutures do not update

AppFutures do not update when result() is called before the job has been passed to the executor.

Support Butler file objects

DESC Requirement

This depends on File object issues #9 and #38

  • Explore butler file objects.
  • How to determine if butler file objects were created.
  • Identify when a butler output is a collection ?

This is still very vague and needs research into what is needed to support butler data objects in Parsl workflows. With all the unknowns this should be Parsl-0.5.0. Perhaps @annawoodard could help understand this space better.

Butler info: https://github.com/lsst/daf_butler

Bug

There seems to be a bug preventing the recursive fibonacci test running with Ipython parallel from passing. Although it passed the Travis CI check, it's not working on my local machine.

Live visualization of execution graph

This feature is motivated by Hemant's request to enable visualization of the execution graph and more importantly simplify the user-flow involved in determining failures, retries and recovery.

Here's a breakdown in terms of priority:

  • Visualize the DAG with tasks as vertices and futures/dependencies as edges
  • Update the DAG periodically by polling the tasks data structure in the dfk
  • Color code the vertices, mapping the run states
  • Enable hover to show status / fail codes/ exceptions etc as appropriate
  • Enable editing of the task app args
  • Design error flows to describe automated healing and user-assisted healing.
  • Enable resetting of task state to indicate a retry request

Better reporting of module import failures in apps

From conversations with @WardLT :

While our documentation points out that all modules required by the app call should be imported by the user in the function body, it is easy to miss some. Better reporting of these errors would be useful to the user.

Upload to PyPI

Test and ensure the setup works correctly.

  1. Add travis.yml to confirm that builds and basic tests pass.
  2. Upload to PyPI

Futures have inconsistent behavior in bash app fn body

Eg :

@App('bash', dfk)
def app1(inputs=[], outputs=[], stdout=None, stderr=None, mock=False ):
    cmd_line = '''echo 'test' > {outputs[0]}'''

@App('bash', dfk)
def app2(inputs=[], outputs=[], stdout=None, stderr=None, mock=False ):

    with open('somefile.txt', 'w') as f:
        f.write("%s\n" % inputs[0])          #<--------------- Here inputs[0] is a DataFuture
    cmd_line = '''echo '{inputs[0]}' > {outputs[0]}'''
app1_future = app1(inputs = [],
                       outputs = [ "simple-out.txt"])
#app1_future.result()                                                                                                                                                    

app2_future = app2(inputs=[app1_future.outputs[0]],
                       outputs = ["simple-out2.txt"])
app2_future.result()

One fix is to evaluate the fn body entirely at app execution time. This need some work.

hide dfk

If 99% of users and usages don't use the dfk variable, but simply pass it in and out of various functions, we should hide it by default.

Can we do this in a backward compatible way? (make dkf an optional arg)

This probably requires major documentation/tutorial changes.

Error when IPP is not installed

It would be nice if Parsl could catch this error and relay something useful to the user. Like install IPP.

Traceback (most recent call last):
File "/home/chard/parsl-source/parsl/parsl/dataflow/start_controller.py", line 87, in init
self.proc = subprocess.Popen(opts, stdout=stdout, stderr=stderr, preexec_fn=os.setsid)
File "/home/chard/miniconda3/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)
File "/home/chard/miniconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ipcontroller': 'ipcontroller'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "simple-parsl.py", line 39, in
dfk = DataFlowKernel(config=ipp_config)
File "/home/chard/parsl-source/parsl/parsl/dataflow/dflow.py", line 66, in init
self.controller_proc = Controller(**self.config["controller"])
File "/home/chard/parsl-source/parsl/parsl/dataflow/start_controller.py", line 91, in init
raise ControllerErr(msg)
parsl.dataflow.error.ControllerErr: Controller init failed:Reason:IPPController failed to start: [Errno 2] No such file or directory: 'ipcontroller': 'ipcontroller'

File types

We currently don't have a file type and use strings holding paths to files to represent files.
As a result it is difficult to differentiate between real strings and strings representing files.
We need to be able to identify files, especially in the returns list to make sure that the file was created/exists
and if not an app failure needs to be raised.

Enable passing env variables to Parsl apps

Here's some code that should work, but is not currently supported :

  @App('bash', dfk):
  def foo (env = {key: value}) : 
        return ''' echo $key '''   # <--- this should work

RFC - Bash App definition change

We currently use a magic variable "cmd_line" in bash apps to specify the command-line string to be executed. For eg :

@App('python', dfk)
def hello ( ):
   cmd_line = "echo 'Hello world' "

This is bad for a few reasons:

  1. This is surprising behavior in python
  2. There's ambiguity in behavior if the user were to reassign to cmd_line
  3. Any values returned are certainly lost (again unusual behavior for python)
  4. Implementation wise, we trace the value assigned to the magic variable and this is computationally expensive.

I recommend that we switch to having bash apps return a string, and that string be treated as the command line executable and if we decide that this change is acceptable, we should merge for 0.3.0.
For eg:

@App('python', dfk)
def hello ( ):
   return "echo 'Hello World' "

Support for application profiling

Support for application profiling with reporting on:

  1. CPU usage
  2. Memory usage
  3. Disk IO
  4. Target resource info (env, uname -a etc..)

Perhaps we could collect this for apps which run longer than N minutes to reduce profiling overhead for trivial tasks ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.