Coder Social home page Coder Social logo

mikedacre / fyrd Goto Github PK

View Code? Open in Web Editor NEW
21.0 21.0 8.0 6.71 MB

Submit functions and shell scripts to torque and slurm clusters or local machines using python.

Home Page: https://fyrd.science

License: MIT License

Python 98.90% Shell 1.10%
bioinformatics-pipeline library python python2 python3 python3-library slurm slurm-cluster torque

fyrd's People

Contributors

jbloom avatar mikedacre avatar takluyver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fyrd's Issues

Add a profile management script to bin

Profile management is currently only useful for relatively advanced python users, should write a management script that can call all of the config_file functions easily.

Problem with array job id

Hi
In our queue system there are some array jobs with job ids like: 136374[].psk
When I run for example:
job = fyrd.Job('ls .', profile='short')
this gives me:
`~/anaconda3/lib/python3.5/site-packages/fyrd/queue.py in torque_queue_parser(user, partition)
--> 680 job_id = int(xmljob.find('Job_Id').text.split('.')[0])
681 job_owner = xmljob.find('Job_Owner').text.split('@')[0]
682 if user and job_owner != user:

ValueError: invalid literal for int() with base 10: '136374[]'`

clean doesn't get all the files

On my computer, submitting a job creates four files. For instance, job = submit('ls -l') creates:

ls.0.cluster.err 
ls.0.cluster.out 
ls.0.cluster.sbatch
ls.0.cluster.script

However, running job.clean() only cleans up the .sbatch and .script files; the .err and .out files still exist after running job.clean().

Merge python-pipeline into this project

The python-pipeline project is an effort to make it easy to create complex pipelines with python. It isn't that useful outside of a multithreading environment, so it makes sense to merge it in here and implement native multithreading in that project through the cluster module.

Rather than keep it as a separate project, the pipeline package should be added as a separate package alongside cluster to be used if the user wishes. However, it is important that its usage is not required in order to use the cluster package. ie. pipeline should depend on cluster, but cluster should not depend on pipeline.

Add decorator definitions

I want folks to be able to add a simple decorator to a function to make it submit to the cluster when called.

Add `resubmit` method to Job class

Right now the flow of the Job submission process comes to a natural end at job completion. There is no real need for this.

Instead:

  • Option to resubmit should be instantly responsive for all failed jobs
  • It should ask for confirmation if job succeeded
  • It should refuse to continue (without raising an Exception) if the job is currently queued or running, job must be cancelled first.

This should work naturally with the fix to Issue #4 so that the user can update attributes and then resubmit the job.

Create a 'Pool' class?

It might be a nice idea to make it possible to spawn a pool object, like the multiprocessing module, and communicate with it similarly. The would require a database and daemon mode to work properly.

Fix docstrings to work with sphinx

A number of my docstrings are formatted to work well with python's help display, but they are not parsed correctly by Sphinx, all of the docstrings need to be updated so that documentation is clear.

New name?

pycluster, python-cluster, and similar are taken, could use a better name for pip

Remove srun from slurm scripts

There isn't a good reason to use this anymore, rather than recreating all functionality, I think it would be a better idea to enforce a single task per fyrd job.

Calling Queue with no arguments throws error

For slurm, calling q = Queue() or q = Queue(user=None) raises an exception, even though None is the default value of user:

>>> q = cluster.Queue(user=None)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "cluster/queue.py", line 141, in __init__
     self._update()
   File "cluster/queue.py", line 311, in _update
     self.user):
   File "cluster/queue.py", line 641, in slurm_queue_parser
     outqueue.append((sid, sname, suser, spartition, sstate, snodelist,
UnboundLocalError: local variable 'snodelist' referenced before assignment

torque script PBS directive error

Interesting project. In doing a quick test-drive, I've hit a PBS directive error. It might be a config error or even a torque version mismatch, but fyrd.get_cluster_environment() does return 'torque'. mem gets the complaint, but I believe walltime also needs to be prefixed by -l in the job script.

>>> j = fyrd.Job('ls ', ['.'])
>>> j.submit()
20161129 17:28:48.344 | WARNING --> Command qsub /home/icooke/ls.0.1493c48f.cluster.qsub failed with code 1, retrying.
20161129 17:28:49.348 | WARNING --> Command qsub /home/icooke/ls.0.1493c48f.cluster.qsub failed with code 1, retrying.
20161129 17:28:50.352 | WARNING --> Command qsub /home/icooke/ls.0.1493c48f.cluster.qsub failed with code 1, retrying.
20161129 17:28:51.359 | WARNING --> Command qsub /home/icooke/ls.0.1493c48f.cluster.qsub failed with code 1, retrying.
20161129 17:28:52.364 | CRITICAL --> qsub failed with code 1
-----------------------------------> stdout: 
-----------------------------------> stderr: qsub: directive error: mem=4000MB

And the script file

$ cat /home/icooke/ls.0.1493c48f.cluster.qsub
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS mem=4000MB
#PBS -q myqueue
#PBS -e /home/icooke/ls.0.1493c48f.cluster.err
#PBS -o /home/icooke/ls.0.1493c48f.cluster.out
#PBS walltime=04:00:00
mkdir -p $LOCAL_SCRATCH > /dev/null 2>/dev/null
cd /home/icooke
date +'%y-%m-%d-%H:%M:%S'
echo "Running ls.0.1493c48f"
ls .
exitcode=$?
echo Done
date +'%y-%m-%d-%H:%M:%S'
if [[ $exitcode != 0 ]]; then
    echo Exited with code: $exitcode >&2
fi

slurm jobs not entering queue

After fixing in the pull request I just submitted, I now see that somehow my slurm jobs are not entering the queue. This worked for me last night, so I'm not sure why it has stopped working now:

>>> job = cluster.Job('ls')
>>> job.write()
>>> job.submit()
Job:ls.0<slurm:41472211(command:ls;args:None)SUBMITTED>
>>> job.wait()
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "cluster/job.py", line 513, in wait
      self.queue.wait(self)
    File "cluster/queue.py", line 204, in wait
       '{} not in queue'.format(job))
    cluster.queue.QueueError: 41472211 not in queue

This problem arises despite the fact that the submitted job actually runs just fine and produces the expected output in the ls.0.cluster.out file.

Auto update Job scripts when attributes are changed until files are already written.

Right now jobs scripts are created by init. They should be stored in a format that allows them to be updated at any time prior to submission. i.e. changing the 'cores' keyword will update the script.

The best way to do this will be to move the string formatting into a write_script method of the Job.Script class that can be called any time to overwrite the current scripts.

Create pandas function/class test file

Create a test function that writes out a pandas .py file and then submits a function from it.

I am worried that classes and complex functions will fail to pickle with the current methods.

To simulate a real life use case, we need to write out a .py file with a pandas utilizing function, submit it to the cluster, and then get the dataframe back again, all in a test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.