luispedro / jug Goto Github PK

View Code? Open in Web Editor NEW

405.0 20.0 62.0 2.29 MB

Parallel programming with Python

Home Page: https://jug.readthedocs.io

License: MIT License

Python 91.53% TeX 8.04% Shell 0.18% Nix 0.26%

python parallel-computing hpc python-2 python-3 workflow workflow-engine

jug's Issues

Config file may be read incorrectly

I had a long message typed out, and then I realized what the problem was :(

Depending on what your intentions are for the config file to be able to comprehend this could be a bug , or it's just a documentation error.
This is from http://jug.readthedocs.io/en/latest/configuration.html?highlight=config


[main]
jugdir=%(jugfile).jugdata
jugfile=jugfile.py

[status]
cache=off

[execute]
aggressive-unload=False
pdb=False
wait-cycle-time=12
nr-wait-cycles=150

This yeilds the error:

%(jugfile).jugdata
Traceback (most recent call last):
  File "/usr/local/bin/jug", line 9, in <module>
    load_entry_point('Jug==1.4.0+git', 'console_scripts', 'jug')()
  File "/usr/local/lib/python3.5/dist-packages/Jug-1.4.0+git-py3.5.egg/jug/jug.py", line 449, in main
    options = parse(argv[1:])
  File "/usr/local/lib/python3.5/dist-packages/Jug-1.4.0+git-py3.5.egg/jug/options.py", line 302, in parse
    'jugfile': cmdline.jugfile[:-3],
ValueError: unsupported format character 'j' (0x6a) at index 11

if you change
jugdir=%(jugfile).jugdata
to:
jugdir=jugfile.jugdata

everything works fine

A fix for this would be to drop the % {} notation, and switch this and the documentation to python .format notation.

the result would be a config file that looks like this:

[main]
jugfile=jugfile.py
jugdir={jugfile}.{date}.jugdata

and your implementation in the source would look like:

    cmdline.jugdir = cmdline.jugdir.format(
                date=datetime.now().strftime('%Y-%m-%d'),
                jugfile=cmdline.jugfile[:-3])

IMO this is cleaner from a config file perspective, and code readability perspective

For reference it used to be:

    cmdline.jugdir = cmdline.jugdir % {
                'date': datetime.now().strftime('%Y-%m-%d'),
                'jugfile': cmdline.jugfile[:-3],
                }

Which is a format I've never used, so therefore know nothing about, maybe it's a quick fix?

Documentation frontpage is outdated

The front page of the documentation states, among other things, that Python 3 is unsupported (even though the changelog just above has an entry indicating it was added). Additionally, the "What's New" section is over a year old, making it seem like jug is under slightly less development than it really is.

Implement --target for jug cleanup

I'm finding myself wanting to cleanup outputs of only some functions but not others.

Since --target already exists in other commands, would it be a viable option here?

Error while running jug execute prime.py

I tried the default example given in the documents. I get the following error. Can you please help me

Traceback (most recent call last):
  File "/storage/work/dua143/anaconda2/bin/jug", line 11, in <module>
    sys.exit(main())
  File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/jug.py", line 278, in main
    cmdapi.run(options.subcommand, options=options, store=store, jugspace=jugspace)
  File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/subcommands/__init__.py", line 271, in run
    return cmd(*args, **kwargs)
  File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/subcommands/__init__.py", line 161, in __call__
    return self.run(*args, **kwargs)
  File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/subcommands/execute.py", line 86, in run
    store, jugspace = init(options.jugfile, options.jugdir, store=store)
  File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/jug.py", line 97, in init
    jugfile_contents = open(jugfile).read()
IOError: [Errno 2] No such file or directory: 'primes.py'

Failed to assign tasks correctly

Hi, I wrote a small piece of python code on a macbook which has 10 + 1 tasks. 10 similar sleeping task and 1 joint task. I am running it by 4 processes and found that sometime Jug cannot assign task correctly. It happens like one process does all tasks along and, at the same time, another 3 share the 11 tasks.

Always trigger `hooks.exit_checks.exit_env_vars`

Running jug.hooks.exit_checks.exit_env_vars() if the specific JUG_* environment variables are not defined has no effect. It should therefore be safe to run at every execution.

Could the behaviour introduced by this hook become a permanent feature such that you can always influence jug execution though JUG_* variables without having to explicitly register the hook on every jugfile?

AttributeError: 'NoneType' object has no attribute 'can_load' in value function

when i compute value of primes with value(primes100),it emerges the error as follows:
AttributeError: 'NoneType' object has no attribute 'can_load' in value function,
any help will be much appreciated! the variable store is always None

Will jug create a lot of overhead in clusters

I want to run a simple script in a cluster whose network transmission rate is relatively low. Processes in my script, if created by multiprocess module, have little communication with each other.

However, since jug need IO over network during execution, will it create a lot of overhead here?

Racing issue with NFS file backend

Hi, First I'd like to thank you guys creating jug, which is very convenient to use for embarrassingly parallel problems.

However I have found a bug using jug on a cluster (PBS). I used file backend and I did find racing problems. The simple task I used for testing is this:

====================

myID = 'procid.%s.%s' % (str(os.getpid()) , os.environ['HOSTNAME'])

def createdir(d):
if not os.path.isdir(d):
os.mkdir(d)
os.mkdir('%s/%s' % (d,myID))
else:
os.mkdir(d + '.' + myID)

@TaskGenerator
def runjob(cmd):
hash = hashlib.md5(str(cmd)).hexdigest()
createdir(hash)

cmds = [c.strip() for c in open(joblistfile,'rt').read().splitlines() if c.strip() and not c.strip().startswith('#')]

for cmd in range(1024):
runjob(str(cmd))

==============================

I submitted two instances in two different nodes.
I found in the directory created by the task, there were dir names with ".procid." which means racing condition did occur.

The reason I think is "O_EXCL is broken on NFS file systems," as described by GNU/Linux open(2) manpage. I would suggest using flufl.lock, which is safe over nfs:
http://pythonhosted.org/flufl.lock/docs/using.html

Thank you very much for your attention to the problem.

-Zheng Qu

The right way to use Jug with class object

Hi, I defined a class which has a init. There are some methods in this class is decorated with @TaskGenerator. These methods cannot get class instance correctly. I tried 2 ways to solve it and they all don't work.

decorate init with @TaskGenerator then init() return's a jug Task instead of None
put a barrier() after defining the class object then got TypeError: cannot serialize '_io.TextIOWrapper' object

Would you mind to tell me the correct way to do it?

`pdb` option doesn't seem to work

I'd like to debug errors that occur when I run a jug script. However, adding the --pdb flag doesn't seem to have any effect.

$ jug execute --debug --pdb jug-init-inducing-fixedhyp.py
...
CRITICAL:root:Could not import file 'jug-init-inducing-fixedhyp.py' (error: too many values to unpack (expected 4))
Traceback (most recent call last):
  File ...

Am I misunderstanding the intended use of the --pdb option?

(Many thanks for the software, it's great)

Jug throwing error, while running on PBS based HPC cluster

I've run the primes example given in docs, and the HPC I'm using has 5 nodes and PBS based.

qsub: invalid option --t
Here's the error

jug execute "/home/jaideep.gedi.17cse/qgis/test.py" | qsub -t 1-100
qsub: invalid option -- 't'
usage: qsub [-a date_time] [-A account_string] [-c interval]
[-C directive_prefix] [-e path] [-f ] [-h ] [-I [-X]] [-j oe|eo] [-J X-Y[:Z]]
[-k o|e|oe] [-l resource_list] [-m mail_options] [-M user_list]
[-N jobname] [-o path] [-p priority] [-P project] [-q queue] [-r y|n]
[-S path] [-u user_list] [-W otherattributes=value...]
[-v variable_list] [-V ] [-z] [script | -- command [arg1 ...]]
qsub --version
Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe

Please look into this..

Jug doesn't like jobs when they're ended by SIGTERM

There seems to be a weird interaction when you run jug cleanup while jobs are still running, but I could be wrong, but in addition, when a job is ended by sigterm jobs seem to enter the completed stage and never finish.

Jug sleep-until exits too early when using barrier

Reported on the jug-users mailing list:

I have a code that requires multiple merge points. At these merge points, a file is written, and the next section of the code uses that file. To address this, I've used jug.barrier(). Since I would like to submit this as a batch job, I created a script run_jug.sh, which will run multiple jug processes on a single node. In that script, I use sleep-until to prevent the batch job from ending early. The problem I'm encountering is that sleep-until seems to be satisfied when all jug processes hit the first jug.barrier(), and as such the batch job ends early.

I confirm that this is a bona fides bug.

Calling execute from code

Hello, is it possible to call an equivalent of jug execute from code? The context is that I want to configure tasks with values from other libs (e.g. one parameter would be the number of workers) and possibly use something like joblib to start jug workers. Thanks.

Is it possible to pass options into jugfile?

I am implementing a distributing calculation system with jug. I want to know whether it is possible to pass some options to jugfile from the command line of jug(1) or elsewhere, rather than let jugfile read them itself.

PyPI links to outdated Jug documentation on pythonhosted.org

I just realized there is an up-to-date documentation on http://jug.readthedocs.org/en/latest/

Previously, duckduckgo [1] referred me to http://pythonhosted.org/Jug/
This old version is also linked on PyPI: https://pypi.python.org/pypi/Jug ("Package documentation")

[1] https://duckduckgo.com/?t=lm&q=python+jug&ia=about

jug shell's invalidate() fails if NoLoad task in DAG

Consider the following jugfile:

import random

import jug
from jug.io import NoLoad

@jug.TaskGenerator
def gauss(i):
    return random.gauss(0, 1)

@jug.TaskGenerator
def load_and_square(t):
    data = jug.value(t.t)
    return data**2

@jug.TaskGenerator
def sum_squares(nums):
    return sum(nums)

ts = [load_and_square(NoLoad(gauss(i))) for i in range(3)]
sum_squares(ts)

The command jug invalidate --target jugfile.gauss (as well as load_and_square and num_squares) all behave as expected. However, if you drop into a jug shell (for example because only a subset of the tasks should be invalidated) and use the "invalidate()" builtin, you encounter unexpected behavior:

$ jug shell
In [1]: get_tasks()
Out[1]:
[Task(jugfile.gauss, args=(0,), kwargs={}),
 Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214168>,), kwargs={}),
 Task(jugfile.gauss, args=(1,), kwargs={}),
 Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e2141f8>,), kwargs={}),
 Task(jugfile.gauss, args=(2,), kwargs={}),
 Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214288>,), kwargs={}),
 Task(jugfile.sum_squares, args=([Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214168>,), kwargs={}), Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e2141f8>,), kwargs={}), Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214288>,), kwargs={})],), kwargs={})]

In [2]: invalidate(get_tasks()[0])
Building task DAG... (only performed once)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/tmp/jugfile.py in <module>
----> 1 invalidate(get_tasks()[0])

~/.envs/jug-debug/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/shell.py in _invalidate(t)
    135             '''
    136             from ..task import alltasks
--> 137             return invalidate(alltasks, reverse_cache, t)
    138
    139         def _get_tasks():

~/.envs/jug-debug/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/shell.py in invalidate(tasklist, reverse, task)
     67         for t in tasklist:
     68             for d in t.dependencies():
---> 69                 reverse.setdefault(d.hash(), []).append(t)
     70     queue = [task]
     71     seen = set()

AttributeError: 'NoLoad' object has no attribute 'hash'

Since this error occurs with any task, it seems like maybe the NoLoad "poisons" the whole DAG. It does not happen with Task.invalidate().

Ability to pass argument(s) that don't create new tasks

I often find myself using the following code pattern:

if condition:
    TMPDIR = "/some/temp"
else:
    TMPDIR = "/tmp"

@TaskGenerator
def func(item):
    use_item(item, TMPDIR)
    ...

when I want to have variables that are used by the function but if changed do not result in different tasks.

Do you have any suggestion for how to handle this use-case?

An alternative could be to have a special argument that could be modified without generating new tasks.
Do you see the following as a reasonable proposal?

@TaskGenerator
def func(item, __jugbypass):
    tmpdir = __jugbypass["TMPDIR"]
    use_item(item, tmpdir)
    ...

if condition:
    TMPDIR = "/some/temp"
else:
    TMPDIR = "/tmp"

func(item, __jugbypass={"TMPDIR": TMPDIR})

The main advantage of the second example is the ability to reuse code across jugfiles.

`jug status` output over 80 characters wide

Could this be 80 characters so it will fit on a standard terminal window?

Tasklet usage/Tasklet.can_load()

I seem to be hitting a few problems with Tasklets. I've implemented an example of what I'm talking about based upon the documentation:

import jug

def square(i):
    return i**2

def gen_list(n):
    return [i for i in range(n)]

def select_first(t):
    return t[0]

result = jug.Task(gen_list, 5)
result0 = jug.Tasklet(select_first, result)
next = jug.Task(square, result0)

Running jug status on this code gets me the following error:

  File "/home/jrporter/miniconda2/envs/std3/bin/jug", line 11, in <module>
    load_entry_point('Jug==1.6.7+git', 'console_scripts', 'jug')()
  File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/jug.py", line 290, in main
    cmdapi.run(options.subcommand, options=options, store=store, jugspace=jugspace)
  File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/__init__.py", line 271, in run
    return cmd(*args, **kwargs)
  File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/__init__.py", line 161, in __call__
    return self.run(*args, **kwargs)
  File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/status.py", line 263, in run
    return _status_nocache(options)
  File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/status.py", line 223, in _status_nocache
    elif t.can_run():
  File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/task.py", line 140, in can_run
    if not hasattr(dep, '_result') and not dep.can_load():
  File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/task.py", line 421, in can_load
    return self.base.can_load()
AttributeError: 'function' object has no attribute 'can_load'

If I comment out the last line (next = jug.Task(square, result0)), it "works":

      Failed     Waiting       Ready    Complete      Active  Task name
-------------------------------------------------------------------------------------
           0           0           1           0           0  test-tasklets.gen_list
......................................................................................
           0           0           1           0           0  Total

My obvious interpretation of this is that the Tasklet object doesn't have a can_load parameter (which seems like it should return true if its dependencies can be loaded?), but perhaps I'm doing something wrong?

Could you advise me on how to proceed? I have read the docs page several times and can't seem to figure out what I'm doing wrong, if anything.

Premature "No tasks can be run" if a lot of small tasks are processed?

Ola @luispedro ,
during my simulations I came across this.

I assume that a lot of workers were doing small tasks, and then I assume jug exits the loop prematurely here as all upnext tasks can be loaded in the mean time. This loop is outside the wait cycles loop, and hence I assume that only the outer execution loop captures this case, but only twice before exiting.

Well, I know that a jug task should be of decent size for a reason, but I still would consider this is something to think about, if I am not mistaken in my assumptions here.

The number of actually running workers was about 80, and 14 of them exited prematurely like this. Everything else works like a charm, no exceptions occurred.

Complete stdout error log http://pastebin.com/vjTYHthD of one worker
Jugfile http://pastebin.com/hqwMCzXM

        gridjug.grid_jug(
            jugfile=BOND_PERCOLATION_JUGFILE,
            jugdir=BOND_PERCOLATION_JUGDIR,
            jug_args=[
                '--aggressive-unload',
            ],
            jug_nworkers=400,
            keep_going=True,
            verbose=True,
            **GRIDMAP_PARAMS
        )

Jug reruns tasks with more than one keyword argument

When one supplies a task with several keyword arguments, jug stores them in a dictionary.
Subsequently, jug computes the hash by iterating through the dictionary.
This implicitly orders the dictionary when updating the hash of the task.
However, the order of keyword arguments (or a dictionary) is undefined in Python < 3.5 (cf. PEP 0468). Hence the implicit change of order of the supplied keyword arguments leads to recomputation of the task (if that particular order has not been computed before).

Invalidate --target is too greedy

-------------------------------------------------------------------------------------
           0           0          1           0           0  jugfile.function
           0           0          1           0           0  jugfile.function_other
.....................................................................................
           0           0          2           2           0  Total

Running jug invalidate --target jugfile.function invalidates everything.

NoLoad does not track dependencies

I am not exactly sure what the purpose of NoLoad is, but when it is used, tasks are not aware of their own dependencies. I am not sure if this is intended behavior or not, but I thought I'd report it.

For example, consider:

import jug
from jug.utils import jug_execute
from jug.io import NoLoad
import random

@jug.TaskGenerator
def gauss():

    return random.gauss(0, 1)

@jug.TaskGenerator
def write_map_ts(ts):

    import os
    outfilename = os.path.abspath('./gauss.txt')

    with open(outfilename, 'w') as f:
        for t in ts:
            data = jug.value(t.t)
            f.write(str(data)+'\n')

    return outfilename

map_ts = [gauss() for i in range(10)]
f = write_map_ts([NoLoad(t) for t in map_ts])

jug_execute(['cat', f])

produces the dependency graph:

Note that although the true dependency graph is gauss <- write_map_ts <- jug_execute, the link between gauss and write_map-ts is interrupted by NoLoad.

Would solving this issue be as simple as implementing dependencies for NoLoad?

Cannot load empty strings

Hi,

first of all, jug is a great module to distribute tasks on a cluster without too much overhead and it already helps me a lot.

While using it, I found that one cannot read results which contain empty strings in python 2.7.6 (not tested with python 3). Here is a minimal working example:

File jugtest.py

from jug import Task

def func():
    return ''
t = Task(func)

File jugtest_result.py

import jug
jug.init('jugtest.py', 'jugtest.jugdata')
import jugtest

print(jugtest.t.value())

Output:

$ jug execute jugtest.py
Executed      Loaded  Task name
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1           0 jugtest.func                                                                   
.............................................................................................................................................................................................................
1           0     Total 

$ python jugtest_result.py 
Traceback (most recent call last):
  File "jugtest_result.py", line 5, in <module>
    print(jugtest.t.value())
  File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/task.py", line 113, in value
    return self.result
  File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/task.py", line 108, in _get_result
    if not hasattr(self, '_result'): self.load()
  File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/task.py", line 150, in load
    self._result = self.store.load(self.hash())
  File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 228, in load
    return decode_from(input)
  File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/backends/encode.py", line 192, in decode_from
    return pickle.load(stream)
EOFError

This is not a big issue because I can just output a dummy string, but it would be great if this can be fixed or documented so other users don't have this issue again.

"SystemError: error return without exception set"

Ok, so take for example

def _buildXY():
  tan_train = [Task(np.random.randn, 100000, 528) for i in range(7)]
  tan_test = [Task(np.random.randn, 100000, 528) for i in range(5)]

  # xs, ys - lists of arrays
  def pad(xs, ys):
    maxT = max( x.shape[0] for x in xs )
    x = np.zeros((len(xs), maxT, xs[0].shape[1]))
    y = np.zeros((len(ys), maxT, ys[0].shape[1]))
    for ii in xrange(len(xs)): 
      t = xs[ii].shape[0]
      x[ii,:t,:] = xs[ii]
      y[ii,:t,:] = ys[ii]
    return x, y

  y_train = [np.random.randn(100000, 5) for i in range(7)]
  y_test = [np.random.randn(100000, 5) for i in range(5)]

  train = Task(pad, tan_train, y_train)
  test = Task(pad, tan_test, y_test)

  return train[0], train[1], test[0], test[1]

_buildXY()

Running jug execute gives me the error:

CRITICAL:root:Exception while running tangent_lstm.pad: error return without exception set
Traceback (most recent call last):
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/bin/jug", line 5, in <module>
    main()
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 415, in main
    execute(options)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 254, in execute
    execution_loop(tasks, options)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 163, in execution_loop
    t.run(debug_mode=options.debug)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/task.py", line 95, in run
    self.store.dump(self._result, name)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 139, in dump
    encode_to(object, output)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/encode.py", line 77, in encode_to
    write(object, stream)
SystemError: error return without exception set

How is it better than rq?

RQ is based on redis and used by many? how is Jug better?

status not showing running tasks

Hi!

jug status does not show me the number of running tasks. I'm using Python 3.4.
Is it possible that this is the same bug that caused #10? I noticed that all the lockfiles start with b'...

None in task name

For example, test_jug.py:

from jug import Task
import numpy as np

Task(np.random.randn, 100, 20)

Then,

 $ rm -rf test_jug.jugdata/ && jug execute test_jug.py 
    Executed      Loaded  Task name                                                      
-----------------------------------------------------------------------------------------
           1           0  None.randn                                                     
.........................................................................................
           1           0  Total

I'm on jug 1.1.

add options to webstatus subcommand (port, ipaddress)

It would be nice if there was possible to pass options to the webstatus app, like the host to listen on and the port.

Something like:

--ip=<Unicode>
    The IP address the webstatus server will listen on.
    Default: 'localhost'
--port=<Int>
    The port the webstatus server will listen on.
    Default: 8080

Sometimes you might already be using that port for something else, and it would be nice to be able to have a separate instance.

Jug design question - failed tasks

Hi all,

While using and developing jug_schedule I've come across several design issues that boil down to the question:

How can I tell if a task has failed before?

Locks in jug are created once and left for the entire duration of execution.
They are always removed when a task finishes, even if it fails, unless processes or nodes crash in which case they are left on the filesystem and percived by jug as "Running".

The consequence of removing locks on failures is that every time a new jug execute is launched the failing task is retried.

The ideal solution here would be a keep-alive mechanism identical to what is present in NGLess in addition to leaving the lock behind in case of failures.

Some brainstorming on other alternatives include (with caveats):

Keep locks on failure and modify them to distinguish from 'running'. Failure could be encoded as:
- timestamp to 'epoch + 1second' (the nix way)
- write a failure message into the file

These provide sufficient information to know if a task failed but are still insufficient to distinguish running from crashed.
The lock would still be left behind so for all practical purposes the task wouldn't be retried until jug cleanup is used.

Extras:

Presence of a lock prevents depending tasks from starting and keeps them waiting
- This would have to be modified to "give up" if the lock is in 'failed' state
jug cleanup removes all locks, regardless of their state. jug cleanup --failed-only could make sense here.
jug status could display a useful Failed column.

`jug.init` does not work without extensions

When trying the examples, e.g., examples/decrypt/printjugresults.py:

$ python3 printjugresults.py 
Traceback (most recent call last):
  File "printjugresults.py", line 2, in <module>
    jug.init('jugfile', 'jugdata')
  File ".../lib/python3.4/site-packages/jug/jug.py", line 386, in init
    jugfile_contents = open(jugfile).read()
FileNotFoundError: [Errno 2] No such file or directory: 'jugfile'

So I change the jug.init call to use jugfile.py instead:

$ python3 printjugresults.py 
Traceback (most recent call last):
  File "printjugresults.py", line 4, in <module>
    results = jug.task.value(jugfile.fullresults)
  File ".../lib/python3.4/site-packages/jug/task.py", line 446, in value
    return elem.value()
  File ".../lib/python3.4/site-packages/jug/task.py", line 111, in value
    return self.result
  File ".../lib/python3.4/site-packages/jug/task.py", line 106, in _get_result
    if not hasattr(self, '_result'): self.load()
  File ".../lib/python3.4/site-packages/jug/task.py", line 147, in load
    assert self.can_load()
AssertionError

Then I change jug.init to also use jugfile.jugdata, and:

$ python3 printjugresults.py 
Jug is perfect for embarassingly parallel applications.
    -- Luis Pedro Coelho

    Password was 'qwert'

According to the documentation, the full names should not be necessary, but that does not seem to be the case.

Processes only start working after checking status

When using
subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, creationflags=CREATE_NEW_CONSOLE)
runing jug process, the subprocess will only starts after checking status.
The same thing happens when running by
subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, shell=True)

This problem doesn't appear when using
subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT)

Would you mind to help?

jug status: Fix order of tasks?

Hi @luispedro , is there any chance to fix the order of tasks in repeated calls of jug status? Or am I missing something there? I am using a setup like

watch jug status

and for now, every new call to jug status produces a new order of tasks.

Config file is never read

Could just be a linux thing (not sure what you develop on) but the tildes are not expanded in the function:
read_configuration_file(fp=None). This results in never passing the IOERROR check, and thus, the jugrc file never being loaded (at least on my comp)
I think you have to use path.expanduser(fp) to replace the fp in the path.exists check and the open(fp) function.

Sadly though when I do that and get past the IOERROR check I get the error

File "/usr/local/lib/python3.5/dist-packages/Jug-1.4.0+git-py3.5.egg/jug/options.py", line 299, in parse
'jugfile': cmdline.jugfile[:-3],
ValueError: unsupported format character 'j' (0x6a) at index 11

So clearly I'm missing something in the bigger picture!

Raises error if jugdir is not defined

If jugdir is not provided on command-line to "jug execute", and there is no jug config file, the program errors out, since this is prior to getting to the call to init() which handles the missing jugdir.

Just define a default:

parser.add_option('--jugdir',
                action='store',
                dest='jugdir',
                default='jugdir', # define default value
                help='Where to save intermediate results')

CompoundTaskGenerator

Hello, I was wondering whether there was a reason why CompoundTaskGenerator is not in the init.py

Is this feature still working? Or is it deprecated, still in progress, or just forgotten?

Great project by the way, it has been very useful!

jug status not working

Hi,
First of all thanks for this great lib! I am thinking about moving some of my code to it. However I have a problem here (python 3.3.5) with jug status. For example if I am running the prime numbers example and jug status primes.py always output :

    Waiting       Ready    Finished     Running  Task name                                                                 
----------------------------------------------------------------------------------------------------------------------------
           0          99           0           0  primes.is_prime                                                           
............................................................................................................................
           0          99           0           0  Total

Even though the computations are finished as I can see by doing jug shell primes.py

I did look around in jug/subcommands/status.py and by commenting this line:
https://github.com/luispedro/jug/blob/master/jug/subcommands/status.py#L186
I have the correct behaviour.

Thanks!

Development installation

Hi Luis,

I am trying to install a version of jug I can hack on as I do my projects. I downloaded the git repository and ran python setup.py install which appeared to be successful. However, now when I run jug status I get the error:

$ jug status
CRITICAL:root:Could not import file 'status' (error: [Errno 2] No such file or directory: 'status')
Traceback (most recent call last):
  File "/home/jrporter/miniconda3/envs/sched2/lib/python3.9/site-packages/Jug-2.1.1.post0-py3.9.egg/jug/jug.py", line 101, in init
    with open(jugfile) as jfile:
FileNotFoundError: [Errno 2] No such file or directory: 'status'

Maybe some kind of misprocessing of the argv?

Anyway, is this not the preferred way to do a development installation? Or did I make a mistake somewhere?

Thanks!

BUG jug execute --pdb conflicts with IPython>=2

Running on Python 3.4, and IPython 3.2.1, the following error occurs when dropping to pdb/ipython:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../bin/jug", line 5, in <module>
    main()
  File ".../lib/python3.4/site-packages/jug/jug.py", line 427, in main
    execute(options)
  File ".../lib/python3.4/site-packages/jug/jug.py", line 266, in execute
    execution_loop(tasks, options)
  File ".../lib/python3.4/site-packages/jug/jug.py", line 198, in execution_loop
    debugger = IPython.core.debugger.Pdb(colors)
  File ".../lib/python3.4/site-packages/IPython/core/debugger.py", line 264, in __init__
    self.set_colors(color_scheme)
  File ".../lib/python3.4/site-packages/IPython/core/debugger.py", line 272, in set_colors
    self.color_scheme_table.set_active_scheme(scheme)
  File ".../lib/python3.4/site-packages/IPython/utils/coloransi.py", line 176, in set_active_scheme
    scheme_test = scheme.lower()
AttributeError: 'LazyConfigValue' object has no attribute 'lower'

IterateTask as a decorator

I have been using a decorator to provide the functionality of iteratetask instead of the function.

The reason I like this is that it specifies how to handle the output (i.e. how many items in the tuple) closer to the place the output is specified (i.e. at the end of the function) rather than where the workflow is specified (which is usually a different file).

So it looks like:

@IterateTask(n=3)
@jug.TaskGenerator
def do_thing(args):
    return a, b, c

I've been using the following for a few weeks without incident:

import jug
from functools import wraps

def IterateTask(n):
    def decorator(f):
         @wraps(f)
         def wrapper(*args, **kwargs):
             return jug.iteratetask(f(*args, **kwargs), n=n)
         return wrapper
    return decorator

Is this something you'd be interested in adding? Is there any reason I haven't foreseen that this isn't a good idea?

Allow cleanup of only locks

I recently had an issue with a few of my jug processes being killed and therefore locks being kept on some active tasks. Normally, I would just delete the lock files in the jugdir, but I was using the Redis backend, so this was not possible. As suggested in the docs, I ran jug cleanup, expecting only the locks to be cleared; I was surprised to find the completed jobs cleared as well.

Is there any way to have jug cleanup only clear the locks? If not, can there be an option for this?

Implement `jug cleanup --keep-locks`

Result files from computations no longer relevant have a performance impact on actions that require traversing jugfile.jugdir.

This happens often if jugfile.py is executed and modified afterwards in ways that lead to definition of new, independent or unrelated tasks.

Since running jug cleanup removes locks as well as all dangling results, it makes it unsafe to run while jug execute is active.

Having --keep-locks would make this action safer.

In order for jug cleanup to be race-condition free .jugdir should be traversed for results first and the graph of tasks should be refreshed afterwards and only then should files be removed.
Otherwise a result may exist for a task that was not yet available.

Memory "leak"

In certain circumstances, when the jug executor is run for long periods of time, memory appears not to be released from previous tasks. This leads to an eventual OOM when many tasks are executed.

For example, if you have a situation like this:

import resource

import numpy as np
from jug import TaskGenerator


@TaskGenerator
def make_array(i):

    print('make_array memory footprint is %0.2f MB' %
          (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024**2))

    a = np.random.randint(0, 512, size=(128, 128, 4))
    return a


@TaskGenerator
def process_array(a):
    print('process_array memory footprint is %0.2f MB' %
          (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024**2))

    a = a * 10
    return a


list_of_arrays = [make_array(i) for i in range(1000)]
processed_arrays = [process_array(a) for a in list_of_arrays]

Then you get the following output:

make_array memory footprint is 28.96 MB
make_array memory footprint is 29.64 MB
make_array memory footprint is 30.14 MB
[...]
make_array memory footprint is 529.29 MB
make_array memory footprint is 529.79 MB
process_array memory footprint is 530.30 MB
process_array memory footprint is 530.81 MB
[...]
process_array memory footprint is 1029.42 MB
process_array memory footprint is 1029.93 MB
process_array memory footprint is 1030.43 MB
process_array memory footprint is 1030.93 MB
    Executed      Loaded  Task name
------------------------------------------------
        1000           0  jugfile.make_array
        1000           0  jugfile.process_array
.............................................................................
        2000           0  Total

I imagine that this has something to do with the way finished jobs are serialized, and that there is a reference to that data hanging around somewhere?

Invalidate inside shell doesn't invalidate dependent tasks

$ cat jugfile.py
from jug import TaskGenerator

@TaskGenerator
def add(a, b):
    return a + b

@TaskGenerator
def minus(a, b):
    return a - b

i = -1

for j in range(2):
    minus(add(j, i), i)
    i = j

$ jug execute
    Executed      Loaded  Task name
-----------------------------------------------
           2           0  jugfile.add
           2           0  jugfile.minus
...............................................
           4           0  Total

$ jug shell
In [1]: from jug.task import alltasks

In [2]: a = alltasks[2]

In [3]: a
Out[3]: Task(jugfile.add, args=(1, 0), kwargs={})

In [4]: invalidate(a)
Building task DAG... (only performed once)

In [5]: [t.can_load() for t in alltasks]
Out[5]: [True, True, False, True]

$ jug execute
    Executed      Loaded  Task name
-----------------------------------------------
           1           1  jugfile.add
           0           2  jugfile.minus
...............................................
           1           3  Total

I expected one jugfile.minus to also be invalidated.

At first I though this was an issue with Task.invalidate() but using jug.subcommand.shell.invalidate (the one available in jug shell) produced the same results.

jug invalidate --target jugfile.add behaves as expected and invalidates everything.

`RuntimeError: maximum recursion depth exceeded while calling a Python object`

Came across this error today:

CRITICAL:root:Exception while running tangent_lstm.test_lstm: maximum recursion depth exceeded while calling a Python object
Traceback (most recent call last):
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/bin/jug", line 5, in <module>
    main()
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 415, in main
    execute(options)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 254, in execute
    execution_loop(tasks, options)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 163, in execution_loop
    t.run(debug_mode=options.debug)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/task.py", line 95, in run
    self.store.dump(self._result, name)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 139, in dump
    encode_to(object, output)
  File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/encode.py", line 77, in encode_to
    write(object, stream)
RuntimeError: maximum recursion depth exceeded while calling a Python object

I'm getting this error when creating a Task that wraps a function which returns a Keras model.

Feature request: timing report for tasks

Somewhere in the documentation it says that one should have tasks that last at least for 2 or 3 seconds. In order to tune this (e.g. via map_stepand reduce_step in mapreduce) it would be nice to have a report of the timing of the tasks.
This could be an extra column in the summary that is printed by every worker, but it would also be nice to have it accessible through the results of a task (which would allow for automatic tuning).

Many many thanks for this awesome tool! It's great. =)

luispedro / jug Goto Github PK

jug's Issues

====================

==============================

Recommend Projects

Recommend Topics

Recommend Org