luispedro / jug Goto Github PK
View Code? Open in Web Editor NEWParallel programming with Python
Home Page: https://jug.readthedocs.io
License: MIT License
Parallel programming with Python
Home Page: https://jug.readthedocs.io
License: MIT License
I had a long message typed out, and then I realized what the problem was :(
Depending on what your intentions are for the config file to be able to comprehend this could be a bug , or it's just a documentation error.
This is from http://jug.readthedocs.io/en/latest/configuration.html?highlight=config
[main]
jugdir=%(jugfile).jugdata
jugfile=jugfile.py
[status]
cache=off
[execute]
aggressive-unload=False
pdb=False
wait-cycle-time=12
nr-wait-cycles=150
This yeilds the error:
%(jugfile).jugdata
Traceback (most recent call last):
File "/usr/local/bin/jug", line 9, in <module>
load_entry_point('Jug==1.4.0+git', 'console_scripts', 'jug')()
File "/usr/local/lib/python3.5/dist-packages/Jug-1.4.0+git-py3.5.egg/jug/jug.py", line 449, in main
options = parse(argv[1:])
File "/usr/local/lib/python3.5/dist-packages/Jug-1.4.0+git-py3.5.egg/jug/options.py", line 302, in parse
'jugfile': cmdline.jugfile[:-3],
ValueError: unsupported format character 'j' (0x6a) at index 11
if you change
jugdir=%(jugfile).jugdata
to:
jugdir=jugfile.jugdata
everything works fine
A fix for this would be to drop the % {} notation, and switch this and the documentation to python .format notation.
the result would be a config file that looks like this:
[main]
jugfile=jugfile.py
jugdir={jugfile}.{date}.jugdata
and your implementation in the source would look like:
cmdline.jugdir = cmdline.jugdir.format(
date=datetime.now().strftime('%Y-%m-%d'),
jugfile=cmdline.jugfile[:-3])
IMO this is cleaner from a config file perspective, and code readability perspective
For reference it used to be:
cmdline.jugdir = cmdline.jugdir % {
'date': datetime.now().strftime('%Y-%m-%d'),
'jugfile': cmdline.jugfile[:-3],
}
Which is a format I've never used, so therefore know nothing about, maybe it's a quick fix?
The front page of the documentation states, among other things, that Python 3 is unsupported (even though the changelog just above has an entry indicating it was added). Additionally, the "What's New" section is over a year old, making it seem like jug is under slightly less development than it really is.
I'm finding myself wanting to cleanup outputs of only some functions but not others.
Since --target
already exists in other commands, would it be a viable option here?
I tried the default example given in the documents. I get the following error. Can you please help me
Traceback (most recent call last):
File "/storage/work/dua143/anaconda2/bin/jug", line 11, in <module>
sys.exit(main())
File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/jug.py", line 278, in main
cmdapi.run(options.subcommand, options=options, store=store, jugspace=jugspace)
File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/subcommands/__init__.py", line 271, in run
return cmd(*args, **kwargs)
File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/subcommands/__init__.py", line 161, in __call__
return self.run(*args, **kwargs)
File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/subcommands/execute.py", line 86, in run
store, jugspace = init(options.jugfile, options.jugdir, store=store)
File "/storage/work/dua143/anaconda2/lib/python2.7/site-packages/jug/jug.py", line 97, in init
jugfile_contents = open(jugfile).read()
IOError: [Errno 2] No such file or directory: 'primes.py'
Hi, I wrote a small piece of python code on a macbook which has 10 + 1 tasks. 10 similar sleeping task and 1 joint task. I am running it by 4 processes and found that sometime Jug cannot assign task correctly. It happens like one process does all tasks along and, at the same time, another 3 share the 11 tasks.
Running jug.hooks.exit_checks.exit_env_vars()
if the specific JUG_*
environment variables are not defined has no effect. It should therefore be safe to run at every execution.
Could the behaviour introduced by this hook become a permanent feature such that you can always influence jug execution though JUG_*
variables without having to explicitly register the hook on every jugfile?
when i compute value of primes with value(primes100),it emerges the error as follows:
AttributeError: 'NoneType' object has no attribute 'can_load' in value function,
any help will be much appreciated! the variable store is always None
I want to run a simple script in a cluster whose network transmission rate is relatively low. Processes in my script, if created by multiprocess module, have little communication with each other.
However, since jug need IO over network during execution, will it create a lot of overhead here?
Hi, First I'd like to thank you guys creating jug, which is very convenient to use for embarrassingly parallel problems.
However I have found a bug using jug on a cluster (PBS). I used file backend and I did find racing problems. The simple task I used for testing is this:
myID = 'procid.%s.%s' % (str(os.getpid()) , os.environ['HOSTNAME'])
def createdir(d):
if not os.path.isdir(d):
os.mkdir(d)
os.mkdir('%s/%s' % (d,myID))
else:
os.mkdir(d + '.' + myID)
@TaskGenerator
def runjob(cmd):
hash = hashlib.md5(str(cmd)).hexdigest()
createdir(hash)
cmds = [c.strip() for c in open(joblistfile,'rt').read().splitlines() if c.strip() and not c.strip().startswith('#')]
for cmd in range(1024):
runjob(str(cmd))
I submitted two instances in two different nodes.
I found in the directory created by the task, there were dir names with ".procid." which means racing condition did occur.
The reason I think is "O_EXCL is broken on NFS file systems," as described by GNU/Linux open(2) manpage. I would suggest using flufl.lock, which is safe over nfs:
http://pythonhosted.org/flufl.lock/docs/using.html
Thank you very much for your attention to the problem.
-Zheng Qu
Hi, I defined a class which has a init. There are some methods in this class is decorated with @TaskGenerator. These methods cannot get class instance correctly. I tried 2 ways to solve it and they all don't work.
Would you mind to tell me the correct way to do it?
I'd like to debug errors that occur when I run a jug script. However, adding the --pdb
flag doesn't seem to have any effect.
$ jug execute --debug --pdb jug-init-inducing-fixedhyp.py
...
CRITICAL:root:Could not import file 'jug-init-inducing-fixedhyp.py' (error: too many values to unpack (expected 4))
Traceback (most recent call last):
File ...
Am I misunderstanding the intended use of the --pdb
option?
(Many thanks for the software, it's great)
I've run the primes example given in docs, and the HPC I'm using has 5 nodes and PBS based.
qsub: invalid option --t
Here's the error
jug execute "/home/jaideep.gedi.17cse/qgis/test.py" | qsub -t 1-100
qsub: invalid option -- 't'
usage: qsub [-a date_time] [-A account_string] [-c interval]
[-C directive_prefix] [-e path] [-f ] [-h ] [-I [-X]] [-j oe|eo] [-J X-Y[:Z]]
[-k o|e|oe] [-l resource_list] [-m mail_options] [-M user_list]
[-N jobname] [-o path] [-p priority] [-P project] [-q queue] [-r y|n]
[-S path] [-u user_list] [-W otherattributes=value...]
[-v variable_list] [-V ] [-z] [script | -- command [arg1 ...]]
qsub --version
Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
Please look into this..
There seems to be a weird interaction when you run jug cleanup while jobs are still running, but I could be wrong, but in addition, when a job is ended by sigterm jobs seem to enter the completed stage and never finish.
Reported on the jug-users mailing list:
I have a code that requires multiple merge points. At these merge points, a file is written, and the next section of the code uses that file. To address this, I've used jug.barrier(). Since I would like to submit this as a batch job, I created a script run_jug.sh, which will run multiple jug processes on a single node. In that script, I use sleep-until to prevent the batch job from ending early. The problem I'm encountering is that sleep-until seems to be satisfied when all jug processes hit the first jug.barrier(), and as such the batch job ends early.
I confirm that this is a bona fides bug.
Hello, is it possible to call an equivalent of jug execute
from code? The context is that I want to configure tasks with values from other libs (e.g. one parameter would be the number of workers) and possibly use something like joblib to start jug workers. Thanks.
I am implementing a distributing calculation system with jug. I want to know whether it is possible to pass some options to jugfile from the command line of jug(1) or elsewhere, rather than let jugfile read them itself.
I just realized there is an up-to-date documentation on http://jug.readthedocs.org/en/latest/
Previously, duckduckgo [1] referred me to http://pythonhosted.org/Jug/
This old version is also linked on PyPI: https://pypi.python.org/pypi/Jug ("Package documentation")
Consider the following jugfile:
import random
import jug
from jug.io import NoLoad
@jug.TaskGenerator
def gauss(i):
return random.gauss(0, 1)
@jug.TaskGenerator
def load_and_square(t):
data = jug.value(t.t)
return data**2
@jug.TaskGenerator
def sum_squares(nums):
return sum(nums)
ts = [load_and_square(NoLoad(gauss(i))) for i in range(3)]
sum_squares(ts)
The command jug invalidate --target jugfile.gauss
(as well as load_and_square
and num_squares
) all behave as expected. However, if you drop into a jug shell (for example because only a subset of the tasks should be invalidated) and use the "invalidate()" builtin, you encounter unexpected behavior:
$ jug shell
In [1]: get_tasks()
Out[1]:
[Task(jugfile.gauss, args=(0,), kwargs={}),
Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214168>,), kwargs={}),
Task(jugfile.gauss, args=(1,), kwargs={}),
Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e2141f8>,), kwargs={}),
Task(jugfile.gauss, args=(2,), kwargs={}),
Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214288>,), kwargs={}),
Task(jugfile.sum_squares, args=([Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214168>,), kwargs={}), Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e2141f8>,), kwargs={}), Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214288>,), kwargs={})],), kwargs={})]
In [2]: invalidate(get_tasks()[0])
Building task DAG... (only performed once)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/tmp/jugfile.py in <module>
----> 1 invalidate(get_tasks()[0])
~/.envs/jug-debug/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/shell.py in _invalidate(t)
135 '''
136 from ..task import alltasks
--> 137 return invalidate(alltasks, reverse_cache, t)
138
139 def _get_tasks():
~/.envs/jug-debug/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/shell.py in invalidate(tasklist, reverse, task)
67 for t in tasklist:
68 for d in t.dependencies():
---> 69 reverse.setdefault(d.hash(), []).append(t)
70 queue = [task]
71 seen = set()
AttributeError: 'NoLoad' object has no attribute 'hash'
Since this error occurs with any task, it seems like maybe the NoLoad
"poisons" the whole DAG. It does not happen with Task.invalidate()
.
I often find myself using the following code pattern:
if condition:
TMPDIR = "/some/temp"
else:
TMPDIR = "/tmp"
@TaskGenerator
def func(item):
use_item(item, TMPDIR)
...
when I want to have variables that are used by the function but if changed do not result in different tasks.
Do you have any suggestion for how to handle this use-case?
An alternative could be to have a special argument that could be modified without generating new tasks.
Do you see the following as a reasonable proposal?
@TaskGenerator
def func(item, __jugbypass):
tmpdir = __jugbypass["TMPDIR"]
use_item(item, tmpdir)
...
if condition:
TMPDIR = "/some/temp"
else:
TMPDIR = "/tmp"
func(item, __jugbypass={"TMPDIR": TMPDIR})
The main advantage of the second example is the ability to reuse code across jugfiles.
Could this be 80 characters so it will fit on a standard terminal window?
I seem to be hitting a few problems with Tasklets. I've implemented an example of what I'm talking about based upon the documentation:
import jug
def square(i):
return i**2
def gen_list(n):
return [i for i in range(n)]
def select_first(t):
return t[0]
result = jug.Task(gen_list, 5)
result0 = jug.Tasklet(select_first, result)
next = jug.Task(square, result0)
Running jug status
on this code gets me the following error:
File "/home/jrporter/miniconda2/envs/std3/bin/jug", line 11, in <module>
load_entry_point('Jug==1.6.7+git', 'console_scripts', 'jug')()
File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/jug.py", line 290, in main
cmdapi.run(options.subcommand, options=options, store=store, jugspace=jugspace)
File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/__init__.py", line 271, in run
return cmd(*args, **kwargs)
File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/__init__.py", line 161, in __call__
return self.run(*args, **kwargs)
File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/status.py", line 263, in run
return _status_nocache(options)
File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/status.py", line 223, in _status_nocache
elif t.can_run():
File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/task.py", line 140, in can_run
if not hasattr(dep, '_result') and not dep.can_load():
File "/home/jrporter/miniconda2/envs/std3/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/task.py", line 421, in can_load
return self.base.can_load()
AttributeError: 'function' object has no attribute 'can_load'
If I comment out the last line (next = jug.Task(square, result0)
), it "works":
Failed Waiting Ready Complete Active Task name
-------------------------------------------------------------------------------------
0 0 1 0 0 test-tasklets.gen_list
......................................................................................
0 0 1 0 0 Total
My obvious interpretation of this is that the Tasklet
object doesn't have a can_load
parameter (which seems like it should return true if its dependencies can be loaded?), but perhaps I'm doing something wrong?
Could you advise me on how to proceed? I have read the docs page several times and can't seem to figure out what I'm doing wrong, if anything.
Ola @luispedro ,
during my simulations I came across this.
I assume that a lot of workers were doing small tasks, and then I assume jug exits the loop prematurely here as all upnext tasks can be loaded in the mean time. This loop is outside the wait cycles loop, and hence I assume that only the outer execution loop captures this case, but only twice before exiting.
Well, I know that a jug task should be of decent size for a reason, but I still would consider this is something to think about, if I am not mistaken in my assumptions here.
The number of actually running workers was about 80, and 14 of them exited prematurely like this. Everything else works like a charm, no exceptions occurred.
Complete stdout error log http://pastebin.com/vjTYHthD of one worker
Jugfile http://pastebin.com/hqwMCzXM
gridjug.grid_jug(
jugfile=BOND_PERCOLATION_JUGFILE,
jugdir=BOND_PERCOLATION_JUGDIR,
jug_args=[
'--aggressive-unload',
],
jug_nworkers=400,
keep_going=True,
verbose=True,
**GRIDMAP_PARAMS
)
When one supplies a task with several keyword arguments, jug stores them in a dictionary.
Subsequently, jug computes the hash by iterating through the dictionary.
This implicitly orders the dictionary when updating the hash of the task.
However, the order of keyword arguments (or a dictionary) is undefined in Python < 3.5 (cf. PEP 0468). Hence the implicit change of order of the supplied keyword arguments leads to recomputation of the task (if that particular order has not been computed before).
-------------------------------------------------------------------------------------
0 0 1 0 0 jugfile.function
0 0 1 0 0 jugfile.function_other
.....................................................................................
0 0 2 2 0 Total
Running jug invalidate --target jugfile.function
invalidates everything.
I am not exactly sure what the purpose of NoLoad is, but when it is used, tasks are not aware of their own dependencies. I am not sure if this is intended behavior or not, but I thought I'd report it.
For example, consider:
import jug
from jug.utils import jug_execute
from jug.io import NoLoad
import random
@jug.TaskGenerator
def gauss():
return random.gauss(0, 1)
@jug.TaskGenerator
def write_map_ts(ts):
import os
outfilename = os.path.abspath('./gauss.txt')
with open(outfilename, 'w') as f:
for t in ts:
data = jug.value(t.t)
f.write(str(data)+'\n')
return outfilename
map_ts = [gauss() for i in range(10)]
f = write_map_ts([NoLoad(t) for t in map_ts])
jug_execute(['cat', f])
produces the dependency graph:
Note that although the true dependency graph is gauss
<- write_map_ts
<- jug_execute
, the link between gauss
and write_map-ts
is interrupted by NoLoad
.
Would solving this issue be as simple as implementing dependencies
for NoLoad
?
Hi,
first of all, jug is a great module to distribute tasks on a cluster without too much overhead and it already helps me a lot.
While using it, I found that one cannot read results which contain empty strings in python 2.7.6 (not tested with python 3). Here is a minimal working example:
File jugtest.py
from jug import Task
def func():
return ''
t = Task(func)
File jugtest_result.py
import jug
jug.init('jugtest.py', 'jugtest.jugdata')
import jugtest
print(jugtest.t.value())
Output:
$ jug execute jugtest.py
Executed Loaded Task name
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 0 jugtest.func
.............................................................................................................................................................................................................
1 0 Total
$ python jugtest_result.py
Traceback (most recent call last):
File "jugtest_result.py", line 5, in <module>
print(jugtest.t.value())
File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/task.py", line 113, in value
return self.result
File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/task.py", line 108, in _get_result
if not hasattr(self, '_result'): self.load()
File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/task.py", line 150, in load
self._result = self.store.load(self.hash())
File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 228, in load
return decode_from(input)
File "/home/feurerm/virtualenvs/2016_dataset_metric_learning/local/lib/python2.7/site-packages/jug/backends/encode.py", line 192, in decode_from
return pickle.load(stream)
EOFError
This is not a big issue because I can just output a dummy string, but it would be great if this can be fixed or documented so other users don't have this issue again.
Ok, so take for example
def _buildXY():
tan_train = [Task(np.random.randn, 100000, 528) for i in range(7)]
tan_test = [Task(np.random.randn, 100000, 528) for i in range(5)]
# xs, ys - lists of arrays
def pad(xs, ys):
maxT = max( x.shape[0] for x in xs )
x = np.zeros((len(xs), maxT, xs[0].shape[1]))
y = np.zeros((len(ys), maxT, ys[0].shape[1]))
for ii in xrange(len(xs)):
t = xs[ii].shape[0]
x[ii,:t,:] = xs[ii]
y[ii,:t,:] = ys[ii]
return x, y
y_train = [np.random.randn(100000, 5) for i in range(7)]
y_test = [np.random.randn(100000, 5) for i in range(5)]
train = Task(pad, tan_train, y_train)
test = Task(pad, tan_test, y_test)
return train[0], train[1], test[0], test[1]
_buildXY()
Running jug execute
gives me the error:
CRITICAL:root:Exception while running tangent_lstm.pad: error return without exception set
Traceback (most recent call last):
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/bin/jug", line 5, in <module>
main()
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 415, in main
execute(options)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 254, in execute
execution_loop(tasks, options)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 163, in execution_loop
t.run(debug_mode=options.debug)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/task.py", line 95, in run
self.store.dump(self._result, name)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 139, in dump
encode_to(object, output)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/encode.py", line 77, in encode_to
write(object, stream)
SystemError: error return without exception set
RQ is based on redis and used by many? how is Jug better?
Hi!
jug status
does not show me the number of running tasks. I'm using Python 3.4.
Is it possible that this is the same bug that caused #10? I noticed that all the lockfiles start with b'...
For example, test_jug.py
:
from jug import Task
import numpy as np
Task(np.random.randn, 100, 20)
Then,
$ rm -rf test_jug.jugdata/ && jug execute test_jug.py
Executed Loaded Task name
-----------------------------------------------------------------------------------------
1 0 None.randn
.........................................................................................
1 0 Total
I'm on jug 1.1.
It would be nice if there was possible to pass options to the webstatus app, like the host to listen on and the port.
Something like:
--ip=<Unicode>
The IP address the webstatus server will listen on.
Default: 'localhost'
--port=<Int>
The port the webstatus server will listen on.
Default: 8080
Sometimes you might already be using that port for something else, and it would be nice to be able to have a separate instance.
Hi all,
While using and developing jug_schedule I've come across several design issues that boil down to the question:
Locks in jug are created once and left for the entire duration of execution.
They are always removed when a task finishes, even if it fails, unless processes or nodes crash in which case they are left on the filesystem and percived by jug as "Running".
The consequence of removing locks on failures is that every time a new jug execute
is launched the failing task is retried.
The ideal solution here would be a keep-alive mechanism identical to what is present in NGLess in addition to leaving the lock behind in case of failures.
Some brainstorming on other alternatives include (with caveats):
These provide sufficient information to know if a task failed but are still insufficient to distinguish running from crashed.
The lock would still be left behind so for all practical purposes the task wouldn't be retried until jug cleanup
is used.
Extras:
jug cleanup
removes all locks, regardless of their state. jug cleanup --failed-only
could make sense here.jug status
could display a useful Failed column.When trying the examples, e.g., examples/decrypt/printjugresults.py
:
$ python3 printjugresults.py
Traceback (most recent call last):
File "printjugresults.py", line 2, in <module>
jug.init('jugfile', 'jugdata')
File ".../lib/python3.4/site-packages/jug/jug.py", line 386, in init
jugfile_contents = open(jugfile).read()
FileNotFoundError: [Errno 2] No such file or directory: 'jugfile'
So I change the jug.init
call to use jugfile.py
instead:
$ python3 printjugresults.py
Traceback (most recent call last):
File "printjugresults.py", line 4, in <module>
results = jug.task.value(jugfile.fullresults)
File ".../lib/python3.4/site-packages/jug/task.py", line 446, in value
return elem.value()
File ".../lib/python3.4/site-packages/jug/task.py", line 111, in value
return self.result
File ".../lib/python3.4/site-packages/jug/task.py", line 106, in _get_result
if not hasattr(self, '_result'): self.load()
File ".../lib/python3.4/site-packages/jug/task.py", line 147, in load
assert self.can_load()
AssertionError
Then I change jug.init
to also use jugfile.jugdata
, and:
$ python3 printjugresults.py
Jug is perfect for embarassingly parallel applications.
-- Luis Pedro Coelho
Password was 'qwert'
According to the documentation, the full names should not be necessary, but that does not seem to be the case.
When using
subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, creationflags=CREATE_NEW_CONSOLE)
runing jug process, the subprocess will only starts after checking status.
The same thing happens when running by
subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, shell=True)
This problem doesn't appear when using
subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT)
Would you mind to help?
Hi @luispedro , is there any chance to fix the order of tasks in repeated calls of jug status? Or am I missing something there? I am using a setup like
watch jug status
and for now, every new call to jug status produces a new order of tasks.
Could just be a linux thing (not sure what you develop on) but the tildes are not expanded in the function:
read_configuration_file(fp=None). This results in never passing the IOERROR check, and thus, the jugrc file never being loaded (at least on my comp)
I think you have to use path.expanduser(fp) to replace the fp in the path.exists check and the open(fp) function.
Sadly though when I do that and get past the IOERROR check I get the error
File "/usr/local/lib/python3.5/dist-packages/Jug-1.4.0+git-py3.5.egg/jug/options.py", line 299, in parse
'jugfile': cmdline.jugfile[:-3],
ValueError: unsupported format character 'j' (0x6a) at index 11
So clearly I'm missing something in the bigger picture!
If jugdir is not provided on command-line to "jug execute", and there is no jug config file, the program errors out, since this is prior to getting to the call to init() which handles the missing jugdir.
Just define a default:
parser.add_option('--jugdir',
action='store',
dest='jugdir',
default='jugdir', # define default value
help='Where to save intermediate results')
Hello, I was wondering whether there was a reason why CompoundTaskGenerator is not in the init.py
Is this feature still working? Or is it deprecated, still in progress, or just forgotten?
Great project by the way, it has been very useful!
Hi,
First of all thanks for this great lib! I am thinking about moving some of my code to it. However I have a problem here (python 3.3.5) with jug status. For example if I am running the prime numbers example and jug status primes.py always output :
Waiting Ready Finished Running Task name
----------------------------------------------------------------------------------------------------------------------------
0 99 0 0 primes.is_prime
............................................................................................................................
0 99 0 0 Total
Even though the computations are finished as I can see by doing jug shell primes.py
I did look around in jug/subcommands/status.py and by commenting this line:
https://github.com/luispedro/jug/blob/master/jug/subcommands/status.py#L186
I have the correct behaviour.
Thanks!
Hi Luis,
I am trying to install a version of jug I can hack on as I do my projects. I downloaded the git repository and ran python setup.py install
which appeared to be successful. However, now when I run jug status
I get the error:
$ jug status
CRITICAL:root:Could not import file 'status' (error: [Errno 2] No such file or directory: 'status')
Traceback (most recent call last):
File "/home/jrporter/miniconda3/envs/sched2/lib/python3.9/site-packages/Jug-2.1.1.post0-py3.9.egg/jug/jug.py", line 101, in init
with open(jugfile) as jfile:
FileNotFoundError: [Errno 2] No such file or directory: 'status'
Maybe some kind of misprocessing of the argv
?
Anyway, is this not the preferred way to do a development installation? Or did I make a mistake somewhere?
Thanks!
Running on Python 3.4, and IPython 3.2.1, the following error occurs when dropping to pdb/ipython:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".../bin/jug", line 5, in <module>
main()
File ".../lib/python3.4/site-packages/jug/jug.py", line 427, in main
execute(options)
File ".../lib/python3.4/site-packages/jug/jug.py", line 266, in execute
execution_loop(tasks, options)
File ".../lib/python3.4/site-packages/jug/jug.py", line 198, in execution_loop
debugger = IPython.core.debugger.Pdb(colors)
File ".../lib/python3.4/site-packages/IPython/core/debugger.py", line 264, in __init__
self.set_colors(color_scheme)
File ".../lib/python3.4/site-packages/IPython/core/debugger.py", line 272, in set_colors
self.color_scheme_table.set_active_scheme(scheme)
File ".../lib/python3.4/site-packages/IPython/utils/coloransi.py", line 176, in set_active_scheme
scheme_test = scheme.lower()
AttributeError: 'LazyConfigValue' object has no attribute 'lower'
I have been using a decorator to provide the functionality of iteratetask
instead of the function.
The reason I like this is that it specifies how to handle the output (i.e. how many items in the tuple) closer to the place the output is specified (i.e. at the end of the function) rather than where the workflow is specified (which is usually a different file).
So it looks like:
@IterateTask(n=3)
@jug.TaskGenerator
def do_thing(args):
return a, b, c
I've been using the following for a few weeks without incident:
import jug
from functools import wraps
def IterateTask(n):
def decorator(f):
@wraps(f)
def wrapper(*args, **kwargs):
return jug.iteratetask(f(*args, **kwargs), n=n)
return wrapper
return decorator
Is this something you'd be interested in adding? Is there any reason I haven't foreseen that this isn't a good idea?
I recently had an issue with a few of my jug processes being killed and therefore locks being kept on some active tasks. Normally, I would just delete the lock files in the jugdir
, but I was using the Redis backend, so this was not possible. As suggested in the docs, I ran jug cleanup
, expecting only the locks to be cleared; I was surprised to find the completed jobs cleared as well.
Is there any way to have jug cleanup
only clear the locks? If not, can there be an option for this?
Result files from computations no longer relevant have a performance impact on actions that require traversing jugfile.jugdir
.
This happens often if jugfile.py
is executed and modified afterwards in ways that lead to definition of new, independent or unrelated tasks.
Since running jug cleanup
removes locks as well as all dangling results, it makes it unsafe to run while jug execute
is active.
Having --keep-locks
would make this action safer.
In order for jug cleanup
to be race-condition free .jugdir
should be traversed for results first and the graph of tasks should be refreshed afterwards and only then should files be removed.
Otherwise a result may exist for a task that was not yet available.
In certain circumstances, when the jug executor is run for long periods of time, memory appears not to be released from previous tasks. This leads to an eventual OOM when many tasks are executed.
For example, if you have a situation like this:
import resource
import numpy as np
from jug import TaskGenerator
@TaskGenerator
def make_array(i):
print('make_array memory footprint is %0.2f MB' %
(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024**2))
a = np.random.randint(0, 512, size=(128, 128, 4))
return a
@TaskGenerator
def process_array(a):
print('process_array memory footprint is %0.2f MB' %
(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024**2))
a = a * 10
return a
list_of_arrays = [make_array(i) for i in range(1000)]
processed_arrays = [process_array(a) for a in list_of_arrays]
Then you get the following output:
make_array memory footprint is 28.96 MB
make_array memory footprint is 29.64 MB
make_array memory footprint is 30.14 MB
[...]
make_array memory footprint is 529.29 MB
make_array memory footprint is 529.79 MB
process_array memory footprint is 530.30 MB
process_array memory footprint is 530.81 MB
[...]
process_array memory footprint is 1029.42 MB
process_array memory footprint is 1029.93 MB
process_array memory footprint is 1030.43 MB
process_array memory footprint is 1030.93 MB
Executed Loaded Task name
------------------------------------------------
1000 0 jugfile.make_array
1000 0 jugfile.process_array
.............................................................................
2000 0 Total
I imagine that this has something to do with the way finished jobs are serialized, and that there is a reference to that data hanging around somewhere?
$ cat jugfile.py
from jug import TaskGenerator
@TaskGenerator
def add(a, b):
return a + b
@TaskGenerator
def minus(a, b):
return a - b
i = -1
for j in range(2):
minus(add(j, i), i)
i = j
$ jug execute
Executed Loaded Task name
-----------------------------------------------
2 0 jugfile.add
2 0 jugfile.minus
...............................................
4 0 Total
$ jug shell
In [1]: from jug.task import alltasks
In [2]: a = alltasks[2]
In [3]: a
Out[3]: Task(jugfile.add, args=(1, 0), kwargs={})
In [4]: invalidate(a)
Building task DAG... (only performed once)
In [5]: [t.can_load() for t in alltasks]
Out[5]: [True, True, False, True]
$ jug execute
Executed Loaded Task name
-----------------------------------------------
1 1 jugfile.add
0 2 jugfile.minus
...............................................
1 3 Total
I expected one jugfile.minus
to also be invalidated.
At first I though this was an issue with Task.invalidate()
but using jug.subcommand.shell.invalidate
(the one available in jug shell
) produced the same results.
jug invalidate --target jugfile.add
behaves as expected and invalidates everything.
Came across this error today:
CRITICAL:root:Exception while running tangent_lstm.test_lstm: maximum recursion depth exceeded while calling a Python object
Traceback (most recent call last):
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/bin/jug", line 5, in <module>
main()
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 415, in main
execute(options)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 254, in execute
execution_loop(tasks, options)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 163, in execution_loop
t.run(debug_mode=options.debug)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/task.py", line 95, in run
self.store.dump(self._result, name)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 139, in dump
encode_to(object, output)
File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/encode.py", line 77, in encode_to
write(object, stream)
RuntimeError: maximum recursion depth exceeded while calling a Python object
I'm getting this error when creating a Task that wraps a function which returns a Keras model.
Somewhere in the documentation it says that one should have tasks that last at least for 2 or 3 seconds. In order to tune this (e.g. via map_step
and reduce_step
in mapreduce) it would be nice to have a report of the timing of the tasks.
This could be an extra column in the summary that is printed by every worker, but it would also be nice to have it accessible through the results of a task (which would allow for automatic tuning).
Many many thanks for this awesome tool! It's great. =)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.