materialsproject / fireworks Goto Github PK
View Code? Open in Web Editor NEWThe Fireworks Workflow Management Repo.
Home Page: https://materialsproject.github.io/fireworks
License: Other
The Fireworks Workflow Management Repo.
Home Page: https://materialsproject.github.io/fireworks
License: Other
Try the following:
Note, I renamed this method from "get_links" recently and made some changes due to weird output in the original function. You can revert to a version of FW before today if you want to see what the function was doing before my changes
I have seen this test fail often and suspect there is something wrong with the test itself (particularly as it is a timed test). Here is an example of it failing when I had only made changes to docs and docstrings ( https://travis-ci.org/materialsproject/fireworks/builds/60772474 ).
======================================================================
FAIL: test_rerun_timed_fws (fireworks.core.tests.test_launchpad.WorkflowFireworkStatesTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/materialsproject/fireworks/fireworks/core/tests/test_launchpad.py", line 710, in test_rerun_timed_fws
self.assertEqual(fw_state, fw_cache_state)
nose.proxy.AssertionError: 'RUNNING' != 'READY'
- RUNNING
+ READY
As the title says, running lpad webgui
seems to open the same page in two different tabs or windows.
Is there a mechanism for using configurations one might have in ~./fireworks
automatically when doing things like
import fireworks
launchpad = fireworks.LaunchPad()
instead of having to put in the hostname, database name, username, password for a non-local MongoDB? I know that using lpad
from the command line automatically uses these configurations, but if there's a reasonable way of telling fireworks.LaunchPad
to do the same thing that would be awesome.
Add benchmark(s) for running workflows
It would be really helpful to have access to the launcher_dir
used so if data is stored in that directory it can be explored programmatically.
This is a major feature request. After struggling with trying to make a particular workflow work across multiple resources, I am of the view that fireworks badly needs a resource-context setting module. The principle that resources are to be separate from tasks is a good concept in theory, but practically, it is somewhat difficult. For example, a particular command may either have alternatives or completely different names in different resources. Or some resources may require special settings (you are allowed to use 1 process in one resource, but 8 in another). matlab can be named "matlab" on one resource and "matlab2009" in another. Yes, people can write if - else statements to cope with these, but it rapidly becomes unmanageable. E.g., if I need to use mpirun on one resource and aprun on another, I basically have to populate if-else statements in all FireTasks that uses mpi/aprun. Not to mention that this is extremely unscalable. E.g., if I have a new resource, I have to remember to modify every single if-else statement somewhere in my potentially thousands of FireTasks.
My proposal is to encapsulate resources settings. For example, something basic would be like
class ResourceA(dict):
def __init__(self):
self["matlab"] = "matlab2009"
self["vasp"] = ["mpirun", "-np", "8", "vasp"]
def at_resource(self):
Some test for being at the resource.
This will be supported by a decorator for FireTasks. E.g.,
@supported_resources([ResourceA])
class FireTaskA(FireTaskBase):
.....init needs to check that it is being run at a supported resource.... It will then set some default _fw_vars which is resource based.
within the FireTaskA, you can replace where you need a particular variable settings as
matlab_cmd = self["_fw_vars"]["matlab"]
Note:: This is somewhat inspired by Fabric's env settings module. I am not saying it is the best implementation out there, and certainly the above is just a skeleton of how such a system would work. Actual implementation of course needs more strategic thinking.
Hi
How do I access stored_data for when a Firework returns a FWAction and stores the data?
Thanks
the subprocess
module in python 3 returns a byte sequence for the stdout and stderr. The problem is that the function _parse_jobid
and _parse_njobs
from the common_adapter.py
module, treat the output like a str because that is the standard for python 2 but for python 3 those functions fail when they receive an array of bytes for input:
return len(output_str.split('\n'))-1
TypeError: a bytes-like object is required, not 'str'
For the same firework id, there may be many different launches, how to trace the throwaway results in the workflow through the topological graph?
In some cases, we may not have the arguments in order or this may not be possible. In these cases, it would be nice to have a kwargs
argument to allow a dictionary to be passed through and unpacked on the other side.
Consider this situation:
a user is using the PyTask to carry out an operation, and as a part of that, transfers a python object that has been serialized in JSON format (with json or jsonpickle) as an input to that PyTask. Then (it seems, based on my own testing), upon starting the job, Fireworks will de-serialize also the input object to the PyTask, resulting in errors. Perhaps Fireworks should make an effort to not alter the format of inputs to PyTasks?
Actually, after looking at my problem more, it doesn't seem like this is correct, and I am likely doing something else wrong.
I have a serious need for one particular feature.
As far as I can see, there is no way for me to figure out the parity between a particular firework and the job id on a queue. There is a really roundabout way of looking at launch_dirs and doing qstat -f to look at the directories, but it really shouldn't be this hard.
Knowing the specific queue id is very useful. For example, I just had two FW randomly die on me on the queue and I want to check what FW they correspond to so that I can restart them. The problem is that all the FWs are flagged as RUNNING, even though some of them have died.
Looking at the code, it seems that the reservation id is set only in reserve mode. Is there any reason not to do it for both reserve and non-reserve mode?
P.S. Calling the private _set_reservation_id is probably not a good idea. If that is supposed to be called outside of the Launchpad module, you'd probably want to make that a public method.
I need a way to pause a workflows. For example, my workflows run on weeks at a time. But say I encounter a scheduled maintenance on a resource in the middle of a week. I need a way to tell the workflow to pause after the current FW is done. i.e. any new additions are automatically set to a state called PAUSED until rerun_fws is called.
I am trying to see if current fireworks already allows this. But I am a bit unclear on the meanings behind all the states. Some are obvious (e.g., COMPLETED and RUNNING) but what's the difference between WAITING and READY? A glossary of all these states should be in the doc (and the code).
Running "rlaunch rapidfire" on a single node uncovered a long delay (8-10 sec.) between each task, for a large workflow of 10,000 items (5,000 sequences of 2 items). The delay had 2 sources: (1) a hostname lookup, which wasn't being cached -- this was immediately fixed, and accounted for ~5 sec. of the delay (2) a performance problem with the update after the job was launched.
To reproduce, use the script from this gist https://gist.github.com/dangunter/9939755 as build_wf.py and run
mkdir abcd
python build_wf.py --output abcd --type sequence --tasks 10000
lpad add abcd/fw_sequence_10000.yaml
rlaunch rapidfire
and note the pause between tasks..
Using rlaunch singleshot --offline from a qlaunch -r singleshot yields the following error:
"Traceback (most recent call last):
File "${HOME}/.local/lib/python2.7/site-packages/fireworks/core/rocket.py", line 172, in run
lp.log_message(logging.INFO, "Task started: %s." % t.fw_name)
AttributeError: 'NoneType' object has no attribute 'log_message'"
From line 69-70 of fireworks/scripts/rlaunch_run.py :
if args.command == 'singleshot' and args.offline:
launchpad = None
This makes sense. It should be reading from the FW.json file. However, on Lines 172 and 193 of fireworks/core/rocket.py , It looks for a launchpad anyway for logging and creates an error:
lp.log_message(logging.INFO, "Task started: %s." % t.fw_name).
I was able to correct this by adding "if lp:" to the logging calls, but I don't know how to write a .patch file to submit the correction to the developers.
There are absolute paths that cause pip to fail installing.
/Users/ajain/Documents/code_matgen/fireworks/scripts/lpad
/Users/ajain/Documents/code_matgen/fireworks/scripts/mlaunch
/Users/ajain/Documents/code_matgen/fireworks/scripts/qlaunch
/Users/ajain/Documents/code_matgen/fireworks/scripts/rlaunch
if you change them to relative paths (i.e. remove the string /Users/ajain/Documents/code_matgen/ from the lines) it installs well.
Hi, I created some PKGBUILDs
https://gist.github.com/rmorgans/76c8285832744ec66325
All tests are passed with these versions, some greater than the requirements.txt in the git repo.
python2 2.7.9-1
python2-yaml 3.11-2
python2-pymongo 2.8-3
python2-jinja 2.7.3-1
python2-six 1.9.0-1
python2-monty 0.6.4-1
python2-dateutil 2.4.1-1
Hope these are useful.
In general, I find the command line option naming very inconsistent.
For example, a lot of the fireworks or workflow specific commands have _fws or _wfs. But reignite and archive doesn't. get_qid as well.
In general, I think that the list of commands are getting unwieldy (not to mention the lpad_run.py script is getting super long. I think a reorg might be in order, e.g., splitting up the admin stuff from the fireworks stuff from the workflow stuff into three separate scripts.
When running in reservation mode, even if the q submit script fails for whatever reason (e.g. a bad qadapter), the FW is still set to RESERVED. And correcting the qadapter and redoing qlaunch does not result in the FW running.
This really should not be how it works. The state should be set to reserved only upon success of the q submission.
I can't do python setup.py develop --user for example. Without if name, it works.
Right now, FW does not track rerun history. If a particular FW has fizzled and that fw is rerun with lpad rerun_fws, the launch history is not stored for the past fizzled attempts. This should be tracked so that people can easily debug any errors (e.g., checking if the failures are consistent across rerun attempts).
I am facing the need of having detailed information about the whole workflow during the execution of a Task. In this case it is limited to simple tasks that will make a cleanup or storing of data from all the FWs launched during the WF, but I think that it could be generally useful to have the possibility of accessing detailed information about the workflow at the run_task level.
At the moment, it is possible to do that by reading the FW id from the FW.json file and instantiating a LaunchPad with the autoload function to call the DB. However, there is a potential flaw in this approach, since the launchpad can be passed as an option to the q/m/rlaunch commands and there is no way to know that inside the Task.
There could probably be some workaround, but I would prefer to add a cleaner way of accessing those data. I can think of two different ways of doing that and I would like to have a feedback.
Thanks
I get an error in following command:
$ qlaunch singleshot
2016-09-14 16:03:28,156 INFO moving to launch_dir /home/python/queue_tests
2016-09-14 16:03:28,162 INFO submitting queue script
2016-09-14 16:03:28,246 ERROR ----|vvv|----
2016-09-14 16:03:28,246 ERROR Could not parse job id following sbatch due to error a bytes-like object is required, not 'str'...
2016-09-14 16:03:28,249 ERROR Traceback (most recent call last):
File "/home/python/sf_box/fireworks/fireworks/user_objects/queue_adapters/common_adapter.py", line 200, in submit_to_queue
job_id = self._parse_jobid(p.stdout.read())
File "/home/python/sf_box/fireworks/fireworks/user_objects/queue_adapters/common_adapter.py", line 71, in _parse_jobid
for l in output_str.split("\n"):
TypeError: a bytes-like object is required, not 'str'
2016-09-14 16:03:28,250 ERROR ----|^^^|----
2016-09-14 16:03:28,250 ERROR ----|vvv|----
2016-09-14 16:03:28,250 ERROR Error writing/submitting queue script!
2016-09-14 16:03:28,256 ERROR Traceback (most recent call last):
File "/home/python/sf_box/fireworks/fireworks/queue/queue_launcher.py", line 133, in launch_rocket_to_queue
launchpad.set_reservation_id(launch_id, reservation_id)
File "/home/python/sf_install/python/3.5.2/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/python/sf_install/python/3.5.2/lib/python3.5/site-packages/monty-0.9.1-py3.5.egg/monty/os/__init__.py", line 32, in cd
yield
File "/home/python/sf_box/fireworks/fireworks/queue/queue_launcher.py", line 130, in launch_rocket_to_queue
raise RuntimeError('queue script could not be submitted, check queue '
RuntimeError: queue script could not be submitted, check queue script/queue adapter/queue server status!
2016-09-14 16:03:28,256 ERROR ----|^^^|----
But i get the correct result:
$ls
FW_job-2315017.error FW.json fw_test.yaml logging my_launchpad.yaml
FW_job-2315017.out FW_submit.script howdy.txt my_fworker.yaml my_qadapter.yaml
$ cat howdy.txt
howdy, your job launched successfully!
$ cat FW_job-2315017.out
2016-09-14 16:03:23,198 INFO Hostname/IP lookup (this will take a few seconds)
2016-09-14 16:03:23,201 INFO Launching Rocket
2016-09-14 16:03:24,258 INFO RUNNING fw_id: 1 in directory: /home/queue_tests
2016-09-14 16:03:24,265 INFO Task started: ScriptTask.
2016-09-14 16:03:24,283 INFO Task completed: ScriptTask
2016-09-14 16:03:24,590 INFO Rocket finished
$ cat my_qadapter.yaml
_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /home/python/queue_tests/my_fworker.yaml -l /home/python/queue_tests/my_launchpad.yaml singleshot
ntasks: 1
cpus_per_task: 1
ntasks_per_node: 1
walltime: '00:02:00'
queue: null
account: null
job_name: null
logdir: /home/python/queue_tests/logging
pre_rocket: null
post_rocket: null
#You can override commands by uncommenting and changing the following lines:
#_q_commands_override:
#submit_cmd: my_qsubmit
#status_cmd: my_qstatus
#You can also supply your own template by uncommenting and changing the following line:
#template_file: /full/path/to/template
Am i make mistakes?
Both pip install and pip install --upgrade will only give you the version 0.198.
Install from Github source code can give you 0.66 version
I have gone through the quick start and had a couple of problems I thought I would pass along. The first one is solved, but the second I am still confused about.
At present, serialization of NumPy arrays proceeds through conversion to strings. In order to have some way to serialize objects that may not have another strategy, this is an ok fallback. However, it is far from optimal for NumPy arrays. A much better strategy might be to cannibalize ( https://github.com/dattalab/mongowrapper ).
lpad get_wfs is extremely slow, regardless of what kind of display mode is being used (other than ids). The main reason is that the method reconstitute the entire Workflow/Firework from the database before doing a dump to dict. For the purposes of getting info, this is very inefficient, especially for large workflows.
For example, I have a workflow that comprises > 150 Fireworks (each firework spawned a few others and this continued over many iterations). Doing a simple lpad get_wfs -d more -i 2 (single workflow only) took 14 secs, with several tens of Mb transferred (each launch is pretty big given that the launch contains new FW with specs). The result is the same with "-d less". 14 secs is an eternity to wait for a single command to check status.
I have optimized the "-d less" mode to query only for the info required, and that reduced the query time from 14 secs to <0.5 secs. I am sure the "-d more" display mode can be similarly optimized.
In general, I would suggest distinguishing the query for information from the query to reconstitute whole fireworks objects. The former can be limited to only the info needed, and the latter is rarely used (e.g., why would you want to reconstitute a whole Workflow object from the database, except in the rare case where you want to modify it?). The LaunchPad should have additional methods such as get_wf_summary(list of fw_ids, mode) optimized for info access instead of object access.
(as reported by several users)
pymongo API seems to have changed
python: 2.7.11
fireworks: 1.2.9 (develop model)
for the command qlaunch -r rapidfire --nlaunches infinite -m 1 --sleep 100 -b 10000
I got :
my environment
usage: qlaunch [-h] [-rh [REMOTE_HOST [REMOTE_HOST ...]]]
[-rc REMOTE_CONFIG_DIR [REMOTE_CONFIG_DIR ...]]
[-ru REMOTE_USER] [-rp REMOTE_PASSWORD] [-rs] [-d DAEMON]
[--launch_dir LAUNCH_DIR] [--logdir LOGDIR] [--loglvl LOGLVL]
[-s] [-r] [-l LAUNCHPAD_FILE] [-w FWORKER_FILE]
[-q QUEUEADAPTER_FILE] [-c CONFIG_DIR]
{singleshot,rapidfire} ...
qlaunch: error: unrecognized arguments: -r rapidfire
then I removed -r rapidfire, then I typed qlaunch --nlaunches infinite -m 1 --sleep 100 -b 10000
I got :
my environment
Traceback (most recent call last):
File "/lustre/home/umjzhh-1/kl_me2/virtenv_kl_me2/bin/qlaunch", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/scripts/qlaunch", line 6, in <module>
qlaunch()
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/scripts/qlaunch_run.py", line 173, in qlaunch
do_launch(args)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/scripts/qlaunch_run.py", line 52, in do_launch
reserve=args.reserve, strm_lvl=args.loglvl, timeout=args.timeout)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/queue/queue_launcher.py", line 174, in rapidfire
start_time = datetime.now()
AttributeError: 'module' object has no attribute 'now'
Now I found 2 solutions:
in fireworks/queue/queue_launcher.py
import datetime
with from datetime import datetime
datetime.now()
with datetime.datetime.now()
lpad get_fws
lpad rerun_fws -s RESERVED
and similar commands for fireworks
error message of lpad get_fws
kl_me2 environment
/lustre/home/umjzhh-1/kl_me2/codes/pymatgen/pymatgen/io/vasp/sets_deprecated.py:460: DeprecationWarning: __init__ is deprecated
All vasp input sets have been replaced by equivalents pymatgen.io.sets. Will be removed in pmg 4.0.
return DictVaspInputSet(name, loadfn(filename), **kwargs)
Traceback (most recent call last):
File "/lustre/home/umjzhh-1/kl_me2/virtenv_kl_me2/bin/lpad", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/scripts/lpad", line 6, in <module>
lpad()
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/scripts/lpad_run.py", line 863, in lpad
args.func(args)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/scripts/lpad_run.py", line 208, in get_fws
fw = lp.get_fw_by_id(id)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/launchpad.py", line 316, in get_fw_by_id
return Firework.from_dict(self.get_fw_dict_by_id(fw_id))
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 148, in _decorator
m_dict = func(self, *new_args, **kwargs)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/firework.py", line 325, in from_dict
state, created_on, fw_id, updated_on=updated_on)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/firework.py", line 218, in __init__
tasks] # put tasks in a special location of the spec
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 161, in _decorator
m_dict = func(self, *args, **kwargs)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 133, in _decorator
m_dict = recursive_dict(m_dict)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 76, in recursive_dict
return {recursive_dict(k, preserve_unicode): recursive_dict(v, preserve_unicode) for k, v in obj.items()}
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 76, in <dictcomp>
return {recursive_dict(k, preserve_unicode): recursive_dict(v, preserve_unicode) for k, v in obj.items()}
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 79, in recursive_dict
return [recursive_dict(v, preserve_unicode) for v in obj]
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 70, in recursive_dict
return recursive_dict(obj.as_dict(), preserve_unicode)
File "/lustre/home/umjzhh-1/kl_me2/codes/custodian/custodian/vasp/jobs.py", line 347, in as_dict
default_vasp_input_set=self.default_vis.as_dict(),
AttributeError: 'dict' object has no attribute 'as_dict'
error message of lpad rerun_fws -s RESERVED
l_me2 environment
Are you sure? This will modify 40 entries. (Y/N)Y
/lustre/home/umjzhh-1/kl_me2/codes/pymatgen/pymatgen/io/vasp/sets_deprecated.py:460: DeprecationWarning: __init__ is deprecated
All vasp input sets have been replaced by equivalents pymatgen.io.sets. Will be removed in pmg 4.0.
return DictVaspInputSet(name, loadfn(filename), **kwargs)
2016-06-03 20:31:30,242 DEBUG Processed fw_id: ...
...
2016-06-03 20:31:31,194 INFO Also rerunning duplicate fw_id: 11131
2016-06-03 20:31:31,233 DEBUG Processed fw_id: 3940
2016-06-03 20:31:31,276 INFO Also rerunning duplicate fw_id: 11110
2016-06-03 20:31:31,323 DEBUG Processed fw_id: 3947
2016-06-03 20:31:31,375 INFO Also rerunning duplicate fw_id: 11103
2016-06-03 20:31:31,416 DEBUG Processed fw_id: 3968
2016-06-03 20:31:30,242 DEBUG Processed fw_id: ...
Traceback (most recent call last):
File "/lustre/home/umjzhh-1/kl_me2/virtenv_kl_me2/bin/lpad", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/scripts/lpad", line 6, in <module>
lpad()
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/scripts/lpad_run.py", line 863, in lpad
args.func(args)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/scripts/lpad_run.py", line 387, in rerun_fws
lp.rerun_fw(int(f))
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/launchpad.py", line 966, in rerun_fw
updated_ids = wf.rerun_fw(fw_id)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/firework.py", line 836, in rerun_fw
self.rerun_fw(child_id, updated_ids))
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/firework.py", line 830, in rerun_fw
m_fw._rerun()
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/launchpad.py", line 1219, in _rerun
self.full_fw._rerun()
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/launchpad.py", line 1309, in full_fw
self._get_launch_data(launch_field)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/launchpad.py", line 1320, in _get_launch_data
fw = self.partial_fw # assure stage 1
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/launchpad.py", line 1302, in partial_fw
self._fw = Firework.from_dict(data)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 148, in _decorator
m_dict = func(self, *new_args, **kwargs)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/firework.py", line 325, in from_dict
state, created_on, fw_id, updated_on=updated_on)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/core/firework.py", line 218, in __init__
tasks] # put tasks in a special location of the spec
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 161, in _decorator
m_dict = func(self, *args, **kwargs)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 133, in _decorator
m_dict = recursive_dict(m_dict)
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 76, in recursive_dict
return {recursive_dict(k, preserve_unicode): recursive_dict(v, preserve_unicode) for k, v in obj.items()}
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 76, in <dictcomp>
return {recursive_dict(k, preserve_unicode): recursive_dict(v, preserve_unicode) for k, v in obj.items()}
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 79, in recursive_dict
return [recursive_dict(v, preserve_unicode) for v in obj]
File "/lustre/home/umjzhh-1/kl_me2/codes/fireworks/fireworks/utilities/fw_serializers.py", line 70, in recursive_dict
return recursive_dict(obj.as_dict(), preserve_unicode)
File "/lustre/home/umjzhh-1/kl_me2/codes/custodian/custodian/vasp/jobs.py", line 347, in as_dict
default_vasp_input_set=self.default_vis.as_dict(),
AttributeError: 'dict' object has no attribute 'as_dict'
There are many pyc files on the delivery (tar.gz) created in Python 3.3 which cause errors running on other versions. If we delete all the pycs, then it recreates and works well.
Webgui fails to start with the above error:
C:\Users\kpoman>python c:\Python34\Lib\site-packages\fireworks\scripts\lpad_run.py webgui
Process Process-1:
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\django\utils\translation\trans_real.py", line 186, in _fetch
app_configs = reversed(list(apps.get_app_configs()))
File "C:\Python34\lib\site-packages\django\apps\registry.py", line 137, in get_app_configs
self.check_apps_ready()
File "C:\Python34\lib\site-packages\django\apps\registry.py", line 124, in check_apps_ready
raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python34\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
File "C:\Python34\lib\multiprocessing\process.py", line 93, in run
self._target(_self._args, *self.kwargs)
File "C:\Python34\lib\site-packages\django\core\management__init.py", line 115, in call_command
return klass.execute(args, *defaults)
File "C:\Python34\lib\site-packages\django\core\management\base.py", line 331, in execute
translation.activate('en-us')
File "C:\Python34\lib\site-packages\django\utils\translation__init.py", line 145, in activate
return _trans.activate(language)
File "C:\Python34\lib\site-packages\django\utils\translation\trans_real.py", line 225, in activate
_active.value = translation(language)
File "C:\Python34\lib\site-packages\django\utils\translation\trans_real.py", line 209, in translation
default_translation = _fetch(settings.LANGUAGE_CODE)
File "C:\Python34\lib\site-packages\django\utils\translation\trans_real.py", line 189, in _fetch
"The translation infrastructure cannot be initialized before the "
django.core.exceptions.AppRegistryNotReady: The translation infrastructure cannot be initialized before the apps registry is ready. Check that you don't make no
n-lazy gettext calls at import time.
C:\Users\kpoman>
This apparently is caused by some incompatibility on the way django starts, introduced in django >=1.7
Up to this point, I have been able to nicely interact with code in iPython by constructing the tasks and establishing their dependencies. Finally, I launch them and can inspect the results.
Now, that I am moving to using a Sun Grid Engine queue I am little confused as to how I can do this. The documentation discuss how to do this with YAML files. However, I like the interactivity afford me by iPython and would like to continue writing code in Python.
I see that there is a rapidfire
function for the queue, but I am unclear on how to get this to work. In particular, how can I set up the CommonAdapter
so that it will launch this job in Python without going through a yaml file directly.
I want a way to transfer data between FireTasks. Right now, there is a way to do so between Fireworks. But even with mod_spec or add_spec, these changes do not get updated in the spec until all tasks are done.
I really need task-level recovery for Fireworks. For example, let's say I have a Firework that runs:
Task 1: Setup input
Task 2: Run VASP [Really expensive]
Task 3: Transfer Calc (takes < 1min)
Task 4: Setup new firework (takes seconds)
Let's say for some reason, Transfer Calc fails (e.g., because there was a temporary IO outage). When I rerun the firework, I think the Firework should automatically know that Task 1 and Task 2 have completed successfully and only proceed to do Task 3.
Note that of course I know I can separate every single task into a Firework, but it is just too dumb to wait in the queue just to do a transfer that takes < 1min.
How would you go about extending the workflow spec to allow make style dependencies? I would be nice to have a general way for the job to be executed only if the inputs have a date that is later than the output files.
This could also be extended to the way you define links in the specs section.
One way would be to write a task specific Firetask but have a general method in the spec would be more useful.
Is this easy to do?
Right now, if a q submission fails, I only get the following:
RuntimeError: queue script could not be submitted, check queue adapter and queue server status!
which is rather unhelpful and impossible to debug. At the minimum, fireworks should spit out the actual error message from the q submission command.
Looking at the documentation for PyTask (https://pythonhosted.org/FireWorks/pytask.html), it looks like it's intended that it should be able to return a FWAction for message-passing etc, but I've been unable to make this work. Looking at script_task.py it looks like it will never return the output.
Inserting something like
elif isinstance(output,FWAction):
return output
at the end of the PyTask definition makes it function as I understand it should.
Am I missing something in how I should be using this?
Add a benchmark for loading large and/or multiple workflows
Port 5000 is a common one to use. There should be a more graceful fallback than just crashing (e.g. bump the port number and try again).
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Process Process-1:
Traceback (most recent call last):
File "/zopt/conda/envs/nanshenv/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/zopt/conda/envs/nanshenv/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/flask/app.py", line 772, in run
run_simple(host, port, self, **options)
File "/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/werkzeug/serving.py", line 625, in run_simple
inner()
File "/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/werkzeug/serving.py", line 603, in inner
passthrough_errors, ssl_context).serve_forever()
File "/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/werkzeug/serving.py", line 512, in make_server
passthrough_errors, ssl_context)
File "/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/werkzeug/serving.py", line 440, in __init__
HTTPServer.__init__(self, (host, int(port)), handler)
File "/zopt/conda/envs/nanshenv/lib/python2.7/SocketServer.py", line 420, in __init__
self.server_bind()
File "/zopt/conda/envs/nanshenv/lib/python2.7/BaseHTTPServer.py", line 108, in server_bind
SocketServer.TCPServer.server_bind(self)
File "/zopt/conda/envs/nanshenv/lib/python2.7/SocketServer.py", line 434, in server_bind
self.socket.bind(self.server_address)
File "/zopt/conda/envs/nanshenv/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 48] Address already in use
Instead of a fixed tracker, I would like a base class to do monitoring of launches (ala Custodian handler's monitor function). There is no limitation to what a monitor can do, and tracking the last few lines is just a special case.
Hi,
I just noticed that the PBS script template doesnt have the variable defined for setting the required process memory. The line for setting the process memory in PBS looks something like this:
It would be great if you guys could add 'pmem' variable to the PBS template
regards
Kiran Mathew
I works with Fireworks 1.2.7 with MPenv 0.1(sjtu branch) and Python 2.7.11.
when I type the following codes in python:
from fireworks.fw_config import QUEUEADAPTER_LOC, CONFIG_FILE_DIR, FWORKER_LOC, LAUNCHPAD_LOC
print QUEUEADAPTER_LOC,'\n', CONFIG_FILE_DIR,'\n', FWORKER_LOC,'\n', LAUNCHPAD_LOC
I saw these:
None
$HOME/<ENV_NAME>/config/config_SjtuPi
$HOME/<ENV_NAME>/config/config_SjtuPi/my_fworker.yaml
$HOME/<ENV_NAME>/config/config_SjtuPi/my_launchpad.yaml
I found the relevant codes in fireworks/fireworks/scripts/qlaunch_run.py
line 35-38 is to source the queueadapter_file
as my_qadapater.yaml
but the line 106 set the default QUEUEADAPTER_LOC
to my_queueadapater.yaml
(which doesn't exist)
what i mean is in the line 143-159 of fireworks/fireworks/fw_config.py
I don't know whether the my_qadapater.yaml
will work for command qlaunch
? Now, I just make a soft link my_queueadapter.yaml -> my_qadapter.yaml
, this will help the 'QUEUEADAPTER_LOC' print normally.
It isn't obvious to me from the spec that this is intended. I would expect that all directories would be at the same level as if one had run launch_rocket
multiple times. Instead, it appears that the directories become deeply nested. This seems to be caused by multiple calls to os.chdir
. If this is indeed intended and expected behavior, I would expect at the minimum that we would change back to the original directory after a run.
There is a serious bug in the links validation. Or to be more exact, there isn't links validation as far as I can see.
For example, starting from a blank launchpad, the following code will generate a workflow that can be inserted without error, but is actually invalid and get_wfs raises an error.
from fireworks import FireWork, Workflow
from fireworks.user_objects.firetasks.script_task import ScriptTask
fws = []
for i in xrange(5):
fw = FireWork([ScriptTask(script="echo %d" % i)], fw_id=i)
fws.append(fw)
wf = Workflow(fws, links_dict={0: [1, 2, 3], 1: [4], 2: [100]})
wf.to_file("testwf.yaml")
Even worse, if you replace the final 2:[100] with 2:[5], there is no error in workflow creation and addition and get_wfs shows that 2 is now linked to fw_id 5. The correct mapping should result ins 2: 6 and 6 is non-existent.
Hi,
First of, great library!
There is only one thing I would like to do which I cannot get to work and that is load a workflow from a file and execute it as part of a python script.
For example I would like to do something like:
from fireworks import Firework, LaunchPad, ScriptTask
from fireworks.core.rocket_launcher import launch_rocket
# set up the LaunchPad and reset it
launchpad = LaunchPad()
launchpad.reset('', require_password=False)
fw = Firework.from_file("/home/chris/dev/fireworks/fw_tutorials/org_wf.yaml")
# store workflow and launch it locally
launchpad.add_wf(fw)
launch_rocket(launchpad)
However I get the following error:
Traceback (most recent call last):
File "WorkflowEngine.py", line 12, in
fw = Firework.from_file("/levaux/tests/fireworkstest/org_wf.yaml")
File "/usr/local/lib/python2.7/dist-packages/fireworks/utilities/fw_serializers.py", line 253, in from_file
return cls.from_format(f.read(), f_format=f_format)
File "/usr/local/lib/python2.7/dist-packages/fireworks/utilities/fw_serializers.py", line 228, in from_format
yaml.load(f_str, Loader=Loader)))
File "/usr/local/lib/python2.7/dist-packages/fireworks/utilities/fw_serializers.py", line 145, in _decorator
m_dict = func(self, _new_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/fireworks/core/firework.py", line 311, in from_dict
tasks = m_dict['spec']['_tasks']
KeyError: u'spec'
I can however load and execute simpler single fireworks like: fw_test.yaml
Thanks in advance.
Cheers
Chris
The following code would not work:
from fireworks import FireWork, Workflow, FWorker, LaunchPad
from fireworks.core.rocket_launcher import rapidfire
from fireworks.user_objects.firetasks.script_task import ScriptTask
# define four individual FireWorks used in the Workflow
task1 = ScriptTask.from_str('echo "Task 1"')
task2 = ScriptTask.from_str('echo "Task 2"')
spec = {'_category': 'cluster1'}
fw1 = FireWork(task1, fw_id=1, name='Task 1', spec=spec)
fw2 = FireWork(task2, fw_id=2, name='Task 2', spec=spec)
# assemble Workflow from FireWorks and their connections by id
workflow = Workflow([fw1, fw2])
# store workflow and launch it locally
launchpad = LaunchPad.auto_load()
launchpad.add_wf(workflow)
lpad get_fws -i 1 -d all
would show:
[
{
"fw_id": 1,
"state": "READY",
"name": "Task 1",
"created_on": "2014-03-21T17:35:57.614476",
"spec": {
"_tasks": [
{
"use_shell": true,
"_fw_name": "ScriptTask",
"script": [
"echo \"Task 2\""
]
}
],
"_category": "cluster1"
}
},
{
"fw_id": 2,
"state": "READY",
"name": "Task 2",
"created_on": "2014-03-21T17:35:57.614507",
"spec": {
"_tasks": [
{
"use_shell": true,
"_fw_name": "ScriptTask",
"script": [
"echo \"Task 2\""
]
}
],
"_category": "cluster1"
}
}
]
Note how the tasks in FW 1 got changed to the tasks in FW 2. This is because the spec of FW 1 is changed in FW 2 initialization. I suggest to copy spec in FireWork.__init__()
As mentioned by Shyue, qlaunch rapidfire will create many empty directories that can greatly exceed the number of fireworks in the database. If fireworks are not added to the DB by the time the jobs start running in the queue, you will get empty directories all over your file system.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.