nhoffman / bioscons Goto Github PK
View Code? Open in Web Editor NEWExtends the scons build tool for reproducible workflows in bioinformatics.
Extends the scons build tool for reproducible workflows in bioinformatics.
This is handy for submitting srun
jobs through salloc
, where the srun commands must all be run with --ntasks=1
for things to work properly. It's annoying to have to set this flag on every single Command.
Specifically,
targs = Targets(locals().values())
targs.show_extras("outdir")
fails with the following trace:
╰➤ sc output
scons: Reading SConscript files ...
AttributeError: 'Targets' object has no attribute 'targets':
File "/home/csmall/code/slurm-scons-debug/SConstruct", line 53:
tgts = fileutils.Targets(locals().values())
File "/home/csmall/pythedge/local/lib/python2.7/site-packages/bioscons/fileutils.py", line 58:
self.targets = self.update(objs) if objs else set()
File "/home/csmall/pythedge/local/lib/python2.7/site-packages/bioscons/fileutils.py", line 68:
self.targets.update(
The following code however works
tgts = fileutils.Targets()
tgts.update(locals().values())
tgts.show_extras(outdir)
It seems that for update
to run successfully, there must already be a targets attribute.
When using an action with multiple steps (either ';' or '&&'), only the first command is being timed.
By default, 'time' only works on the first command:
/usr/bin/time --verbose ls /mnt/disk11 && date
Command being timed: "ls /mnt/disk11"
But this works to time both:
/usr/bin/time --verbose bash -c 'ls /mnt/disk2/ && date'
Command being timed: "bash -c ls /mnt/disk2/ && date"
The command running in bioscons is (which looks like it should be timing both) :
srun -J "/usr/bin/time" bash -c '/usr/bin/time --verbose --output output/sample.test.time ls /mnt/disk11 && date '
But timing is only picking up the first command:
Command being timed: "ls /mnt/disk11"
Double and single quotes have the same behavior in both the SConstruct and at the command line.
We should put together a collection of "cookbook" example scripts so novices can get up to speed quickly. I have made an "examples" directory.
In examples/SConstruct.drop_seq, I don't understand everything, and would love some help on the things marked with ??s
complement to Targets.show_extras(outdir)
Used to define version number using svn revision; need something else now.
should include note to use
python setup.py install --standard-lib
if user wants docs to build
if you rename your README README.rst it will be pretty on github!
Current behavior is to define slurm_queue when creating the Environment - it should be possible to use different queues within the same script.
The top-level SConstruct should require a venv, define PATH explicitly, and include targets for tests, sphinx docs, publication
It seems like it should be pretty easy to capture the call of super in _SlurmCommand, call Precious on it, and then return the result.
Is there any reason not to do this? I understand that Precious forces a backup of the file to be created while things are being worked on, so I could see space possibility being a consideration, but it seems like that should never place a requirement greater than 2x space required otherwise (and only that in extreme cases of everything being built in parallel...).
Consider per-module dependencies on biopython
% scons -n
scons: Reading SConscript files ...
NameError: name 'basestring' is not defined:
File "/mnt/disk2/molmicro/working/ngh2/2017-11-29-test-scons-ncores/SConstruct", line 13:
action='date > $TARGET'
File "/mnt/disk2/molmicro/working/ngh2/2017-11-29-test-scons-ncores/bioscons-env/lib/python3.6/site-packages/bioscons/slurm.py", line 142:
if isinstance(action, basestring) and use_cluster and self.use_cluster:
Not sure how this was missed by 2to3...
Some of us have noticed that occasionally scons thinks that files have changed and need to rebuild, even when this should not be the case. This can be particularly annoying with long running jobs, or jobs with some degree of randomness, as this can lead to all downstream targets being rebuilt unnecessarily.
After some snooping around, I've discovered that this only seems to happen when running on the cluster, and specifically seems to be related to the parental scons process not seeing the changes to the file(system), and conseuquently reading an incorrect (presumably null) MD5 hash.
This problem can be solved by appending appending an action to the end of the command string that ensures that the file exists before returning. The ideal solution would require that a flag be set on SlurmEnvironment to turn on this behavior if desired, defaulting to the current behavior otherwise. It should also be possible to turn this on or off on a specific Command, as well as specify the max wait time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.