Coder Social home page Coder Social logo

gkno_launcher's People

Contributors

alistairnward avatar chapmanb avatar chmille4 avatar pezmaster31 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gkno_launcher's Issues

Spaces on the command line - Version 2 update

Update the command line parser to allow text strings to be included on the command line that contain spaces. Currently, the command line is split on spaces, so any value with a space breaks the parser.

The following utilities have version numbers that could not be determined by gkno:javac

[09:56 user@cluster gkno_launcher]$ ./gkno build

===============================
  Boston College gkno package

  version: 0.106
  date:    November 2013
===============================

Reading in command line arguments...done.

Checking dependencies...failed.
----------------------------------------
The following utilities have version numbers that could not be
determined by gkno:
        javac

This indicates a likely bug or as-yet-unseen oddity.
Please contact the gkno development team to report this issue.  Thanks.

----------------------------------------


================================================================================================
  TERMINATED: Errors found in running gkno.  See specific error messages above for resolution.
================================================================================================

And here's info about my javac:
javac -v

Eclipse Java Compiler 0.894_R34x, 3.4.2 release, Copyright IBM Corp 2000, 2008. All rights reserved.

Output files

Check that the number of output files is equal to the number of inputs. If n files are supplied to a non-greedy task, the task will be run n times generating n outputs. If n inputs are supplied on the command line and m outputs are supplied, there will be a problem! Ensure that this is handled.

fastq-tangram was failed

Hello,

I'm trying to analyze mobile element insertion sites by gkno fastq-tangram pipeline.
But sort-bam task was failed, because there is no input file.

MosaikAligner gives '.bam' as a suffix to specified filename by '--out' paramater.

  • stdout/stderr of gkno pipe fastq-tangram.
$ ./gkno pipe fastq-tangram --input-path ./resources/tutorial/current --output-path ./test_output --fasta-reference chr20_fragment.fa --mobile-element-fasta mobile_element_sequences.fa --merged-reference-fasta chr20_fragment_moblist.fa --ann-paired-end pe.100.01.ann --ann-single-end se.100.005.ann --fastq mutated_genome_1.fq --fastq2 mutated_genome_2.fq --tangram-directory tangram-files --histogram-file tangram-files/hist.dat --library-file tangram-files/lib_table.dat --sequencing-technology illumina --bam-list bam_list.txt --bam mutated_genome.bam --vcf mutated_genome.vcf --processors 1 --hash-size 10 --special-reference-hashes 10 --special-reference-prefix chr20_fragment_moblist_10 --region 20

======================================================
  Boston College gkno package

  version:    1.20.1
  date:       June 2014
  git commit: 39bf3e44b2bffd3e51b6284f27abef4739b4f1ee
======================================================

Reading in command line arguments...done.
Checking instance information...done.
Assigning command line arguments to graph nodes...done.
Checking for commands to execute at command line...done.

Workflow:
     build-tangram-reference (tangram-index):                            Create an indexed reference
                                                                         file including the mobile
                                                                         elements
     merge-fasta (concatenate-files):                                    Join multiple files
     build-reference (mosaik-build-reference):                           Build the Mosaik reference
     build-jump-database (mosaik-jump):                                  Generate the jump database for
                                                                         a Mosaik reference
     create-sequence-dictionary (picard-create-sequence-dictionary):     Generate a dictionary
                                                                         containing all of the sequences
                                                                         in the input reference fasta.
     index-fasta (samtools-index-fasta):                                 Generate an index for a
                                                                         reference fasta file.
     generate-mosaik-parameters (premo):                                 Determine MosaikAligner
                                                                         parameters based on read and
                                                                         fragment length
     build-read-archive (mosaik-build-fastq):                            Build the Mosaik read archive
     align (mosaik-aligner-special):                                     Pairwise alignment of a read
                                                                         archive with additional
                                                                         'special' reference sequences.
                                                                         The special sequences must all
                                                                         have a common prefix and
                                                                         alignment to them will be shown
                                                                         in the ZA tags. No primary
                                                                         alignments to the 'special'
                                                                         sequences will occur.
     sort-bam (bamtools-sort):                                           Sort a BAM file
     mark-duplicates (dedup):                                            Mark duplicate reads in a BAM
                                                                         file (University of Michigan).
     index-duplicate-marked (bamtools-index):                            Index a BAM file.
     generate-bam-list (generate-file-list):                             Generate a text file containing
                                                                         a list of files
     scan-bam-files (tangram-scan):                                      Generate a histogram of the
                                                                         fragment length distributions
                                                                         of the input libraries.
     detect-mei (tangram-detect):                                        Detect and genotype structural
                                                                         variation events.
     index-bam (bamtools-index):                                         Index a BAM file.

Logging gkno usage with ID: pipes/fastq-tangram...done.

Executing makefile: make -j 1 --file fastq-tangram.make...
Executing task: build-tangram-reference...completed successfully.
Executing task: merge-fasta...completed successfully.
Executing task: build-reference...completed successfully.
Executing task: build-jump-database...completed successfully.
make: Warning: File `test_output/chr20_fragment_moblist_10_positions.jmp' has modification time 0.0084 s in the future
Executing task: create-sequence-dictionary...completed successfully.
Executing task: index-fasta...completed successfully.
Executing task: generate-mosaik-parameters...completed successfully.
Executing task: build-read-archive...completed successfully.
Executing task: align...completed successfully.
Executing task: sort-bam...make: *** [/path/to/gkno/gkno_launcher-1.20.1-g39bf3e44b2/test_output/mutated_genome_sorted.bam] Error 1
.failed

gkno failed to complete successfully.  Please check the output files to identify the cause of the problem.

================================================================================================
  TERMINATED: Errors found in running gkno.  See specific error messages above for resolution.
================================================================================================
  • generated files
$ ls -1 test_output/
chr20_fragment_moblist_10_keys.jmp
chr20_fragment_moblist_10_meta.jmp
chr20_fragment_moblist_10_positions.jmp
chr20_fragment_moblist.dat
chr20_fragment_moblist.dict
chr20_fragment_moblist.fa
chr20_fragment_moblist.fa.fai
fastq-tangram_mosaikParameters.json
fastq-tangram.stderr
fastq-tangram.stdout
mutated_genome.bam.bam
mutated_genome.bam.stat
tangram-reference.dat
$

There is 'mutated_genome.bam.bam' instead of 'mutated_genome.bam'.

Include processor information - Version 2 update

If the number of processors used by a task is known, store this information in the graph. Ultimately, when figuring out how to break the pipeline into subphases/divisions etc, tasks with different processor requirements can be separated in order to maximise efficiency.

argument --region in ./gkno pipe fastq-short-variants

Hi,

I would like to report the following issue, when trying to run pipe fastq-short-variants by this command line for example:

./gkno pipe fastq-short-variants
--fasta-reference resources/tutorial/current/chr20_fragment.fa
--fastq resources/tutorial/current/mutated_genome_set2_1.fq
--fastq2 resources/tutorial/current/mutated_genome_set2_2.fq
--ann-single-end resources/tutorial/current/se.100.005.ann
--ann-paired-end resources/tutorial/current/pe.100.01.ann
--special-reference-hashes [0]
--sequencing-technology [illumina]
--hash-size [4]
--output-path /Users/apple/gkno_launcher/NGS

I am receiving this error msg:
ERROR: A required command line argument is missing.

DETAILS: The task 'call-short-variants' requires the argument '--region (-r)' to be set, but it
has not been specified on the command line. This argument cannot be set using a
pipeline argument and consequently must be set using the syntax:

     ./gkno pipe <pipeline name> --call-short-variants [--region <value>] [options]

     This argument is described as the following: <chrom>:<start_position>..<end_position>.
     Limit analysis to the specified region, 0-base coordinates, end_position not included
     (same as BED format).

However, --region is not listed among the required arguments.. is there any mistake in my command line so I am recieving this error ?

P.S: when I write ./gkno pipe fastq-short-variants --help, I am getting this erorr msg at the end:
The following tasks can have parameters modified:
Traceback (most recent call last):
File "/Users/apple/gkno_launcher/src/gkno.py", line 484, in
main()
File "/Users/apple/gkno_launcher/src/gkno.py", line 231, in main
gknoHelp.specificPipelineUsage(pipelineGraph, config, gknoConfig, runName, toolConfigurationFilesPath, instanceName)
File "/Users/apple/gkno_launcher/src/gkno/helpClass.py", line 571, in specificPipelineUsage
isHidden = self.availableTools[associatedTool][1]
KeyError: u'mosaik-aligner-special'

Best regards,

Kiz

Wildcard

Consider the following example:

gkno bwa -q *_1.fq -q2 *_2.fq

The wildcards will be evaluated by bash and fed to gkno, however the order in which the lists appear cannot be guaranteed. For this to work, each X_1.fq must be paired with the correct X_2.fq. Either include logic in configuration file instructing gkno to look for shared patterns, or provide a general check. If multiple arguments are given multiple values on the command line, check to see if there is any common text between the values and order accordingly.

Build connecting to local server?

I am trying to build gkno and I am getting the following error:

./gkno build
fatal: unable to connect to github.com:
github.com[0: 192.30.252.131]: errno=Connection timed out

Clone of 'git://github.com/gkno/configurationClass.git' into submodule path 'src/configurationClass' failed
Traceback (most recent call last):
File "/home/bcantarel/seqprg/gkno_launcher/src/gkno.py", line 11, in
import networkx as nx
ImportError: No module named networkx

gkno build : some programs of Tangram were not built

Hello,

I have downloaded gkno file sets and builded them.
"gkno build" seemed ok, however tangram_merge and tangram_scan were not built.

  • stdout/stderr of gkno build.
$ gkno build
Initialized empty Git repository in /path/to/gkno_launcher/src/configurationClass/.git/
remote: Reusing existing pack: 1051, done.
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 1058 (delta 1), reused 0 (delta 0)
Receiving objects: 100% (1058/1058), 712.06 KiB | 491 KiB/s, done.
Resolving deltas: 100% (512/512), done.

======================================================
  Boston College gkno package

  version:    1.0.4
  date:       May 2014
  git commit: 6f52694cea6048617aa31afb4bc201aa302c8510
======================================================

Checking dependencies...done.
Initializing component data...done.
Building tools:
  bamtools...done.
  blast...done.
  freebayes...done.
  gatk...done.
  glia...done.
  jellyfish...done.
  libStatGen...done.
  bamUtil...done.
  fastQValidator...done.
  qplot...done.
  verifyBamID...done.
  mosaik...done.
  musket...done.
  mutatrix...done.
  ogap...done.
  picard...done.
  premo...done.
  Rufus...done.
  samtools...done.
  scissors...done.
  seqan...done.
  snpEff...done.
  tabix...done.
  tangram...done.
  vcflib...done.
Fetching default resources:
  tutorial:
    Downloading files... 100%
    Unpacking files...done.
  • Builded programs of Tangram.
$ ls -1 tools/Tangram/bin/
tangram_bam
tangram_detect
tangram_filter.pl
tangram_index
tangram_view_scan_file.py
  • Rebuilding Tangram.
$ pushd tools/Tangram/src/
$ make clean
$ make
Building Tangram
=========================================================
- Building in OutSources
  * compiling ssw_cpp.cpp
  * compiling ssw.c
  * compiling Fasta.cpp
  * compiling split.cpp
  * compiling md5.c
  * compiling TGM_BamHeader.c
  * compiling TGM_Error.c

- Building in TangramBam
  * compiling hashes_collection.cpp
  * compiling special_hasher.cpp
  * compiling tangram_bam.cpp
  * compiling ConvertHashTableOutToIn.c
  * compiling seq_converter.c
  * compiling SR_BamHeader.c
  * compiling SR_BamInStream.c
  * compiling SR_BamMemPool.c
  * compiling SR_Error.c
  * compiling SR_HashRegionTable.c
  * compiling SR_InHashTable.c
  * compiling SR_OutHashTable.c
  * compiling SR_QueryRegion.c
  * compiling SR_Reference.c
  * linking /path/to/gkno_launcher/tools/Tangram/bin/tangram_bam

- Building in TangramScan
  * compiling TGM_ReadPairScan.c
  * compiling TGM_ReadPairScanGetOpt.c
  * compiling TGM_ReadPairScanMain.c
  * compiling TGM_LibInfo.c
  * compiling TGM_FragLenHist.c
  * compiling TGM_GetOpt.c
  * compiling TGM_Utilities.c
  * compiling TGM_BamInStream.c
  * compiling TGM_BamPairAux.c
  * compiling TGM_BamMemPool.c
  * linking /path/to/gkno_launcher/tools/Tangram/bin/tangram_scan

- Building in TangramMerge
  * compiling TGM_MergeLibGetOpt.c
  * compiling TGM_MergeLibMain.c
  * compiling TGM_LibInfo.c
  * compiling TGM_FragLenHist.c
  * compiling TGM_GetOpt.c
  * compiling TGM_Utilities.c
  * linking /path/to/gkno_launcher/tools/Tangram/bin/tangram_merge

- Building in TangramDetect
  * compiling TGM_Aligner.cpp
  * compiling TGM_BamPair.cpp
  * compiling TGM_Cluster.cpp
  * compiling TGM_Detector.cpp
  * compiling TGM_FirstMapThread.cpp
  * compiling TGM_FragLenTable.cpp
  * compiling TGM_Genotype.cpp
  * compiling TGM_LibTable.cpp
  * compiling TGM_PairAttrbt.cpp
  * compiling TGM_Parameters.cpp
  * compiling TGM_Printer.cpp
  * compiling TGM_Reference.cpp
TGM_Reference.cpp: In function ‘int ks_getuntil2(kstream_t*, int, kstring_t*, int*, int)’:
TGM_Reference.cpp:28: warning: comparison between signed and unsigned integer expressions
  * compiling TGM_RescuePartial.cpp
  * compiling TGM_SecondMapThread.cpp
TGM_SecondMapThread.cpp: In member function ‘bool Tangram::SecondMapThread::SecondFilterSpecial(bool&, Tangram::RescuePartial&, uint8_t&, const s_align*, uint8_t, const Tangram::PrtlAlgnmnt&, const Tangram::RefRegion&, const int8_t*, int)’:
TGM_SecondMapThread.cpp:457: warning: comparison between signed and unsigned integer expressions
TGM_SecondMapThread.cpp:458: warning: comparison between signed and unsigned integer expressions
  * compiling TGM_Tangram.cpp
  * compiling TGM_GetOpt.c
  * compiling TGM_Utilities.c
  * linking /path/to/gkno_launcher/tools/Tangram/bin/tangram_detect

- Building in TangramIndex
  * compiling TGM_IndexRef.cpp
  * compiling TGM_RefParameters.cpp
  * linking /path/to/gkno_launcher/tools/Tangram/bin/tangram_index

- Building in TangramTools
  * copying /path/to/gkno_launcher/tools/Tangram/bin/tangram_filter.pl /path/to/gkno_launcher/tools/Tangram/bin/tangram_view_scan_file.py

$ popd
  • After rebuilding: Builded programs of Tangram.
$ ls -1 tools/Tangram/bin/
tangram_bam
tangram_detect
tangram_filter.pl
tangram_index
tangram_merge
tangram_scan
tangram_view_scan_file.py
$
  • Version of Tangram.
$ pushd tools/Tangram
$ git log | head -6
commit 1fb2e76028eaffe4894a9fab00a71bf5e8c91011
Author: Jiantao Wu <[email protected]>
Date:   Sun Feb 9 00:39:29 2014 -0800

    Update the readme file

$

Failed task argument parsing

I'm unable to specify individual task arguments when running a pipeline. Like in a similar docs example, I tried adding this argument to fastq-vcf-se pipeline:

--variant-call "--use-best-n-alleles 5"

I get this error:

Assigning command line arguments to tasks...done.
Checking the command line arguments...

ERROR:   Missing value for argument: --use-best-n-alleles (-n)

DETAILS: The argument '--use-best-n-alleles' was specified for task 'variant-call' and it
         expects a value of type 'integer', but no value was provided.  Please check the command
         line.

================================================================================================
  TERMINATED: Errors found in running gkno.  See specific error messages above for resolution.
================================================================================================

Best,
Carlos

Premo fails to build in Debian Wheezy

This probably should go into premo repository. I'm opening it here cause I found it while building gkno.

$ cat logs/build_premo.err 
/home/cborroto/src/gkno_launcher/tools/premo/src/libs/bamtools/internal/index/BamStandardIndex_p.cpp: In member function ‘void BamTools::Internal::BamStandardIndex::WriteLinearOffsets(const int&, BamTools::Internal::BaiLinearOffsetVector&)’:
/home/cborroto/src/gkno_launcher/tools/premo/src/libs/bamtools/internal/index/BamStandardIndex_p.cpp:958:89: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/home/cborroto/src/gkno_launcher/tools/premo/src/libs/bamtools/internal/io/BamHttp_p.cpp: In member function ‘bool BamTools::Internal::BamHttp::SendRequest(size_t)’:
/home/cborroto/src/gkno_launcher/tools/premo/src/libs/bamtools/internal/io/BamHttp_p.cpp:396:66: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/home/cborroto/src/gkno_launcher/tools/premo/src/libs/bamtools/internal/io/TcpSocket_p.cpp: In member function ‘std::string BamTools::Internal::TcpSocket::ReadLine(int64_t)’:
/home/cborroto/src/gkno_launcher/tools/premo/src/libs/bamtools/internal/io/TcpSocket_p.cpp:336:33: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp: In member function ‘void FastqReader::close()’:
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp:69:9: error: conditional expression between distinct pointer types ‘gzFile’ and ‘FILE* {aka _IO_FILE*}’ lacks a cast
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp: In member function ‘bool FastqReader::isOpen() const’:
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp:100:14: error: conditional expression between distinct pointer types ‘gzFile’ and ‘FILE* {aka _IO_FILE*}’ lacks a cast
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp: In member function ‘bool FastqReader::open(const string&)’:
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp:137:5: error: conditional expression between distinct pointer types ‘gzFile’ and ‘FILE* {aka _IO_FILE*}’ lacks a cast
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp: In member function ‘bool FastqReader::isOpen() const’:
/home/cborroto/src/gkno_launcher/tools/premo/src/app/fastqreader.cpp:101:1: warning: control reaches end of non-void function [-Wreturn-type]
make[2]: *** [src/app/CMakeFiles/PremoApp.dir/fastqreader.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [src/app/CMakeFiles/PremoApp.dir/all] Error 2
make: *** [all] Error 2

This is in a Debian Wheezy(recently released stable version) system with cmake 2.8.9 and gcc 4.7.2.

Thanks for all your help,
Carlos
PS: Could you share what system you use? I really want to get gkno to build and work. I know how hard is to support every possible combination of development tools and libraries.

gkno build troubleshooting

Hi Guys,
I just started to build gkno on my personal MacBook, the building process seemed ok, however some tools were not built:
verifyBamID, musket, rufus, jellyfish, mutatrix..
log files :

  • build_jellyfish.err:
    fatal: destination path 'yaggo' already exists and is not an empty directory.
  • build_musket.err:
    kmer.cpp: In member function ‘Kmer Kmer::set_base(uint8_t, int)’:
    kmer.cpp:133: warning: comparison is always true due to limited range of data type
    In file included from thread.h:7,
    from option.h:21,
    from main.cpp:1:
    barrier.h:19: error: ‘pthread_barrier_t’ does not name a type
    barrier.h: In constructor ‘MyBarrier::MyBarrier(int)’:
    barrier.h:10: error: ‘barrier’ was not declared in this scope
    barrier.h:10: error: ‘pthread_barrier_init’ was not declared in this scope
    barrier.h: In destructor ‘MyBarrier::~MyBarrier()’:
    barrier.h:13: error: ‘barrier’ was not declared in this scope
    barrier.h:13: error: ‘pthread_barrier_destroy’ was not declared in this scope
    barrier.h: In member function ‘void MyBarrier::wait()’:
    barrier.h:16: error: ‘barrier’ was not declared in this scope
    barrier.h:16: error: ‘pthread_barrier_wait’ was not declared in this scope
    main.cpp: In function ‘void ParaKmerEC(ProgramOptions&)’:
    main.cpp:650: warning: iteration variable ‘readIdx’ is unsigned
    make: *** [main.o] Error 1
    make: *** Waiting for unfinished jobs....
    In file included from thread.h:7,
    from thread.cpp:1:
    barrier.h:19: error: ‘pthread_barrier_t’ does not name a type
    barrier.h: In constructor ‘MyBarrier::MyBarrier(int)’:
    barrier.h:10: error: ‘barrier’ was not declared in this scope
    barrier.h:10: error: ‘pthread_barrier_init’ was not declared in this scope
    barrier.h: In destructor ‘MyBarrier::~MyBarrier()’:
    barrier.h:13: error: ‘barrier’ was not declared in this scope
    barrier.h:13: error: ‘pthread_barrier_destroy’ was not declared in this scope
    barrier.h: In member function ‘void MyBarrier::wait()’:
    barrier.h:16: error: ‘barrier’ was not declared in this scope
    barrier.h:16: error: ‘pthread_barrier_wait’ was not declared in this scope
    make: *** [thread.o] Error 1

-build_mutatrix.err:
In file included from Fasta.h:19,
from Fasta.cpp:9:
LargeFileSupport.h:12: error: ‘__off64_t’ does not name a type
make[1]: *** [Fasta.o] Error 1
make: *** [fastahack/Fasta.o] Error 2
make: *** Waiting for unfinished jobs....
LeftAlign.cpp:679:26: warning: '&&' within '||' [-Wlogical-op-parentheses]
&& (indel.insertion && indel.position == referenceSequence.size()
~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LeftAlign.cpp:679:26: note: place parentheses around the '&&' expression to silence this warning
&& (indel.insertion && indel.position == referenceSequence.size()
^
( )
1 warning generated.
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated
ld: warning: option -s is obsolete and being ignored
i686-apple-darwin11-llvm-g++-4.2: -lm: linker input file unused because linking not done
i686-apple-darwin11-llvm-g++-4.2: -ltabix: linker input file unused because linking not done
i686-apple-darwin11-llvm-g++-4.2: -lz: linker input file unused because linking not done
i686-apple-darwin11-llvm-g++-4.2: -lm: linker input file unused because linking not done
i686-apple-darwin11-llvm-g++-4.2: -ltabix: linker input file unused because linking not done
i686-apple-darwin11-llvm-g++-4.2: -lz: linker input file unused because linking not done
i686-apple-darwin11-llvm-g++-4.2: -lm: linker input file unused because linking not done

-build_rufus.err:
cc1plus: error: unrecognized command line option "-std=gnu++0x"
cc1plus: error: unrecognized command line option "-std=gnu++0x"
make: *** [bin/ModelDist2] Error 1
make: *** Waiting for unfinished jobs....
make: *** [bin/Overlap19] Error 1

-build_VerifyBamID
Main.cpp:23:20: error: values.h: No such file or directory
make[1]: *** [../obj/Main.o] Error 1
make: *** [src] Error 2

Do you have any idea why ? Has anyone encountered these issue before ?

Best regards

Kiz

Dependency checking

Not all dependencies are checked up front. Update the list of checked dependencies and test on multiple operating systems.

Version 2 upgrade

Work on updating to version 2 is well underway. This is primarily a cleaning up of the configuration files and increasing the ease of use. The main upgrade to functionality comes in the ability to construct a pipeline configuration file using other pipeline configuration files. Consequently, pipelines can become units that are then used to construct larger pipelines.

Remaining tasks:

  1. Export parameter sets.
  2. Inputting multiple sets of argument values.
  3. Implement task streaming.
  4. Allow arguments to accept a command to evaluate instead of a value.
  5. Delete intermediate files at the earliest opportunity.
  6. Check for required files before executing.

Imported arguments

If a pipeline imports arguments from a contained task, not all of the arguments will have nodes built. If arguments are set that do not have pipeline nodes defined, check that the nodes are built.

R plotting

Check for ggplot2 and melt. This is not necessarily something to test at compile time, but when a pipeline with plotting is executed. A warning at compile time would also be useful, though.

pygraphviz

pygraphviz is required for outputting visualisations of the pipeline. Check for presence and provide warnings if a plot is requested, but pygraphviz is not available.

gkno update

Do not crash gkno update just because a single tool fails to update. Provide better error handling.

call-short-variants 'does not exist'

Hi !
I would like to report this issue :
gkno_launcher apple$ ./gkno pipe call-short-variants

Usage: gkno pipe [options]

 <pipeline name>:
      align-pe:                      Align paired-end fastq files using Mosaik.
      align-pe-special:              Align fastq files using Mosaik. In this version additional 'special'
                                     reference sequences are included (usually mobile element insertions)
                                     in the reference.
      align-se:                      Align single-end fastq files using Mosaik (additional processing
                                     steps included).
      align-se-special:              Align single-end fastq files using Mosaik. In this version
                                     additional 'special' reference sequences are included (usually
                                     mobile element insertions) in the reference.
      bam-tangram:                   Build Mosaik reference files and align fastq files using Mosaik
                                     (additional processing steps included).
      build-moblist-reference:       Concatenate reference fasta files and generate Mosaik reference and
                                     jump database.
      fastq-short-variants:          Starting from fasta and fastq files, prepare MOSAIK reference files,
                                     align reads and call short variants.
      fastq-tangram:                 Starting from fasta and fastq files, prepare MOSAIK reference files,
                                     align reads and search for mobile element insertions (MEIs).
      jellyfish-get-fasta-hashes:    Generate hashes for a reference fasta.
      mei-confirm:                   Check MEI calls using local graph alignment.
      merge-bam:                     Merge together all bam files for a sample and mark duplicate reads.
      run-test:                      Test pipeline to ensure proper operation.
      simulation:                    Simulate reads from one reference, then align reads to a different
                                     reference.
      simulation-call:               Simulate reads from one reference, then align reads to a different
                                     reference and search for short variants.

 The following pipelines have been identified as experimental, so should be used with caution:

      rufus:                         Compare parent and mutant strains..

 The following pipelines have malformed configuration files, so are currently unusable:

      detect-mei:                    Starting from fasta and fastq files, prepare MOSAIK reference files,
                                     align reads and search for mobile element insertions (MEIs).

Additional messages

ERROR: Requested pipeline 'call-short-variants' does not exist. Check available pipelines in usage above.

Hide pipeline argument

Allow the pipeline configuration to define command line arguments that are not displayed in the help message. This would allow the parameter sets to access the argument and thus define arguments that the user can't touch. This is useful for many pipelines, for example, performing an intersection to generate the unique portion of a file requires setting the flag '--invert'. This should always be set, so can be given an argument in the configuration file and the default value can be set to 'set', so that it is always used. When viewing the help, the message isn't cluttered with this information.

Version 2 upgrade

Work on updating to version 2 is well underway. This is primarily a cleaning up of the configuration files and increasing the ease of use. The main upgrade to functionality comes in the ability to construct a pipeline configuration file using other pipeline configuration files. Consequently, pipelines can become units that are then used to construct larger pipelines.

Remaining tasks:

Export parameter sets.
Inputting multiple sets of argument values.
Implement task streaming.
Allow arguments to accept a command to evaluate instead of a value.
Delete intermediate files at the earliest opportunity.
Check for required files before executing.

Picard

Move from SVN mirror to github repo.

Comma separated lists - Version 2 update

Allow multiple values to be given to a command as a list of values separated by commas. Ensure tools that expect comma separated lists (e.g. Wham), get the expected values. In particular, a list of files, or repeated use of an argument can be used in gkno to specify multiple files, but Wham requires that the files are concatenated into a comma separated list and this single value supplied on the command line.

Issues with Mosaik build and update

Hi,

I ran into two issues with Mosaik. In my system, CentOS 6.4, Mosaik failed to build until I installed zlib-static and glibc-static. The issue in gkno is "./gkno build" didn't report this failed build. I only found the problem when I got this error while trying to run "./gkno run-test":

ERROR:   Missing executable files

DETAILS: The following executable files are not available, but are required:

         /home/cborroto/src/gkno_launcher/tools/MOSAIK/bin/MosaikBuild
         /home/cborroto/src/gkno_launcher/tools/MOSAIK/bin/MosaikJump
         /home/cborroto/src/gkno_launcher/tools/MOSAIK/bin/MosaikAligner

================================================================================================
  TERMINATED: Errors found in running gkno.  See specific error messages above for resolution.
================================================================================================

Also, when I tried "./gkno update", Mosaik would fail and point me to the .out and .err logs files. The problem is they were empty, no output whatsoever. Probably this is related to why a failed build is not detected.

After some pocking around I was able to fix the issue with update. I modified Mosaik's doUpdate in 'src/gkno/conf/gknoTools.py' by changing line:

    if not self.doPlatformExports()

to:

    self.doPlatformExports()
    if not self.BLD_PLATFORM:

I think the problem is at least in Python 2.6, the result from the value assigned in the last expression in a function is not returned, so this code from doPlatformExports does not returns TRUE ever:

  def doPlatformExports(self):
    pl='linux'
    if sys.platform == 'darwin':
      pl='macosx'
      if sys.maxsize > 2**32:
        pl='macosx64'
    self.BLD_PLATFORM = pl

I guess you could also make sure doPlatformExports does return TRUE once self.BLD_PLATFORM is properly set. Can't tell which would be better.

Thanks,
Carlos

Issue with "data type (unknown) for field 'description"

Hi,

I ran into this issue after building gkno, no matter what I try to do I get:

$ ./gkno

===============================
  Boston College gkno package

  version: 0.69
  date:    April 2013
===============================


ERROR:   Malformed configuration file: concatenate-files.json

DETAILS: The data type (unknown) for field 'description' in tool 'concatenate-files' is
         inconsistent with that expected (string).  Please check the configuration file and
         remove/repair invalid fields.

================================================================================================
  TERMINATED: Errors found in running gkno.  See specific error messages above for resolution.
================================================================================================

I see issue #1 might be related, but I couldn't figured it out why is not working for me. I'm running CentOS 6.4 with python 2.6.6.

Thanks,
Carlos

./gkno build failure on OS X 10.9.2

I encounter an error running ./gkno build on an up to date 16GB Mac running OS 10.9.2, with deps Cmake and Ant installed via Homebrew. It's MOSAIK that I wish to use, so I will build that on its own, but thought you might be interested in this output. Let me know if I've missed something obvious.

bede-rmbp:gkno_launcher bede$ ./gkno  build

======================================================
  Boston College gkno package

  version:    0.164
  date:       March 2014
  git commit: e0401ed280228cd38ef29e599b7afc5ff92e541c
======================================================

Checking dependencies...done.
Initializing component data...ERROR: See logs/submodule_update.* files for more details.

================================================================================================
  TERMINATED: Errors found in running gkno.  See specific error messages above for resolution.
================================================================================================
bede-rmbp:gkno_launcher bede$ cd logs
bede-rmbp:logs bede$ ls
submodule_update.err submodule_update.out
bede-rmbp:logs bede$ cat *
Cloning into 'bamtools'...
Cloning into 'intervaltree'...
Cloning into 'vcflib'...
Cloning into 'fastahack'...
Cloning into 'filevercmp'...
Cloning into 'fsom'...
Cloning into 'intervaltree'...
Cloning into 'multichoose'...
Cloning into 'smithwaterman'...
Cloning into 'tabixpp'...
fatal: Needed a single revision
Unable to find current revision in submodule path 'tools/gatk'
Submodule path 'tools/freebayes/bamtools': checked out 'b307a397f7d818d0fa064b91229e312707e43951'
Submodule path 'tools/freebayes/intervaltree': checked out 'd151b487804861dc9f932e9f1fe4f8c499673cec'
Submodule path 'tools/freebayes/vcflib': checked out '060b734035ddb16884a756a97406a32d44f72fef'
Submodule 'fastahack' (git://github.com/ekg/fastahack.git) registered for path 'fastahack'
Submodule 'filevercmp' (https://github.com/ekg/filevercmp.git) registered for path 'filevercmp'
Submodule 'fsom' (git://github.com/ekg/fsom.git) registered for path 'fsom'
Submodule 'intervaltree' (git://github.com/ekg/intervaltree.git) registered for path 'intervaltree'
Submodule 'multichoose' (git://github.com/ekg/multichoose.git) registered for path 'multichoose'
Submodule 'smithwaterman' (git://github.com/ekg/smithwaterman.git) registered for path 'smithwaterman'
Submodule 'tabixpp' (git://github.com/ekg/tabixpp.git) registered for path 'tabixpp'
Submodule path 'tools/freebayes/vcflib/fastahack': checked out 'c68cebb4f2e5d5d2b70cf08fbdf1944e9ab2c2dd'
Submodule path 'tools/freebayes/vcflib/filevercmp': checked out '378bae779b897f1506043c1b647a5e5f8031ea4b'
Submodule path 'tools/freebayes/vcflib/fsom': checked out 'f688ef24bbe230ec46c7be3de8bf5e67fdbc1b4d'
Submodule path 'tools/freebayes/vcflib/intervaltree': checked out '1290744283cef8076bb8a2968d4899b7228435f4'
Submodule path 'tools/freebayes/vcflib/multichoose': checked out '73d35daa18bf35729b9ba758041a9247a72484a5'
Submodule path 'tools/freebayes/vcflib/smithwaterman': checked out 'd9724f156c07cf16d00d251bebc20c6383eb6bde'
Submodule path 'tools/freebayes/vcflib/tabixpp': checked out 'c2d6c12eb827805fb13db4bab20f74b212b8b6d0'

Argument order

Check that all arguments in the 'argument order' list in tool configuration files are valid tool arguments.

Intermediate file deletion - Version 2 update

Reimplement capability to delete intermediate files as soon as possible during pipeline execution. Standard behaviour in make is to wait until the end, leading to the potential of filling up space with unnecessary files.

Not passing optional argument --fastq2 to pipeline fastq-bam breaks

Hi,

I found that ' --fastq2 (-q2)' is labeled as 'Optional pipeline specific arguments' in pipeline 'fastq-bam'. However if this argument is not provided the execution breaks.

$ ~/src/gkno_launcher/gkno pipe fastq-bam --hash-size 10 --fasta ~/src/gkno_launcher/resources/tutorial/current/test_genome.fa --fastq ~/src/gkno_launcher/resources/tutorial/current/simulated_reads_1.fq --ann-se ~/src/gkno_launcher/resources/tutorial/current/se.100.005.ann --ann-pe ~/src/gkno_launcher/resources/tutorial/current/pe.100.01.ann --known-sites ~/src/gkno_launcher/resources/tutorial/current/test_genome.dbSNP.snps.sites.vcf

===============================
  Boston College gkno package

  version: 0.89
  date:    June 2013
===============================

Checking tool configuration files...
     bamleftalign.json...done.
     vcflib.json...done.
     mason.json...done.
     michigan-bam-utilities.json...done.
     samtools.json...done.
     freebayes.json...done.
     454-tools.json...done.
     sequence-index-1000g.json...done.
     mosaik.json...done.
     tabix.json...done.
     concatenate-files.json...done.
     bgzip.json...done.
     bamtools.json...done.
     premo.json...done.
     md5.json...done.
     picard.json...done.
     generate-file-list.json...done.
     ogap.json...done.
     gzip.json...done.
     merge-vcf-files.json...done.
     gatk.json...done.
     tangram.json...done.

Checking pipeline configuration file...done.
Reading in command line arguments...done.

Workflow:
     build-reference (mosaik-build-reference):                           Build the Mosaik reference
     build-jump-database (mosaik-jump):                                  Generate the jump database for
                                                                         a Mosaik reference
     index-fasta (samtools-index-fasta):                                 Generate an index for a
                                                                         reference fasta file.
     create-sequence-dictionary (picard-create-sequence-dictionary):     Generate a dictionary
                                                                         containing all of the sequences
                                                                         in the input reference fasta.
     generate-mosaik-parameters (premo):                                 Determine MosaikAligner
                                                                         parameters based on read and
                                                                         fragment length
     build-read-archive (mosaik-build-fastq):                            Build the Mosaik read archive
     align (mosaik-aligner):                                             Pairwise alignment of a read
                                                                         archive
     sort-primary-bam (bamtools-sort):                                   Sort a BAM file
     sort-multiple-bam (bamtools-sort):                                  Sort a BAM file
     index-primary-bam (bamtools-index):                                 Index a BAM file
     count-covariates (gatk-count-covariates):                           Count covariates
     recalibrate-bq (gatk-recalibrate-bq):                               Recalibrate base qualities
     mark-duplicates (picard-mark-duplicates):                           Mark duplicate reads.
     filter-bam (bamtools-filter):                                       Filter a BAM file on many
                                                                         parameters or combinations of
                                                                         parameters.
     realign-gaps (ogap):                                                Realigns alignments optimized
                                                                         to open gaps in low-entropy
                                                                         sequence.
     left-align-indels (bamleftalign):                                   Left-aligns and merges the
                                                                         insertions and deletions in all
                                                                         alignments in stdin.  Iterates
                                                                         until each alignment is stable
                                                                         through a left-realignment step.
     index-final-bam (bamtools-index):                                   Index a BAM file

Assigning command line arguments to tasks...done.
Checking the command line arguments...done.
Checking instance information...done.
Checking multiple runs information...done.
Traceback (most recent call last):
  File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 467, in <module>
    main()
  File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 335, in main
    pl.toolLinkage(task, tool, tl.argumentInformation[tool], make.arguments, iLoop.usingInternalLoop, iLoop.tasks, iLoop.numberOfIterations, verbose)
  File "/home/cborroto/src/gkno_launcher/src/gkno/pipelines.py", line 888, in toolLinkage
    for value in arguments[currentTargetTask][0][currentTargetArgument]: self.values[0].append(value)
KeyError: u'-fq2'

Best,
Carlos

Argument order

Ensure that all arguments are in the argument order in tool configuration files.

tangram pipeline failed

It seems that the pipeline configuration of tangram is not correct?

./gkno pipe fastq-tangram -r resources/homo_sapiens/current/human_reference_v37_decoys.fa -mr resources/homo_sapiens/current/mobile_element_sequences.fa -q /raid/sonpham/pairs/originalbam/fastq/pe/LID115260_MERGE1_1.fastq -q2 /raid/sonpham/pairs/originalbam/fastq/pe/LID115260_MERGE1_2.fastq -srh 50 -sr moblist -rg chr1

Output:

ERROR: Invalid attribute in pipeline configuration file: help group

DETAILS: The configuration file for pipeline 'fastq-tangram' contains the general attribute
'help group'. This is an unrecognised attribute which is not permitted. The general
attributes allowed in a pipeline configuration file are:

     categories:    <type 'list'>, required = True
     description:   <type 'str'>, required = True
     developmental: <type 'bool'>, required = False
     hide in help:  <type 'bool'>, required = False
     nodes: <type 'list'>, required = True
     parameter sets:    <type 'list'>, required = True
     tasks: <type 'dict'>, required = True

     Please remove or correct the invalid attribute in the configuration file.

TERMINATED: Errors in configurationClass. See specific error messages above for resolution.

Failing to correctly parse cmake version in RedHat/CentOS 6

Hi,

It seems like 'cmake --version'' output in CentOS 6.4 is not being correctly parsed, see below.

$ ./gkno build

===============================
  Boston College gkno package

  version: 0.69
  date:    April 2013
===============================

Reading in command line arguments...done.

Checking dependencies...failed.
    Not up-to-date:
        cmake    minimum version: 2.6.4    found version: 2.6


gkno (and its components) require a few 3rd-party utilities
to either build or run properly.  To obtain/update the utilities
listed above, check your system's package manager or search the
web for download instructions.


================================================================================================
  TERMINATED: Errors found in running gkno.  See specific error messages above for resolution.
================================================================================================

$ cmake --version
cmake version 2.6-patch 4

Thanks,
Carlos

Output filename conflicts

It is possible that a pipeline is constructed which includes tasks that are themselves pipelines. If the contained pipelines take the same input data, they can result in the same output file names (assuming that they are built by gkno). Implement a check to ensure that the filenames are amended to avoid conflicts.

Handle quotation marks - Version 2 update

Some tool arguments should be supplied in quotation marks. In particular, some text values may contain spaces and so it is prudent to include them in quotation marks on the command line.

Multiple outputs - Version 2 update

Include instructions to check that all required outputs from a rule have been generated. The makefile format does not allow multiple outputs to be included in the rule, so a new rule needs to be included.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.