openwdl / wdl Goto Github PK

View Code? Open in Web Editor NEW

732.0 732.0 300.0 4.95 MB

Workflow Description Language - Specification and Implementations

Home Page: https://www.openwdl.org/

License: BSD 3-Clause "New" or "Revised" License

bioinformatics cloud cromwell openwdl reproducibility reproducible-science wdl workflow

wdl's People

Stargazers

Watchers

Forkers

samuell isanwong pgm phanikishore2 jlefman vdauwera jneff1 mbookman pombredanne kevgh mlin greenstick jchenpku delocalizer cbirger simexin aprabhak2 sidoruka mzueva janeshen91 danbills shiva1387 yixf-self dayedepps osiris1975 sniperxiaojun erichpeterson tcong2012 mbiob pkarnati2004 axtang kiahalespractice luna5124 alenzhao davande cjllanwarne hy714335634 ruchim azzouzis patmagee snewhouse manabuishii kevyin byoo broadinstitute denis-yuen mkbakker jmrodriguezc aednichols luhuifang katevoss gthoppae davycats smw1414 evantheb pshapiro4broad rhpvorderman biomonster ffinfo datnoor genomics-geek tintingli otiai10 xtmgah wingolab multimeric sdshq wuyiran55 infispiel illusional ying-w mlaszloffy jtratner sgnajar hooooooly lbergelson girocao jmonlong juanjdurillo nh13 iwillsky curseoff singletrips quanrd ewail notestaff sstadick aligenomics sclan kislyuk kdong2395 dbottomly dasmoocher dmaziec zhonghl zzygyx9119 omixer marysteph desnar jdidion

wdl's Issues

Consider Supporting Size() engine function for Complex Types

Dynamically sizing HDD & SDD sizes is a great feature about WDL + Cromwell that gives you cost savings in certain cloud environments. A current limitation is that you cannot call size on an Array[File] type or Array[Array[....[File]..]] Types, which forces you to either guess the required disk size, or build logic into your workflow upstream for determining file sizes.

Some practical considerations:

Arrays can have an arbitrary size causing a large number of calls to an engine function / API
Would an Iterator function be a more generalizable solution?
Would we restrict this to Arrays or should it be callable on all File Types, ie Map[String,File],Pair[File,File] etc

quote command part option undocumented

In the spec, a command part option named quote is listed: https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#command-part-options

However, although the spec goes on to have a subsection explaining the other four options (sep, true, false, and default) there appears to be no documentation of what is meant by the quote option.

scatter over maps

It would be nice to get methods to iterate over maps with scatter. It will also be good to have access to map.keySet and map.valueSet to be able to process keys or values of maps separately.

Simple string "contains" checks

To make life easier for authors doing string checks like in https://gatkforums.broadinstitute.org/wdl/discussion/10354/multiple-backends-for-cromwell#latest

Eg:

File f
Boolean a = endsWith(basename(f), ".suffix")
Boolean b = startsWith(basename(f), "prefix.")
Boolean c = contains(basename(f), ".middle.")

Draft implementation: https://github.com/openwdl/wdl/commits/133-string-find/

Optional Outputs

Broad's GDAC often deals with sparsely and inconsistently populated data, and many analyses support a range of acceptable inputs. For example, our methylation preprocessor will correlate methylation data to expression data if it is available for a sample set. However, this means that in addition to optional inputs (already supported by WDL), output files are also optional, or at least conditional upon inputs. Currently cromwell raises an error when an expected output is not present, and I don't believe there is syntax to support it in WDL.

It would be nice to be able to specify that an output is optional. One solution could be to mirror to the optional input syntax, and I could specify optional outputs like this:

task my_task {
    ... inputs, commands, etc...

    output {
        required_output="file1"
        optional_output?="file2"
    }
}

Having this would eliminate the need to write additional logic to create/handle blank files.

There should be a literal for empty / unfilled optionals

There's currently no literal for this concept, inviting a variety of if-based hackery around the problem.

Feature request: Nested scatter loops

Please add the ability to do nested scatter loops. The current single sample workflow runs HaplotypeCaller in parallel by scattering over interval. I would like to add a scattering by sample step so it can handle multiple samples.

Bring sanity to the Spec situation

(@kcibul - another Jeff Special)

The status of our spec(s) is super confusing.

There's "Draft 1 (closed)" which predates our team. There's "Draft 2 (open)". There's the master branch. There's the develop branch. The "Draft 2" on master doesn't match what's in develop all the time. There's also only one real implementation of the spec, and it's not even complete.

I recognizing the desire on our past selves' part to be official about everything but we could simplify this a lot by just having a spec that's allowed to change at will until if/when it actually becomes a problem.

wdl bwa parameters

Thanks for releasing this WDL script. I noticed the bwa parameters are: bwa_commandline="bwa mem -K 100000000 -p -v 3 -t 16 $bash_ref_fasta"

What is the -K flag for? it is not defined in the bwa manual. Also is -M missing?

Thanks
Matt

Update WDL tutorials with new files & URL links

moving my files to /wdl
adding a URL to the comments in the header of my files that points to the tutorial they are linked to

Improve specification documentation on workflow outputs

There is barely a mention of how workflow outputs works in the SPEC.md file

The only mention of it is here: https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#outputs

Acceptance criteria: write a better explanation of how workflow outputs work

small typo in SPEC.md

It says:
"Tasks define all their outputs as declarations at the top of the task definition."
It should say:
"Tasks define all their inputs as declarations at the top of the task definition."

Discrepancies within WDL script in converting to CWL

We are converting the WDL scripts to CWL, and there seem to be some discrepancies regarding the correct way to format arguments within the command line with boolean values.

While working on converting the WDL script to CWL using a wdl2cwl converter (https://github.com/common-workflow-language/wdl2cwl), we found that if a boolean value exists for a particular argument, errors were being raised. There is an issue regarding the interpolation of the argument within the command line.

According to the WDL HaplotypeCaller_3.6 script, if an argument takes in a boolean and its default value is false, then the command line includes both the argument and the value false at the end.

For example, examining this piece of code

-allowNonUniqueKmersInRef ${default="false" allowNonUniqueKmersInRef}

With a default value of false, the script would return

-allowNonUniqueKmersInRef false

if given no value (using the default) when in reality it should return nothing, as only a true value actually causes it to return something in the command line. An "invalid argument value false" error is returned in this case. The WDL script does not seem to portray that. We suspect that the interpolation should instead look like

${true = '-allowNonUniqueKmersInRef, false = '', Boolean bool}

as shown the documentation for wdl (https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#integer-lengtharrayx). When converted to CWL, the command line output does not run if done the current way. If false, the argument should not exist on the command line, and if true, the command line should have

-allowNonUniqueKmersInRef.

The true and false values should not exist in the command line.

There also seem to be problems regarding arguments with array inputs. According to the documentation (https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--input_prior) for something such as kmerSize, with an array of [10, 25], then command line should look like

-kmerSize 10 -kmerSize 25.

However, the output is -kmerSize [10,25], which is not understood by the program as it is not formatted correctly. According to the WDL documentation, the argument should look like array [String] prefix(String, Array [X]) so that a prefix exists for each value. In conversion from WDL to CWL, these problems are causing errors that causes the program to fail.

The format of the WDL seems to be incorrect, if we are understanding the code correctly. Is it possible that there exists a different version of the WDL code for HaplotypeCaller?

Perhaps we aren't understanding the code correctly. Any clarification is appreciated! Thank you!

GATK wrappers should use workaround for optional arrays

Until #25 is fixed, the gatk wrappers should not be using optional arrays, such as the intervals files. Instead they should use this bash workaround, with something similar to:

Array[String]? foo

command {
  if [ -n '${sep=',' foo}' ]; then
    FLAG=--prefix=${sep=',' foo}
  else
    FLAG=''
  fi

  echo $FLAG
}

Support Enumerations

@danbills commented on Thu Jul 06 2017

It would be nice to restrict types to specific values so that users can only specify one of a limited number of a values.

Per this forum post

@cjllanwarne commented on Mon Jul 10 2017

Could be a nice bring-along with broadinstitute/cromwell#2283

@patmagee commented on Tue Aug 29 2017

Just wondering if there has been any discussion regarding this.

@katevoss commented on Tue Aug 29 2017

@patmagee not to my knowledge but I can start the ball rolling.
@geoffjentry & @mcovarr, any opinions on supporting enums?

@geoffjentry commented on Tue Aug 29 2017

I'm leery but not necessarily against it. I'm also generally the one with the most conservative opinion in terms of adding WDL syntax, so view that take as a lower bound of acceptance :)

@patmagee what were you thinking in terms of the syntax?

Pinging @vdauwera so she's abreast of this convo.

@vdauwera commented on Wed Aug 30 2017

FWIW Enum support would definitely be very valuable to us in GATK-world, and are likely to be useful in general.

My one caveat would be that they should be easier to work with than they were in Queue (friendly elbow poke at @kshakir).

@patmagee commented on Fri Sep 01 2017

@geoffjentry I share your concern with adding any sort of syntax to the spec, and my first inclination is to fake support enumeration. Im not exactly sure how to do that though, nor am I sure of whether its possible.

The cleanest way I can think of implementing would be maybe something like a Java style enum:

enum MyEnum {
     "A","B","C"
}

workflow wf {
    MyEnum thisIsMyEnum
}

Another way that we would be able to do it would be define an Enum type in a workflow like so:

workflow wf {
     #This would get overridden at run time, but the value would need to be validated
     Enum greeting = ["HELLO","GOODBYE"]
     or
     #Done override anything but validate it
     Enum["HELLO","GOODBYE"] greeting
     
}

@katevoss commented on Thu Sep 28 2017

@geoffjentry sounds like this is more of a WDL feature request, shall I move it to the WDL repo?

@geoffjentry commented on Thu Sep 28 2017

`true and false` blocks presented as not needing both, but that is untrue

https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#true-and-false

The docs indicate that you do not need a "true" and a "false", but in fact both are required.

RFE: would be nice to have richer set of operators (or std library funcs) for Array type

I would like to be able to get a list of 3 kinds of files in a directory

*.png
*.txt
*.html

and store this list into a single output variable. I know the glob() function can be used to find each type, but that means 3 glob() invocations in the output{} section of my WDL, and there is no way to paste them together.

I suppose I could do "ls *.png *.txt *.html >> outputs.lst" in the command{} section, and then in my output{} section do Array outputs = read_lines("outputs.lst").

But it would be cleaner to be able to do something like

Array[File] = glob(".png") + glob(".txt") + glob(.html")

Array[File] = concat( glob(".png"), glob(".txt"), glob(*.html"))

or variants.

Revamp `object`

Needs Refinement

We want to be able to define types for the values of objects. One suggestion was something like the following (note struct is using as a possible replacement for object, see below):

struct MyType {
  o_f: File
  x: Array[String]
}

MyType foo = read_object(...)

It will coerce to the types it expects and if it can't that's a failure.

Open questions:

Do we make a new construct (e.g. struct above), or replace objects
If replace, who (if anyone) is currently using object
What's the right syntax, regardless of the name of the construct. This needs focus grouping.

Encapsulate sets of inputs

For Mutect we use a lot of sub-workflows, and it's tedious to pass a large set of parameters repeatedly. For example:

import "single_sample.wdl" as SingleSample
workflow MultiSample {
  #these are the parameters of the single-sample subworkflow
  File param1
  File param2
  . . .
  Int param20

  #other parameters

  scatter (sample in samples) {
    call SingleSample.single_sample {
      input:
        File param1
        File param2
        . . .
       Int param20
    }
  }
}

In this case, the same list of parameters is copied 5 times: the single-sample task inputs, the single-sample workflow inputs, the single-sample workflow's call to the task, the multi-sample workflow inputs, and the multi-sample workflow's call to the single-sample workflow. It would be really nice to be able to encapsulate all these parameters, eg

params MyParams {
  File param1
  File param2
  . . .
  Int param20
}

task MyTask {
  MyParams params

  command {
    java -jar ${params.gatk} -R ${params.reference} -L ${params.intervals} . . .
  }
}

workflow MyWorkflow {
  MyParams params

  call MyTask { input params = params }
}

RFE: make shell script generated to execute command section less fragile

Consider a WDL command section with 2 or more Unix statements. Cromwell currently reports that such commands "succeed" even when one or more of its Unix statements fail (i.e. return a non-zero status code), as long as the final statement returns a zero status code (e.g. echo "Done").

This is fragile, and I think it's far more common to want Cromwell to report a failure in such cases than to silently ignore them.

One workaround is to chain all statements together into one effective line, using && to separate each statement, but this is less flexible and readable (uglier). To make it easier for the common case to be more robust and readable, it would be helpful to consider having the generated script file begin with "set -eo pipefail" (maybe even 'set -euo pipefail')

Getting started documentation out of date with regard to "inputs" command

I am following the instructions here:

https://github.com/broadinstitute/wdl#getting-started-with-wdl

I have downloaded cromwell-0.19.jar.
I created the example "hello.wdl".

The README indicates that there is an "inputs" command to see the inputs to the wdl.
When I run "java -jar cromwell.jar", it indicates that there are just two commands "run" and "server".
It appears that the "inputs" command has moved to "wdltool".

I downloaded wdltool from https://github.com/broadinstitute/wdltool/releases

$ java -jar wdltool.jar inputs hello.wdl
{
  "test.hello.name": "String"
}

The documentation should be updated:

To direct the user to download wdltool.jar
To show use of "wdltool.jar inputs" command.

Also at the bottom of this first example, it indicates:

Since the hello task returns a File, the result is a file that contains the string "hello world!" in it.

It would be worth adding a demonstration of that:

$ cat /home/user/test/c1d15098-bb57-4a0e-bc52-3a8887f7b439/call-hello/stdout8818073565713629828.tmp
hello world!

Links to standard library documentation are broken in the top-level README

Meta question: What are we supporting?

We need to decide (and then treat appropriately) the scope of what we're supporting in the WDL universe. For example, are we supporting, and thus maintaining, pywdl?

clarify refering outputs inside and outside scatter

It looks like output of the tasks work in different way depending on if you call it inside scatter (like with dec, where I get integer from inc.increment) or outside of scatter block (in such case I get array from inc.increment)

workflow wf {
  Array[Int] integers = [1,2,3,4,5]
  scatter (i in integers) {
    call inc {input: i=i}
    call dec {input: i=inc.increment} #inc.increment is integer
  }
  call sum {input: ints = inc.increment}  #inc.increment is an array
}

This is very confusing. Maybe it is possible to clarify the behaviour in docs?

Make toolkit more prominent in website

We have a bunch of awesome tools for WDL! Unfortunately I miss the roundup all the time.

It'd be nice to make the link more prominent, perhaps as one of the highlighted pieces?

I understand this is a little more noisy, was thinking perhaps listing these things vertically may help.

Anyway I crafted this professional drawing to be more specific

true and false command part options function unclear (and example broken)

It appears that the command part options example for true and false given in the SPEC (https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#true-and-false) is broken.

First of all, the parser does not seem to allow commas between command options (the list macro is invoked without the :comma second argument; https://github.com/broadinstitute/wdl/blob/develop/parsers/grammar.hgr#L388).

Secondly, it does not appear to be possible to assert a type of a command part by prefixing the expression with the type as is done in this example:

For example, ${true='--enable-foo', false='--disable-foo' Boolean yes_or_no} would evaluate to either --enable-foo or --disable-foo based on the value of yes_or_no.

Because the grammar does not seem to accept a type specification here, it is not clear to me when and how the true / false key values should be interpreted. In this example, it would be clear if the type of yes_or_no is Boolean because the expression is just a simple identifier, but if the expression was more complex (such as a function call) it would be harder to determine the type of it without understanding the execution environment.

We have come across this while trying to improve the wdl2cwl converter (https://github.com/common-workflow-language/wdl2cwl) to handle more complex command expressions, and in order to add the true and false handling we will need to understand when they are meant to be invoked.

After reading the docs and some of the example scripts (such as the WDL for HaplotypeCaller), it seems possible that we are meant to use true and false options (each with a default value of "") whenever the type of the expression is Boolean - but that they do not apply when the type is Boolean ? as in that case the default option is used instead and the true / false values are simply stringified as in this example (from https://github.com/broadinstitute/wdl/blob/develop/scripts/wrappers/gatk/WDLTasks_3.6/HaplotypeCaller_3.6.wdl#L141):

-allelesTrigger ${default="false" useAllelesTrigger}

Which appears to evaluate to -allelesTrigger false if useAllelesTrigger is false or unset/null, and -allelesTrigger true if useAllelesTrigger is true.

If the true and false options apply to both Boolean and Boolean ? and we are meant to apply the default ("") values of the true and false options as the docs suggest -- in other words, if the above line was equivalent to:

-allelesTrigger ${true="" false="" default="false" useAllelesTrigger}

Then I would expect this to evaluate to -allelesTrigger false only when useAllelesTrigger is unset/null, and to -allelesTrigger when useAllelesTrigger is true or false.

Clarification of the documentation would be helpful!

support `meta` block in workflow body as well as task

The grammar currently only allows meta in tasks. We'd love to be able to explicitly track versioning & provenance of workflows themselves by embedding some identifiers in a workflow-level meta block, something like the following:

...
workflow finishQuickly {
  meta {
    revision: "1.0.0"
    url: "https://path/to/my/repo/finishQuickly.wdl" 
  }
  call Usain
}

Add a function to get a list of the values from a Map

I have a Map[String,File] from which I want to extract the values into an Array[File], that would be equivalent to eg the Java expression List list = map.values.

Justification: I need to provide multiple files to a task and I don't want to have to hardcode separate arguments for each. I can't just use an Array at the workflow inputs level because I need to be able to call on one of the files specifically in other tasks.

Worked out use case:

JSON

{
    "DoStuffWithKnownSitesWf.known_sites_VCFs_map": { "dbsnp": "dbsnp_138.vcf", "mills": "mills_indels.vcf", "other": "other_sites.vcf" }
    "DoStuffWithKnownSitesWf.known_sites_indices_map": { "dbsnp": "dbsnp_138.vcf.idx", "mills": "mills_indels.vcf.idx", "other": "other_sites.vcf.idx" }
}

WDL

task SomeTool {
    Array[File] known_sites_VCFs
    Array[File] known_sites_indices

    command { 
        doSomething -knownSites ${sep=" -knownSites " known_sites_VCFs}
    }
}

task SomeOtherTool {
    File dbSNP_VCF   
    File dbSNP_index

    command { 
        doSomethingElse --dbsnp  ${dbSNP_VCF}
    }
}

workflow DoStuffWithKnownSitesWf  {
    Map[String, File] known_sites_VCFs_map
    Map[String, File] known_sites_indices_map

    call SomeTool {
        input:
            known_sites_VCFs = known_sites_VCFs_map.values,
            known_sites_indices = known_sites_indices_map.values,
    }

    call SomeOtherTool {
        input:
            dbSNP_VCF = known_sites_VCFs_map["dbsnp"],
            dbSNP_index = known_sites_indices_map["dbsnp"],
    }
}

The expression dbSNP_VCF = known_sites_VCFs_map["dbsnp"] already works perfectly. But there's currently no way to do a straightforward known_sites_VCFs = known_sites_VCFs_map.values. This is the feature request. Actual syntax can be different of course.

Bonus points for making the keys available as well, though I don't have an immediate use case in mind.

Draft implementation: https://github.com/openwdl/wdl/tree/43-map-values

Allow array-like pair dereferencing

As mentioned a couple of times, the pair dereferencing is not intuitive (you have to look it up in the docs to be able to find it). Perhaps we could additionally support array-style dereferencing?

Pair[Int, Int] p = (100, 22)
Int one_hundred = p[0]
Int twenty_two = p[1]

Clarify globbing implementation

As the definition of globbing can vary from shell to shell, and language to language, the WDL spec should really clearly define its version of globbing rather than just giving examples.

Obviously, there is the most basic implementation of *, ?, [...], though the only examples given are of *.

However, as this seems to be the only form of pattern matching available in WDL, the addition of brace expansion, character classes, and bash-style extended globbing would be excellent.

Allow for variables named with keywords (like 'output')

Allow for WDL like this:

task x {
  command {...}
  output {
    String output = "foobar"
  }
}

The output is called output which currently does not work. WDL parser thinks this is a keyword.

Better support for optional arrays

I would like there to be support for the following type of scenario:

task foo {
    Array[String] bar
    Array[String]? baz

   command {
       something.py --input ${sep=' ' bar} ${"--optionalInput" + ${sep=' ' baz}}
   }

   ...

As far as I can tell there isn't a way to do this now.

Array[Array[__]], conditional, and scatter

I have the following toy workflow, which I invoke with -jar cromwell-25.jar run example.wdl empty_inputs.json. It reads a tsv that has either one column or two and scatters a task over each row of the tsv. The task prints the second column if it is present. When the input fake.tsv has one column, everything is fine. However, when it has two columns eg

1</TAB>1
2</TAB>2

it fails with "Could not construct array of type WdlMaybeEmptyArrayType(WdlOptionalType(WdlIntegerType)) with this value: List(WdlInteger(1), WdlInteger(2))". (Side question: why is it trying to make a list out of values in two different scattered rows?) Using Int? in the conditional instead of Int does not make a difference.

Another bizarre twist: if instead of reading in from a file I hardcode the array, the error persists when each row of the array has the same number of columns but goes away when some rows have two columns and some do not. That is: Array[Array[Int]] table = [[1,1,1], [2,2]] works, but Array[Array[Int]] table = [[1,1], [2,2]] gives the same error as above.

task printInt {
  Int? int

  command { echo "${int}" > out.txt }
  output { File out = "out.txt" }
}

workflow optional {

  Array[Array[Int]] table = read_tsv("fake.tsv")
  scatter (row in table) {

    if (length(row) == 2) {
      Int int = row[1]
    }

    call printInt {input: int=int }
  }
}

array slices

Very often I need to slice an array. For instance, in RNA-Seq experiments I have tsv files with first column as a condtion and all subsequent as GSM ids for samples. It would be useful to get all elements of array except the first. I would love to have something either like Scala's head/tail or like Python access for slices of the array.

Summing up the sizes of input files

@jsotobroad commented on Fri Sep 22 2017

It would nice to have a way of summing up a list of floats like if you have a scatter task that you then want to gather the outputs of, its not straightforward on how you dynamically size that gather task. This is what we currently do (and if it's stupid let us know!)

scatter (bam in list_of_bams) {
  call MakeBam{
    inputs:
      bam
  }

  Float mapped_bam_size = size(MakeBam.output_bam, "GB")
}

call SumFloats {
  input:
    sizes = mapped_bam_size,
}

.....
task SumFloats {
  Array[Float] sizes

  command <<<
  python -c "print ${sep="+" sizes}"
  >>>
  output {
    Float total_size = read_float(stdout())
  }
  runtime {
    docker: "python:2.7"
    preemptible: preemptible_tries
  }
}

@geoffjentry commented on Fri Sep 22 2017

@jsotobroad Please file this in the wdl repo, not the cromwell repo

Disallow expressions in metadata blocks

The WDL spec specifies that the values in the KV pairs of meta and parameter_meta blocks are strings but the grammar allows for expressions. This leads to a confusing use of RuntimeAttribute as part of the AST, as evidenced by the discussion around the comment in wdls4s #53

There doesn't appear to be a reason to use an expression in these blocks (paging @kcibul - do you disagree?), let's remove that capability which will also allow us to clean up wdl4s.

This ticket requires both updates to the spec/grammar here as well as cleaning up wdl4s.

RFE: basic introspection would be nice, easy to add feature

It is useful to have files created by tasks/workflows reflect the name of the thing that created them, like

task Foo {
     command {
             echo "blah blah blah" > Foo_output.txt
              ...
     }
     ...
}

But right now this name has to be hardcoded into the WDL. It would be cleaner and more robust (e.g. to changes in the workflow or task name) if it could instead be referenced by introspection, such as ${__name__} or something. Rather than duplicating the task name as a hardcode in multiple places within the various sections of the WDL body.

Data for running Broad Pipelines

I would like to run PublicPairedSingleSampleWf_160927 workflow, but I can not find where to look for the sample data in the input.json. Could it be mentioned in the README?

"runtime" and "meta" should be keywords in syntax highlighters

runtime should be treated the same as "command" or "output"

Example:

runtime { docker: "broadinstitute/picard"}

Where is the java8/Main.java?

Thanks.

Need to update the sing-samp wf doc

See https://github.com/broadinstitute/wdl/blob/develop/scripts/broad_pipelines/PublicPairedSingleSampleWf_160927.md

The timestamp was incremented to follow the newest version but the doc wasn't actually updated. Changes are minimal but should be noted, and the supported versions should be updated.

Update PublicPairedSingleSampleWf options.json to include other us-central1 zones

PublicPairedSingleSampleWf_160927.options.json only includes 2 of the 4 us-central1 zones:

    "zones": "us-central1-b us-central1-c"

I'm not sure if there is a rational for limiting the zones that Cromwell can use to run tasks. The 4 Compute Engine zones in us-central1 are:

us-central1-a
us-central1-c
us-central1-b
us-central1-f

Document what an Object is

(Tagging @kcibul as I don't know if folks actually look in here regularly)

Object is a WDL type although as far as I can tell it is not defined in the WDL spec at all. One can kind of infer what it is from mentions but I didn't see any concrete explanation.

A contains style function for Array

This WDL forum post presents a use-case for why a contains style function on arrays could be very useful. It also reflects on the downsides of current workarounds in WDL.

It would be helpful to have a function called "contains" or "exists" which returns a Boolean depending on whether a given value exists/is contained within a given array.

Draft implementation: https://github.com/openwdl/wdl/tree/117-contains

Workflow inputs don't override WDL defaults

Truncated from @cjllanwarne

I believe the order of precedence should be:

Inputs provided by the inputs JSON

Inputs specified explicitly by call (i.e. in the workflow doing call foo {input: inputInt = 6})

The default value in the task

Right now the order appears to be:

Inputs specified explicitly by call (i.e. in the workflow doing call foo {input: inputInt = 6})

The default value in the task

Inputs provided by the inputs JSON

Full context: see the conversation from the Cromwell repo

Float => Int Coercion

It seems like this should be possible but it currently isn't (from what I can tell).

Import public WDL scripts from HCA

At the Mint meeting heard these will all be public in the HCA repo (Skylab repo) and available to anyone to take. These would relate to single-cell RNA-Seq. Contacts are Mint team.

E.g. 10X pipeline WDL

At the least, point users to the repo in our README.

RFE: would be nice to have richer, regex()-like file search (beyond just glob)

In issue #89 I mentioned a use case of wanting to get a list of 3 kinds of files in a directory

*.png
*.txt
*.html

and store this list into a single output variable. The glob() function is not expressive enough to generate such a list with a single call, but a regular expression-based function would be.

Syntax Highlighter doesn't like Optional Inputs

I just used the attached wdl file (saved as .txt so github would attach it) in IntelliJ and the highlighter seems to find an error with the first '?' in the file. The error pops up as: , WdlTokenType.IDENTIFIER or WdlTokenType.LSQUARE expected, for '?'

Any subsequent '?' don't have an error.

OptionalInput.wdl.txt

Add tutorial scripts to scripts/tutorials

The tutorial scripts featured on the website should be mirrored here. They can live in scripts/tutorials/ (scripts/ will be created by https://github.com/broadinstitute/wdl/pull/36 when it's merged).