Coder Social home page Coder Social logo

mlst-nf's People

Contributors

dfornika avatar

Stargazers

 avatar

Watchers

 avatar

mlst-nf's Issues

Pipeline fails on low-quality assembly

Quast will fail when given assemblies with no contig greater than 500bp, which causes the pipeline to fail. One poor-quality sample could crash a full run, so it would make the overall pipeline more robust if we can prevent the pipeline from crashing in the presence of a single low-quality sample.

Make input QC optional

There are cases where we run this pipeline on the outputs of another pipeline (generally BCCDC-PHL/routine-assembly. That pipeline may already perform QC on its outputs, so running essentially the same QC on the inputs of this pipeline would be redundant.

Add a --skip_input_qc flag that causes the QUAST analysis on the input assemblies to be skipped.

Remove `versioned_outdir` param

The versioned_outdir param hasn't proven to be useful, and it clutters up our publishDir directives.

Remove the versioned_outdir param.

Add support for `--collect_outputs`

We currently only generate a separate output directory for each sample. But it would be convenient to collect the sequence types for all samples into a single .csv file as well. The user should be able to specify a prefix for the collected outputs, using a --collected_outputs_prefix flag, whose default value is collected.

`parse_alleles.py` fails when no alleles included in mlst output

Command error:
  Traceback (most recent call last):
    File "/home/dfornika/.nextflow/assets/BCCDC-PHL/mlst-nf/bin/parse_alleles.py", line 78, in <module>
      main(args)
    File "/home/dfornika/.nextflow/assets/BCCDC-PHL/mlst-nf/bin/parse_alleles.py", line 29, in main
      num_alleles = len(mlst[sample]['alleles'])
  TypeError: object of type 'NoneType' has no len()

json output from mlst was:

{
   "sample-X.fa" : {
      "scheme" : "-",
      "sequence_type" : "-",
      "alleles" : null,
      "filename" : "sample-X.fa"
   }
}

Adopt nf-core conventions

In anticipation of integrating with tools and platforms like Sequera Platform we'd like to evaluate what would be necessary to adopt the nf-core conventions for our existing pipelines. Since this is a fairly simple pipeline, it's a good candidate for conversion to nf-core.

Add optional versioned output directory

The pipeline currently creates one output directory per sample and publishes all outputs there. eg:

publishDir "${params.outdir}/${sample_id}", mode: 'copy', pattern: "${sample_id}_mlst.json"

When combining this pipeline with others, it may be useful to encapsulate the outputs from this pipeline in a sub-directory that is named with the pipeline name and version.

So by default we would create outputs of this structure:

.
├── sample-01
│   ├── sample-01_alleles.csv
│   └── sample-01_sequence_type.csv
├── sample-02
│   ├── sample-02_alleles.csv
│   └── sample-02_sequence_type.csv
└── sample-03
    ├── sample-03_alleles.csv
    └── sample-03_sequence_type.csv

...but when running with a --versioned_outdir flag , we would produce:

.
├── sample-01
│   └── mlst-nf-v0.1-output
│       ├── sample-01_alleles.csv
│       └── sample-01_sequence_type.csv
├── sample-02
│   └── mlst-nf-v0.1-output
│       ├── sample-01_alleles.csv
│       └── sample-01_sequence_type.csv
└── sample-03
    └── mlst-nf-v0.1-output
        ├── sample-01_alleles.csv
        └── sample-01_sequence_type.csv
 

...then a subsequent analysis could produce similar outputs alongside:

.
├── sample-01
│   ├── mlst-nf-v0.1-output
│   │   └── sample-01_mlst.csv
│   └── routine-assembly-v0.2-output
│       ├── sample-01_bakta.gbk
│       └── sample-01_unicycler.fa
├── sample-02
│   ├── mlst-nf-v0.1-output
│   │   └── sample-02_mlst.csv
│   └── routine-assembly-v0.2-output
│       ├── sample-02_bakta.gbk
│       └── sample-02_unicycler.fa
└── sample-03
    ├── mlst-nf-v0.1-output
    │   └── sample-03_mlst.csv
    └── routine-assembly-v0.2-output
        ├── sample-03_bakta.gbk
        └── sample-03_unicycler.fa

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.