The raredisease's discuss from nf-core

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Add this finishing touch to the mapping subworkflow so that the preprocessing of bam files is complete before branching into other tools e.g. variant callers

Describe alternatives you've considered

Additional context

Description of feature

Hello 👋 , we use this to add additional annotations after vcfanno and include a header relating to software and case info

Create an interactive chart to use during pipeline development

To complement the project board, we would like to have an interactive chart that can reflect the progresses of the development work and that can be easily modified e.g. when we want to include more tools.
nf-core recommends LucidChart or Google Drawings for such task. For the moment we are going for Google Drawings.

Add svdb/merge to nf-core/modules

Description of feature

We use this to aggregate SV vcf callset. https://github.com/J35P312/SVDB

Draft overview of future pipeline

This overview is based on the WGS/WES rare disease pipeline (MIP) that is currently in use at Clinical Genomics Stockholm. This outlines the basic functionality and modules that we would like to have from a pipeline specialised for calling, annotating and scoring variants relevant for rare disease patients.

Overview

Fastq files are prepared for variantcalling by alignment with bwa-mem/bwa-mem2 followed by markduplicates. From this point the workflow is split into a SNV/indel part and a SV part.

SNV/Indels

SNV/indels are primarly called with Deepvariant Glnexus but with the possibility of turning on the GATK Haplotypecaller workflow. These two callsets can be combined into one for maximum sensitivity. Vcfanno annotates the callset with population allele frequencies (Gnomad) and predicted pathogenicity (CADD). Common variation is removed from the callset and CADD scores are caclulated for indels. VEP is used for transcript annotation including annotation with CLINVAR, SpliceAI and pLI scores. The SNV/indels are split into a clinical callset and a research callset based on a bed file with genes of interest. Finally the variants are ranked for predicted pathogenicity based on their annotation as well as their modes of inheritance.

SV

We use Cnvnator, Manta, (Delly) and Tiddit to call structural variants. Using SVDB we combine the variants into one callset and using a local frequency database we remove common variants and sequencing/calling artefacts. The callset is annotated with vcfanno and VEP followed by a split into a clincal callset and a research callset. The SVs are then ranked in the same manner as the SNVs.

But wait there's more

Aside from SNVs and SVs the pipelines identifies and visualizes runs of homo/auto-zygosity as well as upd:s. Also included are identification and annotation of pathogenic STRs with ExpansionHunter and Stranger. SMNCopyNumberCaller is used to diagnose patients with spinal muscular atrophy.

The tools mentioned here are not set in stone and we are certainly open to adding and changing tools as we continue development. Below is a list of tools used in the workflow.

Bcftools
BedTools
BWA
CADD
Chanjo
Chromograph
Cnvnator
Cyrius
Delly
Deepvariant
Expansionhunter
FastQC
GATK
GENMOD
Gffcompare
Glnexus
Manta
MultiQC
Peddy
PicardTools
PLINK
Rhocall
Sambamba
Samtools
SMNCopyNumberCaller
Stranger
Svdb
Telomerecat
Tiddit
Upd
Vcf2cytosure
Vcfanno
VEP

Add Expansionhunter

Is your feature request related to a problem? Please describe

Add Expansionhunter nf-core module:
https://github.com/nf-core/modules/tree/master/modules/expansionhunter

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Bcftools norm

Is your feature request related to a problem? Please describe

Normalize and split multi allelic variants using bcftools norm prior to annotation

Describe the solution you'd like

Incorporate the bcftools norm module from nf-core modules.

Describe alternatives you've considered

We could use vt decompose and normalize

Additional context

Add svdb/merge to pipeline

Description of feature

Add this to call_structural_variants.nf to combine VCFs from manta, cnvpytor, tiddit

Include a default variant catalog file

Maybe a default file for variant_catalog (in case the user doesn't provide one) should still be added? What do you think? In case this should be included in this merge, I can try to look for how this could be done in prepare_genome.nf.

It's not a bad idea. However I think we can go ahead and merge this one and add that option in a small PR later. We could bundle it with the pipeline or have it as an url https://raw.githubusercontent.com/Illumina/ExpansionHunter/master/variant_catalog/hg19/variant_catalog.json
There has also been a discussion about adding a download workflow which would automatically download all the references.

Originally posted by @jemten in #51 (comment)

Add mosdepth to pipeline

Description of feature

Add mosdepth to raredisease

test config: taking too long to run the tests

Description of feature

As shown in the screenshot, it takes roughly 19 minutes to finish the test. I propose that we use the incubating stub feature: https://www.nextflow.io/docs/latest/process.html#stub for workflows that take a while e.g. call_snv_deepvariant.nf.

If we implement, it would involve adding the stub command in each of the processes in the above ^ subworkflow:

deepvariant
glnexus

add bwamem2

RevertSam-GATK

RevertSam

Produce unmapped BAM (uBAM) from aligned BAM

bcftools/sort module

bcftools/sort is used in the mitochondrial workflow. An issue is open here: nf-core/modules#915.

New modules required: Sentieon

Here is a list of Sentieon tools that are relevant for the pipeline and for which issues have been opened in https://github.com/nf-core/modules.

bwa mem nf-core/modules#732
LocusCollector nf-core/modules#733
Dedup nf-core/modules#734
DNAscope assigned to @Gwennid nf-core/modules#715
TNscope assigned to @Gwennid nf-core/modules#731
DNAModelApply nf-core/modules#735

Another tool that might be relevant but for which there is no open issue at the moment:

WgsMetricsAlgo

Add TIDDIT/cov module

Our in-house pipeline (MIP) uses this tool and we want to add this to the nextflow pipeline. It's not part of nf-core modules yet: nf-core/modules#792.

Once the module is added to nf-core/modules, then it'll be added to subworkflow qc_bam.nf

How-to update the pipeline flowchart

Description of feature

We have a in-progress flowchart for the pipeline: https://docs.google.com/drawings/d/1QZsgxM4zuArI-N2kuWzwJB5PpjwOu8XxJ3-DInlNYEk/edit

To encourage everyone to contribute to it, we should write a "how-to".

Adding read groups to meta

It would be good to add read_group to meta so bwa_mem2 can use it and other future programs too (e.g. peddy needs it).

I have tested a little bit the addition of line:
meta.read_group = "'@rg\tID:"+row.sample + "" + row.fastq_1.split('/')[-1].split('R1*.fastq')[0] + "" + row.lane + "\tPL:ILLUMINA\tSM:"+row.sample.split('')[0]+"'"
in subworkflows/local/input_check.nf
But it creates issues when GLnexus needs to combine the different channels again (see nextflow log)
This problem does however not arise with
meta.read_group = "'@rg\tID:myid\tPL:ILLUMINA\tSM:"+row.sample.split('_')[0]+"'"

The problem arises with both a unique sample or multiple samples in the samplesheet.

Add svdb/query to annotate SV subworkflow

Description of feature

Add this to call_structural_variants.nf so we can annotate combined call sets

Parse input vcf to check for normalization

Is your feature request related to a problem? Please describe

We need to know that the input vcf:s used in for example the annotation process have been decomposed.

Describe the solution you'd like

Write a small script that parses the header and checks for bcftools norm command.

Describe alternatives you've considered

Additional context

Java memory issue on SLURM

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

nextflow run nf-core/raredisease -profile test,singularity,hasta,dev_prio -r dev (-c customconf.conf )
See error:

Without customconf.conf

[dd/f36687] NOTE: Process `NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)` terminated with an error exit status (134) -- Execution is retried (1)
WARN: Input tuple does not match input set cardinality declared by process `NFCORE_RAREDISEASE:RAREDISEASE:DEEPVARIANT_CALLER:GLNEXUS` -- offending value: [id:caseydonkey]
Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)` terminated with an error exit status (134)

Command executed:

  picard \
      -Xmx6g \
      MarkDuplicates \
      --CREATE_INDEX \
      -I 1234N.bam \
      -O 1234N_sorted.bam \
      -M 1234N_sorted.MarkDuplicates.metrics.txt

  cat <<-END_VERSIONS > versions.yml
  MARKDUPLICATES:
      markduplicates: $(echo $(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
  END_VERSIONS

Command exit status:
  134

Command output:
  #
  # A fatal error has been detected by the Java Runtime Environment:
  #
  #  Internal Error (g1PageBasedVirtualSpace.cpp:43), pid=211157, tid=211219
  #  guarantee(rs.is_reserved()) failed: Given reserved space must have been reserved already.
  #
  # JRE version:  (11.0.9.1) (build )
  # Java VM: OpenJDK 64-Bit Server VM (11.0.9.1-internal+0-adhoc..src, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
  # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
  #
  # An error report file with more information is saved as:
  # hs_err_pid211157.log
  #
  #

Command error:
  /usr/local/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory
  /usr/local/bin/picard: line 66: 211157 Aborted                 /usr/local/bin/java -Xmx6g -jar /usr/local/share/picard-2.25.7-0/picard.jar MarkDuplicates "--CREATE_INDEX" "-I" "1234N.bam" "-O" "1234N_sorted.bam" "-M" "1234N_sorted.MarkDuplicates.metrics.txt"

With customconf.conf:

process {
    withName: PICARD_MARKDUPLICATES {
        memory = 5.GB
    }
}

Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)` terminated with an error exit status (1)

Command executed:

  picard \
      -Xmx5g \
      MarkDuplicates \
      --CREATE_INDEX \
      -I 1234N.bam \
      -O 1234N_sorted.bam \
      -M 1234N_sorted.MarkDuplicates.metrics.txt

  cat <<-END_VERSIONS > versions.yml
  MARKDUPLICATES:
      markduplicates: $(echo $(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
  END_VERSIONS

Command exit status:
  1

Command output:
  Error occurred during initialization of VM
  Could not reserve enough space for 5242880KB object heap

Command error:
  /usr/local/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory

Expected behaviour

Successful completion of the analysis

Log files

Have you provided the following extra information/files:

The command used to run the pipeline
The .nextflow.log file

System

Hardware: HPC, hasta
Executor: slurm
OS: CentOS
Version: 7

Nextflow Installation

Version: 21.04.3.5560

Container engine

Engine: singularity
version: 3.1.1-1.el7

Quick fix that solves the problem until more elegant solution

modules/nf-core/modules/picard/markduplicates/main.nf:
avail_mem = task.memory.giga-2

Next related issue

Similar error for bamqc.

Additional context

For the first error, markduplicates:

nextflow-customconf.log
nextflow-no-customconf.log

Update mosdepth in nf-core modules

Description of feature

Update mosdepth in nf-core modules to v3.3.3

Add Deepvariant

Use the nf-core module if possible DV

PIPELINE: write tests for local modules + subworkflows

Description of feature

We should have a tests folder with test workflows for local modules + subworkflows 😄 . Not for nf-core/modules because tests are already written for those in their repo 👍

Glnexus

Add glnexus for genotyping
nf-core/modules#729

Add picardtools collecthsmetrics to BamQC subworkflow

~~Our in-house pipeline uses this tool and we want to add this to the nextflow pipeline. It's not part of nf-core modules yet: nf-core/modules#793.~~

~~Once the module is added to nf-core/modules, then it'll be added to subworkflow qc_bam.nf~~

EDIT: The module is part of nf-core/modules now. Please go ahead and add it to subworkflow qc_bam.nf

UPDATE: nf-core template files with new dsl2 syntax

An example of what new syntax looks like: https://github.com/nf-core/rnaseq/blob/dsl2/conf/modules.config

Rationale for template update on module + pipeline level: nf-core/tools#1327

VEP

Is your feature request related to a problem? Please describe

Add VEP from nf-core modules

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Gens preprocessing

Description of feature

Add preprocessing for Gens to the pipeline.

GATK CollectReadCounts added to nfcore/modules
GATK DenoiseReadCounts added to nfcore/modules
Gens perl-scripts added as a local module
Local subworkflow added
Subworkflow added to main workflow

Restructure workflow to use "when"

Description of feature

It would be nice to use this!

What needs to happen:

update nfcore modules: #100
update local modules: #101
modify modules.config starting with align subwkflw: #105

Create subworkflow to prepare indices

Is your feature request related to a problem? Please describe

The pipeline currently re-builds the index for bwamem2 on every run. In order to save resources, there should be a check if there are existing indices to be used instead.

Describe the solution you'd like

This subworkflow should 1) check for existing reference index files 2) allow for the re-use of indices in different downstream processes.

Describe alternatives you've considered

Additional context

Add VEP to annotate SV subworkflow

Description of feature

Some relevant arguments are --max_sv_size <length of chromosme 1>and the ExACpLI plugin

Mitochondria workflow

We have agreed to use the mitochondria workflow currently implemented at GATK best practices.

The following steps are included. Modules already exist for some of them; all modules need to be included in a subworkflow. We plan to have the mitochondrial subworkflow run by default, but to have the possibility to turn it off and also to turn off the calling of variants for the autosomes.

This list can be modified as new issues are created and new modules are added.

Test dataset including mtDNA: https://github.com/nf-core/test-datasets/tree/raredisease

update current module versions

Is your feature request related to a problem? Please describe

Samtools and MultiQC are outdated.

Describe the solution you'd like

Update them versions 😃

Describe alternatives you've considered

Additional context

select mitochondrial reads with `samtools view`; samtools

cnvpytor/importreaddepth module

An issue has been opened on nf-core/modules#1092

add tiddit/sv

Description of feature

In MIP, we combine callsets using svdb from manta, tiddit/sv, and cnvnator. We should add tiddit/sv.

Add Vcfanno to nf-core modules

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Add Stranger to pipeline

Description of feature

Should Stranger be added to subworkflows/nf-core/call_repeat_expansions.nf subworkflow?

refactor check vcf subworkflow

Description of feature

Refactor check vcf subworkflow

Update the way module versions are emitted

Is your feature request related to a problem? Please describe

nf-core/modules updated the way versions are emitted, so from <software>.version.txt -> versions.yml. This allows to emit multiple versions in cases a module or subworkflow uses multiple tools. Updated documentation here.

This pipeline is not updated accordingly yet.

Describe the solution you'd like

Update the subworkflows and main workflow 😄 accordingly

Describe alternatives you've considered

Additional context

Add Stranger to nf-core/modules

Description of feature

https://github.com/Clinical-Genomics/stranger

Add test dataset to enable CI tests

Is your feature request related to a problem? Please describe

Add test dataset to enable CI tests. New branch need to be created in nf-core/test-datasets

Describe the solution you'd like

Reuse test dataset from Sarek or similiar.

Describe alternatives you've considered

Additional context

SamtoFastQ

Convert SAM or BAM file to FastQ

add manta to SV caller subworkflow

Add manta to SV caller subworkflow

refactor alignmnt modules

Description of feature

Currently, the code snippet is in the raredisease.nf script but when there are more mappers/aligners in the picture we should hide the logic away in a bigger subworkflow with switches for which tool to use. This way we can declutter the raredisease.nf script.

if (params.aligner == 'bwamem2') {
        ALIGN_BWAMEM2 (
            INPUT_CHECK.out.reads,
            PREPARE_GENOME.out.bwamem2_index
        )
...

turns into...

if (aligner == 'bwamem2') {
        ALIGN_BWAMEM2 (
            reads,
            bwamem2_index
        )
...

stowed in the bigger subworkflow 👍 - where aligner is a resource defined in the take: definition block.

Add vcfanno to the annotation workflow

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Add svdb/query to nf-core/modules

Description of feature

We need the SV querying module to annotate 👍 https://github.com/J35P312/SVDB#query

Check and create tbi index for bed inputs

Description of feature

Create a subworkflow to check if proper indices are available for bed files and create them when not.

Add Picardtools multiple metrics

Is your feature request related to a problem? Please describe

https://github.com/nf-core/modules/tree/master/modules/picard/collectmultiplemetrics

nf-core / raredisease Goto Github PK

raredisease's Issues

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Description of feature

Overview

SNV/Indels

SV

But wait there's more

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Description of feature

Description of feature

RevertSam

Description of feature

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Check Documentation

Description of the bug

Steps to reproduce

Expected behaviour

Log files

System

Nextflow Installation

Container engine

Quick fix that solves the problem until more elegant solution

Next related issue

Additional context

Description of feature

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature