Coder Social home page Coder Social logo

nf-cmgg / germline Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 1.0 8.59 MB

A nextflow pipeline for calling and annotating small germline variants from short DNA reads for WES and WGS data

Home Page: https://nf-cmgg.github.io/germline/

License: MIT License

HTML 0.93% Nextflow 96.36% Groovy 2.32% Dockerfile 0.39%
annotation cram genotyping germline germline-variant-calling ngs ngs-pipeline short-reads

germline's People

Contributors

matthdsm avatar nvnieuwk avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

nvnieuwk

germline's Issues

Add file check for SDF folder

Description of the bug

Currently the SDF folder isn't checked to see if it exists, this should be added (it's only detected by the first process that needs it)

Command used and terminal output

No response

Relevant files

No response

System information

No response

Don't overwrite multiqc in the outdir

Description of the bug

When running the pipeline multiple times, the multiqc folder also gets overwritten basically deleting all statistics of the files from the previously created VCFs

Command used and terminal output

No response

Relevant files

No response

System information

No response

Improve scattering even more

Description of feature

Improve scattering so that the coverage is also taken into account before the variant calling.

Haplotypecaller is overprovisioned

Description of the bug

Haplotypecaller is set to process_medium after which the setting are overrided in modules.conf to increase time and set cpu's to 1

This causes the job to request too much memory.
I suggest setting to process_single which should be enough for single chunks (pending execution report)

Command used and terminal output

No response

Relevant files

No response

System information

No response

Allow multiple bam per sample

Description of feature

Allow multiple bam inputs per sample to account for replicates, sequencing repeats and general extra coverage

Fix `dump` statements not working correctly with ` set`

Description of the bug

When dumping channels, the channel is emitted and doesn't go further to the set parameter. This should be fixed!

Ideally also make a common consensus for dump tag names

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add ROI bed and callable region inputs to samplesheet instead of single BED

Description of feature

  • Add ROI bed an callable region inputs to samplesheet instead of single BED
  • Create intersect with callable regions and ROI if the ROI is given
  • Create callable regions with mosdepth if no callable regions were given
  • Create a param that can be used to specify a common ROI BED that will be used for all samples that have no ROI BED in their samplesheet

Add dockerfile

Description of feature

Add a dockerfile containing all the requirements for the pipeline, so we can run it as a single "tool" on nomad/galaxy/whatever

Pipeline stalls after genomicsdbimport

Description of the bug

When running a multi family run, the pipeline stalls after genomicsdbimport until all families have been run through genomicsdbimport

Command used and terminal output

No response

Relevant files

No response

System information

dev version on the 4th of January 2023

โš ๏ธ Fix all join mismatches!

Description of the bug

Several join mismatches have been detected:

  • Join when individuals got scattered (possibly after haplotypecaller since all these ran fine)
  • Join when families got scattered (possibly after genomicsdbimport since most of those processes ran fine)
  • Join when families aren't scattered (possibly before genomicsdbimport since the pipeline didn't get further than genotypegvcfs)
  • ... (hopefully nowhere else ๐Ÿ˜)

Command used and terminal output

No response

Relevant files

No response

System information

No response

Re-add ROI support for WES

Description of feature

ROI support has for now been disabled due to the addition of goleft_indexsplit. This should be solved in a new creative way

Automatic scatter count detection

Description of feature

Implement automatic scatter count detection based on the BED file. This way the pipeline will run in the most efficient way

Scattering not working as it's supposed to

Description of the bug

goleft_indexsplit makes more regions than it's supposed to when the reference contains more contigs than the scatter count => possible fix is to return to bed files to determine the regions and merge all alt contigs into one bed file and make bed files per all other regions

Command used and terminal output

No response

Relevant files

No response

System information

No response

Fix PED error

Description of the bug

Caused by:
  Process `CMGG_CMGGGERMLINE:CMGGGERMLINE:ADD_PED_HEADER:RTGTOOLS_PEDFILTER (Proband)` terminated with an error exit status (1)

Command executed:

  rtg pedfilter \
      --vcf \
      Proband.samples.tsv \
  | rtg bgzip  - > Proband.vcf.gz
  
  
  cat <<-END_VERSIONS > versions.yml
  "CMGG_CMGGGERMLINE:CMGGGERMLINE:ADD_PED_HEADER:RTGTOOLS_PEDFILTER":
      rtgtools: $(echo $(rtg version | head -n 1 | awk '{print $4}'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_TMPDIR as environment variable will not be supported in the future, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  Error: Conflicting PED definitions of sex for individual -9

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add DRS fetching

Description of feature

Add a module that fetches data through DRS whenever a row in the samplesheet only contains a sample name

Add better code comments

Description of feature

Some pieces of code can be confusing when looked at the first time, add some more comments to all the code.

Add "validation" subworkflow

Description of feature

Add validation subworkflow which runs RTGtools and/or hap.py to generate a validation report and can be enabled with a flag or something

Better support for single samples

Description of feature

Add the posibility to exclude family and PED from the samplesheet and define that the sample isn't part of a family this way

Lower the label of genotypegvcfs

Description of the bug

The label of genotypegvcfs is currently set on process_high. This is way too high, lower this!

Command used and terminal output

No response

Relevant files

No response

System information

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.