Coder Social home page Coder Social logo

nf-core-wgsnano's Introduction

wgsnano

Whole Genome Sequencing by Nanopore data analysis

Nextflow run with docker run with singularity

Introduction

nf-core-wgsnano is a bioinformatics best-practice analysis pipeline for Nanopore Whole Genome Sequencing.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.

Pipeline summary

Pipeline summary

  1. Basecalling (Dorado) - with GPU run option.Optional for pod5/fast5 formats.
  2. Basecalling QC (PycoQC)
  3. Alignment (Dorado with minimap2)
  4. Merge all aligned bam files into a single file (samtools)
  5. Haplotyping and phased variants calling (PEPPER-Margin-DeepVariant)
  6. Methylation calls extraction from bam to bed files (modkit).- Optional step.
  7. Depth calculation (mosdepth)
  8. MultiQC (MultiQC) for Basecalling (PycoQC) and Depth (mosdepth)

Quick Start

  1. Install Nextflow (>=22.10.1)

  2. Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (this pipeline can NOT be run with conda)). This requirement is not needed for running the pipeline in WashU RIS cluster.

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run dhslab/nf-core-wgsnano -profile test,YOURPROFILE(S) --outdir <OUTDIR>
  4. Start running your own analysis!

    nextflow run dhslab/nf-core-wgsnano --input samplesheet.csv --fasta <FASTA> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>

Usage

Required parameters:

  1. Input: samplesheet.csv - This file provides directory/file paths for fast5|pod5|bam reads along with their metadata. It can be specified in a configuration file or supplied directly as a command-line parameter using --input path/to/samplesheet.csv. An example of the samplesheet is available at assets/samplesheet.csv.
  2. Reference genome fasta file, either in a configuration file or as --fasta path/to/genome.fasta command line parameter.

Pipeline Execution and Customization Parameters:

Parameters for customizing the workflow sequences and entry points, along with options specifically tailored to the Dorado and PEPPER components within the pipeline. For details read the usage documentaion

Running a pipeline test in LSF cluster (configured to WashU RIS cluster environment)

1) Directly from GitHub:

NXF_HOME=${PWD}/.nextflow LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" nextflow run dhslab/nf-core-wgsnano -r dev -profile test,ris,dhslab --outdir results

Notice that three profiles are used here:

  1. test-> to provide input and fasta paths for the test run
  2. ris-> to set general configuration for RIS LSF cluster
  3. dhslab-> to set lab-specific cluster configuration

2) Alternatively, clone the repository and run the pipeline from local directory:

git clone https://github.com/dhslab/nf-core-wgsnano.git
cd nf-core-wgsnano/
chmod +x bin/*
LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" "NXF_HOME=${PWD}/.nextflow ; nextflow run main.nf -profile test,ris,dhslab --outdir results"

Notes:

  • The pipeline is developed and optimized to be run in WashU RIS (LSF) HPC, but could be deployed in any HPC environment supported by Nextflow.
  • The pipeline does NOT support conda because some of the tools used are not available as conda packages.
  • The pipeline can NOT be fully tested in a personal computer as basecalling step is computationally intense even for small test files. For testing/development purposes, the pipeline can be run in stub (dry-run) mode (see below).

nf-core-wgsnano's People

Contributors

nidhidav avatar dhspence avatar m-mahgoub avatar

Stargazers

 avatar Chase Mateusiak avatar  avatar

Watchers

 avatar  avatar

nf-core-wgsnano's Issues

Add multi-threads to samtools sort

Description of the bug

Add multi-threads to samtools sort

Command used and terminal output

No response

Relevant files

No response

System information

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.