Coder Social home page Coder Social logo

wfi's Introduction

wfi (WhoFlu IRMA)

Snakemake pipeline for running IRMA. Designed for Influenza and RSV Illumina Sequencing.

Note: This pipeline is not ready for general use, while it generally works, it is very brittle.

Requirements:

  • Linux Distro (or *unix system like MacOS)
  • Conda
  • Snakemake
  • Cutadapt
  • IRMA
  • R
  • R Packages: ggplot2 dplyr stringr tidyr cowplot gridExtra furrr

Automatic Installation - Experimental

To install wfi, use the automatic installation script:

wget https://raw.githubusercontent.com/ammaraziz/wfi/master/tools/auto_install.sh
bash auto_install.sh

Currently in testing phase, please create an issue if you run into any problems.

Manual Installation

  1. Install miniconda: https://docs.conda.io/en/latest/miniconda.html
  2. Install snakemake: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html
    conda install -n base -c conda-forge mamba
    mamba install -c bioconda -c conda-forge snakemake-minimal
    
  3. Install cutadapt and biopython:
    mamba install -c bioconda cutadapt
    mamba install -c conda-forge biopython 
    
  4. Install R (>3.6 should work) and R packages:
    mamba install -c conda-forge r-base 
    mamba install -c r r-ggplot2 r-dplyr r-tidyr r-cowplot r-gridExtra r-optparse r-furrr
    

Note: installing r packages through conda is troublesome for some, if so install manually in R.

  1. Final step! Download the latest wfi release (see the side ->)
    • Put this in your /home/$USERNAME/bin/
    • Uncompress the .zip file
    • Navigate to /wfi/bin/
    • Uncompress flu-amd.zip
    • Copy the RSV module from /wfi/bin/custom_modules/ to /wfi/bin/flu-amd/IRMA_RES/modules/

In the future the installation process will be simplified, I promise! Any issues please email, I will be happy to assist.


Usage

To use the pipeline, follow these steps:

  1. Navigate to wfi_config.yaml and modify as appropriate:
Params Values Information
input_dir path input directory - location of the raw fastq files for input
output_dir path output directory - location to output results - same dir where the config sits
second_assembly True/False if you suspect mixtures, set to True . It will increase run time substantially
subset True/False if you are only sequencing HA/NA/MP set this to True else leave as False
trim_prog standard/tile Trimming program to use, tile (bbduk) or standard (cutadapt)
trim_org h1/h3 Influenza only, Flu subtype
technology illumina/ont/pgm seq technology used, will change the module by IRMA
  1. Check snakemake is installed, if an error is produced it means snakemake was not found or it is not installed.
% snakemake --version
% 5.10.0 
  1. Test the pipeline, this will output all the commands that will be run. Look for errors (red).
% snakemake -np
  1. Run the pipeline, with option -j to specify number of cores to use.
% snakemake -j 8

Output structure:

  1. Pipeline will output correctly formatted names located in:

{output_dir}/assemblies/rename/

  1. Sorted by subtype - most likely the disired output:

{output_dir}/assemblies/rename/type/FLU{A|B}

  1. IRMA assembly specific files, see: https://wonder.cdc.gov/amd/flu/irma/output.html

{output_dir}/assemblies/{sampleID}/

  1. Files for depth and summary info located in:

{output_dir}/assemblies/{sampleID}/figures/ {output_dir}/assemblies/{sampleID}/tables/

Dependencies

  • BLAT for the match step
  • LABEL, which also packages certain resources used by IRMA:
    • Sequence Alignment and Modeling System (SAM) for both the rough align and sort steps
    • Shogun Toolbox, which is an essential part of LABEL, is used in the sort step
  • SSW for the final assembly step, download our minor modifications to SSW
  • samtools for BAM-SAM conversion as well as BAM sorting and indexing
  • GNU Parallel for single node parallelization
  • R and these R packages: optparse, ggplot2, dplyr, tidyr, stringr, cowplot, gridExtra

Troubleshooting problems:

1. Error regarding path directories		Check input and output directorys you've specified end with a '/'

2. Error: Nothing to be done			Check config file and ensure you've changed the input/output directories. 

3. A job crashed. What do I do?			Two options, delete the output directory so snakemake can run everything again. 
						Or find out where it crashed and delete the whole folder/sample. 
						Example, sometimes IRMA produces errors, find the sample which crashed, 
						go to assemblies and delete the corresponding {sampleID} folder. Rerun snakemake.
4. I'm very confusd 										
   or I need more help			
   or I've screwed something up badly!		Shoot me an email

Contact Ammar via email: [email protected] or go bug him in person for help at any time.

wfi's People

Contributors

ammaraziz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.