Coder Social home page Coder Social logo

broadinstitute / funpipe Goto Github PK

View Code? Open in Web Editor NEW
3.0 5.0 4.0 3.42 MB

A python3 library for building best practice fungal genomic analysis pipeline

License: MIT License

Shell 0.13% Python 17.64% Perl 0.65% R 0.88% Dockerfile 0.32% HTML 76.84% Jupyter Notebook 3.53%
genetics genomics bioinformatics-pipeline fungal python-library python3 infectious-disease

funpipe's Introduction

FunPipe: a python library for building best practice fungal genomic analysis pipeline

FunPipe is a python library designed for efficient implementation of bioinformatic tools and pipelines for fungal genomic analysis. It contains wrapper functions to popular tools, customized functions for specific analyses tasks, and command line tools developed using those functions. This package is developing to facilitate fungal genomics, but many of the functions are generally applicable to other genomic analysis as well.

Synposis

  • funpipe: a directory that contains python library
  • scripts: tools and established pipelines, doc here
  • tests: unit tests
  • docs: API documentation
  • README.md: this file
  • setup.py: pip setup script
  • conda_env.yml: spec file for setting up conda environment
  • Dockerfile: docker images
  • requirements.txt: sphinx requirement file (not requirement for this package)
  • LICENSE: MIT license

Installation

It is recommended to install funpipe via conda, as it automatically setup all required bioinformatic tools. This is very useful on servers or clusters without root privilage. Make sure conda is available in your environment via which conda. If conda is not available in your system, install Python3.7 version of it here.

HTTP errors sometimes occur when creating the conda environment, simply rerun the conda env create -f conda_env.yml to continue creating the environment.

# clone this repo
git clone [email protected]:broadinstitute/funpipe.git

# setup conda environment
cd funpipe

conda env create -f conda_env.yml # this will take about 10 min
conda list  # verify new environment was installed correctly

# activate funpipe environment
conda activate funpipe

# the latest stable version of funpipe is available in this environment
# to use the latest funpipe version, do
pip install .

# deactivate the environment when done
conda deactivate


# to complete remove the environment
conda remove -n funpipe --all

Note:

  • diamond=0.9.22 uses boost library, which depends on python 2.7. This conflicts with funpipe's python version. To use diamond, use it via docker.

There's a bit more overhead using Docker, but it came along with the benefits of consistent environment (i.e.: including the operation systems). It's very useful when using funpipe on the cloud.

To use docker:

# Download docker
docker pull broadinstitute/funpipe:latest

# Run analysis interactively
docker run --rm -v $path_to_data/data -t broadinstitute/funpipe \
    /bin/bash -c "/scripts/vcf_qc_metr.py \
        -p prefix --jar /bin/GenomeAnalysisTK.jar \
        --fa /data/reference.fa
    "

You can use Dockerfile to compile the docker from scratch:

cd funpipe
docker build funpipe .

Install with PIP

This approach is for advanced users who don't like conda and want to integrate funpipe into their current working environment. Before starting pip installation, make sure the following list of bioinformatic tools (or a subset of tools of interest) are properly installed and add to your PATH. Path to Java tools (JARs) need to be specified when evocaking specific functions.

Requirements

  • Python >= 3.7
  • Bioinformatic tool collections: can be automatically installed via conda here
    • Basic functions:
      • samtools>=1.9
      • bwa>=0.7.8
      • gatk>=3.8
      • picard>=2.18.17
    • Phylogenetics:
      • raxml>=8.2.12
      • readseq>=2.1.30
    • CNV:
      • breakdancer>=1.4.5
      • cnvnator>=0.3
    • Microbiome:
      • pilon>=1.23
      • diamond>=0.9.22

To install with pip:

# install latest stable release
pip install funpipe

# install a specific version
pip install funpipe==0.1.0

To install the latest version: funpipe

git clone [email protected]:broadinstitute/funpipe.git
cd funpipe
pip install .

Major analysis pipelines/tools:

  • Quality control modules
    • Reference genome quality evaluation with Pilon.
    • FASTQ quality control with fastqc.
    • BAM quality control using Picard.
    • VCF quality control using GATK VariantEval.
  • Variant Annotation with snpEff.
  • Genomic Variation
    • Coverage analysis
    • Mating type analysis
    • Copy number variation with CNVnator
  • Phylogenetic analysis
    • Dating analysis with BEAST.
    • Phylogenetic tree with FastTree, RAxML and IQTREE.
  • GWAS analysis with GEMMA.

Here are scripts to run each of the above pipelines, use <toolname> -h to see the manuals.

##### Quality control #####
run_pilon.py          # Evaluate reference genome quality with pilon
fastqc.py             # Fastq quality control
bam_qc_metr.py        # Quality control of BAMs
vcf_qc_metr.py        # Quality control of VCFs

##### Variant Annotation #####
run_snpeff.py         # Annotation genomic variants with snpEff
phylo_analysis.py     # Phylogenetic analysis

##### Genomic Variations #####
coverage_analysis.py  # Hybrid coverage and ploidy analysis

You can also use out APIs to build your customized analysis scripts or pipelines. The docs will be available here: https://funpipe.readthedocs.io

funpipe's People

Contributors

sizheqiu avatar xiaoli0 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

funpipe's Issues

WDL for automation

Make pipeline into WDL for ease of execution and deployment across platforms.

replace R scripts

Remove coverage_barplot.R and qc_metr_plot.R with scripts from covisr and samplyzer

dep_per_win.pl illegal devision

dep_per_win.pl -m /gsap/garage-fungal/Crypto_neoformans_seroD_B454/analysis/JEC21_NCBI/batch1_AD_coverage/ploidy_profiles/529.indels_realigned.sorted.depth.gz -p 529.indels_realigned.sorted --window 5000 --faidx /gsap/garage-fungal/Crypto_neoformans_seroD_B454/assembly/NCBI_H99_JEC21.fasta.fai
Traceback (most recent call last)

Illegal division by zero at /cil/shed/sandboxes/xiaoli/fungal-pipeline/scripts/dep_per_win.pl line 71.

publication

Add citation section in publication when manuscript published.

Make python package

Add setup functionalities, make this repo a python package; deploy to PyPI

Include examples in documentation

Need to add an easy to use example for users, include cases, functions to directly import examples. Could use test data to accomplish this goal simultaneously.

Add popgenome analysis

R package Popgenome as an efficient way to perform population genomics analysis, like Fst, pi-score, Etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.