Coder Social home page Coder Social logo

hxdhan / cannoli Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bigdatagenomics/cannoli

0.0 1.0 0.0 577 KB

Distributed execution of bioinformatics tools on Apache Spark. Apache 2 licensed.

License: Apache License 2.0

Shell 2.94% Scala 82.79% Java 14.26%

cannoli's Introduction

Cannoli

Distributed execution of bioinformatics tools on Apache Spark. Apache 2 licensed.

Maven Central API Documentation

Cannoli project logo

Hacking Cannoli

Install

To build

$ mvn install

Installing Cannoli

Cannoli is available in Conda via Bioconda, https://bioconda.github.io/

$ conda install cannoli

Cannoli is available in Homebrew via Brewsci/bio, https://github.com/brewsci/homebrew-bio

$ brew install brewsci/bio/cannoli

Cannoli is available in Docker via BioContainers, https://biocontainers.pro

$ docker pull quay.io/biocontainers/cannoli:{tag}

Find {tag} on the tag search page, https://quay.io/repository/biocontainers/cannoli?tab=tags

Using Cannoli interactively from the shell

To run the Cannoli interactive shell, based on the ADAM shell, which in turn extends the Apache Spark shell, use cannoli-shell.

Wildcard import from ADAMContext to add implicit methods to SparkContext for loading alignments, features, fragments, genotypes, reads, sequences, slices, variant contexts, or variants, such as sc.loadPairedFastqAsFragments below.

Wildcard import from Cannoli to add implicit methods for calling external commands to the genomic datasets loaded by ADAM, such as reads.alignWithBwaMem below.

$ ./bin/cannoli-shell \
    <spark-args>

scala> import org.bdgenomics.adam.ds.ADAMContext._
import org.bdgenomics.adam.ds.ADAMContext._

scala> import org.bdgenomics.cannoli.Cannoli._
import org.bdgenomics.cannoli.Cannoli._

scala> import org.bdgenomics.cannoli.BwaMemArgs
import org.bdgenomics.cannoli.BwaMemArgs

scala> val args = new BwaMemArgs()
args: org.bdgenomics.cannoli.BwaMemArgs = org.bdgenomics.cannoli.BwaMemArgs@54234569

scala> args.indexPath = "hg38.fa"
args.indexPath: String = hg38.fa

scala> args.sampleId = "sample"
args.sampleId: String = sample

scala> val reads = sc.loadPairedFastqAsFragments("sample1.fq", "sample2.fq")
reads: org.bdgenomics.adam.ds.fragment.FragmentRDD = RDDBoundFragmentRDD with 0 reference
sequences, 0 read groups, and 0 processing steps

scala> val alignments = reads.alignWithBwaMem(args)
alignments: org.bdgenomics.adam.ds.read.AlignmentRecordRDD = RDDBoundAlignmentRecordRDD with
0 reference sequences, 0 read groups, and 0 processing steps

scala> alignments.saveAsParquet("sample.alignments.adam")

Running Cannoli from the command line

To run Cannoli commands from the command line, use cannoli-submit.

Note the -- argument separator between Spark arguments and Cannoli command arguments.

$ ./bin/cannoli-submit --help

                              _ _ 
                             | (_)
   ___ __ _ _ __  _ __   ___ | |_ 
  / __/ _` | '_ \| '_ \ / _ \| | |
 | (_| (_| | | | | | | | (_) | | |
  \___\__,_|_| |_|_| |_|\___/|_|_|

Usage: cannoli-submit [<spark-args> --] <cannoli-args>

Choose one of the following commands:

CANNOLI
        bcftoolsCall : Call variant contexts with bcftools call.
     bcftoolsMpileup : Call variants from an alignment dataset with bcftools mpileup.
        bcftoolsNorm : Normalize variant contexts with bcftools norm.
   bedtoolsIntersect : Intersect the features in a feature dataset with Bedtools intersect.
              blastn : Align DNA sequences in a sequence dataset with blastn.
              bowtie : Align paired-end reads in a fragment dataset with Bowtie.
             bowtie2 : Align paired-end reads in a fragment dataset with Bowtie 2.
    singleEndBowtie2 : Align unaligned single-end reads in an alignment dataset with Bowtie 2.
              bwaMem : Align paired-end reads in a fragment dataset with bwa mem.
             bwaMem2 : Align paired-end reads in a fragment dataset with Bwa-mem2.
           freebayes : Call variants from an alignment dataset with Freebayes.
                 gem : Align paired-end reads in a fragment dataset with GEM-Mapper.
          magicBlast : Align paired-end reads in a fragment dataset with Magic-BLAST.
            minimap2 : Align paired-end reads in a fragment dataset with Minimap2.
        longMinimap2 : Align long reads in a sequence dataset with Minimap2.
   singleEndMinimap2 : Align unaligned single-end reads in an alignment dataset with Minimap2.
     samtoolsMpileup : Call variants from an alignment dataset with samtools mpileup.
                snap : Align paired-end reads in a fragment dataset with SNAP.
              snpEff : Annotate variant contexts with SnpEff.
                star : Align paired-end reads in a fragment dataset with STAR-Mapper.
       singleEndStar : Align unaligned single-end reads in an alignment dataset with STAR-Mapper.
              unimap : Align paired-end reads in a fragment dataset with Unimap.
          longUnimap : Align long reads in a sequence dataset with Unimap.
     singleEndUnimap : Align unaligned single-end reads in an alignment dataset with Unimap.
                 vep : Annotate variant contexts with Ensembl VEP.
         vtNormalize : Normalize variant contexts with vt normalize.

CANNOLI TOOLS
     interleaveFastq : Interleaves two FASTQ files.
         sampleReads : Sample reads from interleaved FASTQ format.

External commands wrapped by Cannoli should be installed to each executor node in the cluster

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwaMem \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    -sample_id sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -add_files

or can be run using Docker

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwaMem \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    -sample_id sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -use_docker \
    -image quay.io/biocontainers/bwa:0.7.17--hed695b0_7 \
    -add_files

or can be run using Singularity

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwaMem \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    -sample_id sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -use_singularity \
    -image quay.io/biocontainers/bwa:0.7.17--hed695b0_7 \
    -add_files

cannoli's People

Contributors

heuermh avatar fnothaft avatar waltermblair avatar dependabot[bot] avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.