Coder Social home page Coder Social logo

cannoli's Introduction

Cannoli

Distributed execution of bioinformatics tools on Apache Spark. Apache 2 licensed.

Maven Central API Documentation

Cannoli project logo

Hacking Cannoli

Install

To build

$ mvn install

Installing Cannoli

Cannoli is available in Conda via Bioconda, https://bioconda.github.io/

$ conda install cannoli

Cannoli is available in Homebrew via Brewsci/bio, https://github.com/brewsci/homebrew-bio

$ brew install brewsci/bio/cannoli

Cannoli is available in Docker via BioContainers, https://biocontainers.pro

$ docker pull quay.io/biocontainers/cannoli:{tag}

Find {tag} on the tag search page, https://quay.io/repository/biocontainers/cannoli?tab=tags

Using Cannoli interactively from the shell

To run the Cannoli interactive shell, based on the ADAM shell, which in turn extends the Apache Spark shell, use cannoli-shell.

Wildcard import from ADAMContext to add implicit methods to SparkContext for loading alignments, features, fragments, genotypes, reads, sequences, slices, variant contexts, or variants, such as sc.loadPairedFastqAsFragments below.

Wildcard import from Cannoli to add implicit methods for calling external commands to the genomic datasets loaded by ADAM, such as reads.alignWithBwaMem below.

$ ./bin/cannoli-shell \
    <spark-args>

scala> import org.bdgenomics.adam.ds.ADAMContext._
import org.bdgenomics.adam.ds.ADAMContext._

scala> import org.bdgenomics.cannoli.Cannoli._
import org.bdgenomics.cannoli.Cannoli._

scala> import org.bdgenomics.cannoli.BwaMemArgs
import org.bdgenomics.cannoli.BwaMemArgs

scala> val args = new BwaMemArgs()
args: org.bdgenomics.cannoli.BwaMemArgs = org.bdgenomics.cannoli.BwaMemArgs@54234569

scala> args.indexPath = "hg38.fa"
args.indexPath: String = hg38.fa

scala> args.sampleId = "sample"
args.sampleId: String = sample

scala> val reads = sc.loadPairedFastqAsFragments("sample1.fq", "sample2.fq")
reads: org.bdgenomics.adam.ds.fragment.FragmentRDD = RDDBoundFragmentRDD with 0 reference
sequences, 0 read groups, and 0 processing steps

scala> val alignments = reads.alignWithBwaMem(args)
alignments: org.bdgenomics.adam.ds.read.AlignmentRecordRDD = RDDBoundAlignmentRecordRDD with
0 reference sequences, 0 read groups, and 0 processing steps

scala> alignments.saveAsParquet("sample.alignments.adam")

Running Cannoli from the command line

To run Cannoli commands from the command line, use cannoli-submit.

Note the -- argument separator between Spark arguments and Cannoli command arguments.

$ ./bin/cannoli-submit --help

                              _ _ 
                             | (_)
   ___ __ _ _ __  _ __   ___ | |_ 
  / __/ _` | '_ \| '_ \ / _ \| | |
 | (_| (_| | | | | | | | (_) | | |
  \___\__,_|_| |_|_| |_|\___/|_|_|

Usage: cannoli-submit [<spark-args> --] <cannoli-args>

Choose one of the following commands:

CANNOLI
        bcftoolsCall : Call variant contexts with bcftools call.
     bcftoolsMpileup : Call variants from an alignment dataset with bcftools mpileup.
        bcftoolsNorm : Normalize variant contexts with bcftools norm.
   bedtoolsIntersect : Intersect the features in a feature dataset with Bedtools intersect.
              blastn : Align DNA sequences in a sequence dataset with blastn.
              bowtie : Align paired-end reads in a fragment dataset with Bowtie.
             bowtie2 : Align paired-end reads in a fragment dataset with Bowtie 2.
    singleEndBowtie2 : Align unaligned single-end reads in an alignment dataset with Bowtie 2.
              bwaMem : Align paired-end reads in a fragment dataset with bwa mem.
     singleEndBwaMem : Align unaligned single-end reads in an alignment dataset with bwa mem.
             bwaMem2 : Align paired-end reads in a fragment dataset with Bwa-mem2.
    singleEndBwaMem2 : Align unaligned single-end reads in an alignment dataset with Bwa-mem2.
           freebayes : Call variants from an alignment dataset with Freebayes.
                 gem : Align paired-end reads in a fragment dataset with GEM-Mapper.
          magicBlast : Align paired-end reads in a fragment dataset with Magic-BLAST.
            minimap2 : Align paired-end reads in a fragment dataset with Minimap2.
        longMinimap2 : Align long reads in a sequence dataset with Minimap2.
   singleEndMinimap2 : Align unaligned single-end reads in an alignment dataset with Minimap2.
     samtoolsMpileup : Call variants from an alignment dataset with samtools mpileup.
                snap : Align paired-end reads in a fragment dataset with SNAP.
              snpEff : Annotate variant contexts with SnpEff.
                star : Align paired-end reads in a fragment dataset with STAR-Mapper.
       singleEndStar : Align unaligned single-end reads in an alignment dataset with STAR-Mapper.
              unimap : Align paired-end reads in a fragment dataset with Unimap.
          longUnimap : Align long reads in a sequence dataset with Unimap.
     singleEndUnimap : Align unaligned single-end reads in an alignment dataset with Unimap.
                 vep : Annotate variant contexts with Ensembl VEP.
         vtNormalize : Normalize variant contexts with vt normalize.

CANNOLI TOOLS
     interleaveFastq : Interleaves two FASTQ files.
         sampleReads : Sample reads from interleaved FASTQ format.

External commands wrapped by Cannoli should be installed to each executor node in the cluster

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwaMem \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    -sample_id sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -add_files

or can be run using Docker

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwaMem \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    -sample_id sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -use_docker \
    -image quay.io/biocontainers/bwa:0.7.17--h5bf99c6_8 \
    -add_files

or can be run using Singularity

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwaMem \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    -sample_id sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -use_singularity \
    -image https://depot.galaxyproject.org/singularity/bwa:0.7.17--h5bf99c6_8 \
    -add_files

cannoli's People

Contributors

dependabot[bot] avatar fnothaft avatar heuermh avatar waltermblair avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cannoli's Issues

FileNotFoundException while loading file from HDFS

I'm getting FileNotFoundException while running bwa via cannoli.
The command is
./bin/cannoli-submit --master yarn-cluster --driver-memory 2g --executor-memory 4g -- bwa hdfs://master.hdp:8020/genomics/SampleFastqFile1.fastq hdfs://master.hdp:8020/opt/121.sam sample -index /Data/HumanBase/hg19/hg19.fa

The error,
17/08/29 16:49:03 INFO ContainerManagementProtocolProxy: Opening proxy : slave3.hdp:45454 17/08/29 16:49:04 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave.hdp:37022) with ID 1 17/08/29 16:49:04 INFO BlockManagerMasterEndpoint: Registering block manager slave.hdp:39003 with 2.7 GB RAM, BlockManagerId(1, slave.hdp, 39003) 17/08/29 16:49:08 INFO AMRMClientImpl: Received new token for : slave1.hdp:45454 17/08/29 16:49:08 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 0 of them. 17/08/29 16:49:12 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave3.hdp:39136) with ID 2 17/08/29 16:49:12 INFO BlockManagerMasterEndpoint: Registering block manager slave3.hdp:41459 with 2.7 GB RAM, BlockManagerId(2, slave3.hdp, 41459) 17/08/29 16:49:12 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 17/08/29 16:49:12 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done 17/08/29 16:49:12 INFO ADAMContext: Loading hdfs://master.hdp:8020/opt/original_samfile_gvs_sorted.bam.adam as Parquet containing Fragments. Command body threw exception: java.io.FileNotFoundException: Couldn't find any files matching hdfs://master.hdp:8020/opt/original_samfile_gvs_sorted.bam.adam 17/08/29 16:49:13 INFO Bwa: Overall Duration: 12.83 secs 17/08/29 16:49:13 ERROR ApplicationMaster: User class threw exception: java.io.FileNotFoundException: Couldn't find any files matching hdfs://master.hdp:8020/opt/original_samfile_gvs_sorted.bam.adam java.io.FileNotFoundException: Couldn't find any files matching hdfs://master.hdp:8020/opt/original_samfile_gvs_sorted.bam.adam at org.bdgenomics.adam.rdd.ADAMContext.getFsAndFilesWithFilter(ADAMContext.scala:1354) at org.bdgenomics.adam.rdd.ADAMContext.loadAvroPrograms(ADAMContext.scala:1152) at org.bdgenomics.adam.rdd.ADAMContext.loadParquetFragments(ADAMContext.scala:2512) at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadFragments$1.apply(ADAMContext.scala:2904) at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadFragments$1.apply(ADAMContext.scala:2882) at scala.Option.fold(Option.scala:157) at org.apache.spark.rdd.Timer.time(Timer.scala:48) at org.bdgenomics.adam.rdd.ADAMContext.loadFragments(ADAMContext.scala:2882) at org.bdgenomics.cannoli.cli.Bwa.run(Bwa.scala:114) at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55) at org.bdgenomics.cannoli.cli.Bwa.run(Bwa.scala:101) at org.bdgenomics.cannoli.cli.Cannoli.apply(Cannoli.scala:103) at org.bdgenomics.cannoli.cli.Cannoli$.main(Cannoli.scala:41) at org.bdgenomics.cannoli.cli.Cannoli.main(Cannoli.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559) 17/08/29 16:49:13 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.io.FileNotFoundException: Couldn't find any files matching hdfs://master.hdp:8020/opt/original_samfile_gvs_sorted.bam.adam) 17/08/29 16:49:13 INFO SparkContext: Invoking stop() from shutdown hook 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 17/08/29 16:49:13 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 17/08/29 16:49:13 INFO SparkUI: Stopped Spark web UI at http://172.16.2.124:45250 17/08/29 16:49:13 INFO YarnAllocator: Driver requested a total number of 0 executor(s). 17/08/29 16:49:13 INFO YarnClusterSchedulerBackend: Shutting down all executors 17/08/29 16:49:13 INFO YarnClusterSchedulerBackend: Asking each executor to shut down 17/08/29 16:49:13 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 17/08/29 16:49:13 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/08/29 16:49:13 INFO MemoryStore: MemoryStore cleared 17/08/29 16:49:13 INFO BlockManager: BlockManager stopped 17/08/29 16:49:13 INFO BlockManagerMaster: BlockManagerMaster stopped 17/08/29 16:49:13 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/08/29 16:49:13 INFO SparkContext: Successfully stopped SparkContext 17/08/29 16:49:13 INFO ShutdownHookManager: Shutdown hook called 17/08/29 16:49:13 INFO ShutdownHookManager: Deleting directory /var/hadoop/yarn/local/usercache/root/appcache/application_1504003590244_0003/spark-efeec220-d6bd-49bd-aacc-fe9c1492ab61 17/08/29 16:49:13 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/08/29 16:49:13 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

How to fix this?

Compile errors with latest ADAM 0.23.0-SNAPSHOT version

...
[ERROR] cannoli/src/main/scala/org/bdgenomics/cannoli/Bwa.scala:199: error: value copy
is not a member of org.bdgenomics.adam.rdd.read.AlignmentRecordRDD
[ERROR] possible cause: maybe a semicolon is missing before `value copy'?
[ERROR]      .copy(recordGroups = RecordGroupDictionary(Seq(RecordGroup(sample, sample))))
[ERROR]       ^
[ERROR] cannoli/src/main/scala/org/bdgenomics/cannoli/Bwa.scala:203: error: value copy
is not a member of org.bdgenomics.adam.rdd.read.AlignmentRecordRDD
[ERROR]      output.copy(sequences = sequences)
[ERROR]             ^
[ERROR] cannoli/src/main/scala/org/bdgenomics/cannoli/util/QuerynameGrouper.scala:32:
error: class FragmentRDD is abstract; cannot be instantiated
[ERROR]    new FragmentRDD(apply(rdd.rdd),
[ERROR]    ^
[ERROR] three errors found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------

BCF not yet supported

i'm getting following error while running freebayes using cannoli
my command -------
ADAM_MAIN=org.bdgenomics.cannoli.Cannoli adam-submit --jars /opt/cannoli/target/cannoli_2.10-0.1-SNAPSHOT.jar -- freebayes -freebayes_reference hdfs://master.hdp:8020/Data/HumanBase/hg19/hg19.fa hdfs://master.hdp:8020/opt/small5.adam hdfs://master.hdp:8020/opt/sample.genotypes.adam
the error ------ Exception in thread "main" java.lang.AssertionError: assertion failed: BCF not yet supported

cannoli-assertion failed-bcf not yet supported

Read in Fragment not found

Hi,
I am getting this error after running cannoli command with bwa.
Command: ./cannoli-submit bwa hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/SRR1517848.adam hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/Homos_sep12.adam SRR1517848 -index hdfs://ip-10-48-3-5.ips.local:8020/genomics/CannoliTest/Homo_sapience_assembly18.fasta

Error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, ip-10-48-3-65.ips.local, executor 3): org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch. Avro field 'readInFragment' not found.
at org.apache.parquet.avro.AvroIndexedRecordConverter.getAvroField(AvroIndexedRecordConverter.java:128)
at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:89)
at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:64)
at org.apache.parquet.avro.AvroCompatRecordMaterializer.(AvroCompatRecordMaterializer.java:34)
at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:138)
at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:130)
at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:179)
at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:201)
at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
17/10/03 05:

Jenkins failures due to missing publish_scaladoc.sh

...
+ '[' 2.10 == 2.10 ']'
+ <https://amplab.cs.berkeley.edu/jenkins/job/cannoli/HADOOP_VERSION=2.6.0,SCALAVER=2.10,SPARK_VERSION=1.6.1,label=centos/ws/scripts/publish-scaladoc.sh>
/tmp/hudson6502393491935054868.sh: line 26: <https://amplab.cs.berkeley.edu/jenkins/job/cannoli/HADOOP_VERSION=2.6.0,SCALAVER=2.10,SPARK_VERSION=1.6.1,label=centos/ws/scripts/publish-scaladoc.sh>: No such file or directory
Build step 'Execute shell' marked build as failure
Recording test results
Publishing Scoverage XML and HTML report...

FreeBayes Issue

Hi, I am trying to run freebayes with Cannoli. I am havibng following isuue:

Driver stacktrace:
17/10/03 06:04:23 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at VariantContextRDD.scala:349, took 20.385264 s
Command body threw exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 1.0 failed 4 times, most recent failure: Lost task 4.3 in stage 1.0 (TID 15, ip-10-48-3-12.ips.local, executor 4): java.io.IOException: Cannot run program "freebayes": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.bdgenomics.adam.rdd.GenomicRDD$$anonfun$13.apply(GenomicRDD.scala:544)
at org.bdgenomics.adam.rdd.GenomicRDD$$anonfun$13.apply(GenomicRDD.scala:517)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 20 more

Driver stacktrace:
17/10/03 06:04:23 WARN spark.ExecutorAllocationManager: No stages are running, but numRunningTasks != 0
17/10/03 06:04:23 INFO cli.Freebayes: Overall Duration: 32.89 secs
Exception in thread "main" 17/10/03 06:04:23 WARN scheduler.TaskSetManager: Lost task 7.1 in stage 1.0 (TID 14, ip-10-48-3-64.ips.local, executor 2): TaskKilled (killed intentionally)

NoSuchMethodError when running cannoli locally

I am working on adding MACS2 to cannoli. When running cannoli locally, I get the following error:

$ ADAM_MAIN=org.bdgenomics.cannoli.Cannoli \
    adam-submit \
    --jars target/cannoli-spark2_2.11-0.1-SNAPSHOT.jar \
    -- \
    macs2 -t test.bed

Using ADAM_MAIN=org.bdgenomics.cannoli.Cannoli
Using SPARK_SUBMIT=/Applications/spark-2.1.1-bin-hadoop2.7/bin/spark-submit
2017-06-30 12:44:28 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Command body threw exception:
java.lang.NoSuchMethodError: org.bdgenomics.adam.rdd.ADAMContext.loadFeatures(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;)Lorg/bdgenomics/adam/rdd/feature/FeatureRDD;
Exception in thread "main" java.lang.NoSuchMethodError: org.bdgenomics.adam.rdd.ADAMContext.loadFeatures(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;)Lorg/bdgenomics/adam/rdd/feature/FeatureRDD;
	at org.bdgenomics.cannoli.Macs2.run(Macs2.scala:111)
	at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
	at org.bdgenomics.cannoli.Macs2.run(Macs2.scala:94)
	at org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:126)
	at org.bdgenomics.cannoli.Cannoli$.main(Cannoli.scala:36)
	at org.bdgenomics.cannoli.Cannoli.main(Cannoli.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I was working with @devin-petersohn to debug this. We tried two methods.

First method

  1. Copy adam-assembly-spark2_2.11-0.23.0-SNAPSHOT.jar from adam/adam-assembly/target into the cannoli repo.
  2. In cannoli, run adam-shell --jars target/cannoli-spark2_2.11-0.1-SNAPSHOT.jar,adam-assembly-spark2_2.11-0.23.0-SNAPSHOT.jar
  3. In adam-shell, try
scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._

scala> val x = sc.loadFeatures("test.bed")

This worked, sc.loadFeatures did not cause any errors.

Second method

We then tried adding the additional jar into the original command, which errored.

$ ADAM_MAIN=org.bdgenomics.cannoli.Cannoli \
    adam-submit \
    --jars target/cannoli-spark2_2.11-0.1-SNAPSHOT.jar,adam-assembly-spark2_2.11-0.23.0-SNAPSHOT.jar \
    -- \
  macs2 -t test.bed

Using ADAM_MAIN=org.bdgenomics.cannoli.Cannoli
Using SPARK_SUBMIT=/Applications/spark-2.1.1-bin-hadoop2.7/bin/spark-submit
2017-06-30 12:45:33 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Command body threw exception:
java.lang.NoSuchMethodError: org.bdgenomics.adam.rdd.ADAMContext.loadFeatures(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;)Lorg/bdgenomics/adam/rdd/feature/FeatureRDD;
Exception in thread "main" java.lang.NoSuchMethodError: org.bdgenomics.adam.rdd.ADAMContext.loadFeatures(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;)Lorg/bdgenomics/adam/rdd/feature/FeatureRDD;
	at org.bdgenomics.cannoli.Macs2.run(Macs2.scala:111)
	at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
	at org.bdgenomics.cannoli.Macs2.run(Macs2.scala:94)
	at org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:126)
	at org.bdgenomics.cannoli.Cannoli$.main(Cannoli.scala:36)
	at org.bdgenomics.cannoli.Cannoli.main(Cannoli.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Any suggestions on where to go from here? Thank you!

Tidy up FreeBayes

  • bump to quay.io/ucsc_cgl/freebayes:1.1.0--f7597a50b4449d4a963ace565a6ca7040ae10118
  • mount reference file into docker container
  • support copying files to executor
  • run in -i mode
  • specify -region when invoking FreeBayes not necessary
  • VCF comes out mis-ordered
  • use Avocado read/contig filtering mechanism (bigdatagenomics/avocado#254)

Jenkins compile error due to upstream 0.23.0-SNAPSHOT changes

...
[ERROR] <https://amplab.cs.berkeley.edu/jenkins/job/cannoli/HADOOP_VERSION=2.3.0,SCALAVER=2.10,SPARK_VERSION=2.0.0,label=centos/ws/cli/src/main/scala/org/bdgenomics/cannoli/cli/Freebayes.scala>:86: error: not enough arguments for constructor VCFOutFormatter: (conf: org.apache.hadoop.conf.Configuration)org.bdgenomics.adam.rdd.variant.VCFOutFormatter.
[ERROR] Unspecified value parameter conf.
[ERROR]     implicit val uFormatter = new VCFOutFormatter
[ERROR]                               ^
[ERROR] <https://amplab.cs.berkeley.edu/jenkins/job/cannoli/HADOOP_VERSION=2.3.0,SCALAVER=2.10,SPARK_VERSION=2.0.0,label=centos/ws/cli/src/main/scala/org/bdgenomics/cannoli/cli/Freebayes.scala>:103: error: could not find implicit value for parameter xFormatter: org.bdgenomics.adam.rdd.OutFormatter[org.bdgenomics.adam.models.VariantContext]
[ERROR]     val output: VariantContextRDD = input.pipe[VariantContext, VariantContextRDD, BAMInFormatter](freebayesCommand)
[ERROR]                                                                                                  ^
[ERROR] <https://amplab.cs.berkeley.edu/jenkins/job/cannoli/HADOOP_VERSION=2.3.0,SCALAVER=2.10,SPARK_VERSION=2.0.0,label=centos/ws/cli/src/main/scala/org/bdgenomics/cannoli/cli/SnpEff.scala>:89: error: not enough arguments for constructor VCFOutFormatter: (conf: org.apache.hadoop.conf.Configuration)org.bdgenomics.adam.rdd.variant.VCFOutFormatter.
[ERROR] Unspecified value parameter conf.
[ERROR]     implicit val uFormatter = new VCFOutFormatter
[ERROR]                               ^
[ERROR] <https://amplab.cs.berkeley.edu/jenkins/job/cannoli/HADOOP_VERSION=2.3.0,SCALAVER=2.10,SPARK_VERSION=2.0.0,label=centos/ws/cli/src/main/scala/org/bdgenomics/cannoli/cli/SnpEff.scala>:104: error: could not find implicit value for parameter xFormatter: org.bdgenomics.adam.rdd.OutFormatter[org.bdgenomics.adam.models.VariantContext]
[ERROR]     val output: VariantContextRDD = input.pipe[VariantContext, VariantContextRDD, VCFInFormatter](snpEffCommand)
[ERROR]                                                                                                  ^
[ERROR] four errors found

Cannoli Command

Hi,
I want to align my fastq file with reference genome. I indexed the reference genome with bwa. I got five different files now-Homo_sapiens_assembly18.fasta.sa
Homo_sapiens_assembly18.fasta.ann
Homo_sapiens_assembly18.fasta.amb
Homo_sapiens_assembly18.fasta.pac
Homo_sapiens_assembly18.fasta.bwt).
Which one should I use to write the cannoli command.

Any suggestion will be helpful.

Thanks

Cannoli-BWA Command Failure // Re Open

Hi @fnothaft
I put the bwa executable on the $PATH on each worker node.
Index file I put in each worker node.

Command:
./cannoli-submit --driver-memory 3g -- bwa -bwa_path /home/rokshan.jahan/adamproject/bwa-master/bwa hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/SRR1517974.fastq hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/SRR1517974.adam SRR1517974 -index /home/rokshan.jahan/adamproject/reference/ref/Homo_sapiens_assembly38.fasta -force_load_ifastq

But now I am having this error:

17/10/22 22:16:02 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
17/10/22 22:16:02 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.7 KB, free 529.9 MB)
17/10/22 22:16:02 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 15 ms
17/10/22 22:16:02 INFO serialization.ADAMKryoRegistrator: Did not find Spark internal class. This is expected for Spark 1.
17/10/22 22:16:02 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 377.4 KB, free 529.5 MB)
17/10/22 22:16:03 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Thread-10,5,main]
java.lang.IllegalArgumentException: Found read name SRR1517974.2 2/1 ending in /1 despite first-of-pair flag being set
at org.bdgenomics.adam.converters.FastqRecordConverter.readNameSuffixAndIndexOfPairMustMatch(FastqRecordConverter.scala:65)
at org.bdgenomics.adam.converters.FastqRecordConverter.parseReadInFastq(FastqRecordConverter.scala:87)
at org.bdgenomics.adam.converters.FastqRecordConverter.parseReadPairInFastq(FastqRecordConverter.scala:138)
at org.bdgenomics.adam.converters.FastqRecordConverter.convertFragment(FastqRecordConverter.scala:235)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadInterleavedFastqAsFragments$1$$anonfun$apply$16.apply(ADAMContext.scala:2206)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadInterleavedFastqAsFragments$1$$anonfun$apply$16.apply(ADAMContext.scala:2206)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.bdgenomics.adam.rdd.fragment.InterleavedFASTQInFormatter.write(InterleavedFASTQInFormatter.scala:71)
at org.bdgenomics.adam.rdd.InFormatterRunner.run(InFormatter.scala:27)
at java.lang.Thread.run(Thread.java:748)
17/10/22 22:16:03 INFO storage.DiskBlockManager: Shutdown hook called
17/10/22 22:16:03 INFO util.ShutdownHookManager: Shutdown hook called
17/10/22 22:16:03 INFO util.ShutdownHookManager: Dele

Factor out docker/mapping code

In #25 we add the ability to run BWA through Docker, and to mount reference files locally on the executors. We should factor this code out from BWA into a general pattern that we can use across all of the tools in cannoli.

It should support the matrix of:

  • Running tools natively or running tools from a docker container
  • Mounting index files local to the executor, mounting index files into the docker container, or not doing anything (i.e., index/ref files are on a globally mounted file system)

Bowtie error: reads file does not look like a FASTQ file

@chowbina reported here:

I get the following error while running bowtie using ADAM 0.22 and Spark 2.1

$ ADAM_MAIN=org.bdgenomics.cannoli.Cannoli \
  adam-submit \
  --jars target/cannoli-spark2_2.11-0.1-SNAPSHOT.jar \
  -- \
  bowtie -single -bowtie_index chr20.250k interleaved_fastq_sample1.ifq bowtie.sam
Using ADAM_MAIN=org.bdgenomics.cannoli.Cannoli
Using SPARK_SUBMIT=/Users/sudhir-chowbina/Documents/Softwares/spark//bin/spark-submit
2017-04-05 11:59:39 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error: reads file does not look like a FASTQ file
Command: bowtie-align --wrapper basic-0 -S chr20.250k - 
2017-04-05 11:59:48 WARN  BlockManager:66 - Putting block rdd_2_0 failed due to an exception
2017-04-05 11:59:48 WARN  BlockManager:66 - Block rdd_2_0 could not be removed as it was not found on disk or in memory
2017-04-05 11:59:48 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: Piped command List(bowtie, -S, chr20.250k, -) exited with error code 1.
	at org.bdgenomics.adam.rdd.OutFormatterRunner.hasNext(OutFormatter.scala:37)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2017-04-05 11:59:48 WARN  TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: Piped command List(bowtie, -S, chr20.250k, -) exited with error code 1.
	at org.bdgenomics.adam.rdd.OutFormatterR

Properly handle sequence and read group metadata from BWA

We write out empty dictionaries:

scala> println(sc.loadAlignments("LP6005441-DNA_A01.bwa.adam").recordGroups)
RecordGroupDictionary()

scala> println(sc.loadAlignments("LP6005441-DNA_A01.bwa.adam").sequences)
SequenceDictionary{}

Error with snpEff

Hi,
I am trying to use cannoli with snpEff.
Here is my command -
./cannoli-submit snpEff hdfs://ip-10-48-3-5.ips.local:8020/genomics/VCF_FILE/SRR1517974.vcf hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/SRR1.vcf -database GRCh38.82

I am getting this error -
17/11/07 12:28:10 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-48-3-12.ips.local:37891 (size: 24.7 KB, free: 530.0 MB)
17/11/07 12:28:11 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-48-3-12.ips.local, executor 1): java.lang.ArrayIndexOutOfBoundsException: -2147483648
at org.bdgenomics.adam.util.PhredUtils$.phredToSuccessProbability(PhredUtils.scala:68)
at org.bdgenomics.adam.util.PhredUtils$.phredToLogProbability(PhredUtils.scala:56)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$formatNonRefGenotypeLikelihoods$1.apply(VariantContextConverter.scala:863)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$formatNonRefGenotypeLikelihoods$1.apply(VariantContextConverter.scala:862)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:156)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofInt.map(ArrayOps.scala:156)
at org.bdgenomics.adam.converters.VariantContextConverter.formatNonRefGenotypeLikelihoods(VariantContextConverter.scala:862)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$80.apply(VariantContextConverter.scala:1739)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$80.apply(VariantContextConverter.scala:1734)
at scala.Option.fold(Option.scala:157)
at org.bdgenomics.adam.converters.VariantContextConverter.org$bdgenomics$adam$converters$VariantContextConverter$$convert$2(VariantContextConverter.scala:1734)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$makeGenotypeFormatFn$1.apply(VariantContextConverter.scala:1787)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$makeGenotypeFormatFn$1.apply(VariantContextConverter.scala:1787)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$5.apply(VariantContextConverter.scala:318)
at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$5.apply(VariantContextConverter.scala:317)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:317)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1$$anonfun$apply$14.apply(ADAMContext.scala:2017)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1$$anonfun$apply$14.apply(ADAMContext.scala:2017)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Any suggestion will be helpful

Thanks!

Default Docker images to those on BioContainers registry

Given that homebrew-science has given up the ghost, the Linuxbrew-based Docker images I had been building and hosting on Docker Hub are no longer reliable going forward.

Docker images on the BioContainers registry on Quay.io are built from recipes on Bioconda. Bowtie, bowtie2, bwa, freebayes, and snpeff are all there.

Is there any magic in the Docker images we've been using for various workflows (e.g. fnothaft/bwa:debug-3) that might be missing from the BioContainers images?

Cannoli-BWA Command Failure

Hi,
I am using this command :

./cannoli-submit --driver-memory 3g -- bwa hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/SRR1518011.fastq hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/SRR1518011.adam SRR1518011 -index hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/hg38.fasta -sequence_dictionary hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/hg38.dict -force_load_ifastq

Error :

Driver stacktrace:
17/10/21 23:05:39 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at ADAMRDDFunctions.scala:165, took 8.503789 s
Command body threw exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, ip-10-48-3-12.ips.local, executor 1): java.io.IOException: Cannot run program "bwa": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.bdgenomics.adam.rdd.GenomicRDD$$anonfun$13.apply(GenomicRDD.scala:544)
at org.bdgenomics.adam.rdd.GenomicRDD$$anonfun$13.apply(GenomicRDD.scala:517)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 16 more

Any suggestion will be helpful!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.