Coder Social home page Coder Social logo

workflows's People

Contributors

almussel avatar dependabot[bot] avatar fnothaft avatar heuermh avatar jpfeil avatar jvivian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

workflows's Issues

Create single machine FASTQ->VCF script

It would be great to have a single machine script showing how to go from FASTQ to VCF. I've done something similar for rnaseq pipelines at treehouse so that partners can get started very easily:

https://github.com/UCSC-Treehouse/pipelines/blob/master/Makefile

I'd be happy to test/iterate/develop a similar Makefile if you kick start by dumping a history of command line histories plus link to a publicly available FASTQ sample. Ideally this same sample would have been run through some other pipelines for comparison as well.

Features/requirements for Be The Match collaboration

For lack of a better place for this, our collaboration with Be The Match will require

  • Download BAM files from s3, transform to ADAM Avro+Parquet, and upload to s3 (transform_alignments)
  • Download ADAM Avro+Parquet alignments for multiple samples from s3, update record groups to prevent collision, merge into a single multi-sample ADAM Avro+Parquet alignments data set, and upload to s3 (merge_alignments)
  • Report BAM file sizes, single sample ADAM Avro+Parquet alignments file sizes, and merged ADAM Avro+Parquet alignments file size
  • Download VCF files from s3, transform to ADAM Avro+Parquet variants and genotypes, and upload to s3 (transform_variants, transform_genotypes)
  • Download ADAM Avro+Parquet variants for multiple samples, merge into a single sites-only ADAM Avro-Parquet variants data set, and upload to s3 (merge_variants)
  • Download ADAM Avro+Parquet genotypes for multiple samples, merge into a single multi-sample ADAM Avro-Parquet genotypes data set, and upload to s3 (merge_genotypes)
  • Report VCF file sizes, single sample ADAM Avro+Parquet variants and genotypes file sizes, and merged ADAM Avro+Parquet variants and genotypes file sizes
  • Notebook with queries to compare native file via s3 vs. transformed via s3 access performance
  • Documentation on how to run this stuff
  • Short manuscript on transformation process, storage requirements, and access performance

There hasn't been an ask for realigning reads, recalling variants, annotating variants with SnpEff, or joint genotyping yet, but there could be in the near future.

Doc commands have incorrect syntax

The doc command blocks aren't copy-and-pasteable due to extraneous $ symbols, e.g.

To run locally, we invoke the following command:

$ bdg-deca \
$   --targets <regions> \
$   --samples <manifest> \
$   --output-dir <path-to-save> \
$   --memory <memory-in-GB> \
$   --run-local \
$   file:<toil-jobstore-path>

No call_cannoli function

bwa_alignment for cannoli will fail:


from bdgenomics.workflows.tools.spark_tools import call_adam, \
    call_cannoli, \
    call_conductor, \
    MasterAddress, \
    HDFS_MASTER_PORT, \
    SPARK_MASTER_PORT

ImportError: cannot import name call_cannoli

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.