Coder Social home page Coder Social logo

jsa-aerial / aerobio Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 1.0 660 KB

Extensible full DAG streaming computation server with services and jobs for RNA-Seq, Tn-Seq, WG-Seq and Term-Seq.

License: MIT License

Clojure 59.26% R 3.19% JavaScript 0.04% Ruby 6.90% Perl 2.01% Python 28.60%
clojure genome-sequencing pipeline-framework pipelines rna-seq streaming-data term-seq tn-seq wg-seq

aerobio's People

Contributors

jsa-aerial avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

davidalphafox

aerobio's Issues

Completion emails (any?) should always include relevant EID

From Stephen:

I am running some more samples for Eddie and one feature I would like to request is that the email sent saying a phase was successful should have the EID with it. I'm running some of the plasmids in parallel and its not clear which sample is which by the email

Generate experiment sheets?

Is it plausible to do a good job of generating the various experiment description sheets. See #4 as supporting information

Split off flow graph to separate lib

pgmgraph and friends should be separate library!! Additionally tool and job db, create, delete, update should be part of this and removed from server (which is a crazy place for it)

Flow graph should be rewritten using Specter

While issue #2 speaks to the need of splitting this off into a separate lib which would then be a dependency, this issue is a marker indicating the need to rewrite the (rather ugly) custom graph rewriting code to use Specter. Specter has vastly cleaner general purpose rewrite code for all manner of nested datastructures, including graphs, trees, maps, and all manner of nesting. This is exactly what the flow graph recursive rewriting code needs.

Change status progress report indication

Currently, status reports in progress output with a [done: [list of nodes that have finished]]. The 'done' is confusing and would be more accurate to be run progress

Validator misses sample name/id check

Sample names and IDs in SampleSheet [Data] section must not contain underbars _, as they are the delimiter used by the conversion software (bcl2fastq and bcl-convert) to demarcate sequencer conversion information - NOT sample names/ids. This can result in 'missing input fastqs' during experiment barcode demultiplexing.

Make `replicates` the default; add new `combined` modifier

Generally, running with replicates enabled is what is typically needed the most. Combination runs are less likely to be used / desired - though they can still happen. So, change the default behavior to be replicate oriented while allowing explicit replicates modifier for backward compatibility and adding new combined modifier to indicate that a combination run is desired.

Replicate ID's - allow more than a single character

Looks like the replicate IDs can only be a single character, which limits the number of possible replicates in an experiment. For large (e.g. human population) studies there might be hundreds of samples in one group, so it would be nice to have replicate ID's that can be more than a single character. A workaround for now is to do things in batches of 26 and name replicates a-z.

config.clj file missing?

Hi, I'm trying to see if I could install the package on my MacBook pro. It looks like the config.clj is missing? I wonder if you have it or it's not necessary for running? Thank you!

Use GTFs in TnSeq analysis

calc_fitness and aggregate both use gbk parsing to determine annotations but this has two problems:

  1. It introduces dependency on bioperl and biopython - nothing else in them uses this
  2. neither of them properly parse gbks with multiple locus entries - say for whole genome and some associated plasmids

Using GTFs:

  • eliminates these dependencies - making installation simpler
  • simplifies the 'parse' - basically it is just csv read and pick fields
  • easy to create GTFs with multiple locus entries (the 'chromosome' field) from multiple gbks
  • gbks can be kept simple - single locus per gbk
  • runs involving a strain with whole genome and associated plasmids become simple to accommodate

Validate no repeating replicates

  • Make sure that no replicate names repeat in Exp-SampleSheet.csv. This is for RNASeq,TNSeq and TermSeq.

  • Make sure that SampleName/IDs do not repeat in SampleSheet. This is because bcl2fastq will catch this but it has no good way of relaying the error to Aerobio...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.