Coder Social home page Coder Social logo

swabseq-analysis's People

Contributors

dependabot[bot] avatar kathryn-explorable avatar kkovary avatar robotoer avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

kkovary

swabseq-analysis's Issues

Add classification for control success / control fail

The following changes should be made to the Sample Categorization plot in the qc_report:

  1. Add control success / control fail logic for positions A1 and B1 so that it's easy to see if there was an issue
  2. Update color scheme so that COVID positive and COVID negative classifications are more striking, along with control wells.

Test if wrapping python script into R script using reticulate

At the moment there are two major scripts in the pipeline, countAmpliconsAWS.R and dict_align.py.

At the moment, countAmpliconsAWS.R runs dict_align.py towards the beginning of the pipeline in order to align and count the amplicons. When dict_align.py is finished running, it saves the output as results.csv, which is then loaded into memory by countAmpliconsAWS.R for downstream analysis. This write/read step takes extra time, and it would be better to keep the results in memory for downstream analysis instead of writing it to the drive and then reading it back in.

The reticulate library for R provides an R interface to python that may allow us to bypass this write/read step (https://rstudio.github.io/reticulate/). I haven't used this library yet, but I'm interested in trying it out to see if it improves speed.

version suggestion

ARG SERVER_VERSION=local+container

From Jamie: "I recommend not setting this here and force that the value gets passed in directly and force an error. This will ensure that the health check will have the right version."

Only return necessary files

The QA/QC pdf is necessary, and the LIMS_results.csv are the per-DNA-barcode results. The run_info.csv data is already in the pdf, and the other info is not necessary to QA/QC the run.

Use distribution based model to adjust RPP30 threshold in water control wells

At the moment, the water control wells are using a fixed RPP30 threshold (>10 counts), but this will lead to a high number of control failures. Instead we will use threshold that is based on the distribution of RPP30 reads in the run.

Implicit assumption is that RPP30 reads come from a mixture of distributions and that for samples we look and see if reads are possibly coming from left tail of RPP30 present distribution, and for neg controls we look and see if reads are possibly coming from right tail of RPP30 absent distribution.

run bcl2fastq without compression

We could shave off ~30 seconds or so by adding the argument --no-bgzf-compression to bcl2fastq to convert bcl files to fastq files instead of fastq.gz files.

Normally fastq.gz is preferred since fastq files are so large, but since we're deleting the run files after analysis this is not an issue, and decompression takes a while.

I haven't tested this out yet but I'm curious if it improves speed.

New arguments for pipeline

I've added two new arguments to the pipeline that I think could be useful:

--season

  • Here we can specify winter, spring, summer, or fall
  • This allows the pipeline to pull in the correct forward and reverse barcode information in so that the plots in the PDF file will be correct

--debug

  • There are some outputs and plots that take extra time to generate and may not always be necessary.
    • We can discuss if this is how we want to move forward or if we just want all analyses to be done all of the time.
  • If --debug TRUE, the pipeline will cary out these extra steps if there is a potential issue with the run.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.