Coder Social home page Coder Social logo

biokanga_align_paper's People

Contributors

rsuchecki avatar

Watchers

 avatar  avatar  avatar

biokanga_align_paper's Issues

Number of simulated reads not precisely as requested

Number of reads generated by some of the tools managed by rnftools is not precisely as requested, presumably due to a read being output based on some probability calculation, taking into consideration the size of input genome assembly.

We can either accept this number not being precise or sanitise rnftools output. I would opt for leaving it as is, as the same input is passed to each aligner. In addition, we may decide on reporting results as percentages anyway.

number_of_read_tuples=${nsimreads},

Optimal and/or comparable settings for all aligners

We are currently using default settings for each of the aligners. This is a good starting point as needed for illustrating how the tools perform out-of-the-box. As the defaults vary widely, in the next step we may need to tailor the alignment settings to allow for a fairer, more direct comparison between the aligners. For example, allow up to 3 mismatches per 100bp, enable or disable indels, soft-clipping etc.

An alternative would be to try to tailor each of the aligners' settings to be optimal by some standard, but this could be very time and resource expensive. On the other hand, if explored via parameter sweep set within some reasonable limits, this may be a very good use of the implemented framework, providing answers to questions on what parameters one should use wit each of the aligners (if one trusts that the simulation sufficiently reflects properties of real input data).

Proposal: use constrained parameter sweep and evaluate based on proportion of reads aligned correctly, wrongly and unaligned. Among the explored ranges we should be able to select ones which allow for fairer comparison of the tools.

Report locations of aligned reads along chromosomes

This should allow us to identify hot-spots where some aligners tend to incorrectly place reads. The required information is already generated

rnftools sam2es -i out.bam -o - | awk -vOFS="\t" '\$1 !~ /^#/ {category[\$7]++};END{for(k in category) {print k,category[k]}}' > summary

but in addition to summarising the results we should plot the raw info along chromosomes. Perhaps as density.

Missed the boat with respect to core functionality?

From the proposed paper outlining in 'readme.md' I am unsure as to the justifying rationale(s) used to select specific capabilities of BioKanga for publication. Remember that the core functionalities targeted in the proposed paper are those which were incorporated into BioKanga back in about 2011. I think we have missed the boat on the basic alignment capabilities of BioKanga and the paper should be either a straight methods publication, with no experimental justification, or concentrate on the latest incorporated features such as the localised haplotype detection which clearly differentiates BioKanga from all other competing bioinformatics toolsets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.