biokanga_align_paper's People
biokanga_align_paper's Issues
Report locations of aligned reads along chromosomes
This should allow us to identify hot-spots where some aligners tend to incorrectly place reads. The required information is already generated
Line 355 in bfa0663
but in addition to summarising the results we should plot the raw info along chromosomes. Perhaps as density.
Number of simulated reads not precisely as requested
Number of reads generated by some of the tools managed by rnftools
is not precisely as requested, presumably due to a read being output based on some probability calculation, taking into consideration the size of input genome assembly.
We can either accept this number not being precise or sanitise rnftools
output. I would opt for leaving it as is, as the same input is passed to each aligner. In addition, we may decide on reporting results as percentages anyway.
Line 147 in 819fdcd
Optimal and/or comparable settings for all aligners
We are currently using default settings for each of the aligners. This is a good starting point as needed for illustrating how the tools perform out-of-the-box. As the defaults vary widely, in the next step we may need to tailor the alignment settings to allow for a fairer, more direct comparison between the aligners. For example, allow up to 3 mismatches per 100bp, enable or disable indels, soft-clipping etc.
An alternative would be to try to tailor each of the aligners' settings to be optimal by some standard, but this could be very time and resource expensive. On the other hand, if explored via parameter sweep set within some reasonable limits, this may be a very good use of the implemented framework, providing answers to questions on what parameters one should use wit each of the aligners (if one trusts that the simulation sufficiently reflects properties of real input data).
Proposal: use constrained parameter sweep and evaluate based on proportion of reads aligned correctly, wrongly and unaligned. Among the explored ranges we should be able to select ones which allow for fairer comparison of the tools.
Resource allocation to be adjusted based on the size of the input files
Currently requirements specified for processes in conf/requirements.config
are not sufficiently specific and we end-up over-allocating for smaller genomes
Missed the boat with respect to core functionality?
From the proposed paper outlining in 'readme.md' I am unsure as to the justifying rationale(s) used to select specific capabilities of BioKanga for publication. Remember that the core functionalities targeted in the proposed paper are those which were incorporated into BioKanga back in about 2011. I think we have missed the boat on the basic alignment capabilities of BioKanga and the paper should be either a straight methods publication, with no experimental justification, or concentrate on the latest incorporated features such as the localised haplotype detection which clearly differentiates BioKanga from all other competing bioinformatics toolsets.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.