csiro-crop-informatics / biokanga_align_paper Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 139 KB

Nextflow 100.00%

biokanga_align_paper's People

Contributors

Watchers

biokanga_align_paper's Issues

Report locations of aligned reads along chromosomes

This should allow us to identify hot-spots where some aligners tend to incorrectly place reads. The required information is already generated

biokanga_align_paper/main.nf

Line 355 in bfa0663

    
             rnftools sam2es -i out.bam -o - | awk -vOFS="\t" '\$1 !~ /^#/ {category[\$7]++};END{for(k in category) {print k,category[k]}}' > summary

but in addition to summarising the results we should plot the raw info along chromosomes. Perhaps as density.

Number of simulated reads not precisely as requested

Number of reads generated by some of the tools managed by rnftools is not precisely as requested, presumably due to a read being output based on some probability calculation, taking into consideration the size of input genome assembly.

We can either accept this number not being precise or sanitise rnftools output. I would opt for leaving it as is, as the same input is passed to each aligner. In addition, we may decide on reporting results as percentages anyway.

biokanga_align_paper/main.nf

Line 147 in 819fdcd

number_of_read_tuples=${nsimreads},

Optimal and/or comparable settings for all aligners

We are currently using default settings for each of the aligners. This is a good starting point as needed for illustrating how the tools perform out-of-the-box. As the defaults vary widely, in the next step we may need to tailor the alignment settings to allow for a fairer, more direct comparison between the aligners. For example, allow up to 3 mismatches per 100bp, enable or disable indels, soft-clipping etc.

An alternative would be to try to tailor each of the aligners' settings to be optimal by some standard, but this could be very time and resource expensive. On the other hand, if explored via parameter sweep set within some reasonable limits, this may be a very good use of the implemented framework, providing answers to questions on what parameters one should use wit each of the aligners (if one trusts that the simulation sufficiently reflects properties of real input data).

Proposal: use constrained parameter sweep and evaluate based on proportion of reads aligned correctly, wrongly and unaligned. Among the explored ranges we should be able to select ones which allow for fairer comparison of the tools.

Resource allocation to be adjusted based on the size of the input files

Currently requirements specified for processes in conf/requirements.config are not sufficiently specific and we end-up over-allocating for smaller genomes

Missed the boat with respect to core functionality?

From the proposed paper outlining in 'readme.md' I am unsure as to the justifying rationale(s) used to select specific capabilities of BioKanga for publication. Remember that the core functionalities targeted in the proposed paper are those which were incorporated into BioKanga back in about 2011. I think we have missed the boat on the basic alignment capabilities of BioKanga and the paper should be either a straight methods publication, with no experimental justification, or concentrate on the latest incorporated features such as the localised haplotype detection which clearly differentiates BioKanga from all other competing bioinformatics toolsets.

csiro-crop-informatics / biokanga_align_paper Goto Github PK

biokanga_align_paper's People

Contributors

Watchers

biokanga_align_paper's Issues

Report locations of aligned reads along chromosomes

Number of simulated reads not precisely as requested

Optimal and/or comparable settings for all aligners

Resource allocation to be adjusted based on the size of the input files

Missed the boat with respect to core functionality?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent