Coder Social home page Coder Social logo

Suggestion: Improved logging about hisat2 HOT 14 CLOSED

ewels avatar ewels commented on July 22, 2024 3
Suggestion: Improved logging

from hisat2.

Comments (14)

infphilo avatar infphilo commented on July 22, 2024 1

Thank you for your suggestion, @ewels

The suggested format looks great to me! I'll try to incorporate it in the next version of HISAT2. I'll be out of the country the rest of week and the next week, sorry for the brief response.

from hisat2.

ewels avatar ewels commented on July 22, 2024 1

Ok brilliant - thanks for the log changes and explanation 👍

Regarding the input filenames - no problem, I'll just take the log filename for now and hope that people name their logs after their samples :)

Phil

from hisat2.

infphilo avatar infphilo commented on July 22, 2024 1

Thank you for your suggestion again, and developing MultiQC, a very powerful tool! :-)

from hisat2.

ewels avatar ewels commented on July 22, 2024 1

Hi @infphilo,

I've just written the new HISAT2 MultiQC module to work with the output from --new-summary, so it's now available in v1.1dev.

Thanks again,

Phil

from hisat2.

infphilo avatar infphilo commented on July 22, 2024 1

Thank you for your great work, @ewels !

from hisat2.

ewels avatar ewels commented on July 22, 2024

Fantastic, thanks! Looking forward to it..

from hisat2.

lcolladotor avatar lcolladotor commented on July 22, 2024

+1 ^^

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

It took me such a long time to implement your suggested output format due to multiple (very exciting) projects, job hunting, grant writing, etc.

How about the summary output format?
-- single-end reads --
Summary stats:
Total reads: 1000000
Aligned 0 time: 956 (0.10%)
Aligned 1 time: 957987 (95.80%)
Aligned >1 times: 41057 (4.11%)
Overall alignment rate: 99.90%

-- paired-end reads --
Summary stats:
Total pairs: 1000000
Aligned concordantly 0 time: 1116 (0.11%)
Aligned concordantly 1 time: 965412 (96.54%)
Aligned concordantly >1 times: 33472 (3.35%)
Aligned discordantly 1 time: 51 (4.57%)
Total unpaired reads: 2130
Aligned 0 time: 1057 (49.62%)
Aligned 1 time: 1057 (49.62%)
Aligned >1 times: 16 (0.75%)
Overall alignment rate: 99.95%

I also implemented a new option, --summary-file, to output the summary to a file (in addition to stderr).

from hisat2.

ewels avatar ewels commented on July 22, 2024

Hi @infphilo,

No problem - I know the feeling! Thanks for looking into this.

The output you suggest looks great... A couple of minor suggestions:

  • Could you change Summary stats: to HISAT2 Summary stats:? The addition of the specific HISAT2 string makes the output a lot easier to find programmatically.
  • If it's possible to print the input filenames that would be great. Some users concatenate stderr from multiple samples, then it's nice to have the input sample associated with the summary stats.

Cheers,

Phil

from hisat2.

ewels avatar ewels commented on July 22, 2024

ps. A question - one of the plots I'd like to make for MultiQC is a stacked bargraph showing how all of the input read pairs are aligned (eg. like this one). So what proportion are not aligned at all, what proportion have > 1 alignment and so on. However, it's not entirely clear to me how the numbers from your paired-end output can be summed:

Category Number of Reads Running Total
Total pairs 1000000
Aligned concordantly 0 time 1116 1116
Aligned concordantly 1 time 965412 966528
Aligned concordantly >1 times 33472 1000000
Aligned discordantly 1 time 51 1000051
Total unpaired reads 2130
Aligned 0 time 1057 1001108
Aligned 1 time 1057 1002165
Aligned >1 times 16 1002181

I assume that this is because reads pairs can be assigned to multiple categories. Or are some numbers sub-categories of others (eg. unpaired reads?). Is there a way to put this together into a stacked bar plot that your recommend?

Cheers,

Phil

from hisat2.

ewels avatar ewels commented on July 22, 2024

Ok, looking at this a little longer. I guess Aligned discordantly 1 time is part of Aligned concordantly 0 time, which is why the top part doesn't sum to 1000000. So I can subtract one from the other to make a new category and everything should add up.

Still a bit confused about where the 2130 Total unpaired reads come from though. Are they part of the 1000000 read pair input? Or did the input FastQ files somehow have 1000000 paired-end reads and 2130 single-end reads mixed together?

How does the Overall alignment rate take this into account? Presumably you have to come to a total number of aligned reads to calculate this.

Apologies if I'm being slow here..

Phil

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

Thank you - I just changed the log a bit as follows:

HISAT2 summary stats:
Total pairs: 1000000
Aligned concordantly or discordantly 0 time: 1065 (0.11%)
Aligned concordantly 1 time: 965412 (96.54%)
Aligned concordantly >1 times: 33472 (3.35%)
Aligned discordantly 1 time: 51 (0.01%)
Total unpaired reads: 2130
Aligned 0 time: 1057 (49.62%)
Aligned 1 time: 1057 (49.62%)
Aligned >1 times: 16 (0.75%)
Overall alignment rate: 99.95%

Below is a breakdown of some numbers, and Total unpaired reads are twice the number of unaligned pairs.

Total pairs (1000000) = Aligned concordantly or discordantly 0 time + Aligned concordantly 1 time + Aligned concordantly >1 times + Aligned discordantly 1 time

Total unpaired reads (2130) = 2 * Aligned concordantly or discordantly 0 time

Overall alignment rate is number of aligned reads / number of total reads
= (2 * (Aligned concordantly 1 time + Aligned concordantly >1 times + Aligned discordantly 1 time) + Aligned 1 time + Aligned >1 times) / (2 * Total pairs)

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

I think adding input file names is also a good idea, but we'd like to have minimal summary output for now. We might add the additional info. in a later version of HISAT2.

from hisat2.

ewels avatar ewels commented on July 22, 2024

No problem at all, thanks for HISAT2! Open-source bioinformatics is great 😁 🌟

from hisat2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.