Coder Social home page Coder Social logo

Comments (7)

infphilo avatar infphilo commented on July 22, 2024

The MANUAL file is outdated, sorry. There is no --end-to-end and --local options/modes in HISAT2. I'd like to have soft clipping as a default behavior. When reads span introns with very small anchors (a couple of bases), soft-clipping is often done to represent such small anchors.

from hisat2.

FelixKrueger avatar FelixKrueger commented on July 22, 2024

Thanks for the information, we were only a bit surprised to find soft-clipped reads because we assumed it was not happening.

Just to understand the issue a bit better, instead of having a say 2bp match over a splice-junction would HISAT2 rather soft-clip it than call it a splice-junction? If I went ahead to use --sp 1000,1000 to prevent soft-clipping would it then do the right thing and give it a 2M5000N100M or the like CIGAR string?

We recently looked a little into effects on soft-clipping and found that it may add a lot of repetitive (and probably wrong) alignments to data set (https://sequencing.qcfail.com/articles/soft-clipping-of-reads-may-add-potentially-unwanted-alignments-to-repetitive-regions/). This might be not so relevant for RNA-Seq, but for regular DNA-alignments it might well add quite some noise.

Thanks a lot, Kind regards, Felix

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

Thanks for your comments! Using --sp 1000,1000 reports 2M5000N100M only when the splice junction is supported by some other reads with long anchors (>= 15bp). It won't report 2S100M. (I edited this a bit as I misunderstood your question first.)

Bowtie2 and HISAT2 are quite different when it comes to the use of soft-clipping. I'd like to see how HISAT2 works with/without soft-clipping for your DNA analysis part in which you used Bowtie2 (--end-to-end and --local modes). HISAT2 with soft-clipping should be more conservative than Bowtie2's local mode, I think (less sensitive alignment and more unique alignments than Bowtie2). BTW, that's a really nice description!

from hisat2.

FelixKrueger avatar FelixKrueger commented on July 22, 2024

I can certainly re-run the DNA samples with HISAT2, will let you know the outcome!

from hisat2.

FelixKrueger avatar FelixKrueger commented on July 22, 2024

HISAT2 on genomic sequences comparison.pdf

Hi Daehwan,

I have now run the same 100bp Input sample with HISAT2 with and w/o soft-clipping, and with and w/o splice junction models. For this data the splice junctions didn't make much of a difference (which is good). I didn't dig very deep into any kind of analysis, but I can report on a few things I found (see also the slides attached):

  • The most striking difference between Bowtie2 and HISAT2 mapping was the overall mapping efficiency (slide 1): the rate of Unaligned reads with HISAT2 was nearly 3 times higher than with Bowtie2, while the rate of multiple alignments dropped dramatically.
  • Soft-clipping in HISAT2 does not lead to a lot of extra 'peaks' in the data, if anything it almost looks like there are more regions with more reads in end-to-end (--sp 1000,1000) mode. (slide 2)
  • When you look at some regions it appears that HISAT2 is behaving nicely, e.g. in the region of satellite repeats on chrX in slide 3 that gains lost of extra reads in Bowtie2 local mode
  • there are quite a lot of regions that lose coverage compared to Bowtie2 (in either mode) such as in slide 4. This can often be seen in regions with many predicted genes (Gm or RIKEN...), or close to regions that generally look dodgy e.g. close to gaps etc.

Altogether I was quite surprised to see the rather big overall differences between Bowtie2 and HISAT2, but when you look in more detail then it actually looks like the two agree very well in the vast majority of the genome, and only differ in some regions that look sort of dodgy to me (even though I haven't investigated this any further) whereby my gut feeling is that HISAT2 is more trust worthy in these regions. Again I can't base this on facts though. If you would like to follow any of this up I could share the data on an FTP site with you if need be.

Do you think it would be possible to give users the option to chose between soft-clipping (which you would like to use as the default) and no-softclipping such as an option --end-to-end or --no-softclipping (even if this would be setting a very high soft-clipping penalty behind the scenes) because it really isn't very obvious that you would need to do --sp 1000,1000 or the like? Just a thought.

Cheers, Felix

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

Hi Felix,

Thank you again for this detailed information. I'll think more about this analysis to see how I can make HISAT2 better.

As you suggested, I will provide a new option, --no-softclipping, in the next release.

Thanks,
Daehwan

from hisat2.

FelixKrueger avatar FelixKrueger commented on July 22, 2024

Great, thanks for your quick feedback! Felix

from hisat2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.