Coder Social home page Coder Social logo

Comments (10)

infphilo avatar infphilo commented on July 22, 2024

Thank you for reporting this bug, which I fixed soon after releasing 2.0.1-beta. The fixed version will be available sometime in Feb.

The fixed version will report the following alignment for your provided read,
0 0 MT 16503 255 61M14S * 0 0 GGTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCGCGATGGATCACAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-14 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:61 YT:Z:UU NH:i:1

from hisat2.

mjafin avatar mjafin commented on July 22, 2024

Great, thanks for the update, look forward to the new release as we'd like to be able to run Qualimap to summarise data quality.

from hisat2.

mjafin avatar mjafin commented on July 22, 2024

@infphilo do you have a hunch when the new version might be out?

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

I'm trying to release a new version within three weeks, thanks.

from hisat2.

mmterpstra avatar mmterpstra commented on July 22, 2024

Hi @infphilo,

Poorly repeatable CIGAR M operator maps off end of reference. also happens for me (version 2.03-beta vs gatk bundle reference). Testing showed that only using the read-pair does not replicate the problem. At the moment this has only happened once and this is just for the record. So do not spend too much time on it.
For now i'll run java -jar picard.jar CleanSam ... to solve the problem.

This is the affected read-pair (sam format in incorrect output):

M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 73  GL000220.1  117973  255 48M43688N103M   =   117973  0   ATTTGGTGTATGTGCTTGGCTGAGGAGCCAATGGGGCGAAGTTACCACAGGGATAACTGGCTTGTGGCGGCCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGA AAABBFABAFBFGGGGFGGBGFGHGHDFGC4GHHHCGAEEGFGHHHHHFFFFGHHHGGHFEHFEHGCECEEEECAFGGCCGHF4CEEEGGFGGGHGHGFFFHGFHFE/?1?GHGGAD0F1FGHFHFHHGHGHGHFFGHFFGFFGHHHGGGA AS:i:-10    XN:i:0  XM:i:11 XO:i:0  XG:i:0  NM:i:11 MD:Z:41C5T94N0N0N0N0N0N0N0N0N0  YT:Z:UP XS:A:-  NH:i:1
M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 133 GL000220.1  117973  0   *   =   117973  0   NNNTTGNNTNNNNNNNCNNNNNNNNNNNNNNNNGNNNNGNNNNNNANNNNNNNANNAGNNNCGTNNNNNNGAANNNCNNNCCNCCACCNNCCAGTTNTNNNTNTNGTNANNTNGCCCNNNNTGNGNNNCNGCCNAGNACNNACANCAAATA !!!>1>!!>!!!!!!!A!!!!!!!!!!!!!!!!A!!!!A!!!!!!A!!!!!!!A!!BB!!!B?>!!!!!!B??!!!/!!!?/!???F/!!?/>FFH!?!!!?!/!??!?!!/!?/?F!!!!/<!?!!!?!??F!?>!>>!!.>>!<<CEGF YT:Z:UP YF:Z:NS

Testing this, gives the correct alignment:

$ hisat2 -q -x bundle/2.8/b37//hisat2/2.0.3-beta-goolf-1.7.20/human_g1k_v37_decoy -1 <(echo -e '@M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618\nATTTGGTGTATGTGCTTGGCTGAGGAGCCAATGGGGCGAAGTTACCACAGGGATAACTGGCTTGTGGCGGCCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGA\n+\nAAABBFABAFBFGGGGFGGBGFGHGHDFGC4GHHHCGAEEGFGHHHHHFFFFGHHHGGHFEHFEHGCECEEEECAFGGCCGHF4CEEEGGFGGGHGHGFFFHGFHFE/?1?GHGGAD0F1FGHFHFHHGHGHGHFFGHFFGFFGHHHGGGA'| perl -we 'my @lines=<>;print join("",@lines) x 100000;') -2 <(echo -e '@M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618\nNNNTTGNNTNNNNNNNCNNNNNNNNNNNNNNNNGNNNNGNNNNNNANNNNNNNANNAGNNNCGTNNNNNNGAANNNCNNNCCNCCACCNNCCAGTTNTNNNTNTNGTNANNTNGCCCNNNNTGNGNNNCNGCCNAGNACNNACANCAAATA\n+\n!!!>1>!!>!!!!!!!A!!!!!!!!!!!!!!!!A!!!!A!!!!!!A!!!!!!!A!!BB!!!B?>!!!!!!B??!!!/!!!?/!???F/!!?/>FFH!?!!!?!/!??!?!!/!?/?F!!!!/<!?!!!?!??F!?>!>>!!.>>!<<CEGF'| perl -we 'my @lines=<>;print join("",@lines) x 100000;') -S Miseq.sam --threads 1
$ perl -wne 'BEGIN{our $last1="";our $last2="";};my $cur=$_;if($.%2==1){if($last1 ne $cur){print $cur;};$last1=$cur;}else{if($last2 ne $cur){print $cur;}; $last2=$cur;}' Miseq.sam | tail
@SQ SN:GL000200.1   LN:187035
@SQ SN:GL000193.1   LN:189789
@SQ SN:GL000194.1   LN:191469
@SQ SN:GL000225.1   LN:211173
@SQ SN:GL000192.1   LN:547496
@SQ SN:NC_007605    LN:171823
@SQ SN:hs37d5   LN:35477943
@PG ID:hisat2   PN:hisat2   VN:2.0.3-beta   CL:"hisat2/2.0.3-beta-foss-2016a/bin/hisat2-align-s --wrapper basic-0 -q -x bundle/2.8/b37//hisat2/2.0.3-beta-goolf-1.7.20/human_g1k_v37_decoy -S Miseq.sam --threads 1 -1 /dev/fd/63 -2 /dev/fd/62"
M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 73  GL000220.1  117973  255 48M43688N94M9S  =   117973  0   ATTTGGTGTATGTGCTTGGCTGAGGAGCCAATGGGGCGAAGTTACCACAGGGATAACTGGCTTGTGGCGGCCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGA AAABBFABAFBFGGGGFGGBGFGHGHDFGC4GHHHCGAEEGFGHHHHHFFFFGHHHGGHFEHFEHGCECEEEECAFGGCCGHF4CEEEGGFGGGHGHGFFFHGFHFE/?1?GHGGAD0F1FGHFHFHHGHGHGHFFGHFFGFFGHHHGGGA AS:i:-21    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:41C5T94    YT:Z:UP XS:A:-  NH:i:1
M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 133 GL000220.1  117973  0   *   =   117973  0   NNNTTGNNTNNNNNNNCNNNNNNNNNNNNNNNNGNNNNGNNNNNNANNNNNNNANNAGNNNCGTNNNNNNGAANNNCNNNCCNCCACCNNCCAGTTNTNNNTNTNGTNANNTNGCCCNNNNTGNGNNNCNGCCNAGNACNNACANCAAATA !!!>1>!!>!!!!!!!A!!!!!!!!!!!!!!!!A!!!!A!!!!!!A!!!!!!!A!!BB!!!B?>!!!!!!B??!!!/!!!?/!???F/!!?/>FFH!?!!!?!/!??!?!!/!?/?F!!!!/<!?!!!?!??F!?>!>>!!.>>!<<CEGF YT:Z:UP YF:Z:NS

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

I reopened this issue. Sorry that this issue still exists. I'll make an intensive set of test cases on my own and fix this issue, and will make sure this problem won't happen again.

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

Hi @mmterpstra,

Sorry that it took a while for me to get back to you. I just realized this may be a bug in picard, not in HISAT2.

First, the sequence length of GL000220 is 161802 bps.
Second, the rightmost coordinate of the alignment (not including the soft clip on the right end of the read) is 117973 + 48 + 43688 + 94 - 1 = 161802, which is the location of the last base of the sequence. It does not map beyond the sequence.

Would you like to double-check if I did this calculation correctly?

Thanks,
Daehwan

from hisat2.

mmterpstra avatar mmterpstra commented on July 22, 2024

Hi @infphilo,

Sorry I do not work Mondays:

You probably missed this:
but the sam line containing the error produces this cigar (at the same mapping position):
48M43688N103M (Ctrl + F or Apple + F)

Applying your calculation on that:
117973 + 48 + 43688 + 103 - 1 = 161811 and it maps off the reference.

I tried to test it with an simple alignment (only the read mentioned) but results into the producing the correctly trimmed read (48M43688N94M9S).

It also happens on 1/9 samples tested so far, so it might not be a common problem. For now I'll advise to close this thread and the users still experiencing problems can share their dataset with you for bug-fixing or run the picard tool for cleaning sam files: java -jar picard.jar CleanSam ...

But thanks for the response,
Best Regards M. M. Terpstra

from hisat2.

infphilo avatar infphilo commented on July 22, 2024

I'm sorry I overlooked the other alignment. Fortunately, I was able to reproduce the issue, with additional parameters, "--known-splicesite-infile splicesites.txt", where the file, splicesites.txt, contains one line as follows:
chrUn_gl000220 118019 161708 +

I fixed the bug - the fix is already in the master branch.

Thanks,
Daehwan

from hisat2.

mmterpstra avatar mmterpstra commented on July 22, 2024

Great! Thanks!

Maybe this also happend with novel splice sites detected on the go, and that's why you needed to specify the --known-splicesite-infile splicesites.txt to compensate for that.

PS: fixed with commit 902db4a for cross referencing.

from hisat2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.