Comments (10)
Thank you for reporting this bug, which I fixed soon after releasing 2.0.1-beta. The fixed version will be available sometime in Feb.
The fixed version will report the following alignment for your provided read,
0 0 MT 16503 255 61M14S * 0 0 GGTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCGCGATGGATCACAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-14 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:61 YT:Z:UU NH:i:1
from hisat2.
Great, thanks for the update, look forward to the new release as we'd like to be able to run Qualimap to summarise data quality.
from hisat2.
@infphilo do you have a hunch when the new version might be out?
from hisat2.
I'm trying to release a new version within three weeks, thanks.
from hisat2.
Hi @infphilo,
Poorly repeatable CIGAR M operator maps off end of reference. also happens for me (version 2.03-beta vs gatk bundle reference). Testing showed that only using the read-pair does not replicate the problem. At the moment this has only happened once and this is just for the record. So do not spend too much time on it.
For now i'll run java -jar picard.jar CleanSam ...
to solve the problem.
This is the affected read-pair (sam format in incorrect output):
M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 73 GL000220.1 117973 255 48M43688N103M = 117973 0 ATTTGGTGTATGTGCTTGGCTGAGGAGCCAATGGGGCGAAGTTACCACAGGGATAACTGGCTTGTGGCGGCCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGA AAABBFABAFBFGGGGFGGBGFGHGHDFGC4GHHHCGAEEGFGHHHHHFFFFGHHHGGHFEHFEHGCECEEEECAFGGCCGHF4CEEEGGFGGGHGHGFFFHGFHFE/?1?GHGGAD0F1FGHFHFHHGHGHGHFFGHFFGFFGHHHGGGA AS:i:-10 XN:i:0 XM:i:11 XO:i:0 XG:i:0 NM:i:11 MD:Z:41C5T94N0N0N0N0N0N0N0N0N0 YT:Z:UP XS:A:- NH:i:1
M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 133 GL000220.1 117973 0 * = 117973 0 NNNTTGNNTNNNNNNNCNNNNNNNNNNNNNNNNGNNNNGNNNNNNANNNNNNNANNAGNNNCGTNNNNNNGAANNNCNNNCCNCCACCNNCCAGTTNTNNNTNTNGTNANNTNGCCCNNNNTGNGNNNCNGCCNAGNACNNACANCAAATA !!!>1>!!>!!!!!!!A!!!!!!!!!!!!!!!!A!!!!A!!!!!!A!!!!!!!A!!BB!!!B?>!!!!!!B??!!!/!!!?/!???F/!!?/>FFH!?!!!?!/!??!?!!/!?/?F!!!!/<!?!!!?!??F!?>!>>!!.>>!<<CEGF YT:Z:UP YF:Z:NS
Testing this, gives the correct alignment:
$ hisat2 -q -x bundle/2.8/b37//hisat2/2.0.3-beta-goolf-1.7.20/human_g1k_v37_decoy -1 <(echo -e '@M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618\nATTTGGTGTATGTGCTTGGCTGAGGAGCCAATGGGGCGAAGTTACCACAGGGATAACTGGCTTGTGGCGGCCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGA\n+\nAAABBFABAFBFGGGGFGGBGFGHGHDFGC4GHHHCGAEEGFGHHHHHFFFFGHHHGGHFEHFEHGCECEEEECAFGGCCGHF4CEEEGGFGGGHGHGFFFHGFHFE/?1?GHGGAD0F1FGHFHFHHGHGHGHFFGHFFGFFGHHHGGGA'| perl -we 'my @lines=<>;print join("",@lines) x 100000;') -2 <(echo -e '@M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618\nNNNTTGNNTNNNNNNNCNNNNNNNNNNNNNNNNGNNNNGNNNNNNANNNNNNNANNAGNNNCGTNNNNNNGAANNNCNNNCCNCCACCNNCCAGTTNTNNNTNTNGTNANNTNGCCCNNNNTGNGNNNCNGCCNAGNACNNACANCAAATA\n+\n!!!>1>!!>!!!!!!!A!!!!!!!!!!!!!!!!A!!!!A!!!!!!A!!!!!!!A!!BB!!!B?>!!!!!!B??!!!/!!!?/!???F/!!?/>FFH!?!!!?!/!??!?!!/!?/?F!!!!/<!?!!!?!??F!?>!>>!!.>>!<<CEGF'| perl -we 'my @lines=<>;print join("",@lines) x 100000;') -S Miseq.sam --threads 1
$ perl -wne 'BEGIN{our $last1="";our $last2="";};my $cur=$_;if($.%2==1){if($last1 ne $cur){print $cur;};$last1=$cur;}else{if($last2 ne $cur){print $cur;}; $last2=$cur;}' Miseq.sam | tail
@SQ SN:GL000200.1 LN:187035
@SQ SN:GL000193.1 LN:189789
@SQ SN:GL000194.1 LN:191469
@SQ SN:GL000225.1 LN:211173
@SQ SN:GL000192.1 LN:547496
@SQ SN:NC_007605 LN:171823
@SQ SN:hs37d5 LN:35477943
@PG ID:hisat2 PN:hisat2 VN:2.0.3-beta CL:"hisat2/2.0.3-beta-foss-2016a/bin/hisat2-align-s --wrapper basic-0 -q -x bundle/2.8/b37//hisat2/2.0.3-beta-goolf-1.7.20/human_g1k_v37_decoy -S Miseq.sam --threads 1 -1 /dev/fd/63 -2 /dev/fd/62"
M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 73 GL000220.1 117973 255 48M43688N94M9S = 117973 0 ATTTGGTGTATGTGCTTGGCTGAGGAGCCAATGGGGCGAAGTTACCACAGGGATAACTGGCTTGTGGCGGCCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGA AAABBFABAFBFGGGGFGGBGFGHGHDFGC4GHHHCGAEEGFGHHHHHFFFFGHHHGGHFEHFEHGCECEEEECAFGGCCGHF4CEEEGGFGGGHGHGFFFHGFHFE/?1?GHGGAD0F1FGHFHFHHGHGHGHFFGHFFGFFGHHHGGGA AS:i:-21 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:41C5T94 YT:Z:UP XS:A:- NH:i:1
M01997:246:000000000-ANBW6_NNNNNN:1:2112:22392:4618 133 GL000220.1 117973 0 * = 117973 0 NNNTTGNNTNNNNNNNCNNNNNNNNNNNNNNNNGNNNNGNNNNNNANNNNNNNANNAGNNNCGTNNNNNNGAANNNCNNNCCNCCACCNNCCAGTTNTNNNTNTNGTNANNTNGCCCNNNNTGNGNNNCNGCCNAGNACNNACANCAAATA !!!>1>!!>!!!!!!!A!!!!!!!!!!!!!!!!A!!!!A!!!!!!A!!!!!!!A!!BB!!!B?>!!!!!!B??!!!/!!!?/!???F/!!?/>FFH!?!!!?!/!??!?!!/!?/?F!!!!/<!?!!!?!??F!?>!>>!!.>>!<<CEGF YT:Z:UP YF:Z:NS
from hisat2.
I reopened this issue. Sorry that this issue still exists. I'll make an intensive set of test cases on my own and fix this issue, and will make sure this problem won't happen again.
from hisat2.
Hi @mmterpstra,
Sorry that it took a while for me to get back to you. I just realized this may be a bug in picard, not in HISAT2.
First, the sequence length of GL000220 is 161802 bps.
Second, the rightmost coordinate of the alignment (not including the soft clip on the right end of the read) is 117973 + 48 + 43688 + 94 - 1 = 161802, which is the location of the last base of the sequence. It does not map beyond the sequence.
Would you like to double-check if I did this calculation correctly?
Thanks,
Daehwan
from hisat2.
Hi @infphilo,
Sorry I do not work Mondays:
You probably missed this:
but the sam line containing the error produces this cigar (at the same mapping position):
48M43688N103M
(Ctrl + F or Apple + F)
Applying your calculation on that:
117973 + 48 + 43688 + 103 - 1 = 161811
and it maps off the reference.
I tried to test it with an simple alignment (only the read mentioned) but results into the producing the correctly trimmed read (48M43688N94M9S
).
It also happens on 1/9 samples tested so far, so it might not be a common problem. For now I'll advise to close this thread and the users still experiencing problems can share their dataset with you for bug-fixing or run the picard tool for cleaning sam files: java -jar picard.jar CleanSam ...
But thanks for the response,
Best Regards M. M. Terpstra
from hisat2.
I'm sorry I overlooked the other alignment. Fortunately, I was able to reproduce the issue, with additional parameters, "--known-splicesite-infile splicesites.txt", where the file, splicesites.txt, contains one line as follows:
chrUn_gl000220 118019 161708 +
I fixed the bug - the fix is already in the master branch.
Thanks,
Daehwan
from hisat2.
Great! Thanks!
Maybe this also happend with novel splice sites detected on the go, and that's why you needed to specify the --known-splicesite-infile splicesites.txt
to compensate for that.
PS: fixed with commit 902db4a for cross referencing.
from hisat2.
Related Issues (20)
- Hisat2 [Errno 2] No such file or directory
- Align ATAC-seq with Hisat2?
- error minimum intron length with hisat2 v. 2.2.1
- Repeat mapping with different result
- Feature request: Add support for xz and zstd
- hisat2 hangs aligning axolotl reads HOT 1
- Output files(.snp, .haplotype) of hisat2_extract_snps_haplotypes_*.py are empty
- Please add the pbat option of hisat-3n
- A question about methylation information extraction
- Any plans to support Apple Silicon architecture?
- Installation Issue Error 1 - make HOT 1
- -np argument seemingly not working
- ERR): "fastq file.fastq" does not exist. Exiting now ...
- [Bug Report] hisat2-align exited with value 137, space complexity of hisat2
- hisat2 location does not exist
- Hisat-3N mapping quality
- hisat2-build index for circRNA-seq
- hisat2-build failed for Segmentation fault
- [Future request] hisat-3n table option to report conversions summarized to genomic feature or reads counts
- Issue with hisatgenotype HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hisat2.