Coder Social home page Coder Social logo

benlangmead / bowtie2 Goto Github PK

View Code? Open in Web Editor NEW
660.0 31.0 158.0 151.01 MB

A fast and sensitive gapped read aligner

License: GNU General Public License v3.0

Makefile 0.58% C++ 81.34% C 1.43% Perl 13.90% Python 1.46% Shell 1.00% CMake 0.29%
bioinformatics read-aligners genomics c-plus-plus

bowtie2's Introduction

Random Tests Simple Tests Version

License: GPL v3

Overview

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Obtaining Bowtie2

Bowtie 2 is available from various package managers, notably Bioconda. With Bioconda installed, you should be able to install Bowtie 2 with conda install bowtie2.

Containerized versions of Bowtie 2 are also available via the Biocontainers project (e.g. via Docker Hub).

You can also download Bowtie 2 sources and binaries from the "releases" tab on this page. Binaries are available for the Linux, Mac OS X, and Windows. By utilizing the SIMDE project Bowtie 2 now supports the following architectures: ARM64, PPC64, and s390x. If you plan to compile Bowtie 2 yourself, make sure you at least have the zlib library and header files installed. See the Building from source section of the manual for details.

Getting started

Looking to try out Bowtie 2? Check out the Bowtie 2 UI (currently in beta).

Alignment

bowtie2 takes a Bowtie 2 index and a set of sequencing read files and outputs a set of alignments in SAM format.

"Alignment" is the process by which we discover how and where the read sequences are similar to the reference sequence. An "alignment" is a result from this process, specifically: an alignment is a way of "lining up" some or all of the characters in the read with some characters from the reference in a way that reveals how they're similar. For example:

  Read:      GACTGGGCGATCTCGACTTCG
             |||||  |||||||||| |||
  Reference: GACTG--CGATCTCGACATCG

Where dash symbols represent gaps and vertical bars show where aligned characters match.

We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It's not always possible to determine this with certainty. For instance, if the reference genome contains several long stretches of As (AAAAAAAAA etc.) and the read sequence is a short stretch of As (AAAAAAA), we cannot know for certain exactly where in the sea of As the read originated.

Examples

# Aligning unpaired reads
bowtie2 -x example/index/lambda_virus -U example/reads/longreads.fq

# Aligning paired reads
bowtie2 -x example/index/lambda_virus -1 example/reads/reads_1.fq -2 example/reads/reads_2.fq

Building an index

bowtie2-build builds a Bowtie index from a set of DNA sequences. bowtie2-build outputs a set of 6 files with suffixes .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev.1.bt2, and .rev.2.bt2. In the case of a large index these suffixes will have a bt2l termination. These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence FASTA files are no longer used by Bowtie 2 once the index is built.

Bowtie 2's .bt2 index format is different from Bowtie 1's .ebwt format, and they are not compatible with each other.

Examples

# Building a small index
bowtie2-build example/reference/lambda_virus.fa example/index/lambda_virus

# Building a large index
bowtie2-build --large-index example/reference/lambda_virus.fa example/index/lambda_virus

Index inpection

bowtie2-inspect extracts information from a Bowtie 2 index about what kind of index it is and what reference sequences were used to build it. When run without any options, the tool will output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns). It can also be used to extract just the reference sequence names using the -n/--names option or a more verbose summary using the -s/--summary option.

Examples

# Inspecting a lambda_virus index (small index) and outputting the summary
bowtie2-inspect --summary example/index/lambda_virus

# Inspecting the entire lambda virus index (large index)
bowtie2-inspect --large-index example/index/lambda_virus

Publications

Bowtie 2 Papers

Related Publications

Related Work

Check out the Bowtie 2 UI, a shiny, frontend to the Bowtie 2 command line.

bowtie2's People

Contributors

alienzj avatar benlangmead avatar bmwiedemann avatar bwlang avatar cbrueffer avatar ch4rr0 avatar christopherwilks avatar extemporaneousb avatar hamilcare avatar infphilo avatar jeffhussmann avatar jmarshall avatar junaruga avatar mr-c avatar mtojek avatar nathanweeks avatar nsoranzo avatar petehaitch avatar pkubaj avatar rpetit3 avatar sameerd avatar sfiligoi avatar sjackman avatar val-antonescu avatar vejnar avatar wasade avatar wookietreiber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bowtie2's Issues

Incorrect sam template length field for reads where a mate completely overlaps another.

When a paired-read aligns such that one mate completely overlaps another, the template length field of the sam output (9th field, starting from 1) is incorrect.

As per the SAM specifications, the template length should be the number of bases from the leftmost coordinate with respect to the reference to the rightmost coordinate with respect to the reference. If the mate is in the reverse direction, the template length should be given a negative value. If the mate is in forward direction, the template length should be given a positive value.

In the following example SAM, the expected template length is 170bp. However, bowtie2 reports the template length as +/- 332bp.

Output Sam: templateLenTest.sam

@HD VN:1.0  SO:unsorted
@SQ SN:HIV1B-nef    LN:621
@PG ID:bowtie2  PN:bowtie2  VN:2.2.2    CL:"bowtie2-align-s --wrapper basic-0 --local -x templateLenTest.ref.fasta -S templateLenTest.sam -1 templateLenTest.1.fq -2 templateLenTest.2.fq"
read/1  99  HIV1B-nef   294 41  81S170M =   296 332 TGTATTATTTTTTTTTTCAAGCAGAAGACGGCATACGAGATCTAGTACGGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGCTAATTTACTCCCAAAAAAGACAAGATATCCTTGATCTGTGGGTCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTTCAAGCTAGTACCAGTTGAGCCAGAGAAGGTAGAAG HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:340    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:170    YS:i:336    YT:Z:CP
read/2  147 HIV1B-nef   296 41  168M83S =   294 -332    GGCTAATTTACTCCCAAAAAAGACAAGATATCCTTGATCTGTGGGTCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTTCAAGCTAGTACCAGTTGAGCCAGAGAAGGTAGAAGCTGTCTCTTATACACATCTGACGCTGCCGACGACACCTTACGTGTAGATCTCGGAGATAGCCGCAACATTAAAAAAAAAAAAA HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:336    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:168    YS:i:340    YT:Z:CP

Reference Fasta: templateLenTest.ref.fasta

>HIV1B-nef
ATGGGTGGCAAGTGGTCAAAACGTAGTGTGGTTGGATGGCCTACTGTAAGGGAAAGAATGAGACGAGCTGAGCCAGCAGCAGATGGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTAACAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCAGACCTCAGGTACCTTTAAGACCAATGACTTACAAGGGAGCTTTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTTACTCCCAAAAAAGACAAGATATCCTTGATCTGTGGGTCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTTCAAGCTAGTACCAGTTGAGCCAGAGAAGGTAGAAGAGGCCAATGAAGGAGAGAACAACAGCTTGTTACACCCTATGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTGTTAGTGTGGAAGTTTGACAGCCGCCTAGCATTTCATCACATGGCCCGAGAGCTGCATCCGGAGTACTACAAGGACTGCTGA

Mate1 Fastq: templateLenTest.1.fq

@read/1  Adapter contamination for 81bp followed by exact match for 170bp to HIV1B
TGTATTATTTTTTTTTTCAAGCAGAAGACGGCATACGAGATCTAGTACGGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGCTAATTTACTCCCAAAAAAGACAAGATATCCTTGATCTGTGGGTCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTTCAAGCTAGTACCAGTTGAGCCAGAGAAGGTAGAAG
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Mate2 Fastq: templateLenTest.2.fq

@read/2  Adapter contamination for 83bp followed by Exact match to HIV1B-nef for 168M.  Read in reverse direction.
TTTTTTTTTTTTTAATGTTGCGGCTATCTCCGAGATCTACACGTAAGGTGTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTTCTACCTTCTCTGGCTCAACTGGTACTAGCTTGAAGCACCATCCAAAGGTCAGTGGATATCTGATCCCTGGCCCTGGTGTGTAGTTCTGCCAATCAGGGAAGTAGCCTTGTGTGTGGTAGACCCACAGATCAAGGATATCTTGTCTTTTTTGGGAGTAAATTAGCC
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Platform Stack:
Bowtie v2.2.2
OS: Ubuntu 12.10
kernel: 3.8.0-44-generic

Step to Reproduce

  • Copy and paste contents of templateLenTest.1.fq, templateLenTest.2.fq, and templateLenTest.ref.fasta to files in the same directory. Either use the same file names, or alter the following commands accordingly.
  • Via the commandline, go to the directory containing the test files.
  • Run Bowtie2 v2.2.2 with these commands via commandline:
bowtie2-build templateLenTest.ref.fasta templateLenTest.ref.fasta

bowtie2 --local -x templateLenTest.ref.fasta -S templateLenTest.sam -1 templateLenTest.1.fq -2 templateLenTest.2.fq

Truncated fastq output with --un

When using --un with fastq input, some records are truncated in the unaligned output, with the following record starting without a newline. The total characters in the truncated records always seems to be 8192, suggesting a buffer size limit issue.

bowtie2-build issue

Hi, I encountered this following problem whenever I tried bowtie2-build, bowtie2-build-s OR bowtie2-build-l in Windows:

"bowtie2-build-s" has stopped working
"bowtie2-build-l" has stopped working

But, then it works when I used bowtie2-build-s-debug or bowtie2-build-l-debug. Everything here was done on lambda phage example provided in the bowtie2 folder. Any idea on why "debug" option is needed?

BITS=32 under MinGW

we might need to fail gracefully under MinGW in case it can not be accomplish. This is very low priority since is related with 32bit build.

Mappings Below Minimum Score Output

I have found reads in the SAM file with score below the threshold.

The command is

bowtie2 -q -N 1 -L 5 -i C,3 --gbar 1 --mp 1,1 --rdg 0,1 --rfg 0,1 --score-min L,-1 --end-to-end --norc -x /home/indexes/libraryA -U CRISPR.fastq --quiet -S CRISPR.sam --un notAligned.fasta

I see CIGAR entires like 7M3I4M2I5M which have a score below -1, if calculated according to the documentation.

Was 2.3.0 retagged?

Hi, I'm a maintainer for homebrew and looking to get bowtie 2.3.0 packaged. Homebrew/homebrew-science#4753 has a checksum that differs from what I'm currently getting while downloading bowtie. Was version 2.3.0 re-tagged?

==> Installing bowtie2 
==> Downloading https://github.com/BenLangmead/bowtie2/archive/v2.3.0.tar.gz
==> Downloading from https://codeload.github.com/BenLangmead/bowtie2/tar.gz/v2.3.0
==> Verifying bowtie2-2.3.0.tar.gz checksum
Error: SHA256 mismatch
Expected: 7ff24321e3e726c4d5f1baa0c46cceeb3611215de5c3bf59abfe89e5046c6628
Actual: 9804fddf36233f3f92c11e2250224de3395790cf35c8280c66387075df078221

Not multi-threadding during build

I am having multi-threadding issues. I'm running this line
bowtie2-build -f --noref --threads 19 ./Genes.fasta GENESDB
and checked the flux system.
It indicates that it is NOT using multiple threads (if it were the %CPU would be 1900).

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24190 tname 20 0 0.140t 0.139t 1392 R 100.0 14.1 368:08.11 bowtie2-build-l

I have a 123GB fasta file of genes, so it can take weeks to finish (I've run it for weeks and flux kills the job before it finishes). It would really help to have it multi-thread on this.

FASTA parser

Bowtie2 seems to incorrectly parse FASTA record definitions with extra greater-than character(s) appearing in them. According to the accepted consensus, FASTA definitions occupy whole lines and only the first '>' symbol on a line serves as the definition line indicator.

E.g.:

$ cat myreads.fa
>A <B> <C>
NNNNNNNNNN
$ bowtie2 -x myref --no-hd -fU myreads.fa
Warning: skipping read 'A <B' because length (1) <= # seed mismatches (0)
Warning: skipping read 'A <B' because it was < 2 characters long
Warning: skipping read ' <C' because length (1) <= # seed mismatches (0)
Warning: skipping read ' <C' because it was < 2 characters long
3 reads; of these:
  3 (100.00%) were unpaired; of these:
    3 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
A   4   *   0   0   *   *   0   0   N   I   YT:Z:UU YF:Z:LN
    4   *   0   0   *   *   0   0   C   I   YT:Z:UU YF:Z:LN
2   4   *   0   0   *   *   0   0   NNNNNNNNNN  IIIIIIIIII  YT:Z:UU YF:Z:NS

All output goes to stderr

All output of bowtie2 goes to stderr, even stuff like:

31645137 reads; of these:
31645137 (100.00%) were unpaired; of these:
31613688 (99.90%) aligned 0 times
30390 (0.10%) aligned exactly 1 time
1059 (0.00%) aligned >1 times
0.10% overall alignment rate

Shouldn't this go to stdout instead? It took me two days to figure out why my logs were empty.

Match Bonus causes segfault in local mode

Below is a scenario that triggers a segfault when processing reads that appear to trigger the match bonus algorithm on certain reads and causes bowtie2 to segfault. If I run the code in debug mode, I get an assertion, apparently related to the match bonus (-ma) being non-zero. If I set -ma to zero (0), the alignment runs, but obviously the alignments are suspect since we're triggering code that is documented as:

"The best possible score in local mode equals the match bonus times the length of the read. This happens when there are no differences between the read and the reference."

Which would result in a score of zero.

The two read files (renamed to .txt for upload) are the smallest set of paired reads that I could get to recreate the issue.

Here's my attempt at isolating the issue:

Version

[root@cmp001 bowtie2-2.2.9]# ./bowtie2 --version
/root/bowtie2-2.2.9/bowtie2-align-s version 2.2.9
64-bit
Built on localhost.localdomain
Thu Apr 21 18:36:37 EDT 2016
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Options: -O3 -m64 -msse2  -funroll-loops -g3 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

Setting environment

[root@cmp001 bowtie2-2.2.9]# BOWTIE2_INDEXES=/data/aligner/bowtie2/indexes/HOMO_SAPIEN/hg19
[root@cmp001 bowtie2-2.2.9]# export BOWTIE2_INDEXES

Attempted Alignment

[root@cmp001 bowtie2-2.2.9]# ./bowtie2 --local --very-fast --ma 2 --mp 6 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-0.6,-0.6 --skip 0 --trim5 0 --trim3 0 --phred33 --mm -x hg19 -1 ~/segfault-reads-1 -2 ~/segfault-reads-2 -S /tmp/out.sam
(ERR): bowtie2-align died with signal 11 (SEGV)

Debugging Attempted Alignment

[root@cmp001 bowtie2-2.2.9]# ./bowtie2 --debug --local --very-fast --ma 2 --mp 6 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-0.6,-0.6 --skip 0 --trim5 0 --trim3 0 --phred33 --mm -x hg19 -1 ~/segfault-reads-1 -2 ~/segfault-reads-2 -S /tmp/out.sam
Warning: Running in debug mode.  Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
assert_gt: expected (0) > (0)
aligner_swsse_loc_u8.cpp:314
bowtie2-align-s-debug: aligner_swsse_loc_u8.cpp:314: TAlScore SwAligner::alignGatherLoc8(int&, bool): Assertion `0' failed.
(ERR): bowtie2-align died with signal 6 (ABRT)

Setting -ma to zero

[root@cmp001 bowtie2-2.2.9]# ./bowtie2 --debug --local --very-fast --ma 0 --mp 6 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-0.6,-0.6 --skip 0 --trim5 0 --trim3 0 --phred33 --mm -x hg19 -1 ~/segfault-reads-1 -2 ~/segfault-reads-2 -S /tmp/out.sam
Warning: Running in debug mode.  Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
256 reads; of these:
  256 (100.00%) were paired; of these:
    85 (33.20%) aligned concordantly 0 times
    171 (66.80%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    85 pairs aligned concordantly 0 times; of these:
      6 (7.06%) aligned discordantly 1 time
    ----
    79 pairs aligned 0 times concordantly or discordantly; of these:
      158 mates make up the pairs; of these:
        89 (56.33%) aligned 0 times
        62 (39.24%) aligned exactly 1 time
        7 (4.43%) aligned >1 times
82.62% overall alignment rate

Running with -ma disabled without debugging. Yields poor alignment

[root@cmp001 bowtie2-2.2.9]# ./bowtie2 --local --very-fast --ma 0 --mp 6 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-0.6,-0.6 --skip 0 --trim5 0 --trim3 0 --phred33 --mm -x hg19 -1 ~/segfault-reads-1 -2 ~/segfault-reads-2 -S /tmp/out.sam
256 reads; of these:
  256 (100.00%) were paired; of these:
    85 (33.20%) aligned concordantly 0 times
    171 (66.80%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    85 pairs aligned concordantly 0 times; of these:
      6 (7.06%) aligned discordantly 1 time
    ----
    79 pairs aligned 0 times concordantly or discordantly; of these:
      158 mates make up the pairs; of these:
        89 (56.33%) aligned 0 times
        62 (39.24%) aligned exactly 1 time
        7 (4.43%) aligned >1 times
82.62% overall alignment rate

segfault-reads-1.txt
segfault-reads-2.txt

Better Input Checking for Minimum Score

I used --score-min L,18 --end-to-end but the software didn't stop with an error that the maximum possible score when --end-to-end is used is 0. This could be improved.

TopHat2 (Bowtie2) hangs at Generating SAM header

My understanding is that this concerns a Bowtie2 call in TopHat2; with essentially default parameters:

$ tophat -p 2 -r 56 /mnt/data/AGPv3/AGPv3 s_1_1_paired-trimmed.fq.gz s_1_2_paired-trimmed.fq.gz

the process stalls at "Generating SAM header", consuming 100% of one CPU but never (over several days) proceeding:

[2015-02-21 08:37:44] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2015-02-21 08:37:44] Checking for Bowtie
          Bowtie version:    2.2.4.0
[2015-02-21 08:37:44] Checking for Bowtie index files (genome)..
[2015-02-21 08:37:44] Checking for reference FASTA file
[2015-02-21 08:37:44] Generating SAM header for /mnt/data/AGPv3/AGPv3

I'm pretty sure it's stuck, not just taking a long time on this pretty fast Xeon machine. I'd appreciate any tips on what to alter to get it past this stage!

Making Small Indexes

Since Tophat dosent accept bt2l index files is there a way to over-ride bowtie2-build to force it to build the smaller *.bt2 index files. I tried

bowtie2-build-s hg19.fa

But it still generated a large index. I am using bowtie2-2.2.4

problems compiling on snow leopard MacOSX

Hello,

Just cloned bowtie2 to a Snow Leopard MacOSX with g++ from homebrew. After typing make, I get the following error:

$ make
/usr/local/homebrew/bin/g++ -O3 -m64 -msse2  -funroll-loops -g3 -DCOMPILER_OPTIONS="\"-O3 -m64 -msse2  -funroll-loops -g3 -DPOPCNT_CAPABILITY\"" -DPOPCNT_CAPABILITY \
                -fno-strict-aliasing -DBOWTIE2_VERSION="\"`cat VERSION`\"" -DBUILD_HOST="\"`hostname`\"" -DBUILD_TIME="\"`date`\"" -DCOMPILER_VERSION="\"`/usr/local/homebrew/bin/g++ -v 2>&1 | tail -1`\"" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE  -DBOWTIE_MM  -DBOWTIE2 -DNDEBUG -Wall \
                 -I third_party \
                -o bowtie2-build-s bt2_build.cpp \
                ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp edit.cpp bt2_idx.cpp bt2_io.cpp bt2_util.cpp reference.cpp ds.cpp multikey_qsort.cpp limit.cpp random_source.cpp tinythread.cpp diff_sample.cpp bowtie_build_main.cpp \
                -lpthread
bt2_idx.h:449:suffix or operands invalid for `popcnt'
bt2_idx.h:449:suffix or operands invalid for `popcnt'
reference.cpp: In member function 'int BitPairReference::getStretch(uint32_t*, size_t, size_t, size_t) const':
reference.cpp:456:12: warning: variable 'origBufOff' set but not used [-Wunused-but-set-variable]
   uint64_t origBufOff = bufOff;
            ^
reference.cpp:450:7: warning: variable 'binarySearched' set but not used [-Wunused-but-set-variable]
  bool binarySearched = false;
       ^
make: *** [bowtie2-build-s] Error 1

Any ideas of what to try next? Thank you,
Paul

Bowtie2-build fails if any all N sequences are present

When bowtie2-build encounters a sequence of all Ns (which may occur when processing repeat-masked contigs), it fails with the following error:
*** glibc detected *** bowtie2-build: double free or corruption (out): 0x00000000046805c0 ***

It appears this issue has been documented by others as well - a google search of the error yielded this result: ibest/ARC#33

Bowtie2 local alignment issue when reference seqs are shorter than 20bp

I am trying Bowtie2 local alignment (bowtie2 --local), to align fastq reads to a set of reference sequences. I have thousands of very short reference sequences (I treat them as many short chromosomes). I observed that bowtie2 --local was not able to align any reads to the reference sequences shorter than 20bp, when it worked fine for those reference sequences above 21bp. I wonder if there is any solution.

Error in bowtie2-build if multiple threads used

If I use --threads option in bowtie2-build, finally error occurs:

Settings:
  Output files: "test.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  test.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 1485
Using parameters --bmax 1114 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 1114 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 5; iterating...
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 741.625 (target: 1113)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 8
Getting block 2 of 8
Getting block 4 of 8
Could not open file for reading a reference graph: "test.0.saGetting block 5 of 8
"
  Reserving size (1114) for bucket 5
  Reserving size (1114) for bucket 4
  Reserving size (1114) for bucket 2
Getting block 6 of 8
Getting block 7 of 8
Getting block 3 of 8
  Calculating Z arrays for bucket 2
  Calculating Z arrays for bucket 5
  Reserving size (1114) for bucket 6
Getting block 8 of 8
  Entering block accumulator loop for bucket 5:
  Calculating Z arrays for bucket 6
  Calculating Z arrays for bucket 4
  bucket 5: 10%
  Entering block accumulator loop for bucket 2:
  Reserving size (1114) for bucket 8
  bucket 5: 20%
  Reserving size (1114) for bucket 1
  Reserving size (1114) for bucket 7
  Entering block accumulator loop for bucket 4:
  bucket 5: 30%
  Reserving size (1114) for bucket 3
  Calculating Z arrays for bucket 1
  Calculating Z arrays for bucket 7
  bucket 2: 10%
  bucket 4: 10%
  Entering block accumulator loop for bucket 1:
  bucket 2: 20%
  Entering block accumulator loop for bucket 7:
  Calculating Z arrays for bucket 3
  bucket 5: 40%
  Entering block accumulator loop for bucket 6:
  bucket 2: 30%
  Calculating Z arrays for bucket 8
  bucket 4: 20%
  bucket 5: 50%
  bucket 1: 10%
  bucket 7: 10%
  Entering block accumulator loop for bucket 8:
  bucket 1: 20%
  bucket 7: 20%
  bucket 8: 10%
  bucket 2: 40%
  bucket 4: 30%
  Entering block accumulator loop for bucket 3:
  bucket 8: 20%
  bucket 2: 50%
  bucket 6: 10%
  bucket 4: 40%
  bucket 5: 60%
  bucket 3: 10%
  bucket 2: 60%
  bucket 7: 30%
  bucket 1: 30%
  bucket 5: 70%
  bucket 3: 20%
  bucket 1: 40%
  bucket 6: 20%
  bucket 4: 50%
  bucket 8: 30%
  bucket 2: 70%
  bucket 7: 40%
  bucket 1: 50%
  bucket 6: 30%
  bucket 4: 60%
  bucket 8: 40%
  bucket 3: 30%
  bucket 1: 60%
  bucket 7: 50%
  bucket 4: 70%
  bucket 2: 80%
  bucket 3: 40%
  bucket 8: 50%
  bucket 6: 40%
  bucket 7: 60%
  bucket 3: 50%
  bucket 4: 80%
  bucket 5: 80%
  bucket 2: 90%
  bucket 3: 60%
  bucket 8: 60%
  bucket 6: 50%
  bucket 2: 100%
  bucket 8: 70%
  Sorting block of length 866 for bucket 2
  bucket 1: 70%
  bucket 5: 90%
  bucket 8: 80%
  (Using difference cover)
  bucket 3: 70%
  bucket 7: 70%
  bucket 1: 80%
  bucket 8: 90%
  bucket 6: 60%
  bucket 5: 100%
  bucket 3: 80%
  bucket 4: 90%
  bucket 6: 70%
  bucket 8: 100%
  bucket 3: 90%
  Sorting block of length 449 for bucket 8
  (Using difference cover)
  bucket 7: 80%
  bucket 1: 90%
  bucket 4: 100%
  bucket 6: 80%
  Sorting block of length 973 for bucket 4
  bucket 3: 100%
  (Using difference cover)
  bucket 6: 90%
  bucket 7: 90%
  Sorting block of length 853 for bucket 5
  (Using difference cover)
  Sorting block of length 909 for bucket 3
  (Using difference cover)
  bucket 1: 100%
  bucket 6: 100%
  bucket 7: 100%
  Sorting block of length 439 for bucket 1
  (Using difference cover)
  Sorting block of length 761 for bucket 6
  (Using difference cover)
  Sorting block of length 683 for bucket 7
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 867 for bucket 2
  Sorting block time: 00:00:00
  Sorting block time: 00:00:00
  Sorting block time: 00:00:00
  Sorting block time: 00:00:00
Returning block of 440 for bucket 1
  Sorting block time: 00:00:00
Returning block of 974 for bucket 4
Returning block of 450 for bucket 8
Returning block of 910 for bucket 3
Returning block of 854 for bucket 5
  Sorting block time: 00:00:00
  Sorting block time: 00:00:00
Returning block of 762 for bucket 6
Returning block of 684 for bucket 7
Total time for call to driver() for forward index: 00:00:00
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 --threads 10 test.fa test 
Deleting "test.3.bt2" file written during aborted indexing attempt.
Deleting "test.4.bt2" file written during aborted indexing attempt.
Deleting "test.1.bt2" file written during aborted indexing attempt.
Deleting "test.2.bt2" file written during aborted indexing attempt.

However, if I delete --threads option, it is OK.

Settings:
  Output files: "test.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  test.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 1485
Using parameters --bmax 1114 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 1114 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 5; iterating...
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 1; iterating...
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 741.625 (target: 1113)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 8
  Reserving size (1114) for bucket 1
  Calculating Z arrays for bucket 1
  Entering block accumulator loop for bucket 1:
  bucket 1: 10%
  bucket 1: 20%
  bucket 1: 30%
  bucket 1: 40%
  bucket 1: 50%
  bucket 1: 60%
  bucket 1: 70%
  bucket 1: 80%
  bucket 1: 90%
  bucket 1: 100%
  Sorting block of length 439 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 440 for bucket 1
Getting block 2 of 8
  Reserving size (1114) for bucket 2
  Calculating Z arrays for bucket 2
  Entering block accumulator loop for bucket 2:
  bucket 2: 10%
  bucket 2: 20%
  bucket 2: 30%
  bucket 2: 40%
  bucket 2: 50%
  bucket 2: 60%
  bucket 2: 70%
  bucket 2: 80%
  bucket 2: 90%
  bucket 2: 100%
  Sorting block of length 866 for bucket 2
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 867 for bucket 2
Getting block 3 of 8
  Reserving size (1114) for bucket 3
  Calculating Z arrays for bucket 3
  Entering block accumulator loop for bucket 3:
  bucket 3: 10%
  bucket 3: 20%
  bucket 3: 30%
  bucket 3: 40%
  bucket 3: 50%
  bucket 3: 60%
  bucket 3: 70%
  bucket 3: 80%
  bucket 3: 90%
  bucket 3: 100%
  Sorting block of length 1035 for bucket 3
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 1036 for bucket 3
Getting block 4 of 8
  Reserving size (1114) for bucket 4
  Calculating Z arrays for bucket 4
  Entering block accumulator loop for bucket 4:
  bucket 4: 10%
  bucket 4: 20%
  bucket 4: 30%
  bucket 4: 40%
  bucket 4: 50%
  bucket 4: 60%
  bucket 4: 70%
  bucket 4: 80%
  bucket 4: 90%
  bucket 4: 100%
  Sorting block of length 731 for bucket 4
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 732 for bucket 4
Getting block 5 of 8
  Reserving size (1114) for bucket 5
  Calculating Z arrays for bucket 5
  Entering block accumulator loop for bucket 5:
  bucket 5: 10%
  bucket 5: 20%
  bucket 5: 30%
  bucket 5: 40%
  bucket 5: 50%
  bucket 5: 60%
  bucket 5: 70%
  bucket 5: 80%
  bucket 5: 90%
  bucket 5: 100%
  Sorting block of length 969 for bucket 5
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 970 for bucket 5
Getting block 6 of 8
  Reserving size (1114) for bucket 6
  Calculating Z arrays for bucket 6
  Entering block accumulator loop for bucket 6:
  bucket 6: 10%
  bucket 6: 20%
  bucket 6: 30%
  bucket 6: 40%
  bucket 6: 50%
  bucket 6: 60%
  bucket 6: 70%
  bucket 6: 80%
  bucket 6: 90%
  bucket 6: 100%
  Sorting block of length 761 for bucket 6
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 762 for bucket 6
Getting block 7 of 8
  Reserving size (1114) for bucket 7
  Calculating Z arrays for bucket 7
  Entering block accumulator loop for bucket 7:
  bucket 7: 10%
  bucket 7: 20%
  bucket 7: 30%
  bucket 7: 40%
  bucket 7: 50%
  bucket 7: 60%
  bucket 7: 70%
  bucket 7: 80%
  bucket 7: 90%
  bucket 7: 100%
  Sorting block of length 683 for bucket 7
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 684 for bucket 7
Getting block 8 of 8
  Reserving size (1114) for bucket 8
  Calculating Z arrays for bucket 8
  Entering block accumulator loop for bucket 8:
  bucket 8: 10%
  bucket 8: 20%
  bucket 8: 30%
  bucket 8: 40%
  bucket 8: 50%
  bucket 8: 60%
  bucket 8: 70%
  bucket 8: 80%
  bucket 8: 90%
  bucket 8: 100%
  Sorting block of length 449 for bucket 8
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 450 for bucket 8
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 2117
fchr[G]: 3176
fchr[T]: 4236
fchr[$]: 5940
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4196453 bytes to primary EBWT file: test.1.bt2
Wrote 1492 bytes to secondary EBWT file: test.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 5940
    bwtLen: 5941
    sz: 1485
    bwtSz: 1486
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 372
    offsSz: 1488
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 31
    numLines: 31
    ebwtTotLen: 1984
    ebwtTotSz: 1984
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:00:00
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
  Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 1485
Using parameters --bmax 1114 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 1114 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 741.625 (target: 1113)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 8
  Reserving size (1114) for bucket 1
  Calculating Z arrays for bucket 1
  Entering block accumulator loop for bucket 1:
  bucket 1: 10%
  bucket 1: 20%
  bucket 1: 30%
  bucket 1: 40%
  bucket 1: 50%
  bucket 1: 60%
  bucket 1: 70%
  bucket 1: 80%
  bucket 1: 90%
  bucket 1: 100%
  Sorting block of length 1020 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 1021 for bucket 1
Getting block 2 of 8
  Reserving size (1114) for bucket 2
  Calculating Z arrays for bucket 2
  Entering block accumulator loop for bucket 2:
  bucket 2: 10%
  bucket 2: 20%
  bucket 2: 30%
  bucket 2: 40%
  bucket 2: 50%
  bucket 2: 60%
  bucket 2: 70%
  bucket 2: 80%
  bucket 2: 90%
  bucket 2: 100%
  Sorting block of length 469 for bucket 2
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 470 for bucket 2
Getting block 3 of 8
  Reserving size (1114) for bucket 3
  Calculating Z arrays for bucket 3
  Entering block accumulator loop for bucket 3:
  bucket 3: 10%
  bucket 3: 20%
  bucket 3: 30%
  bucket 3: 40%
  bucket 3: 50%
  bucket 3: 60%
  bucket 3: 70%
  bucket 3: 80%
  bucket 3: 90%
  bucket 3: 100%
  Sorting block of length 975 for bucket 3
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 976 for bucket 3
Getting block 4 of 8
  Reserving size (1114) for bucket 4
  Calculating Z arrays for bucket 4
  Entering block accumulator loop for bucket 4:
  bucket 4: 10%
  bucket 4: 20%
  bucket 4: 30%
  bucket 4: 40%
  bucket 4: 50%
  bucket 4: 60%
  bucket 4: 70%
  bucket 4: 80%
  bucket 4: 90%
  bucket 4: 100%
  Sorting block of length 469 for bucket 4
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 470 for bucket 4
Getting block 5 of 8
  Reserving size (1114) for bucket 5
  Calculating Z arrays for bucket 5
  Entering block accumulator loop for bucket 5:
  bucket 5: 10%
  bucket 5: 20%
  bucket 5: 30%
  bucket 5: 40%
  bucket 5: 50%
  bucket 5: 60%
  bucket 5: 70%
  bucket 5: 80%
  bucket 5: 90%
  bucket 5: 100%
  Sorting block of length 677 for bucket 5
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 678 for bucket 5
Getting block 6 of 8
  Reserving size (1114) for bucket 6
  Calculating Z arrays for bucket 6
  Entering block accumulator loop for bucket 6:
  bucket 6: 10%
  bucket 6: 20%
  bucket 6: 30%
  bucket 6: 40%
  bucket 6: 50%
  bucket 6: 60%
  bucket 6: 70%
  bucket 6: 80%
  bucket 6: 90%
  bucket 6: 100%
  Sorting block of length 964 for bucket 6
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 965 for bucket 6
Getting block 7 of 8
  Reserving size (1114) for bucket 7
  Calculating Z arrays for bucket 7
  Entering block accumulator loop for bucket 7:
  bucket 7: 10%
  bucket 7: 20%
  bucket 7: 30%
  bucket 7: 40%
  bucket 7: 50%
  bucket 7: 60%
  bucket 7: 70%
  bucket 7: 80%
  bucket 7: 90%
  bucket 7: 100%
  Sorting block of length 912 for bucket 7
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 913 for bucket 7
Getting block 8 of 8
  Reserving size (1114) for bucket 8
  Calculating Z arrays for bucket 8
  Entering block accumulator loop for bucket 8:
  bucket 8: 10%
  bucket 8: 20%
  bucket 8: 30%
  bucket 8: 40%
  bucket 8: 50%
  bucket 8: 60%
  bucket 8: 70%
  bucket 8: 80%
  bucket 8: 90%
  bucket 8: 100%
  Sorting block of length 447 for bucket 8
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 448 for bucket 8
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 2117
fchr[G]: 3176
fchr[T]: 4236
fchr[$]: 5940
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4196453 bytes to primary EBWT file: test.rev.1.bt2
Wrote 1492 bytes to secondary EBWT file: test.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 5940
    bwtLen: 5941
    sz: 1485
    bwtSz: 1486
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 372
    offsSz: 1488
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 31
    numLines: 31
    ebwtTotLen: 1984
    ebwtTotSz: 1984
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 00:00:00

This makes me very sad...

Bowtie 2 manual claims --score-min default in --local mode is G,20,8, but is actually G,0,10

The bowtie 2 manual entry for --score-min claims:
"... The default in --end-to-end mode is L,-0.6,-0.6 and the default in --local mode is G,20,8"

However, when --local (or one of the *-local presets) is specified and --score-min G,20,8 is provided the results are different then when leaving --score-min off the command line.

Upon inspection of the source code, I found within scoring.h:

define DEFAULT_MIN_CONST_LOCAL (0.0f)

define DEFAULT_MIN_LINEAR_LOCAL (10.0f)

These constants are used to initialize the costMin variable in aligner_seed_policy.cpp.

providing --score-min G,0,10 in local mode and the results become consistent

option --dovetail or --no-dovetail

I found that in the online manual for bowite 2.2.9, there is an option --dovetail but in the help message of bowtie2 program, there is no --dovetail but a --no-dovetail option.
I also found that the 'gDovetailMatesOK' is false by default in file bt2_search.cpp#L10.

may I set 'gDovetailMatesOK' to true by enable --dovetail ?

and perhaps the online manual or the program help message should be updated .

Not aligning real-simple matches

I have a full file of 3M ribosomal genes and proteins that I align my reads to first, then align the not matched (--un-gz) to my protein database.

I noticed I still got reads that map 100% ID on BLASTN to ribosomal RNAs slipping through to into the not matched file, though the gene that they map to on BLASTN is in the index/database. To verify this for sure, I added a PolyA "gene" of 300 As to the database/index. There are still many poly A reads (eg. see below) ending up in the "not matched" file, with similar reads ending up in the "matched" file. This tells me bowtie is NOT aligning reads correctly - missing many real matches.

@Uniq1129;size=4207;
CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

Bowtie2 character encoding and format error

Although bowtie2 detects errors in the fastq file, it does not give accurate information to determine where is the error.
For example, an error like this:
"Saw ASCII character 0 but expected 33-based Phred qual.
terminate called after throwing an instance of 'int'
(ERR): bowtie2-align died with signal 6 (ABRT) "
is thrown if there is a flipped bit somewhere in the fastq file or the fastq files is incorrectly formatted (e.g. missing read seq or quality).

Make Bowtie2 architecture independent

Hi @BenLangmead, any plan to make Bowtie2 architecture independent? Bowtie2 won't even compile for other architectures, due to hard-coded x86 SSE optimizations. It would be useful to have an option to turn this off if running this on non-x86 hardware platforms, or add in optimisations for other architectures, i.e., ARM.

bowtie2 use high sys %Cpu .

Hi.

bowtie2 use high sys %Cpu.
Doses bowtie2 use system call that will use a lot of cpu?

#top
Tasks: 1537 total,   5 running, 1532 sleeping,   0 stopped,   0 zombie
%Cpu(s): 75.7 us, 22.1 sy,  0.0 ni,  2.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 10567477+total, 96510118+free, 15082620 used, 76563960 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 10307786+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
178984 root      20   0 7494520 2.401g   3952 S  3463  0.2   1492:21 bowtie2-align-s
178955 root      20   0 7494520 2.431g   4064 S  3461  0.2   1491:43 bowtie2-align-s
179004 root      20   0 7494520 2.411g   4012 S  3459  0.2   1491:40 bowtie2-align-s
178928 root      20   0 7494520 3.101g   3952 S  3453  0.3   1489:54 bowtie2-align-s
178949 root      20   0  556796 526120   2064 R  66.1  0.0  27:38.28 samtools
178977 root      20   0  556796 525160   2100 R  64.8  0.0  27:46.40 samtools
179001 root      20   0  556796 525632   1964 R  64.8  0.0  27:45.89 samtools
178922 root      20   0  556796 524080   1952 R  64.5  0.0  27:44.98 samtools
181966 root      20   0   57680   5688   3508 R   1.6  0.0   0:00.43 top
181728 root       0 -20       0      0      0 S   0.7  0.0   0:01.82 kworker/117:2H
  5625 root      20   0   19684   3076   2400 S   0.3  0.0   0:08.69 irqbalance

# uptime
 15:06:31 up  2:22,  1 user,  load average: 144.45, 144.30, 137.25
->no overload thread

# bowtie2 --version
/usr/hpc-bio/bowtie/bin/bowtie2-align-s version 2.2.9
64-bit
Built on localhost.localdomain
Thu Apr 21 18:36:37 EDT 2016
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Options: -O3 -m64 -msse2  -funroll-loops -g3 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}


OS:CentOS  7.2
This happened on both kernel version.
# uname -a
Linux R930 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
# uname -a
Linux R930 4.4.36-1.el7.elrepo.x86_64 #1 SMP Fri Dec 2 10:57:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

bowtie2-align-s vs bowtie-align-l

new release introduces 2 new binaries
bowtie2-align-l
bowtie2-align-s
instead of bowtie2-align,
Could you please note somewhere in the documentation what is the difference?
Wouldn't be better to keep the old name for a default use case to avoid problems with other tools such as tophat ?
Thank you.

mapping bias for multimapped reads

This is an issue reported by Levi Teitz over email. I will add it here to make sure we do not forget about it and will be investigated during next release cycle.

Hi Ben,

Thanks for the quick response. I am sure that the difference isn't being caused by the differences. I aligned the files using -k 3 mode, and then went through the resulting sam file and generated a list of all reads that both mapped once to each of the g regions and that had identical alignment scores at both read pairs in all three of those alignments. I then looked through the original alignment files (the ones with the bias, where each read has only one alignment) and counted how many reads from that list were aligned to each of the g regions. The numbers were slightly different for each, but around 11500 mapped to each of g1 and g2, and around 6000 mapped to g3. (Each of the regions is ~300 kb, and the reads are 100 bp paired-end reads, so the total read counts confirm that this isn't missing a significant number of the reads that multimap. Also, I don't think it should matter, but g1 and g2 are on the same strand, and g3 is on the opposite strand.)

I also compared the reads that map to g3 when using default parameters and the reads that map to g3 when using --non-deterministic. In each case around 6000 map to g3 as I mentioned above, but only ~1400 of those two sets of 6000 overlap, which is approximately what the expected number would be it it's random.

Let me know if you want any more information, and thanks again.

Levi

From: Ben Langmead
Sent: Tuesday, March 17, 2015 9:52 PM
To: Levi Shmuel Teitz
Cc: Valentin Antonescu

Hi Levi,

Thanks for this report. CC'ing Valentin Antonescu who helps support Bowtie tools. I've battled imperfect randomness at various points in the Bowtie 2 development process. Sounds like we should take a look at this example.

Before we get too far into it, though, you call these regions "nearly" identical. Are you sure that differences in coverage can't be explained by differences in the sequences g1, g2 and g3?

Best,
Ben

On Mar 17, 2015, at 7:23 PM, Levi Shmuel Teitz wrote:

Dear Dr. Langmead,

I am a graduate student at MIT, and I have been using Bowtie2 (Version 2.1.1) in my research. I've recently been using it to align Illumina reads to the human Y chromosome, and I noticed that in some of the resulting files, there was a serious bias in the locations of multimapping reads. Specifically, there are three nearly identical regions on the Y (g1, g2, and g3); the mean depth of the areas around them was around 7.5, but g3 had a mean depth of 5.3 and g1 and g2 both had a mean depth of around 8.7. This means that the correct number of reads are being mapped to the three g regions in total; there is just a bias against g3. In addition, I ran several tests and determined that reads that map equally well to all three g regions are being mapped to g3 at a much lower frequency. (The alternative would be that reads that actually map to differences between the g regions are the cause of this pattern.) This bias also persisted when I tried rerunning Bowtie2 with the --non-deterministic option. This issue also happened in only some, but not all, of the files I was working with, but the bias was always associated with specific files, regardless of the options used while running Bowtie2.

This seems to me to be a serious problem with Bowtie2; any application which depends on the random assignment of multimapped reads could return false results. Is this an issue that you've encountered before? If not, is this something that can be fixed? I'd be glad to give more details and give what help I can to figure this out.

remove scaffold and other unplaced sequence before mapping ?

Hi,
I downloaded reference genomes from Ensembl (fasta format).
But there are lots of sequences with name "dna:scaffold": https://github.com/CTLife/TEMP/tree/master/RefGenomes

Such as Mouse_GRCm38 (mm10), except chromosome 1-19, Mt, X and Y; others should be removed before mapping ? https://github.com/CTLife/TEMP/blob/master/RefGenomes/Mouse_GRCm38.p4.txt

Such as Human_GRCh38.p5 (hg38), https://github.com/CTLife/TEMP/blob/master/RefGenomes/Human_GRCh38.p5.txt, there are 516 sequences. In addition to chromosome 1-22, Mt, X and Y; others (such as CHR_HG2241_PATCH and KI270728.1) should be removed before mapping ?

Replace std::sort with std::stable_sort

We use std::sort() inside EList::sortPortion(), but different implementations of std::sort() can give different orderings when there are ties. Should replace with std:::stable_sort() and benchmark to ensure that doesn't lose much.

Ben

Strange sam format with bowtie2

Hello Ben,

I tested bowtie2 on one of my samples and I have this error message when I wanted to convert the sam to bam file :

[sam_read1] reference 'NGACTTTGACCAGGACCAGGTC' is recognized as '*'. Parse error at line 1: invalid CIGAR operation Aborted

the Sam file contain this in the beginning :

0 ACTTCTTCTGTGACTTGGCCCC instantiate aligner_seed.cpp:99 18 CCCCTCTGATCAAACTTTCCTG instantiate aligner_seed.cpp:99 36 CCTGCTCAGATGCAATGATCAA instantiate aligner_seed.cpp:99 0 GGGGCCAAGTCACAGAAGAAGT instantiate aligner_seed.cpp:99 18 CAGGAAAGTTTGATCAGAGGGG instantiate aligner_seed.cpp:99 36 TTGATCATTGCATCTGAGCAGG instantiate aligner_seed.cpp:99 instantiateSeeds aligner_seed.cpp:387 0 CCAGGACAGCTCTGATGATGCA instantiate aligner_seed.cpp:99 18

Any idea ?

Keep /1 and /2 at end of reads

Hi,

I know it's possible to figure out the reads with the flag, but I'm wondering if you might be able to point where in the source code the read names are printed, so that I could modify it to keep the /1 and /2, rather than removing these from the names?

Thanks in advance!

Incorrect Alignments and CIGAR Strings

The read sequence is ATGCCCAGGTGCTGAAGCCCC. Bowtie 2 maps this to the guide RNA reference sequence ATGAACAGGTTCCGCAGCGG (all of the reference sequences in this analysis are 20 nucleotides long) and gives the mapping a CIGAR of 17M1I3M in the SAM file. This makes no sense. If I use the Needleman-Wunsch algorithm between these two sequences, it correctly identifies many more mismatches.

Experimental       1 ATGCCCAGGTGCTGAAGCCCC     21
                     |||..|||||.|.|.|||.. 
Library            1 ATGAACAGGTTCCGCAGCGG-     20

The best alignment is actually with the reference sequence TGCCCAGGTGCTGAAGCCCC, which Bowtie 2 does not report.

This could be the same problem identified previously in #57.

Bowtie2 hangs when loading reverse index

I have an index over a 2.6 G fasta file, which I currently have stored in the smaller format '-s'. I'm basically unable to get past the last line below in the bowtie2 timing:

/usr/local/bin/bowtie2 -t --ma 8 --mp 8 --rdg 3,1 --rfg 5,2 -D 500 -R 10 -L 27 --local --score-min G,50,50 -p 8 -f -x WB-16s-RDP_11.3/wb16s -U s1_p0.br.fasta -S s1_p0.br.sam

Time loading reference: 00:00:01
Time loading forward index: 00:00:03

And it never gets to the next line, which I know to be the loading of the reverse index. How can I debug this? The process sits at 100% CPU and 2.9G memory for over a day now, I've tried rebuilding bowtie2 (doesn't do anything), build a smaller index to test the threading (works as expected).

I'm now trying to build the original index forcing it to use --large-index.

thanks, jim

paired end mode: is --un-conc reporting incompatible with -k mode?

I often use -k mode to find multiple valid alignments per read. I also often use the --un-conc-gz reporting mode to capture non-concordantly aligned read pairs for subsequent analysis. By design, the pair of non-aligned read files have identical numbers of reads in identical matching order, so they are still valid mate pair files.

But when I use -k with a value greater than one during a paired end alignment, the --un-conc files end up with unequal numbers of reads. Clearly some reads are only being written to one of the un-aligned files, breaking the rule for valid mate pair files. In tests, the mate-2 file ends up with more reads, but neither file has the expected number of reads -- given the summary alignment count details reported by Bowtie2.

Are these two modes incompatible, or is finding secondary alignments somehow corrupting what gets reported as non-concordant read pairs? Thanks!

build issues with 2.2.7, 2.2.8 and TBB

Hi,
I seem to be having some trouble building with WITH_TBB=1
This is for 2.2.7, I could also not build 2.2.6 with TBB though.

Here is what I get when I do WITH_TBB=1 make
TLDR: the build is broken with numerous error: ‘tthread’ was not declared in this scope

and

Here is what I get when I do WITH_TBB=0 make
TLDR: the build goes fine

Can you give me any tips for solving this?

sorry

sorry. I posted wrong place.

Checksum change

Installing bowtie2 with Homebrew currently fails due to a mismatch between the SHA256 checksum recorded by the Homebrew team and the checksum produced by bowtie2-2.2.6:

==> Installing bowtie2 from homebrew/homebrew-science
==> Downloading https://github.com/BenLangmead/bowtie2/archive/v2.2.6.tar.gz
==> Downloading from https://codeload.github.com/BenLangmead/bowtie2/tar.gz/v2.2.6
######################################################################## 100.0%
Error: SHA256 mismatch
Expected: fb4d09a96700cc929e8191659ee8509bb2f19816235322d1f012338d4a177358
Actual: 06d584040d9ce457873c59e4a5889aafe1a5f74ada207793335765d7abdf4eeb
Archive: /Library/Caches/Homebrew/bowtie2-2.2.6.tar.gz
To retry an incomplete download, remove the file above.

Is this intended? We just need to get confirmation before changing the checksum on our side.

build broken with 2.2.9, 2.3.0, 2.3.1 and TBB

Hi @BenLangmead, @val-antonescu, building the 2.2.9 release WITH_TBB=1 still seems to be broken:
log of failed build WITH_TBB=1
and
log of successful build WITH_TBB=0

Here's a small excerpt of the error log:

blockwise_sa.h:218:5: error: looser throw specifier for ‘KarkkainenBlockwiseSA<TStr>::~KarkkainenBlockwiseSA() noexcept (false) [with TStr = S2bDnaString]’
     ~KarkkainenBlockwiseSA()
     ^
In file included from bt2_idx.h:42:0,
                 from bt2_build.cpp:27:
blockwise_sa.h:178:7: error:   overriding ‘virtual InorderBlockwiseSA<S2bDnaString>::~InorderBlockwiseSA() noexcept’
 class InorderBlockwiseSA : public BlockwiseSA<TStr> {

Bioconda not up to date with bowtie2 2.2.9

Hello,
To keep me up to date with latest packages i use Bioconda to update all my bioIT packages, which i personally find very useful.
I just realized that bowtie2 is still in version 2.2.8 in the Bioconda repository where as release 2.2.9 is out since April 2016.

I´m just wondering is bowtie 2 is not promoted anymore on the Bioconda website and if i should directly download latest releases from your website ?

Thanks

Installation of bowtie2 on MacOS using terminal

I'm trying to install bowtie2 (latest version) on MacOS using terminal. It's not working. Giving following error.
-bash: bowtie2: command not found
-bash: bowtie2-build: command not found
etc...
I'm using following platform.
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin16).
Can anyone guide me to install bowtie2 on MacOS

Error reading RefRecord 'first' flag

Hi everyone !

I have some trouble with using bowtie2 through tophat2.

I'm trying to align a bunch of reads against my assembly :


Loutre:~$ tophat2 --num-threads 25 --max-intron-length 50000 -o \
> /media/loutre/SUZUKII/annotation/tophat_out \
> /media/loutre/SUZUKII/annotation/indexes/cleaned_canu3_suzukii \
> '/media/loutre/SUZUKII/annotation/evidences/suz_ant.R1.fastq.gz'\
> ,'/media/loutre/SUZUKII/annotation/evidences/suz_ovi.R1.fastq.gz'\
> ,'/media/loutre/SUZUKII/annotation/evidences/suz_pro.R1.fastq.gz'\
> ,'/media/loutre/SUZUKII/annotation/evidences/suz_tar.R1.fastq.gz'\
>  '/media/loutre/SUZUKII/annotation/evidences/suz_ant.R2.fastq.gz'\
> ,'/media/loutre/SUZUKII/annotation/evidences/suz_ovi.R2.fastq.gz'\
> ,'/media/loutre/SUZUKII/annotation/evidences/suz_pro.R2.fastq.gz'\
> ,'/media/loutre/SUZUKII/annotation/evidences/suz_tar.R2.fastq.gz'

This command seem to work at first, but after approximatively 2hours, it throws an bowtie2 error message :

[2016-08-16 13:31:56] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2016-08-16 13:31:56] Checking for Bowtie
          Bowtie version:    2.1.0.0
[2016-08-16 13:31:56] Checking for Samtools
        Samtools version:    0.1.19.0
[2016-08-16 13:31:56] Checking for Bowtie index files (genome)..
[2016-08-16 13:31:56] Checking for reference FASTA file
[2016-08-16 13:31:56] Generating SAM header for /media/loutre/SUZUKII/annotation/indexes/cleaned_canu3_suzukii
    format:      fastq
    quality scale:   phred33 (default)
[2016-08-16 13:31:56] Preparing reads
     left reads: min. length=100, max. length=100, 180811539 kept reads (47437 discarded)
    right reads: min. length=100, max. length=100, 180039152 kept reads (819824 discarded)
[2016-08-16 15:05:07] Mapping left_kept_reads to genome cleaned_canu3_suzukii with Bowtie2 
    [FAILED]
Error running bowtie:
Error reading RefRecord 'first' flag
Error: Encountered internal Bowtie 2 exception (#1)
Command: /usr/bin/bowtie2-align -q -k 20 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 25 --sam-no-hd -x /media/loutre/SUZUKII/annotation/indexes/cleaned_canu3_suzukii - 

Am I doing something wrong ?

Thanks for helping,

Cheers,

Roxane

Multiseed length and policy string parsing

Certain command line arguments that are passed via policy string to SeedAlignmentPolicy::parseString are parsed in an order-specific manner, and can lead to unintended behavior.

Specifically, the multiseed length parameter can be set by both the SEED and SEEDLEN policy string arguments. The SEED policy is set by the -N and --multiseed command-line arguments; SEEDLEN policy is set by the -L command-line argument. If a SEED policy string appears after the SEEDLEN string, it will overwrite the multiseed length parameter specified by SEEDLEN policy. The effect of this is that the -N and -L command-line arguments must be given in the proper order, or else the multiseed length parameter will be set to the default value (22) rather than the value specified by the -L command-line argument.

To recreate this behavior:
bowtie2 -N 1 -L 10 --verbose gives the following policy string: SEED=0,22;DPS=15;ROUNDS=2;IVAL=S,1,1.15;SEED=1;SEEDLEN=10. The multiseed length variable (multiseedLen) is set by both SEED parameters and the SEEDLEN parameter. It is first set to 22 as specified in the first SEED policy string, then to 22 again as a result of the handling of the SEED string, and finally to 10 as per the SEEDLEN parameter from the -L command-line option. multiseedLen value is 10 and the multiseed alignment proceeds as expected.

bowtie2 -L 10 -N 1 --verbose gives the following policy string: SEED=0,22;DPS=15;ROUNDS=2;IVAL=S,1,1.15;SEEDLEN=10;SEED=1. The multiseed length variable (multiseedLen) is set first to 22 as specified in the first SEED policy string, then to 10 by the SEEDLEN parameter. The final SEED parameter results in multiseedLen being overwritten with the default value of 22. This is the result of the if(ctoks.size() >= 2) check (aligner_seed_policy.cpp, line 560), which evaluates as false and resets the multiseedLen value. As a result, multiseedLen value is 22 rather than 10 as specified on the command line. The multiseed alignment will attempt to extract seeds of length 22, which differs from the expected behavior.

I have verified this behavior in bowtie2-2.2.4 as well as the current repository version.

Fix: I don't have a specific fix as I am not a project member, nor am I very familiar with the bowtie2 code base. However, placing the -L argument at the end of the command will work. Alternatively, one can comment out aligner_seed_policy.cpp line 560 without any obvious ill effects.

P.S. The handling of the SEED parameter also gives rise to another unintended problem. In local mode the default seed length is 20. Passing the command-line arguments --local -N 1 will produce a policy string ending with SEED=1, resulting in the multiseed length being reset to 22 rather than the local-mode default of 20.

bowtie2 core dumps on references with long homopolymer runs

I'm aligning (in batches) around 20TB of read data against several thousand microbial genomes. Some of these batches fail with core dumps after a very long runtime (around 10x as long as those that are successful.) I've tried looking into why only certain batches fail, and what I've found is that the genomes it fails on are those which contain long (likely incorrect) homopolymeric repeats. One example is:

gi|257136525|ref|NZ_GG699286.1| Xanthomonas campestris pv. vasculorum NCPPB702 genomic scaffold scf_7293_715, whole genome shotgun sequence

Examples of the homopolymeric stretches:

WARNING: Sequence ID gi|257136525|ref|NZ_GG699286.1| contains a homopolymer run (T) of length 45972
WARNING: Sequence ID gi|257136529|ref|NZ_GG699290.1| contains a homopolymer run (A) of length 131072
WARNING: Sequence ID gi|257136550|ref|NZ_GG699311.1| contains a homopolymer run (T) of length 51385
WARNING: Sequence ID gi|257136550|ref|NZ_GG699311.1| contains a homopolymer run (A) of length 262144
WARNING: Sequence ID gi|257136567|ref|NZ_GG699328.1| contains a homopolymer run (A) of length 61064

Obviously these are incorrect sequences, but many entries like this still appear in the public entries and cause bowtie2 to fail. When I replace them with Ns, bowtie2 runs to completion. Is this a known issue with bowtie2?

(I'm using bowtie2-2.2.4)

Multithreading issue, unpaired mode

I'm currently trying to use bowtie2 to align paired reads merged in a single file, and to gain some time I'm trying to use the multithreading option of bowtie2 (-p/--threads option). Yet, so far either on my personal machine or on the cloud solution we're using, bowtie2-align-l (called by bowtie2) runs on a single core, for hours and hours then (merged reads file >= 10GB).

Edit : Ok, I think it's my bad, I've tried with small files and it works nicely, we're just using too big reads files. You can close the issue.

Spaces in executable path or file paths

There are a couple of places in the bowtie wrapper perl script that execute a command using Perl's open:

"open(BT, "$cmd |")"

which passes $cmd to the shell. Unfortunately, if the executable is in a location where the path contains spaces or if any of the paths to data files contain spaces the command fails. Could someone fix this?

bowtie2-align-s sometimes hangs indefinitely when multiple threads are used

See the attached file for example data. This works as expected:

bowtie2 --threads 2 --very-sensitive --end-to-end -x reference.fasta -1 R1.fastq -2 R2.fastq -U merged.fastq -S output.sam

However, adding the --reorder flag causes this command to hang indefinitely, comsuming 100% of CPU:

bowtie2 --threads 2 --reorder --very-sensitive --end-to-end -x reference.fasta -1 R1.fastq -2 R2.fastq -U merged.fastq -S output.sam

If I remove "-U merged.fastq", it works (although it produces different results, of course). If I remove --reorder, it works. If I remove "--threads 2" it works. It's the combination of multiple threads, --reorder, and a merged fastq that triggers this bug.

bowtie2_threads_example.zip

Extracting read pairs that have been concordantly aligned exactly 1 time

I have a following alignment summary:

Multiseed full-index search: 00:02:04 771613 reads; of these: 771613 (100.00%) were paired; of these: 63890 (8.28%) aligned concordantly 0 times 573948 (74.38%) aligned concordantly exactly 1 time 133775 (17.34%) aligned concordantly >1 times ---- 63890 pairs aligned concordantly 0 times; of these: 29942 (46.86%) aligned discordantly 1 time ---- 33948 pairs aligned 0 times concordantly or discordantly; of these: 67896 mates make up the pairs; of these: 40102 (59.06%) aligned 0 times 10516 (15.49%) aligned exactly 1 time 17278 (25.45%) aligned >1 times 97.40% overall alignment rate

I want to extract read pairs that have been concordantly aligned exactly 1 time.

I can get those pairs that have been aligned concordantly either exactly 1 time or more than 1 time using the following command:

samtools view -h in.bam | grep "YT:Z:CP" | wc -l

This gives 1415446 = 2 x 573948 + 2 x 133775

I have also tried

samtools view -h in.bam | grep "YT:Z:CP" | grep -v "XS:i" | wc -l

Which gives 1123839, but the correct result is 1147896.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.