kaist-ina / bwa-meme Goto Github PK

BWA-MEME: Faster BWA-MEM2 using learned-index

Home Page: https://ina.kaist.ac.kr/projects/bwa-meme/

License: MIT License

Makefile 0.77% Rust 8.28% C++ 45.50% Shell 0.45% CMake 0.94% C 44.03% CSS 0.02%

bioinformatics ngs alignment-algorithm bwa-mem bwa-mem2 genomics learned-index short-read-mapping machine-learning bwa-meme

bwa-meme's Introduction

BWA-MEME: BWA-MEM emulated with a machine learning approach

BWA-MEME generates the same SAM output as BWA-MEM2 and the original bwa mem 0.7.17. [1] [2]
BWA-MEME is optimized for CPU usage, achieving up to 1.4x higher alignment throughput, no specialized hardware is required.
The seeding throughput of BWA-MEME is up to 3.32x faster than BWA-MEM2.
BWA-MEME can adapt to a wide range of server memory sizes, from 38GB to 128GB.
To accelerate index loading in low disk speed env (e.g., luster, HDD disks), BWA-MEME provides runtime index-building that is equivalent to disk read speed of 3-5 GB per second.

BWA-MEME has been used in numerous institutions:

Contact: Youngmok Jung, Dongsu Han

Email: [email protected], [email protected]

When to use BWA-MEME
Performance of BWA-MEME
Getting Started
- Option 1. Bioconda
- Option 2. Build locally
Changing memory requirement for index in BWA-MEME
Notes
Citation

When to use BWA-MEME

Anyone who use BWA-MEM or BWA-MEM2 in CPU-only machine (BWA-MEME requires 38GB of memory for index at minimal mode)
Building high-throughput NGS alignment cluster with low cost/throughput. CPU-only alignment can be cheaper than using hardware acceleration (GPU, FPGA).
Just add single option "-7" to deploy BWA-MEME instead of BWA-MEM2 (BWA-MEME does not change anything, except the speed).

Performance of BWA-MEME

The seeding module of BWA-MEME uses Learned-index. This results in 3.32x higher seeding throughput compared to FM-index of BWA-MEM2.

End-to-end alignment throughput is up to 1.4x higher than BWA-MEM2.

Getting Started

Install Option 1. Bioconda

# Install with conda, bwa-meme and the learned-index train script "build_rmis_dna.sh" will be installed
conda install -c conda-forge -c bioconda bwa-meme

# Print version and Mode of compiled binary executable
# bwa-meme binary automatically choose the binary based on the SIMD instruction supported (SSE, AVX2, AVX512 ...)
# Other modes of bwa-meme is available as bwa-meme_mode1 or bwa-meme_mode2
bwa-meme version

Build index of the reference DNA sequence

# Build index (Takes ~1hr for human genome)
# we recommend using at least 8 threads
bwa-meme index -a meme <input.fasta> -t <thread number>

Training P-RMI

# Run code below to train P-RMI, suffix array is required which is generated in index build code
# takes about 15 minute for human genome with single thread
build_rmis_dna.sh <input.fasta>

Run alignment and compare SAM output with BWA-MEM2

# Perform alignment with BWA-MEME, add -7 option
bwa-meme mem -7 -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_meme.sam>

# Below runs alignment with BWA-MEM2, without -7 option
bwa-meme mem -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_mem2.sam>

# Compare output SAM files
diff <output_mem2.sam> <output_meme.sam>

# To diff large SAM files use https://github.com/unhammer/diff-large-files

Install Option 2. Build locally

Required libraries

sudo apt-get install libz-dev cmake

Compile the code

# Compile from source
git clone https://github.com/kaist-ina/BWA-MEME.git BWA-MEME
cd BWA-MEME

# To compile all binary executables run below command. 
# Put the highest number of available vCPU cores
# You should also have cmake installed. Download by sudo apt-get install cmake
make -j<num_threads>

# Print version and Mode of compiled binary executable
# bwa-meme binary automatically choose the binary based on the SIMD instruction supported (SSE, AVX2, AVX512 ...)
# Other modes of bwa-meme is available as bwa-meme_mode1 or bwa-meme_mode2
./bwa-meme version

# For bwa-meme with mode 1 or 2 see below

Build index of the reference DNA sequence

# Build index (Takes ~1hr for human genome)
# we recommend using 32 threads
./bwa-meme index -a meme <input.fasta> -t <thread number>

Training P-RMI

Prerequisites for building locally: To use the train code, please install Rust.

# Run code below to train P-RMI, suffix array is required which is generated in index build code
# takes about 15 minute for human genome with single thread
./build_rmis_dna.sh <input.fasta>

Run alignment and compare SAM output with BWA-MEM2

# Perform alignment with BWA-MEME, add -7 option
./bwa-meme mem -7 -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_meme.sam>

# Below runs alignment with BWA-MEM2, without -7 option
./bwa-meme mem -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_mem2.sam>

# Compare output SAM files
diff <output_mem2.sam> <output_meme.sam>

# To diff large SAM files use https://github.com/unhammer/diff-large-files

Test scripts and executables are available in the BWA-MEME/test folder

Changing memory requirement for index in BWA-MEME

# You can check the MODE value by running version command
# mode 1: 38GB in index size
./bwa-meme_mode1 version
# mode 2: 88GB in index size
./bwa-meme_mode2 version
# mode 3: 118GB in index size, fastest mode
./bwa-meme  version

# If binary executable does not exist, run below command to compile
make clean
make -j<number of threads>

Notes

BWA-MEME requires at least 64 GB RAM (with minimal acceleration BWA-MEME requires 38GB of memory). For WGS runs on human genome (>32 threads) with full acceleration of BWA-MEME, it is recommended to have 140-192 GB RAM.
When deploying BWA-MEME with many threads, mimalloc library is recommended for a better performance (Enabled at default).

Building pipeline with Samtools

Credits to @keiranmraine, see issue #10

Due to increased alignment throughput, given enough threads the bottleneck moves from alignment to Samtools sorting. As a result BWA-MEME might require additional pipeline modification (not a simple drop-in replacement)
To reduce the CPU waste, you might want to use mbuffer in the pipeline or write alignment outputs to a file with fast compression.

# mbuffer size should be determined by memory option given to samtools.
# ex) samtools sort uses 20 threads, 1G per each thread, so mbuffer size should be 20G (= -m 1G x -@ 20)
bwa-meme mem -7 -K 100000000 -t 32 \
 <reference> <fastq 1> <fastq 2> \
 | mbuffer -m 20G \
 | samtools sort -m 1G --output-fmt bam,level=1 -T ./sorttmp -@ 20 - > sorted.bam

Reference file download

You can download the reference using the command below.

# Download human_g1k_v37.fasta human genome and decompress it
wget -c ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
gunzip human_g1k_v37.fasta.gz

# hg38 human reference
wget -c https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta

Download MEME indices and pretrained P-RMI model

# We provide the pretrained models and all indices required alignment (for hg37 and hg38 human reference) 
# you can download in the link below.
https://web.inalab.net/~bwa-meme/

# Indices of MEME and models should be in the same folder, we follow the prefix-based loading in bwa-mem

Citation

If you use BWA-MEME, please cite the following paper

Youngmok Jung, Dongsu Han, BWA-MEME: BWA-MEM emulated with a machine learning approach, Bioinformatics, Volume 38, Issue 9, 1 May 2022, Pages 2404–2413, https://doi.org/10.1093/bioinformatics/btac137

@article{10.1093/bioinformatics/btac137,
    author = {Jung, Youngmok and Han, Dongsu},
    title = "{BWA-MEME: BWA-MEM emulated with a machine learning approach}",
    journal = {Bioinformatics},
    volume = {38},
    number = {9},
    pages = {2404-2413},
    year = {2022},
    month = {03},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac137},
    url = {https://doi.org/10.1093/bioinformatics/btac137},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/38/9/2404/43480985/btac137.pdf},
}

bwa-meme's People

Contributors

Stargazers

Watchers

Forkers

nh13 woojinnn jianshu93 wook2014 gemmanguen animesh zm-git-dev healthvivo yang-mj schaudge galateabio2 galatea-bio theteaterra shopsheep

bwa-meme's Issues

ERROR: fail to find the right executable

Hi there,
I use conda conda create -n meme -c bioconda -c conda-forge bwa-meme=1.0.6 it doesn't throw any erro,then I activate my env and input bwa-meme,the output is ERROR: fail to find the right executable , I can't figure out why this happen.
With thanks!

Windows binary

Can you advice about compiling Windows binary for BWA-MEME?

binary autoselect like bwa-mem2?

Is it possible to build all chipset binaries and have the tool auto-select the correct one for the system it is running on in the same way as the original bwa-mem2 works?

https://github.com/bwa-mem2/bwa-mem2#installation

Without this I'm not convinced a bioconda version will be as useful.

bwa-mem2 mem Segmentation fault (core dumped)

Hi,

I had installed bwa-meme, but failed at the bwa-mem2 mem

My index building commands :

./bwa-mem2 index -a meme -t 32 human.fna ; 
./build_rmis_dna.sh human.fna

All index file :

$ ls human.fna* -hal
lrwxrwxrwx 1 hudeneil hudeneil   45 Oct  4 10:37 human.fna
-rw-rw-r-- 1 hudeneil hudeneil 5.8G Oct  4 12:16 human.fna.0123
-rw-rw-r-- 1 hudeneil hudeneil 1.1K Oct  4 11:56 human.fna.amb
-rw-rw-r-- 1 hudeneil hudeneil 1.9K Oct  4 11:56 human.fna.ann
-rw-rw-r-- 1 hudeneil hudeneil 2.9G Oct  4 11:55 human.fna.bwt
-rw-rw-r-- 1 hudeneil hudeneil 742M Oct  4 11:56 human.fna.pac
-rw-rw-r-- 1 hudeneil hudeneil  29G Oct  4 15:05 human.fna.pos_packed
-rw-rw-r-- 1 hudeneil hudeneil  76G Oct  4 15:05 human.fna.possa_packed
-rw-rw-r-- 1 hudeneil hudeneil  29G Oct  4 15:05 human.fna.ref2sa_packed
-rw-rw-r-- 1 hudeneil hudeneil 1.5G Oct  4 12:13 human.fna.sa
-rw-rw-r-- 1 hudeneil hudeneil  47G Oct  4 15:05 human.fna.suffixarray_uint64
-rw-rw-r-- 1 hudeneil hudeneil  122 Oct  4 15:23 human.fna.suffixarray_uint64_data.h
-rw-rw-r-- 1 hudeneil hudeneil    8 Oct  4 15:23 human.fna.suffixarray_uint64_L0_PARAMETERS
-rw-rw-r-- 1 hudeneil hudeneil 1.2G Oct  4 15:24 human.fna.suffixarray_uint64_L1_PARAMETERS
-rw-rw-r-- 1 hudeneil hudeneil 6.0G Oct  4 15:24 human.fna.suffixarray_uint64_L2_PARAMETERS
drwxrwxr-x  5 hudeneil hudeneil  244 Oct  4 10:14 RMI
drwxrwxr-x  2 hudeneil hudeneil  258 Oct  4 15:24 rmi_data

bwa-mem2 mem occured `Segmentation fault (core dumped)`

There is the following message :

$ time ~/tools/BWA-MEME/bwa-mem2 mem -Y -K 100000000 -t 32 -7 ~/tools/BWA-MEME/human.fna S461.trim.fq -o S461.trim.fq.mem.sam
-----------------------------
Executing in AVX512 mode!!
-----------------------------
* SA compression enabled with xfactor: 8
* Ref file: /home/hudeneil/tools/BWA-MEME/human.fna
* Entering FMI_search
Reading other elements of the index from files /home/hudeneil/tools/BWA-MEME/human.fna
* Index prefix: /home/hudeneil/tools/BWA-MEME/human.fna
* Read 0 ALT contigs
* Reading reference genome..
* Binary seq file = /home/hudeneil/tools/BWA-MEME/human.fna.0123
* Reference genome size: 6224193392 bp
* Done reading reference genome !!

------------------------------------------
1. Memory pre-allocation for Chaining: 1419.8876 MB
2. Memory pre-allocation for BSW: 7667.7448 MB
Segmentation fault (core dumped)

How to solve this problem? Thank you.

Index loading

I'm finding it takes ~25 minutes to load the various components of the indexes, without -7 this is only a couple of minutes.

The loading of the core reference files up to the following message runs at ~100% CPU:

* Reading reference genome..
* Binary seq file = /home/kr525/rds/hpc-work/data/ref/Homo_sapiens_assembly38.fasta.0123
* Reference genome size: 6434693834 bp
* Done reading reference genome !!

The section as follows runs at 5-15% CPU indicating disk-wait:

[M::memoryAllocLearned::LEARNED] Reading kmer SA index File to memory
[M::memoryAllocLearned::LEARNED] Reading ref2sa index File to memory

Is there anything obvious relating to the file reading that could account for this?

I expect unrelated, but I did notice that ref.suffixarray_uint64 can be compressed with gzip -1 for 50% reduction in size.
Decompression cost for L1 is negligible compared to the disk latency (and will be more cost effective for systems with IOPs accounting).

bwa-meme index

Some questions/comments on the index command.

If you pre-generate BWT with -a mem2 will -a meme skip the BWT build?
- BWT is single threaded from what I remember, we get charged for CPU so doing low resource bit separately is useful
The help for bwa-meme index references bwa-mem2 and doesn't list meme as an option for -a.

(sorry, hopefully I'm more helpful than irritating)

bwa-meme error

Hello，

when I was using the bwa-meme7 for some data，it turns out the quality and sequence are not in same length，but when I using bwa-meme without -7 or bwa，the error is gone？

how could it be，thankyou！

all log is
`Looking to launch executable "/mnt/ilustre/users/isanger/app/bioinfo/dna/miniconda3/envs/bwa-meme-2023/bin/bwa-meme_mode3.avx2", simd = _mode3.avx2
Launching executable "/mnt/ilustre/users/isanger/app/bioinfo/dna/miniconda3/envs/bwa-meme-2023/bin/bwa-meme_mode3.avx2"

Executing in AVX2 mode!!

SA compression enabled with xfactor: 8
Ref file: /mnt/ilustre/isanger_workspaceWgsV4/20230224/WgsV4_i5p0_mt24tbhp52mtk1hqco7o2v/GenomeConfigV4/output/ref.fa
Entering FMI_search
Reading other elements of the index from files /mnt/ilustre/isanger_workspaceWgsV4/20230224/WgsV4_i5p0_mt24tbhp52mtk1hqco7o2v/GenomeConfigV4/output/ref.fa
Index prefix: /mnt/ilustre/isanger_workspaceWgsV4/20230224/WgsV4_i5p0_mt24tbhp52mtk1hqco7o2v/GenomeConfigV4/output/ref.fa
Read 0 ALT contigs
Reading reference genome..
Binary seq file = /mnt/ilustre/isanger_workspaceWgsV4/20230224/WgsV4_i5p0_mt24tbhp52mtk1hqco7o2v/GenomeConfigV4/output/ref.fa.0123
Reference genome size: 920699320 bp
Done reading reference genome !!

Memory pre-allocation for Chaining: 1135.9148 MB
Memory pre-allocation for BSW: 1916.9362 MB
[M::memoryAllocLearned::MEME] Reading Learned-index models into memory
[Learned-Config] MODE:3 SEARCH_METHOD: 1 MEM_TRADEOFF:1 EXPONENTIAL_SMEMSEARCH: 1 DEBUG_MODE:0 Num 2nd Models:268435456 PWL Bits Used:28
[M::memoryAllocLearned::MEME] Loading RMI model and Pac reference file took 9.015 sec
[M::memoryAllocLearned::MEME] Reading suffix array into memory
[M::memoryAllocLearned::MEME] Loading pos_packed file took 10.711 sec
[M::memoryAllocLearned::MEME] Generating SA, 64-bit Suffix and ISA in memory
[M::memoryAllocLearned::MEME] Runtime-build-index took 18.149 sec
Memory pre-allocation for BWT: 4732.1747 MB

Threads used (compute): 8
No. of pipeline threads: 2

[0000] read_chunk: 80000000, work_chunk_size: 80000037, nseq: 531012
[0000][ M::kt_pipeline] read 531012 sequences (80000037 bp)...
[0000] Reallocating initial memory allocations!!
[0000] Calling mem_process_seqs.., task: 0
[0000] 1. Calling kt_for - worker_bwt
[0000] read_chunk: 80000000, work_chunk_size: 80000068, nseq: 530970
[0000][ M::kt_pipeline] read 530970 sequences (80000068 bp)...
[0000] 2. Calling kt_for - worker_aln
[0000] Inferring insert size distribution of PE reads from data, l_pac: 460349660, n: 531012
[0000][PE] # candidate unique pairs for (FF, FR, RF, RR): (20, 156214, 41, 27)
[0000][PE] analyzing insert size distribution for orientation FF...
[0000][PE] (25, 50, 75) percentile: (221, 707, 3516)
[0000][PE] low and high boundaries for computing mean and std.dev: (1, 10106)
[0000][PE] mean and std.dev: (1828.05, 2315.16)
[0000][PE] low and high boundaries for proper pairs: (1, 13401)
[0000][PE] analyzing insert size distribution for orientation FR...
[0000][PE] (25, 50, 75) percentile: (248, 299, 358)
[0000][PE] low and high boundaries for computing mean and std.dev: (28, 578)
[0000][PE] mean and std.dev: (303.99, 82.31)
[0000][PE] low and high boundaries for proper pairs: (1, 688)
[0000][PE] analyzing insert size distribution for orientation RF...
[0000][PE] (25, 50, 75) percentile: (324, 1223, 4529)
[0000][PE] low and high boundaries for computing mean and std.dev: (1, 12939)
[0000][PE] mean and std.dev: (2298.71, 2392.36)
[0000][PE] low and high boundaries for proper pairs: (1, 17144)
[0000][PE] analyzing insert size distribution for orientation RR...
[0000][PE] (25, 50, 75) percentile: (476, 1843, 6054)
[0000][PE] low and high boundaries for computing mean and std.dev: (1, 17210)
[0000][PE] mean and std.dev: (3128.30, 3213.03)
[0000][PE] low and high boundaries for proper pairs: (1, 22788)
[0000][PE] skip orientation FF
[0000][PE] skip orientation RF
[0000][PE] skip orientation RR
[0000] 3. Calling kt_for - worker_sam
[0000][ M::mem_process_seqs] Processed 531012 reads in 260.608 CPU sec, 32.703 real sec
[0000] Calling mem_process_seqs.., task: 1
[0000] 1. Calling kt_for - worker_bwt
[0000] read_chunk: 80000000, work_chunk_size: 80000073, nseq: 530864
[0000][ M::kt_pipeline] read 530864 sequences (80000073 bp)...
[E::sam_parse1] SEQ and QUAL are of different length
samtools sort: truncated file. Aborting
`

Indexing Human Genome

Hi,
I'm trying to make the index for human genome fasta GRCh38.p14 and i've setted this command:

bwa-meme index -a meme -t 16 -p bwa-meme/bwa-meme_GRCh38.p14_genomic.fna fasta/GRCh38.p14_genomic.fna

But it still use one single core for processing. Can you help me please?

SEQ and QUAL are of different length - mode1

Hey,

due to lower memory on a different server, I was running bwa-meme (v1.0.6) mode1.
Samtools sort throws

[E::sam_parse1] SEQ and QUAL are of different length

I thought this was fixed in 1.0.6 but maybe only for mode3?

Best
Christo

installation problem

Hello! We faced a difficulty when we tried to install BWA-MEME on a linux server (bwa-mem2 already installed).

We followed the installation steps and got errors at make -j32 arch=avx512.
It showed many warnings and two errors.

The two errors are
src/LearnedIndex_seeding.cpp:468:5: error: ‘__builtin_expect_with_probability’ was not declared in this scope
src/LearnedIndex_seeding.cpp:565:5: error: ‘__builtin_expect_with_probability’ was not declared in this scope

How can we solve the problem? Thank you.

index error

First, thank you for developing nice tool.

I got an error while indexing human genome.

Step1. FASTA index

results 6 files (genome.fasta.0123, *.amb, *.ann, *.pac, *.pos_packed, *.suffixaray_uint64)

Step2. build rmis

cmd: build_rmis_dna.sh genome.fasta
output is below.

Training top-level pwl model layer

Training second-level linear model layer (num models = 268435456)

[2nd layer]Computing lower bound stats...

[2nd layer]Fixing empty models...

Computing last level errors...

Average gap: 19.046705737228734

Total Partial model num: 41300812, Leaf of partial model num: 186417

Total last layer model num: 309549851

Partial start at idx:268249039

Model build time: 867524 ms

Average model error: 1494.9997627890018 (0.00002420435750882617%)

Average model L2 error: 42821253900152.66

Average model log2 error: 5.169317311164392

Max model log2 error: 18.635250483499124

Max model error on model 309539124: 407164 (0.006592069956143941%)

I wish you would help me. thank you.

license clarification

is the LICENSE file in the repo for bwa-meme? (It appears to be for bwa-mem2).
What is the exact license for bwa-meme? thanks.

这个arm64的分支怎么使用？构建还是报一样的错误

g++: error: unrecognized command line option ‘-msse’
g++: error: unrecognized command line option ‘-msse2’
g++: error: unrecognized command line option ‘-msse3’
g++: error: unrecognized command line option ‘-mssse3’
g++: error: unrecognized command line option ‘-msse4.1’

Segmentation fault

I build Docker container then downloaded the refences provided. and ran the lowest memory requirement mode. The hardware is Mac M w 64 GB ram. Allowable ram is 50GB.

docker warning: WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

bwa-meme_mode1 version
Looking to launch executable "/opt/conda/bin/bwa-meme_mode1.sse42", simd = _mode1.sse42
Launching executable "/opt/conda/bin/bwa-meme_mode1.sse42"
Identical to BWA-MEM2 2.2
BWA-MEME v1.0.4
MEME mode 1: uses 38GB for index size in runtime

bwa-meme_mode1 mem -7 -Y -t 1 Homo_sapiens_assembly38.fasta 20A0012672-20A0012672_57977-WGS_R1_001.fastq.gz 20A0012672-20A0012672_57977-WGS_R2_001.fastq.gz -o 20A0012672_bwa-meme.sam

Looking to launch executable "/opt/conda/bin/bwa-meme_mode1.sse42", simd = _mode1.sse42
Launching executable "/opt/conda/bin/bwa-meme_mode1.sse42"

Executing in SSE4.2 mode!!

SA compression enabled with xfactor: 8
Ref file: Homo_sapiens_assembly38.fasta
Entering FMI_search
Reading other elements of the index from files Homo_sapiens_assembly38.fasta
Index prefix: Homo_sapiens_assembly38.fasta
Read 0 ALT contigs
Reading reference genome..
Binary seq file = Homo_sapiens_assembly38.fasta.0123
Reference genome size: 4354060288 bp
Done reading reference genome !!

Memory pre-allocation for Chaining: 142.0078 MB
Memory pre-allocation for BSW: 239.6170 MB
[M::memoryAllocLearned::MEME] Reading Learned-index models into memory
[Learned-Config] MODE:1 SEARCH_METHOD: 1 MEM_TRADEOFF:0 EXPONENTIAL_SMEMSEARCH: 1 DEBUG_MODE:0 Num 2nd Models:102087850 PWL Bits Used:1
[M::memoryAllocLearned::MEME] Loading RMI model and Pac reference file took 32.630 sec
[M::memoryAllocLearned::MEME] Reading suffix array into memory
[M::memoryAllocLearned::MEME] Loading-index took 42.408 sec
Memory pre-allocation for BWT: 13528.0310 MB

Threads used (compute): 1
No. of pipeline threads: 2

[0000] read_chunk: 10000000, work_chunk_size: 10000024, nseq: 68388
[0000][ M::kt_pipeline] read 68388 sequences (10000024 bp)...
[0000] Reallocating initial memory allocations!!
[0000] Calling mem_process_seqs.., task: 0
[0000] 1. Calling kt_for - worker_bwt
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

real 0m50.682s
user 0m32.737s
sys 0m17.065s

Should I compile it for the Apple M? please give me some instructions to do that.

compile error while building locally

Hello,

After git cloning the code, I run: make -j32 and got:

...
In file included from src/bwtindex.cpp:43:
src/Learnedindex.h:35:27: note:   initializing argument 1 of ‘void buildSAandLEP(char*, int)’
   35 | void buildSAandLEP( char* prefix,  int num_threads);
      |                     ~~~~~~^~~~~~
src/bwamem_pair.cpp: In function ‘int mem_matesw_batch_pre(const mem_opt_t*, const bntseq_t*, const uint8_t*, const mem_pestat_t*, const mem_alnreg_t*, int, const uint8_t*, mem_alnreg_v*, mem_cache*, int, int32_t, int32_t&, int32_t&, int32_t)’:
src/bwamem_pair.cpp:1158:33: warning: '0' flag ignored with precision and ‘%d’ gnu_printf format [-Wformat=]
 1158 |                 fprintf(stderr, "[0000][%0.4d] Re-allocating (doubling) seqBufRefs in %s\n",
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/bwamem_pair.cpp:1175:33: warning: '0' flag ignored with precision and ‘%d’ gnu_printf format [-Wformat=]
 1175 |                 fprintf(stderr, "[0000][%0.4d] Re-allocating (doubling) seqBufQers in %s\n",
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/bwamem_pair.cpp:1192:33: warning: '0' flag ignored with precision and ‘%d’ gnu_printf format [-Wformat=]
 1192 |                 fprintf(stderr, "[0000][%0.4d] Re-allocating seqPairs in %s\n", tid, __func__);
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[1]: Leaving directory '/home/user/soft/BWA-MEME'
make: *** [Makefile:123: multi] Error 2

gcc version is 9.4.0, ubuntu1~20.04.1
What can be the problem?

Thank you in advance,
Adily

Error compile from source

Hi,
i'm trying to compile from source bwa-meme with the command:

sudo make -j 32

but i have this error and i don't know how to resolve. Can you help me, please?

error.log

I'm working on e2-standard-32 google machine type

Why are bwa and BWA-MEME results inconsistent?

Dear developer:

bwa: Version: 0.7.17-r1188
BWA-MEME:v1.0.5

bwa stat of bam:
338883556 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
448166 + 0 supplementary
0 + 0 duplicates
330144737 + 0 mapped (97.42% : N/A)
338435390 + 0 paired in sequencing
169217695 + 0 read1
169217695 + 0 read2
322879394 + 0 properly paired (95.40% : N/A)
329460362 + 0 with itself and mate mapped
236209 + 0 singletons (0.07% : N/A)
5641738 + 0 with mate mapped to a different chr
2394586 + 0 with mate mapped to a different chr (mapQ>=5)
BWA-MEME stat of bam:
338883548 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
448158 + 0 supplementary
0 + 0 duplicates
330144743 + 0 mapped (97.42% : N/A)
338435390 + 0 paired in sequencing
169217695 + 0 read1
169217695 + 0 read2
322879718 + 0 properly paired (95.40% : N/A)
329460388 + 0 with itself and mate mapped
236197 + 0 singletons (0.07% : N/A)
5641548 + 0 with mate mapped to a different chr
2394572 + 0 with mate mapped to a different chr (mapQ>=5)

Q: `-K` value

Is the very large value of -K 100000000 (100 mill) used for a specific reason? Initially this was to prevent variability when specifying different numbers of threads.

The parabricks comparison command indicates10000000 (10 mill).

Would a 10 million value work without any detriment to run time?

bioconda recipe?

Given that bwa-meme will likely be a dependency for a a lot of bioinformatics tools/pipelines, it would be very helpful to have a bioconda recipe for bwa-meme, as are already available for bwa and bwa-mem2.

Are there any plans on creating a bioconda recipe for bwa-meme?

Unable to run the alignment

I have generated the indexes and model but when i try to run the command i see the following error:

(base) administrator@mastry1:/data2/raju/BWA-MEME$ ./bwa-meme mem -Y -K 100000000 -t 64 Homo_sapiens_assembly38.fasta D1_S1_L001_R1.fq -o output_mem2.sam
Looking to launch executable "/data2/raju/BWA-MEME/./bwa-meme_mode3.avx", simd = _mode3.avx
Launching executable "/data2/raju/BWA-MEME/./bwa-meme_mode3.avx"

Executing in AVX mode!!

SA compression enabled with xfactor: 8
Ref file: Homo_sapiens_assembly38.fasta
Entering FMI_search
ERROR! Unable to open the file: Homo_sapiens_assembly38.fasta.bwt.2bit.64

It says Homo_sapiens_assembly38.fasta.bwt.2bit.64 file but iam using the original reference file.

Please help.

Thanks

-Raju

Resource requirements for build_rmis_dna.sh

What are the resource requirements for the build_rmis_dna.sh script:

Threads/cpus (1?)
Memory
Approx runtime

bioconda bwa-meme-train

I understand the reasons for not including everything in the main recipe but it would be very useful to provide the tools for training.

I'm making the assumption that training is required to use Human GRCh38 (with alts suitable for bwakit). It will make this far more attractive for groups with a diverse species/build requirement.

BTW I'm actually looking at this to do a bake-off against NVIDIA parabricks, specifically as the implementation doesn't allow for use of bwa-postalt.js or any other modifications due to the heavy cost/effort required to change it.

Building pipeline with Samtools - Error

I used the command outlined in the Building pipeline with Samtools https://github.com/kaist-ina/BWA-MEME#building-pipeline-with-samtools

Looking to launch executable "/home/husamia/./BWA-MEME/bwa-meme_mode3.avx512bw", simd = _mode3.avx512bw
Launching executable "/home/husamia/./BWA-MEME/bwa-meme_mode3.avx512bw"
-----------------------------
Executing in AVX512 mode!!
-----------------------------
* SA compression enabled with xfactor: 8
* Ref file: /mnt/c/Research/Homo_sapiens_assembly38.fasta
* Entering FMI_search
Reading other elements of the index from files /mnt/c/Research/Homo_sapiens_assembly38.fasta
* Index prefix: /mnt/c/Research/Homo_sapiens_assembly38.fasta
* Read 0 ALT contigs
* Reading reference genome..
* Binary seq file = /mnt/c/Research/Homo_sapiens_assembly38.fasta.0123
* Reference genome size: 6434693834 bp
* Done reading reference genome !!

------------------------------------------
1. Memory pre-allocation for Chaining: 1419.8876 MB
2. Memory pre-allocation for BSW: 4792.3405 MB
[M::memoryAllocLearned::MEME] Reading Learned-index models into memory
[Learned-Config] MODE:3 SEARCH_METHOD: 1 MEM_TRADEOFF:1 EXPONENTIAL_SMEMSEARCH: 1 DEBUG_MODE:0 Num 2nd Models:268435456 PWL Bits Used:28
[M::memoryAllocLearned::MEME] Loading RMI model and Pac reference file took 66.232 sec
[M::memoryAllocLearned::MEME] Reading suffix array into memory
[M::memoryAllocLearned::MEME] Loading pos_packed file took 285.735 sec
[M::memoryAllocLearned::MEME] Generating SA, 64-bit Suffix and ISA in memory
[W::sam_hdr_create] Ignored @SQ SN:HLA-C*08:02:01:01 : bad or missing LN tag
[E::sam_hrecs_error] Malformed key:value pair at line 3253: "@SQ        SN:HLA-C*08:02:01:01    "
[E::sam_hrecs_error] Malformed key:value pair at line 3253: "@SQ        SN:HLA-C*08:02:01:01    "
samtools sort: failed to change sort order header to 'coordinate'

summary:  124 kiByte in  5min 40.9sec - average of  0.4 kiB/s

Can't find the recipe on bioconda

Hi there,
When running conda install -c conda-forge -c bioconda bwa-meme conda (or mamba) can't find the package...
Same with the search command
Is the recipe not available anymore?

I tries to build fron source but I'm on macos and it is not straightforward. There are some posts related to bwa-mem2 that suggest some solutions but I didn't manage to get it to work yet. So a conda-based solution would save some headaches.

With thanks!

Version Number 1.0.6

Hi,
the version number when invoking bwa-meme version is still 1.0.5.

training for other genomes, e.g. bacteria genome error

Dear bwa-meme team:

when using bacterial genome as reference I have the following error (genome are highly fragmented and are assembled from meagenome):

(base) [jzhao399@login-phoenix-3 Competitive_mapping]$ build_rmis_dna.sh ./all_mags_rename.new.fasta
Training top-level pwl model layer
Training second-level linear model layer (num models = 268435456)
[2nd layer]Computing lower bound stats...
[2nd layer]Fixing empty models...
Computing last level errors...
Average gap: 0.8713072761893272
Total Partial model num: 0, Leaf of partial model num: 0
Total last layer model num: 268435456
Partial start at idx:268435456
Model build time: 103312 ms
Average model error: 1.914389257033157 (0.0000020606146782561575%)
Average model L2 error: 78.10451166276728
Average model log2 error: 1.9188643583795388
Max model log2 error: 7.189824558880018
Max model error on model 79172320: 146 (0.0001571518132585239%)
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', rmi_lib/src/codegen.rs:399:28
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Any idea? I installed everything via condo,

Thanks,

Jianshu

stable issue or memory error for

hello ,when I using 1.06 version to do bwa-meme some how like that
" bwa-meme mem -M -a -t 8 $2 $3 $4 |mbuffer -m 8G |samtools sort --reference $2 -o $5.cram -O CRAM -@ 8 -"

but it turns out exit without any error report?

how could it be?

kaist-ina / bwa-meme Goto Github PK

bwa-meme's Introduction

BWA-MEME: BWA-MEM emulated with a machine learning approach

Contents

When to use BWA-MEME

Performance of BWA-MEME

The seeding module of BWA-MEME uses Learned-index. This results in 3.32x higher seeding throughput compared to FM-index of BWA-MEM2.

End-to-end alignment throughput is up to 1.4x higher than BWA-MEM2.

Getting Started

Install Option 1. Bioconda

Build index of the reference DNA sequence

Training P-RMI

Run alignment and compare SAM output with BWA-MEM2

Install Option 2. Build locally

Required libraries

Compile the code

Build index of the reference DNA sequence

Training P-RMI

Run alignment and compare SAM output with BWA-MEM2

Test scripts and executables are available in the BWA-MEME/test folder

Changing memory requirement for index in BWA-MEME

Notes

Building pipeline with Samtools

Reference file download

Download MEME indices and pretrained P-RMI model

Citation

bwa-meme's People

Contributors

Stargazers

Watchers

Forkers

bwa-meme's Issues

bwa-mem2 mem occured Segmentation fault (core dumped)

all log is `Looking to launch executable "/mnt/ilustre/users/isanger/app/bioinfo/dna/miniconda3/envs/bwa-meme-2023/bin/bwa-meme_mode3.avx2", simd = _mode3.avx2 Launching executable "/mnt/ilustre/users/isanger/app/bioinfo/dna/miniconda3/envs/bwa-meme-2023/bin/bwa-meme_mode3.avx2"

Executing in AVX2 mode!!

Looking to launch executable "/opt/conda/bin/bwa-meme_mode1.sse42", simd = _mode1.sse42 Launching executable "/opt/conda/bin/bwa-meme_mode1.sse42"

Executing in SSE4.2 mode!!

Executing in AVX mode!!

Recommend Projects

Recommend Topics

Recommend Org

bwa-mem2 mem occured `Segmentation fault (core dumped)`

all log is
`Looking to launch executable "/mnt/ilustre/users/isanger/app/bioinfo/dna/miniconda3/envs/bwa-meme-2023/bin/bwa-meme_mode3.avx2", simd = _mode3.avx2
Launching executable "/mnt/ilustre/users/isanger/app/bioinfo/dna/miniconda3/envs/bwa-meme-2023/bin/bwa-meme_mode3.avx2"

Looking to launch executable "/opt/conda/bin/bwa-meme_mode1.sse42", simd = _mode1.sse42
Launching executable "/opt/conda/bin/bwa-meme_mode1.sse42"