itbe-lab / ma Goto Github PK

View Code? Open in Web Editor NEW

45.0 2.0 3.0 11.72 MB

The Modular Aligner and The Modular SV Caller

License: MIT License

Python 7.96% C++ 70.53% C 19.32% CMake 2.04% Shell 0.05% NSIS 0.08% Makefile 0.02% HTML 0.01% Batchfile 0.01%

alignment bioinformatics sequence-alignment read-aligners fm-index structural-variation seeds

ma's People

Contributors

Stargazers

Watchers

Forkers

animesh yancheer

ma's Issues

parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: '>'

Index out of range exception for paired end reads

I was trying to run MA on paired end short reads with:

$ maCMD -x dbs/ma.json -p Illumina_Paired -i datasets/short_read.qc_1.fq -m datasets/short_read.qc_2.fq -o outputs/metax.pe.sam

But I got the error below:

starting alignment.
Drop exception (different thread threw already): Index out of range
Drop exception (different thread threw already): Index out of range
Drop exception (different thread threw already): Index out of range
Throw exception
Error:
Index out of range

It was fine with single end file or comma separated list of files.

redirect sam to standard output for piping

Dear ModularAligner team,
thank you very much for your great job setting up a new read aligner. In order to include it into already existing pipelines, could it be possible to allow maCMD to write the .sam to standard output instead of writing to a file? processing tools could be piped in that way to speed up the process.

Thank you very much!

Best regards,

cigar string

Hi Arne and Markus,
Thank you for the tool and I really enjoy the speed of MA.
I have a pipeline which relays one the CIGAR string of the sam file. The cigar string of MA output is like "2=3X2=1X1=4D5=1X7=3X4=1X5=1X6=1X1=4D2=3D10=3D2=3I1=1D5=2X4=2I" and it's not compatible with my current pipeline. I wonder if there is a way or instruction to convert the CIGAR string to the standard format like minimap2?
Thank you,
Gorliver

Preset for PacBio CCS?

Hi!

Do you have parameter recommendations for long accurate reads such as HiFi PacBio reads of ~15kb >Q20?

Thank you,
Armin

closely related species as a reference for seed

Hi,
I am working on a genome expected size is around 10g.
I just wonder is it possible to use a closely related genome as a seed for the alignment?

prebuilt binary for aarch64 platform

Thank you for developing such an useful tool. It works very well for me on x86 servers/computers. But recently we have server based on aarch64. I tried to compile and install MA on it, but did not succeed. I was wondering if there will be a prebuilt binary available to download or install via conda for aarch64?

Best,
Zhi-Luo

Feature request: output BAM and/or print to STDOUT

Dear developers,

I am just trying this out with mapping PromethION data against a very large (6x larger than the human genome) and fragmented genome assembly. It seems to perform quite well. Good job!

I wonder if you could consider adding a way to either print to a BAM file, gzip the SAM file or just print to STDOUT, leaving it to the user to deal with saving the alignments (for instance by first processing it arbitrarily with samtools).

In my case, this is mostly about saving disk space but at least the latter option would make the program more easy to integrate into arbitrary pipelines.

Cheers!

Can I use MA to align assembled contig to the reference genome?

thank you for developing this nice aligner. I was wondering what is the recommended range of query length for MA? Can it be used to align assembled contigs with length up to 10M bases to the reference genome? If it is fine, which parameter setting is recommended?

test_aligner.py class issue?

Hi,

I'm testing out some basic functions and trying to understand how to play with the ExecutionContext. I tried running the setupaligner.py code, but ran into the following issue:

$ python -c "from MA import *; test_aligner(); exit(0)"
seed set (q,r,l):
49 999999 2
0 100 25
26 126 24
51 151 24
76 176 24
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/MA/build/MA/setupaligner.py", line 37, in test_aligner
    for i, seeds in enumerate(harm_seeds):
TypeError: 'libMS.containers.Container' object is not iterable

It looks like there were some issues with this line: harm_seeds = harm_module.execute(socs, query, reference_index)

Maybe it's something obvious I'm missing.
Thanks.

Multiple identical alignments were produced in the output

I noticed that sometimes, MA produces SAM file with the same alignment occurring multiple times on Nanopore data. They have the same read, ref contig, start pos, and even the cigar. The command line I used:

 maCMD -t 20 -x ma_db.json -p Nanopore -i barcode01.fq.gz -o barcode01.sam -s maxSpan -M 5 --Use_M_in_CIGAR false

One of the read have six the same alignment (only show read, ref_contig, pos):

185-b77e-25cb88888888    GCF_000182965.3|NC_032094.1       171367
185-b77e-25cb88888888    GCF_000182965.3|NC_032094.1       171367
185-b77e-25cb88888888    GCF_000182965.3|NC_032094.1       171367
185-b77e-25cb88888888    GCF_000182965.3|NC_032094.1       171367
185-b77e-25cb88888888    GCF_000182965.3|NC_032094.1       171367
185-b77e-25cb88888888    GCF_000182965.3|NC_032094.1       171367

I am using Version 1.1.4-d2d8fc1-0 installed with conda

Is this expected. What could be the reason causing this issue?

Does MA supports memory mapping?

I have many samples to run, and the reference database is very large. So loading the database from disk to memory takes much more time than alignment. So I was wondering wether MA supports memory mapping to void loading database into memory every time?

Distribution via bioconda?

Hi!

In order to make it more accessible, it would be great if you could ensure distribution via bioconda. It might get you more traction, because conda install ma is easier than compiling it on your own.

Armin

How to run MA on short reads faster?

I am using MA for short reads alignment which takes 20-30 minutes per sample. The results look great. But it would be great if I can have 2-3 times speedup. What parameters settings can help speedup MA on short reads without losing too much accuracy?

compile error of cmake

Hi Arne and Markus,
I would like to install MA and I have a compile error during the cmake step. The cmake is version 3.13.3.
Here is the output:
Cloning into 'MA'...
remote: Enumerating objects: 345, done.
remote: Counting objects: 100% (345/345), done.
remote: Compressing objects: 100% (166/166), done.
remote: Total 10050 (delta 208), reused 280 (delta 169), pack-reused 9705
Receiving objects: 100% (10050/10050), 7.53 MiB | 26.33 MiB/s, done.
Resolving deltas: 100% (7610/7610), done.
-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
// CMAKE_ARGS:
// CMAKE_BUILD_TYPE = ""
// CMAKE_CXX_FLAGS =
// CMAKE_CXX_FLAGS_DEBUG = -g
// CMAKE_CXX_FLAGS_RELEASE = -O3 -DNDEBUG
// CMAKE_CXX_FLAGS_RELWITHDEBINFO = -O2 -g -DNDEBUG
// CMAKE_CXX_FLAGS_MINSIZEREL = -Os -DNDEBUG
Build type set to "Release"
-- Found PythonLibs: /opt/lib/libpython3.5m.a (found suitable version "3.5.1", minimum required is "3.5")
Found Python Libs
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7")
Found zlib
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
Include pybind11 ...
-- Found PythonInterp: /home/workspace/python/bin/python3.6 (found version "3.6.5")
-- Found PythonLibs: /opt/lib/libpython3.5m.a
-- pybind11 v2.3.dev0
Include libkswcpp ...
Include libMA ...
-- libMA: Build shared library with python support via local pybind11.
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- libMA: Filesystem support for GCC below version 8
-- libMA: Building shared library components that rely on zlib.
-- libMA: Version = 1.1.1-9d9c5a6-D
-- libMA: generating test index_generation.cpp
-- libMA: generating test random_alignments.cpp
-- libMA: generating test random_paired_alignments.cpp
-- Found PythonInterp: /home/workspace/python/bin/python3.6 (found suitable version "3.6.5", minimum required is "3.5")
-- libMA: generating test random_alignments.py
-- Could NOT find wxWidgets (missing: wxWidgets_LIBRARIES wxWidgets_INCLUDE_DIRS)
-- Configuring done
-- Generating done
-- Build files have been written to: /home/workspace/software/MA/build
Scanning dependencies of target kswcpp
[ 1%] Building CXX object libs/kswcpp/CMakeFiles/kswcpp.dir/src/cpu_info.cpp.o
In file included from /home/workspace/software/MA/libs/kswcpp/src/cpu_info.cpp:7:0:
/home/workspace/software/MA/libs/kswcpp/inc/cpu_info.h: In constructor ‘CPU_Info::CPU_Info_Internal::CPU_Info_Internal()’:
/home/workspace/software/MA/libs/kswcpp/inc/cpu_info.h:64:36: error: expected ‘,’ before ‘)’ token
static_assert( sizeof(int) == 4 ); // implies 0x20 % sizeof( int ) == 0
^
/home/workspace/software/MA/libs/kswcpp/inc/cpu_info.h:64:36: error: expected string-literal before ‘)’ token
make[2]: *** [libs/kswcpp/CMakeFiles/kswcpp.dir/src/cpu_info.cpp.o] Error 1
make[1]: *** [libs/kswcpp/CMakeFiles/kswcpp.dir/all] Error 2
make: *** [all] Error 2

Did I miss something?
Many Thanks!