mehrdadbakhtiari / advntr Goto Github PK

View Code? Open in Web Editor NEW

39.0 4.0 15.0 1.46 MB

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data

Home Page: http://advntr.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 70.73% Makefile 0.15% C++ 1.76% Shell 0.01% Cython 27.35%

structural-variation genomics bioinformatics genotype next-generation-sequencing

advntr's Introduction

adVNTR - A tool for genotyping VNTRs

adVNTR is a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data. It works with both NGS short reads (Illumina HiSeq) and SMRT reads (PacBio) and finds diploid repeating counts for VNTRs and identifies possible mutations in the VNTR sequences.

code-adVNTR, a tool specialized in detecting small indel variants within motifs using short reads is now available. This tool employs multiple motif Hidden Markov Models (HMMs) to identify small variants within motifs and estimate diploid repeat counts for VNTRs specifically in coding regions. For more details, please refer to this readme.

Installation

If you are using the conda packaging manager (e.g. miniconda or anaconda), you can install adVNTR from the bioconda channel:

conda config --add channels bioconda
conda install -c conda-forge -c bioconda advntr

adVNTR could be invoked from command line with advntr

Alternatively, you can install dependencies and install the adVNTR from source.

Data Requirements

In order to genotype VNTRs, you need to either train models for loci of interest or use pre-trained models (recommended):

To run adVNTR on trained VNTR models:
- Download vntr_data_recommended_loci.zip and extract it inside the project directory. This includes a set of pre-trained VNTR models in hg19 for Illumina (6719 loci) and Pacbio (8960 loci) sequencing data. You can use vntr_data_recommended_loci_hg38.zip for VNTRs in GRCh38.
- You can also download and use vntr_data_genic_loci.zip for 158522 VNTRs in hg19 that results in having much longer running time.
- Alternatively you can download the BED file for gene-proximal VNTRs gene_proximal_vntrs.bed for VNTRs in GRCh38.
- The VNTR coordinates (GRCh38) utilized for GIAB TR benchmarking can be downloaded from this link.

Alternatively, you can add model for custom VNTR. See Add Custom VNTR for more information about training models for custom VNTRs.

[Optional] For faster genotyping with adVNTR-NN, pretrained neural network models can be downloaded from here.

Execution:

Use following command to see the help for running the tool.

advntr --help

The program outputs the RU count genotypes of trained VNTRs. To specify a single VNTR by its ID use --vntr_id <id> option. The list of some known VNTRs and their ID is available at Disease-linked-VNTRs page in wiki.

See the demo below or Quickstart page to see an example data set with step-by-step genotyping commands.

Demo input in BAM format

--alignment_file specifies the alignment file containing mapped and unmapped reads:

    advntr genotype --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/

With --pacbio, adVNTR assumes the alignment file contains PacBio sequencing data:

    advntr genotype --alignment_file aligned_pacbio_reads.bam --working_directory ./log_dir/ --pacbio

Use --frameshift to find the possible frameshifts in VNTR:

    advntr genotype --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/ --frameshift

Documentation:

Documentation is available at advntr.readthedocs.io.

See Quickstart page to see an example data set with step-by-step genotyping commands.

Citation:

code-adVNTR:

Jonghun Park, Mehrdad Bakhtiari, Bernt Popp, Michael Wiesener, Vineet Bafna. Detecting tandem repeat variants in coding regions using code-adVNTR iScience vol. 25,8 104785. (2022),
adVNTR-NN (v1.4.0):

Mehrdad Bakhtiari, Jonghun Park, Yuan-Chun Ding, Sharona Shleizer-Burko, Susan L. Neuhausen, Bjarni V. Halldorsson, Kari Stefansson, Melissa Gymrek, Vineet Bafna. Variable Number Tandem Repeats mediate the expression of proximal genes Nature Communications 12, 2075 (2021).
Original publication (adVNTR):

Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. and Bafna, V., 2018. Targeted genotyping of variable number tandem repeats with adVNTR. Genome research vol. 28,11 (2018)

advntr's People

Contributors

Stargazers

Watchers

Forkers

nh13 radygenomics inambioinfo mldmort jong-hun-park bcade alreadydone hevmarriott sara-javadzadeh asleonard balasundar30 backupnoob01 noma-m wangdi2016

advntr's Issues

sqlite3.OperationalError: unable to open database file

I tried to add my own model using this:

advntr addmodel -r /rd/tansj/vntr/vntr_data/dm6.fa -c chr2L -p GTTGTCTTTTTGGTGGTTGGTTCTTCTGTTGTCTTATGGGTTGTTGTCTTACGAGTGGTTGGCTCCTGAGTTGTCCTTTCAGTTGTTTCCCTCTTTGTAGTTGGCTCTCTAGTAGTCTTTGTGGTTGATGTTTTGCGAGTAGTTGGTTCTTCGGTAGGCTTTAAAGTAGTTGATTTCTTAGTTGTGGGTTCATGAGTGC -s 9898540 -e 9902555

it is a VNTR in fruitfly genome. But I got some errors:

Traceback (most recent call last):
File "/home/tansj/program/miniconda2/bin/advntr", line 6, in
sys.exit(advntr.main.main())
File "/home/tansj/program/miniconda2/lib/python2.7/site-packages/advntr/main.py", line 118, in main
add_model(args, addmodel_parser)
File "/home/tansj/program/miniconda2/lib/python2.7/site-packages/advntr/advntr_commands.py", line 141, in add_model
vntr_id = get_largest_id_in_database() + 1
File "/home/tansj/program/miniconda2/lib/python2.7/site-packages/advntr/models.py", line 178, in get_largest_id_in_database
db = sqlite3.connect(settings.TRAINED_MODELS_DB)
sqlite3.OperationalError: unable to open database file

I did not use sqlite3 before. We have mysql. Is there something I need to do before running advntr?

OSError: [Errno 24] Too many open files

Hi, apparently I have a multiprocess error when running adVNTR ; the command :

advntr genotype --alignment_file $bam --working_directory $advntrDir/$bam_name --vntr_id 25561 --pacbio --frameshift -m ../hg19_selected_VNTRs_Pacbio.db -t $Ncpu

the error :

[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 14991 reads
Process Process-1009:
Traceback (most recent call last):
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 264, in check_if_pacbio_read_spans_vntr
self.check_if_flanking_regions_align_to_str(str(read.seq).upper(), length_distribution, spanning_reads)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 260, in check_if_flanking_regions_align_to_str
spanning_reads.append(read_str[left_align[3]:right_align[3]+flanking_region_size])
File "", line 2, in append
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod
self._connect()
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 740, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
File "/opt/miniconda3/envs/gene36/bin/advntr", line 11, in
sys.exit(main())
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/main.py", line 121, in main
genotype(args, genotype_parser)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/advntr_commands.py", line 101, in genotype
genome_analyzier.find_repeat_counts_from_pacbio_alignment_file(input_file)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/genome_analyzer.py", line 103, in find_repeat_counts_from_pacbio_alignment_file
copy_numbers = self.vntr_finder[vid].find_repeat_count_from_pacbio_alignment_file(alignment_file, reads)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 477, in find_repeat_count_from_pacbio_alignment_file
mapped_spanning_reads = self.get_spanning_reads_of_aligned_pacbio_reads(alignment_file)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 329, in get_spanning_reads_of_aligned_pacbio_reads
p.start()
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/popen_fork.py", line 65, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

Is there a db file of hg38 matching pacbio data?

Hello, I have bam file of pacbio hg38 here, which db file should I use?

vcf Output-Filter is all . instead of PASS

In running adVNTR I am concerned since the FILTER column for the output vcf file is all "." for the n=6,719 loci.
Please advise: This should all be "PASS", instead no?? Does this mean that these calls did not PASS?
The header of adVNTR output vcf file is below and input code as per program:
advntr genotype -m hg19_selected_VNTRs_Illumina.db -f FILENAME.bam --of vcf -o FILE.vcf

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FILENAME
chr1 700243 . CGAGGCGGGAAGATCACTTGATATCAGGAGTCGAGGCGGGAAGATCACTTGACGTCAGGAGTT . . . END=700306;VID=156;RU=CGAGGCGGGAAGATCACTTGACATCAGGAGTT;RC=2GT:DP:SR:FR:ML 0/0:18:6:12:0.9995
chr1 777293 . AGAAATGCTTCTTTAGAAATGCTTCTTT . . . END=777321;VID=174;RU=AGAAATGCTTCTTT;RC=2 GT:DP:SR:FR:ML 0/0:32:18:14:1.0000
chr1 900683 . TTTATTTAGTTGTTTATTTATTTAGTTATTTATCTTATTTATTGAGGGG . . . END=900732;VID=239;RU=TTTATTTAGTTATTTA;RC=3 GT:DP:SR:FR:ML 0/0:28:15:13:1.0000
chr1 910475 . GAGAAAGAAAAAAGAGAGAAGAGAGAGAGGAGAAGAGAGAAAAGAAGAGAAGAGAGAAGAGAAGAGAAGGAAAGAGAGAGAGAGAATAAGAAAAGGAAGAAAGAAAAAGAAAAGATAGAA

sqlite3.OperationalError: disk I/O error

Hi Mehrdad,

I'm trying to run advntr on some PacBio reads mapped to hg38 and I'm using the hg38 models (hg38_selected_VNTRs_Illumina.db) however I get a sqlite3.OperationalError when running with the following options:

advntr genotype --alignment_file HG00733.sorted.bam --working_directory $PWD --pacbio -r hg38.no_alt.fa --models vntr_data/hg38_selected_VNTRs_Illumina.db --outfile advntr.vcf -
-outfmt vcf
Using Theano backend.
Traceback (most recent call last):
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/bin/advntr", line 11, in <module>
    sys.exit(main())
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/lib/python3.6/site-packages/advntr/__main__.py", line 134, in main
    genotype(args, genotype_parser)
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/lib/python3.6/site-packages/advntr/advntr_commands.py", line 104, in genotype
    reference_vntrs = load_unique_vntrs_data()
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/lib/python3.6/site-packages/advntr/models.py", line 140, in load_unique_vntrs_data
    right_flanking, repeats, scaled_score FROM vntrs''')
sqlite3.OperationalError: disk I/O error

All the files referenced in the command exist. I tried redownloading the VNTR models in case they were corrupted. I get the same error if I use the hg19 ones. Any suggestions? Thanks.

Output with --haploid option

When --haploid option is given, the output format should be changed accordingly.

addmodel error

Hi,
I tried to build database for my VNTRs, the following error occurred:
Traceback (most recent call last):
File "/rd1/laixh/soft/anaconda2/envs/snakemake/bin/advntr", line 33, in
sys.exit(load_entry_point('advntr==1.5.0', 'console_scripts', 'advntr')())
File "/rd1/laixh/soft/anaconda2/envs/snakemake/lib/python3.10/site-packages/advntr-1.5.0-py3.10-linux-x86_64.egg/advntr/main.py", line 149, in main
add_model(args, addmodel_parser)
File "/rd1/laixh/soft/anaconda2/envs/snakemake/lib/python3.10/site-packages/advntr-1.5.0-py3.10-linux-x86_64.egg/advntr/advntr_commands.py", line 208, in add_model
ref_vntr.init_from_vntrseek_data()
File "/rd1/laixh/soft/anaconda2/envs/snakemake/lib/python3.10/site-packages/advntr-1.5.0-py3.10-linux-x86_64.egg/advntr/reference_vntr.py", line 45, in init_from_vntrseek_data
repeat_segments = self.find_repeat_segments(corresponding_region_in_ref)
File "/rd1/laixh/soft/anaconda2/envs/snakemake/lib/python3.10/site-packages/advntr-1.5.0-py3.10-linux-x86_64.egg/advntr/reference_vntr.py", line 82, in find_repeat_segments
model = build_reference_repeat_finder_hmm(patterns, copies=self.estimated_repeats)
File "/rd1/laixh/soft/anaconda2/envs/snakemake/lib/python3.10/site-packages/advntr-1.5.0-py3.10-linux-x86_64.egg/advntr/hmm_utils.py", line 674, in build_reference_repeat_finder_hmm
model.bake()
File "pomegranate/hmm.pyx", line 731, in pomegranate.hmm.HiddenMarkovModel.bake
File "/rd1/laixh/soft/anaconda2/envs/snakemake/lib/python3.10/site-packages/networkx/classes/reportviews.py", line 194, in getitem
return self._nodes[n]
KeyError: 0

Any Suggestion?

Change version of h5py to 2.10.0

Hi there,

adVNTR will not work unless the h5py version is changed to 2.10.0. Perhaps the conda environment could be updated to reflect this?

Help understanding outputs

Hi,
I have used the foll. command to run advntr on my PCR-free WGS data from Illumina-

advntr genotype --threads 4 --expansion --coverage 5 --alignment_file mybam.sorted.bam --models /advntr_hg38_selected_VNTRs_Illumina/hg38_selected_VNTRs_Illumina.db --working_directory .

I get the foll. results-

unmapped.fasta (476257 reads)
keywords.txt (0 b)
filtering_out.txt (0 b)
log (shown below)

My questions-
Why only the unmapped reads were used by the tool?
How do I generate vcf and bed format output?
Can a tab delimited file with predicted GT per VNTR be generated?
Is there a way to visualize the reads leading to genotypes- static like REVEIWER or interactive like IGV?

Best,
Hasna

README update for VCF

README update is required for VCF output

Can't run adVNTR genotype - unexpected keyword argument 'max_prob'

Hi there,

I have been trying to run adVNTR (latest version) via conda and every time it gives me this error:

Traceback (most recent call last):

File "/users/k19042397/miniconda2/envs/adVNTR/bin/advntr", line 11, in <modu$
sys.exit(main())
File "/users/k19042397/miniconda2/envs/adVNTR/lib/python3.6/site-packages/ad$
genotype(args, genotype_parser)
File "/users/k19042397/miniconda2/envs/adVNTR/lib/python3.6/site-packages/ad$
genome_analyzier.find_repeat_counts_from_alignment_file(input_file, averag$
File "/users/k19042397/miniconda2/envs/adVNTR/lib/python3.6/site-packages/ad$
average_coverage, update)
File "/users/k19042397/miniconda2/envs/adVNTR/lib/python3.6/site-packages/ad$
retval = func(*args, **kwargs)
File "/users/k19042397/miniconda2/envs/adVNTR/lib/python3.6/site-packages/ad$
max_prob=0) # No probability for the estimated genotype
TypeError: init() got an unexpected keyword argument 'max_prob'

I do not know whether this is a disagreement between init GenotypeResult, which does not have max_prob assigned and defined further down, only max_likelihood, or if it is a problem with my options. I assigned a float to the coverage option and played around with the options and it still gives me this.

Any help would be much appreciated.

All the best,

Heather

Error running advntr --help

Hi,
I was trying to setup advntr using conda and running into to this error. Any help is appreciated.
Thank you

(/home/aravind/advntrenv) aravind@cn61:~$ advntr --help
Traceback (most recent call last):
File "/home/aravind/advntrenv/bin/advntr", line 7, in
from advntr.main import main
File "/home/aravind/advntrenv/lib/python2.7/site-packages/advntr/main.py", line 6, in
from advntr.advntr_commands import genotype, view_model, add_model, del_model
File "/home/aravind/advntrenv/lib/python2.7/site-packages/advntr/advntr_commands.py", line 7, in
from advntr.genome_analyzer import GenomeAnalyzer
File "/home/aravind/advntrenv/lib/python2.7/site-packages/advntr/genome_analyzer.py", line 8, in
from advntr.vntr_finder import VNTRFinder
File "/home/aravind/advntrenv/lib/python2.7/site-packages/advntr/vntr_finder.py", line 14, in
from advntr.hmm_utils import *
File "/home/aravind/advntrenv/lib/python2.7/site-packages/advntr/hmm_utils.py", line 3, in
from pomegranate import DiscreteDistribution, State
File "/home/aravind/advntrenv/lib/python2.7/site-packages/pomegranate/init.py", line 38, in
from .hmm import *
File "pomegranate/hmm.pyx", line 13, in init pomegranate.hmm
File "/home/aravind/advntrenv/lib/python2.7/site-packages/networkx/init.py", line 68, in
import networkx.utils
File "/home/aravind/advntrenv/lib/python2.7/site-packages/networkx/utils/init.py", line 2, in
from networkx.utils.decorators import *
File "/home/aravind/advntrenv/lib/python2.7/site-packages/networkx/utils/decorators.py", line 7, in
from decorator import decorator
File "/home/aravind/advntrenv/lib/python2.7/site-packages/decorator.py", line 162
print('Error in generated code:', file=sys.stderr)
^
SyntaxError: invalid syntax

Genotype error

Hi Mehrdad,

I've installed adVNTR-1.4.1 on Linux, when I running samples in batch, some of them produce the fllowing error:

  File "/home/sjzhang/anaconda2/envs/SV_ENV/bin/advntr", line 11, in <module>
    sys.exit(main())
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/__main__.py", line 134, in main
    genotype(args, genotype_parser)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/advntr_commands.py", line 124, in genotype
    genome_analyzier.find_repeat_counts_from_alignment_file(input_file, average_coverage, args.update)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/genome_analyzer.py", line 241, in find_repeat_counts_from_alignment_file
    average_coverage, update)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/vntr_finder.py", line 707, in find_repeat_count_from_alignment_file
    selected_reads = self.select_illumina_reads(alignment_file, unmapped_filtered_reads, update)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/vntr_finder.py", line 658, in select_illumina_reads
    hmm = self.get_vntr_matcher_hmm(read_length=read_length)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/vntr_finder.py", line 119, in get_vntr_matcher_hmm
    vntr_matcher = self.build_vntr_matcher_hmm(copies, flanking_region_size)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/vntr_finder.py", line 101, in build_vntr_matcher_hmm
    vntr_matcher = get_read_matcher_model(left_flanking_region, right_flanking_region, patterns, copies)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/hmm_utils.py", line 529, in get_read_matcher_model
    repeats_matcher = get_variable_number_of_repeats_matcher_hmm(patterns, copies, vpaths)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/hmm_utils.py", line 476, in get_variable_number_of_repeats_matcher_hmm
    model = get_constant_number_of_repeats_matcher_hmm(patterns, copies, vpaths)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/hmm_utils.py", line 405, in get_constant_number_of_repeats_matcher_hmm
    transitions, emissions = build_profile_hmm_for_repeats(patterns, settings.MAX_ERROR_RATE)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/advntr/profile_hmm.py", line 170, in build_profile_hmm_for_repeats
    alignment = AlignIO.read(StringIO(stdout), "clustal")
  File "/home/sjzhang/anaconda2/envs/SV_ENV/lib/python2.7/site-packages/Bio/AlignIO/__init__.py", line 428, in read
    raise ValueError("No records found in handle")
ValueError: No records found in handle

Here is my command:
advntr genotype -t 2 --alignment_file 6814D.marked.realigned.recal.bam --working_directory 6814D -m hg38_selected_VNTRs_Illumina.db -of vcf -o 6814D_vntr.vcf

Could you please help me figure out what the issue is? Any help is greatly appreciated!

subset hg19_genic_VNTRs.db

Can you please clarify advise? For genotyping a specific subset of VNTRs from the n=158,522 VNTRs loci (i.e. a subset of the hg19_genic_VNTRs.db within vntr_data_genic_loci.zip):

Can the VNTRs be subsetted by specific VNTRs of interest within, if I specify chromosome and coordinates??
This input code could then just include this subsetted reference file...correct?
advntr genotype -m vntr_data subsettedfile.db -f filename.bam --working_directory ./ -of vcf -o filename_advntr.vcf
Thanks.

only None copy numbers in the output file

I tried adVNTR on PacBio data and the output bed file has only None values in R1 and R2 columns. It basically does not genotype any tandem repeat region.

`hg19_chromosomes` is not described

hg19_chromosomes/CombinedHG19_Reference.fa is hard-coded in many places (ex. src/reference_vntr.py)
chr9.fasta is hard-coded in a supplementary script
chromosome names are defined in settings.py, and not using a sequence dictionary (.dict) or fasta index (.fai).

Describe vntr_data

Dumping the current DB, it seems like it wouldn't be hard to describe the necessary columns: id,nonoverlapping,chromosome,ref_start,gene_name,annotation,pattern,left_flanking,right_flanking,repeats.

Also, having where the database exists would be nice on the command line.

FileNotFoundError: [Errno 2] No such file or directory: 'makeblastdb'

I right now don't have the time to investigate this further but I already wanted to report the error.

It's perhaps a dependency that needs to be installed?

[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 1066443 reads
Traceback (most recent call last):
  File "advntr.py", line 60, in <module>
    run_advntr()
  File "advntr.py", line 49, in run_advntr
    genotype(args, genotype_parser)
  File "/home/wdecoster/bin/adVNTR/src/advntr_commands.py", line 73, in genotype
    genome_analyzier.find_repeat_counts_from_alignment_file(input_file)
  File "/home/wdecoster/bin/adVNTR/src/genome_analyzer.py", line 71, in find_repeat_counts_from_alignment_file
    vntr_reads = self.get_vntr_filtered_reads_map(unmapped_reads_file)
  File "/home/wdecoster/bin/adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/wdecoster/bin/adVNTR/src/genome_analyzer.py", line 33, in get_vntr_filtered_reads_map
    read_ids = self.vntr_finder[vid].filter_reads_with_keyword_matching(self.working_dir, read_file, illumina)
  File "/home/wdecoster/bin/adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/wdecoster/bin/adVNTR/src/vntr_finder.py", line 88, in filter_reads_with_keyword_matching
    empty_db = make_blast_database(read_file, blast_db_name)
  File "/home/wdecoster/bin/adVNTR/src/blast_wrapper.py", line 13, in make_blast_database
    call(['makeblastdb'] + make_db_args)
  File "/home/wdecoster/anaconda3/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/home/wdecoster/anaconda3/lib/python3.6/subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "/home/wdecoster/anaconda3/lib/python3.6/subprocess.py", line 1333, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'makeblastdb'

Cheers,
Wouter

vntr data does not exists

The link in the readme for the VNTR data does not work: https://cseweb.ucsd.edu/~mbakhtia/adVNTR/vntr_data.zip/

cannot create my custom database

hi, i installed the latest version of adVNTR （v1.32） using miniconda: conda install -c bioconda advntr, and erverything is well but i cannot add (create) a new .db
ohh! i created a new folder : vntr_data

my code is :
advntr addmodel -r sus.fa -p GGGGTCATG -s 14859 -e 14889 -c chr1

error info:
Traceback (most recent call last):
File "/home/miniconda2/envs/py36/bin/advntr", line 11, in
sys.exit(main())
File "/home/miniconda2/envs/py36/lib/python3.6/site-packages/advntr/main.py", line 138, in main
add_model(args, addmodel_parser)
File "/home/miniconda2/envs/py36/lib/python3.6/site-packages/advntr/advntr_commands.py", line 184, in add_model
create_vntrs_database(settings.TRAINED_MODELS_DB)
File "/home/miniconda2/envs/py36/lib/python3.6/site-packages/advntr/models.py", line 115, in create_vntrs_database
''')
sqlite3.OperationalError: disk I/O error

it will generate a new file (hg19_selected_VNTRs_Illumina.db) under vntr_data, if change the permission to 777 (chmod 777 *)
,another error encounterd:

Traceback (most recent call last):
File "/home/miniconda2/envs/py36/bin/advntr", line 11, in
sys.exit(main())
File "/home/miniconda2/envs/py36/lib/python3.6/site-packages/advntr/main.py", line 138, in main
add_model(args, addmodel_parser)
File "/home/miniconda2/envs/py36/lib/python3.6/site-packages/advntr/advntr_commands.py", line 186, in add_model
vntr_id = get_largest_id_in_database() + 1
File "/home/miniconda2/envs/py36/lib/python3.6/site-packages/advntr/models.py", line 207, in get_largest_id_in_database
cursor.execute('''SELECT MAX(id) FROM vntrs''')
sqlite3.OperationalError: disk I/O error

although，I saw the same question, I didn't get any exact solution。
Thank you in advance

Contig tag in VCF header

Since mergeSTR tool requires contig tag to merge VCF files, it is better to add contig tag in VCF header, which is not required.

trained VNTR models for Nanopore data

Hi, when I download vntr_data_recommended_loci.zip , there are two files ( illumina and pacbio ). If I have Nanopore data, can I choose the pacbio file or do you have a specific nanopore file that I can find ?

VCF output

Output a vcf with REF and ALT sequences for each loci

adVNTR doesn't output the correct genotype on the CSTB sample data

The current version (1.4.1) doesn't output the correct genotype on the CSTB sample data.

This is because some reads are filtered out and caused by the commit 163a756.

IndexError when running `advntr addmodel`

@mehrdadbakhtiari

Issue:
I get an IndexError when I try to run advntr addmodel. What can be done to fix this error?

Version:
adVNTR 1.3.3 (downloaded using conda)
(Note: I have tried to install the latest GitHub version but it fails to build because of a Cython CompileError.)

Command:

advntr addmodel \
-r ~/[...]/hg38_no-alt-no-random.fa \
-c 18 -s 57024495 -e 57024955 \
-p GTTATGTCCCTTATAGAATTTCACATATCCCTATTCTCAGACAGGAATAGGGATATGTGAGCTATGATA \
-g WDR7 \
-m vntr_data/hg38_selected_VNTRs_Illumina.db

Output:

Searching reference genome for regions with shared kmers with VNTR. It takes a few hours for human genome
Traceback (most recent call last):
  File "/home/jeffr/miniconda3/envs/advntr-test/bin/advntr", line 11, in <module>
    sys.exit(main())
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/__main__.py", line 138, in main
    add_model(args, addmodel_parser)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/advntr_commands.py", line 193, in add_model
    scaled_recruitment_score = vntr_finder.train_classifier_threshold(args.reference)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/vntr_finder.py", line 687, in train_classifier_threshold
    hmm = self.get_vntr_matcher_hmm(read_length=read_length)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/vntr_finder.py", line 87, in get_vntr_matcher_hmm
    vntr_matcher = self.build_vntr_matcher_hmm(copies, flanking_region_size)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/vntr_finder.py", line 69, in build_vntr_matcher_hmm
    vntr_matcher = get_read_matcher_model(left_flanking_region, right_flanking_region, patterns, copies)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/hmm_utils.py", line 485, in get_read_matcher_model    model = get_suffix_matcher_hmm(left_flanking_region)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "/home/jeffr/miniconda3/envs/advntr-test/lib/python3.6/site-packages/advntr/hmm_utils.py", line 317, in get_suffix_matcher_hmm    model.add_transition(unit_start, delete_states[0], delete_error)
IndexError: list index out of range

Including Keras as a requirement

Since vntr_finder imports keras.models, Keras should be added to the requirement?

advntr not working with hg19_genic_VNTRs.db

Got advntr working through conda eg:
advntr genotype --alignment_file NS5_NA12878-PcrFree-Hg19-BWA_chr20.bam --working_directory ./log_dir/ -of bed -o NS5_NA12878-PcrFree-Hg19-BWA_chr20_VNTRstandard.bed

this works fine, however giving advntr bigger db (hg19_genic_VNTRs.db) as described in readme produces following error:
advntr genotype --alignment_file ../NS5_NA12878-PcrFree-Hg19-BWA_chr20.bam --working_directory ./log_dir/ -m ./vntr_data/hg19_genic_VNTRs.db

None
70183
None
70184
None
70185
None
Traceback (most recent call last):
File "/home/ptedder/anaconda2/envs/ariba371/bin/advntr", line 11, in
sys.exit(main())
File "/home/ptedder/anaconda2/envs/ariba371/lib/python3.6/site-packages/advntr/main.py", line 128, in main
genotype(args, genotype_parser)
File "/home/ptedder/anaconda2/envs/ariba371/lib/python3.6/site-packages/advntr/advntr_commands.py", line 121, in genotype
genome_analyzier.find_repeat_counts_from_alignment_file(input_file, average_coverage, args.update)
File "/home/ptedder/anaconda2/envs/ariba371/lib/python3.6/site-packages/advntr/genome_analyzer.py", line 129, in find_repeat_counts_from_alignment_file
average_coverage, update)
File "/home/ptedder/anaconda2/envs/ariba371/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/ptedder/anaconda2/envs/ariba371/lib/python3.6/site-packages/advntr/vntr_finder.py", line 681, in find_repeat_count_from_alignment_file
return self.get_ru_count_with_coverage_method(pattern_occurrences, total_counted_vntr_bp, average_coverage)
File "/home/ptedder/anaconda2/envs/ariba371/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/ptedder/anaconda2/envs/ariba371/lib/python3.6/site-packages/advntr/vntr_finder.py", line 624, in get_ru_count_with_coverage_method
estimate = [int(pattern_occurrences / (float(average_coverage) * haplotypes))] * 2
TypeError: float() argument must be a string or a number, not 'NoneType'

Support CRAM input

Users must specify the reference genome if the input alignment file is in CRAM format

Issues running advntr genotype

Hi,

After installing the package, I'm having difficulty getting the genotype command to run on some pacbio bam files.

Running the following;

advntr genotype -p -a NA07019_SC034768.bam --update -m vntr_data/hg19_VNTRs.db -o bed --working_directory test/

Produces;

Traceback (most recent call last):
File "/opt/miniconda2/bin/advntr", line 11, in
sys.exit(main())
File "/opt/miniconda2/lib/python2.7/site-packages/advntr/main.py", line 121, in main
genotype(args, genotype_parser)
File "/opt/miniconda2/lib/python2.7/site-packages/advntr/advntr_commands.py", line 91, in genotype
reference_vntrs = load_unique_vntrs_data()
File "/opt/miniconda2/lib/python2.7/site-packages/advntr/models.py", line 125, in load_unique_vntrs_data
right_flanking, repeats, scaled_score FROM vntrs''')
sqlite3.OperationalError: no such table: vntrs

I'm at a bit of a loss here as to what is the issue. Any help is greatly appreciated.

Non-human data

If this software can be used in nonhuman data? How can we extend it to other species?

outformat

For an adVNTR output in "vcf format:, is the command just --outfmt vcf
I am getting bed and text to output, but not vcf...Not clear. If you can advise...Thanks.
advntr genotype -m vntr_data/hg19_selected_VNTRs_Illumina.db -f BAM --working_directory ./ -of text -o NAME

IndexError: cannot do a non-empty take from an empty axes.

I get the following error when running adVNTR. After, I modified the code printed out false_scores in vntr_finder.py on line 197 and I get[]. Any help would be greatly appreciated.

Traceback (most recent call last):
  File "adVNTR/advntr.py", line 58, in <module>
    run_advntr()
  File "adVNTR/advntr.py", line 49, in run_advntr
    genotype(args, genotype_parser)
  File "...adVNTR/src/commands.py", line 73, in genotype
    genome_analyzier.find_repeat_counts_from_alignment_file(input_file)
  File "...adVNTR/src/genome_analyzer.py", line 74, in find_repeat_counts_from_alignment_file
    copy_number = self.vntr_finder[vid].find_repeat_count_from_alignment_file(alignment_file, unmapped_reads)
  File "...adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "...adVNTR/src/vntr_finder.py", line 643, in find_repeat_count_from_alignment_file
    selected_reads = self.select_illumina_reads(alignment_file, unmapped_filtered_reads)
  File "...adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "...adVNTR/src/vntr_finder.py", line 575, in select_illumina_reads
    min_score_to_count_read = self.get_min_score_to_select_a_read(hmm, alignment_file, read_length)
  File "...adVNTR/src/vntr_finder.py", line 219, in get_min_score_to_select_a_read
    score = self.calculate_min_score_to_select_a_read(hmm, alignment_file)
  File "...adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "...adVNTR/src/vntr_finder.py", line 199, in calculate_min_score_to_select_a_read
    score = numpy.percentile(false_scores, 100 - settings.SCORE_SELECTION_PERCENTILE)
  File ".../lib/python2.7/site-packages/numpy/lib/function_base.py", line 4116, in percentile
    interpolation=interpolation)
  File ".../lib/python2.7/site-packages/numpy/lib/function_base.py", line 3858, in _ureduce
    r = func(a, **kwargs)
  File ".../lib/python2.7/site-packages/numpy/lib/function_base.py", line 4233, in _percentile
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File ".../lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 134, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File ".../lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

Clear warning outputs

Description

adVNTR has some warning messages related to Keras.
Currently, the default output is set to standard output, and so the warning messages break the output format.

vcf header

In trying to merge adVNTR output files across multiple samples using bcftools merge, I get an error:
[W::vcf_parse] FILTER ' .' is not defined in the header
[E::vcf_parse] Could not add dummy header for FILTER ' .'

Is FILTER " ." not meant to be included...No FILTER "PASS" in my output...Only "."
Should I just edit vcf file header to add ##FILTER=<ID=., Description="xyzzy"
##FILTER=<ID=PASS,Description="All filters passed">
Would you suggest an alternative way to merge files across samples? Other than bcftools..?

Settings.py needs to be better described

When running adVNTR, I was surprised when I got the error IOError: [Errno 2] No such file or directory: 'IOError: [Errno 2] No such file or directory: 'blast_tmp/d139ad54-4941-449a-94b9-2b3bcda61b69219902_query.fasta'. I would have expected at least some of the settings in settings.py to be on the command line, and at least a yaml/json/hocon configuration file.

Output Files location and readme

Hello,

I was trying to test the adVNTR tools, it installed without any issue and I obtained vntr_data following instructions in docs. Below is my command that I ran
advntr genotype --alignment_file Bams/Samples.bam --working_directory Bams/

I don't seem to find any information about the output file. Can you please provide some information on where to find output file as well brief description to understand the output file.

Thanks,
Nick

:ref:`add-custom-vntr-label` not found in the README.rst

It would be great how to add custom VNTRs to use adVNTR.

Haplotype call for VNTRs on chrY

For VNTRs on chrY, we can first check if we can call homozygous calls for VNTRs on chrY and output the haplotype only if the call is confident.

Only Coding VNTRs were genotyped

Hello Mehrdad,

We started analyzing the adVNTR output bed files and it turned out that it only contains coding or near coding VNTRs. Is it default to only use coding VNTRs ? Is there is a specific flag that needs to be set to turned on/off such behavior ?

Thanks,
Nick

muscle is not listed as a requirement

See MUSCLE_DIR in settings.py. It failed fairly far into the execution saying that it could find the muscle directory (is it the executable or directory???)

Support for python-3.7

All required packages have released an update to support python-3.7, so we can support it as well.

Genotype Error

Hi!
The following error always occur when I run genotype command on pacbio bam file:
Traceback (most recent call last):
File "/soft/Anaconda/anaconda3/bin/advntr", line 11, in
sys.exit(main())
File "/soft/Anaconda/anaconda3/lib/python3.6/site-packages/advntr/main.py", line 134, in main
genotype(args, genotype_parser)
File "/soft/Anaconda/anaconda3/lib/python3.6/site-packages/advntr/advntr_commands.py", line 114, in genotype
genome_analyzier.find_repeat_counts_from_pacbio_alignment_file(input_file)
File "/soft/Anaconda/anaconda3/lib/python3.6/site-packages/advntr/genome_analyzer.py", line 214, in find_repeat_counts_from_pacbio_alignment_file
self.print_genotype(vid, copy_numbers)
File "/soft/Anaconda/anaconda3/lib/python3.6/site-packages/advntr/genome_analyzer.py", line 31, in print_genotype
self.print_genotype_in_vcf(vntr_id, genotype_result)
File "/soft/Anaconda/anaconda3/lib/python3.6/site-packages/advntr/genome_analyzer.py", line 101, in print_genotype_in_vcf
if genotype_result.copy_numbers is None:
AttributeError: 'tuple' object has no attribute 'copy_numbers'

command： advntr genotype --alignment_file NA12878.bam --models hg19_selected_VNTRs_Pacbio.db --working_directory new_log --outfile advntr_new --outfmt vcf --pacbio

adNVTR using assembly-level data

Hi,

Not an issue, I'm just wondering if adVNTR suits for assembly-level data generated from PacBio HiFi reads?

Thanks!

Consider creating a conda environment yaml

You may want to consider sharing a conda environment.yaml](https://conda.io/docs/user-guide/tasks/manage-environments.html#sharing-an-environment) or recipe for better reproducibility (versioning) and easier installation of dependencies.

why is python >= 3.7 skipped during build?

why is python >= 3.7 skipped in meta.yaml during build?

Reading VNTR IDs from a file instead of command line

Suggested by Nick in #17

python3

Hi,

I'd like to try your tool, looks really interesting. I noticed the README describes Python2.7, but is there any reason it wouldn't work for Python3?

Cheers,
Wouter

Error with genotype

Hi,

I've been trying to follow the quickstart guide to test adVNTR using the sample data provided. However, the following error keeps occuring

Using Theano backend.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Traceback (most recent call last):
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/bin/advntr", line 11, in
sys.exit(main())
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/main.py", line 134, in main
genotype(args, genotype_parser)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/advntr_commands.py", line 124, in genotype
genome_analyzier.find_repeat_counts_from_alignment_file(input_file, average_coverage, args.update)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/genome_analyzer.py", line 241, in find_repeat_counts_from_alignment_file
average_coverage, update)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/vntr_finder.py", line 707, in find_repeat_count_from_alignment_file
selected_reads = self.select_illumina_reads(alignment_file, unmapped_filtered_reads, update)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/vntr_finder.py", line 658, in select_illumina_reads
hmm = self.get_vntr_matcher_hmm(read_length=read_length)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/vntr_finder.py", line 119, in get_vntr_matcher_hmm
vntr_matcher = self.build_vntr_matcher_hmm(copies, flanking_region_size)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/vntr_finder.py", line 101, in build_vntr_matcher_hmm
vntr_matcher = get_read_matcher_model(left_flanking_region, right_flanking_region, patterns, copies)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/hmm_utils.py", line 529, in get_read_matcher_model
repeats_matcher = get_variable_number_of_repeats_matcher_hmm(patterns, copies, vpaths)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/hmm_utils.py", line 476, in get_variable_number_of_repeats_matcher_hmm
model = get_constant_number_of_repeats_matcher_hmm(patterns, copies, vpaths)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/hmm_utils.py", line 405, in get_constant_number_of_repeats_matcher_hmm
transitions, emissions = build_profile_hmm_for_repeats(patterns, settings.MAX_ERROR_RATE)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profiler.py", line 8, in wrapper
retval = func(*args, **kwargs)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/advntr/profile_hmm.py", line 169, in build_profile_hmm_for_repeats
stdout, stderr = muscle_cline(stdin=data)
File "/home/users/astar/gis/ramul/miniconda3/envs/advntrenv/lib/python2.7/site-packages/Bio/Application/init.py", line 531, in call
stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'muscle -clwstrict', message 'Invalid command line'

command: advntr genotype --vntr_id 301645 --alignment_file CSTB_2_5_testdata.bam --working_directory working_dir

A similar error also occurs when running the addmodel command. I'm at quite a loss so any help would be deeply appreciated.

Build Data Base for other species

Hi!
I want to use advntr to generate VNTR data base for other species like no human primate and fruit fly. I have couple of questions:

Can I use command below to generate my own PacBio VNTR dataset?
advntr addmodel --reference reference.fa --chromosome chrosome --pattern Unit --start start --end end
What dose the scaled_score mean in the VNTR datesets and what dose this score used for?

Best，
Le

Convenient installation with PyPI and conda

Suggested by @wdecoster