illumina / expansionhunter Goto Github PK

A tool for estimating repeat sizes

License: Other

CMake 1.78% C++ 97.40% Shell 0.45% Dockerfile 0.37%

expansionhunter's Introduction

Expansion Hunter: a tool for estimating repeat sizes

There are a number of regions in the human genome consisting of repetitions of short unit sequence (commonly a trimer). Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Fragile X Syndrome, ALS, and Huntington's Disease are well known examples.

Expansion Hunter aims to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat.

Linux and macOS operating systems are currently supported.

License

Expansion Hunter is provided under the terms and conditions of the PolyForm Strict License 1.0.0. It relies on several third party packages provided under other open source licenses, please see COPYRIGHT.txt for additional details.

Documentation

Installation instructions, usage guide, and description of file formats are contained in the docs folder.

Companion tools and resources

A genome-wide STR catalog containing polymorphic repeats with similar properties to known pathogenic and functional STRs
REViewer, a tool for visualizing alignments of reads in regions containing tandem repeats

Method

The method is described in the following papers:

Egor Dolzhenko, Joke van Vugt, Richard Shaw, Mitch Bekritsky, and others, Detection of long repeat expansions from PCR-free whole-genome sequence data, Genome Research 2017
Egor Dolzhenko, Viraj Deshpande, Felix Schlesinger, Peter Krusche, Roman Petrovski, and others, ExpansionHunter: A sequence-graph based tool to analyze variation in short tandem repeat regions, Bioinformatics 2019

expansionhunter's People

Contributors

Stargazers

Watchers

Forkers

snewhouse trickytank nh13 acheta7 gavinwinner haoziyeung radygenomics philpalmer dnil oodnadatta nmmsv bjtrost pkuyizhan alesmaver jjuj66 tianyunwang kriquin-un biosciences huethercode kzjericho lenbok hchetia xiaosheng8361 linda9617 shicheng-guo bw2 rubencauchi jin0008 aysegulbumin blajoie feve98 kew24 davymoon novapyth dr-yoon lfearnley multimeric rachele8991 shrishtee-kandoi guliba ctsa lightning-auriga 050114dragon zzygyx9119 ngslabex kkarkar jhu99 kuanyili uclanelsonlab jinhuili-lab jossshrrr hsannife

expansionhunter's Issues

Warning messages and output

Hi, there,

We have tried the program on our interested regions, we had some warning messages:

has low weighed purity score of 0.5

We are wondering what this means?

Also, you probably have mentioned somewhere, I may have missed the explanation. The output from the program:

Reasign flanking to spanning

how can we use this information?

Thanks.

Removed repeats NIPA1, GLS and RFC1

Hello,

I'm currently updating from v3 to v4 and found out that the repeats NIPA1, GLS and RFC1 are missing in the variant catalog of the new release. Why got they removed?

Best regards,
Leon

Request: Expose the minimum weighted purity filter value

This was available for v2.x.x and no longer for 3.x.x. I think the relevant lines are here:

ExpansionHunter/region_analysis/LocusAnalyzer.cpp

Lines 190 to 191 in e2a9c5b

    
           const bool isFirstReadInrepeat = weightedPurityCalculator.score(read.sequence()) >= 0.90; 
        
           const bool isSecondReadInrepeat = weightedPurityCalculator.score(mate.sequence()) >= 0.90;

To add gene ARX to variant catalog

Hi,
I want to add gene ARX in the variant catalog,the repeat region are chrX:25031766-25031814( repeat unit is NGC ) and chrX:25031646-25031682( repeat unit is NGC ), but there's a long sequence between two repeat region( log sequence:ccctgcgccgtccggccgttccccgggccgcgcggTTGGCGGTGGCGGCGGAGGGGCCTCCCCGCGTGGACccgccgtggccgt ).
So could you help me to add the gene ?
Thanks in advance

Repeat Expansion for Mutation in FMR1 gene

Hi,

I am trying to find out repeat expansion of a affected patient for FMR1 genes, but no sure whether the bam file aligned using the hg19 reference is sufficient enough because it was unable to capture the repeat count using that bam, is there any specific command or modification is there that can help me to get that.

Request: Add LocusId to the output VCF output as an INFO field

Assertion `!frag[0].name.empty() && !frag[1].name.empty()' failed

Options:

--sex female --min-score -0.1 --min-baseq 20 --min-anchor-mapq 0 --region-extension-length 10000 --read-depth 30

Assertion produced by ExpansionHunter

ExpansionHunter: src/irr_counting.cc:332: int CountAlignedIrr(const BamFile&, const Parameters&, const AlignPairs&, std::map<std::basic_string<char>, int>&, const std::vector<std::vector<std::basic_string<char> > >&, std::vector<RepeatAlign>*): Assertion `!frag[0].name.empty() && !frag[1].name.empty()' failed.

Request to append the ATTTC expansion in DAB1 to the variant catalog

Hey all, thank you for this great tool!

I would like to ask if you could add the ATTTC-Repeat expansion (https://www.ncbi.nlm.nih.gov/books/NBK541729/ , https://onlinelibrary.wiley.com/doi/full/10.1002/humu.23704) in the DAB1 gene to the variant catalog (grch37/hg19)?

Thank you very much in advance!

Request to add custom variant catalogue for NIPA1

I am interested in analysing the NIPA1 locus in WGS data using ExpansionHunter (publication reference: Tazelaar et al., 2019. Association of NIPA1 repeat expansions with amyotrophic lateral sclerosis in a large international cohort.). Are you able to provide assistance with defining a custom catalogue entry to analyse this locus? Thanks, Melissa.

Looking for a variant catalog for C. elegans

I would like to use ExpansionHunter on C. elegans WGS data and I'm wondering if you are aware of a Variant Catalog (in JSON format for this model organism?

I know, the Caenorhabditis elegans Natural Diversity Resource CeDNR has variant data in .VCF format. Is there a helper script for converting this information into a custom variant catalogue?

output vcf file

Hi, ExpansionHunter developers,

I noticed that in the output vcf file, the GT field was labeled as 1, instead of ./., 0/0, 0/1, 1/1. What does this mean?

Thanks.

George

Getting error message

Hi,
I am trying to look for SCA repeat in my Whole genome data, but while running the tool i am getting this error
"2019-11-11T14:50:27,[size_in_units = 23 is outside of allowed range (0,22)]"

and the program is exiting no output is getting generated.

Cannot process offtarget mates for locus_n because repeat unit is not set

Hello,

I have an issue about the off targets. At some point the program stops and throws this error. But when I check the catalog.json file the entry looks fine:
{
"ReferenceRegion": "chr8:16007595-16007646",
"VariantType": "Repeat",
"LocusStructure": "(TATC)*",
"LocusId": "locus_n",
"OfftargetRegions": [
"chr14:27990612-27990623",
"chr7:54347739-54348473",
"chrX:77460138-77460149",
"chr7:67149722-67149781",
"chr8:20573715-20573769"
]
}

I got the error for another entry as well:
{
"ReferenceRegion": "chr8:16001967-16002012",
"VariantType": "Repeat",
"LocusStructure": "(AC)*",
"LocusId": "locus_m",
"OfftargetRegions": [
"chr1:179020154-179020196",
"chrX:12681696-12681747",
"chr14:89416169-89416251",
"chr21:37817465-37818005",
"chr2:878549-878569"
]
}

When I remove "chr21:37817465-37818005" off target it works somehow. I'm not sure about the problem of that region though.

Thanks a lot

Best,

Volkan

Request to add PABPN1 to variant catalog

Hi,

I want to find PABPN1 repeat expansions, but I saw this repeat is not in the variant catalog. Is it possible to get the correct entry for hg19 for this locus?
Thanks in advance!

Advanced options

Hi Egor

What is the difference between the dag-aligner and path-aligner and the seeking and streaming analysis modes under the advanced options? I would like to understand when it may be useful to specify these options.

Thanks,
Melissa

bad_alloc for early regions

I retrieve a bad_alloc after about 20 minutes when the definition includes at least one region with very small start/stop values, for example

{
"LocusId": "test",
"LocusStructure": "(TAACCC)*",
"ReferenceRegion": "chr1:9999-10500",
"VariantType": "Repeat"
}
]

The magic number seems to be 10991 or less and the error seems to occure when parsing the cataloge json, since the repeat detection step doesn't even start.

Edit: Sorry 10991 is still working, but take MUCH longer to parse than higher regions. 9999 definitely fails.

High RAM usage in "seeking" mode.

Hello,

I've been running ExpansionHunter in "seeking" mode for a few hundred variants and I see that the RAM usage keeps increasing and goes up to around 14-16 GB. I was wondering is this can be reduced or controlled in some manner.

About locus length

Hello,

I have a question about the repeat locus coordinate. When I change the repeat coordinate length even by 2bp upstream and downstream relative to coordinates given in your variant catalogue, for some of the sample I see a change of around 20-30 repeat units for some samples. Could you please help me interpret such results.

Regards,

`analyzers.size() == 1'

Hello,

I'm having an issue like I paste below:

"ExpansionHunter: /opt/conda/conda-bld/expansionhunter_1602022417547/work/sample_analysis/HtsSeekingSampleAnalysis.cpp:186: void ehunter::{anonymous}::analyzeReadPair(ehunter::AnalyzerFinder&, const ehunter::Read&, const ehunter::Read&, const AlignmentStatsCatalog&): Assertion `analyzers.size() == 1' failed.
ExpansionHunter --reads sample.bam --hg38.fa --variant-catalog EH.json --output-prefix EH_out/sample"

The strange thing is it happens at a certain entry in the json file:

"
{
"ReferenceRegion": "chr7:87547571-87547601",
"VariantType": "RareRepeat",
"LocusStructure": "(AAAGGAAGGGAAGGGAAGGG)*",
"LocusId": "ABCB1_38",
"OfftargetRegions": [
"chr20:18695680-18695731",
"chr4:137894908-137895913",
"chr3:132044263-132045046",
"chr7:87547667-87547860"
]
},
"

I'm not quite sure about this error and I couldn't find anything about it.

Thanks a lot

Volkan

Details on repeat unit of interrupted and alternate alleles

The two most common issues we have right now pertain to annotating the repeat units. In particular A) the precise repeat unit (for RFC1) present in a repeat that matches and B) interrupted- versus non-interrupted allele expansions as in ATXN1.

The latter (B) is perhaps to some extent covered by repeat purity, and may be solved by exposing/using it, but automatically recovering the longest uninterrupted pure sub-stretch would be useful.

It would be helpful for screening to be able to see already from the VCF if the discovered RFC1 alleles were AAGGG or AAAAG - or one of the other slightly more rare versions - and the zygosity to tell if the expanded locus was homozygous normal - or pathologic.

Install error "[stats/CMakeFiles/stats.dir/all] Error 2"

Hi,

I got the following error message when typing make.

Scanning dependencies of target reads
[ 39%] Building CXX object reads/CMakeFiles/reads.dir/Read.cpp.o
[ 40%] Building CXX object reads/CMakeFiles/reads.dir/ReadPairs.cpp.o
[ 41%] Linking CXX static library libreads.a
[ 41%] Built target reads
Scanning dependencies of target stats
[ 41%] Building CXX object stats/CMakeFiles/stats.dir/LocusStats.cpp.o
In file included from /home/software/boost_1_64_0/boost/numeric/ublas/vector.hpp:21:0,
                 from /home/software/boost_1_64_0/boost/numeric/ublas/matrix.hpp:18,
                 from /home/software/boost_1_64_0/boost/accumulators/statistics/covariance.hpp:22,
                 from /home/software/boost_1_64_0/boost/accumulators/statistics.hpp:13,
                 from /home/niuyw/software/ExpansionHunter/stats/LocusStats.hh:29,
                 from /home/niuyw/software/ExpansionHunter/stats/LocusStats.cpp:22:
/home/software/boost_1_64_0/boost/numeric/ublas/storage.hpp: In member function ‘void boost::numeric::ublas::unbounded_array<T, ALLOC>::serialize(Archive&, unsigned int)’:
/home/software/boost_1_64_0/boost/numeric/ublas/storage.hpp:299:18: error: ‘make_array’ is not a member of ‘boost::serialization’
             ar & serialization::make_array(data_, s);
                  ^
/home/software/boost_1_64_0/boost/numeric/ublas/storage.hpp: In member function ‘void boost::numeric::ublas::bounded_array<T, N, ALLOC>::serialize(Archive&, unsigned int)’:
/home/software/boost_1_64_0/boost/numeric/ublas/storage.hpp:494:18: error: ‘make_array’ is not a member of ‘boost::serialization’
             ar & serialization::make_array(data_, s);
                  ^
In file included from /home/software/boost_1_64_0/boost/accumulators/statistics/covariance.hpp:22:0,
                 from /home/software/boost_1_64_0/boost/accumulators/statistics.hpp:13,
                 from /home/niuyw/software/ExpansionHunter/stats/LocusStats.hh:29,
                 from /home/niuyw/software/ExpansionHunter/stats/LocusStats.cpp:22:
/home/software/boost_1_64_0/boost/numeric/ublas/matrix.hpp: In member function ‘void boost::numeric::ublas::c_matrix<T, M, N>::serialize(Archive&, unsigned int)’:
/home/software/boost_1_64_0/boost/numeric/ublas/matrix.hpp:5977:18: error: ‘make_array’ is not a member of ‘boost::serialization’
             ar & serialization::make_array(data_, N);
                  ^
make[2]: *** [stats/CMakeFiles/stats.dir/LocusStats.cpp.o] Error 1
make[1]: *** [stats/CMakeFiles/stats.dir/all] Error 2
make: *** [all] Error 2

The boost version is 1.64.0. cmake version is 3.8.0-rc4. gcc version is 4.9.3.

Do you have any ideas?

Bests,
Yiwei Niu

Realigned BAM visualization

Hello,

I was trying to visualize the realigned reads, but it seems like the format is off somehow - IGV is not able to visualize the reads. I can't seem to spot why. Any suggestions?

Thank you in advance!

Is it possible to identify potential disease-associated STR loci genome-wide using ExpansionHunter?

Hi,

I got about 200 WGS samples from a neurogenetic diseases and 200 control samples. I want to use ExpansionHunter to test whether this disease is associated with specific STR loci.

I know ExpansionHunter use a pre-defined loci as reference, but I wonder if it is possible to use it genome-wide. Say using the simple repeat track from UCSC. If yes, could you give me some advice about transforming the track into the format that ExpansionHunter needs?

I really appreciate any help you can provide.

Bests,
Yiwei Niu

Running with an unexpected Error

Hi,
First I'd like to express my thanks to this project as it seems to fit in what I'm doing right now. I'm looking for a region where the reference itself has highly repetitive sequence and I'm not sure how many portion of the repeats does my reads have.
I'm running on this on hg38 aligned files and I was running into problems. The variant-catalog file didn't contain the region that I am interested in so it gives error like this:

2019-03-01T18:21:45,[Could not recover the mate of E00387:228:HN7TFCCXY:2:1205:22912:64245/1]

When I created my own JSON script, it popped up an error

2019-03-01T17:51:07,[Definition of variant ID=XXX;classification=SmallVariant/Swap;ReferenceLocus=(4):-;optionalRefNode=None is inconsistent]

where I'm not sure what it means exactly. The "ReferenceRegion" I provided is 'chr5:486858-486869' but I don't think it's properly parsed to the program based on this errror. Could you help me explaining a bit of this error about what it means to be 'inconsistent'?

Thank you very much
Bowei

Recommendations for setting --min-score when the repeat is inexact

For many larger repeats, the same repeat unit isn't repeated exactly many times. Besides a few base differences, there are also length differences (one or two more or fewer bases). For just mismatches, I would assume setting --min-score to some reasonable value based on how different the sequences are makes sense (I'd like advice on how to determine this value)*. I am also wondering if ExpansionHunter will handle length differences? I will send you the repeat via email (egor-dolzhenko).

I could align the known repeat units in that large VNTR to determine a similarity score similar to the weighted score ExpansionHunter uses.

Is it possible to specify the log file?

Dear ExpansionHunter team,

Is it possible to specify the log file (ideally in append mode) or to log to stdout/stderr?
Thank you very much for your answer.

Anne-Sophie

ExpansionHunter for PCR-not-free protocol

Hi! Thank you for your great tool!
I have the same question as here. Could I use ExpansionHunter for samples prepared with PCR-not-free protocol? I'm using Nextera DNA Flex protocol for WGS. I've tried to run ExpansionHunter - it works, test.log and test.json.txt files are attached. How could I understand that ExpansionHunter works correctly?

Expansion Hunter v2.5.5

Originally posted by @golubnikova in #35 (comment)

Skipping low coverage genes

Hi,

I noticed that genes with a coverage below 10 are skipped in the calling. However, when I used your other tool GraphAlignmentViewer, I noticed that these images sometimes show a correct repeat size. Thats why I wanted to ask whether it is possible to lower the threshold of 10 since some images show a proper repeat size while they are skipped by ExpansionHunter.

PCR-free WGS? Illumina SeqLab specific TruSeq Nano High Throughput library?

Hello Egor,
this is a really interesting project. I'm trying to work out whether we can apply it to our data. I noticed that in your methods you state that some of the samples were prepared using Illumina SeqLab specific TruSeq Nano High Throughput library kits, which is what was used in our WGS, in combination with Illumina HiSeqX sequencing. What was your experience with the samples prepared using Illumina SeqLab specific TruSeq Nano High Throughput libraries? Did your method work despite the inital PCR step?
Thank you!

EHv4 link error on MacOSX: "Undefined symbols for architecture x86_64: "testing::internal::g_linked_ptr_mutex"

On MacOSX I'm seeing this link error when building master:

wm4d1-03b:~/p1/bin/ExpansionHunter-repo/build $ cmake ..
-- Configuring done
-- Generating done
-- Build files have been written to: ~/project1/bin/ExpansionHunter-repo/build/googletest-download
[ 11%] Performing update step for 'googletest'
[ 22%] No patch step for 'googletest'
[ 33%] No configure step for 'googletest'
[ 44%] No build step for 'googletest'
[ 55%] No install step for 'googletest'
[ 66%] No test step for 'googletest'
[ 77%] Completed 'googletest'
[100%] Built target googletest
-- Found Boost: /usr/local/lib/cmake/Boost-1.73.0/BoostConfig.cmake (found suitable version "1.73.0", minimum required is "1.4") found components: program_options filesystem regex date_time system
-- Found Boost: /usr/local/lib/cmake/Boost-1.73.0/BoostConfig.cmake (found suitable version "1.73.0", minimum required is "1.5") found components: program_options filesystem system
-- Configuring done
-- Generating done
-- Build files have been written to: ~/p1/bin/ExpansionHunter-repo/build

wm4d1-03b:~/p1/bin/ExpansionHunter-repo/build $ make
Scanning dependencies of target zlib
[  1%] Creating directories for 'zlib'
[  1%] Performing download step (git clone) for 'zlib'
-- zlib download command succeeded.  See also ~/p1/bin/ExpansionHunter-repo/build/thirdparty/zlib/src/zlib-stamp/zlib-download-*.log
[  2%] No update step for 'zlib'
[  2%] No patch step for 'zlib'
[  3%] Performing configure step for 'zlib'
Checking for gcc...
Building static library libz.a version 1.2.8 with gcc.
Checking for off64_t... No.
Checking for fseeko... Yes.
Checking for strerror... Yes.
Checking for unistd.h... Yes.
Checking for stdarg.h... Yes.
Checking whether to use vs[n]printf() or s[n]printf()... using vs[n]printf().
Checking for vsnprintf() in stdio.h... Yes.
Checking for return value of vsnprintf()... Yes.
Checking for attribute(visibility) support... Yes.
[  4%] Performing build step for 'zlib'
inflate.c:1507:61: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
    if (strm == Z_NULL || strm->state == Z_NULL) return -1L << 16;
                                                        ~~~ ^
1 warning generated.
[  4%] Performing install step for 'zlib'
-- zlib install command succeeded.  See also ~/p1/bin/ExpansionHunter-repo/build/thirdparty/zlib/src/zlib-stamp/zlib-install-*.log
[  5%] Completed 'zlib'
[  5%] Built target zlib
Scanning dependencies of target htslib
[  6%] Creating directories for 'htslib'
[  6%] Performing download step (git clone) for 'htslib'
-- htslib download command succeeded.  See also ~/p1/bin/ExpansionHunter-repo/build/thirdparty/htslib/src/htslib-stamp/htslib-download-*.log
[  7%] No update step for 'htslib'
[  7%] No patch step for 'htslib'
[  8%] No configure step for 'htslib'
[  8%] Performing build step for 'htslib'
hts.c:48:5: warning: unused function 'ks_getc' [-Wunused-function]
    KSTREAM_INIT2(, BGZF*, bgzf_read, 65536)
    ^
./htslib/kseq.h:152:2: note: expanded from macro 'KSTREAM_INIT2'
        __KS_INLINED(__read)
        ^
./htslib/kseq.h:68:20: note: expanded from macro '__KS_INLINED'
        static inline int ks_getc(kstream_t *ks) \
                          ^
1 warning generated.
cram/cram_io.c:3096:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:311: note: expanded from macro 'itf8_put'
  ...=0xf0|(((v)>>28)&0xff),(c)[1]=((v)>>20)&0xff,(c)[2]=((v)>>12)&0xff,(c)[3]=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~^~~~~~~~~~~~~~~~~
cram/cram_io.c:3096:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:345: note: expanded from macro 'itf8_put'
  ...=((v)>>20)&0xff,(c)[2]=((v)>>12)&0xff,(c)[3]=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~~~~~~^~~~~
cram/cram_io.c:3096:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:367: note: expanded from macro 'itf8_put'
  ...=((v)>>12)&0xff,(c)[3]=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~~~~~~^~~~~
cram/cram_io.c:3096:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:388: note: expanded from macro 'itf8_put'
  ...=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~~~~~^~~~~
cram/cram_io.c:3154:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:311: note: expanded from macro 'itf8_put'
  ...=0xf0|(((v)>>28)&0xff),(c)[1]=((v)>>20)&0xff,(c)[2]=((v)>>12)&0xff,(c)[3]=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~^~~~~~~~~~~~~~~~~
cram/cram_io.c:3154:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:345: note: expanded from macro 'itf8_put'
  ...=((v)>>20)&0xff,(c)[2]=((v)>>12)&0xff,(c)[3]=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~~~~~~^~~~~
cram/cram_io.c:3154:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:367: note: expanded from macro 'itf8_put'
  ...=((v)>>12)&0xff,(c)[3]=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~~~~~~^~~~~
cram/cram_io.c:3154:8: warning: implicit conversion from 'int' to 'char' changes value from 255 to -1 [-Wconstant-conversion]
        cp += itf8_put(cp, -2);
              ^~~~~~~~~~~~~~~~
cram/cram_io.h:97:388: note: expanded from macro 'itf8_put'
  ...=((v)>>4)&0xff,(c)[4]=(v)&0xf,5))
     ~~~~~~~~~^~~~~
8 warnings generated.
[  8%] Performing install step for 'htslib'
[  9%] Completed 'htslib'
[  9%] Built target htslib
Scanning dependencies of target graphtools
[ 10%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/GaplessAligner.cpp.o
[ 10%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/GappedAligner.cpp.o
[ 11%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/GraphAlignment.cpp.o
[ 11%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/GraphAlignmentOperations.cpp.o
[ 12%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/KmerIndex.cpp.o
[ 12%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/KmerIndexOperations.cpp.o
[ 13%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/LinearAlignment.cpp.o
[ 13%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/LinearAlignmentOperations.cpp.o
[ 14%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/Operation.cpp.o
[ 15%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/OperationOperations.cpp.o
[ 15%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/PinnedAligner.cpp.o
[ 16%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/TracebackMatrix.cpp.o
[ 16%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/TracebackRunner.cpp.o
[ 17%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphalign/dagAligner/PenaltyMatrix.cpp.o
[ 17%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/Graph.cpp.o
[ 18%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/GraphBuilders.cpp.o
[ 18%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/GraphCoordinates.cpp.o
[ 19%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/GraphOperations.cpp.o
[ 19%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/GraphReferenceMapping.cpp.o
[ 20%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/Path.cpp.o
[ 21%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/PathFamily.cpp.o
[ 21%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/PathFamilyOperations.cpp.o
[ 22%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphcore/PathOperations.cpp.o
[ 22%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphio/AlignmentWriter.cpp.o
[ 23%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphio/GraphJson.cpp.o
[ 23%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphutils/DepthTest.cpp.o
[ 24%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphutils/IntervalBuffer.cpp.o
[ 24%] Building CXX object thirdparty/graph-tools-master/CMakeFiles/graphtools.dir/src/graphutils/SequenceOperations.cpp.o
[ 25%] Linking CXX static library libgraphtools.a
[ 25%] Built target graphtools
Scanning dependencies of target region_spec
[ 25%] Building CXX object region_spec/CMakeFiles/region_spec.dir/LocusSpecification.cpp.o
[ 26%] Building CXX object region_spec/CMakeFiles/region_spec.dir/VariantSpecification.cpp.o
[ 27%] Linking CXX static library libregion_spec.a
[ 27%] Built target region_spec
Scanning dependencies of target common
[ 28%] Building CXX object common/CMakeFiles/common.dir/Common.cpp.o
[ 29%] Building CXX object common/CMakeFiles/common.dir/CountTable.cpp.o
[ 29%] Building CXX object common/CMakeFiles/common.dir/GenomicRegion.cpp.o
[ 30%] Building CXX object common/CMakeFiles/common.dir/HtsHelpers.cpp.o
[ 30%] Building CXX object common/CMakeFiles/common.dir/Parameters.cpp.o
[ 31%] Building CXX object common/CMakeFiles/common.dir/Reference.cpp.o
[ 31%] Building CXX object common/CMakeFiles/common.dir/ReferenceContigInfo.cpp.o
[ 32%] Linking CXX static library libcommon.a
[ 32%] Built target common
Scanning dependencies of target filtering
[ 32%] Building CXX object filtering/CMakeFiles/filtering.dir/GraphVariantAlignmentStats.cpp.o
[ 33%] Building CXX object filtering/CMakeFiles/filtering.dir/OrientationPredictor.cpp.o
[ 33%] Linking CXX static library libfiltering.a
[ 33%] Built target filtering
Scanning dependencies of target stats
[ 34%] Building CXX object stats/CMakeFiles/stats.dir/LocusStats.cpp.o
[ 34%] Building CXX object stats/CMakeFiles/stats.dir/ReadSupportCalculator.cpp.o
[ 35%] Building CXX object stats/CMakeFiles/stats.dir/WeightedPurityCalculator.cpp.o
[ 35%] Linking CXX static library libstats.a
[ 35%] Built target stats
Scanning dependencies of target input
[ 36%] Building CXX object input/CMakeFiles/input.dir/CatalogLoading.cpp.o
[ 37%] Building CXX object input/CMakeFiles/input.dir/GraphBlueprint.cpp.o
[ 37%] Building CXX object input/CMakeFiles/input.dir/LocusSpecDecoding.cpp.o
[ 38%] Building CXX object input/CMakeFiles/input.dir/ParameterLoading.cpp.o
[ 38%] Building CXX object input/CMakeFiles/input.dir/RegionGraph.cpp.o
[ 39%] Building CXX object input/CMakeFiles/input.dir/SampleStats.cpp.o
[ 39%] Linking CXX static library libinput.a
[ 39%] Built target input
Scanning dependencies of target genotyping
[ 40%] Building CXX object genotyping/CMakeFiles/genotyping.dir/AlignMatrix.cpp.o
[ 41%] Building CXX object genotyping/CMakeFiles/genotyping.dir/AlleleChecker.cpp.o
[ 41%] Building CXX object genotyping/CMakeFiles/genotyping.dir/RepeatGenotype.cpp.o
[ 42%] Building CXX object genotyping/CMakeFiles/genotyping.dir/SmallVariantGenotype.cpp.o
[ 42%] Building CXX object genotyping/CMakeFiles/genotyping.dir/SmallVariantGenotyper.cpp.o
[ 43%] Building CXX object genotyping/CMakeFiles/genotyping.dir/StrAlign.cpp.o
[ 43%] Building CXX object genotyping/CMakeFiles/genotyping.dir/OneAlleleStrGenotyper.cpp.o
[ 44%] Building CXX object genotyping/CMakeFiles/genotyping.dir/TwoAlleleStrGenotyper.cpp.o
[ 44%] Building CXX object genotyping/CMakeFiles/genotyping.dir/FragLogliks.cpp.o
[ 45%] Building CXX object genotyping/CMakeFiles/genotyping.dir/StrGenotyper.cpp.o
[ 45%] Building CXX object genotyping/CMakeFiles/genotyping.dir/AlignMatrixFiltering.cpp.o
[ 46%] Linking CXX static library libgenotyping.a
[ 46%] Built target genotyping
Scanning dependencies of target reads
[ 47%] Building CXX object reads/CMakeFiles/reads.dir/Read.cpp.o
[ 48%] Building CXX object reads/CMakeFiles/reads.dir/ReadPairs.cpp.o
[ 48%] Linking CXX static library libreads.a
[ 48%] Built target reads
Scanning dependencies of target classification
[ 48%] Building CXX object classification/CMakeFiles/classification.dir/AlignmentClassifier.cpp.o
[ 49%] Building CXX object classification/CMakeFiles/classification.dir/ClassifierOfAlignmentsToVariant.cpp.o
[ 49%] Linking CXX static library libclassification.a
[ 49%] Built target classification
Scanning dependencies of target alignment
[ 50%] Building CXX object alignment/CMakeFiles/alignment.dir/AlignmentFilters.cpp.o
[ 50%] Building CXX object alignment/CMakeFiles/alignment.dir/AlignmentTweakers.cpp.o
[ 51%] Building CXX object alignment/CMakeFiles/alignment.dir/GreedyAlignmentIntersector.cpp.o
[ 51%] Building CXX object alignment/CMakeFiles/alignment.dir/HighQualityBaseRunFinder.cpp.o
[ 52%] Building CXX object alignment/CMakeFiles/alignment.dir/OperationsOnAlignments.cpp.o
[ 52%] Building CXX object alignment/CMakeFiles/alignment.dir/SoftclippingAligner.cpp.o
[ 53%] Linking CXX static library libalignment.a
[ 53%] Built target alignment
Scanning dependencies of target region_analysis
[ 54%] Building CXX object region_analysis/CMakeFiles/region_analysis.dir/LocusAnalyzer.cpp.o
[ 54%] Building CXX object region_analysis/CMakeFiles/region_analysis.dir/LocusFindings.cpp.o
[ 55%] Building CXX object region_analysis/CMakeFiles/region_analysis.dir/RepeatAnalyzer.cpp.o
[ 55%] Building CXX object region_analysis/CMakeFiles/region_analysis.dir/SmallVariantAnalyzer.cpp.o
[ 56%] Building CXX object region_analysis/CMakeFiles/region_analysis.dir/VariantAnalyzer.cpp.o
[ 56%] Building CXX object region_analysis/CMakeFiles/region_analysis.dir/VariantFindings.cpp.o
[ 57%] Linking CXX static library libregion_analysis.a
[ 57%] Built target region_analysis
Scanning dependencies of target sample_analysis
[ 57%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/AnalyzerFinder.cpp.o
[ 58%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/GenomeMask.cpp.o
[ 58%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/GenomeQueryCollection.cpp.o
[ 59%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/HtsFileSeeker.cpp.o
[ 59%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/HtsFileStreamer.cpp.o
[ 60%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/HtsSeekingSampleAnalysis.cpp.o
[ 60%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/HtsStreamingSampleAnalysis.cpp.o
[ 61%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/IndexBasedDepthEstimate.cpp.o
[ 61%] Building CXX object sample_analysis/CMakeFiles/sample_analysis.dir/MateExtractor.cpp.o
[ 62%] Linking CXX static library libsample_analysis.a
[ 62%] Built target sample_analysis
Scanning dependencies of target output
[ 63%] Building CXX object output/CMakeFiles/output.dir/BamletWriter.cpp.o
[ 63%] Building CXX object output/CMakeFiles/output.dir/JsonWriter.cpp.o
[ 64%] Building CXX object output/CMakeFiles/output.dir/VcfHeader.cpp.o
[ 64%] Building CXX object output/CMakeFiles/output.dir/VcfWriter.cpp.o
[ 65%] Building CXX object output/CMakeFiles/output.dir/VcfWriterHelpers.cpp.o
[ 65%] Linking CXX static library liboutput.a
[ 65%] Built target output
Scanning dependencies of target ExpansionHunter
[ 66%] Building CXX object CMakeFiles/ExpansionHunter.dir/src/ExpansionHunter.cpp.o
[ 66%] Linking CXX executable ExpansionHunter
ld: warning: direct access in function 'boost::wrapexcept<boost::program_options::validation_error>::rethrow() const' from file '/usr/local/lib/libboost_program_options-mt.a(value_semantic.o)' to global weak symbol 'typeinfo for boost::wrapexcept<boost::program_options::validation_error>' from file 'input/libinput.a(ParameterLoading.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'boost::filesystem::detail::directory_iterator_construct(boost::filesystem::directory_iterator&, boost::filesystem::path const&, unsigned int, boost::system::error_code*)' from file '/usr/local/lib/libboost_filesystem-mt.a(directory.o)' to global weak symbol 'boost::system::detail::is_generic_value(int)::gen' from file 'input/libinput.a(SampleStats.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'boost::system::detail::system_error_category::default_error_condition(int) const' from file '/usr/local/lib/libboost_filesystem-mt.a(operations.o)' to global weak symbol 'boost::system::detail::is_generic_value(int)::gen' from file 'input/libinput.a(SampleStats.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
[ 66%] Built target ExpansionHunter
Scanning dependencies of target gtest
[ 66%] Building CXX object googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 67%] Linking CXX static library ../../lib/libgtest.a
[ 67%] Built target gtest
Scanning dependencies of target gmock
[ 68%] Building CXX object googletest-build/googlemock/CMakeFiles/gmock.dir/src/gmock-all.cc.o
[ 68%] Linking CXX static library ../../lib/libgmock.a
[ 68%] Built target gmock
Scanning dependencies of target gmock_main
[ 69%] Building CXX object googletest-build/googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o
[ 69%] Linking CXX static library ../../lib/libgmock_main.a
[ 69%] Built target gmock_main
Scanning dependencies of target gtest_main
[ 70%] Building CXX object googletest-build/googletest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
[ 70%] Linking CXX static library ../../lib/libgtest_main.a
[ 70%] Built target gtest_main
Scanning dependencies of target GenomicRegionTest
[ 71%] Building CXX object common/tests/CMakeFiles/GenomicRegionTest.dir/GenomicRegionTest.cpp.o
[ 71%] Linking CXX executable GenomicRegionTest
[ 71%] Built target GenomicRegionTest
Scanning dependencies of target CountTableTest
[ 71%] Building CXX object common/tests/CMakeFiles/CountTableTest.dir/CountTableTest.cpp.o
[ 72%] Linking CXX executable CountTableTest
[ 72%] Built target CountTableTest
Scanning dependencies of target StrGenotyperTest
[ 72%] Building CXX object genotyping/tests/CMakeFiles/StrGenotyperTest.dir/StrGenotyperTest.cpp.o
[ 73%] Linking CXX executable StrGenotyperTest
[ 73%] Built target StrGenotyperTest
Scanning dependencies of target RepeatGenotypeTest
[ 74%] Building CXX object genotyping/tests/CMakeFiles/RepeatGenotypeTest.dir/RepeatGenotypeTest.cpp.o
[ 75%] Linking CXX executable RepeatGenotypeTest
[ 75%] Built target RepeatGenotypeTest
Scanning dependencies of target StrAlignMatrixTest
[ 75%] Building CXX object genotyping/tests/CMakeFiles/StrAlignMatrixTest.dir/AlignMatrixTest.cpp.o
[ 76%] Linking CXX executable StrAlignMatrixTest
[ 76%] Built target StrAlignMatrixTest
Scanning dependencies of target StrAlignTest
[ 76%] Building CXX object genotyping/tests/CMakeFiles/StrAlignTest.dir/StrAlignTest.cpp.o
[ 77%] Linking CXX executable StrAlignTest
[ 77%] Built target StrAlignTest
Scanning dependencies of target AlleleCheckerTest
[ 77%] Building CXX object genotyping/tests/CMakeFiles/AlleleCheckerTest.dir/AlleleCheckerTest.cpp.o
[ 78%] Linking CXX executable AlleleCheckerTest
[ 78%] Built target AlleleCheckerTest
Scanning dependencies of target FragLogliksTest
[ 79%] Building CXX object genotyping/tests/CMakeFiles/FragLogliksTest.dir/FragLogliksTest.cpp.o
[ 79%] Linking CXX executable FragLogliksTest
Undefined symbols for architecture x86_64:
  "testing::internal::g_linked_ptr_mutex", referenced from:
      testing::internal::linked_ptr_internal::depart() in FragLogliksTest.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [genotyping/tests/FragLogliksTest] Error 1
make[1]: *** [genotyping/tests/CMakeFiles/FragLogliksTest.dir/all] Error 2
make: *** [all] Error 2

wm4d1-03b:~/p1/bin/ExpansionHunter-repo/build $ cd ..
wm4d1-03b:~/p1/bin/ExpansionHunter-repo $ git pull
Already up to date.
wm4d1-03b:~/p1/bin/ExpansionHunter-repo $ cmake --version
cmake version 3.18.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).
wm4d1-03b:~/p1/bin/ExpansionHunter-repo $ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.3 (clang-1103.0.32.62)
Target: x86_64-apple-darwin19.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Tool/Option to help identify OffTargetRegions

Hi!

I would like to use this tool for things other than the 3 examples. Currently to do this, the user needs to create a specs json file that includes OffTargetRegions.

Are there any plans to offer a tool/option such that users can explore potential off target regions similar to Supplementary Figure 4? (http://biorxiv.org/content/early/2016/12/19/093831)

Or alternatively, creating a file for all the known repeat expansion genes such as those reported in https://www.ncbi.nlm.nih.gov/pubmed/17417937.

At the moment, Supplementary Section 3.2 is all the user has for guidance if they would like to explore a different loci.

Thanks!

Regards,
Monkol

Is reported ATNX2 repeat unit correct?

I have noted that the repeat unit reported for ATXN2 in the output json file is GCT. Is this correct? Since in the literature I see it is referred to as a CAG repeat.

Error while runnning bam file

Hello,

getting the following error when ruining my bam file

"2019-03-01T15:47:50,[Starting Expansion Hunter v3.0.0-rc2]
2019-03-01T15:47:50,[Analyzing sample Prajwal_Wagh_aligned.sorted]
2019-03-01T15:47:50,[Read length is set to 70]
2019-03-01T15:47:50,[Initializing reference /mnt/NGS/Human_Exome_hg19/hg19.fa]
2019-03-01T15:47:50,[Loading variant catalog from disk /home/ngs/Downloads/ExpansionHunter-v3.0.0-rc2-linux_x86_64/variant_catalog/variant_catalog_hg19.json]
2019-03-01T15:47:50,[Running sample analysis]
2019-03-01T15:47:50,[Depth is set to 19.6855]
2019-03-01T15:47:52,[size_in_units = 25 is outside of allowed range (0,24)]"

Command I am using
"/home/ngs/Downloads/ExpansionHunter-v3.0.0-rc2-linux_x86_64/bin/ExpansionHunter --reads /home/ngs/Downloads/RepeatHMM/bin/Prajwal_Wagh_aligned.sorted.bam --reference /mnt/NGS/Human_Exome_hg19/hg19.fa --variant-catalog /home/ngs/Downloads/ExpansionHunter-v3.0.0-rc2-linux_x86_64/variant_catalog/variant_catalog_hg19.json --output-prefix PW_new_TRD --sex male"

help me to get the correct command its urgent.

Compilation time error.

Hi,
We are encountering errors at compilation time on our Redhat 7.4 server using gcc/6.1.0, cmake/3.15.3, and boost/1.72.0 (compiled with the same gcc) and we got error as the following:

.
.
.
[ 46%] Building CXX object workflow/CMakeFiles/workflow.dir/VariantFindings.cpp.o
[ 46%] Building CXX object workflow/CMakeFiles/workflow.dir/WorkflowBuilder.cpp.o
/gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/WorkflowBuilder.cpp: In function ‘std::shared_ptrehunter::GraphSmallVariantAnalyzer ehunter::createSmallVariantAnalyzer(const std::shared_ptrehunter::GraphModel&, const ehunter::VariantSpecification&)’:
/gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/WorkflowBuilder.cpp:93:108: error: call of overloaded ‘make_shared(std::shared_ptrehunter::GraphSmallVariant&, const string&, ehunter::VariantSubtype, const boost::optional&)’ is ambiguous
smallVariant, variantSpec.id(), variantSpec.classification().subtype, variantSpec.optionalRefNode());
^
In file included from /gpfs/share/apps/gcc/6.1.0/include/c++/6.1.0/memory:82:0,
from /gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/WorkflowBuilder.hh:24,
from /gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/WorkflowBuilder.cpp:22:
/gpfs/share/apps/gcc/6.1.0/include/c++/6.1.0/bits/shared_ptr.h:632:5: note: candidate: std::shared_ptr<_Tp1> std::make_shared(_Args&& ...) [with _Tp = ehunter::GraphSmallVariantAnalyzer; _Args = {std::shared_ptrehunter::GraphSmallVariant&, const std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, ehunter::VariantSubtype, const boost::optional&}]
make_shared(_Args&&... __args)
^~~~~~~~~~~
In file included from /gpfs/share/apps/boost/1.72.0/include/boost/smart_ptr/make_shared.hpp:14:0,
from /gpfs/share/apps/boost/1.72.0/include/boost/archive/detail/helper_collection.hpp:28,
from /gpfs/share/apps/boost/1.72.0/include/boost/archive/detail/basic_iarchive.hpp:28,
from /gpfs/share/apps/boost/1.72.0/include/boost/serialization/vector.hpp:25,
from /gpfs/share/apps/boost/1.72.0/include/boost/accumulators/statistics/density.hpp:28,
from /gpfs/share/apps/boost/1.72.0/include/boost/accumulators/statistics.hpp:14,
from /gpfs/share/apps/expansionhunter/raw/ExpansionHunter/stats/LocusStats.hh:29,
from /gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/LocusFindings.hh:28,
from /gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/LocusAnalyzer.hh:28,
from /gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/WorkflowBuilder.hh:29,
from /gpfs/share/apps/expansionhunter/raw/ExpansionHunter/workflow/WorkflowBuilder.cpp:22:
/gpfs/share/apps/boost/1.72.0/include/boost/smart_ptr/make_shared_object.hpp:248:87: note: candidate: typename boost::detail::sp_if_not_array::type boost::make_shared(Args&& ...) [with T = ehunter::GraphSmallVariantAnalyzer; Args = {std::shared_ptrehunter::GraphSmallVariant&, const std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, ehunter::VariantSubtype, const boost::optional&}; typename boost::detail::sp_if_not_array::type = boost::shared_ptrehunter::GraphSmallVariantAnalyzer]
template< class T, class... Args > typename boost::detail::sp_if_not_array< T >::type make_shared( Args && ... args )
^~~~~~~~~~~
make[2]: *** [workflow/CMakeFiles/workflow.dir/WorkflowBuilder.cpp.o] Error 1
make[1]: *** [workflow/CMakeFiles/workflow.dir/all] Error 2
make: *** [all] Error 2

We have tried different combinations of gcc/4.8, boost/1.63.0, and other cmake versions and each time compilation error occurs at different stage. I appreciate if you help us resolve this issue.

Thanks

(AAAAG|AAGGG)*

I suspect there would be quite a bit of interest to have an option for switched repeat units. Have you had a look at this already? Other suggestions for good workarounds for RFC1? I'm currently just using (AAGGG)*... Cheers!

Lots of Alleles

Hi guys,

I'm trying to understand the VCF output of ExpansionHunter, and I have some questions. I'm profiling the (autosomal) Huntington repeat, and I'm seeing several alleles, when I would expect just two.

For example,

$ grep -i htt *.vcf
proband.vcf:4 3076603 HTT     CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG      <STR20>,<STR80> .       PASS    SVTYPE=STR;END=3076660;REF=19;RL=57;RU=CAG      GT:SO:SP:CN:CI       0/1/2:SPANNING/SPANNING/INREPEAT:2/11/14:19/20/80:././65-93
parentA.vcf:4 3076603 HTT     CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG      <STR16>,<STR17>,<STR20> .       PASS    SVTYPE=STR;END=3076660;REF=19;RL=57;RU=CAG  GT:SO:SP:CN:CI   0/1/2/3:SPANNING/SPANNING/SPANNING/SPANNING:3/1/10/12:19/16/17/20:./././.
control.vcf:4 3076603 HTT     CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG      <STR17>,<STR20>,<STR22>,<STR23> .       PASS    SVTYPE=STR;END=3076660;REF=19;RL=57;RU=CAG   GT:SO:SP:CN:CI  1/2/3/4:SPANNING/SPANNING/SPANNING/SPANNING:12/1/2/5:17/20/22/23:./././.

Is ExpansionHunter really trying to say that the proband has three alleles (0/1/2) —and parentA and the control have four? Am I misinterpreting this?

<STR20> (in the proband) seems pretty close to the reference of 19 repeats. Are they in fact same and 0/1/2 really should be 0/2 (i.e., one haplotype with the normal 19 repeats, and the other with 80?)

Specific context: The proband seems to 80 have repeats—which would fall into the disease-causing range—which fits. But does the parent have an expansion too?

Mean argument is 0

Hello,

version: v3.0.0-rc1-linux_x86_64

I get this error when trying to run ExpansionHunter. Illumina HiSeq 2000 WGS data.

2019-01-16T08:59:49,[Error in function boost::math::poisson_distribution<d>::poisson_distribution: Mean argument is 0, but must be > 0 !]
Any Ideas?

Thanks
Matt

"Invalid contig name X" using example data

Hi all, thanks so much for providing your program. I'm looking forward to running it. I just gave it a shot using the provided example data and got this error. I've tried it a few different ways to no avail.

Input:
./build/ExpansionHunter --reads example/input/variants.bam --reference example/input/reference.fa --variant-catalog variant_catalog/grch37/variant_catalog.json --output-prefix example

Output:
2020-01-28T00:29:27,[Starting Expansion Hunter v3.2.0] 2020-01-28T00:29:27,[Workflow parameter object is initialized with HeuristicParameters(regionExtensionLength=1000, qualityCutoffForGoodBaseCall=20, skipUnaligned=true, alignerType=dag-aligner, kmerLenForAlignment=14, paddingLength=10, seedAffixTrimLength=14)] 2020-01-28T00:29:27,[Analyzing sample variants] 2020-01-28T00:29:27,[Initializing reference example/input/reference.fa] [fai_load] build FASTA index. 2020-01-28T00:29:27,[Loading variant catalog from disk variant_catalog/hg19/variant_catalog.json] 2020-01-28T00:29:27,[Invalid contig name chrX]

I also tried switching to variant_catalog/grch37/variant_catalog.json, producing a slightly different error message (Invalid contig name X instead of Invalid contig name chrX) and also tried providing a full reference genome rather than the example one (which didn't change anything). I haven't tried using my own bams because I don't feel that should change anything.

Now wondering whether I'm not actually supposed to be using the example variant catalogs. Any feedback would be helpful.

Thank you!
Lee-kai

I would like to use in plant, de novo identify the repeat by

Hi,
I would like to use this model in a plant genome and WGS data. The Variant catalog file is a required input, I think there is a hypothetical strategy to achieve this goal,
step1: identify the repeat by tools like Tandem Repeats Finder, and the output is a .txt contain repeats.
step2: extra the basic structure by Python Scripts, eg: CGGCGGCGG --> (CGG)*
step3: modify the format in a JSON array:
[
{
"LocusId": "ref_rep1",
"LocusStructure": "(CAG)",
"ReferenceRegion": "1:462-522",
"VariantType": "Repeat"
},
{
"LocusId": "ref_rep2",
"LocusStructure": "(CAGT)CGTTG(CGG)",
"ReferenceRegion": "1:1593-1624",
"VariantType": ["Repeat", "Repeat"]
},
{
"LocusId": "ref_rep3",
"LocusStructure": "(TGGGCAGCAGTA)",
"ReferenceRegion": "1:4731-4910",
"VariantType": "Repeat"
},
]
Sometimes, it is hard to obtain annotation in a plant genome. It is easy to gain the repeats from a reference genome using tools like Tandem Repeats Finder, can this method be used for identification and keep as much information as possible? and is there any way to make this Variant catalog file from reference genome(.fasta)?
Thanking you!

Specifying offtarget regions for loci with multiple variants

Just in case this isn't a lot of work to support - for variant catalog loci with multiple variants (like HTT, FXN, etc.) it would be nice to be able to specify off-targets for one or more of the variants.
For example, I tried adding them as a list of lists, but got a type error about expecting string instead of array:

{
      "LocusId": "FXN-chr9-69037286-69037304-GAA",
      "LocusStructure": "(A)*(GAA)*",
      "ReferenceRegion": [
        "9:69037261-69037286",
        "9:69037286-69037304"
      ],
      "VariantId": [
        "FXN_A",
        "FXN"
      ],
      "VariantType": [
        "Repeat",
        "RareRepeat"
      ],
      "OfftargetRegions": [
        [], [
          "chr2:220546033-220546610",
          "chr5:127247161-127247640",
          "chrX:51621350-51621856",
          "chr1:101657701-101658187",
          "chr13:102161416-102161881",
          "chr7:37848005-37848522",
          "chrY:25645531-25646013",
          "chr7:84690949-84691442",
          "chrUn_KN707747v1_decoy:1062-2074",
          "chr6:50708070-50708556",
          "chrY:24024122-24024600"
        ]
      ]
    }

Also a few places in https://github.com/Illumina/ExpansionHunter/blob/master/docs/04_VariantCatalogFiles.md say "VariantStatus" which I think is meant to say "VariantType"

Problem related to the pre-installed software and packages

Good day!

Is it possible to take into account a case when prerequisites (for example, htslib library) are already installed on my own? A "rigid" installation of these packages there leads to a problem of incompatibility of the versions.

Best wishes.
p.s. the same issue was created in the REViewer repo: Illumina/REViewer#12

enhance index search

some softwares generate prefix.bai index not prefix.bam.bai.

PCR-Not-Free Samples

Hi guys,

ExpansionHunter seems like an awesome tool, and I'd love to use it. The WGS samples I have, though, have been prepared with six cycles of PCR.

I know in your paper you discuss analyzing 12 samples prepared with a PCR step. It seems like the results from these samples were mixed?

What's your ultimate advice on samples prepared with PCR—can be ExpansionHunter be used reliably on them? Just on repeats with low GC content? Is there anything I can do computationally, like using Picard's MarkDuplicates, to get good results?

Region extension length

Hi,
Your tool is very usefull, thank you very much! I'm trying to use it for Exome Seq data and it seems to work. However, I see some differences in the number of samples with a "pass" if I change the region extension length. What exactly is the meening of this parameter? By reading the documentation, my best guess is that it will only look for reads -X and +X of this region. If the parameter is set to 1000, does this means looking at reads at -1000 and +1000 of the ROI, or rather -500 and +500 of the ROI?
Thanks in advance,
Jeroen

[Invalid contig name chr1]

Hey,
I have trouble related to the incompatibility of the contigs (for example, chr 1 against 1).
The trouble goes just after the solved one "Invalid contig name X" which successfully was eliminated.

I've tried all the possible things starting from the changing reference genome (37, 38 to the local test reference.fa) and ending with the usage of my own bam file as an inputs while do tests.
I don't know exactly where I am wrong.

Repeat on Forward and reverse strand

Hello,

Thank you for Expansion Hunter and its expansion with newer version. Indeed a great tool!

I have few question in terms of how EH handles repeats on forward and reverse strand
a) For CAG repeat (defined as region 5586155-5586227, hg38 per variant catalog), if I run expansion hunter with locus ID as CTG (and same region as above), I expect to get similar results. My logic is CTG exist in the reverse strand.
b) I did try that. For some of my samples I get similar results, but then for few samples results change drastically.

So my question is how EH handles such scenario. If running those scenarios even feasible in your opinion?
Please correct me if I am running the program wrong.

Thank you!
Regards,
Ashwani

EHv4 link error on CentOS7: "undefined reference to boost::system::detail::generic_category_ncx()"

Trying to build master on Centos7, but running into a link error with boost:

analysis-node:~/p1/bin/ExpansionHunter-repo/build 4956 2 $ cmake ..
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/disks/data-disk/project1/bin/ExpansionHunter-repo/build/googletest-download
[ 11%] Performing update step for 'googletest'
[ 22%] No configure step for 'googletest'
[ 33%] No build step for 'googletest'
[ 44%] No install step for 'googletest'
[ 55%] No test step for 'googletest'
[ 66%] Completed 'googletest'
[100%] Built target googletest
-- Found Boost: /usr/local/include (found suitable version "1.68.0", minimum required is "1.4") found components:  program_options filesystem regex date_time system
-- Found Boost: /usr/local/include (found suitable version "1.68.0", minimum required is "1.5") found components:  program_options filesystem system
-- Configuring done
-- Generating done
-- Build files have been written to: ~/p1/bin/ExpansionHunter-repo/build
analysis-node:~/p1/bin/ExpansionHunter-repo/build 4957 0 $ make
[ 16%] Built target graphtools
[ 21%] Built target zlib
[ 25%] Built target htslib
[ 27%] Built target region_spec
[ 32%] Built target common
[ 33%] Built target filtering
[ 35%] Built target stats
[ 39%] Built target input
[ 46%] Built target genotyping
[ 48%] Built target reads
[ 49%] Built target classification
[ 53%] Built target alignment
[ 57%] Built target region_analysis
[ 62%] Built target sample_analysis
[ 65%] Built target output
[ 65%] Linking CXX executable ExpansionHunter
input/libinput.a(ParameterLoading.cpp.o): In function `boost::system::generic_category()':
ParameterLoading.cpp:(.text._ZN5boost6system16generic_categoryEv[_ZN5boost6system16generic_categoryEv]+0x5): undefined reference to `boost::system::detail::generic_category_ncx()'
collect2: error: ld returned 1 exit status
make[2]: *** [ExpansionHunter] Error 1
make[1]: *** [CMakeFiles/ExpansionHunter.dir/all] Error 2
make: *** [all] Error 2

Request to add SAMD12 (Ishiura et al 2018)

Hi, I am planning to run the complex repeats associated with FAME, kindly help me to create the json file for SAMD12 gene containing the TTTTA and TTTCA repeats. Thanks

output vcf file

Sorry, I accidentally closed the issue. The region we look at is on chr7, which is supposed to be diploid. Here is part of the record

GT:SO:SP:CN:CI 1:FLANKING:6:22:22-22

Thanks.

George

"Encountered empty query"

With master 274903d but also tag v3.1.2 neither a Nanopore based BAM (NGLMR, also minimap2) nor Illumina 150bp based BAM (bwa mem, also minimap2) seem to work at all in ExpansionHunter.

With the Nanopore based BAM there is no error shown, but the resulting VCF is blank apart of the header. There are just warnings(?) like:

2020-01-12T21:22:54,[Skipping 1cfb1d7c-1277-4ed6-875e-65dfa75272d8/2 because it is unpaired]
2020-01-12T21:22:54,[Skipping 219c7422-e73c-4d63-9097-a01bd7facce9/2 because it is unpaired]
2020-01-12T21:22:54,[Skipping 9d91b9b5-f2d1-4b5d-9aa1-6f91b4ab1a01/2 because it is unpaired]
2020-01-12T21:22:54,[Skipping locus TCF4 due to low coverage]

With the Illumina based BAM the search stops immediately with "Encountered empty query".

$ ExpansionHunter --reads kitID_hg38_bwamem_v2.bam --reference /sf_Genome/hg38/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna --output-prefix kitID_hg38 --variant-catalog variants.json --log-level=trace

2020-01-12T21:08:56,[Starting Expansion Hunter v3.2.0]
2020-01-12T21:08:56,[Workflow parameter object is initialized with HeuristicParameters(regionExtensionLength=1000, qualityCutoffForGoodBaseCall=20, skipUnaligned=true, alignerType=dag-aligner, kmerLenForAlignment=14, paddingLength=10, seedAffixTrimLength=14)]
2020-01-12T21:08:56,[Analyzing sample 60820188482512_hg38_bwamem_v2]
2020-01-12T21:08:56,[Initializing reference /sf_Genome/hg38/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna]
2020-01-12T21:08:56,[Loading variant catalog from disk variants.json]
2020-01-12T21:08:56,[Running sample analysis in seeking mode]
2020-01-12T21:08:57,[Encountered empty query for A00910:49:HW7Y5DSXX:3:2537:3278:34632/1]

	const bool isFirstReadInrepeat = weightedPurityCalculator.score(read.sequence()) >= 0.90;
	const bool isSecondReadInrepeat = weightedPurityCalculator.score(mate.sequence()) >= 0.90;

illumina / expansionhunter Goto Github PK

expansionhunter's Introduction

Expansion Hunter: a tool for estimating repeat sizes

License

Documentation

Companion tools and resources

Method

expansionhunter's People

Contributors

Stargazers

Watchers

Forkers

expansionhunter's Issues

Recommend Projects

Recommend Topics

Recommend Org