illumina / hap.py Goto Github PK

View Code? Open in Web Editor NEW

399.0 399.0 119.0 181.22 MB

Haplotype VCF comparison tools

License: Other

Python 19.85% Shell 5.23% C++ 73.20% CMake 1.17% R 0.45% Dockerfile 0.10%

bioinformatics genomics vcf vcf-comparison

hap.py's People

Contributors

Stargazers

Watchers

Forkers

avilella simexin goranrakocevic genomicsnx winterli1993 zzygyx9119 snashraf luisfdf kentalot jq2252 mdshw5 shajoezhu scchess ichobits wxb263stu adamjorr haoziyeung yqwu1983 sjmentch raonyguimaraes evantheb wujh2017 gezthxbio kong75 tparidae xchromosome219 tw7649116 dcfmodel rezajf buchkovm nh13 palc dychangfeng nicholas-owen xchromosome tsnorri wave-wu modulexcite eipvccri pkrusche aneeshpanoli mattwherlock jaredo s-boardman wangyumei-gd imsarath areebapatel jaspez wangdi2014 cjwei-quest popucui yfarjoun schmidgu mikechavez jackycsie quanrd cfrane gavinwinner pichuan rennangm hguturu metzgerpatrick danielamsel tealeave stavraspa lierking cmguodong starnas satchellhong muluayele999 bopohdr denise0593 nate-d-olson zhuakexi chris-chung-0 twhanigan leipzig cdvogela thoughtsynapse zhanglixixi terelaguinge mamamya rogerzhanglijie chvbs2000 piyanut-ratp limingxiang pamgan ksebby steven-n-hart urmi-21 mingcheninsz filipejesus bballew meredith705 pratyushtandale das2000sidd teodora-bors adriamopa nyagam xiaoqiwang19

hap.py's Issues

Buffer overflow in hap.py’s `preprocess` program

hap.py/src/c++/lib/variant/VariantReader.cpp

Line 735 in 6c907ce

int * ad = new int[adcount];

, the size allocated for the array ad is adcount, whereas in the function

hap.py/src/c++/lib/tools/BCFHelpers.cpp

Line 532 in 6c907ce

    
           void getAD(const bcf_hdr_t * header, bcf1_t * line, int isample, int *ad, int max_ad)

(which gets called at VarientReader:738 with ad) writes values.size() elements (
where values is a vector resembling the elements in the AD entry of a sample), causing a buffer overflow.

This could either be solved by: exiting on an incorrect AD field, truncating the interpreted AD fields so that they are of correct length or making the size of AD be the size of the actualAD fields (I haven’t actually looked though the code to see how this code path is used and what would be appropriate).

This bug can be shown to happen in the following vcf (I’ve exaggerated the number of AD fields to make problems happen). This should segfault on the next delete [] statement due to heap corruption:

##fileformat=VCFv4.1
##contig=<ID=chr1,length=10000>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1
chr1    2       .       AAT     T       1       PASS    .       GT:AD   1/1:1,1,1,1,2,3,2,4,1,3,5,5,3,2,12,1,3,3,1

Problems while compiling

It seems there is a not well documented dependency for hap.py:

[ 89%] Linking CXX executable ../../../bin/vcfhdr2json
Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_intervalbuffer.cpp.o
Linking CXX executable ../../../bin/fastainfo
[ 90%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_intervallist.cpp.o
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/fastainfo] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/fastainfo.dir/all] Fehler 2
make[2]: *** [bin/vcfhdr2json] Fehler 1
make[1]: *** Warte auf noch nicht beendete Prozesse...
make[1]: *** [src/c++/main/CMakeFiles/vcfhdr2json.dir/all] Fehler 2
[ 92%] [ 92%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_main.cpp.o
Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_locationinfo.cpp.o
[ 94%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_popcount.cpp.o
Linking CXX executable ../../../bin/gvcf2bed
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/gvcf2bed] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/gvcf2bed.dir/all] Fehler 2
[ 95%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_refvar.cpp.o
[ 96%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_stringutil.cpp.o
Linking CXX executable ../../../bin/dipenum
[ 97%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_variantprocessing.cpp.o
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/dipenum] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/dipenum.dir/all] Fehler 2
[ 98%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_variants.cpp.o
Linking CXX executable ../../../bin/preprocess
Linking CXX executable ../../../bin/hapenum
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/preprocess] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/preprocess.dir/all] Fehler 2
[100%] Building CXX object src/c++/test/CMakeFiles/test_haplotypes.dir/test_variantstatistics.cpp.o
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/hapenum] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/hapenum.dir/all] Fehler 2
Linking CXX executable ../../../bin/hapcmp
Linking CXX executable ../../../bin/blocksplit
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/hapcmp] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/hapcmp.dir/all] Fehler 2
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/blocksplit] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/blocksplit.dir/all] Fehler 2
Linking CXX executable ../../../bin/vcfcheck
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/vcfcheck] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/vcfcheck.dir/all] Fehler 2
Linking CXX executable ../../../bin/alleles
Linking CXX executable ../../../bin/validatevcf
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/alleles] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/alleles.dir/all] Fehler 2
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/validatevcf] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/validatevcf.dir/all] Fehler 2
Linking CXX executable ../../../bin/multimerge
Linking CXX executable ../../../bin/xcmp
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/multimerge] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/multimerge.dir/all] Fehler 2
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/xcmp] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/xcmp.dir/all] Fehler 2
Linking CXX executable ../../../bin/roc
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/roc] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/roc.dir/all] Fehler 2
Linking CXX executable ../../../bin/scmp
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/scmp] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/scmp.dir/all] Fehler 2
Linking CXX executable ../../../bin/quantify
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/quantify] Fehler 1
make[1]: *** [src/c++/main/CMakeFiles/quantify.dir/all] Fehler 2
Linking CXX executable ../../../bin/test_haplotypes
/bin/ld: cannot find -lstdc++
collect2: Fehler: ld gab 1 als Ende-Status zurück
make[2]: *** [bin/test_haplotypes] Fehler 1
make[1]: *** [src/c++/test/CMakeFiles/test_haplotypes.dir/all] Fehler 2
make: *** [all] Fehler 2
Traceback (most recent call last):
  File "install.py", line 298, in <module>
    main()
  File "install.py", line 283, in main
    build_haplotypes(source_dir, build_dir, args)
  File "install.py", line 157, in build_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /tmp/buildn0adKc &&  make -j24' returned non-zero exit status 2
``


I'v tried to compile it under CentOS Linux release 7.2.1511 

Anyone know this problem? Thanks in advance!

happy treats variants that aren't fully within the confident regions as confident

In our deep dive into happy with DeepVariant we've noticed that hap.py includes in its confident calls for comparison any multi-base event (e.g., a deletion) that merely overlaps with the confident regions, rather than requiring that event to be full contained within the confident regions. Internally we require the event to be fully contained within the confident regions to be considered confident. We wanted to double check here that (a) hap.py intended this behavior and (b) check this is the recommended approach (overlaps, not contains) according to the GA4GH benchmarking recommendations?

Possible incorrect handling of incoherent VCFs

Occasionally variant callers will produce output that is incoherent, but hap.py sometimes does not detect this type of error and can produce incorrect true positives. For instance, given input like:

Truth VCF:
11 34133117 TTCCA TACCA,T 1/2

Query VCF:
11 34133117 TTCCA T 0/1
11 34133118 T A 0/1

hap.py will proceed normally and emit a 'TP' result. However, the variants in the query VCF make conflicting assertions about the state of position 34133118 (the first variant asserts that it's ref + deleted, while the second asserts that it's ref + 'A'; they can't both be correct).
It seems that hap.py should produce at minimum a FP/ FN result for the above situation, or, even better, an error value of some sort indicating that the input was invalid.

How to call hap,.py with vcfeval for joint vcf.

How do I provide sample name on the command line of HAP.py to run vcfeval for multi-sample vcf

Here is the error I get currently:

2017-05-11 21:35:33,701 ERROR Error running rtg tools / vcfeval. Return code was 1, output: / Error: No sample name provided but calls is a multi-sample VCF.

2017-05-11 21:35:33,702 ERROR Traceback (most recent call last):
2017-05-11 21:35:33,702 ERROR File "/opt/hap.py/bin/hap.py", line 417, in
2017-05-11 21:35:33,702 ERROR main()
2017-05-11 21:35:33,702 ERROR File "/opt/hap.py/bin/hap.py", line 394, in main
2017-05-11 21:35:33,702 ERROR tempfiles += Haplo.vcfeval.runVCFEval(args.vcf1, args.vcf2, output_name, args)
2017-05-11 21:35:33,702 ERROR File "/opt/hap.py/lib/python27/Haplo/vcfeval.py", line 136, in runVCFEval
2017-05-11 21:35:33,703 ERROR raise Exception("Error running rtg tools / vcfeval. Return code was %i, output: %s / %s \n" % (rc, o, e))
2017-05-11 21:35:33,703 ERROR Exception: Error running rtg tools / vcfeval. Return code was 1, output: / Error: No sample name provided but calls is a multi-sample VCF.

Support for complex SVs

Are there any plans to support symbolic SVs, or SVs in breakend notation? To properly compare such variants requires haplotype reconstruction as equivalent representations can be quite different. For example, the CHM1 and CHM13 truth sets from Huddleston 2016 report duplication events as INS, whereas most short read SVs will report the same event as a DUP. In the case of tandem duplication expansions, the nominal variant positions can be non-overlapping (even hundreds of base pairs apart) but still result in the same haplotype thus are just different representations of the same variant.

Failed installation: MacOSX 10.10.5

I'm trying to install hap.py on my laptop and things went ok until the unit tests run.
I ran this command:

python install.py ~/hap.py-install

This seems to be the relevant part of the message:

chrS:12-12 12-12:A . 0/1 0/1
chrS:13-13 13-13:C . 0/1 0/1
chrS:14-14 14-14:A 14-14:C 0/1 0/2 2/1
Unexpected variant calls: chrS:14-14 14-14:A 14-14:C 0/1 0/2 2/1
Assertion failed: (false), function test_method, file /Users/dchurch/Documents/source/hap.py/src/c++/test/test_allelesplit.cpp, line 197.
unknown location:0: fatal error in "testVariantPrimitiveSplitter": signal: SIGABRT (application abort requested)
/Users/dchurch/Documents/source/hap.py/src/c++/test/test_align.cpp:298: last checkpoint

and

*** 1 failure detected in test suite "Master Test Suite"
Boost Unit tests FAILED!
Traceback (most recent call last):
  File "install.py", line 315, in <module>
    main()
  File "install.py", line 311, in main
    test_haplotypes(source_dir, python_shebang, args)
  File "install.py", line 191, in test_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /Users/dchurch/hap.py-install && /Users/dchurch/Documents/source/hap.py/src/sh/run_tests.sh' returned non-zero exit status 1

Any help would be appreciated.

Request for rtgtools tag on Docker Hub

Thanks for providing this in an image on Docker Hub. Would it be possible to create another build with the --with-rtgtools option and tag it with :rtgtools or some such?

Haploid genome benchmarking

Thanks for a great tool - can I also use this for benchmarking haploid genomes?
Thank you,

Very high memory usage in SGE

Hi, this is the first time I use hap.py, and I use it in a SGE environment. Surprising, even with small variant files (about 40k variant positions from WES) as input, it costs a very high memory (upmost to nearly 50G). I'm not sure if I've used the right arguments.
python hap.py ref.snp.vcf.gz sample.snp.vcf.gz -o output -f region.bed -r hg19.fasta --force-interactive

And the SGE arguments are:
-l num_proc=4,vf=10G

Any solution to this issue ? THANKS.

Potential bug with hom-alt truth and multiple het candidate representation

Hi hap.py authors:

I think I may have run into a bug in happy, but I'm not 100% certain. Here's the example:

query:
20 57777379 . C CT 0 . . GT 0/1
20 57777394 . TA T 0 . . GT 0/1
20 57777395 . A T 0 . . GT 0/1

truth:
20 57777395 rs3899824 A T 50 PASS . GT:DP:ADALL:AD:GQ:IGT:I
PS:PS 1|1:639:13,200:12,179:293:1/1:.:HOMVAR

ref:
CCTTTTTTTTTTTTTTTAAAAAAAAAAACACATCTAAT starting at 57777378

hap.py output
20 57777379 . C CT 0 . BS=57777379;Regions=CONF,TS_contained GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 0/1:FP:lm:i1_5:INDEL:het:0
20 57777394 . TA T 0 . BS=57777379;Regions=CONF,TS_contained GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 0/1:FP:lm:d1_5:INDEL:het:0
20 57777395 . A T 50 . BS=57777379;Regions=CONF,TS_contained GT:BD:BK:BI:BVT:BLT:QQ 1/1:FN:am:tv:SNP:homalt:. 0/1:FP:am:tv:SNP:het:0

I know this representation isn't ideal, but what is happening here is:

-- query has 0/1 for A/T @ 57777395, which is 50% of the right answer for the truth hom-alt.
-- query has a het C=>CT followed by TA=>T, which in effect replaces one of As in the run of Ts with one more T. This has the same effect on the haplotype as the A => T SNP, and since both are het this should produce two matching haplotypes.

My view is that happy should emit no FPs and no FNs here, as these two representations are in fact identical. Seems like a bug to me.

Here's how I run hap.py:

docker run -it -v "${ROOT}:${ROOT}" -v "${OUTPUT_DIR}:${OUTPUT_DIR}" pkrusche/hap.py /opt/hap.py/bin/hap.py "${TRUTH_VCF}" "${NEW_LABELER_VCF}" --preprocess-truth --no-adjust-conf-regions -f "${TRUTH_BED}" -r "${REF%.gz}" -o "${OUTPUT_DIR}/new_labeler.happy.output" -l ${REGION}

hap.py can't be run from a symlink (fix included)

hap.py relies on importing some libraries from a relative path. Unfortunately this path is calculated relative to the invoked path of hap.py, and not the actual location of the file, causing errors when using a symlink to hap.py. This can be fixed with this trivial change:

Before (hap.py:37):
scriptDir = os.path.abspath(os.path.dirname(__file__))
After:
scriptDir = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))

true positive is not counted

issue.zip

Reference // This is from the reference file cowpox.fa

gi_30844336_ref_NC_003663.2_:27650..27749
CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGTTGACATCTGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA

Truth // ACATC is replaced by TACGAATTG
gi_30844336_ref_NC_003663.2_ 27700 . ACATC TACGAATTG 200 . . GT 1/1

CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGTTG ACATC TGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // reference (cowpox.fa)
CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGTTG TACGAATTG TGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // truth (mutatedcowpox.vcf)

BWA // variantcalling by BWA
gi_30844336_ref_NC_003663.2_ 27698 . TG TGTACG 306.478 . AB=0.809524;ABP=20.4855;AC=1;AF=0.5;AN=2;AO=17;CIGAR=1M4I1M;DP=21;DPB=55;DPRA=0;EPP=3.13803;EPPR=5.18177;GTI=0;LEN=4;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=2.76283;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=479;QR=123;RO=4;RPL=5;RPP=9.26925;RPPR=11.6962;RPR=12;RUN=1;SAF=13;SAP=13.3567;SAR=4;SRF=1;SRP=5.18177;SRR=3;TYPE=ins GT:DP:RO:QR:AO:QA:GL 0/1:21:4:123:17:479:-37.0656,0,-5.05395
gi_30844336_ref_NC_003663.2_ 27701 . CATC ATTG 421.462 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=17;CIGAR=2X1M1X;DP=17;DPB=17;DPRA=0;EPP=4.1599;EPPR=0;GTI=0;LEN=4;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=26.5627;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=505;QR=0;RO=0;RPL=6;RPP=6.20364;RPPR=0;RPR=11;RUN=1;SAF=13;SAP=13.3567;SAR=4;SRF=0;SRP=0;SRR=0;TYPE=complex GT:DP:RO:QR:AO:QA:GL 1/1:17:0:0:17:505:-45.7407,-5.11751,0

CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGT TG A CATC TGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // reference
CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGT TGTACG A ATTG TGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // BWA

Truth CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGTTG TACGAATTG TGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // reference
BWA CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGT TGTACG A ATTG TGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // BWA

Truth CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGTTGTACGAATTGTGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // reference
BWA CGTTGATTGCAATGGCTATACATGTATATCCGTTATTTGATCTAATGTTGTACGAATTGTGAACCGGATTCTAGCAGTAAAGACACTAGAGATTGTTTATTATA // BWA

HAP.PY
gi_30844336_ref_NC_003663.2_ 27698 . T TGTAC 306.478 . BS=27698 GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 0/1:FP:lm:i1_5:INDEL:het:306.478
gi_30844336_ref_NC_003663.2_ 27700 . ACATC TACGAATTG 200 . BS=27698 GT:BD:BK:BI:BVT:BLT:QQ 1/1:FN:lm:c1_5,tv:INDEL:homalt:. ./.:.:.:.:NOCALL:nocall:0
gi_30844336_ref_NC_003663.2_ 27701 . C A 421.462 . BS=27698 GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 1/1:FP:lm:tv:SNP:homalt:421.462
gi_30844336_ref_NC_003663.2_ 27702 . A T 421.462 . BS=27698 GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 1/1:FP:lm:tv:SNP:homalt:421.462
gi_30844336_ref_NC_003663.2_ 27704 . C G 421.462 . BS=27698 GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 1/1:FP:lm:tv:SNP:homalt:421.462

// The variants called are equivalent but HAP.PY counted it as False Negative. Hope this issue can be fixed.

Problems with VCFv4.2: Hap.py v0.3.10 can not deal with asterisk at "ALT" column

Running hap.py reveals several warnings and errors. The positions always regard variants with at least one * at the "ALT" column. It seems, hap.py does not count asterisks at "ALT" column. Is hap.py compatible with the VCFv4.2 or later?

Error and warnings below

Best wishes
Martin

2018-02-27 12:15:39,660 WARNING [W] too many AD fields at chrX:111680917 max_ad = 2 retrieved: 3
2018-02-27 12:15:39,661 WARNING [W] too many AD fields at chrX:117985447 max_ad = 2 retrieved: 3
2018-02-27 12:15:41,130 WARNING [W] too many AD fields at chrX:119640487 max_ad = 2 retrieved: 3
2018-02-27 12:15:41,131 WARNING [W] too many AD fields at chrX:119640489 max_ad = 2 retrieved: 3
2018-02-27 12:15:41,302 ERROR One of the preprocess jobs failed
2018-02-27 12:15:41,302 ERROR Traceback (most recent call last):
2018-02-27 12:15:41,302 ERROR File "/usr/local/bin/hap.py", line 511, in
2018-02-27 12:15:41,303 ERROR main()
2018-02-27 12:15:41,303 ERROR File "/usr/local/bin/hap.py", line 366, in main
2018-02-27 12:15:41,303 ERROR "QUERY")
2018-02-27 12:15:41,303 ERROR File "/usr/local/bin/pre.py", line 203, in preprocess
2018-02-27 12:15:41,303 ERROR haploid_x=gender == "male")
2018-02-27 12:15:41,303 ERROR File "/usr/local/lib/python27/Haplo/partialcredit.py", line 213, in partialCredit
2018-02-27 12:15:41,304 ERROR raise Exception("One of the preprocess jobs failed")
2018-02-27 12:15:41,304 ERROR Exception: One of the preprocess jobs failed

raise Exception("Input file %s does not exist." % args.vcf2)

Hi,

I'm getting the following error messages:
2017-06-01 14:09:41,816 ERROR Input file NA12878.diploidSV.vcf.gz} does not exist.
2017-06-01 14:09:41,817 ERROR Traceback (most recent call last):
2017-06-01 14:09:41,817 ERROR File "/opt/hap.py/bin/hap.py", line 460, in
2017-06-01 14:09:41,817 ERROR main()
2017-06-01 14:09:41,817 ERROR File "/opt/hap.py/bin/hap.py", line 224, in main
2017-06-01 14:09:41,817 ERROR raise Exception("Input file %s does not exist." % args.vcf2)
2017-06-01 14:09:41,817 ERROR Exception: Input file NA12878.diploidSV.vcf.gz} does not exist.

The commands I'm running are:
#!/bin/bash -x

Set variables

HAPPY=/opt/hap.py/bin/hap.py
export HGREF=/home/neuro/Public/RefSeqIndexAllPrograms/GATK/hg19_1stM_unmask_ran_all.fa
WORKDIR=/path/to/NA12878
TRUTH=${WORKDIR}/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz
QUERY=${WORKDIR}/NA12878.diploidSV.vcf.gz

Run command

${HAPPY} ${TRUTH} ${QUERY} -o test -r ${HGREF}

The header for both files:
zcat NA12878.diploidSV.vcf.gz | head
##fileformat=VCFv4.1
##fileDate=20170412
##source=GenerateSVCandidates 0.29.6
##reference=file:///home/neuro/Public/RefSeqIndexAllPrograms/GATK/hg19_1stM_unmask_ran_all.fa
##contig=<ID=chrM,length=16571>
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>

zcat HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz | head
##fileformat=VCFv4.2
##fileDate=20160824
##CL=vcffilter -i filtered-phase-transfer.vcf.gz -o - --javascript "ensureFormatHeader("##FORMAT=<ID=PS,Number=1,Type=String,Description=\"Phase set for GT\">"); function record() {if(INTEGRATION.GT=="1/1") { INTEGRATION.IPS="."; INTEGRATION.PS="HOMVAR"; INTEGRATION.GT="1|1";} else {if((INTEGRATION.GT=="0/1" || INTEGRATION.GT=="1/2" || INTEGRATION.GT=="2/1" || INTEGRATION.GT=="1/0") ) {if(INTEGRATION.IPS.length>1) {INTEGRATION.PS=INTEGRATION.IPS; INTEGRATION.GT=INTEGRATION.IGT;} else {INTEGRATION.PS=".";};} else { if((INTEGRATION.IPS.length<2)) { INTEGRATION.IPS=".";} INTEGRATION.PS="PATMAT";};};}"
##RUN-ID=99570df1-3bbc-46df-b8e8-0839daa980e3
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>

The files are definitely in the working directory, so I'm not quite sure why it can't find the query file. Any help would be much appreciated.

Cheers,
Thuong

Input file has no samples. Hap.py will not like that.

Hi, I simulated reads with variants on A.thaliana. Then I run a variant caller, and when I am going to benchmark the calling I have that error. The golden vcf produced by the simulator does not have samples because it created the variants itself. But hap.py does not like that:

2018-06-11 11:40:13,580 ERROR CalledProcessError: Command 'vcfcheck /gatk/required_files/A.tha_variant_01_golden.vcf --check-bcf-errors 1' returned non-zero exit status 1

Is there anyway I can modify the .vcf or pass an option to hap.py to make it work? Thank you very much

This is a part of the golden.vcf:

##fileformat=VCFv4.1
##reference=/A.tha.nuclear.fa
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=VMX,Number=1,Type=String,Description="SNP is Missense in these Read Frames">
##INFO=<ID=VNX,Number=1,Type=String,Description="SNP is Nonsense in these Read Frames">
##INFO=<ID=VFX,Number=1,Type=String,Description="Indel Causes Frameshift">
##INFO=<ID=WP,Number=A,Type=Integer,Description="Ploidy indicator">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INS,Description="Insertion of novel sequence">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=CNV,Description="Copy number variable region">
##ALT=<ID=TRANS,Description="Translocation">
##ALT=<ID=INV-TRANS,Description="Inverted translocation">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 513 . C T . PASS WP=1/0
1 542 . ATG A . PASS WP=0/0
1 573 . C T . PASS WP=1/0
1 643 . G A . PASS WP=0/0
1 659 . A G . PASS WP=0/1

Error while installation

/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make[2]: *** [bin/quantify] Error 1
make[1]: *** [src/c++/main/CMakeFiles/quantify.dir/all] Error 2
make: *** [all] Error 2
Traceback (most recent call last):
File "install.py", line 332, in
main()
File "install.py", line 317, in main
build_haplotypes(source_dir, build_dir, args)
File "install.py", line 186, in build_haplotypes
subprocess.check_call(to_run, shell=True)
File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /tmp/buildPXIbnx && make -j64' returned non-zero exit status 2

Kindly help in resolving the same...

Is it possible to evaluate vcf of tetraploid species?

I experienced this error:
[W] Found a variant with more 4 > 2 (max) alt alleles. These become no-calls.

quantify returned non-zero exit status -11

log:

2018-02-14 09:30:18,372 ERROR    Command 'quantify /tmp/hap.py.result.jryogM.vcf.gz -o /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.roc.tsv -r /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa --threads 8 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.vcf.gz --roc-regions '*'' returned non-zero exit status -11
2018-02-14 09:30:18,372 ERROR    Traceback (most recent call last):
2018-02-14 09:30:18,372 ERROR      File "build/bin/hap.py", line 511, in <module>
2018-02-14 09:30:18,373 ERROR        main()
2018-02-14 09:30:18,373 ERROR      File "build/bin/hap.py", line 496, in main
2018-02-14 09:30:18,373 ERROR        qfy.quantify(args)
2018-02-14 09:30:18,373 ERROR      File "/mnt/home/ian/hap.py/build/bin/qfy.py", line 129, in quantify
2018-02-14 09:30:18,374 ERROR        strat_fixchr=args.strat_fixchr)
2018-02-14 09:30:18,374 ERROR      File "/mnt/home/ian/hap.py/build/lib/python27/Haplo/quantify.py", line 178, in run_quantify
2018-02-14 09:30:18,375 ERROR        subprocess.check_call(run_str, shell=True, stdout=tfo, stderr=tfe)
2018-02-14 09:30:18,375 ERROR      File "/mnt/home/ian/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
2018-02-14 09:30:18,377 ERROR        raise CalledProcessError(retcode, cmd)
2018-02-14 09:30:18,377 ERROR    CalledProcessError: Command 'quantify /tmp/hap.py.result.jryogM.vcf.gz -o /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.roc.tsv -r /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa --threads 8 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.vcf.gz --roc-regions '*'' returned non-zero exit status -11

stdout:

bespin1 Wed Feb 14 09:27 hap.py $build/bin/hap.py \
> /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/svanalyzer_union_171212_v0.5.0_annotated.with_chr.vcf \
> /mnt/yard2/ian/structural_variants/longranger2.0-vcfs/HG2_combined.sorted.vcf.gz \
> -r /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa \
> -o /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy \
> --roc QUAL -V \
> --logfile /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/happy.log \
> --threads 8
[W] overlapping records at chr1:1285401 for sample 1
[W] variant at chr1:79708114 has more than one base of reference padding
[W] Variants that have >1 base of reference padding: 142
[W] Variants that overlap on the reference allele: 2765
[I] Total VCF records:         74216
[I] Non-reference VCF records: 66671
[W] Symbolic / SV ALT alleles at chr1:649703
[W] overlapping records at chr1:829171 for sample 0
[W] variant at chr1:20164759 has more than one base of reference padding
[W] Variants that have >1 base of reference padding: 253
[W] Variants that overlap on the reference allele: 291
[W] Variants that have symbolic ALT alleles: 6177
[I] Total VCF records:         32009
[I] Non-reference VCF records: 31880

Any idea what is going on? Since /tmp/hap.py.result.jryogM.vcf.gz is a temporary file I can't check on it after the run. Is there a way to retain intermediates?

regex error despite using gcc 4.9

I'm running hap.py like this:

hap.py patient_3_strelka.vcf.gz patient_3_varscan.vcf.gz --reference /b37/human_g1k_v37.fasta -o ${NAME}_str_var

My input vcfs are sorted, bgzipped, and tabix indexed.

I'm running it on a 64 core server with 500GB RAM. It's running CentOS Linux release 7.3.1611.

Gcc is version 4.9.2. I upgraded from 4.8 this morning; after upgrading gcc I uninstalled and re-installed hap.py using Bioconda.

It runs for a while and then produces the following error messages:

2017-11-17 14:18:42,736 ERROR Command 'quantify /tmp/4487.1.all.q/hap.py.result.WAYuiy.vcf.gz -o _str_var.roc.tsv -r /b37/human_g1k_v37.fasta --threads 64 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v _str_var.vcf.gz --roc-regions '*'' returned non-zero exit status 1
2017-11-17 14:18:42,736 ERROR Traceback (most recent call last):
2017-11-17 14:18:42,736 ERROR File "/home/wid/miniconda3/envs/patrick/bin/hap.py", line 460, in
2017-11-17 14:18:42,737 ERROR main()
2017-11-17 14:18:42,737 ERROR File "/home/wid/miniconda3/envs/patrick/bin/hap.py", line 445, in main
2017-11-17 14:18:42,737 ERROR qfy.quantify(args)
2017-11-17 14:18:42,737 ERROR File "/home/wid/miniconda3/envs/patrick/bin/qfy.py", line 129, in quantify
2017-11-17 14:18:42,737 ERROR strat_fixchr=args.strat_fixchr)
2017-11-17 14:18:42,737 ERROR File "/home/wid/miniconda3/envs/patrick/lib/python27/Haplo/quantify.py", line 177, in run_quantify
2017-11-17 14:18:42,738 ERROR subprocess.check_call(run_str, shell=True, stdout=tfo, stderr=tfe)
2017-11-17 14:18:42,738 ERROR File "/home/wid/miniconda3/envs/patrick/lib/python2.7/subprocess.py", line 186, in check_call
2017-11-17 14:18:42,738 ERROR raise CalledProcessError(retcode, cmd)
2017-11-17 14:18:42,739 ERROR CalledProcessError: Command 'quantify /tmp/4487.1.all.q/hap.py.result.WAYuiy.vcf.gz -o _str_var.roc.tsv -r /data/Phil/ref_phil/GATK_resource/b37/human_g1k_v37.fasta --threads 64 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v _str_var.vcf.gz --roc-regions '*'' returned non-zero exit status 1

This seems to have something to do with the ROCs. I added the --no.roc flag to the command, but it failed the same way.

Preprocessing and xcmp both takes unusually long time to complete

I got hap.py v0.3.10 installed on Red Hat Enterprise Linux Server (release 7.2.1511), but it takes ~10 hours to complete even the example dataset suggested here. With gVCFs, it runs into several errors (major one being "Exception when running <function preprocessWrapper at 0x23a6cf8>" and "IOError: [Errno 23] Too many open files in system").

Looking at log output from running the example dataset preprocess consumes ~0.6 hrs and xcmp consumes >8 hrs. Log file is attached here, and the logs with 'time taken' part are shown below. Hap.py doc says it should take <15 minutes to finish the job for gVCFs, but even the example dataset takes way longer time to complete. Any idea where the problem arises? I have also used hap.py v0.3.9 and it had the same problem.

2017-08-23 11:49:01,434 INFO     blocksplit for chr21 -- time taken 1.23
2017-08-23 12:24:37,917 INFO     preprocess for chr21:36330398-38937043 -- time taken 2136.40
2017-08-23 12:24:43,907 INFO     preprocess for chr21:26860115-28424230 -- time taken 2142.38
2017-08-23 12:24:46,338 INFO     preprocess for chr21:43622442-44642556 -- time taken 2144.65
2017-08-23 12:24:48,196 INFO     preprocess for chr21:38937044-41083006 -- time taken 2146.59
2017-08-23 12:24:50,242 INFO     preprocess for chr21:23383456-25351788 -- time taken 2148.72
2017-08-23 12:24:52,227 INFO     preprocess for chr21:16812555-18980058 -- time taken 2150.71
2017-08-23 12:24:52,309 INFO     preprocess for chr21:46758486-2147483647 -- time taken 2150.63
2017-08-23 12:24:54,070 INFO     preprocess for chr21:41083007-42084376 -- time taken 2152.39
2017-08-23 12:24:54,710 INFO     preprocess for chr21:21943268-23383455 -- time taken 2153.19
2017-08-23 12:24:54,789 INFO     preprocess for chr21:32571313-34319568 -- time taken 2153.11
2017-08-23 12:24:54,874 INFO     preprocess for chr21:44642557-46758485 -- time taken 2153.19
2017-08-23 12:24:54,933 INFO     preprocess for chr21:28424231-30006304 -- time taken 2153.41
2017-08-23 12:24:55,033 INFO     preprocess for chr21:34319569-36330397 -- time taken 2153.36
2017-08-23 12:24:55,060 INFO     preprocess for chr21:42084377-43622441 -- time taken 2153.38
2017-08-23 12:24:55,087 INFO     preprocess for chr21:25351789-26860114 -- time taken 2153.57
2017-08-23 12:24:55,103 INFO     preprocess for chr21:30006305-32571312 -- time taken 2153.59
2017-08-23 12:24:55,384 INFO     preprocess for chr21:1-9605451 -- time taken 2153.86
2017-08-23 12:24:55,666 INFO     preprocess for chr21:18980059-21943267 -- time taken 2154.14
2017-08-23 12:24:55,770 INFO     preprocess for chr21:11198117-16812554 -- time taken 2154.25
2017-08-23 12:24:56,185 INFO     preprocess for chr21:10044887-10657837 -- time taken 2154.66
2017-08-23 12:24:56,865 INFO     preprocess for chr21:9605452-10044886 -- time taken 2155.34
2017-08-23 12:24:58,042 INFO     preprocess for chr21:10657838-11198116 -- time taken 2156.52
2017-08-23 12:25:06,894 INFO     preprocess for /tmp/query.pppjVPot.vcf.gz -- time taken 2170.67
2017-08-23 12:25:08,313 INFO     blocksplit for chr21 -- time taken 1.37
2017-08-23 20:56:50,889 INFO     xcmp for chunk chr21:43203367-43990133 -- time taken 30702.43
2017-08-23 20:57:52,969 INFO     xcmp for chunk chr21:9955614-10086540 -- time taken 30764.49
2017-08-23 20:58:39,561 INFO     xcmp for chunk chr21:47554728-2147483647 -- time taken 30811.09
2017-08-23 20:58:40,135 INFO     xcmp for chunk chr21:18700617-19485076 -- time taken 30811.65
2017-08-23 20:58:47,228 INFO     xcmp for chunk chr21:11141681-14362815 -- time taken 30818.74
2017-08-23 20:58:56,355 INFO     xcmp for chunk chr21:19485077-20200412 -- time taken 30827.87
2017-08-23 20:59:02,967 INFO     xcmp for chunk chr21:15659651-16522182 -- time taken 30834.48
2017-08-23 20:59:20,403 INFO     xcmp for chunk chr21:41601933-42337467 -- time taken 30851.92
2017-08-23 20:59:22,483 INFO     xcmp for chunk chr21:24204974-25028671 -- time taken 30853.98
2017-08-23 20:59:26,725 INFO     xcmp for chunk chr21:35268047-36463156 -- time taken 30858.27
2017-08-23 20:59:38,237 INFO     xcmp for chunk chr21:15268599-15659650 -- time taken 30869.75
2017-08-23 20:59:39,899 INFO     xcmp for chunk chr21:17778067-18700616 -- time taken 30871.41
2017-08-23 20:59:44,890 INFO     xcmp for chunk chr21:46922203-47554727 -- time taken 30876.42
2017-08-23 20:59:47,690 INFO     xcmp for chunk chr21:34242631-35268046 -- time taken 30879.23
2017-08-23 20:59:49,259 INFO     xcmp for chunk chr21:28700929-29512110 -- time taken 30880.77
2017-08-23 20:59:55,445 INFO     xcmp for chunk chr21:9849286-9903489 -- time taken 30886.95
2017-08-23 20:59:58,195 INFO     xcmp for chunk chr21:10191719-10637115 -- time taken 30889.72
2017-08-23 21:00:01,130 INFO     xcmp for chunk chr21:9903490-9955613 -- time taken 30892.64
2017-08-23 21:00:05,232 INFO     xcmp for chunk chr21:16522183-17778066 -- time taken 30896.74
2017-08-23 21:00:05,295 INFO     xcmp for chunk chr21:11101337-11141680 -- time taken 30896.81
2017-08-23 21:00:06,175 INFO     xcmp for chunk chr21:10795606-10854430 -- time taken 30897.68
2017-08-23 21:00:07,350 INFO     xcmp for chunk chr21:36463157-37557574 -- time taken 30898.91
2017-08-23 21:00:09,297 INFO     xcmp for chunk chr21:26685873-27755314 -- time taken 30900.80
2017-08-23 21:00:09,297 INFO     xcmp for chunk chr21:45567792-46267390 -- time taken 30900.82
2017-08-23 21:00:09,671 INFO     xcmp for chunk chr21:20200413-20929776 -- time taken 30901.20
2017-08-23 21:00:11,343 INFO     xcmp for chunk chr21:9477179-9568735 -- time taken 30902.86
2017-08-23 21:00:11,477 INFO     xcmp for chunk chr21:1-9477178 -- time taken 30902.99
2017-08-23 21:00:11,744 INFO     xcmp for chunk chr21:25795687-26685872 -- time taken 30903.26
2017-08-23 21:00:12,442 INFO     xcmp for chunk chr21:10762268-10795605 -- time taken 30903.97
2017-08-23 21:00:12,568 INFO     xcmp for chunk chr21:21758584-22405450 -- time taken 30904.08
2017-08-23 21:00:13,251 INFO     xcmp for chunk chr21:38362658-39058109 -- time taken 30904.79
2017-08-23 21:00:14,191 INFO     xcmp for chunk chr21:43990134-44509792 -- time taken 30905.71
2017-08-23 21:00:14,291 INFO     xcmp for chunk chr21:42337468-43203366 -- time taken 30905.85
2017-08-23 21:00:15,321 INFO     xcmp for chunk chr21:30489401-32023024 -- time taken 30906.87
2017-08-23 21:00:15,618 INFO     xcmp for chunk chr21:20929777-21758583 -- time taken 30907.13
2017-08-23 21:00:15,773 INFO     xcmp for chunk chr21:10854431-11053798 -- time taken 30907.30
2017-08-23 21:00:15,869 INFO     xcmp for chunk chr21:39058110-39964967 -- time taken 30907.39
2017-08-23 21:00:16,056 INFO     xcmp for chunk chr21:40929749-41601932 -- time taken 30907.61
2017-08-23 21:00:16,127 INFO     xcmp for chunk chr21:9568736-9701279 -- time taken 30907.65
2017-08-23 21:00:16,360 INFO     xcmp for chunk chr21:29512111-30489400 -- time taken 30907.91
2017-08-23 21:00:16,899 INFO     xcmp for chunk chr21:22405451-23197628 -- time taken 30908.41
2017-08-23 21:00:17,223 INFO     xcmp for chunk chr21:27755315-28700928 -- time taken 30908.73
2017-08-23 21:00:17,746 INFO     xcmp for chunk chr21:10086541-10191718 -- time taken 30909.26
2017-08-23 21:00:17,747 INFO     xcmp for chunk chr21:10637116-10727483 -- time taken 30909.27
2017-08-23 21:00:17,898 INFO     xcmp for chunk chr21:39964968-40929748 -- time taken 30909.45
2017-08-23 21:00:17,983 INFO     xcmp for chunk chr21:32023025-33039875 -- time taken 30909.53
2017-08-23 21:00:18,052 INFO     xcmp for chunk chr21:25028672-25795686 -- time taken 30909.57
2017-08-23 21:00:18,153 INFO     xcmp for chunk chr21:9701280-9849285 -- time taken 30909.68
2017-08-23 21:00:18,260 INFO     xcmp for chunk chr21:37557575-38362657 -- time taken 30909.78
2017-08-23 21:00:18,273 INFO     xcmp for chunk chr21:46267391-46922202 -- time taken 30909.82
2017-08-23 21:00:18,330 INFO     xcmp for chunk chr21:14362816-14798906 -- time taken 30909.84
2017-08-23 21:00:18,339 INFO     xcmp for chunk chr21:14798907-15268598 -- time taken 30909.85
2017-08-23 21:00:18,406 INFO     xcmp for chunk chr21:10727484-10762267 -- time taken 30909.95
2017-08-23 21:00:18,448 INFO     xcmp for chunk chr21:33039876-34242630 -- time taken 30909.99
2017-08-23 21:00:18,458 INFO     xcmp for chunk chr21:44509793-45567791 -- time taken 30910.01
2017-08-23 21:00:19,136 INFO     xcmp for chunk chr21:23197629-24204973 -- time taken 30910.65
2017-08-23 21:00:19,276 INFO     xcmp for chunk chr21:11053799-11101336 -- time taken 30910.79

Log file:
NA12878_example_happy.txt

Incorrect variant position representation in error logs

hap.py fails for vcfs due to presence of star ALT alleles, and I am still trying to figure out how to best deal with this. In the process, I found a bug in error reporting, where variant positions appear to be using zero-indexing instead of one-indexing. For example, in the error log snippet below, 1:15273 should actually be 1:15274. I'm using hap.py v0.3.10.

2018-05-15 14:58:27,833 ERROR    Preprocess command preprocess /tmp/tmpkHm0Sh.vcf.gz:* -o /tmp/input.akWCP_.prep.vcf.gz -V 1 -L 0 -r ref_genome/grch37/human_g1k_v37.fasta failed. Outputs are here /tmp/stdout3Yq7WQ.log / /tmp/stderrimbr59.log
2018-05-15 14:58:27,834 ERROR    [W] too many AD fields at 1:15273 max_ad = 2 retrieved: 3
2018-05-15 14:58:27,834 ERROR    [W] too many AD fields at 1:20226 max_ad = 2 retrieved: 3
2018-05-15 14:58:27,836 ERROR    [W] too many AD fields at 1:54711 max_ad = 2 retrieved: 4
2018-05-15 14:58:27,836 ERROR    [W] too many AD fields at 1:54723 max_ad = 2 retrieved: 3
2018-05-15 14:58:27,836 ERROR    [W] too many AD fields at 1:82132 max_ad = 2 retrieved: 4
2018-05-15 14:58:27,841 ERROR    Exception when running <function preprocessWrapper at 0x2aaad1e11cf8>:
2018-05-15 14:58:27,841 ERROR    ------------------------------------------------------------
2018-05-15 14:58:27,841 ERROR    Traceback (most recent call last):
2018-05-15 14:58:27,842 ERROR      File "/dddd/software/happy-0.3.10/lib/python27/Tools/parallel.py", line 72, in parMapper
2018-05-15 14:58:27,851 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2018-05-15 14:58:27,851 ERROR      File "/dddd/software/happy-0.3.10/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2018-05-15 14:58:27,884 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2018-05-15 14:58:27,884 ERROR      File "/gpfs/gpfs1/software/Python-2.7.11/lib/python2.7/subprocess.py", line 540, in check_call
2018-05-15 14:58:27,897 ERROR        raise CalledProcessError(retcode, cmd)
2018-05-15 14:58:27,897 ERROR    CalledProcessError: Command 'preprocess /tmp/tmpkHm0Sh.vcf.gz:* -o /tmp/input.akWCP_.prep.vcf.gz -V 1 -L 0 -r ref_genome/grch37/human_g1k_v37.fasta' returned non-zero exit status -6
2018-05-15 14:58:27,898 ERROR    ------------------------------------------------------------
2018-05-15 14:58:27,909 ERROR    One of the preprocess jobs failed
2018-05-15 14:58:27,909 ERROR    Traceback (most recent call last):
2018-05-15 14:58:27,909 ERROR      File "/dddd/software/happy-0.3.10/bin/hap.py", line 511, in <module>
2018-05-15 14:58:27,917 ERROR        main()
2018-05-15 14:58:27,917 ERROR      File "/dddd/software/happy-0.3.10/bin/hap.py", line 366, in main
2018-05-15 14:58:27,918 ERROR        "QUERY")
2018-05-15 14:58:27,918 ERROR      File "/dddd/software/happy-0.3.10/bin/pre.py", line 203, in preprocess
2018-05-15 14:58:27,925 ERROR        haploid_x=gender == "male")
2018-05-15 14:58:27,926 ERROR      File "/dddd/software/happy-0.3.10/lib/python27/Haplo/partialcredit.py", line 214, in partialCredit
2018-05-15 14:58:27,926 ERROR        raise Exception("One of the preprocess jobs failed")
2018-05-15 14:58:27,926 ERROR    Exception: One of the preprocess jobs failed

/bin/ld: cannot find -lstdc++ error

Trying to install on a centos 7.2 machine using
sudo python install.py ~/hap.py-install --with-rtgtools
got this error

Linking CXX executable ../../../bin/multimerge
/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make[2]: *** [bin/multimerge] Error 1
make[1]: *** [src/c++/main/CMakeFiles/multimerge.dir/all] Error 2
Linking CXX executable ../../../bin/alleles
/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make[2]: *** [bin/alleles] Error 1
make[1]: *** [src/c++/main/CMakeFiles/alleles.dir/all] Error 2
Linking CXX executable ../../../bin/test_haplotypes
/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make[2]: *** [bin/test_haplotypes] Error 1
make[1]: *** [src/c++/test/CMakeFiles/test_haplotypes.dir/all] Error 2
Linking CXX executable ../../../bin/scmp
/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make[2]: *** [bin/scmp] Error 1
make[1]: *** [src/c++/main/CMakeFiles/scmp.dir/all] Error 2
Linking CXX executable ../../../bin/quantify
/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make[2]: *** [bin/quantify] Error 1
make[1]: *** [src/c++/main/CMakeFiles/quantify.dir/all] Error 2
make: *** [all] Error 2
Traceback (most recent call last):
  File "install.py", line 298, in <module>
    main()
  File "install.py", line 283, in main
    build_haplotypes(source_dir, build_dir, args)
  File "install.py", line 157, in build_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /tmp/buildIMEVcD &&  make -j32' returned non-zero exit status 2

install.py failed when Multimerge test

source version:current version(commit d51d111)

#./install.py --python system  /usr/hpc-bio/hap.py
...
...
...
Running Multimerge test (1)
Adding file '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../data/merge1.vcf.gz' / sample 'NA12877'
Adding file '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../data/merge2.vcf.gz' / sample 'NA12878'
Writing '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../data/merge1.vcf.gz:NA12877' as sample 'NA12877'
Writing '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../data/merge2.vcf.gz:NA12878' as sample 'NA12878'
Multimerge test SUCCEEDED.
Running Multimerge data import test
Adding file '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../data/import_errors.vcf.gz' / sample 'NA12877'
Writing '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../data/import_errors.vcf.gz:NA12877' as sample 'NA12877'
[W] Found a variant with more 3 > 2 (max) alt alleles. These become no-calls.
Multimerge test SUCCEEDED.
Running Multimerge test (2)
Output is in /tmp/multimerge.nKc2BbJ3L8.vcf
Adding file '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../../example/multimerge/hap_alleles_1.vcf.gz' / sample ''
Adding file '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../../example/multimerge/hap_alleles_2.vcf.gz' / sample ''
Writing '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../../example/multimerge/hap_alleles_1.vcf.gz:' as sample 'hap_alleles_1.vcf'
Writing '/nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../../example/multimerge/hap_alleles_2.vcf.gz:' as sample 'hap_alleles_2.vcf'
[W::vcf_parse_format] FORMAT 'GT' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'GT' is not defined in the header, assuming Type=String
Contig chr1 is not known
35a35,40
> chr1  19472002        .       GACACACACACACACACACAC   G       0       .       .       GT      0/1     0/1
> chr1  19472022        .       C       CAC     0       .       .       GT      ./.     0/1
> chr1  19472022        .       C       CAC     0       .       .       GT      0/1     ./.
> chr1  31755330        .       GTATCTATC       G       0       .       .       GT      0/1     0/1
> chr1  31755334        .       CTATC   C       0       .       .       GT      ./.     0/1
> chr1  31755334        .       CTATC   C       0       .       .       GT      0/1     ./.
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##reference=hg19
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##INFO=<ID=END,Number=.,Type=Integer,Description="SV end position">
##INFO=<ID=IMPORT_FAIL,Number=.,Type=Flag,Description="Flag to identify variants that could not be imported.">
##FORMAT=<ID=AGT,Number=1,Type=String,Description="Genotypes at ambiguous locations">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=A,Type=Integer,Description="Allele Depths">
##FORMAT=<ID=ADO,Number=.,Type=Integer,Description="Summed depth of non-called alleles.">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  hap_alleles_1.vcf       hap_alleles_2.vcf
Multimerge test (2) FAILED. diff -I^# /tmp/multimerge.nKc2BbJ3L8.vcf /nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/../../example/multimerge/hap_alleles_leftshifted.vcf
Multimerge tests FAILED!
Traceback (most recent call last):
  File "./install.py", line 334, in <module>
    main()
  File "./install.py", line 330, in main
    test_haplotypes(source_dir, python_shebang, args)
  File "./install.py", line 199, in test_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /usr/hpc-bio/hap.py && /nfs/git/dna/benchmarking-tools/tools/hap.py/src/sh/run_tests.sh' returned non-zero exit status 1

Incorrect FN assignment in latest hap.py

Hi Happy authors,

We've been using hap.py for our internal DeepVariant evaluations and recently did a deep dive into some of the FPs and FNs and discovered what looks like a bug in happy. Here's the situation (these are using the b37 truth from GIAB for HG002 if that's useful):

Candidate variant call:
20      16392574        .       C       CGTGT   0       .       .       GT      1/0

truth:
20      16392574        .       CGTGTGTGT       CGTGTGTGTGTGT,C 50      PASS    .     GT:DP:ADALL:AD:GQ:IGT:IPS:PS    1|2:259:0,33,47:2,43,61:297:2|1:.:PATMAT

happy vcf output:
20      16392574        .       CGTGTGTGT       C       50      .       BS=16392574;Regions=CONF,TS_contained   GT:BD:BK:BI:BVT:BLT:QQ  0/1:FN:lm:d6_15:INDEL:het:. ./.:.:.:.:NOCALL:nocall:0
20      16392574        .       C       CGTGT   0       .       BS=16392574;Regions=CONF,TS_contained   GT:BD:BK:BI:BVT:BLT:QQ  ./.:.:.:.:NOCALL:nocall:.  0/1:FP:lm:i1_5:INDEL:het:0
20      16392582        .       T       TGTGT   50      .       BS=16392574;Regions=CONF,TS_contained   GT:BD:BK:BI:BVT:BLT:QQ  0/1:FN:lm:i1_5:INDEL:het:. ./.:.:.:.:NOCALL:nocall:0

In this situation we have a multi-allelic het-alt in truth and a candidate variant that contains only one of the two alleles (C=>CGTGT is the same as CGTGTGTGT=>CGTGTGTGTGTGT since we can remove the suffix GTGTGTGT to produce C=>CGTGT). However, in the happy VCF output you can see it is saying there's on FN (for the deletion allele), one FP for the hom-ref assignment for the C=>CGTGT allele, and one extra and incorrect FN allele T=>TGTGT. This last allele is actually the same as the C=>CGTGT but it's been put at an unusual position (16392582 not 16392574) as the reference in this region is a repeat of lots of GT dinucleotides.

It's unclear to use if this is just a problem with the VCF emitter, but if it affects the actual TP/FP/FN calculations then it's definitely much more serious.

All the best,

Mark

HAP.PY does not work without a bed file input.

When -f or -T option is not specified on the command line, that is , there is no bed file provided as input , HAP.PY filters out all variants from the VCF truth VCF file.

This should not be the case, as we should be able to compare two VCF files without the need of any region definition.

He is how the Summary looks like:
Benchmarking Summary:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt METRIC.Recall METRIC.Precision METRIC.Frac_NA TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 0 0 0 906937 0 906937 0 0.0 NaN 1.0 . NaN . 1.806528
INDEL PASS 0 0 0 906937 0 906937 0 0.0 NaN 1.0 . NaN . 1.806528
SNP ALL 0 0 0 3820642 0 3820642 0 0.0 NaN 1.0 . 2.025957 . 1.592023
SNP PASS 0 0 0 3820642 0 3820642 0 0.0 NaN 1.0 . 2.025957 . 1.592023

Requires 'chr' for contig names in all analyses?

While I understand that a copy of hg19 with 'chr' prefixed to the chromosome names is is required passing integration tests, it seems that chromosome names are required to start with 'chr' even for general analysis. For instance, when running hap.py such as:

~/hap.py-install/bin/hap.py test_input.vcf.gz test_calls.vcf.gz -r ~/resources/GRCh38.p2/GRCh38.p2.fa -o happytest
I get an error like:
...
[fai_fetch_seq] The sequence "chr10" not found
...

There's no contig called 'chr10' in the VCFs or in the supplied reference fasta, is hap.py maybe accidentally looking for contigs that start with 'chr' even in cases when its not performing internal testing?

Hap.py installation error

Hi, not able to install hap.py

git clone https://github.com/sequencing/hap.py
mkdir hap.py-build
cd hap.py-build
Run CMake
cmake ../hap.py
Build
make

After make it showing error, can any one help me with installation ?

jsoncpp.cpp:(.text.ZN4Json6Reader15StructuredErrorC2EOS1[ZN4Json6Reader15StructuredErrorC5EOS1]+0x45): undefined reference to `std::__cxx11::basic_string<char, std::char_traits, std::allocator >::basic_string(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&&)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/c++/main/CMakeFiles/xcmp.dir/build.make:111: bin/xcmp] Error 1
make[1]: *** [CMakeFiles/Makefile2:304: src/c++/main/CMakeFiles/xcmp.dir/all] Error 2
make: *** [Makefile:128: all] Error 2

pre.py header contig INFO issue

Hi there

pre.py may has a small regex issue:
the default arg --fixchr only transformed chromosome names in "main text" of a vcf file(chromosome column), while it does not transform chromosome names at the contig INFO part in header lines.

the script pre.py used is :
bcftools view [my.vcf.gz] | perl -pe 's/^([0-9XYM])/chr$1/' | perl -pe s/chrMT/chrM/ | bcftools view -o /tmp/tmp.vcf.gz -O z

this works fine for main text, but it ignored header lines. The contig info still use b37 chromosome format(as below). This will cause errors when I use bed file contain hg19 chr format. (like cannot creat tabix index )
eg:
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<**ID=4,**length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG001
chr1 239339 rs184451216 A G 50 PASS platforms=1;platformnames=10X;datasets=1;datasetnames=10XChromium;callsets=1;c
chr1 239482 rs201702841 G T 50 PASS platforms=2;platformnames=10X,Illumina;datasets=2;datasetnames=10XChromium,HiS
chr1 240147 rs202156508 C T 50 PASS platforms=2;platformnames=10X,Illumina;datasets=2;datasetnames=10XChromium,HiS
chr1 240898 rs14799093

Ubuntu-based Docker image uses outdated pysam, incompatible with Tools/bamstats.py

The default ubuntu-based docker image is built using the default Python libraries from Ubuntu's repo, which for the pysam library are very outdated. This breaks the Tools/bamstats.py module, which calls pysam.idxstats(). This function previously returned a list instead of the string that the module expects, leading to an attribute exception.

bcftools error

I am trying to run som.py using the Docker file provided here. I keep getting bcftools errors that appear to be related to masking in the reference .fa file. Running vcfcheck, I get the same sequences between the VCF and reference file, but one is upper case and one is lower case, so it says they don't match. I think this may be due to the version of bcftools.

Here is the bcftools error:

2018-05-07 21:18:27,692 ERROR Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output: / [E::hts_idx_push] unsorted positions on sequence #1: 36937065 followed by 36937059
index: failed to create index for "/tmp/tmpv2p9Tv/normalized_truth.vcf.gz"

2018-05-07 21:18:27,692 ERROR Traceback (most recent call last):
2018-05-07 21:18:27,692 ERROR File "/opt/hap.py/bin/som.py", line 779, in
2018-05-07 21:18:27,693 ERROR main()
2018-05-07 21:18:27,693 ERROR File "/opt/hap.py/bin/som.py", line 276, in main
2018-05-07 21:18:27,693 ERROR args.ref)
2018-05-07 21:18:27,693 ERROR File "/opt/hap.py/lib/python27/Tools/bcftools.py", line 220, in preprocessVCF
2018-05-07 21:18:27,693 ERROR runBcftools("index", "-t", output)
2018-05-07 21:18:27,693 ERROR File "/opt/hap.py/lib/python27/Tools/bcftools.py", line 50, in runBcftools
2018-05-07 21:18:27,693 ERROR ". Return code was %i, output: %s / %s \n" % (rc, o, e))
2018-05-07 21:18:27,693 ERROR Exception: Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output: / [E::hts_idx_push] unsorted positions on sequence #1: 36937065 followed by 36937059index: failed to create index for "/tmp/tmpv2p9Tv/normalized_truth.vcf.gz"

Here is the output of vcfcheck using the reference .fa and vcf:

mismatched reference G should be g at chr1:115252609
mismatched reference C should be c at chr13:28589495
mismatched reference C should be c at chr13:28589509
mismatched reference AAG should be Aag at chr13:28602226
mismatched reference C should be c at chr13:28611669
mismatched reference A should be a at chr13:28611710
mismatched reference CTTT should be cttt at chr17:7578711
mismatched reference A should be a at chr17:7578837
mismatched reference AGCAGAGGCTTAAGGAGGAGGAAGAAGACAAGAAACGCAAAGAGGAGGAGGAG should be AGCAGAGGCTTAAGGAGGAGGAAGAAGACAAGAAACGCAAAgaggaggaggag at chr19:13054564
mismatched reference C should be c at chr19:33792631
mismatched reference T should be t at chr20:51296340
mismatched reference G should be g at chr21:36206932
mismatched reference T should be t at chr21:43606997
mismatched reference TAAA should be Taaa at chr7:148504906
mismatched reference A should be a at chr7:148508481
mismatched reference C should be c at chr7:148525904

Install fail (variant decomposition test)

Chr prefixed truth test successful
Chr prefix detection tests SUCCEEDED!
2017-04-18 13:00:40,482 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-04-18 13:00:40,483 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
Variant decomposition test Hap.py v0.3.7 from /home/mcgaugheyd/hap.py-install/bin
2017-04-18 13:00:41,274 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-04-18 13:00:41,275 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-04-18 13:00:41,518 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-04-18 13:00:41,519 ERROR    coercing to Unicode: need string or buffer, NoneType found
2017-04-18 13:00:41,519 ERROR    Traceback (most recent call last):
2017-04-18 13:00:41,519 ERROR      File "/home/mcgaugheyd/hap.py-install/bin/hap.py", line 460, in <module>
2017-04-18 13:00:41,519 ERROR        main()
2017-04-18 13:00:41,519 ERROR      File "/home/mcgaugheyd/hap.py-install/bin/hap.py", line 199, in main
2017-04-18 13:00:41,519 ERROR        if not os.path.exists(args.ref):
2017-04-18 13:00:41,519 ERROR      File "/home/mcgaugheyd/anaconda2/lib/python2.7/genericpath.py", line 26, in exists
2017-04-18 13:00:41,519 ERROR        os.stat(path)
2017-04-18 13:00:41,520 ERROR    TypeError: coercing to Unicode: need string or buffer, NoneType found
hap.py failed!
Variant decomposition test FAILED!
Traceback (most recent call last):
  File "install.py", line 334, in <module>
    main()
  File "install.py", line 330, in main
    test_haplotypes(source_dir, python_shebang, args)
  File "install.py", line 199, in test_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/home/mcgaugheyd/anaconda2/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /home/mcgaugheyd/hap.py-install && /usr/local/bin/hap.py/src/sh/run_tests.sh' returned non-zero exit status 1

install error

Install fails as shown below, do you have any idea what is wrong?
the dependencies are all installes and I ran
sudo -H python install.py ~/hap.py-install

I could later install with the 'make 'method but do not quite understand the need of the initial and failed install.

Thanks for your feedback

...
Skipping Locations.INDEL -- not present on both sides
Chr prefixed truth test successful
Chr prefix detection tests SUCCEEDED!
2017-10-29 10:25:42,256 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-10-29 10:25:42,257 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
Variant decomposition test Hap.py v0.3.10 from /home/luna.kuleuven.be/u0002316/hap.py-install/bin
2017-10-29 10:25:42,827 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-10-29 10:25:42,828 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-10-29 10:25:42,975 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-10-29 10:25:42,976 ERROR    Please specify a valid reference path using -r.
2017-10-29 10:25:42,976 ERROR    Traceback (most recent call last):
2017-10-29 10:25:42,976 ERROR      File "/home/luna.kuleuven.be/u0002316/hap.py-install/bin/hap.py", line 511, in <module>
2017-10-29 10:25:42,976 ERROR        main()
2017-10-29 10:25:42,976 ERROR      File "/home/luna.kuleuven.be/u0002316/hap.py-install/bin/hap.py", line 211, in main
2017-10-29 10:25:42,976 ERROR        raise Exception("Please specify a valid reference path using -r.")
2017-10-29 10:25:42,976 ERROR    Exception: Please specify a valid reference path using -r.
hap.py failed!
Variant decomposition test FAILED!
Traceback (most recent call last):
  File "install.py", line 298, in <module>
    main()
  File "install.py", line 294, in main
    test_haplotypes(source_dir, python_shebang, args)
  File "install.py", line 170, in test_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /home/luna.kuleuven.be/u0002316/hap.py-install && /opt/biotools/hap.py/src/sh/run_tests.sh' returned non-zero exit status 1

confirm numbers for one of the Illumina BaseSpace Hiseq V4 reference datasets

I downloaded one of the vcf files for NA12877 from a Public Dataset in Basespace (see screenshot).
https://basespace.illumina.com/analyses/18540672?preview=False&projectId=16095081
Ran it against hap.py and got the values below:

i=NA12877-rep1_S1.genome.vcf.gz && sudo docker run -it -v $PWD:/data/input -v $GROUP/PlatinumGenomes/:/data/PlatinumGenomes pkrusche/hap.py /opt/hap.py/bin/hap.py /data/PlatinumGenomes/hg19/8.0.1/NA12877/NA12877.vcf.gz /data/input/$i -o /data/test
HAPPY
Benchmarking Summary:
                 TRUTH.TOTAL  QUERY.TOTAL  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA
Locations.INDEL       591048       622986       0.833987          0.781263               0
Locations.SNP        3592710      3226918       0.889341          0.989879               0

Are these numbers correct? I was expecting SNP Recall to be in the order of 0.98 and SNP Precision to be around 0.998 .

Preprocess jobs failing without specific error message

I'm trying to run a comparison of the GIAB HG001 reference VCF against some local data, and am coming up with consistent crashes in hap.py. Unfortunately, the error messages don't point towards a specific cause.

Are there any instructions for how to isolate the root cause of problems like this? Is there a way of turning on debugging messages, or really any message that could point to the cause of this error?

2017-01-23 16:45:14,544 ERROR    ------------------------------------------------------------
2017-01-23 16:45:14,544 ERROR    Traceback (most recent call last):
2017-01-23 16:45:14,544 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 71, in parMapper
2017-01-23 16:45:14,544 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2017-01-23 16:45:14,544 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 64, in preprocessWrapper
2017-01-23 16:45:14,545 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2017-01-23 16:45:14,545 ERROR      File "/opt/python27/lib/python2.7/subprocess.py", line 541, in check_call
2017-01-23 16:45:14,545 ERROR        raise CalledProcessError(retcode, cmd)
2017-01-23 16:45:14,546 ERROR    CalledProcessError: Command 'preprocess /tmp/tmpVFXClX.vcf.gz:* -l 4:46200230-49670140 -o /tmp/input.4:46200230-496701409qciL6.prep.vcf.gz -V 1 -L 0 -r /ssd/reference/fda/nih.GRCh37.fasta' returned non-zero exit status 1
2017-01-23 16:45:14,546 ERROR    ------------------------------------------------------------
2017-01-23 16:45:33,072 ERROR    One of the preprocess jobs failed
2017-01-23 16:45:33,072 ERROR    Traceback (most recent call last):
2017-01-23 16:45:33,072 ERROR      File "/opt/local/bin/hap.py", line 419, in <module>
2017-01-23 16:45:33,072 ERROR        main()
2017-01-23 16:45:33,073 ERROR      File "/opt/local/bin/hap.py", line 306, in main
2017-01-23 16:45:33,073 ERROR        args.gender)  # same gender as truth above
2017-01-23 16:45:33,073 ERROR      File "/opt/hap.py/bin/pre.py", line 202, in preprocess
2017-01-23 16:45:33,073 ERROR        haploid_x=gender == "male")
2017-01-23 16:45:33,073 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 201, in partialCredit
2017-01-23 16:45:33,073 ERROR        raise Exception("One of the preprocess jobs failed")
2017-01-23 16:45:33,073 ERROR    Exception: One of the preprocess jobs failed```

Support for gVCF

Does hap.py compare gVCFs of various forms? Aside from the standard form on page 24 of gVCF spec where a symbolic alt allele <*> indicates a hom ref block, GATK uses a different symbolic alt allele < NON_REF >. Some gVCFs mark no call blocks rather than hom ref blocks, and some mark both. Does hap.py accept all these forms of gVCFs? Thanks a lot.

Support for allele only comparisons

Hi,

Is it possible to use hap.py to make allele only comparisons? (i.e. Can the genotype check be disabled?). I know som.py does this, but I'd like to keep hap.py's haplotype resolution - only without the genotype checking.

vcf-eval appears to support this with the --squash-ploidy option.

Thanks.

pointer to a prebuilt docker image and running instructions?

Is there a URL/pointer to a prebuilt docker image of hap.py with hg19?
And given an input vcf.gz/vcf.gz.tbi pair of files, how shall one run hap.py with such prebuilt docker image?

Thx

Error in creating mount source path with docker

I was trying to use hap.py with Google DeepVariant, with the following command.

docker run -it -v `pwd`:/data pkrusche/hap.py /opt/hap.py/bin/hap.py \
  /data/quickstart-testdata/test_nist.b37_chr20_100kbp_at_10mb.vcf.gz \
  "/data/${FINAL_OUTPUT_VCF}" \
  -f /data/quickstart-testdata/test_nist.b37_chr20_100kbp_at_10mb.bed \
  -r "/data/${REF}" \
  -o "/data/${OUTPUT_DIR}/happy.output" \
  --engine=vcfeval \
  -l chr20:10000000-10010000

However, I get the following error

docker: Error response from daemon: error while creating mount source path '/homes/pyk12/Desktop/DeepVariant': mkdir /homes: read-only file system.

I am not sure if I have installed docker or hap.py wrong, or have made other mistakes. I have tested docker with docker run hello-world and apparently it can run. Do you have any idea what the problem is?

I am using ubuntu 16.04 and I used docker pull pkrusche/hap.py to get the docker images. Following is the output of the docker info.

Containers: 39
 Running: 0
 Paused: 0
 Stopped: 39
Images: 41
Server Version: 17.06.2-ce
Storage Driver: aufs
 Root Dir: /var/snap/docker/common/var-lib-docker/aufs
 Backing Filesystem: xfs
 Dirs: 139
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-119-generic
Operating System: Ubuntu Core 16
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.55 GiB
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 16
 Goroutines: 26
 System Time: 2018-04-25T16:14:55.308825278+01:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Cannot find matching haplotype pairs and Too many het nodes

Got this two errors while trying to run:

python2 hap.py/bin/hap.py
example/happy/PG_NA12878_chr21.vcf.gz
example/happy/NA12878_chr21.vcf.gz
-f example/happy/PG_Conf_chr21.bed.gz
-o test

I'm using hg19 from UCSC, could that be the issue?

2018-01-19 12:33:41,959 WARNING [DEBUG] Error at hap.py/src/c++/lib/diploidgraphs/DiploidReference.cpp:287
2018-01-19 12:33:41,959 WARNING Cannot find matching haplotype pairs at chr21:37213761-37213821
2018-01-19 12:33:41,959 WARNING xcmp(_Z10error_stopPKciS0_z+0x10b)[0x542420]
2018-01-19 12:33:41,959 WARNING xcmp(_ZN10haplotypes16DiploidReference9setRegionEPKcllRKNSt7__cxx114listIN7variant8VariantsESaIS6_EEEi+0x964)[0x55b97c]
2018-01-19 12:33:41,959 WARNING xcmp(_ZN10haplotypes14DiploidCompare9setRegionEPKcllRKNSt7__cxx114listIN7variant8VariantsESaIS6_EEEii+0x28b)[0x5563d5]
2018-01-19 12:33:41,959 WARNING xcmp[0x53cf8e]
2018-01-19 12:33:41,959 WARNING xcmp(main+0x2458)[0x54007f]
2018-01-19 12:33:41,959 WARNING /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f4000842830]
2018-01-19 12:33:41,959 WARNING xcmp(_start+0x29)[0x53c399]
2018-01-19 12:33:41,959 WARNING
2018-01-19 12:33:41,959 WARNING [DEBUG END]
2018-01-19 12:33:41,959 WARNING
2018-01-19 12:33:41,959 WARNING [DEBUG] Error at hap.py/src/c++/lib/diploidgraphs/DiploidReference.cpp:181
2018-01-19 12:33:41,960 WARNING Too many het nodes (21) at chr21:37868998-37869302
2018-01-19 12:33:41,960 WARNING xcmp(_Z10error_stopPKciS0_z+0x10b)[0x542420]
2018-01-19 12:33:41,960 WARNING xcmp(_ZN10haplotypes16DiploidReference9setRegionEPKcllRKNSt7__cxx114listIN7variant8VariantsESaIS6_EEEi+0x2a3)[0x55b2bb]
2018-01-19 12:33:41,960 WARNING xcmp(_ZN10haplotypes14DiploidCompare9setRegionEPKcllRKNSt7__cxx114listIN7variant8VariantsESaIS6_EEEii+0x28b)[0x5563d5]
2018-01-19 12:33:41,960 WARNING xcmp[0x53cf8e]
2018-01-19 12:33:41,960 WARNING xcmp(main+0x2458)[0x54007f]
2018-01-19 12:33:41,960 WARNING /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f4000842830]
2018-01-19 12:33:41,960 WARNING xcmp(_start+0x29)[0x53c399]
2018-01-19 12:33:41,960 WARNING
2018-01-19 12:33:41,960 WARNING [DEBUG END]

pre.py: Segfault while running `preprocess`

I have a file (attached, repro.vcf.gz) which will consistently cause a segmentation fault in preprocess when running pre.py.

The problem can be reproduced with the latest pkrusche/hap.py Docker image by having the attached repro.vcf.gz file as well as hg19.fa and .fai files in the current working directory and running the following command:

$ docker run -it -v `pwd`:/data pkrusche/hap.py /opt/hap.py/bin/pre.py -r /data/hg19.fa /data/repro.vcf.gz /data/out.vcf.gz

Output:

2018-01-24 10:35:39,664 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2018-01-24 10:35:39,670 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[I] Total VCF records:         39000
[I] Non-reference VCF records: 39
2018-01-24 10:35:58,838 ERROR    Preprocess command preprocess /tmp/tmpZtDJn6.vcf.gz:* -l chr16:1-2147483647 -o /tmp/input.chr16:1-2147483647AV1SRT.prep.vcf.gz -V 1 -L 1 -r /data/hg19.fa failed. Outputs are here /tmp/stdout5XGXK7.log / /tmp/stderryoI6H2.log
2018-01-24 10:35:58,839 ERROR    Segmentation fault (core dumped)
2018-01-24 10:35:58,839 ERROR    Exception when running <function preprocessWrapper at 0x7f7be157e9b0>:
2018-01-24 10:35:58,839 ERROR    ------------------------------------------------------------
2018-01-24 10:35:58,839 ERROR    Traceback (most recent call last):
2018-01-24 10:35:58,839 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2018-01-24 10:35:58,840 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2018-01-24 10:35:58,840 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2018-01-24 10:35:58,840 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2018-01-24 10:35:58,840 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2018-01-24 10:35:58,841 ERROR        raise CalledProcessError(retcode, cmd)
2018-01-24 10:35:58,841 ERROR    CalledProcessError: Command 'preprocess /tmp/tmpZtDJn6.vcf.gz:* -l chr16:1-2147483647 -o /tmp/input.chr16:1-2147483647AV1SRT.prep.vcf.gz -V 1 -L 1 -r /data/hg19.fa' returned non-zero exit status 139
2018-01-24 10:35:58,841 ERROR    ------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hap.py/bin/pre.py", line 395, in <module>
    main()
  File "/opt/hap.py/bin/pre.py", line 391, in main
    preprocessWrapper(args)
  File "/opt/hap.py/bin/pre.py", line 241, in preprocessWrapper
    args.somatic_allele_conversion)
  File "/opt/hap.py/bin/pre.py", line 203, in preprocess
    haploid_x=gender == "male")
  File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 214, in partialCredit
    raise Exception("One of the preprocess jobs failed")
Exception: One of the preprocess jobs failed

The file is about as small as what will consistently reproduce the issue. If I remove some lines from the end, it might only segfault occasionally.

Running the preprocess program in GDB (locally compiled binary, not using the Docker image anymore) indicates that it is crashing in htslib code:

$ gdb --args preprocess repro.vcf.gz:* -o output.vcf.gz -V 1 -L 1 -r hg19.fa
GNU gdb (GDB) Red Hat Enterprise Linux 7.8.2-38.el7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from preprocess...done.
(gdb) run
Starting program: /opt/hap.py/bin/preprocess repro.vcf.gz:\* -o output.vcf.gz -V 1 -L 1 -r hg19.fa
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00000000005504d8 in vcf_parse_format (s=<error reading variable: Cannot access memory at address 0x7fffff7fef88>, h=<error reading variable: Cannot access memory at address 0x7fffff7fef80>, 
    v=<error reading variable: Cannot access memory at address 0x7fffff7fef78>, p=<error reading variable: Cannot access memory at address 0x7fffff7fef70>, 
    q=<error reading variable: Cannot access memory at address 0x7fffff7fef68>) at vcf.c:1883
1883	vcf.c: No such file or directory.
(gdb) where 6
#0  0x00000000005504d8 in vcf_parse_format (s=<error reading variable: Cannot access memory at address 0x7fffff7fef88>, h=<error reading variable: Cannot access memory at address 0x7fffff7fef80>, 
    v=<error reading variable: Cannot access memory at address 0x7fffff7fef78>, p=<error reading variable: Cannot access memory at address 0x7fffff7fef70>, 
    q=<error reading variable: Cannot access memory at address 0x7fffff7fef68>) at vcf.c:1883
#1  0x0000000000552dd1 in vcf_parse (s=0x8a5fd0, h=0x8a5150, v=0x8b7660) at vcf.c:2377
#2  0x0000000000540274 in _reader_fill_buffer (files=0x8a5f70, reader=0x8a6b00) at synced_bcf_reader.c:527
#3  0x0000000000540577 in _reader_next_line (files=0x8a5f70) at synced_bcf_reader.c:591
#4  0x00000000005407f3 in bcf_sr_next_line (files=0x8a5f70) at synced_bcf_reader.c:633
#5  0x00000000004ae219 in variant::VariantReader::advance() ()
(More stack frames follow...)

Counting local matches

I have a question related to calculating local match statistics. The paper suggests:
TP.LM = QUERY.TP + FP.GT + FP.AL
FN.LM = TRUTH.FN - FP.GT - FP.AL
FP.LM = QUERY.FP - FP.GT - FP.AL
It seems that the FP.AL are counted relative to the query. If I try to use those numbers relative to the truth set (for example to calculate recall with local matching) I get numbers that don't make sense, simply because the local matches aren't 1-to-1. Perhaps this is simply a feature request, but is there a way to get the FN.AL counts? Or am I interpreting something wrong?

Exception: Cannot read sample names from truth input file

Downloaded hg19/8.0.1/NA12878/NA12878.vcf.gz and index file from the Platinum Genomes ftp site.
Running on a small test.vcf.gz file like so:

sudo docker run -it -v $PWD:/data/input -v $GROUP/PlatinumGenomes/:/data/PlatinumGenomes pkrusche/hap.py /opt/hap.py/bin/hap.py /data/PlatinumGenomes/hg19/8.0.1/NA12878/NA12878.vcf.gz /data/input/test.vcf.gz -o /data/test                     2015-12-02 10:16:54,779 WARNING  Reference sequence check failed! Please ensure that truth and query VCF use the same reference sequence as hap.py. XCMP may fail if this is not the case, and the results will not be  accurate.
2015-12-02 10:16:54,779 WARNING  No calls for location chr2 in query!
2015-12-02 10:16:54,779 WARNING  No calls for location chr3 in query!
2015-12-02 10:16:54,779 WARNING  No calls for location chr4 in query!
2015-12-02 10:16:54,779 WARNING  No calls for location chr5 in query!
2015-12-02 10:16:54,779 WARNING  No calls for location chr6 in query!
2015-12-02 10:16:54,779 WARNING  No calls for location chr7 in query!
2015-12-02 10:16:54,779 WARNING  No calls for location chr8 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr9 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr10 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr11 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr12 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr13 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr14 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr15 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr16 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr17 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr18 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr19 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr20 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr21 in query!
2015-12-02 10:16:54,780 WARNING  No calls for location chr22 in query!
2015-12-02 10:16:56,399 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:56,453 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:56,548 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:56,632 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:56,685 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:56,800 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,102 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,400 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,400 WARNING  [W::vcf_parse] INFO 'CS' is not defined in the header, assuming Type=String
2015-12-02 10:16:57,400 WARNING  [W::vcf_parse] INFO 'Context' is not defined in the header, assuming Type=String
2015-12-02 10:16:57,400 WARNING  [W::vcf_parse] INFO 'DP' is not defined in the header, assuming Type=String
2015-12-02 10:16:57,400 WARNING  [W::vcf_parse] INFO 'MQ0' is not defined in the header, assuming Type=String
2015-12-02 10:16:57,400 WARNING  [W::vcf_parse] INFO 'NS' is not defined in the header, assuming Type=String
2015-12-02 10:16:57,400 WARNING  [W::vcf_parse] INFO 'REF' is not defined in the header, assuming Type=String
2015-12-02 10:16:57,400 WARNING  [W::vcf_parse] INFO 'SB' is not defined in the header, assuming Type=String
2015-12-02 10:16:57,643 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,747 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,828 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,942 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,992 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:57,992 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,074 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,329 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,345 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,377 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,451 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,508 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,622 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,626 WARNING  Could not parse the header line: "##vcfProcessLog=<InputVCF=<>, InputVCFSource=<>, InputVCFVer=<1.1>, InputVCFParam=<> InputVCFgeneAnno=<>>"
2015-12-02 10:16:58,627 ERROR    Cannot read sample names from truth input file
2015-12-02 10:16:58,627 ERROR    Traceback (most recent call last):
2015-12-02 10:16:58,627 ERROR      File "/opt/hap.py/bin/hap.py", line 804, in <module>
2015-12-02 10:16:58,627 ERROR        main()
2015-12-02 10:16:58,628 ERROR      File "/opt/hap.py/bin/hap.py", line 575, in main
2015-12-02 10:16:58,628 ERROR        raise Exception("Cannot read sample names from truth input file")
2015-12-02 10:16:58,628 ERROR    Exception: Cannot read sample names from truth input file

hap.py failed when running validation process

My script:
export HGREF=/mypath/hs37d5.fasta
hap.py /path/ctrl.vcf /path/query.vcf -o happy.out -r /path/hs37d5.fasta
g++ version gcc/gcc-7.2.0/bin/g++
Python 2.7.13 :: Anaconda custom (64-bit)

but I still got these ERRORs :
2017-12-28 11:39:56,825 ERROR [stderr] regex_error
2017-12-28 11:39:57,010 ERROR Command 'quantify /tmp/hap.py.result.kQ8D10.vcf.gz -o happy.out12268.roc.tsv -r hs37d5.fasta --threads 32 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v happy.out12268.vcf.gz --roc-regions ''' returned non-zero exit status 1
2017-12-28 11:39:57,010 ERROR Traceback (most recent call last):
2017-12-28 11:39:57,010 ERROR File "/nextcode/sge_software/hap/hapv0.3.10/bin/hap.py", line 511, in
2017-12-28 11:39:57,021 ERROR main()
2017-12-28 11:39:57,022 ERROR File "/nextcode/sge_software/hap/hapv0.3.10/bin/hap.py", line 496, in main
2017-12-28 11:39:57,022 ERROR qfy.quantify(args)
2017-12-28 11:39:57,022 ERROR File "/nextcode/sge_software/hap/hapv0.3.10/bin/qfy.py", line 129, in quantify
2017-12-28 11:39:57,023 ERROR strat_fixchr=args.strat_fixchr)
2017-12-28 11:39:57,023 ERROR File "/nextcode/sge_software/hap/hapv0.3.10/lib/python27/Haplo/quantify.py", line 177, in run_quantify
2017-12-28 11:39:57,027 ERROR subprocess.check_call(run_str, shell=True, stdout=tfo, stderr=tfe)
2017-12-28 11:39:57,027 ERROR File "/nextcode/sge_software/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
2017-12-28 11:39:57,050 ERROR raise CalledProcessError(retcode, cmd)
2017-12-28 11:39:57,051 ERROR CalledProcessError: Command 'quantify /tmp/hap.py.result.kQ8D10.vcf.gz -o happy.out12268.roc.tsv -r hs37d5.fasta --threads 32 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v happy.out12268.vcf.gz --roc-regions ''' returned non-zero exit status 1

sudo docker build .

[...]
*** 2391 failures detected in test suite "Master Test Suite"
Boost Unit tests FAILED!
Traceback (most recent call last):
  File "install.py", line 315, in <module>
    main()
  File "install.py", line 311, in main
    test_haplotypes(source_dir, python_shebang, args)
  File "install.py", line 191, in test_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /opt/hap.py && /opt/hap.py-source/src/sh/run_tests.sh' returned non-zero exit status 1
The command '/bin/sh -c HG19=/opt/hap.py-data/hg19.fa python install.py /opt/hap.py' returned a non-zero code: 1

Any ideas?

gvcf2bed issue

Hi,

I have installed hap.py through bioconda and I tried to run the program according to the read me document:

hap.py hap.py/example/happy/PG_NA12878_chr21.vcf.gz hap.py/example/happy/NA12878_chr21.vcf.gz -r /mnt/hds/proj/bioinfo/MIP_ANALYSIS/references_4.0/Homo_sapiens.GRCh37.d5.fasta -f hap.py/example/happy/PG_Conf_chr21.bed.gz -o test

I got the following error:

2017-09-04 14:10:07,886 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-09-04 14:10:07,897 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[W] overlapping records at chr21:10993857 for sample 0
[W] Symbolic / SV ALT alleles at chr21:15847469
[W] Variants that overlap on the reference allele: 144
[W] Variants that have symbolic ALT alleles: 14
[I] Total VCF records:         65402
[I] Non-reference VCF records: 65402
Contig chr21 is not known
2017-09-04 14:10:21,531 ERROR    Command 'gvcf2bed /tmp/truth.ppCDsZ5w.vcf.gz -r /mnt/hds/proj/bioinfo/MIP_ANALYSIS/references_4.0/Homo_sapiens.GRCh37.d5.fasta -o /tmp/tmpqpBTVv.bed -T hap.py/example/happy/PG_Conf_chr21.bed.gz' returned non-zero exit status 1
2017-09-04 14:10:21,531 ERROR    Traceback (most recent call last):
2017-09-04 14:10:21,531 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/bin/hap.py", line 460, in <module>
2017-09-04 14:10:21,532 ERROR        main()
2017-09-04 14:10:21,532 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/bin/hap.py", line 282, in main
2017-09-04 14:10:21,532 ERROR        conf_temp = Haplo.gvcf2bed.gvcf2bed(args.vcf1, args.ref, args.fp_bedfile, args.scratch_prefix)
2017-09-04 14:10:21,532 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/lib/python27/Haplo/gvcf2bed.py", line 38, in gvcf2bed
2017-09-04 14:10:21,544 ERROR        subprocess.check_call(cmdline, shell=True)
2017-09-04 14:10:21,545 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/lib/python2.7/subprocess.py", line 186, in check_call
2017-09-04 14:10:21,561 ERROR        raise CalledProcessError(retcode, cmd)
2017-09-04 14:10:21,561 ERROR    CalledProcessError: Command 'gvcf2bed /tmp/truth.ppCDsZ5w.vcf.gz -r /mnt/hds/proj/bioinfo/MIP_ANALYSIS/references_4.0/Homo_sapiens.GRCh37.d5.fasta -o /tmp/tmpqpBTVv.bed -T hap.py/example/happy/PG_Conf_chr21.bed.gz' returned non-zero exit status 1

I downloaded gvcf2bed and tried to run the same command again and got a new error.

2017-09-04 14:25:11,100 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2017-09-04 14:25:11,104 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[W] overlapping records at chr21:10993857 for sample 0
[W] Symbolic / SV ALT alleles at chr21:15847469
[W] Variants that overlap on the reference allele: 144
[W] Variants that have symbolic ALT alleles: 14
[I] Total VCF records:         65402
[I] Non-reference VCF records: 65402
usage: gvcf2bed [-h] -I INPUT -O OUTPUT [-s SAMPLE] [-q QUALITY]
                [-nq NON_VARIANT_QUALITY] [-b]
gvcf2bed: error: argument -I/--input is required
2017-09-04 14:25:12,072 ERROR    Command 'gvcf2bed /tmp/truth.ppfH1zZb.vcf.gz -r /mnt/hds/proj/bioinfo/MIP_ANALYSIS/references_4.0/Homo_sapiens.GRCh37.d5.fasta -o /tmp/tmpCeSiNo.bed -T hap.py/example/happy/PG_Conf_chr21.bed.gz' returned non-zero exit status 2
2017-09-04 14:25:12,072 ERROR    Traceback (most recent call last):
2017-09-04 14:25:12,072 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/bin/hap.py", line 460, in <module>
2017-09-04 14:25:12,072 ERROR        main()
2017-09-04 14:25:12,073 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/bin/hap.py", line 282, in main
2017-09-04 14:25:12,073 ERROR        conf_temp = Haplo.gvcf2bed.gvcf2bed(args.vcf1, args.ref, args.fp_bedfile, args.scratch_prefix)
2017-09-04 14:25:12,073 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/lib/python27/Haplo/gvcf2bed.py", line 38, in gvcf2bed
2017-09-04 14:25:12,073 ERROR        subprocess.check_call(cmdline, shell=True)
2017-09-04 14:25:12,073 ERROR      File "/mnt/hds/proj/bioinfo/SERVER/miniconda/envs/rnaseq/lib/python2.7/subprocess.py", line 186, in check_call
2017-09-04 14:25:12,074 ERROR        raise CalledProcessError(retcode, cmd)
2017-09-04 14:25:12,074 ERROR    CalledProcessError: Command 'gvcf2bed /tmp/truth.ppfH1zZb.vcf.gz -r /mnt/hds/proj/bioinfo/MIP_ANALYSIS/references_4.0/Homo_sapiens.GRCh37.d5.fasta -o /tmp/tmpCeSiNo.bed -T hap.py/example/happy/PG_Conf_chr21.bed.gz' returned non-zero exit status 2

It seems like gvcf2bed only supports in python 3 but I was unable to install hap.py in python 3. This probably isn't causing the error but which python version you recommend to run hap.py?

CalledProcessError: Command xcmp

Hi, I am comparing two vcf file, and get the following errors:

[W] overlapping records at 1:6799149 for sample 0
[W] Variants that overlap on the reference allele: 1346
[I] Total VCF records:         4029532
[I] Non-reference VCF records: 4029532
[W] overlapping records at 1:8149316 for sample 0
[W] Variants that overlap on the reference allele: 464
[I] Total VCF records:         4003155
[I] Non-reference VCF records: 4003155
2018-01-10 12:31:59,960 WARNING  No calls for location MT in query!
2018-01-10 12:34:22,279 WARNING  [W] too many AD fields at 1:49573365 max_ad = 6 retrieved: 7
2018-01-10 12:34:22,279 WARNING  *** Error in `xcmp': free(): invalid next size (fast): 0x0000000002c96e90 ***
2018-01-10 12:34:22,280 WARNING  ======= Backtrace: =========
2018-01-10 12:34:22,280 WARNING  /usr/lib64/libc.so.6(+0x7d023)[0x7efdc5c05023]
2018-01-10 12:34:22,280 WARNING  xcmp(_ZN7variant13VariantReader7advanceEv+0x211f)[0x56eac3]
2018-01-10 12:34:22,280 WARNING  xcmp(main+0x2222)[0x533839]
2018-01-10 12:34:22,280 WARNING  /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7efdc5ba9b15]
2018-01-10 12:34:22,280 WARNING  xcmp[0x52fd41]
2018-01-10 12:34:22,280 WARNING  ======= Memory map: ========
2018-01-10 12:34:22,280 WARNING  00400000-007a0000 r-xp 00000000 00:25 20026347337                        tools/hap.py-build/bin/xcmp
...
...
2018-01-10 12:47:17,996 WARNING  ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
2018-01-10 12:47:17,996 ERROR    Exception when running <function xcmpWrapper at 0x7fa633357668>:
2018-01-10 12:47:17,996 ERROR    ------------------------------------------------------------
2018-01-10 12:47:17,997 ERROR    Traceback (most recent call last):
2018-01-10 12:47:17,997 ERROR      File "tools/hap.py-build/lib/python27/Tools/parallel.py", line 71, in parMapper
2018-01-10 12:47:17,997 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2018-01-10 12:47:17,997 ERROR      File "tools/hap.py-build/lib/python27/Haplo/xcmp.py", line 69, in xcmpWrapper
2018-01-10 12:47:17,997 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2018-01-10 12:47:17,997 ERROR      File "/home/cc439/anaconda2/envs/hornet/lib/python2.7/subprocess.py", line 186, in check_call
2018-01-10 12:47:17,997 ERROR        raise CalledProcessError(retcode, cmd)
2018-01-10 12:47:17,997 ERROR    CalledProcessError: Command 'xcmp /tmp/truth.ppooeVs5.vcf.gz /tmp/query.ppR_g6QK.vcf.gz -l 3:46861889-55529583 -o /tmp/result.3:46861889-55529583yNJudr.bcf -r refdata-b37-2.1.0/fasta/genome.fa -f 0 -n 16768 --expand-hapblocks 30 --window 50 --no-hapcmp 0 --qq QUAL' returned non-zero exit status -6
2018-01-10 12:47:17,997 ERROR    ------------------------------------------------------------

Any possible reasons for this?

qfy.py killed

Hi,

I'm running qfy.py using a vcf produced by vcfeval in ga4gh mode. The execution lasts around 2h after which it fails. The command line and the stderr are the following:

python hap.py/bin/qfy.py --report-prefix combined --write-vcf --write-counts --verbose --type ga4gh --threads 8 --stratification stratification.tsv --roc-delta 20 --roc QQ --reference human_g1k_v37_decoy.fasta combined.vcf.gz

2018-03-12T17:39:59.736617253Z 2018-03-12 17:39:59,089 ERROR    [stderr] Added region file 'truseq.bed' as 'truseq' (214115 intervals)
2018-03-12T17:39:59.800729266Z 2018-03-12 17:39:59,666 ERROR    [stderr] Killed
2018-03-12T17:39:59.800761058Z 2018-03-12 17:39:59,728 ERROR    Command 'quantify combined.vcf.gz -o combined.roc.tsv -r human_g1k_v37_decoy.fasta --threads 8 --output-vtc 0 --output-rocs 1 --type ga4gh --qq QQ --roc-delta 20.000000 --clean-info 1 --fix-chr-regions 0 -v combined.vcf.gz -R 'truseq:truseq.bed' --roc-regions '*'' returned non-zero exit status 137
2018-03-12T17:39:59.801256247Z 2018-03-12 17:39:59,750 ERROR    Traceback (most recent call last):
2018-03-12T17:39:59.801259753Z 2018-03-12 17:39:59,761 ERROR      File "/opt/hap.py/bin/qfy.py", line 378, in <module>
2018-03-12T17:39:59.872980577Z 2018-03-12 17:39:59,865 ERROR        main()
2018-03-12T17:39:59.873145682Z 2018-03-12 17:39:59,872 ERROR      File "/opt/hap.py/bin/qfy.py", line 374, in main
2018-03-12T17:39:59.873228664Z 2018-03-12 17:39:59,873 ERROR        quantify(args)
2018-03-12T17:39:59.890464487Z 2018-03-12 17:39:59,873 ERROR      File "/opt/hap.py/bin/qfy.py", line 129, in quantify
2018-03-12T17:39:59.890505413Z 2018-03-12 17:39:59,873 ERROR        strat_fixchr=args.strat_fixchr)
2018-03-12T17:39:59.890519038Z 2018-03-12 17:39:59,874 ERROR      File "/opt/hap.py/lib/python27/Haplo/quantify.py", line 178, in run_quantify
2018-03-12T17:39:59.911044306Z 2018-03-12 17:39:59,910 ERROR        subprocess.check_call(run_str, shell=True, stdout=tfo, stderr=tfe)
2018-03-12T17:39:59.911065951Z 2018-03-12 17:39:59,910 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2018-03-12T17:39:59.951891424Z 2018-03-12 17:39:59,951 ERROR        raise CalledProcessError(retcode, cmd)
2018-03-12T17:39:59.965796603Z 2018-03-12 17:39:59,964 ERROR    CalledProcessError: Command 'quantify combined.vcf.gz -o combined.roc.tsv -r human_g1k_v37_decoy.fasta --threads 8 --output-vtc 0 --output-rocs 1 --type ga4gh --qq QQ --roc-delta 20.000000 --clean-info 1 --fix-chr-regions 0 -v combined.vcf.gz -R 'truseq:truseq.bed' --roc-regions '*'' returned non-zero exit status 137

I'd be happy to share the input files if that would help.

P.S. I truncated the file paths from the stderr just for simplicity's sake.

Thanks

Confidence intervals for hap.py

I see in the documentation for som.py that it outputs confidence intervals for recall and precision using Jeffreys method for binomial proportions. This is calculated using the script: https://github.com/Illumina/hap.py/blob/master/src/python/Tools/ci.py.

As far as I can see there isn't an option to output confidence intervals for hap.py (please do correct me if I'm wrong!). In future versions would it be possible to output this information to the summary.csv file?

In the meantime would the above Tools/ci.py script be suitable for us to use to calculate the confidence intervals for the recall and precision output from hap.py? Or would you recommend a different method?

install failed at Ubuntu 16.04, 'DataFrame' object has no attribute 'sort'

I try to install at Ubuntu 16.04, and install all depencies in the dockerfile.
With numexpr is 2.6.2 and pandas 0.20.3,numpy 1.13.1
The install is failed ruing the test file, the related log is as following. Does it because som.py used deperated
method sort which is not supported by the pandas?:

/usr/bin/python /home/jianfeng.qian/hap.py-install/bin/som.py /home/jianfeng.qian/graph_geno/hap.py/src/sh/../../example/sompy/PG_admix_truth_snvs.vcf.gz /home/jianfeng.qian/graph_geno/hap.py/src/sh/../../example/sompy/strelka_admix_snvs.vcf.gz -o /tmp/sompy.L2qFyDafe7 -P --feature-table hcc.strelka.indel
2017-07-14 02:58:03,424 WARNING  Missing feature I.QSI_NT
2017-07-14 02:58:03,429 WARNING  Missing feature I.RC
2017-07-14 02:58:03,430 WARNING  Missing feature I.IC
2017-07-14 02:58:03,430 WARNING  Missing feature I.IHP
2017-07-14 02:58:03,430 WARNING  Missing feature S.1.BCN50
2017-07-14 02:58:03,430 WARNING  Missing feature S.2.BCN50
2017-07-14 02:58:03,430 WARNING  Missing feature S.1.FDP50
2017-07-14 02:58:03,431 WARNING  Missing feature S.2.FDP50
2017-07-14 02:58:03,431 WARNING  Missing feature S.1.TAR
2017-07-14 02:58:03,431 WARNING  Missing feature S.2.TAR
2017-07-14 02:58:03,431 WARNING  Missing feature S.1.TIR
2017-07-14 02:58:03,431 WARNING  Missing feature S.2.TIR
2017-07-14 02:58:03,432 WARNING  Missing feature S.1.TOR
2017-07-14 02:58:03,432 WARNING  Missing feature S.2.TOR
2017-07-14 02:58:17,032 ERROR    'DataFrame' object has no attribute 'sort'
2017-07-14 02:58:17,038 ERROR    Traceback (most recent call last):
2017-07-14 02:58:17,038 ERROR      File "/home/jianfeng.qian/hap.py-install/bin/som.py", line 779, in <module>
2017-07-14 02:58:17,038 ERROR        main()
2017-07-14 02:58:17,038 ERROR      File "/home/jianfeng.qian/hap.py-install/bin/som.py", line 500, in main
2017-07-14 02:58:17,039 ERROR        tps.sort(["CHROM", "POS"], inplace=True)
2017-07-14 02:58:17,039 ERROR      File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.3-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 3081, in __getattr__
2017-07-14 02:58:17,041 ERROR        return object.__getattribute__(self, name)
2017-07-14 02:58:17,041 ERROR    AttributeError: 'DataFrame' object has no attribute 'sort'
som.py failed!
Som.py test FAILED!
Traceback (most recent call last):
  File "install.py", line 325, in <module>
    main()
  File "install.py", line 321, in main
    test_haplotypes(source_dir, python_shebang, args)
  File "install.py", line 190, in test_haplotypes
    subprocess.check_call(to_run, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /home/jianfeng.qian/hap.py-install && /home/jianfeng.qian/graph_geno/hap.py/src/sh/run_tests.sh' returned non-zero exit status 1