Comments (3)
Thanks! Fixing this is on our backlog; it can also be fixed using a one-liner to replace contig=<ID=...
with contig=<ID=chr
.
I have not seen issues with tabix indexing; are you using tabix or bcftools to do this?
from hap.py.
Hi Peter
when using -R option in pre.py(also -R option in bcftools), the inner build bcftools in pre.py may cause issues if the contig info not transformed. However, when using -T option, there was no error reported.
it is because in bcftools, -R option required "sequence names must match exactly, chr20 is not the same as 20" . So, if I have a vcf file using b37 format, but the bed file is hg19 format, -R may cause the problem, and --fixchr option in pre.py will not solve this, due to the small regex issue I mentioned. to solve this maybe just add the one-liner regex you mentioned contig=<ID=... with contig=<ID=chr behind "bcftools view [my.vcf.gz] | perl -pe 's/^([0-9XYM])/chr$1/' | perl -pe s/chrMT/chrM/" in pre.py.
The options I gave and the error messages :
pre.py -r ucsc.hg19.fasta -R NA12878highconf_intersect.bed --fixchr --verbose HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2 _highconf_PGandRTGphasetransfer.vcf.gz HG01_norm.vcf
2018-01-24 11:24:47,875 INFO Preprocessing HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGp hasetransfer.vcf.gz
[I] Total VCF records: 3775119
[I] Non-reference VCF records: 3775119
2018-01-24 11:25:50,286 INFO [I] X chromosome appears to not be haploid -- assuming this is a female sample
2018-01-24 11:25:50,467 INFO bcftools view HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGp hasetransfer.vcf.gz | perl -pe 's/^([0-9XYM])/chr$1/' | perl -pe s/chrMT/chrM/ | bcftools view -o /tmp/tmpdDKjMz.vcf.gz -O z
2018-01-24 11:27:39,492 INFO bcftools index -t /tmp/tmpdDKjMz.vcf.gz
2018-01-24 11:27:58,617 INFO bcftools view /tmp/tmpdDKjMz.vcf.gz -R intersect.bed -o /tmp/tmpT1YLtb.vcf.gz -O z
2018-01-24 11:28:08,194 INFO bcftools index -t /tmp/tmpT1YLtb.vcf.gz
Traceback (most recent call last):
File "hap/hapv0.3.10/bin/pre.py", line 395, in
main()
File "hap/hapv0.3.10/bin/pre.py", line 391, in main
preprocessWrapper(args)
File "hap/hapv0.3.10/bin/pre.py", line 241, in preprocessWrapper
args.somatic_allele_conversion)
File "hap/hapv0.3.10/bin/pre.py", line 192, in preprocess
sample=sample)
File "hap/hapv0.3.10/lib/python27/Tools/bcftools.py", line 219, in preprocessVCF
runBcftools("index", "-t", output)
File "hap/hapv0.3.10/lib/python27/Tools/bcftools.py", line 49, in runBcftools
". Return code was %i, output: %s / %s \n" % (rc, o, e))
Exception: Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output: / [E::hts_idx_push] unso rted positions on sequence #5: 114453605 followed by 114453591
index: failed to create index for "/tmp/tmpT1YLtb.vcf.gz"
when I change -R to -T, its going well, there were no error messages reported.
pre.py -r ucsc.hg19.fasta -T NA12878highconf_intersect.bed --fixchr --verbose HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2 _highconf_PGandRTGphasetransfer.vcf.gz HG01_norm.vcf
2018-01-31 13:32:25,865 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2018-01-31 13:32:25,962 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[I] Total VCF records: 3775119
[I] Non-reference VCF records: 3775119
from hap.py.
hi,
thank you creating this great tool. I seem to be experiencing the similar error, when i use the -R, however when i switch it -T it works
here is the following error i see when i use the -R
2018-12-27 17:57:51,640 INFO bcftools index -t /tmp/tmpMivC9J.vcf.gz
2018-12-27 17:57:52,010 INFO bcftools view /tmp/tmpMivC9J.vcf.gz -R //BED/target.bed -o /tmp/tmpKINXFk.vcf.gz -O z
2018-12-27 17:58:19,926 INFO bcftools index -t /tmp/tmpKINXFk.vcf.gz
2018-12-27 17:58:20,037 ERROR Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output: / [E::hts_idx_push] unsorted positions on sequence #9: 2595270 followed by 2595248
index: failed to create index for "/tmp/tmpKINXFk.vcf.gz"
in my VCF, i have a deletion called twice in the sample.
chr17 2595248 . CCGCCACGCCCCCGCCCCGCCCCCGCCCCCGCCA C 686.21 PASS AC=0;AF=0.00;AN=2;BaseQRankSum=-2.093;ClippingRankSum=-0.046;DP=13;FS=2.282;MQ=69.65;MQ0=0;MQRankSum=-1.337;QD=12.04;ReadPosRankSum=1.890;SOR=0.976;VQSLOD=2.35;culprit=FS GT:AD:DP:GQ:PL:TP 0|0:13,0:13:43:0,43,535:3
chr17 2595270 . CCGCCCCCGCCA C 316.31 PASS AC=0;AF=0.00;AN=2;BaseQRankSum=-3.319;ClippingRankSum=0.494;DP=11;FS=0.000;MQ=69.56;MQ0=0;MQRankSum=-0.454;QD=7.53;SOR=0.586;VQSLOD=5.51;culprit=FS GT:AD:DP:GQ:PL:TP 0|0:11,0:11:40:0,40,445:3
chr17 2595281 . A C 689.83 VQSRTrancheSNP99.90to100.00 AC=2;AF=1.00;AN=2;BaseQRankSum=2.898;ClippingRankSum=-0.282;DP=12;FS=20.203;MQ=70.00;MQ0=0;MQRankSum=-1.941;NEGATIVE_TRAIN_SITE;QD=18.64;ReadPosRankSum=-5.165;SOR=4.307;VQSLOD=-6.317e+00;culprit=ReadPosRankSum GT:AD:DP:GQ:PL:TP 1|1:1,11:12:31:435,31,0:9
when i removed chr17:2595270CCGCCCCCGCCA>C from the input vcf, it worked with the -R
i'm not exactly sure what the issue is, it seems perfectly valid to have large indels being represented in multiple rows especially if phased.
Thanks,
Rohan
from hap.py.
Related Issues (20)
- Parsing results VCF gives different counts of TRUTH FN than summary HOT 1
- CMake Error at CMakeLists.txt:32 (message): Building external dependencies has failed
- Error running BCFTOOLS :Argument list too long
- Integrating vcfdist as a comparison engine into hap.py
- missing reference HOT 1
- VCF format issues with --write-vcf, - FORMAT field inconsistencies HOT 1
- Trying to print sequence when we mean contig name
- error code 1
- Docker Implementation: Several Error Messages Related to "preprocess" HOT 1
- ROC and PR curve HOT 2
- Docker fails to build for both bases
- Incorrect number of FORMAT/AD values on scmp-distance engine
- Link provide in email is broken HOT 1
- [E::bgzf_uncompress] inflate failed: invalid distance too far back HOT 1
- Docker build failed HOT 1
- How is the false positive rate calculated in som.py stats?
- Can't find reference HOT 1
- While using hap.py, there is a problem:
- Make a new pre-built docker image?
- Using --usefiltered-truth results in incorrect FP calls for filtered variants in the truth (xcmp)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hap.py.