Coder Social home page Coder Social logo

pre.py header contig INFO issue about hap.py HOT 3 CLOSED

illumina avatar illumina commented on August 24, 2024
pre.py header contig INFO issue

from hap.py.

Comments (3)

pkrusche avatar pkrusche commented on August 24, 2024

Thanks! Fixing this is on our backlog; it can also be fixed using a one-liner to replace contig=<ID=... with contig=<ID=chr.

I have not seen issues with tabix indexing; are you using tabix or bcftools to do this?

from hap.py.

m891115891117 avatar m891115891117 commented on August 24, 2024

Hi Peter
when using -R option in pre.py(also -R option in bcftools), the inner build bcftools in pre.py may cause issues if the contig info not transformed. However, when using -T option, there was no error reported.
it is because in bcftools, -R option required "sequence names must match exactly, chr20 is not the same as 20" . So, if I have a vcf file using b37 format, but the bed file is hg19 format, -R may cause the problem, and --fixchr option in pre.py will not solve this, due to the small regex issue I mentioned. to solve this maybe just add the one-liner regex you mentioned contig=<ID=... with contig=<ID=chr behind "bcftools view [my.vcf.gz] | perl -pe 's/^([0-9XYM])/chr$1/' | perl -pe s/chrMT/chrM/" in pre.py.

The options I gave and the error messages :
pre.py -r ucsc.hg19.fasta -R NA12878highconf_intersect.bed --fixchr --verbose HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2 _highconf_PGandRTGphasetransfer.vcf.gz HG01_norm.vcf

2018-01-24 11:24:47,875 INFO Preprocessing HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGp hasetransfer.vcf.gz
[I] Total VCF records: 3775119
[I] Non-reference VCF records: 3775119
2018-01-24 11:25:50,286 INFO [I] X chromosome appears to not be haploid -- assuming this is a female sample
2018-01-24 11:25:50,467 INFO bcftools view HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGp hasetransfer.vcf.gz | perl -pe 's/^([0-9XYM])/chr$1/' | perl -pe s/chrMT/chrM/ | bcftools view -o /tmp/tmpdDKjMz.vcf.gz -O z
2018-01-24 11:27:39,492 INFO bcftools index -t /tmp/tmpdDKjMz.vcf.gz
2018-01-24 11:27:58,617 INFO bcftools view /tmp/tmpdDKjMz.vcf.gz -R intersect.bed -o /tmp/tmpT1YLtb.vcf.gz -O z
2018-01-24 11:28:08,194 INFO bcftools index -t /tmp/tmpT1YLtb.vcf.gz
Traceback (most recent call last):
File "hap/hapv0.3.10/bin/pre.py", line 395, in
main()
File "hap/hapv0.3.10/bin/pre.py", line 391, in main
preprocessWrapper(args)
File "hap/hapv0.3.10/bin/pre.py", line 241, in preprocessWrapper
args.somatic_allele_conversion)
File "hap/hapv0.3.10/bin/pre.py", line 192, in preprocess
sample=sample)
File "hap/hapv0.3.10/lib/python27/Tools/bcftools.py", line 219, in preprocessVCF
runBcftools("index", "-t", output)
File "hap/hapv0.3.10/lib/python27/Tools/bcftools.py", line 49, in runBcftools
". Return code was %i, output: %s / %s \n" % (rc, o, e))
Exception: Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output: / [E::hts_idx_push] unso rted positions on sequence #5: 114453605 followed by 114453591
index: failed to create index for "/tmp/tmpT1YLtb.vcf.gz"

when I change -R to -T, its going well, there were no error messages reported.

pre.py -r ucsc.hg19.fasta -T NA12878highconf_intersect.bed --fixchr --verbose HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2 _highconf_PGandRTGphasetransfer.vcf.gz HG01_norm.vcf

2018-01-31 13:32:25,865 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2018-01-31 13:32:25,962 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[I] Total VCF records: 3775119
[I] Non-reference VCF records: 3775119

from hap.py.

rohandavidg avatar rohandavidg commented on August 24, 2024

hi,

thank you creating this great tool. I seem to be experiencing the similar error, when i use the -R, however when i switch it -T it works

here is the following error i see when i use the -R
2018-12-27 17:57:51,640 INFO bcftools index -t /tmp/tmpMivC9J.vcf.gz
2018-12-27 17:57:52,010 INFO bcftools view /tmp/tmpMivC9J.vcf.gz -R //BED/target.bed -o /tmp/tmpKINXFk.vcf.gz -O z
2018-12-27 17:58:19,926 INFO bcftools index -t /tmp/tmpKINXFk.vcf.gz
2018-12-27 17:58:20,037 ERROR Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output: / [E::hts_idx_push] unsorted positions on sequence #9: 2595270 followed by 2595248
index: failed to create index for "/tmp/tmpKINXFk.vcf.gz"

in my VCF, i have a deletion called twice in the sample.
chr17 2595248 . CCGCCACGCCCCCGCCCCGCCCCCGCCCCCGCCA C 686.21 PASS AC=0;AF=0.00;AN=2;BaseQRankSum=-2.093;ClippingRankSum=-0.046;DP=13;FS=2.282;MQ=69.65;MQ0=0;MQRankSum=-1.337;QD=12.04;ReadPosRankSum=1.890;SOR=0.976;VQSLOD=2.35;culprit=FS GT:AD:DP:GQ:PL:TP 0|0:13,0:13:43:0,43,535:3
chr17 2595270 . CCGCCCCCGCCA C 316.31 PASS AC=0;AF=0.00;AN=2;BaseQRankSum=-3.319;ClippingRankSum=0.494;DP=11;FS=0.000;MQ=69.56;MQ0=0;MQRankSum=-0.454;QD=7.53;SOR=0.586;VQSLOD=5.51;culprit=FS GT:AD:DP:GQ:PL:TP 0|0:11,0:11:40:0,40,445:3
chr17 2595281 . A C 689.83 VQSRTrancheSNP99.90to100.00 AC=2;AF=1.00;AN=2;BaseQRankSum=2.898;ClippingRankSum=-0.282;DP=12;FS=20.203;MQ=70.00;MQ0=0;MQRankSum=-1.941;NEGATIVE_TRAIN_SITE;QD=18.64;ReadPosRankSum=-5.165;SOR=4.307;VQSLOD=-6.317e+00;culprit=ReadPosRankSum GT:AD:DP:GQ:PL:TP 1|1:1,11:12:31:435,31,0:9

when i removed chr17:2595270CCGCCCCCGCCA>C from the input vcf, it worked with the -R
i'm not exactly sure what the issue is, it seems perfectly valid to have large indels being represented in multiple rows especially if phased.

Thanks,
Rohan

from hap.py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.