donfreed / mosaic-from-barcodes Goto Github PK
View Code? Open in Web Editor NEWDetect mosaic variants using barcode information
License: MIT License
Detect mosaic variants using barcode information
License: MIT License
See these lines.
See these lines.
Hi Don,
I am trying to run the test data with your code, and I have run into some issues. For example, when I search for MOSAIC==1 I get the following two lines:
chr1 2454204 . A G 1392.69 PASS 0.31224843345631653 3 1.0 12 53 1|1
chr1 2454204 . A G 357.487 PASS 0.31224843345631653 3 1.0 12 53 1|0
where I printed fields 1-7 plus the following INFO fields (in this order): MOSAICP, PHASEDP,PHASETP,RO,AO followed by the genotype.
I would have thought that the first line would have been skipped based on line 300. When I look at the raw data (NA12878_WGS_210_phased_variants.vcf.gz) this variant is listed as follows:
chr1 2454204 . A G 1392.69 PASS NS=1;DP=63;DPB=63.0;AC=1;AN=2;AF=0.5;RO=12;AO=53;PRO=0.0;PAO=0.0;QR=531;QA=1926;PQR=0.0;PQA=0.0;SRF=8;SRR=5;SAF=28;SAR=22;SRP=4.51363;SAP=4.57376;AB=0.793651;ABP=50.1967;RUN=1;RPP=37.059;RPPR=4.51363;RPL=11.0;RPR=39.0;EPP=14.1282;EPPR=16.5402;DPRA=0.0;ODDS=41.5985;GTI=0;TYPE=snp;CIGAR=1X;NUMALT=1;MEANALT=1.0;LEN=1;MQM=59.1;MQMR=60.0;PAIRED=0.94;PAIREDR=1.0;technology.ILLUMINA=1.0;MUMAP_REF=58.75;MUMAP_ALT=59.6226;MMD=1.23391;RESCUED=5;NOT_RESCUED=60;AON=0;AOP=53;HAPLOCALLED=0;RON=0;ROP=12 GT:DP:RO:QR:AO:QA:GL:BX:PS:PQ:JQ 1|1:63:13:531:50:1926:-154.378,0.0,-29.1628:CTGTGTCTCTTAGCTT-1_75;AGTTCCCGTCTCTATT-1_65;ATCTACTCAAGCTGTT-1_75;ACGGGCTAGGAGGACG-1_75;TAGGCGCAGACCTAAA-1_75;ACCTAAGCAAGTTCTG-1_75;AACAAGATCTGTATGG-1_75;CACGTAAAGCTCTGCG-1_75;TTGGGTAGTAAAGGAG-1_75;CGCACTTCAGACGCCT-1_75;CAGTACATCTGCGGGT-1_75;TCGGTAACAACTGCGC-1_75,TTGTCCGGTCTCGTCT-1_60_75;ATAGCGTTCGCCTGAG-1_75;ACTGGGCGTTCGATAC-1_75;CCTCAGTCAAACGGGT-1_75;TACTAGGCATGCCTAA-1_60;TCTCTAAAGGGTACGT-1_44_70;ATGTAGCAGCAATCTC-1_70_75;CCTTACGAGTTTGGCT-1_75;TTCCCAGGTCGCATCG-1_70;AATCCAGTCTGCGGGT-1_60;CAGTACATCTGCGGGT-1_65;GAAGTTCGTACACCGC-1_75_75;TCCCATGGTTGGAACG-1_75_44;AGGGCAAGTGTCACAT-1_65;CTGTGTCTCTTAGCTT-1_75;GGAAGTGGTAAGGGAA-1_75;GGAGAACGTCTGTCCT-1_60;GGACTTATCCGTCATC-1_75;TGTGGTAAGGGATAAG-1_75;CAACGTATCCCATAAG-1_44;CTAACTTTCGCGCCAA-1_44;TAGCGTAGTTTAAGCC-1_70;GCTCTGTGTTCGGTGC-1_65;CGAGAAGCATGGTAAA-1_75;TGTACGACAGGCATAG-1_75;CCAAGATTCTAGCTAG-1_44;TGTCACCTCCTGCCAT-1_75;TCGCTACTCTACGGGC-1_75;TAAGAGAAGAGTTGGC-1_65_75;GATGGTTTCGGCCAGT-1_75;TTGGGTAGTAAAGGAG-1_75;GCCGACATCCAGGCGT-1_65;GAGACCCAGGCATTCT-1_75;CTTTCGGTCATTTGCT-1_75;CAGCTAACAAGAACCG-1_75;AATCCAGCAGGGTACA-1_75;CGCCGAACAAGACACG-1_75;GCGCTGAAGGCTTATC-1_44;AGCTCCTTCATGACCA-1_70;CGGCTAGAGTCTCTCC-1_44;CATGCCTAGAGGTCTG-1_70;TGTGGTACACGTTGGC-1_70:2078699:255:255
chr1 2454204 . A G 357.487 PASS NS=1;DP=30;DPB=30.0;AC=1;AN=1;AF=0.5;RO=12;AO=53;PRO=0.0;PAO=0.0;QR=373;QA=800;PQR=0.0;PQA=0.0;SRF=4;SRR=5;SAF=11;SAR=10;SRP=3.25157;SAP=3.1137;AB=0.0;ABP=0.0;RUN=1;RPP=26.2761;RPPR=3.25157;RPL=3.0;RPR=18.0;EPP=3.94093;EPPR=9.04217;DPRA=0.0;ODDS=82.3144;GTI=0;TYPE=snp;CIGAR=1X;NUMALT=1;MEANALT=1.0;LEN=1;MQM=59.0476;MQMR=60.0;PAIRED=0.904762;PAIREDR=1.0;technology.ILLUMINA=1.0;MUMAP_REF=58.75;MUMAP_ALT=59.6226;MMD=1.23391;RESCUED=5;NOT_RESCUED=60;AON=0;AOP=53;HAPLOCALLED=1;RON=0;ROP=12 GT:DP:RO:QR:AO:QA:GL:BX:PS:PQ:JQ 1|0:30:9:373:21:800:-38.2033,0.0:CTGTGTCTCTTAGCTT-1_75;AGTTCCCGTCTCTATT-1_65;ATCTACTCAAGCTGTT-1_75;ACGGGCTAGGAGGACG-1_75;TAGGCGCAGACCTAAA-1_75;ACCTAAGCAAGTTCTG-1_75;AACAAGATCTGTATGG-1_75;CACGTAAAGCTCTGCG-1_75;TTGGGTAGTAAAGGAG-1_75;CGCACTTCAGACGCCT-1_75;CAGTACATCTGCGGGT-1_75;TCGGTAACAACTGCGC-1_75,TTGTCCGGTCTCGTCT-1_60_75;ATAGCGTTCGCCTGAG-1_75;ACTGGGCGTTCGATAC-1_75;CCTCAGTCAAACGGGT-1_75;TACTAGGCATGCCTAA-1_60;TCTCTAAAGGGTACGT-1_44_70;ATGTAGCAGCAATCTC-1_70_75;CCTTACGAGTTTGGCT-1_75;TTCCCAGGTCGCATCG-1_70;AATCCAGTCTGCGGGT-1_60;CAGTACATCTGCGGGT-1_65;GAAGTTCGTACACCGC-1_75_75;TCCCATGGTTGGAACG-1_75_44;AGGGCAAGTGTCACAT-1_65;CTGTGTCTCTTAGCTT-1_75;GGAAGTGGTAAGGGAA-1_75;GGAGAACGTCTGTCCT-1_60;GGACTTATCCGTCATC-1_75;TGTGGTAAGGGATAAG-1_75;CAACGTATCCCATAAG-1_44;CTAACTTTCGCGCCAA-1_44;TAGCGTAGTTTAAGCC-1_70;GCTCTGTGTTCGGTGC-1_65;CGAGAAGCATGGTAAA-1_75;TGTACGACAGGCATAG-1_75;CCAAGATTCTAGCTAG-1_44;TGTCACCTCCTGCCAT-1_75;TCGCTACTCTACGGGC-1_75;TAAGAGAAGAGTTGGC-1_65_75;GATGGTTTCGGCCAGT-1_75;TTGGGTAGTAAAGGAG-1_75;GCCGACATCCAGGCGT-1_65;GAGACCCAGGCATTCT-1_75;CTTTCGGTCATTTGCT-1_75;CAGCTAACAAGAACCG-1_75;AATCCAGCAGGGTACA-1_75;CGCCGAACAAGACACG-1_75;GCGCTGAAGGCTTATC-1_44;AGCTCCTTCATGACCA-1_70;CGGCTAGAGTCTCTCC-1_44;CATGCCTAGAGGTCTG-1_70;TGTGGTACACGTTGGC-1_70:2078699:25:25
It looks as though the homozygote entry is HAPLOCALLED=0 while the heterozygote entry is HAPLOCALLED=1. Shouldn't the heterozygote get the MOSAIC INFO tag of 1 while the homozygote is skipped?
Thanks in advance for any insight.
Best,
KMS
The current implementation has a very long runtime. Profiling shows that the program spends most of it's time in phasing.py::get_haplotypes
. The majority of the time is spent retrieving values from the sparse matrix using indices.
Possible solutions are to change the data representation (move to a different type of matrix) or use a more efficient method for iteration.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.