Coder Social home page Coder Social logo

mosaic-from-barcodes's People

Contributors

donfreed avatar

Watchers

 avatar

Forkers

cdarby douym

mosaic-from-barcodes's Issues

Parsing homozygotes in main.py

Hi Don,

I am trying to run the test data with your code, and I have run into some issues. For example, when I search for MOSAIC==1 I get the following two lines:

chr1	2454204	.	A	G	1392.69	PASS	0.31224843345631653	3	1.0	12	53	1|1
chr1	2454204	.	A	G	357.487	PASS	0.31224843345631653	3	1.0	12	53	1|0

where I printed fields 1-7 plus the following INFO fields (in this order): MOSAICP, PHASEDP,PHASETP,RO,AO followed by the genotype.

I would have thought that the first line would have been skipped based on line 300. When I look at the raw data (NA12878_WGS_210_phased_variants.vcf.gz) this variant is listed as follows:

chr1	2454204	.	A	G	1392.69	PASS	NS=1;DP=63;DPB=63.0;AC=1;AN=2;AF=0.5;RO=12;AO=53;PRO=0.0;PAO=0.0;QR=531;QA=1926;PQR=0.0;PQA=0.0;SRF=8;SRR=5;SAF=28;SAR=22;SRP=4.51363;SAP=4.57376;AB=0.793651;ABP=50.1967;RUN=1;RPP=37.059;RPPR=4.51363;RPL=11.0;RPR=39.0;EPP=14.1282;EPPR=16.5402;DPRA=0.0;ODDS=41.5985;GTI=0;TYPE=snp;CIGAR=1X;NUMALT=1;MEANALT=1.0;LEN=1;MQM=59.1;MQMR=60.0;PAIRED=0.94;PAIREDR=1.0;technology.ILLUMINA=1.0;MUMAP_REF=58.75;MUMAP_ALT=59.6226;MMD=1.23391;RESCUED=5;NOT_RESCUED=60;AON=0;AOP=53;HAPLOCALLED=0;RON=0;ROP=12	GT:DP:RO:QR:AO:QA:GL:BX:PS:PQ:JQ	1|1:63:13:531:50:1926:-154.378,0.0,-29.1628:CTGTGTCTCTTAGCTT-1_75;AGTTCCCGTCTCTATT-1_65;ATCTACTCAAGCTGTT-1_75;ACGGGCTAGGAGGACG-1_75;TAGGCGCAGACCTAAA-1_75;ACCTAAGCAAGTTCTG-1_75;AACAAGATCTGTATGG-1_75;CACGTAAAGCTCTGCG-1_75;TTGGGTAGTAAAGGAG-1_75;CGCACTTCAGACGCCT-1_75;CAGTACATCTGCGGGT-1_75;TCGGTAACAACTGCGC-1_75,TTGTCCGGTCTCGTCT-1_60_75;ATAGCGTTCGCCTGAG-1_75;ACTGGGCGTTCGATAC-1_75;CCTCAGTCAAACGGGT-1_75;TACTAGGCATGCCTAA-1_60;TCTCTAAAGGGTACGT-1_44_70;ATGTAGCAGCAATCTC-1_70_75;CCTTACGAGTTTGGCT-1_75;TTCCCAGGTCGCATCG-1_70;AATCCAGTCTGCGGGT-1_60;CAGTACATCTGCGGGT-1_65;GAAGTTCGTACACCGC-1_75_75;TCCCATGGTTGGAACG-1_75_44;AGGGCAAGTGTCACAT-1_65;CTGTGTCTCTTAGCTT-1_75;GGAAGTGGTAAGGGAA-1_75;GGAGAACGTCTGTCCT-1_60;GGACTTATCCGTCATC-1_75;TGTGGTAAGGGATAAG-1_75;CAACGTATCCCATAAG-1_44;CTAACTTTCGCGCCAA-1_44;TAGCGTAGTTTAAGCC-1_70;GCTCTGTGTTCGGTGC-1_65;CGAGAAGCATGGTAAA-1_75;TGTACGACAGGCATAG-1_75;CCAAGATTCTAGCTAG-1_44;TGTCACCTCCTGCCAT-1_75;TCGCTACTCTACGGGC-1_75;TAAGAGAAGAGTTGGC-1_65_75;GATGGTTTCGGCCAGT-1_75;TTGGGTAGTAAAGGAG-1_75;GCCGACATCCAGGCGT-1_65;GAGACCCAGGCATTCT-1_75;CTTTCGGTCATTTGCT-1_75;CAGCTAACAAGAACCG-1_75;AATCCAGCAGGGTACA-1_75;CGCCGAACAAGACACG-1_75;GCGCTGAAGGCTTATC-1_44;AGCTCCTTCATGACCA-1_70;CGGCTAGAGTCTCTCC-1_44;CATGCCTAGAGGTCTG-1_70;TGTGGTACACGTTGGC-1_70:2078699:255:255
chr1	2454204	.	A	G	357.487	PASS	NS=1;DP=30;DPB=30.0;AC=1;AN=1;AF=0.5;RO=12;AO=53;PRO=0.0;PAO=0.0;QR=373;QA=800;PQR=0.0;PQA=0.0;SRF=4;SRR=5;SAF=11;SAR=10;SRP=3.25157;SAP=3.1137;AB=0.0;ABP=0.0;RUN=1;RPP=26.2761;RPPR=3.25157;RPL=3.0;RPR=18.0;EPP=3.94093;EPPR=9.04217;DPRA=0.0;ODDS=82.3144;GTI=0;TYPE=snp;CIGAR=1X;NUMALT=1;MEANALT=1.0;LEN=1;MQM=59.0476;MQMR=60.0;PAIRED=0.904762;PAIREDR=1.0;technology.ILLUMINA=1.0;MUMAP_REF=58.75;MUMAP_ALT=59.6226;MMD=1.23391;RESCUED=5;NOT_RESCUED=60;AON=0;AOP=53;HAPLOCALLED=1;RON=0;ROP=12	GT:DP:RO:QR:AO:QA:GL:BX:PS:PQ:JQ	1|0:30:9:373:21:800:-38.2033,0.0:CTGTGTCTCTTAGCTT-1_75;AGTTCCCGTCTCTATT-1_65;ATCTACTCAAGCTGTT-1_75;ACGGGCTAGGAGGACG-1_75;TAGGCGCAGACCTAAA-1_75;ACCTAAGCAAGTTCTG-1_75;AACAAGATCTGTATGG-1_75;CACGTAAAGCTCTGCG-1_75;TTGGGTAGTAAAGGAG-1_75;CGCACTTCAGACGCCT-1_75;CAGTACATCTGCGGGT-1_75;TCGGTAACAACTGCGC-1_75,TTGTCCGGTCTCGTCT-1_60_75;ATAGCGTTCGCCTGAG-1_75;ACTGGGCGTTCGATAC-1_75;CCTCAGTCAAACGGGT-1_75;TACTAGGCATGCCTAA-1_60;TCTCTAAAGGGTACGT-1_44_70;ATGTAGCAGCAATCTC-1_70_75;CCTTACGAGTTTGGCT-1_75;TTCCCAGGTCGCATCG-1_70;AATCCAGTCTGCGGGT-1_60;CAGTACATCTGCGGGT-1_65;GAAGTTCGTACACCGC-1_75_75;TCCCATGGTTGGAACG-1_75_44;AGGGCAAGTGTCACAT-1_65;CTGTGTCTCTTAGCTT-1_75;GGAAGTGGTAAGGGAA-1_75;GGAGAACGTCTGTCCT-1_60;GGACTTATCCGTCATC-1_75;TGTGGTAAGGGATAAG-1_75;CAACGTATCCCATAAG-1_44;CTAACTTTCGCGCCAA-1_44;TAGCGTAGTTTAAGCC-1_70;GCTCTGTGTTCGGTGC-1_65;CGAGAAGCATGGTAAA-1_75;TGTACGACAGGCATAG-1_75;CCAAGATTCTAGCTAG-1_44;TGTCACCTCCTGCCAT-1_75;TCGCTACTCTACGGGC-1_75;TAAGAGAAGAGTTGGC-1_65_75;GATGGTTTCGGCCAGT-1_75;TTGGGTAGTAAAGGAG-1_75;GCCGACATCCAGGCGT-1_65;GAGACCCAGGCATTCT-1_75;CTTTCGGTCATTTGCT-1_75;CAGCTAACAAGAACCG-1_75;AATCCAGCAGGGTACA-1_75;CGCCGAACAAGACACG-1_75;GCGCTGAAGGCTTATC-1_44;AGCTCCTTCATGACCA-1_70;CGGCTAGAGTCTCTCC-1_44;CATGCCTAGAGGTCTG-1_70;TGTGGTACACGTTGGC-1_70:2078699:25:25

It looks as though the homozygote entry is HAPLOCALLED=0 while the heterozygote entry is HAPLOCALLED=1. Shouldn't the heterozygote get the MOSAIC INFO tag of 1 while the homozygote is skipped?

Thanks in advance for any insight.

Best,
KMS

Very long runtime

The current implementation has a very long runtime. Profiling shows that the program spends most of it's time in phasing.py::get_haplotypes. The majority of the time is spent retrieving values from the sparse matrix using indices.

Possible solutions are to change the data representation (move to a different type of matrix) or use a more efficient method for iteration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.