Coder Social home page Coder Social logo

genetronhealth / uvc Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 4.0 2.51 MB

UVC, a very accurate small-variant caller (https://doi.org/10.1093/bib/bbab458)

License: BSD 3-Clause "New" or "Revised" License

C++ 96.68% Makefile 0.48% C 0.94% Shell 1.57% Python 0.34%

uvc's People

Contributors

genetronhealth avatar zhaoxiaofei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

uvc's Issues

Parameters or preprocessing steps to improve uvc performance

Hello,

I used mutect2, vardict, and varscan to call variants on matched tumour-normal bam files containing reads that were UMI-collapsed with fgbio's CallDuplexConsensusReads and stitched using illumina's pisces/gemini. I used UVC on reads that had UMIs extracted and placed in the reads QNAME before being realigned to hg19 (but with no UMI collapsing or read stitching).

The variants called by Mutect2 (with FilterMutectCalls applied) align well with the gold standard list of variants for this matched tumour-normal pair. The variants predicted by varscan with basic filters applied (VAF > 0.05, somatic_status=Somatic) also align well with this goldstandard list of variants. Vardict labels many passing variants, but also captures many of the gold standard variants.

UVC is capturing 3 less variants than mutect2/vardict and 5 less variants than varscan. I am wondering if I should change any parameters when running uvc or perform additional steps in the preprocessing of my tumour-normal bam files. Below is an outline of my current workflow:

Current preprocessing steps:

  1. picard FastqToSam
  2. fgbio ExtractUMIsFromBam (get reads in originalName#UMI format)
  3. picard SamToFastq
  4. bwa (for alignment)

Current uvc script:
uvc='/software/uvc/0.14.2.15f4adc/bin/uvcTN.sh'
export UVC_BIN_EXE_FULL_NAME=/software/uvc/0.14.2.15f4adc/bin/uvc-1

$uvc ${REF_GENOME} ${TUMOUR_BAM} ${NORMAL_BAM} \
${OUTPUT_PATH}/colo829_uvc.vcf "COLO829_S8,COLO829_BL_S7"

Apologies for the long description, thank you for your advice.

how to understand these different values between SRR7757440_SRR7757439_TN.vcf.gz and SRR7757440_uvc1.vcf.gz

Hi Zhao,

I have some questions about the result.

how to understand these different values between SRR7757440_SRR7757439_TN.vcf.gz [a] and SRR7757440_uvc1.vcf.gz [b], just like 'QUAL', 'FILTER', especially 'SomaticQ'
[a] 3 16306504 . C T 60 PASS SOMATIC;SomaticQ=60;TLODQ=78;NLODQ=60;NLODV=A;TNBQF=251,26,0,19;TNCQF=177,48,0,62;tDP=50250;tADR=50127,78;nDP=58463;nADR=58416,6;
[b] 3 16306504 . C T 49 Q50 ANY_VAR;SomaticQ=49;TLODQ=49;NLODQ=57;NLODV=;TNBQF=0,-75,0,4;TNCQF=0,-75,0,49;tDP=50250;tADR=50127,78;nDP=0;nADR=0,0;

(##INFO=<ID=SomaticQ,Number=A,Type=Float,Description="Somatic quality of the variant, the PHRED-scale probability that this variant is not somatic.)
SomaticQ value of [a] means this variant is not somatic and has a high PHRED-scale probability (60๏ผŒ1 - 10e-6), but the value of allele ([a] tADR=50127,78 ) shows that this variant may be a low freq mutation.
am I misunderstanding these?

thanks

UVC results

Hi!

I have a tumor-only sample and I got the following variants from uvc, all in the same position. UVC cannot detect which of these representations is correct? Moreover, I want to detect indels. Are there any filters, or metrics that I need to watch, to guaranty that these variants are not artifacts?

chr1	2494541	.	A	AG	11.000000	Q20	ANY_VAR;SomaticQ=11;TLODQ=11;NLODQ=119;NLODV=<NONE>;TNBQF=0,-70,0,11;TNCQF=0,0,0,0;tbDP=1812;tDP=1283;tAD=1276,3;t2DP=0;t2AD=0,0;RU=G;RC=6;R3X2=2494535,1,1,2494547,2,1	GT:GQ:HQ:FT:FTS:_A_:DP:AD:bDP:bAD:c2DP:c2AD:_Aa:APDP:APXM:_Ab:APLRID:APLRI:APLRP:_Ac:ALRPxT:ALRIT:ALRIt:ALRPt:ALRBt:_AQ:aMQs:AMQs:a1BQf:A1BQf:a1BQr:A1BQr:_ADPf:aDPff:ADPff:aDPfr:ADPfr:_ADPr:aDPrf:ADPrf:aDPrr:ADPrr:_ALP:aLP1:ALP1:aLP2:ALP2:aLPL:ALPL:_ARP:aRP1:ARP1:aRP2:ARP2:aRPL:ARPL:_ALB:aLB1:aLB2:ALB2:aLBL:ALBL:_ARB:aRB1:aRB2:ARB2:aRBL:ARBL:_ALI:aLI1:aLI2:ALI2:aLIr:ALIr:_ARI:aRI1:aRI2:ARI2:aRIf:ARIf:_AX:aBQ2:ABQ2:aPF2:APF2:aP1:AP1:aP2:AP2:_Ax:aPF1:aLIT:aRIT:aP3:aNC:_BDP:bDPf:BDPf:bDPr:BDPr:_BT:bTAf:BTAf:bTAr:BTAr:bTBf:BTBf:bTBr:BTBr:_CDP1:cDP1f:CDP1f:cDP1r:CDP1r:cDP12f:CDP12f:cDP12r:CDP12r:_CDP2:cDP2f:CDP2f:cDP2r:CDP2r:_DDP:DDP1:dDP1:DDP2:dDP2:_ea:aBQ:a2BQf:a2BQr:a2XM2:a2BM2:aBQQ:_eb:bMQ:aAaMQ:bNMQ:bNMa:bNMb:bMQQ:_eB:bIAQb:bIADb:bIDQb:_eC:cIAQf:cIADf:cIDQf:cIAQr:cIADr:cIDQr:_eE:bIAQ:cIAQ:bTINQ:cTINQ:_eQ1:cPCQ1:cPLQ1:cVQ1:gVQ1:_eQ2:cPCQ2:cPLQ2:cVQ2:cMmQ:dVQinc:_CDP1vx:cDP1v:CDP1v:cDP1w:CDP1w:cDP1x:CDP1x:_CDP2vx:cDP2v:CDP2v:cDP2w:CDP2w:cDP2x:CDP2x:_f1:CONTQ:nPF:nNFA:nAFA:nBCFA:_g1:VTI:VTD:cVQ1M:cVQ2M:cVQAM:cVQSM:_g2:gapNf:gapNr:gapSeq:gapbAD1:gapcAD1:gcAD2:gcAD3:_g3:bDPa:cDP0a:gapSa:_h1:bHap:cHap:c2Hap:_i1:vHGQ:vAC:vNLODQ:note	./1:0:0,0:.:cbDup-81:_A_:1283:1276,3:1812:1805,3:0:0,0:_Aa:2312,7,5,7,5,0,59,4,2110,0,0,0:24294,120,337758,12,7,5,700,500:_Ab:28,7,5,5:119719,814,285713,1486:102372,207765,7,5:_Ac:7,6:474,370,564,438:73,98,96,128:38,43,88,93:150,175,369,394:_AQ:136362,300:136962:50798,120:51062:28344,73:28453:_ADPf:770,2:773,0:444,1:445,0:_ADPr:692,1:696,0:368,1:370,0:_ALP:1154,3:1158:1010,3:1014:110978,203:111307:_ARP:1238,4:1246:1162,4:1170:188947,483:190066:_ALB:2009,5:1953,3:1961:443459,829:444794:_ARB:2009,5:1985,5:1995:798967,2025:803779:_ALI:519,1:457,0:457:812,2:815:_ARI:1333,3:1148,1:1153:1462,3:1469:_AX:2274,5:2284:227400,500:228400:2009,5:2019:2274,5:2284:_Ax:227400,500:119238,102:278777,449:2111,5:2236,5:_BDP:939,2:942,0:866,1:870,0:_BT:220737,475:221410:210704,117:211453:5069,46:5138:6490,19:6601:_CDP1:672,2:675,0:604,1:608,0:672,2:675,0:604,1:608,0:_CDP2:0,0:0,0:0,0:0,0:_DDP:0,0:0,0:0,0:0,0:_ea:34,38:50624,120:28245,73:0,0:0,0:49113,133:_eb:61,61:1,-1:0,3:3,10:4,3:59,64:_eB:57969,36:1763,3:33,40:_eC:8073,0:190,0:48,0:7500,0:176,0:48,0:_eE:34870,24:-30,-24:199588,415:37,37:_eQ1:68,20:95,11:92,11:59,11:_eQ2:68,73:85,88:0,0:29,13:0,0:_CDP1vx:112850,247:113260,0:112850,247:113260:127506,248:128063:_CDP2vx:0,0:0,0:0,0:0:1,1:3:_f1:51,0:5,5:900,900,262,271,262,160:251,262,249,259,260,252,246,256,255:262,271,254,160,0,0,0,0,0,0:_g1:6,12:<LR>,<LI1>:11,0:0,0:<LI1>,<LD1>:G,G:_g2:1:1:G,G:2,1:2,1:0,0:0,0:_g3:1805,3:1276,3:,G:_h1:.:.:.:_i1:119:0,0:0,82:.

chr1	2494541	.	AG	A	6.183890	Q10	ANY_VAR;SomaticQ=-20;TLODQ=-20;NLODQ=119;NLODV=<NONE>;TNBQF=0,-68,0,0;TNCQF=0,0,0,0;tbDP=1812;tDP=1283;tAD=1276,4;t2DP=0;t2AD=0,0;RU=G;RC=6;R3X2=2494535,1,1,2494547,2,1	GT:GQ:HQ:FT:FTS:_A_:DP:AD:bDP:bAD:c2DP:c2AD:_Aa:APDP:APXM:_Ab:APLRID:APLRI:APLRP:_Ac:ALRPxT:ALRIT:ALRIt:ALRPt:ALRBt:_AQ:aMQs:AMQs:a1BQf:A1BQf:a1BQr:A1BQr:_ADPf:aDPff:ADPff:aDPfr:ADPfr:_ADPr:aDPrf:ADPrf:aDPrr:ADPrr:_ALP:aLP1:ALP1:aLP2:ALP2:aLPL:ALPL:_ARP:aRP1:ARP1:aRP2:ARP2:aRPL:ARPL:_ALB:aLB1:aLB2:ALB2:aLBL:ALBL:_ARB:aRB1:aRB2:ARB2:aRBL:ARBL:_ALI:aLI1:aLI2:ALI2:aLIr:ALIr:_ARI:aRI1:aRI2:ARI2:aRIf:ARIf:_AX:aBQ2:ABQ2:aPF2:APF2:aP1:AP1:aP2:AP2:_Ax:aPF1:aLIT:aRIT:aP3:aNC:_BDP:bDPf:BDPf:bDPr:BDPr:_BT:bTAf:BTAf:bTAr:BTAr:bTBf:BTBf:bTBr:BTBr:_CDP1:cDP1f:CDP1f:cDP1r:CDP1r:cDP12f:CDP12f:cDP12r:CDP12r:_CDP2:cDP2f:CDP2f:cDP2r:CDP2r:_DDP:DDP1:dDP1:DDP2:dDP2:_ea:aBQ:a2BQf:a2BQr:a2XM2:a2BM2:aBQQ:_eb:bMQ:aAaMQ:bNMQ:bNMa:bNMb:bMQQ:_eB:bIAQb:bIADb:bIDQb:_eC:cIAQf:cIADf:cIDQf:cIAQr:cIADr:cIDQr:_eE:bIAQ:cIAQ:bTINQ:cTINQ:_eQ1:cPCQ1:cPLQ1:cVQ1:gVQ1:_eQ2:cPCQ2:cPLQ2:cVQ2:cMmQ:dVQinc:_CDP1vx:cDP1v:CDP1v:cDP1w:CDP1w:cDP1x:CDP1x:_CDP2vx:cDP2v:CDP2v:cDP2w:CDP2w:cDP2x:CDP2x:_f1:CONTQ:nPF:nNFA:nAFA:nBCFA:_g1:VTI:VTD:cVQ1M:cVQ2M:cVQAM:cVQSM:_g2:gapNf:gapNr:gapSeq:gapbAD1:gapcAD1:gcAD2:gcAD3:_g3:bDPa:cDP0a:gapSa:_h1:bHap:cHap:c2Hap:_i1:vHGQ:vAC:vNLODQ:note	./1:0:0,0:.:aPositionL-53|cbDup-79:_A_:1283:1276,4:1812:1805,4:0:0,0:_Aa:2312,7,5,7,5,0,59,4,2110,0,0,0:24294,120,337758,12,7,5,700,500:_Ab:28,7,5,5:119719,814,285713,1486:102372,207765,7,5:_Ac:7,6:474,370,564,438:73,98,96,128:38,43,88,93:150,175,369,394:_AQ:136362,300:136962:50798,144:51062:28344,36:28453:_ADPf:770,1:773,0:444,0:445,0:_ADPr:692,3:696,0:368,1:370,0:_ALP:1154,1:1158:1010,1:1014:110978,126:111307:_ARP:1238,4:1246:1162,4:1170:188947,636:190066:_ALB:2009,5:1953,5:1961:443459,506:444794:_ARB:2009,5:1985,5:1995:798967,2787:803779:_ALI:519,0:457,0:457:812,1:815:_ARI:1333,4:1148,4:1153:1462,4:1469:_AX:2274,5:2284:227400,500:228400:2009,5:2019:2274,5:2284:_Ax:227400,500:119238,66:278777,721:2111,5:2236,5:_BDP:939,1:942,0:866,3:870,0:_BT:220737,198:221410:210704,632:211453:5069,23:5138:6490,92:6601:_CDP1:672,1:675,0:604,3:608,0:672,1:675,0:604,3:608,0:_CDP2:0,0:0,0:0,0:0,0:_DDP:0,0:0,0:0,0:0,0:_ea:34,35:50624,143:28245,35:0,0:0,0:49113,118:_eb:61,61:1,-1:0,5:3,13:4,3:59,62:_eB:57969,37:1763,4:33,36:_eC:8073,0:190,0:48,0:7500,0:176,0:48,0:_eE:34870,10:-30,-20:199588,546:37,37:_eQ1:68,11:95,2:92,0:59,2:_eQ2:68,71:85,92:0,0:29,13:0,0:_CDP1vx:112850,163:113260,0:112850,163:113260:127506,309:128063:_CDP2vx:0,0:0,0:0,0:0:1,1:3:_f1:51,0:5,5:900,900,262,261,250,160:251,262,241,259,260,289,246,250,243:250,261,242,160,0,0,0,0,0,0:_g1:6,9:<LR>,<LD1>:11,0:0,0:<LI1>,<LD1>:G,G:_g2:1:1:G,G:1,3:1,3:0,0:0,0:_g3:1805,4:1276,4:,G:_h1:.:.:.:_i1:119:0,0:0,82:.


chr1	2494541	.	A	C	6.183890	Q10	ANY_VAR;SomaticQ=-20;TLODQ=-20;NLODQ=112;NLODV=<NONE>;TNBQF=0,-62,0,0;TNCQF=0,0,0,0;tbDP=1820;tDP=1286;tAD=1276,9;t2DP=0;t2AD=0,0;RU=G;RC=6;R3X2=2494534,1,1,2494541,6,1	GT:GQ:HQ:FT:FTS:_A_:DP:AD:bDP:bAD:c2DP:c2AD:_Aa:APDP:APXM:_Ab:APLRID:APLRI:APLRP:_Ac:ALRPxT:ALRIT:ALRIt:ALRPt:ALRBt:_AQ:aMQs:AMQs:a1BQf:A1BQf:a1BQr:A1BQr:_ADPf:aDPff:ADPff:aDPfr:ADPfr:_ADPr:aDPrf:ADPrf:aDPrr:ADPrr:_ALP:aLP1:ALP1:aLP2:ALP2:aLPL:ALPL:_ARP:aRP1:ARP1:aRP2:ARP2:aRPL:ARPL:_ALB:aLB1:aLB2:ALB2:aLBL:ALBL:_ARB:aRB1:aRB2:ARB2:aRBL:ARBL:_ALI:aLI1:aLI2:ALI2:aLIr:ALIr:_ARI:aRI1:aRI2:ARI2:aRIf:ARIf:_AX:aBQ2:ABQ2:aPF2:APF2:aP1:AP1:aP2:AP2:_Ax:aPF1:aLIT:aRIT:aP3:aNC:_BDP:bDPf:BDPf:bDPr:BDPr:_BT:bTAf:BTAf:bTAr:BTAr:bTBf:BTBf:bTBr:BTBr:_CDP1:cDP1f:CDP1f:cDP1r:CDP1r:cDP12f:CDP12f:cDP12r:CDP12r:_CDP2:cDP2f:CDP2f:cDP2r:CDP2r:_DDP:DDP1:dDP1:DDP2:dDP2:_ea:aBQ:a2BQf:a2BQr:a2XM2:a2BM2:aBQQ:_eb:bMQ:aAaMQ:bNMQ:bNMa:bNMb:bMQQ:_eB:bIAQb:bIADb:bIDQb:_eC:cIAQf:cIADf:cIDQf:cIAQr:cIADr:cIDQr:_eE:bIAQ:cIAQ:bTINQ:cTINQ:_eQ1:cPCQ1:cPLQ1:cVQ1:gVQ1:_eQ2:cPCQ2:cPLQ2:cVQ2:cMmQ:dVQinc:_CDP1vx:cDP1v:CDP1v:cDP1w:CDP1w:cDP1x:CDP1x:_CDP2vx:cDP2v:CDP2v:cDP2w:CDP2w:cDP2x:CDP2x:_f1:CONTQ:nPF:nNFA:nAFA:nBCFA:_g1:VTI:VTD:cVQ1M:cVQ2M:cVQAM:cVQSM:_g2:gapNf:gapNr:gapSeq:gapbAD1:gapcAD1:gcAD2:gcAD3:_g3:bDPa:cDP0a:gapSa:_h1:bHap:cHap:c2Hap:_i1:vHGQ:vAC:vNLODQ:note	./1:0:0,0:.:aBQXM-23|aAlignR-58|bcDup-66:_A_:1286:1276,9:1820:1799,19:0:0,0:_Aa:2292,7,0,7,5,0,61,4,2014,0,0,0:24306,120,334770,12,7,0,700,0:_Ab:7,28,0,0:120620,814,281999,1466:99237,196979,6,5:_Ac:6,7:476,372,564,438:74,99,96,128:39,44,87,92:155,180,364,389:_AQ:134982,2280:137442:46755,340:47130:27754,23:27786:_ADPf:756,17:773,0:445,1:447,0:_ADPr:683,13:698,0:367,7:374,0:_ALP:913,6:919:793,6:799:91887,511:92398:_ARP:1168,0:1168:1109,0:1109:172273,369:172642:_ALB:1745,6:1696,6:1702:370291,2099:372390:_ARB:1745,6:1723,6:1729:710393,1440:711833:_ALI:516,4:429,4:433:812,8:821:_ARI:1316,23:1130,19:1151:1439,30:1471:_AX:2008,6:2014:201937,552:202566:1958,32:1993:2251,38:2292:_Ax:207075,748:120105,500:276296,5338:2038,34:2218,34:_BDP:933,11:944,0:866,8:876,0:_BT:218767,3168:221935:210528,2160:213253:5046,115:5161:6220,370:6670:_CDP1:672,4:676,0:604,5:610,0:624,1:625,0:562,0:562,0:_CDP2:0,0:0,0:0,0:0,0:_DDP:0,0:0,0:0,0:0,0:_ea:33,11:47921,386:27963,27:204748,1648:218840,2733:41234,11:_eb:61,61:1,-1:0,2:3,9:5,3:59,74:_eB:48131,0:1419,0:35,0:_eC:7187,0:171,0:48,0:7216,0:170,0:48,0:_eE:42175,0:-42,-42:199494,1185:9,9:_eQ1:68,44:91,19:91,0:59,0:_eQ2:68,83:83,83:0,0:29,25:0,0:_CDP1vx:126272,505:126780,0:127081,1251:128335:127082,506:127689:_CDP2vx:0,0:0,0:0,0:0:1,1:3:_f1:51,0:40,40:900,900,177,197,215,160:168,241,169,177,201,177,177,177,169:215,197,205,160,0,0,0,0,0,0:_g1:0,1:A,C:0,0:0,0:T,C:,:_g2:.:.:.:.:.:.:.:_g3:1799,19:1276,9:,:_h1:.:.:.:_i1:112:0,0:84,0:.

Preprocessing BAMS to get originalName#UMI format

Hello,

To get my reads into the originalName#UMI format in a bam file, I am running:

  1. picard FastqToSam
  2. fgbio ExtractUMIsFromBam (get reads in originalName#UMI format)
  3. picard SamToFastq
  4. bwa (for alignment)
    However, when using fgbio ExtractUMIsFromBam with --annotate-read-names set to true, the UMI tag is appended to the QNAME but with a + instead of a #. Although I can reformat the bams, I was wondering if there were any alternative pre-processing steps or tools that should be used prior to running UVC?

Tumor only variant calling

Hi!

Do you have any recent benchmarks regarding Tummor only variant calling? I was about to try Mutect and PureCN when I saw your implementation.

Thanks,
Konstantinos

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.