novoCaller is a Bayesian de novo variant calling algorithm that uses information from read-level data both in the pedigree and in unrelated samples. The method was extensively tested using large trio sequencing studies, and it consistently achieved over 98% sensitivity while giving significantly more specificity than other well known methods for the same sensitivity values.
The method works with a vcf file alone (first layer), but it can also inspect the reads in the bam file and compute probabilities based on direction-specific (forward and reverse) reads (second layer).
We made the software keeping VCFv4.1 in mind. The software looks at the format field of each variant call location in the VCF file and reads the AD element. Then it extracts that part from every sample.
g++ novoCaller.cpp -o novoCaller
./novoCaller \
-I <path to vcf file> \
-O <path to output file for layer 1> \
-T <path to file containing sample IDs of the trios, the IDs are in the order:parent1(TAB)parent2(TAB)proband> \
-X <put 1 if you want to run on X chromosome as well, 0 otherwise> \
-P <threshold on posterior probability. Calls are made if the PP is above threshold. Use a low value like 0.005 so that a large number of calls are made for the second layer> \
-E <threshold on the ExAC allele frequency, e.g. 0.0001>
python -W ignore novoCallerBAM.py \
-I <path to the output file from previous step (the file given in -O option)> \
-U <path to a file containing paths to the bam files from unrelated samples> \
-T <path to a file containing paths to the bam files of the trio> \
-O <path to the output file for the second layer>
The ignore option is given to ignore log of 0 warning.
./novoCaller -I ./all_calls.vep.vcf -O step1_out.txt -T trio_ids.txt -X 1 -P 0.005 -E 0.008
python -W ignore novoCallerBAM.py -I step1_out.txt -U de_novo_unrelated_bams.txt -T de_novo_case_bams.txt -O denovo_calls.txt
denovo_calls.txt columns are the following: Rank, chromosome, position, reference allele, alternative allele, AF (in samples), rhos, priors, PP=posterior probability, AF_unrelated, gene_name(s)
Find our paper here
novocaller's People
Forkers
dbmi-bgmnovocaller's Issues
Providing example files
Hi developers of novoCaller,
I have tried running the first layer of novoCaller with the following command but the program just keep on running for over 24 hours without generating any output data. I am new to bioinformatics so please correct me if I made any mistakes.
Command:
novoCaller -I input.vcf -O step_1_out.txt -T sample_id.txt -X 1 -P 0.005 -E 0.008
vcf:
example.vcf.gz
sample ID file:
sample_id.txt
It would be very helpful if you can provide example files for the program.
Thanks a lot!
Marcus
Pysam error when running novoCallerBAM.py
Hi,
When trying to run the python script using the bam files and the output of step one, I get this error:
Traceback (most recent call last):
File "/nfs/projects/refractory_epilepsy/Novocaller_test/novoCaller/novoCallerBAM.py", line 802, in
runner(outfilename,initial_filename,unrelated_filename,trio_filename)
File "/nfs/projects/refractory_epilepsy/Novocaller_test/novoCaller/novoCallerBAM.py", line 731, in runner
PP,ADfs,ADrs,ADfs_U,ADrs_U,rho_f_new,rho_r_new,prior_L_new,AF_unrel = PP_calc(trio_samfiles,unrelated_samfiles,chrom,pos,REF,ALT,allele_freq,MQ_thresh,BQ_thresh)
File "/nfs/projects/refractory_epilepsy/Novocaller_test/novoCaller/novoCallerBAM.py", line 628, in PP_calc
ADfs,ADrs = get_all_ADs_combined(unrelated_samfiles,chrom,pos,REF,ALT,MQ_thresh,BQ_thresh)
File "/nfs/projects/refractory_epilepsy/Novocaller_test/novoCaller/novoCallerBAM.py", line 312, in get_all_ADs_combined
ADf,ADr = get_ADs_combined(samfile,chrom,position_actual,REF,ALT,MQ_thresh,BQ_thresh)
File "/nfs/projects/refractory_epilepsy/Novocaller_test/novoCaller/novoCallerBAM.py", line 303, in get_ADs_combined
ADf,ADr = get_ADs(samfile,chrom,position_actual,REF[0],MQ_thresh,BQ_thresh)
File "/nfs/projects/refractory_epilepsy/Novocaller_test/novoCaller/novoCallerBAM.py", line 149, in get_ADs
SP=samfile.pileup("chr"+CC, position, position+1)
File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig chr1
unrelated file
seems I put this question on the wrong page. sorry, I should put it at here; https://github.com/dbmi-bgm/granite, but I cannot find the issues button on the granite page...
Hi, i have some questions when i using novocaller, maybe someone can help me figure out some of these:
- Can novo caller detect de novo indel?
- when calling DNVs, the unrelated files refer to a) independent trios from the specific trios; b) samples with unrelated phenotypes that not suppose to carry DNV in this position; 3) samples that are independent of each other and the specific trios.
- how many un-related samples will novoCaller prefer
- we know sequencing error and gatk may detect more than one DNVs in many trios at an identical position, which is not a real DNV but a sequencing error, in case I used many other trios as un-related samples and the error DNVs in other trios at this position will increase or decrease the DNV probability
- I saw the example vcf file, is it necessary to put the child at the first genotype column [10th column in vcf file]. the reason I ask this question is because I have many trios, if it is necessary to put the child in the 10th column in vcf file, that means I have to create many big vcf files containing similar un-related samples for each trio
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.