Coder Social home page Coder Social logo

variant_filter's Introduction

Variant Filtering Script - R & Bash

A set of scripts for filtering multi-sample VCF files to generate candidate variant lists

Main command:

./variant_filtering.sh --project my_project \
--input /data/Alignments/your_merged_vcf.vcf \
--output /home/user/ \
--config variant_filtering.config \
--temp TRUE

1. variant_filtering_results.tsv

Ouput description table

Column Name Description
Id Variant Id Number
CHROM Chromosome number
POS Variant loci position
rsID rsID or position again if not known
REF Reference base
ALT Alternate base
QUAL QUAL score
GENE Gene symbol
TYPE Exonic/splicing - what type of region
AA AA change for multiple transcripts - look up appropriate one
CONSEQUENCE Functional change or consequence (stop gain etc)
X1000G Rarity in 1000 genomes
EXAC Rarity in ExAC consortium
CADD CADD score variant damage prediction
HET_val Number of samples called with a het alt call at this position
HOM_val Number of samples called with a hom alt call at this position
MISS_rate The pecentage rate of sites failing to genotype (missing rate)
AC_all Alt allelic counts for this site (post filtering) - homozygous = 2 towards count
AC_nsyn Same as AC_all but only for nonsynonymous changes
AC_trnc Same as AC_all but only for truncating changes (stop gain, frameshifts)
AC_other Same as AC_all but only for all other types of changes
AggAF_Trunc Aggregated allele frequency of truncating mutations based on 1000Genomes or ExAc
AggAF_nsyn Same as AggAF_Trunc but for nonsynonymous mutations
RVIS_Pct Residual variant intolerance score percentile - Higher is more tolerant
GDIS_Phred Genetic damage intolerance score - Phred scaled - Higher is more tolerant
Sample_Id Columns from this point are genotyping columns - 0 = ref, 1 = het, 2 = hom, -9 = missing

2. variant_filtering_GeneAC.tsv

  • Seperate file containing only the AC_* columns from variant_filtering_results.tsv
  • Used for statistical analysis of case/controls for all genes present in the variant file

3. variant_filtering_results_AD.tsv

  • Allelic depth ratio matrix
  • describes the proportion of reads supporting the REF against ALT
  • Calculates as ALT READs / REF reads + ALT reads
  • Used as a filtering step in variant_filtering.config file - variant retained if one sample passes the threshold
  • Multi-allelic variants are omitted from this filter for ease of parsing the data

4. R_log.txt

  • Log file of each filtering step in the R filtering script
  • Denotes start times and variant counts at each stage
  • Declares the number of multi-allelic variants carried forward
  • Describes the number of variants in which AC_all was not equal to sum(AC_trunc + AC_nsyn + AC_other) Should be a very low proportion of total variants and is due to unusual variant type/consequence combinations

5. variantfilter.log

  • Log file for the bash portion of the variant filtering script
  • Contains the parameters used and declares which thresholds were specified in the variant_filtering.config file
  • Mainly just for error checks and confirming the script has completed

6. variant_comphet-biallelic_results.tsv

  • Calculates sets of variants that are potentially compound heterozygous
  • Sites are retained if two or more occur in the same sample and gene simultanously

Required dependencies and software:

Required R Libraries:

variant_filter's People

Contributors

phil9s avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.