Coder Social home page Coder Social logo

spliceai-test's Introduction

SpliceAI-test

Introduction

This repository includes python/shell scripts to test Illumina/SpliceAI to check whether SpliceAI is really good at predicting splice sites.

This project simply tries to run SpliceAI using all alternative isoforms from GENCODE v19.

Instructions

  1. It requires,

    1. python2 interpreter satisfying the requirement specified in setup.py
    2. python3 interpreter with installed minwoooj/lab-modules.

    So I recommend to use conda virtual environments.

  2. Original source codes: https://basespace.illumina.com/s/5u6ThOblecrh

  3. Additional source codes or source code modification will appear at the commit history.

The following description is original README.md of Illumina/SpliceAI.


SpliceAI: A deep learning-based tool to identify splice variants

This package annotates genetic variants with their predicted effect on splicing, as described in Jaganathan et al, Cell 2019 in press.

Installation

The simplest way to install SpliceAI is through pip:

pip install spliceai

Alternately, SpliceAI can be installed from the github repository:

git clone https://github.com/Illumina/SpliceAI.git
cd SpliceAI
python setup.py install

SpliceAI requires tensorflow>=1.2.0, which is best installed separately via pip: pip install tensorflow. See the TensorFlow website for other installation options.

Usage

SpliceAI can be run from the command line:

spliceai -I input.vcf -O output.vcf -R genome.fa -A annotations.txt

# or you can pipe the input and output VCFs
cat input.vcf | spliceai -R genome.fa -A annotations.txt > output.vcf

Options:

  • -I: Input VCF with variants of interest.
  • -O: Output VCF with SpliceAI predictions SpliceAI=ALLELE|SYMBOL|DS_AG|DS_AL|DS_DG|DS_DL|DP_AG|DP_AL|DP_DG|DP_DL included in the INFO column (see table below for details). Only SNVs and simple INDELs (ref or alt must be a single base) within genes are annotated. Variants in multiple genes have separate predictions for each gene.
  • -R: Reference genome fasta file.
  • -A: Gene annotation file. Can instead provide grch37 or grch38 to use GENCODE canonical annotation files included with the package. To create custom annotation files, use spliceai/annotations/grch37.txt in repository as template.

Note: The annotations for all possible SNVs within genes are available here for download.

Details of SpliceAI INFO field:

ID Description
ALLELE Alternate allele
SYMBOL Gene symbol
DS_AG Delta score (acceptor gain)
DS_AL Delta score (acceptor loss)
DS_DG Delta score (donor gain)
DS_DL Delta score (donor loss)
DP_AG Delta position (acceptor gain)
DP_AL Delta position (acceptor loss)
DP_DG Delta position (donor gain)
DP_DL Delta position (donor loss)

Delta score of a variant ranges from 0 to 1, and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall/likely pathogenic), 0.5 (recommended/pathogenic), and 0.8 (high precision/pathogenic) cutoffs. Delta position conveys information about the location where splicing changes relative to the variant position (positive values are upstream of the variant, negative values are downstream).

Examples

A sample input file and the corresponding output file can be found at examples/input.vcf and examples/output.vcf respectively (grch37 annotation). The output SpliceAI=T|RYR1|0.22|0.00|0.91|0.70|-107|-46|-2|90 for the variant 19:38958362 C>T can be interpreted as follows:

  • The probability that the position 19:38958255 is used as a splice acceptor increases by 0.22.
  • The probability that the position 19:38958360 is used as a splice donor increases by 0.91.
  • The probability that the position 19:38958452 is used as a splice donor decreases by 0.70.

Contact

Kishore Jaganathan: [email protected]

spliceai-test's People

Watchers

 avatar

Forkers

bio-szhang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.