Coder Social home page Coder Social logo

psst's Introduction

PSST

Polygenic SNP Search Tool Version 2.0

Graphical Overview

Workflow

Overview:

The Polygenic SNP Search Tool is an open-source pipeline that identifies multiple SNPs that are associated with diseases; including SNPs that modify the penetrance of other SNPs. This pipeline identifies:

  • Asserted pathogenic SNPs
  • Genome-wide Association Studies (GWAS) identified SNPs , crossed with database and datasets such as ClinVar, SRA, and GEO, and then constructs a report describing multiple genetic variants associated with diseases.

Dependencies:

Biopython

Usage:

The main script psst.sh accepts as input a text file where each line corresponds to a unique SNP rs-accessions and either another text file containing unique SRA accessions or a FASTQ file. This script will then output a TSV file describing which SNPs are contained in the SRA datasets.


Usage: psst.sh [-h description and usage] [-s SRA accessions] [-n SNP accessions]
               [-f FASTQ file] [-d working directory] [-e email for Entrez]
               [-t threads] [-p max number of child processes]
               

Example: The PSST pipeline is as follows:

  1. Extracts flanking sequences for the SNP accessions and creates a FASTA file containing these flanking sequences.

  2. Creates a BLAST database out of the SNP flanking sequences.

  3. Runs Magic-BLAST on each phenotype-associated SRA dataset and the SNP flanking sequence BLAST database.

  4. From the Magic-BLAST alignments, determines which SNPs are contained in the SRA datasets using a statistical heuristic.

See the file breast-ovarian_cancer.tsv for an example output file.

Disease Clustering:

Grouping different disease types through the ClinVar database in various categories such as assorted metabolic diseases and breast cancer to see the relationship among human variations and phenotypes.

  1. Diseases were manually found exploring through the ClinVar dataset.

  2. Performed an online search to crosscheck whether the diseases that came up were metabolic or cancer related.

  3. Those that were not a match were eliminated while the correct diseases were moved into another file.

Future Additions

  • Add a Bayesian inference variant calling rule for small number of NGS datasets. Our current heuristic runs fast on a large number of datasets, but for small number of datasets, a bayesian inference rule would be better and we wouldn't lose much in terms of time usage.

psst's People

Contributors

sean-la avatar anmolv07 avatar dcgenomics avatar chipmash avatar

Stargazers

 avatar  avatar  avatar Felix Peppert avatar peterdfields avatar  avatar

Watchers

 avatar James Cloos avatar  avatar  avatar Phil Greer avatar Carrie Price avatar  avatar Greg Boratyn avatar  avatar  avatar

psst's Issues

Issue with get_var_flanks.py

Using the below command, the following error is generated:

python get_var_flanks.py -i gene_snp_accessions.txt -e [email protected] -o gene_snp_flanks.txt

where gene_snp_accessions.txt contains the SNP names from gene (0.01 <= MAF <= 0.5)

[email protected] is the email

and gene_SNP_flanks.txt is the output.

Traceback (most recent call last):
File "get_var_flanks.py", line 86, in
main(sys.argv)
File "get_var_flanks.py", line 82, in main
flanking_sequences = get_var_flanking_sequences(accessions,email)
File "get_var_flanks.py", line 12, in get_var_flanking_sequences
for record in records:
File "/home/moralesdia/.local/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 286, in parse
self.parser.Parse(text, False)
File "/home/moralesdia/.local/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 390, in endElementHandler
raise RuntimeError(value)
RuntimeError: Invalid uid rs1010447 at position=0

To fix this problem, the input snp files need to have the 'rs', removed.

New Issue with get_var_flanks.py

This is the latest error that I get trying to use get_var_flanks.py. This time the mTOR_snp_accessions.txt file does not have 'rs' in the entries.

below command:
python get_var_flanks.py -i mTOR_snp_accessions.txt -e [email protected] -o mTOR_snp_flanks.txt

Results:

Traceback (most recent call last):
File "get_var_flanks.py", line 86, in
main(sys.argv)
File "get_var_flanks.py", line 82, in main
flanking_sequences = get_var_flanking_sequences(accessions,email)
File "get_var_flanks.py", line 12, in get_var_flanking_sequences
for record in records:
File "/home/moralesdia/.local/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 286, in parse
self.parser.Parse(text, False)
File "/home/moralesdia/.local/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 390, in endElementHandler
raise RuntimeError(value)
RuntimeError: Empty id list - nothing todo

title

Polygenic SNP Search Tool

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.