Coder Social home page Coder Social logo

bio-ontology-research-group / deepsvp Goto Github PK

View Code? Open in Web Editor NEW
16.0 13.0 4.0 863 KB

Prioritizing Copy Number Variants (CNV) using Phenotype and Gene Functional Similarity

License: GNU General Public License v3.0

Python 83.32% Shell 14.22% Common Workflow Language 2.11% Dockerfile 0.35%
copy-number-variation ontology structural-variation

deepsvp's Introduction

DeepSVP

DeepSVP is a computational method to prioritize structural variants (SV) involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual celltypes, and anatomical sites of expression. DeepSVP systematically relates them to their phenotypic consequences through ontologies and machine learning.

Training dataset

We train and evaluate our method using human SV collected from dbvar dataset.

Annotation data sources (integrated in the candidate SV prediction workflow)

We integrated the annotations from different sources:

  • Gene ontology (GO)
  • Uber-anatomy ontology (UBERON)
  • Mammalian Phenotype ontology (MP)
  • Human Phenotype Ontology (HPO)

This work is done using DL2vec. We convert different types of Description Logic axioms into graph representation, and then generate an embedding for each node and edge type.

We collected genomics features using the AnnotSV (v2.2) public tool.

Installation

Using pip version 20.3.1:

pip install deepsvp

Or you can create a specific Conda Environments (e.g. named "deepsvp-py38-pip2031"):

conda create -n deepsvp-py38-pip2031 python=3.8 pip=20.3.1
conda activate deepsvp-py38-pip2031
pip3 install deepsvp
pip3 install networkx
pip3 install torch
pip3 list
conda deactivate

Running the DeepSVP prediction model

  • Download all the files from data and place the uncompressed files/repository in the folder named "data":
mkdir DeepSVP/          ;# /path_of_your_DeepSVP_repository/
cd DeepSVP
wget "https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/data.zip"
unzip data.zip
cd data                 ;# /path_of_your_DeepSVP_data_repository/
wget "https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/experiments.zip"   # can be very long
unzip experiments.zip
  • Download and install the required AnnoSV (2.3) tool in the "data" folder:
cd /path_of_your_DeepSVP_data_repository/
git clone  [email protected]:lgmgeo/AnnotSV.git --branch v2.3
cd AnnotSV/
make PREFIX=. install
make DESTDIR= PREFIX=. install-human-annotation
cd ..
  • Add genomic features to your VCF input file (/path_and_name_of_your_vcf_input_file/) thanks to AnnotSV (v2.3):

e.g. /path_and_name_of_your_vcf_input_file/ = ./input.vcf

e.g. /path_and_name_of_your_annotsv_output_file/ = ./data/output.annotsv.annotated.tsv

bash 
export ANNOTSV=/path_of_your_DeepSVP_data_repository/AnnotSV
$ANNOTSV/bin/AnnotSV -SVinputFile ./input.vcf -genomeBuild GRCh38 -outputFile ./data/output.annotsv.annotated.tsv

Your annotated VCF file (./data/output.annotsv.annotated.tsv) should be placed in the data folder (/path_of_your_DeepSVP_data_repository/).

  • Run the command deepsvp --help to display help and parameters:
Usage: deepsvp [OPTIONS]
      
     DeepSVP: A phenotype-based tool to prioritize caustive CNV using WGS data
     and Phenotype/Gene Functional Similarity
  
Options:
    -d, --data-root TEXT      Data root folder  [required]
    -i, --in-file TEXT        Annotated Input file  [required]
    -p, --hpo TEXT            List of phenotype ids separated by commas
                              [required]
    -maf, --maf_filter FLOAT  Allele frequency filter using gnomAD and 1000G
                              default<=0.01
    -m, --model_type TEXT     Ontology model, one of the following (go , mp ,
                              hp, cl, uberon, union), default=mp
    -ag, --aggregation TEXT   Aggregation method for the genes within CNV (max
                              or mean) default=max
    -o, --outfile TEXT        Output result file
    --help                    Show this message and exit.        
  • Run the example (with you own HPO terms):
    deepsvp -d data/ -i output.annotsv.annotated.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt

Or run the example with the deepsvp-py38-pip2031 Conda Environment:

conda activate deepsvp-py38-pip2031
deepsvp -d data/ -i $your_annotsv_output.annotated.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt
conda deactivate

Or by using cwl-runner, modify the input file in the input example yaml deepsvp.yaml file and then run:

cwl-runner deepsvp.cwl deepsvp.yaml 
|========                        | 25% Reading the input phenotypes...
|================                | 50% Phenotype prediction... 
|========================        | 75% CNV Prediction... 
|================================| 100% DONE! You can find the prediction results in the output file: example_output.txt

Output:

The script will output a ranking a score for the candidate caustive CNV.

Scripts

  • Details for predicting pathogenic variants and comparison with other methods can be found in the experiment folder.
  • annotations.sh: This script is used to annotate the varaints.
  • data_preprocessing.py: preprocessing the annotations and features.
  • pheno_model.py: script to get the DL2vec score using the trained model.
  • deepsvp_training.py: script to train and testing the model, with Hyperparameter optimization
  • BWA_GATK.sh : script to run GATK workflow for the input fastq files for the real samples, run using KAUST Supercomputing IBEX.
  • run_Manta.sh : script to generate VCF with the structural variants (SVs), we used Manta to identify the candidate SVs. run using KAUST Supercomputing IBEX.

Final notes

For any questions or comments please contact: [email protected]

deepsvp's People

Contributors

azzatha avatar coolmaksat avatar dependabot[bot] avatar leechuck avatar lgmgeo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepsvp's Issues

raise KeyError(key) from err KeyError: '1000g_AF'

Hi, am back after calling and annotating CNV.

When running DeepSVP now, I get to 75% and immediately crash to this traceback:

Traceback (most recent call last):
File "/Users/alessihd/opt/miniconda3/envs/py36/bin/deepsvp", line 8, in
sys.exit(main())
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/deepsvp/main.py", line 119, in main
maf_filter)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/deepsvp/main.py", line 290, in func_wrapper
value = func(*args, **kwargs)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/deepsvp/main.py", line 460, in load_cnv_model
data['1000g_AF'].fillna(0, inplace=True)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py", line 2902, in getitem
indexer = self.columns.get_loc(key)
File "/Users/alessihd/opt/miniconda3/envs/py36/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: '1000g_AF'

This seems to be because my annotated output has no 1000g_AF column. Unfortunately, after further diving into the DeepSVP codebase, almost none of the columns (i.e., NumPromoters, TriS_CGscore) match up with what came from the AnnotSV annotation.

Has AnnotSV changed their labels? Any ideas would be appreciated. Thanks!

Help for install

Hi,

I would like to install DeepSVP with the following command:

pip3 install deepsvp

but I have this error message:

Collecting tensorflow==2.3.0
  Could not find a version that satisfies the requirement tensorflow==2.3.0 (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.3.0, 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.12.2, 1.12.3, 1.13.1, 1.13.2, 1.14.0)
No matching distribution found for tensorflow==2.3.0

Actually, I've had difficulties with the following requirement:

pip3 install tensorflow==2.3.0   
Collecting tensorflow==2.3.0
  Could not find a version that satisfies the requirement tensorflow==2.3.0 (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.3.0, 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.12.2, 1.12.3, 1.13.1, 1.13.2, 1.14.0)
No matching distribution found for tensorflow==2.3.0

Thank you for any help you can provide me,
Véronique

Are my VCF files formatted incorrectly?

Hi, this might be a bad issue to post here because it's mainly with AnnotSV I guess, but if you could help it would be much appreciated.

I am getting this error when running annotation.sh:

No SV to annotate in the SVinputFile - Exit without error.

I have tried multiple vcf files to no avail. Thanks.

edit: just to note, the file size of the vcf is not 0.

Help for install

Hi,

I would like to install DeepSVP with the following command:

conda create -n deepsvp-py38-pip2031 python=3.8 pip=20.3.1
conda activate deepsvp-py38-pip2031
pip3 install deepsvp==1.0.3

but I have this error message:

 ERROR: Command errored out with exit status 1:
     command: /home/wangzh/miniconda3/envs/deepsvp-py38-pip2031/bin/python3.8 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-joadgm7k/smart-open_bb4492a42a8742308626c897d6dcc1fb/setup.py'"'"'; __file__='"'"'/tmp/pip-install-joadgm7k/smart-open_bb4492a42a8742308626c897d6dcc1fb/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-4069swc8
         cwd: /tmp/pip-install-joadgm7k/smart-open_bb4492a42a8742308626c897d6dcc1fb/
    Complete output (1 lines):
    error in smart_open setup command: 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6.*'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Thank you for any help you can provide me

installation problem

Hi, I am having a problem while installation.
I followed the steps by creating a conda environment and pip install deepsvp. However, an error throw out:
error in smart_open setup command: 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6.*'
After googling, I am not sure how to fix this issue, do you have any suggestions?

Best

Shirley

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.