Coder Social home page Coder Social logo

seffnet / seffnet Goto Github PK

View Code? Open in Web Editor NEW
12.0 4.0 3.0 913.65 MB

Network representation learning on drug-target-side effects-indication graphs for side effect prediction

Home Page: https://seffnet.readthedocs.io

License: MIT License

Python 11.61% Jupyter Notebook 88.04% HTML 0.32% Dockerfile 0.03%
machine-learning network-representation-learning side-effects

seffnet's Introduction

SEffNet Development Build Status

SEffNet (Side Effect Network embeddings) is a tool that optimizes, trains, and evaluates predictive models for biomedical networks that contain drug-, target- and side effect-information using different network representation learning methods in an attempt to understand the causes of side effects.

This package was developed during the master's thesis of Rana Aldisi.

Structure

  • notebooks: Notebooks that were used for training and evaluation of models, and interpertation of prediction model
  • resources: The graphs and materials that are used for training and testing

Installation

seffnet can be installed on python37+ from the latest code on GitHub with:

$ pip install git+https://github.com/seffnet/seffnet.git

Usage

Using the predictive model

If you've installed seffnet locally, you can use the default model from the GitHub repository with:

from seffnet.default_predictor import predictor

# Find new relations for a given entity based on its CURIE
results = predictor.find_new_relations(curie='pubchem.compound:5095')
...

You can get the embeddings for phenotype entities with

import itertools as itt
from seffnet.default_predictor import predictor

phenotype_to_embedding = {
    node_data['identifier']: predictor.embeddings[node_id]
    for node_id, node_data in predictor.node_id_to_info.items()
    if node_data['namespace'] == 'umls'
}
# could use sklearn.metrics.pairwise.cosine_similarity on the values in this dict

You can use the default model in the CLI:

$ seffnet predict pubchem.compound:5095

You can predict on new chemicals via their SMILES strings based on their similarity to chemicals included in the network. Warning: we haven't benchmarked how well this actually works yet.

$ seffnet predictc "C1=CC=C(C=C1)C2=CC=C(C=C2)CCO"

Rebuilding the resources

You can rebuild all the graphs and maps created for this project by running the following:

$ seffnet rebuild

Note that you need to have RDKit package and environment to be able to run this command

Model training and evaluation

You can train an NRL model using the following:

$ seffnet train --input-path ./resources/basic_graphs/fullgraph_with_chemsim.edgelist --evaluation --method node2vec
  • For further CLI options and parameters use --help, -h

Optimizing hyperparameters

Network representation learning models can be optimized with:

$ seffnet optimize --input-path ./resources/basic_graphs/fullgraph_with_chemsim.edgelist --method node2vec
  • For further CLI options and parameters use --help, -h

Web Application

The web application allows users to get results from the model programmatically. Make sure the extra dependencies have been installed as well using the [web] extra. Unfortunately, this doesn't work when installing directly from GitHub, so see the setup.cfg for the Flask dependencies.

$ pip install -e .[web]

Run development server with:

$ seffnet web --host localhost --port 5000

Run through docker with:

$ docker-compose up

As an example, you can check the chemicals predicted to interact with HDAC6 at http://localhost:5000/predict/uniprot:Q9UBN7?results_type=chemical.

seffnet's People

Contributors

aldisirana avatar cthoyt avatar ddomingof avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

seffnet's Issues

Validation of chemical embedding imputation

For each chemical, impute its embedding (after throwing away the 1.0 similarity to itself).

Scenario 1: Global Losses

Calculate a loss of that embedding against all other embeddings. This would allow for the goodness of different imputation procedures to be compared, but isn't so easy to assess on its own.

Scenario 2: Standard Evaluation

How good is the imputed embedding at predicting the real edges that were already in the network? Report true positive rate and false negative rate.

Negative sampling could also be used to allow reporting of MCC, ROC-AUC, ROC-PR, and other metrics

Update imports

I'm getting two warnings from sklearn that we should prepare to fix:

/Users/cthoyt/.virtualenvs/seffnet/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.linear_model.logistic module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
  warnings.warn(message, FutureWarning)
/Users/cthoyt/.virtualenvs/seffnet/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator LogisticRegression from version 0.21.3 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.

Implement imputation of chemical embeddings

  1. Use element-wise average of all other chemicals weighted by similarity
  2. Do a "validation" where for each chemical you calculate the embedding then use a loss function to see how far away it is from the real one

Update requirements (install fail on dateutil)

Got an error during installation

(seffnet) sichom@sichom-Precision-T7600:~/projects/seffnet$ seffnet web
Traceback (most recent call last):
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (python-dateutil 2.8.1 (/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages), Requirement.parse('python-dateutil<2.8.1,>=2.1; python_version >= "2.7"'), {'botocore'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sichom/software/miniconda/envs/seffnet/bin/seffnet", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 3251, in <module>
    @_call_aside
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
    f(*args, **kwargs)
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 585, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 598, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (python-dateutil 2.8.1 (/home/sichom/software/miniconda/envs/seffnet/lib/python3.7/site-packages), Requirement.parse('python-dateutil<2.8.1,>=2.1; python_version >= "2.7"'), {'botocore'})

Refactor CLI

  • Click options that are reused can be saved as variables at the top of the file
  • No business logic in the CLI functions! Only handling user/file input/output. That means that there should be a wrapper function that takes care of all of the if statements living somewhere else. This is the case for optimize and train

Online embedding of new chemicals

What happens when we want to make a prediction about a new chemical that isn't already in the graph?

  1. Given a PubChem compound identifier, look up the SMILES string. Maybe also consider other identifier to SMILES resolvers, or allow direct input of SMILES
  2. Convert that SMILES string to a descriptor vector, same as how the graph was created.
  3. Calculate similarities to all chemicals in the graph using the same cutoff as before
  4. Do some random walks for this new chemical
  5. Stick those random walks in the Word2Vec model and generate the embedding

I think this means we need a more robust way of storing the fingerprints of all of the chemicals in the given graph, and also to store the word2vec model in addition to all of the other things

Use CURIEs instead of node identifiers

You spent a lot of effort to assign numeric identifiers to each node (which we then treat as strings)... would it be possible to skip all of that and just use the CURIEs directly? Or somewhere do these numbers correspond to the indexes in some matrices?

Add weight to edges from SIDER

  • Update Bio2BEL package to propagate through frequency information
  • Make a mapping function from those frequencies to continuous real number space.
  • Make new export of SIDER edge list including weighting (this will need a new format, right?)

We'll probably have to do some iteration through several different functions to see which works best

Add weight to edges from DrugBank

Check if DrugBank sucks in nominal IC50 or Ki values. If so, we're done. Otherwise:

  • Map drugs to ChEMBL identifiers
  • Map drugs' targets to ChEBML identifiers
  • Look up pCHEMBL values for drug-target interactions using the ChEMBL python package
  • Generate new edge list using that as the weighting

Seed shouldn't be set during optimization

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.