Coder Social home page Coder Social logo

pgx_guidelines's Introduction

Drug Sensitivity Prediction From Cell Line-Based Pharmacogenomics Data: Guidelines for Developing Machine Learning Models

Table of contents

  1. Installation
  2. Datasets
  3. Experiments
  4. Citation

Installation

Requirements

  • Python 3
  • Conda

To get the source files of PGx Guidelines you need to clone into its repository:

git clone https://github.com/bhklab/PGx_Guidelines

Conda environment

All the required packages to run PGx Guidelines experiments are specified in environment subdirectory. To install these packages run the following command:

conda env create -f PGx.yml

This command installs PGxG environment.

After the successful installation of the packages into environmet you would be able to load them using conda activate.

Datasets

Download datasets

All of the utilized datasets for PGx Guidelines experiments are publicly available in the PSet format via ORCESTRA platform:

https://www.orcestra.ca/pset/stats

Preprocess and load datasets

After downloading PSet objects, the molecular and pharmacological data can be extracted via R using codes provided in Preprocess data subdirectory.

To load all datasets and Area above dose-response curve (AAC) data, run LoadAllPSets.R.

To load log transformed and truncated IC50 values, run IC50Loading_logtruncated.R.

tissueType_encoding.csv file is one-hot coding of tissue types which is added to molecular profiles to adjust for tissue type.

Running R scripts generates the final datasets in .tsv format. Add them to a new subdirectory Data_All:

mkdir Data_All

By creating this subdirectory and adding all the data files to it, you will be able to re-run PGx Guidelines experiments. Alternatively, we have also provided these preprocessed files on Zenodo.

Experiments

Run univariable analysis

Each Rscript includes code to load required libraries and datasets.

Simply run the following for:

  • all [solid and non-solid] tissues:
Rscript biomarker_analysis_alltissues.R "$@"
  • after excluding non-solid tissues:
Rscript biomarker_analysis_solidonly.R "$@"
  • after excluding non-solid tissues and log transformed IC50 values:
Rscript biomarker_analysis_log.R "$@"
  • after excluding non-solid tissues and truncated
Rscript biomarker_analysis_truncated.R "$@"
  • after excluding non-solid tissues, truncated, and log transformed IC50 values:
Rscript biomarker_analysis_truncated_log.R "$@"

Within-domain

For this analysis, we have provided the Python scripts as follows:

  • Ridge Regression: Within-Ridge-aac.py and Within-Ridge-ic50.py
sbatch ridge-wjob-aac.bs
sbatch ridge-wjob-ic50.bs

  • Elastic Net: Within-EN-aac.py and Within-EN-ic50.py:
sbatch en-wjob-aac.bs
sbatch en-wjob-ic50.bs
  • Random Forest: Within-RF-aac.py and Within-RF-ic50.py.
sbatch rf-wjob-aac.bs
sbatch rf-wjob-ic50.bs

Cross-domain

For this analysis, we have provided the Jupyter notebooks to run Ridge Regression (Ridge.ipynb), Elastic Net (ElasticNet.ipynb), and Random Forest (RandomForest.ipynb). For Deep Neural Networks experiments, we have provided python scripts in DNN subdirectory to run them. First you should create directories to store logs, models, and results. You should also add your local path to these directories to PGxGRun.bs:

mkdir logs
mkdir models
mkdir results
sbatch PGxGRun.bs

We have also provided randomly generated hyperparameter settings in filelistF10Uniquev1.

We have provided the model objects for the best settings of DNN experiments on Zenodo.

CTRPv2 vs. GDSCv1

For this analysis, we have provided the Jupyter notebook GDSCv1.ipynb.

Impact of non-solid cell lines

For this analysis, we have provided the Jupyter notebook SolidandnonSolid.ipynb. For running the random subset experiment, run SNRidge-aac.py script.

python SNRidge-aac.py

Citation

    author = {Sharifi-Noghabi, Hossein and Jahangiri-Tazehkand, Soheil and Smirnov, Petr and Hon, Casey and Mammoliti, Anthony and Nair, Sisira Kadambat and Mer, Arvind Singh and Ester, Martin and Haibe-Kains, Benjamin},
    title = "{Drug sensitivity prediction from cell line-based pharmacogenomics data: guidelines for developing machine learning models}",
    journal = {Briefings in Bioinformatics},
    year = {2021},
    month = {08},
    issn = {1477-4054},
    doi = {10.1093/bib/bbab294},
    url = {https://doi.org/10.1093/bib/bbab294},
    note = {bbab294},
    eprint = {https://academic.oup.com/bib/advance-article-pdf/doi/10.1093/bib/bbab294/39679532/bbab294.pdf},
}

pgx_guidelines's People

Contributors

bhaibeka avatar honcasey avatar hosseinshn avatar p-smirnov avatar soheilj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.