Coder Social home page Coder Social logo

paccmann_rl's Introduction

Build Status

paccmann_rl

Pipeline to reproduce the results of the PaccMann^RL paper.

Description

In the repo we provide a conda environment and instructions to reproduce the pipeline descirbed in the manuscript:

  1. Train a multimodal drug sensitivity predictor (source code)
  2. Train a generative model for omic profiles, also known as the PVAE (source code)
  3. Train a generative model for molecules, also known as the SVAE (source code)
  4. Train PaccMann^RL (source code)

Requirements

  • conda>=3.7
  • The following data from here:
    • The processed splitted data from the folder splitted_data
    • The processed gene expression data from GDSC: data/gene_expression/gdsc-rnaseq_gene-expression.csv
    • The processed SMILES from the drugs from GDSC: data/smiles/gdsc.smi
    • A pickled SMILESLanguage object (data/smiles_language_chembl_gdsc_ccle.pkl)
    • A pickled list of genes representing the panel considered in the paper (data/2128_genes.pkl)
    • A pickled pandas DataFrame containing expression values and metadata for the cell lines considered in the paper (data/gdsc_transcriptomics_for_conditional_generation.pkl)
  • The git repos linked in the previous section

NOTE: please refer to the README.md and to the manuscript for details on the datasets used and the preprocessing applied.

Setup

Install the environment

Create a conda environment:

conda env create -f conda.yml

Activate the environment:

conda activate paccmann_rl

Download data

Download the data reported in the requirements section. From now on, we will assume that they are stored in the root of the repository in a folder called data, following this structure:

data
├── 2128_genes.pkl
├── gdsc-rnaseq_gene-expression.csv
├── gdsc.smi
├── gdsc_transcriptomics_for_conditional_generation.pkl
├── smiles_language_chembl_gdsc_ccle.pkl
└── splitted_data
    ├── gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv
    ├── gdsc_cell_line_ic50_train_fraction_0.9_id_997_seed_42.csv
    ├── tcga_rnaseq_test_fraction_0.1_id_242870585127480531622270373503581547167_seed_42.csv
    ├── tcga_rnaseq_train_fraction_0.9_id_242870585127480531622270373503581547167_seed_42.csv
    ├── test_chembl_22_clean_1576904_sorted_std_final.smi
    └── train_chembl_22_clean_1576904_sorted_std_final.smi

1 directory, 11 files

NOTE: no worries, the data folder is in the .gitignore.

Clone the repos

To get the scripts to run each of the component create a code folder and clone the repos. Simply type this:

mkdir code && cd code && \
  git clone https://github.com/PaccMann/paccmann_predictor && \ 
  git clone https://github.com/PaccMann/paccmann_omics && \ 
  git clone https://github.com/PaccMann/paccmann_chemistry && \ 
  git clone https://github.com/PaccMann/paccmann_generator && \
  cd ..

NOTE: no worries, the code folder is in the .gitignore.

Pipeline

Now it's all set to run the full pipeline.

NOTE: the workload required to run the full pipeline is intesive and might not be straightforward to run all the steps on a desktop laptop. For this reason, we also provide pretrained models that can be downloaded and used to run the different steps.

NOTE: in the following, we assume a folder models has been created in the root of the repository. No worries, the models folder is in the .gitignore.

Multimodal drug sensitivity predictor

(paccmann_rl) $ python ./code/paccmann_predictor/examples/train_paccmann.py \
    ./data/splitted_data/gdsc_cell_line_ic50_train_fraction_0.9_id_997_seed_42.csv \
    ./data/splitted_data/gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv \
    ./data/gdsc-rnaseq_gene-expression.csv \
    ./data/gdsc.smi \
    ./data/2128_genes.pkl \
    ./data/smiles_language_chembl_gdsc_ccle.pkl \
    ./models/ \
    ./code/paccmann_predictor/examples/example_params.json paccmann

PVAE

(paccmann_rl) $ python ./code/paccmann_omics/examples/train_vae.py \
    ./data/splitted_data/tcga_rnaseq_train_fraction_0.9_id_242870585127480531622270373503581547167_seed_42.csv \
    ./data/splitted_data/tcga_rnaseq_test_fraction_0.1_id_242870585127480531622270373503581547167_seed_42.csv \
    ./data/2128_genes.pkl \
    ./models/ \
    ./code/paccmann_omics/examples/example_params.json pvae

SVAE

(paccmann_rl) $ python ./code/paccmann_chemistry/examples/train_vae.py \
    ./data/splitted_data/train_chembl_22_clean_1576904_sorted_std_final.smi \
    ./data/splitted_data/test_chembl_22_clean_1576904_sorted_std_final.smi \
    ./data/smiles_language_chembl_gdsc_ccle.pkl \
    ./models/ \
    ./code/paccmann_chemistry/examples/example_params.json svae

PaccMann^RL

(paccmann_rl) $ python ./code/paccmann_generator/examples/train_paccmann_rl.py \
    ./models/svae \
    ./models/pvae \
    ./models/paccmann \
    ./data/smiles_language_chembl_gdsc_ccle.pkl \
    ./data/gdsc_transcriptomics_for_conditional_generation.pkl \
    ./code/paccmann_generator/examples/example_params.json \
    paccmann_rl breast

NOTE: this will create a biased_model folder containing the conditional generator and the baseline SMILES generator used. In this case: breast_paccmann_rl and baseline. No worries, the biased_models folder is in the .gitignore.

References

If you use paccmann_rl in your projects, please cite the following:

@misc{born2019paccmannrl,
    title={PaccMann^RL: Designing anticancer drugs from transcriptomic data via reinforcement learning},
    author={Jannis Born and Matteo Manica and Ali Oskooei and Joris Cadow and María Rodríguez Martínez},
    year={2019},
    eprint={1909.05114},
    archivePrefix={arXiv},
    primaryClass={q-bio.BM}
}

paccmann_rl's People

Contributors

drugilsberg avatar jannisborn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.