paccmann / paccmann_rl Goto Github PK

View Code? Open in Web Editor NEW

30.0 6.0 9.0 32 KB

Code pipeline for the PaccMann^RL in iScience: https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6

License: MIT License

deep-learning generative-models de-novo-drug-design drug-discovery transcriptomics drug-sensitivity

paccmann_rl's Introduction

DISCLAIMER:

This code gives the tensorflow implementation of PaccMann as of our paper in Molecular Pharmaceutics.

PaccMann

paccmann is a package for drug sensitivity prediction and is the core component of the repo.

The package provides a toolbox of learning models for IC50 prediction using drug's chemical properties and tissue-specific cell lines gene expression.

Citation

Please cite us as follows:

@article{oskooei2018paccmann,
  title={PaccMann: Prediction of anticancer compound sensitivity with multi-modal attention-based neural networks},
  author={Oskooei, Ali and Born, Jannis and Manica, Matteo and Subramanian, Vigneshwari and S{\'a}ez-Rodr{\'\i}guez, Julio and Mart{\'\i}nez, Mar{\'\i}a Rodr{\'\i}guez},
  journal={arXiv preprint arXiv:1811.06802},
  year={2018}
}

@article{manica2019paccmann,
author = {Manica, Matteo and Oskooei, Ali and Born, Jannis and Subramanian, Vigneshwari and Saez-Rodriguez, Julio and Rodriguez Martinez, Maria},
title = {Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders},
journal = {Molecular Pharmaceutics},
year = {2019},
doi = {10.1021/acs.molpharmaceut.9b00520},
note = {PMID: 31618586},
}

Installation

Setup of the virtual environment

We strongly recommend to work inside a virtual environment (venv).

Create the environment:

python3 -m venv venv

Activate it:

source venv/bin/activate

Module installation

The module can be installed either in editable mode:

pip3 install -e .

Or as a normal package:

pip3 install .

Models training

Models can be trained using the script bin/training_paccmann that is installed together with the module. Check the examples for a quick start. For more details see the help of the training command by typing training_paccmann -h:

usage: training_paccmann [-h] [-save_checkpoints_steps 300]
                         [-eval_throttle_secs 60] [-model_suffix]
                         [-train_steps 10000] [-batch_size 64]
                         [-learning_rate 0.001] [-dropout 0.5]
                         [-buffer_size 20000] [-number_of_threads 1]
                         [-prefetch_buffer_size 6]
                         train_filepath eval_filepath model_path
                         model_specification_fn_name params_filepath
                         feature_names

Run training of a `paccmann` model.

positional arguments:
  train_filepath        Path to train data.
  eval_filepath         Path to eval data.
  model_path            Path where the model is stored.
  model_specification_fn_name
                        Model specification function. Pick one of the
                        following: ['dnn', 'rnn', 'scnn', 'sa', 'ca', 'mca'].
  params_filepath       Path to model params. Dictionary with parameters
                        defining the model.
  feature_names         Comma separated feature names. Select from the
                        following: ['smiles_character_tokens',
                        'smiles_atom_tokens', 'fingerprints_256',
                        'fingerprints_512', 'targets_10', 'targets_20',
                        'targets_50', 'selected_genes_10',
                        'selected_genes_20', 'cnv_min', 'cnv_max', 'disrupt',
                        'zigosity', 'ic50', 'ic50_labels'].

optional arguments:
  -h, --help            show this help message and exit
  -save_checkpoints_steps 300, --save-checkpoints-steps 300
                        Steps before saving a checkpoint.
  -eval_throttle_secs 60, --eval-throttle-secs 60
                        Throttle seconds between evaluations.
  -model_suffix , --model-suffix 
                        Suffix for the trained moedel.
  -train_steps 10000, --train-steps 10000
                        Number of training steps.
  -batch_size 64, --batch-size 64
                        Batch size.
  -learning_rate 0.001, --learning-rate 0.001
                        Learning rate.
  -dropout 0.5, --dropout 0.5
                        Dropout to be applied to set and dense layers.
  -buffer_size 20000, --buffer-size 20000
                        Buffer size for data shuffling.
  -number_of_threads 1, --number-of-threads 1
                        Number of threads to be used in data processing.
  -prefetch_buffer_size 6, --prefetch-buffer-size 6
                        Prefetch buffer size to allow pipelining.

paccmann_rl's People

Contributors

Stargazers

Watchers

Forkers

oskoa anu-bioinfo bbyun28 stollpa eovchinn jeffm14 celestious rl-gym rnaimehaom

paccmann_rl's Issues

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

Thanks for fixing the previous issue. I tried again but the error comes back.

All steps were done according to the steps in the tutorial.

I checked that the cuda is well assigned to the model, but I got the following error:

Error message :

Traceback (most recent call last):
  File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 316, in <module>
    main()
  File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 222, in main
    cell_line, epoch, params['batch_size']
  File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_generator\paccmann_generator\reinforce.py", line 400, in policy_gradient
    latent_z, remove_invalid=True
  File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_generator\paccmann_generator\reinforce.py", line 315, in get_smiles_from_latent
    temperature=self.temperature
  File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_chemistry\paccmann_chemistry\models.py", line 382, in generate
    temperature=temperature
  File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_chemistry\paccmann_chemistry\models.py", line 209, in generate_from_latent
    output, hidden, stack = self(input_token, hidden, stack)
  File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_chemistry\paccmann_chemistry\stack_rnn.py", line 106, in forward
    embedded_input = self.encoder(input_token.to(self.device))
  File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\torch\nn\functional.py", line 1724, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

Process finished with exit code 1

About gene expression input to the model

Hi, thank you for your amazing work.

However, I am quite confused about the gene expression input to the model (both paccmann, and RL).
1. In PaccMann predictor
Your recent model used rna-seq data, and the dataset you uploaded (~400 cell lines) cannot cover the full cell lines in GDSC (~1000 cell lines). And also there are some missing genes among the 2,128 selected genes.

Can you explain how the model handles the expression values of missing genes? and also how does the model handle the data points that are missing in the cell line - gene expression dictionary?

2. In PaccMann RL (Generator)
As you mentioned in the readme, you used the rna-seq gex data for the whole framework.
but It seems like the input gene expression for conditional generation (the pickle file) was RMA-normalized gene expression.
The reason why I thought like that is because of the reasons that I mentioned above. (RMA data covered the most of cell lines (985) and it contains 2,128 selected genes)

You mentioned in the paper, the PVAE is trained with TCGA rna-seq data. Thus I think there might exist a discrepancy when you encode the RMA gene expression with the PVAE encoder.
Can you explain the exact source of the pickle file (gdsc_transcriptomics_for_conditional_generation.pkl) and the reason why you do not use that pickle file in the other part? (PaccMann predictor)

RuntimeError: Expected object of scalar type float but got scalar type __int64 for sequence element 1.

During rl model learning, the following errors are found. (I found this error when epoch is 6 epoch.)

As a result of my debugging, it seems to be an error that occurs when a molecule is not created in the generator.generate function when making smile molecules for validation. (mols_numerical is empty tensor.)

How can i fix it?

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 313, in
main()
File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 237, in main
epoch, params['eval_batch_size'], cell_line
File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl_cpu\lib\site-packages\paccmann_generator\reinforce.py", line 280, in generate_compounds_and_evaluate
latent_z, remove_invalid=True
File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl_cpu\lib\site-packages\paccmann_generator\reinforce.py", line 322, in get_smiles_from_latent
) for mol_num in iter(mols_numerical)
File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl_cpu\lib\site-packages\paccmann_generator\reinforce.py", line 322, in
) for mol_num in iter(mols_numerical)
RuntimeError: Expected object of scalar type float but got scalar type __int64 for sequence element 1.

RuntimeError: cudnn RNN backward can only be called in training mode

I found an error like that.

I think this is because the model mode is set to eval. So I change all models to train, but it doesn't work well.

self.predictor.train()
self.encoder.train()
self.generator.train()

rl_loss.backward()

Which model should I turn into a train to work?

C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\torch\nn\modules\container.py:100: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
Traceback (most recent call last):
File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 314, in
main()
File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 220, in main
cell_line, epoch, params['batch_size']
File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\paccmann_generator\reinforce.py", line 442, in policy_gradient
rl_loss.backward()
File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\torch\tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Users\seungho.kuk\anaconda3\envs\paccmann_rl\lib\site-packages\torch\autograd_init_.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cudnn RNN backward can only be called in training mode

SMILESLanguage object initiate problem

I got the following error, it seems to be an initiation problem with the SMLESLanguage object.

In train_paccmann_rl.py, the smiles language object is created with SMILES Language.load(smiles language path).

When created in this way, we have confirmed that SMLESLanguage does not pass the init function.

So it seems that elements like SMILES Language.special indexes and SMILESLanguage.transform_smiles are not created.

I think the code should be fixed in such a way that the object is first created and then loaded.

Traceback (most recent call last):
File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 317, in
main()
File "C:/Users/seungho.kuk/Desktop/Python_project/paccmann_rl/code/paccmann_generator/examples/train_paccmann_rl.py", line 223, in main
cell_line, epoch, params['batch_size']
File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_generator\paccmann_generator\reinforce.py", line 406, in policy_gradient
latent_z, remove_invalid=True
File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_generator\paccmann_generator\reinforce.py", line 327, in get_smiles_from_latent
) for mol_num in iter(mols_numerical)
File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_generator\paccmann_generator\reinforce.py", line 327, in
) for mol_num in iter(mols_numerical)
File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_datasets\pytoda\smiles\smiles_language.py", line 637, in token_indexes_to_smiles
for token_index in token_indexes
File "C:\Users\seungho.kuk\Desktop\Python_project\paccmann_rl\code\paccmann_datasets\pytoda\smiles\smiles_language.py", line 639, in
if token_index not in self.special_indexes
AttributeError: 'SMILESLanguage' object has no attribute 'special_indexes'

Process finished with exit code 1

model weight doesn't match

I follow the tutorial in your repo. but it doesn't work. I got error message like this.

command : python code\paccmann_generator\examples\IC50\train_paccmann_rl.py models\svae models\pvae models\paccmann data\gdsc_transcriptomics_for_conditional_generation.pkl code\paccmann_generator\examples\IC50\example_params.json paccmann_rl breast

RuntimeError: Error(s) in loading state_dict for TeacherVAE:
Missing key(s) in state_dict: "encoder.embedding.weight", "encoder.backward_stackgru.stack_controls_layer.weight", "encoder.backward_stackgru.stack_controls_layer.bias", "encoder.backward_stackgru.stack_input_layer.weight", "encoder.backward_stackgru.stack_input_layer.bias", "encoder.backward_stackgru.e
mbedding.weight", "encoder.backward_stackgru.gru.weight_ih_l0", "encoder.backward_stackgru.gru.weight_hh_l0", "encoder.backward_stackgru.gru.bias_ih_l0", "encoder.backward_stackgru.gru.bias_hh_l0", "encoder.backward_stackgru.gru.weight_ih_l1", "encoder.backward_stackgru.gru.weight_hh_l1", "encoder.backward_stac
kgru.gru.bias_ih_l1", "encoder.backward_stackgru.gru.bias_hh_l1", "decoder.embedding.weight", "decoder.output_layer.weight", "decoder.output_layer.bias".
Unexpected key(s) in state_dict: "encoder.encoder.weight", "encoder.decoder.weight", "encoder.decoder.bias", "encoder.gru.weight_ih_l0_reverse", "encoder.gru.weight_hh_l0_reverse", "encoder.gru.bias_ih_l0_reverse", "encoder.gru.bias_hh_l0_reverse", "encoder.gru.weight_ih_l1_reverse", "encoder.gru.weight
_hh_l1_reverse", "encoder.gru.bias_ih_l1_reverse", "encoder.gru.bias_hh_l1_reverse", "decoder.encoder.weight", "decoder.decoder.weight", "decoder.decoder.bias", "decoder.gru.weight_ih_l0_reverse", "decoder.gru.weight_hh_l0_reverse", "decoder.gru.bias_ih_l0_reverse", "decoder.gru.bias_hh_l0_reverse", "decoder.gr
u.weight_ih_l1_reverse", "decoder.gru.weight_hh_l1_reverse", "decoder.gru.bias_ih_l1_reverse", "decoder.gru.bias_hh_l1_reverse".
size mismatch for encoder.gru.weight_ih_l1: copying a param with shape torch.Size([768, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
size mismatch for decoder.gru.weight_ih_l1: copying a param with shape torch.Size([768, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).

I presume that there is a problem with the model or model information you provided.