fimrie / deepcoy Goto Github PK

License: Other

Python 100.00%

deepcoy's Introduction

DeepCoy - Generating Property-Matched Decoy Molecules Using Deep Learning

This repository contains our implementation of Generating Property-Matched Decoy Molecules Using Deep Learning (DeepCoy).

If you found DeepCoy useful, please cite our paper:

Imrie F, Bradley AR, Deane CM. Generating property-matched decoy molecules using deep learning. Bioinformatics. 2021

@article{Imrie2021DeepCoy,
    author = {Imrie, Fergus and Bradley, Anthony R and Deane, Charlotte M},
    title = "{Generating property-matched decoy molecules using deep learning}",
    journal = {Bioinformatics},
    year = {2021},
    month = {02},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btab080},
    url = {https://doi.org/10.1093/bioinformatics/btab080},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btab080/36297301/btab080.pdf},
}

Acknowledgements

We thank the authors of Constrained Graph Variational Autoencoders for Molecule Design for releasing their code. The code in this repository is based on their source code release (link). If you find this code useful, please consider citing their work.

Requirements

This code was tested in Python 3.6 with Tensorflow 1.10.

A yaml file containing all install requirements is provided. This can be readily setup using conda.

conda env create -f DeepCoy-env.yml
conda activate DeepCoy-env

To run our model using subgraph reweighted loss function, you need to download the subgraph frequency data from http://opig.stats.ox.ac.uk/resources. Note this is not required for generating molecules using pretrained models but is advised for training new models.

Data Extraction

We have prepared two training datasets based on different physicochemical properties. Both were created from a subset of the ZINC dataset.

To preprocess these datasets, please go to data directory and run prepare_data.py.

python prepare_data.py

Running DeepCoy

To train and generate molecules using DeepCoy, use:

python DeepCoy.py --dataset zinc --config '{"number_of_generation_per_valid": 100, "num_epochs": 10, "epoch_to_generate": 10, "train_file": "data/molecules_zinc_dekois_train.json", "valid_file": "data/molecules_zinc_dekois_valid.json", "subgraph_freq_file": "./freq_dict_zinc_250k_smarts.pkl"}'

To train and generate molecules using DeepCoy without the subgraph reweighted loss function, use:

python DeepCoy.py --dataset zinc --config '{"number_of_generation_per_valid": 100, "num_epochs": 10, "epoch_to_generate": 10, "train_file": "data/molecules_zinc_dekois_train.json", "valid_file": "data/molecules_zinc_dekois_valid.json", "use_subgraph_freq": false}'

To generate molecules with a pretrained model, use

python DeepCoy.py --restore models/DeepCoy_DUDE_model_e09.pickle --dataset zinc --config '{"generation": true, "number_of_generation_per_valid": 1000, "batch_size": 1, "train_file": "data/molecules_zinc_dekois_valid.json", "valid_file": "data/molecules_zinc_dekois_valid.json", "output_name": "output/DeepCoy_generated_decoys_zinc_dekois_valid.txt"}'

The output is of the following format:

Input molecule (SMILES) Generated molecule (SMILES)

More configurations can be found at function default_params in DeepCoy.py.

Evaluation

A script to evaluate the generated molecules and prepare a set of decoys is provided in evaluation directory. You can either specify a one file or a directory containing multiple files to process.

python select_and_evaluate_decoys.py --data_path PATH_TO_INPUT_FILE/DIRECTORY --output_path PATH_TO_OUTPUT --dataset_name dude --num_decoys_per_active 50 >> decoy_selection_log.txt

The input format should be of the following format:

Active molecule (SMILES) Possible decoy molecule (SMILES)

Pretrained Models and Generated Molecules

We provide two pretrained models based on different physicochemical properties (as described in our paper).

Due to GitHub file size constraints, these need to be downloaded from http://opig.stats.ox.ac.uk/resources:

models/DeepCoy_DUDE_model_e09.pickle
models/DeepCoy_DEKOIS_model_e10.pickle

In addition, we provide a model that incorporates phosphorus:

models/DeepCoy_DUDE_phosphorus_model_e10.pickle

Generated molecules can also be downloaded from the OPIG website.

Examples

An example Jupyter notbook demonstrating the use of DeepCoy to generate and select decoy molecule can be found in the examples directory.

Contact (Questions/Bugs/Requests)

Please submit a Github issue or contact either Fergus Imrie or the Oxford Protein Informatics Group (OPIG) [email protected].

deepcoy's People

Contributors

Stargazers

Watchers

Forkers

oxpig agnesdark sahikat asclepiusinformatica thanh-an-pham sherifelsabbagh

deepcoy's Issues

Number of propertes problem

Hi,

In your article, you mentioned that you compute 27 unique properties.

But, in your script decoy_utils.py calc_props_all function, I only find 26 properties. I checked many times, seems like it is 26 properties.

Any suggestions? thanks!

error in creation of the conda env

Hello There,

I work on MacOSX Catalina with Anaconda.

after I ran

conda env create --file DeepCoy-env.yml

I received the following error:

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - gstreamer==1.14.0=hb453b48_1
  - icu==58.2=h9c2bf20_1
  - grpcio==1.16.1=py36hf8bcb03_1
  - openssl==1.1.1a=h7b6447c_0
  - bzip2==1.0.6=h14c3975_5
  - pillow==5.3.0=py36h34e0f95_0
  - pyqt==5.9.2=py36h22d08a2_1
  - rdkit==2018.09.1.0=py36h71b666b_1
  - cython==0.29=py36he6710b0_0
  - libgfortran-ng==7.3.0=hdf63c60_0
  - cudnn==7.2.1=cuda9.2_0
  - mkl_fft==1.0.6=py36h7dd41cf_0
  - sqlite==3.25.3=h7b6447c_0
  - pandas==0.23.4=py36h04863e7_0
  - protobuf==3.6.1=py36he6710b0_0
  - jpeg==9b=h024ee3a_2
  - tensorflow-gpu==1.10.0=hf154084_0
  - libpng==1.6.35=hbc83047_0
  - pixman==0.34.0=hceecf20_3
  - py-boost==1.65.1=py36hf484d3e_4
  - tensorflow-base==1.10.0=gpu_py36had579c0_0
  - numpy==1.15.4=py36h1d66e8a_0
  - libstdcxx-ng==8.2.0=hdf63c60_1
  - zlib==1.2.11=h7b6447c_3
  - mkl_random==1.0.1=py36h4414c95_1
  - xz==5.2.4=h14c3975_4
  - cupti==9.2.148=0
  - dbus==1.13.12=h746ee38_0
  - readline==7.0=ha6073c6_4
  - libgcc-ng==8.2.0=hdf63c60_1
  - libedit==3.1.20170329=h6b74fdf_2
  - c-ares==1.15.0=h7b6447c_1
  - cudatoolkit==9.2=0
  - gst-plugins-base==1.14.0=hbbd80ab_1
  - numpy-base==1.15.4=py36h81de0dd_0
  - tk==8.6.8=hbc83047_0
  - cairo==1.14.12=h8948797_3
  - libprotobuf==3.6.1=hd408876_0
  - libxml2==2.9.8=h26e45fe_1
  - ncurses==6.1=hf484d3e_0
  - python==3.6.7=h0371630_0
  - libxcb==1.13=h1bed415_1
  - scipy==1.1.0=py36hfa4b5c9_1
  - fontconfig==2.13.0=h9420a91_0
  - libffi==3.2.1=hd88cf55_4
  - libboost==1.65.1=habcd387_4
  - expat==2.2.6=he6710b0_0
  - freetype==2.9.1=h8a8886c_1
  - libuuid==1.0.3=h1bed415_2
  - tensorflow==1.10.0=gpu_py36hcebf108_0
  - scikit-learn==0.20.1=py36h4989274_0
  - _tflow_1100_select==0.0.1=gpu
  - tensorboard==1.10.0=py36hf484d3e_0
  - kiwisolver==1.1.0=py36he6710b0_0
  - tornado==6.0.3=py36h7b6447c_0
  - glib==2.56.2=hd408876_0
  - matplotlib==3.0.2=py36h5429711_0
  - libtiff==4.0.9=he85c1e1_2
  - sip==4.19.13=py36he6710b0_0
  - qt==5.9.7=h5867ecd_1
  - pcre==8.42=h439df22_0

What could be the possible solution to this problem?

Thanks in advance

AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

I found this error when I tried to generate decoys

AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

Add new atom type and failed when training

Hi I'm found that your model atom dictionary doesn't have phosphorus, and I added it. but I didn't modify bucket_sizes, because I don't know what it does

    elif dataset=='zinc':
        return { 'atom_types': ['Br1(0)', 'C4(0)', 'Cl1(0)', 'F1(0)', 'H1(0)', 'I1(0)',
                'N2(-1)', 'N3(0)', 'N4(1)', 'O1(-1)', 'O2(0)', 'S2(0)','S4(0)', 'S6(0)', 'P5(0)'],
                 'maximum_valence': {0: 1, 1: 4, 2: 1, 3: 1, 4: 1, 5:1, 6:2, 7:3, 8:4, 9:1, 10:2, 11:2, 12:4, 13:6, 14:5},
                 'number_to_atom': {0: 'Br', 1: 'C', 2: 'Cl', 3: 'F', 4: 'H', 5:'I', 6:'N', 7:'N', 8:'N', 9:'O', 10:'O', 11:'S', 12:'S', 13:'S', 14:'P'},
                 'bucket_sizes': np.array([28,31,33,35,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,53,55,58,84]) 
               }

BUT failed when training. Is there anything else that needs to be modified?
Can you help me? Thanks!

Possible wrong version of tensorflow in DeepCoy-env.yml

Hi,

When running the examples to generate molecules with the pretrained models, or running the example Jupyter Notebook (both in a newly made conda environment setup with the DeepCoy-env.yml file) I get a tensorflow error:

Input:

(DeepCoy-env) XXXX@XXXX:~/DeepCoy$ python3 DeepCoy.py --restore models/DeepCoy_DUDE_model_e09.pickle --dataset zinc --config '{"generation": true, "number_of_generation_per_valid": 1000, "batch_size": 1, "train_file": "data/molecules_zinc_decoys_valid.json", "valid_file": "data/molecules_zinc_dekois_valid.json", "output_name": "output/DeepCoy_generated_decoys_zinc_dekois_valid.txt"}'

Output:

Traceback (most recent call last):
  File "DeepCoy.py", line 1129, in <module>
    model = DenseGGNNChemModel(args)
  File "DeepCoy.py", line 55, in __init__
    super().__init__(args)
  File "/DeepCoy/GGNN_DeepCoy.py", line 80, in __init__
    self.make_model()
  File "/DeepCoy/GGNN_DeepCoy.py", line 126, in make_model
    self.prepare_specific_graph_model()
  File "DeepCoy.py", line 196, in prepare_specific_graph_model
    cell = tf.nn.rnn_cell.DropoutWrapper(cell,
AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'

When trying to debug, it looks as though tf.nn.rnn_cell was moved into tf.contrib from version 1.0 (this contrib usage seems to be used in line 195 and 209 of DeepCoy also?)

Changing lines 195, 196 and 209, 210 from:

cell = tf.contrib.rnn.GRUCell(new_h_dim)
cell = tf.nn.rnn_cell.DropoutWrapper(cell,

cell = tf.contrib.rnn.GRUCell(new_h_dim)
cell = tf.contrib.rnn.DropoutWrapper(cell,

Fixes this particular error, but then the below error is produced from the same input:
Output:

File "DeepCoy.py", line 1129, in <module>
    model = DenseGGNNChemModel(args)
  File "DeepCoy.py", line 55, in __init__
    super().__init__(args)
  File "/DeepCoy/GGNN_DeepCoy.py", line 80, in __init__
    self.make_model()
  File "/DeepCoy/GGNN_DeepCoy.py", line 142, in make_model
    "_encoder")
  File "DeepCoy.py", line 307, in compute_final_node_representations_with_residual
    h = self.weights['node_gru'+scope_name+str(iter_idx)](acts, h)[1]                       # [b*v, h]
  File "/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 713, in __call__
    output, new_state = self._cell(inputs, state, scope)
  File /.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 150, in __call__
    [inputs, state], 2 * self._num_units, True, 1.0))
  File "/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 1044, in _linear
    _WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)
  File "/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 653, in _get_single_variable
    name, "".join(traceback.format_list(tb))))
ValueError: Variable graph_model/gru_scope_encoder0/gru_cell/gates/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 1044, in _linear
    _WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)
  File "/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 150, in __call__
    [inputs, state], 2 * self._num_units, True, 1.0))
  File "/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 713, in __call__
    output, new_state = self._cell(inputs, state, scope)

> /.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py(653)_get_single_variable()
-> name, "".join(traceback.format_list(tb))))

IndexError: index 28 is out of bounds for axis 1 with size 28 while generating the decoys

Hello!
I want to use DeepCoy for decoy generation based on 3172 active structures obtained by chembl. I successfully installed DeepCoy and generated output decoys for the P38-alpha_example from the jupyter notebook and for a small subset of my own actives (10 first ligands from my 3062 actives).

Nonetheless when I try to generate decoys for all 3062 of my own actives I get an Index error.
IndexError: index 28 is out of bounds for axis 1 with size 28

All my input files, commands and terminal output including the error are available at: https://box.fu-berlin.de/s/6WiPi8ejTkJpysD

I know that numpy starts counting at 0 and that 28 is out of the range of 0-27 but I do not know why this error occurs only in a larger subset as I do not see any differences in the file formats of the input files of the 10 or the 3062 ligands. I also ensured that both files (.smi / .json) contain the same number of ligands (i.e. I ensured, that all molecules from the .smi could be red and processed by prepare_data.py to create the .json file. ).

I hope you can help me to solve the issue

Range Error

hello! when I use the prepare_data.py to process my own training dataset, I got an error below：

Range Error
idx
Violation occurred on line 168 in file /opt/conda/conda-bld/rdkit_1540176401003/work/Code/GraphMol/ROMol.cpp
Failed Expression: 31 < 31