optimal-pse-lab / deepdock Goto Github PK

Code related to : O. Mendez-Lucio, M. Ahmad, E.A. del Rio-Chanona, J.K. Wegner, A Geometric Deep Learning Approach to Predict Binding Conformations of Bioactive Molecules

License: MIT License

Python 18.33% Jupyter Notebook 80.68% Shell 0.12% Dockerfile 0.86%

deepdock's People

Contributors

Stargazers

Watchers

deepdock's Issues

Is there any method to specify binding site in script?

Hi,

Is there any method to specify binding site in script?
The result pose is binding in different site unlike raw protein-ligand complex structures.

JeongSoo Na

Constant additive term in MDN

DeepDock/deepdock/models.py

Line 218 in ab1e450

sigma = F.elu(self.z_sigma(C))+1.1

Hi,
first of all thanks for sharing the code.

Just a simple question: what is the meaning of the +1.1 and +1 in the the output for sigma and mu in the mixture density network? Is it some kind of prior knowledge you incorporate in the model? Or simply some numeric regularization?

Thanks in advance for you answer.

What is the purpose of sanitize and cleanupSubstructures in RDkit function?

What is the purpose of sanitize and cleanupSubstructures in RDkit function? What happened if I set sanitize=True, cleanupSubstructures=True ?

real_mol = Chem.MolFromMol2File('1z6e_ligand.mol2',sanitize=False, cleanupSubstructures=False)

MSMS error

I have navigated MSMS as
export MSMS_BIN=/,,,/MSMS
but I got error. So could you please help me to solve it?

Unable to load data for training example

RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.

This error is thrown when I tried to run the second cell in the Train_DeepDock.ipynb example.
I have tried using the Archive manager to open the dataset.tar but it somehow fails to extract, may I know what is contained within the tar files? Is it a regular tar archive file or some other file renamed with a .tar extension?

I am currently using torch-geometric 2.0.3, which is a later version than the one in the requirements.txt but unfortunately my RTX 30 series hardware only supports CUDA 11+ and CUDA 10 won't work, therefore I am unable to use older versions or torch/torch geometric.

May I know if there are any other ways I could get my hands on the training data?

Thank you.

Model Retraining

Hi,

Great work!

I have a question about the model retraining.

To retrain the model on a new dataset, how shall the data be structure? In the current state, DeepDock loads an entire .tar archive (which I suppose contains proteins and ligands structures) for training. However, we did not manage to decompress it and thus it is unclear how the files should be organised for training the algorithm de-novo (i.e., on a new training set).

Thank you!

batch_size ->8, then show the error:

when i changed the batch_size->8
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

great work, do you have any plan to update?

since the torch is updated.. do you have any plan to update?

BTW, how to train my own data? epecially how to manage the training data?

Thanks

dataset download issue

Hi,
I found that the downloaded training set and test set data are damaged by the data/get_deepdock_data.sh. Can you update the data set again?
Thanks.
David.

Error while building docker container

Hi,

I've tried to build the docker container and I get this error :

Step 22/26 : RUN ["wget", "-O", "reduce.gz", "http://kinemage.biochem.duke.edu/php/downlode-3.php?filename=/../downloads/software/reduce31/reduce.3.23.130521.linuxi386.gz"]
---> Running in d2006b089496
--2021-12-07 13:27:22-- http://kinemage.biochem.duke.edu/php/downlode-3.php?filename=/../downloads/software/reduce31/reduce.3.23.130521.linuxi386.gz Resolving kinemage.biochem.duke.edu (kinemage.biochem.duke.edu)... 40.76.186.240
Connecting to kinemage.biochem.duke.edu (kinemage.biochem.duke.edu)|40.76.186.240|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2021-12-07 13:27:22 ERROR 404: Not Found.

The command 'wget -O reduce.gz http://kinemage.biochem.duke.edu/php/downlode-3.php?filename=/../downloads/software/reduce31/reduce.3.23.130521.linuxi386.gz' returned a non-zero code: 8
Looking forward to test deepdock ^^

libcusparse issue

I got error about libcusparse, when I ran Train DeepDock.
Could you please help me to solve this issue?

Large Scale Screening

Hi,

I have a question about large scale screening workflow.

Could you please suggest which would be the optimal usage of DeepDock for large scale screening?

Specifically, thousands of molecules all stored within the same .mol2 file against one single protein .pdb. If you could tell us which functions/we have to run in sequential way, it would be really helpful.

Thank you.

how to represent the ligand as a vector of the Euler angles, the relative position of the ligand?

hi, @omendezlucio
I am interesting in represent the ligand as a vector of the Euler angles, the relative position of the ligand. I am confused in it and didn't find the code in your project. Would you tell me the principle about this process，or share some code about it?

how to gennerate ply file?

installation and examples

hi ,

i downloaded the Docker image as in:
docker pull omendezlucio/deepdock
docker run -it omendezlucio/deepdock:latest
then copy/paste the source from
https://github.com/OptiMaL-PSE-Lab/DeepDock/blob/main/examples/Score_example.ipynb
into a python script:

from rdkit import Chem
import deepdock
from deepdock.models import *
from deepdock.DockingFunction import score_compound
from deepdock.DockingFunction import calculate_atom_contribution
import numpy as np
import torch
np.random.seed(123)
torch.cuda.manual_seed_all(123)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
ligand_model = LigandNet(28, residual_layers=10, dropout_rate=0.10)
target_model = TargetNet(4, residual_layers=10, dropout_rate=0.10)
model = DeepDock(ligand_model, target_model, hidden_dim=64, n_gaussians=10, dropout_rate=0.10, dist_threhold=7.).to(device)
checkpoint = torch.load(deepdock.path[0]+'/../Trained_models/DeepDock_pdbbindv2019_13K_minTestLoss.chk', map_location=torch.device(device))
model.load_state_dict(checkpoint['model_state_dict'])
target_ply = deepdock.path[0]+'/../data/1z6e_protein.ply'
real_mol = Chem.MolFromMol2File(deepdock.path[0]+'/../data/1z6e_ligand.mol2',sanitize=False, cleanupSubstructures=False)
score = score_compound(real_mol, target_ply, model, dist_threshold=3., seed=123, device=device)
score

I called the script test.py and said:
python3 test.py

this runs for a few seconds, without any errors or warnings, and then exits without giving me any output,
shouldn't the final command (score) print out the score to stdout?
talking about commands ... are there any plans to make a documentation for deepdock?

when inside docker i cannot execute the jupyter notebook, as there is no browser in this image.
therefore i tried to install form source as outlined on the webpage.
I say:

git clone https://github.com/OptiMaL-PSE-Lab/DeepDock.git
cd DeepDock/
conda create --name mydd
conda activate mydd
git submodule update --init --recursive
conda install -c conda-forge rdkit=2019.09.1
pip install -r requirements.txt

the last command starts running, and then i get:
[...]
Looking in links: https://pytorch-geometric.com/whl/torch-1.4.0.html, https://pytorch-geometric.com/whl/torch-1.4.0.html, https://pytorch-geometric.com/whl/torch-1.4.0.html, https://pytorch-geometric.com/whl/torch-1.4.0.html
Collecting torch==1.4.0
Using cached torch-1.4.0-cp38-cp38-manylinux1_x86_64.whl (753.4 MB)
Collecting torch-scatter==2.0.4+cu101
Using cached https://data.pyg.org/whl/torch-1.4.0/torch_scatter-2.0.4%2Bcu101-cp38-cp38-linux_x86_64.whl (10.6 MB)
Discarding https://data.pyg.org/whl/torch-1.4.0/torch_scatter-2.0.4%2Bcu101-cp38-cp38-linux_x86_64.whl (from https://pytorch-geometric.com/whl/torch-1.4.0.html): Requested torch-scatter==2.0.4+cu101 from https://data.pyg.org/whl/torch-1.4.0/torch_scatter-2.0.4%2Bcu101-cp38-cp38-linux_x86_64.whl (from -r requirements.txt (line 3)) has inconsistent version: filename has '2.0.4+cu101', but metadata has '2.0.4'
ERROR: Could not find a version that satisfies the requirement torch-scatter==2.0.4+cu101 (from versions: latest+cpu, latest+cu92, latest+cu100, latest+cu101, 0.3.0, 1.0.2, 1.0.3, 1.0.4, 1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.3.0, 1.3.1, 1.3.2, 1.4.0, 2.0.2, 2.0.3, 2.0.3+cpu, 2.0.3+cu100, 2.0.3+cu101, 2.0.3+cu92, 2.0.4, 2.0.4+cpu, 2.0.4+cu100, 2.0.4+cu101, 2.0.4+cu92, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.9)
ERROR: No matching distribution found for torch-scatter==2.0.4+cu101

any suggestions on how to deal with that?
(my system is Ubuntu 20.04.4)

thanks!
michael

Dataload error while running training_DeepDock.ipynb

Hello, I try to run training_DeepDock.ipynb.
With my environment pytorch=1.10.2 and pyg=2.0.3, I face following error while loading preprocessed data with following error

Traceback (most recent call last):                                                                                              
  File "/db2/users/kyuhyunlee/DeepDock_test/train_deepdock.py", line 21, in <module>                                            
    db_complex = PDBbind_complex_dataset(data_path=deepdock.__path__[0]+'/../data/dataset_deepdock_pdbbind_v2019_16K.tar',      
  File "/db2/users/kyuhyunlee/git_repos/DeepDock/deepdock/utils/data.py", line 60, in __init__                                  
    self.data = [i for i in self.data if not np.isnan(i[1].x.numpy().min())]                                                    
  File "/db2/users/kyuhyunlee/git_repos/DeepDock/deepdock/utils/data.py", line 60, in <listcomp>                                
    self.data = [i for i in self.data if not np.isnan(i[1].x.numpy().min())]                                                    
  File "/db2/users/kyuhyunlee/anaconda3/envs/py39_default/lib/python3.9/site-packages/torch_geometric/data/data.py", line 642, i
n x                                                                                                                             
    return self['x'] if 'x' in self._store else None                                                                            
  File "/db2/users/kyuhyunlee/anaconda3/envs/py39_default/lib/python3.9/site-packages/torch_geometric/data/data.py", line 357, i
n __getattr__                                                                                                                   
    raise RuntimeError(                                                                                                         
RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing
 dataset, remove the 'processed/' directory in the dataset's root folder and try again.

Is there any method to fix it? I already know DeepDock need older version of pytorch and pytorch_geometric as prerequisities but It will be nice to use DeepDock with latest version of pytorch and pyg.

Screening power does not match

Hi,
Could you please release your Score_decoys_screening_CASF2016.csv. file? I tried your checkpoint with your screening power calculation notebook scripts but can not get the same result as shown in your paper. The code I used for the screening success rate is from CASF-16 and I checked the example data (e.g. AnadockVina) and the results are correct for them.
Best

Ply Files

Hi,

Once again, great work!

I have a question about the ply files used by DeepDock.

We noticed that inside the data folder there are ".ply" files. We did not understand whether these files are computed on the fly during DeepDock scoring, or if they should be computed separately before scoring. Could you elaborate?

Thank you!

Charge file not generated/found

FileNotFoundError: [Errno 2] No such file or directory: '1z6e_protein_temp_out.csv'

Running the cell with compute_inp_surface(target_filename, ligand_filename, dist_threshold=10) in the Docking_example.ipynb throws this error.

Is there something that needs to be done with the multivalue binary in ABPS to return the tmp_file_base+"_out.csv"?

The MULTIVALUE_BIN enviornment variable is pointing to the exact path for the multivalue binary and all the steps prior to this works to output files, like the tmp_file_base + ".csv".

Also, may I ask what this three lines of code are actually doing?
multivalue = multivalue_bin + " %s %s %s"
make_multivalue = multivalue % (tmp_file_base+".csv", tmp_file_base+".dx", tmp_file_base+"_out.csv")
os.system(make_multivalue)

Thank you.

GPU utilization is very low

First, I change the input data from type Data to type HeteroData, which contains target,and ligand, and replace dataloader with datalistloader, in order to train model on multi_GPU. But the GPU utilization is extremely low through training, for example, 15%-20% for each GPU When I use 4 GPU to train. And I also train models on single GPU, GPU utilization is just 30%, a little higher than multi-GPU.
For multi-GPU, batch_size is 12; For single GPU, batch_size is 3.

Wrong RMSD calculation?

I noticed there is a problem with the docking pose generation script you provided #Docking_CASF2016_CoreSet.ipynb.

In dock_compound function, there is a line
result['rmsd'] = Chem.rdMolAlign.AlignMol(opt_mol, real_mol, atomMap=list(zip(opt.noHidx,opt.noHidx))).

This means you are aligning opt_mol to real_mol, then calculate the rmsd, when I delete this line, the docking pose will become much worse than before. This operation is only used in the molecule generation task but not in the docking task, it will neglect the error of rotation and translation of ligand.
You should output the conformer to a file and then use obrms(from openbabel) to calculate the real reliable rmsd.

I reproduced your experiment, using the code directly from your code the result successful rate is correct and is about 62%.
After I removed the alignment line and redo the experiments, the successful rate drops down to 41%. (percentage of docking pose is <2A RMSD compare to crystal-structure)

About screening power calculation

Your article "https://www.nature.com/articles/s42256-021-00409-9" is great, thank you very much for sharing the source code and test results. But I have some questions about the results of the CASF2016 Screening power test.
In your ForwardScreeningPower_Deepdocks_3A.out:
The best ligand is found among top 1% candidates for 25 cluster(s); success rate = 43.9%
The best ligand is found among top 5% candidates for 35 cluster(s); success rate = 61.4%
The best ligand is found among top 10% candidates for 47 cluster(s); success rate = 82.5%

Why do the results have a different number of clusters instead of a uniform 57 clusters？
Looking forward to your reply, thanks!

Error when using another Ligand

I tried to use different ligands to dock with the proteins in the repo .According to the files in your examples, when I tried to dock with 2br1_protein.pdb and 2wtv_protein.pdb, it worked .But when I tried to dock with 1z6e_protein.pdb, it failed.
And error msg:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    compute_inp_surface(target_filename, ligand_filename, dist_threshold=10)
  File "/DeepDock/deepdock/prepare_target/computeTargetMesh.py", line 70, in compute_inp_surface
    structure = structures[0] # 'structures' may contain several proteins in this case only one.
  File "/opt/conda/lib/python3.6/site-packages/Bio/PDB/Entity.py", line 45, in __getitem__
    return self.child_dict[id]
KeyError: 0

It seems like something wrong when generating 1z6e_protein_15A.pdb.
And my Ligand file:
new1.mol2.zip
Thank you!

optimal-pse-lab / deepdock Goto Github PK

deepdock's People

Contributors

Stargazers

Watchers

Forkers

deepdock's Issues

Recommend Projects

Recommend Topics

Recommend Org