gcorso / diffdock Goto Github PK
View Code? Open in Web Editor NEWImplementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
Home Page: https://arxiv.org/abs/2210.01776
License: MIT License
Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
Home Page: https://arxiv.org/abs/2210.01776
License: MIT License
Exception: ('RDKit could not read the molecule ', 'data/3d_sdf/1eby_01.sdf')
Explicit valence for atom # 48 H, 4, is greater than permitted
could not process mol
when Training a model by myself, What is the difference between score model and confidence model? why need this two score
The code of 'datasets.geom' is used for pretraining?
https://anonymous.4open.science/r/DiffDock/datasets/pdbbind.py
from datasets.geom import GeomNoiseTransform, Geom
The Zenodo dataset link doesn't work in china, please Add an alternate download site if possible; e.g. Google Cloud Disk
I follow the README but after I inference the whole dataset, the log says I failed at 80 of them, is there something wrong or it's an expected behavior?
HI,this is a great work. When I run inference, error exists:
radius molecule: mean 7.5976667404174805, std 0.0, max 7.5976667404174805
distance protein-mol: mean 12.849800109863281, std 0.0, max 12.849800109863281
rmsd matching: mean 0.0, std 0.0, max 0
common t schedule [1. 0.5]
Size of test dataset: 1
0it [00:00, ?it/s]Failed on ['data/7rfw_receptor.pdb____data/7rfw_ligand.mol2'] index 2 is out of bounds for axis 0 with size 2
1it [01:09, 70.00s/it]
Failed for 1 complexes
Skipped 0 complexes
Results are in results/user_predictions_small
[2]- Killed python -m inference --protein_path data/7rfw_receptor.pdb --ligand data/7rfw_ligand.mol2 --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
`
I have installed diffdock from github and fair-esm from pip. unzip the esm into diffdock.
I try to run the extract.py as given in the readme file . I am getting the error.
HOME=esm/model_weights python esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok
Traceback (most recent call last):
File "/home/cadd/DiffDock-main/esm/scripts/extract.py", line 136, in
main(args)
File "/home/cadd/DiffDock-main/esm/scripts/extract.py", line 77, in main
dataset, collate_fn=alphabet.get_batch_converter(args.truncation_seq_length), batch_sampler=batches
TypeError: get_batch_converter() takes 1 positional argument but 2 were given
$python -m inference --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
Precomputing and saving to cache SO(3) distribution table
Precomputing and saving to cache torus distribution table
100%|█████████████████████████████████████████| 201/201 [00:45<00:00, 4.39it/s]
100%|█████████████████████████████████████████| 201/201 [00:55<00:00, 3.59it/s]
/home/xzhang/projects/DiffDock/utils/torus.py:39: RuntimeWarning: invalid value encountered in divide
score_ = grad(x, sigma[:, None], N=100) / p_
Reading molecules and generating local structures with RDKit (unless --keep_local_structures is turned on).
0it [00:00, ?it/s]rdkit coords could not be generated without using random coords. using random coords now.
6it [00:01, 3.74it/s]
Reading language model embeddings.
Generating graphs for ligands and proteins
loading complexes: 100%|██████████████████████████| 6/6 [00:02<00:00, 2.81it/s]
loading data from memory: data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings863911206/heterographs.pkl
Number of complexes: 6
radius protein: mean 33.793853759765625, std 14.15740966796875, max 53.81545639038086
radius molecule: mean 6.548925876617432, std 3.7833714485168457, max 14.822683334350586
distance protein-mol: mean 59.33454513549805, std 22.522035598754883, max 75.89938354492188
rmsd matching: mean 0.0, std 0.0, max 0
HAPPENING | confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now.
Reading molecules and generating local structures with RDKit (unless --keep_local_structures is turned on).
0it [00:00, ?it/s]rdkit coords could not be generated without using random coords. using random coords now.
6it [00:01, 3.53it/s]
Reading language model embeddings.
Generating graphs for ligands and proteins
loading complexes: 100%|██████████████████████████| 6/6 [00:02<00:00, 2.82it/s]
loading data from memory: data/cache_torsion_allatoms/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_atomRad5_atomMax8_esmEmbeddings863911206/heterographs.pkl
Number of complexes: 6
radius protein: mean 33.793853759765625, std 14.15740966796875, max 53.81545639038086
radius molecule: mean 6.218267917633057, std 2.776658773422241, max 12.076842308044434
distance protein-mol: mean 58.96320724487305, std 22.422767639160156, max 75.24571990966797
rmsd matching: mean 0.0, std 0.0, max 0
common t schedule [1. 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35
0.3 0.25 0.2 0.15 0.1 0.05]
Size of test dataset: 6
0it [00:00, ?it/s]/home/xzhang/miniconda3/envs/diffdock/lib/python3.9/site-packages/e3nn/o3/_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback
To report the issue, try enable logging via setting the envvariable export PYTORCH_JIT_LOG_LEVEL=manager.cpp
(Triggered internally at /opt/conda/conda-bld/pytorch_1659484809662/work/torch/csrc/jit/codegen/cuda/manager.cpp:237.)
sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])
5it [03:45, 30.19s/it]Failed on ['data/PDBBind_processed/6mo8/6mo8_protein_processed.pdb____data/PDBBind_processed/6hld/6hld_ligand.mol2'] Invariant Violation
no eligible neighbors for chiral center
Violation occurred on line 213 in file Code/GraphMol/FileParsers/MolFileStereochem.cpp
Failed Expression: nbrScores.size()
RDKIT: 2022.09.1
BOOST: 1_78
6it [04:24, 44.08s/it]
Failed for 1 complexes
Skipped 0 complexes
Results are in results/user_predictions_small
There is only one file in folder "index5_data-PDBBind_processed-6mo8-6mo8_protein_processed.pdb____data-PDBBind_processed-6hld-6hld_ligand.mol2", but there are 41 files in the other 5 folders. Did I run the inference correctly? Thanks
Thanks for your repo. When I followed your readme and tried to create a conda env using "conda env create", I got
Pip subprocess error:
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [73 lines of output]
failed
CondaEnvException: Pip failed
confidence_train.py: error: unrecognized arguments: --inf_sched_alpha 1 --inf_sched_beta 1 --tr_sigma_min 0.1 --tr_sigma_max 34 --rot_sigma_min 0.03 --rot_sigma_max 1.55
Hi,
Would you please give some hints about the calculation process of the translation score?
Thanks in advance.
Line 50 in f6094e7
While going through the steps described in README I am getting an error. See below for the full trace...
[It is on macOS Monterey, Version 12.6 (21G115)]
BTW: It should be
python scripts/extract.py esm2_t33_650M_UR50D ../data/pdbbind_sequences.fasta embeddings_output --repr_layers 33 --include per_tok
instead of
python scripts/extract.py esm2_t33_650M_UR50D data/pdbbind_sequences.fasta embeddings_output --repr_layers 33 --include per_tok
Full trace:
(diffdock) $ python -m inference --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10
loading data from memory: data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings863911206/heterographs.pkl
Number of complexes: 0
/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:257: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/ryszard/repos/DiffDock/inference.py", line 81, in <module>
test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions,
File "/Users/ryszard/repos/DiffDock/datasets/pdbbind.py", line 111, in __init__
print_statistics(self.complex_graphs)
File "/Users/ryszard/repos/DiffDock/datasets/pdbbind.py", line 361, in print_statistics
print(f"{name[i]}: mean {np.mean(array)}, std {np.std(array)}, max {np.max(array)}")
File "<__array_function__ internals>", line 180, in amax
File "/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2793, in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/Users/ryszard/miniconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
(diffdock) $
Hi, When I run the inference, There exists a kill error, No other explanations is offered. So I do not know where to debug.Thank you for your time. Here is my log file.
~/my_prj/diffdock/DiffDock$ python -m inference --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small --inference_steps 2 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise loading data from memory: data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings492466589/heterographs.pkl Number of complexes: 5 radius protein: mean 29.789531707763672, std 12.012960433959961, max 53.81545639038086 radius molecule: mean 6.223214149475098, std 3.113785982131958, max 12.253068923950195 distance protein-mol: mean 65.75000762939453, std 19.175703048706055, max 76.23456573486328 rmsd matching: mean 0.0, std 0.0, max 0 HAPPENING | confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now. loading data from memory: data/cache_torsion_allatoms/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_atomRad5_atomMax8_esmEmbeddings492466589/heterographs.pkl Number of complexes: 5 radius protein: mean 29.789531707763672, std 12.012960433959961, max 53.81545639038086 radius molecule: mean 6.45449686050415, std 3.5746219158172607, max 13.436868667602539 distance protein-mol: mean 65.57249450683594, std 19.06147575378418, max 75.89845275878906 rmsd matching: mean 0.0, std 0.0, max 0 common t schedule [1. 0.5] Size of test dataset: 5 **0it [00:00, ?it/s]Killed**
Hi,
Thank you for the great project!
I followed the readme to taste the project but got confusing issues:
"Failed on [xxx] Cannot create a consistent method resolution order (MRO) for bases Batch, Batch"
when I run "Using the provided model weights for evaluation," i.e.,
python -m inference --protein_ligand_csv data/testset_csv.csv --out_dir results/user_predictions_testset --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
Would you mind giving some comments/suggestions to solve this problem?
Before using DiffDock for inference, would you suggest running the same obabel
and reduce
preprocessing steps on the receptor that were used in Equibind? I didn't see anything mentioned here in the documentation but still saw some references to the same _protein_obabel_reduce filenames so figured I'd ask.
Hi,
I performed a docking with inference protocol, and I finally obtained several result files named like rank1_confidence-0.22.sdf
.
Now I want to evaluate the results based on the confidence score, but I couldn't found any cutoff values for the score.
Is there any cutoff or rule of thumb value for result selection ?
Sincerely,
Thank you for the great project.
I have tried to dock histamine h1 receptor (protein) with diphenhydramine (ligand) but I got a strange pose of the ligand.
This file is complex after docking.
H1_diphenhydramine_DiffDock.pdb.zip
After that, I tried to use Schrodinger to visualize more details output and I got this one
The problem is the ligand is completely deformed, and the original ligand is no longer visible (as you can see in the image above).
Do you have any comments/suggestions to solve this problem?
LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening
https://pubs.acs.org/doi/10.1021/acs.jcim.0c00155
Thanks for the great work!
We notice a potentially major typo in the code.
DiffDock/models/score_model.py
Line 298 in f8d67b5
center_edge_index[0] will give us data['ligand'].batch (as defined below)
DiffDock/models/score_model.py
Line 403 in f8d67b5
But this doesn't make sense here. since we don't need the features from the graph for protein/ligand pair 0 to make prediction for protein/ligand pair 1. But lig_node_attr has dimension n_atom x n_feature. meaning that pair 1 will get features from pair 0.
This typo should have large effect on the result, but somehow it doesn't look like so. so we are confused. Maybe our understanding is not correct.
Wei
Hi, very interesting model. But here are a few suggestions from us drug design developers:
I saw issue#13 was very similar, but this doesn't seem to be the same problem. I have the updated version of the pdbbind.py.
Reading molecules and generating local structures with RDKit
75%|█████████████████████████████▎ | 2315/3080 [01:20<00:18, 42.04it/s]rdkit coords could not be generated without using random coords. using random coords now.
75%|█████████████████████████████▎ | 2317/3080 [02:35<00:51, 14.90it/s]
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/DiffDock/inference.py", line 81, in <module>
test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions,
File "/home/ubuntu/DiffDock/datasets/pdbbind.py", line 102, in __init__
self.inference_preprocessing()
File "/home/ubuntu/DiffDock/datasets/pdbbind.py", line 203, in inference_preprocessing
generate_conformer(mol)
File "/home/ubuntu/DiffDock/datasets/process_mols.py", line 276, in generate_conformer
AllChem.MMFFOptimizeMolecule(mol, confId=0)
ValueError: Bad Conformer Id
why many pip installed package cannot find the required version?
Running on the example works fine.
Running on new structures in a csv file with full paths/etc. I get this error(even though I'm not running PDBBind data):
Run cmd:
python -m inference --protein_ligand_csv data/diffdock_paths.csv --out_dir results/glycans_local --inference_steps 20 --samples_per_complex 40 --batch_size 10
File "/home/jadolfbr/.conda/envs/diffdock/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jadolfbr/.conda/envs/diffdock/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jadolfbr/DiffDock/inference.py", line 81, in <module>
test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions,
File "/home/jadolfbr/DiffDock/datasets/pdbbind.py", line 102, in __init__
self.inference_preprocessing()
File "/home/jadolfbr/DiffDock/datasets/pdbbind.py", line 208, in inference_preprocessing
mol.RemoveAllConformers()
AttributeError: 'ValueError' object has no attribute 'RemoveAllConformers'
Hello! Really great paper and very very nice interface for production. Much simpler than EquiBind, and much more easy to run individuals or sets of molecules.
I was wondering if there is any way to constrain the docking around a pocket as well as the overall flexibility of input ligands while using the option to use input structures. Sometimes for larger molecules, it is necessary to keep the input ligand mostly resembling the input structure.
Thanks.
Hello,
Thanks for your awesome work.
I met following error when training model:
loading data from memory: data/cache_torsion/limit0_INDEXtimesplit_no_lig_overlap_train_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings/heterographs.pkl
Number of complexes: 16271
radius protein: mean 35.324798583984375, std 10.894360542297363, max 140.3852081298828
radius molecule: mean 7.464962482452393, std 3.125143051147461, max 28.649322509765625
distance protein-mol: mean 12.960723876953125, std 6.1995849609375, max 70.93856811523438
rmsd matching: mean 0.5280884495331837, std 0.5165252914283075, max 6.0405902977233294
loading data from memory: data/cache_torsion/limit0_INDEXtimesplit_no_lig_overlap_val_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings/heterographs.pkl
Number of complexes: 955
radius protein: mean 35.945213317871094, std 11.460107803344727, max 92.777587890625
radius molecule: mean 7.608007431030273, std 3.1059141159057617, max 21.896770477294922
distance protein-mol: mean 13.32070541381836, std 6.682881832122803, max 54.33257293701172
rmsd matching: mean 0.5830034745709728, std 0.6322264833323166, max 5.280259896200438
Model with 20248214 parameters
Starting training...
Run name: big_score_model
0%| | 0/1017 [00:13<?, ?it/s]
Traceback (most recent call last):
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/chenshoufa/workspace/DiffDock/train.py", line 158, in <module>
main_function()
File "/home/chenshoufa/workspace/DiffDock/train.py", line 153, in main_function
train(args, model, optimizer, scheduler, ema_weights, train_loader, val_loader, t_to_sigma, run_dir)
File "/home/chenshoufa/workspace/DiffDock/train.py", line 35, in train
train_losses = train_epoch(model, train_loader, optimizer, device, t_to_sigma, loss_fn, ema_weights)
File "/home/chenshoufa/workspace/DiffDock/utils/training.py", line 128, in train_epoch
raise e
File "/home/chenshoufa/workspace/DiffDock/utils/training.py", line 105, in train_epoch
tr_pred, rot_pred, tor_pred = model(data)
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/data_parallel.py", line 70, in forward
outputs = self.parallel_apply(replicas, inputs, None)
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chenshoufa/workspace/DiffDock/models/score_model.py", line 299, in forward
global_pred = self.final_conv(lig_node_attr, center_edge_index, center_edge_attr, center_edge_sh, out_nodes=data.num_graphs)
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chenshoufa/workspace/DiffDock/models/score_model.py", line 88, in forward
out = self.batch_norm(out)
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chenshoufa/anaconda3/envs/diffdock/lib/python3.9/site-packages/e3nn/nn/_batchnorm.py", line 178, in forward
torch.cat(new_means, out=self.running_mean)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
I follow all the instructions in README and cloned and evaluated it three times using the checkpoint you provided. The evaluating instruction is the same as in README. The dataset I am using is Version 2 in zenodo. The evaluated RMSD < 2% is always 17%, filtered RMSD < 2% is among 32%-34%, and top-5 filtered RMSD < 2% is among 38-40%. I see the paper the results are all higher than these. The whole results are as below
run_times_std 21.73
run_times_mean 47.04
steric_clash_fraction 8.37
self_intersect_fraction 0.58
mean_rmsd 19.127588073855097
rmsds_below_2 17.09366391184573
rmsds_below_5 50.874655647382916
rmsds_percentile_25 2.68
rmsds_percentile_50 4.91
rmsds_percentile_75 9.19
mean_centroid 16.57
centroid_below_2 54.66
centroid_below_5 77.22
centroid_percentile_25 0.8
centroid_percentile_50 1.75
centroid_percentile_75 4.3
top5_steric_clash_fraction 6.61
top5_self_intersect_fraction 0.0
top5_rmsds_below_2 31.68
top5_rmsds_below_5 69.97
top5_rmsds_percentile_25 1.68
top5_rmsds_percentile_50 3.17
top5_rmsds_percentile_75 5.73
top5_centroid_below_2 67.77
top5_centroid_below_5 85.4
top5_centroid_percentile_25 0.59
top5_centroid_percentile_50 1.28
top5_centroid_percentile_75 2.59
top10_steric_clash_fraction 6.89
top10_self_intersect_fraction 0.0
top10_rmsds_below_2 38.29
top10_rmsds_below_5 73.83
top10_rmsds_percentile_25 1.52
top10_rmsds_percentile_50 2.74
top10_rmsds_percentile_75 5.21
top10_centroid_below_2 70.52
top10_centroid_below_5 86.78
top10_centroid_percentile_25 0.51
top10_centroid_percentile_50 1.09
top10_centroid_percentile_75 2.37
filtered_self_intersect_fraction 0.55
filtered_steric_clash_fraction 2.48
filtered_rmsds_below_2 32.78
filtered_rmsds_below_5 59.5
filtered_rmsds_percentile_25 1.63
filtered_rmsds_percentile_50 3.48
filtered_rmsds_percentile_75 7.91
filtered_centroid_below_2 62.26
filtered_centroid_below_5 79.89
filtered_centroid_percentile_25 0.55
filtered_centroid_percentile_50 1.26
filtered_centroid_percentile_75 3.3
top5_filtered_self_intersect_fraction 4.68
top5_filtered_steric_clash_fraction 4.68
top5_filtered_rmsds_below_2 39.94
top5_filtered_rmsds_below_5 73.0
top5_filtered_rmsds_percentile_25 1.45
top5_filtered_rmsds_percentile_50 2.55
top5_filtered_rmsds_percentile_75 5.25
top5_filtered_centroid_below_2 69.97
top5_filtered_centroid_below_5 86.5
top5_filtered_centroid_percentile_25 0.46
top5_filtered_centroid_percentile_50 1.05
top5_filtered_centroid_percentile_75 2.43
top10_filtered_self_intersect_fraction 4.13
top10_filtered_steric_clash_fraction 4.13
top10_filtered_rmsds_below_2 42.7
top10_filtered_rmsds_below_5 75.76
top10_filtered_rmsds_percentile_25 1.4
top10_filtered_rmsds_percentile_50 2.43
top10_filtered_rmsds_percentile_75 4.78
top10_filtered_centroid_below_2 72.45
top10_filtered_centroid_below_5 87.6
top10_filtered_centroid_percentile_25 0.45
top10_filtered_centroid_percentile_50 0.96
top10_filtered_centroid_percentile_75 2.15
no_overlap_run_times_std 21.73
no_overlap_run_times_mean 47.04
no_overlap_steric_clash_fraction 12.05
no_overlap_self_intersect_fraction 0.49
no_overlap_mean_rmsd 13.51542289628426
no_overlap_rmsds_below_2 6.09375
no_overlap_rmsds_below_5 32.29166666666667
no_overlap_rmsds_percentile_25 4.09
no_overlap_rmsds_percentile_50 7.82
no_overlap_rmsds_percentile_75 20.88
no_overlap_mean_centroid 11.02
no_overlap_centroid_below_2 33.66
no_overlap_centroid_below_5 56.94
no_overlap_centroid_percentile_25 1.48
no_overlap_centroid_percentile_50 3.52
no_overlap_centroid_percentile_75 19.7
no_overlap_top5_steric_clash_fraction 9.03
no_overlap_top5_self_intersect_fraction 0.0
no_overlap_top5_rmsds_below_2 13.19
no_overlap_top5_rmsds_below_5 52.08
no_overlap_top5_rmsds_percentile_25 2.62
no_overlap_top5_rmsds_percentile_50 4.82
no_overlap_top5_rmsds_percentile_75 8.93
no_overlap_top5_centroid_below_2 47.22
no_overlap_top5_centroid_below_5 70.14
no_overlap_top5_centroid_percentile_25 1.11
no_overlap_top5_centroid_percentile_50 2.24
no_overlap_top5_centroid_percentile_75 5.99
no_overlap_top10_steric_clash_fraction 9.03
no_overlap_top10_self_intersect_fraction 0.0
no_overlap_top10_rmsds_below_2 16.67
no_overlap_top10_rmsds_below_5 55.56
no_overlap_top10_rmsds_percentile_25 2.45
no_overlap_top10_rmsds_percentile_50 4.19
no_overlap_top10_rmsds_percentile_75 8.05
no_overlap_top10_centroid_below_2 49.31
no_overlap_top10_centroid_below_5 72.22
no_overlap_top10_centroid_percentile_25 0.96
no_overlap_top10_centroid_percentile_50 2.06
no_overlap_top10_centroid_percentile_75 5.64
no_overlap_filtered_self_intersect_fraction 0.0
no_overlap_filtered_steric_clash_fraction 4.86
no_overlap_mean_filtered_rmsds 12.111509165359777
no_overlap_filtered_rmsds_below_2 15.28
no_overlap_filtered_rmsds_below_5 38.89
no_overlap_filtered_rmsds_percentile_25 2.73
no_overlap_filtered_rmsds_percentile_50 6.79
no_overlap_filtered_rmsds_percentile_75 16.45
no_overlap_mean_filtered_centroid 9.79313355364492
no_overlap_filtered_centroid_below_2 41.67
no_overlap_filtered_centroid_below_5 61.11
no_overlap_filtered_centroid_percentile_25 0.94
no_overlap_filtered_centroid_percentile_50 2.82
no_overlap_filtered_centroid_percentile_75 14.3
no_overlap_top5_filtered_self_intersect_fraction 6.25
no_overlap_top5_filtered_steric_clash_fraction 6.25
no_overlap_top5_filtered_rmsds_below_2 22.92
no_overlap_top5_filtered_rmsds_below_5 56.94
no_overlap_top5_filtered_rmsds_percentile_25 2.11
no_overlap_top5_filtered_rmsds_percentile_50 4.28
no_overlap_top5_filtered_rmsds_percentile_75 9.13
no_overlap_top5_filtered_centroid_below_2 51.39
no_overlap_top5_filtered_centroid_below_5 72.22
no_overlap_top5_filtered_centroid_percentile_25 0.8
no_overlap_top5_filtered_centroid_percentile_50 1.86
no_overlap_top5_filtered_centroid_percentile_75 6.24
no_overlap_top10_filtered_self_intersect_fraction 6.94
no_overlap_top10_filtered_steric_clash_fraction 6.94
no_overlap_top10_filtered_rmsds_below_2 25.69
no_overlap_top10_filtered_rmsds_below_5 60.42
no_overlap_top10_filtered_rmsds_percentile_25 1.98
no_overlap_top10_filtered_rmsds_percentile_50 3.96
no_overlap_top10_filtered_rmsds_percentile_75 7.83
no_overlap_top10_filtered_centroid_below_2 54.86
no_overlap_top10_filtered_centroid_below_5 73.61
no_overlap_top10_filtered_centroid_percentile_25 0.8
no_overlap_top10_filtered_centroid_percentile_50 1.74
no_overlap_top10_filtered_centroid_percentile_75 5.46
Could you please take a look? I am not sure if this is because of changes of original codes.
command: /usr/bin/python2.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/user/Documents/CONDA/DiffDock/DiffDock/esm/setup.py'"'"'; file='"'"'/home/user/Documents/CONDA/DiffDock/DiffDock/esm/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps --user --prefix=
cwd: /home/user/Documents/CONDA/DiffDock/DiffDock/esm/
Complete output (6 lines):
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: option --user not recognized
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python2.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/user/Documents/CONDA/DiffDock/DiffDock/esm/setup.py'"'"'; file='"'"'/home/user/Documents/CONDA/DiffDock/DiffDock/esm/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps --user --prefix= Check the logs for full command output.
When I run the training code on the small score model, it throws an exception during the preprocessing
loading complexes: 3%|▎ | 493/16379 [43:30<23:21:46, 5.29s/it]
Traceback (most recent call last):
File "anaconda3/envs/diffdock/lib/python3.10/site-packages/scipy/optimize/_differentialevolution.py", line 1116, in _calculate_population_energies
calc_energies = list(
File "anaconda3/envs/diffdock/lib/python3.10/site-packages/scipy/_lib/_util.py", line 407, in __call__
return self.f(x, *self.args)
File "diffdock/datasets/conformer_matching.py", line 60, in score_conformation
SetDihedral(self.mol.GetConformer(self.probe_id), r, values[i])
ValueError: Bad Conformer Id
This is the exception thrown in conformer matching file. I am not sure if it is only me or it's the codes. Could you please take a look? Or do you see this exception before?
I installed this in a fresh environment using the provided environment.yml. but it fails on inference. I attached the two files (renamed to txt for upload)
The esm embedding step works:
(diffdock) $ HOME=esm/model_weights python esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok
Downloading: "https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt" to esm/model_weights/.cache/torch/hub/checkpoints/esm2_t33_650M_UR50D.pt
Downloading: "https://dl.fbaipublicfiles.com/fair-esm/regression/esm2_t33_650M_UR50D-contact-regression.pt" to esm/model_weights/.cache/torch/hub/checkpoints/esm2_t33_650M_UR50D-contact-regression.pt
Transferred model to GPU
Read data/prepared_for_esm.fasta with 2 sequences
Processing 1 of 1 batches (2 sequences)
I get the following error when executing this command in the root dir of the project. I did not download the data and used single sdf and pdb files but according to the README that should work. data/esm2_output
contains two .pt
files: 1cbr_protein.pdb_chain_0.pt 1cbr_protein.pdb_chain_1.pt
(diffdock) $ python -m inference --ligand_path examples/1cbr_ligand.sdf --protein_path examples/1cbr_protein.pdb --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10
Traceback (most recent call last):
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/share/lcbcsrv5/lcbcdata/duerr/PhD/08_Code/DiffDock/inference.py", line 16, in <module>
from datasets.pdbbind import PDBBind
File "/share/lcbcsrv5/lcbcdata/duerr/PhD/08_Code/DiffDock/datasets/pdbbind.py", line 22, in <module>
from utils.utils import read_strings_from_txt
File "/share/lcbcsrv5/lcbcdata/duerr/PhD/08_Code/DiffDock/utils/utils.py", line 12, in <module>
from torch_geometric.nn.data_parallel import DataParallel
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/__init__.py", line 3, in <module>
from .sequential import Sequential
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/sequential.py", line 8, in <module>
from torch_geometric.nn.conv.utils.jit import class_from_module_repr
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/__init__.py", line 25, in <module>
from .spline_conv import SplineConv
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/spline_conv.py", line 16, in <module>
from torch_spline_conv import spline_basis, spline_weighting
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_spline_conv/__init__.py", line 11, in <module>
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
AttributeError: 'NoneType' object has no attribute 'origin'
I have been trying to get DiffDock installed on a Windows server so I can test it with our structures and ligands of interest.
I have been running into difficulties. After getting everything installed and all the prerequisite packages working. I get the following errors/failures. These errors/failures occur when I use either an .sdf or a .mol2 file for the ligand. And even when I include a Smiles code and the program supposedly completes, the results make no sense. The molecule basically blows apart, or is nowhere near the target PDB.
'''
python -m inference --protein_ligand_csv data/protein_ligand_trial3_csv.csv --out_dir results/user_predictions_small3 --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
loading data from memory: data/cache_torsion\limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings3467677806\heterographs.pkl
Number of complexes: 1
radius protein: mean 25.799917221069336, std 0.0, max 25.799917221069336
radius molecule: mean 3.531266689300537, std 0.0, max 3.531266689300537
distance protein-mol: mean 11.676636695861816, std 0.0, max 11.676636695861816
rmsd matching: mean 0.0, std 0.0, max 0
HAPPENING | confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now.
loading data from memory: data/cache_torsion_allatoms\limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_atomRad5_atomMax8_esmEmbeddings3467677806\heterographs.pkl
Number of complexes: 1
radius protein: mean 25.799917221069336, std 0.0, max 25.799917221069336
radius molecule: mean 3.7641730308532715, std 0.0, max 3.7641730308532715
distance protein-mol: mean 11.22496223449707, std 0.0, max 11.22496223449707
rmsd matching: mean 0.0, std 0.0, max 0
common t schedule [1. 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35
0.3 0.25 0.2 0.15 0.1 0.05]
Size of test dataset: 1
0it [00:00, ?it/s]### C:\Users\XXXXXXX\Miniconda3\envs\diffdock4\lib\site-packages\e3nn\o3_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: torch::jit::fuser::cuda::compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback
To report the issue, try enable logging via setting the envvariable export PYTORCH_JIT_LOG_LEVEL=manager.cpp
(Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\manager.cpp:244.)
sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])
C:\Users\XXXXXXXX\DiffDock-main\utils\torsion.py:60: RuntimeWarning: invalid value encountered in true_divide
rot_vec = rot_vec * torsion_updates[idx_edge] / np.linalg.norm(rot_vec) # idx_edge!
Failed on ['data/trial-3/Ap_GST_Phi2.pdb____data/trial-3/L_glufosinate.mol2'] linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 2).
1it [00:07, 7.58s/it]
Failed for 1 complexes
Skipped 0 complexes
Results are in results/user_predictions_small3
'''
So, ok, maybe I'll just try using smiles representations instead. When I use the isomeric smiles string I get the same. I am just showing the warning and error portions.
"""
C:\Users\XXXXXXX\Miniconda3\envs\diffdock4\lib\site-packages\e3nn\o3_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: torch::jit::fuser::cuda::compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback
To report the issue, try enable logging via setting the envvariable export PYTORCH_JIT_LOG_LEVEL=manager.cpp
(Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\manager.cpp:244.)
sh = spherical_harmonics(self.lmax, x[..., 0], x[..., 1], x[..., 2])
C:\Users\XXXXXXX\DiffDock-main\utils\torsion.py:60: RuntimeWarning: invalid value encountered in true_divide
rot_vec = rot_vec * torsion_updates[idx_edge] / np.linalg.norm(rot_vec) # idx_edge!
Failed on ['data/trial-1/Ap_GST_Phi2.pdb__C(CC(=O)NC@@HC(=O)NCC(=O)O)C@@HN'] linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 2).
1it [00:07, 7.16s/it]
Failed for 1 complexes
Skipped 0 complexes
Results are in results/user_predictions_small1
"""
However, when I list a ligand as a canonical smiles string
"""
python -m inference --protein_ligand_csv data/protein_ligand_trial3_csv.csv --out_dir results/user_predictions_small3 --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
Reading molecules and generating local structures with RDKit
1it [00:00, 22.30it/s]
Reading language model embeddings.
Generating graphs for ligands and proteins
loading complexes: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.97it/s]
loading data from memory: data/cache_torsion\limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings3792232284\heterographs.pkl
Number of complexes: 1
radius protein: mean 25.799917221069336, std 0.0, max 25.799917221069336
radius molecule: mean 5.837835311889648, std 0.0, max 5.837835311889648
distance protein-mol: mean 11.18027114868164, std 0.0, max 11.18027114868164
rmsd matching: mean 0.0, std 0.0, max 0
HAPPENING | confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now.
Reading molecules and generating local structures with RDKit
1it [00:00, 27.60it/s]
Reading language model embeddings.
Generating graphs for ligands and proteins
loading complexes: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.88it/s]
loading data from memory: data/cache_torsion_allatoms\limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_atomRad5_atomMax8_esmEmbeddings3792232284\heterographs.pkl
Number of complexes: 1
radius protein: mean 25.799917221069336, std 0.0, max 25.799917221069336
radius molecule: mean 6.153195858001709, std 0.0, max 6.153195858001709
distance protein-mol: mean 11.254096031188965, std 0.0, max 11.254096031188965
rmsd matching: mean 0.0, std 0.0, max 0
common t schedule [1. 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35
0.3 0.25 0.2 0.15 0.1 0.05]
Size of test dataset: 1
0it [00:00, ?it/s]C:\Users\XXXXXXX\Miniconda3\envs\diffdock4\lib\site-packages\e3nn\o3_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: torch::jit::fuser::cuda::compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback
To report the issue, try enable logging via setting the envvariable export PYTORCH_JIT_LOG_LEVEL=manager.cpp
(Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\manager.cpp:244.)
sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])
1it [00:57, 57.72s/it]
Failed for 0 complexes
Skipped 0 complexes
Results are in results/user_predictions_small3
"""
You'll notice that even when it successfully completes I still have the FALLBACK warning.
I don't know what's going on. Many of the ligands I am interested in are chiral compounds where one isomer is active and the other is not. I want to investigate the differences between the interactions.
Thanks for any assistance.
Hi, I tried running the model for a protein-ligand complex with the following commands:
python datasets/esm_embedding_preparation.py --protein_path /brahma_hd/a7_allosteric/docking/7ekt/diffdock/7ekt.pdb --out_file data/prepared_for_esm.fasta
git clone https://github.com/facebookresearch/esm
cd esm
pip install -e .
cd ..
HOME=esm/model_weights python esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok
python -m inference --out_dir /brahma_hd/a7_allosteric/docking/7ekt/diffdock/ --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise --protein_path /brahma_hd/a7_allosteric/docking/7ekt/diffdock/7ekt.pdb --ligand /brahma_hd/a7_allosteric/docking/7ekt/diffdock/EQ04.mol2
But I get this error when running inference:
Traceback (most recent call last):
File "/biggin/b196/scro4068/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/biggin/b196/scro4068/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/biggin/b196/scro4068/opt/DiffDock/inference.py", line 16, in <module>
from datasets.pdbbind import PDBBind
File "/biggin/b196/scro4068/opt/DiffDock/datasets/pdbbind.py", line 22, in <module>
from utils.utils import read_strings_from_txt
File "/biggin/b196/scro4068/opt/DiffDock/utils/utils.py", line 12, in <module>
from torch_geometric.nn.data_parallel import DataParallel
File "/biggin/b196/scro4068/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/__init__.py", line 3, in <module>
from .sequential import Sequential
File "/biggin/b196/scro4068/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/sequential.py", line 8, in <module>
from torch_geometric.nn.conv.utils.jit import class_from_module_repr
File "/biggin/b196/scro4068/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/__init__.py", line 25, in <module>
from .spline_conv import SplineConv
File "/biggin/b196/scro4068/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/spline_conv.py", line 16, in <module>
from torch_spline_conv import spline_basis, spline_weighting
File "/biggin/b196/scro4068/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_spline_conv/__init__.py", line 11, in <module>
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
AttributeError: 'NoneType' object has no attribute 'origin'
Any ideas?
Best regards,
Franco
Seems the instructions in the README are wrong because --ligand_path
does not exist. I tried using --ligand but that throws a different error.
python -m inference --ligand_path examples/1cbr_ligand.sdf --protein_path examples/1cbr_protein.pdb --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10
100%|████████████████████████████████████████████████████████████████████████████████████████| 201/201 [01:33<00:00, 2.14it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 201/201 [02:04<00:00, 1.61it/s]
/share/lcbcsrv5/lcbcdata/duerr/PhD/08_Code/DiffDock/utils/torus.py:38: RuntimeWarning: invalid value encountered in divide
score_ = grad(x, sigma[:, None], N=100) / p_
usage: inference.py [-h] [--config CONFIG] [--protein_ligand_csv PROTEIN_LIGAND_CSV] [--protein_path PROTEIN_PATH]
[--ligand LIGAND] [--out_dir OUT_DIR] [--esm_embeddings_path ESM_EMBEDDINGS_PATH] [--save_visualisation]
[--samples_per_complex SAMPLES_PER_COMPLEX] [--model_dir MODEL_DIR] [--ckpt CKPT]
[--confidence_model_dir CONFIDENCE_MODEL_DIR] [--confidence_ckpt CONFIDENCE_CKPT]
[--batch_size BATCH_SIZE] [--cache_path CACHE_PATH] [--no_random] [--no_final_step_noise] [--ode]
[--inference_steps INFERENCE_STEPS] [--num_workers NUM_WORKERS] [--sigma_schedule SIGMA_SCHEDULE]
[--actual_steps ACTUAL_STEPS] [--keep_local_structures]
inference.py: error: unrecognized arguments: --ligand_path examples/1cbr_ligand.sdf
Error with --ligand
python -m inference --ligand examples/1cbr_ligand.sdf --protein_path examples/1cbr_protein.pdb --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10
Reading molecules and generating local structures with RDKit
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/duerr/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/share/lcbcsrv5/lcbcdata/duerr/PhD/08_Code/DiffDock/inference.py", line 81, in <module>
test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions,
File "/share/lcbcsrv5/lcbcdata/duerr/PhD/08_Code/DiffDock/datasets/pdbbind.py", line 102, in __init__
self.inference_preprocessing()
File "/share/lcbcsrv5/lcbcdata/duerr/PhD/08_Code/DiffDock/datasets/pdbbind.py", line 208, in inference_preprocessing
mol.RemoveAllConformers()
The zenodo dataset has two versions, which one to use?
Hi, congratulations on your work, it is a very interesting approach and the results are amazing!
I was able to run the PDBbind examples, but I see the following error with other input files:
Failed on ['data/protein.pdb____data/ligands/10005.sdf'] tensor_type->scalarType().has_value() INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1659484809662/work/torch/csrc/jit/codegen/cuda/type_promotion.cpp":111, please report a bug to PyTorch. Missing Scalar Type information
Do you have any idea what might be wrong?
Is there a way to include existing ligand in a PDB structure and dock a new ligand. This is necessary to block off the pocket pertaining to the originally bound ligand in the protein PDB.
An example is an ATP/GTP bound pocket which is inaccessible to a second ligand that we would try to dock using diffdock.
If not, is it possible to dock more than one ligand simultaneously to the protein.
The Zenodo dataset proteins are different from the original PDBBind. Can you explain how these were processed? I did not see any details in the paper or code.
Thanks!
However I try to use diffdock ( I followed the instructions with creating the conda environment and all that seemed to work)
I'll get the error mentioned in the title. To be more specific it looks like this.
$ python -m inference --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10
Traceback (most recent call last):
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/work/scratch/b_mayer/DiffDock/inference.py", line 16, in <module>
from datasets.pdbbind import PDBBind
File "/work/scratch/b_mayer/DiffDock/datasets/pdbbind.py", line 22, in <module>
from utils.utils import read_strings_from_txt
File "/work/scratch/b_mayer/DiffDock/utils/utils.py", line 12, in <module>
from torch_geometric.nn.data_parallel import DataParallel
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/__init__.py", line 3, in <module>
from .sequential import Sequential
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/sequential.py", line 8, in <module>
from torch_geometric.nn.conv.utils.jit import class_from_module_repr
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/__init__.py", line 25, in <module>
from .spline_conv import SplineConv
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/spline_conv.py", line 16, in <module>
from torch_spline_conv import spline_basis, spline_weighting
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_spline_conv/__init__.py", line 11, in <module>
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
AttributeError: 'NoneType' object has no attribute 'origin'_
I found that if I open a python console in the DiffDock folder and do
from datasets.pdbbind import PDBBind
I'll get the same error as above:
>>>from datasets.pdbbind import PDBBind
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/work/scratch/b_mayer/DiffDock/datasets/pdbbind.py", line 22, in <module>
from utils.utils import read_strings_from_txt
File "/work/scratch/b_mayer/DiffDock/utils/utils.py", line 12, in <module>
from torch_geometric.nn.data_parallel import DataParallel
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/__init__.py", line 3, in <module>
from .sequential import Sequential
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/sequential.py", line 8, in <module>
from torch_geometric.nn.conv.utils.jit import class_from_module_repr
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/__init__.py", line 25, in <module>
from .spline_conv import SplineConv
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_geometric/nn/conv/spline_conv.py", line 16, in <module>
from torch_spline_conv import spline_basis, spline_weighting
File "/work/scratch/b_mayer/miniconda3/envs/diffdock/lib/python3.9/site-packages/torch_spline_conv/__init__.py", line 11, in <module>
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
AttributeError: 'NoneType' object has no attribute 'origin'
Hi, could you shed some light on the error "LM embeddings for complex... did not have the right length for the protein"?
I'm trying to run on a single protein-ligand complex, and I'm providing a prepared protein PDB and ligand SDF. I can see in the code where this is generated
Line 322 in c32ec5b
if
statement to be true.
Here is the exact error:
python -m inference --protein_path data/XXX.pdb --ligand data/YYY.sdf --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
Reading molecules and generating local structures with RDKit
1it [00:00, 23.30it/s]
Reading language model embeddings.
Generating graphs for ligands and proteins
loading complexes: 0%| | 0/1 [00:00<?, ?it/s]LM embeddings for complex data/XXX.pdb____data/YYY.sdf did not have the right length for the protein. Skipping data/XXX.pdb____data/YYY.sdf.
loading complexes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.47s/it]
loading data from memory: data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings2279325814/heterographs.pkl
Number of complexes: 0
/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/_methods.py:257: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/cluster/home/slochowe/explorations/DiffDock/inference.py", line 81, in <module>
test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions,
File "/cluster/home/slochowe/explorations/DiffDock/datasets/pdbbind.py", line 111, in __init__
print_statistics(self.complex_graphs)
File "/cluster/home/slochowe/explorations/DiffDock/datasets/pdbbind.py", line 376, in print_statistics
print(f"{name[i]}: mean {np.mean(array)}, std {np.std(array)}, max {np.max(array)}")
File "<__array_function__ internals>", line 180, in amax
File "/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2793, in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/cluster/home/slochowe/anaconda3/envs/diffdock/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
Dear authors,
first of all let me congratulate you on the great work!
I would like to ask you about the use of the ligand hydrogens - in the paper, you say the final model does not use hydrogens for the score model - did it bring a significant improvement? And does not this improvement come just from the fact that there is less atoms to align in the RMSD computation?
Just to make myself clear on where the ligand hydrogens are lost - they are used just in the node features on the input, but the network does not predict their poses at all, right? So to obtain them from DiffDock the best I can do is to run Diffdock and then run some external protonation tool?
Thank you very much in advance for any reply!
Petr
With AlphaFold, when pLDDT is say above 70, you can gain some trust in the prediction. For DiffDock, what is a range where you would "trust" the results?
When I follow the readme, in the following step
(diffdock) icer@ubuntu:~/my_prj/diffdock/DiffDock$ HOME=esm/model_weights python esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok
Downloading: "https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt" to esm/model_weights/.cache/torch/hub/checkpoints/esm2_t33_650M_UR50D.pt
I found it is still here, can not continue.
Hi, when I follow this command:
HOME=esm/model_weights python esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok
it get an error.
Traceback (most recent call last): File "/home/icer/my_prj/diffdock/DiffDock/esm/scripts/extract.py", line 137, in <module> main(args) File "/home/icer/my_prj/diffdock/DiffDock/esm/scripts/extract.py", line 74, in main dataset = FastaBatchedDataset.from_file(args.fasta_file) File "/home/icer/my_prj/diffdock/DiffDock/esm/esm/data.py", line 39, in from_file with open(fasta_file, "r") as infile: FileNotFoundError: [Errno 2] No such file or directory: 'data/prepared_for_esm.fasta'
Hello,
Thank you for the great work. I was trying the example out of the box. Namely (I managed to successfully clone the repo & install the conda environment). Then, I ran:
python datasets/esm_embedding_preparation.py --protein_ligand_csv data/protein_ligand_example_csv.csv --out_file data/prepared_for_esm.fasta
git clone https://github.com/facebookresearch/esm
cd esm
pip install -e .
cd ..
HOME=esm/model_weights python esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok
However, on the last line of code, I received the error:
(diffdock) akshat@Akshat:~/Downloads/DiffDock$ HOME=esm/model_weights python esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok
Traceback (most recent call last):
File "/home/akshat/Downloads/DiffDock/esm/scripts/extract.py", line 12, in <module>
from esm import Alphabet, FastaBatchedDataset, ProteinBertModel, pretrained, MSATransformer
File "/home/akshat/Downloads/DiffDock/esm/esm/pretrained.py", line 15, in <module>
from esm.model.esm2 import ESM2
ModuleNotFoundError: No module named 'esm.model'
I would appreciate some advice.
Thank you so much! :)
Hello, I have been trying to use DiffDock with Glufosinate specifically. It has a Chiral Center where one of the members is hydrogen. The smiles code for them is below.
R-glufosinate CP(=O)(CCC@HN)O
L-glufosinate CP(=O)(CCC@@HN)O
When I converted these Smiles codes to SDF and ran DiffDock with the SDFs. The results were all of the same chirality. Is there some way to define the chirality differently. I am now running it with the smiles code itself rather than the SDF to see if there is a difference.
Thanks
Docs state that you don't need a csv and can use --ligand
but this is not the case
optional arguments:
-h, --help show this help message and exit
--out_file OUT_FILE
--protein_ligand_csv PROTEIN_LIGAND_CSV
Path to a .csv specifying the input as described in the main README
--protein_path PROTEIN_PATH
Path to a single PDB file. If this is not None then it will be used instead of the --protein_ligand_csv
Thanks for the great work.
When I do "python -m evaluate --model_dir workdir/paper_score_model --ckpt best_ema_inference_epoch_model.pt --confidence_ckpt best_model_epoch75.pt --confidence_model_dir workdir/paper_confidence_model --run_name DiffDockInference --inference_steps 20 --split_path data/splits/timesplit_test --samples_per_complex 40 --batch_size 10"
I got the following message
"ModuleNotFoundError: No module named 'datasets.pdbbind' "
I think we need an empty file named "__init__.py
" under the datasets folder.
Hi,
Is there anyway, we can generate animation (.gif) of the predicted results?. Thanks in advance.
Hi, thank you for your great work!
When I am runing the training code, several batches will output 'WARNING: weird torch_cluster error, skipping batch' , and it takes up around 1/5 of all of steps per epoch. The training data are using the split in the repo, is this within the expectation? Or somethins is wrong with my training process? By the way, I find that according to the README doc, the training epochs number is set to be 850, may I ask how many epochs does the model actually need to train to have the similar performance as the paper's?
As the code is written (--inference_steps 20 --samples_per_complex 40 --batch_size 10), it takes about 12 minutes/run to complete on a Mac M1 with 32 GB RAM. I have tried to minimize these values but have gotten less accurate results. Can you provide some insight into which parameters can be minimized without losing significant accuracy?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.