rf_vina_enhance

This is a course project enhancing docking scoring function. I take deltavinarf20 as the baseline and involve more non-bonded information, including hydrogen_bond, water_bridge, halogen_bond, salt_bridge, pi_cation_interaction and pi_stack.

A quick setup

For a quick setup, you need to install the dependencies in requirements.txt (NOTE: DO NOT FORGET TO MODIFY SYSTEM PATH). You also need to preinstall several codebases, deltavina https://github.com/chengwang88/deltavina.git is the repo of deltavinarf20, in which only provides model inference codes. vina4dv https://github.com/chengwang88/vina4dv.git is a fork version of AutoDockVina, which is required by deltavina. plip https://github.com/pharmai/plip.git is a tool extracting non-bonded interactions between ligand and protein pair. For feature preparation: Step 1, convert protein.pdb to protein.pdbqt using mgltools:

pythonsh /path-you-install-mgltools/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_receptor4.py -r xxxx_protein.pdb -o xxxx_protein.pdbqt

Step 2, calculate vina score and deltavinarf20 score of each complex in PDBbind refined-set using deltavina repo:

/path-you-install-deltavina/deltavina/bin/dvrf20.py -r xxxx_protein.pdb -l xxxx_ligand.mol2

Step 3, extract feature using plip by running plip_extract_feature.py (as well as parsing the xml file generated by plip)

Step 4, extract vina score feature by running scoring_core_deltavinarf20.py

Final step, running jupyter-notebook scoring_model.ipynb to

combine plip feature with vina score feature as model input
prepare ground-truth from refined-set/index/INDEX_refined_data.2019
train models using sklearn
inference and calculate mse loss

Data Requirements

PDBbind (http://www.pdbbind.org.cn/) is a dataset processed from PDB database. It contains a group of protein-ligand pairs with a id (like 1a28). It take the complex in PDB databse with the same id and seperate the receptor with the ligand, and do experiments to get the binding affinity pKd (-logKi/Kd, the fourth column in refined-set/index/INDEX_refined_data.2019).

After cleaning, train set (from refined-set) includes 3602 complexes and test set (from core-set) includes 263 complexes.

yhliu918 / rf_vina_enhance Goto Github PK

rf_vina_enhance's Introduction

rf_vina_enhance

A quick setup

Data Requirements

rf_vina_enhance's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent