Coder Social home page Coder Social logo

rf_vina_enhance's Introduction

rf_vina_enhance

This is a course project enhancing docking scoring function. I take deltavinarf20 as the baseline and involve more non-bonded information, including hydrogen_bond, water_bridge, halogen_bond, salt_bridge, pi_cation_interaction and pi_stack.

A quick setup

For a quick setup, you need to install the dependencies in requirements.txt (NOTE: DO NOT FORGET TO MODIFY SYSTEM PATH). You also need to preinstall several codebases, deltavina https://github.com/chengwang88/deltavina.git is the repo of deltavinarf20, in which only provides model inference codes. vina4dv https://github.com/chengwang88/vina4dv.git is a fork version of AutoDockVina, which is required by deltavina. plip https://github.com/pharmai/plip.git is a tool extracting non-bonded interactions between ligand and protein pair. For feature preparation: Step 1, convert protein.pdb to protein.pdbqt using mgltools:

pythonsh /path-you-install-mgltools/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_receptor4.py -r xxxx_protein.pdb -o xxxx_protein.pdbqt

Step 2, calculate vina score and deltavinarf20 score of each complex in PDBbind refined-set using deltavina repo:

/path-you-install-deltavina/deltavina/bin/dvrf20.py -r xxxx_protein.pdb -l xxxx_ligand.mol2

Step 3, extract feature using plip by running plip_extract_feature.py (as well as parsing the xml file generated by plip)

Step 4, extract vina score feature by running scoring_core_deltavinarf20.py

Final step, running jupyter-notebook scoring_model.ipynb to

  1. combine plip feature with vina score feature as model input
  2. prepare ground-truth from refined-set/index/INDEX_refined_data.2019
  3. train models using sklearn
  4. inference and calculate mse loss

Data Requirements

PDBbind (http://www.pdbbind.org.cn/) is a dataset processed from PDB database. It contains a group of protein-ligand pairs with a id (like 1a28). It take the complex in PDB databse with the same id and seperate the receptor with the ligand, and do experiments to get the binding affinity pKd (-logKi/Kd, the fourth column in refined-set/index/INDEX_refined_data.2019).

After cleaning, train set (from refined-set) includes 3602 complexes and test set (from core-set) includes 263 complexes.

rf_vina_enhance's People

Contributors

yhliu918 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.