Coder Social home page Coder Social logo

drugclip's Introduction

DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening

License: MIT ArXiv

cover

Official code for the paper "DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening", accepted at Neural Information Processing Systems, 2023. Currently the code is a raw version, will be updated ASAP. If you have any inquiries, feel free to contact [email protected]

Requirements

same as Uni-Mol

rdkit version should be 2022.9.5

Data and checkpoints

https://drive.google.com/drive/folders/1zW1MGpgunynFxTKXC2Q4RgWxZmg6CInV?usp=sharing

It currently includes the train data, the trained checkpoint and the test data for DUD-E

Training data

The dataset for training is included in google drive: train_no_test_af.zip. It contains several files:


dick_pkt.txt: dictionary for pocket atom types

dict_mol.txt: dictionary for molecule atom types

train.lmdb: train dataset

valid.lmdb: validation dataset

Use py_scripts/lmdb_utils.py to read the lmdb file. The keys in the lmdb files and corresponding descriptions are shown below:


"atoms": "atom types for each atom in the ligand" 

"coordinates": "3D coordinates for each atom in the ligand generated by RDKit. Max number of conformations is 10"

"pocket_atoms": "atom types for each atom in the pocket"

"pocket_coordinates": "3D coordinates for each atom in the pocket"

"mol": "RDKit molecule object for the ligand"

"smi": "SMILES string for the ligand"

"pocket": "pdbid of the pocket",

The dataset is compiled from the PBDBind dataset, containing a combination of authentic protein-ligand complexes and those generated through HomoAug, a technique for augmenting data with homology-based transformations.

Test data

DUD-E

DUD-E
├── gene id
│   ├── receptor.pdb
│   ├── crystal_ligand.mol2
│   ├── actives_final.ism
│   ├── decoys_final.ism
│   ├── mols.lmdb (containing all actives and decoys)
│   ├── pocket.lmdb

PCBA

lit_pcba
├── target name
│   ├── PDBID_protein.mol2
│   ├── PDBID_ligand.mol2
│   ├── actives.smi
│   ├── inactives.smi
│   ├── mols.lmdb (containing all actives and inactives)
│   ├── pocket.lmdb

Data preprocessing

see py_scripts/write_dude_multi.py

HomoAug

Please refer to HomoAug directory for details

Train

bash drugclip.sh

Test

bash test.sh

Retrieval

bash retrieval.sh

In the google drive folder, you can find example file for pocket.lmdb and mols.lmdb under retrieval dir.

Citation

If you find our work useful, please cite our paper:

@inproceedings{gao2023drugclip,
    author = {Gao, Bowen and Qiang, Bo and Tan, Haichuan and Jia, Yinjun and Ren, Minsi and Lu, Minsi and Liu, Jingjing and Ma, Wei-Ying and Lan, Yanyan},
    title = {DrugCLIP: Contrasive Protein-Molecule Representation Learning for Virtual Screening},
    booktitle = {NeurIPS 2023},
    year = {2023},
    url = {https://openreview.net/forum?id=lAbCgNcxm7},
}

drugclip's People

Contributors

bowen-gao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.