Coder Social home page Coder Social logo

hicai-zju / imold Goto Github PK

View Code? Open in Web Editor NEW
18.0 1.0 1.0 286 KB

Official implementation for Learning Invariant Molecular Representation in Latent Discrete Space (NeurIPS 2023)

License: MIT License

Python 100.00%
molecular-representation-learning out-of-distribution

imold's Introduction

Learning Invariant Molecular Representation in Latent Discrete Space

This repository is the official implementation of our paper:

Learning Invariant Molecular Representation in Latent Discrete Space

Xiang Zhuang, Qiang Zhang*, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen* (* denotes correspondence)

Advances in Neural Information Processing Systems (NeurIPS) 2023

Environment

To run the code successfully, the following dependencies need to be installed:

Python                     3.8      
torch                      1.10.1
torch_geometric            2.0.4
torch_scatter              2.0.9
torch_cluster              1.6.0
torch_sparse               0.6.13
torch_spline_conv          1.2.1
rdkit_pypi                 2022.9.5
vector_quantize_pytorch    1.0.7
ogb                        1.3.6

This repo is also depended on GOOD and DrugOOD, please follow the installation methods provided for each package:

Data

The data used in the experiments can be downloaded from the following sources:

  1. GOOD
  2. DrugOOD
    • download from link.
    • Extract the downloaded file and save the contents in the drugood-data-chembl30 directory.

An example of the folder hierarchy after adding the data files:

├── data
│   ├── GOODHIV
│   ├── GOODPCBA
│   ├── GOODZINC
├── drugood-data-chembl30
│   ├── lbap_core_ec50_assay.json
│   └── ...
├── models
│   ├── model.py
│   └── ...
├── run.py
└── README.md

Running Script

Training

python run.py --dataset GOODZINC --domain scaffold --shift concept --num_e 4000 --bs 256 --gamma 0.5 --inv_w 0.01 --reg_w 0.5 --gpu 0 --exp_name ZINC --exp_id scaffold-concept

Running parameters and descriptions are as follows:

Parameter Description Choices
dataset name of dataset GOODHIV, GOODZINC, GOODPCBA, ic50_assay, ic50_scaffold, ic50_size, ec50_assay, ec50_scaffold, ec50_size.
domain environment-splitting strategy scaffold, size. Only need to be specified for datasets in GOOD.
shift type of distribution shift covariate, concept. Only need to be specified for datasets in GOOD.
num_e code book size -
bs batch size -
gamma threshold $\gamma$ -
inv_w $\lambda_1$ -
reg_w $\lambda_2$ -
gpu which GPU to use -
exp_name experiment name -
exp_id experiment ID -

Evaluation

We provide the hyperparameters for the training of each dataset in the Appendix, and provide the corresponding checkpoints in the release page.

python eval.py --dataset GOODZINC --domain scaffold --shift concept --load_path checkpoint/GOODZINC-scaffold-concept.pkl

The load_path parameter specifies the path to load the checkpoint.

Citation

If you use or extend our work, please cite the paper as follows:

@InProceedings{zhuang2023learning,
  title={Learning Invariant Molecular Representation in Latent Discrete Space},
  author={Xiang Zhuang and Qiang Zhang and Keyan Ding and Yatao Bian and Xiao Wang and Jingsong Lv and Hongyang Chen and Huajun Chen},
  booktile={Advances in Neural Information Processing Systems},
  year={2023}
}

imold's People

Contributors

toooooodo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

peikalunci

imold's Issues

Can not reproduce the result on DrugOOD.

I ran your code on DrugOOD with 10 different seeds (0 ~ 9), but the results are significantly lower than those reported in your paper. For example, the ROC-AUC on ec50_assay and ec50_size are:

  • ec50_assay: 76.997 73.57 71.06 72.65 75.16 73.07 71.84 71.92 73.67 76.86
  • ec50_size : 61.71 61.93 60.12 61.95 62.50 61.92 57.81 62.97 61.85 64.20

these results are all below those reported in your paper.
I have used the same hyper-parameters provided in the Appendix. Does that mean your proposed method is sensitive to the random seed?
Thanks!

requests.exceptions.ConnectionError:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='drive.google.com', port=443): Max retries exceeded with url: /uc?id=1GNc0HUee5YQH4Vtlk8ZbDjyJBYTEyabo (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc3f5ebb0a0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

Can you tell me how to solve it?

Running Error

Dear Authors,

When I run this code, there exists an error. Could you give me some advice?

捕获

About Siam Projection

self.simsiam_proj = nn.Sequential(nn.Linear(emb_dim, emb_dim * 2),

Thanks for sharing the codes!

But I am confused about one part: It seems that the projection layer in the Encoder isn't used anywhere in the code. Did I miss something?

About Evaluation

Dear Authors,

I have finished the training process. However, I cannot find the saved model.

Is the model saved in dump/0813-ZINC/scaffold-concept/params.pkl?

But when I run the command, these exists an error, i.e., python eval.py --dataset GOODZINC --domain scaffold --shift concept --load_path dump/0813-ZINC/scaffold-concept/params.pkl?

捕获

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.