deepgraphlearning / torchdrug Goto Github PK

View Code? Open in Web Editor NEW

1.4K 30.0 198.0 2.69 MB

A powerful and flexible machine learning platform for drug discovery

Home Page: https://torchdrug.ai/

License: Apache License 2.0

Python 88.62% C++ 5.84% Cuda 5.27% HTML 0.20% Dockerfile 0.07%

pytorch deep-learning graph-neural-networks drug-discovery

torchdrug's People

Stargazers

Watchers

Forkers

tangjianpku bz99bz kyungjoongjeong trendingtechnology ashora liuyunwu xuanheiiis sailfish009 lipovsek jiahaoyao nguyenducnhaty gopalrk ihabbendidi ngo010 vmdlv jxzhangjhu kormilitzin ivorobyev yingzhang233 shuhua886 jspisak superzhen625 danceinclouds neuronblack stevenjokess xuanlin1991 yuquanwei knowledgehacker vishalbelsare dashenzi721 haipinglu prasannavd giribio yingyingjin kcbhamu wikipedia2008 nasa03 cuidachao xy21hb opaya ustchope johnshouie gaoshan2006 abouopensource miggiecs911 chonglu121 orkatz7 ai-hub-deep-learning-fundamental shubhammittal98 amdens-sci henriquepheak wconnell flash-jaehyun adiv5 shayansadeghieh ip01 mubashermohammed ndnng general-synthesis tifosi528 stjordanis quocdat32461997 cthoyt sfrias curioustauseef idiomaticrefactoring adbmd baranwa2 o7s8r6 akhil4rajan yupliu rubenszimbres bigdatasciencegroup genomicsnx bbyun28 manikant92 ywu-roivant mkeshita giering manangoel99 rohitpandey13 jboktor catenate15 amirsh15 deepsystemspharmacology lifeixianshen rnaimehaom pstjohn yanyipu yueyedeai farhad-abdi ardeat heathcliff233 ericdoug-qi hongxinxiang shiyx409 saramandaaa tianyuzelin proevgenii jeeinoh

torchdrug's Issues

pretrained model serving

I'm wondering if there is an intention to serve pretrained models directly through the API? It seems to me readily available pretrained models (eg large-scale trained molecular representation models) would be of great utility for many users and generally reduce waste.

See the huggingface transformers library as an example. There is vast demand for this type of interface...

How are missing values treated in regression model training for property prediction?

Hi,
I tried to build a property prediction model using the OPV dataset. See code below.
Training a GIN model using all 8 tasks fails due to missing values in the 4 subtasks ending in _extrapolated.
However, model training does not stop even when all values get nan.
When the 4 subtasks with missing values are excluded model training works fine.

How does torchdrug deal with missing values in subtasks?

I'm asking as I would like to find out how robust multitask GIN models are to data sparsity.
See Effect of missing data on multitask prediction methods

import torch
from torchdrug import core, data, datasets, tasks, models

dataset = datasets.OPV("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
print(f"# Train/Valid/Test: {len(train_set)}/{len(valid_set)}/{len(test_set)}")

model = models.GIN(
    input_dim=dataset.node_feature_dim,
    hidden_dims=[300, 300, 300, 300],
    short_cut=True,
    batch_norm=True,
    concat_hidden=True,
)
subtasks = (
    "gap",
    "homo",
    "lumo",
    "spectral_overlap",
    # "homo_extrapolated", # task contains nan values
    # "lumo_extrapolated", # task contains nan values
    # "gap_extrapolated", # task contains nan values
    # "optical_lumo_extrapolated", # task contains nan values
)
task = tasks.PropertyPrediction(
    model, task=subtasks, criterion="mse", metric=("mae", "rmse"), verbose=1
)
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
solver = core.Engine(
    task,
    train_set,
    valid_set,
    test_set,
    optimizer,
    gpus=[0],
    batch_size=256,
)
solver.train(num_epoch=3)
solver.save("opv_gin_property_prediction.pth")
solver.evaluate("valid")
solver.evaluate("test")

Thanks!

Customized target for retrosynthesis

Hi, thanks for sharing this repo!

I am wondering how I could input arbitrary target/product for retrosynthesis analysis? What target format would the model required besides SMILES? In the notebook, it's performing prediction on USPTO dataset. I am interested in knowing how I could apply this model to the target outside of USPTO.

Thanks!!

`import torchdrug` will result in GUI of matplotlib use agg

import torchdrug will result in GUI of matplotlib use agg and the figure cant show. In general, the GUI of matplotlib use module://backend_interaggto show figure. Agg is non-GUI backend, so please tell why? In the graph.py, i find the function named visualize to show figure, but i cant find why change it. Hope to get your reply! Thanks!

Tip:
I find a solution to solve this problem.
under the all import, add matplotlib.use('module://backend_interagg') and import matplotlib.

MOSES dataset: AttributeError: 'NoneType' object has no attribute 'edge_list'

I tried following the tutorial of GCPN. But when I execute model = models.RGCN(input_dim=dataset.node_feature_dim, num_relation=dataset.num_bond_type, hidden_dims=[256, 256, 256, 256], batch_norm=False), I got the following error:

Traceback (most recent call last):
  File "gcpn.py", line 13, in <module>
    num_relation=dataset.num_bond_type,
  File "/opt/conda/lib/python3.8/site-packages/torchdrug/data/dataset.py", line 168, in num_bond_type
    return len(self.bond_types)
  File "/opt/conda/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
    result = self.func(obj)
  File "/opt/conda/lib/python3.8/site-packages/torchdrug/data/dataset.py", line 183, in bond_types
    bond_types.update(graph.edge_list[:, 2].tolist())
AttributeError: 'NoneType' object has no attribute 'edge_list'

I tried the dataset MOSES. It looks like in torchdrug/data/dataset.py, line 183, we have

for graph in self.data:
       bond_types.update(graph.edge_list[:, 2].tolist())

and then the program complained about that graph is a NoneType, which may refer to self.data also has some problem. So I think there may be some bug in RGCN when interpreting the dataset. But I don't know how the inside works, could someone help solve this problem?

To reproduce this problem, I used nvidia docker:

nvidia-docker run -it --name=xxxxx  nvcr.io/nvidia/pytorch:21.06-py3 /bin/bash
conda install -c milagraph -c conda-forge torchdrug

and then follow the tutorial.

KeyError: 'graph' in Tutorials :Property Prediction

When running the following code

from torchdrug import utils
from torch.nn import functional as F
samples = []
categories = set()
for sample in valid_set:
    sample.pop("graph")
    category = tuple(sample.values())
    if category not in categories:
        categories.add(category)
        samples.append(sample)
samples = data.graph_collate(samples)
samples = utils.cuda(samples)

preds = F.sigmoid(task.predict(samples))
targets = task.target(samples)

titles = []
for pred, target in zip(preds, targets):
    pred = ", ".join(["%.2f" % p for p in pred])
    target = ", ".join(["%d" % t for t in target])
    titles.append("predict: %s\ntarget: %s" % (pred, target))
graph = samples["graph"]
graph.visualize(titles, figure_size=(3, 3.5), num_row=1)

The following error occurred：

Traceback (most recent call last):
  File "/home/ibmc-2/Projects/MNIST/mnist_data/5-2.py", line 46, in <module>
    preds = F.sigmoid(task.predict(samples))
  File "/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/tasks/property_prediction.py", line 104, in predict
    graph = batch["graph"]
KeyError: 'graph'

Since I'm just getting started, the question may be absurd. Thank you

Is there or will there be support for fine-tuning based on the property prediction task (i.e., weight transfer functionality) ?

How to use the generation model to optimize specific molecules

Hi！
How to use the generation model to optimize specific molecules? For example, I have trained a generation model of QED and logP "GCPN"_ zinc250k_ 1epoch_ finetune. pkl ",and I have one or some smiles of known molecules. I want to generate some molecules with better QED and logP properties based on my own molecules through this generation model. How can I achieve it?

What's more, how is it implemented on the official website Tutorials: Molecule Generation? I don't have such output during and after the training

The results are as follows:


(5.63, 'CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=C(I)C(C)(C)C')
(5.60, 'CCC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC(C)(C)CCC')
(5.44, 'CC=CC=CC=CC(Cl)=CC=CC=CC=CC=CC=C(C)C=CC=CC=C(C)C=CC(Br)=CC=CCCC')
(5.35, 'CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=CC=C(CC)C(C)C')
...

Thank you very much!

Saving model in Get Started guide: TypeError: Object of type Subset is not JSON serializable

Following the Get Started guide, the following error is thrown

import json

with open("clintox_gin.json", "w") as fout:
    json.dump(solver.config_dict(), fout)
solver.save("clintox_gin.pth")

TypeError: Object of type Subset is not JSON serializable

Trying to work with MOSES Dataset

dataset.num_bond_type
Traceback (most recent call last):
File "", line 1, in
File "/torchdrug/torchdrug/data/dataset.py", line 168, in num_bond_type
return len(self.bond_types)
File "torchdrug/torchdrug/utils/decorator.py", line 21, in get
result = self.func(obj)
File "torchdrug/torchdrug/data/dataset.py", line 183, in bond_types
bond_types.update(graph.edge_list[:, 2].tolist())
AttributeError: 'NoneType' object has no attribute 'edge_list'

ChEMBLFiltered fails with FileNotFoundError

import torch
from torchdrug import data, datasets
dataset = datasets.ChEMBLFiltered("~/molecule-datasets/")

fails with

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/datasets/chembl_filtered.py in _load_chembl_with_labels_dataset(root_path)
    103 def _load_chembl_with_labels_dataset(root_path):
    104     # 1. load folds and labels
--> 105     f=open(os.path.join(root_path, 'folds0.pckl'), 'rb')
    106     folds=pickle.load(f)
    107     f.close()

FileNotFoundError: [Errno 2] No such file or directory: './temp/chem_dataset/dataset/chembl_filtered/raw/folds0.pckl'

However, file folds0.pckl is written to
~/molecule-datasets/chem_dataset/dataset/chembl_filtered/raw/folds0.pckl

ChEMBLFiltered seems to download all MoleculeNet related datasets.
Is that intended behaviour?

ZINC2m IsADirectoryError in utils.get_line_count(path)

import torch
from torchdrug import core, datasets, tasks, models

dataset = datasets.ZINC2m("~/molecule-datasets/", node_feature="pretrain", edge_feature="pretrain")

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/datasets/zinc2m.py in __init__(self, path, verbose, **kwargs)
     44             reader = csv.reader(fin)
     45             if verbose:
---> 46                 reader = iter(tqdm(reader, "Loading %s" % path, utils.get_line_count(path)))
     47             smiles_list = []
     48 

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/utils/file.py in get_line_count(file_name, chunk_size)
    112     """
    113     count = 0
--> 114     with open(file_name, "rb") as fin:
    115         chunk = fin.read(chunk_size)
    116         while chunk:

IsADirectoryError: [Errno 21] Is a directory: '/username/molecule-datasets/'

graph.visualize(titles, figure_size=(3, 3.5), num_row=1) can not show the generated image

I modified the code 'plt.switch_backend("agg")' in graph.py to 'plt.switch_backend("TkAgg")', and still can't show the image，what's the correct way?

ImportError:DLL load failed: 找不到指定模块

the code 'command = ['ninja', '-v']' in file ~\Lib\site-packages\torch\utils\cpp_extension.py line 1631 report an error,then I modified to command = ['ninja', '--version'],an other error occured

how can I fix it?

How to add a custom data?

Please add a description or tutorial how to load custom data.

I would like to use the clinical photosensitivity (PIH) data published by Schmidt et al Chem. Res. Toxicol. 2019, 32, 2338−2352.
The data can be downloaded as supplementary material EXCEL file tx9b00338_si_001.xls
Table S1 contains a column with the SMILES, the PIH value and a Set column, with indicates the splits.

Many thanks

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

This issue happens in torch 1.9.0 + python 3.7, but not in torch 1.8.0 + python 3.7.

Upgrading PyTorch from 1.4 to 1.5 has made the code work fine.

Upgrading PyTorch from 1.4 to 1.5 has made the code work fine.
Thanks.

Originally posted by @sleeper2173 in #12 (comment)

I meet the same problem. Could you specify how did you upgrade the pytorch? I tried conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=10.1 -c pytorch but then got the error

File "/opt/conda/lib/python3.8/site-packages/torch_scatter/__init__.py", line 18, in <module>
    raise RuntimeError(
RuntimeError: Expected PyTorch version 1.4 but found version 1.5.

Then I tried uninstall torch_scatter and install it again. But it didn't help.
I'm working on the nvidia docker image: docker nvidia-docker run -it --name torchDrug -v ~/shared:/home/shared nvcr.io/nvidia/pytorch:21.06-py3 /bin/bash and ran the command conda install -c milagraph -c conda-forge torchdrug after starting the container.

How to install torchdrug in conda without updating too many dependencies?

Hi,

I've installed torchdrug via conda. The installation updates already installed pytorch and pytorch_scatter. I don't know the backside of torchdrug yet, but it seems like to me maybe install (or update) too many the packages. Wouldn't it be safer, if some of the dependencies are already installed, for such dependencies not to update while installing torchdrug?

JIT crash when running `generalized_spmm` and `_rspmm` on macOS

Whenever any model encounters functions like functional.generalized_spmm() or functional.generalized_rspmm() it crashes with a long stack trace saying that there are problems when JIT compiling the C++ code:

RuntimeError: Error building extension 'spmm': [1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_clang\" -DPYBIND11_STDLIB=\"_libcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1002\" -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/TH -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/THC -isystem /Users/migalkin/opt/miniconda3/envs/nbf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -g -Ofast -fopenmp -c /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cpp -o spmm.o 
FAILED: spmm.o 
c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_clang\" -DPYBIND11_STDLIB=\"_libcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1002\" -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/TH -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/THC -isystem /Users/migalkin/opt/miniconda3/envs/nbf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -g -Ofast -fopenmp -c /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cpp -o spmm.o 
clang: error: unsupported option '-fopenmp'
[2/3] c++ -MMD -MF rspmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_clang\" -DPYBIND11_STDLIB=\"_libcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1002\" -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/TH -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/THC -isystem /Users/migalkin/opt/miniconda3/envs/nbf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -g -Ofast -fopenmp -c /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cpp -o rspmm.o 
FAILED: rspmm.o 
c++ -MMD -MF rspmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_clang\" -DPYBIND11_STDLIB=\"_libcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1002\" -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/TH -isystem /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torch/include/THC -isystem /Users/migalkin/opt/miniconda3/envs/nbf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -g -Ofast -fopenmp -c /Users/migalkin/opt/miniconda3/envs/nbf/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cpp -o rspmm.o 
clang: error: unsupported option '-fopenmp'
ninja: build stopped: subcommand failed.

It seems that the problem is in the unsupported -fopenmp compiler flag for clang.

Torchdrug version is 0.1.0 h6151fa9. Python 3.8, torch 1.8.1.
I am running the code on macOS 11.2.1 (M1 CPU in the x86 compatibility mode, but it shouldn't matter I guess). clang version is:

Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

The conda environment has the following packages:

attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
boost                     1.74.0           py38h692b87f_3    conda-forge
boost-cpp                 1.74.0               hff03dee_4    conda-forge
bzip2                     1.0.8                h0d85af4_4    conda-forge
ca-certificates           2021.5.30            h033912b_0    conda-forge
cairo                     1.16.0            he43a7df_1008    conda-forge
certifi                   2021.5.30        py38h50d1736_0    conda-forge
cffi                      1.14.6           py38h9688ba1_0    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
coverage                  5.5              py38h96a0964_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
decorator                 4.4.2                      py_0    conda-forge
easydict                  1.9                      pypi_0    pypi
fontconfig                2.13.1            h10f422b_1005    conda-forge
freetype                  2.10.4               h4cff582_1    conda-forge
future                    0.18.2           py38h50d1736_3    conda-forge
gettext                   0.19.8.1          h7937167_1005    conda-forge
greenlet                  1.1.1            py38ha048514_0    conda-forge
icu                       68.1                 h74dc148_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
jbig                      2.1               h0d85af4_2003    conda-forge
jinja2                    3.0.1              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   hbcb3906_0    conda-forge
kiwisolver                1.3.2            py38h12bbefe_0    conda-forge
lcms2                     2.12                 h577c468_0    conda-forge
lerc                      2.2.1                h046ec9c_0    conda-forge
libblas                   3.9.0              11_osx64_mkl    conda-forge
libcblas                  3.9.0              11_osx64_mkl    conda-forge
libcxx                    12.0.1               habf9029_0    conda-forge
libdeflate                1.7                  h35c211d_5    conda-forge
libffi                    3.3                  h046ec9c_2    conda-forge
libgfortran               5.0.0           9_3_0_h6c81a4c_23    conda-forge
libgfortran5              9.3.0               h6c81a4c_23    conda-forge
libglib                   2.68.4               hd556434_0    conda-forge
libiconv                  1.16                 haf1e3a3_0    conda-forge
liblapack                 3.9.0              11_osx64_mkl    conda-forge
libpng                    1.6.37               h7cec526_2    conda-forge
libprotobuf               3.16.0               hcf210ce_0    conda-forge
libtiff                   4.3.0                h1167814_1    conda-forge
libwebp-base              1.2.1                h0d85af4_0    conda-forge
libxml2                   2.9.12               h93ec3fd_0    conda-forge
llvm-openmp               12.0.1               hda6cdc1_1    conda-forge
lz4-c                     1.9.3                he49afe7_1    conda-forge
markupsafe                2.0.1            py38h96a0964_0    conda-forge
matplotlib                3.4.3            py38h50d1736_0    conda-forge
matplotlib-base           3.4.3            py38hc7d2367_0    conda-forge
mkl                       2021.3.0           h08c4f10_555    conda-forge
more-itertools            8.8.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.2                  h2e338ed_4    conda-forge
networkx                  2.6.2              pyhd8ed1ab_0    conda-forge
ninja                     1.10.2               h9a9d8cb_0    conda-forge
numpy                     1.21.2           py38h49b9922_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjpeg                  2.4.0                h6e7aa92_1    conda-forge
openssl                   1.1.1l               h0d85af4_0    conda-forge
packaging                 21.0               pyhd8ed1ab_0    conda-forge
pandas                    1.3.2            py38ha53d530_0    conda-forge
pcre                      8.45                 he49afe7_0    conda-forge
pillow                    8.3.1            py38hee640a0_0    conda-forge
pip                       21.2.4             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               hbcb3906_0    conda-forge
pluggy                    0.13.1           py38h50d1736_4    conda-forge
py                        1.10.0             pyhd3deb0d_0    conda-forge
pycairo                   1.20.1           py38h53d24c6_0    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pytest                    6.2.5            py38h50d1736_0    conda-forge
pytest-cov                2.12.1             pyhd8ed1ab_0    conda-forge
python                    3.8.11               h88f2d9e_1
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.9.0           cpu_py38h0529baa_2    conda-forge
pytorch-cpu               1.9.0           cpu_py38he781eb1_2    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
pyyaml                    5.4.1                    pypi_0    pypi
rdkit                     2021.03.5        py38h0bd8f9b_0    conda-forge
readline                  8.1                  h05e3726_0    conda-forge
reportlab                 3.5.68           py38hf6ac518_0    conda-forge
scipy                     1.7.1            py38hd329d04_0    conda-forge
setuptools                57.4.0           py38h50d1736_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.5.1                h35c211d_1    conda-forge
sqlalchemy                1.4.23           py38h96a0964_0    conda-forge
sqlite                    3.36.0               h23a322b_0    conda-forge
tbb                       2021.3.0             h940c156_0    conda-forge
tk                        8.6.11               h5dbffcc_1    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
torch                     1.8.1                    pypi_0    pypi
torch-scatter             2.0.8                    pypi_0    pypi
torchdrug                 0.1.0                  h6151fa9    milagraph
tornado                   6.1              py38h96a0964_1    conda-forge
tqdm                      4.62.2             pyhd8ed1ab_0    conda-forge
typing_extensions         3.10.0.0           pyha770c72_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xz                        5.2.5                haf1e3a3_1    conda-forge
zlib                      1.2.11            h7795811_1010    conda-forge
zstd                      1.5.0                h582d3a0_0    conda-forge

What would be the way to fix this?
Thanks!

(Google Colab Installation) ImportError: cannot import name 'data' from 'torchdrug' (unknown location)

Dear sir:
I am trying to use torchdrug on Google Colab. Here is my code.

 !pip install rdkit-pypi
 !git clone https://github.com/DeepGraphLearning/torchdrug
 !pip install -r /content/torchdrug/requirements.txt
 !python /content/torchdrug/setup.py install

All seems to be well.
But when I run

import torchdrug as td
from torchdrug import data

edge_list = [[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 0]]
graph = data.Graph(edge_list, num_node=6)
graph.visualize()

Then shows:

ImportError                               Traceback (most recent call last)
<ipython-input-10-bbad14b8be21> in <module>()
      1 import torch
      2 import torchdrug as td
----> 3 from torchdrug import data, datasets, core, models, tasks
      4 get_ipython().magic('matplotlib inline')
      5 

ImportError: cannot import name 'data' from 'torchdrug' (unknown location)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

How can solve this probelm?

AttributeError: 'USPTO50k' object has no attribute 'lazy'

Running the retrosynthesis tutorial, the following error occurred

import torchdrug
print(torchdrug.__version__)

0.1.0

from torchdrug import datasets

reaction_dataset = datasets.USPTO50k("~/Projects/drugs/molecule-datasets/",
                                     node_feature="center_identification",
                                     kekulize=True)
synthon_dataset = datasets.USPTO50k("~/Projects/drugs/molecule-datasets/", as_synthon=True,
                                    node_feature="synthon_completion",
                                    kekulize=True)

from torchdrug.utils import plot

for i in range(2):
    sample = reaction_dataset[i]
    reactant, product = sample["graph"]
    reactants = reactant.connected_components()[0]
    products = product.connected_components()[0]
    plot.reaction(reactants, products)

AttributeError: 'USPTO50k' object has no attribute 'lazy'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/zq/yt7gftfj7_x_11psy591dtz00000gn/T/ipykernel_6673/2093549361.py in <module>
      2 
      3 for i in range(2):
----> 4     sample = reaction_dataset[i]
      5     reactant, product = sample["graph"]
      6     reactants = reactant.connected_components()[0]

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/data/dataset.py in __getitem__(self, index)
    138     def __getitem__(self, index):
    139         if isinstance(index, int):
--> 140             return self.get_item(index)
    141 
    142         index = self._standarize_index(index, len(self))

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/data/dataset.py in get_item(self, index)
    127 
    128     def get_item(self, index):
--> 129         if self.lazy:
    130             item = {"graph": data.Molecule.from_smiles(self.smiles_list[index], **self.kwargs)}
    131         else:

AttributeError: 'USPTO50k' object has no attribute 'lazy'

G2Gs retrosynthesis performance can not be reproduced?

Hello,

Thanks for sharing this library! The reproduced retrosynthesis results of G2Gs are different from the paper. Does this mean that the results of the G2Gs paper are not reproducible?

These are the reported results from the Torchdrug tutorial:

top-1 accuracy: 0.47541
top-3 accuracy: 0.741803
top-5 accuracy: 0.827869
top-10 accuracy: 0.879098

How to use MOSES train/test/testSF dataset in Torchdrug

TorchDrug implements MOSES dataset, but doesn't distinguish between train / test / testSF which MOSES has. To train GCPN on Moses, I think the correct order is to pretrain the model by train dataset at first, then train it on test / testSF dataset and finally generate the molecules. But how to do this in TorchDrug? There's only one dataset named MOSES.

I have this question because when I generate molecules by MOSES, the statistics doesn't look correct if compared to other models on MOSEC, especially the Scaf/Test property in the table, which tries to find out if there are same scaffolds in test dataset and generated molecules. It's 0 for GCPN model after training on TorchDrug, following the tutorial. I think the problem is that TorchDrug only uses the train dataset but not test dataset. How can I explicitly use it? Thanks in advance!

Tutorial Goal-directed molecule generation: UnboundLocalError: local variable 'sascorer' referenced before assignment

In the tutorial "Goal-directed molecule generation", the following error occurred

UnboundLocalError: local variable 'sascorer' referenced before assignment

import torchdrug
print(torchdrug.__version__)

0.1.0

import os
import pickle
import torch
from torchdrug import core, datasets, models, tasks
from collections import defaultdict

# dataset = datasets.ZINC250k("~/Projects/drugs/molecule-datasets/", kekulize=True,
#                             node_feature="symbol")
filename = os.path.expanduser("~/Projects/drugs/molecule-datasets/zinc250k.pkl")
print(f"Loading {filename}")
with open(filename, "rb") as fin:
    dataset = pickle.load(fin)

model = models.RGCN(input_dim=dataset.node_feature_dim,
                    num_relation=dataset.num_bond_type,
                    hidden_dims=[256, 256, 256, 256], batch_norm=False)
task = tasks.GCPNGeneration(model, dataset.atom_types,
                            max_edge_unroll=12, max_node=38,
                            task="plogp", criterion="ppo",
                            reward_temperature=1,
                            agent_update_interval=3, gamma=0.9)

optimizer = torch.optim.Adam(task.parameters(), lr=1e-5)
solver = core.Engine(task, dataset, None, None, optimizer,
                     #gpus=(0,),
                     batch_size=16, log_interval=10)

filename = os.path.expanduser("~/Projects/drugs/graphgeneration/gcpn_zinc250k_1epoch.pkl")
solver.load(filename,
            load_optimizer=False)

# RL finetuning
solver.train(num_epoch=10)
filename = os.path.expanduser("~/Projects/drugs/graphgeneration/gcpn_zinc250k_1epoch_finetune.pkl")
solver.save(filename)

UnboundLocalError: local variable 'sascorer' referenced before assignment
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
/var/folders/zq/yt7gftfj7_x_11psy591dtz00000gn/T/ipykernel_7569/599880889.py in <module>
     39 
     40 # RL finetuning
---> 41 solver.train(num_epoch=10)
     42 filename = os.path.expanduser("~/Projects/drugs/graphgeneration/gcpn_zinc250k_1epoch_finetune.pkl")
     43 solver.save(filename)

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/core/engine.py in train(self, num_epoch, batch_per_epoch)
    141                     batch = utils.cuda(batch, device=self.device)
    142 
--> 143                 loss, metric = model(batch)
    144                 if not loss.requires_grad:
    145                     raise RuntimeError("Loss doesn't require grad. Did you define any loss in the task?")

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/tasks/generation.py in forward(self, batch)
    700                 metric.update(_metric)
    701             elif criterion == "ppo":
--> 702                 _loss, _metric = self.reinforce_forward(batch)
    703                 all_loss += _loss * weight
    704                 metric.update(_metric)

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/tasks/generation.py in reinforce_forward(self, batch)
    811         for task in self.task:
    812             if task == "plogp":
--> 813                 plogp = metrics.penalized_logP(graph)
    814                 metric["Penalized logP"] = plogp.mean()
    815                 metric["Penalized logP (max)"] = plogp.max()

~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/metrics/metric.py in penalized_logP(pred)
    117                 Chem.GetSymmSSSR(mol)
    118                 logp = Descriptors.MolLogP(mol)
--> 119                 sa = sascorer.calculateScore(mol)
    120             logp = (logp - logp_mean) / logp_std
    121             sa = (sa - sa_mean) / sa_std

UnboundLocalError: local variable 'sascorer' referenced before assignment

Running quick start RuntimeError: Ninja is required to load C++ extensions

Hello. I was running the "quickstart" code on ubuntu20.04. I used torch = 1.9.0 and python = 3.8 with cuda = 11.1.

when I running the code followed:

optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
solver = core.Engine(task, train_set, valid_set, test_set, optimizer, gpus=[0],
                     batch_size=512) solver.train(num_epoch=100)

But:

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_2887068/1406504193.py in <module>
----> 1 solver.train(num_epoch=100)

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/core/engine.py in train(self, num_epoch, batch_per_epoch)
    141                     batch = utils.cuda(batch, device=self.device)
    142 
--> 143                 loss, metric = model(batch)
    144                 if not loss.requires_grad:
    145                     raise RuntimeError("Loss doesn't require grad. Did you define any loss in the task?")

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/tasks/property_prediction.py in forward(self, batch)
     72         metric = {}
     73 
---> 74         pred = self.predict(batch, all_loss, metric)
     75 
     76         if all([t not in batch for t in self.task]):

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/tasks/property_prediction.py in predict(self, batch, all_loss, metric)
    103     def predict(self, batch, all_loss=None, metric=None):
    104         graph = batch["graph"]
--> 105         output = self.model(graph, graph.node_feature.float(), all_loss=all_loss, metric=metric)
    106         pred = self.linear(output["graph_feature"])
    107         return pred

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/models/gin.py in forward(self, graph, input, all_loss, metric)
     74 
     75         for layer in self.layers:
---> 76             hidden = layer(graph, layer_input)
     77             if self.short_cut and hidden.shape == layer_input.shape:
     78                 hidden = hidden + layer_input

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/layers/conv.py in forward(self, graph, input)
     89             update = checkpoint.checkpoint(self._message_and_aggregate, *graph.to_tensors(), input)
     90         else:
---> 91             update = self.message_and_aggregate(graph, input)
     92         output = self.combine(input, update)
     93         return output

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/layers/conv.py in message_and_aggregate(self, graph, input)
    338 
    339     def message_and_aggregate(self, graph, input):
--> 340         adjacency = utils.sparse_coo_tensor(graph.edge_list.t()[:2], graph.edge_weight,
    341                                             (graph.num_node, graph.num_node))
    342         update = torch.sparse.mm(adjacency.t(), input)

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/utils/torch.py in sparse_coo_tensor(indices, values, size)
    160         size (list): size of the tensor
    161     """
--> 162     return torch_ext.sparse_coo_tensor_unsafe(indices, values, size)
    163 
    164 

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/utils/torch.py in __getattr__(self, key)
     26     def __getattr__(self, key):
     27         if "module" not in self.__dict__:
---> 28             self.module = cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
     29                                              self.extra_ldflags, self.extra_include_paths, self.build_directory,
     30                                              self.verbose, **self.kwargs)

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1078                 verbose=True)
   1079     '''
-> 1080     return _jit_compile(
   1081         name,
   1082         [sources] if isinstance(sources, str) else sources,

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1291                             clean_ctx=clean_ctx
   1292                         )
-> 1293                     _write_ninja_file_and_build_library(
   1294                         name=name,
   1295                         sources=sources,

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_standalone)
   1372         with_cuda: Optional[bool],
   1373         is_standalone: bool = False) -> None:
-> 1374     verify_ninja_availability()
   1375     if IS_WINDOWS:
   1376         compiler = os.environ.get('CXX', 'cl')

~/anaconda3/envs/torchDrug/lib/python3.8/site-packages/torch/utils/cpp_extension.py in verify_ninja_availability()
   1428     '''
   1429     if not is_ninja_available():
-> 1430         raise RuntimeError("Ninja is required to load C++ extensions")
   1431 
   1432 

RuntimeError: Ninja is required to load C++ extensions**

Ninja and other required package were installed.
Could you give me some advice to solve this problem?
Thank you!
Best wish!

RuntimeError: shape mismatch: value tensor of shape [28] cannot be broadcast to indexing result of shape [0]

I tried to run the code of Molecule Generation in the Tutorials.

from torchdrug import datasets, core, models, tasks
from torch import optim

dataset = datasets.ZINC250k("~/molecule-datasets/", kekulize=True,
                            node_feature="symbol")
model = models.RGCN(input_dim=dataset.node_feature_dim,
                    num_relation=dataset.num_bond_type,
                    hidden_dims=[256, 256, 256, 256], batch_norm=False)
task = tasks.GCPNGeneration(model, dataset.atom_types, max_edge_unroll=12,
                            max_node=38, criterion="nll")
optimizer = optim.Adam(task.parameters(), lr = 1e-3)
solver = core.Engine(task, dataset, None, None, optimizer,
                     gpus=(0,), batch_size=128, log_interval=10)
solver.train(num_epoch=1)
solver.save("gcpn_zinc250k_1epoch.pkl")
results = task.generate(num_sample=32, max_resample=5)

And I got the error.

Traceback (most recent call last):
  File "./test.py", line 18, in <module>
    results = task.generate(num_sample=32, max_resample=5)
  File "/home/foo/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad                     return func(*args, **kwargs)
  File "/home/foo/opt/anaconda3/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/tasks/generation.py", line 1369, in generate
    new_graph = self._apply_action(graph, off_policy, max_resample, verbose=1)
  File "/home/foo/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/foo/opt/anaconda3/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/tasks/generation.py", line 1248, in _apply_action
    bond_type[is_modified_edge] = edge_action[has_modified_edge]
RuntimeError: shape mismatch: value tensor of shape [28] cannot be broadcast to indexing result of shape [0]

My environment : torch==1.4.0 ,python==3.7.4, rdkit==2020.03.3
Thanks.

Some questions about KnowledgeBaseGraphAttentionNetwork model

Hi: I have some questions about KnowledgeBaseGraphAttentionNetwork model, the code at kbgat.py , the kbgat inherit GAT method. But in the original paper, there is also attention code for different relations, but at kbgat.py can not find attention for multi-type relations, maybe I miss something for this code, or kbgat.py did not fully code refer to the original paper? thanks

An issue on Retrosynthesis Tasks

synthon_optimizer = torch.optim.Adam(synthon_task.parameters(), lr=1e-3)
synthon_solver = core.Engine(synthon_task, synthon_train, synthon_valid,
synthon_test, synthon_optimizer,
gpus=[0], batch_size=128)
synthon_solver.train(num_epoch=10)
synthon_solver.evaluate("valid")
synthon_solver.save("g2gs_synthon_model.pth")

Error: module 'torch._C' has no attribute '_cuda_resetPeakMemoryStats'

I tried to run the code in the Quick Start. When I got to this step,

optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
solver = core.Engine(task, train_set, valid_set, test_set, optimizer,
                     batch_size=1024)
solver.train(num_epoch=5)

I got the bug

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-46cb357a6598> in <module>
      2 solver = core.Engine(task, train_set, valid_set, test_set, optimizer,
      3                      batch_size=1024)
----> 4 solver.train(num_epoch=5)

~/torch_drug_test/torchdrug/torchdrug/core/engine.py in train(self, num_epoch, batch_per_epoch)
    129         model.train()
    130 
--> 131         for epoch in self.meter(num_epoch):
    132             sampler.set_epoch(epoch)
    133 

~/torch_drug_test/torchdrug/torchdrug/core/meter.py in __call__(self, num_epoch)
    100                 logger.warning(pretty.separator)
    101                 logger.warning("Epoch %d end" % epoch)
--> 102             self.step()

~/torch_drug_test/torchdrug/torchdrug/core/meter.py in step(self)
     82         logger.warning("ETA: %s" % pretty.time(eta))
     83         logger.warning("max GPU memory: %.1f MiB" % (torch.cuda.max_memory_allocated() / 1e6))
---> 84         torch.cuda.reset_peak_memory_stats()
     85 
     86         logger.warning(pretty.line)

~/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torch/cuda/memory.py in reset_peak_memory_stats(device)
    236     """
    237     device = _get_device_index(device, optional=True)
--> 238     return torch._C._cuda_resetPeakMemoryStats(device)
    239 
    240 

AttributeError: module 'torch._C' has no attribute '_cuda_resetPeakMemoryStats'

It's likely because I'm using the CPU install, and it was easy enough for me to comment out the line

torch.cuda.reset_peak_memory_stats()

in
/torchdrug/core/meter.py. So perhaps one could just add an if statement to see if the GPU is enabled?

Broadly speaking, is this code repo robust for CPU users or is it targeted at GPU only?

Functionality for Explainable Graph Neural Networks?

Hi!

Are you going to add functionality for the interpretation of GNN models to torchdrug?

There are benchmarks datasets Benchmarks for interpretation of QSAR models
and there is a whole bunch of different methods Explainability in Graph Neural Networks: A Taxonomic Survey.
Unfortunately, I haven't seen a method which directly combines explainability and uncertainty quantification (like evidential deep learning).
That would be really helpful for our medicinal chemists to understand why a decision was made by a model und how certain the model is about the decision.

Thanks

PackageNotFoundError: Packages missing in current channels: torchdrug

i use conda install -c milagraph -c conda-forge torchdrug to install torchdrug, but it show this. Can u tell what cause this problem? Thanks u. in my environment : torch==1.8 ,py ==3.6

example 中的例子不能完美运行

项目开始介绍页使用的 td.CARBON 没有引用来源，无法展示项目功能

PCQM4M-LSC dataset

Dear Torchdrug team,

Can you please add a reader for the PCQM4M-LSC dataset?
See https://ogb.stanford.edu/kddcup2021/pcqm4m/

Many thanks!

Shape mismatch error in the GraphConv layer

torchdrug/torchdrug/layers/conv.py

Lines 154 to 174 in 5bf0a50

    
           def message_and_aggregate(self, graph, input): 
        
               node_in, node_out = graph.edge_list.t()[:2] 
        
               node_in = torch.cat([node_in, torch.arange(graph.num_node, device=graph.device)]) 
        
               node_out = torch.cat([node_out, torch.arange(graph.num_node, device=graph.device)]) 
        
               edge_weight = torch.cat([graph.edge_weight, torch.ones(graph.num_node, device=graph.device)]) 
        
               degree_in = graph.degree_in + 1 
        
               degree_out = graph.degree_out + 1 
        
               edge_weight = edge_weight / (degree_in[node_in] * degree_out[node_out]).sqrt() 
        
               adjacency = utils.sparse_coo_tensor(torch.stack([node_in, node_out]), edge_weight, 
        
                                                   (graph.num_node, graph.num_node)) 
        
               update = torch.sparse.mm(adjacency.t(), input) 
        
               if self.edge_linear: 
        
                   edge_input = graph.edge_feature.float() 
        
                   if self.edge_linear.in_features > self.edge_linear.out_features: 
        
                       edge_input = self.edge_linear(edge_input) 
        
                   edge_weight = edge_weight.unsqueeze(-1) 
        
                   edge_update = scatter_add(edge_input * edge_weight, graph.edge_list[:, 1], dim=0, 
        
                                             dim_size=graph.num_node) 
        
                   if self.edge_linear.in_features <= self.edge_linear.out_features: 
        
                       edge_update = self.edge_linear(edge_update) 
        
                   update += edge_update

In the message_and_aggregate() method of GraphConv class, the edge list is concatenated with self-loops while the edge input isn't. This will cause a shape mismatch error in this line.

torchdrug/torchdrug/layers/conv.py

Lines 170 to 171 in 5bf0a50

    
           edge_update = scatter_add(edge_input * edge_weight, graph.edge_list[:, 1], dim=0, 
        
                                     dim_size=graph.num_node)

Json dump of solver.config_dict() fails with TypeError: Object of type Subset is not JSON serializable | in train_set

torchdrug.version: 0.1.1

Code

import json
import torch
from torchdrug import core, datasets, tasks, models

dataset = datasets.OPV("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
print(f"# Train/Valid/Test: {len(train_set)}/{len(valid_set)}/{len(test_set)}")

model = models.GIN(
    input_dim=dataset.node_feature_dim,
    hidden_dims=[300, 300, 300, 300],
    short_cut=True,
    batch_norm=True,
    concat_hidden=True,
)
subtasks = (
    "gap",
    "homo",
    "lumo",
    "spectral_overlap",
)
task = tasks.PropertyPrediction(
    model, task=subtasks, criterion="mse", metric=("mae", "rmse"), verbose=1
)
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
solver = core.Engine(
    task,
    train_set,
    valid_set,
    test_set,
    optimizer,
    gpus=[0],
    batch_size=256,
)
with open("opv_gin.json", "w") as fout:
    json.dump(solver.config_dict(), fout)

Error

Traceback (most recent call last):
  File "~/opv-gin-property-prediction.py", line 40, in <module>
    json.dump(solver.config_dict(), fout)
  File "~/opt/anaconda3/envs/drugs/lib/python3.8/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "~/opt/anaconda3/envs/drugs/lib/python3.8/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "~/opt/anaconda3/envs/drugs/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "~/opt/anaconda3/envs/drugs/lib/python3.8/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "~/opt/anaconda3/envs/drugs/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Subset is not JSON serializable

Output JSON File

opv_gin.json

{
    "class": "core.Engine",
    "task": {
        "class": "tasks.PropertyPrediction",
        "model": {
            "class": "models.GIN",
            "input_dim": 69,
            "hidden_dims": [
                300,
                300,
                300,
                300
            ],
            "edge_input_dim": null,
            "num_mlp_layer": 2,
            "eps": 0,
            "learn_eps": false,
            "short_cut": true,
            "batch_norm": true,
            "activation": "relu",
            "concat_hidden": true,
            "readout": "sum"
        },
        "task": [
            "gap",
            "homo",
            "lumo",
            "spectral_overlap"
        ],
        "criterion": "mse",
        "metric": [
            "mae",
            "rmse"
        ],
        "verbose": 1
    },
    "train_set":

Writing of the file opv_gin.json stops with the error above.

An error:TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

there is an error when i tried to run the following test code.

from torchdrug import data

edge_list = [[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 0]]
graph = data.Graph(edge_list, num_node=6)
graph = graph.cuda()

the subgraph induced by nodes 2, 3 & 4

subgraph = graph.subgraph([2, 3, 4])

the error is :
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

how can I fix it

The error when run tutorial of retrosynthesis

Dear everyone,

I have install torchdrug correctly, and then follow the tutorial https://torchdrug.ai/docs/tutorials/retrosynthesis.html
When I run the code as below:

from torchdrug import datasets

reaction_dataset = datasets.USPTO50k("D:/test/molecule-datasets/",
                                     node_feature="reaction_reaction_identification",
                                     kekulize=True)
synthon_dataset = datasets.USPTO50k("D:/test/molecule-dataset/", as_synthon=True,
                                    node_feature="synthon_completion",
                                    kekulize=True)

It happens error as follows:

Loading D:/test/molecule-datasets/data_processed.csv: 100%|██████████| 50017/50017 [00:00<00:00, 92358.37it/s]
Constructing molecules from SMILES:   0%|          | 0/50016 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "F:/workdir/pycharm/Retrosynthesis/main.py", line 5, in <module>
    kekulize=True)
  File "D:\soft\Anaconda3\envs\py37\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "D:\soft\Anaconda3\envs\py37\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\core\core.py", line 282, in wrapper
    return init(self, *args, **kwargs)
  File "D:\soft\Anaconda3\envs\py37\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\datasets\uspto50k.py", line 63, in __init__
    **kwargs)
  File "D:\soft\Anaconda3\envs\py37\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\data\dataset.py", line 112, in load_csv
    self.load_smiles(smiles, targets, verbose=verbose, **kwargs)
  File "D:\soft\Anaconda3\envs\py37\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\data\dataset.py", line 232, in load_smiles
    mol = data.Molecule.from_molecule(mol, **kwargs)
  File "D:\soft\Anaconda3\envs\py37\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\data\molecule.py", line 183, in from_molecule
    func = R.get("features.atom.%s" % name)
  File "D:\soft\Anaconda3\envs\py37\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\core\core.py", line 208, in get
    raise KeyError("Can't find `%s` in `%s`" % (key, ".".join(keys[:i])))
KeyError: "Can't find `reaction_reaction_identification` in `features.atom`"

what is the problem? could you help me to solve it?
Thanks.

UnboundLocalError: local variable 'sascorer' referenced before assignment in Tutorials：Molecule Generation

HI，When I run to this part "Goal-Directed Molecule Generation with Reinforcement Learning: GCPN"

import torch
import pickle
from torchdrug import core, datasets, models, tasks
from torch import nn, optim
from collections import defaultdict

with open("/home/ibmc-2/Projects/torchdrug/zinc250k.pkl", "rb") as fin:
    dataset = pickle.load(fin)

model = models.RGCN(input_dim=dataset.node_feature_dim,
                    num_relation=dataset.num_bond_type,
                    hidden_dims=[256, 256, 256, 256], batch_norm=False)
task = tasks.GCPNGeneration(model, dataset.atom_types,
                            max_edge_unroll=12, max_node=38,
                            task="plogp", criterion="ppo",
                            reward_temperature=1,
                            agent_update_interval=3, gamma=0.9)


optimizer = optim.Adam(task.parameters(), lr=1e-5)
solver = core.Engine(task, dataset, None, None, optimizer,
                     gpus=(0,), batch_size=16, log_interval=10)

solver.load("/home/ibmc-2/Projects/torchdrug/gcpn_zinc250k_1epoch.pkl",
             load_optimizer=False)

The above part works normally，but an error is reported when running to fine-tuning

solver.train(num_epoch=10)
solver.save("/home/ibmc-2/Projects/torchdrug/gcpn_zinc250k_1epoch_finetune.pkl")

as follow

solver.train(num_epoch=10)
19:16:03   >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
19:16:03   Epoch 0 begin
/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/data/molecule.py:103: UserWarning: Try to apply masks on molecules with stereo bonds. This may produce invalid molecules. To discard stereo information, call `mol.bond_stereo[:] = 0` before applying masks.
  warnings.warn("Try to apply masks on molecules with stereo bonds. This may produce invalid molecules. "
/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torch/nn/functional.py:1698: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Traceback (most recent call last):
  File "/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-451bfe7c53dc>", line 1, in <module>
    solver.train(num_epoch=10)
  File "/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/core/engine.py", line 143, in train
    loss, metric = model(batch)
  File "/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/tasks/generation.py", line 702, in forward
    _loss, _metric = self.reinforce_forward(batch)
  File "/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/tasks/generation.py", line 813, in reinforce_forward
    plogp = metrics.penalized_logP(graph)
  File "/home/ibmc-2/anaconda3/envs/td/lib/python3.8/site-packages/torchdrug-0.1.0-py3.8.egg/torchdrug/metrics/metric.py", line 119, in penalized_logP
    #sa = sascorer.calculateScore(mol)
UnboundLocalError: local variable 'sascorer' referenced before assignment

By the way, is it normal that UserWarning: Unknown value Tl or Mn or Cr appears when downloading the ClinTox data set, thank you！

undefined symbol: _ZNK2at6Tensor6deviceEv

I installed torchdrug with

conda install -c milagraph -c conda-forge torchdrug

I've performed the verification for pytorch https://pytorch.org/get-started/locally/#linux-verification

This is the error I get when I try import torchdrug:

OSError:  <redacted>/.venv/lib/python3.8/site-packages/torch_scatter/_scatter_cuda.so: undefined symbol: _ZNK2at6Tensor6deviceEv

Operating system: ArchLinux

Is it correct to have warnings of unknown value in feature.py?

How to get the SMILES leading to a RDKit error during data loading?

Like:

Constructing molecules from SMILES:  22%|██▏       | 305/1417 [00:00<00:02, 465.19it/s]
RDKit ERROR: [11:20:56] Explicit valence for atom # 7 O, 3, is greater than permitted
[11:20:56] Explicit valence for atom # 7 O, 3, is greater than permitted

Constructing molecules from SMILES:  33%|███▎      | 473/1417 [00:01<00:03, 308.42it/s]
~/opt/anaconda3/envs/drugs/lib/python3.8/site-packages/torchdrug/data/feature.py:37: UserWarning: Unknown value `As`
  warnings.warn("Unknown value `%s`" % x)

Constructing molecules from SMILES:  56%|█████▌    | 788/1417 [00:06<00:11, 55.14it/s]
RDKit ERROR: [11:21:02] Explicit valence for atom # 3 O, 3, is greater than permitted

There are arsenic-based drugs. Are these correctly processed in feature.py?
See for example: https://go.drugbank.com/categories/DBCAT001515

What all molecule generation models can be used?

I wanted to ask are GCPN and GRAPHAF the only two graph generative models used, can I integrate other models like ZINC, ORGAN and JT VAE for molecular generation?

How to determine the "node_feature" and "edge_feature"?

Hi all,

I come across a problem while I cannot find the answer in tutorial or document.
Is there any candidate pool for the value of "node_feature" and "edge_feature"?
In someplace, the "node_feature" is "default", "pretrain" or other values. So I wonder if there is any difference between these values? From where, I can find the whole candidates for the value?

Thank you! Looking forward to your apply!

Best,

How to load an existing checkpoint and train it for more epochs

For example, after train a model for 3 epochs and save it as model_3epoch.pkl, how to load model_3epoch.pkl and train it for more epochs?

Besides that, if we train model_3epoch.pkl with the training dataset, can we load it and train it with the test dataset then?

The question was asked here at first.

TorchDrug for Ubuntu 18.04 without Anaconda

What is the simplest way to install it with pip on Ubuntu 18.04 please?

I have successfully installed all requirements, however, it can't find the RDKit:

ModuleNotFoundError: No module named 'rdkit'

To install it, I used: sudo apt-get install python-rdkit librdkit1 rdkit-data as per RDKit

Any ideas, how to run this awesome library without Anaconda please?

How to evaluate molecule generation models?

Hi torchdrug team, thank you for the awesome project! I am playing with molecule generation models, and am interested in trying to reproduce the benchmarks posted here: https://torchdrug.ai/docs/benchmark/generation.html

I am able to follow the tutorial for molecule generation: https://torchdrug.ai/docs/tutorials/generation.html

But I found that there was no mention of how we can evaluate models once they are fully trained. Is there any evaluator class or oracle that can be called to obtain the metrics as in your benchmark?

Additionally, do you have any advice on how to set the hyperparameters to fairly reproduce/compare to the GCPN or GraphAF papers?

Error installing OSError undefined symbols

Background

I am facing installation issues. Note, my installation is in a fresh conda environment, and my only manual installs were numpy and pytorch before running

conda install -c milagraph -c conda-forge torchdrug

Details below, and happy to provide more info. Thanks!

System and Conda Env Info

System information
Ubuntu 20.04.2 LTS, 64-bit
No GPU

python --version
> Python 3.8.8

conda --version
> conda 4.10.3

conda --list
># Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             4.5                       1_gnu  
alsa-lib                  1.2.3                h516909a_0    conda-forge
boost                     1.74.0           py38hc10631b_3    conda-forge
boost-cpp                 1.74.0               h312852a_4    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2021.5.30            ha878542_0    conda-forge
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2021.5.30        py38h578d9bd_0    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
cpuonly                   1.0                           0    pytorch
cycler                    0.10.0                     py_2    conda-forge
dbus                      1.13.6               h48d8840_2    conda-forge
decorator                 5.0.9              pyhd8ed1ab_0    conda-forge
expat                     2.4.1                h9c3ff4c_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
glib                      2.68.3               h9c3ff4c_0    conda-forge
glib-tools                2.68.3               h9c3ff4c_0    conda-forge
greenlet                  1.1.0            py38h709712a_0    conda-forge
gst-plugins-base          1.18.4               hf529b03_2    conda-forge
gstreamer                 1.18.4               h76c114f_2    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
intel-openmp              2021.3.0          h06a4308_3350  
jbig                      2.1               h7f98852_2003    conda-forge
jinja2                    3.0.1              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
kiwisolver                1.3.1            py38h1fd1430_1    conda-forge
krb5                      1.19.2               hcc1bbae_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.35.1               h7274673_9  
lerc                      2.2.1                h9c3ff4c_0    conda-forge
libblas                   3.9.0           11_linux64_openblas    conda-forge
libcblas                  3.9.0           11_linux64_openblas    conda-forge
libclang                  11.1.0          default_ha53f305_1    conda-forge
libdeflate                1.7                  h7f98852_5    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.3.0               h5101ec6_17  
libgfortran-ng            11.1.0               h69a702a_8    conda-forge
libgfortran5              11.1.0               h6c583b3_8    conda-forge
libglib                   2.68.3               h3e27bee_0    conda-forge
libgomp                   9.3.0               h5101ec6_17  
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           11_linux64_openblas    conda-forge
libllvm11                 11.1.0               hf817b99_2    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.17          pthreads_h8fe5266_1    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     13.3                 hd57d9b9_0    conda-forge
libstdcxx-ng              9.3.0               hd4cf53a_17  
libtiff                   4.3.0                hf544144_1    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.2.0                h7f98852_2    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.12               h72842e0_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markupsafe                2.0.1            py38h497a2fe_0    conda-forge
matplotlib                3.4.2            py38h578d9bd_0    conda-forge
matplotlib-base           3.4.2            py38hcc49a3a_0    conda-forge
mkl                       2021.3.0           h06a4308_520  
mysql-common              8.0.25               ha770c72_2    conda-forge
mysql-libs                8.0.25               hfa10184_2    conda-forge
ncurses                   6.2                  he6710b0_1  
networkx                  2.5                        py_0    conda-forge
ninja                     1.10.2               h4bd325d_0    conda-forge
nspr                      4.30                 h9c3ff4c_0    conda-forge
nss                       3.67                 hb5efdd6_0    conda-forge
numpy                     1.21.1           py38h9894fe3_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1k               h7f98852_0    conda-forge
pandas                    1.3.1            py38h1abd341_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    8.3.1            py38h8e6f84c_0    conda-forge
pip                       21.2.3             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycairo                   1.20.1           py38hf61ee4a_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py38h578d9bd_7    conda-forge
pyqt-impl                 5.12.3           py38h7400c14_7    conda-forge
pyqt5-sip                 4.19.18          py38h709712a_7    conda-forge
pyqtchart                 5.12             py38h7400c14_7    conda-forge
pyqtwebengine             5.12.1           py38h7400c14_7    conda-forge
python                    3.8.11          h12debd9_0_cpython  
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.4.0               py3.8_cpu_0  [cpuonly]  pytorch
pytorch_scatter           2.0.4            py38h9235441_1    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
qt                        5.12.9               hda022c4_4    conda-forge
rdkit                     2021.03.4        py38hf8acc3d_0    conda-forge
readline                  8.1                  h27cfd23_0  
reportlab                 3.5.68           py38hadf75a6_0    conda-forge
setuptools                52.0.0           py38h06a4308_0  
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlalchemy                1.4.22           py38h497a2fe_0    conda-forge
sqlite                    3.36.0               hc218d9a_0  
tk                        8.6.10               hbc83047_0  
torchaudio                0.4.0                      py38    pytorch
torchdrug                 0.1.0                  h39ad8c7    milagraph
torchvision               0.5.0                  py38_cpu  [cpuonly]  pytorch
tornado                   6.1              py38h497a2fe_1    conda-forge
tqdm                      4.62.0             pyhd8ed1ab_0    conda-forge
wheel                     0.36.2             pyhd3eb1b0_0  
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.5.0                ha95c52a_0    conda-forge

Error output

Here is the error:

(torchdrug) 1_Projects $ python
Python 3.8.11 (default, Aug  3 2021, 15:09:35) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchdrug
Traceback (most recent call last):
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torch_scatter/__init__.py", line 12, in <module>
    torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torch/_ops.py", line 106, in load_library
    ctypes.CDLL(path)
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torch_scatter/_version.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torchdrug/__init__.py", line 1, in <module>
    from . import patch
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torchdrug/patch.py", line 12, in <module>
    from torchdrug import core, data
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torchdrug/core/__init__.py", line 2, in <module>
    from .engine import Engine
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torchdrug/core/engine.py", line 10, in <module>
    from torchdrug import data, core, utils
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torchdrug/data/__init__.py", line 1, in <module>
    from .graph import Graph, PackedGraph, cat
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torchdrug/data/graph.py", line 9, in <module>
    from torch_scatter import scatter_add, scatter_min
  File "/home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torch_scatter/__init__.py", line 21, in <module>
    raise OSError(e)
OSError: /home/murph213/anaconda3/envs/torchdrug/lib/python3.8/site-packages/torch_scatter/_version.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

ImportError: No module named 'torch_ext'

Hi, I was running the "quickstart" code on my win10. I used torch = 1.8.0 and python = 3.7 with cuda = 10.2.

The problems happened when I tried training the model in Jupyter:

optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
solver = core.Engine(task, train_set, valid_set, test_set, optimizer, gpus=[0],
                     batch_size=512) solver.train(num_epoch=100)

And this turned to:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19744/1406504193.py in <module>
----> 1 solver.train(num_epoch=100)

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\core\engine.py in train(self, num_epoch, batch_per_epoch)
    141                     batch = utils.cuda(batch, device=self.device)
    142 
--> 143                 loss, metric = model(batch)
    144                 if not loss.requires_grad:
    145                     raise RuntimeError("Loss doesn't require grad. Did you define any loss in the task?")

d:\conda\envs\torchdrug\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\tasks\property_prediction.py in forward(self, batch)
     72         metric = {}
     73 
---> 74         pred = self.predict(batch, all_loss, metric)
     75 
     76         if all([t not in batch for t in self.task]):

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\tasks\property_prediction.py in predict(self, batch, all_loss, metric)
    103     def predict(self, batch, all_loss=None, metric=None):
    104         graph = batch["graph"]
--> 105         output = self.model(graph, graph.node_feature.float(), all_loss=all_loss, metric=metric)
    106         pred = self.linear(output["graph_feature"])
    107         return pred

d:\conda\envs\torchdrug\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\models\gin.py in forward(self, graph, input, all_loss, metric)
     74 
     75         for layer in self.layers:
---> 76             hidden = layer(graph, layer_input)
     77             if self.short_cut and hidden.shape == layer_input.shape:
     78                 hidden = hidden + layer_input

d:\conda\envs\torchdrug\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\layers\conv.py in forward(self, graph, input)
     89             update = checkpoint.checkpoint(self._message_and_aggregate, *graph.to_tensors(), input)
     90         else:
---> 91             update = self.message_and_aggregate(graph, input)
     92         output = self.combine(input, update)
     93         return output

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\layers\conv.py in message_and_aggregate(self, graph, input)
    339     def message_and_aggregate(self, graph, input):
    340         adjacency = utils.sparse_coo_tensor(graph.edge_list.t()[:2], graph.edge_weight,
--> 341                                             (graph.num_node, graph.num_node))
    342         update = torch.sparse.mm(adjacency.t(), input)
    343         if self.edge_linear:

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\utils\torch.py in sparse_coo_tensor(indices, values, size)
    160         size (list): size of the tensor
    161     """
--> 162     return torch_ext.sparse_coo_tensor_unsafe(indices, values, size)
    163 
    164 

d:\conda\envs\torchdrug\lib\site-packages\torchdrug-0.1.0-py3.7.egg\torchdrug\utils\torch.py in __getattr__(self, key)
     28             self.module = cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
     29                                              self.extra_ldflags, self.extra_include_paths, self.build_directory,
---> 30                                              self.verbose, **self.kwargs)
     31         return getattr(self.module, key)
     32 

d:\conda\envs\torchdrug\lib\site-packages\torch\utils\cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1089         is_python_module,
   1090         is_standalone,
-> 1091         keep_intermediates=keep_intermediates)
   1092 
   1093 

d:\conda\envs\torchdrug\lib\site-packages\torch\utils\cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1315         return _get_exec_path(name, build_directory)
   1316 
-> 1317     return _import_module_from_library(name, build_directory, is_python_module)
   1318 
   1319 

d:\conda\envs\torchdrug\lib\site-packages\torch\utils\cpp_extension.py in _import_module_from_library(module_name, path, is_python_module)
   1697 def _import_module_from_library(module_name, path, is_python_module):
   1698     # https://stackoverflow.com/questions/67631/how-to-import-a-module-given-the-full-path
-> 1699     file, path, description = imp.find_module(module_name, [path])
   1700     # Close the .so file after load.
   1701     with file:

d:\conda\envs\torchdrug\lib\imp.py in find_module(name, path)
    294         break  # Break out of outer loop when breaking out of inner loop.
    295     else:
--> 296         raise ImportError(_ERR_MSG.format(name), name=name)
    297 
    298     encoding = None

ImportError: No module named 'torch_ext'

The same code works well in Colab and I suspect this is because I couldn't install rdkit-pypi and installed rdkit on conda instead.

Fail to train synthon completion in retrosynthesis: Pre-condition Violation

Following the tutorial in the doc, I got error on this line.

synthon_solver = core.Engine(synthon_task, synthon_train, synthon_valid, synthon_test, synthon_optimizer , gpus=[0], batch_size=128)

The error message is

****
Pre-condition Violation
bgnIdx not connected to begin atom of bond
Violation occurred on line 292 in file /opt/conda/conda-bld/rdkit_1603173682698/work/Code/GraphMol/Bond.cpp
Failed Expression: getOwningMol().getBondBetweenAtoms(getBeginAtomIdx(), bgnIdx) != nullptr
****

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zhuzhaoc/torchdrug/torchdrug/core/engine.py", line 143, in train
    loss, metric = model(batch)
  File "/home/zhuzhaoc/.local/envs/ogb/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/zhuzhaoc/torchdrug/torchdrug/tasks/retrosynthesis.py", line 596, in forward
    pred, target = self.predict_and_target(batch, all_loss, metric)
  File "/home/zhuzhaoc/torchdrug/torchdrug/tasks/retrosynthesis.py", line 993, in predict_and_target
    graph2, node_in_target2, node_out_target2, bond_target2, stop_target2 = self.all_stop(reactant, synthon)
  File "/home/zhuzhaoc/.local/envs/ogb/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/zhuzhaoc/torchdrug/torchdrug/tasks/retrosynthesis.py", line 586, in all_stop
    graph, feature_valid = self._update_molecule_feature(graph)
  File "/home/zhuzhaoc/torchdrug/torchdrug/tasks/retrosynthesis.py", line 385, in _update_molecule_feature
    mols = graphs.to_molecule(ignore_error=True)
  File "/home/zhuzhaoc/torchdrug/torchdrug/data/molecule.py", line 788, in to_molecule
    bond.SetStereoAtoms(*stereo_atoms[j])
RuntimeError: Pre-condition Violation
        bgnIdx not connected to begin atom of bond
        Violation occurred on line 292 in file Code/GraphMol/Bond.cpp
        Failed Expression: getOwningMol().getBondBetweenAtoms(getBeginAtomIdx(), bgnIdx) != nullptr
        RDKIT: 2020.09.1
        BOOST: 1_73

This is because some molecules in USPTO50k have stereo bonds.

RDKit Error when preparing the USPTO dataset

I followed the instruction to install the TorchDrug properly and meet the Runtime Error when preparing the USPTO-50 dataset:
from torchdrug import datasets
reaction_dataset = datasets.USPTO50k("~/molecule-datasets/", node_feature="center_identification", kekulize=True)

The error log was:

reaction_dataset = datasets.USPTO50k("~/data/molecule-datasets/",
node_feature="center_identification",
kekulize=True)
Loading /home/masa/data/molecule-datasets/data_processed.csv: 100%|█| 50017/50017 [00:00<00:00, 11396
Constructing molecules from SMILES: 100%|█████████████████████| 50016/50016 [03:31<00:00, 236.07it/s]
Computing reaction centers: 0%| | 0/50016 [00:00<?, ?it/s]/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/data/graph.py:411: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
return match.nonzero().flatten()
[21:52:07]

Pre-condition Violation
getNumImplicitHs() called without preceding call to calcImplicitValence()
Violation occurred on line 188 in file /home/conda/feedstock_root/build_artifacts/rdkit_1629841762512/work/Code/GraphMol/Atom.cpp
Failed Expression: d_implicitValence > -1

Computing reaction centers: 0%| | 0/50016 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/core/core.py", line 282, in wrapper
return init(self, *args, **kwargs)
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/datasets/uspto50k.py", line 83, in init
reactants, products = process_fn(reactant, product)
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/datasets/uspto50k.py", line 142, in _get_reaction_center
reactant_hs = torch.tensor([atom.GetTotalNumHs() for atom in reactant.to_molecule().GetAtoms()])
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/data/molecule.py", line 332, in to_molecule
Chem.AssignStereochemistry(mol)
RuntimeError: Pre-condition Violation
getNumImplicitHs() called without preceding call to calcImplicitValence()
Violation occurred on line 188 in file Code/GraphMol/Atom.cpp
Failed Expression: d_implicitValence > -1
RDKIT: 2021.03.5
BOOST: 1_74

My env setting was python=3.7, torch=1.7.1, CUDA=11.0, and RDKit=2021.03.5.

Loss doesn't require grad. Did you define any loss in the task?

I have defined the mse loss but it's still useless.

Here comes the codes:

train_set, valid_set, test_set, dataset = load_dataset('Caco-2-Permeability.csv')
model = models.GIN(input_dim=dataset.node_feature_dim,
                   hidden_dims=[256, 256, 256, 256],
                   short_cut=True, batch_norm=True, concat_hidden=True)
task = tasks.PropertyPrediction(model, task=(),
                                criterion="mse", metric="rmse")
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
solver = core.Engine(task, train_set, valid_set, test_set, optimizer,
                     gpus=[0], batch_size=1024)
solver.train(num_epoch=100)
solver.evaluate("valid")
solver.save("clintox_gin_infograph.pth")

	def message_and_aggregate(self, graph, input):
	node_in, node_out = graph.edge_list.t()[:2]
	node_in = torch.cat([node_in, torch.arange(graph.num_node, device=graph.device)])
	node_out = torch.cat([node_out, torch.arange(graph.num_node, device=graph.device)])
	edge_weight = torch.cat([graph.edge_weight, torch.ones(graph.num_node, device=graph.device)])
	degree_in = graph.degree_in + 1
	degree_out = graph.degree_out + 1
	edge_weight = edge_weight / (degree_in[node_in] * degree_out[node_out]).sqrt()
	adjacency = utils.sparse_coo_tensor(torch.stack([node_in, node_out]), edge_weight,
	(graph.num_node, graph.num_node))
	update = torch.sparse.mm(adjacency.t(), input)
	if self.edge_linear:
	edge_input = graph.edge_feature.float()
	if self.edge_linear.in_features > self.edge_linear.out_features:
	edge_input = self.edge_linear(edge_input)
	edge_weight = edge_weight.unsqueeze(-1)
	edge_update = scatter_add(edge_input * edge_weight, graph.edge_list[:, 1], dim=0,
	dim_size=graph.num_node)
	if self.edge_linear.in_features <= self.edge_linear.out_features:
	edge_update = self.edge_linear(edge_update)
	update += edge_update