ardigen / mat Goto Github PK

View Code? Open in Web Editor NEW

229.0 11.0 57.0 422 KB

The official implementation of the Molecule Attention Transformer.

License: MIT License

Jupyter Notebook 19.79% Python 80.21%

mat's Introduction

MAT

The official implementation of the Molecule Attention Transformer. ArXiv

Code

EXAMPLE.ipynb jupyter notebook with an example of loading pretrained weights into MAT,
transformer.py file with MAT class implementation,
utils.py file with utils functions.

More functionality will be available soon!

Pretrained weights

Pretrained weights are available here

Results

In this section we present the average rank across the 7 datasets from our benchmark.

Results for hyperparameter search budget of 500 combinations.
Results for hyperparameter search budget of 150 combinations.
Results for pretrained model

Requirements

PyTorch 1.4

Acknowledgments

Transformer implementation is inspired by The Annotated Transformer.

mat's People

Contributors

Stargazers

Watchers

Forkers

lilleswing xinhaoli74 hyperdimensions codeaudit francoep hassanmohsin aspirincode boyuezhong sparklingredstar cbilodeau2 imonelittlesheep kudkudak floscha myausweis himaghna zhaoqichang odb9402 liuyunwu lliai jiangjing0122 layeqa pk-organics pincher-chen khatvangi thegodone tanxiaoqin888 nigelnnk jaewook94 shunsunsun ipark2021 rowcolumn chang111 seunghoon-yi classicvalues amdens-sci piotrhm franklalalala otsukaresama markussagen jiaor17 ys-arch fastscience-ai htw5295 sunnnymskang gordon5-ai yingli2009 gaoshan2006 rnaimehaom matthewcarbone daniel1991zy sherry9122 pctskate shitoudidi wenzhihao666 niccroot famoso83 sailfish009

mat's Issues

No library found for "Adapter" function

Hi,
Thanks for the excellent and easily understandable code.
I could easily use your code but I have some issues understanding the use of "Adapter" and in which library it exists.
I couldn't find which library it is from.
self.adapter = Adapter(size, 8) if use_adapter else None

Thanks in advance.

Benchmark against chemprop

Are there any experiments that benchmark MAT against chemprop? Have the folks over at Ardigen already tried this comparison?

UFFTYPER: Unrecognized charge state for atom

Hi,

Thanks for sharing this wonderful work! When I execute X, y = load_data_from_df('../data/estrogen-alpha/estrogen-alpha.csv', one_hot_formal_charge=True), I met below info:

I wonder if it is ok for the following fine-tuning?

Thank you in advance!

How is invariance to order of atoms in the molecule achieved?

Hi,

Thanks for the really nice and well explained paper.

I had a question regarding how the prediction output is invariant to the order of the atoms in the molecule. One can randomly permute the order of atoms in both the adjacency matrix, distance matrix as well as the atom feature matrix.

Will the MAT give the same property prediction for the different permutations?

My understanding is that the learned Attention is between positions so it is not permutation invariant. In the NLP uses of the Transformer, there is a positional encoding term added which helps with learning distant context, but unlike in language tasks, the order of the atoms in a molecule can be specified quite arbitrarily.

Thanks.

Reproducing performance on downstream tasks

Hello!
I am trying to reproduce the results on downstream tasks (e.g. BBBP, estrogen-alpha, estrogen-beta, and so on).
I used the pre-trained model released in this repository and fine-tuned it on each downstream task for 100 epochs as guided in the paper. But I could not reproduce the results in the paper. I suppose my fine-tuning process is quite different from yours.
Can I ask for the fine-tuning codes or more details on the fine-tuning process?

Thank you and nice work!

CUDA error: an illegal memory access was encountered

When I run the cell model.cuda( ), I get the following error.
I am running it on Google Colab with GPU support ( Tesla K80 )

RuntimeError Traceback (most recent call last)
in ()
----> 1 model.cuda()

8 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in (t)
456 Module: self
457 """
--> 458 return self._apply(lambda t: t.cuda(device))
459
460 def cpu(self: T) -> T:

RuntimeError: CUDA error: an illegal memory access was encountered

Finding attention scores for self-attention visualisation

Hi there! In the MAT paper, there were visualisations of the self-attention weights that were produced for one molecule. I am a bit unsure of where these self-attention weights can be located in the transformers.py file. Do you have any advice? Thanks!

What the shape of the edge features?

Hi, I want to add edge features to train, but I don't know how the edge features working

Adding a License file

Thanks for putting out this implementation! It's a really nice complement to your recent paper :)

Would you folks be willing to add a license for the code? I'd love to re-use some of the models and tools here, but can only do so with a license file.

If you're not sure what license would be good, I'd recommend MIT. It's what we use for deepchem's license

Assistance needed for running model

Hello! I'm trying to do a test run of the pre-trained MAT model. I've run SMILES through the DataLoader, which returns the following nine variables:

adj
afm
bft
orderAtt
aromAtt
conjAtt
ringAtt
distances
label

The model itself, as instantiated in load_weights.ipynb, is class GraphTransformer which needs these arguments:

src
src_mask
adj_matrix
distances_matrix
edges_att

Is there example code that shows how to go from the output of the DataLoader into the transformer?

load_data_from_df function extremely slow on M2 Macbook Pro

Just as the title reads. I am trying out the Jupyter example with my own database, and it's extremely slow most of the time except once when it randomly worked fast. I know this might be a stupid question and I would appreciate any help.

Provide a more complete example for training and inference?

Perhaps this is because I come from a Tensorflow/Keras background and I'm not familiar with PyTorch, but the main entry point for the code for inference and for training is not clear to me. I've looked at the EXAMPLE.ipynb, and the 9th cell has the code:

for batch in data_loader:
    adjacency_matrix, node_features, distance_matrix, y = batch
    batch_mask = torch.sum(torch.abs(node_features), dim=-1) != 0
    output = model(node_features, batch_mask, adjacency_matrix, distance_matrix, None)
    . . .

What's supposed to go in the dots? For training: I wasn't able to find any obvious (to me) optimizer in the other scripts in the repo. For inference: should I pass the out to the to_predict method of the GraphTransformer class? Does to_predict return an Nx1 array of point estimates of (for example) logS solubility, where N is the number of molecules in the batch?

Is there a lot of missing code in the ... part or is it trivial enough (fewer than 10 or so lines) that you could paste it here? Doesn't have to be anything fancy, maybe just to retrain a model on one of your .csv datasets and perform inference on the same dataset.

Other repos I've come across make the main entry point really clear. Here is an example from another repo:

main.Main(data=sol_data,        # provided data (SMILES, property)
          data_name=data_name,  # dataset's name
          data_units='',        # property's SI units
          bayopt_bounds=bounds, # bounds contraining the Bayesian search of neural architectures
          k_fold_number = 10,   # number of k-folds used for cross-validation
          augmentation = True,  # SMILES augmentation
          outdir = "./data/",  # directory for outputs (plots + .txt files)
          bayopt_n_epochs = 10, # number of epochs for training during Bayesian search
          bayopt_n_rounds = 25, # number of architectures to sample during Bayesian search 
          bayopt_on = True,     # use Bayesian search
          n_gpus = 1,           # number of GPUs to be used
          patience = 25,        # number of epochs with no improvement after which training will be stopped
          n_epochs = 100)       # maximum of epochs for training

Maybe that's a bit too formal, but when I pasted it in a separate notebook and ran it on an AWS GPU it ran without issues. Is there a complete example like that for this repo?

mol_collate_func potential bug

I've modified the data_loader code and am running into a crash in featurization.data_utils.mol_collate_func here:

   for molecule in batch:
         if type(molecule.y[0]) == np.ndarray:
            labels.append(molecule.y[0])
        else:
            labels.append(molecule.y)

I didn't debug the original code to see why it doesn't crash here, but with my code molecule.y is a scalar, so molecule.y[0] doesn't work. It seems like it should be:

   for molecule in batch:
         if type(molecule.y) == np.ndarray:
            labels.append(molecule.y[0])
        else:
            labels.append(molecule.y)

How to run Training loop using random search?

Great Work!

I'm trying to repeat the results in your paper, but I'm having trobule using random search to obtain the best results in each dataset.

As you mentioned in the paper, " we extensively tune their hyperparameters using random search", & "We run two sets of experiments with budget of 150 and 500 evaluations".
May I ask how you tune this hyperparameters using random search and how to control the budget of 150 and 500 combinations. I have tried to utilize the skorch package to solve this problem but it failed.

Thanks a lot!

Undefined "Adapter"

Afaict, Adapter referenced here is not defined anywhere. It's not a problem as control flow with default params doesn't hit it, but I'm curious what it is? Just a reference to some other unreleased dev code?

"In Regression Tasks The Property Value Was Standardized"

From The Pre - Print page 4 of evaluation what does this sentence mean? What standardization did the results go through?

Access Feature vector

Hi!
I am quite fascinated by your approach,I was wondering if we could tweak the present architecture or if the present architecture can give lets say a d dimensional feature vector of any SMILES molecule given to it,like a molecule fingerprint which can be used for other downstream tasks e.g. calculating two SMILES sequence similarity.

Thanks and Regards

Dataset Splits

Great Work!

Can you release the splits used for the 6 folds for each dataset?
In figure 2 you mention that some were random and some were scaffold, but which was which was not discussed in either dataset section. The splits would be especially helpful the Estrogen datasets as grabbing data hitting a protein from CHEMBL is a tricky process and hard to do exactly the same way twice.

README referenced `load_weights.ipynb` gone

The load_weights.ipynb file, which is referenced in the README has been removed. I think it is superseded by the new EXAMPLE.ipynb. If so, the README should be updated.

Thanks!

Why not using the Rdkit function "AllChem.Get3DDistanceMatrix" for distance matrix calculation Method

Thanks a lot for the code! And your idea is really impressive.

After the coordinates are embedded in the mol, the "pairwise_distances" function is used to calculate the distance matrix.
But why not use the Rdkit function "AllChem.Get3DDistanceMatrix" for distance matrix calculation? I did the experiment in the both methods, it seems that the results are the same.
Is there any concern to use "pairwise_distances" instead of "AllChem.Get3DDistanceMatrix"?

Question re: pretrained weights

I would like to confirm whether the pretrained weights available in the README are just from "masked input node" prediction, and not of the final trained MAT. I assume this is the case because it skips loading any generator weights (which would differ for each task).

When you do transfer learn onto a specific task, do you do any freezing and gradual thawing of the encoder weights, or just train right away?