Coder Social home page Coder Social logo

materialsvirtuallab / megnet Goto Github PK

View Code? Open in Web Editor NEW
494.0 494.0 154.0 165.41 MB

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals

License: BSD 3-Clause "New" or "Revised" License

Python 16.17% Jupyter Notebook 81.03% Makefile 0.34% HTML 0.68% CSS 0.25% Batchfile 0.31% Shell 0.02% R 0.63% Procfile 0.01% JavaScript 0.58%
deep-learning graph-networks keras machine-learning materials-science tensorflow

megnet's People

Contributors

a-ws-m avatar chc273 avatar dependabot-preview[bot] avatar dependabot[bot] avatar dgaines2 avatar lauri-codes avatar pre-commit-ci[bot] avatar shyuep avatar wardlt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

megnet's Issues

is it possible to predict many molecules at same time?

hi, I read your example and source code. They all predict 1 structure property once. Is it possible to predict many molecules at same time?

Because you know, there is a 'multiprocess' parameters in keras model.predict function. It will help us a lot when we handle big data.

thanks

On the use of training molecule data for megnet

Dear Megnet Developer.

I'm using megnet and appreciate your development of megnet. I want to train the model on my own molecular data set. How can I process my data set and what format should I format the molecular data?

Sincerely.

MEGNetLayer edge output is not differentiable

my model architecture is very simple, and I follow the code you provided in the section 'Customized Graph Network Models'. I only change out[2] to out[1] in order to get edge output. I got an error when I try to fit the model.

code:

def keras_model():
K.clear_session()
n_atom_feature= 5
n_bond_feature = 16
n_global_feature = 1
# Define model inputs
int32 = 'int32'
x1 = Input(shape=(None, n_atom_feature)) # atom feature placeholder
x2 = Input(shape=(None, n_bond_feature)) # bond feature placeholder
x3 = Input(shape=(None, n_global_feature)) # global feature placeholder
x4 = Input(shape=(None,), dtype=int32) # bond index1 placeholder
x5 = Input(shape=(None,), dtype=int32) # bond index2 placeholder
x6 = Input(shape=(None,), dtype=int32) # atom_ind placeholder
x7 = Input(shape=(None,), dtype=int32) # bond_ind placeholder
xs = [x1, x2, x3, x4, x5, x6, x7]
# Pass the inputs to the MEGNetLayer layer
# Here the list are the hidden units + the output unit,
# you can have others like [n1] or [n1, n2, n3 ...] if you want.
out = MEGNetLayer([32, 16], [32, 16], [32, 16], pool_method='mean', activation='relu')(xs)
# the output is a tuple of new graphs V, E and u
# Since u is a per-structure quantity,
# we can directly use it to predict per-structure property
out = Dense(1)(out[1])
# Set up the model and compile it!
model = Model(inputs=xs, outputs=out)
model.compile(loss='mae', optimizer='adam')

error:

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

Thanks.

Got nan in loss

@chc273

Hi, I create train script for learning from qm9's xyz file.
The following dir is reproducing directory.
https://github.com/nd-02110114/megnet/tree/repro/repro

If the number of data is so small (100), this script works 👍
However, If the number of data is large (1000~),
this script doesn't works... I got nan in loss.

How should I do in order to avoid getting nan in loss ?
This script's hyper param setting is worse...?

The below log is sample.

Epoch 1/10

 1/25 [>.............................] - ETA: 2:54 - loss: 0.0612
 2/25 [=>............................] - ETA: 1:31 - loss: 0.0318
 3/25 [==>...........................] - ETA: 1:03 - loss: 0.0257
 4/25 [===>..........................] - ETA: 49s - loss: 0.0241
 5/25 [=====>........................] - ETA: 40s - loss: 0.0212
 6/25 [======>.......................] - ETA: 33s - loss: 0.0182
 7/25 [=======>......................] - ETA: 29s - loss: 0.0159
 8/25 [========>.....................] - ETA: 25s - loss: 0.0145
 9/25 [=========>....................] - ETA: 22s - loss: nan
10/25 [===========>..................] - ETA: 20s - loss: nan
11/25 [============>.................] - ETA: 17s - loss: nan
12/25 [=============>................] - ETA: 15s - loss: nan
13/25 [==============>...............] - ETA: 14s - loss: nan
14/25 [===============>..............] - ETA: 12s - loss: nan
15/25 [=================>............] - ETA: 10s - loss: nan
16/25 [==================>...........] - ETA: 9s - loss: nan
17/25 [===================>..........] - ETA: 8s - loss: nan
18/25 [====================>.........] - ETA: 7s - loss: nan
19/25 [=====================>........] - ETA: 5s - loss: nan
20/25 [=======================>......] - ETA: 4s - loss: nan
21/25 [========================>.....] - ETA: 3s - loss: nan
22/25 [=========================>....] - ETA: 2s - loss: nan
23/25 [==========================>...] - ETA: 1s - loss: nan
24/25 [===========================>..] - ETA: 0s - loss: nan
25/25 [==============================] - 24s 952ms/step - loss: nan - val_loss: nan
I0819 07:17:08.106158 140691671836480 callbacks.py:272] Nan loss found!
I0819 07:17:09.059619 140691671836480 callbacks.py:295] No weights were loaded
I0819 07:17:09.905934 140691671836480 callbacks.py:275] Now lr is 0.0005000000237487257.
Epoch 2/10

 1/25 [>.............................] - ETA: 5:16 - loss: nan
 2/25 [=>............................] - ETA: 2:43 - loss: nan
 3/25 [==>...........................] - ETA: 1:52 - loss: nan
 4/25 [===>..........................] - ETA: 1:25 - loss: nan
 5/25 [=====>........................] - ETA: 1:09 - loss: nan
 6/25 [======>.......................] - ETA: 57s - loss: nan
 7/25 [=======>......................] - ETA: 48s - loss: nan
 8/25 [========>.....................] - ETA: 42s - loss: nan
 9/25 [=========>....................] - ETA: 37s - loss: nan
10/25 [===========>..................] - ETA: 33s - loss: nan
11/25 [============>.................] - ETA: 29s - loss: nan
12/25 [=============>................] - ETA: 25s - loss: nan
13/25 [==============>...............] - ETA: 22s - loss: nan
14/25 [===============>..............] - ETA: 19s - loss: nan
15/25 [=================>............] - ETA: 16s - loss: nan
16/25 [==================>...........] - ETA: 14s - loss: nan
17/25 [===================>..........] - ETA: 12s - loss: nan
18/25 [====================>.........] - ETA: 10s - loss: nan
19/25 [=====================>........] - ETA: 8s - loss: nan
20/25 [=======================>......] - ETA: 7s - loss: nan
21/25 [========================>.....] - ETA: 5s - loss: nan
22/25 [=========================>....] - ETA: 4s - loss: nan
23/25 [==========================>...] - ETA: 2s - loss: nan

predict_structure triggering retracing

When doing trying to do predictions, I get a lot of warnings about function retracing. Is this a warning that should be ignored, or should I be dong something different to prevent this?

def predict(pred_model, structure):
    return pred_model.predict_structure(structure)
predict_verify = functools.partial(predict, model)
test_results = test_data.join(test_data['structure'].apply(predict_verify))

WARNING:tensorflow:5 out of the last 5 calls to <function Model.make_predict_function.<locals>.predict_function at 
0x7f273dab04c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to 
passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument 
shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization
/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

Python Files Missing in PyPi Distribution

I am having trouble installing megnet from pypi, and it seems like there's an issue with the whl file provided via pypi. It does not seem to contain any Python files:

image

I am not sure how you are doing your publishing to PyPi, but there seems to be an issue in your deployment code.

Superconducting Critical temperature model

Great work so far! I have been thinking about creating a model predicting the superconducting critical temperature based on the crystal structure. So far, however, I am having trouble finding appropriate training data... Supercon's dataset is the best I have found (https://supercon.nims.go.jp/supercon/material_menu). Have you considered developing such a model yourself, and if so, what have been the main issues you faced? Thanks for all the work you have done.

loss is Nan

I use all features for training, but at the beginning, loss is Nan. What's the reason? Is there a solution?
`structures = [mol_from_file(molpath + filename + '.mol', file_format='mol') for filename in filenames]
targets = df['rou(exp)'].values

#split data
train_structures = structures[:1500]
test_structures = structures[1500:]
train_targets = targets[:1500]
test_targets = targets[1500:]

#build model
mg = MolecularGraph(atom_features=['element', 'chirality', 'formal_charge', 'ring_sizes',
'hybridization', 'donor', 'acceptor', 'aromatic'],
bond_features=['bond_type', 'same_ring', 'spatial_distance', 'graph_distance'],
distance_converter=GaussianDistance(np.linspace(0, 4, 20), 0.5),
known_elements=['C', 'H', 'N', 'O'])
model = MEGNetModel(27, 2, 32, nblocks=3, lr=1e-20, n1=64, n2=64, n3=64, embedding_dim=64,npass=3, ntarget=1, graph_converter=mg)

INTENSIVE = True # U0 is an extensive quantity
scaler = StandardScaler.from_training_data(train_structures, train_targets, is_intensive=INTENSIVE)
model.target_scaler = scaler

model.train(train_structures, train_targets, batch_size=1,epochs=50, verbose=2,
callbacks=[ReduceLRUponNan()], scrub_failed_structures=False, prev_model=None,
lr_scaling_factor=0.5, patience=500,) # In reality, use epochs>1000`

Specie references in megnet causes attribute errors when calling `predict_structure`

AttributeError                            Traceback (most recent call last)
<ipython-input-47-a3eadd274445> in <module>()
----> 1 df_test["megent_pred"] = model.predict_structure(df_test["final_structure"]).ravel()

3 frames
/usr/local/lib/python3.7/dist-packages/megnet/models/base.py in predict_structure(self, structure)
    311             predicted target value
    312         """
--> 313         graph = self.graph_converter.convert(structure)
    314         return self.predict_graph(graph)
    315 

/usr/local/lib/python3.7/dist-packages/megnet/data/graph.py in convert(self, structure, state_attributes)
    243             state_attributes or getattr(structure, "state", None) or np.array([[0.0, 0.0]], dtype="float32")
    244         )
--> 245         atoms = self.get_atom_features(structure)
    246         index1, index2, _, bonds = get_graphs_within_cutoff(structure, self.nn_strategy.cutoff)
    247 

/usr/local/lib/python3.7/dist-packages/megnet/data/graph.py in get_atom_features(structure)
    122             List of atomic numbers
    123         """
--> 124         return np.array([i.specie.Z for i in structure], dtype="int32").tolist()
    125 
    126     def __call__(self, structure: Structure) -> Dict:

/usr/local/lib/python3.7/dist-packages/megnet/data/graph.py in <listcomp>(.0)
    122             List of atomic numbers
    123         """
--> 124         return np.array([i.specie.Z for i in structure], dtype="int32").tolist()
    125 
    126     def __call__(self, structure: Structure) -> Dict:

AttributeError: 'Structure' object has no attribute 'specie'

using pypi releases

pymatgen                      2022.0.10
megnet                        1.2.8

@shyuep

Atomic embeddings for QM9 models

When reading atomic embeddings from QM9 models (MEGNet-simple) from files, for example:

model_h = MEGNetModel.from_file('../mvl_models/qm9-2018.6.1/H.hdf5')
embedding_layer = [i for i in model_h.layers if i.name.startswith('embedding')][0]
embedding = embedding_layer.get_weights()[0]
print('Embedding matrix dimension is ', embedding.shape)

One obtains the following:
Embedding matrix dimension is (9, 16)

How is this matrix embedding atoms H,C,N,O,F if the atomic number goes up to 9 (F)?

Training with save_checkpoint=True disables validation metrics logging

I've recently been training some models with MEGNet and trying to use TensorBoard to track the model metrics. At first I was very confused as to why I wasn't seeing the validation metrics in the output -- the MEGNet ModelCheckpointMAE callback was reporting improvements to the val_mae as expected, so I knew that I'd passed the validation correctly. I did some digging and found this. I understand the logic, but I don't think hiding the validation data from Keras should be default behaviour because it prevents other callbacks that track validation metrics from working as expected.

I also checked the code for the ModelCheckpointMAE callback and I noticed that the validation MAE is manually computed.
The logs arguments to on_epoch_end already includes pre-computed metrics, so long as the model was compiled with those metrics. You can see in the TensorBoard callback code that it simply pulls the pre-computed validation metrics from this parameter. So it may be more efficient to ensure that the model is compiled with the mae metric by default and then pull its value from logs; this would resolve the issue of validation metrics being computed twice.

Documentation, functionality help

Hi not sure where to ask this, I am interested in using this code to predict binding energy of a system. Is this possible? If this function needs to be developed, which source files would need editing?

Also, is there a more comprehensive documentation for megnet (just like pymatgen)?

Wrong version of libprotobuf

I'm hitting this error:

[libprotobuf FATAL external/protobuf_archive/src/google/protobuf/stubs/common.cc:68] This program requires version 3.7.0 of the Protocol Buffer runtime library, but the installed version is 3.6.1.  Please update your library.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "google/protobuf/any.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  This program requires version 3.7.0 of the Protocol Buffer runtime library, but the installed version is 3.6.1.  Please update your library.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "google/protobuf/any.pb.cc".)

...which is weird because pip and conda report having 3.7.0 installed. Any idea why it's importing the wrong version?

predict_structure only producing first target when ntargets=2

Hello,
I was wondering how to get the other predictions when ntargets is greater than 0. When I do model.predict_structure(structure), I seem to get an array containing only the first prediction. Do I need to set a different target_scaler to be able to do this? If so, what should it be?

Training with global state

I'm impressed with your work and trying to train temperature-dependent data.
When 'model' is generated via 'MEGNetModel', 'nfeat_global' and 'global_embedding_dim' options can customize it.

But, when I tried to train the model,
no global states but only structures (input) and target (output) needed?

How can I train the model with the global states as input?
(Or how can I collaborate the global states with the structures to feedforward?)

Suggestion for improvement in default parameter

Hello,

I'm raising this issue in hope to improve the default behavior of the megnet. I have ran into a very reproducible training failure for binary classification task (e.g. MEGNetModel(..., is_classification=True)). The default behavior of the model training often leads to predicting Nan values due to the exploding gradient. To make the training more robust, clipnorm (typically between 1 and 5) parameter can be set for optimizer (Adam) which clips gradient. This can be added to opt_params in init of MEGNetModel class in the case of binary classification task to make the default behavior more robust. I'm not sure if improving the default behavior is of your interest, but I hope this helps!

Another issue I noted was that the callback class, ReduceLRUponNan, leaded to error for me which raised "FailedPreconditionError: Attempting to use uninitialized value." I did not have a chance to address this issue as clipnorm solved my problem, but it seems like the model parameters are not initialized after loading weights.

best regards,
Geun Ho Gu

Type Error while Training a new MEGNetModel from structures

I was trying run a new model as suggested by you:

from megnet.models import MEGNetModel
from megnet.data.crystal import CrystalGraph
import numpy as np
from pymatgen.core.structure import Structure

nfeat_bond = 100
r_cutoff = 5
gaussian_centers = np.linspace(0, r_cutoff + 1, nfeat_bond)
gaussian_width = 0.5
graph_converter = CrystalGraph(cutoff=r_cutoff)
model = MEGNetModel(graph_converter=graph_converter, centers=gaussian_centers, width=gaussian_width)

But got the following error:
TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (Dimension(96), 64). Consider casting elements to a supported type.

Please help

Data set process

Dear Megnet developers:
I want to redo your work of training the network. So, may I have the codes of processing the dataset, "mp.2018.6.1.json" and " QM9" you provide in the paper.
Thanks.

Hardcoded reading of json as opposed to hdf5

In the README, it says you can do model = MEGNetModel.from_file('mvl_models/qm9-2018.6.1/HOMO.hdf5'). However, if I'm understanding it correctly, the from_file function here is hard-coded to read in a .json (line 339). Am I understanding this correctly?

What to do with negative values of Bulk Modulus and Shear Modulus?

Hi
When we set up the model for properties like bulk modulus and shear modulus, we take into account the log10(GPa) values. For some crystals, the values of bulk and shear moduli come out to be negative. The question is how do we take the logarithm? Do we take the absolute values before calculating the logarithm? Or do we take only those crystals which have positive values for these properties?
I will be grateful to you if you can answer.
Thanks

Batch training error and NaN loss from the first epoch

Hi, I'm beginenr with deep learning.

I'm tring to use MEGNetmodel (and my own model using MEGNet layers) for materials property prediction now.

when I tried to train the model, I get some problems.
First, I get an error message with using MolecularBatchGenerator.

the message is,
ValueError: Error when checking target: expected dense_8 to have shape (None, 1) but got array with shape (1, 4)

I thought this error is due to MolecularBatchGenerator so I made my own graphbatchgenerator, (generator with transposed target).

def __getitem__(self, index: int) -> tuple:
        # Get the indices for this batch
        batch_index = self.mol_index[index * self.batch_size:(index + 1) * self.batch_size]

        # Get the inputs for each batch
        inputs = self._generate_inputs(batch_index)

        # Make the graph data
        inputs = self._combine_graph_data(*inputs)

        # Return the batch
        if self.targets is None:
            return inputs
        else:
            # get targets
            it = itemgetter(*batch_index)
            target_temp = itemgetter_list(self.targets, batch_index)
            target_temp = np.atleast_2d(target_temp).transpose()

            return inputs, expand_1st(target_temp)

(other things are all same except target_temp = np.atleast_2d(target_temp).transpose()

After changing MolecularBatchGernerator with this, I could start to train.

However, when I start to train the model, I get an nan loss from the first epoch which is totally frustrating me... actually, I get nan value for all metrics. At the start of the training, their values are real value. But after trained with several batch, the model gives nan metrics.

I heard several method to solve nan problems in deep learning, so I tried most of them: I have done output scaling, changing the loss function, using other optimizer, reduce(or increase) batch size, gradient clipping, and changing the model structure.

Do you have any other insights for solve the nan loss problem??
I've spent several days for this problem.. TT

On the error for the prediction

Dear megnet developers.

I'm interested on the crystal graph network and examined the machine learning for elastic moduli with the help of the sample code on megnet github page.

I used the data from elastic moduli
https://figshare.com/articles/Graphs_of_materials_project/7451351

However, I found the error message from the prediction of new structure which has only 1 atom in the crystal cell.

InvalidArgumentError Traceback (most recent call last)
InvalidArgumentError: segment_ids should be the same size as dimension 0 of input.

The error seems tensorflow error for the input structure for prediction crystal structure but
I set simply pymatgen structure ojbect made from cif file.

I found the error was always occurred at the prediction of 1 atom crystal cell like "Si" from the training crystal graph model .

model.predict_structure(st)
where st is pymatgen structure object with crystal which has only 1 atom in the unit cell.

If I get reply and suggestion for the error, I’m very grateful.

My python code is as follow

import json
import gzip
import numpy as np
from megnet.data.mp import index_rep_from_structure, to_list,graph_to_inputs
import os
from pymatgen.core import Structure
import pandas as pd
from matplotlib import pyplot as plt

data_json_file = "../megnet/data/tests/mp_elas.json"
with open (data_json_file,'r') as f:
data = pd.read_json(f)

target_cols = ['K']

is_null = data[target_cols].apply(pd.isnull, axis=1)
rows_to_drop = data[is_null.values].index

print('target null number =',len(rows_to_drop))

data = data.drop(rows_to_drop, axis=0)
data = data.dropna(axis=0)

from pymatgen.io.cif import CifParser

for target_col in target_cols:
plt.title(target_col)
plt.hist(data[target_col], bins=50)
plt.show()

structures = []
for i in range(len(data.index)):
cif = CifParser.from_string(data.iloc[i]['structure'])
st = cif.get_structures()[0]
structures.append(st)
if(len(st.sites)==1):
print('sites is 1')
graph = data.iloc[i]['graph']
print (graph)
name = st.formula
print (name)

structures = pd.Series(structures)

from megnet.model import megnet_model
from megnet.data.graph import GaussianDistance
from megnet.data.crystal import CrystalGraph,structure2graph
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

import matplotlib.pyplot as plt
get_ipython().run_line_magic('matplotlib', 'inline')

n_bond_feature = 10
n_global_feature = 2
gaussian_centers = np.linspace(0, 5, 10)
gaussian_width = 0.5

graph_convertor = CrystalGraph()
distance_convertor = GaussianDistance(gaussian_centers, gaussian_width)

targets =data[target_cols[0]]

kf = KFold(n_splits=5, shuffle=True)

X = structures
Y = targets

for train_index, test_index in kf.split(X,Y):
print('train index =',train_index,'test_index=',test_index)
X_tra, X_test = X.iloc[train_index].tolist(), X.iloc[test_index].tolist()
y_tra, y_test = Y.iloc[train_index].tolist(), Y.iloc[test_index].tolist()

model = megnet_model(n_bond_feature,n_global_feature, 

graph_convertor=graph_convertor, distance_convertor = distance_convertor)
hist = model.train(X_tra,y_tra,epochs=20)

model.summary()
        
plt.rcParams['font.size'] = 20
plt.figure(figsize=(8, 6))
plt.plot(model.history.history['loss'], 'o-r', label='Train Loss')
plt.xlabel('Epoch')
plt.ylabel("Loss (a.u.)")
plt.legend(frameon=False)

pred_test = []
for i,st in enumerate(X_test):

    st_pre = model.predict_structure(st)
    pred_test.append(st_pre[0])
    print('ypred =',st_pre[0],'ans=',y_test[i])


mse = mean_squared_error(y_test, pred_test)
print("KERAS REG TEST RMSE : %.2f" % (mse ** 0.5))

Model saving issues in JSON file format

Hi,

I am trying to save a MEGNet model via train method (defined in GraphModel class) as illustrated in the sample usage codes. If I understand correctly, the train method calls custom ModelCheckpointMAE then saves Keras hdf5 files in the callback directory by default.

I can see hdf5 files are generated as the training proceeds, but I do not see any json files associated with hdf5 files are constructed accordingly. It seems I do need both hdf5 and json files to load a trained MEGNet model via from_file method as per the sample notebook in save_and_load_model.ipynb.

I debugged some codes, and it seems that ModelCheckpointMAE actually saves the model through standard tf.keras.Model.save().

self.model.save(filepath, overwrite=True)

Aren't we suppose to use save_model method defined in GraphModel class to save a model in json and hdf5 format?
I have been trying to workaround the issue by changing self.model.save() to self.model.save_model(), but got an AttributeError: 'Functional' object has no attribute 'save_model'.

I think tf.keras.callbacks can only access to the Functional type, not the encapsulated models like MEGNet. Can someone help me with saving and loading a MEGNet model during training? Thank you.

early stopping seems not work?

I'm using megnet in a classification problem, and set patience to 3. The training msg is under:

Epoch 10/10000
16/16 [==============================] - 341s 21s/step - loss: 0.3728 - val_loss: 0.4102
Epoch 11/10000
16/16 [==============================] - 343s 21s/step - loss: 0.3762 - val_loss: 0.3696
Epoch 12/10000
16/16 [==============================] - 344s 22s/step - loss: 0.3728 - val_loss: 0.3583
Epoch 13/10000
16/16 [==============================] - 342s 21s/step - loss: 0.3702 - val_loss: 0.3748
INFO:megnet.callbacks:val_acc does not improve after 3, stopping the fitting...
Epoch 14/10000
16/16 [==============================] - 339s 21s/step - loss: 0.3723 - val_loss: 0.3999
INFO:megnet.callbacks:val_acc does not improve after 3, stopping the fitting...
Epoch 15/10000
16/16 [==============================] - 344s 21s/step - loss: 0.3748 - val_loss: 0.3903
INFO:megnet.callbacks:val_acc does not improve after 3, stopping the fitting...
Epoch 16/10000
16/16 [==============================] - 339s 21s/step - loss: 0.3739 - val_loss: 0.3549
INFO:megnet.callbacks:val_acc does not improve after 3, stopping the fitting...
Epoch 17/10000

so stopping didn't stop?
thanks

On the use of training data for megnet

Dear Megnet Developer.

I'm using megnet and appreciate your development of megnet.

Now, I used the pre-trained model of logK (/mp-2018.6.1/log10K.hdf5)

but Since I update (pip megnet) the megnet (and pymatgen), the error has occured.
(old version of megnet and pymatgen, it worked.)

It's maybe some of my environment are old and it causes the error, but if I get
some advice, I'll be very happy.

Sincerely.

Here this is my code
##################################
from megnet.models import MEGNetModel
from megnet.data.graph import GaussianDistance
from megnet.data.crystal import CrystalGraph
from sklearn.model_selection import train_test_split

model_pretrained = MEGNetModel.from_file("../megnet/mvl_models/mp-2018.6.1/log10K.hdf5")
model_pretrained.model.summary()
########################################

and the error


TypeError Traceback (most recent call last)
in
5
6
----> 7 model_pretrained = MEGNetModel.from_file("../megnet/mvl_models/mp-2018.6.1/log10K.hdf5")
8 model_pretrained.model.summary()

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/megnet/models.py in from_file(cls, filename)
299 GraphModel
300 """
--> 301 configs = loadfn(filename + '.json')
302 from keras.models import load_model
303 from megnet.layers import _CUSTOM_OBJECTS

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/monty/serialization.py in loadfn(fn, *args, **kwargs)
81 if "cls" not in kwargs:
82 kwargs["cls"] = MontyDecoder
---> 83 return json.load(fp, *args, **kwargs)
84
85

~/.conda/envs/mukimi_okuno/lib/python3.7/json/init.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
--> 296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
297
298

~/.conda/envs/mukimi_okuno/lib/python3.7/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
359 if parse_constant is not None:
360 kw['parse_constant'] = parse_constant
--> 361 return cls(**kw).decode(s)

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/monty/json.py in decode(self, s)
254 def decode(self, s):
255 d = json.JSONDecoder.decode(self, s)
--> 256 return self.process_decoded(d)
257
258

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/monty/json.py in process_decoded(self, d)
246
247 return {self.process_decoded(k): self.process_decoded(v)
--> 248 for k, v in d.items()}
249 elif isinstance(d, list):
250 return [self.process_decoded(x) for x in d]

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/monty/json.py in (.0)
246
247 return {self.process_decoded(k): self.process_decoded(v)
--> 248 for k, v in d.items()}
249 elif isinstance(d, list):
250 return [self.process_decoded(x) for x in d]

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/monty/json.py in process_decoded(self, d)
236 if not k.startswith("@")}
237 if hasattr(cls_, "from_dict"):
--> 238 return cls_.from_dict(data)
239 elif np is not None and modname == "numpy" and classname ==
240 "array":

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/megnet/data/graph.py in from_dict(cls, d)
170 d.update({'nn_strategy': nn_strategy_obj})
171 return super().from_dict(d)
--> 172 return super().from_dict(d)
173
174

~/.conda/envs/mukimi_okuno/lib/python3.7/site-packages/monty/json.py in from_dict(cls, d)
124 decoded = {k: MontyDecoder().process_decoded(v) for k, v in d.items()
125 if not k.startswith("@")}
--> 126 return cls(**decoded)
127
128 def to_json(self):

TypeError: init() got an unexpected keyword argument 'atom_convertor'

the unit of Gibbs free enegy is kj/mol or eV

hi
I saw the error is 0.012eV of Gibbs free energy
but in qm9 pretrained notebook, it lists below:
mu -0.008 0.000
alpha 13.127 13.210
HOMO -10.557 -10.550
LUMO 3.241 3.186
gap 13.622 13.736
R2 35.975 35.364
ZPVE 1.215 1.218
U0 -17.166 -17.172
U -17.353 -17.286
H -17.420 -17.389
G -16.107 -16.152
Cv 6.427 6.469
omega1 3151.626 3151.708

so the unit here of Gibbs is eV or kj/mol?

thanks

how many gpu you used for training?

really a great work, but...

“In our work, we use dedicated GPU resources to train MEGNet models with 100,000 crystals/molecules. It is recommended that you do the same.”

i found tesla p100 12G can’t handle the qm9 training...
and if use cpu mode, 128G memory is not enough.

so I wander how many gpu i need to train a model?

thanks.

How to make reference log10K.hdf5 file

Dear Megnet developers

I enjoyed the megnet and appreciate releasing the source code of megnet.
I had tried to train the log of elastic moduli (log10K) by megnet with
almost default parameter values and the date is divided data (mp_elas.json)
into 80:10:10 (train, validation, test) and verified the model performance in the test data.
The result for test data is MAE (log10 K) is 0.097 and is less accurate than the
reference paper (Graph Networks as a Universal Machine Learning Framework for Molecules and Crystal).

My python script is as follow,

nfeat_bond = 10
nfeat_global = 2 # number of state freedom
gaussian_centers = np.linspace(0, 5, 10)
gaussian_width = 0.5
distance_convertor = GaussianDistance(gaussian_centers, gaussian_width)
graph_convertor = CrystalGraph(bond_convertor=distance_convertor)

targets =data[target_cols[0]]
X = structures #5 (mp_elas. structure data)
Y = targets # ( mp elas target elastic value)

! split data into 80;10:10 (train , valid , test)
X_train , X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.1,random_state=1)
X_train, X_val, Y_train, Y_val = train_test_split(X_train,Y_train,test_size=(10.0/90.0),random_state=1)

callback_list =[ManualStop(),ReduceLROnPlateau(monitor='val_loss',factor=0.1,patience=10),
ModelCheckpoint(filepath='./megnet_mp_elas2.h5',monitor='val_loss',save_best_only=True)
]
megmodel =MEGNetModel(nfeat_bond, nfeat_global,graph_convertor=graph_convertor)
megmodel.model.summary()

! batch_size 24 (default) epochs=30 we stop number of epoch 30
megmodel.train(X_train,Y_train,X_val,Y_val,batch_size=24,epochs=30,callbacks=callback_list)

And I found the released leaned model (log10K.hdf5) has first bond attributes dimension
is 6464 (input_2[0][0] of dense_3 layer before MEGNetLayer 1 .) , and is different from
my default values as follow

! my layer structure (defalut use of MegNetModel)
dense_3 (Dense) (None, None, 64) 704 input_2[0][0]

!log10K.hdf5
dense_3 (Dense) (None, None, 64) 6464 input_2[0][0]

What cause the difference of the dimension of dense_3 layer if MegNetModel?

I'm beginner in deep learning and might have made a simple mistake.
If I can get reply or get the python script that made released log10K.hdf5 , I'm very happy.

Sincerely,

Yukihiro Okuno.

how to set learning rate decay?

hi, is it possible to set a learning rate decay in the model? if not, do you have better way to implement the decay besides loop {train 1 epoch once and save the model -> reset the lr} ... thank you.

Error: converter or convertor and include_site

When I run the test_models.py script, I get a "TypeError: init() got an unexpected keyword argument 'bond_converter'", which can be fixed by changing "converter" to "convertor".

However, then I get another error further down, namely: RuntimeError: get_all_neighbors() got an unexpected keyword argument 'include_site'. The same happens by the way when I run the crystal_example notebook.

Thanks for your help.

ReduceLRUponNan Callback causes tensorflow.python.framework.errors_impl.FailedPreconditionError

When training some models with an initially high learning rate, the ReduceLRUponNan kicks in and seems to properly load weights, but then fails next epoch due to the error above.
There is a minimal working example python script below and I have attached a conda environment file for reproducing the error, as well as the script's output when I run it on my machine.
I have a working solution to this in a fork, I will submit a PR after posting. It seems to arise from recompiling the model during training.

"""MWE showing learning rate reduction method instability."""
import numpy as np
import tensorflow as tf
from matminer.datasets import load_dataset
from megnet.data.crystal import CrystalGraph
from megnet.models import MEGNetModel
from sklearn.model_selection import train_test_split

RANDOM_SEED = 2021


def get_default_megnet_args(
    nfeat_bond: int = 10, r_cutoff: float = 5.0, gaussian_width: float = 0.5
) -> dict:
    gaussian_centers = np.linspace(0, r_cutoff + 1, nfeat_bond)
    graph_converter = CrystalGraph(cutoff=r_cutoff)
    return {
        "graph_converter": graph_converter,
        "centers": gaussian_centers,
        "width": gaussian_width,
    }


if __name__ == "__main__":
    # For reproducability
    tf.random.set_seed(RANDOM_SEED)

    data = load_dataset("matbench_jdft2d")

    train, test = train_test_split(data, random_state=RANDOM_SEED)

    meg_model = MEGNetModel(**get_default_megnet_args(), lr=1e-2)

    meg_model.train(
        train["structure"],
        train["exfoliation_en"],
        test["structure"],
        test["exfoliation_en"],
        epochs=8,
        verbose=2,
    )

load_model segmentation fault

The installation packages all meet the version requirements, but an error will be reported when "load_model" is loaded. The fault type seems to force me to use GPU, segmentation fault.

Error in transfer_learning notebook

In notebooks/transfer_learning.ipynb, there is a line reading

embedding_layer_index = [i for i, j in enumerate(model.layers) if j.name.startswith('embedding')][0]

This throws an error of IndexError: list index out of range. I believe the line should be rewritten to be

embedding_layer_index = [i for i, j in enumerate(model.layers) if j.name.startswith('atom_embedding')][0]

Preprocessed QM9 dataset

Is it possible to request the preprocessed QM9 dataset (similar to the example provided)? When I read the official QM9 dataset by Pybel, the molecularGraph converter reports quite a lot of errors (by rdkit). I have been struggling with providing the right QM9 dataset to run the model.

Thanks!

Segmentation fault issue possibly due to Keras module in TF2.0

I built off of your MEGNet implementation to create a node classification model. I've been getting occasional Segmentation Faults while running hyperparameter tuning on my model. It does not happen consistently and the stack traces are not the same, though they seem to be triggered from the call to fit_generator in train_from_graphs.

Here's are stack traces from a few of these seg faults:
Screen Shot 2020-03-29 at 6 36 33 PM

Screen Shot 2020-03-29 at 6 22 10 PM

Screen Shot 2020-03-29 at 6 20 47 PM

I think this could be an issue with Keras being imported as a standalone module, while there is a tf.keras module in Tensorflow 2.0. When examining the stack trace, I see a mix of both modules: python3.6/site-packages/tensorflow_core/python/keras/backend.py and python3.6/site-packages/keras/engine/training.py.

Is there a reason that you are still using the standalone Keras? I'm attempting to switch to tf.keras to eliminate any versioning issues, and so that I can build the MEGNet model dynamically and use Tensorflow 2.0's eager computation to more easily debug. However, this has turned out to not be trivial.

Furthermore, the following fatal error has also come up:

Screen Shot 2020-03-29 at 6 26 38 PM

I'm unsure if this is related to the Keras/tf.keras issue or a separate bug. It seems odd that the segment ids would not be increasing, as the bond indices are ordered when the graph is built.

I would appreciate any insight!
-Nicole

On the use of train molecule data for megnet

Excuse me, sir. After I imported my data, there was such an error. Is it caused by tensorflow? Which version of Python and tensrflow should I install? At present, I use Python 3.7 and tensorflow 2.0.0.
If you can help me solve the problem, I will be very grateful and look forward to your reply!

Sincerely.

Here are my codes and errors:
code:
from pymatgen import Molecule
import pandas as pd
from megnet.data.molecule import MolecularGraph
from megnet.models import MEGNetModel
mydata = pd.read_json('/home/chenjie/mytest.json')
structures = []
filenames = pd.read_excel('/home/chenjie/mydata/train_label.xlsx')['newfilename'].values
for filename in filenames:
mol = Molecule.from_file('/home/chenjie/mydata/xyzfile/'+filename)
structures.append(mol)
targets = mydata['Density'].tolist()

print(targets)

model = MEGNetModel(27, 2, 27, nblocks=1, lr=1e-2,
n1=4, n2=4, n3=4, npass=1, ntarget=1,
graph_converter=MolecularGraph())
model.train(structures, targets, epochs=1000, verbose=1)

error:
Using TensorFlow backend.
Traceback (most recent call last):
File "/home/chenjie/PycharmProjects/megnet-master/data-pro.py", line 4, in
from megnet.data.molecule import MolecularGraph
File "/home/chenjie/PycharmProjects/megnet-master/megnet/data/molecule.py", line 18, in
from megnet.data.qm9 import ring_to_vector
File "/home/chenjie/PycharmProjects/megnet-master/megnet/data/qm9.py", line 15, in
from megnet.data.graph import GaussianDistance
File "/home/chenjie/PycharmProjects/megnet-master/megnet/data/graph.py", line 14, in
from keras.utils import Sequence
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/keras/init.py", line 3, in
from . import utils
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/keras/utils/init.py", line 6, in
from . import conv_utils
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/keras/utils/conv_utils.py", line 9, in
from .. import backend as K
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/keras/backend/init.py", line 1, in
from .load_backend import epsilon
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/keras/backend/load_backend.py", line 90, in
from .tensorflow_backend import *
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 5, in
import tensorflow as tf
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow/init.py", line 98, in
from tensorflow_core import *
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow_core/init.py", line 40, in
from tensorflow.python.tools import module_util as _module_util
File "", line 983, in _find_and_load
File "", line 959, in _find_and_load_unlocked
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow/init.py", line 50, in getattr
module = self._load()
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow/init.py", line 44, in _load
module = _importlib.import_module(self.name)
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow_core/python/init.py", line 63, in
from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow_core/python/framework/framework_lib.py", line 30, in
from tensorflow.python.framework.sparse_tensor import SparseTensor
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow_core/python/framework/sparse_tensor.py", line 34, in
from tensorflow.python.ops import gen_sparse_ops
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_sparse_ops.py", line 4824, in
_op_def_lib = _InitOpDefLibrary(b"\n\253\001\n\031AddManySparseToTensorsMap\022\022\n\016sparse_indices\030\t\022\022\n\rsparse_values"\001T\022\020\n\014sparse_shape\030\t\032\022\n\016sparse_handles\030\t"\t\n\001T\022\004type"\027\n\tcontainer\022\006string\032\002\022\000"\031\n\013shared_name\022\006string\032\002\022\000\210\001\001\n\246\001\n\025AddSparseToTensorsMap\022\022\n\016sparse_indices\030\t\022\022\n\rsparse_values"\001T\022\020\n\014sparse_shape\030\t\032\021\n\rsparse_handle\030\t"\t\n\001T\022\004type"\027\n\tcontainer\022\006string\032\002\022\000"\031\n\013shared_name\022\006string\032\002\022\000\210\001\001\n{\n\025DeserializeManySparse\022\025\n\021serialized_sparse\030\007\032\022\n\016sparse_indices\030\t\032\026\n\rsparse_values"\005dtype\032\020\n\014sparse_shape\030\t"\r\n\005dtype\022\004type\n\243\001\n\021DeserializeSparse\022 \n\021serialized_sparse"\013Tserialized\032\022\n\016sparse_indices\030\t\032\026\n\rsparse_values"\005dtype\032\020\n\014sparse_shape\030\t"\r\n\005dtype\022\004type"\037\n\013Tserialized\022\004type\032\0020\007:\006\n\0042\002\007\025\n\227\001\n\023SerializeManySparse\022\022\n\016sparse_indices\030\t\022\022\n\rsparse_values"\001T\022\020\n\014sparse_shape\030\t\032\035\n\021serialized_sparse"\010out_type"\t\n\001T\022\004type"\034\n\010out_type\022\004type\032\0020\007:\006\n\0042\002\007\025\n\223\001\n\017SerializeSparse\022\022\n\016sparse_indices\030\t\022\022\n\rsparse_values"\001T\022\020\n\014sparse_shape\030\t\032\035\n\021serialized_sparse"\010out_type"\t\n\001T\022\004type"\034\n\010out_type\022\004type\032\0020\007:\006\n\0042\002\007\025\n\346\001\n\tSparseAdd\022\r\n\ta_indices\030\t\022\r\n\010a_values"\001T\022\013\n\007a_shape\030\t\022\r\n\tb_indices\030\t\022\r\n\010b_values"\001T\022\013\n\007b_shape\030\t\022\017\n\006thresh"\005Treal\032\017\n\013sum_indices\030\t\032\017\n\nsum_values"\001T\032\r\n\tsum_shape\030\t" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027"\037\n\005Treal\022\004type:\020\n\0162\014\001\002\003\004\005\006\t\016\021\023\026\027\n\232\001\n\rSparseAddGrad\022\026\n\021backprop_val_grad"\001T\022\r\n\ta_indices\030\t\022\r\n\tb_indices\030\t\022\017\n\013sum_indices\030\t\032\017\n\na_val_grad"\001T\032\017\n\nb_val_grad"\001T" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\n\243\001\n\014SparseConcat\022\016\n\007indices\030\t*\001N\022\016\n\006values"\001T*\001N\022\r\n\006shapes\030\t*\001N\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T\032\020\n\014output_shape\030\t"\021\n\nconcat_dim\022\003int"\014\n\001N\022\003int(\0010\002"\t\n\001T\022\004type\n\360\002\n\013SparseCross\022\016\n\007indices\030\t*\001N\022\026\n\006values2\014sparse_types\022\r\n\006shapes\030\t*\001N\022\033\n\014dense_inputs2\013dense_types\032\022\n\016output_indices\030\t\032\031\n\routput_values"\010out_type\032\020\n\014output_shape\030\t"\n\n\001N\022\003int(\001"\025\n\rhashed_output\022\004bool"\024\n\013num_buckets\022\003int(\001"\017\n\010hash_key\022\003int"$\n\014sparse_types\022\nlist(type)(\001:\006\n\0042\002\t\007"#\n\013dense_types\022\nlist(type)(\001:\006\n\0042\002\t\007"\030\n\010out_type\022\004type:\006\n\0042\002\t\007"\035\n\rinternal_type\022\004type:\006\n\0042\002\t\007\n~\n\023SparseDenseCwiseAdd\022\016\n\nsp_indices\030\t\022\016\n\tsp_values"\001T\022\014\n\010sp_shape\030\t\022\n\n\005dense"\001T\032\013\n\006output"\001T" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\n~\n\023SparseDenseCwiseDiv\022\016\n\nsp_indices\030\t\022\016\n\tsp_values"\001T\022\014\n\010sp_shape\030\t\022\n\n\005dense"\001T\032\013\n\006output"\001T" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\n~\n\023SparseDenseCwiseMul\022\016\n\nsp_indices\030\t\022\016\n\tsp_values"\001T\022\014\n\010sp_shape\030\t\022\n\n\005dense"\001T\032\013\n\006output"\001T" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\n\267\001\n\023SparseFillEmptyRows\022\013\n\007indices\030\t\022\013\n\006values"\001T\022\017\n\013dense_shape\030\t\022\022\n\rdefault_value"\001T\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T\032\027\n\023empty_row_indicator\030\n\032\025\n\021reverse_index_map\030\t"\t\n\001T\022\004type\nr\n\027SparseFillEmptyRowsGrad\022\025\n\021reverse_index_map\030\t\022\020\n\013grad_values"\001T\032\r\n\010d_values"\001T\032\024\n\017d_default_value"\001T"\t\n\001T\022\004type\n\235\001\n\017SparseReduceMax\022\021\n\rinput_indices\030\t\022\021\n\014input_values"\001T\022\017\n\013input_shape\030\t\022\022\n\016reduction_axes\030\003\032\013\n\006output"\001T"\025\n\tkeep_dims\022\004bool\032\002(\000"\033\n\001T\022\004type:\020\n\0162\014\001\002\003\004\005\006\t\016\021\023\026\027\n\320\001\n\025SparseReduceMaxSparse\022\021\n\rinput_indices\030\t\022\021\n\014input_values"\001T\022\017\n\013input_shape\030\t\022\022\n\016reduction_axes\030\003\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T\032\020\n\014output_shape\030\t"\025\n\tkeep_dims\022\004bool\032\002(\000"\033\n\001T\022\004type:\020\n\0162\014\001\002\003\004\005\006\t\016\021\023\026\027\n\242\001\n\017SparseReduceSum\022\021\n\rinput_indices\030\t\022\021\n\014input_values"\001T\022\017\n\013input_shape\030\t\022\022\n\016reduction_axes\030\003\032\013\n\006output"\001T"\025\n\tkeep_dims\022\004bool\032\002(\000" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\n\325\001\n\025SparseReduceSumSparse\022\021\n\rinput_indices\030\t\022\021\n\014input_values"\001T\022\017\n\013input_shape\030\t\022\022\n\016reduction_axes\030\003\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T\032\020\n\014output_shape\030\t"\025\n\tkeep_dims\022\004bool\032\002(\000" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\ny\n\rSparseReorder\022\021\n\rinput_indices\030\t\022\021\n\014input_values"\001T\022\017\n\013input_shape\030\t\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T"\t\n\001T\022\004type\nh\n\rSparseReshape\022\021\n\rinput_indices\030\t\022\017\n\013input_shape\030\t\022\r\n\tnew_shape\030\t\032\022\n\016output_indices\030\t\032\020\n\014output_shape\030\t\n\214\001\n\013SparseSlice\022\013\n\007indices\030\t\022\013\n\006values"\001T\022\t\n\005shape\030\t\022\t\n\005start\030\t\022\010\n\004size\030\t\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T\032\020\n\014output_shape\030\t"\t\n\001T\022\004type\n\222\001\n\017SparseSliceGrad\022\026\n\021backprop_val_grad"\001T\022\021\n\rinput_indices\030\t\022\017\n\013input_start\030\t\022\022\n\016output_indices\030\t\032\r\n\010val_grad"\001T" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\n]\n\rSparseSoftmax\022\016\n\nsp_indices\030\t\022\016\n\tsp_values"\001T\022\014\n\010sp_shape\030\t\032\013\n\006output"\001T"\021\n\001T\022\004type:\006\n\0042\002\001\002\n\260\001\n\023SparseSparseMaximum\022\r\n\ta_indices\030\t\022\r\n\010a_values"\001T\022\013\n\007a_shape\030\t\022\r\n\tb_indices\030\t\022\r\n\010b_values"\001T\022\013\n\007b_shape\030\t\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T"\033\n\001T\022\004type:\020\n\0162\014\001\002\003\004\005\006\t\016\021\023\026\027\n\265\001\n\023SparseSparseMinimum\022\r\n\ta_indices\030\t\022\r\n\010a_values"\001T\022\013\n\007a_shape\030\t\022\r\n\tb_indices\030\t\022\r\n\010b_values"\001T\022\013\n\007b_shape\030\t\032\022\n\016output_indices\030\t\032\022\n\routput_values"\001T" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027\n\275\001\n\013SparseSplit\022\r\n\tsplit_dim\030\t\022\013\n\007indices\030\t\022\013\n\006values"\001T\022\t\n\005shape\030\t\032\035\n\016output_indices\030\t*\tnum_split\032\035\n\routput_values"\001T*\tnum_split\032\033\n\014output_shape\030\t*\tnum_split"\024\n\tnum_split\022\003int(\0010\001"\t\n\001T\022\004type\n\242\001\n\024SparseTensorDenseAdd\022\025\n\ta_indices"\010Tindices\022\r\n\010a_values"\001T\022\023\n\007a_shape"\010Tindices\022\006\n\001b"\001T\032\013\n\006output"\001T" \n\001T\022\004type:\025\n\0232\021\001\002\003\004\005\006\010\t\013\014\r\016\021\022\023\026\027"\030\n\010Tindices\022\004type:\006\n\0042\002\003\t\n\271\001\n\027SparseTensorDenseMatMul\022\025\n\ta_indices"\010Tindices\022\r\n\010a_values"\001T\022\013\n\007a_shape\030\t\022\006\n\001b"\001T\032\014\n\007product"\001T"\t\n\001T\022\004type"\034\n\010Tindices\022\004type\032\0020\t:\006\n\0042\002\003\t"\025\n\tadjoint_a\022\004bool\032\002(\000"\025\n\tadjoint_b\022\004bool\032\002(\000\n\274\001\n\rSparseToDense\022\032\n\016sparse_indices"\010Tindices\022\030\n\014output_shape"\010Tindices\022\022\n\rsparse_values"\001T\022\022\n\rdefault_value"\001T\032\n\n\005dense"\001T"\034\n\020validate_indices\022\004bool\032\002(\001"\t\n\001T\022\004type"\030\n\010Tindices\022\004type:\006\n\0042\002\003\t\n\266\001\n\034TakeManySparseFromTensorsMap\022\022\n\016sparse_handles\030\t\032\022\n\016sparse_indices\030\t\032\026\n\rsparse_values"\005dtype\032\020\n\014sparse_shape\030\t"\r\n\005dtype\022\004type"\027\n\tcontainer\022\006string\032\002\022\000"\031\n\013shared_name\022\006string\032\002\022\000\210\001\001")
File "/home/chenjie/anaconda3/envs/megnet-master/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_sparse_ops.py", line 3384, in _InitOpDefLibrary
op_list.ParseFromString(op_list_proto_bytes)
google.protobuf.message.DecodeError: Error parsing message

Process finished with exit code 1

Example with MolecularGraphBatchGenerator?

This looks like a very convenient way to go from SMILES strings to feature vectors that MEGNet can use, but I'm not sure how to use it. I have a very simple script:

import pandas as pd
import tensorflow as tf
from megnet.data.molecule import MolecularGraphBatchGenerator
from megnet.models import MEGNetModel
from math import ceil

tf.random.set_random_seed(42)

df = pd.read_pickle('dataset/uvvis_train.pickle')
df = df[['Structure', 'mean_rj']]

batch_size = 100

train_generator = MolecularGraphBatchGenerator(
    mols=df['Structure'],
    targets=df['mean_rj'].values,
    molecule_format='smiles',
    batch_size=batch_size,
)

model = MEGNetModel(10, 2, nblocks=1, lr=1e-2,
                    n1=4, n2=4, n3=4, npass=1, ntarget=1)

n_steps_per_epoch = ceil(len(df) / batch_size)

model.fit_generator(train_generator, steps_per_epoch=n_steps_per_epoch, verbose=1, epochs=10)

However, this fails with the following error:

Traceback (most recent call last):
  File "/home/erictaw/PycharmProjects/experimental_uv_vis/models/megnet_test/megnet_test.py", line 26, in <module>
    model.fit_generator(train_generator, steps_per_epoch=n_steps_per_epoch, verbose=1, epochs=10)
  File "/home/erictaw/miniconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/erictaw/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/erictaw/miniconda3/lib/python3.6/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "/home/erictaw/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1211, in train_on_batch
    class_weight=class_weight)
  File "/home/erictaw/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
    exception_prefix='input')
  File "/home/erictaw/miniconda3/lib/python3.6/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (1, 5626, 44)

How do you use this generator, and why is there a dimension error?

Refactoring suggestions

Here are some changes I suggest for the organization:

  1. Instead of having the highly specific QM9 and MP packages, I suggest you build an interface which assembles the input based on a list of pymatgen Structures/Molecules and the target properties they wish to learn. This will allow people to use the MEGNet from conceivably any set of structures they wish to use.
  2. You can retain the data.mp and data.qm9 as specific examples of how to leverage on the classes in part 1 to build a model.

Tensorflow version compability

I had been successfully using your great package for months, but now it seems a dependency is off, and I can't seem to find how to get back on track.

I followed your molecule example notebook as inspiration, and as stated, at some point this worked. Because it happened so suddenly, I'm not sure if others are seeing any dependency mismatches.

The meat of the code:
model = MEGNetModel(graph_converter = MolecularGraph(),centers = gaussian_centers,width = gaussian_width,
nfeat_node=27,nfeat_edge=27,nfeat_global=len(state_attributes[0]))
I use the graph_converter=Molecular_graph() to convert structures into graphs and then...
model.train_from_graphs(train_graphs=graph_train, train_targets = target_train,
validation_graphs=graph_validation,validation_targets=target_validation)

The training gets through 3 epochs and then errors out.The errors are below by python and tensorflow version. Other warnings occur,but do not stop the code from running.

Using python 3.8:
tf 2.4.1 TypeError: 'NoneType' object is not callable
tf 2.2.0 Error while reading resource variable Adam/beta_2_17667 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/Adam/beta_2_17667/N10tensorflow3VarE does not exist.
[[node Adam/Cast_3/ReadVariableOp (defined at /home/bransom/Programs/anaconda3/envs/tf-8/lib/python3.8/site-packages/megnet/models/base.py:222) ]] [Op:__inference_train_function_29030]

Function call stack:
train_function

Using python3.7
with tensorflow 1.x
TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (Dimension(96), 64). Consider casting elements to a supported type.
Apparently this is just a problem with tf 1.x

with tensorflow 2.x
Error while reading resource variable Adam/iter_124774 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/Adam/iter_124774/N10tensorflow3VarE does not exist.
[[node Adam/ReadVariableOp (defined at /home/bransom/Programs/anaconda3/envs/tf-2/lib/python3.7/site-packages/megnet/models/base.py:230) ]] [Op:__inference_distributed_function_134432]

Function call stack:
distributed_function

I have read online that this could possibly be fixed using tensorflow.Session, which I'm not sure where to incorporate into the megnet base, since I'm using the prebuilt models?
I have also read that you can import Adam from tf.keras.optimizers, which I have tried to do at various parts in the megnet code and have not succeeded.

Using Python 3.6:
with tensorflow 2.x, there are many modules which may have been renamed within tensorflow, that causes many import errors

Any insight would be great, I've been at this for weeks.

Question about notebooks

Hello, I have some questions:

  1. I tried to train my own data with the latest 'qm9_simple_model' uploaded in notebnooks. It works! but I want to know that the program trains molecular data. Why call crystal related modules? How are they related?
    gc = CrystalGraph(bond_converter=GaussianDistance(np.linspace(0, 5, 100), 0.5), cutoff=4)
    model = MEGNetModel(100, 2, graph_converter=gc)
    What are the parameters of 100 and 2?
  2. Is there any difference or connection between qm9_simple_model, molecule_example and qm9_example?

I am looking forward to your reply, thanks!

Reproduce the result in "Graph Networks as a Universal Machine Learning"

We are trying to reproduce the "formation energy" prediction result in "Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals". But we wonder the parameter setings.

Following the paper, in our experiment, we set converter as:
nfeat_bond = 100 r_cutoff = 4 gaussian_centers = np.linspace(0, r_cutoff + 1, nfeat_bond) gaussian_width = 0.5 graph_converter = CrystalGraph(cutoff=r_cutoff)

and define the model as
model = MEGNetModel(graph_converter=graph_converter, loss="mae", lr=1e-3, metrics=["mean_absolute_error"], dropout=0.5, centers=gaussian_centers, width=gaussian_width, nvocal=95, embedding_dim=16)

finally define the callbacks as
callbacks = [ReduceLRUponNan(patience=500), ManualStop()]

We train the model with 60,000 crystals from the dataset "mp.2018.6.1.json" and the remaining were divided equally between validation and test.

And the training stops at the 522ed epoch and the best validation mae is 0.099684 at the 21st epoch and the test mae is 0.1886

But we wonder the parameter setings in your experiment and any tricks can be shared ?
Thanks a lot.

Impossible to do classification training (ValueError)

Hello! I wanted to train megnet model to classification of structure set (some property is zero or not), so I prepared train data as
column with string values 'zero' or 'nonzero'.

Then model.train method (model is MEGNetModel loaded from band_classification.hdf5) fails with
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).
Using bool values instead of string also doesn't work.
If I use 0 and 1 for train, then trained model gives me float values (near 0 or 1 if trained well enough), but this is not reliable and I would like to do classification machine learning, with probably more than two classes.
Could you please provide any advice how to change model properties to allow that?

there is function i use to make model:

def gnn_model(n_targets=1):
    model_form = MEGNetModel.from_file('band_classification.hdf5')
    embedding_layer = [i for i in model_form.layers if i.name.startswith('embedding')][0]
    embedding = embedding_layer.get_weights()[0]
    #print('Embedding matrix dimension is ', embedding.shape)
    model = MEGNetModel(100,2,ntarget=n_targets)
    # find the embedding layer  index in all the model layers
    embedding_layer_index = [i for i, j in enumerate(model.layers) if j.name.startswith('atom_embedding')][0]

    # Set the weights to our previous embedding
    model.layers[embedding_layer_index].set_weights([embedding])

    # Freeze the weights
    model.layers[embedding_layer_index].trainable = False
    return model

Getting atomistic forces

Hello, I've trained a MEGNet model on potential energies for a crystal system I am interested in and it seems to be working very well. I was curious whether you could obtain per atom forces from the model in order to run a molecular dynamics trajectory using ASE, similar to SchNetPack models. https://pubs.acs.org/doi/10.1021/acs.jctc.8b00908 Has this been tried before with MEGNet?

Is there a way to remove this warning -

As i run this piece of code, I get a warning message

model = MEGNetModel(100, 2,
                    graph_converter=CrystalGraph(bond_converter=GaussianDistance(np.linspace(0, 5, 100), 0.5)))

model.train(train, target, epochs=1)

Here is the warning message I get -

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:437: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradient_tape/model_1/set2set_atom/Reshape_9:0", shape=(None,), dtype=int32), values=Tensor("gradient_tape/model_1/set2set_atom/Reshape_8:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradient_tape/model_1/set2set_atom/Cast:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)

We are trying to build a wrapper function for the megnet model and it's important for us to remove this error.

Screenshot from 2021-03-18 17-23-25

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.