hips / neural-fingerprint Goto Github PK

Convolutional nets which can take molecular graphs of arbitrary size as input.

License: MIT License

Python 19.24% TeX 67.61% Shell 0.09% PostScript 9.22% HTML 3.84%

neural-fingerprint's Introduction

Neural Graph Fingerprints

This software package implements convolutional nets which can take molecular graphs of arbitrary size as input. These are useful for predicting the properties of novel molecules, and are designed to be a drop-in replacement for Morgan or ECFP fingerprints.

The paper describing the algorithm used is:

Convolutional Networks on Graphs for Learning Molecular Fingerprints

David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams.

How to install

This package requires:

Scipy version >= 0.15.0
RDkit
Autograd (Just run pip install autograd)

Examples

This package includes a regression example and a visualization example in the examples directory.

Authors

This software was primarily written by David Duvenaud, Dougal Maclaurin, and Ryan P. Adams. Please feel free to submit any bugs or feature requests. We'd also love to hear about your experiences with this package in general. Drop us an email!

We want to thank Jennifer Wei for helpful contributions and advice, and Analog Devices International and Samsung Advanced Institute of Technology for their generous support.

TensorFlow and Theano implementations

A Tensorflow implementation of a closely-related algorithm can be found at https://github.com/momeara/DeepSEA

and a Theano implementation can be found at https://github.com/debbiemarkslab/neural-fingerprint-theano

neural-fingerprint's People

Contributors

Stargazers

Watchers

Forkers

rtvt123 ml-lab codeaudit sdvillal jeprescottroy altaetran ericmjl robi56 hainm wgapl geauxeric naocandu skearnes ariesselman xypan1232 nanxstats fibala jskdr uta-smile 1206lyp wolfguidobolick 0x7ca xericzephyr chaoshangcs hkmztrk strategist922 mojtabah archenroot j143-zz shizhe1 resurgo-genetics alvarovm jb-delafosse roysh songfgh sergeyanufriev cooldiao feigeliudan01 willtai frankhan91 akshayjh hochshi nseyke zhenghangcn ai3dvision afcarl hbcbh1999 jhuang111 cuichen0497 0xdecafc0ffee anu-bioinfo shkdidrlf amoliu xiaoliang008 aspirincode hulalazz somous-jhzhao jaehongyoon huweiwei0105 ygshuwu zeigar tsjain gfzhou alphatestk annaebair silviaamam minghao2016 science4fun bhanditz nicolemitchell miliana adamxyang gscalia mc-robinson zeromtmu stjordanis gagayuan goodmorningmoring mirjunaid26 mxu00 savithanagaraju hehuanma zjujdj sparklingredstar chapmajw phenylazide m4rm0k juexinwang ishidomasami riddleye86 derienfe lifeixianshen yuanjames hcji shouhengtuo aditya1707 sj-huang natnaelt kacperkubara awoziji

neural-fingerprint's Issues

experiment scripts import error

Hello,
I was trying to launch script launch_experiments.py in directory experiment_scripts and got error:
Traceback (most recent call last):
File "launch_experiments.py", line 11, in
from nfpexperiment import util
ImportError: No module named nfpexperiment

It seesm that there is not module nfpexperiment. Could you please provide this module?
Thanks in advance.

Syntax Errors, util.py not found, cannot import name 'logsumexp' from 'autograd.scipy.misc'

As title says, I've encountered a whole handful of issues attempting to use this software.

After building, I had to manually place packages and libs in my miniconda3 folder. Maybe I could've specified some parameters to install it correctly? Not sure. Here are the real problems:

Attempting to import neuralfingerprint from any IDE or location within terminal outside of neuralfingerprint folder results in: 'util.py' not found. Not a problem right? Just append the path of that folder to my sys.path....

Once fixing the util.py problem, I receive many syntax errors regarding print statements.
E.G.:
print "Total number of weights in the network:", num_weights
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Total number of weights in the network:", num_weights)?

Once fixing those manually, I get the next problem!

File "/mnt/c/Users/came/Documents/neural-fingerprint-master/neural-fingerprint-master/neuralfingerprint/build_convnet.py", line 2, in
from autograd.scipy.misc import logsumexp
ImportError: cannot import name 'logsumexp' from 'autograd.scipy.misc' (/home/azn/miniconda3/lib/python3.9/site-packages/autograd/scipy/misc.py)

I'm no longer interested in putting out fires so I figured I'd post here despite the forum being inactive for the last four years, fingers crossed y'all are still paying attention to this page!

Please let me know if there are any fixes for these issues or if y'all know what's going on here.

I believe it could just be a mismatch between Python3 and Python2, I'm not very thrilled about having to swap python3 to python2 for this so I figured it would be good to ask first.

Many thanks!

Binary Cross Entropy instead of MSE

Hi,

maybe a quite simple question but in your Regression example you pass the

'nll_func': <function neuralfingerprint.util.rmse as the nll_func via build_conv_deep_net to the build_standard_net function. However, I am not sure how the utils.rmse relates to the mean_squared_error function that is being used in the build_standard_net.

My goal is to adapt the example code so that I can do a binary classification.
I tried to replace the default loss from build_standard_net with the binary_cross_entropy. But I think I am missing something because the results do not make sense:

Strange predictions

Hello. I'm trying neuralfingerprint, and have faced strange behaviour:
When i apply the model to csv file, which contains only smiles:
CCC
FFF
i get the result:
CCC,-3.4293943508031028
FCF,-2.6789522776231816

but when i put only CCC, i get another result.
CCC,-3.0120117325667533

If there are same molecules in input file, it gives same results for them, like:
CCC,-3.0120117325667533
CCC,-3.0120117325667533

The predictions are reproductable (dont change after another run), but the exact values depends on the contents of test csv file. I use your example.

smiles = read_smiles(task_params['experiment_data_file'])
result = predict_func(smiles)

I'm wondering if it is a bug or a feature.

Malari Dataset

Hi, I am wondering that malaria dataset is from real experiments or calculation?

Some bugs about mol_graph.py

Hello,

I have met a bug about

mol_graph.py", line 79, in graph_from_smiles
raise ValueError("Could not parse SMILES string:", smiles)
ValueError: ('Could not parse SMILES string:', 'CCCCCCCCCCCCN(C)CC(=O)O')"

As described above, the "[N]" in the smiles couldn't be parsed. I wonder if this question is mainly due to the limitation of rdkit package.
Or would you have any better solutions about this bug?

Best regards,
YJ

The metrics of Mean predictive accuracy of neural fingerprints in table 1

Is the metric MAE, MSE or RMSE?

Some details about Hyperparameter Optimization?

I am so interested in your method that I had to ask another following issue:

Hyperparameter Optimization
"To optimize hyperparameters, we used random search. The hyperparameters
of all methods were optimized using 50 trials for each cross-validation fold. The
following hyperparameters were optimized: log learning rate, log of the initial weight scale, the log
L2 penalty, fingerprint length, fingerprint depth (up to 6), and the size of the hidden layer in the
fully-connected network. Additionally, the size of the hidden feature vector in the convolutional
neural fingerprint networks was optimized."

Could you give me some suggestions about Hyperparameter Optimization?
Or any empirical range about these parameters?

Best regards,
YJ

Fingerprint BitVector to Int array

Currently in the file rdkit_utils.py a BitVector is obtained using RDKit, then transformed to a BitString, and then iteratively converted to a np array

AllChem.GetMorganFingerprintAsBitVect(
        m, fp_radius, nBits=fp_length)).ToBitString()

np.array([list(s) for s in A], dtype=int)

Can be written as
DataStructs.ConvertToNumpyArray(AllChem.GetMorganFingerprintAsBitVect(m, fp_radius, Bits=fp_length), np.zeros((1,)))

as seen in
http://www.rdkit.org/Python_Docs/rdkit.DataStructs.cDataStructs-module.html#ConvertToNumpyArray

Utils not found

After the neuralfingerprint install (pip install -e /my_dir/, I'm trying to run the examples (regression.py), but I get a 'utils' not found error...
Can you please help, thanks!