Coder Social home page Coder Social logo

ani1x_datasets's Introduction

ANI1x_datasets

The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for organic molecules. Please downlod actual datafiles from FigShare first: https://springernature.figshare.com/collections/The_ANI-1ccx_and_ANI-1x_data_sets_coupled-cluster_and_density_functional_theory_properties_for_molecules/4712477

This repository contains the scripts needed to access the ANI-1x data sets.

Required software

  • python>=3.5
  • numpy
  • h5py

Repository content

  • Python reader for HDF5 dataset file
  • Interactive plots comparing data distribution in QM9, ANI-1, ANI-1x and ANI-1ccx datasets in form of parametric t-SNE projection of the first later activation of ANI-1x model.

...

If you use ANI-1x dataset please cite the following papers

  • ANI-1x dataset

    Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less Is More: Sampling Chemical Space with Active Learning. J. Chem. Phys. 2018, 148 (24), 241733.
    https://doi.org/10.1063/1.5023802

  • ANI-1ccx dataset

    Smith, J. S.; Nebgen, B. T.; Zubatyuk, R.; Lubbers, N.; Devereux, C.; Barros, K.; Tretiak, S.; Isayev, O.; Roitberg, A. E. Approaching Coupled Cluster Accuracy with a General-Purpose Neural Network Potential through Transfer Learning. Nat. Commun. 2019, 10 (1), 2903.
    https://doi.org/10.1038/s41467-019-10827-4

  • wB97x/def2-TZVPP data

    Zubatyuk, R.; Smith, J. S.; Leszczynski, J.; Isayev, O. Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecules Neural Network. Sci. Adv. 2019, 5 (8), eaav6490.
    https://doi.org/10.1126/sciadv.aav6490

ani1x_datasets's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ani1x_datasets's Issues

SMILES string

How do I access the SMILES string for any molecule?

ccsd(t) forces not included

Hello,

after using your example_loader file, it seems that the ccsd(t)_cbs.forces key is missing from the hd5 dump.

ANI energies

What is the key of the energies on which the ANI models were trained on? And what the forces one?

Missing Quadrupole Constants?

Hello!

Thank you so much for making the ANI-1x dataset available, it is a fantastic resource. I have a question regarding the availability of quadrupoles for molecules/conformers in the dataset. According to the paper, the 'wb97x_dz.quadrupole' key should contain an array of size $N_c \times 6$ where $N_c$ is the number of conformers per molecule. When I look at this array, a significant number of rows were full of nan. I ran the following code snippet:

ani1x_data = h5py.File('ani-1x/ani1x-release.h5')
frac_quads_li = []
for i in ani1x_data.keys():
    all_quads = ani1x_data[i]['wb97x_dz.quadrupole']
    all_quads_sub = np.unique(np.argwhere(~np.isnan(all_quads))[:,0])
    frac_quads_li.append(float(len(all_quads_sub))/len(all_quads))
print(f'Avg Fraction Computed Quads: {round(np.average(frac_quads_li),3)}')
print(f'No Quad Count: {np.sum(np.array(frac_quads_li)==0.0)}/{len(frac_quads_li)}')

...and got the following result:

Avg Fraction Computed Quads: 0.215
No Quad Count: 1698/3114

So it appears that there are quite a few quadrupoles that are all nan, with more than half of molecules having no quadrupole information. When I run the same analysis on 'wb97x_dz.dipole', I found that 181 molecules have no dipole constants available for any conformers. I did not find anything in the publication or GH repo that mentioned these nan values (although I may have missed it). So I am just wondering what happened to the dipoles/quadrupoles in these cases, and whether there is a version of the ANI-1x dataset that contains these dipole/quadrupole values. If not, that is fine. Am happy to recalculate them, or omit the corresponding conformers for the analysis I am trying to do. But if a shareable version is available with these additional values I would appreciate it, as it would save me some time and compute.

Thank you for your time,

Marcus Schwarting

multiplicity of oxygen molecules

I notice some oxygen molecules appears in the ANI-1x dataset, such as the oxygen dimer below. I'd like to confirm what multiplicity these dimers are using during the Gaussian calculation, 1 or 5?

4

O 1.817700 0.308000 -0.269900
O 1.132200 -0.515000 0.285500
O -1.794600 -0.410600 -0.233300
O -1.155300 0.617600 0.217600

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.