Coder Social home page Coder Social logo

qmof's Introduction

QMOF Database

Overview

The Quantum MOF (QMOF) Database is a publicly available dataset of quantum-chemical properties for 20,000+ metal–organic frameworks (MOFs) and coordination polymers derived from high-throughput periodic density functional theory (DFT) calculations. The MOFs are all DFT-optimized and are derived from a variety of parent databases, including both experimental and hypothetical MOF databases.

Explore and Download the QMOF Database

To download the data underlying the QMOF Database (i.e. DFT-optimized geometries, energies, partial atomic charges, bond orders, atomic spin densities, magnetic moments, band gaps, charge densities, density of states, etc. as well as the raw VASP files), see the documentation below:

Downloading the QMOF Database

Interactively explore the dataset and more at the following link:

https://materialsproject.org/mofs

Follow the QMOF Database on Twitter (@QMOF_Database) if you want to be the first to know about the latest news and updates.

Updates

For a list of version-specific updates, see updates.md.

Citation

If you use the QMOF Database, please refer to the following publications. Both should be cited if you are using the dataset with 20k+ structures.

  • A.S. Rosen, S.M. Iyer, D. Ray, Z. Yao, A. Aspuru-Guzik, L. Gagliardi, J.M. Notestein, R.Q. Snurr. "Machine Learning the Quantum-Chemical Properties of Metal–Organic Frameworks for Accelerated Materials Discovery", Matter, 4, 1578-1597 (2021). DOI: 10.1016/j.matt.2021.02.015.
  • A.S. Rosen, V. Fung, P. Huck, C.T. O'Donnell, M.K. Horton, D.G. Truhlar, K.A. Persson, J.M. Notestein, R.Q. Snurr. "High-Throughput Predictions of Metal–Organic Framework Electronic Properties: Theoretical Challenges, Graph Neural Networks, and Data Exploration," npj Comput. Mat., 8, 112 (2022). DOI: 10.1038/s41524-022-00796-6.

Licensing

The data underlying the QMOF Database is made publicly available under a CC BY 4.0 license. This means you can copy it, share it, adapt it, and do whatever you like with it provided that you give appropriate credit and indicate any changes.

Contact

If you have any questions, you can reach the corresponding author at the e-mail listed here.

qmof's People

Contributors

andrew-s-rosen avatar janosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

qmof's Issues

Pymatgen import error with OFM

Moved from #16, which stated:
This is my prolem,ImportError: cannot import name 'VoronoiCoordFinder' from 'pymatgen.analysis.structure_analyzer'.
What is the version of pymatgen package?Has the function been removed? How can I solve this problem?
Thank you very much.

Hi @tuantuan-lin. I'm happy to try and help. Pymatgen recently introduced some backwards-incompatible changes with how functions are called, as noted here. I imagine this is the likely cause.

In order to help you, let's make sure we're talking about the same thing. I assume that you are referring to ofm_feature_generator.py, correct? And the problem appears upon calling the OrbitalFieldMatrix function from matminer? Please let me know the full traceback error (e.g. what lines it's failing on).

If this is the case, please try the following:

  1. Uninstall matminer.
  2. Uninstall pymatgen.
  3. Install the most recent version of pymatgen (pip install pymatgen).
  4. Clone the current matminer repository (git clone https://github.com/hackingmaterials/matminer.git).
  5. In the base directory of the downloaded matminer repository, install matminer via pip install .
  6. Try the code again.

As for the specific versions used in the QMOF Database paper, please see the "Additional Software and Hardware Details for Machine Learning" section (bottom of pg 8) in the Supporting Information. I used Pymatgen v.2020.12.3 with matminer v.0.6.4.

Dimensonality typo

In the .json files of the QMOF database, one of the entries is named dimensonality instead of dimensionality.

Update machine learning scripts for v3+

v3 of the QMOF database introduces new structures, properties, and new formatting of the tabulated data. The machine learning scripts have a few hard-coded instances tied to the format used in v1 of the QMOF database (and the sample encodings are for the smaller v1 database). This should be updated.

CHGCAR filenames

There are a few CHGCARs whose names don't perfectly match up with the corresponding MOFs (e.g. a period "." is present instead of an underscore "_"). There also may be a few (e.g. a dozen) CHGCARs whose structures have since been removed from the QMOF Database. Both of these issues will be fixed when the next batch of CHGCARs is uploaded.

about the qmof.json

I have find the error that the qmof.json doesn't have the 'structure'. And so the code will have problem.

import os
import json
from pymatgen.core import Structure

------Settings------#

struct_json_path = "E:\GANN-main\qmof_database\qmof_structure_data.json" # path to structure json
cif_folder_path = "E:\GANN-main\qmof_database\qmof_cif" # path to folder where CIFs will be stored
write_site_props = True # if site properties should be written to CIF
only_ddec_charge = False # set to True if you only want _atom_site_charge flags

------Settings------#

Make new folder to store CIFs

if not os.path.exists(cif_folder_path):
os.mkdir(cif_folder_path)

Read in structure data

with open("E:\GANN-main\qmof_database\qmof.json") as f:
qmof_struct_data = json.load(f)

Loop over structures and write each one out to a CIF

qmof_structs = {}
for entry in qmof_struct_data:

qmof_id = entry["qmof_id"]  # name for CIF
print(f"Writing {qmof_id}")

struct = Structure.from_dict(entry["structure"])  # Pymatgen structure
cif_path = os.path.join(cif_folder_path, f"{qmof_id}.cif")  # path to write CIF
struct.to(filename=cif_path)  # write CIF
properties = dict(sorted(struct.site_properties.items()))  # fetch site properties

# Overwrite CIF with site properties
if write_site_props:
    new_cif = ""
    i = 0
    prop_lines = False

    with open(cif_path, "r") as f:
        for line in f:
            if "_atom_site_occupancy" in line:
                new_cif += line
                if only_ddec_charge:
                    new_cif += f" _atom_site_charge\n"
                else:
                    for key in properties.keys():
                        new_cif += f" _atom_site_{key}\n"
                prop_lines = True
                continue

            if i == len(struct):
                prop_lines = False

            if prop_lines:
                new_cif += line.strip()
                if only_ddec_charge:
                    new_cif += f"  {properties['pbe_ddec_charge'][i]}"
                else:
                    for value_sets in properties.values():
                        new_cif += f"  {value_sets[i]}"
                new_cif += "\n"
                i += 1
            else:
                new_cif += line

    # Write out new CIF
    with open(cif_path, "w") as f:
        f.write(new_cif)

Traceback (most recent call last):
File "E:\GANN-main\qmof_database\scripts\make_cifs.py", line 27, in
struct = Structure.from_dict(entry["structure"]) # Pymatgen structure
KeyError: 'structure'

can you tell me how to solve it?
thank you for your reply

Switch to openpyxl for xlsx reader in v4

In the qmof_database.zip folder, the get_subset_data.py script calls pandas' read_excel() function to read a .xlsx file. The underlying xlrd engine no longer supports reading of .xlsx, so line 65 should instead read: df_excel = pd.read_excel(os.path.join(parent_name, entry),index_col=0, header=None, sheet_name=None, engine='openpyxl') to call openpyxl instead. Will be updated in v4 of the QMOF database.

Some GMOFs have missing H

Some of the GMOFs (materials with 'gmof_' in the name) have missing H atoms in the linkers or odd CN bonds. These are due to errors in the original GMOF Database. Please exercise caution when using the GMOFs until this is resolved (ETA week of 9/13).

This issue is present for v9 and v10 but will be resolved in v11.

properties = dict(sorted(struct.site_properties.items())) # fetch site properties出现错误 struct = Structure.from_dict(entry["structure"]) # Pymatgen structure KeyError: 'structure'

when i run the code of make_cifs.py ,it will have the error---- properties = dict(sorted(struct.site_properties.items())) # fetch site properties出现错误 struct = Structure.from_dict(entry["structure"]) # Pymatgen structure
KeyError: 'structure'

i can't solve it can you tell me what happened?

import os
import json
import pymatgen
from pymatgen.core import Structure

------Settings------#

struct_json_path = "E:\GANN-main\qmof_database\qmof.json" # path to structure json
cif_folder_path = "E:\GANN-main\qmof_database\qmof_cif" # path to folder where CIFs will be stored
write_site_props = True # if site properties should be written to CIF
only_ddec_charge = False # set to True if you only want _atom_site_charge flags

------Settings------#

Make new folder to store CIFs

if not os.path.exists(cif_folder_path):
os.mkdir(cif_folder_path)

Read in structure data

with open("E:\GANN-main\qmof_database\qmof.json") as f:
qmof_struct_data = json.load(f)

Loop over structures and write each one out to a CIF

qmof_structs = {}
for entry in qmof_struct_data:

qmof_id = entry["qmof_id"]  # name for CIF
print(f"Writing {qmof_id}")

struct = Structure.from_dict(entry["structure"])  # Pymatgen structure
cif_path = os.path.join(cif_folder_path, f"{qmof_id}.cif")  # path to write CIF
struct.to(filename=cif_path)  # write CIF
properties = dict(sorted(struct.site_properties.items()))  # fetch site properties

# Overwrite CIF with site properties
if write_site_props:
    new_cif = ""
    i = 0
    prop_lines = False

    with open(cif_path, "r") as f:
        for line in f:
            if "_atom_site_occupancy" in line:
                new_cif += line
                if only_ddec_charge:
                    new_cif += f" _atom_site_charge\n"
                else:
                    for key in properties.keys():
                        new_cif += f" _atom_site_{key}\n"
                prop_lines = True
                continue

            if i == len(struct):
                prop_lines = False

            if prop_lines:
                new_cif += line.strip()
                if only_ddec_charge:
                    new_cif += f"  {properties['pbe_ddec_charge'][i]}"
                else:
                    for value_sets in properties.values():
                        new_cif += f"  {value_sets[i]}"
                new_cif += "\n"
                i += 1
            else:
                new_cif += line

    # Write out new CIF
    with open(cif_path, "w") as f:
        f.write(new_cif)

Make CHGCARs more accessible

There's no way around this, but the CHGCARs are very large. They are currently made available on Box, but this only allows the user to download one at a time. Some approach should be taken such that they can be readily accessed in bulk.

Bug with get_subset_data.py

The get_subset_data.py file does not correctly create a subset from the parent .json as is evident from an incorrect number of rows in the resulting DataFrame. Will be patched in v5.

atom_init.json: La/Ac/Lu/Lr

In the original atom_init.json file, La and Ac were placed in group 3 whereas Lu and Lr were placed in fictitious group 19. While perhaps slightly debatable, it is likely more appropriate for La and Ac to be in fictitious group 19 with Lu and Lr in group 3. This is also more consistent with the QMOF SI. The new atom_init.json adjusts for this, and benchmarking results will use this going forward.

Thanks to @Eric-Musa (FAIR-Chem/fairchem#296) for his thoughtful analysis of the initial elemental embeddings, which prompted me to look for this.

Issues encountered while parsing CIF

Dear authors,

When I run main.py, I got a UserWarning "Issues encountered while parsing CIF: Some fractional co-ordinates rounded to ideal values to avoid issues with finite precision". Do this warning have any effect on the results of CGCNN?

Prepare for v2 and do some restructuring

The Figshare repository could use some minor restructuring to make it more amenable to future planned updates (including v2, to be released in a few days) and to continue increasing its user-friendliness. Same goes for this GitHub repo.

ConQuest Python script filename

In the README at qmof_database/initial_structures/structures/README.md in the QMOF Database, the instructions say to use download_and_remove_free_solvent.py but should instead say download_csd_cifs.py. The name was shortened but not updated in the README. This will be fixed in the next update.

'COO' object has no attribute 'format'

Dear authors,

When I run soap_matrix_generator.py, there is an error as follows.

Traceback (most recent call last):
File "soap_matrix_generator.py", line 59, in
save_npz(soap_filename, soap_matrix)
File "/home/liu/.local/lib/python3.6/site-packages/scipy/sparse/_matrix_io.py", line 56, in save_npz
if matrix.format in ('csc', 'csr', 'bsr'):
AttributeError: 'COO' object has no attribute 'format'

Is something wrong with 'dscribe' or 'scipy.saprse'?

Add a .csv of non-DFT property data

It would be nice to add a .csv of additional useful info for each structure. I'm thinking the following at minimum:

  1. Is it also in the CoRE 2019 database?
  2. Dimensionality of the framework (e.g. using pymatgen's get_dimensionality_larsen())
  3. Pore-based properties (e.g. using Zeo++)
  4. Initial and final lattice constants (and volume)
  5. Chemical formula

Figshare Link is Down

For some reason, Figshare is showing the QMOF Database as offline. 😡

I have reached out to support to resolve this issue and will make a mirror in the meantime.

Add .json files

.csv and .xlsx files are nice for those with less programming expertise, but it'd be extra convenient to host most of the important data in a single .json file. The .json format will allow for entries of different length, which is nice. This can also be used to store key VASP parameters so the user doesn't need to download the raw log files.

Make GitHub repo more amenable to user contributions

Self-explanatory. As brought up by @kjappelbaum, the GitHub page should make it more seamless to fork and make PRs. Will need to navigate file size limits and should be kept small here, but nonetheless should have some of the curated data from the Figshare. Also provide instructions on how to contribute,

EDIFF for 185 structures

For 185 structures, the VASP log files uploaded on Figshare are for EDIFF = 1e-4 rather than 1e-6. The corrected files will be uploaded in v9 of the QMOF Database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.