mir-group / flare Goto Github PK

An open-source Python package for creating fast and accurate interatomic potentials.

Home Page: https://mir-group.github.io/flare

License: MIT License

Python 59.55% C++ 39.01% CMake 1.23% Shell 0.16% C 0.05%

kokkos

flare's Introduction

NOTE: This is the latest release 1.3.3 which includes significant changes compared to the previous version 0.2.4. Please check the updated tutorials and documentations from the links below.

FLARE: Fast Learning of Atomistic Rare Events

FLARE is an open-source Python package for creating fast and accurate interatomic potentials.

Major Features

Note:

We implement Sparse GP, all the kernels and descriptors in C++ with Python interface.

We implement Full GP, Mapped GP, RBCM, Squared Exponential kernel and 2+3-body descriptors in Python.

Please do NOT mix them.

Documentations and Tutorials

Documentation of the code can be accessed here: https://mir-group.github.io/flare

Applications using FLARE and gallery

Google Colab Tutorials

FLARE (ACE descriptors + sparse GP). The tutorial shows how to run flare with ACE and SGP on energy and force data, demoing "offline" training on the MD17 dataset and "online" on-the-fly training of a simple aluminum force field. All the trainings use yaml files for configuration.

FLARE (LAMMPS active learning) This tutorial demonstrates new functionality for running active learning all within LAMMPS, with LAMMPS running the dynamics to allow arbitrarily complex molecular dynamics workflows while maintaining a simple interface. This also demonstrates how to use the C++ API directly from Python through pybind11. Finally, there's a simple demonstration of phonon calculations with FLARE using phonopy.

FLARE (ACE descriptors + sparse GP) with LAMMPS. The tutorial shows how to compile LAMMPS with FLARE pair style and uncertainty compute code, and use LAMMPS for Bayesian active learning and uncertainty-aware molecular dynamics.

Compute thermal conductivity from FLARE and Boltzmann transport equations. The tutorial shows how to use FLARE (LAMMPS) potential to compute lattice thermal conductivity from Boltzmann transport equation method, with Phono3py for force constants calculations and Phoebe for thermal conductivities.

Using your own customized descriptors with FLARE. The tutorial shows how to attach your own descriptors with FLARE sparse GP model and do training and testing.

All the tutorials take a few minutes to run on a normal desktop computer or laptop (excluding installation time).

Installation

Pip installation

Please check the installation guide here. This will take a few minutes on a normal desktop computer or laptop.

Developer's installation guide

For developers, please check the installation guide.

Compiling LAMMPS

See documentation on compiling LAMMPS with FLARE

Trouble shooting

If you have problem compiling and installing the code, please check the FAQs to see if your problem is covered. Otherwise, please open an issue or contact us.

System requirements

Software dependencies

GCC 9
Python 3
pip>=20

MKL is recommended but not required. All other software dependencies are taken care of by pip.

The code is built and tested with Github Actions using the GCC 9 compiler. (You can find a summary of recent builds here.) Other C++ compilers may work, but we can't guarantee this.

Operating systems

flare++ is tested on a Linux operating system (Ubuntu 20.04.3), but should also be compatible with Mac and Windows operating systems. If you run into issues running the code on Mac or Windows, please post to the issue board.

Hardware requirements

There are no non-standard hardware requirements to download the software and train simple models—the introductory tutorial can be run on a single cpu. To train large models (10k+ sparse environments), we recommend using a compute node with at least 100GB of RAM.

Tests

We recommend running unit tests to confirm that FLARE is running properly on your machine. We have implemented our tests using the pytest suite. You can call pytest from the command line in the tests directory.

Instructions (either DFT package will suffice):

pip install pytest
cd tests
pytest

References

If you use FLARE++ including B2 descriptors, NormalizedDotProduct kernel and Sparse GP, please cite the following paper:

[1] Vandermause, J., Xie, Y., Lim, J.S., Owen, C.J. and Kozinsky, B., 2021. Active learning of reactive Bayesian force fields: Application to heterogeneous hydrogen-platinum catalysis dynamics. Nature Communications 13.1 (2022): 5183. https://www.nature.com/articles/s41467-022-32294-0

If you use FLARE active learning workflow, full Gaussian process or 2-body/3-body kernel in your research, please cite the following paper:

[2] Vandermause, J., Torrisi, S. B., Batzner, S., Xie, Y., Sun, L., Kolpak, A. M. & Kozinsky, B. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events. npj Comput Mater 6, 20 (2020). https://doi.org/10.1038/s41524-020-0283-z

If you use FLARE LAMMPS pair style or MGP (mapped Gaussian process), please cite the following paper:

[3] Xie, Y., Vandermause, J., Sun, L. et al. Bayesian force fields from active learning for simulation of inter-dimensional transformation of stanene. npj Comput Mater 7, 40 (2021). https://doi.org/10.1038/s41524-021-00510-y

If you use FLARE PyLAMMPS for training, please cite the following paper:

[4] Xie, Y., Vandermause, J., Ramakers, S., Protik, N.H., Johansson, A. and Kozinsky, B., 2022. Uncertainty-aware molecular dynamics from Bayesian active learning: Phase Transformations and Thermal Transport in SiC. npj Comput. Mater. 9(1), 36 (2023).

If you use FLARE LAMMPS Kokkos pair style with GPU acceleration, please cite the following paper:

[5] Johansson, A., Xie, Y., Owen, C.J., Soo, J., Sun, L., Vandermause, J. and Kozinsky, B., 2022. Micron-scale heterogeneous catalysis with Bayesian force fields from first principles and active learning. arXiv preprint arXiv:2204.12573.

flare's People

Contributors

Stargazers

Watchers

Forkers

jonpvandermause nw13slx kylebystrom mayankaditya mfhossam1992 smheidrich ponychen123 sabarikumar bduschatko kitpeng11 sirrine-jonathan mailhexu chemshift gangtou3 owaisahmad18 aaronchen0316 jens-hummelshoej-tri atomistic1 vera-weili feifzhou zhenming-xu jxzhangjhu fenggo jiaming1999 yychuang yuuuxie hyunkim98 bostonceltics20 deyh2020 klightz christianzhao e-kwsm davidw99 leslie-zheng hitergelei juliayang mhsiron reach2sayan davy12344 currymultisci matt-chan proodjepindakaas shdchen jingxuanding luigibonati amkcode htz1992213 kurtmckee bastonero raolixiang-up 1713175349 yukeli-chem henryiii neerukulhari developers81828182 cw-tan andim53 bdxka sudheerganisetti nicholasdow thomaspigeon dft13 aaditsaluja

flare's Issues

is_std_in_bound() method from OTF could be moved to a new file

Related to issue 14: #14

Because the is_std_in_bound() function is a convenience method for OTF or (the in development) TrajectoryTrainer, it would eliminate redundancy to have it exist outside of the OTF class. If we end up splitting predict functions to a separate file, I think that this function would be a natural fit for there (given that it is to diagnose the result of a prediction on a structure).

Tutorials for OTF, MFF, and MFF pairstyle

Before officially releasing the first version of the code, it would be great to include detailed tutorials in the documentation explaining how to use key features (especially otf, mff, and the mff pairstyle in LAMMPS).

Transfer training result to LAMMPS coefficient file

need a module to

use otf_parser.py to get trained GP
use mff to build mapping
save coefficients from mff
write coefficients in the format read by LAMMPS pair_style

Parallel environment setup

we currently use mpirun. But we should allow the user to define their own mpi environment, such as mpirun, mpiexec and srun.

Need to meet all community standards before releasing V1.0

Missing code of conduct, issue template, pull request template, etc.

mff may not be compatible with the new species convention

serial QE call in run_espresso is incompatible with impi

IMPI does not take "mpirun exec <input" as OpenMPI does.

It should be "mpirun -np 1 exec < input"

ASE Interface Unit Tests

Unit tests for the ASE interface.

update_L_alpha and update_L_alpha_v1 in gp.py need unit tests

The tests should check that the resulting matrices agree with set_L_alpha.

Setup.py file would help new users

A setup.py file contains lots of useful info-- among other things, a list of all python packages which are required. Running this can install everything that is needed and so would be handy for new users.

One such guide to doing so is here:
https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/quickstart.html

We can include this on the wishlist for V 1.0.

Missing positional argument in md_run module

In line 50 of md_run.py, the function call to output.write_header is missing the positional argument std_tolerance.

Kernel Benchmarks

Before trying to make any kernel optimizations, I thought it would be good to make a suite of benchmarks so we can easily and consistently measure any performance boosts.

I am thinking a two phase benchmark would work well.

Setup script with constructs kernel inputs (atomic environments and the required parameters) and write them to disk in a language agnostic form. This data can then be used by multiple implementations of the kernel.
Python/numba implementation test which reads in the data and times the computation of the kernels

OTF restart module/method

Occasionally OTF could be interrupted (either by bug, or the user wants to interrupt the training and change the condition) and needs to restart from middle. It will be good to have a restart module, or a method inside otf module.
I have a script to do this while I was training my stanene system. If you guys have written any better wrapped module, that will be great.

output file shouldn't be repeatedly open and close

The output file in output.py should not be repeatedly open and close. And we also need to allow multiple output files

VASP IO interface sample file

Relates to issue #19 ; generate a file demonstrating how to set up / run / parse VASP files.

Serialization and representation (e.g. str) methods for various objects

Methods for serializing certain objects which are passed between models (i.e. atomic environments, structures, etc), or even models themselves, would be useful. The advantage of this over pickled objects is that they can be more human-readable (and I understand that pickled objects have some security risks associated with them).

One example application is that JSON objects are easily storable in certain database architectures. This might be relevant for e.g. FOOGA in the near future if we want to automate the process of training GP models for different datasets, as this would let us store them more easily.

In my development branch I've done this for the AtomicEnvironment object. There are ways we could standardize this or easily implement it across our codebase (e.g. by using Monty, which has an object type which allows for effortless JSON serialization of different Python objects).

update_L_alpha should be parallelized

set_L_alpha gives the option to compute the covariance matrix in parallel. It would be helpful to have the same option for update_L_alpha, especially for large GPs

"mff" should no longer be used, the name is gonna change

to "mgp" (Mapped Gaussian Process) (credit to Jon)

VASP parsers and utility

Would be helpful to (short term) provide built-in methods to parse VASP files for model training, and (long term) support VASP interface for OTF runs

CPU overload when parallelization is on for both gp and prediction

The training process can overload compute nodes when both gp and predict are parallelized.
This should not happen because the paralleled functions are never called in the same step.
But it could be because we use multithread and concurrence at the same time.

ASE interface unit test

open a new branch, tasks:

calculator
otf - md
test interface of different dft calculators

flexible interface for DFT call

We need to reform the OTF class to interface with

QE
CP2K
ASE
Pymatgen (VASP)

Previous positions are weird; velocity for structure would be useful to have

Docstrings are missing from many files

Docstrings annotating arguments and outputs to various methods would improve user readability.

Hyperparameter output to a separate file

Would help to make managing the results of runs easier.

VASP (potentially more) Parser Utility Functions

One feature that would help to accelerate the pipeline of GP from AIMD workflows would be helper functions which parse DFT outputs (like VASP) and turn them into a file of serialized structures decorated with force information. A second helper function could generate atomic environments from a .json file. The use of functions like pymatgen for parsing would be extremely welcome here, as they have very high quality and externally maintained parsers for VASP. These wrappers would be simple to implement and useful.

Relevant to #19.

To / From methods for Structure object & ASE Atoms

@YuuuuXie may have already done this, but

A static method for turning ASE Atoms objects intro FLARE Structures would help when generating structures using ASE,
A method for turning FLARE Structures into ASE Atoms objects would also be nice.

Both of these would enhance user pre-processing flexibility.

Flare io.py

Write flare.io module, which is an input/output module specifically for flare data structures. Should take md_trajectory_to/from_file from vasp_util.py and put it here.

need integration test of mff and lammps potential

check both against underlying gp model

print likelihood at the end of each training step

currently it stops printing the likelihood after the hyper-parameter training phase

Third unit test in test_gp_from_aimd takes forever and sometimes fails

@stevetorr could you take a look?

Here's a screenshot of the error:

ASE virial stress & test otf_parser for its output

calculate stress based on atomic forces
test otf_parser to parser the output file from ase-otf

Continuous Integration

Travis or CircleCI would be useful to look into for continuous integration, so we can eliminate friction with our pull requests and development.

Underway: Developing new GP from AIMD toolkit

I'm making this for my own development process

"Force Source" or "Train Source" class that OTF and TrajectoryTrainer inherit from

This will require some design choice and discussion, but, this would help make our abstraction for the OTF and Trajectory Trainer make more sense, and share methods which make sense (i.e. prediction or std in bound methods).

Allow more flexibility for kernel hyperparameters

option to freeze certain hyperparameters
different hyperparameters for different species

Unit tests should skip instead of fail for users without QE

As is, the tests which involve calling Espresso fail if the PWSCF command is not found.
It would be nice if the PWSCF command was not detected for the unit tests for calling QE to not fail, but simply be skipped. This may be more informative to the user, as there are reasons why it could fail in attempting to call QE which would require debugging.
We could also print a message encouraging the user to fix their environment variable.

naming convention for new branches

This idea is courtesy of @dmclark17. To keep branches organized and easy to navigate, I propose we adhere to the following convention for naming branches:

type of change/branch owner/description of change

Some examples:
bug/jon/gp-hotfix
feature/jon/cool-new-kernel
docs/jon/env-docstrings
etc.

Job hangs due to memory issue?

Flare code can hang in the SLURM job on Odyssey. It could be related to the memory setup. Specifying the memory for SLURM and set the memory limit up can help.
`#SBATCH --mem-per-cpu=6000

ulimit -s unlimited`
But we should look at memory profiling for the code at some point

Z_to_element method

Would be a three-liner in util.py to have a rough version:
_Z_to_element = {z: elt for elt, z in _element_to_Z.items()}

def Z_to_element(Z): return _Z_to_element[Z]

Would be useful for mapping 'coded species' integers to species names as @nw13slx mentioned in the comments for issue 28. I'll add this later once I learn how to use branches (or somebody else is welcome to add it, no need to wait for me :) )

mc kernel issue for test_mff.py

@YuuuuXie , I think that the import statement needs to be reformatted in the test_mff.py file, as it references the mc_kernels module which has since moved / changed:

It results in an error seen above.

otf parser tests (test_parse_otf.py) is based on a fixed output file

To test that it works for the current version of otf.py (and its output conventions), the test should be based on a current output file, e.g. the ones generated by test_OTF.py.

np Array typehints in Sphinx

np Arrays are behaving strangely in the Sphinx documentation when used as typehints; this has something to do with their 'mock import' in the configuration. I will look into this and see if this can be fixed so that they render correctly in Sphinx.

Pymatgen to/from methods for Flare Structures

Would improve flexibility for the user, as Pymatgen structures have a lot of terrific methods and enjoy a wide user base.

Multiple versions of `like` and `likelihood` variables floating around in gp.py

There is some redundancy in certain methods setting the like or likelihood variable in different places.

Predict methods in OTF should be moved outside of class

Hat tip to Lixin and Yu who discovered the following problem (1) after laborious debugging:

1.Apparently parallelization for python fails when multiple processes are operating on the same instance from the same class object.

This also helps to keep the OTF class itself smaller and focused, while freeing up the predict functions to be used for other purposes.

For instance, the module I'm developing of gp_from_aimd has great cause to use the predict functions, and to avoid duplicating code, having them be in a different file allows them to be called without an OTF instance. I currently have implemented this in my development branch, for reviving gp_from_aimd. Lixin has done the same in hers, so one of us will push it eventually.

adding npool option to QE otf jobs

It would be helpful to have the option to parallelize efficiently over multiple compute nodes (i.e. with the -npool flag) for large, expensive otf simulations. Should be an easy fix -- just have to give the option in otf.py to use "run_dft_npool" instead of "run_dft_par".

small bug in update_db method of gp

#27 snuck a call to set_L_alpha() into the update_db method of gp. This leads to inefficiencies when constructing large gp models based on hundreds of structures, since it requires the covariance matrix to be updated from scratch every time a new structure is added to the training set. Eliminating the call to set_L_alpha resolves the issue.

output file as formated as possible

We should maks as much output files as possible in simple column format, which can be easily read by numpy.loadtxt

@YuuuuXie @jonpvandermause
Could you please get a list of output that can be formatted? like hyper parameter, mae/likelihood each step, ...

put the lammps patch in this repo

eom