Coder Social home page Coder Social logo

mir-group / flare Goto Github PK

View Code? Open in Web Editor NEW
278.0 20.0 64.0 173.15 MB

An open-source Python package for creating fast and accurate interatomic potentials.

Home Page: https://mir-group.github.io/flare

License: MIT License

Python 59.55% C++ 39.01% CMake 1.23% Shell 0.16% C 0.05%
kokkos

flare's Introduction

Build Status pypi activity codecov

NOTE: This is the latest release 1.3.3 which includes significant changes compared to the previous version 0.2.4. Please check the updated tutorials and documentations from the links below.

FLARE: Fast Learning of Atomistic Rare Events

FLARE is an open-source Python package for creating fast and accurate interatomic potentials.

Major Features

Note:

We implement Sparse GP, all the kernels and descriptors in C++ with Python interface.

We implement Full GP, Mapped GP, RBCM, Squared Exponential kernel and 2+3-body descriptors in Python.

Please do NOT mix them.

Documentations and Tutorials

Documentation of the code can be accessed here: https://mir-group.github.io/flare

Applications using FLARE and gallery

Google Colab Tutorials

FLARE (ACE descriptors + sparse GP). The tutorial shows how to run flare with ACE and SGP on energy and force data, demoing "offline" training on the MD17 dataset and "online" on-the-fly training of a simple aluminum force field. All the trainings use yaml files for configuration.

FLARE (LAMMPS active learning) This tutorial demonstrates new functionality for running active learning all within LAMMPS, with LAMMPS running the dynamics to allow arbitrarily complex molecular dynamics workflows while maintaining a simple interface. This also demonstrates how to use the C++ API directly from Python through pybind11. Finally, there's a simple demonstration of phonon calculations with FLARE using phonopy.

FLARE (ACE descriptors + sparse GP) with LAMMPS. The tutorial shows how to compile LAMMPS with FLARE pair style and uncertainty compute code, and use LAMMPS for Bayesian active learning and uncertainty-aware molecular dynamics.

Compute thermal conductivity from FLARE and Boltzmann transport equations. The tutorial shows how to use FLARE (LAMMPS) potential to compute lattice thermal conductivity from Boltzmann transport equation method, with Phono3py for force constants calculations and Phoebe for thermal conductivities.

Using your own customized descriptors with FLARE. The tutorial shows how to attach your own descriptors with FLARE sparse GP model and do training and testing.

All the tutorials take a few minutes to run on a normal desktop computer or laptop (excluding installation time).

Installation

Pip installation

Please check the installation guide here. This will take a few minutes on a normal desktop computer or laptop.

Developer's installation guide

For developers, please check the installation guide.

Compiling LAMMPS

See documentation on compiling LAMMPS with FLARE

Trouble shooting

If you have problem compiling and installing the code, please check the FAQs to see if your problem is covered. Otherwise, please open an issue or contact us.

System requirements

Software dependencies

  • GCC 9
  • Python 3
  • pip>=20

MKL is recommended but not required. All other software dependencies are taken care of by pip.

The code is built and tested with Github Actions using the GCC 9 compiler. (You can find a summary of recent builds here.) Other C++ compilers may work, but we can't guarantee this.

Operating systems

flare++ is tested on a Linux operating system (Ubuntu 20.04.3), but should also be compatible with Mac and Windows operating systems. If you run into issues running the code on Mac or Windows, please post to the issue board.

Hardware requirements

There are no non-standard hardware requirements to download the software and train simple models—the introductory tutorial can be run on a single cpu. To train large models (10k+ sparse environments), we recommend using a compute node with at least 100GB of RAM.

Tests

We recommend running unit tests to confirm that FLARE is running properly on your machine. We have implemented our tests using the pytest suite. You can call pytest from the command line in the tests directory.

Instructions (either DFT package will suffice):

pip install pytest
cd tests
pytest

References

If you use FLARE++ including B2 descriptors, NormalizedDotProduct kernel and Sparse GP, please cite the following paper:

[1] Vandermause, J., Xie, Y., Lim, J.S., Owen, C.J. and Kozinsky, B., 2021. Active learning of reactive Bayesian force fields: Application to heterogeneous hydrogen-platinum catalysis dynamics. Nature Communications 13.1 (2022): 5183. https://www.nature.com/articles/s41467-022-32294-0

If you use FLARE active learning workflow, full Gaussian process or 2-body/3-body kernel in your research, please cite the following paper:

[2] Vandermause, J., Torrisi, S. B., Batzner, S., Xie, Y., Sun, L., Kolpak, A. M. & Kozinsky, B. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events. npj Comput Mater 6, 20 (2020). https://doi.org/10.1038/s41524-020-0283-z

If you use FLARE LAMMPS pair style or MGP (mapped Gaussian process), please cite the following paper:

[3] Xie, Y., Vandermause, J., Sun, L. et al. Bayesian force fields from active learning for simulation of inter-dimensional transformation of stanene. npj Comput Mater 7, 40 (2021). https://doi.org/10.1038/s41524-021-00510-y

If you use FLARE PyLAMMPS for training, please cite the following paper:

[4] Xie, Y., Vandermause, J., Ramakers, S., Protik, N.H., Johansson, A. and Kozinsky, B., 2022. Uncertainty-aware molecular dynamics from Bayesian active learning: Phase Transformations and Thermal Transport in SiC. npj Comput. Mater. 9(1), 36 (2023).

If you use FLARE LAMMPS Kokkos pair style with GPU acceleration, please cite the following paper:

[5] Johansson, A., Xie, Y., Owen, C.J., Soo, J., Sun, L., Vandermause, J. and Kozinsky, B., 2022. Micron-scale heterogeneous catalysis with Bayesian force fields from first principles and active learning. arXiv preprint arXiv:2204.12573.

flare's People

Contributors

aaronchen0316 avatar aldogl avatar anjohan avatar cjowen1 avatar claudiozeni avatar dmclark17 avatar jonpvandermause avatar kylebystrom avatar niklundgren avatar nw13slx avatar simonbatzner avatar smheidrich avatar spahng avatar stevetorr avatar yuuuxie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flare's Issues

is_std_in_bound() method from OTF could be moved to a new file

Related to issue 14: #14

Because the is_std_in_bound() function is a convenience method for OTF or (the in development) TrajectoryTrainer, it would eliminate redundancy to have it exist outside of the OTF class. If we end up splitting predict functions to a separate file, I think that this function would be a natural fit for there (given that it is to diagnose the result of a prediction on a structure).

Tutorials for OTF, MFF, and MFF pairstyle

Before officially releasing the first version of the code, it would be great to include detailed tutorials in the documentation explaining how to use key features (especially otf, mff, and the mff pairstyle in LAMMPS).

Parallel environment setup

we currently use mpirun. But we should allow the user to define their own mpi environment, such as mpirun, mpiexec and srun.

Kernel Benchmarks

Before trying to make any kernel optimizations, I thought it would be good to make a suite of benchmarks so we can easily and consistently measure any performance boosts.

I am thinking a two phase benchmark would work well.

  • Setup script with constructs kernel inputs (atomic environments and the required parameters) and write them to disk in a language agnostic form. This data can then be used by multiple implementations of the kernel.
  • Python/numba implementation test which reads in the data and times the computation of the kernels

OTF restart module/method

Occasionally OTF could be interrupted (either by bug, or the user wants to interrupt the training and change the condition) and needs to restart from middle. It will be good to have a restart module, or a method inside otf module.
I have a script to do this while I was training my stanene system. If you guys have written any better wrapped module, that will be great.

Serialization and representation (e.g. __str__) methods for various objects

Methods for serializing certain objects which are passed between models (i.e. atomic environments, structures, etc), or even models themselves, would be useful. The advantage of this over pickled objects is that they can be more human-readable (and I understand that pickled objects have some security risks associated with them).

One example application is that JSON objects are easily storable in certain database architectures. This might be relevant for e.g. FOOGA in the near future if we want to automate the process of training GP models for different datasets, as this would let us store them more easily.

In my development branch I've done this for the AtomicEnvironment object. There are ways we could standardize this or easily implement it across our codebase (e.g. by using Monty, which has an object type which allows for effortless JSON serialization of different Python objects).

update_L_alpha should be parallelized

set_L_alpha gives the option to compute the covariance matrix in parallel. It would be helpful to have the same option for update_L_alpha, especially for large GPs

VASP parsers and utility

Would be helpful to (short term) provide built-in methods to parse VASP files for model training, and (long term) support VASP interface for OTF runs

ASE interface unit test

open a new branch, tasks:

  1. calculator
  2. otf - md
  3. test interface of different dft calculators

VASP (potentially more) Parser Utility Functions

One feature that would help to accelerate the pipeline of GP from AIMD workflows would be helper functions which parse DFT outputs (like VASP) and turn them into a file of serialized structures decorated with force information. A second helper function could generate atomic environments from a .json file. The use of functions like pymatgen for parsing would be extremely welcome here, as they have very high quality and externally maintained parsers for VASP. These wrappers would be simple to implement and useful.

Relevant to #19.

To / From methods for Structure object & ASE Atoms

@YuuuuXie may have already done this, but

  1. A static method for turning ASE Atoms objects intro FLARE Structures would help when generating structures using ASE,
  2. A method for turning FLARE Structures into ASE Atoms objects would also be nice.

Both of these would enhance user pre-processing flexibility.

Flare io.py

Write flare.io module, which is an input/output module specifically for flare data structures. Should take md_trajectory_to/from_file from vasp_util.py and put it here.

Continuous Integration

Travis or CircleCI would be useful to look into for continuous integration, so we can eliminate friction with our pull requests and development.

Unit tests should skip instead of fail for users without QE

As is, the tests which involve calling Espresso fail if the PWSCF command is not found.
It would be nice if the PWSCF command was not detected for the unit tests for calling QE to not fail, but simply be skipped. This may be more informative to the user, as there are reasons why it could fail in attempting to call QE which would require debugging.
We could also print a message encouraging the user to fix their environment variable.

naming convention for new branches

This idea is courtesy of @dmclark17. To keep branches organized and easy to navigate, I propose we adhere to the following convention for naming branches:

type of change/branch owner/description of change

Some examples:
bug/jon/gp-hotfix
feature/jon/cool-new-kernel
docs/jon/env-docstrings
etc.

Job hangs due to memory issue?

Flare code can hang in the SLURM job on Odyssey. It could be related to the memory setup. Specifying the memory for SLURM and set the memory limit up can help.
`#SBATCH --mem-per-cpu=6000

ulimit -s unlimited`
But we should look at memory profiling for the code at some point

Z_to_element method

Would be a three-liner in util.py to have a rough version:
_Z_to_element = {z: elt for elt, z in _element_to_Z.items()}

def Z_to_element(Z): return _Z_to_element[Z]

Would be useful for mapping 'coded species' integers to species names as @nw13slx mentioned in the comments for issue 28. I'll add this later once I learn how to use branches (or somebody else is welcome to add it, no need to wait for me :) )

mc kernel issue for test_mff.py

@YuuuuXie , I think that the import statement needs to be reformatted in the test_mff.py file, as it references the mc_kernels module which has since moved / changed:

image

It results in an error seen above.

np Array typehints in Sphinx

np Arrays are behaving strangely in the Sphinx documentation when used as typehints; this has something to do with their 'mock import' in the configuration. I will look into this and see if this can be fixed so that they render correctly in Sphinx.

Predict methods in OTF should be moved outside of class

Hat tip to Lixin and Yu who discovered the following problem (1) after laborious debugging:

1.Apparently parallelization for python fails when multiple processes are operating on the same instance from the same class object.

  1. This also helps to keep the OTF class itself smaller and focused, while freeing up the predict functions to be used for other purposes.

For instance, the module I'm developing of gp_from_aimd has great cause to use the predict functions, and to avoid duplicating code, having them be in a different file allows them to be called without an OTF instance. I currently have implemented this in my development branch, for reviving gp_from_aimd. Lixin has done the same in hers, so one of us will push it eventually.

adding npool option to QE otf jobs

It would be helpful to have the option to parallelize efficiently over multiple compute nodes (i.e. with the -npool flag) for large, expensive otf simulations. Should be an easy fix -- just have to give the option in otf.py to use "run_dft_npool" instead of "run_dft_par".

small bug in update_db method of gp

#27 snuck a call to set_L_alpha() into the update_db method of gp. This leads to inefficiencies when constructing large gp models based on hundreds of structures, since it requires the covariance matrix to be updated from scratch every time a new structure is added to the training set. Eliminating the call to set_L_alpha resolves the issue.

output file as formated as possible

We should maks as much output files as possible in simple column format, which can be easily read by numpy.loadtxt

@YuuuuXie @jonpvandermause
Could you please get a list of output that can be formatted? like hyper parameter, mae/likelihood each step, ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.