Coder Social home page Coder Social logo

lj_surrogates's Introduction

LJ_surrogates

This repository accompanies this manuscript: Using physical property surrogate models to perform multi-fidelity global optimization of force field parameters

Drivers used to prepare parameter sets, process data inputs, build surrogates and run optimizations are found in LJ_surrogates. Specific files used to set up experiments in the paper are found in projects/pure-only/integrated and projects/mixture-only. Data files containing the output of all simulations (i.e. force fields and physical property calculation results) used in the optimization, as well any relevant benchmarking outputs, can be found in data

Multi-fidelity optimization workflow

An example surrogate-building workflow (the one used in the paper) is provided in projects/pure-only/intergated/optimize.py. This workflow is intended to run on HPC resources (using GPU resources while requesting simulations using OpenFF Evaluator, and using a single CPU resource to perform the optimization and surrogate building. In order to adapt this code to specific HPC resources, the IntegratedOptimizer.setup_server method may need to be edited; see the OpenFF Evaluator Documentation. The SurrogateDESearchOptimizer is initialized with the input forcefield and input experimental physical property dataset. The optimize method drives the multifidelity optimization, and takes the following inputs: param_range: Relative ranges of each parameter to build the initial parameter space box.

smirks: List of vdw SMIRKS types to be optimized

max_simulations: The maximum number of simulated data sets that the optimizer will request (initial simulations + additional simulations during the optimization process)

initial_samples: The number of initial parameter sets to create with LHS sampling and simulate, before the initial surrogate model is built. The initial force field is also included, but does not count in initial_samples (So requesting 9 initial samples will leave you with 9 + 1 = 10 parameter datasets to build surrogates with initially)

n_workers: The number of workers for the Evaluator server to request from HPC resources

use_cached_data: Whether to use already existing data (from a previous checkpoint) to build surrogates. Counts against max_simulations (i.e., if you supply 10 dataset/force field pairs in the cached data and you set max_simulation=15, the optimizer will only call the simulation level 5 more times.

cached_data_location: The filepath to the cached data that should be used. Cached data should be in the form of the directory passed to collate_physical_property_data.

The script in the example will perform the N=10 optimization described in the paper.

Building Surrogates

The function to build surrogates are containing in the surrogates/collate_data.py module. Specifically, the collate_physical_property_data function will create a ParameterSetDataMultiplex object which contains the experimental reference data, simulated parameter sets, physical properties simulated from those parameter sets, and surrogates built from that data. The surrogates, built with botorch, expect N OpenFF Evaluator PhysicalPropertyDataSet objects (see here) as an input (each containing M physical properties) as training outputs, and N parameter vectors (taken from initially supplied .offxml SMIRNOFF spec force field files).

To use this function, you should supply directory: A directory containing N estimated physical property datasets, named estimated_data_set_{n}.json and N corresponding force fields, labeled force_field_{n}.offxml

smirks: a list of the SMIRKS types to build surrogate models for

initial_forcefield: An .offxml file used to extract the initial parameters; can be any simulated forcefield in the dataset.

properties_filepath: A path to a .json file containing the reference physical properties.

device: Which device to use to build the surrogates. Currently, 'cpu' is recommended.

Optimizers

Modifications to the optimization algorithm can be made by creating a class that inherits the base IntegratedOptimizer class from the LJ_surrogates/sampling/integrated_optimization.py module (e.g. FooOptimizer(IntegratedOptimizer)) and overwriting the optimize method.

Bayesian inference

An example notebook demonstrating the use of physical property surrogates to do Bayesian inference for a test problem is available in projects/examples/argon_single

lj_surrogates's People

Contributors

ocmadin avatar simonboothroyd avatar

Stargazers

Pavan Behara avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.