Coder Social home page Coder Social logo

openforcefield / openff-evaluator Goto Github PK

View Code? Open in Web Editor NEW
50.0 23.0 18.0 4.41 MB

A physical property evaluation toolkit from the Open Forcefield Consortium.

Home Page: https://docs.openforcefield.org/projects/evaluator

License: MIT License

Shell 0.05% Python 99.95%
openforcefield physical-properties forcefield force-field validation assessment openmm thermoml

openff-evaluator's Introduction

Status GH Actions Status Codecov coverage
Latest Release Last release tag Commits since releaseDOI
Communication docs stable user & dev discussions
Foundation license platforms python Funding
Installation Releases Conda Last updated Anaconda Cloud downloads

The Open Force Field toolkit

The Open Force Field Toolkit, built by the Open Force Field Initiative, is a Python toolkit for the development and application of modern molecular mechanics force fields based on direct chemical perception and rigorous statistical parameterization methods.

The toolkit currently covers two main areas we have committed to stably maintain throughout their lifetimes:

Note: Prior to version 0.9.0, this toolkit and its associated repository were named openforcefield and used different import paths. For details on this change and migration instructions, see the release notes of version 0.9.0.

Documentation

Documentation for the Open Force Field Toolkit is hosted at readthedocs. Example notebooks are available in the examples/ directory and also hosted on the Open Force Field website.

How to cite

Please cite the OpenFF Toolkit using the Zenodo record of the latest release or the version that was used. The BibTeX reference of the latest release can be found here.

Installation

The Open Force Field Toolkit (openff-toolkit) is a Python toolkit, and supports Python 3.9 through 3.11.

Installing via Mamba/Conda

Detailed installation instructions can be found here.

Force Fields

Two major force field development efforts have been undertaken by the Open Force Field Initiative, with results hosted in separate repositories.

  • The Open Force Fields repository, which features the Parsley and Sage force field lines. These are the Open Force Field Initiative's efforts toward building new force fields. The initial parameters are taken from smirnoff99Frosst, but software and data produced by the Initiative's efforts have been used to refit parameter values and add new SMIRKS-based parameters.
  • The smirnoff99Frosst repository, which is descended from AMBER's parm99 force field as well as Merck-Frosst's parm@frosst. This line of force fields does not aim to alter parameter values, but is instead a test of accurately converting an atom type-based force field to the SMIRNOFF format.

Force fields from both of these packages are available in their respective GitHub repositories and also as conda packages. Tables detailing the individual file names/versions within these force field lines are in the README of each repository. By default, installing the Open Force Field toolkit using conda or the single-file toolkit installers will also install these conda packages. A plugin architecture is provided for other force field developers to produce python/conda packages that can be imported by the Open Force Field Toolkit as well.

Toolkit features

The SMIRKS Native Open Force Field (SMIRNOFF) format

This repository provides tools for using the SMIRKS Native Open Force Field (SMIRNOFF) specification, which currently supports an XML representation for force field definition files.

By convention, files containing XML representations of SMIRNOFF force fields carry .offxml extensions.

Example SMIRNOFF .offxml force field definitions can be found in openff/toolkit/data/test_forcefields/. These force fields are for testing only, and we neither record versions of these files, nor do we guarantee their correctness or completeness.

Working with SMIRNOFF parameter sets

SMIRNOFF force fields can be parsed by the ForceField class, which offers methods including create_openmm_system for exporting to OpenMM and create_interchange for exporting to other formats (GROMACS, Amber, LAMMPS) via Interchange.

# Load a molecule into the OpenFF Molecule object
from openff.toolkit import Molecule
from openff.toolkit.utils import get_data_file_path
sdf_file_path = get_data_file_path('molecules/ethanol.sdf')
molecule = Molecule.from_file(sdf_file_path)

# Create an OpenFF Topology object from the molecule
from openff.toolkit import Topology
topology = Topology.from_molecules(molecule)

# Load the latest OpenFF force field release: version 2.1.0, codename "Sage"
from openff.toolkit import ForceField
forcefield = ForceField('openff-2.1.0.offxml')

# Create an OpenMM system representing the molecule with SMIRNOFF-applied parameters
openmm_system = forcefield.create_openmm_system(topology)

# Create an Interchange object for representations in other formats
interchange = forcefield.create_interchange(topology)

Detailed examples of using SMIRNOFF with the toolkit can be found in the documentation.

Frequently asked questions (FAQ)

See FAQ.md for answers to a variety of common problems, such as:

  • Why do I need to provide molecules corresponding to the components of my system, or a Topology with bond orders?
  • Can I use an Amber, CHARMM, or GROMACS topology/coordinate file as a starting point for applying a SMIRNOFF force field?
  • What if I am starting from a PDB file?

Contributors

For a partial list of contributors, see the GitHub Contributors page. Others whose work constitutes significant contributions but did not make it into the git history include Shuzhe Wang.

openff-evaluator's People

Contributors

aehogan avatar darelbeida avatar dependabot[bot] avatar dotsdl avatar j-wags avatar jaimergp avatar jaketanderson avatar jeff231li avatar jthorton avatar lilyminium avatar mattwthompson avatar ocmadin avatar pavankum avatar pre-commit-ci[bot] avatar simonboothroyd avatar yoshanuikabundi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openff-evaluator's Issues

A little bug in unit toolkit (Tutorial 04)

[16]

Reduce the default number of molecules

from evaluator.properties import Density, EnthalpyOfVaporization

I think it should be

Reduce the default number of molecules

from openff.evaluator.properties import Density, EnthalpyOfVaporization

It works on my remote computer!

Executable for SSH tunnelling?

Being able to monitor the Dask dashboard remotely is a useful feature. I can do this currently with:

function dashboard {
    address=$1
    port=$2
    dest=${3:-hpc3}
    ssh -N -f -L 127.0.0.1:${port}:${address}:${port} ${dest}
    url="127.0.0.1:${port}/status"
    echo "Dashboard started at ${url}"
}

and it's called with the below when I want to tunnel to the hpc3 cluster (dest has a default argument of hpc3):

dashboard 10.240.58.91 8080

Do you think it'd be possible to add this as a command-line tool? If using subprocess, you may need to set shell=True. I'm not sure how complex it would be to deal with presence or absence of SSH configs. I'm also not familiar with creating python-less command-line tools beyond the simple function above. I think a well-documented command line tool for this would be super super helpful for anyone not familiar with dask, ssh, tunnelling, or has just not memorised the syntax.

I/O issues when running paprika protocol on NFS, Lustre mount

I've created this issue as an anchor for an ongoing troubleshooting session. Any solution(s) to this issue will be documented here.

From @jeff231li:

I'm having is an I/O issue. One example I encounter is when a dask-worker has opened a *.dcd file at the start of a simulation. As the simulation progresses, the dask-worker will need to open this file and append to it the current snapshot of the system. However, it is spitting out an error because it is trying to append to an empty file (for some reason the initial header was not written to the file by OpenMM). I tried using os.flush() and f.flush() but the same problem occurs. Note that this only happens in some system (43 host-guest systems in total) and occurs randomly. We can discuss this further tomorrow but I think it is a problem with the way TSCC is setup, either hardware limitations or OS issue (due to high volume I/O).

We met today in a live session to troubleshoot. Some details:

  1. He is using paprika to perform host-guest free energy simulations on the paprika-integration branch.
  2. Using the DaskPBSBackend on TSCC for compute.
  3. Using Lustre filesystem or NFS mount for full run, both working_directory and storage_directory.

Seeing issues such as the following.

Running on NFS mount

#!/usr/bin/env python
import json
import os
import shutil

from distributed import Adaptive
from openforcefield.typing.engines import smirnoff

from propertyestimator import unit
from propertyestimator.properties import HostGuestBindingAffinity
from propertyestimator.protocols.paprika import OpenMMPaprikaProtocol
from propertyestimator.backends import QueueWorkerResources, DaskPBSBackend
from propertyestimator.client import PropertyEstimatorClient, PropertyEstimatorOptions
from propertyestimator.datasets.taproom import TaproomDataSet
from propertyestimator.server import PropertyEstimatorServer
from propertyestimator.storage import LocalFileStorage
from propertyestimator.utils import setup_timestamp_logging, get_data_filename
from propertyestimator.utils.serialization import TypedJSONEncoder
from propertyestimator.workflow import WorkflowOptions


class CustomAdaptive(Adaptive):
    """A temporary work-around to attempt to fix
    https://github.com/dask/distributed/issues/3154
    """

    async def recommendations(self, target: int) -> dict:
        """
        Make scale up/down recommendations based on current state and target
        """
        await self.cluster
        return await super(CustomAdaptive, self).recommendations(target)


def _get_modified_schema(workflow_options):
    default_schema = HostGuestBindingAffinity.get_default_paprika_simulation_workflow_schema(workflow_options)

    host_guest_protocol = OpenMMPaprikaProtocol('host_guest_free_energy_$(orientation_replicator)')
    host_guest_protocol.schema = default_schema.protocols[host_guest_protocol.id]

    host_guest_protocol.equilibration_timestep = 1 * unit.femtosecond
    host_guest_protocol.number_of_equilibration_steps = 200000

    host_protocol = OpenMMPaprikaProtocol('host')
    host_protocol.schema = default_schema.protocols[host_protocol.id]

    host_protocol.equilibration_timestep = 1 * unit.femtosecond
    host_protocol.number_of_equilibration_steps = 200000

    default_schema.protocols[host_guest_protocol.id] = host_guest_protocol.schema
    default_schema.protocols[host_protocol.id] = host_protocol.schema

    return default_schema

def main():

    setup_timestamp_logging()

    # Load in the force field
    force_field = smirnoff.ForceField('smirnoff99Frosst-1.1.0.offxml',
                                      get_data_filename('forcefield/tip3p.offxml'))

    # Load in the data set, retaining only a specific host / guest pair.
    host = ['acd', 'bcd']
    # guest = 'bam'

    data_set = TaproomDataSet()

    data_set.filter_by_host_identifiers(*host)
    # data_set.filter_by_guest_identifiers(guest)

    # Set up the server object which run the calculations.
    working_directory = 'working_directory'
    storage_directory = 'storage_directory'

    # Remove any existing data.
    if os.path.isdir(working_directory):
        shutil.rmtree(working_directory)

    queue_resources = QueueWorkerResources(number_of_threads=1,
                                           number_of_gpus=1,
                                           preferred_gpu_toolkit=QueueWorkerResources.GPUToolkit.CUDA,
                                           per_thread_memory_limit=4 * unit.gigabyte,
                                           wallclock_time_limit="08:00:00")

    setup_script_commands = [
        'source /home/jsetiadi/.bashrc',
        'conda activate propertyestimator',
        f'cd /projects/gilson-kirkwood/jsetiadi/propertyestimator/full-taproom'
    ]

    calculation_backend = DaskPBSBackend(minimum_number_of_workers=1,
                                         maximum_number_of_workers=48,
                                         resources_per_worker=queue_resources,
                                         queue_name='gpu-condo',
                                         setup_script_commands=setup_script_commands,
                                         adaptive_interval='1000ms',
                                         resource_line='nodes=1:ppn=2:gpuTitan',
                                         adaptive_class=CustomAdaptive)

    # Set up a backend to cache simulation data in.
    storage_backend = LocalFileStorage(storage_directory)

    # Spin up the server object.
    PropertyEstimatorServer(calculation_backend=calculation_backend,
                            storage_backend=storage_backend,
                            working_directory=working_directory)

    # Request the estimate of the host-guest binding affinity.
    options = PropertyEstimatorOptions()
    options.allowed_calculation_layers = ['SimulationLayer']

    workflow_options = WorkflowOptions(convergence_mode=WorkflowOptions.ConvergenceMode.NoChecks)
    workflow_schema = _get_modified_schema(workflow_options)

    options.workflow_options = {'HostGuestBindingAffinity': {'SimulationLayer': workflow_options}}
    options.workflow_schemas = {'HostGuestBindingAffinity': {'SimulationLayer': workflow_schema}}

    estimator_client = PropertyEstimatorClient()

    request = estimator_client.request_estimate(property_set=data_set,
                                                force_field_source=force_field,
                                                options=options)

    # Wait for the results.
    results = request.results(True, 3600)

    # Save the result to file.
    with open('results.json', 'wb') as file:

        json_results = json.dumps(results, sort_keys=True, indent=2,
                                  separators=(',', ': '), cls=TypedJSONEncoder)

        file.write(json_results.encode('utf-8'))


if __name__ == "__main__":
    main()

Gives:

18:18:38.913 INFO     An exception was raised: working_directory/SimulationLayer/e34f73be-3bc7-42f0-82b1-c74f02147912/135262ac-c38f-4108-8c76-4bae641adb51:host_guest_free_energy_1 - An unhandled exception occurred: ['Traceback (most recent call last):\n', '  File "/projects/gilson-kirkwood/jsetiadi/propertyestimator/paprika_integration/propertyestimator/workflow/workflow.py", line 1264, in _execute_protocol\n    output_dictionary = protocol.execute(directory, available_resources)\n', '  File "/projects/gilson-kirkwood/jsetiadi/propertyestimator/paprika_integration/propertyestimator/protocols/paprika.py", line 685, in execute\n    error = self._setup(\'\', available_resources)\n', '  File "/projects/gilson-kirkwood/jsetiadi/propertyestimator/paprika_integration/propertyestimator/protocols/paprika.py", line 554, in _setup\n    result = self._solvate_windows(directory, available_resources)\n', '  File "/projects/gilson-kirkwood/jsetiadi/propertyestimator/paprika_integration/propertyestimator/protocols/paprika.py", line 254, in _solvate_windows\n    reference_structure_path)\n', '  File "/projects/gilson-kirkwood/jsetiadi/propertyestimator/paprika_integration/propertyestimator/protocols/paprika.py", line 754, in _add_dummy_atoms\n    self._solvated_system_xml_paths[index])\n', '  File "/projects/gilson-kirkwood/jsetiadi/anaconda3_tscc/envs/propertyestimator/lib/python3.7/site-packages/paprika/setup.py", line 398, in add_dummy_atoms\n    reference_structure = pmd.load_file(reference_pdb, structure=True)\n', '  File "/projects/gilson-kirkwood/jsetiadi/anaconda3_tscc/envs/propertyestimator/lib/python3.7/site-packages/parmed/formats/registry.py", line 162, in load_file\n    if filename.startswith(\'http://\') or filename.startswith(\'https://\')\\\n', "AttributeError: 'NoneType' object has no attribute 'startswith'\n"]

Running on Lustre filesystem

#!/usr/bin/env python
import json
import os
import shutil

from distributed import Adaptive
from openforcefield.typing.engines import smirnoff

from propertyestimator import unit
from propertyestimator.properties import HostGuestBindingAffinity
from propertyestimator.protocols.paprika import OpenMMPaprikaProtocol
from propertyestimator.backends import QueueWorkerResources, DaskPBSBackend
from propertyestimator.client import PropertyEstimatorClient, PropertyEstimatorOptions
from propertyestimator.datasets.taproom import TaproomDataSet
from propertyestimator.server import PropertyEstimatorServer
from propertyestimator.storage import LocalFileStorage
from propertyestimator.utils import setup_timestamp_logging, get_data_filename
from propertyestimator.utils.serialization import TypedJSONEncoder
from propertyestimator.workflow import WorkflowOptions

import logging
from importlib import reload
reload(logging)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.basicConfig(
    filename='propertyestimator.log',
    format='%(asctime)s %(message)s',
    datefmt='%Y-%m-%d %I:%M:%S %p',
)


class CustomAdaptive(Adaptive):
    """A temporary work-around to attempt to fix
    https://github.com/dask/distributed/issues/3154
    """

    async def recommendations(self, target: int) -> dict:
        """
        Make scale up/down recommendations based on current state and target
        """
        await self.cluster
        return await super(CustomAdaptive, self).recommendations(target)


def _get_modified_schema(workflow_options):
    default_schema = HostGuestBindingAffinity.get_default_paprika_simulation_workflow_schema(workflow_options)

    host_guest_protocol = OpenMMPaprikaProtocol('host_guest_free_energy_$(orientation_replicator)')
    host_guest_protocol.schema = default_schema.protocols[host_guest_protocol.id]

    host_guest_protocol.equilibration_timestep = 1 * unit.femtosecond
    host_guest_protocol.number_of_equilibration_steps = 200000
    host_guest_protocol.number_of_production_steps = 1000000
    host_guest_protocol.number_of_solvent_molecules = 2210

    host_protocol = OpenMMPaprikaProtocol('host')
    host_protocol.schema = default_schema.protocols[host_protocol.id]

    host_protocol.equilbration_timestep = 1 * unit.femtosecond
    host_protocol.number_of_equilibration_steps = 200000
    host_protocol.number_of_production_steps = 1000000
    host_protocol.number_of_solvent_molecules = 1500

    default_schema.protocols[host_guest_protocol.id] = host_guest_protocol.schema
    default_schema.protocols[host_protocol.id] = host_protocol.schema

    return default_schema

def main():

    setup_timestamp_logging()

    # Load in the force field
    force_field = smirnoff.ForceField('smirnoff99Frosst-1.1.0.offxml',
                                      get_data_filename('forcefield/tip3p.offxml'))

    # Load in the data set, retaining only a specific host / guest pair.
    # host = 'bcd'
    # guest = 'bam'

    data_set = TaproomDataSet()

    # data_set.filter_by_host_identifiers(host)
    # data_set.filter_by_guest_identifiers(guest)

    # Set up the server object which run the calculations.
    working_directory = 'working_directory'
    storage_directory = 'storage_directory'

    # Remove any existing data.
    if os.path.isdir(working_directory):
        shutil.rmtree(working_directory)

    queue_resources = QueueWorkerResources(number_of_threads=1,
                                           number_of_gpus=1,
                                           preferred_gpu_toolkit=QueueWorkerResources.GPUToolkit.CUDA,
                                           per_thread_memory_limit=4 * unit.gigabyte,
                                           wallclock_time_limit="99:00:00")

    setup_script_commands = [
        'source /home/jsetiadi/.bashrc',
        'conda activate pe-paprika',
        f'cd /oasis/tscc/scratch/jsetiadi/full-run',
        'echo "Using GPU no $CUDA_VISIBLE_DEVICES"'
    ]

    calculation_backend = DaskPBSBackend(minimum_number_of_workers=1,
                                         maximum_number_of_workers=12,
                                         resources_per_worker=queue_resources,
                                         queue_name='home-mgilson',
                                         setup_script_commands=setup_script_commands,
                                         adaptive_interval='1000ms',
                                         resource_line='nodes=1:ppn=3:gpu980',
                                         adaptive_class=CustomAdaptive)

    # Set up a backend to cache simulation data in.
    storage_backend = LocalFileStorage(storage_directory)

    # Spin up the server object.
    PropertyEstimatorServer(calculation_backend=calculation_backend,
                            storage_backend=storage_backend,
                            working_directory=working_directory)

    # Request the estimate of the host-guest binding affinity.
    options = PropertyEstimatorOptions()
    options.allowed_calculation_layers = ['SimulationLayer']

    workflow_options = WorkflowOptions(convergence_mode=WorkflowOptions.ConvergenceMode.NoChecks)
    workflow_schema = _get_modified_schema(workflow_options)

    options.workflow_options = {'HostGuestBindingAffinity': {'SimulationLayer': workflow_options}}
    options.workflow_schemas = {'HostGuestBindingAffinity': {'SimulationLayer': workflow_schema}}

    estimator_client = PropertyEstimatorClient()

    request = estimator_client.request_estimate(property_set=data_set,
                                                force_field_source=force_field,
                                                options=options)

    # Wait for the results.
    results = request.results(True, 10800)

    # Save the result to file.
    with open('results.json', 'wb') as file:

        json_results = json.dumps(results, sort_keys=True, indent=2,
                                  separators=(',', ': '), cls=TypedJSONEncoder)

        file.write(json_results.encode('utf-8'))


if __name__ == "__main__":
    main()

Gives:

18:48:08.021 INFO     An exception was raised: working_directory/SimulationLayer/9ab84d91-427c-42a5-8a7a-46c2256ca9b3/4d727f5c-7b6f-4f45-bdfa-3c11542978dd:filter_host/4d727f5c-7b6f-4f45-bdfa-3c11542978dd:host - An unhandled exception occurred: ['Traceback (most recent call last):\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/workflow/workflow.py", line 1264, in _execute_protocol\n    output_dictionary = protocol.execute(directory, available_resources)\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/protocols/paprika.py", line 699, in execute\n    error = self._setup(\'\', available_resources)\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/protocols/paprika.py", line 568, in _setup\n    result = self._solvate_windows(directory, available_resources)\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/protocols/paprika.py", line 268, in _solvate_windows\n    reference_structure_path)\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/protocols/paprika.py", line 757, in _add_dummy_atoms\n    result = build_solvated_complex_system.execute(window_directory, None)\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/protocols/forcefield.py", line 371, in execute\n    file.write(system_xml.encode(\'utf-8\'))\n', 'OSError: [Errno 5] Input/output error\n']
18:48:08.181 INFO     Finished server request 9ab84d91-427c-42a5-8a7a-46c2256ca9b3
18:48:11.303 INFO     Finished server request e028c196-0e92-419f-8368-c2c1a981d64c
18:48:13.447 INFO     An exception was raised:  - acd/release/windows/r002/simulations/npt_production: The simulation failed unexpectedly: ['Traceback (most recent call last):\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/protocols/simulation.py", line 732, in _simulate\n    self._write_checkpoint_file(current_step, context)\n', '  File "/home/jsetiadi/propertyestimator/propertyestimator/protocols/simulation.py", line 454, in _write_checkpoint_file\n    json.dump(checkpoint, file, cls=TypedJSONEncoder)\n', 'OSError: [Errno 5] Input/output error\n']

Possible workarounds

It may make sense in this case to create a scratch directory on each compute node's local storage at $TMPDIR for both the PropertyEstimatorServer and the dask-workers in setup_script_commands, then set the working_directory to point to that. This may avoid issues with rapid writes/reads on mounted network filesystems. More details in the TSCC docs.

Substitutions in the scripts above like the following may work well:

    setup_script_commands = [
        'source /home/jsetiadi/.bashrc',
        'conda activate propertyestimator',
        'mkdir -p $TMPDIR/jsetiadi/working_directory',
        'cd $TMPDIR/jsetiadi/working_directory'
    ]

working_directory = os.path.join(os.environ['TMPDIR'], 'working_directory')
os.makedirs(working_directory)

@jeff231li, can you give the above a shot and let us know here if this addresses the errors you are seeing?

Usage of daemon workers, dask config, and tests

Daemon workers and dask config

I tried to set up a DaskSLURMBackend and received this error:

distributed.worker - WARNING -  Compute Failed
Function:  _wrapped_function
args:      (<function ProtocolGraph._execute_protocol at 0x7f952bf3f8b0>, 'evaluator_working-data/SimulationLayer/f4683d534fda4bd6bb6d8722fafff554/456cf9ec8e1649c1a15d61042b188733_build_solvated_coordinates', '{"id": "456cf9ec8e1649c1a15d61042b188733|build_solvated_coordinates", "type": "BuildCoordinatesPackmol", "inputs": {".allow_merging": true, ".max_molecules": 2000, ".count_exact_amount": true, ".mass_density": {"value": 0.95, "unit": "g / ml", "@type": "openff.evaluator.unit.Quantity"}, ".box_aspect_ratio": [1.0, 1.0, 1.0], ".substance": {"components": [{"smiles": "CCCCCCCCCC", "role": {"value": "solv", "@type": "openff.evaluator.substances.components.Component.Role"}, "@type": "openff.evaluator.substances.components.Component"}, {"smiles": "CCCCCCCCCC", "role": {"value": "sol", "@type": "openff.evaluator.substances.components.Component.Role"}, "@type": "openff.evaluator.substances.components.Component"}], "amounts": {"CCCCCCCCCC{solv}": [{"value": 1.0, "@type": "openff.evaluator.substances
kwargs:    {'safe_exceptions': True, 'available_resources': <openff.evaluator.backends.backends.QueueWorkerResources object at 0x7f95353d5160>, 'registered_workflow_protocols': ['openff.evaluator.workflow.protocols.ProtocolGroup', 'openff.evaluator.protocols.analysis.AverageObservable', 'openff.evaluator.protocols.analysis.AverageDielectricConstant', 'openff.evaluator.protocols.analysis.AverageFreeEnergies', 'openff.evaluator.protocols.analysis.ComputeDipoleMoments', 'openff.evaluator.protocols.analysis.DecorrelateTrajectory', 'openff.evaluator.protocols.analysis.DecorrelateObservables', 'openff.evaluator.protocols.coordinates.BuildCoordinatesPackmol', 'openff.evaluator.protocols.coordinates.SolvateExistingStructure', 'openff.evaluator.protocols.coordinates.BuildDockedCoordinates', 'openff.evaluator.protocols.forcefield.BaseBuildSystem', 'openff.evaluator.protocols.forcefield.TemplateBuildSystem', 'openff.evaluator.protocols.forcefield.BuildSmirnoffSystem', 'openff.evaluator.protocols.forcefield.
Exception: AssertionError('daemonic processes are not allowed to have children')

This is addressed by Evaluator's documentation, which recommends setting up a dask distributed configuration that switches daemons off: https://openff-evaluator.readthedocs.io/en/stable/backends/daskbackends.html#configuration

However, IMO recommendations should be left for settings that optimize program execution, not are essential to function. Also, needing to set a global configuration is not ideal if you want to run dask with other configurations. You can set environment variables in your script, but doing that for something that's essential for function is also not ideal.

Dask has a configuration context manager: https://docs.dask.org/en/stable/configuration.html#dask.config.set
Would it be possible to use this in Evaluator? Alternatively, would it be possible to emit a warning or error upon seeing known bad configurations? While the Dask worker job dies on SLURM, the overall manager job continues, so it's easy to incur some wastage of computer time.

Tests

I would have looked into this further out of interest, but was a bit stumped by the tests not working. The backend tests are currently skipped because of Travis. After removing the skip, they continued not working because GH runners aren't an HPC environment, I guess (https://github.com/lilyminium/openff-evaluator/tree/lily/dask-backends). Is OpenFF interested in bringing these tests back? The Dask Jobqueue library sets up Docker images for testing.

Add `tidy` keyword to to_pandas?

I was surprised that .to_pandas converts to a wide format where each property type gets its own column and imposed unit. I would have thought it more intuitive to convert to a tidier format. i.e.

Instead of:

Index(['Id', 'Temperature (K)', 'Pressure (kPa)', 'Phase', 'N Components',
       'Component 1', 'Role 1', 'Mole Fraction 1', 'Exact Amount 1',
       'Component 2', 'Role 2', 'Mole Fraction 2', 'Exact Amount 2',
       'SolvationFreeEnergy Value (kJ / mol)',
       'SolvationFreeEnergy Uncertainty (kJ / mol)', 'Source'],
      dtype='object')

You could have:

Index(['Id', 'Temperature (K)', 'Pressure (kPa)', 'Phase', 'N Components',
       'Component 1', 'Role 1', 'Mole Fraction 1', 'Exact Amount 1',
       'Component 2', 'Role 2', 'Mole Fraction 2', 'Exact Amount 2',
       'Property type', 'Value', 'Value unit', 'Uncertainty', 'Uncertainty unit', 'Source'],
      dtype='object')

This would be more efficient memory-wise (edit: for mixed datasets), as you no longer have NaNs taking up a bunch of space, as well as help in filtering by property type. When working direclty with the dataframe it would be much easier to see how many of each property type you have and to group by it.

Cannot roundtrip to/from pandas

I'm using an old version of Evaluator (0.3.5), but looking at the code I don't think it's changed in the relevant parts.

I have converted my dataset to a dataframe for filtering, but I can't convert it back. The reason is ExactAmounts are interpreted as floats. The reason is that I have mixed Nones and integers in the column, which Pandas interprets as float64; nan is a float. A column of all Nones does not have this problem, because Pandas does not convert None to NaN and keeps it as an object. The relevant code is here:

if not numpy.isclose(exact_amount, 0.0):
substance.add_component(component, ExactAmount(exact_amount))

IMO code changes should go in from_pandas because then you can read from general CSV files.

I also noticed this line:

for _, data_row in data_frame.iterrows():

This doesn't seem to have caused problems yet, but I would generally recommend changing this to itertuples. iterrows does not preserve column type, but converts the row into a Series (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html). This is another great way to change an integer into a float without realising. However, itertuples may be hard to work with as there spaces in the column headings.

Standard release process

1: Update the release history documentation file, using this template

# 0.2.2

This release modifies an example to show how to parameterize a solvated system, cleans up backend code, and makes several improvements to the README.

A richer version of these release notes with live links to API documentation is available on [our ReadTheDocs page](https://open-forcefield-toolkit.readthedocs.io/en/latest/releasehistory.html)

See our [installation instructions](https://open-forcefield-toolkit.readthedocs.io/en/latest/installation.html).

Please report bugs, request features, or ask questions through our [issue tracker](https://github.com/openforcefield/openforcefield/issues).

**Please note that there may still be some changes to the API prior to a stable 1.0.0 release.**

### Bugfixes

* PR #279: Cleanup of unused code/warnings in main package __init__
* PR #259: Update T4 Lysozyme + toluene example to show how to set up solvated systems
* PR #256 and PR #274: Add functionality to ensure that links in READMEs resolve successfully

Merge the above release summary into the release history file in a PR.

2: Cut the release on GitHub

  • Go to the Releases tab on the front page of the repo
  • Draft a new release
#### Tag = `X.Y.Z` @ master

#### Title = `X.Y.Z [Descriptive Title]`

Copy the text from the release summary in part 1), being sure to convert to MarkDown if necessary.

You do NOT need to upload any files. The source code will automatically be added as a tar.gz.

And check This is a pre-release

3: Trigger a new build on Omnia.

Note: Omnia builds take about 30 minutes to run. When you open a PR, the build will run, and you can check the bottom of the travis logs for "package failed to build" listings. Some packages always fail (protons, assaytools), but propertyestimator shouldn't be there. Ctrl-F for propertyestimator to ensure that it did build at all, though.

  • Create branch or fork of omnia-md/conda-recipes with changes to propertyestimator in meta.yaml:
    • git_tag set to match git release (This tag can also be a branch name)
    • version set to match git release (this will go into the conda package name)
    • build set to 0
    • any updated dependencies reflected under requirements:
    • If we want to push to special rc label: use extra.upload
  • Open PR to merge branch or fork into omnia-md master
    • PR should be called, eg [propertyestimator] 0.0.1 (label: rc)
    • No PR body text is needed
    • Travis will run on this PR (~30 minutes) and attempt to build the package. Under no conditions will the package built before the PR is merged be uploaded. This step is just to ensure that building doesn't crash.
    • If build is successful, PR should be reviewed and merged by omnia maintainers
    • Once merged into master Package is built again on travis, and pushed to the channel set in meta.yaml (main,beta, or rc)
    • If we have upload: rc, we would install with conda install -c omnia/label/rc propertyestimator
  • Test omnia package
    • conda install -c omnia/label/rc openforcefield

4: Update the ReadTheDocs build versions

  • Trigger RTD build of latest (we need to do this to make RTD aware of the new tagged release)
  • Under Versions tab, add new release version to list of built versions and SAVE
  • Verify new version docs have built and pushed correctly
  • Under Admin | Advanced Settings: Set new release version as Default version to display and SAVE

5: Announce the release

Post something like this in #general

@channel We're pleased to announce the release of the Open Force Field Toolkit version 0.4.0! This release introduces updates to the SMIRNOFF spec, the ability to read SMIRNOFF 0.1 spec OFFXML files, and performance improvements for system creation. Detailed release notes are available at https://open-forcefield-toolkit.readthedocs.io/en/0.4.0/releasehistory.html

Conda packages available now on the `omnia` channel for both Linux and MacOS!
https://anaconda.org/omnia/openforcefield/files

Unable to install openff-evaluator on windows 11

Hi,

I'm unable to install the openff-evaluator module on my windows 11 64x desktop PC.

I created a new conda env called "openff-evaluator", tried downloading openff-evaluator and got this result.

grafik

I also tried downloading it from source but got the same results.

Is this a known issue?

Regards

dask port issue

Looks like if two users on the same hpc cluster submit two jobs independently other person cannot spawn worker jobs and end up with address already in use error. There is an open issue, here (dask/distributed#1926).

Calculate density expectation as M/<V> instead of M<1/V>

I'd argue that M/<V> instead of M<1/V> is a better quantity to be calculating. Volume is the more fundamental thermodynamic quantity (dG/dP = V), and the numerical behavior is better (doesn't overweight small values, underweight large values). It's also easier to then use in the calculation of partial molar volume.

One argument against this is that the experiments are often in terms of density. However, it's still straightforward to convert average molar volume to average density, and plug the density expectation into the Gaussian error model.

Add calculation of excess molar volume

Although initial testing indicated that excess molar volume had sufficient numerical issues that made it not ideal for optimization (too noisy, needing substantial time to converge). I think it would be an important property to benchmark on going forward, especially since it is so straightforward to calculate (any simulation of heat of mixing can obtain the excess molar volume with no additional simulation work), and so should be included in the evaluator formalism.

It is also related to a key thermodynamic quantity. Partial molar volumes capture completely the combined pressure and component number response of the free energy, which is the fundamental function of interest, i.e. V_i = d^2G/dPdN_i

Excess molar volumes are directly related to the partial molar volumes by V^E=\sum _i x_i (V _i-V_i^{ideal})
Where V_i^{ideal} is just the volume of pure i under those T and P conditions.

Replace the internal AttributeXXX classes with attrs

From @SimonBoothroyd:

The current AttributeClass and Attribute classes were mainly created to be flexible representations of data classes to be used when speccing out the new class designs as part of the openff-evaluator refactoring.

While these classes are flexible, extensible and allow new classes to be specced out easily, complete with automatic serialization support, automatic docstrings and a degree of automatic validation, they are not particularly performant and could (and should) likely be replace with / built on top of the existing attrs library.

Any replacement solution must provide the same level of flexibility, extensibility and automation, as well as being able to readily support polymorphic class designs.

Behavio(u)r spelling

Having keywords of behavior and classes of Behaviour seem ripe for spelling-induced errors. I'd recommend going with Behavior, as most of the world programs in American.

(e.g. merge_behavior=InequalityMergeBehaviour.LargestValue)

Adding OpenCL support for dask scheduling

We've recently (thanks to @mrshirts) got access to some AMD GPUs which would be useful to run evaluator on. After talking with @mattwthompson, it seems like evaluator's interaction with OpenMM supports OpenCL:

# A platform which runs on GPUs has been requested.
platform_name = (
"CUDA"
if toolkit_enum == ComputeResources.GPUToolkit.CUDA
else ComputeResources.GPUToolkit.OpenCL
)

However, Evaluator currently does not support using OpenCL when scheduling jobs with dask:

if resources_per_worker.number_of_gpus > 0:
if (
resources_per_worker.preferred_gpu_toolkit
== ComputeResources.GPUToolkit.OpenCL
):
raise ValueError("The OpenCL gpu backend is not currently supported.")
if resources_per_worker.number_of_gpus > 1:
raise ValueError("Only one GPU per worker is currently supported.")

Is it possible to add support for using Evaluator with OpenCL? This is not a particularly time-sensitive ask, but would be good to have in the future.

Tutorial 04 - Trainning

When I run this "!ForceBalance optimize.in" to start training, it shows the following.


Calculation started at 2022-02-02 11:40 AM
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Traceback (most recent call last):
File "/home/ylinb/anaconda3/envs/OpenFFCrystal/bin/ForceBalance.py", line 45, in Run_ForceBalance
optimizer.Run()
File "/home/ylinb/anaconda3/envs/OpenFFCrystal/lib/python3.8/site-packages/forcebalance/optimizer.py", line 322, in Run
xk = self.OptTabself.jobtype
KeyError: 'single'


Is there a specific dependency I need to install?
I follow the install instruction here
https://openff-evaluator.readthedocs.io/en/latest/install.html

GPU usage with schema and merge orders

This is less of an issue and more of a question. In running an estimation with two schemas, SolvationFreeEnergy and HostGuestBindingAffinity here, I'm aware that HostGuestBindingAffinity is able to make full use of all available dask workers, while SolvationFreeEnergy doesn't make full use of a single worker. But what I was surprised to see was that when I ran my code with the host_guest_data_set merged into the freesolv_data_set and with the solvation_schema added to estimation_options before the host_guest_schema, the binding simulation was restricted to one GPU for the entire calculation.

I tried to fix this problem by switching two things: the line order of the schema additions and which data set is merged into the other. In swapping both of my original orders, the problem has completely gone away. The solvation and binding run simultaneously until the solvation is complete, at which point the binding calculation is able to utilize all four of my available workers.

So what I'm wondering is, is this part of normal operation? Do I need to make sure I load certain schemas first or merge certain datasets into their counterparts and not vice versa? In the future I plan on testing the job with just one of my two fixes implemented to see which one was actually responsible for correcting the GPU usage. Below is my code with relevant lines marked with asterisks.

    freesolv_data_set = PhysicalPropertyDataSet.from_pandas(molecule)
    
    host_guest_data_set = TaproomDataSet(
        #####
    )

*** freesolv_data_set.merge(host_guest_data_set)
    #FIXED VERSION:
    #host_guest_data_set.merge(freesolv_data_set)
    
    solvation_schema = SolvationFreeEnergy.default_simulation_schema(use_implicit_solvent=True)
    
    APR_settings = APRSimulationSteps(
        #####
    )
    host_guest_schema = HostGuestBindingAffinity.default_paprika_schema(
        simulation_settings=APR_settings,
        use_implicit_solvent=True,
        enable_hmr=False,
    )
    
    estimation_options = RequestOptions()
    estimation_options.calculation_layers = ["SimulationLayer"]
*** estimation_options.add_schema(
        "SimulationLayer", "SolvationFreeEnergy", solvation_schema
    )
*** estimation_options.add_schema(
        "SimulationLayer", "HostGuestBindingAffinity", host_guest_schema
    )
    #FIXED VERSION:
    #Swapped order of the two starred .add_schema methods to have host_guest_schema go first

    print("All schemas were added to estimation_options")

    # Create Pool of Dask Workers
    calculation_backend = DaskLocalCluster(
        number_of_workers=4,
        resources_per_worker=ComputeResources(
            number_of_threads=1,
            number_of_gpus=1,
            preferred_gpu_toolkit=ComputeResources.GPUToolkit.CUDA,
        ),
    )
    calculation_backend.start()

A little bug in unit toolkit (Tutorial 03)

In the "Plotting the Results" part:

Original:
[7]
from openff.units import unit

It should be:
[7]
from openff.evaluator import unit

I guess there is an update in "unit". Same question as Tutorial 01

Running EvaluatorServer on a local machine for PBS jobs.

I'm running ForceBalance with Evaluator and I stumble across a small technical issue. Using the script below, I launch the server in the background and the process will be killed after the ForceBalance run is complete. However, if the job crashes or gets terminated the kill command will not be invoked and PBS/Torque is not able to terminate any processes running in the background. So the server continues to run and I checked this with TSCC admin. The servers I spawned (that didn't get killed) overloaded a node, which they had to restart. Is there a way to run the EvaluatorServer on a local machine and send the dask-workers to a remote cluster? Or is there a simple solution to prevent the server from running after the PBS job is terminated?

# Start the estimation server.
python evaluator_FB.py &> server_console_output.log &
echo $! > save_pid.txt

sleep 60

# Run ForceBalance.
ForceBalance.py optimize.in &> force_balance.log

# Kill the server.
kill -9 `cat save_pid.txt`
rm save_pid.txt

Issue with dask worker restarting/continuing a job

I've just noticed an issue with Evaluator restarting/picking up a job that gave similar error to issue #224 about appending to an empty binary file but may be caused by something else. The log file for the worker for the failed job shows a 7:56 timestamp

07:55:53.738 INFO     Running window 1 out of 59
07:55:53.889 INFO     Executing npt_equilibration
07:55:54.199 INFO     Setting up an openmm platform on GPU 0
07:56:00.527 INFO     No checkpoint files were found.
07:56:30.478 INFO     Protocol failed to execute: npt_equilibration
07:56:30.526 INFO     Protocol failed to execute: 45ab11237d9b4db899cb5418c9d5a5be|host_guest_free_energy_0
07:56:38.933 INFO     Executing 1307506edaf9477882944919aa4e8f65|host_guest_free_energy_1

Tracing back to the previous worker that dealt with this job indicates that it quit at 6:13 because it exceeded the max wall limit.

06:13:49.829 INFO     Executing npt_equilibration
06:13:49.831 INFO     Setting up an openmm platform on GPU 0
06:13:51.863 INFO     No checkpoint files were found.

and the files in the directory are

-rw-r--r-- 1 jsetiadi gibbs-group 548306 Jul 10 06:13 input.pdb
-rw-r--r-- 1 jsetiadi gibbs-group      0 Jul 10 06:13 openmm_statistics.csv 
-rw-r--r-- 1 jsetiadi gibbs-group      0 Jul 10 06:13 trajectory.dcd 

It looks like the worker dies before the DCDReporter/StateDataReporter could write anything to file. So then the next worker that picks up this job will definitely complain about appending to an empty trajectory.dcd.

As a potential fix, what do you think of checking for not only if the file exists but also the file size of trajectory.dcd? So changing

append_trajectory = os.path.isfile(self._local_trajectory_path)
dcd_reporter = app.DCDReporter(
self._local_trajectory_path, 0, append_trajectory
)
to

append_trajectory = os.path.isfile(self._local_trajectory_path) and os.path.getsize(self._local_trajectory_path) != 0
dcd_reporter = app.DCDReporter (
    self._local_trajectory_path, 0, append_trajectory
)

Could potentially apply the same logic to checkpoint_state.xml and checkpoint.json, so if the checkpoint file is empty then just restart that particular window from the beginning.

to/from_pandas does not roundtrip

I cannot create a ThermoMLDataSet from a pandas dataframe that was created from a dataset.

>>> df = dataset.to_pandas()
>>> ThermoMLDataSet.from_pandas(df)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/var/folders/rv/j6lbln6j0kvb5svxj8wflc400000gn/T/ipykernel_34462/444742739.py in <module>
      1 df = dataset.to_pandas()
----> 2 ThermoMLDataSet.from_pandas(df)

~/anaconda3/envs/polymetrizer/lib/python3.9/site-packages/openff/evaluator/datasets/datasets.py in from_pandas(cls, data_frame)
    555         for match in property_header_matches:
    556 
--> 557             assert match
    558 
    559             property_type_string, property_unit_string = match.groups()

AssertionError: 

Diagnostics

It dies on matching ExcessMolarVolume Value (cm ** 3 / mol) because the match pattern does not have asterisks.

>>> import re
>>> property_header_matches = {
            (header, re.match(r"^([a-zA-Z]+) Value \(([a-zA-Z0-9+-/\s]*)\)$", header))
            for header in df
            if header.find(" Value ") >= 0
        }
>>> property_header_matches
{('Density Value (g / ml)',
  <re.Match object; span=(0, 22), match='Density Value (g / ml)'>),
 ('DielectricConstant Value ()',
  <re.Match object; span=(0, 27), match='DielectricConstant Value ()'>),
 ('EnthalpyOfMixing Value (kJ / mol)',
  <re.Match object; span=(0, 33), match='EnthalpyOfMixing Value (kJ / mol)'>),
 ('EnthalpyOfVaporization Value (kJ / mol)',
  <re.Match object; span=(0, 39), match='EnthalpyOfVaporization Value (kJ / mol)'>),
 ('ExcessMolarVolume Value (cm ** 3 / mol)', None)}

Suggestion

        property_header_matches = {
---            re.match(r"^([a-zA-Z]+) Value \(([a-zA-Z0-9+-/\s]*)\)$", header)
+++            re.match(r"^([a-zA-Z]+) Value \(([a-zA-Z0-9+*-/\s]*)\)$", header)
            for header in data_frame
            if header.find(" Value ") >= 0
        }

Or get rid of the check altogether, as new exciting units arise. (I notice no allowance for exponents, for example, even though kJ/mol and kJ mol^-1 should be equivalent.)

[Request] allow evaluator to work with non-_TransformedDict dictionaries

Here:

for parameter in labelled_molecule[parameter_key.tag].store.values():

.store is an attribute specific to the OpenFF toolkit _TransformedDict. Given that __iter__ is defined I think you could make this more generic quite easily:

--- for parameter in labelled_molecule[parameter_key.tag].store.values():
+++ for parameter in labelled_molecule[parameter_key.tag].values():

Making it more generic opens the evaluator workflow to all ParameterHandlers that do not return _TransformedDict subclasses, e.g., dict, e.g., LibraryChargeHandler. Or more generically, a custom ParameterHandler plugin.

Importing ABCs from `collections` is deprecated

This will stop working soon:

/dfs6/pub/lilyw7/pydev/openff-evaluator/openff/evaluator/layers/layers.py:229: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
  if len(results) > 0 and isinstance(results[0], collections.Iterable)

Colab runs failing with import errors

Running the first tutorial on colab fails with the following error


NameError                                 Traceback (most recent call last)

<ipython-input-16-0eaab1ef33d9> in <module>()
----> 1 from openff.evaluator.datasets.thermoml import ThermoMLDataSet

11 frames

/usr/local/lib/python3.6/site-packages/openff/evaluator/datasets/thermoml/__init__.py in <module>()
----> 1 from .thermoml import ThermoMLDataSet
      2 
      3 from .plugins import register_thermoml_property, thermoml_property  # isort:skip
      4 
      5 __all__ = [ThermoMLDataSet, register_thermoml_property, thermoml_property]

/usr/local/lib/python3.6/site-packages/openff/evaluator/datasets/thermoml/thermoml.py in <module>()
     13 import requests
     14 
---> 15 from openff.evaluator import unit
     16 from openff.evaluator.datasets import (
     17     MeasurementSource,

/usr/local/lib/python3.6/site-packages/openff/evaluator/__init__.py in <module>()
     11 
     12 # Load the default plugins
---> 13 register_default_plugins()
     14 # Load in any found external plugins.
     15 register_external_plugins()

/usr/local/lib/python3.6/site-packages/openff/evaluator/plugins.py in register_default_plugins()
     19 
     20     # Import the default properties.
---> 21     importlib.import_module("openff.evaluator.properties")
     22 
     23     # Import the default layers

/usr/lib/python3.7/importlib/__init__.py in import_module(name, package)
    125                 break
    126             level += 1
--> 127     return _bootstrap._gcd_import(name[level:], package, level)
    128 
    129 

/usr/local/lib/python3.6/site-packages/openff/evaluator/properties/__init__.py in <module>()
----> 1 from .binding import HostGuestBindingAffinity
      2 from .density import Density, ExcessMolarVolume
      3 from .dielectric import DielectricConstant
      4 from .enthalpy import EnthalpyOfMixing, EnthalpyOfVaporization
      5 from .solvation import SolvationFreeEnergy

/usr/local/lib/python3.6/site-packages/openff/evaluator/properties/binding.py in <module>()
      9 from openff.evaluator.layers import register_calculation_schema
     10 from openff.evaluator.layers.simulation import SimulationLayer, SimulationSchema
---> 11 from openff.evaluator.protocols import (
     12     analysis,
     13     coordinates,

/usr/local/lib/python3.6/site-packages/openff/evaluator/protocols/analysis.py in <module>()
     21     bootstrap,
     22 )
---> 23 from openff.evaluator.utils.openmm import openmm_quantity_to_pint, system_subset
     24 from openff.evaluator.utils.timeseries import (
     25     TimeSeriesStatistics,

/usr/local/lib/python3.6/site-packages/openff/evaluator/utils/openmm.py in <module>()
      8 import numpy
      9 from pint import UndefinedUnitError
---> 10 from simtk import openmm
     11 from simtk import unit as simtk_unit
     12 

/usr/local/lib/python3.6/site-packages/simtk/openmm/__init__.py in <module>()
     17         'lib': version.openmm_library_path, 'path': _path}
     18 
---> 19 from simtk.openmm.openmm import *
     20 from simtk.openmm.vec3 import Vec3
     21 from simtk.openmm.mtsintegrator import MTSIntegrator, MTSLangevinIntegrator

/usr/local/lib/python3.6/site-packages/simtk/openmm/openmm.py in <module>()
     62 
     63 
---> 64 class ios_base(object):
     65     thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
     66 

/usr/local/lib/python3.6/site-packages/simtk/openmm/openmm.py in ios_base()
     68         raise AttributeError("No constructor defined")
     69     __repr__ = _swig_repr
---> 70     erase_event = _openmm.ios_base_erase_event
     71     imbue_event = _openmm.ios_base_imbue_event
     72     copyfmt_event = _openmm.ios_base_copyfmt_event

NameError: name '_openmm' is not 

Can we standardize small molecule force field parameterization through SystemGenerator?

@SimonBoothroyd : We're approaching a new release of OpenMM that would allow us to use the new openforcefields.generators.SystemGenerator to generate parameterized systems that include small molecules using a common interface for SMIRNOFF, GAFF, and potentially future force field residue template generator plugins. Could this play some role in the future of property evaluator, allowing us to standardize the way we implement plugins to provide other small molecule force field types?

Because SystemGenerator currently uses the OpenMM ForceField class under the hood, it's a stop-gap solution until we can more fully switch to an all-openforcefield toolkit infrastructure, but the concept of isolating all external small molecule parameter assignment engines behind a single API that permits new force fields (like CGenFF, MMFF, and others) to be "plugged in" could be useful if we can somehow share the infrastructure among projects.

'FilterDuplicates` unintentionally selects values without uncertainty if multiple are present

The FilterDuplicates class will always select a measurement without a reported uncertainty from a set of duplicate measurements, if one exists. This behavior is due to the default behavior of the pandas.sort_values method, which is to use na_position='last', putting any measurements with no uncertainty value at the bottom of the list:

uncertainty_header = value_header.replace("Value", "Uncertainty")

                property_data = component_data[component_data[value_header].notna()]

                if uncertainty_header in component_data:
                    property_data = property_data.sort_values(uncertainty_header)

Then when pandas.drop_duplicates is called, with keep='last' (intending to keep the measurement with the highest uncertainty), it will select a measurement with no uncertainty value.

property_data = property_data.drop_duplicates(
                    subset=subset_columns, keep="last"
                )

I believe this can be easily fixed by using property_data.sort_values(uncertainty_header, na_position='first'), and will open a PR to make this change.

Support OpenMM style Forcefields

It would be great if we could extend the supported forcefield types to include OpenMM style forcefields, the main benefit of this would be using the water models that come with OpenMM rather than having to make a offxml representation of the model. This would also be useful for projects like QUBEKit which use the openmm format for small molecules as well.

`from_pandas` and `from_json` methods of `PhysicalPropertyDataSet` object return different objects

The from_pandas method of the openff.evaluator.datasets.PhysicalPropertyDataSet object returns an instance of the PhysicalPropertyDataSet object, but the from_json method returns a dictionary. These methods should both return the same object to avoid confusion.

Example code

from openff.evaluator.datasets import PhysicalPropertyDataSet
import pandas


csv_file = 'alcohol-alkane-test.csv'
json_file = 'alcohol-alkane-test.json'

data_csv = pandas.read_csv(csv_file)
data_csv['Id'] = data_csv['Id'].astype('string')

pandas_dataset = PhysicalPropertyDataSet.from_pandas(data_csv)
json_dataset = PhysicalPropertyDataSet.from_json(json_file)

print(f'From Pandas: \n {type(pandas_dataset)}')
print(f'From json: \n {type(json_dataset)}')

Output

From Pandas: 
 <class 'openff.evaluator.datasets.datasets.PhysicalPropertyDataSet'>
From json: 
 <class 'dict'>

Conda env

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
absl-py                   0.13.0                   pypi_0    pypi
alabaster                 0.7.12                     py_0    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
amberlite                 16.0                     pypi_0    pypi
ambertools                20.9                     pypi_0    pypi
argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
arpack                    3.7.0                hc6cf775_2    conda-forge
arviz                     0.11.2             pyhd8ed1ab_1    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
babel                     2.9.1              pyh44b312d_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
blas                      2.16                        mkl    conda-forge
bleach                    4.0.0              pyhd8ed1ab_0    conda-forge
blosc                     1.21.0               h9c3ff4c_0    conda-forge
bokeh                     2.3.3            py38h578d9bd_0    conda-forge
boost                     1.74.0           py38hc10631b_3    conda-forge
boost-cpp                 1.74.0               h312852a_4    conda-forge
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.2               h7f98852_0    conda-forge
ca-certificates           2021.5.30            ha878542_0    conda-forge
cachetools                4.2.2                    pypi_0    pypi
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
cerberus                  1.3.2                      py_0    conda-forge
certifi                   2021.5.30        py38h578d9bd_0    conda-forge
cffi                      1.14.6           py38ha65f79e_0    conda-forge
cftime                    1.5.0            py38hb5d20a5_0    conda-forge
chardet                   4.0.0            py38h578d9bd_1    conda-forge
charset-normalizer        2.0.0              pyhd8ed1ab_0    conda-forge
clang                     5.0                      pypi_0    pypi
click                     8.0.1            py38h578d9bd_0    conda-forge
cloudpickle               1.6.0                      py_0    conda-forge
clusterutils              0.3.1              pyhd8ed1ab_1    conda-forge
cmiles                    0.1.6                ha770c72_2    conda-forge
cmiles-base               0.1.6              pyhd8ed1ab_2    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
conda                     4.10.3           py38h578d9bd_0    conda-forge
conda-package-handling    1.7.3            py38h497a2fe_0    conda-forge
cryptography              3.4.7            py38ha5dfef3_0    conda-forge
cudatoolkit               11.1.74              h6bb024c_0    nvidia
curl                      7.78.0               hea6ffbf_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
cython                    0.29.24          py38h709712a_0    conda-forge
cytoolz                   0.11.0           py38h497a2fe_3    conda-forge
dask                      2021.8.1           pyhd8ed1ab_0    conda-forge
dask-core                 2021.8.1           pyhd8ed1ab_0    conda-forge
dask-jobqueue             0.7.3              pyhd8ed1ab_0    conda-forge
dataclasses               0.6                      pypi_0    pypi
dbus                      1.13.6               h48d8840_2    conda-forge
debugpy                   1.4.1            py38h709712a_0    conda-forge
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
deprecated                1.2.12                   pypi_0    pypi
distributed               2021.8.1         py38h578d9bd_0    conda-forge
dm-tree                   0.1.6                    pypi_0    pypi
docopt                    0.6.2                      py_1    conda-forge
docutils                  0.17.1           py38h578d9bd_0    conda-forge
entrypoints               0.3             py38h32f6830_1002    conda-forge
expat                     2.4.1                h9c3ff4c_0    conda-forge
fftw                      3.3.9                h27cfd23_1  
flatbuffers               1.12                     pypi_0    pypi
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
fsspec                    2021.7.0           pyhd8ed1ab_0    conda-forge
future                    0.18.2           py38h578d9bd_3    conda-forge
gast                      0.4.0                    pypi_0    pypi
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
glib                      2.68.4               h9c3ff4c_0    conda-forge
glib-tools                2.68.4               h9c3ff4c_0    conda-forge
google-auth               1.35.0                   pypi_0    pypi
google-auth-oauthlib      0.4.5                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
gpflow                    2.2.1                    pypi_0    pypi
gpytorch                  1.5.0                    pypi_0    pypi
greenlet                  1.1.1            py38h709712a_0    conda-forge
grpcio                    1.39.0                   pypi_0    pypi
gst-plugins-base          1.18.4               hf529b03_2    conda-forge
gstreamer                 1.18.4               h76c114f_2    conda-forge
h5py                      3.1.0                    pypi_0    pypi
hdf4                      4.2.15               h10796ff_3    conda-forge
hdf5                      1.10.6          nompi_h7c3c948_1111    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
idna                      3.1                pyhd3deb0d_0    conda-forge
imagesize                 1.2.0                      py_0    conda-forge
importlib-metadata        4.6.4            py38h578d9bd_0    conda-forge
importlib_metadata        4.6.4                hd8ed1ab_0    conda-forge
importlib_resources       5.2.2              pyhd8ed1ab_0    conda-forge
intel-openmp              2021.3.0          h06a4308_3350  
ipykernel                 6.2.0            py38he5a9106_0    conda-forge
ipython                   7.26.0           py38he5a9106_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.3              pyhd3deb0d_0    conda-forge
jax                       0.2.19                   pypi_0    pypi
jaxlib                    0.1.70+cuda111           pypi_0    pypi
jbig                      2.1               h7f98852_2003    conda-forge
jedi                      0.18.0           py38h578d9bd_2    conda-forge
jinja2                    3.0.1              pyhd8ed1ab_0    conda-forge
joblib                    1.0.1                    pypi_0    pypi
jpeg                      9d                   h36c2ea0_0    conda-forge
jsonschema                3.2.0            py38h32f6830_1    conda-forge
jupyter                   1.0.0            py38h578d9bd_6    conda-forge
jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
jupyter_console           6.4.0              pyhd8ed1ab_0    conda-forge
jupyter_core              4.7.1            py38h578d9bd_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
keras                     2.6.0                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.3.1            py38h1fd1430_1    conda-forge
krb5                      1.19.2               hcc1bbae_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      2.2.1                h9c3ff4c_0    conda-forge
libarchive                3.5.1                hccf745f_2    conda-forge
libblas                   3.8.0                    16_mkl    conda-forge
libcblas                  3.8.0                    16_mkl    conda-forge
libclang                  11.1.0          default_ha53f305_1    conda-forge
libcurl                   7.78.0               h2574ce0_0    conda-forge
libdeflate                1.7                  h7f98852_5    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 11.1.0               hc902ee8_8    conda-forge
libgfortran-ng            7.5.0               h14aa051_19    conda-forge
libgfortran4              7.5.0               h14aa051_19    conda-forge
libglib                   2.68.4               h3e27bee_0    conda-forge
libgomp                   11.1.0               hc902ee8_8    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.8.0                    16_mkl    conda-forge
liblapacke                3.8.0                    16_mkl    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libllvm11                 11.1.0               hf817b99_2    conda-forge
libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     13.3                 hd57d9b9_0    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libsolv                   0.7.19               h780b84a_5    conda-forge
libssh2                   1.9.0                ha56f1ee_6    conda-forge
libstdcxx-ng              11.1.0               h56837e0_8    conda-forge
libtiff                   4.3.0                hf544144_1    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libuv                     1.42.0               h7f98852_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.12               h72842e0_0    conda-forge
libxslt                   1.1.33               h15afd5d_2    conda-forge
lj-surrogates             0.0.0                     dev_0    <develop>
llvmlite                  0.36.0           py38h4630a5e_0    conda-forge
locket                    0.2.0                      py_2    conda-forge
lxml                      4.6.3            py38hf1fe3a4_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mamba                     0.15.3           py38h2aa5da1_0    conda-forge
markdown                  3.3.4                    pypi_0    pypi
markupsafe                2.0.1            py38h497a2fe_0    conda-forge
matplotlib-base           3.4.3            py38hf4fb855_0    conda-forge
matplotlib-inline         0.1.2              pyhd8ed1ab_2    conda-forge
mdtraj                    1.9.6            py38hf01b267_1    conda-forge
mistune                   0.8.4           py38h497a2fe_1004    conda-forge
mkl                       2020.2                      256  
mmpbsa-py                 16.0                     pypi_0    pypi
mock                      4.0.3            py38h578d9bd_1    conda-forge
mpi                       1.0                       mpich    conda-forge
mpich                     3.3.2                h846660c_5    conda-forge
mpiplus                   v0.0.1          py38h32f6830_1002    conda-forge
msgpack-python            1.0.2            py38h1fd1430_1    conda-forge
multipledispatch          0.6.0                    pypi_0    pypi
mysql-common              8.0.25               ha770c72_2    conda-forge
mysql-libs                8.0.25               hfa10184_2    conda-forge
nbclient                  0.5.4              pyhd8ed1ab_0    conda-forge
nbconvert                 6.1.0            py38h578d9bd_0    conda-forge
nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nest-asyncio              1.5.1              pyhd8ed1ab_0    conda-forge
netcdf-fortran            4.5.3           nompi_hfef6a68_101    conda-forge
netcdf4                   1.5.6           nompi_py38hf887595_102    conda-forge
networkx                  2.6.2              pyhd8ed1ab_0    conda-forge
nose                      1.3.7           py38h32f6830_1004    conda-forge
notebook                  6.4.3              pyha770c72_0    conda-forge
nspr                      4.30                 h9c3ff4c_0    conda-forge
nss                       3.69                 hb5efdd6_0    conda-forge
numba                     0.53.1           py38h8b71fd7_1    conda-forge
numexpr                   2.7.3            py38h51da96c_0    conda-forge
numpy                     1.19.5                   pypi_0    pypi
numpydoc                  1.1.0                      py_1    conda-forge
numpyro                   0.7.2                    pypi_0    pypi
oauthlib                  3.1.1                    pypi_0    pypi
ocl-icd                   2.3.1                h7f98852_0    conda-forge
ocl-icd-system            1.0.0                         1    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openff-evaluator          0.3.4              pyhd8ed1ab_0    conda-forge
openff-forcefields        2.0.0              pyh6c4a22f_0    conda-forge
openff-toolkit            0.10.0             pyhd8ed1ab_0    conda-forge
openff-toolkit-base       0.10.0             pyhd8ed1ab_0    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openmm                    7.5.1            py38hafe6fa4_1    conda-forge
openmmtools               0.20.3             pyhd8ed1ab_0    conda-forge
openmoltools              0.8.7              pyhd8ed1ab_0    conda-forge
openssl                   1.1.1k               h7f98852_1    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 21.0               pyhd8ed1ab_0    conda-forge
packmol                   20.010               h6e990d7_0    conda-forge
packmol-memgen            1.1.0rc0                 pypi_0    pypi
pandas                    1.3.2            py38h43a58ef_0    conda-forge
pandoc                    2.14.1               h7f98852_0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parmed                    at20RC5+54.g5702a232fe.dirty          pypi_0    pypi
parso                     0.8.2              pyhd8ed1ab_0    conda-forge
partd                     1.2.0              pyhd8ed1ab_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pdb4amber                 1.7.dev0                 pypi_0    pypi
pdbfixer                  1.7                pyhd3deb0d_0    conda-forge
perl                      5.32.1          0_h7f98852_perl5    conda-forge
pexpect                   4.8.0            py38h32f6830_1    conda-forge
pickleshare               0.7.5           py38h32f6830_1002    conda-forge
pillow                    8.3.1            py38h8e6f84c_0    conda-forge
pint                      0.17               pyhd8ed1ab_0    conda-forge
pip                       21.2.4             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
prometheus_client         0.11.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.19             pyha770c72_0    conda-forge
prompt_toolkit            3.0.19               hd8ed1ab_0    conda-forge
protobuf                  3.17.3                   pypi_0    pypi
psutil                    5.8.0            py38h497a2fe_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycairo                   1.20.1           py38hf61ee4a_0    conda-forge
pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pydantic                  1.8.2            py38h497a2fe_0    conda-forge
pydoe2                    1.3.0                    pypi_0    pypi
pygments                  2.10.0             pyhd8ed1ab_0    conda-forge
pymbar                    3.0.5            py38h5c078b8_2    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py38h578d9bd_7    conda-forge
pyqt-impl                 5.12.3           py38h7400c14_7    conda-forge
pyqt5-sip                 4.19.18          py38h709712a_7    conda-forge
pyqtchart                 5.12             py38h7400c14_7    conda-forge
pyqtwebengine             5.12.1           py38h7400c14_7    conda-forge
pyro-api                  0.1.2                    pypi_0    pypi
pyro-ppl                  1.7.0                    pypi_0    pypi
pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
pysocks                   1.7.1            py38h578d9bd_3    conda-forge
pytables                  3.6.1            py38hc386592_3    conda-forge
python                    3.8.10          h49503c6_1_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.10.0.dev20210824 py3.8_cuda11.1_cudnn8.0.5_0    pytorch-nightly
pytraj                    2.0.5                    pypi_0    pypi
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
pyyaml                    5.4.1            py38h497a2fe_1    conda-forge
pyzmq                     22.2.1           py38h2035c66_0    conda-forge
qt                        5.12.9               hda022c4_4    conda-forge
qtconsole                 5.1.1              pyhd8ed1ab_0    conda-forge
qtpy                      1.10.0             pyhd8ed1ab_0    conda-forge
rdkit                     2021.03.4        py38hf8acc3d_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
reportlab                 3.5.68           py38hadf75a6_0    conda-forge
reproc                    14.2.1               h36c2ea0_0    conda-forge
reproc-cpp                14.2.1               h58526e2_0    conda-forge
requests                  2.26.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.7.2                    pypi_0    pypi
ruamel_yaml               0.15.80         py38h497a2fe_1004    conda-forge
sander                    16.0                     pypi_0    pypi
scikit-learn              0.24.2                   pypi_0    pypi
scipy                     1.5.3            py38h828c644_0    conda-forge
seaborn                   0.11.2               hd8ed1ab_0    conda-forge
seaborn-base              0.11.2             pyhd8ed1ab_0    conda-forge
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                57.4.0           py38h578d9bd_0    conda-forge
six                       1.15.0                   pypi_0    pypi
smirnoff99frosst          1.1.0              pyh44b312d_0    conda-forge
smt                       1.0.0                    pypi_0    pypi
snappy                    1.1.8                he1b5a44_3    conda-forge
snowballstemmer           2.1.0              pyhd8ed1ab_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
sphinx                    4.1.2              pyh6c4a22f_1    conda-forge
sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
sphinxcontrib-htmlhelp    2.0.0              pyhd8ed1ab_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_0    conda-forge
sqlalchemy                1.4.23           py38h497a2fe_0    conda-forge
sqlite                    3.36.0               h9cd32fc_0    conda-forge
statsmodels               0.12.2           py38h5c078b8_0    conda-forge
tabulate                  0.8.9                    pypi_0    pypi
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
tensorboard               2.6.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.0                    pypi_0    pypi
tensorflow                2.6.0                    pypi_0    pypi
tensorflow-estimator      2.6.0                    pypi_0    pypi
tensorflow-probability    0.13.0                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
terminado                 0.11.1           py38h578d9bd_0    conda-forge
testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
threadpoolctl             2.2.0                    pypi_0    pypi
tk                        8.6.11               h21135ba_0    conda-forge
toolz                     0.11.1                     py_0    conda-forge
tornado                   6.1              py38h497a2fe_1    conda-forge
tqdm                      4.62.2             pyhd8ed1ab_0    conda-forge
traitlets                 5.0.5                      py_0    conda-forge
typing-extensions         3.7.4.3                  pypi_0    pypi
typing_extensions         3.10.0.0           pyha770c72_0    conda-forge
uncertainties             3.1.6              pyhd8ed1ab_0    conda-forge
urllib3                   1.26.6             pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
werkzeug                  2.0.1                    pypi_0    pypi
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
widgetsnbextension        3.5.1            py38h578d9bd_4    conda-forge
wrapt                     1.12.1                   pypi_0    pypi
xarray                    0.19.0             pyhd8ed1ab_1    conda-forge
xmltodict                 0.12.0                     py_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxt                1.2.1                h7f98852_2    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
yank                      0.25.2             pyhd8ed1ab_0    conda-forge
zeromq                    4.3.4                h9c3ff4c_0    conda-forge
zict                      2.0.0                      py_0    conda-forge
zipp                      3.5.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge

Example script and files are also attached.
dataset_issue.zip

Refactor out the use of CMILES

I notice that this project is the only one in our stack that currently (explicitly) uses CMILES. Given that it's no longer maintenance and all functionality (should) be in the toolkit, we should probably remove it altogether at some point.

Filtering w/o Pandas?

Have you considered filtering data sets without converting to Pandas under the hood? It can be difficult to hold both in memory at the same time, especially as the dataframe is wide format. This would also allow for progress bars. My intuition is that creating an index mask array from the properties directly and creating a new dataset from that will be substantially faster than converting to and from pandas.

Edit: happy to implement this and benchmark it, but I'm not sure how many users are relying on using dataframes for filtering.

Edit: It's also kind of odd that the filters don't take units, given the emphasis placed on units elsewhere by OpenFF. While it's clearly because of the imposed units from the dataframe, it's also not clear at all if you start the filtering wiht a dataset. e.g.

pressure_filter = FilterByPressure.apply(
    data_set,
    FilterByPressureSchema(minimum_pressure=101.224, maximum_pressure=101.426)
)

Problems with host-guest calculations -- paprika

I'm having some issues when running HG calculations with the paprika_integration branch separate from issue #224 and I don't think I have the necessary expertise to fix these bugs (I don't understand dask).

Issue 1

With commit aef4e06 the calculation for releasing the host restraints is only performed once. Before this, the release calculations were repeated for each HG system (redundant). Basically, when I run the calculations on a cluster evaluator is only successful in calculating one HG system while the rest fails to run. I also tried running this locally, and the log file shows that it is failing when evaluator is trying to add the energies from the different phases (add_per_orientation_free_energies). evaluator spits out the following error:

13:02:34.874 INFO     Exceptions were raised while executing batch d553ea07ca164dd79b3030e6f60fb860
13:02:34.874 INFO     e4ed6f0a0fa44553901012448561208e|add_per_orientation_free_energies_0 failed to execute.

Traceback (most recent call last):
  File "/home/openff-evaluator/openff/evaluator/workflow/protocols.py", line 1245, in _execute_protocol
    protocol.execute(directory, available_resources)
  File "/home/openff-evaluator/openff/evaluator/workflow/protocols.py", line 704, in execute
    self._execute(directory, available_resources)
  File "/home/openff-evaluator/openff/evaluator/protocols/miscellaneous.py", line 49, in _execute
    self.result += value
TypeError: unsupported operand type(s) for +=: 'UndefinedAttribute' and 'UndefinedAttribute'

So... the directory for the host-only calculation is not funneled properly to the workers, hence the add_per_orientation_free_energies protocol is failing?

Issue 2

This problem is more subtle as there is no error printed by evaluator. When I run the HG calculations without the commit mentioned above (i.e. aef4e06) the program will run as expected. However, when analyzing the free-energies I noticed that the results (off-v1.2.0_bcd.zip) for bcd-m4t and bcd-m4c are exactly the same (-5.710330040754208 +- 0.6205644212487271 kcal/mol). I tried running the commands below inside the worker-logs directory

  • grep m4t *
  • grep m4c *

and found that nothing was returned for the m4c case. Thus, evaluator is only calculating the bcd-m4t system but is assigning the results it calculated for this system to bcd-m4c. Is this a bug with the task graph in evaluator?
(Note: I did not observe any duplicate results with calculations I did at the start of 2020)

RecursionError in LocalFileStorage

@property
def root_directory(self):
"""str: Returns the directory in which all stored objects are located."""
return self.root_directory

This recursive call means the property is not really very helpful.

>>> from openff.evaluator.storage import LocalFileStorage
>>> lfs = LocalFileStorage()
>>> lfs.root_directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lily/anaconda3/envs/polymetrizer/lib/python3.9/site-packages/openff/evaluator/storage/localfile.py", line 21, in root_directory
    return self.root_directory
  File "/Users/lily/anaconda3/envs/polymetrizer/lib/python3.9/site-packages/openff/evaluator/storage/localfile.py", line 21, in root_directory
    return self.root_directory
  File "/Users/lily/anaconda3/envs/polymetrizer/lib/python3.9/site-packages/openff/evaluator/storage/localfile.py", line 21, in root_directory
    return self.root_directory
  [Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded

My suggestion would be to just not privatize attributes with getter/setters, as I think it adds burden to code maintenance and is liable to bugs such as the one reported here. (The actual attribute used is _root_directory.)

Default keyword arguments result in error

In generate_default_metadata, the default parameter_gradient_keys=None, results in an error.

@staticmethod
def generate_default_metadata(
physical_property,
force_field_path,
parameter_gradient_keys=None,
target_uncertainty=None,
):

~/anaconda3/envs/polymetrizer/lib/python3.9/site-packages/openff/evaluator/workflow/workflow.py in generate_default_metadata(physical_property, force_field_path, parameter_gradient_keys, target_uncertainty)
    684         # Find only those gradient keys which will actually be relevant to the
    685         # property of interest
--> 686         relevant_gradient_keys = Workflow._find_relevant_gradient_keys(
    687             physical_property.substance, force_field_path, parameter_gradient_keys
    688         )

~/anaconda3/envs/polymetrizer/lib/python3.9/site-packages/openff/evaluator/workflow/workflow.py in _find_relevant_gradient_keys(substance, force_field_path, parameter_gradient_keys)
    568 
    569         # noinspection PyTypeChecker
--> 570         if parameter_gradient_keys == UNDEFINED or len(parameter_gradient_keys) == 0:
    571             return []
    572 

TypeError: object of type 'NoneType' has no len()

Ideally default values for functions should be ones that "just work". I'd suggest passing in parameter_gradient_keys=UNDEFINED.

Getting Attribute error when running host-guest calculations on PBS

I'm running into a problem when I run Evaluator on a cluster with PBS for the pAPRika host-guest calculations. I checked the working folder and Evaluator calculates the binding free energy. However, the program quits afterward (after all workers have completed their job) and spits out the following error:

Traceback (most recent call last):
  File "run_pbs.py", line 118, in <module>
    main()
  File "run_pbs.py", line 110, in main
    root_directory="workflow", calculation_backend=calculation_backend,
  File "/projects/gilson-kirkwood/jsetiadi/anaconda3_tscc/envs/evaluator/lib/python3.7/site-packages/distributed/client.py", line 225, in result
    raise exc.with_traceback(tb)
  File "/projects/gilson-onsager/jsetiadi/openff-evaluator/openff/evaluator/backends/dask.py", line 477, in _wrapped_function
    return_value = _Multiprocessor.run(function, *args, **kwargs) 
  File "/projects/gilson-onsager/jsetiadi/openff-evaluator/openff/evaluator/backends/dask.py", line 169, in run
    raise return_value[0]
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/projects/gilson-kirkwood/jsetiadi/anaconda3_tscc/envs/evaluator/lib/python3.7/multiprocessing/managers.py", line 234, in serve_client
    request = recv()
  File "/projects/gilson-kirkwood/jsetiadi/anaconda3_tscc/envs/evaluator/lib/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/projects/gilson-onsager/jsetiadi/openff-evaluator/openff/evaluator/attributes/attributes.py", line 226, in __setstate__
    self._set_value(name, state[name])
  File "/projects/gilson-onsager/jsetiadi/openff-evaluator/openff/evaluator/attributes/attributes.py", line 166, in _set_value
    attribute._set_value(self, value)
  File "/projects/gilson-onsager/jsetiadi/openff-evaluator/openff/evaluator/attributes/attributes.py", line 361, in _set_value
    f"The {self._private_attribute_name[1:]} attribute can only accept "
ValueError: The value attribute can only accept values of type <class 'pint.measurement.Measurement'>
---------------------------------------------------------------------------

I don't get this error when I run the calculations on a local cluster. Any idea what's wrong?
host_guest_evaluator.zip

[Request] Add try/except to parsing _Compound.from_xml_node

Occasionally parsing the IUPAC name fails. I'd like for dataset creation to continue and skip that molecule, rather than completely dying. Things it has died on:

  • "graphite"
  • "(.+-.)-cis-3-hexenyl 2-methylbutyrate"

from DOIs (all from this bib file https://github.com/openforcefield/release-1-benchmarking/blob/master/physical_properties/physprop-benchmark-sources.bib):

  • 10.1021/je025641j
  • 10.1016/j.jct.2015.02.015

Relevant code in thermoml.py:

        if (
            smiles is None
            and len(common_identifier_nodes) > 0
            and common_identifier_nodes[0].text is not None
        ):
            # Convert the common name to a smiles pattern.
+++         try:
                smiles = cls.smiles_from_common_name(common_identifier_nodes[0].text)
+++         except:
                return None  # or fall through to the logging if below

Solvation Free Energy calculations can generate lots of checkpoint data

I'm currently running a large set of solvation free energies with Evaluator, and it's generated a larger-than-expected amount of data (3.7T so far for 380 SFEs), which is mainly due to large (up to 30GB) solvent checkpoint solvent*_checkpoint.nc files (in the working-directory/SimulationLayer/request_name/*_conditional_group/*_run_yank folder. I think these checkpoint files can be deleted after each individual calculation finishes, but currently they are not deleted until the entire set of calculations finishes, so users running a large number of calculations could wind up with a problematic amount of intermediate data if they aren't aware.

Is it possible to clean up these checkpoint files after each individual calculation finishes?

Including rationales, best practices, and pitfalls or open issues

At the moment, it looks as though we could use a place in the organization of the docs to provide rationales for those aspects of the procedures that have specific rationales and/or are viewed as being rooted in an accepted "best practice".

Also, I think there may be room for notes on either potential pitfalls (such as inadequate sampling of torsions in liquid-state simulations) or open issues (for example, whether to compute dielectric constants via dipole fluctuations vs applied fields.).

Progress bars?

Sometimes things take a really long time and a progress bar is hugely helpful in planning your day. Would you be interested in adding progress bars to loopy actions such as from_file? I'd be happy to add some if desired.

String representation of classes that hold configuration values

Hello,

I am (slowly) getting to know the code behind propertyestimator by running the examples interactively in the notebook. During that journey, I am finding patterns relying on objects that could benefit from string representations (__repr__() and __str__()).

One example is propertyestimator.backends.backends.ComputeResources, which is the default option for the resources_per_worker keyword in propertyestimator.backends.DaskLocalClusterBackend. Without a string representation, the help()/? message for DaskLocalClusterBackend reads like this:

Init signature:
backends.DaskLocalClusterBackend(
    number_of_workers=1,
    resources_per_worker=<propertyestimator.backends.backends.ComputeResources object at 0x7f096b314dd8>,
)
Docstring:     
A property estimator backend which uses a dask `LocalCluster` to
run calculations.
Init docstring: Constructs a new DaskLocalClusterBackend
File:           ~/.local/anaconda/envs/openforcefield/lib/python3.7/site-packages/propertyestimator/backends/dask.py
Type:           type
Subclasses:     

, which is not very informative in terms of default options. If ComputeResources had string reprs, the message could be printed like this:

Init signature:
backends.DaskLocalClusterBackend(
    number_of_workers=1,
    resources_per_worker=<propertyestimator.backends.backends.ComputeResources object with number_of_threads=1, number_of_gpus=0, preferred_gpu_toolkit=None, _gpu_device_indices=None at 0x7f096ab3b9b0>,
)
[...]

Example implementation:

class ComputeResourcesRepr(backends.ComputeResources):
    def __repr__(self):
        cfg = ', '.join([f'{k}: {v}' for k, v in self.__getstate__().items()])
        return f'<{self.__module__}.{self.__class__.__name__} with {cfg} at 0x{id(self):02x}>'

Maybe there are more examples where this could be helpful, but I have yet to find them :) Would you welcome these changes as a PR once we list all potentially applicable classes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.