Coder Social home page Coder Social logo

openforcefield / openff-toolkit Goto Github PK

View Code? Open in Web Editor NEW
308.0 39.0 90.0 223.72 MB

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io

Home Page: http://openforcefield.org

License: MIT License

Python 94.92% Jupyter Notebook 5.08% Ruby 0.01%
molecular-dynamics force-fields openmm amber chemical-environment-perception smirnoff-force-field molecular-mechanics nsf-grant-che-1738979 open-force-field-consortium forcefield

openff-toolkit's Introduction

Status GH Actions Status Codecov coverage
Latest Release Last release tag Commits since releaseDOI
Communication docs stable user & dev discussions
Foundation license platforms python Funding
Installation Releases Conda Last updated Anaconda Cloud downloads

The Open Force Field toolkit

The Open Force Field Toolkit, built by the Open Force Field Initiative, is a Python toolkit for the development and application of modern molecular mechanics force fields based on direct chemical perception and rigorous statistical parameterization methods.

The toolkit currently covers two main areas we have committed to stably maintain throughout their lifetimes:

Note: Prior to version 0.9.0, this toolkit and its associated repository were named openforcefield and used different import paths. For details on this change and migration instructions, see the release notes of version 0.9.0.

Documentation

Documentation for the Open Force Field Toolkit is hosted at readthedocs. Example notebooks are available in the examples/ directory and also hosted on the Open Force Field website.

How to cite

Please cite the OpenFF Toolkit using the Zenodo record of the latest release or the version that was used. The BibTeX reference of the latest release can be found here.

Installation

The Open Force Field Toolkit (openff-toolkit) is a Python toolkit, and supports Python 3.9 through 3.11.

Installing via Mamba/Conda

Detailed installation instructions can be found here.

Force Fields

Two major force field development efforts have been undertaken by the Open Force Field Initiative, with results hosted in separate repositories.

  • The Open Force Fields repository, which features the Parsley and Sage force field lines. These are the Open Force Field Initiative's efforts toward building new force fields. The initial parameters are taken from smirnoff99Frosst, but software and data produced by the Initiative's efforts have been used to refit parameter values and add new SMIRKS-based parameters.
  • The smirnoff99Frosst repository, which is descended from AMBER's parm99 force field as well as Merck-Frosst's parm@frosst. This line of force fields does not aim to alter parameter values, but is instead a test of accurately converting an atom type-based force field to the SMIRNOFF format.

Force fields from both of these packages are available in their respective GitHub repositories and also as conda packages. Tables detailing the individual file names/versions within these force field lines are in the README of each repository. By default, installing the Open Force Field toolkit using conda or the single-file toolkit installers will also install these conda packages. A plugin architecture is provided for other force field developers to produce python/conda packages that can be imported by the Open Force Field Toolkit as well.

Toolkit features

The SMIRKS Native Open Force Field (SMIRNOFF) format

This repository provides tools for using the SMIRKS Native Open Force Field (SMIRNOFF) specification, which currently supports an XML representation for force field definition files.

By convention, files containing XML representations of SMIRNOFF force fields carry .offxml extensions.

Example SMIRNOFF .offxml force field definitions can be found in openff/toolkit/data/test_forcefields/. These force fields are for testing only, and we neither record versions of these files, nor do we guarantee their correctness or completeness.

Working with SMIRNOFF parameter sets

SMIRNOFF force fields can be parsed by the ForceField class, which offers methods including create_openmm_system for exporting to OpenMM and create_interchange for exporting to other formats (GROMACS, Amber, LAMMPS) via Interchange.

# Load a molecule into the OpenFF Molecule object
from openff.toolkit import Molecule
from openff.toolkit.utils import get_data_file_path
sdf_file_path = get_data_file_path('molecules/ethanol.sdf')
molecule = Molecule.from_file(sdf_file_path)

# Create an OpenFF Topology object from the molecule
from openff.toolkit import Topology
topology = Topology.from_molecules(molecule)

# Load the latest OpenFF force field release: version 2.1.0, codename "Sage"
from openff.toolkit import ForceField
forcefield = ForceField('openff-2.1.0.offxml')

# Create an OpenMM system representing the molecule with SMIRNOFF-applied parameters
openmm_system = forcefield.create_openmm_system(topology)

# Create an Interchange object for representations in other formats
interchange = forcefield.create_interchange(topology)

Detailed examples of using SMIRNOFF with the toolkit can be found in the documentation.

Frequently asked questions (FAQ)

See FAQ.md for answers to a variety of common problems, such as:

  • Why do I need to provide molecules corresponding to the components of my system, or a Topology with bond orders?
  • Can I use an Amber, CHARMM, or GROMACS topology/coordinate file as a starting point for applying a SMIRNOFF force field?
  • What if I am starting from a PDB file?

Contributors

For a partial list of contributors, see the GitHub Contributors page. Others whose work constitutes significant contributions but did not make it into the git history include Shuzhe Wang.

openff-toolkit's People

Contributors

adalke avatar andrrizzi avatar bannanc avatar camizanette avatar cbayly13 avatar davidlmobley avatar dependabot[bot] avatar dgasmith avatar dotsdl avatar ethanholz avatar ijpulidos avatar j-wags avatar jaimergp avatar jchodera avatar joshhorton avatar jthorton avatar leeping avatar lilyminium avatar mattwthompson avatar ntbre avatar pavankum avatar pefrankel avatar pre-commit-ci[bot] avatar richardjgowers avatar simonboothroyd avatar sukanyasasmal avatar trevorgokey avatar vtlim avatar yoshanuikabundi avatar ziyuanzhao2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openff-toolkit's Issues

Raise exception when generic parameters are assigned

In SMIRNOFF99Frosst, generic parameters are set to fairly ridiculous values. For example, the generic bond ([*:1]~[*:2] for any two atoms connected by any bond) the length is 4.0 angstroms and the force constant is 2,000 kcal/mol/angstrom**2. For our molecule sets we have eliminated molecules where these parameters are assigned, indeed that is how we discovered what parameters were missing when developing smirnoff99Frosst to begin with. However, if these parameters are assigned to a molecule the simulation will run without any kind of warning and unless someone looks closely at the data they may never know there is a problem.

@davidlmobley and I agree that as more people begin to use our force field running quietly with these parameters is not ideal. Offline we discussed two options for how to handle this scenario:

  1. We could make the parameters slightly more reasonable and add a warning that generic parameters were assigned to the parameters.

  2. Default behavior is to have an exception raised when one of these parameters is assigned. There should be an option that allows users to run the simulation anyway where a warning would still be printed.

David and I think option 2 is the better solution, but this raises a couple of questions:

  • Currently generic parameters are assigned the number 1 in their category (i.e. b1 for Bonds and n1 for non-bonds). This numbering scheme comes from how we convert the "frocmodish" files into the smirnoff ffxml format where each parameter is numbered in order. Should we maintain this convention where we look at the parameter id? OR assume the first parameter in any section is generic?

  • Should we keep the generic parameters as is or change the numbers to be slightly more reasonable values since default behavior will be to raise an error. If we change the values what should we change them to?

@cbayly13 could we get your input on this when you have time?

Remove requirement for unique atom names in reference molecules

Currently, generateTopology_from_oemol is used to turn reference molecules into Topology objects for matching against the provided OpenMM Topology when applying createSystem, and this builds a topology using atom names to create bonds. This results in a requirement where the reference OEMol objects must have unique atom names. We should revise how this is handled to fix.

Ultimately, I think we want to replace this method of generating topologies with the oemol_to_openmmTop in https://github.com/oess/oeommtools/blob/master/oeommtools/utils.py . However, I would have to check whether that will fix this specific issue.

Parsing issues with Chemical Environments

There are a few problems with chemical environments when it comes to substituted SMIRKS strings, such as "$ewg1". I wrote up long commentary in the smarty issue tracker before remembering we're trying to move everything to this repository, please see smarty issue#248

Provide a ChemistryFixer for molecule prep assistance

(I thought we had an issue for this already, but I don't see one.)

There are two types of assistance we want to provide for dealing with chemistry of input molecules:

  1. Sanity checks that the chemistry makes sense (e.g. see #61 ) which will be applied to all inputs, even those prepared by experts/with expert tools
  2. Chemistry preparation assistance to attempt to ensure chemistry is probably correct

See #61 for some related discussion, and also this comment and following.

Possible usage might look like:

python
from openforcefield.typing.helpers import ChemistryFixer
fixed_topology = ChemistryFixer(topology)
...

Part of the reasoning here is that experts/workflows (e.g. Orion) will often want to be certain that molecules coming into ForceField are not being modified by ForceField, as they are already assumed to be correct. So the default preparation needs to be only sanity checks. But many users (e.g. see #61 ) need more than this, such as some attempt to take a molecule which may not be correct or at least many not be completely specified (such as having missing bond orders) and "make it so", hence a need for ChemistryFixer.

Outdated Environment examples

Some of the environment examples are outdated, based on the first version of chemical environments which couldn't use SMIRKS to initiate them.

I'm going to combine the ipython notebooks in the example/chemicalEnvironments directory so they use the current code.

This is probably my fault for not updating the smarty examples with fixes in the chemical environments last fall.

Support constraints argument in ForceField.createSystem

Currently, calling createSystem with constraints different than None raises an exception. Since the constraints have been implement, we could now in principle support standard OpenMM constraints specification (e.g. HBonds, HAngles) by "appending" the SMIRNOFF definition of constraint to the stored force field specification.

Loading forcefield from ffxml in openforcefield/data/forcefield fails

I've tried to use the following script lines:

from openforcefield.typing.engines.smirnoff import forcefield
ff = forcefield.ForceField('forcefield/Frosst_AlkEtOH.ffxml')

This gets the following error:

OSError: Error reading file '/Users/bannanc/anaconda3/lib/python3.6/site-packages/openforcefield/typing/engines/smirnoff/data/forcefield/Frosst_AlkEtOH.ffxml': failed to load external entity "/Users/bannanc/anaconda3/lib/python3.6/site-packages/openforcefield/typing/engines/smirnoff/data/forcefield/Frosst_AlkEtOH.ffxml"

There is this #TODO in the loadFiles method in forcefield.py

It seems like this would be fixed by using the get_data_filename. If that works I'll put in a pull request today.

Allow configure VdW switching in ForceField

I couldn't find a way to setUseSwitchingFunction and setSwitchingDistance through ForceField.createSystem. We can still work around it by just configuring manually the force after the system has been created, but it would be nice to support it in NonbondedGenerator.createForce.

conda channel needs to be updated

We have some undergrads in the group performing gas phase minimizations to compare SMIRNOFF and other forcefields. I instructed our undergrad to download the openforcefield library and stop using her installation of smarty since the most recent version of SMIRNOFF needed the SMIRNOFF name support.

She followed the directions in the repo with conda install --yes -c omnia openforcefield and the version that was downloaded didn't support the name SMIRNOFF in the ffxml file.

Write API documentation

We should probably add true API documentation beyond the README.md.

Sphinx can produce nice API docs with customizable formatting, and is probably a good choice for this project as well. Here are some examples:

Would we be OK with readthedocs, or is that too unprofessional? Here are some examples:

We could alternatively push the docs to an S3 bucket like we do with yank, but we'd then have to integrate that into the openforcefield.org website content.

Provide alternate sets of monovalent ion parameters

As discussed in #53, the AMBER force fields provide three recommended sets of ion parameters PER WATER MODEL. As proposed here (#53 (comment)) we need to make these different sets available in FFXML format.

I've added one set in #53 as a default set, but we need the full range to be available. SMIRKS patterns can be pulled from the FFXML file added there. We probably want to make an automatic utility script that can convert frcmod files for AMBER ion parameters sets into ffxml files for the next time someone decides to invent new set(s) of AMBER ion parameters. We can do this by coding equivalences between the relevant strings (e.g. Li+) and SMIRKS patterns.

For consistency change AlkEtOH to AlkEthOH

We are not consistent in file names or references to the Alkanes, Ethers, and Alcohol sets. Some say AlkEtOH and others AlkEthOH. I like AlkEthOH, it is also what I think most of us have been saying out load when we talk about the set. Not a big deal, but we should probably be consistent before we publish the smirnoff or smarty papers.

Versioning the SMIRNOFF specification

cc: openforcefield/smarty#175 openforcefield/smarty#42

One of the original ideas of splitting off SMIRNOFF was that we could version the specification and synchronize the GitHub releases. Since we're combining several tools in one repo now, we can avoid needing exact version synchronization, but we will still want to add a version number attribute and call the current version something. We could start with 0.1.5, since that was the last version of smarty that SMIRNOFF was included in.

I think we currently have a float version tag (openforcefield/smarty#42 (comment)), but I don't see this in the SMIRNOFF spec.

Instead of x.y.z for the spec, I think we are OK with x.y:

  • X: update for major, API-breaking changes
  • Y: update every time we add a new feature that doesn't break the API

where the z of x.y.z can be reserved for code bugfix updates.

filter molecules utilities directory needs organization

I have noticed there are still a lot of intermediate files in the filter molecules directory. It also doesn't have very good documentation on the most recent uses of the scripts contained there.

I propose removing any intermediate molecule sets that we created when decided what to use to test smarty and smirky, but weren't actually used for anything. Then update the README to reflect what is actually included here and how we used it.

(as an aside, I'm working on making a "difficult" eMolecules set that only has molecules that are assigned a generic parameter. I could add the script for that here when I'm done if this is the place that makes the most sense for it.)

openforcefield fails when OEMol misses atom names

I was trying to parametrize an OEMol by using the SMIRNOFF FF. However the parametrization fails with errors:

Traceback (most recent call last):
File "test.py", line 20, in
mol_top, mol_sys, mol_pos = create_system_from_molecule(mol_ff, ligand)
File "/Users/gcalabro/local/miniconda2/envs/openeye/lib/python3.5/site-packages/smarty/forcefield_utils.py", line 101, in create_system_from_molecule
system = forcefield.createSystem(topology, [mol], verbose=verbose)
File "/Users/gcalabro/local/miniconda2/envs/openeye/lib/python3.5/site-packages/smarty/forcefield.py", line 950, in createSystem
topology = _Topology(topology, molecules)
File "/Users/gcalabro/local/miniconda2/envs/openeye/lib/python3.5/site-packages/smarty/forcefield.py", line 230, in init
self._identifyMolecules()
File "/Users/gcalabro/local/miniconda2/envs/openeye/lib/python3.5/site-packages/smarty/forcefield.py", line 361, in _identifyMolecules
raise Exception(msg)
Exception: No provided molecule matches topology molecule:
Atom 0 0 pBace_lLigand 1_0

I was able to narrow down the error and it is related to missing string atom names in the passed OEMol. In particular the dictionary in the function generateTopologyFromOEMol is generated with just one entry. To overcome the problem I called the function:

oechem.OETriposAtomNames(molecule)

prior to parametrize my molecule and the problem was fixed. I advice to add a check to the function generateTopologyFromOEMol to test if any of the string atom name is empty and if so assign Tripos atom names:

    if any([atom.GetName() == '' for atom in molecule.GetAtoms()]):
        oechem.OETriposAtomNames(molecule)

Set up openforcefield.org website

@mrshirts and I were talking yesterday and realized we don't have a place to send people that has the more general information about the OFF Group. The READMEs on the individual repositories tell you what is there and how to use it (or find more detailed examples). It would be nice to have a more general Wiki/website to link people not yet familiar with the project.

Michael asked me to write and issue and this seemed the most logical place to put it, but ideally I would think this should be independent on any of the repos.

Decide how to handle water models with extra sites

I just realized that we have no support for TIPnP models of water where n is higher than 3 at present, because SMIRNOFF currently uses only an all-atom representation of molecules (though coarse-grained representations are planned). There are plans for virtual sites for off-center charges in the format, however, but none are supported yet.

@jchodera - what's the easiest way to get in support for water models with more than three sites?

Certain molecules end up with modified protonation states prior to SMIRKS matching

Shuzhe Wang found that certain molecules appear to result in the application of a parameter which is not the expected one; for example, he provided an example (below) of a molecule containing trivalent nitrogens which ends up with a torsion involving tetravalent nitrogens. It appears that this results from an extra assignment of implicit hydrogens when we are preparing to get SMIRKS matches by switching to the desired aromaticity model (i.e. in https://github.com/open-forcefield-group/openforcefield/blob/master/openforcefield/typing/engines/smirnoff/forcefield.py#L105-L110 ); the extra call to OEAssignImplicitHydrogens appears to sometimes result in a change in protonation state from the desired (provided) protonation state, which is obviously not what we want.

The specific case Shuzhe provided was [H]c1c([H])c(N([H])S(=O)(=O)C([H])([H])[H])c([H])c([H])c1OC([H])([H])C([H])([H])N(C([H])([H])[H])C([H])([H])C([H])([H])c1c([H])c([H])c(N([H])S(=O)(=O)C([H])([H])[H])c([H])c1[H] and this ends up getting a SMIRKS match to [*:1]-[#6X4:2]-[#7X4:3]-[*:4].

Fix going up momentarily.

tests fail when installing new master on top of old conda distribution

nothing works and even tests fail after trying to install the new openforcefield master. My installation went like this:
conda create -c conda-forge -c omnia -n openff_2017sep python=3.5 openforcefield
source activate openff_2017sep
pip install jupyter
pip install OpenEye-toolkits
cd openforcefield
pip install -e ./

The traceback I get with tests all end up with the same KeyError on 'return self._nodes[n]', example:

ERROR: Check embedded atom parsing works

Traceback (most recent call last):
File "/Users/bayly/BaylyData/projects/forcefield/openforcefield/openforcefield/tests/test_chemicalenvironment.py", line 213, in test_embedded_atoms_smirks
env = ChemicalEnvironment(smirks)
File "/Users/bayly/BaylyData/projects/forcefield/openforcefield/openforcefield/typing/chemistry/environment.py", line 473, in init
if not self.isValid():
File "/Users/bayly/BaylyData/projects/forcefield/openforcefield/openforcefield/typing/chemistry/environment.py", line 486, in isValid
smirks = self._asSMIRKS()
File "/Users/bayly/BaylyData/projects/forcefield/openforcefield/openforcefield/typing/chemistry/environment.py", line 711, in _asSMIRKS
initialAtom = self.getAtoms()[0]
File "/Users/bayly/miniconda/envs/openff_2017sep/lib/python3.5/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0

Allow easy swaps of parameter groups such as for water

As noted in #24 , we want to begin using standard water models. Currently, there seem to be two ways of adding a water model (such as TIP3P) neither of which seems ideal:
a) Add the water model into an existing FFXML file, such as SMIRNOFF99Frosst. But, this would mean the water model is not easily replaceable by another
b) Put the water into its own FFXML file. But since each FFXML file is applied independently to a system, it means it could only be applied to systems consisting only of water (which would then need to be re-combined with systems containing other molecules)

To some degree, this would be helped by adding support for including XML files within existing XML files, as requested in #19 . But, there may still be some complexities -- for example, we have to have a mechanism for ensuring the new content ends up in the right place in the hierarchical format.

@jchodera , insights? This also has some overlap with #25 since when we swap water models we also want to make sure the charges are going to come from the right place.

Initial import of SMARTY

For historical record:

This was an initial import of stable tools from the https://github.com/open-forcefield-group/smarty repo to preserve their modification histories.

To import, I followed a modification of this suggestion:

# Check out new repository (openforcefield)
git clone [email protected]:open-forcefield-group/openforcefield.git
cd openforcefield
# Check out repository to import from (smarty)
mkdir import
cd import
https://github.com/open-forcefield-group/smarty
cd smarty
# Remove origin (for safety)
git remote rm origin
# Get a list of all files
git ls-files > ../files-to-remove
# MANUAL STEP: Edit ../files-to-remove manually, removing any entries you want to keep
# Now filter git repo and history to retain only files not deleted
# NOTE: Need to edit path below
# This will take some time
# BE VERY CAREFUL WHEN USING "rm -rf"!
git filter-branch --tree-filter 'rm -rf $(cat /absolute/path/to/files-to-remove)' -- --all
# Now pull from this repo
cd ../..
git remote add smarty import/smarty
git pull smarty master --allow-unrelated-histories
# Remove old repo
rm -rf import/smarty

Implement Library Charges

Major design decision:

Currently, the charge method is handled on a "by system" basis; that is, as an argument to createSystem. So, all the components of a System are charged via the same mechanism, such as AM1-BCC, and we have no mechanism to flag specific components of a system for special treatment. Often, this is what we want, but water poses challenges (see #24 ) -- typically, for existing water models, we want to assign library charges and only charge our solute (or non-water solvent) atoms via normal charging schemes like AM1-BCC.

How shall we achieve this? @jchodera , @bannanc , @andrizzi, thoughts? Some ideas:

  • Create a new "special section" in the SMIRNOFF format (kind of analogous to the BondChargeCorrections section) which allows SMIRKS patterns to specify use of library charges for specific groups
  • Allow chargeMethod to be a list rather than a string, where a different charge method can be used for different components of the system

Other possibilities?

smirnoff create system thrown off by atom names

command:
top, sys, xyz = create_system_from_molecule( forcefield, mol)

threw an exception on a mol which had a repeat atom name. Two issues:

  1. The exception was obfuscating:
Atom        0  HH31     0 ACE

note: Caitlin found an explicit test for atom name uniqueness which apparently did not fire.

  1. I don't think this problem should cause a failure. First, the atom name is just baggage, why should it matter if they are unique. What if input is a pdb file which has many repeat atom names? Second, if it is so important, why not just silently rename instead of failing, or at least have a settable flag to do so automatically if needed.
    nma.mol2.zip

Implement charge override in NonbondedForce

We should implement the ability to override partial charges for certain nonbonded types to allow water models and ions to be more easily implemented via self-contained XML files:

<NonbondedForce coulomb14scale="0.833333" lj14scale="0.5" sigma_unit="nanometers" epsilon_unit="kilojoules_per_mole">
   <!-- TIP3P water oxygen with charge override -->
   <Atom smirks="[#1]-[#8X2H2:1]-[#1]" sigma="0.31507524065751241" epsilon="0.635968" charge="-0.8476"/>
   <!-- TIP3P water hydrogen with charge override -->
   <Atom smirks="[#1:1]-[#8X2H2]-[#1]" sigma="0.1" epsilon="0" charge="+0.4238"/>
</NonbondedForce>

cc: #24 (comment)

@davidlmobley : Should these really be overrides, which has the ability to silently cause issues if we override some (but not all) atoms in a residue? Or should this be possible only if no OEMols are specified for that molecule?

Alternatively, we can simply check to make sure molecules end up with integral charges, which would allow us to catch most (but not all) override errors.

Remove generateTopologyFromOEMol?

In the OpenMM OpenEye stack, there's now support for roundtripping molecules from OEMol to OpenMM and back (see #47 (comment) ); should we remove the generateTopologyFromOEMol function in favor of using the functionality there? Though, we'd presumably need to make sure that the relevant toolkits are then conda installable.

The functionality there is, I believe, superior, in that it handles molecules that may consist of multiple residues, etc., whereas we currently handle only small molecules that consist of a single residue.

Modify ConstraintGenerator Class' labelForce() Function

I think the labelForce function of ConstraintGenerator class should be modified to be the same as labelForce methods in other force generator classes (in /typing/engines/smirnoff/forcefield.py).

In this class, unrollSMIRKSMatches is being called instead of getSMIRKSMatches_OEMol.

Errors when running the example code.

@davidlmobley and @jchodera

I was able to download this from the conda channel so at least that is working. However, I tried to run the example in the README as a first pass to check that things were working. I'm getting an error when I try to load the force field ffxml file

I wrote a script based on the example from the README

from openeye.oechem import *
from openforcefield.typing.engines.smirnoff import ForceField
from openforcefield.utils import get_data_filename, extractPositionsFromOEMol, generateTopologyFromOEMol

# Create OEMol of propanol
mol = OEMol()
smiles = "CCCO"
OEParseSmiles(mol, smiles)
OEAddExplicitHydrogens(mol)

# Get positions and topology in OpenMM-compatible format
topology = generateTopologyFromOEMol(mol)

# Load a SMIRNOFF small molecule forcefield for alkanes, ethers, and alcohols
forcefield = ForceField('Frosst_AlkEtOH_parmAtFrosst.ffxml')

# Create the OpenMM system, additionally specifying a list of OEMol objects for the unique molecules in the system
system = forcefield.createSystem(topology, [mol])

This is the error message I get when it tries to load the forcefield:

Error reading file '/[pythonpath]/openforcefield/typing/engines/smirnoff/data/Frosst_AlkEtOH_parmAtFrosst.ffxml': failed to load external entity "/[pythonpath]/openforcefield/typing/engines/smirnoff/data/Frosst_AlkEtOH_parmAtFrosst.ffxml"

I assume this is supposed to access [pythonpath]/openforcefield/data/forcefield/Frosst_AlkEtOH_parmAtFrosst.ffxml

If I copy the ffxml file to my local directory then I get an error when trying to create the system, with the following error:

Traceback (most recent call last):
  File "test_openforcefield.py", line 18, in <module>
    system = forcefield.createSystem(topology, [mol])
  File "/Users/cbanana/anaconda/lib/python2.7/site-packages/openforcefield/typing/engines/smirnoff/forcefield.py", line 969, in createSystem
    topology = _Topology(topology, molecules)
  File "/Users/cbanana/anaconda/lib/python2.7/site-packages/openforcefield/typing/engines/smirnoff/forcefield.py", line 197, in __init__
    self._identifyMolecules()
  File "/Users/cbanana/anaconda/lib/python2.7/site-packages/openforcefield/typing/engines/smirnoff/forcefield.py", line 375, in _identifyMolecules
    raise Exception(msg)
Exception: No provided molecule matches topology molecule:
Atom        0           0

Add sanity checks on input OEmols and input topologies

In #59 (and #60 which resolves it) we are making some adjustments to the SMIRKS matching code to avoid scenarios where the substructure search might operate on an OEMol with a different formal charge or number of protons than the input OEMols. But really this is a two part problem:
a) The user might provide input molecules with unexpected or bad protonation states and/or formal charges, and
b) We want the substructure search to operate on the same OEMol that we are parameterizing.

In #60, we fix (b). However, our fixes do nothing to resolve issue (a); we should sanity check the input OEMols for obvious issues, at least issuing warnings. For example, if the input molecule has a number of implicit hydrogens different than the number of explicit hydrogens, that's a problem. If it has no formal charges, we probably need to assign them, and if the protonation state is different from what would normally be expected at neutral pH we may want to issue a warning. I'm checking with @cbayly13 about whether there are other tests he would suggest.

Add explicit icu 58.* as dependency?

I run into a problem today where I had to explicitly tell conda to install openforcefield with icu 58.* (a dependency for lxml) or version 56.1 got installed, which would get me this error:

Traceback (most recent call last):
  File "create_input_files.py", line 12, in <module>
    from openforcefield.typing.engines import smirnoff
  File "/cbio/jclab/home/andrrizzi/miniconda/envs/smirnoff/lib/python3.5/site-packages/openforcefield/typing/engines/smirnoff/__init__.py", line 3, in <module>
    from openforcefield.typing.engines.smirnoff.forcefield import *
  File "/cbio/jclab/home/andrrizzi/miniconda/envs/smirnoff/lib/python3.5/site-packages/openforcefield/typing/engines/smirnoff/forcefield.py", line 32, in <module>
    import lxml.etree as etree
ImportError: libicui18n.so.58: cannot open shared object file: No such file or directory

The exact commands I used to solve the issue are

conda config --add channels omnia --add channels conda-forge
conda create --name smirnoff yank openforcefield packmol 'icu=58.*'

(the icu=58.* was not there before). This is not a problem when installing only openforcefield, but neither yank nor packmol requires icu, so this must be some sort conda weirdness.

I'd modify the conda recipe to explicitly require 58.* version to avoid this problem.

migrate read_typelist from AtomTyper from smarty into this repo

The method read_typelist from AtomTyper is used in smarty and smirky to read in *.smarts files. I propose we move this function to utils.py in this repo. I don't think we will need the whole AtomTyper class as that only works with SMARTS strings to match single atoms and we have clearly moved away from that.

@davidlmobley @jchodera
I can add this in my current pull request if you agree.

How to best expose rdkit SMIRNOFF to the user?

I guess it is good if we discuss and decide on how to expose rdkit SMIRNOFF to the end users.

At the moment, I just have a separate class called forcefield_rdk and a user will call that class instead of forcefield if they wish to use the rdkit version, and I am putting all import statements in SMIRNOFF concerning openeye products into try blocks.

Modeller compatibility with SMIRNOFF force fields

Is it already possible to use openMM Modeller @#class with openforcefield generated topologies? I tried but failed to do so, so far and the host guest system example also just uses pdbfixer and not the Modeller class.

Implement our own Topology - what features does it need?

It looks like we're headed towards needing our own Topology object, because as we progress towards including virtual sites, OpenMM's Topology objects won't cover what we need. For example, we will need the ability to have topologies which allow for easy extraction of (a) which particles are physical (chemical), (b) the full set including virtual sites, and (c) utility functionality for extracting physical positions given a full set of positions, etc. This should also preserve bond orders, etc., to allow easy creation of OEMol or RDKit molecules corresponding to the components of the system, etc.

Some of this is discussed here

@jchodera , other thoughts on what this should do/be like?

Lingering issues relating to an RDKit implementation

As I understand it, @hjuinj is making fairly good progress towards an RDKit implementation (see #69 ) , in that his prototype RDKit version can give identical parameter assignments for the majority of DrugBank with only a couple lingering issues on the RDKit end (both of which are quite significant, though):

  1. We currently rely on the MDL aromaticity model, which is not implemented in RDKit and we are unsure how to do so (the information on setting up a custom aromaticity model is minimal) so we need some guidance from the RDKit developers before we can proceed further (rdkit/rdkit#1622 )

  2. RDKit doesn’t fully support input of molecules from mol2 format unless they have Corina atom types (though they PARTIALLY support some molecules with SYBYLY/TRIPOS types), but the tools commonly used in molecular simulations don’t assign Corina atom types, nor am I aware of an open source tool which does. This poses major problems for getting molecules INTO RDKit. The folks I’m aware of who have dealt with this so far have bypassed the issue either by (a) pre-processing all of their molecules with the OpenEye tools; it seems that mol2 files output by the OpenEye tools are often (but not always) correctly handled by RDKIt, or (b) reading mol2 files of their molecules with the OpenEye toolkits and then converting the OEMols into RDMols. Neither of these is a general solution, since someone wanting an RDKit version presumably does so because they don't have the OpenEye tools. The only general solution I see is to get RDKit to provide a better/more general mol2 reader, as bypassing mol2 entirely is not really a viable option.

There is also a third issue which doesn’t directly involve RDKit necessarily — how will we assign partial charges for small molecules in general? We need a conda-installable open source tool to assign AM1-BCC charges or similar.

Anyone interested in contributing here (in terms of discussion/ideas, or coding help) is welcome.

Speed up parameter assignment by reversing order?

Chris Bayly relays a point from Stan Wlodek at OpenEye -- we could make parameter assignment run substantially faster by simply reversing the order of our files and assigning parameters in "first one wins" order. That is, by processing the most specialized SMIRKS patterns first, we can assign most of the parameters and only have to assign generic parameters to relatively few or no cases, potentially giving substantial speedups.

I'll have to think about whether this could be implemented without major architectural changes. However, it IS worth noting that assignment of a full SMIRNOFF99Frosst force field to a nontrivial system can take a noticeable amount of time, and this seems like it could make it substantially faster. Compared to the cost of doing something like a free energy calculation, this cost is still trivial, but it might be worth investigating this at some point.

Add tip3p.ffxml to data/forcefield/?

Is there a file for this somewhere?

Having a separate file with the parameters for tip3p water would help setting up simulations in water solvents by simply doing

smirnoff_ffxml = get_data_filename('forcefield/smirnoff99Frosst.ffxml')
tip3p_ffxml = get_data_filename('forcefield/tip3p.ffxml')
ff = ForceField(smirnoff_ffxml, tip3p_ffxml)

and avoid hardcoding the parameters in a Python script through ForceField.addParameter(). I can work on this today if you agree this is the way to go.

Move historical force field stuff from Bayly to private repo

Under the utilities within the conversion directory, there are historical files Chris Bayly used in making smirnoff99Frosst. These are of no practical value to anyone who wasn't involved in his process, so we should move them out of this public repo to a private one.

Remove requirement for OEMols of components if compatible Topologies provided

In the paper draft on SMIRNOFF, @jchodera remarks

Now that OpenMM Topology supports OEChem-compatible bond orders, we can remove the requirement that OEMol objects be provided!

This is true, and presumably what we would do is make them optional as long as the Topology includes bond orders. Though it's irrelevant until there are tools commonly in use which produce OpenMM Topology objects which have bond orders. As far as I'm aware this is so far only tools from OpenEye (e.g. in github.com/oess/oeommtools); @jchodera , are there others? If this is something people would use soon then I'm all for putting it in place quickly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.