mobleylab / blues Goto Github PK

Applications of nonequilibrium candidate Monte Carlo (NCMC) to ligand binding mode sampling

Home Page: https://mobleylab-blues.readthedocs.io/en/latest

License: MIT License

Python 98.59% Shell 0.49% HTML 0.92%

blues's Introduction

`BLUES`: Binding modes of Ligands Using Enhanced Sampling

This package takes advantage of non-equilibrium candidate Monte Carlo moves (NCMC) to help sample between different ligand binding modes.

Latest release:

Citations

Publication

Gill, S; Lim, N. M.; Grinaway, P.; Rustenburg, A. S.; Fass, J.; Ross, G.; Chodera, J. D.; Mobley, D. L. “Binding Modes of Ligands Using Enhanced Sampling (BLUES): Rapid Decorrelation of Ligand Binding Modes Using Nonequilibrium Candidate Monte Carlo” - Journal of Physical Chemistry B. February 27, 2018

Preprints

BLUES v1 - ChemRxiv September 19, 2017
BLUES v2 - ChemRxiv September 25, 2017

Manifest

blues/ - Source code and example scripts for BLUES toolkit
devdocs/ - Class diagrams for developers
devtools/ - Developer tools and documentation for conda, travis, and issuing a release
images/ - Images/logo for repository
notebooks - Jupyter notebooks for testing/development

Prerequisites

BLUES is compatible with MacOSX/Linux with Python=3.6.

Install miniconda according to your system.

Installation

ReadTheDocs: Installation

Recommended: Install releases from conda

# Create a clean environment (python 3.6 is required)
conda create -n blues python=3.6
conda activate blues

# Install OpenEye toolkits and related tools first
conda install -c openeye/label/Orion -c omnia oeommtools
conda install -c openeye openeye-toolkits

# Install necessary dependencies
conda install -c omnia -c conda-forge openmmtools=0.15.0 openmm=7.4.2 numpy cython

conda install -c mobleylab blues

Install from source (if conda installation fails)

# Clone the BLUES repository
git clone https://github.com/MobleyLab/blues.git ./blues

# Install some dependencies
conda install -c omnia -c conda-forge openmmtools=0.15.0 openmm=7.4.2 numpy cython

# To use SideChainMove class, OpenEye toolkits and related tools are requried (requires OpenEye License)
conda install -c openeye/label/Orion -c omnia oeommtools
conda install -c openeye openeye-toolkits

# Install BLUES package from the top directory
pip install -e .

# To validate your BLUES installation run the tests (need pytest)
cd blues/tests
pytest -v -s

Documentation

For documentation on the BLUES modules see ReadTheDocs: Modules For a tutorial on how to use BLUES see ReadTheDocs: Tutorial

BLUES using NCMC

This package takes advantage of non-equilibrium candidate Monte Carlo moves (NCMC) to help sample between different ligand binding modes using the OpenMM simulation package. One goal for this package is to allow for easy additions of other moves of interest, which will be covered below.

Example Use

An example of how to set up a simulation sampling the binding modes of toluene bound to T4 lysozyme using NCMC and a rotational move can be found in examples/example_rotmove.py

Actually using BLUES

The integrator of BLUES contains the framework necessary for NCMC. Specifically, the integrator class calculates the work done during a NCMC move. It also controls the lambda scaling of parameters. The integrator that BLUES uses inherits from openmmtools.integrators.AlchemicalNonequilibriumLangevinIntegrator to keep track of the work done outside integration steps, allowing Monte Carlo (MC) moves to be incorporated together with the NCMC thermodynamic perturbation protocol. Currently the openmmtools.alchemy package is used to generate the lambda parameters for the ligand, allowing alchemical modification of the sterics and electrostatics of the system. The Simulation class in blues/simulation.py serves as a wrapper for running NCMC simulations.

Implementing Custom Moves

Users can implement their own MC moves into NCMC by inheriting from an appropriate blues.moves.Move class and constructing a custom move() method that only takes in an Openmm context object as a parameter. The move() method will then access the positions of that context, change those positions, then update the positions of that context. For example if you would like to add a move that randomly translates a set of coordinates the code would look similar to this pseudocode:

from blues.moves import Move
class TranslationMove(Move):
   	def __init__(self, atom_indices):
   		self.atom_indices = atom_indices
   	def move(context):
   	"""pseudocode for move"""
   		positions = context.context.getState(getPositions=True).getPositions(asNumpy=True)
   		#get positions from context
   		#use some function that translates atom_indices
   		newPositions = RandomTranslation(positions[self.atom_indices])
   		context.setPositions(newPositions)
   		return context

Combining Moves

Note: This feature has not been tested, use at your own risk. If you're interested in combining moves together sequentially–say you'd like to perform a rotation and translation move together–instead of coding up a new Move class that performs that, you can instead leverage the functionality of existing Moves using the CombinationMove class. CombinationMove takes in a list of instantiated Move objects. The CombinationMove's move() method perfroms the moves in either listed or reverse order. Replicating a rotation and translation move on t, then, can effectively be done by passing in an instantiated TranslationMove (from the pseudocode example above) and RandomLigandRotation. One important non-obvious thing to note about the CombinationMove class is that to ensure detailed balance is maintained, moves are done half the time in listed order and half the time in the reverse order.

Versions:

Version 0.0.1: Basic BLUES functionality/package
Version 0.0.2: Maintenance release fixing a critical bug and improving organization as a package.
Version 0.0.3: Refactored BLUES functionality and design.
Version 0.0.4: Minor bug fixes plus a functionality problem on some GPU configs.
Version 0.1.0: Refactored move proposals, added Monte Carlo functionality, Smart Darting moves, and changed alchemical integrator.
Version 0.1.1: Features to boost move acceptance such as freezing atoms in the NCMC simulation and adding extra propagation steps in the alchemical integrator.
Version 0.1.2: Incorporation of SideChainMove functionality (Contributor: Kalistyn Burley)
Version 0.1.3: Improvements to simulation logging functionality and parameters for extra propagation.
Version 0.2.0: YAML support, API changes, custom reporters.
Version 0.2.1: Bug fix in alchemical correction term
Version 0.2.2: Bug fixes for OpenEye tests and restarting from the YAML; enhancements to the Logger and package installation.
Version 0.2.3: Improvements to Travis CI, fix in velocity synicng, and add tests for checking freezing selection.
Version 0.2.4: Added a simple test system (charged ethylene) which can run quickly on CPU.
Version 0.2.5: This contains numerous small changes/fixes since the v0.2.4 release, but most notably includes the introduction of water hopping as described at https://doi.org/10.26434/chemrxiv.12429464.v1

Acknowledgements

We would like to thank Patrick Grinaway and John Chodera for their basic code framework for NCMC in OpenMM (see https://github.com/choderalab/perses/tree/master/perses/annihilation), and John Chodera and Christopher Bayly for their helpful discussions.

blues's People

Contributors

Stargazers

Watchers

Forkers

amrhamedp zuzanaj tannerbobak nathanmlim msuruzhon redesignscience yunhuige sgill2 layeqa amdens-sci byun-jinyoung vincenzochen solidsnake1905 wutobias

blues's Issues

Enforce symmetry during NCMC move

In order to keep detailed balance (easily) during an NCMC move it's necessary to keep the propagation/perturbation steps during an NCMC move symmetric. We should probably enforce this symmetry as best as we can when users are using this code themselves, so they don't unwittingly alter the acceptance criteria. This probably involves at least these three things:

Include a default lambda switching protocol that is symmetric.
Include documentation warning about changing the lambda switching protocol and the importance of the symmetry
Include additional perturbation moves (like the rotational/smart darting moves) at a symmetric point in the NCMC move (And possibly a way to enforce it?).
I'll see about addressing these points in the coming days.

oeommtools installation

I was trying to install oeommtools as instructed in the blues installation using
conda install -c openeye/label/Orion oeommtools
But, got the error "PackageNotFoundError: Packages missing in current channels: - oeommtools -> packmol ".

I switched to conda install -c openeye/label/Orion oeommtools -c omnia as instructed in Gaetano's github and it worked fine. There isn't any specific issue, but maybe changing the blues installation instructions might help anyone new.

Defining darts in protein systems (Smart Darting)

The way I handle the smart darting in my static toy system is to define a set point of space to be the center of my dart. Of course, this doesn't work when the system of interest moves around, like a protein in a solvent box. Ideally I'd like to use the OpenMM virtual site functionality to define the center of a dart. That way the optimized OpenMM code can update the virtual site locations and I can just retrieve those coordinates and update the dart locations.

My current approach right now is to extract the positions from the context and then calculate where the dart center should be using the weighted average of a given set of particle positions. This should work fine in principle, but I think it'd potentially be faster/more robust to let the OpenMM code do this through virtual sites. On the other hand it seems like you have to jump through a lot of hoops to add particles (even virtual ones) to a system with forces that you've already set up.

Any other thoughts/input on how best to handle defining a dart center would be appreciated.

SimNCMC.rotationalmove() rotates the positions at the start of the ncmc step

SimNCMC.rotationalmove() rotates the ligand positions from the start of the ncmc step, when it should use the positions from wherever the ligand is at currently after the pertubation/propogation steps.

Correct alchemical correction equation

The alchemical correction factor originally used the formula:
correction_factor = -1.0*((norm_newPE - alc_newPE) - (oldPE - alc_oldPE))*(1/nc_integrator.kT),
where
norm_newPE is the potential energy at the end of the NCMC iteration using softcore interactions,
alc_oldPE is the potential energy at the start of the NCMC iteration using softcore interactions,
and norm_newPE and oldPE refer to the potential energies at the end of the NCMC iteration respectively using the normal steric/electrostatic interactions. (the kT portion is to cast the potential energy in terms of kT).

The actual correction factor should be:
correction_factor = (alc_oldPE - oldPE + norm_newPE - alc_newPE)*(1/nc_integrator.kT)
With the semi-cycle worked out in the attached picture.

Implement thermostatting test

Gilson pointed out we should be able to test whether the MC is working properly by turning off all thermostatting and running constant energy MD and using the MC to "thermostat" the system (e.g. without velocity randomization after moves). This is probably a good test to set up (a slow test to run locally, most likely) if we can implement it without too much wrangling.

Update docs for NonequilibriumExternalLangevinIntegrator to indicate it is vestigial

We need to update the docs for NonequilibriumExternalLangevinIntegrator to indicate it is currently vestigial and link to the discussion in #53 . It's good to have here in case we need it in the future, but wouldn't want it to appear it is currently used if it is not.

Refactor Smart Darting

Smart darting isn't refactored in #44 and should be changed to fit with how BLUES will now handle moves.

Refactor NCMC code ot eliminate code duplication?

We've moved all of the nonequilibrium switching functionality from perses into NonequilibriumLangevinIntegrat.

All of the nonequilibrium integrators can be moved out of ncmc_switching.py and replaced by calls to openmmtools.integrators.NonequilibriumLangevinIntegrator with the appropriate integrators splittings:

For g-BAOAB, this would be R V O H O V R, where the Hamiltonian update happens in the middle to make the protocol symmetric. We currently recommend this, neglecting shadow work and only including protocol work in the NCMC acceptance probability---this will be the topic of a paper we're trying to tackle ASAP).
For VVVR, this is O V R H R V O
For velocity Verlet, this is V R H R V
For GHMC (Metropolized VVVR) with alchemical updates, you want O { V R H R V } O

@maxentile, @bas-rustenburg, and @patrickgrinaway were integral in putting this together, and can help you if you need more info.

Plan reorganization into package with supporting materials in other directories

We'll need to reorganize what's here into a package structure (i.e. see https://github.com/open-forcefield-group/smarty) with a setup.py which installs a main package under some name (see #1) which can be imported. I can help with this, but first we'll need to settle what should go where, etc.

I'm guessing this means that many of the current items would then get moved into an examples or development directory (or some into each?) because they may relate to specific tests, illustrations, or ongoing development. @sgill2 , what would be the best way to handle this?

Test blues on CPUs

@Limn1 reported problems running simulations on CPUs. I'll revisit this issue after #19 has been addressed.

Plan MoveMIxtureEngine

Ultimately, we will want to combine various types of moves, either separately with different probablities, or sometimes in combination (smart darting plus random rotation). Perhaps the separate application might look like this:

mover = MoveMixtureEngine( [ ligmove, [sidechainmove_val111, sidechainmove_leu113], loopmove ], nstepsNC, probabilities = 0.5, 0.25, 0.25] )
Simulation(mover, everything_else***)
Simulation.run()

Simulation should be able to take either a "plain" move, e.g. Simulation(ligmove) or a MoveMixtureEngine object.

But how would we alter this if we want to apply moves simultaneously?

BLUES Logo colors altered on resize

Minor issue but when the logo was resized the colors were altered.
Resized:
Particularly, the white shadows surrounding the blue appear purple.

Rather than changing the resolution on the file itself, you should scale the original source image with html.
For example:<img src="https://cloud.githubusercontent.com/assets/11985776/23873871/d2b50abe-07f0-11e7-94ae-85256ee78edd.png" width="240">

Get travis testing framework going here

We need to get automated testing via the Travis-CI framework working here. I will attempt to do so. As per #10, and particularly the example shown here, my initial pass will just add a travis check that attempts to get the package, install, and make sure it does this properly: #10 (comment)

Isolate dependency on openeye tools

As we incorporated sidechain moves, we introduced a dependency on the OpenEye toolkits. We should isolate this so it is only a dependency if sidechain moves are utilized so that people can use the package (though not sidechain moves) without the OE toolkits.

Do new point release

We should do a new point release now that #44 is merged. 0.0.3? @nathanmlim , want to do this? You would use the "releases" tab and model it after the other ones. This SHOULD automatically trigger a new DOI via zenodo as well.

Disallow odd values for Simulation

We should enforce that the nsteps_nc option for blues.simulation.Simulation to be even, so that the protocol is symmetric, to maintain detailed balance.

Water translation & rotation

I'm working on integrating the BLUES WaterTranslationRotationMove class and RandomLigandRotationMove class into one

See code here: https://github.com/MobleyLab/blues/blob/waterscript/Water/WaterTranslationRotation.py

See if checking for nans is necessary

There's a conditional part of the log(accept) that checks for if the energy is nan and immediately rejects it. Going by nan logic, they shouldn't be accepted if they pop up in the ≥ conditional, so checking for nans explicitly might not be needed.

Once repo is public, turn on Zenodo for DOIs

Once this is public, turn on Zenodo so that each release will automatically generate a new citeable DOI for the code.

Add a license to the repo

I'll add a license to the repository -- probably MIT, for consistency with OpenMM.

Exception: Particles not set for NC context

Offending line causes the exception below:

File "example.py", line 105, in runNCMC
alchemical_correction=True)
File "/mnt/SG1TB/Github/testenv/blues/blues/ncmc.py", line 417, in runSim
nc_stateinfo = nc_context.getState(True, False, False, False, False, periodic)
File "/home/limn1/anaconda3/envs/blues_test/lib/python3.5/site-packages/simtk/openmm/openmm.py", line 3969, in getState
self._getStateAsLists(getP, getV, getF, getE, getPa, getPd, enforcePeriodic, groups_mask)
File "/home/limn1/anaconda3/envs/blues_test/lib/python3.5/site-packages/simtk/openmm/openmm.py", line 3919, in _getStateAsLists
return _openmm.Context__getStateAsLists(self, getPositions, getVelocities, getForces, getEnergy, getParameters, getParameterDerivatives, enforcePeriodic, groups)
Exception: Particle positions have not been set

Line 416 is meant to create an empty positions array for the nc context, but the new versions of OpenMM cause the exception above to be raised.

Possible solution would be to just initialize an empty numpy array using the length of atoms from the md context?

Revert reversion to `alchemy`

#84 will revert to the old alchemy package rather than openmmtools.alchemy while we are sorting out the issues there. This is a reminder to revert back.

Example script uses relative import

The example script currently uses a relative import, so if a user copies the example script to another directory they'll run into an import error.
https://github.com/MobleyLab/blues/blob/master/blues/example.py#L16

Extend bond rotation moves to general bonds

We're close to having sidechain NCMC moves working, thanks to @khburley , but I just realized we should also eventually make it so the same type of move (under a different name) can be applied to arbitrary bond rotations to allow flexible selection of bonds for sampling enhancement. This is just a placeholder to remind us to do that eventually.

Provide a minimal example of a unit test

@sgill2 - I'd like to help get the automated testing framework running here. To do so, can you provide me with some sort of a minimal test which should run if blues is successfully installed which is rapid and either (a) produces some output you know is correct, or (b) just tests that the module loads properly?

Really, this could just be a series of commands which should run that wouldn't run if blues is not correctly installed. Then I will incorporate these into a test for Travis-CI and get the testing framework going on here, and you can then extend the tests as things proceed.

Add link to v0.0.4 in README.md

The Versions list in the README.md didn't get updated with the 0.0.4 release.

Refactor Model Class

At some point soon we're looking to refactor the model class so that it can modularly combine different types of moves by using a redefined Class structure and generators (See openforcefield).

0.0.1 release did not trigger Zenodo; I should do next release

@nathanmlim - LMK when we are ready for another release. For some reason the 0.0.1 release you just cut did not trigger deposition to Zenodo/assignment of a DOI. I will need to do another release to troubleshoot, but it would be silly to do another release which is exactly the same. So, tag me when anything significant and new is in so I can do another release to troubleshoot.

Bring in fix to accelerate alchemical relaxation

@sgill2 - as we're rapidly heading towards having more people using this, we should go ahead and bring in the fix which makes the alchemical portion dramatically faster by NOT using the long range correction. Can you go ahead and do that?

Example fails for me

I'm attempting to run the example and I get this error:

practice_run = SimNCMC(temperature=temperature, residueList=lig_atoms)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-0b85e377113d> in <module>()
----> 1 practice_run = SimNCMC(temperature=temperature, residueList=lig_atoms)

/Users/dmobley/anaconda/lib/python2.7/site-packages/blues-0.0.1-py2.7.egg/blues/ncmc.pyc in __init__(self, temperature, residueList, **kwds)
    163         niter:    int, number of iterations of NC/MD to perform in total
    164         """
--> 165         super().__init__(**kwds)
    166 #        print('testing1')
    167

TypeError: super() takes at least 1 argument (0 given)

Fix verbose print statements

Currently the verbose printing of the integrator variables for debugging in the Simulation object doesn't work correctly for two reasons:

The verbose boolean is never specified, thus always defaults to the default of false.
https://github.com/MobleyLab/blues/blob/master/blues/simulation.py#L230-L235
The getWorkInfo() method specifies a str 'param' when it should reference the param variable
https://github.com/MobleyLab/blues/blob/master/blues/simulation.py#L152-L153

Allow addition of restraints

Currently it's difficult to add restraints to the BLUES simulations without changing the base code. It'd be good to think of a scheme to allow easier additions of restraints.

Think through architectural issues relating to Orion

In prep for our visit to OpenEye next week, there are a couple of things to think through relating to potentially getting this going on Orion. Here is some info from Chris Bayly:

Orion is the cloud-based computing arena that runs Floes (workflows), which consist of modular "units of work" cubes connected by data streams. So good preparation for Sam's work would be to have it as a workflow that logically consists of only successive high-level Python functions, where each function will become a "unit of work" cube and arguments/return values become data streams.

Right now the initial input to the first cube needs to be one or more molecules; we attach SD data to the molecule to get other plain old data in.

So I think we should now think through some architectural issues about how this would be used and how to modularize as part of workflows. Right now I think @sgill2 has mainly focused on the aspect of "given this prepared system, how do I sample this binding site or sites efficiently?" but we need to think a little bigger now.

Here are some key questions we should try to sort out this week:

If I have a series of ligands and some protein of interest, what would a typical workflow look like before applying blues?
How would we designate or determine what binding site or sites we want to explore for each ligand?
What would be passed into the actual blues run and what would it provide back? (i.e. what goes into the cube, what comes out?)
What final analysis/data would we report back to a user? (i.e. what is the analysis of the whole workflow?)
What will the user do with what is reported back and how do we facilitate that?

It seems like so far, @sgill2 has focused on the internals of item 3, plus a small amount of item 4 perhaps.

Some of this is obviously a bit premature as we have a lot of development/testing to do yet, but now's a good time to think about where we want to end up, with a finished product that can be applied to wide range of systems fairly easily.

Here are some initial thoughts from my end, but I could use your input, @sgill2 and @Limn1 : It seems to me that we're going to want to enable a workflow where someone takes a library of ligands and docks them to a protein in a binding site or sites (or a whole protein) they specify, selects diverse docking poses, runs short MD simulations to identify diverse stable binding sites, then applies NCMC with rotational and smart darting moves to determine the equilibrium populations of these different binding modes.

If that's the case, then I would think item 1 would take in (a) a set of molecules representing the ligand library, (b) a protein molecule for docking into, and (c) a designation of the binding site(s) to dock into, provided by (i) several placements of a reference molecule in the protein or (ii) selection of the whole protein to dock into or (iii) designation of a region around a reference molecule. Presumably the first stage (cube) would then be to run docking into this setup and return diverse poses back.

The second stage (cube) might then be to take these diverse poses for each ligand, run short MD simulations, cluster them (?) and identify stable locations the ligand can remain over these simulations.

The third stage would then perhaps be blues and would take in (a) the locations the considered ligand can be, and (b) a starting placement of the ligand. What would be returned?

Thoughts appreciated. We can also discuss perhaps tomorrow.

Make simple but nontrivial test system with fast run time

We've needed a simple test system for basic BLUES which will allow us to (with minimal simulation effort) verify that rotational moves are in fact working correctly and yielding correct answers. We probably want a system we can in principle run to convergence with standard MD (to get the right answers which would be archived for comparison) but which will run very fast with BLUES.

In discussions we came up with an eight-atom test system we think may work, which would consist of a "ligand" made up of a two atoms connected by bonds (with very weak partial charges -- a slight dipole) in a "binding site" made up of six atoms at the centers of the faces of a cuboid (like a rectangle but 3D), with the end atoms slightly charged (so that the ligand has a preferential orientation). The cuboid would be somewhat longer than it is tall and wide so that the "ligand" primarily fits end-wise, with one direction being preferred over the other somewhat.

The dimensions of the box would need to be tuned carefully so that transitions between "binding modes" are possible but slow with standard MD, whereas with rotational moves it will be easy to switch between them. The purpose of the charges is just to break the symmetry slightly so that of the two "preferred" orientations, one has substantially higher population than the other.

@sgill2 was going to set this up.

Option to output NCMC simulation frames

To debug at the ncmc level it'd be helpful to have the ability to output frames from the ncmc simulation. Currently the only way to do this is to use the simulation.writeFrame() method, which isn't ideal for two reasons:

You'd have to go and modify the blues code directly
Using ParmEd (which is what simulation.writeFrame uses) to write out the files is very slow

To address this I'm thinking of adding–to the simulation opt dict–an option to specify if ncmc simulation frames are wanted and at what interval to output those frames using openmm reporters.

Glitch in README.md formatting

There is some sort of glitch in the README.md formatting where the prerequisites aren't displaying properly and the things right below that are misformatted.

Also, if it's easy, it would be nice to make the text wrap around the blues logo; if not, we should make it smaller.

Make repo public, do final cleanup first

We'll need to make this public (open source) soon, because that was my original intention, but also because the Travis-CI testing framework is free for open source projects only. It will give us our first 100 builds for free anyway, though that won't last that long (100 commits, roughly) so we need to open it up soon.

What do we need to do first? i.e. what should be cleaned up/better documented/fixed? We probably won't have much traffic until we publish on it or start advertising, but we'll certainly have SOME.

query about moves.py

On line 215 & 216 of moves.py, the function OEPerceiveResidues was called twice successively. Is there a particular reason for the current command structure? Just trying to understand the function..
OEPerceiveResidues(molecule, OEPreserveResInfo_All)
OEPerceiveResidues(molecule)

Create benchmarking "suite"; what should be in it?

We really need a benchmark set or suite where we have a couple of diverse systems we can use to check performance of different move proposal schemes/integrators/etc. We want to move away from running a few small simulations locally when we change something and seeing that acceptance roughly stays the same or gets better to actually knowing EXACTLY how much different approaches impact sampling efficiency on some set of systems. We want this to end up basically push-button, so we can just run some utility on our queue and get back an assessment of the current level of performance.

Obviously, we should include toluene in lysozyme since we've done so much with this already and it's easy to figure out exactly how to analyze the data to assess efficiency (number of transitions per time, convergence of populations, etc.) But what else should be on our tests? @nathanmlim - do you think we can get your initial test system to this stage too?

And, what should we test? I'd think we'd want to normally look at each system, and then for each system try varying the amount of relaxation done over some range (how broad a range?) and look at measures of sampling efficiency.

Select a package/repository name

@sgill2 - we'll need to pick a name for this repository and the package people would install. Particularly, imagine this repository becomes a python package (plus supporting material) which can take an arbitrary protein-ligand system and accurately predict and return the binding modes of a ligand, and their populations. What would this package be called? What would you conda install mytoolname?

I temporarily changed the repo name to "ligand_ncmc" but this is probably not the final answer. Clever acronyms are acceptable, as well as names inspired by mythology, analogy, or all sorts of other alternatives. Thinking along the lines of "smart darting" may help (i.e. if this were just smart darting, we'd call it smartdarting or something).

We should resolve this soon, as in the very near future we'll need to start reorganizing into something which can be organized/installed as a python package, etc., and for that we need a name.

Add at least a minimal example and update README.md to reflect

We need at least a basic README.md which says, minimally:

What this is for/what it does
How to install/what the prerequisites are
What the different files here are and what they are for (i.e. a manifest)
Gives at least a minimal usage example which is provided here

A fairly extensive example of a README.md can be found on SMARTY: https://github.com/open-forcefield-group/smarty/blob/master/README.md . For now, this one can be more minimal, but it at least needs to say what the different scripts here are for. (i.e., imagine I wanted to start doing detailed code review now to check how things are organized/commented/etc., and I wanted to start by doing a test application of your tools to a problem I'm interested in. I should be able to figure out how to do that from your README.md, @sgill2 .)

UnicodeDecodeError on installation?

Not sure why I'm only getting this issue when trying to clean install the package from the repo on TSCC

(dev) [limn1@tscc-gpu-9-6 blues]$ python setup.py install
Traceback (most recent call last):
  File "setup.py", line 126, in <module>
    long_description=read('README.md'),
  File "setup.py", line 107, in read
    return open(os.path.join(os.path.dirname(__file__), fname)).read()
  File "/home/limn1/anaconda3/envs/dev/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2199: ordinal not in range(128)

Finish testing integrators and swap default integrator

In #50 and (to a lesser extent #53 ) we've been discussing changing the default integrator to NonequilibriumLangevinIntegrator. We need to do so, but first we need to finish testing:

@sgill2 has some questions he needs to answer (and re-test if needed) which are asked here: #50 (comment) (and could be answered on this issue) for tests on lysozyme, I believe.

@nathanmlim is going to test on a soluble epoxide hydrolase (?) system, cross-comparing this with the current integrator and report back.

Once we've tested a bit more carefully and documented the results it sounds like we should be able to switch, which will resolve issues with code duplication discussed in #50 and will hopefully also improve acceptance, etc.

Suppress warnings for setting particle sigma=1A

This is just a suggestion to fix a minor annoyance.

Using the new refactored framework where we provide a parmed.Structure as input. There is an absurd amount of warning messages like:

particle 22339 has Lennard-Jones sigma = 0 (charge=0.417 e, sigma=0.0 nm, epsilon=0.0 kJ/mol); setting sigma=1A

This appears to be coming out of openmm.alchemy but I've been told these warnings are safe to ignore.

I'm not quite sure what the best way to address this is?

Constraint handling in SimulationFactory.generateSystem()

I'm still trying to figure out the integrator differences in acceptance but in the process I noticed that the constraints may not be handled appropriately in the generateSystem() method.https://github.com/MobleyLab/blues/blob/master/blues/ncmc.py#L234-L247

Specifically while generateSystem() takes a constraint argument, that argument is never actually used in the function itself, thus the constraints for systems generated this way will always be None.

Update certain tests

@nathanmlim remarked in #53 that he needs to update certain tests:

I'll raise an issue to change the tests I had written up (since they're bad anyways).

So, this is a reminder to do so.

Allowing flexibility in integrator choice

Currently the SimulationFactory.generateSimFromStruct() method doesn't allow for specification of integrators and an NCMCVVAlchemicalIntegrator is automatically used as the integrator. I think allowing an option to either specify and/or pass in an integrator would be beneficial.

NCMC Reporter and logger changes

There are a couple parts that are getting merged in PR#94 that should definitely be revisited later:

Initialization of the logger module and it's settings is currently embedded in the example scripts.
- Needs moved into the core code somewhere.
The _getSimulationInfo still is hardcoded to account for extra propagation steps only from the lambda range 0.2 -> 0.8.
- Will need to fix that up so it needs to be dynamic if we change the lambda range.
- Need to include a check to ensure that the range is symmetric?
Progress in the simulateNCMC phase just prints out work values and the step value
- Will replace with a NCMC reporter I've been working on.

Character encoding issues in README.md

There is an issue in the README.md file related to character encoding that results in this error during installation of blues. It can be bypassed by deleting the entire contents of the README.md file.