Comments (5)
Excellent point. This is certainly undesirable.
I like the idea of a multi-resolution output file format, with the large, full restart snapshots being written less frequently.
I felt an important feature of these NetCDF files and repex.py
was robustness to early termination. Resuming from an existing NetCDF file should ideally be painless, where it finds the last "good" snapshot and resumes from there.
I propose we allow the user to tune the checkpoint frequency separately from the frequency at which other properties are written. When a run is resumed, we should ERASE the data following the last checkpoint and resume from that point. This may cause the odd behavior where running for a short amount of time after a termination will actually remove samples, but I think it gives the most robust overall behavior.
repex.py
should also predict the final file sizes for you so you can tell if you are going to run out of storage.
Finally, I like the idea of a "plug-in" system for computing different properties to be included in the NetCDF file. Structuring in a manner similar to the Reporter
objects is a good idea.
This feature will be very important as we start doing explicit solvent free energy calculations. This has recently been enabled by Peter's addition of a dispersion correction to the CustomNonbondedForce
class.
I'll mark this high priority.
from brokenyank.
This might be something to punt until the 2.0 release. To me, we could just fix all outstanding minor issues (e.g. installation cleanup, GPU issues, MPI issues) for the 1.0 release but have a list of "harder" changes that will wait until 2.0.
from brokenyank.
Good point. It's not essential for explicit solvent free energy calculations if you have sufficient storage, so we can punt this.
from brokenyank.
The other idea I had for reducing file sizes is to only output results for specific thermodynamic states. For example, when I do repex for conformational change, I don't actually care about what goes on at 500K. I just want those states to be exchanging with ambient temperatures. Thus, it would make sense to simulate 300-500K, but output only 300-320K.
I think this idea might be less useful for alchemical simulations, however.
from brokenyank.
We can certainly allow more flexibility as to which data precisely is output every iteration. I'd still want to write out full-precision "checkpoints" every so often, to allow the simulation to cleanly resume from these checkpoints.
from brokenyank.
Related Issues (15)
- Feature request: Add support for GROMOS forcefields and small molecules via ATb
- YANK does not correctly handle the situation where all MPI processes are attached to GPU nodes
- Check everything with PyFlakes and (auto) pep8. HOT 1
- Standardize our usage of delayed imports HOT 1
- Multidimensional Repex and AMD HOT 1
- Modify alchemy module to use group-based CustomNonbondedForce
- Allow systems to be set up using OpenMM Modeller facility
- Feature request: Protein mutations HOT 1
- Speed up alchemical intermediate creation
- See if alchemical intermediates can be assigned global Context parameter to avoid the need to create and cache many Context objects
- Deprecate pyopenmm HOT 2
- test_repex_mpi.py cuda platform HOT 4
- test_repex_mpi.py hangs on context creation HOT 24
- Use separate MBar? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from brokenyank.