maginngroup / cassandra Goto Github PK
View Code? Open in Web Editor NEWCassandra is a Monte Carlo package to conduct atomistic simulations.
Home Page: https://cassandra.nd.edu/
License: GNU General Public License v3.0
Cassandra is a Monte Carlo package to conduct atomistic simulations.
Home Page: https://cassandra.nd.edu/
License: GNU General Public License v3.0
Check off the items as they are completed. Feel free to add to the checklist if anything appears to be missing!
The README in Cassandra/Examples/NPT/water_spce is incorrect. It claims that the simulation is of 600 molecules, but it only has 90 molecules. It also refers to a file, npt.inp.chk, which does not exist, and it claims to start from that checkpoint file despite actually starting from npt.inp.xyz. The README may have been made for a different simulation from that provided in the examples, and may have other inaccuracies not listed here.
Compress large fragment files in the Cassandra distribution
@jshahOSU suggested that we might want to modify the ‘custom’ option for the Mixing_Rule section. Currently, the user has to specify all the cross interactions. We could provide a feature that only the parameters for the cross interaction of interest need to be specified. For the other cross interactions, the user-specified or default combining rules will apply.
The mcfgen script is not currently being automatically tested in any form.
It'd be good to include unit tests for several PDB files with species of different topologies (i.e. with rings, united atom, all atom, zeolites). The resulting MCF in these tests should be compared to accepted MCF.
Dear Developers,
I'm a Ph.D. student working on sorption in biopolymers. I've been doing molecular dynamics modeling on sorption in polymeric systems for past years (mainly with LAMMPS) and recently I've decided to migrate to Cassandra. first of all, I wanted to thank you for your clear user guide and examples which accelerated me in preparing my modeling tool. there are some questions regarding energy calculations in Cassandra that I find unclear and would be very thankful for If I can have your comments on it.
I start with my most important question:
1- I perform a MD + GCMC loop. first I add SPCE water with a specific chemical potential to my system with GCMC in Cassandra. afterward using a bash code I prepare the final product for GROMACS NPT. I repeat this loop to reach equilibrium. Since water molecule atoms move and vibrate during my MD section even when I use LINCS or RIGID algorithms to constrain the bonds. after some repetition, Cassandra finds the mismatch between mcf file and bond lengths in xyz and stops with an error. to solve this problem I can delete the bond and angle information from water spce. However, this results in previously bonded atoms in the same molecule to interact with lj and thus giving me very high energy and also problems with gcmc. To me, I can solve the problem by using intra-scaliing 0 0 0 0 instead of 0 0 0 1 to stop atoms interacting in the same molecule. (is this correct?) does this option neglect interaction within the same water molecule or all the interaction between different water molecules, too?
2-since all the above mentioned problems can happens in the solid structure of my polymer too can I use the same approach for my cellulose solid structure too?
3- strangely I found out that Cassandra gives me different energy values for wrap or unwrapped input files. If this is expected. should I always use unwrapped configurations for my simulations?
Again thanks for all your time and just wanted to add Cassandra is a very helpful and clear package and I'm happy to join your community
For a simulation with electrostatic interactions, the number of k-vectors in by a cubic box whose box length is X should be the same as those required by an orthogonal box whose box lenghts are all the same and equal to X.
The number of k-vectors is not the same between cubic and orthogonal boxes of same dimensions for the same system.
Steps to reproduce the behavior:
Any additional information here, attach relevant text or image files and URLs to external sites, publications , etc
The MCF files written by Cassandra during the fragment library generation do not contain the bond length tolerances that are (optionally) specified in the MCF file. Therefore, fragment library generation can fail for ring structures even if the PDB bond lengths are within the specified tolerance.
Cassandra should write the bond length tolerances to the fragment MCF files so that it can generate the fragment libraries without error in such cases.
With Python 2 is no longer being supported/updated, can the run_examples.py be converted to Python 3?
A conversion of run_examples.py to Python 3
When cloning, at least without any special flags:
$ git clone https://github.com/MaginnGroup/Cassandra.git
we get the entire git
history in the .git
hidden directory, which happens to be quite large. This may be due to some binaries being tracked in the history, which can balloon the size quickly. Here is a snapshot of my clone:
I will tag GOMC-WSU/GOMC#154 as the source of my deja vu (strangely enough, the histories are nearly the same size) and @justinGilmer, who noted this in GOMC, and @YounesN, who is probably responsible for fixing it there.
When simulating rigid solids, a PDB of the solid framework is required to setup a simulation. If CONECT records are found in this file, the script mcfgen.py generates an MCF with connectivity information, including bonds. This might be problematic when running a simulation for rigid solids, as the initial configuration is typically input as an XYZ file and the bonds of the framework atoms might not be equal to the nominal bond length. As a consequence, the function Compute_Molecule_Bond_Energy raises an error when it checks that the XYZ file and MCF bond lengths are consistent within some tolerance.
The documentation specifies that the PDB file for the rigid solid does not list CONECT information, so the mcfgen.py script will not include bond, angle, or dihedral sections in the force field template.
Perhaps adding a "--solid" flag to the command line interface of mcfgen to ignore the CONECT section of the PDB file if present. The generated ff template should only include intermolecular interactions and the final MCF should only have non-bonded parameters.
None
None
A minimal example of faujasite is provided. Included is a PDB with CONECT keywords. If the mcfgen.py script executed, the connectivity information will need to be input. Also the ring identification functions are executed.
Ring fragments with incorrect bond lengths in the starting structure (i.e., the physical bond length does not equal the length specified in the MCF file) are problematic. The following commentary would also apply to fixed angles. In V1.2, the (fragment generation, and end-use) simulations run without warning the user that the bond length is different than the specified bond length. In the version on the develop branch, the either simulation immediately exits with an error (broken bond).
I think it would be a useful feature if (either in Cassandra, or library_setup.py), we could "fix" incorrect bond lengths in rings. I'm open to suggestions regarding the best way to accomplish this end. I'm naively envisioning a solution whereby the coordinates are updated such that the provided configuration is updated to the closest configuration lying on the constraint surface. If integrated directly into the Cassandra executable, this could also "fix" initial configurations that have incorrect bond lengths.
We now have integrated the test suite into our development pipeline, which is a great first step. These are effectively integration tests. As the code continues to become more complicated we should also add unit tests. This will allow us to test the behavior of specific functions and increase our confidence that new code changes do not break existing behavior.
I need to do more research, but one option is pFUnit. An example of another scientific code that uses pFUnit is here. Look at the .pf
files to get an idea of how it is implemented.
Please reply with other options below!
The auxillary scripts should also contain version info that is consistent with the main code releases to help with bug/issue tracking. bump2version
can be used to manage the versioning.
Count and report the number of Widom insertions with core overlap for each test particle species/box combination. Output the totals to the log file underneath the chemical potentials.
Cassandra currently saves the trajectory output to an xyz
file. Though this is a general format, it contains minimal information about the trajectory, and contains no box information.
An alternative output format is the general simulation data object or GSD. This is the native output format for HOOMD-blue. Particles and topologies can vary from one frame to the next, it has a Python API, can be read by tools like freud and OVITO, and is a binary format that supports random frame access.
There is a C API, so I think with fortran-C interoperability we could use the format without too much difficulty.
I'd like to get some input from the other developers on this idea.
If fragments 1 and 2 are connected, only one connection should appear in the MCF file.
The connection is doubled.
# Fragment_Connectivity
2
1 1 2
2 2 1
Steps to reproduce the behavior:
mcfgen.py C4H10.pdb
in the Methane_Butane
example.The old Python 2 version of mcfgen.py distributed with V1.2 had the correct behavior:
$ diff C4H10.mcf v1.2_C4H10.mcf
68c68
< 2
---
> 1
70d69
< 2 2 1
For the custom mixing rule, it would be good to allow for Aij and Bij coefficients, in addition to sigma and epsilon. It will enable purely repulsive, purely attractive, or some balance of the two. I think, then, we should be able to model WCA potential as well.
I am curious to hear opinions for/against removing library_setup.py and moving this functionality entirely into the Cassandra executable.
Under this scenario, fragment libraries would be generated before the first step of any MC simulation. The user would be 'blind' to the fact that the fragment libraries are being generated; they would be discarded at the end of a given simulation. In my view, the benefits of this approach are that the user only has to run a single executable to perform a simulation. In my experience, the computational cost of fragment library generation is inconsequential and would thus not notably impact execution time under most use-cases.
I imagine providing options to save the fragment libraries to disk, and read fragment libraries from disk. This would preserve the current workflow for users who were interested and/or if there was some system for which fragment library generation was costly.
I'm interested in hearing opinions from more experienced developers/users as this would represent a substantial overhaul to the code.
If library_setup.py
is executed from a directory with a space in the path to the current location, it fails and exits with an error. Example where working directory is: /root/simulations/test sim
rm: cannot remove '/root/simulations/test': No such file or directory
rm: cannot remove 'sim/nvt.inp': No such file or directory
mv: target 'sim/nvt.inp' is not a directory
library_setup.py
should work correctly regardless of if there is a space in the directory path.
I have been running a simulation for ~2 billion steps without issue. When I restart to run for beyond 2 billion steps, the simulation says it is complete. This may be an issue with the size of numbers which the program can store.
Attached are the input files needed to check this problem. In R32_bmimpf6_fail.zip the input file results in the issue discussed above. In R32_bmimpf6_success.zip the input file is only set to run for a short amount of time and is successful at restarting.
R32_bmimpf6_fail.zip
R32_bmimpf6_success.zip
Running mcfgen.py with a .pdb without CONECT (for example, a rigid framework of a zeolite) should work.
When running mcfgen.py with the option -t
and the pdb without CONECT I get:
[hmcezar@bitz-desktop silicalite_ch4]$ python ~/Sofware/Cassandra/Cassandra/Scripts/MCF_Generation/mcfgen.py silicalite.pdb -t
Reading Modified PDB File...
*********Generation of Topology File*********
Summary
---- -----------------------------------
0 bonds
0 rings
Traceback (most recent call last):
File "/home/hmcezar/Sofware/Cassandra/Cassandra/Scripts/MCF_Generation/mcfgen.py", line 1420, in <module>
angleID()
File "/home/hmcezar/Sofware/Cassandra/Cassandra/Scripts/MCF_Generation/mcfgen.py", line 546, in angleID
nBonds = len(atomConnect[atom])
KeyError: 1
Steps to reproduce the behavior:
mcfgen.py -t
I will push a fix for this bug.
Enable Cassandra to read pregenerated trajectories from .xyz
and .H
files, compute properties from them, and perform Widom insertions in them.
In the nvt, npt, gcmc, and gemc driver subroutines, the time counter for coordinate writing should increment coord_time
by ncoord_freq
, which is the time interval in minutes between steps with coordinate writing given in the input file after keyword coord_freq
when the units given are minutes.
Instead of incrementing by ncoord_freq
, coord_time
increments by nthermo_freq
, which is the time interval in minutes between steps with thermodynamic property writing given in the input file after keyword prop_freq
when the units given are minutes.
Replace nthermo_freq
in these sections with ncoord_freq
Code copied and pasted from this section (it's the same in each of the abovementioned drivers):
write_flag = .FALSE.
IF (.NOT. timed_run) THEN
IF ( MOD(i_mcstep,ncoord_freq) == 0) write_flag = .TRUE.
ELSE
now_time = now_time - coord_time
IF(now_time .GT. ncoord_freq) THEN
coord_time = coord_time + nthermo_freq
write_flag = .TRUE.
END IF
END IF
Should we allow dashes in file names? This restriction occasionally causes confusion. I'm not opposed to disallowing dashes if there is a good reason. Otherwise, I think we should allow them. If we decide to make this change we need to make sure that dashes work with mcfgen.py
and library_setup.py
in addition to the core code.
@rwsmith7531 I know you have looked into this?
@ryangmullen @emarinri Thoughts?
Allow dashes in file names.
Add an explicit warning in documentation. Also add a more descriptive error message to the code.
Discussion spawned by #87.
This is in order to simplify the signature of the function when using many molecules.
Use Frenkel’s method to compute error bars with block averages. For a long simulation, it involves modifying the block size until a plateau appears in the standard deviation
The subroutine Minimum_Image_Separation should compute the minimum image distance between two atoms.
The subroutine Minimum_Image_Separation computes the distance from a central atom i to the image of another atom j in a cell centered on atom i. For cubic and orthogonal cells, the distance between atom i and the image of j in this cell will be the minimum image distance. However, for triclinic cells the image j in this cell may not be the closest image to i.
# Box_Info
1
cell_matrix
1.000000 0.500000 0.000000
0.000000 0.866025 0.000000
0.000000 0.000000 1.000000
2
# BOX: 1.000000 1.000000 1.000000 90. 90. 60.
LJ 0.500000 0.288675 0.000000
LJ 1.312500 0.757772 0.000000
3a. As input, the vector pointing from atom 1 to atom 2 is (13/24, 13/24, 0) in scaled coordinates and (0.812501, 0.469097, 0) in Cartesian coordinates. The length of this vector is 0.938195.
3b. In a triclinic cell centered on atom 1, atom 2 gets wrapped along lattice vectors a and b. The vector pointing from atom 1 to image 2 is (-11/24, -11/24, 0) in scaled coordinates and (-0.687499, -0.396928, 0) in Cartesian coordinates. The length of this vector is 0.793856. This is the distance that CASSANDRA finds.
3c. If we wrap atom 2 only using lattice vector a, the vector pointing from atom 1 to image 2 is (-11/24, 13/24, 0) in scaled coordinates and (-0.187499, 0.469097, 0) in Cartesian coordinates. The length of this vector is 0.505181. This is the distance that CASSANDRA should find.
Rather than wrapping atoms such that the scaled distance between atoms is on the range [-0.5, 0.5] along each lattice vector, we might could wrap atoms until the Cartesian component of the distance is on the range [-hbox, hbox] where hbox is half the distance between faces of an orthogonal cell constructed around atom i.
The current ring fragment identification functionality in mcfgen.py uses a custom algorithm to detect rings. This is not optimal, as there are established undirected graph cycle detection algorithms implemented in open source libraries.
Utilize networkx to detect rings.
None
A few months back I wrote an overhaul to mcfgen.py
and library_setup.py
. See numbers #32 and #33 for the code and reference. I have refrained from merging the changes into the main repository because I am concerned that the updated scripts have not yet had adequate testing. I propose we move forward with integrating these into the repository but first add pytest
-based unit testing on these two scripts. See #83 for further motivation.
This overhaul will also move us to python3
for both scripts. We have now included a python2
deprecation warning with both scripts for a couple of releases but I nonetheless propose that we continue to publish the python2
compatible versions of the scripts at least for another release or two, at which time they would be removed.
Add the description of the convention used to define box vectors to Cassandra documentation regarding the cell_matrix
method of specifying the simulation box dimensions.
Apply_PBC_Anint
in minimum_image_separation.f90
should apply periodic boundary conditions to all three box dimensions, with the box image in each dimension being dependent on the parent coordinate in the corresponding dimension.
Apply_PBC_Anint
, when applying PBC to cubic boxes, bases the y and z box images on the x parent coordinate, not the y or z parent coordinates.
Steps to reproduce the behavior:
Correct the dimension labels in the code for Apply_PBC_Anint
.
This bug was discovered when testing Cassandra's trajectory reader (still in development) on unwrapped trajectories from LAMMPS.
mcfgen.py
writes an MCF file without charges if the charge fields of the .ff
file are left blank. E.g., if no charges are specified a line of the Atom_Info
section of the MCF file appears as follows:
1 O O 15.999 LJ 78.208 3.166
rather than with charges:
1 O O 15.999 -0.82 LJ 78.208 3.166
In some cases it appears that "None" is written in place of the charges, e.g.,
1 O O 15.999 None LJ 78.208 3.166
mcfgen.py
should throw an error if no charges are provided since the charges are a required field of an MCF file.
The Cassandra error with no charges listed is:
Cannot have an atom of vdw type 148.000 in a box of vdw type LJ
This error occurred in subroutine Get_Atom_Info on step 0.
Fatal Error. Stopping program.
Thanks to @ShahResearchGroup for reporting this bug.
Update mcfgen, plot, libgen
Cassandra currently has problems with performing a number of Widom insertions exceeding the largest number storable as an INT32
, so the counters related to Widom insertions should be updated to INT64
.
Read the checkpoint of a NPT simulation and use it as a starting point of a GCMC simulation.
The program crashes without a clear error message in the log file, showing in STDOUT the message below
forrtl: severe (59): list-directed I/O syntax error, unit 100, file /path/to/file.chk
Steps to reproduce the behavior:
master
branch versioncheckpoint
By compiling with debug symbols and traceback, I was able to track the crash to line 225 of read_write_checkpoint.f90
:
READ(restartunit,*) initial_mcstep
I´m attaching an example below. The crash happens when I try to run gcmc_co2.inp
.
checkpoint_error.zip
The fragment generation fails for a cage structure. MCF and PDB file here. I had to modify this MCF file by hand some but I think it is correct.
There are several separate issues raised by this MCF file:
line_string
.line_array
.I think we should allow for longer lines (360 characters ?), and more items per line (60 ?).
Here is what I mean by "cage":
The number of trials for identity switch moves is not initialized to zero. This does not affect the behavior of the code as identity switch is not fully implemented yet. However, it may cause the log file to report that some (non-zero) number of identity switch moves were attempted.
In 'move_rotate.f90', the subroutine called 'Rotate_Molecule_Axis' (starting on line 362) allocates and computes one-dimensional arrays called 'dxrot', 'dyrot', and 'dzrot'. These arrays don't seem to be used for anything internal to the 'Rotate_Molecule_Axis' subroutine and are output to the 'dx', 'dy', and 'dz' arguments specified when the 'Rotate_Molecule_Axis' subroutine is called. The 'dx', 'dy', and 'dz' arrays are local to the 'Rotate' subroutine and don't appear to be utilized anywhere. Am I overlooking where these quantities are utilized in some fashion or are these arrays vestiges of earlier versions that can now be removed?
Harmonically Mapped Averaging (HMA) allows crystal properties (energy, pressure, heat capacity) to be computed more efficiently (more precision in the same time or the same precision in less time). Other benefits include reduced finite-size effects, reduced truncation effects, shorter decorrelation time and faster equilibration.
We'll be adding code to compute HMA properties.
HMA properties could be computed by dumping forces and coordinates to files, but the output would be huge.
HMA paper: https://doi.org/10.1103/PhysRevE.92.043303
HMA in LAMMPS paper: https://doi.org/10.1063/1.5129942
HMA for VASP paper: https://doi.org/10.1016/j.cpc.2020.107554
LAMMPS pull request: lammps/lammps#1503
The fragments generated for fused ring systems violate the specified fixed bond lengths. If the crankshaft move is applied to an atom that is connected to >2 other ring (not including exoring) atoms, it will cause one of the ring bonds to change length. This can be easily demonstrated by generating fragments for naphthalene (files here) and looking at the fragment bond lengths.
Currently unsure of the correct solution. One possibility that comes to mind is not applying the crankshaft move to ring atoms connected to > 2 other ring atoms. One concern with this approach is whether all the internal DOF would be properly sampled in that case.
Two examples are shown below. Orange and red atoms are RING
atoms; the orange atoms would be used for the crankshaft move whereas the red atoms would not.
Also related to #59 as the cage structure in that example suffers from the same problem.
Jindal noticed that the libary_setup.py generates as many fragment libraries as there are fragments in a molecule. This is fine for small molecules; however, for large molecules, it can consume considerable amount of space. Can we, instead, generate fragment libraries only for unique fragments?
PR #106 updates String_To_Int
to be read to INT64
instead of INT32
. The current behavior without this change is to overflow to negatives if the number in the input string is too large to be represented by INT32, and this occurs in some of the examples because they provide random number seeds that are large enough to cause this overflow. Until recently, this hasn't been very problematic because the overflow was consistent, but now it causes failures in the test suite for PR #106 , where the overflow was removed, causing the seeds provided in the input file to be the seeds that are actually effectively used, which is not the case where the overflow is present.
The examples in the test suite need to be corrected to provide smaller seeds that do not cause overflow so that they don't cause test suite failures when the update to String_To_Int
in PR #106 is made.
The ability to run NpT-GEMC with no box fluctuations for a single simulation box; e.g., one box is vapor (with box fluctuations) and another is a solid or pore structure (no box fluctuations).
The solution will likely require modification to input routines to allow for zero box fluctuations in one box and ensuring there are no other cascading changes. The box without box fluctuations should support cubic or non-cubic boxes. The solution should also include a unit test demonstrating the functionality.
I ran into an issue where I was trying to restart a simulation from a checkpoint file named equil-rst.out.chk
. I found the same issue addressed here: https://cassandra.nd.edu/index.php/forum/cassandra-software/300-problem-restarting-simulation.
I think it's okay to not allow dashes in the file names, but I think the default name for restart simulations (equil-rst
) should be changed to something like equil_rst
to fit this naming convention. In addition, improve the documentation to make it clear that dashes are not allowed.
Alternatively, the code could be refactored to allow dashes in the file names, but this may be a much more complicated solution.
Any additional information here, attach relevant text or image files and URLs to external sites, publications , etc
The documentation is not clear on the number of parameters a production simulation required for an NPT simulation . To clarify, it requires two arguments that control the acceptance output frequency to the log file of thermal and volume moves, respectively
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.