nicocvn / emdna Goto Github PK
View Code? Open in Web Editor NEWEnergy minimization software for DNA/proteins complexes by the Olson lab at Rutgers
Home Page: https://nicocvn.github.io/emDNA/
Energy minimization software for DNA/proteins complexes by the Olson lab at Rutgers
Home Page: https://nicocvn.github.io/emDNA/
Develop tests to be run upon installation of emDNA for functionality
@rty10 So as I moving on with DNASim refactoring I noticed a few things:
There is a sub-dir called dna_force_field_packager in the DNASim source tree; as far as I remember (and can tell) this is a standalone command line tool to create force-field data files to be used in emDNA. The way I see it is that this better belong next to emDNA rather than in the DNASim source tree (which is supposed to be a library only). Would it be fine to relocate that thing? A broader question is how do you guys deal with force field when using emDNA (in terms of workflow)? Do you only use the one shipped with the tool or you use custom ones?
My intention is that once the project is refactored the only build products that get installed are emDNA commnad line tools. That would mean that the DNASim library itself does not get installed. I am fine with installing the test executables as this is convenient to check the build was proper.
There is quite a bit of cruft around single/double precision. I would suggest dropping single precision entirely because: a) I am not sure this was ever fully tested (float
suffer from quite limited accuracy) b) realistically I would expect anyone using this code to have access to a machine that supports double precision natively. So the question is: can we drop support for single precision?
Do we have a set of emDNA tests? and by that I mean a set of inputs and outputs computed with the "original" emDNA? This would be very helpful to make sure nothing break and that we are able to reproduce previous results.
This is not yet pushed in any branches but here are the required fixes:
googletest needs the option for shared CRT:
set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)
FileHandler and OutputFileHandler contains non-portable code (getcwd, mkdir, ... which does not seem to be used) so it should be removed
DS_ASSERT macro uses PRETTY_FUNCTION which is not portable
This issue should be expanded to describe how Windows is supported including installation notes
Circular DNA currently works by taking reference frame of structure, copying the first frame to the end, and then in optimization ensure use of "--hold-last-bp".
Circular DNA options should be in place to automatically generate this addition in the code and add stipulation to emDNA_DNAElectrostaticsParams / BpCollectionElectrostaticEnergy files to prevent possibility of two charges occupying the same space.
We can keep most of the documentation in the repo but it is usually a good idea to have a simple page other than the plain repo view.
See: https://pages.github.com/ > project site
The following text should be copy/pasted in a LICENSE at the root of the repo:
@rty10 feel free to update/edit/shorten the list of names in the first line ... Not sure who added what to the current code base so maybe other members should be added here (Stefjord?).
Copyright © 2014-2021 Nicolas Clauvelin, Wilma Olson, Robert Young All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
* Neither the name of “Rutgers, the State University of New Jersey” nor the names
of its contributors may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Originally DNASim was designed to be a cross-project library and it includes various dependencies that are not relevant to emDNA.
We should make sure that we only keep what is needed so that the build process is easier and code maintenance is simplified.
The following dependencies should be removed:
What we need to keep:
This is not very high priority but @rty10 has mentioned that his code has failed to run in the cluster a few times but only on specific nodes. After working with the computing center director and some googling we found out that it might potentially be due to the compilation with "march=native" and may have to do with AVX-512 extensions. The front end nodes on the Rutgers cluster are newer machines and when the compiled code is assigned to run on older machines it crashes. This is just a theory, we don't know exactly what's producing the crash but could one of the emDNA libraries (alglib??) be causing this. It hasn't really happened with the chromatin code before, so shouldn't be the DNASim library.
The issue with Alglib is that it is not available in a way that can easily be imported in the repository: the user needs to download it and install it. Due to license restrictions I am not sure we can just import it and build it as part of the project.
There a bunch of other well-maintained libraries providing L-BFGS minimization; here are a few:
So the plan is:
That will require to define a list of tests.
Generate an installation guide and update the documentation html document.
(Not sure how to handle this one at the moment ... will refine later)
I have a lot of resources on the formulation of the implementation (definitions of the matrices, Mathematica notebooks for validation, ...) including scanned notes for the Hessian expressions.
This should be added somewhere at some point
The CMake setup in the project is a bit out-dated, a lot of settings are hard-coded and some antipatterns are sprinkled there and there.
The revised CMake setup should be able to support the main platforms (Win, Linux, macOS) and multiple compilers (GCC, Clang, MSVC).
Tasks:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.