Coder Social home page Coder Social logo

nicocvn / emdna Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 1.0 11.07 MB

Energy minimization software for DNA/proteins complexes by the Olson lab at Rutgers

Home Page: https://nicocvn.github.io/emDNA/

CMake 1.59% C 0.74% C++ 85.36% Shell 0.80% Python 1.69% Mathematica 9.81%
biophysics chemistry physics

emdna's People

Contributors

nicocvn avatar rty10 avatar stodolli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

rty10

emdna's Issues

Project refactoring questions

@rty10 So as I moving on with DNASim refactoring I noticed a few things:

  1. There is a sub-dir called dna_force_field_packager in the DNASim source tree; as far as I remember (and can tell) this is a standalone command line tool to create force-field data files to be used in emDNA. The way I see it is that this better belong next to emDNA rather than in the DNASim source tree (which is supposed to be a library only). Would it be fine to relocate that thing? A broader question is how do you guys deal with force field when using emDNA (in terms of workflow)? Do you only use the one shipped with the tool or you use custom ones?

  2. My intention is that once the project is refactored the only build products that get installed are emDNA commnad line tools. That would mean that the DNASim library itself does not get installed. I am fine with installing the test executables as this is convenient to check the build was proper.

  3. There is quite a bit of cruft around single/double precision. I would suggest dropping single precision entirely because: a) I am not sure this was ever fully tested (float suffer from quite limited accuracy) b) realistically I would expect anyone using this code to have access to a machine that supports double precision natively. So the question is: can we drop support for single precision?

  4. Do we have a set of emDNA tests? and by that I mean a set of inputs and outputs computed with the "original" emDNA? This would be very helpful to make sure nothing break and that we are able to reproduce previous results.

Windows fixes

This is not yet pushed in any branches but here are the required fixes:

  • googletest needs the option for shared CRT:

    set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)
    
  • FileHandler and OutputFileHandler contains non-portable code (getcwd, mkdir, ... which does not seem to be used) so it should be removed

  • DS_ASSERT macro uses PRETTY_FUNCTION which is not portable

This issue should be expanded to describe how Windows is supported including installation notes

Incorporate Circular DNA

Circular DNA currently works by taking reference frame of structure, copying the first frame to the end, and then in optimization ensure use of "--hold-last-bp".
Circular DNA options should be in place to automatically generate this addition in the code and add stipulation to emDNA_DNAElectrostaticsParams / BpCollectionElectrostaticEnergy files to prevent possibility of two charges occupying the same space.

License text

The following text should be copy/pasted in a LICENSE at the root of the repo:

@rty10 feel free to update/edit/shorten the list of names in the first line ... Not sure who added what to the current code base so maybe other members should be added here (Stefjord?).

Copyright © 2014-2021 Nicolas Clauvelin, Wilma Olson, Robert Young All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this
  list of conditions and the following disclaimer in the documentation and/or
  other materials provided with the distribution.
* Neither the name of “Rutgers, the State University of New Jersey” nor the names
  of its contributors may be used to endorse or promote products derived from this software
  without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

DNASim dependencies cleanup

Goal

Originally DNASim was designed to be a cross-project library and it includes various dependencies that are not relevant to emDNA.

We should make sure that we only keep what is needed so that the build process is easier and code maintenance is simplified.

Dependencies

The following dependencies should be removed:

  • cereal: this is a serialization library and it is not used for emDNA
  • ode-0.13: this is a physics engine which was used in other projects for collision detection; for emDNA it is not used and can be removed
  • nanoflann: this one is not used in emDNA at the moment; however, should electrostatics be fully implemented it might be useful to speed up computation if a cut-off is used.

What we need to keep:

  • cereal: this provides the serialization feature
  • Eigen: this provides linear algebra for all computations in emDNA
  • tclap: this is used for the command line interface ... back when emDNA was originally implemented there was not many choices but situation has changed so we could revise this one

Updating to Eigen v3.3.X break sparse matrix code

Unclear what is the reason at the moment so as part of #3 and #1 Eigen has only been updated to v3.2.9. Might be worth taking a look at the changelog at some point to get a better understanding of what has been changed on the sparse matrix side.

Fix compile issues on heterogeneous clusters

This is not very high priority but @rty10 has mentioned that his code has failed to run in the cluster a few times but only on specific nodes. After working with the computing center director and some googling we found out that it might potentially be due to the compilation with "march=native" and may have to do with AVX-512 extensions. The front end nodes on the Rutgers cluster are newer machines and when the compiled code is assigned to run on older machines it crashes. This is just a theory, we don't know exactly what's producing the crash but could one of the emDNA libraries (alglib??) be causing this. It hasn't really happened with the chromatin code before, so shouldn't be the DNASim library.

Replace Alglib dependency

The issue with Alglib is that it is not available in a way that can easily be imported in the repository: the user needs to download it and install it. Due to license restrictions I am not sure we can just import it and build it as part of the project.

There a bunch of other well-maintained libraries providing L-BFGS minimization; here are a few:

So the plan is:

  • replace Alglib
  • validate the new implementation against well-defined tests

That will require to define a list of tests.

Notes, notebooks and other docs

(Not sure how to handle this one at the moment ... will refine later)

I have a lot of resources on the formulation of the implementation (definitions of the matrices, Mathematica notebooks for validation, ...) including scanned notes for the Hessian expressions.

This should be added somewhere at some point

REAL_WIDTH

In DNASim_Defines.h the value of the REAL_WIDTH macro was changed in December 2018 from 10 to 6 ... @rty10 or @stodolli any ideas why?

Revise and simplify CMake build system

The CMake setup in the project is a bit out-dated, a lot of settings are hard-coded and some antipatterns are sprinkled there and there.

The revised CMake setup should be able to support the main platforms (Win, Linux, macOS) and multiple compilers (GCC, Clang, MSVC).

Tasks:

  • address #1 to simplify DNASim setup
  • revise DNASim CMake setup
  • revise emDNA CMake setup

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.