Coder Social home page Coder Social logo

patrickfuchs / buildh Goto Github PK

View Code? Open in Web Editor NEW
14.0 5.0 6.0 28.12 MB

:computer: Build hydrogen atoms from united-atom molecular dynamics of lipids and calculate the order parameters.

Home Page: https://buildh.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 90.70% Makefile 0.58% TeX 8.72%
python lipids order-parameters united-atom molecular-dynamics-simulation

buildh's Introduction

buildH

DOI DOI SWH License: BSD Binder Code CI Status Doc CI Status Documentation Status Powered by MDAnalysis buildH version on PyPI Anaconda-Server Badge

buildH_logo

Build hydrogen atoms from united-atom molecular dynamics of lipids and calculate the order parameters

Features

buildH can :

  • Reconstruct hydrogens from a united-atom structure file (pdb, gro) or trajectory (e.g. xtc).
  • Calculate the order parameters based on the reconstructed hydrogens.
  • Write new structure trajectory files with the reconstructed hydrogens.

buildH works in two modes :

  1. A slow mode when an output trajectory (in xtc format) is requested by the user. In this case, the whole trajectory including newly built hydrogens are written to this trajectory.
  2. A fast mode without any output trajectory.

In both modes, the order parameters are calculated. All calculations are accelerated with Numba. As a CPU cost indication, running buildH on a trajectory of 2500 frames with 128 POPC (without trajectory output) takes ~ 7' on a single core Xeon @ 3.60GHz.

Requirements

Python >= 3.6 is mandatory for running buildH.

buildH is written in Python 3 and needs the modules numpy, pandas, MDAnalysis and Numba.

Installation

Pip

A simple installation with pip will do the trick:

python3 -m pip install buildh

All dependencies (modules) will be installed automatically by pip.

Conda

buildH is also available through the Bioconda channel:

conda install buildh -c bioconda -c conda-forge

More details on installation here.

For installing a development version, see here.

Running buildH

Once installed, a simple invocation of the buildH command will run the program ($ represents the Unix prompt):

$ buildH
usage: buildH [-h] -c COORD [-t TRAJ] -l LIPID [-lt LIPID_TOPOLOGY [LIPID_TOPOLOGY ...]] -d DEFOP
              [-opx OPDBXTC] [-o OUT] [-b BEGIN] [-e END] [-pi PICKLE] [-igch3]
buildH: error: the following arguments are required: -c/--coord, -l/--lipid, -d/--defop

The minimal command for running buildH can resemble this:

$ buildH -c start_128popc.pdb -t popc0-25ns_dt1000.xtc -l Berger_POPC -d Berger_POPC.def

The different arguments mean the following: -c start_128popc.pdb is a pdb file with 128 POPC molecules, -t popc0-25ns_dt1000.xtc is a trajectory with 25 frames, -l Berger_POPC indicates the united-atom force field and the type of lipid to be analyzed, -d Berger_POPC.def indicates what C-H are considered for H building and order parameter calculation (the structure and trajectory files can be found here). The def file can be found here. The final order parameters averaged over the trajectory will be written to the default output name OP_buildH.out

Other detailed examples and Jupyter Notebooks can be found in the documentation at Read the Docs.

Important: sometimes, when performing MD, some molecules are split over periodic boundary conditions (PBC). buildH takes as input whole structures (pdb, gro, xtc, etc.). If broken molecules are supplied, it will most likely generate nonsense results. So it is up to the user to take care of making molecules whole before running buildH (e.g. by using a tool like trjconv in GROMACS with flag -pbc mol).

Invoking buildH with the -h flag will display some help to the screen and tell which lipids are supported.

$ buildH -h
usage: buildH [-h] [-v] -c COORD [-t TRAJ] -l LIPID [-lt LIPID_TOPOLOGY [LIPID_TOPOLOGY ...]] -d
              DEFOP [-opx OPDBXTC] [-o OUT] [-b BEGIN] [-e END] [-igch3]
[...]
The list of supported lipids (-l option) are: Berger_CHOL, Berger_DOPC, Berger_DPPC, Berger_POP, Berger_POPC, Berger_PLA, Berger_POPE, Berger_POPS, CHARMM36UA_DPPC, CHARMM36UA_DPUC, CHARMM36_POPC, GROMOS53A6L_DPPC, GROMOSCKP_POPC, GROMOSCKP_POPS. More documentation can be found at https://buildh.readthedocs.io.

Documentation

The full documentation is available at Read the Docs.

Contributors

  • Hubert Santuz
  • Amรฉlie Bรขcle
  • Pierre Poulain
  • Patrick Fuchs

License

buildH is licensed under the BSD License.

Contributing

If you want to report a bug, request a feature, or propose an improvement use the GitHub issue system.

Please, see also the CONTRIBUTING file.

Note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See the CODE_OF_CONDUCT file.

Citing buildH

If you use buildH for your research, please cite :

Santuz et al., (2021). buildH: Build hydrogen atoms from united-atom molecular dynamics of lipids and calculate the order parameters. Journal of Open Source Software, 6(65), 3521, https://doi.org/10.21105/joss.03521

buildh's People

Contributors

abacle avatar hublot avatar patrickfuchs avatar pierrepo avatar readthedocs-assistant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

buildh's Issues

Why does it crash with file 200POPC.pdb ?

When ran on this file :

$ python3 ./add_hydrogens.py 200POPC.pdb
[...]
Traceback (most recent call last):
  File "./add_hydrogens.py", line 624, in <module>
    new_df_atoms = build_all_H(universe_woH, return_coors=True)
  File "./add_hydrogens.py", line 553, in build_all_H
    Hs_coor = buildHs_on_1C(atom)
  File "./add_hydrogens.py", line 454, in buildHs_on_1C
    helper1_coor = sel("name {0}".format(helper1_name))[0].position
  File "/home/fuchs/software/miniconda3/lib/python3.7/site-packages/MDAnalysis/core/groups.py", line 1804, in __getitem__
    return super(AtomGroup, self).__getitem__(item)
  File "/home/fuchs/software/miniconda3/lib/python3.7/site-packages/MDAnalysis/core/groups.py", line 475, in __getitem__
    return self.level.singular(self.ix[item], self.universe)
IndexError: index 0 is out of bounds for axis 0 with size 0

For the record, this file has 200 POPC and some TO (trioleins).

Using MDAnalysis data structure ?

Hi,

Following IRL discussion with @patrickfuchs, I'm trying to use MDAnalysis data structure to build the hydrogens.
This issue will resume my progress on it and see if it's doable.

Right know, I'm able to build easily the hydrogen with this code :

if __name__ == "__main__":
    # read coordinates in a pandas dataframe
    # list_df_residues = pdb2list_pandasdf_residues("POPC_only.pdb")
    # new_df_atoms = reconstruct_hydrogens3(list_df_residues)
    # print(pandasdf2pdb(new_df_atoms))

    u = MDAnalysis.Universe("POPC_only.pdb")

    for atom in u.atoms:
        if atom.name in dic_lipids.POPC:
            typeofH2build, helper1_name, helper2_name = dic_lipids.POPC[atom.name]

            #atom Object has a residue object. This residue object has a list of its atom we can select.
            #[0] is because select_atoms return a AtomGroup who contains only 1 atom
            helper1_coor = atom.residue.atoms.select_atoms("name {0}".format(helper1_name))[0].position
            helper2_coor = atom.residue.atoms.select_atoms("name {0}".format(helper2_name))[0].position
            # Build H(s).
            H1_coor, H2_coor = get_CH2_H(atom.position, helper1_coor, helper2_coor)

            H1_name, H2_name = get_name_H(atom.name) 

Pro :

  • it's very fast : ~4s on my computer
  • we skip the use of Dataframes

Cons:

  • I'm not able, yet, to recreate a new universe object with the newly hydrogens inserted at the right positions. I'm working on it.

Transform dic_lipids.py into json files

After thorough discussion, it will be better to transform the dic_lipids.py into json files.
This dic_lipids.py is used to reconstructed hydrogens from a united atom bilayer.

This will allow the users to add their own .json files for their own trajectories.
A regular database of json will be created in the buildH source code into a directory called lipids to provide a set of supported lipids.
We will add an option to the program for user to provided their own json.

Steps:

  • Convert the current dic_lipids.py into a series of .json files into buildh/lipids directory
  • Transform the code to read those json files
  • Provide a list of supported lipids when using the -h of the program (by parsing the directory)
  • Add an argument for users to provide their own .json files.
  • Integrate the lipids from NMRLipids@d0a2bc1 (issue #44)

Urgent things TODO before first release

  • Manage options w/wo output traj files.
  • Name of output file (containing OP) ;.out extension of this file ?
  • Change function quick_dic().
  • Make dic_lipid.POPC choosable by the user (use builtin function gettatr()).
  • Use ordered dic for dic_OP (so that the OP output is always in the same order).

Force python 3 detection

I realized that when installing MDAnalysis with python2.7, buildH doesn't issue any error but when it runs, it generates wrong results. So we should add a detection of python 3.7 (or higher).

Create a def_files directory

Provide a set of order parameters definitions files into a directory called def_files.
Users will pick them to calculate the order parameters.

Refactorizing

The main function buildH_calcOP.py gets big. It would be nice to refactorize it by adding functions.
Also, the script buildH_calcOP.py could be factorized by adding modules.

Specify the type of trajectory supported

Currently, we can support any trajectory and topology files that MDAnalysis support.
But, our argument option for trajectory is -x, --xtc wich implies that only xtc trajectory files are supported.

We should change it to a more general option name and specify it in the documentation

Create doc

I'm about to create a doc dir. I'll move there all the stuff related to doc (vectors.png, CHARMM36_POPC_validation), simplify the main README.md by move some of its content.
Once done, we can complete later the doc in a some appropriate format (sphynx?) and add notebooks to guide the user.

Update vectors.png

In the current version the yellow vectors don't fit the C-H bonds (drawn as sticks), although they should (and they do). That would be useful (in the long term) to show this also for the other types of H that we rebuild.

Be clearer when there is an error with the user supplied json file

Making a try with a toy (user supplied) json file:

$ buildH -c 3Conly.pdb -d 3Conly.def -lt Berger_3Conly.json  -l PO3C
usage: buildH [-h] -c COORD [-t TRAJ] -l LIPID [-lt LIPID_TOPOLOGY [LIPID_TOPOLOGY ...]] -d DEFOP [-opx OPDBXTC] [-o OUT]
              [-b BEGIN] [-e END] [-pi PICKLE]
buildH: error: Berger_3Conly.json is in a bad format.

This error message is not very precise. The user can't find where the error comes from in the json file.

When doing this by hand:

>>> f = open("Berger_3Conly.json")
>>> json.load(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 4 column 1 (char 61)

Here we can see that it comes from line 4 column 1. Easier to debug.
In fact the error was coming from a comma at the end of the last line :

{
    "resname": ["PO3C"],
    "C26": ["CH2", "C25", "C27"],
}

It makes me think to warn the user about this: don't put a comma on the last line!

Correspondance lipid name / residue name

By creating a notebook example, I realized we don't check the correspondance between the lipid name given with -l option and the residue name in the PDB. For example:

Constructing the system...
System has 28526 atoms
Traceback (most recent call last):
  File "/home/fuchs/software/miniconda3/envs/buildh/bin/buildH", line 33, in <module>
    sys.exit(load_entry_point('buildh', 'console_scripts', 'buildH')())
  File "/home/fuchs/papers/buildH_Amelie/buildH/buildh/cli.py", line 232, in main
    core.fast_build_all_Hs_calc_OP(universe_woH, begin, end, dic_OP, dic_lipid, dic_Cname2Hnames)
  File "/home/fuchs/papers/buildH_Amelie/buildH/buildh/core.py", line 466, in fast_build_all_Hs_calc_OP
    dic_lipids_with_indexes = make_dic_lipids_with_indexes(universe_woH,
  File "/home/fuchs/papers/buildH_Amelie/buildH/buildh/core.py", line 365, in make_dic_lipids_with_indexes
    first_lipid_residue = universe_woH.select_atoms(selection).residues[0]
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.8/site-packages/MDAnalysis/core/groups.py", line 540, in __getitem__
    return self.level.singular(self.ix[item], self.universe)
IndexError: index 0 is out of bounds for axis 0 with size 0

Rather unclear, isn't it ๐Ÿ˜† ???

The error above comes from the use of Berger_POPC instead of Berger_PLA with the -l option.

Additionnally, I'm thinking maybe we should also make a check between the argument passed to -l option and that passed to -d option. Imagine, for example, -l CHARMM-UA -d order_parameter_definitions_MODEL_Berger_POPC.def (could be solved in another issue)

Update doc about Numba

At least it should be stated in the main README that buildH is accelerated with Numba. Also a couple of pages in the docs directory will have to be updated, as well as in notebooks.

Update licence text

For some reason, the licence text is not what it should be, as listed here.

We should update the text licence appropriately.

Create sub directory in test_data

As discussed with @patrickfuchs, it will be better to have sub directories in test_data.
Currently, we only test a POPC with the forcefield Berger, but in the future, we could test several lipids and/or forcefields
Creating a sub-directory for each couple lipid-forcefield will help.
For now, we can create "POPC-Berger" and move the related files.

Document the use of the json files

Following #66 and the re-work of the documentation, we should explain in the buildh.md file the role and architecture of the json files for the lipid topologies.

Use logging module

For this kind of project, I'll advise to use the logging module instead of the print() function.

You will find here a small example : https://github.com/pierrepo/seq-to-first-iso/blob/master/seq_to_first_iso/seq_to_first_iso.py
First, import the module:

import logging

Then configure it:

# Set custom logger.
log = logging.getLogger(__name__)
log_handler = logging.StreamHandler()
log_formatter = logging.Formatter("[%(asctime)s] %(levelname)-8s: %(message)s",
                                  "%Y-%m-%d, %H:%M:%S")
log_handler.setFormatter(log_formatter)
log.addHandler(log_handler)
log.setLevel(logging.INFO)

Note that with logging you have the ability to print messages in the console and save them in a log file in the mean time (with just a few lines of extra configuration)!

The use it:

log.info("This an info message")
log.warning("This is a warning message")
log.error("This an error massage")
log.critical("This an error message that will stop the program!")

Add json files for other UA force fields and lipids

So far in dic_lipid.py, we have only one UA lipid : Berger POPC. We also have CHARMM36-AA POPC, but it's an AA force field. This lipid is there because it was used to test whether the reconstruction from buildH gave a good match compared to an all-atom force field such as CHARMM36-AA.
In this issue, I suggest to extend dic_lipids.py to :

We shall use autolipmap for building automatically these dictionnaries.

Update doc (Calculate OP on a subpart)

Update documentation to explain that OP is calculated only on the H described in .def file. For example, on the following .def file, OP is calculated only on the polar head :

beta1 POPC C5  H51
beta2 POPC C5  H52
alpha1 POPC C6  H61
alpha2 POPC C6  H62
g3_1 POPC C12 H121
g3_2 POPC C12 H122
g2_1 POPC C13 H131
g1_1 POPC C32 H321
g1_2 POPC C32 H322

Equations in the doc

In the section Statistics, I entered some equations in Latex format between $, e.g. $f(x) = x^2$. Unfortunately, Sphynx does not convert them into a real equation on readthedocs. I Tried to look at Sphynx doc, but it is not easy to tease things apart. Most of the examples are based on rst files, not markdown files.

Generation of xtc with one frame?

When the user provides a single pdb file without any trajectory (no use of -x), and the trajectory with H is requested with option -opx, buildH generates a pdb and an xtc. The xtc file contains only one frame, it's thus useless. Maybe we can check if -x is not used and -opx is used to generate only the pdb?

Long term goals

  • Unit tests
  • Optimization / possibly multithreading
  • Packaging
  • Documentation

Implement a way to use buildH as a simple module

It will be useful to have a way to use buildH as a module to be called in others python scripts

import buildh
buildh.main("popc.def","popc.pdb","popc.xtc","POPC","Berger_POPC","outputfile.txt")

Add a message that PBC are not taken (yet) into account

After a discussion with Hanne Antila, it looks like we can obtain "funny" results when the molecules are not whole (as produced by mdrun). So for now, it's probably useful to warn the user that he/she shall use trajectories with whole molecules (produced with trjconv with flag -pbc mol. In the future we can think about supporting molecules split at the boundaries. But this is low priority.

Check results between buildH & JMelcr program

I think we should add a subdir in Berger_test_case dir showing a validation of the order parameter calculated by buildH. I did the test on the traj of 25 frames, the mean diff is 0.000166585365853658. Not bad ๐Ÿ˜› !

Hydrogens pro-S and pro-R

In an NMR lipids post, we talked some time ago about being consistent regarding which hydrogens are pro-R and which are pro-S. So far if we buildH on, say C31, we build H311 and H312. So we add 1 and 2 suffixes. The idea would be for example to assign 1 to the pro-R hydrogen, and 2 to the pro-S.
This is something we can do without too much pain in buildH. I describe the strategy in a recent issue of NMRlipids IVb.
In fact, I did a check for Berger POPC. What matters is the order of the helpers in the .json files. So the "only thing" we have to do, if we want this, is to do a check when we create the json file.
So we can think about whether we implement this or not.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.