patrickfuchs / buildh Goto Github PK

View Code? Open in Web Editor NEW

14.0 5.0 6.0 28.12 MB

:computer: Build hydrogen atoms from united-atom molecular dynamics of lipids and calculate the order parameters.

Home Page: https://buildh.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 90.70% Makefile 0.58% TeX 8.72%

python lipids order-parameters united-atom molecular-dynamics-simulation

buildh's Introduction

buildH

Build hydrogen atoms from united-atom molecular dynamics of lipids and calculate the order parameters

Features

buildH can :

Reconstruct hydrogens from a united-atom structure file (pdb, gro) or trajectory (e.g. xtc).
Calculate the order parameters based on the reconstructed hydrogens.
Write new structure trajectory files with the reconstructed hydrogens.

buildH works in two modes :

A slow mode when an output trajectory (in xtc format) is requested by the user. In this case, the whole trajectory including newly built hydrogens are written to this trajectory.
A fast mode without any output trajectory.

In both modes, the order parameters are calculated. All calculations are accelerated with Numba. As a CPU cost indication, running buildH on a trajectory of 2500 frames with 128 POPC (without trajectory output) takes ~ 7' on a single core Xeon @ 3.60GHz.

Requirements

Python >= 3.6 is mandatory for running buildH.

buildH is written in Python 3 and needs the modules numpy, pandas, MDAnalysis and Numba.

Installation

Pip

A simple installation with pip will do the trick:

python3 -m pip install buildh

All dependencies (modules) will be installed automatically by pip.

Conda

buildH is also available through the Bioconda channel:

conda install buildh -c bioconda -c conda-forge

More details on installation here.

For installing a development version, see here.

Running buildH

Once installed, a simple invocation of the buildH command will run the program ($ represents the Unix prompt):

$ buildH
usage: buildH [-h] -c COORD [-t TRAJ] -l LIPID [-lt LIPID_TOPOLOGY [LIPID_TOPOLOGY ...]] -d DEFOP
              [-opx OPDBXTC] [-o OUT] [-b BEGIN] [-e END] [-pi PICKLE] [-igch3]
buildH: error: the following arguments are required: -c/--coord, -l/--lipid, -d/--defop

The minimal command for running buildH can resemble this:

$ buildH -c start_128popc.pdb -t popc0-25ns_dt1000.xtc -l Berger_POPC -d Berger_POPC.def

The different arguments mean the following: -c start_128popc.pdb is a pdb file with 128 POPC molecules, -t popc0-25ns_dt1000.xtc is a trajectory with 25 frames, -l Berger_POPC indicates the united-atom force field and the type of lipid to be analyzed, -d Berger_POPC.def indicates what C-H are considered for H building and order parameter calculation (the structure and trajectory files can be found here). The def file can be found here. The final order parameters averaged over the trajectory will be written to the default output name OP_buildH.out

Other detailed examples and Jupyter Notebooks can be found in the documentation at Read the Docs.

Important: sometimes, when performing MD, some molecules are split over periodic boundary conditions (PBC). buildH takes as input whole structures (pdb, gro, xtc, etc.). If broken molecules are supplied, it will most likely generate nonsense results. So it is up to the user to take care of making molecules whole before running buildH (e.g. by using a tool like trjconv in GROMACS with flag -pbc mol).

Invoking buildH with the -h flag will display some help to the screen and tell which lipids are supported.

$ buildH -h
usage: buildH [-h] [-v] -c COORD [-t TRAJ] -l LIPID [-lt LIPID_TOPOLOGY [LIPID_TOPOLOGY ...]] -d
              DEFOP [-opx OPDBXTC] [-o OUT] [-b BEGIN] [-e END] [-igch3]
[...]
The list of supported lipids (-l option) are: Berger_CHOL, Berger_DOPC, Berger_DPPC, Berger_POP, Berger_POPC, Berger_PLA, Berger_POPE, Berger_POPS, CHARMM36UA_DPPC, CHARMM36UA_DPUC, CHARMM36_POPC, GROMOS53A6L_DPPC, GROMOSCKP_POPC, GROMOSCKP_POPS. More documentation can be found at https://buildh.readthedocs.io.

Documentation

The full documentation is available at Read the Docs.

Contributors

Hubert Santuz
Amélie Bâcle
Pierre Poulain
Patrick Fuchs

License

buildH is licensed under the BSD License.

Contributing

If you want to report a bug, request a feature, or propose an improvement use the GitHub issue system.

Please, see also the CONTRIBUTING file.

Note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See the CODE_OF_CONDUCT file.

Citing buildH

If you use buildH for your research, please cite :

Santuz et al., (2021). buildH: Build hydrogen atoms from united-atom molecular dynamics of lipids and calculate the order parameters. Journal of Open Source Software, 6(65), 3521, https://doi.org/10.21105/joss.03521

buildh's People

Contributors

Stargazers

Watchers

Forkers

ohsollila nmrlipids hublot pierrepo readthedocs-assistant rnaimehaom

buildh's Issues

Create PyPI package and update procedure

Implement calculation of STEM

Add calculation of STandard Error of the Mean (STEM) like in calcOrderParameters.py from MATCH repository :
https://github.com/NMRLipids/MATCH/blob/f5fc0f7d46c84cc2a0328694a9c666f60315697c/scripts/calcOrderParameters.py#L159
Useful comment from Samuli : https://nmrlipids.blogspot.com/2019/04/nmrlipids-ivb-assembling-pe-pg-results.html?showComment=1559806601154#c455028859912859374

Why does it crash with file 200POPC.pdb ?

When ran on this file :

$ python3 ./add_hydrogens.py 200POPC.pdb
[...]
Traceback (most recent call last):
  File "./add_hydrogens.py", line 624, in <module>
    new_df_atoms = build_all_H(universe_woH, return_coors=True)
  File "./add_hydrogens.py", line 553, in build_all_H
    Hs_coor = buildHs_on_1C(atom)
  File "./add_hydrogens.py", line 454, in buildHs_on_1C
    helper1_coor = sel("name {0}".format(helper1_name))[0].position
  File "/home/fuchs/software/miniconda3/lib/python3.7/site-packages/MDAnalysis/core/groups.py", line 1804, in __getitem__
    return super(AtomGroup, self).__getitem__(item)
  File "/home/fuchs/software/miniconda3/lib/python3.7/site-packages/MDAnalysis/core/groups.py", line 475, in __getitem__
    return self.level.singular(self.ix[item], self.universe)
IndexError: index 0 is out of bounds for axis 0 with size 0

For the record, this file has 200 POPC and some TO (trioleins).

Improve PEP 257 compliance.

We should improve docstrings PEP 257 compliance. Especially for one-line docstrings.

We could use pydocstyle for this. pydocstyle is already installed in the conda environement (see binder/environment.yml).

Using MDAnalysis data structure ?

Hi,

Following IRL discussion with @patrickfuchs, I'm trying to use MDAnalysis data structure to build the hydrogens.
This issue will resume my progress on it and see if it's doable.

Right know, I'm able to build easily the hydrogen with this code :

if __name__ == "__main__":
    # read coordinates in a pandas dataframe
    # list_df_residues = pdb2list_pandasdf_residues("POPC_only.pdb")
    # new_df_atoms = reconstruct_hydrogens3(list_df_residues)
    # print(pandasdf2pdb(new_df_atoms))

    u = MDAnalysis.Universe("POPC_only.pdb")

    for atom in u.atoms:
        if atom.name in dic_lipids.POPC:
            typeofH2build, helper1_name, helper2_name = dic_lipids.POPC[atom.name]

            #atom Object has a residue object. This residue object has a list of its atom we can select.
            #[0] is because select_atoms return a AtomGroup who contains only 1 atom
            helper1_coor = atom.residue.atoms.select_atoms("name {0}".format(helper1_name))[0].position
            helper2_coor = atom.residue.atoms.select_atoms("name {0}".format(helper2_name))[0].position
            # Build H(s).
            H1_coor, H2_coor = get_CH2_H(atom.position, helper1_coor, helper2_coor)

            H1_name, H2_name = get_name_H(atom.name)

Pro :

it's very fast : ~4s on my computer
we skip the use of Dataframes

Cons:

I'm not able, yet, to recreate a new universe object with the newly hydrogens inserted at the right positions. I'm working on it.

Transform dic_lipids.py into json files

After thorough discussion, it will be better to transform the dic_lipids.py into json files.
This dic_lipids.py is used to reconstructed hydrogens from a united atom bilayer.

This will allow the users to add their own .json files for their own trajectories.
A regular database of json will be created in the buildH source code into a directory called lipids to provide a set of supported lipids.
We will add an option to the program for user to provided their own json.

Steps:

Convert the current dic_lipids.py into a series of .json files into buildh/lipids directory
Transform the code to read those json files
Provide a list of supported lipids when using the -h of the program (by parsing the directory)
Add an argument for users to provide their own .json files.
Integrate the lipids from NMRLipids@d0a2bc1 (issue #44)

Urgent things TODO before first release

Manage options w/wo output traj files.
Name of output file (containing OP) ;.out extension of this file ?
Change function quick_dic().
Make dic_lipid.POPC choosable by the user (use builtin function gettatr()).
Use ordered dic for dic_OP (so that the OP output is always in the same order).

Add GitHub CI badges

It would be nice to add CI badges for testing code and documentation.

Force python 3 detection

I realized that when installing MDAnalysis with python2.7, buildH doesn't issue any error but when it runs, it generates wrong results. So we should add a detection of python 3.7 (or higher).

CHARMM validation cleaning up

Move CHARMM_POPC.json to there in dir lipids.

Register buildH to Software Heritage

Software Heritage allows archiving research software. There're very few steps to follow to be registered:

https://www.softwareheritage.org/save-and-reference-research-software/?lang=fr

Order parameter formula

Add it 😉

Expand tests down to Python 3.6

BuildH should work with Python 3.6. We should perform tests also with this version.

Connect repo to Zenodo

@patrickfuchs
Could please connect the repo to Zendo with these guidelines?

Once done, can you try to publish a new release? Everything should be explained here: buildH/devtools/RELEASE.md

Create a def_files directory

Provide a set of order parameters definitions files into a directory called def_files.
Users will pick them to calculate the order parameters.

Refactorizing

The main function buildH_calcOP.py gets big. It would be nice to refactorize it by adding functions.
Also, the script buildH_calcOP.py could be factorized by adding modules.

Add an ignore build hydrogens flag

When one wants to use buildH to calculate OP only, without reconstructing Hs, from an all-atom trajectory.

Use mamba to create environement at Read the Docs

The feature flag CONDA_USES_MAMBA enables mamba instead of conda to create the environment at Read the Docs.

Unfortunately documentation is not very explicit on how to use feature flag...

Specify the type of trajectory supported

Currently, we can support any trajectory and topology files that MDAnalysis support.
But, our argument option for trajectory is -x, --xtc wich implies that only xtc trajectory files are supported.

We should change it to a more general option name and specify it in the documentation

Create doc

I'm about to create a doc dir. I'll move there all the stuff related to doc (vectors.png, CHARMM36_POPC_validation), simplify the main README.md by move some of its content.
Once done, we can complete later the doc in a some appropriate format (sphynx?) and add notebooks to guide the user.

Update vectors.png

In the current version the yellow vectors don't fit the C-H bonds (drawn as sticks), although they should (and they do). That would be useful (in the long term) to show this also for the other types of H that we rebuild.

Convert pandas dataframe to 2D-numpy array

According to https://stackoverflow.com/questions/28006793/pandas-dataframe-to-list-of-lists, I just have to use df.values to convert a pandas dataframe to a numpy 2D-array (and not np.array(df.values.tolist()) 😆 !!!).

Be clearer when there is an error with the user supplied json file

Making a try with a toy (user supplied) json file:

$ buildH -c 3Conly.pdb -d 3Conly.def -lt Berger_3Conly.json  -l PO3C
usage: buildH [-h] -c COORD [-t TRAJ] -l LIPID [-lt LIPID_TOPOLOGY [LIPID_TOPOLOGY ...]] -d DEFOP [-opx OPDBXTC] [-o OUT]
              [-b BEGIN] [-e END] [-pi PICKLE]
buildH: error: Berger_3Conly.json is in a bad format.

This error message is not very precise. The user can't find where the error comes from in the json file.

When doing this by hand:

>>> f = open("Berger_3Conly.json")
>>> json.load(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 4 column 1 (char 61)

Here we can see that it comes from line 4 column 1. Easier to debug.
In fact the error was coming from a comma at the end of the last line :

{
    "resname": ["PO3C"],
    "C26": ["CH2", "C25", "C27"],
}

It makes me think to warn the user about this: don't put a comma on the last line!

Correspondance lipid name / residue name

By creating a notebook example, I realized we don't check the correspondance between the lipid name given with -l option and the residue name in the PDB. For example:

Constructing the system...
System has 28526 atoms
Traceback (most recent call last):
  File "/home/fuchs/software/miniconda3/envs/buildh/bin/buildH", line 33, in <module>
    sys.exit(load_entry_point('buildh', 'console_scripts', 'buildH')())
  File "/home/fuchs/papers/buildH_Amelie/buildH/buildh/cli.py", line 232, in main
    core.fast_build_all_Hs_calc_OP(universe_woH, begin, end, dic_OP, dic_lipid, dic_Cname2Hnames)
  File "/home/fuchs/papers/buildH_Amelie/buildH/buildh/core.py", line 466, in fast_build_all_Hs_calc_OP
    dic_lipids_with_indexes = make_dic_lipids_with_indexes(universe_woH,
  File "/home/fuchs/papers/buildH_Amelie/buildH/buildh/core.py", line 365, in make_dic_lipids_with_indexes
    first_lipid_residue = universe_woH.select_atoms(selection).residues[0]
  File "/home/fuchs/software/miniconda3/envs/buildh/lib/python3.8/site-packages/MDAnalysis/core/groups.py", line 540, in __getitem__
    return self.level.singular(self.ix[item], self.universe)
IndexError: index 0 is out of bounds for axis 0 with size 0

Rather unclear, isn't it 😆 ???

The error above comes from the use of Berger_POPC instead of Berger_PLA with the -l option.

Additionnally, I'm thinking maybe we should also make a check between the argument passed to -l option and that passed to -d option. Imagine, for example, -l CHARMM-UA -d order_parameter_definitions_MODEL_Berger_POPC.def (could be solved in another issue)

Add some help on how to install the required packages with conda

Update doc about Numba

At least it should be stated in the main README that buildH is accelerated with Numba. Also a couple of pages in the docs directory will have to be updated, as well as in notebooks.

Implement -b -e flags to select different slices of a trajectory

Update licence text

For some reason, the licence text is not what it should be, as listed here.

We should update the text licence appropriately.

Implement cholesterol

In a recent discussion of the Databank project, I proposed to implement cholesterol in buildH.

Create sub directory in test_data

As discussed with @patrickfuchs, it will be better to have sub directories in test_data.
Currently, we only test a POPC with the forcefield Berger, but in the future, we could test several lipids and/or forcefields
Creating a sub-directory for each couple lipid-forcefield will help.
For now, we can create "POPC-Berger" and move the related files.

Document the use of the json files

Following #66 and the re-work of the documentation, we should explain in the buildh.md file the role and architecture of the json files for the lipid topologies.

Use logging module

For this kind of project, I'll advise to use the logging module instead of the print() function.

You will find here a small example : https://github.com/pierrepo/seq-to-first-iso/blob/master/seq_to_first_iso/seq_to_first_iso.py
First, import the module:

import logging

Then configure it:

# Set custom logger.
log = logging.getLogger(__name__)
log_handler = logging.StreamHandler()
log_formatter = logging.Formatter("[%(asctime)s] %(levelname)-8s: %(message)s",
                                  "%Y-%m-%d, %H:%M:%S")
log_handler.setFormatter(log_formatter)
log.addHandler(log_handler)
log.setLevel(logging.INFO)

Note that with logging you have the ability to print messages in the console and save them in a log file in the mean time (with just a few lines of extra configuration)!

The use it:

log.info("This an info message")
log.warning("This is a warning message")
log.error("This an error massage")
log.critical("This an error message that will stop the program!")

Add "Powered ny MDAnalysis" badge

We could add this badge

See https://www.mdanalysis.org/2021/04/11/powered-by-mda/

Check PEP257 for docstrings

As pointed by @pierrepo in #37 (comment), we should ensure buildH follow the PEP257 for docstrings.

For example, with one-line docstring, we should have:

"""Test for is_allHs_present()."""

Create Bioconda package and update procedure

Add Zenodo description file

Zenodo can be used to archive buildH source code.

Zenodo metadata can be defined in a file called .zenodo.json. Here's an example: https://github.com/pierrepo/autoclasswrapper/blob/master/.zenodo.json

For buildH, we need:

a title
author's ORCID
some keywords

Add json files for other UA force fields and lipids

So far in dic_lipid.py, we have only one UA lipid : Berger POPC. We also have CHARMM36-AA POPC, but it's an AA force field. This lipid is there because it was used to test whether the reconstruction from buildH gave a good match compared to an all-atom force field such as CHARMM36-AA.
In this issue, I suggest to extend dic_lipids.py to :

other UA force fields, especially CHARMM36-UA. Here's an example of trajectory : https://doi.org/10.5281/zenodo.1293774.
possibly other common lipids, such as DOPC, DPPC and DMPC.

We shall use autolipmap for building automatically these dictionnaries.

Remove TODO file and create (updated) issues on github

Update doc (Calculate OP on a subpart)

Update documentation to explain that OP is calculated only on the H described in .def file. For example, on the following .def file, OP is calculated only on the polar head :

beta1 POPC C5  H51
beta2 POPC C5  H52
alpha1 POPC C6  H61
alpha2 POPC C6  H62
g3_1 POPC C12 H121
g3_2 POPC C12 H122
g2_1 POPC C13 H131
g1_1 POPC C32 H321
g1_2 POPC C32 H322

Equations in the doc

In the section Statistics, I entered some equations in Latex format between $, e.g. $f(x) = x^2$ . Unfortunately, Sphynx does not convert them into a real equation on readthedocs. I Tried to look at Sphynx doc, but it is not easy to tease things apart. Most of the examples are based on rst files, not markdown files.

Generation of xtc with one frame?

When the user provides a single pdb file without any trajectory (no use of -x), and the trajectory with H is requested with option -opx, buildH generates a pdb and an xtc. The xtc file contains only one frame, it's thus useless. Maybe we can check if -x is not used and -opx is used to generate only the pdb?

Remove template tag from repo

This repo appears as a template:

There is no reason for this. @patrickfuchs could you please remove this in the settings?

Long term goals

Unit tests
Optimization / possibly multithreading
Packaging
Documentation

Install bumpversion to update easily version number

Advertise buildH

I create this issue to gather a few places where we can advertise buildH. Here are a few ideas:

First, we are cited on the NMRlipids blog ;
It'll be a priori cited in some future NMRlipids papers ;
We can add a link on our personal web page ;
We can publish a post on this GROMACS forum web page dedicated on third-party tools

Don't hesitate if you have other ideas!

Implement a way to use buildH as a simple module

It will be useful to have a way to use buildH as a module to be called in others python scripts

import buildh
buildh.main("popc.def","popc.pdb","popc.xtc","POPC","Berger_POPC","outputfile.txt")

Add a message that PBC are not taken (yet) into account

After a discussion with Hanne Antila, it looks like we can obtain "funny" results when the molecules are not whole (as produced by mdrun). So for now, it's probably useful to warn the user that he/she shall use trajectories with whole molecules (produced with trjconv with flag -pbc mol. In the future we can think about supporting molecules split at the boundaries. But this is low priority.

Write doc about command line options

Complete command_line_options.md.

Check results between buildH & JMelcr program

I think we should add a subdir in Berger_test_case dir showing a validation of the order parameter calculated by buildH. I did the test on the traj of 25 frames, the mean diff is 0.000166585365853658. Not bad 😛 !

Hydrogens pro-S and pro-R

In an NMR lipids post, we talked some time ago about being consistent regarding which hydrogens are pro-R and which are pro-S. So far if we buildH on, say C31, we build H311 and H312. So we add 1 and 2 suffixes. The idea would be for example to assign 1 to the pro-R hydrogen, and 2 to the pro-S.
This is something we can do without too much pain in buildH. I describe the strategy in a recent issue of NMRlipids IVb.
In fact, I did a check for Berger POPC. What matters is the order of the helpers in the .json files. So the "only thing" we have to do, if we want this, is to do a check when we create the json file.
So we can think about whether we implement this or not.

patrickfuchs / buildh Goto Github PK

buildh's Introduction

buildH

Features

Requirements

Installation

Pip

Conda

Running buildH

Documentation

Contributors

License

Contributing

Citing buildH

buildh's People

Contributors

Stargazers

Watchers

Forkers

buildh's Issues

Recommend Projects

Recommend Topics

Recommend Org