Coder Social home page Coder Social logo

srnas / barnaba Goto Github PK

View Code? Open in Web Editor NEW
34.0 12.0 15.0 43.15 MB

Analyse Nucleic Acids Structure and Simulations with baRNAba

License: GNU General Public License v3.0

Python 32.41% Jupyter Notebook 67.59%
rna-structure python molecular-dynamics modeling rna rna-folding

barnaba's Introduction

image

image

image

image

image

Introduction

Barnaba is a tool for analyzing RNA three-dimensional structures and simulations. Barnaba uses MDtraj to read/write topology and trajectory files, as such it supports several formats including pdb, xtc, trr, dcd, binpos, netcdf, mdcrd, prmtop, and more. Barnaba has been developed by Sandro Bottaro with the crucial help of Giovanni Bussi, Giovanni Pinamonti, Sabine Rei{ss}er and Wouter Boomsma.

This is what you can do with Barnaba:

  1. Calculate eRMSD [1]
  2. Calculate RMSD after optimal alignment
  3. Search for single/double stranded RNA motifs in the PDB database or in simulations [1]
  4. Annotate PDB structures and trajectories with the Leontis-Westhof classification
  5. Produce dynamic secondary structure figures in SVG format
  6. Cluster nucleic acids structures using the eRMSD as a metric distance
  7. Calculate elastic network models for nucleic acids and nucleic acids/protein complexes [2]
  8. Calculate backbone and pucker torsion angles in a PDB structure or trajectory
  9. Back-calculate 3J scalar couplings from PDB structure or trajectory
  10. Score three-dimensional structures using eSCORE [1]

For bugs, questions or comments contact Sandro at sandro dot bottaro (guesswhat) gmail dot com

If you use Barnaba in your work, please cite the following paper:

@article{bottaro2019barnaba,
    title={Barnaba: software for analysis of nucleic acid structures and trajectories},
    author={Bottaro, Sandro and Bussi, Giovanni and Pinamonti, Giovanni and Rei{\ss}er, Sabine and Boomsma, Wouter and Lindorff-Larsen, Kresten},
    journal={RNA},
    volume={25},
    number={2},
    pages={219--231},
    year={2019},
    publisher={Cold Spring Harbor Lab}
}

The manuscript is also available on biorXiv here: https://www.biorxiv.org/content/10.1101/345678v3

Requirements

Barnaba requires:
  • Python >= 3.6
  • Numpy
  • Scipy
  • Mdtraj 1.9
  • future

Barnaba requires mdtraj (http://mdtraj.org/) for manipulating structures and trajectories. To perform cluster analysis, scikit-learn is required too.

Required packages can be installed using pip, e.g.:

pip install mdtraj

Installation

You can obtain the latest tagged version of barnaba using pip:

pip install barnaba

If you prefer to manage your dependencies with conda you can use:

conda -c conda-forge install barnaba

On MacOS, you can install the same tagged version using the python distributed with MacPorts:

sudo port install py36-barnaba

Just replace 36 with the python version that you prefer to use.

Alternatively, you can find the most recent version of barnaba on Github:

git clone git://github.com/srnas/barnaba.git

then move to the barnaba directory and run the command

pip install -e .

Usage

Barnaba can be either used as a Python library or as a commandline tool. A number of Notebook examples can be found in the examples directory. The notebooks for conducting the analyses and producing the figures in the manuscript can be found in the folder manuscript_figures .

Alternatively, the command-line interface can be found in the bin directory. Here's a minimal how-to

  1. minimal help: barnaba --help
  2. Calculate the ERMSD between structures

    barnaba ERMSD --ref ../test/data/sample1.pdb --pdb ../test/data/sample2.pdb

    trajectories can be provided as well, by specifying a topology file

    barnaba ERMSD --ref ../test/data/sample1.pdb --top ../test/data/sample1.pdb --trj ../test/data/samples.xtc

    other accepted options are shown in a function-specific help

    barnaba ERMSD --help

  3. Calculate the RMSD between structures

    barnaba RMSD --ref ../test/data/sample1.pdb --pdb ../test/data/sample2.pdb --dump

  4. Find single stranded motif

    barnaba SS_MOTIF --query ../test/data/GNRA.pdb --pdb ../test/data/1S72.pdb

  5. Find double stranded motif. l1 and l2 are the lengths of the two strands

    barnaba DS_MOTIF --query ../test/data/SARCIN.pdb --pdb ../test/data/1S72.pdb --l1 8 --l2 7

  6. Annotate structures/trajectories according to the Leontis/Westhof classification.

    barnaba ANNOTATE --pdb ../test/data/SARCIN.pdb

  7. Produce dynamic secondary-structure figures. It requires as input the files .pairing and .stacking produced with the ANNOTATE command.

    barnaba SEC_STRUCTURE --ann outfile.ANNOTATE.stacking.out outfile.ANNOTATE.pairing.out

  8. Calculate backbone/sugar/pseudorotation angles

    barnaba TORSION --pdb ../test/data/GNRA.pdb --backbone --sugar --pucker

  9. Calculate J-couplings

    barnaba JCOUPLING --pdb ../test/data/sample1.pdb

  10. Calculate elastic network models for RNA and predict SHAPE reactivity. NB: only works with PDB.

    barnaba ENM --pdb ../test/data/GNRA.pdb --shape

  11. Calculate relative positions between bases R_ij ang G vectors for pairs within ellipsoidal cutoff

barnaba DUMP --pdb ../test/data/GNRA.pdb --dumpG --dumpR

  1. Extract fragments from structures with a given sequence. NB: only works with PDB.

    barnaba SNIPPET --pdb ../test/data/1S72.pdb --seq NNGNRANN

  2. Calculate ESCORE

barnaba ESCORE --ff ../test/data/1S72.pdb --pdb ../test/data/sample1.pdb

References

[1] Bottaro, Sandro, Francesco Di Palma, and Giovanni Bussi.

"The role of nucleobase interactions in RNA structure and dynamics." Nucleic acids research 42.21 (2014): 13306-13314.

[2] Pinamonti, Giovanni, et al.

"Elastic network models for RNA: a comparative assessment with molecular dynamics and SHAPE experiments." Nucleic acids research 43.15 (2015): 7260-7269.

barnaba's People

Contributors

cclauss avatar giopina avatar giovannibussi avatar kntkb avatar sbottaro avatar sreisser avatar wouterboomsma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

barnaba's Issues

output for different pdb models in ANNOTATE

Hi,

at the moment ANNOTATE with --pdb option outputs only the annotation for the first model, irrespective of the number of models in the file. Output for multiple models can be obtained by treating the pdb file as a trajectory (--trj *.pdb --top ..pdb).
I suggest to output different models with a 'Model' (instead of 'Frame' as with --trj) keyword also when using --pdb.

Cheers
Sabine

Barnaba usage for kink-turns in Python with custom pdb files

Hi,
I want to use Barnaba to compare kink-turns between them and also with some new other patterns.

I would like to obtain first eRMSD values between two kink-turns. So I relied on the file example_07_double_strande.ipynb from this git.

Here is the script that I used:

import barnaba as bb

query = "../test/data/1.txt" 
pdb = "../test/data/1S72.pdb" # Here 1S72.pdb is supposed to be replaced by another kink-turn file, but the bug already appeared with this setup.
(l1, l2) = (5, 7)
# call function. 
results = bb.ds_motif(query,pdb,l1=l1,l2=l2,bulges=0,threshold=0.7,out='motif')

I put as additional files both the error (error.txt) and the kink-turn pdb file (1.txt).
I think the problem comes from my file 1.txt as I generated it from a custom parser from a RNA graph.

Indeed, when I tried to compare it with the git examples (SARCIN.pdb and 1S72.pdb, with the latter that can be reduced to only the lines containing ATOM), I cannot find any difference.
Can you please say me if you see any problem with my 0.pdb file or any other problem that can be fixed ?

Thanks a lot and have a nice day,
Best,
Théo Boury

1.txt
error.txt

Testing

I'm not sure what's the proper way to run all tests.
Should I just execute each of the test_*.py ?

Sparse ENMs

This branch implements sparse ENMs, where tools for sparse matrices are used. Results are equivalent but are significantly faster for large molecules. It might be worth integrating also these functions in the new version.

all-atoms/backbone only RMSD

It would be nice to add the possibility to specify which atoms to include in RMSD calculation, either backbone-only or heavy atoms.

ENM example does not work

When I run the ENM example within jupyter I get the following error:

/Users/sandrobottaro/anaconda/lib/python2.7/site-packages/mdtraj/core/selection.pyc in call(self, selection)
352 lines = ["%s: %s" % (msg, selection),
353 " " * (12 + len("%s: " % msg) + e.loc) + "^^^"]
--> 354 raise ValueError('\n'.join(lines))
355
356 # Change ATOM in function bodies. It must bind to the arg

ValueError: Expected end of text (at char 11), (line:1, col:12): name "C1'" "C2" "P" "CA" "CB"

support mmCIF format

It would be nice to support mmCIF format.
One solution would be to add this feature to mdtraj, but I don't know how difficult it is. Otherwise one could convert from .cif to some format that is already supported by mdtraj.
Any ideas?

Possible issue with python3

I am trying to use barnaba with MacPorts python (I don't like conda).

(I actually just opened a pull request to include mdtraj in MacPorts, so all the requirements will be there soon. The long term plan is to include barnaba in MacPorts as well).

If I try to launch baRNAba.py from the command line with python2.7 it works:

macbook: (master) barnaba$ PYTHONPATH=/Users/bussi/barnaba/ python2.7 bin/baRNAba.py
usage: baRNAba.py [-h]
                  {ERMSD,RMSD,ESCORE,SS_MOTIF,DS_MOTIF,ANNOTATE,DUMP,TORSION,JCOUPLING,SNIPPET,ENM}
                  ...
baRNAba.py: error: too few arguments

However, if I launch it with python3.6 it does not work:

macbook: (master) barnaba$ PYTHONPATH=/Users/bussi/barnaba/ python3.6 bin/baRNAba.py
Traceback (most recent call last):
  File "bin/baRNAba.py", line 582, in <module>
    main()
  File "bin/baRNAba.py", line 552, in main
    outfile = filename(args)
  File "bin/baRNAba.py", line 540, in filename
    if(args.name == None):
AttributeError: 'Namespace' object has no attribute 'name'

I suspect there is a problem with py27 to py36 conversion, but my ignorance does not allow me to fix it (or to even be sure that this is the reason...). Notice that it only happens when passing no argument (e.g. .... bin/baRNAba.py ENM works).

Perhaps @wouterboomsma could have a look?

Thanks!

Giovanni

Correspondence between PDB chain/residue identifiers and `bb.annotate()` output

Thank you for this very useful library.

Just would like a bit more detail on the output of bb.annotate() and how it matches to the original PDB's residue and chain identifies. i.e. I would like to be able to map the annotate output back to the original PDB.

I noticed that the chains are relabeled to have integer IDs (this seems to be coming from the way mdtraj handles pdbs). Now mdtraj seems to be preserving the original chain IDs (mdtraj/mdtraj#1715) I'm investigating whether this can be added as a feature to this project.

Do the residue positions also get relabeled or do these correspond to the original PDB?

Thanks!

SEC_STRUCTURE problem

Is there a way to stop the rna on the .svg image collapsing on itself during minimisation (on the 2d image).

[Question] Threshold used to define base stacking

I would like to annotate base stacking for molecular simulation structures. I understand that the criteria for base stacking (and pairing) was calibrated against high resolution structures, and might not be optimal for simulated structures as described in the example notebook.

I looked into the earlier paper of barnaba published in 2014 describing the distribution of base stacking parameters in Figure 2 and Figure SD3 (supporting information). I can see that the ρij is in the range of 0 - 4 angstroms for base stacking and the distribution could be quite different among base pairs.

The current ρij threshold is set to 2.5 angstroms, but would it make sense to increase this threshold to say ~4 to handle molecular simulation structures and low-resolution structures?

Use scikit-learn instead of sklearn

Hi,
I see you are working on support for newest python versions (3.11) and you use scikit-learn there but please take into account that sklearn is really deprecated. Installation attempts of current barnaba versions can sometimes throw an exception (from 2022 December 1st): https://pypi.org/project/sklearn/
It can be annoying in CI/CD environments and it forces to use temporary solutions (like SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True)

Thanks!

Pyemma compatibility

I've created a function (get_gvecs_pyemma) that computes g-vectors and return them in a format compatible with what required by
pyemma.coordinates.featurizer.add_custom_function()
I've got it in a branch on my fork.
I'm not sure whether this might be useful for everybody or is a bit out of scope at the moment.

(btw, it's not a big difference from the already existing dump_gvec_traj, I just removed seq from the return values and changed the shape of the gvecs array.

for RNA Analysis

Here i am using below code for calculation of ermsd of RNA system by using gromacs trajectry file. And i got an error like Error...
Can anyone help me regarding this?

Thank you

`import barnaba as bb
import mdtraj as md

define trajectory and topology files

native="md.pdb"
traj = "md.xtc"
top = "md.pdb"

calculate eRMSD between native and all frames in trajectory

ermsd = bb.ermsd(native,traj,topology=top)

import matplotlib.pyplot as plt
plt.xlabel("Frame")
plt.ylabel("eRMSD from native")
plt.plot(ermsd[::50])
plt.show()

plt.hist(ermsd,density=True,bins=50)
plt.xlabel("eRMSD from native")
plt.show()`

Error-
MemoryError Traceback (most recent call last)
in ()
7
8 # calculate eRMSD between native and all frames in trajectory
----> 9 ermsd = bb.ermsd(native,traj,topology=top)
10
11 import matplotlib.pyplot as plt

/home/workstation/anaconda3/lib/python3.5/site-packages/barnaba/functions.py in ermsd(reference, target, cutoff, topology, residues_ref, residues_target)
56 traj = md.load(target)
57 else:
---> 58 traj = md.load(target,top=topology)
59
60 warn += "# Loaded target %s \n" % target

/home/workstation/anaconda3/lib/python3.5/site-packages/mdtraj/core/trajectory.py in load(filename_or_filenames, discard_overlapping_frames, **kwargs)
428 _assert_files_or_dirs_exist(filename_or_filenames)
429
--> 430 value = loader(filename, **kwargs)
431 return value
432

mdtraj/formats/xtc/xtc.pyx in xtc.load_xtc (mdtraj/formats/xtc/xtc.c:2766)()

mdtraj/formats/xtc/xtc.pyx in xtc.load_xtc (mdtraj/formats/xtc/xtc.c:2720)()

mdtraj/formats/xtc/xtc.pyx in xtc.XTCTrajectoryFile.read_as_traj (mdtraj/formats/xtc/xtc.c:4579)()

mdtraj/formats/xtc/xtc.pyx in xtc.XTCTrajectoryFile.read (mdtraj/formats/xtc/xtc.c:6037)()

MemoryError:

ENM Travis Fail

@giopina , ENM tests fail with Python 2.7 and 3.6 but not 3.5. Could it be related to precision/phase issues?

Modified residues

Hello,

Does BARNABA have the ability to read modified residues. And can it calculate features of these residues like eRMSD, RMSD and gvectors? If yes, how?

Thank you,

Tia

Installation problem

Hi Sandro,

I'm having a problem installing barnaba in python3.9:
Installation initially seems fine:


pip install barnaba
Requirement already satisfied: barnaba in /home/sabine/.local/lib/python3.9/site-packages (0.1.7)
Requirement already satisfied: numpy in /home/sabine/anaconda3/envs/barnaba/lib/python3.9/site-packages (from barnaba) (1.22.3)
Requirement already satisfied: mdtraj in /home/sabine/.local/lib/python3.9/site-packages (from barnaba) (1.9.7)
Requirement already satisfied: future in /home/sabine/anaconda3/envs/barnaba/lib/python3.9/site-packages (from barnaba) (0.18.2)
Requirement already satisfied: scipy in /home/sabine/.local/lib/python3.9/site-packages (from barnaba) (1.8.0)
Requirement already satisfied: astunparse in /home/sabine/.local/lib/python3.9/site-packages (from mdtraj->barnaba) (1.6.3)
Requirement already satisfied: pyparsing in /home/sabine/.local/lib/python3.9/site-packages (from mdtraj->barnaba) (3.0.8)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/sabine/anaconda3/envs/barnaba/lib/python3.9/site-packages (from astunparse->mdtraj->barnaba) (0.37.1)
Requirement already satisfied: six<2.0,>=1.6.1 in /home/sabine/anaconda3/envs/barnaba/lib/python3.9/site-packages (from astunparse->mdtraj->barnaba) (1.16.0)

But when I try to run it, I get an error:


(barnaba) sabine@sabine-Swift-SF314-511:~/bin$ barnaba
Traceback (most recent call last):
File "/home/sabine/.local/bin/barnaba", line 16, in
from barnaba import commandline
File "/home/sabine/.local/lib/python3.9/site-packages/barnaba/init.py", line 3, in
from .functions import *
File "/home/sabine/.local/lib/python3.9/site-packages/barnaba/functions.py", line 19, in
import mdtraj as md
File "/home/sabine/.local/lib/python3.9/site-packages/mdtraj/init.py", line 55, in
from ._lprmsd import lprmsd
ImportError: /home/sabine/.local/lib/python3.9/site-packages/mdtraj/_lprmsd.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv

Do you have any idea how to fix this?

Cheers
Sabine

Torsion values in case of missing residue

Hi,
I am using Barnaba to extract backbone torsion angles PDB 5YTT (chain B) (Image and PDB file attached here).

It has a missing residue, however, barnaba give values for alpha, epsilon torsion of resid U7 and G5, respectively. These values should be absent.

I think barnaba considering residues G5 and U7 as i and i+1 residue.

Could you please suggest, how can I remove this behavior?

Best,
Mandar Kulkarni
5YTT_B.pdb.txt
torsion_val_missing_residue_RNA

new release

@sbottaro can you make a new release on pip?

I think there are important fixes on ENM. Tests are failing with 0.1.5 but do work with master. (cc: @giopina)

Thanks!

Giovanni

Test failure

Some tests are failing on my laptop

> ======================================================================
> FAIL: test_dump.test_dump
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/Users/giopina/software/miniconda3/envs/barnaba/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
>     self.test(*self.arg)
>   File "/Users/giopina/software/barnaba/test/test_dump.py", line 36, in test_dump
>     assert(filecmp.cmp("%s/dump_01.test.dat" % outdir,"%s/dump_01.test.dat" % refdir)==True)
> AssertionError
> 
> ======================================================================
> FAIL: test_ermsd.test_ermsd_4
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/Users/giopina/software/miniconda3/envs/barnaba/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
>     self.test(*self.arg)
>   File "/Users/giopina/software/barnaba/test/test_ermsd.py", line 60, in test_ermsd_4
>     assert(filecmp.cmp("%s/ermsd_04.test.dat" % outdir,"%s/ermsd_04.test.dat" % refdir)==True)
> AssertionError
> 
> ----------------------------------------------------------------------
> 

BUG annotate conversion to dotbracket

Hi,
I found a case where an A-G and a C-U pair end up as basepairs in the dotbracket annotation (1S72). The annotations look fine, so there seems to be a bug in the conversion to dotbracket.
Cheers
Sabine

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.