Coder Social home page Coder Social logo

strevol-mpi-mis / rafft Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 85.53 MB

RAFFT: RNA structure and folding dynamics predictions using the fast Fourier transform

License: MIT License

Python 100.00%
rna-secondary-structure rna-folding fast-fourier-transform bioinformatics

rafft's Introduction

RAFFT is a folding tool that builds fast-folding paths for a given sequence. Starting from the completely unfolded structure, it quickly identifies stems with an FFT-based technique. Then, it forms the stem that improves the overall stability. Multiple folding paths can be explored and displayed. Therefore, given a sequence, the user will obtain several structures or folding paths.

Requirements

  • Vienna RNA package (version >= 2.4.17)
  • Numpy, scipy with python 3

Installation

To install RAFFT in your local python library

pip install . --user --upgrade

Then two command line tools will be available: rafft for the folding and rafft_kin for the kinetic analysis.

Alternative Implementation

Another more performant implementation of the core algorithm is available here: rafft-rs This implementation is able to generate output in a compatible format that can be used for the same kinetic analysis using rafft_kin. Please refer to this repository for documentation about differences in usage.

rafft-rs is also added as a submodule to this repository. Use git submodule init after cloning to pull its code.

Usage

For the examples, we used the Coronavirus frameshifting stimulation element obtained from RFAM.

To display only the final structures:

rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU  -ms 5

To display the visited/saved intermediates:

rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU  -ms 5  --traj

The algorithm has two critical parameters:

  • -ms <INT> is the number of saved structures at each folding step (default=1)
  • -n <INT> is the number of positional lag to search for stems (default=100)

The search of stems can be tuned using weights on base pairs types:

  • --GC <FLOAT> GC base pairs weight (default = 3.0)
  • --AU <FLOAT> AU base pairs weight (default = 2.0)
  • --GU <FLOAT> GU base pairs weight (default = 1.0)

Inputs

The input is one sequence in the standard input or a simple text file (it can be in Fasta format).

Outputs

For the trajectory output format: at each step, numbered from 0 to 3, the structures saved are given below the sequence with their stability (computed with Vienna RNA API).

GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU
# ---------0----------
..................................................................................    0.0
# ---------1----------
.....(((((((((((..........))))))))))).............................................  -14.0
..................................((((((((................))))))))................   -6.8
...................(((((.............................................)))))........   -6.4
..................................((((.......)))).................................   -5.5
(((((..............)))))..........................................................   -4.6
# ---------2----------
........((((((((..........))))))))(((((((((((.........))).))))))))................  -23.1
........((((((((..........))))))))((((((((..((........))..))))))))................  -20.9
...................(((((..........(((((((((((.........))).))))))))...)))))........  -18.8
........((((((((..........))))))))((((((((...((....)).....))))))))................  -18.7
.....(((((((((((.((.....)))))))))))))....................((((((.............))))))  -18.2
# ---------3----------
........((((((((.((.....))))))))))(((((((((((.........))).))))))))................  -24.0
........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................  -24.0
........((((((((..........))))))))(((((((((((.........))).))))))))................  -23.1
........((((((((.((.....))))))))))((((((((..((........))..))))))))................  -21.8
........((((((((..........))))))))((((((((..((........))..))))))))................  -20.9

The output of the final structures only is the following.

GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU
........((((((((.((.....))))))))))(((((((((((.........))).))))))))................  -24.0
........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................  -24.0
........((((((((..........))))))))(((((((((((.........))).))))))))................  -23.1
........((((((((.((.....))))))))))((((((((..((........))..))))))))................  -21.8
........((((((((..........))))))))((((((((..((........))..))))))))................  -20.9

Analysis

Fast-paths plot

To create the fast-folding path figures, one can use the utility utility/plot_path.py on rafft output:

It uses VARNA to produce the secondary structure representation, should be download directly from its website

cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 5 --traj > rafft.out
python ../utility/plot_path.py rafft.out -he 500 -wi 900 -rv 1 -o path_5.png

example/path_5.png

With 20 saved structures:

example/path_20.png

Kinetic trajectory

From the above fast-folding graph, one can produce kinetic trajectories. Starting from the completely unfolded structures, it simulates the folding process.

cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 20 --traj > rafft_20.out
rafft_kin rafft_20.out -o kinetic.png --plot -mt 40

example/kinetic.png

The output has the following form

[...]
<structure>                                                                   <population> <Energy> <Structure ID>
.....(((((((((((.((.....)))))))))))))((((((((.........))).)))))...................  0.009 -23.2 44
........((((((((..........))))))))(((((((((((.........))).))))))))................  0.011 -23.1 21
((((((((.......))).)))))(((.....(((((((((((((.........))).))))))))...))......)))..  0.014 -23.6 62
(((((.(((.......))))))))(((.....((((((((((..((........))..))))))))...))......)))..  0.014 -23.6 63
........((((((((.((.....))))))))))(((((((((((.........))).))))))))................  0.049 -24.0 42
........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................  0.049 -24.0 43
(((((.(((.......))))))))(((((.....(((((((((((.........))).))))))))..))))).........  0.063 -24.5 41
(((((.(((.......))))))))(((((.....(((((((((((((....)).))).))))))))..))))).........  0.063 -24.5 61
(((((.(((.......))))))))(((.....(((((((((((((.........))).))))))))....)).....)))..  0.168 -25.1 60
(((((.(((.......))))))))(((.....(((((((((((((.........))).))))))))...))......)))..  0.531 -25.8 59

Folding landscape

From the fast-folding graph, one can also draw a landscape using the multidimensional scaling algorithm to map the structures onto a plan. It tries to preserve as much as possible the base pair distance between structures.

cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 20 --traj > rafft_20.out
python ../utility/surface.py rafft_20.out -o landscape.png

(Initial and minimum energy structure are circled in black)

example/landscape.png

Usage as a package

The folding function and the kinetic function can both be called from the rafft package.

from rafft.rafft import fold
from rafft.rafft_kin import kinetics

seq = "GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU"
final_set_struct, trajectory = fold(seq, max_stack=20, traj=True)
traj_k, times, struct_list, equi_pop = kinetics(trajectory, 40, 32)
for st, nrj, prob, sid in equi_pop[::-1][:10]:
    print(f"{st} {prob:5.2f}")

Reproducibility of the benchmarks

The dataset curated we used for the benchmarks is in benchmarks_results/benchmark_cleaned_all_length.csv.

The benchmark results files (and associated script to produce them) are given in the following table (for details about those results, see the associated reference):

MethodfileNotes
RAFFTrafft_100n_50ms_best_nrj_scores.csv-n 100 -ms 50 (best energy)
rafft_100n_50ms_scores.csv-n 100 -ms 50 (best score)
rafft_200n_200ms_scores.csv-n 200 -ms 200 (best score)
MFEmfe_scores.csvbench_mfe.py
MLmxfold_scores.csvbench_mxfold.py

analysis.org and utils_analysis.py contain the pieces of script used to perform the analysis and the figures.

For the test case, we used the command line given in the Usage section above. Figures were derived from their output.

rafft's People

Contributors

fncnt avatar lemerleau avatar vaiteaopuu avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

hklz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.