Coder Social home page Coder Social logo

bhattacharya-lab / lociparse Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 24.99 MB

locality-aware invariant Point Attention-based RNA ScorEr

License: GNU General Public License v3.0

Python 100.00%
attention-network rna-scoring rna-structure-prediction deep-learning

lociparse's Introduction

lociPARSE: a locality-aware invariant point attention model for scoring RNA 3D structures

by Sumit Tarafder and Debswapna Bhattacharya

[bioRxiv] [pdf]

Codebase for our locality-aware invariant Point Attention-based RNA ScorEr (lociPARSE).

DOI

alt text

Installation

pip install lociPARSE

Or

git clone https://github.com/Bhattacharya-Lab/lociPARSE.git
cd lociPARSE
pip install .

Typical installation time should take less than a minute in a 64-bit Linux system.

Usage

Instructions for running lociPARSE:

from lociPARSE import lociparse
lp = lociparse()
score = lp.score("R1108.pdb")

Additional functionality

score.pMoL.show() #Returns pMoL value
score.pNuL.show() #Returns a list of pNuL values
score.pNuL.show(1) #Returns the pNuL value of 1st nucleotide
score.save("score.txt") #Saves the scores
  1. Given an RNA pdb "R1108.pdb" as input, lociPARSE predicts both molecular-level lDDT (pMoL) and nucleotide-wise lDDT (pNuL) score.

  2. Use show() function to print the pMoL or pNuL values.

  3. Save the output in a provided filename of your choice("score.txt"). First line shows pMoL score. Each of the subsequent lines sepcify 2 columns: column-1: nucleotide index in PDB and column-2: predicted nucleotide-wise lDDT (pNuL) score.

Inference time for a typical RNA structure (~70 nucleotides) should take a few seconds.

Datasets

  • The lists of IDs used in our training set, test sets and validation set used in ablation study are available here.
  • Training set and test set of 30 independent RNAs were taken from trRosettaRNA.
  • CASP15 experimental strctures and all submiited predictions were downloaded from CASP15.
  • The set of 60 non-redundant RNA targets TS60 for hyperparameter optimization was in-house curated. See (https://doi.org/10.1093/biomethods/bpae047) for more details.

Training and evaluation materials

If you want to train or evaluate lociPARSE, please follow these initial steps:

  • Download the necessary materials from here and place it in the root directory(/lociPARSE)

    wget https://zenodo.org/records/12729167/files/Materials.tar.gz
    
  • Extract the Material.tar.gz folder

    tar -xvzf Materials.tar.gz --strip-components=1
    
  • Make sure if you have installed appropriate torch version compatible with the CUDA version installed in your machine for GPU training. See here for more (https://pytorch.org/get-started/locally/).

Training lociPARSE

If you wish to train lociPARSE from scratch on our training set, please follow these steps:

  • Download our training dataset Train.tar.gz from here and place it inside Input/Dataset folder.

  • Extract the training dataset

    tar -xzvf Train.tar.gz
    
  • Run the following command to train our architecture

    chmod a+x lociPARSE_train.sh && ./lociPARSE_train.sh > log.txt
    

    It will take approximately 16 hours to finish feature generation and 50 epochs of training on a single A100 gpu.

  • The best model saved on validation loss will be placed inside the Model folder as "QAmodel_retrained.pt".

Evaluation of lociPARSE

If you want to generate our reported results in the paper from the provided predictions, follow these steps:

  • To generate Tables 1-6, please run the following commands one by one.

    cd Evaluate
    python3 QA_eval.py Test30_CASP15 0
    python3 QA_eval.py ARES_benchmark2 0
    
  • You will find the corresponding results inside Evaluate/Results folder.

  • To generate Supplementary Figures S1-S2, please run the following commands.

     cd Evaluate
     python3 draw.py
    
  • Generated figures will be inside Evaluate/Figures folder.

If you want to predict the scores by lociPARSE from scratch and re-evaluate, follow these steps:

  • Download our test datasets Test.tar.gz and Ares_set.tar.gz from here and place it inside Input/Dataset folder.

  • Extract the folders

    tar -xzvf Test.tar.gz
    
    tar -xzvf Ares_set.tar.gz
    
  • To predict and evaluate results on our two test sets Test30 and CASP15 (Tables 1-5), please run the following command.

    chmod a+x evaluate.sh && ./evaluate.sh Test30_CASP15 Model/QAmodel_lociPARSE.pt
    
  • To predict and evaluate results on ARES benchmark set-2 (Table 6), please run the following command. [This will be slow due to ~76k models in this test set]

    chmod a+x evaluate.sh && ./evaluate.sh ARES_benchmark2 Model/QAmodel_Ares_set.pt
    
  • You will find the corresponding results inside Evaluate/Results folder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.