Coder Social home page Coder Social logo

sdfsfdx / e2efold Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ml4bio/e2efold

0.0 0.0 0.0 2.68 MB

pytorch implementation for "RNA Secondary Structure Prediction By Learning Unrolled Algorithms"

License: MIT License

Python 99.62% Shell 0.38%

e2efold's Introduction

E2Efold: RNA Secondary Structure Prediction By Learning Unrolled Algorithms

pytorch implementation for RNA Secondary Structure Prediction By Learning Unrolled Algorithms [1]

[Paper] [Presentation] [Slides] [GaTech news] [Chinese news] [Chinese introduction] [Plain explanation]

Setup

Install the package

The environment that we use is given in environment.yml. You can consider to use exactly the same environment by running the following command.

Conda env create -f environment.yml

Please navigate to the root of this repository, and run the following command to install the package e2efold.

source activate rna_ss # activate the enviornment
pip install -e .

Data

Please download the RNA secondary structure data and put all the .tgz files in the /data folder. Then run:

tar -xzf rnastralign_all.tgz
tar -xzf rnastralign_all_600.tgz
tar -xzf archiveII_all.tgz

These files contain the processed data. As a reference, the codes for preprocessing the data are also given in this /data folder.

Folder structure

Finally the project should have the following folder structure:

e2efold
|___e2efold  # source code
|___e2efold_productive  # productive code for handling new sequences
|___data  # data
    |___archiveII_all
    |___rnastralign_all_600
    |___rnastralign_all
    |___preprocess_archiveii.py  # just as a reference. no need to run.
    |......
|___models_ckpt  # trained models
|___results
|___experiment_archiveii
|___experiment_rnastralign
|___slides_and_articles    # slides and articles related to the project 
...

Prediction for user's input sequence

To directly use our trained model to make prediction for any RNA sequence, please refer to the information in /e2efold_productive folder.

Reproduce experimental results in the paper

To reproduce the experiments in our paper, please refer to the following steps:

Test with trained model

You can download the pretrained models and put the .pt files in the folder /models_ckpt.

RNAStralign

You can navigate to the /experiment_ranstralign folder and run the following command to test the model on RNAStralign test dataset:

python e2e_learning_stage3.py -c config.json --test True
python e2e_learning_stage3_rnastralign_all_long.py -c config_long.json --test True

ArchiveII

You can navigate to the /experiment_archiveii folder and run the following command to test the model on ArchiveII data. Note that the saved model is trained on the RNAStralign database.

# For sequences shorter than 600
python e2e_learning_stage3.py -c config.json

# For sequences from 600 to 1800, not performing well on long sequence in archiveii
python e2e_learning_stage3_rnastralign_all_long.py -c config_long.json

Reproduce the training process or re-train the model on a new dataset

The model is trained on the RNAstralign training set. To reproduce the training process, you can navigate to the folder e2efold_rnastralign and run:

# For sequences shorter than 600
python e2e_learning_stage1.py -c config.json  # pre-train the score network
python e2e_learning_stage3.py -c config.json  # end-to-end training

# For sequences from 600 to 1800 
python e2e_learning_stage1_rnastralign_all_long.py -c config_long.json 
python e2e_learning_stage3_rnastralign_all_long.py -c config_long.json 

Given the training logic implemented in the above python files, you can modify the data generator to re-train the model on other datasets. Our data generator in defined in e2efold/data_generator.py. You could probably choose to define a Sub Class based on the Class RNASSDataGenerator.

Citation

If you found this library useful in your research, please consider citing

@article{chen2020rna,
  title={RNA Secondary Structure Prediction By Learning Unrolled Algorithms},
  author={Chen, Xinshi and Li, Yu and Umarov, Ramzan and Gao, Xin and Song, Le},
  journal={arXiv preprint arXiv:2002.05810},
  year={2020}
}

References

[1] Xinshi Chen*, Yu Li*, Ramzan Umarov, Xin Gao, Le Song. "RNA Secondary Structure Prediction By Learning Unrolled Algorithms." In International Conference on Learning Representations. 2020.

e2efold's People

Contributors

liyu95 avatar xinshi-chen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.