Coder Social home page Coder Social logo

uzh-dqbm-cmi / pridict Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 5.0 347.34 MB

Prime editing guide RNA prediction

Home Page: https://pridict.it/

License: MIT License

Python 61.98% Jupyter Notebook 37.92% Rich Text Format 0.10%
crispr-cas9 deep-learning machine-learning prime-editing

pridict's Introduction

πŸ“£ πŸ“£ πŸ“£ Update: Check out PRIDICT2.0 from our updated study here. πŸ“£ πŸ“£ πŸ“£

PRIDICT: PRIme editing guide RNA preDICTion

PRIDICT logo

For accessing Supplementary Files, click here.

Repository containing python package for running trained PRIDICT (PRIme editing guide RNA preDICTion) models. prieml package includes modules to setup and run PRIDICT models for predicting prime editing efficiency and product purity.

To run PRIDICT online, see our webapp.


Installation using Anaconda (Linux and Mac OS) 🐍

πŸ“£ PRIDICT can only be installed on Linux and Mac OS since ViennaRNA package is not available for Windows πŸ“£

The easiest way to install and manage Python packages on various OS platforms is through Anaconda. Once installed, any package (even if not available on Anaconda channel) could be installed using pip.

  • Install Anaconda.

  • Start a terminal and run:

    # clone PRIDICT repository
    git clone https://github.com/uzh-dqbm-cmi/PRIDICT.git
    # navigate into repository
    cd PRIDICT
    # create conda environment and install dependencies for PRIDICT (only has to be done before first run/install)
    # use pridict_linux for linux machine or pridict_mac for a macbook
    conda env create -f pridict_linux.yml # pridict_mac.yml for macbook
    # note that this step ('Solving environment:') can take a while (sometimes up to 45 min), but should eventually succeed.
    # if it doesn't succeed, try to remove viennarna from the .yml file and install it separately with 
    # conda install -c conda-forge -c bioconda viennarna
    
    
    # activate the created environment
    conda activate pridict
    
    	### ONLY FOR M1 (or newer) Mac you need to additionally run the following conda install command (tensorflow): 
    	conda install conda-forge::tensorflow
    	# optional (only if encountering error with libiomp5.dylib on MacOS):
    	pip uninstall numpy
    	pip install numpy==1.22.1
    	###
    	
    
    # run desired PRIDICT command (manual or batch mode, described below)
    python pridict_pegRNA_design.py manual --sequence-name seq1 --sequence 'GCCTGGAGGTGTCTGGGTCCCTCCCCCACCCGACTACTTCACTCTCTGTCCTCTCTGCCCAGGAGCCCAGGATGTGCGAGTTCAAGTGGCTACGGCCGA(G/C)GTGCGAGGCCAGCTCGGGGGCACCGTGGAGCTGCCGTGCCACCTGCTGCCACCTGTTCCTGGACTGTACATCTCCCTGGTGACCTGGCAGCGCCCAGATGCACCTGCGAACCACCAGAATGTGGCCGC'
    # results are stored in 'predictions' folder
  • PRIDICT environment only has to be installed once. When already installed, follow the following commands to use PRIDICT again:

    # open Terminal/Command Line
    # navigate into repository
    # activate the created environment
    conda activate pridict
    # run desired PRIDICT command (manual or batch mode, described below)
    python pridict_pegRNA_design.py manual --sequence-name seq1 --sequence 'GCCTGGAGGTGTCTGGGTCCCTCCCCCACCCGACTACTTCACTCTCTGTCCTCTCTGCCCAGGAGCCCAGGATGTGCGAGTTCAAGTGGCTACGGCCGA(G/C)GTGCGAGGCCAGCTCGGGGGCACCGTGGAGCTGCCGTGCCACCTGCTGCCACCTGTTCCTGGACTGTACATCTCCCTGGTGACCTGGCAGCGCCCAGATGCACCTGCGAACCACCAGAATGTGGCCGC'
    # results are stored in 'predictions' folder

Running PRIDICT in 'manual' mode:

Required:

  • --sequence-name: name of the sequene (i.e. unique id for the sequence)
  • --sequence: target sequence to edit in quotes (format: "xxxxxxxxx(a/g)xxxxxxxxxx"; minimum of 100 bases up and downstream of parentheses are needed; put unchanged edit-flanking bases outside of parentheses (e.g. xxxT(a/g)Cxxx instead of xxx(TAC/TGC)xxx)

Optional:

  • --output-dir: output directory where results are dumped on disk (default: ./predictions; directory must already exist before running)
  • --use-5folds: Use all 5-folds trained models. Default is to use fold-1 model
  • --cores: Number of cores to use for multiprocessing. Default value 0 uses all available cores.
  • --nicking: Additionally, design nicking guides for edit (PE3) with DeepSpCas9 prediction.
  • --ngsprimer: Additionally, design NGS primers for edit based on Primer3 design.
python pridict_pegRNA_design.py manual --sequence-name seq1 --sequence 'GCCTGGAGGTGTCTGGGTCCCTCCCCCACCCGACTACTTCACTCTCTGTCCTCTCTGCCCAGGAGCCCAGGATGTGCGAGTTCAAGTGGCTACGGCCGA(G/C)GTGCGAGGCCAGCTCGGGGGCACCGTGGAGCTGCCGTGCCACCTGCTGCCACCTGTTCCTGGACTGTACATCTCCCTGGTGACCTGGCAGCGCCCAGATGCACCTGCGAACCACCAGAATGTGGCCGC'

Running in batch mode:

Required:

  • --input-fname: input file name - name of csv file that has two columns [editseq, sequence_name]. See batch_template.csv in the ./input folder

Optional:

  • --input-dir : directory where the input csv file is found on disk
  • --output-dir: directory on disk where to dump results (default: ./predictions)
  • --output-fname: output filename used for the saved results
  • --use-5folds: Use all 5-folds trained models. Default is to use fold-1 model
  • --cores: Number of cores to use for multiprocessing. Default value 0 uses all available cores.
  • --nicking: Additionally, design nicking guides for edit (PE3) with DeepSpCas9 prediction.
  • --ngsprimer: Additionally, design NGS primers for edit based on Primer3 design.
 python pridict_pegRNA_design.py batch --input-fname batch_example_file.csv --output-fname batchseqs

Citation

If you find our work is useful in your research, please cite the following paper:

@article {Mathis et al.,
author = {Mathis, Nicolas and Allam, Ahmed and Kissling, Lucas and Marquart, Kim Fabiano and Schmidheini, Lukas and Solari, Cristina and BalΓ‘zs, Zsolt and Krauthammer, Michael and Schwank, Gerald},
title = {Predicting prime editing efficiency and product purity by deep learning},
year = {2023},
doi = {10.1038/s41587-022-01613-7},
URL = { https://www.nature.com/articles/s41587-022-01613-7 },
journal = {Nature Biotechnology}
}

pridict's People

Contributors

lhentges avatar mathinic avatar orisenbazuru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pridict's Issues

How to specify a genomic alteration caused by a concurrent deletion and inseriton.

Hi,

I've encountered an issue for pegRNA design of SFTPB121ins as this specific mutation is caused by a concurrent deletion and insertion at the same position. I have provided the sequence below including 110 bp flanking regions. The aim is to correct the disease-variant "gaa" to the wild-type 'c".

GGCCAAGGAGGCCATTTTCCAGGACACGATGAGGAAGTTCCTGGAGCAGGAGTGCAACGTCCTCCCCTTGAAGCTGCTCATGCCCCAGTGCAACCAAGTGCTTGACGACTACTTC(gaa/c)CCCTGGTCATCGACTACTTCCAGAACCAGACTGACTCAAACGGCATCTGTATGCACCTGGGCCTGTGCAAATCCCGGCAGCCAGAGCCAGAGCAGGAGCCAGGGATGTCA

I am unfortunately unable to queue the above as "replacement".

Any easy fix for this type of mutation?

Thanks
Jakob

output dir not functioning

Hey there again. I got one more for you.

If I pass the --output-dir into my call, it doesnt write it to that location on my computer. It also doesnt make a folder in the directory I am in either. Maybe I am misunderstand how this is supposed to be used?

Error Reporting for Batch mode

I am attempting to run PRIDICT in batch mode, and keep getting exceptions but I am not able to trace the error itself. I have attached my csv as an example... with this command
python pridict_pegRNA_design.py batch --input-fname design.csv --use_5folds --nicking --combine_results

design.csv

Calculating features took 19.3 seconds to run.
Deep model took 79.3 seconds to run.
-- Exception occured --
Length of values (2290) does not match length of index (459)
1it [00:28, 28.12s/it]
1it [00:07,  7.46s/it]
1it [00:28, 28.56s/it]

Calculating features took 18.8 seconds to run.
Deep model took 84.2 seconds to run.
-- Exception occured --
Length of values (2289) does not match length of index (459)
1it [00:02,  2.46s/it]

Calculating features took 18.9 seconds to run.
Deep model took 86.2 seconds to run.
-- Exception occured --
Length of values (2292) does not match length of index (459)
1it [00:02,  2.95s/it]
1it [00:01,  1.80s/it]

Calculating features took 29.7 seconds to run.
Deep model took 77.4 seconds to run.
-- Exception occured --
Length of values (3056) does not match length of index (612)
<<< joined row computation process
<<< joined row computation process
<<< joined row computation process
<<< joined row computation process
<<< joined row computation process
***
 Error :( Check your input format is compatible with PRIDICT! More information in input box on https://pridict.it/ ...
***

Question regarding PRIDICT Sequences Format and Source Code for Retraining

I am currently working on reproducing the results of the following work: 'Predicting the efficiency of prime editing guide RNAs in human cells.' To run the PRIDICT tool, the format of the sequence used as input should have the following characteristics: 'xxxxxxxxx(a/g)xxxxxxxxxx.' A minimum of 100 bases up and downstream of the brackets are needed. Unchanged edit-flanking bases should be placed outside of the brackets (e.g., xxxT(a/g)Cxxx instead of xxx(TAC/TGC)xxx).

I have noticed that in library 2 of the article's datasets, the PRIDICT sequence format(excel sheet) given does not follow this criterion. I would like to be informed if the sequences are being extended through the training process. If they are, would it be possible to send me the source code to retrain the tool?

Thank you for your time!

poly-T filtering

Hi!
How does your model deal with poly-T stretches, does it filter them out or are the guides merely scored poorly?

Error running test case after conda install - tensorflow libflatbuffers.so.2

Installation using a fresh conda env, using the pridict_linux.yml, on Ubuntu cluster

python pridict_pegRNA_design.py manual --sequence-name seq1 --sequence 'GCCTGGAGGTGTCTGGGTCCCTCCCCCACCCGACTACTTCACTCTCTGTCCTCTCTGCCCAGGAGCCCAGGATGTGCGAGTTCAAGTGGCTACGGCCGA(G/C)GTGCGAGGCCAGCTCGGGGGCACCGTGGAGCTGCCGTGCCACCTGCTGCCACCTGTTCCTGGACTGTACATCTCCCTGGTGACCTGGCAGCGCCCAGATGCACCTGCGAACCACCAGAATGTGGCCGC'

File ".../lib/python3.10/site-packages/tensorflow/python/pywrap_tensorflow.py", line 62, in
from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: libflatbuffers.so.2: cannot open shared object file: No such file or directory
from tensorflow.python._pywrap_tensorflow_internal import *

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.