Coder Social home page Coder Social logo

disopred's Introduction

DISOPRED RELEASE NOTES
======================

DISOPRED Version 3.1

Copyright 2014 D. Jones, D. Cozzetto & J. Ward. All Rights Reserved.

Here are some brief notes on using the DISOPRED 3.1 software.

Please see the LICENSE file for the license terms for the software.
Basically it is free to academic users as long as you don't want to sell
the software or, for example, store the results obtained with it in a
database and then try to sell the database. If you do wish to sell the
software or use it commercially, then please contact [email protected]
to discuss licensing terms.


What is new in DISOPRED3
========================

DISOPRED3 represents the latest release of our successful machine-learning
based approach to the detection of intrinsically disordered regions. The
method was originally trained on evolutionarily conserved sequence features
of disordered regions from missing residues in high-resolution X-ray structures.
DISOPRED2 mainly addressed the marked class imbalance between ordered and
disordered amino acids as well as the different sequence patterns associated
with terminal and internal disordered regions using SVMs.

DISOPRED3 extends the previous architecture with two independent predictors of
intrinsic disorder - a neural network and a nearest neighbour classifier - which
were trained to identify long intrinsically disordered regions using data from
the PDB and DisProt databases. The intermediate results are integrated by an
additional neural network.

To provide insights into the biological roles of proteins, DISOPRED3 also predicts
protein binding sites within disordered regions using a SVM that examines patterns
of evolutionary sequence conservation, positional information and amino acid
composition of putative disordered regions.


Installing DISOPRED3
====================

The program is supplied in source code form - some components must be
compiled before they can be used. On a standard Unix or Linux system,
DISOPRED can be compiled and installed from the src/ directory with:

make clean

make

make install

The process will place the executables in the DISOPRED bin/ directory, where
the script "run_disopred.pl" expects to find them. A copy of the svm-predict
program from the LIBSVM package Version 3.17 is also included for the prediction
of protein binding sites within disordered regions. Full details of LIBSVM,
including the licence, can be found at:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

You will additionally need to download the disored library to a directory
called dso_lib. From the main disopred directory run

wget http://bioinfadmin.cs.ucl.ac.uk/downloads/DISOPRED/dso_lib.tar.gz

tar -zxvf dso_lib.tar.gz

You must also set the ENVIRONMENT variable for the DSO_LIB_PATH to the path to
newly untarred dso_lib/

Configuring DISOPRED3
=====================

A simple Perl script called "run_disopred.pl" allows to predict intrinsically
disordered regions and protein binding sites within them. The script assumes
that the NCBI BLAST binaries and appropriate sequence databases have been
installed locally. Their location is specified through the variables:

my $NCBI_DIR = "/home/bin/blast-2.2.26/bin/"; # directory where the BLAST binaries are
my $SEQ_DB   = "/home/uniref/uniref90"; # the path to the formatdb'ed sequence database

The NCBI executables can be obtained from ftp://ftp.ncbi.nih.gov/blast

Suitable sequence data banks are available from ftp://ftp.ncbi.nih.gov/blast/db/
and ftp://ftp.ebi.ac.uk/pub/databases/uniprot/

********************       IMPORTANT NOTE ON BLAST+       *****************
NCBI are encouraging users to switch over from the classic BLAST package
to the new BLAST+ package. On the one hand this is a cleaner and nicer
version of BLAST, but on the other hand, it omits some useful features.
In particular, BLAST+ no longer offers the facility to extract more precise
PSSM scores from checkpoint files in a "supported" way (i.e. using the
makemat utility for this purpose).

Eventually, we will probably switch over to BLAST+ as the preferred way of
searching for similar sequences, but for the time being no interface to
BLAST+ is provided.
***************************************************************************

The Perl script also expects to find the directories bin/, data/ and dso_lib/ at the same path.
If you need to move these directories somewhere else, please change the values of the variables
with the new full paths

my $EXE_DIR = abs_path(join '/', dirname($0), "bin"); # the path of the bin directory
my $DATA_DIR = abs_path(join '/', dirname($0),"data"); # the path of the data directory
$ENV{DSO_LIB_PATH} = join '', abs_path("./dso_lib"), '/'; # the path of the library directory used by the nearest neighbour classifier

Running DISOPRED3
=================

The script "run_disopred.pl" requires as input a text file containing one
amino acid sequence for which predictions are sought. A few parameters can
be tuned from inside the script, including the PSI-BLAST search options and
the DISOPRED2 SVM specificity level. During the execution, a number of
temporary files will be generated (e.g. PSI-BLAST output files, the PSSM file,
the intermediate disordered residue prediction files, the input file to
svm-predict), which are identified by concatenating the input file name, the
process id of the Perl job and the numeric identifier for the host. These
files are removed after the final output has been generated in the same
directory as the input.

Here is the output of a successful DISOPRED run for the file examples/example.fasta:

./run_disopred.pl examples/example.fasta

Running PSI-BLAST search ...

Generating PSSM ...

Predicting disorder with DISOPRED2 ...

Running neural network classifier ...

Running nearest neighbour classifier ...

Combining disordered residue predictions ...

Predicting protein binding residues within disordered regions ...

Cleaning up ...

Finished

Disordered residue predictions in absolute-path/examples/example.diso

Protein binding disordered residue predictions in absolute-path/examples/example.pbdat


OUTPUT FILE FORMAT
==================

Results are saved in plain ASCII text format. Disordered region predictions are presented
in tabular format with four fields on each line representing the amino acid position, the
residue single letter code, the order/disorder assignment code, and the corresponding
confidence level. Ordered residues are marked with dots (.) and have scores in [0.00, 0.49];
disordered residues are labelled with asterisks (*) and are scored in [0.50, 1.00].

Putative disordered protein binding sites are annotated in a similar way, with one row for
each amino acid and four fields representing the sequence position, the single letter code,
the assignment code, and the confidence level. Ordered residues are labelled with dots (.)
and have no score associated, so the value in last field is "NA". Protein-binding disordered
residues are indicated by carets (^) and their confidence scores are in [0.50, 1.00], while
all other unstructured positions are tagged with dashes (-) and are scored in [0.00, 0.49].


Citing DISOPRED3
================

Please cite:

Jones, D.T. and Cozzetto, D. (2014) DISOPRED3: Precise disordered region
predictions with annotated protein binding acrivity, Bioinformatics

disopred's People

Contributors

danbuchan avatar hyphaltip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

disopred's Issues

docker image and code license

I was wondering if there exists a public docker image of DisoPred or if the code license allows generating a docker image and uploading it on a public registry such as quay.io or docker hub ?

Thank you !
Eliza

Failing for long sequences

run_disopred.pl fails for the three titin variants A2ASS6, E9Q8K5, and E9Q8N1 (uniprot accession codes). These are very long sequences, >30000 aa. No non-standard amino acids can be found in the sequences. See the output below.

E9Q8N1
Running PSI-BLAST search ...

Generating PSSM ...

Predicting disorder with DISOPRED2 ...

/domus/h1/marklund/src/disopred3/disopred/bin/disopred2 /domus/h1/marklund/calc_disorder/wd/E9Q8N1 /domus/h1/marklund/calc_disorder/wd/E9Q8N1_26496_12accd0b.mtx /domus/h1/marklund/src/disopred3/disopred/data/ 5 
mv /domus/h1/marklund/calc_disorder/wd/E9Q8N1.diso /domus/h1/marklund/calc_disorder/wd/E9Q8N1.diso2 
Running neural network classifier ...

/domus/h1/marklund/src/disopred3/disopred/bin/diso_neu_net /domus/h1/marklund/src/disopred3/disopred/data/weights.dat.nmr_nonpdb /domus/h1/marklund/calc_disorder/wd/E9Q8N1_26496_12accd0b.mtx > /domus/h1/marklund/calc_disorder/wd/E9Q8N1.nndiso 
Running nearest neighbour classifier ...

/domus/h1/marklund/src/disopred3/disopred/bin/diso_neighb /domus/h1/marklund/calc_disorder/wd/E9Q8N1_26496_12accd0b.mtx /domus/h1/marklund/src/disopred3/disopred/data/dso.lst > /domus/h1/marklund/calc_disorder/wd/E9Q8N1.dnb 
Combining disordered residue predictions ...

/domus/h1/marklund/src/disopred3/disopred/bin/combine /domus/h1/marklund/src/disopred3/disopred/data/weights_comb.dat /domus/h1/marklund/calc_disorder/wd/E9Q8N1.diso2 /domus/h1/marklund/calc_disorder/wd/E9Q8N1.nndiso /domus/h1/marklund/calc_disorder/wd/E9Q8N1.dnb > /domus/h1/marklund/calc_disorder/wd/E9Q8N1.diso 
[/home/marklund/src/disopred3/disopred/run_disopred.pl] ERROR: Different numbers of elements in the profile data structure and the array of disordered region lengths

My perl skills are just too weak to figure out what goes wrong, but the sequence length seem like a likely culprit. All other >55000 proteins in my dataset worked fine.

Replace svm-predict

Hi.

Can I replace the LIBSVM package Version 3.17 by the new version 3.23, released on July 15, 2018?

My best regards.

Duca.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.