Coder Social home page Coder Social logo

secretsanta's Introduction

Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed. minimal R version Linux Build Status

Please note, SecretSanta is currently undergoing updates, meanwhile please use it with R 3.4 and the previous releases of Bioconductor (3.5-3.6).

1. Background

The SecretSanta package provides an R interface for the integrative prediction of extracellular proteins that are secreted via classical pathways.

Secretome prediction often involves multiple steps. Typically, it starts with prediction of short signal peptides at the N-terminal end of a protein. Next, it is crucial to ensure the absence of motifs and domains preventing the protein from being secreted despite the presence of the signal peptide. These sequences include transmembrane domains, short ER lumen retention signals,and mitochondria/plastid targeting signals.

Several command line tools and web-interfaces exist to perform predictions of individual motifs and domains (SignalP, TargetP, TMHMM, WoLF PSORT, TOPCONS) however the interface that combines the outputs in a single flexible workflow is lacking.

The SecretSanta package attempts to bridge this gap. It provides wrapper and parser functions around existing command line tools for prediction of signal peptides and protein subcellular localisation. The functions are designed to work together by producing standardized output. This allows the user to pipe results between individual predictors easily to create flexible custom pipelines and also to compare predictions between similar methods.

To speed-up processing of large input fasta files initial steps of the pipeline are automatically run as a massive parallel process when the number of input sequences exceeds a certain limit.

Taken together SecretSanta provides a platform to build automated multi-step secretome prediction pipelines that can be applied to large protein sets to facilitate comparison of secretomes across multiple species or under various conditions.

Below is a summary of main functionality:

  • manage_paths(): run tests with the external dependencies to ensure correct installation;
  • signalp(): predict signal peptides with SignalP 2.0, SignalP 3.0 or SignalP 4.1;
  • tmhmm(): predict transmembrane domains with TMHMM 2.0;
  • topcons(): parse predictions of transmemrane domains performed by TOPCONS2;
  • targetp(): predict subcellular localisation with TargetP 1.1;
  • wolfpsort(): predict subcellular localisation with WoLF PSORT;
  • check_khdel(): check C-terminal ER-retention signals;
  • m_slicer(): generate proteins with alternative translation start sites;
  • ask_uniprot(): fetch known subcellular location data from UniprotKB based on uniprot ids.

Please see the the pre-build vignette for detailed documentation and use-case scenarios.

Citation:

If you find SecretSanta useful for your work, please cite the following paper:

Anna Gogleva, Hajk-Georg Drost, Sebastian Schornack. SecretSanta: flexible pipelines for functional secretome prediction. Bioinformatics (2018). https://doi.org/10.1093/bioinformatics/bty088

2. External dependencies

SecretSanta relies on a set of existing command line tools to predict secreted proteins. Please install them and configure according to the listed instructions. Due to limitations imposed by the external dependencies, some of SecretSanta wrapper functions won't work in Windows or Mac, however are fully functional on Linux. Please note, signlap() wrapper provides access and can work with legacy versions of SignlP (2.0 and 3.0), as well as the most recent version (4.1). If your application does not require multiple SignalP versions the respective version-specific installation instructions could be skipped.

2.1 Automatic installation of external dependencies

Download the external dependencies:

Place all the tarballs in a dedicated directory and run the installation script inside it.

2.2 Manual installation of external dependencies

Tools for prediction of signal peptides and cleavage sites:
  • signalp-2.0

    tar -zxvf signalp-2.0.Linux.tar.Z
    cd signalp-2.0
    • Edit "General settings" at the top of the signalp file. Set value of 'SIGNALP' variable to be path to your signalp-2.0 directory. Other variables usually do not require changes. We will not use plotting functions from signalp, so gnuplot, ppmtogif and ghostview are not required. For more details please check signalp-2.0.readme.
    • Since, we want to be able to run different versions of signalp, including the legacy ones, it is important to be able to discriminate between them. R is oblivious to shell aliases, so we will simply rename the siganlp script:
    mv signalp signalp2
  • signalp-3.0

    tar -zxvf signalp-3.0.Linux.tar.Z
    cd signalp-3.0
    • Similar to signalp-2.0, edit "General settings" at the top of the signalp file. Set value of 'SIGNALP' variable to be path to your signalp-3.0 directory. Other variables usually do not require changes. For more details please check signalp-3.0.readme.
    • Rename signalp script to avoid further confusion between the versions:
    mv signalp signalp3
  • signalp-4.1 - the most recent version

    tar -zxvf signalp-4.1.Linux.tar.Z
    cd signalp-4.1
    • Edit "General settings" at the top of the signalp file. Set values for 'SIGNALP' and 'outputDir' variables. For more details please check signalp-4.1.readme.
    • Rename signalp script to avoid further confusion between the versions:
    mv signalp signalp4

    Tools for prediction of protein subcellular localization:

  • taretp-1.1

    tar -zxvf targetp-1.1b.Linux.tar.Z
    cd targetp-1.1
    • Edit the paragraph labelled "GENERAL SETTINGS, customize" at the top of the targetp file. Set values for 'TARGETP' and 'TMP' variables. Ensure, that the path to targetp does not exceed 60 characters, otherwise targetp-1.1 might fail.
  • WoLFPsort

    • Clone WoLFPsort
    git clone https://github.com/fmaguire/WoLFPSort.git
    cd WoLFPSort
    • Copy the binaries from the appropriate platform specific binary directory ./bin/binByPlatform/binary-? to `./bin/``
    • For more details please check the INSTALL file.
    • The most important script we need runWolfPsortSummary has a bulky name, we will rename it to simply wolfpsort for the future convenience:
    mv runWolfPsortSummary wolfpsort
Tools for prediction of transmembrane domains
  • tmhmm-2.0

    tar -zxvf tmhmm-2.0c.Linux.tar.gz
    cd tmhmm-2.0c
    • Set correct path for Perl 5.x in the first line of bin/tmhmm and bin/tmhmmformat.pl scripts.
    • For more details please check the README file.
Organise access to the external dependencies

The best option would be to make all the external dependencies are accessible from any location. This requires modification of $PATH environment variable.

To make the change permanent, edit .profile:

# Open .profile:
gedit ~/.profile

Add a line with all the path exports. In this example all the dependencies are installed in the my_tool directory:

export PATH=
"/home/my_tools/signalp-4.1:\
/home/my_tools/signalp-2.0:\
/home/my_tools/signalp-3.0:\
/home/my_tools/targetp-1.1:\
/home/tmhmm-2.0c/bin:\
/home/my_tools/WoLFPSort/bin:\
$PATH"

Reload .profile:

. ~/.profile

Reboot, to make changes visible to R. If you are using csh or tcsh, edit .login instead of .profile and use the setenv command instead of export.

3. Installation

To install SecretSanta package:

library("devtools")
install_github("gogleva/SecretSanta")
library("SecretSanta")

Details about individual functions, pipeline assemblies and use case scenarios are documented in the vignette. For a short-form documentation please use:

?SecretSanta

Reporting bugs

please raise an issue (preferred option) or email [email protected] about bugs and strange things.

secretsanta's People

Contributors

gogleva avatar hajkd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

secretsanta's Issues

signalp(version = 4) - is broken

Error in names(sp) <- c("gene_id", "Cmax", "Cpos", "Ymax", "Ypos", "Smax", :
'names' attribute [12] must be the same length as the vector [5]

fix cleavage position in signalp2 output

sp2 outputs something weird as Cpos in the NN results: values > 100
so we will use a different line to extract cleavage position:

'Max cleavage site probability:' - should work for both signalp2 and signalp3 outputs

helper functions

isolate helper functions and source all of them (e.g crop_names)

re-test examples

upd examples, take into account all the new parameters and added functionality

pkgdown fails to update docs

Initialising site -----------------------------------------------------------------------------------
Copying '/home/anna/R/x86_64-pc-linux-gnu-library/3.4/pkgdown/assets/jquery.sticky-kit.min.js'
Copying '/home/anna/R/x86_64-pc-linux-gnu-library/3.4/pkgdown/assets/link.svg'
Copying '/home/anna/R/x86_64-pc-linux-gnu-library/3.4/pkgdown/assets/pkgdown.css'
Copying '/home/anna/R/x86_64-pc-linux-gnu-library/3.4/pkgdown/assets/pkgdown.js'
Building home ---------------------------------------------------------------------------------------
Error in xml2::xml_contents(heading)[[1]] : subscript out of bounds

Installation with tests fails

install('SecretSanta') 

Installing SecretSanta
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL  \
  '/home/anna/anna/Labjournal/SecretSanta' --library='/home/anna/R/x86_64-pc-linux-gnu-library/3.4' --install-tests 

Error: Command failed (127)

parse_signalp

revise examples, current example does not contain 'Signal peptide field'

feedback

  1. Vignette: fix typos and grammar.
  2. Vignette: figure numbering.
  3. Vignette: Fig 1: typo in target
  4. Incorporate installation script
  5. Run mode - not very intuitive argument, make Boolean and piper by default.
  6. Further work: higher level of abstraction for individual functions -> virtual wrapper.

Problem using targetp 2.0c linux

when running the targetp function, when the backend is pointing to targetp v2.0c, the code fails with the message:

flag provided but not defined: -N
Usage of targetp:
...usage statement for targetp...
Error in checkForRemoteErrors(val) :
one node produced an error: 'names' attribute [7] must be the same length as the vector [1]

Is this because SecretSanta will only work with the older version of targetp (v1.1)? I would have just tried that before posting here but I haven't been able to find an old version of targetp. Their website only provides v2.0c as far as I can see.

docker

  • try to put all the dependencies in a docker container
  • licence issue --> ask developers once again

fix citations

vignettes: check that citations are correct and adequately formatted
missing entries?

Error in .wrap_in_length_one_list_like_object

Have been trying to run the example for signalp4 and it seems to run well, except the result is not returned, gor the same error with targetp.

step1_sp4 <- signalp(inp, version = 4,
organism = 'euk',
run_mode = "starter")

Output=

"Version used ... signalp4
Running signalp locally ...
2 sequences need to be truncated ...
Ok for single processing.
Submitted sequences... 100
Candidate sequences with signal peptides ... 3
Error in .wrap_in_length_one_list_like_object(value, names(x)[[i]], x) :
failed to coerce 'list(value)' to a AAStringSetList object of length 1"

docker (??)

Very nice package. Much appreciated.
There are third-party tools dependencies.
Do you plan to deliver a docker? Containers look like the natural solution for these package architectures.

Bests,
-A

abstract wrapper function

Higher level of abstraction for individual functions -> virtual wrapper/closure -> framework for integration of new tools? Would it make documentation more complex?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.