Coder Social home page Coder Social logo

ltrpred's Introduction

LTRpred(ict)

A bioinformatics pipeline for meta-genome scale functional de novo annotation of LTR retrotransposons

Due to their enormous contribution to genome structure and genome evolution transposable elements allow us to study fundamental mechanisms of phenotypic adaptation, diversification, and evolution. In particular, understanding the recognition and regulation of transposable elements by the genetic regulatory machinery will enable us to systematically identify the key players and key processes that enable niche adaptation and species diversification on the genetic level.

The LTRpred pipeline aims to provide an integrated software framework to predict potentially functional LTR transposons in any genomic sequence of interest. First, LTRpred retrieves de novo annotations of retrotransposons via LTRharvest and LTRdigest and second efficiently screens, filters and annotates those predictions for potentially functional elements.

LTR transposons have the capacity to move to new sites in genomes through a copy-and-paste mechanism and by doing so are able to contribute generatively to genome evolution and environmental sensing on the genetic level. Hence, predicting the presence of LTR transposons within genomes as well as their capacity to perform this copy-and-paste strategy enables us to quantify the extent to which transposons shape the adaptation and evolution of life in general.

In particular the following analyses can be performed with LTRpred:

De novo prediction and annotation

  • de novo prediction of LTR retrotransposons (nested, overlapping, or pure template) using LTRharvest and LTRdigest
  • annotation of predicted LTR retrotransposons using Dfam or Repbase as reference
  • solo LTR prediction based on specialized BLAST searches
  • LTR retrotransposons family clustering using vsearch
  • open reading frame prediction in LTR retrotransposons using usearch
  • age estimation of predicted LTR retrotransposons in Mya (not implemented yet, but soon to come..)
  • CHH, CHG, CG, ... content quantification in predicted LTR retrotransposons
  • filtering for (potentially) functional LTR retrotransposons
  • quality assesment of input genomes used to predict LTR retrotransposons

Meta-Genomics Analyses

  • run LTRpred on entire kingdoms of life using only one command (see ?LTRpred.meta)
  • perform meta genomics studies customized for LTR retrotransposons
  • cluster LTR retrotransposons within and between species
  • quantify the diversity space of LTR retrotransposons for entire kingdoms of life

Install

# install the current version of LTRpred on your system
source("http://bioconductor.org/biocLite.R")
biocLite("devtools")
biocLite("HajkD/LTRpred")

Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:

https://github.com/HajkD/LTRpred/issues

NEWS

The current status of the package as well as a detailed history of the functionality of each version of LTRpred can be found in the NEWS section.

Tutorials

This tutorial introduces users to LTRpred:

Users can also read the tutorials within (RStudio) :

library(LTRpred)
browseVignettes("LTRpred")

In the LTRpred framework users can find:

De Novo Annotation Functions:

  • LTRpred() : Major pipeline to predict LTR retrotransposons in a given genome
  • LTRpred.meta : Perform Meta-Analyses with LTRpred
  • meta.summarize() : Summarize (concatenate) all predictions of a LTRpred.meta() run
  • meta.apply() : Apply functions to meta data generated by LTRpred()
  • LTRharvest() : Run LTRharvest to predict putative LTR Retrotransposons
  • LTRdigest() : Run LTRdigest to predict putative LTR Retrotransposons

Sequence Clustering and Similarity Computations

  • CLUSTpred() : Cluster Sequences with VSEARCH
  • cluster.members() : Select members of a specific cluster
  • clust2fasta() : Export sequences of TEs belonging to the same cluster to fasta files
  • AllPairwiseAlign() : Compute all pairwise (global) alignments with VSEARCH
  • filter.uc() : Filter for cluster members
  • SimMatAbundance() : Compute histogram shape similarity between species

LTR Copy Number Estimation

  • ltr.cn() : Detect solo LTR copies of predicted LTR transposons
  • cn2bed() : Write copy number estimation results to BED file format.

Filter Functions

  • filter.jumpers() : Detect LTR retrotransposons that are potential jumpers
  • tidy.datasheet() : Select most important columns of 'LTRpred' output for further analytics

Import the Output Files of the Prediction Tools:

  • read.prediction() : Import the output of LTRharvest or LTRdigest
  • read.tabout() : Import information sheet returned by LTRdigest
  • read.orfs() : Read output of ORFpred()
  • read.seqs() : Import sequences of predicted LTR transposons
  • read.ltrpred() : Import the data sheet file generated by LTRpred()
  • read.uc() : Read file in USEARCH cluster format
  • read.blast6out() : Read file in blast6out format generated by USEARCH or VSEARCH

Export the Output Files of the Prediction Tools:

  • pred2bed() : Format LTR prediction data to BED file format
  • pred2fasta() : Save the sequence of the predicted LTR Transposons in a fasta file
  • pred2gff() : Format LTR prediction data to GFF3 file format
  • pred2annotation() : Match LTRharvest, LTRdigest, or LTRpred prediction with a given annotation file in GFF3 format
  • pred2csv() : Format LTR prediction data to CSV file format

Analytics Tools:

  • ORFpred() : Open Reading Frame prediction in putative LTR transposons

Annotation and Validation:

  • dfam.query() : Annotation of de novo predicted LTR transposons via Dfam searches
  • read.dfam() : Import Dfam Query Output
  • repbase.clean() : Clean the initial Repbase database for BLAST
  • repbase.query() : Query the RepBase to annotate putative LTRs
  • repbase.filter() : Filter the Repbase query output

Methylation Context Estimation

  • motif.count() : Low level function to detect motifs in strings

Visualization Framework

  • plot_ltrsim_individual() : Plot the age distribution of predicted LTR transposons
  • plot_ltrwidth_individual() : Plot the width distribution of putative LTR transposons or LTRs for individual species
  • plot_ltrwidth_species() : Plot the width distribution of putative LTR transposons or LTRs for all species
  • plot_ltrwidth_kingdom() : Plot the width distribution of putative LTR transposons or LTRs for all kingdoms
  • plot_copynumber_individual() : Plot the copy number distribution of putative LTR transposons or LTRs for individual species
  • plot_copynumber_species() : Plot the copy number distribution of putative LTR transposons or LTRs for all species
  • plot_copynumber_kingdom() : Plot the copy number distribution of putative LTR transposons or LTRs for all kingdoms
  • plotLTRRange() : Plot Genomic Ranges of putative LTR transposons
  • PlotSimCount() : Plot LTR Similarity vs. predicted LTR count
  • plotSize() : Plot Genome size vs. LTR transposon count
  • plotSizeJumpers() : Plot Genome size vs. LTR transposon count for jumpers
  • plotFamily() : Visualize the Superfamily distribution of predicted LTR retrotransposons
  • plotDomain() : Visualize the Protein Domain distribution of predicted LTR retrotransposons
  • plotCN() : Plot correlation between LTR copy number and methylation context
  • plotCluster() : Plot correlation between Cluster Number and any other variable
  • PlotInterSpeciesCluster() : Plot inter species similarity between TEs (for a specific cluster)
  • PlotMainInterSpeciesCluster() : Plot inter species similarity between TEs (for the top n clusters)

Minor helper functions

  • bcolor() : Beautiful colors for plots
  • file.move() : Move folders from one location to another
  • get.pred.filenames() : Retrieve file names of files genereated by LTRpred
  • get.seqs() : Quickly retrieve the sequences of a 'Biostrings' object
  • ws.wrap.path() : Wrap whitespace in paths
  • rename.fasta() : rename.fasta

Acknowledgement

I would like to thank the Paszkowski team for incredible support and motivating discussions that led to the realization of this project.

ltrpred's People

Contributors

hajkd avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.