Coder Social home page Coder Social logo

anamir's Introduction

anamiR

An integrated analysis R package of miRNA and mRNA profiiling

This document guides the user through all available functions of the anamiR package. anamiR aims to find potential miRNA-target gene interactions from both miRNA and mRNA expression data.

Traditional miRNA analysis methods use online databases to predict miRNA-target gene interactions. However, the inconsistent results make the interactions less reliable. To address this issue, anamiR integrates the whole expression analysis with expression data into a workflow including normalization, differential expression, correlation, and finally database intersection, to find more reliable interactions. Moreover, users can identify interactions from pathways or gene sets of interest.

Data Source

As shown in the workflow, not only samples of paired miRNA and mRNA expression data, but also phenotypic information on miRNA and mRNA sequences are required for the analysis. Since anamiR reads data in expression matrices, data sources are platform- and technology-independent. Particularly, expression data from microarrays or next generation sequencing are all acceptable for anamiR. However, this also means the raw data have to be formatted into expression matrices before using anamiR.

mRNA expression

Columns for samples. Rows for genes

GENE  SmapleA   SamplB ...
A     0.1       0.2
B    -0.5      -0.3
C     0.4       0.1

miRNA expression

Columns for samples. Rows for miRNAs

miRNA  SampleA  SampleB ...
A         0.1     0.5
B        -0.5     2.1
C         0.4     0.3

phenotype data

Cloumns for samples. Rows for feature name, including two groups, multiple groups, continuous data.

Feature  groupA  groupA  groupA ...
SampleA  123.33     A       A
SampleB  120.34     B       C
SampleC  121.22     A       B

Installation

anamiR is on Bioconductor and can be installed following standard installation procedure.

install.packages("BiocManager")
BiocManager::install("anamiR")

To use,

library(anamiR)

General Workflow

Basically there are six steps, corresponding to six R functions, to complete the whole analysis:

  1. Normalize expression data
  2. Find differentially expressed miRNAs and genes
  3. Convert miRNA annotation to the latest version
  4. Calculate the correlation coefficient between each miRNA and gene
  5. Predict and validate the intersection of miRNA-target gene interaction databases
  6. Functional analysis of genes of interest

alt text

  1. Import data
library(anamiR)
data(mrna)
data(mirna)
data(pheno.mirna)
data(pheno.mrna)
  1. SummarizedExperimaent object

Before entering the main workflow, we should put our data and phenotype information into “SummarizedExperiment” format first.

mirna_se <- SummarizedExperiment(assays = SimpleList(counts=mirna), colData = pheno.mirna)
mrna_se <- SummarizedExperiment(assays = SimpleList(counts=mrna), colData = pheno.mrna)
  1. Differential Expression analysis

Second, we will find differentially expressed genes and miRNAs. There are three statistical methods in this function. Here, we use “limma” for demonstration.

mirna_d <- differExp_discrete(se = mirna_se, class = "ER", method = "limma", log2 = FALSE, p_value.cutoff = 0.05, logratio = 0.5, p_adjust.method = "BH")
mrna_d <- differExp_discrete(se = mrna_se, class = "ER", method = "limma", log2 = FALSE, p_value.cutoff = 0.05, logratio = 0.5, p_adjust.method = "BH")

This function will delete genes and miRNAs (rows) that are not differentially expressed, and add another three columns to represent fold-change (log2), p-value, adjusted p-value.

  1. miRNA ID conversion

Before using the collected databases to find the intersection of potential miRNA-target gene interactions, we have to make sure all miRNA data are in the latest annotation version (miRBase 21). If not, we can use this step to do it.

mirna_21 <- miR_converter(data = mirna_d, remove_old = TRUE, original_version = 17, latest_version = 21)
  1. Correlation abalysis

Fourth, to find potential miRNA-target gene interactions, we have to combine the information from two differentially expressed datasets, which we obtain from “differExp_discrete”.

cor <- negative_cor(mrna_data = mrna_d, mirna_data = mirna_21, cut.off = -0.5, method = "pearson")

In the displayed list, each row is a potential interaction, and only the rows whose correlation coefficient is < cut.off would be kept in the list.

Note that in our assumption, miRNAs negatively regulate the expression of their target genes; that is, cut.off should typically be a negative decimal.

  1. Database Intersection

After correlation analysis, we have some potential interactions, and we use the “database_support” function to get information on whether there are databases that predict or validate these interactions.

sup <- database_support(cor_data = cor, Sum.cutoff = 2, org = "hsa")

In the output, the Sum column tells us the total hits in the 8 prediction databases and the Validate column tells us if each interaction has been validated.

  1. Heatmap visualization
heat_vis(cor, mrna_d, mirna_21)
  1. Functional analysis

Finally, after finding reliable miRNA-target gene interactions, we are also interested in the pathways that may be enriched by the expression of these genes.

enr <- enrichment(data_support = sup, per_time = 5000)

Note that for the parameter “per_time”, we chose 500 times for demonstration purposes. The default value is 5000 times.

The output from this data shows not only the P-value generated by the hypergeometric test, but also the empirical P-value, which is the value of the average permutation test in each pathway.

GSEA Workflow

Basically there are only two steps with two R functions, to complete the whole analysis:

  1. Find related miRNAs and genes in the possible enriched pathways.
  2. Find potential interactions from the above result.

alt text

  1. Load data
aa <- system.file("extdata", "GSE19536_mrna.csv", package = "anamiR")
mrna <- data.table::fread(aa, fill = T, header = T)
bb <- system.file("extdata", "GSE19536_mirna.csv", package = "anamiR")
mirna <- data.table::fread(bb, fill = T, header = T)
cc <- system.file("extdata", "pheno_data.csv", package = "anamiR")
pheno.data <- data.table::fread(cc, fill = T, header = T)
  1. transform data format
mirna_name <- mirna[["miRNA"]]
mrna_name <- mrna[["Gene"]]
mirna <- mirna[, -1]
mrna <- mrna[, -1]
mirna <- data.matrix(mirna)
mrna <- data.matrix(mrna)
row.names(mirna) <- mirna_name
row.names(mrna) <- mrna_name
pheno_name <- pheno.data[["Sample"]]
pheno.data <- pheno.data[, -1]
pheno.data <- as.matrix(pheno.data)
row.names(pheno.data) <- pheno_name

Extra steps

mirna <- mirna[, 50:70]
mrna <- mrna[, 50:70]
pheno.data <- as.matrix(pheno.data[50:70, ])
colnames(pheno.data) <- "ER"
  1. SummarizedExperiment object
mirna_21 <- miR_converter(mirna, original_version = 17)
mirna_se <- SummarizedExperiment(assays = SimpleList(counts=mirna_21), colData = pheno.data)
mrna_se <- SummarizedExperiment(assays = SimpleList(counts=mrna), colData = pheno.data)
  1. GSEA_ana

In the first step, we use the “GSEA_ana” function to find the pathways which are the most likely to be enriched in the given expression data.

table <- GSEA_ana(mrna_se = mrna_se, mirna_se = mirna_se, class = "ER", pathway_num = 2)
names(table)
miRNA_path1 <- table[[1]]
Gene_path1 <- table[[2]]

The result would be a list, in matrix form, of related genes and miRNAs for each pathway. Note that because it would take a few minutes to run GSEA_ana, here we use the pre-calculated data to show the output.

  1. GSEA_res

After doing GSEA analysis, we have selected miRNA and gene expression data for each enriched pathway. In the second step, the generated data would be put into “GSEA_res”.

result <- GSEA_res(table, pheno.data = pheno.data, class = "ER", DE_method = "limma", cor_cut = -0.3)
names(result)
result_path1 <- result[[1]]

This function calculates the P-value, fold-change, and correlation for each miRNA-gene pair and shows these values to the user.

anamir's People

Contributors

hpages avatar lshep avatar titaiwangms avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

anamir's Issues

multi_Differ does not produce any output!

Dear Wang,
I tried to use function multi_Differ provided in R-package "anamiR" but the function does not generate any results!

library("anamiR")

Use the internal dataset

data("mirna", package = "anamiR", envir = environment())
data("pheno.mirna", package = "anamiR", envir = environment())

SummarizedExperiment class

require(SummarizedExperiment)
mirna_se <- SummarizedExperiment(
assays = SimpleList(counts=mirna),
colData = pheno.mirna)

Finding differential miRNA from miRNA expression data with anova

multi_Differ(se = mirna_se, class = "Subtype", post_hoc = "scheffe.test")

No result is returned by the function!

Do you have any idea what the problem is?
Best regards,

Payam

Server Error

Hello,
Looks like there is some server problem. "anamir.cgm.ntu.edu.tw" looks down and i can not connect to server. I hope it will fixed as sson as possible.

Best,
Samed

Whats expression values are used for the calculation of negative correlation?

I have a list of differentially expressed miRNA and mRNA and would like to use anamiR for further analysis. But I am not sure what values it takes as input for negative correlation function. I assume it is the normalized count. Is it true? If yes, could you please clarify, if we need to normalize the miRNA and mRNA data with the same normalized method? and how does it takes care of data coming from different methods (such as small RNA-Seq and RNAseq), is it comparable

Versions of Databases?

Hello,

I really like your package anamiR and its been a great help with my analyses. I wanted to ask if the versions of the databases used in the database_support() function is listed? I noticed that one of my miRNA-mRNA pairs had an experimentally verified interaction in miRTarBase v7.0, but it was not annotated in the output from the database_support(). Is there any way to select which versions of the databases to query?

Thanks,

Jenny

A question of GSEA_ana

Hello, when I use the example code "table <- GSEA_ana(mrna_se = mrna_se, mirna_se = mirna_se, class = "ER", pathway_num = 2)", it shows error "Error in .local(drv, ...) : Failed to connect to database: Error: Can't connect to MySQL server on 'anamir.cgm.ntu.edu.tw' (110)", the function miR_converter also shows the same error.

what is the reason, and how to solve the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.