frederickhuanglin / ancombc Goto Github PK

View Code? Open in Web Editor NEW

96.0 96.0 24.0 445 KB

Differential abundance (DA) and correlation analyses for microbial absolute abundance data

Home Page: https://www.nature.com/articles/s41467-020-17041-7

R 95.65% TeX 4.35%

ancom ancombc ancombc2 correlation differential-abundance-analysis microbiome normalization secom sequencing

ancombc's People

Contributors

Stargazers

Watchers

ancombc's Issues

Log fold change

Hi,

Very naive question: I wonder if negative log-Fold change are conserved in ANCOMBC output.

Cheers,

Global test results omit certain taxa

Hi,

I'm trying to identify taxa that are differentially abundant between different sequencing batches. I have two metadata columns, 'site' and 'kit'. My original otu_table has 663 samples and 3986 taxa. However, after running ANCOM-BC, the global test result table has only dimensions of 3834 x 6. Some taxa are not included in this result table. May I know why this may be the case? My command parameters are shown below.

column <- "kit"
out <- ancombc(phyloseq = phy_subset, formula = column,
p_adj_method = "BH", zero_cut = 1, lib_cut = 0,
group = column, struc_zero = FALSE, neg_lb = FALSE,
tol = 1e-5, max_iter = 1, conserve = F,
alpha = 0.05, global = TRUE)

Thanks in advance.

Filtering low abundance Taxa

Hello,

I have a count table of ASV with two populations: P1 and P2 (5 samples/group), with-out filtering ASVs by abundance:

out = ancombc(phyloseq = pseq_A1_raw, formula = "population",
p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000,
group = "population", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5,
max_iter = 100, conserve = TRUE, alpha = 0.05, global = FALSE)

However, I find some ASVs with a significant q_val, but when a see the corresponding row in the count table show very low differences, because they are low abundand ASVs.

For example:
P1_1 P1_2 P1_3 P1_4 P1_5 P2_1 P2_2 P2_3 P2_4 P2_5
ASV_2 2 2 11 1 0 0 0 0 0 0
[...]
ASV_726 1 43 0 15 1 0 0 0 0 0

I wonder if it would be okey to apply a filter to remove low abundand taxa? And when should I used before or after doing differential abundance with ancombc?

Thanks in advance

Global test with multiple group variables

Hello Mr. Developer,

I'm now working on an analysis project. There are 3 major environmental factors (e.g. A, B and C) which we think would affect the abundance of microbiomes. In one step I'd like to test the association between the abundance of microbiomes and 2 of the factors (say B and C), but I am not sure how I could write my formula properly (sorry that this question is rather basic, my apologies). Could I perform a global test on both factors in this case? If so, I'm still wondering if ANCOM-BC support multiple group variables as input.

Any help would be much appreciated and thank you in advance!

Pan

Understanding the Output

Hi Frederick,

Thanks for developing the tool for compositional data.
I am new to microbiome analysis and trying to understand the output result from ANCOM-BC

I was trying use the data to identify differentially abundant KOs from PICRUST2 output.

I have an abundance table for features as rows (KO dataset) with samples as columns. I have 3 treatment groups (treatA, treatB, treatC), with sample size of 10/group. I modified the it formula from the example for my dataset;

out = ancombc(phyloseq = physeq2, formula = "treatment", p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000, group = "treatment", struc_zero = TRUE, neg_lb = FALSE, tol = 1e-5, max_iter = 100, conserve = TRUE,
alpha = 0.05, global = TRUE)

The output has two result files; res and res_global. I am trying to understand these files, and having a bit of trouble.

res_global = is the output for comparing the overall features that are differentially abundant between the three groups in my case. Can we think of the W statistic as analogous to the F statistic in one-way ANOVA?
res = is the output for evaluating the pairwise comparisons for which I have attached a snapshot of the res$beta gives me file with three columns. beta being a coefficient based on which the W and SE are calculated.
a). Is the beta coefficient listed below the two columns treatB or treatC based comparison with treatA (treatB vs treatA; treatC vs treatA)? but there is no comparison between treatB vs treatC. Is this issue associated with the way I have setup the ancombc to run as mentioned above?
b). How do I interpret the first column the "Intercept"? I understand it is a result of applying linear regression to the differential abundance analysis.
c). How do I calculate the effect size for each feature?

out1.txt

If any of my questions are too basic, my apologies. Any help is much appreciated.
Thank you.
Shrez

Error in data_prep(..)

Hi Frederick,
Thanks for developing the methodology and the R package!

When I run the toy example from ?help (GlobalPatterns) data, it runs with no problems. However, when I plug in my data, it givees me this error message:


Error in data_prep(phyloseq, group, zero_cut, lib_cut, global = global) : 
  The group variable should have >= 2 categories.

Any help is much appreciated!

Best. Stan

inconsistent data structures

For my convenience, I transform parts of the result structure into one data frame. I noticed that when using one independent variable with two levels in the formula, the results for $res$diff_abn, $res$p_val, and $res$q_val can either be a simple vector when the parameter struc_zero = FALSE or they are a matrix with one column when struc_zero = TRUE. To me it makes most sense that they are always a matrix since this is the case when more than two levels are involved anyway. Either way, I believe that the outcome should be consistent independent of the struc_zero parameter.

Adjusting for covariates

Dear Huang,

I'm looking forward to trying out ANCOM-BC on our data, and was wondering if the option to adjust for covariates will be implemented in this method (as it is in ANCOMv2.1)?

Many thanks,

Roos

Incorrect labelling of groups in output?

Hello!

I have three treatment groups in the dataset I'm analyzing: Limit Fed, Ad Libitum, and High Fat. They are all listed as a "diet" variable, which i indicate in my code (see below). However, when I look at the output, I see the High Fat group listed twice with different numbers, and no Ad Libitum group. Am i inputing something incorrectly? I double checked my metadata and it seems correct. I have attached a sample out$res file for you to see.

here is the line of code I run:
P0Ancom = ANCOMBC::ancombc(phyloseq = phylum_data0, formula = "Diet", p_adj_method = "holm", zero_cut = 0.90, group = "Diet", struc_zero = TRUE, conserve = TRUE, neg_lb = FALSE, tol = 1e-5, max_iter = 100, alpha = 0.05)

P0pval = P0Ancom$res

P0stats.csv

Best,
Haley

Support for (Tree)SummarizedExperiment

ANCOMBC does currently support phyloseq. We are now actively working on an alternative framework - miaverse - based on the newer (Tree)SummarizedExperiment class, providing some enhanced capabilities. It would be relatively straightforward to add support for this class because they are closely related, phyloseq can always be converted to TSE mia::makeTreeSummarizedExperimentFromphyloseq and TSE to phyloseq in many cases (not always as it is a superset). An example is shown in Conversions between data formats in R.

Support for (Tree)SummarizedExperiment could be added ancombc function (and possibly others?).

Unclear output with four conditions

Dear Frederick,

first of all many thanks for this package! Just tested to detect taxa enriched by season using

bac <- pseq %>% tax_glom("Genus") %>% filter_taxa(.,function(x) mean(x) > 10, T)

out = ancombc(bac, formula = "season", p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000, group = "season", struc_zero = T, neg_lb = F, tol = 1e-5, max_iter = 100, conserve = T, alpha = 0.05, global = T)

I do get an output, which is however confusing: there is "seasonspring", "seasonsummer" and "seasonwinter" but no "seasonfall".

Also, I'd interpret that TRUE in res$diff_abn means that a given taxon is enriched under the specific condition (for example, if TRUE under seasonspring it's enriched in spring). But the results do not match other DA tools I've been testing.

Many thanks for your help!
Matthias

Comparison of 4 groups

Hello,

I am Ana, a Veterinary Student form Spain. I would like to compare the microbiome composition of sows given 4 different treatments, and see if there is any difference in the relative abundance. I performed ANCOMBC with global results, but I am not sure if that is the best way, nor how to know which of the 4 groups is the one having the significant differencial abundant data.

Thank you in advance for your help.
Ana

More informative error message in 'Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed'

Hi @FrederickHuangLin !

I've been working with the ANCOM-BC package for the past couple of days and it's very intuitive and interesting. However, I would like to know more about this error message :

'Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed'

If perhaps you could add an enhancement in the package such that the user knows more details about why they are getting such an error, it would be useful. I have referred gone through your conversation in this issue : #16 , but it's still a little unclear, and may not be applicable in this case. Given that I am using the latest version of ANCOM-BC, could you provide me with more details about this error?

Thank you so much! Looking forward to hearing from you.

Best,
Sharvari Narendra

Longitudinal analysis using ancombc

Dear lin:

About the analysis of the longitudinal data using ANCOM_BC;
I know that ANCOM can be used to analyses longitudinal data;
Whether ANCOM_BC would provide simialr function ?
or if we can use bias corrected data by ANCOM_BC to perform ANCOM ?

Thanks for your great work ~
best
huahui

Error Message

Hello! I am currently using ANCOMBC on a phyloseq object. When I run the function:

G0Ancom = ANCOMBC::ancombc(phyloseq = pig0, formula = "Diet", p_adj_method = "holm", zero_cut = 0.90, group = "Diet", struc_zero = TRUE, conserve = TRUE, neg_lb = FALSE, tol = 1e-5, max_iter = 100, alpha = 0.05)

I am getting this error:

Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed

It seems that this issue has been brought up before, but i am not using a small amount of taxa (I have 48 OTUs). Any advice on how to remedy this?

Thanks!

package not available for R version 3.6.1?

Hi,

I am trying to install ANCOMBC using the following command:

package ‘ANCOMBC’ is not available (for R version 3.6.1)

But then I get the following result:
package ‘ANCOMBC’ is not available (for R version 3.6.1)

Is there a way I could fix this?

Cheers,
Pablo

z-transformed covariates

Hi Frederick,

thank you for developing such amazing tool! I am using it for my studies on gut microbiome and stress and I was wondering if I could z-transform my covariate (stress) before runing ancombc() to better interpret the results.

Do you think it could work? my code is:

out = ancombc(phyloseq = phylum_data, formula = "stress + season + sex + age_cat", p_adj_method = "holm",
zero_cut = 0.90, lib_cut = 1000, group = NULL, struc_zero = FALSE, neg_lb = FALSE, tol = 1e-5, max_iter = 100,
conserve = TRUE, alpha = 0.05, global = FALSE)

but before of it, i guess I should z-transform the covariate in the data summary part or even before? should i do it before and then run the following?

summary_template =
list("stress" =
list("min" = ~ min(.data$stress, na.rm = TRUE),
"max" = ~ max(.data$stress, na.rm = TRUE),
"mean (sd)" = ~ mean_sd(.data$stress, na_rm = TRUE,
show_n = "never")),
"sex" =
list("F" = ~ n_perc0(.data$sex == "female", na_rm = TRUE),
"M" = ~ n_perc0(.data$sex == "male", na_rm = TRUE),
"NA" = ~ n_perc0(is.na(.data$sex))), and so on...

Thank you!

Cheers,

Simone

Another error message in ANCOM BC

Hi,

I have created a phyloseq object and try to run ANCOM BC on it - the phyloseq object contains three files, a tsv metadata file, and 2 qiime qza files - the taxa and the samples (table.qza).

When I try to run ANCOM BC on the phylum level (I have used phylum_data = aggregate_taxa(physeq_sopp, "Phylum"), with the following code
phylum_data_OW_Sarcoidosis_Control = subset_samples(phylum_data, (Category%in%c("Sarcoidosis", "Control")) & (SampleType=="OW"))

out_OW_Sarcoidosis_Control = ancombc(phyloseq = phylum_data_OW_Sarcoidosis_Control, formula = "Category",

                                 p_adj_method = "holm", zero_cut = 0.90, lib_cut = 500,

                                 group = "Category", struc_zero = TRUE, neg_lb = FALSE,

                                 tol = 1e-5, max_iter = 100, conserve = TRUE,

                                 alpha = 0.05, global = TRUE)

I get this error message:

Error in while (epsilon > tol & iterNum < max_iter) { :
missing value where TRUE/FALSE needed

and I cannot get around this. I have tried tweeking the neg_lb, tpl, and max_iter to extremes to no avail, and I can not see any errors in the metadatafile regarding the "Category" values.

Any idea what this means?

Does ANCOMBC supports only one variable of interest?

Hi developer,

Does ANCOMBC support only one continuous variable of interest?
For example, in the vignettes:

out = ancombc(phyloseq = phylum_data, formula = "age + nation + bmi_group",
p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000,
group = "nation", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5,
max_iter = 100, conserve = TRUE, alpha = 0.05, global = TRUE)

Is that possible, if I want to test the association between age(only one continuous variable) and each taxa？
Appreciate it!

Error message after adjusting formula

Hello,

Thank you for developing and sharing the ancombc package.

I have been running ancombc on a dataset using 2 covariates in the formula argument and it seems to be working great.
However, when I try to add a third covariate, I receive the following error message;

Error in vapply(seq(n_taxa), function(i) dnorm(Delta[i], delta, sqrt(nu0[i])),  : 
  values must be length 1,
 but FUN(X[[2]]) result is length 0
In addition: Warning messages:
1: In min(Delta, na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
2: In max(Delta, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf

Would you happen to have any advice on where this is coming from and how to remedy it?

Thank you!

ANCOM-BC global test and multiple comparisons

Dear all,
First of all, thanks for developing such a tool, it is proving very helpful in many of my current studies.
I have a couple of questions regarding its utilization in one of my recent study designs. I have a batch of samples with varying feature abundances; I have identified a total of 6 groups (C1 to C6) and I'd like to identify differentially abundant features across the dataset using ANCOM-BC. However, the test itself relies on comparisons against a reference, which I don't have. To avoid resorting to manually performing a batch of multiple comparisons each time defining a different group as a reference, I tried to do the following:

have ANCOM-BC define globally differentially-abundant features through the global test;
recover the identified features from a CLR-normalized table and plot their abundances by group.

I am however unsure about this. What does really the global test procedure do? Should I normalize the results by subtracting the sampling fraction from log-normalized feature abundances as the vignette shows?
Thanks for the advices!
Best regards

What to do with the group argument when your predictors are all continuous

Hello,

I apologize if this has been answered elsewhere, but it's not clear to me what I should do with the group = argument when all of my predictors are continuous. I would like to control for structural zeros if possible, but I don't think I can give a continuous variable to the group argument and I don't have any categorical predictors in my model.

Thanks!

error in evaluating the argument 'y' in selecting a method for function 'plot'

Hi,
I am running ANCOMBC to compare abundance levels between countries using the yes/no variable. I try to run the code to make a volcano plot but I get this error:
Error in h(simpleError(msg, call)) : error in evaluating the argument 'y' in selecting a method for function 'plot': non-numeric argument to mathematical function.
I have attached a photo of this issue with the R code. I wonder what is wrong here.

Best,
Nermin

Diff_abn for first variable depends on the second variable reference level - expected?

Dear Frederick,

I have been happily using ANCOM-BC for a while and plan to keep using it in the future. However, something is puzzling me now that I'm working on a data set with two variables, of which the variable Status has 2 levels and the variable Treatment has 3 levels.

I ran ancombc with the formula "Status*Treatment". This gave me two of the three possible comparisons for Treatment, but the third comparison for the variable Treatment is also relevant in this case, so I repeated ancombc with the same formula but with the Treatment levels switched around so that another level was the reference, which achieved the goal.

However, I noticed that the 79 ASVs that were differentially abundant between Status levels in the first comparison only partially overlapped the 18 ASVs that were differentially abundant between Status level in the second comparison (5 ASVs in common). I switched around or omitted Treatment factor levels some more and saw that the ASVs differentially abundant between the Status levels depended heavily on which Treatment factor level was the reference.

Did I make a mistake somewhere? Is this expected behaviour and the comparison for the first factor have to be repeated with every level of the second one as reference (and vice versa)? Do you have any suggestions on how to proceed in cases like this and how to interpret these results? I would be grateful for any comment or advice!

With kind regards,
Marko

unused argument (prv_cut = 0.1)

Hi Dr. Lin,

I was running the example code in your vignettes and I got an error from ## 4.1 Run ancombc function:

out = ancombc(phyloseq = family_data, formula = "age + region + bmi_group", p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, group = "region", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5, max_iter = 100, conserve = TRUE, alpha = 0.05, global = TRUE),

the error message is :

Error in ancombc(phyloseq = family_data, formula = "age + region + bmi_group", : unused argument (prv_cut = 0.1),

I could not debug this error and all codes in the vignettes before ## 4.1 ran well, so I don't think I have missed something in the previous codes.

Here are my session info:

`R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggforce_0.3.3 limma_3.50.3 corrplot_0.92 DT_0.23
[5] ANCOMBC_1.4.0 qwraps2_0.5.2 magrittr_2.0.3 microbiome_1.16.0
[9] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4
[13] readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6
[17] tidyverse_1.3.1 phyloseq_1.38.0

loaded via a namespace (and not attached):
[1] Rtsne_0.16 colorspace_2.0-3 ggsignif_0.6.3
[4] ellipsis_0.3.2 XVector_0.34.0 fs_1.5.2
[7] rstudioapi_0.13 farver_2.1.0 ggpubr_0.4.0
[10] fansi_1.0.3 lubridate_1.8.0 xml2_1.3.3
[13] codetools_0.2-18 splines_4.1.2 knitr_1.39
[16] polyclip_1.10-0 ade4_1.7-19 jsonlite_1.8.0
[19] nloptr_2.0.0 broom_0.8.0 cluster_2.1.3
[22] dbplyr_2.1.1 BiocManager_1.30.18 compiler_4.1.2
[25] httr_1.4.3 backports_1.4.1 assertthat_0.2.1
[28] Matrix_1.4-1 fastmap_1.1.0 cli_3.3.0
[31] tweenr_1.0.2 htmltools_0.5.2 tools_4.1.2
[34] igraph_1.3.1 gtable_0.3.0 glue_1.6.2
[37] GenomeInfoDbData_1.2.7 reshape2_1.4.4 Rcpp_1.0.8.3
[40] carData_3.0-5 Biobase_2.54.0 cellranger_1.1.0
[43] vctrs_0.4.1 Biostrings_2.62.0 rhdf5filters_1.6.0
[46] multtest_2.50.0 ape_5.6-2 nlme_3.1-157
[49] iterators_1.0.14 xfun_0.31 rbibutils_2.2.8
[52] rvest_1.0.2 lifecycle_1.0.1 gtools_3.9.2
[55] rstatix_0.7.0 zlibbioc_1.40.0 MASS_7.3-57
[58] scales_1.2.0 hms_1.1.1 parallel_4.1.2
[61] biomformat_1.22.0 rhdf5_2.38.1 yaml_2.3.5
[64] stringi_1.7.6 S4Vectors_0.32.4 foreach_1.5.2
[67] permute_0.9-7 caTools_1.18.2 BiocGenerics_0.40.0
[70] GenomeInfoDb_1.30.1 Rdpack_2.3 rlang_1.0.2
[73] pkgconfig_2.0.3 bitops_1.0-7 evaluate_0.15
[76] lattice_0.20-45 Rhdf5lib_1.16.0 htmlwidgets_1.5.4
[79] tidyselect_1.1.2 plyr_1.8.7 R6_2.5.1
[82] IRanges_2.28.0 gplots_3.1.3 generics_0.1.2
[85] DBI_1.1.2 pillar_1.7.0 haven_2.5.0
[88] withr_2.5.0 mgcv_1.8-40 survival_3.3-1
[91] abind_1.4-5 RCurl_1.98-1.6 modelr_0.1.8
[94] crayon_1.5.1 car_3.0-13 KernSmooth_2.23-20
[97] utf8_1.2.2 tzdb_0.3.0 rmarkdown_2.14
[100] grid_4.1.2 readxl_1.4.0 data.table_1.14.2
[103] vegan_2.6-2 reprex_2.0.1 digest_0.6.29
[106] stats4_4.1.2 munsell_0.5.0 `

Is the ANCOMBC currently available?

Hi, Huang,

I am wondering if the ANCOMBC is currently available? I am really interested in trying it. I tried to install either the latest released or development version using code listed in the install section, but none of them worked for me.

Thanks!

Best regards,
Zhe

Confidence intervals from standard errors output by model?

Hi, I'm really excited about this method and package! Thanks for developing it.

I noticed in the original publication (https://doi.org/10.1038/s41467-020-17041-7) that log-fold changes were plotted with Bonferroni-corrected 95% confidence intervals (CIs) as the error bars, but the current release outputs standard errors (SEs) and not CIs directly. I haven't been able to find the documentation in the manuscript methods or in the vignette or package documentation for calculating CIs from the SEs - can a normal distribution be assumed to calculate those?

Thanks for any help!

Error in ancombc

Hi,

This package is very interesting. I tried it and found that it reported the error: "Error in eval(predvars, data, env) :
numeric 'envir' arg not of length one" when the data is sparse. I am wondering if you have any suggestions on it.

Shulei

Longitudinal analysis

Hi @FrederickHuangLin

I have seen the package and I find it very interesting. I have to do an analysis with longitudinal data and wanted to ask if there is any date for the new release containing longitudinal data analysis

Thanks in advance

Dolors Pelegrí

ANCOM v2.1 and ANCOMBC, difference?

Hi developer,
I am a little confused, what's the difference between ANCOM-BC and ANCOM v2.1?
I have been always using ANCOM v2.1, is it different from ANCOM-BC?

Thanks!!!!

Relation of W statistic and beta coefficient

Hi Huang,

I've been using your tool happily and I've read your paper (Lin & Peddada, 2020), the vignette, and the ANCOM-BC manual.

What is not entirely clear to me is how the effect size (beta coefficient of the log-linear model) relates to the ANOVA-like model proposed in Lin & Peddada, 2020. And how the significance of the effect is actually tested for.

Writing Eq(4) of Lin & Peddada, 2020 in log form, we have: log(Oijk) = log (cjk) + log(Θij) + eijk

From what I understand from the paper, this model could also be written in a log-linear form to explicitly account for the effect of a covariate: log(Oijk/cjk) = βi0 + βi1X + eijk

But the log-linear model equation is not written in the paper.

Then, for the W test statistic we have Eq(33) from the paper and the information in the manual: "W = beta/se" (standardized effect size).

So, how does W relates to the beta coefficient of the covariate X we're testing for ? How is the p-value of W computed ?

Kind regards,
Rudy

Foldchange value vs logFoldchange

Hi,
Is it correct to say that the log ratio between the tested conditions corresponds to the dataframe res$W=(beta/se)

In such case the Foldchange would be exp(1)^(abs(res$w) ?

So for example if res$w returns -1.363898358 than the FC will be exp(1)^(1.363898358) = -3.91 ?

Is it correct ?

Confusion about results interpretion

Hi Dr. Lin,

I met some confusions when I tried to interpret the ANCOM-BC results.

First, I am trying to compare the DA(Differential Abundant) phyla between "Positive" group and "Negative" group, I believe the "Negative" group was used as the reference group here in my analysis. According to the LFC( Log Fold Change) values, I want to conclude that phyla such as Halobacterota and Campilobacterota are more abundant in the "Positive" group, because their LFC values are positive. But when I checked their RA(relative abundance), I found the RA of Campilobacterota was higher in the "Negative" group when compared with the "Positive" group. Similarly, the LFC value for Bacteroidota is negative, which means it should be more abundant in the reference ("Negative") group, but the RA of Bacteroidota was higher in the "Positive" group than the "Negative" group. I know I am not supposed to compare the RA values directly, but I just want to make sure that it is possible to see such scenarios.

Second, I am trying to compare the DA phyla between "Treatment" group and "Control" group, I believe the "Control" group was used as the reference group here in my analysis. Here, I saw Thermoplasmatota has a negative LFC value, which should lead to the conclusion that" Thermoplasmatota is more abundant in the "Control" group". However, the RA of Thermoplasmatota in the "Control" group is 0.00%. Is this also because of the structural zeros?

I hope I have described my questions clearly, if not, please feel free to let me know.

Best regards,
Jinji Pang

which group the differential features are enriched

Hi Frederick,

Thanks for your great work.

In ANCOMBC, the global test is for testing whether there are some taxa that are differentially abundant in at least one group, and how can I determin which group the differential features are enriched.

Assume there are three groups A, B anc C, featureA is differential among these three groups after running ANCOMBC, which group (A, B or C) the featureA is enriched in? Is the group which has largest mean abundance of featureA? If it is, how to got the absolute abundance, is the Bias-adjusted abundances of the result represents the
corrected absolute abudance?

I'm sorry If my question are too basic. Any help is much appreciated.

Thanks for your attention.

Error with contrasts in ancombc

I call ancombc as follows with a phyloseq object called physeq.

result <-
  ANCOMBC::ancombc(
    phyloseq = physeq,
    formula = "treatment",
    group = "treatment",
    lib_cut = 500,
    p_adj_method = "holm",
    conserve = FALSE
  )

which leads to the error with traceback

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
9. stop("contrasts can be applied only to factors with 2 or more levels")
8. `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
7. model.matrix.default(mt, mf, contrasts)
6. model.matrix(mt, mf, contrasts)
5. lm(tformula, data = df)
4. FUN(X[[i]], ...)
3. lapply(seq_len(n_taxa), function(i) { df = data.frame(y = unlist(y[i, ]) - d, meta_data) return(lm(tformula, data = df)) })
2. para_est(y, meta_data, formula, tol, max_iter)
1. ANCOMBC::ancombc(phyloseq = physeq, formula = "treatment", group = "treatment", lib_cut = 500, p_adj_method = "holm", conserve = FALSE)

Treatment is a factor with two levels "baseline" "A" so I'm not sure where the problem is coming from.

Unfortunately, I can't share any data but I'm happy to answer any questions to solve this. I use this phyloseq object in other analyses so I generally think that the data is in the right shape.

sessionInfo()

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_BE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_BE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_BE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_BE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] lubridate_1.7.10 magrittr_2.0.1   phyloseq_1.34.0  nloptr_1.2.2.2   ANCOMBC_1.0.3    dplyr_1.0.4     

loaded via a namespace (and not attached):
 [1] Biobase_2.50.0      mustashe_0.1.2      tidyr_1.1.2         jsonlite_1.7.2      splines_4.0.4       foreach_1.5.1      
 [7] RcppParallel_5.0.3  assertthat_0.2.1    Rdpack_2.1.1        BiocManager_1.30.10 stats4_4.0.4        cellranger_1.1.0   
[13] renv_0.13.0         yaml_2.2.1          progress_1.2.2      pillar_1.5.0        lattice_0.20-41     glue_1.4.2         
[19] digest_0.6.27       XVector_0.30.0      rbibutils_2.0       picante_1.8.2       stringfish_0.15.0   colorspace_2.0-0   
[25] htmltools_0.5.1.1   Matrix_1.3-2        plyr_1.8.6          pkgconfig_2.0.3     microbiome_1.12.0   zlibbioc_1.36.0    
[31] purrr_0.3.4         scales_1.1.1        Rtsne_0.15          RApiSerialize_0.1.0 rcartocolor_2.0.0   tibble_3.1.0       
[37] mgcv_1.8-34         generics_0.1.0      IRanges_2.24.1      ggplot2_3.3.3       ellipsis_0.3.1      BiocGenerics_0.36.0
[43] cli_2.3.1           readxl_1.3.1        survival_3.2-7      crayon_1.4.1        evaluate_0.14       fansi_0.4.2        
[49] nlme_3.1-152        MASS_7.3-53.1       vegan_2.5-7         tools_4.0.4         data.table_1.14.0   prettyunits_1.1.1  
[55] hms_1.0.0           formatR_1.7         lifecycle_1.0.0     stringr_1.4.0       Rhdf5lib_1.12.0     S4Vectors_0.28.1   
[61] munsell_0.5.0       cluster_2.1.1       Biostrings_2.58.0   ade4_1.7-16         compiler_4.0.4      qs_0.23.6          
[67] rlang_0.4.10        rhdf5_2.34.0        grid_4.0.4          iterators_1.0.13    rhdf5filters_1.2.0  biomformat_1.18.0  
[73] rstudioapi_0.13     igraph_1.2.6        rmarkdown_2.7       gtable_0.3.0        codetools_0.2-18    multtest_2.46.0    
[79] reshape2_1.4.4      R6_2.5.0            knitr_1.31          utf8_1.1.4          permute_0.9-5       readr_1.4.0        
[85] ape_5.4-1           stringi_1.5.3       parallel_4.0.4      Rcpp_1.0.6          vctrs_0.3.6         tidyselect_1.1.0   
[91] xfun_0.21

Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed

Hi!

I am running ANCOMBC with a phyloseq object. And I got this error:

Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed

I know this error has been discussed more than once. But what makes it curious for me this time, is that it's definitely not due to small number of taxa. When I agglomerated my phyloseq object at genus level (204 taxa) or order level (44 taxa), the function run through no problem. But when I try to agglomerated my phyloseq object at family (67 taxa), it fails.

Any suggestions?

PS: I do have a fairly complicated formula which contains 4 terms. Still it doesn't make sense it worked for order level but not for family level.

User-friendly Result table

As posted in Issue #73:

Considering you are performing major updates to the package, have you considered to organize the final results more in a user-friendly table, a similar table to the one DESeq2 (https://bioconductor.org/packages/release/bioc/html/DESeq2.html) produces, for example? It would summarize the results in a more consise fashion and would make the end-user life easier, that would not need to merge several tables.

Cheers,

André

transformation for plots

Dear Huang,

Thanks for providing your continuous support to ANCOM and ANCOM-BC users!

I'm wondering what is the best way to plot changes in abundances of DA features identified with ANCOM-BC. I used to plot clr-transformed counts on heatmaps when I was using ANCOM but now that I switched to ANCOM-BC I get very conflicting results regarding more or less abundant features in two conditions in comparison with ANCOM-BC results. For instance, ANCOM-BC identifies featureX as differentially more abundant in treatmentA vs treatmentB but when I plot the clr-transformed abundances I get the opposite pattern. This only happens for some features, the majority of them are consistent with ANCOMBC results, but sometimes I need to focus on those exceptions and haven't found a better way to plot them.

Any help is appreciated!

Thanks,
Anny

Unable to install

Thanks for making this package available!

I am unable to install your package on my system using Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.3 (2020-10-10). Installation produced the following warning:
Warning message: package ‘ANCOMBC’ is not available for this version of R

My version of R was 4.0.3. See full session info as following:

R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Registered S3 method overwritten by 'treeio':
  method     from
  root.phylo ape 


> if (!requireNamespace("BiocManager", quietly = TRUE))
+     install.packages("BiocManager")
> BiocManager::install("ANCOMBC")

Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.3 (2020-10-10)
Installing package(s) 'ANCOMBC'
Warning message:
package ‘ANCOMBC’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages 

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
  [1] Rtsne_0.15          colorspace_2.0-0    ggtree_2.2.4        ggsignif_0.6.0      class_7.3-17        ellipsis_0.3.1      rio_0.5.16         
  [8] htmlTable_2.1.0     XVector_0.30.0      base64enc_0.1-3     aplot_0.0.6         rstudioapi_0.13     farver_2.0.3        ggpubr_0.4.0       
 [15] graphlayouts_0.7.1  ggrepel_0.9.0       codetools_0.2-18    splines_4.0.3       knitr_1.30          polyclip_1.10-0     ade4_1.7-16        
 [22] Formula_1.2-4       jsonlite_1.7.2      phyloseq_1.34.0     broom_0.7.3         cluster_2.1.0       png_0.1-7           ggforce_0.3.2      
 [29] BiocManager_1.30.10 readr_1.4.0         compiler_4.0.3      rvcheck_0.1.8       backports_1.2.1     Matrix_1.3-0        lazyeval_0.2.2     
 [36] tweenr_1.0.1        htmltools_0.5.0     prettyunits_1.1.1   tools_4.0.3         igraph_1.2.6        gtable_0.3.0        glue_1.4.2         
 [43] reshape2_1.4.4      dplyr_1.0.2         Rcpp_1.0.5          carData_3.0-4       Biobase_2.50.0      cellranger_1.1.0    vctrs_0.3.6        
 [50] Biostrings_2.58.0   rhdf5filters_1.2.0  multtest_2.46.0     ape_5.4-1           nlme_3.1-151        iterators_1.0.13    ggraph_2.0.4       
 [57] xfun_0.19           stringr_1.4.0       openxlsx_4.2.3      lifecycle_0.2.0     fuzzyjoin_0.1.6     rstatix_0.6.0       zlibbioc_1.36.0    
 [64] MASS_7.3-53         scales_1.1.1        tidygraph_1.2.0     hms_0.5.3           parallel_4.0.3      biomformat_1.18.0   rhdf5_2.34.0       
 [71] RColorBrewer_1.1-2  yaml_2.2.1          curl_4.3            gridExtra_2.3       ggplot2_3.3.3       rpart_4.1-15        latticeExtra_0.6-29
 [78] stringi_1.5.3       S4Vectors_0.28.1    foreach_1.5.1       e1071_1.7-4         tidytree_0.3.3      checkmate_2.0.0     permute_0.9-5      
 [85] BiocGenerics_0.36.0 phylosmith_1.0.5    zip_2.1.1           rlang_0.4.10        pkgconfig_2.0.3     evaluate_0.14       lattice_0.20-41    
 [92] sf_0.9-6            purrr_0.3.4         Rhdf5lib_1.12.0     treeio_1.12.0       patchwork_1.1.1     htmlwidgets_1.5.3   tidyselect_1.1.0   
 [99] plyr_1.8.6          magrittr_2.0.1      R6_2.5.0            IRanges_2.24.1      generics_0.1.0      Hmisc_4.4-2         DBI_1.1.0          
[106] pillar_1.4.7        haven_2.3.1         foreign_0.8-81      mgcv_1.8-33         units_0.6-7         survival_3.2-7      abind_1.4-5        
[113] nnet_7.3-14         tibble_3.0.4        crayon_1.3.4        car_3.0-10          KernSmooth_2.23-18  microbiome_1.10.0   phylofactor_0.0.1  
[120] rmarkdown_2.6       viridis_0.5.1       jpeg_0.1-8.1        progress_1.2.2      grid_4.0.3          readxl_1.3.1        data.table_1.13.6  
[127] vegan_2.5-7         forcats_0.5.0       classInt_0.4-3      digest_0.6.27       tidyr_1.1.2         stats4_4.0.3        munsell_0.5.0      
[134] viridisLite_0.3.0   qiime2R_0.99.23

Variables 'change' after using ANCOMBC function

Hello dr Lin,

I am very interested in using ANCOMBC to analyse a human microbiome dataset.
Unfortunately, I am running into a problem using the ANCOMBC function.

I am not an experienced R user but I'll try to explain as best as I can.

e.g.: the variable “City” consisting of “Bur”, “Man”, “Kna”, “Tow”.
After running ancombc, the columns of the output tables are headed: CityMan, CityKna, CityTow.
The variable Bur has disappeared altogether. This happens for all the variables; the column name of the variable seems to replace one of the variables.
In the merged phyloseq file used for ancombc this doesn’t seem to be an issue:
sam_data list40x5 A data.frame with 40 rows and 5 columns
Town factor Factor with 4 levels: “Bur”, “Man”, “Kna”, “Tow”

I am sure I am missing something here and hopefully you can help.
Thank you very much in advance.
Regards, Ica

Using ANCOM-BC with known sampling fractions

Dear,

I find the ANCOM-BC approach to identify deferentially abundant taxa very interesting and was wondering if the estimated sample-specific offset term could be replaced by a known sampling fraction. The ecosystem absolute abundance can easily be determined through e.g. flow cytometry. Combined with the library sizes this allows for the calculation of sampling fractions. With quantitative microbiome profiling (https://www.nature.com/articles/nature24460) gaining popularity, an option to use the known sampling factions in the ANCOM-BC model seems very interesting to me.

Kind regards,
Kim

adjusting for random effects

Really nice contribution to the field. I am trying ANCOMBC with my own dataset. Is there currently a way to adjust for random effects as with ANCOM-II? I use the rand_formula = "~ 1 | random_effect_var" option with ANCOM-II.

ANCOM-BC / missing comparisons

Hello,

Thanks to have published the ANCOM-BC tool.

I wonder why ANCOM-BC doesn’t do all possible pairwise comparisons?

For example for the variables “nationality” and “BMI” in your Bioconductor webpage there is only any nationality versus “CE” and any BMI versus “Lean” (see screenshot below)? I think the choice of the versus group “CE” or “Lean” is selected alphabetically, right?

Cheers,
Jérémy Tournayre

Co-occurrence network with ANCOM-BC derived absolute abundances

Hello,

We have performed differential abundance analysis using ANCOM-BC specifying the model "case_status + batch", where "case_status" is our variable of interest and denotes what samples came from a case or control individual, while "batch" is a covariate and denotes what sequencing batch a sample belonged to.

We would now like to construct species to species co-occurrence networks of tested species. I would like to use estimated absolute abundances as input to the co-occurrence networks, but I'm unsure if this can be done, as I'm aware the estimated sampling fractions (and therefore the bias-adjusted abundances) are dependent on the model given to ANCOM-BC. In order to preserve any differences that would be present in absolute abundances of case vs control samples, would it make sense to just supply ANCOM-BC with the "batch" variable, compute estimated absolute abundances using the resulting sampling fractions, and use those abundances for input to network analysis (after any needed transformations or normalizations)?

Any info/suggestions is much appreciated!

Repeated measures & using bias-adjusted counts elsewhere?

Hello,
You mention in your paper that it is possible to use ANCOMBC with repeated-measures designs. I was wondering if this had been implemented in ANCOMBC already (perhaps as support for random effects)?

If not, is it reasonable to run ancombc(), take the bias-adjusted counts, and use them in, say, lme4::lmer() to take advantage of random effects?

Thanks

Help with out file

First, Thank you for this tool!!
It is very impressive...

I do have a question, I am not that code smart, so I have a question/request help.

In the final output "out" all the tables has as rownames the taxa information. I wonder if it is possible to get that output with the tax_table rownames information and not the actual taxa info. Thanks.

technical replicates

Hi again,

Many questions as a new user... Please let me know if you prefer to have e-mail conversations or you could enable the 'Discussion' feature for this repository; but issues are just fine for me.

I wonder how you suggest to handle technical replicates. As an example, I have multiple samples per patient in each group. In a linear mixed-effect model I could specify this as a random effect, e.g.,

lme4::lmer(abundance ~ group + (1 | patient), ...)

One possible way is to combine abundance tables using mean values, I suppose, but that seems like a huge loss of information. Thank you for any insights.

input ASV table

Hi Frederick,

What a great software to analyze differential abundance!
I just have a brief question regarding the input ASV table. I noticed that you stressed multiple times that the input should be an absolute count table instead of a relative abundance table. I am wondering if you would recommend using a rarefied table as input or just raw data?

Thank you!

Group variables and global test results

Dear ANCOMBC developer,

I am trying to run ANCOMBC with 5 confounders and 1 group variable with 2 levels as follows

fit.ancombc0 = ancombc(phyloseq = phydata, formula = "x1 +x2+x3+x4+x5+ group" ,
p_adj_method = "BH", zero_cut = 0.90, lib_cut = 1000,
struc_zero = TRUE, neg_lb = FALSE,
tol = 1e-5, max_iter = 100, conserve = TRUE,
group = group, alpha = 0.05, global = TRUE)

I get the following warning message and my res_global output is NULL

Warning message:
In data_prep(phyloseq, group, zero_cut, lib_cut, global = global) :
The multi-group comparison will be deactivated as the group variable has < 3 categories.

I am interested to find the Differentially abundant (DA) taxa between the two levels of my group variable. Does the above warning mean that ANCOMBC cannot provide DA taxa between two groups?

q_val of 0.000000000

Hi, thanks for your efforts in developing this package!

I had a doubt: I run ancombc using "fdr" adjustment and I have a lot of q-values of 0. I couldn't get exactly whether these are actually low q-values (so high significance) or an artefact from the test itself.
eg. this is the output I got (transformed to significance level), where 0 are marked as "?"
? * ** *** **** ns
634 46 20 7 7 2222

frederickhuanglin / ancombc Goto Github PK

ancombc's People

Contributors

Stargazers

Watchers

Forkers

ancombc's Issues

Recommend Projects

Recommend Topics

Recommend Org