Hi microshades team,
I've been noticing a bit of a bug in how relative abundances are calculated for microshades.
Issue description
That is, when using lower levels of taxonomy that might have unclassified taxa, microshades is calculating the relative abundance using only the reads that were classified at that level.
If microshading at the genus level, any taxa that aren't classified at the genus level are thrown out. Because of this, the relative abundances shown in the figure are not representative of the actual relative abundances. This also impacts the relative abundances shown for higher taxonomic levels.
Example
For example, if we had a community with 2 taxa in the family Ruminococcaceae:
1. Faecalibacterium (50%)
2. Unclassified at the genus level (50%)
Which in phyloseq results in a taxonomy table that has:
Family Genus
Ruminococcaceae Faecalibacterium
Ruminococcaceae <NA>
When this is plotted in microshades with subgroup=“Genus”, it throws out the <NA> values before calculating relative abundance. In this simple example community, then Feacalibacterium would be plotted as 100% taxa present.
Further documentation (easy to see the effects in a real data set)
I’ve attached an example knitted Rmarkdown example, where you can click through the different levels, and you can see major changes in relative abundance based on the subgroup level (pdf at the bottom). The knitted Rmd file also has some proposed workarounds.
Addressing this?
I think this is happening during speedyseq’s tax_glom function (in prep_mdf). If it is from speedyseq, would it be possible to have an argument that disables this behavior? Or could there be some sort of note in the documentation/examples telling users to handle taxa unclassified at lower taxonomic levels prior to using microshades?
Real dataset example
In these figures, particularly reference the "control" microbiomes on the far right. These are contaminated germ free mice, with very low diversity microbiomes, which is why it's so evident. On the family level microshades plot (the first one), it appears that peptostreptococcaceae makes up ~75% of some of the microbiomes. However, the genus level plot shows that they're 90% Turicibacter, which is not in the peptostreptococcaceae family. This is because the main ASV in peptostreptococcaceae was unclassified at the genus level, and it was thrown out for relative abundance calculations, resulting in a misleading relative abundance of Turicibacter presented. You can also see differences in the phylum-level relative abundances of Firmicutes and Bacteroidota in the samples labeled 74. I go into more depth in the knitted Rmd. I've attached it below as a pdf, but it lends itself to better reading (for tabbing through the figures) as html, so I can email that version if you'd like it.
Taxa-bar.pdf
Thanks again for this package! It's helped our lab make some fantastic plots. I just wanted to bring this behavior to your attention, since it confused me for quite a few days :)
Thanks,
John