astrobiomike / astrobiomike.github.io Goto Github PK

Site to help biologists wade their way into bioinformatics.

Home Page: https://astrobiomike.github.io/

License: Other

Ruby 0.03% HTML 85.75% CSS 0.09% JavaScript 1.29% Shell 0.02% Python 0.10% TeX 0.01% SCSS 2.14% Jupyter Notebook 10.57%

astrobiomike.github.io's Introduction

Hi there!

I’m Mike Lee, a bioinformatician with NASA GeneLab and a research scientist with Blue Marble Space Institute of Science located at NASA’s Ames Research Center in northern California, USA. I focus primarily on microbial ecology and evolution in all kinds of different systems ranging from the bottoms of our oceans up to the International Space Station 👽

If interested, you can find publications here: microbialomics.org/research 🙂

astrobiomike.github.io <- bioinformatics for beginners
GToTree <- phylogenomics for all
Twitter
Google Scholar

astrobiomike.github.io's People

Stargazers

Watchers

Forkers

planetazor brockml edgraham pythseq luizirber darkdengar flanaganbena csoneson memoll fedarko bjtully madmolecularman arkadiy-garber alpha17081205 ssmufer ctb philiplaric luponsky abubakariabdulwasid

astrobiomike.github.io's Issues

Phylogenetic tree construction

Dear Mike,
Thank you very much. I had finished the process with my own data. However, I still have tow questions. First, no phylogenetic tree constructing code in your tutorial. The phylogenetic tree is also an important part for the microbial diversity analysis. Second, how to remove the unclassified data from the datasheet and reanalysis the micriobial diversity, composition etc. Thank you very much.

Best wishes

Long

Error in installing sabre

Hi, I am trying to demultiplex on my metabarcoding data. I found that "sabre" can do it. I have try to install sabre, but face some problem with installing. Hoping for a guide from you all to me to solve the problem.

It shows an error at the step "make" file.

decontam bug in subsetting fasta file when there are no contaminants

Hey @AstrobioMike hope you are well. I ran across a situation in your full amplicon example that breaks the tutorial. Not sure if its worth mentioning but in the event that decontam doesn't reveal any contaminant sequences from the negative controls, subsetting the fasta file breaks. Here is an example of what I'm talking about. Sorry I don't have an elegant solution.

> vector_for_decontam <- c(rep(FALSE, 33), rep(TRUE, 3))
> tail(rownames(t(asv_tab)))
[1] "NORMAL-11b" "GNOTO-12b" "GNOTO-13b"  "KITNEG-KN1"
[5] "KITNEG-KN2" "KITNEG-KN3"
> contam_df <- isContaminant(t(asv_tab), neg=vector_for_decontam)
> table(contam_df$contaminant) # identified no contaminants
FALSE 
  585 
> unique(contam_df$contaminant)
[1] FALSE
> # getting vector holding the identified contaminant IDs
> contam_asvs <- row.names(contam_df[contam_df$contaminant == TRUE, ])
> contam_asvs
character(0)
> contam_asvs
character(0)
> asv_tax[row.names(asv_tax) %in% contam_asvs, ]
     domain phylum class order family genus species
> # making new fasta file
> contam_indices <- which(asv_fasta %in% paste0(">", contam_asvs))
> contam_indices
integer(0)
> dont_want <- sort(c(contam_indices, contam_indices + 1))
> print(dont_want)
numeric(0)
> asv_fasta_no_contam <- asv_fasta[-dont_want]
> asv_fasta_no_contam
character(0) #UH OHHHH

New Topic Idea

What else do you think should be covered on the site? (This can be a request for something you are or did have trouble with, or it can be a suggestion for something you're comfortable with but think will help others!)

I would suggest adding this code snippet at the end so that the person can see the result of replacing \r with \n.
wc -l gene_annotations_fixed.tsv

Page link: https://astrobiomike.github.io/unix/six-glorious-commands#tr

slight point

regarding the unix crash course / Variables and For loops / "grep -w -A 1 ">9" genes.faa"
grep with -w doesn't detect only whitespace but any "non-word constituent character" so anything that isn't alphanumerical or an underscore. This also means adding > to the string is redundant.

(I found it out by chance and then checked the man page. Possibly not universal(?))

Genus/Family-based stacked bar charts

Hi,

So sorry to bother you with yet another silly query; you can ignore this if you like and I'll absolutely have no complaints whatsoever. I was wondering if you could quickly hint at where I should be modifying your R codes to generate a genus-level or a family-level stacked bar chart. I tried for example from the below step on wards (and left out the codes for the class breakdown) but everything seems to be going haywire after that. Also, I was wondering if you know your codes could work outright with other databases such as UNITE that are maintained by the DADA2 developers. Thanks again a zillion in advance for your time.

genus_counts_tab <- otu_table(tax_glom(ASV_physeq, taxrank="genus"))

Add a section about BUSCO to de novo assembly tutorial

Hi,
Some great tutorials you've got there!
May I suggest adding a section about BUSCO to the assembly-QA part of your de novo tutorial?
I find it to be a very useful tool and think it should definitely be mentioned there.
WDYT?

Tutorial on Filtering Host Reads - just a thoughtNew Topic Idea

Hi Mike - I always love browsing your tutorials for suggestions when analyzing data. So far, I have mostly worked with amplicon data, but I am now working with metagenomes. Because I couldn't separate the host (pelagic Sargassum) from its microbial associates, my DNA samples and consequently my data have both Sargassum and microbial reads. Another thing to take into account is that I do not have a host reference. I have searched for a tutorial, but nothing really fit the bill...at least not well.

I imagine many people deal with this. Do you think a tutorial on how to separate host reads from microbes without a reference genome is in the scope of your site?

Thank you!
Taruna

[JOSE Review] de novo genome assembly

Providing a VM/Binder/etc. would be helpful. I know these can take time to get
setup and working correctly. Perhaps a simpler alternative would be
showing how these tools could be installed using Bioconda?
I see you use it in the MAGs lesson

Extreme loss of reads during several steps in dada2 pipeline.

Hi Mike! Thank you for posting that amazing tutorial on Amplicon Analysis. It has been so helpful for getting started in microbiome research. I had a question regarding loss of reads during the dada2 step and merging step.

I have read through both yours and Ben Callahan's tutorials and I am trying to figure out what is happening to cause such a decrease in reads during the dada_f and dada_r steps and merging steps. I've also attached my summary table of the reads at each step, so you can see the exact drop off that is occurring.

The sample-set is a coral sample sequenced using Illumina and its 2x250bp. The primers have been removed, and I'm using a trunc length of 200 for the forward reads, a trunc length of 190 for the reverse reads, and a max EE of 5 for both. I also have the merging length set to 125. Any idea as to why the reads drop so significantly from the filtered to the dada_f, dada_r, and merging stage?

Summary_tab_filtration_scores Summary_tab_filtration_scores.c.pdf
Thanks,

Emily

suggestion

Thank you for the wonderful tutorials, you clearly leveling the steep learning curve in Bioinformatics. Any chance the content menu on the right of a page be set to auto hide to enhance readability. That will be great.

Thanks again

Mistake, bug, or typo

Describe the mistake, bug, or typo here:
I have attached the screenshot.

If we want to append an output to a file, rather than overwrite it, we can use two of them instead, >>:

After this line the code is:
ls >> directory_contents.txt

I think after ls, there should be a directory name because what else is this code appending?
I'm very new to this, sorry if I'm wrong.

Link to the page containing the issue:
https://astrobiomike.github.io/unix/wild-redirectors#redirectors-and-wildcards

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
Just adding experiment/ before >> would suffice, I think.

Mistake when running the protocol

Describe the mistake, bug, or typo here:
Bad taxrank argument. Must be among the values of rank_names(physeq)

Link to the page containing the issue:
phyla_counts_tab <- otu_table(tax_glom(ASV_physeq, taxrank="Phylum"))

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
I recently followed your tutorial R code in my computer. When I ran the above step, and got the problem. I think this question would be due to the different version of phyloseq. Yes, the Rstudio, R, and R_package are all different from those in the binder. The older version is not easy to download followed your tutorial. Could you please update the tutorial based on the latest version of Rstudio, R, and R_package? I have got a lot of questions during the study of this tutorial. Unfortunately, it is not a easy problem for me about the conflict between different versions. Thank you very much!

Deseq error

Describe the mistake, bug, or typo here:
Error in DESeqDataSetFromMatrix(count_tab, colData = sample_info_tab, :
ncol(countData) == nrow(colData) is not TRUE
Link to the page containing the issue:
Line 290 deseq_counts <- DESeqDataSetFromMatrix(count_tab, colData = sample_info_tab, design = ~type)
Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
Dear AstrobioMike，
Here, I got another error in running my own data. I had search for an answer. However, I still don't know why for I just followed the codes of the tutorial. It was no problem for your data.
This is my sample_info_tab

This is my count_tab
The answer for a similar problem like this :
colData should contain a row for each sample in the analysis, which countData should contain one column for each sample. Therefore the number of rows in colData should be the same as the number of columns in countData and both should be equal to the number of samples in the study. https://www.biostars.org/p/418963/#418964
I don't know how to fix this error.
Thank you very much!

Best wishes

Long

[JOSE Review] Community contribution requirement

There are links on the rendered page suggesting users can create issues. There
could be some template instructions in the issue and GitHub has features to
support this (e.g. https://help.github.com/en/articles/setting-guidelines-for-repository-contributors).

[JOSE Review] DADA2 Full example workflow

You have a for loop in your example. Previous commands work well with copy/paste.
Since this one does not - perhaps some explanation to the learner about formatting?

Dada2 merging doubt

Hey Mike!

First of all thank you a lot for your workflow for 16S data, I have learned a lot and it helped me solve some issues. I have followed every step and get comparable results with those of a colleague which uses a different methodology but I have noticed that I get a big drop in the retained reads in the merging step.
This is an output of the read count tracking tab:

dada2_input | filtered | dada_f | dada_r | merged | nochim | final_perc_reads_retained
26405 | 26405 | 20968 | 23301 | 2711 | 2630 | 10
12521 | 12521 | 6869 | 6914 | 680 | 680 | 5

I am using 2x100bp reads generated using V3 and V4 primers. I have checked my raw data and it has a fairly good quality. I have tried the same pipeline with and without trimming the primers with similar results.
The best scenario is a case in which a get a 79% retention with the following parameters: no primer trimming, maxEE= 2,2, trimLeft=19,19, trimRight=10,10, maxQ=2
This last case is even better than without trimming or filtering.

I find this a little odd, do you have any suggestions?

Thank you for your help!

De novo genome assembly

Hi Mike, I hope you are doing well.

I am new in bioinformatics. I have only one question. Could I perform the genome assembly in mac terminal using the conda environment "de_novo_example"

Thank you very much for your response

Error in running the process in my data

Describe the mistake, bug, or typo here:
Error in $<-.data.frame(*tmp*, color, value = character(0)) :
replacement has 0 rows, data has 150

Link to the page containing the issue:
Line 283 sample_info_tab$color <- as.character(sample_info_tab$color)

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
Dear AstrobioMike,
Could you please help me to fix the above question? I have search a solution for this question. I got one answer "You need to check your variable name (column name), the reason for this problem is usually because the variable name in your function is different from the variable name (column name) in the data frame." However, I don't know how to readjust parameters in your tutorial. The column of my "sample_info_tab" including "group", "site", "Depth", and "Plant". May I use the following code to run before this step "sample_info_tab$color <- as.character(sample_info_tab$color)".
names = c('group')
sample_info_tab[,names] <- sapply(sample_info_tab[,names] , factor)

Best wishes

Long

Broader taxonomy resolution after distinguishing ASVs

Hi Mike,

I noticed that you mention in a couple of places in the amplicon data section of your website that it may be possible to "go back and cluster [ASVs] into some form of OTU." Do you have any recommendations about how to do this that you would consider adding to the site or maybe can describe here?

I am working with a temporal dataset of 18S amplicon data and have about 160 ASVs. As I continue on the data analysis path, I am thinking about various ways to broaden the taxonomy a bit to pull out larger patterns in species occurrence. I also have thought about just grouping all of the "_sp" designated ASVs into their respective genus but there are some ASVs classified as "_sp." that occur at different times (winter species) than other "_sp." ASVs so I am trying to think of other ways to cluster that would still allow me to detect temporal patterns at a broader resolution. I hope that makes sense. Also thank you so much for all of the details on this site -- I am constantly returning to refresh my memory!

Thanks,
Diana

all the deposited samples from "Seafloor Basalts" paper seem to be the same (!)

Hey, hello Mike,
I was checking your dada2 microbiome tutorial and I started to play with the full data set from your paper:
The thing is that I downloaded all the files and after running some steps in my pipeline (qiime2/dada2), I got that every single sample has the exact same microbial composition. So I went and checked all the downloaded files and as far I can tell, they all are the same (!). I got them from NCBI and ENA to be sure about it. Just a few as an example:

seqkit stats *fastq.gz

file	format	type	num_seqs	sum_len	min_len	avg_len	max_len	Q1	Q2	Q3	N50	Q20(%)	Q30(%)
reads/PRJNA295345/SRR2397934/SRR2397934_1.fastq.gz	FASTQ	DNA	5887653	1750383981	32	297.3	301	300	300	300	300	97.29	92.14
reads/PRJNA295345/SRR2397934/SRR2397934_2.fastq.gz	FASTQ	DNA	5887653	1752648301	32	297.7	301	300	300	301	300	92.41	83.08
reads/PRJNA295345/SRR2398152/SRR2398152_1.fastq.gz	FASTQ	DNA	5887653	1750383981	32	297.3	301	300	300	300	300	97.29	92.14
reads/PRJNA295345/SRR2398152/SRR2398152_2.fastq.gz	FASTQ	DNA	5887653	1752648301	32	297.7	301	300	300	301	300	92.41	83.08
reads/PRJNA295345/SRR2398177/SRR2398177_1.fastq.gz	FASTQ	DNA	5887653	1750383981	32	297.3	301	300	300	300	300	97.29	92.14
reads/PRJNA295345/SRR2398177/SRR2398177_2.fastq.gz	FASTQ	DNA	5887653	1752648301	32	297.7	301	300	300	301	300	92.41	83.08
reads/PRJNA295345/SRR2398282/SRR2398282_1.fastq.gz	FASTQ	DNA	5887653	1750383981	32	297.3	301	300	300	300	300	97.29	92.14
reads/PRJNA295345/SRR2398282/SRR2398282_2.fastq.gz	FASTQ	DNA	5887653	1752648301	32	297.7	301	300	300	301	300	92.41	83.08
reads/PRJNA295345/SRR2398314/SRR2398314_1.fastq.gz	FASTQ	DNA	5887653	1750383981	32	297.3	301	300	300	300	300	97.29	92.14
reads/PRJNA295345/SRR2398314/SRR2398314_2.fastq.gz	FASTQ	DNA	5887653	1752648301	32	297.7	301	300	300	301	300	92.41	83.08

When I tried diff in some subsets of the files I got that are effectively the same sequences in different files, so it seems understandable why I got exactly the same microbial composition for every sample:

diff -y --suppress-common-lines SRR2397934_1.head SRR2398282_1.head | head
@SRR2397934.1 1 length=92                                     | @SRR2398282.1 1 length=92
+SRR2397934.1 1 length=92                                     | +SRR2398282.1 1 length=92
@SRR2397934.2 2 length=301                                    | @SRR2398282.2 2 length=301
+SRR2397934.2 2 length=301                                    | +SRR2398282.2 2 length=301
@SRR2397934.3 3 length=35                                     | @SRR2398282.3 3 length=35
+SRR2397934.3 3 length=35                                     | +SRR2398282.3 3 length=35
@SRR2397934.4 4 length=291                                    | @SRR2398282.4 4 length=291
+SRR2397934.4 4 length=291                                    | +SRR2398282.4 4 length=291
@SRR2397934.5 5 length=301                                    | @SRR2398282.5 5 length=301
+SRR2397934.5 5 length=301                                    | +SRR2398282.5 5 length=301

So that's it, I've been really confused about this the whole afternoon. What do you think?

[JOSE Review] General Comments

Nice work! Here are some overall feedback:

No where is it clear where to ask for help. It's probably unrealistic for you to
provide a high-level of support for a site like this, but some links and explanation
about resources like Biostars, StackOverflow, etc.
would be a reasonable support option.

BASH and R lessons would benefit from some "thought" questions (i.e. what do you
think this does? What might happen if we try?). I think these two sections have
very helpful content, but to satisfy the pedagogical requirements of publication,
these sections need some clear statements of intended learning outcomes and
clear questions (summative and if possible formative) to help a learner self-assess.
Learners need to test to see if they understand a concept, and if they aren't understanding
they should get some clues on where to look if additional explanation
is needed.

There should be some explanation that the BASH and R sections are suited for novice
learners and that while these lessons may better help a learner understand the
Amplicon Analysis and Genomics tutorials, there is still a large learning gap
between those intros and full workflows.

Learning objectives are implicit in some places, but they could be more clearly stated.
Tutorials are thoroughly work, but without any kinds of formative assessments,
it may be difficult for learners to transfer their understanding to a situation outside
the context of the tutorial.

qiime2 and dada2. What is the difference between dada2 in qiime2 and dada2 as a standalone tool?

About amplicon sequencing that uses DADA2 in QIIME2 environment, I wonder is it actually generating ASVs or OTUs? Is that means that as long as I am using DADA2 (no matter where it is embedded) I am actually generating ans ASV table?

hmm-sources-Campbell_et_al

sorry to bother, well, I have installed hmm with conda, but my hmm source only have Ribosomal_RNAs, Protista_83, Bacteria_71 and Archaea_76. I have no idea about how to add other sources like Campbell_et_al and some others to my local sources, so if could you give me some suggestions about this?

[JOSE Review] R lessons

R Basics,

Might it be a good idea to have screenshots of the RStudio Environment
(Consider using RStudio cloud? or a VM). Would a short video help here?
For example, people can use CyVerse VICE (a project I work on) to
launch Jupyter/RStudio/RShiny environments. It's still in its first release
but I think is working pretty well. Would be happy to share talk if
there is interest in setting this up for you. (Follow up: I see in the
DADA2 lesson you use Binder to launch RStudio, so maybe here too?)

Although RStudio is recommended, its not clear why since the tutorial does
not really cover the value-add of the RStudio environment.

DataCamp is listed as a recommendation for future learning. It is your choice on
what to recommend, but some have withdrawn recommendation due to
their handling of misconduct: https://carpentries.org/blog/2019/04/datacamp-response/

R Installing Packages
Bioconductor is mentioned later in the text, but there is not much of an introduction to it. Maybe a few sentences to make the concept clear (perhaps starting from your introduction
when you mention CRAN) that these are good places to search for packages.

Mistake, bug, or typo

Hi, thanks for the concise and clear doc about $PATH in https://astrobiomike.github.io/unix/modifying_your_path

I followed all steps but when typing .. my-bin % ./what_time_is_it.sh
I get the message
zsh: no such file or directory: ./what_time_is_it.sh

I assume that the problem is that it is a "zsh" shell. Changed it to "bash" shell but do not know whether this was the problem - because ... see below

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
There are mistakes in the description. The correct spelling of the sh file with dashes, i.e. "what-time-is-it.sh" whilest in the description it is sometimes written with underscores "what_time_is_it.sh"

Amplicon analysis - My output is different from yours

I'm new to bioinformatic. In order to learn the DADA2 pipeline, i have followed your guide at https://astrobiomike.github.io/amplicon/dada2_workflow_ex#so-what-now. However my output is slightly different than the example output. I wonder what could be the causes.
Rplot01.pdf
Rplot03.pdf
Rplot04.pdf

I hope that you could kindly address the issue.
Thanks for your time.

REgarding the full example workflow DADA2

Hi Mike! first of all thansk for the tutorial and clear info in your page!

I'm following both DADA2 tutorial (https://benjjneb.github.io/dada2/tutorial.html) and your pipeline too (https://astrobiomike.github.io/amplicon/dada2_workflow_ex#inferring-asvs).
I wanted to ask you why in your own example the ASV table creation cames before merging of sequences?
This is the oposite in the DADA2 tutorial, with my few experience in metabarcoding pipelines I would tend to think first in merge the reads before inferring ASV. Have you a word to say?

Thanks in advance and have a good day.
César

How to define vector for decontam in a merged sequence table?

I'm following your wonderful tutorial for DADA2 analysis of my datasets and my heartfelt thanks for this noble initiative of yours. Recently I have had to run independently two batches of datasets on DADA2 and merge the sequence tables using the mergeSequenceTables function (as suggested in the DADA2 workflow for Big Data). When I do colnames(asv_tab), the output looks like below. In the first technical batch, the first 4 samples represent the negative control libraries and in the second technical batch, the 92nd sample i.e., "B1NEGATIVE" is the sole negative control. At the time of defining the vector for decontam for the whole set of 150 samples, I would like to include the 92nd sample "B1NEGATIVE" as well. As in your tutorial, its straightforward to define the first 4 samples as vector for decontam by doing vector_for_decontam <- c(rep(TRUE, 4), rep(FALSE, 146)) but I was wondering if there's a convenient way here to define the 92nd sample "B1NEGATIVE" as well? I'm sure this is a silly thing but unfortunately, I'm quite pathetic with R and I guess I have a sort of brain fog this morning. Another workaround that I can think of is to load the data as phlyoseq objects and run decontam after specifying in the sample metadata file an additional column as to whether each sample represents a true sample or a control sample. I'm hoping to stick to your tutorial way as I personally feel that yours is a bit more convenient here and hence, my silly request for a quick-fix. Many thanks in advance!

[1] "B1EBNEGATIVE"          "B2T0NEGATIVE"  "B3T1NEGATIVE"  "B4TFNEGATIVE" 
[5] "Treat10T0"      "Treat10T1"      "Treat10TFINAL"  "Treat11T0"     
[9] "Treat11T1"      "Treat11TFINAL"  "Treat12T0"      "Treat12T1"     
[13] "Treat12TFINAL"  "Treat13T0"      "Treat13T1"      "Treat13TFINAL" 
[17] "Treat14T0"      "Treat14T1"      "Treat14TFINAL"  "Treat15T0"     
[21] "Treat15T1"      "Treat15TFINAL"  "Treat1T0"       "Treat1T1"      
[25] "Treat1TFINAL"   "Treat2T0"       "Treat2T1"       "Treat2TFINAL"  
[29] "Treat3T0"       "Treat3T1"       "Treat3TFINAL"   "Treat4T0"      
[33] "Treat4T1"       "Treat4TFINAL"   "Treat5T0"       "Treat5TFINAL"  
[37] "Treat6T0"       "Treat6T1"       "Treat6TFINAL"   "Treat7T0"      
[41] "Treat7T1"       "Treat7TFINAL"   "Treat8T0"       "Treat8T1"      
[45] "Treat9T0"       "Treat9T1"       "Treat9TFINAL"   "CCTRL10T0"    
[49] "CCTRL10T1"     "CCTRL10TFINAL" "CCTRL11T0"     "CCTRL11T1"    
[53] "CCTRL11TFINAL" "CCTRL12T0"     "CCTRL12T1"     "CCTRL12TFINAL"
[57] "CCTRL13T0"     "CCTRL13T1"     "CCTRL13TFINAL" "CCTRL14T0"    
[61] "CCTRL14T1"     "CCTRL14TFINAL" "CCTRL15T0"     "CCTRL15T1"    
[65] "CCTRL15TFINAL" "CCTRL1T0"      "CCTRL1T1"      "CCTRL1TFINAL" 
[69] "CCTRL2T0"      "CCTRL2T1"      "CCTRL2TFINAL"  "CCTRL3T0"     
[73] "CCTRL3T1"      "CCTRL3TFINAL"  "CCTRL4T0"      "CCTRL4T1"     
[77] "CCTRL5T0"      "CCTRL5T1"      "CCTRL5TFINAL"  "CCTRL6T0"     
[81] "CCTRL6T1"      "CCTRL6TFINAL"  "CCTRL7T0"      "CCTRL7T1"     
[85] "CCTRL7TFINAL"  "CCTRL8T0"      "CCTRL8T1"      "CCTRL8TFINAL" 
[89] "CCTRL9T0"      "CCTRL9T1"      "CCTRL9TFINAL"  "B1NEGATIVE"           
[93] "FTreat10T0"       "FTreat10T1"       "FTreat10T2"       "FTreat1T0"       
[97] "FTreat1T1"        "FTreat1T2"        "FTreat2T0"        "FTreat2T1"       
[101] "FTreat2T2"        "FTreat3T0"        "FTreat3T1"        "FTreat3T2"       
[105] "FTreat4T0"        "FTreat4T1"        "FTreat4T2"        "FTreat5T0"       
[109] "FTreat5T1"        "FTreat5T2"        "FTreat6T0"        "FTreat6T1"       
[113] "FTreat6T2"        "FTreat7T0"        "FTreat7T1"        "FTreat7T2"       
[117] "FTreat8T0"        "FTreat8T1"        "FTreat8T2"        "FTreat9T0"       
[121] "FTreat9T1"        "FTreat9T2"        "CTRL10T0"      "CTRL10T1"     
[125] "CTRL10T2"      "CTRL1T1"       "CTRL1T2"       "CTRL2T0"      
[129] "CTRL2T1"       "CTRL2T2"       "CTRL3T0"       "CTRL3T1"      
[133] "CTRL3T2"       "CTRL4T0"       "CTRL4T2"       "CTRL5T0"      
[137] "CTRL5T1"       "CTRL5T2"       "CTRL6T0"       "CTRL6T1"      
[141] "CTRL6T2"       "CTRL7T0"       "CTRL7T1"       "CTRL7T2"      
[145] "CTRL8T0"       "CTRL8T1"       "CTRL8T2"       "CTRL9T0"      
[149] "CTRL9T1"       "CTRL9T2"

Mistake, bug, or typo

Describe the mistake, bug, or typo here:
Link to Slack channel at bottom of page not working. Error is:
'ZEIT Now 1.0 has been shut down. As part of this, all ZEIT Now 1.0 Deployments are now inaccessible.'

Link to the page containing the issue:
https://astrobiomike.github.io/unix/getting-started
At bottom of page

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
Not good enough to help. Just wanted you to know!

[JOSE Review] BASH lesson

Setup instructions use curl, but that may not be installed on all computers;
perhaps include instructions to installing.

Nano issue

Hello Mike,

Firstly, I would like to say thank you for the tutorials! They are very helpful. As for what I have noticed, loading the link to the Binder on the page for "An introduction for Scripting", nano is not pre-loaded into the terminal. This means that you cannot proceed with the tutorial. I have tried looking up how to install nano in the notebook, but have had no luck so if it is something I am doing , please let me know! The error is: bash: nano command not found

Link to the page containing the issue:
https://astrobiomike.github.io/unix/scripting#binder-available

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
I have been coming back on-and-off when I do these lessons, so each time I would go to a previous page to load the Binder, which solves the issues as nano is installed on the other links prior to this, such as this one https://astrobiomike.github.io/unix/getting_unix_env

HappyBelly workflow change: cut adapt vs bbmap/bbduk

Hi Mike -
I am getting back to a series of datasets that I had processed months back. I had been referencing your happy belly site (which is such a great help!). I've run these datasets through DADA2, but I noticed that you'd changed your workflow from using bbmap/bbduk to using cut adapt. Any reason I shouldn't use bbmap/bbduk?

I have 3 data sets that I am merging (2 were run with 454 and one with PE MiSeq Illumina). My plan is to run them through DADA2 and then merge them before moving to phyloseq.

one example command I'd used:
bbmap/bbduk.sh in=BKG.14_R1.fq \ out= BKG.14_R1_trimmed.fq \ literal= AGRGTTTGATCMTGGCTCAG k=10 ordered=t mink=2 \ ktrim=l rcomp=f minlength=404 maxlength=503 tbo

many thanks!
Lara

Error in # generating and visualizing the PCoA with phyloseq

Describe the mistake, bug, or typo here:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'unique': argument 1 is not a vector

Link to the page containing the issue:
Line 337-342

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
Dear Mike,
I face a new problem in running this step:
this code in the tutorial,
plot_ordination(vst_physeq, vst_pcoa, color="char") +
geom_point(size=1) + labs(col="type") +
geom_text(aes(label=rownames(sample_info_tab), hjust=0.3, vjust=-0.4)) +
coord_fixed(sqrt(eigen_vals[2]/eigen_vals[1])) + ggtitle("PCoA") +
scale_color_manual(values=unique(sample_info_tab$color[order(sample_info_tab$char)])) +
theme_bw() + theme(legend.position="none")
this code I have revised based on my own data,
plot_ordination(vst_physeq, vst_pcoa, color="Plant") +
geom_point(size=1) + labs(col="group") +
geom_text(aes(label=rownames(sample_info_tab), hjust=0.3, vjust=-0.4)) +
coord_fixed(sqrt(eigen_vals[2]/eigen_vals[1])) + ggtitle("PCoA") +
scale_color_manual(values=unique(sample_info_tab$color[order(sample_info_tab$char)])) +
theme_bw() + theme(legend.position="none")
The error message just like the above list:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'unique': argument 1 is not a vector
I don't know how to fix it. I had tried to remove the color = Plant and labs ( col=group). It still had the same error.
Could you please help me to fix this problem?

Best wishes
Long

Support for SILVA v138.1

I'm new to this sort of analysis and thanks a zillion for this excellent tutorial! I was wondering if you could perhaps include steps to use the latest version of silva database (v138.1) since this database version doesn't seem to be made available on DECIPHER. I did the below instead of DECIPHER but (obviously) the rest of your R command-lines then break. Perhaps this is too much to ask but I would appreciate any quick-fix ideas you might have. Also, I'm a newbie to R which is not helping with things here either.

tax_info <- assignTaxonomy(seqtab.nochim, "/home/cvb/tools/dada2_db/silva_nr99_v138.1_train_set.fa", multithread=TRUE); save.image(file="tax-info.RData")
load("tax-info.RData")

Specifically, the below is what breaks

asv_tax <- t(sapply(tax_info, function(x) {
  m <- match(ranks, x$rank)
  taxa <- x$taxon[m]
  taxa[startsWith(taxa, "unclassified_")] <- NA
  taxa
}))
Error in x$rank : $ operator is invalid for atomic vectors

Bioconductor installation instructions

Describe the mistake, bug, or typo here:
The installation instructions for Bioconductor packages are based on the biocLite() function. However, Bioconductor packages are currently installed via the BiocManager package, see https://www.bioconductor.org/install/

Link to the page containing the issue:
https://github.com/AstrobioMike/AstrobioMike.github.io/blob/master/R/installing_packages.md

Suggested fix – if any, just reporting what's wrong is super-helpful either way :)
Update the Bioconductor installation instructions :)

astrobiomike / astrobiomike.github.io Goto Github PK

astrobiomike.github.io's Introduction

Hi there!

astrobiomike.github.io's People

Stargazers

Watchers

Forkers

astrobiomike.github.io's Issues

Recommend Projects

Recommend Topics

Recommend Org