carpentries-lab / metagenomics-analysis Goto Github PK
View Code? Open in Web Editor NEWData Processing and Visualization for Metagenomics
Home Page: https://carpentries-lab.github.io/metagenomics-analysis/
License: Other
Data Processing and Visualization for Metagenomics
Home Page: https://carpentries-lab.github.io/metagenomics-analysis/
License: Other
Reviewer's comments:
Reviewer's comment: Most exercises are command-line based, adding more conceptual questions or discussion could help evaluate understanding and relevance of the topic better.
I think we must have an automatized step
Although jumping between the Metagenomics and the "Introduction to the Command line for Genomics" was good enough, there are a bunch of bash commands required to parse metagenomics files, mostly in the Diversity analysis lesson. I think teaching Shell could be better integrated, ideally, the relevant lessons from "Introduction to the command line for genomics" should be carried out using data to be used in the metagenomics pipeline.
Reviewer's comment:The file cuatroc.biom that is used for the work in episode Diversity Tackled With R would be useful to be provided (in case learners encountered issues creating it).
To do: Make sure that the file is in the hidden backup directory and put a note somewhere about it.
R section needs more analysis, may be include further examples from Phyloseq
Automating a Quality control Workflow
Change this episode for a box at the end of trimming episode and reference the genomics episode.
Paul and Abel can create a box describing software and references for nanopore, and why nanopore differes from illumina
Nelly
Adapter files are not in the correct location. They must be put in the right location and update this in zenodo.
Software Carpentry already have a pipeline for 16s at https://nwu-eresearch.github.io/2017-10-24-ARC-16S/
and there are 4 Cienegas data available at MG-RAST that may be included
Reviewer's comment: It is quite a dense lesson and although the structure is appropriate for learners, I think it could be too much for people who have never used the command line. The exercises would definitely healp ease them into the shell. I don't really have any practical suggestions other than considering splitting this part or maybe adding more practical exercises inbetween.
Reviewer's comment: Throughout the lesson, there are several small typos that need fixing. :)
To do: Review all episodes for typos and correct them.
Reviewer's comment:In the Data Tidiness episode, the "Structuring data in spreadsheets" should also contain a definition of the sample/observation vs variable.
To do: Add definition of sample/observation and variable.
Reviewer's comments:
1)In the Working with Files and Directories episode, the "Examining Files" starts with the cat command, but it's not actually used. This might increase confusion, so it might need to be removed.
2) In the Working with Files and Directories episode, the "Details on the FASTQ format" section is rather disjoined to the rest of the episode. It does provide useful context, but it may be either added as an optional/note box, or possibly moved to the episode on analyzing the FASTQ files.
Reviewer's comments:
1)The Windows installation instructions for Git are not up-to-date (there are additional steps in the installer - as of 27/06/2022). Additionally, for the Mac instructions, and in addition to the video that is helpful, the link to the software could be added directly as well (for convenience).
Option B: there are several links to the manual pages of the required software that are missing. This poses a challenge when trying to find the installation instructions (a clear case is the CheckM-genome software that requires a bit of an effort to find the install instructions). Additionally, the version of trimmomatic is listed as 0.38, but only versions 0.37/0.39/0.40 are available on the linked page. Finally, the link to the MaxBin2 software links back to the workshop page.
A local installation through conda was challenging, as the current versions led to multiple conflicts that could not be resolved. This was equally true both when installing each tool independently, and when using the provided yaml file. However, using the specification file (listed under Option A), it was possible to create a working environment. It may be useful to highlight the specification file also as an option to create a local instance (including the few commands that need to be executed as the final step), as well as for setting up the AWS instance.
The table "Software for Bash" might need some edits, e.g. removing or updating the "Available for" entries, and also moving the description of the KronaTools to the appropriate column.
In https://carpentries-incubator.github.io/metagenomics/07-Diversity-tackled-with-R/index.html, the exercise/solution block for exercise 2 seems to be misformatted.
Reviewer's comments:
1)The image https://carpentries-incubator.github.io/shell-metagenomics/fig/02-01-01.png does not have a sufficiently detailed alt text.
2)The image https://carpentries-incubator.github.io/shell-metagenomics/fig/02-02-01.svg does not have a sufficiently detailed alt text.
3)The alt text of image https://carpentries-incubator.github.io/shell-metagenomics/fig/02-05-01.png could be slightly more descriptive.
4)The image https://user-images.githubusercontent.com/67386612/118720027-ba433c00-b7ee-11eb-87e5-7496fde5763e.png should have a more detailed alt text (that being said, the Figure 1 caption should be sufficient for this)
5)The alt text of image https://carpentries-incubator.github.io/metagenomics/fig/03-02-03.png has some phrasing issues, and could also be enhanced.
6)The alt text of image https://carpentries-incubator.github.io/metagenomics/fig/03-03-01.png could be enriched with more information.
7)There is insufficient alt text in the image https://carpentries-incubator.github.io/metagenomics/fig/03-04-01.png and https://carpentries-incubator.github.io/metagenomics/fig/03-05-01.png
8)There is insufficient alt text in the image https://carpentries-incubator.github.io/metagenomics/fig/03-06-01.png
To do: Improve alternative text in the mentioned images following the alternative text guidelines: UCSF: Accessibile Images Best Practices
The bIg Hack: Avoid these common alt-text mistakes.
And review the alternative text in the rest of the images not mentioned here.
Reviewr's comments:
In the Writing Scripts and Working with Data episode, there is no proposed text to write in nano. Although it's understood that this allows for more creativity from the learners, it may be useful to add an example for guidance. Such an example will also assist in the flow of the section on "Writing files", as it is currently a bit unclear.
In the Writing Scripts and Working with Data episode, the sentence "You will learn more about writing scripts in a later lesson." links back to the same episode.
In the Writing Scripts and Working with Data episode, the "Transferring data between your local machine and the cloud" needs to be adapted to also fit the case of a local installation - or be provided with a possible alternative.
In the Writing Scripts and Working with Data episode, the section on "Versioning scripts with Git and GitHub" would lead to a confusion, given the target audience. Although knowledge of Git is undoubtedly a useful skill, it may not be easily connected here.
Reviewer's comments:
1)Compared to the content of the first three lessons, this lesson is much more dense, both in terms of content as well as in topics per episode. It could be considered to split some of the episodes into smaller ones.
2) This is the most dense lesson, I believe (at least the longest). I like the flow of the lesson and the contents but splitting it into two parts would be beneficial for the learners, in my honest opinion.
3) A more general comment is regarding the overall distribution of the load across the workshop. In its current form, is a bit "heavier" towards the end - although fully understandable, as a lot of the metagenomics-specific topics are tackled at that time, it may be useful to review, and possibly split, some of the episodes into smaller ones. Another approach would be to ensure that, in a 2-day workshop context, some of the first episodes of the Data processing and visualization for metagenomics lesson are tackled during Day 1, thus distributing a bit the load.
Claudia's suggestion:
Reviewer's comment: The overall content of this episode might be misleading, compared to the actual title of "Intro to R for metagenomics". Also, the dataset used (i.e. musicians) is not directly connected to metagenomics, so a more relevant toy dataset could be constructed for these purposes.
To do: Decide if change name of the lesson or change the content for it to be more about metagenomics.
Reviewer's comment:Some of the Figures in Taxonomic Analysis with R could be improved for color-blindness.
To do: Wait until the structure of the R episodes is improved and the code works to change the abundance graphs so they use a colorblind-friendly palette.
Reviewer's comments: As a suggestion, I noticed throughout the lesson the use of the period in variable names (v.examp) but in R it is also used in functions (as.logical()) and could be confusing. Variable names could be just single words to avoid confusion.
Reviewer's comments*:
To do: Review all discussions in the lesson. If appropriate make them diagnostic and add solutions. Correct typos and phrasing.
Reviewer's comments:
Kraken-biom is used in the R section, so we need to introduce biom format
The Diversity analysis required to jump from command line in the remote computer to Rstudio in each student's local computer. This brought incompatibilities in versions and OSs. It would be better to done as much as possible in the same interface and just bring files to local computer when it is most likely to work for everyone (in eg. just the JPG figures from R)
Reviewer's comment: In the Examining Data on the NCBI SRA Database, the previous version of the zenodo dataset is linked ( sequencing dataset (from Okie, et al. 2020) adapted for this workshop), compared to the dataset linked int the Setup page.
To do: Put the permanent DOI
Mathematical formulae are displayed as images in Diversity Tackled with R without alternative text. These images are inaccessible for anyone visiting the lesson using a screen reader.
To make the lesson more accessible, and easier to maintain, the lesson template allows you to replace these images with MathJax elements. Follow the steps below to replace these images.
math: yes
to the bottom of your _config.yml
file.$
signs, e.g.$$E(y) = \beta_0 + \beta_1 \times x_1 + \beta_2 \times x_2.$$
For an example of what I describe above, you can browse the _config.yml
and episode page source of the Multiple Linear Regression for Public Health lesson in the Incubator.
See the Lesson Example for further documentation of this feature of the lesson template.
We have not a binning section, and would be nice to have one
Reviewer's comment: In the Assessing Read Quality episode, the "Exercise 2: Looking at files metadata" could be better structured, in order for each answer to provide a different context to the original question.
Reviewer's comment: In the Data Tidiness section could be worth briefly mentioning confidentiality issues that may arise from storing and sharing the metadata, i.e. in the case of human samples or patents.
To do: Add a paragraph or box about confidentiality in the Data Tidiness episode.
Reviewer's comment: In Intro to R for metagenomics. Some of the screenshots have a Spanish localization (as well as the names of a few variables). This is not an issue by itself, but it would be useful to be consistent across the material.
Reviewer's comment: I think not only does the audience need to "have some familiarity with biological concepts, such as that each living organism has a genome" but also some understanding of what sequencing techniques are and what can they achieve, would be greatly beneficial, otherwise he introduction should ellaborate more on that.
To do: Decide if modify the prerequisite or add an explanation about sequencing before the current Introduction in the Data Tidiness episode.
In the taxonomic assignment part we show how to download the minikraken database, but the kraken-db is the one used in the kraken command. Since we will not run that command anyway we should put the example of download for the kraken-db and not the minikraken db, or both.
Hacer que en el pipeline se guarden las graficas y abrirlas desde un archivo para que se puedan ver.
As it is for other Data Carpentry lessons, a workshop overview to include:
Examples:
https://datacarpentry.org/ecology-workshop/
https://datacarpentry.org/genomics-workshop/
In this episode nothing is learned when the min, max and mean lines are ran. It is not clear their purpose.
https://carpentries-incubator.github.io/metagenomics/09-diversity-analysis/index.html
Reviewer's comment:In the NCBI SRA section (last) some images are quite difficult to describe, especially those of the process of downloading data from SRA. And complete transcriptions of images which only consist of text could be provided.
To do: Provide complete transcription of images with text.
Instructors notes must be developed for the Intro to R and Data processing Lessons here: https://carpentries-incubator.github.io/metagenomics-workshop/guide/index.html
Conclusions of the paper and from other studies should be included in the lesson as discussion topics and exercises.
FAQ and data sites are still pointing to genomics workshop
https://nselem.github.io/metagenomics-workshop/
These sites need to be adapted. Also we need to find some place to credit the authors of genomic lesson
Reviewer's comment:
To do : Remove link from exercise solution. Add link to all the datatypes. Put everything in english.
Reviewer's comment: The first two episode have the exact same LO defined ("What types of data does the R language has?").
To do: Put that learning objective only in the appropriate episode and remove it from the other.
Reviewer's comments:
To do: Improve the sequence of steps and remove confusing mentions to the old run selector.
Reviewer's comments:
Reviewer's comment: In the Data Tidiness episode, the "Data about the experiment" might be a bit confusing, as it currently refers to three different templates (file, Here and the guide). It may be more practical to focus on one, and have the rest as optional choices (although they appear to have a significant overlap).
To do: Improve the explanation about README files so we refer only to one thing. And if we are mentioning the other options give them different names.
Data files have very long unreadable names. Given the size of the shell window it quickly becomes very difficult to follow when multiple arguments and files are to be together in commands. I think it would be more didactic if filenames were shorter and more descriptive. I think a simple cp + mv would work at the begining when also teaching the importance of keeping raw data untouched.
Factors are used in episode 9, so they should be explained in the R section.
The "Diversity analysis" and the "Taxonomic Assignation" lessons lasted for over an 1.5 hours. I thinkj both of them should be split into two lessons.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.