Coder Social home page Coder Social logo

carpentries-lab / metagenomics-analysis Goto Github PK

View Code? Open in Web Editor NEW
9.0 9.0 28.0 20.51 MB

Data Processing and Visualization for Metagenomics

Home Page: https://carpentries-lab.github.io/metagenomics-analysis/

License: Other

Ruby 0.82% Makefile 7.96% R 7.24% Shell 4.76% Python 77.04% HTML 2.17%
lesson metagenomics english life-sciences carpentries-lab peer-reviewed stable

metagenomics-analysis's People

Contributors

aaren avatar aaronejaime avatar abbycabs avatar abought avatar abrahamavelar avatar bedxxe avatar betterlabmx avatar czirion avatar davidalberto avatar edderdaniel avatar erinbecker avatar fabel134 avatar fmichonneau avatar jdblischak avatar jduckles avatar joaorodrigues avatar jsta avatar k8hertweck avatar katrinleinweber avatar kerchner avatar maneesha avatar mawds avatar maxim-belkin avatar nselem avatar pbanaszkiewicz avatar raynamharris avatar rgaiacs avatar tbekolay avatar vanessaarfer avatar wking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metagenomics-analysis's Issues

Improve Introducing the Shell

Reviewer's comments:

  1. In the Introducing the Shell episode, a consideration could be made for accessing the shell when having a local installation (i.e. not through ssh).
  2. A short disclaimer could be added about the differences in output between an AWS instance and a local instance (in order to avoid confusion).
  3. In the Introducing the Shell episode, the "Shortcut: Tab Completion" section assumes a different starting point from where the learners might be by that time.

Integrate Shell lesson within the metagenomics lesson

Although jumping between the Metagenomics and the "Introduction to the Command line for Genomics" was good enough, there are a bunch of bash commands required to parse metagenomics files, mostly in the Diversity analysis lesson. I think teaching Shell could be better integrated, ideally, the relevant lessons from "Introduction to the command line for genomics" should be carried out using data to be used in the metagenomics pipeline.

Provide files created during the lesson

Reviewer's comment:The file cuatroc.biom that is used for the work in episode Diversity Tackled With R would be useful to be provided (in case learners encountered issues creating it).

To do: Make sure that the file is in the hidden backup directory and put a note somewhere about it.

Adapter files

Adapter files are not in the correct location. They must be put in the right location and update this in zenodo.

Intro to Command Line too dense

Reviewer's comment: It is quite a dense lesson and although the structure is appropriate for learners, I think it could be too much for people who have never used the command line. The exercises would definitely healp ease them into the shell. I don't really have any practical suggestions other than considering splitting this part or maybe adding more practical exercises inbetween.

Typos

Reviewer's comment: Throughout the lesson, there are several small typos that need fixing. :)

To do: Review all episodes for typos and correct them.

Improve Working with Files and Directories

Reviewer's comments:
1)In the Working with Files and Directories episode, the "Examining Files" starts with the cat command, but it's not actually used. This might increase confusion, so it might need to be removed.
2) In the Working with Files and Directories episode, the "Details on the FASTQ format" section is rather disjoined to the rest of the episode. It does provide useful context, but it may be either added as an optional/note box, or possibly moved to the episode on analyzing the FASTQ files.

Required software setup section

Reviewer's comments:
1)The Windows installation instructions for Git are not up-to-date (there are additional steps in the installer - as of 27/06/2022). Additionally, for the Mac instructions, and in addition to the video that is helpful, the link to the software could be added directly as well (for convenience).

  1. Option B: there are several links to the manual pages of the required software that are missing. This poses a challenge when trying to find the installation instructions (a clear case is the CheckM-genome software that requires a bit of an effort to find the install instructions). Additionally, the version of trimmomatic is listed as 0.38, but only versions 0.37/0.39/0.40 are available on the linked page. Finally, the link to the MaxBin2 software links back to the workshop page.

  2. A local installation through conda was challenging, as the current versions led to multiple conflicts that could not be resolved. This was equally true both when installing each tool independently, and when using the provided yaml file. However, using the specification file (listed under Option A), it was possible to create a working environment. It may be useful to highlight the specification file also as an option to create a local instance (including the few commands that need to be executed as the final step), as well as for setting up the AWS instance.

  3. The table "Software for Bash" might need some edits, e.g. removing or updating the "Available for" entries, and also moving the description of the KronaTools to the appropriate column.

Alternative text of images

Reviewer's comments:
1)The image https://carpentries-incubator.github.io/shell-metagenomics/fig/02-01-01.png does not have a sufficiently detailed alt text.
2)The image https://carpentries-incubator.github.io/shell-metagenomics/fig/02-02-01.svg does not have a sufficiently detailed alt text.
3)The alt text of image https://carpentries-incubator.github.io/shell-metagenomics/fig/02-05-01.png could be slightly more descriptive.
4)The image https://user-images.githubusercontent.com/67386612/118720027-ba433c00-b7ee-11eb-87e5-7496fde5763e.png should have a more detailed alt text (that being said, the Figure 1 caption should be sufficient for this)
5)The alt text of image https://carpentries-incubator.github.io/metagenomics/fig/03-02-03.png has some phrasing issues, and could also be enhanced.
6)The alt text of image https://carpentries-incubator.github.io/metagenomics/fig/03-03-01.png could be enriched with more information.
7)There is insufficient alt text in the image https://carpentries-incubator.github.io/metagenomics/fig/03-04-01.png and https://carpentries-incubator.github.io/metagenomics/fig/03-05-01.png
8)There is insufficient alt text in the image https://carpentries-incubator.github.io/metagenomics/fig/03-06-01.png

To do: Improve alternative text in the mentioned images following the alternative text guidelines: UCSF: Accessibile Images Best Practices
The bIg Hack: Avoid these common alt-text mistakes.
And review the alternative text in the rest of the images not mentioned here.

Improve Writing Scripts and Working with Data

Reviewr's comments:

  1. In the Writing Scripts and Working with Data episode, there is no proposed text to write in nano. Although it's understood that this allows for more creativity from the learners, it may be useful to add an example for guidance. Such an example will also assist in the flow of the section on "Writing files", as it is currently a bit unclear.

  2. In the Writing Scripts and Working with Data episode, the sentence "You will learn more about writing scripts in a later lesson." links back to the same episode.

  3. In the Writing Scripts and Working with Data episode, the "Transferring data between your local machine and the cloud" needs to be adapted to also fit the case of a local installation - or be provided with a possible alternative.

  4. In the Writing Scripts and Working with Data episode, the section on "Versioning scripts with Git and GitHub" would lead to a confusion, given the target audience. Although knowledge of Git is undoubtedly a useful skill, it may not be easily connected here.

Data processing and visualization too dense

Reviewer's comments:
1)Compared to the content of the first three lessons, this lesson is much more dense, both in terms of content as well as in topics per episode. It could be considered to split some of the episodes into smaller ones.
2) This is the most dense lesson, I believe (at least the longest). I like the flow of the lesson and the contents but splitting it into two parts would be beneficial for the learners, in my honest opinion.
3) A more general comment is regarding the overall distribution of the load across the workshop. In its current form, is a bit "heavier" towards the end - although fully understandable, as a lot of the metagenomics-specific topics are tackled at that time, it may be useful to review, and possibly split, some of the episodes into smaller ones. Another approach would be to ensure that, in a 2-day workshop context, some of the first episodes of the Data processing and visualization for metagenomics lesson are tackled during Day 1, thus distributing a bit the load.

Claudia's suggestion:

  1. Take ggplot section from Taxonomic Analysis with R ti a new episode in Intro to R.
  2. Make a new episode to introduce Phyloseq and make the phyloseq objects and arrangements needed for the diversity plots and taxonomic analysis.
  3. Put only diversity theory, alpha diversity plot and beta diversity plot in the Diversity episode.
  4. Only remain with the abundance plots in the Taxonomic analysis episode.

Intro to R for metagenomics is misleading

Reviewer's comment: The overall content of this episode might be misleading, compared to the actual title of "Intro to R for metagenomics". Also, the dataset used (i.e. musicians) is not directly connected to metagenomics, so a more relevant toy dataset could be constructed for these purposes.

To do: Decide if change name of the lesson or change the content for it to be more about metagenomics.

Variable names in Intro to R

Reviewer's comments: As a suggestion, I noticed throughout the lesson the use of the period in variable names (v.examp) but in R it is also used in functions (as.logical()) and could be confusing. Variable names could be just single words to avoid confusion.

Improve discussions in Project Organization and Management lesson

Reviewer's comments*:

  1. there are several discussion exercises that could have a diagnostic context.
  2. In the Data Tidiness episode, the "Discussion 2" box has a few typos and phrasing issues that need to be addressed.
  3. In the Planning for NGS Projects episode, it would be useful to have some potential discussion "solutions" after "Discussion 1" box.

To do: Review all discussions in the lesson. If appropriate make them diagnostic and add solutions. Correct typos and phrasing.

Improve Trimming and Filtering episode

Reviewer's comments:

  1. In the Trimming and Filtering episode, it may be useful to provide the Trimmomatic adapter file (cp ~/.miniconda3/pkgs/trimmomatic-0.38-0/share/trimmomatic-0.38-0/adapters/TruSeq3-PE.fa .) as a downloadable option from the material, in case the command doesn't work due to different versions.
  2. Idea: on the Trimming and Filtering section, give an example of how a multi-line command would look like.
  3. Maybe mention something about anaconda/minicionda, as it appears in some commands. There is a box about conda but the relationship between them is not clear. To do: Check in which episodes is mentioned and choose a place for this.

Add R to the remote computer

The Diversity analysis required to jump from command line in the remote computer to Rstudio in each student's local computer. This brought incompatibilities in versions and OSs. It would be better to done as much as possible in the same interface and just bring files to local computer when it is most likely to work for everyone (in eg. just the JPG figures from R)

Replace formula images with MathJax

Mathematical formulae are displayed as images in Diversity Tackled with R without alternative text. These images are inaccessible for anyone visiting the lesson using a screen reader.

To make the lesson more accessible, and easier to maintain, the lesson template allows you to replace these images with MathJax elements. Follow the steps below to replace these images.

  1. Add the line math: yes to the bottom of your _config.yml file.
  2. In the episode file, replace the images of formulae with the LaTeX equivalent, wrapped in $ signs, e.g.
$$E(y) = \beta_0 + \beta_1 \times x_1 + \beta_2 \times x_2.$$

For an example of what I describe above, you can browse the _config.yml and episode page source of the Multiple Linear Regression for Public Health lesson in the Incubator.

See the Lesson Example for further documentation of this feature of the lesson template.

Add confidentiality in Data Tidiness

Reviewer's comment: In the Data Tidiness section could be worth briefly mentioning confidentiality issues that may arise from storing and sharing the metadata, i.e. in the case of human samples or patents.

To do: Add a paragraph or box about confidentiality in the Data Tidiness episode.

Spanish names

Reviewer's comment: In Intro to R for metagenomics. Some of the screenshots have a Spanish localization (as well as the names of a few variables). This is not an issue by itself, but it would be useful to be consistent across the material.

Workshop prerequisites

Reviewer's comment: I think not only does the audience need to "have some familiarity with biological concepts, such as that each living organism has a genome" but also some understanding of what sequencing techniques are and what can they achieve, would be greatly beneficial, otherwise he introduction should ellaborate more on that.

To do: Decide if modify the prerequisite or add an explanation about sequencing before the current Introduction in the Data Tidiness episode.

Database for taxonomic assignment

In the taxonomic assignment part we show how to download the minikraken database, but the kraken-db is the one used in the kraken command. Since we will not run that command anyway we should put the example of download for the kraken-db and not the minikraken db, or both.

R no despliega graficas

Hacer que en el pipeline se guarden las graficas y abrirlas desde un archivo para que se puedan ver.

Image transcription in Examining Data on the NCBI SRA Database

Reviewer's comment:In the NCBI SRA section (last) some images are quite difficult to describe, especially those of the process of downloading data from SRA. And complete transcriptions of images which only consist of text could be provided.

To do: Provide complete transcription of images with text.

Improve R datatypes

Reviewer's comment:

  1. In the R datatypes episode, there is non rendered link in the solution of Exercise 1.
  2. In the Types of data section, only the integer data type is hyperlinked, don't know if this is on putpose, but maybe all data types could link to a more detailed description.
  3. In this Types of data section there is a mix of English and Spanish ("resultado <- "4 and 3 are not the same in Earth. In Mars maybe... ").

To do : Remove link from exercise solution. Add link to all the datatypes. Put everything in english.

Learning objectives in Intro to R

Reviewer's comment: The first two episode have the exact same LO defined ("What types of data does the R language has?").

To do: Put that learning objective only in the appropriate episode and remove it from the other.

Improve Examining Data on the NCBI SRA Database episode

Reviewer's comments:

  1. the SRA-Run section might need to be slightly restructured, mostly because it currently tries to address both the new and the legacy SRA-Run tool.
    2)Examining Data on the NCBI SRA Database can be a little overwhelming. There is too much information on the screen, but I don't know if there is something to do about it other than taking it slow and making sure everyone follows. I didn't really get the reason of using the "old RunSelector" instead of the new one, I imagine the old one will become deprecated at some point...

To do: Improve the sequence of steps and remove confusing mentions to the old run selector.

Improve Redirection

Reviewer's comments:

  1. In the Redirection episode, in the "Exercise 1: Using grep" some additional context on the produced output might be useful (esp. in the 2nd example where it has a long screen).
  2. In the Redirection episode, the "File extensions - part 2" part might be a bit confusing, given that the episode flow up until that point, would not lead to the expected error (unless you run the same command twice). It could be useful to clarify this (or slightly rephrase).
  3. In the Redirection episode, the "Writing for loops" and "Using Basename in for loops" sections do not have an evaluation associated with it (e.g. an expected output file, or a discussion on how it can be further adapted). Unless this section is critical for other parts of the overall workshop, it may be useful to change it as an "optional" content, keeping only the basename command as part of the content.

Improve Data about the experiment in Data Tidiness

Reviewer's comment: In the Data Tidiness episode, the "Data about the experiment" might be a bit confusing, as it currently refers to three different templates (file, Here and the guide). It may be more practical to focus on one, and have the rest as optional choices (although they appear to have a significant overlap).

To do: Improve the explanation about README files so we refer only to one thing. And if we are mentioning the other options give them different names.

Change name of Data Files

Data files have very long unreadable names. Given the size of the shell window it quickly becomes very difficult to follow when multiple arguments and files are to be together in commands. I think it would be more didactic if filenames were shorter and more descriptive. I think a simple cp + mv would work at the begining when also teaching the importance of keeping raw data untouched.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.