The bioc-rnaseq from carpentries-incubator

I think emphasis could also be put on how to use visualization tools for RNA-Seq data and analysis output. I namely think about iSEE or GeneTonic.

suggest episode re-order to 5. DE analysis, 6. Exploration of design matrices, 7. Re-visit practice data analysis, 8. Gene set

At the BioC2023 workshop, I jumped from episode 5 DE analysis to episode 8 to explore design matrices. My rationale was that episode 5 was just showing the mechanics of how DESeq2 works and how to pull out contrasts and to that end we had to throw in a design matrix of some sort without actually really considering what the coef/contrasts were measuring. [edited out something I had wrong before!]

I think we could leave episode 5 mostly as it is, but kind of gloss over what each coef/contrast is measuring other than generally time differences or sex differences. Then use ExploreModelMatrix to figure out what it was actually measuring, stepping through the simple to the more complex. Then write a new episode of how to analyze the dataset depending on what information/contrasts do we want and how to pull them out. Show examples of using DESeq2's contrast = c("time","Day8","Day0") but possibly as a single factor with 6 levels to easily pull out specific pairwise comparisons. And maybe even how to do an interaction test. [NOTE: when I live-coded this in class, I resorted to limma's makeContrasts() in order to get a numeric contrast vector to give to contrast =.

In episode 5 I also added in an example of how to write out your results to a .csv or .tsv file (Examples in Day2_08-01-2023.R at https://uofi.box.com/s/o71rrvlfqb84seewl4n31wqkif13uy12)

FINAL THOUGHT: Also could go from episode 1, experimental design, directly into ExploreModelMatrix. The rationale being you need to understand how you will analyze your experiment before you actually conduct your experiment to make sure you don't confound anything or can't answer the question you are trying to answer. ExploreModelMatrix doesn't need data, just hypothetical samples belonging to various experimental factors. However, this could possibly break the attendees' brains by having it so early!

Add explanatory text to experimental design episode

The experimental design episode contains several examples of experimental designs, but needs some explanatory text.

Add exercises to differential expression episode

Add exercises to the differential expression episode.

Provide an overview of RNA-seq

Write a short background/overview explaining how the data is generated, QC steps and gene expression quantification.

Save assembled SE object in output/ rather than in data/

For consistency with the suggestions made in episode 2, the assembled SE object in episode 3 should go in output/ rather than in data/.

Not recognizing .Rmd files in episodes directory?

@lgatto @csoneson asking you this before kicking it upstairs. I tried to do the re-organization/collapse of the episodes which involved re-naming most of the espisode .Rmd files, which I did with git mv. All seemed to go well but when I made the pull request https://github.com/carpentries-incubator/bioc-rnaseq/actions/runs/5003856029/jobs/8965634091?pr=42 there is an error that The /home/runner/work/bioc-rnaseq/bioc-rnaseq/episodes directory must have (R)markdown files. I get the same error when I try to build locally:

> sandpaper::serve()
Error: The C:/GitHubRepos/bioc-rnaseq/episodes directory must have (R)markdown files

The files all have .Rmd so what could be the problem?

Transition to workbench

Hi @zkamvar,

similarly to carpentries-incubator/bioc-intro#76 and carpentries-incubator/bioc-project#48, I would like to ask for your help with transitioning also this repository to the new workbench format.

Thanks a lot in advance, and let me know if I can do anything to help!

All the best,
Charlotte

Instructions

Thanks for contributing! ❤️

If this contribution is for instructor training, please email the link to this contribution to
[email protected] so we can record your progress. You've completed your contribution
step for instructor checkout by submitting this contribution!

If this issue is about a specific episode within a lesson, please provide its link or filename.

Keep in mind that lesson maintainers are volunteers and it may take them some time to
respond to your contribution. Although not all contributions can be incorporated into the lesson
materials, we appreciate your time and effort to improve the curriculum. If you have any questions
about the lesson maintenance process or would like to volunteer your time as a contribution
reviewer, please contact The Carpentries Team at [email protected].

You may delete these instructions from your comment.

- The Carpentries

Replace UCSC with NCBI

In Finding the reference sequences section of Episode 1, remove UCSC and replace it with NCBI, which is much more heavily used in the US

Manually update workflow files.

Hello, I noticed that https://github.com/carpentries-incubator/bioc-rnaseq/actions/runs/5116496804 is failing due to an error in the JS of a dependency.

This will prevent updates to the package cache and the workflows from coming in. Please follow the instructions at https://carpentries.github.io/sandpaper-docs/update.html#via-r to update the workflows using the latest version of {sandpaper}.

Note: this is also an issue for the other bioc repositories as well.

Add authors

Need to add authors here https://github.com/carpentries-incubator/bioc-rnaseq/blob/main/AUTHORS

Collapse 4-5 Intro/Set up sections into 3

We currently have the github.io landing page titled "Summary and Setup", then episodes "Introduction and setup", "Introduction to RNA-seq" and "Setup". These should be condensed to be less redundant. Possible suggestion:

Landing page "Summary and Setup" has

#39
Overview of the lessons
Prerequisites
Directions on installing/updating R & RStudio and what packages to install. Attendees are supposed to do this ahead of the workshop, not during the workshop although instructors can be available prior to the workshop to help.

Episode 1: "Introduction to RNA-Seq" has:

Explain what RNA-seq is.
Describe some of the most common design choices that have to be made before running an RNA-seq experiment.
Provide an overview of the procedure to go from the raw data to the read count matrix that will be used for downstream analysis.
Show some common types of results and visualizations generated in RNA-seq analyses.

Episode 2: "RStudio Project and Experimental Data" has:

Instructions to set up RStudio project and directory structure (can reuse https://carpentries-incubator.github.io/bioc-intro/20-r-rstudio.html#getting-set-up)
Introduce example data set and download

Episode 3 "Experimental design"

Reviewer larger experiment and see how to parameterize it in various ways
Subset down to 2 X 2 for rest of workshop
Read in subset data and put into SE

Episode 4: "Exploratory analysis and quality control" (previous episode 6)

Add explanatory text to data import episode

Contributing to episode 07-differential-expression

Add workflow diagram

Add a workflow diagram clarifying the overall analysis process and which type of data (counts, normalized/transformed values) is used for each step.

Contributing to episode 03-setup

FIXME pages don't exist

The following pages are linked in the CONTRIBUTING.md files of new lessons (I saw it in the bioc-rnaseq lesson), but the pages don't exist.

https://github.com/swcarpentry/FIXME
https://swcarpentry.github.io/FIXME.

Change color palette in barplot

Change the color palette in the barplot of library sizes in episode 4, so that male and female samples have different shades of the same color.

Add exercises to the exploratory data analysis episode

Add exercises to the exploratory data analysis episode.

Standard workflow for the "Introduction to RNA-seq" episode

Hello, everyone.

The Introduction to RNA-seq is currently empty, so I would like to contribute to it.

As far as I understood it, this episode should contain instructions on how to go from raw FASTQ files to a matrix of transcript abundances, including pre-processing steps (e.g., sequence QC, trimming adapters and low-quality sequences, etc).

However, as there are several options of software tools to use in each step of the pipeline, I think we should first agree on a workflow to use. I think we can build on the Bioc workflow package rnaseqGene. My suggested workflow would be:

QC, trimming and filtering with fastp
Quantification of transcript abundances with salmon
Data import with tximport - maybe this could go to the beginning of the Importing and annotating quantified data into R episode?

I'd love to hear what you all think.

Best,
Fabricio

paragraph introduction to the workshop

Add links to bioc-project

Add links to specific bioc-project sections in the relevant intro sections.

Same for bioc-intro

Add more exercises to gene set analysis episode

Also consider removing the challenge to convert a list representation of gene sets to a data frame representation, since we demonstrate exactly how this is done just before the challenge.

Update life cycle stage?

Is pre-alpha still the most appropriate description of the maturity of this lesson? Looking through the lesson site, the content looks quite mature, and if I recall correctly it has been taught a few times already? If so, it would be better marked as alpha or even beta, if you think it is ready for other Instructors to use.

Expand episode 1 to include experimental design & batch effects

Episode 1 does not cover what is listed in the questions, particularly "What are the different choices to consider when planning an RNA-seq experiment?". This episode needs to be greatly expanded to include information on experimental design considerations and batch effect avoidance in additional to sequencing and quantification options. See Harvard's excellent https://hbctraining.github.io/Intro-to-rnaseq-hpc-salmon-flipped/lessons/02_experimental_planning_considerations.html and credit them if re-using anything. I also have many slides/graphics that could be added.

To Do before BioC2023 workshop

List of things still needing to be done:

Tasks

Beta Give feedback

No tasks being tracked yet.

Options

Should we switch to TidySummarizedExperiment?

It would be a bit of an overhaul, but we should consider switching everything over to tidySummarizedExperiment. This would help simplify the ggplot function coding. Would we also want to switch all transformations/subsetting to tidyverse instead of baseR?

Contributing to episode 02-intro-to-rnaseq

Add intro to GRanges, or create an unranged SE

Either add a short introduction to GRanges in episode 3, or alternatively just put the information in the rowData. Later episodes use the chromosome information.

Put subheadings in Episode 1

All other episodes have subheading except 1

Add > PCs to iSEE example OR add glimmaMDS

iSEE wasn't that useful with only 2 PCs in the sce object. We could also add in Glimma as quick alternative to just the clustering (and gives stand-alone html which can be more practical for some people than shiny).

Extras to add to episode 4

I added these extras to episode 4 at the BioC2023 workshop and they should be added in:

ggsave() for plots
distributions of expression values ala limma::plotDensities()
interactive MDS clustering ala Glimma:glimmaMDS()
Challenge question showing examples of PCA/MDS showing outliers, batch effects, other problems and asking them to evaluate
Section on what to do if you have outliers or batch effects. If remove outlier, say to re-do all step from beginning (e.g. any gene filtering, scale factors will change slightly)

Add iSEE-related challenges

Add some specific exercises/tasks to the interactive exploration section in episode 4
Add DE results to rowData(dds) and explore again with iSEE in episode 5 (can use iSEEde once it is released)

Populate "Download Lesson Handout"

I tried to have sandpaper automatically create a "Download Lesson Handout" following @lgatto attempt in bioc-intro by adding options(sandpaper.handout = TRUE) to the sandpaper-main.yaml. However, this didn't seem to do anything. Googling led me to here, which says you also need to add purl = TRUE to all code chunks you want to include. I tried doing this for episode 3, but it also didn't seem to do anything when I built locally using sandpaper::serve(). However, later after pushing all commits to this GitHub repo, merging the pull request and running the GitHub action, it did appear on the Episode sidebar of the github.io website.

Add purl=TRUE to all/almost add R code chunks
Check the resulting Lesson Handout for clarity/modifications needed

'Theme & variations' exercise - change significance threshold

Note from the results output that the default significance threshold with DESeq2 is adj.P=0.1
Figure out where this is set
Change it to 0.05 instead
(Much harder) challenge: figure out why the results are not necessarily the same as just setting a threshold on the adjusted p-value in the first set of results (since the independent filtering depends on the provided alpha)
Variant: set a min logFC threshold to test an interval null hypothesis, and rerun the analysis. Again, figure out why the results are not the same as setting a threshold on the logFC after running the default test (of a point null hypothesis)

Add links to public FastQC reports

To illustrate a typical (and a not-so-good) RNA-seq FastQC report, see if we can find links to public ones.

carpentries-incubator / bioc-rnaseq Goto Github PK

bioc-rnaseq's People

Contributors

Stargazers

Watchers

Forkers

bioc-rnaseq's Issues

Tasks

Tasks

Tasks

Recommend Projects

Recommend Topics

Recommend Org