Coder Social home page Coder Social logo

carpentries-incubator / bioc-rnaseq Goto Github PK

View Code? Open in Web Editor NEW
18.0 8.0 27.0 45.31 MB

Analysis and Interpretation of Bulk RNA-Seq Data using Bioconductor

Home Page: https://carpentries-incubator.github.io/bioc-rnaseq

License: Other

R 95.24% Shell 0.05% TeX 4.71%
lesson english carpentries-incubator bioconductor pre-alpha bulk-rna-seq rna-seq-analysis r life-sciences hacktoberfest

bioc-rnaseq's People

Contributors

almeidasilvaf avatar almutlue avatar csoneson avatar cutsort avatar djhshih avatar ghar1821 avatar jdrnevich avatar jennyzadeh avatar jokergoo avatar lgatto avatar mblue9 avatar rcastelo avatar shbrief avatar tobyhodges avatar zchikwambi avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bioc-rnaseq's Issues

suggest episode re-order to 5. DE analysis, 6. Exploration of design matrices, 7. Re-visit practice data analysis, 8. Gene set

At the BioC2023 workshop, I jumped from episode 5 DE analysis to episode 8 to explore design matrices. My rationale was that episode 5 was just showing the mechanics of how DESeq2 works and how to pull out contrasts and to that end we had to throw in a design matrix of some sort without actually really considering what the coef/contrasts were measuring. [edited out something I had wrong before!]

I think we could leave episode 5 mostly as it is, but kind of gloss over what each coef/contrast is measuring other than generally time differences or sex differences. Then use ExploreModelMatrix to figure out what it was actually measuring, stepping through the simple to the more complex. Then write a new episode of how to analyze the dataset depending on what information/contrasts do we want and how to pull them out. Show examples of using DESeq2's contrast = c("time","Day8","Day0") but possibly as a single factor with 6 levels to easily pull out specific pairwise comparisons. And maybe even how to do an interaction test. [NOTE: when I live-coded this in class, I resorted to limma's makeContrasts() in order to get a numeric contrast vector to give to contrast =.

In episode 5 I also added in an example of how to write out your results to a .csv or .tsv file (Examples in Day2_08-01-2023.R at https://uofi.box.com/s/o71rrvlfqb84seewl4n31wqkif13uy12)

FINAL THOUGHT: Also could go from episode 1, experimental design, directly into ExploreModelMatrix. The rationale being you need to understand how you will analyze your experiment before you actually conduct your experiment to make sure you don't confound anything or can't answer the question you are trying to answer. ExploreModelMatrix doesn't need data, just hypothetical samples belonging to various experimental factors. However, this could possibly break the attendees' brains by having it so early!

Not recognizing .Rmd files in episodes directory?

@lgatto @csoneson asking you this before kicking it upstairs. I tried to do the re-organization/collapse of the episodes which involved re-naming most of the espisode .Rmd files, which I did with git mv. All seemed to go well but when I made the pull request https://github.com/carpentries-incubator/bioc-rnaseq/actions/runs/5003856029/jobs/8965634091?pr=42 there is an error that The /home/runner/work/bioc-rnaseq/bioc-rnaseq/episodes directory must have (R)markdown files. I get the same error when I try to build locally:

> sandpaper::serve()
Error: The C:/GitHubRepos/bioc-rnaseq/episodes directory must have (R)markdown files

The files all have .Rmd so what could be the problem?

Transition to workbench

Hi @zkamvar,

similarly to carpentries-incubator/bioc-intro#76 and carpentries-incubator/bioc-project#48, I would like to ask for your help with transitioning also this repository to the new workbench format.

Thanks a lot in advance, and let me know if I can do anything to help!

All the best,
Charlotte

Instructions

Thanks for contributing! ❤️

If this contribution is for instructor training, please email the link to this contribution to
[email protected] so we can record your progress. You've completed your contribution
step for instructor checkout by submitting this contribution!

If this issue is about a specific episode within a lesson, please provide its link or filename.

Keep in mind that lesson maintainers are volunteers and it may take them some time to
respond to your contribution. Although not all contributions can be incorporated into the lesson
materials, we appreciate your time and effort to improve the curriculum. If you have any questions
about the lesson maintenance process or would like to volunteer your time as a contribution
reviewer, please contact The Carpentries Team at [email protected].

You may delete these instructions from your comment.

- The Carpentries

Replace UCSC with NCBI

In Finding the reference sequences section of Episode 1, remove UCSC and replace it with NCBI, which is much more heavily used in the US

Collapse 4-5 Intro/Set up sections into 3

We currently have the github.io landing page titled "Summary and Setup", then episodes "Introduction and setup", "Introduction to RNA-seq" and "Setup". These should be condensed to be less redundant. Possible suggestion:

Landing page "Summary and Setup" has

  • #39
  • Overview of the lessons
  • Prerequisites
  • Directions on installing/updating R & RStudio and what packages to install. Attendees are supposed to do this ahead of the workshop, not during the workshop although instructors can be available prior to the workshop to help.

Episode 1: "Introduction to RNA-Seq" has:

  • Explain what RNA-seq is.
  • Describe some of the most common design choices that have to be made before running an RNA-seq experiment.
  • Provide an overview of the procedure to go from the raw data to the read count matrix that will be used for downstream analysis.
  • Show some common types of results and visualizations generated in RNA-seq analyses.

Episode 2: "RStudio Project and Experimental Data" has:

Episode 3 "Experimental design"

  • Reviewer larger experiment and see how to parameterize it in various ways
  • Subset down to 2 X 2 for rest of workshop
  • Read in subset data and put into SE

Episode 4: "Exploratory analysis and quality control" (previous episode 6)

Add workflow diagram

Add a workflow diagram clarifying the overall analysis process and which type of data (counts, normalized/transformed values) is used for each step.

Change color palette in barplot

Change the color palette in the barplot of library sizes in episode 4, so that male and female samples have different shades of the same color.

Standard workflow for the "Introduction to RNA-seq" episode

Hello, everyone.

The Introduction to RNA-seq is currently empty, so I would like to contribute to it.

As far as I understood it, this episode should contain instructions on how to go from raw FASTQ files to a matrix of transcript abundances, including pre-processing steps (e.g., sequence QC, trimming adapters and low-quality sequences, etc).

However, as there are several options of software tools to use in each step of the pipeline, I think we should first agree on a workflow to use. I think we can build on the Bioc workflow package rnaseqGene. My suggested workflow would be:

  1. QC, trimming and filtering with fastp
  2. Quantification of transcript abundances with salmon
  3. Data import with tximport - maybe this could go to the beginning of the Importing and annotating quantified data into R episode?

I'd love to hear what you all think.

Best,
Fabricio

Add more exercises to gene set analysis episode

Also consider removing the challenge to convert a list representation of gene sets to a data frame representation, since we demonstrate exactly how this is done just before the challenge.

Update life cycle stage?

Is pre-alpha still the most appropriate description of the maturity of this lesson? Looking through the lesson site, the content looks quite mature, and if I recall correctly it has been taught a few times already? If so, it would be better marked as alpha or even beta, if you think it is ready for other Instructors to use.

Expand episode 1 to include experimental design & batch effects

Episode 1 does not cover what is listed in the questions, particularly "What are the different choices to consider when planning an RNA-seq experiment?". This episode needs to be greatly expanded to include information on experimental design considerations and batch effect avoidance in additional to sequencing and quantification options. See Harvard's excellent https://hbctraining.github.io/Intro-to-rnaseq-hpc-salmon-flipped/lessons/02_experimental_planning_considerations.html and credit them if re-using anything. I also have many slides/graphics that could be added.

To Do before BioC2023 workshop

List of things still needing to be done:

  • Put all required packages from each episode in learners/setup.md
  • Add "Download handouts" see: carpentries-incubator/bioc-intro@a9a8490
  • Remove beginning warnings from pages
  • Check on callout box types - varnish package
  • Check to see if SummarizedExperiment constructor checks genes/samples consistency
  • New subsection in Lesson 3 about general data provenance and reproducibility
  • Add discussion and/or coding challenge in lesson 3
  • Add short discussion of design matrix in lesson 4 and link to lesson 8 for more info
  • email attendees with set up instructions to be done BEFORE the workshop
  • Run through entire workshop as learner with brand new R/RStudio
  • Run through entire workshop as learner using BioC's Galaxy instance as backup: https://workshop.bioconductor.org/
  • Why download_data.R run at the beginning of each episode? Add additional manipulations to it or have a second version for later episodes?
  • Online note place?
  • post-it notes and name tags

Tasks

No tasks being tracked yet.

Should we switch to TidySummarizedExperiment?

It would be a bit of an overhaul, but we should consider switching everything over to tidySummarizedExperiment. This would help simplify the ggplot function coding. Would we also want to switch all transformations/subsetting to tidyverse instead of baseR?

Add > PCs to iSEE example OR add glimmaMDS

iSEE wasn't that useful with only 2 PCs in the sce object. We could also add in Glimma as quick alternative to just the clustering (and gives stand-alone html which can be more practical for some people than shiny).

Extras to add to episode 4

I added these extras to episode 4 at the BioC2023 workshop and they should be added in:

  • ggsave() for plots
  • distributions of expression values ala limma::plotDensities()
  • interactive MDS clustering ala Glimma:glimmaMDS()
  • Challenge question showing examples of PCA/MDS showing outliers, batch effects, other problems and asking them to evaluate
  • Section on what to do if you have outliers or batch effects. If remove outlier, say to re-do all step from beginning (e.g. any gene filtering, scale factors will change slightly)

Add iSEE-related challenges

  • Add some specific exercises/tasks to the interactive exploration section in episode 4
  • Add DE results to rowData(dds) and explore again with iSEE in episode 5 (can use iSEEde once it is released)

Populate "Download Lesson Handout"

I tried to have sandpaper automatically create a "Download Lesson Handout" following @lgatto attempt in bioc-intro by adding options(sandpaper.handout = TRUE) to the sandpaper-main.yaml. However, this didn't seem to do anything. Googling led me to here, which says you also need to add purl = TRUE to all code chunks you want to include. I tried doing this for episode 3, but it also didn't seem to do anything when I built locally using sandpaper::serve(). However, later after pushing all commits to this GitHub repo, merging the pull request and running the GitHub action, it did appear on the Episode sidebar of the github.io website.

  • Add purl=TRUE to all/almost add R code chunks
  • Check the resulting Lesson Handout for clarity/modifications needed

'Theme & variations' exercise - change significance threshold

  • Note from the results output that the default significance threshold with DESeq2 is adj.P=0.1
  • Figure out where this is set
  • Change it to 0.05 instead
  • (Much harder) challenge: figure out why the results are not necessarily the same as just setting a threshold on the adjusted p-value in the first set of results (since the independent filtering depends on the provided alpha)
  • Variant: set a min logFC threshold to test an interval null hypothesis, and rerun the analysis. Again, figure out why the results are not the same as setting a threshold on the logFC after running the default test (of a point null hypothesis)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.