Coder Social home page Coder Social logo

tavareshugo / 2018-12-03-bioinformatics_for_biologists Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 15 KB

Materials for the R sections of the "Bioinformatics for Biologists" course at Cambridge University

Home Page: https://tavareshugo.github.io/2018-12-03-bioinformatics_for_biologists/

R 100.00%

2018-12-03-bioinformatics_for_biologists's Introduction

Introduction to data analysis with R

3-4 December 2018, Cambridge University Bioinformatics Training

Instructors: Hugo Tavares & Sandra Cortijo (Sainsbury Laboratory)


This is a general introduction to R for data analysis.

Our practicals will be very hands-on, focusing on learning the necessary sintax to allow you to do data analysis in R, from data manipulation to visualisation. We will focus on tabular data, which is general enough to allow you to apply these skills to a wide range of problems.

Below, we provide links to detailed materials for your reference, many of which were developed by the Data Carpentry organisation.

If you have any queries please post a new issue on our GitHub repository.


Setup

All necessary software and data will be available on the training machines at the Bioinformatics Training Room (Craik-Marshall Building).

However, you are welcome to use your own laptop, in which case you need to:

  • Download and install R (here)
  • Download and install RStudio (here)
  • Install the R package tidyverse (open RStudio and go to Tools > Install Packages)

Data Organisation in Spreadsheets

Digital data recording often starts with a spreadsheet software (e.g. Excel). For an effective data analysis, it's crucial to start with a well structured and formatted dataset. Because of this, before diving into R, we will start by having a discussion about common issues that should be considered when recording data in spreadsheets.

  • Download data for this lesson here
  • Find detailed materials here

Further reading:

Introduction to R

This lesson will cover the very basics of using R with RStudio.

Detailed reference materials:

exercises

Data manipulation and visualisation in R

This lesson will cover some functions to effectively manipulate and summarise tabular data using the dplyr package and we will start to learn how to visualise data with the ggplot2 package.

Detailed reference materials:

Exploratory RNAseq data analysis in R

In this session we will apply the concepts learned so far to a worked example of an exploratory data analysis of transcriptomic data.

During the lesson, we will also learn a few more tricks in R, including:

Further reading:

Further resources

Extra materials/books:

2018-12-03-bioinformatics_for_biologists's People

Contributors

tavareshugo avatar

Stargazers

 avatar

Watchers

 avatar  avatar

2018-12-03-bioinformatics_for_biologists's Issues

intructor notes

The exercises for the course have been compiled here.

Outline of things to cover:

  • Create Rproj and folders "data_output" and "scripts"
  • Intro
    • skip factors
    • exercises 1.1 and 1.2
  • data.frames
    • use read_csv() from the beginning to simplify things
    • don't spend too much time here, main thing is to explain [rows, columns] for subset and $ to access column
    • exercise 1.3
  • dplyr
    • skip spread/gather (covered in extra RNAseq lesson)
    • exercises 2.1, 2.2, 2.3
    • Do exercise 2.4 with students to save time
  • ggplot2:
    • skip themes and customisation (simply mention them at the end)
    • extra: see note below to mention factors
    • exercises 3.1-4 (if time is short do some exercises together)

note: extra material for ggplot2 section

So that students intuitively understand factors, introduce them in the plotting
section.

For example:

When doing this plot:

surveys_complete %>% 
  ggplot(aes(sex, hindfoot_length)) +
  geom_boxplot()

What if we want to change the order of the x-axis labels to be "M" first?

Then we need to learn about factors, which are a special way that R has to
encode categorical variables.

Let's look at factors using a simple example first. Then go through the example
of the course materials here, but only the very first section of it.

From there, jump back to the plotting problem and resolve it:

surveys_complete %>% 
  mutate(sex = factor(sex, levels = c("M", "F")))
  ggplot(aes(sex, hindfoot_length)) +
  geom_boxplot()

Exercise 3.4 applies this concept again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.