Coder Social home page Coder Social logo

swcarpentry / r-novice-inflammation Goto Github PK

View Code? Open in Web Editor NEW
158.0 78.0 393.0 42.66 MB

Programming with R

Home Page: http://swcarpentry.github.io/r-novice-inflammation/

License: Other

R 99.95% Shell 0.05%
carpentries software-carpentry lesson r knitr rmarkdown data-visualisation data-wrangling data-visualization english programming stable

r-novice-inflammation's Introduction

Build and Deploy Website Create a Slack Account with us Slack Status

r-novice-inflammation

build-and-deploy-readme-badge

The Carpentries teach foundational coding, and data science skills to researchers worldwide. This GitHub repository generates the Software Carpentry lesson website "Introduction to R for non-programmers using inflammation data." The lesson website can be viewed here. Making changes in this GitHub repository allows us to change the content of the lesson website.

The following people are maintainers for this lesson, and are responsible for determining which changes to incorporate into the lesson:

Alumni:

The goal of this lesson is to teach novice programmers to write modular code to perform a data analysis. R is used to teach these skills because it is a commonly used programming language in many scientific disciplines. However, the emphasis is not on teaching every aspect of R, but instead on language agnostic principles like automation with loops and encapsulation with functions (see Best Practices for Scientific Computing to learn more). This lesson is a translation of the Python version, and is also available in MATLAB.

The example used in this lesson analyzes a set of 12 data files with inflammation data collected from a trial for a new treatment for arthritis (the data was simulated). Learners are shown how it is better to create a function and apply it to each of the 12 files using a loop instead of using copy-paste to analyze the 12 files individually.

Contributing

We value your contributions. How to contribute to this lesson is outlined in CONTRIBUTING.md. If you have questions about our contributing guidelines, please create a new issue in the issues tab and one of the maintainers will respond.

Getting Help

Please see https://github.com/carpentries/lesson-example for instructions on formatting, building, and submitting lessons, or run make in this directory for a list of helpful commands.

If you have questions or proposals, please send them to the r-discuss mailing list.

r-novice-inflammation's People

Contributors

aaren avatar ateucher avatar bbolker avatar bisaloo avatar brynnelliott avatar chendaniely avatar cmd-ntrf avatar dhaine avatar diyadas avatar elichad avatar emilliman5 avatar erinbecker avatar fmichonneau avatar gavinsimpson avatar gvwilson avatar haozeke avatar hdashnow avatar jainsley avatar jdblischak avatar karawoo avatar katrinleinweber avatar louisranjard avatar michaellevy avatar mlammens avatar natalie-robinson avatar steltenpower avatar stephenturner avatar tomwright01 avatar valentina-s avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

r-novice-inflammation's Issues

Image naming

In working on #20, I'm finding that some of the filenames for generated figures are different. For example, in the loops lesson the images created in the loop have a numeric suffix following a hyphen, while previously it seems that there was no hyphen before the numeric suffix. As a result, there are some orphan images in the /fig folder, and some broken image links in the html and md files where the newly generated images weren't commited (e.g., the last few figures in https://github.com/swcarpentry/r-novice-inflammation/blob/gh-pages/03-loops-R.md). I'm not sure if this is due to different versions of knitr or rmarkdown, or ... ?

I've commited the newly named image files for that example (in ateucher@ceb3a79) so that the images appear in the md/html files, however I haven't deleted the orphans, as I thought maybe this should be taken on as a broader issue.

Do something with refs/objects.Rmd

This was created back before the new lesson-template. If we want to keep it, we should:

  • Rename it as a supplemental lesson and move it to the base directory.
  • Format it according to the new lesson-template.
  • Add it to index.md

Filename changes and addition to index for supplemental lessons

@tomwright01, after merging I realized there were two additional things I need you to do for your two new supplemental lessons.

  • Update the names of the files to the convention used in this repo: all lowercase, separated with a dash.
  • Include a mention of these lessons in index.md. I'd recommend having a new header below "More Resources" that says "Supplemental Lessons".

Sequence of teaching

I have a suggestion to Schedule part of the title page. In my view the flow would be better if "Data types and structures" were introduced earlier in the lesson, however I see the benefit of starting with some practical example. The first six sections of this lesson could look like the following:

  1. Introduction to RStudio
  2. Analyzing Patient Data
  3. Data Types and Structures
  4. Addressing Data
  5. Reading and Writing CSV Files
  6. Understanding Factors

There's quite a bit of overlap between sections 2, 3, 4, so if you would agree to this, I volunteer going through these sections and merging them into only two ("Addressing data" will be dissolved in the other two chapters).

Then the rest of the concepts can be introduced.

Reformat learning objectives

The beginning of each lesson has a list of learning objectives. This need to be reformatted to conform with the new lesson template, see here.

the use of spaces in subsetting

I proposed some changes to the subsetting lesson and included spaces that are not part of the Google R Style Guide, and so it was requested I remove them, which I did. After some discussion with Karl Broman I wanted to suggest that for the purposes of teaching we should not follow the style guide in this respect.

When I teach my peers subsetting in R I always tell them to put in spaces when they want all the rows or all the columns

THIS -> dataframe[ ,5]

instead of this -> dataframe[,5]

I talked to them about thinking when they type that first bracket 'do I want all the rows?" if yes, then put a space, then a comma, then whatever you want for the columns (or vise versa, put in your subsetting stuff for the rows and then ask if you want all the columns, if yes, put a space).

I've found this helpful because it forces the learner to think about what is being indicated on both sides of the comma and helps prevent confusion about what goes where. Overtime perhaps the learner doesn't need to do it, I still do, but they may not, but I think it is a good learning tool.

I realize its not with the style guide, but I think it is worth considering and some of you won't like this, but that is how I plan to teach.

ducks from all the angry people throwing things

Indosyncracy of mean() on inflammation-01

This works as expected

# mean inflammation on day 7
mean(dat[, 7])

but this fails

# mean of measurements for patient 7
mean(dat[7,])

returns an error. That is because there is no header in this file and arbitrary field names are give.

Might be worthwhile to note that it makes no medical sense to calculate the mean of 24 measurements.

Pass tests from check.py

Running make check runs tools/check.py. This highlights formatting issues that need to be resolved for complying with the lesson template.

You can run make check yourself. Alternatively, it is run daily on a remote server and the results are stored here. This is also a good way to see the very latest rendered versions of the lessons since I do not re-build the lessons after each PR (I wait until a few changes have been merged). See http://scf.rgaiacs.com/nightly/r-novice-inflammation/

Add discussion.md

See here for directions. notes.txt may contain some info to be included in this page. Once useful information has been extracted from notes.txt, it can be deleted.

Reformat challenges

The challenges need to be reformatted to conform to the new lesson template, see here. This could be tricky for those challenges that include R Markdown chunks. We'll probably just want to leave the chunks alone, but if someone thinks of a better idea that would be great.

car-speeds.csv?

I was going through the 01-supp-read-write-csv.html lesson, but I could not find a link to download the car-speeds.csv. Also, the file was not in the data folder in r-novice-inflammation-data.zip.

Could a copy of car-speeds.csv be put in the data folder in r-novice-inflammation-data.zip?

Thanks!

For loops and conditions vs. apply and logical subsetting

I am of strong opinion that introducing for loops and if else statements early in the teaching program makes more harm than good to R education. I understand that this is done out of desire to maintain consistency between how Python and R are taught, but I would argue that the approach to teaching (and using) the two languages should be different.

I would argue that for loops and if else statements need to be shelved under the "Advanced" topics towards the end of the teaching lesson and instead a sections on *apply family of functions and logical subsetting should be introduced. R is positioning itself as a vectorized language (even though under the hood it might be running highly optimized for loops in C++), but the R programmer is encouraged to think in terms of vectors and data frames. Introduction of non-vectorized operators breaks that frame and positions R for failure due to apriori lost argument on speed and efficiency.

Again, if you agree, I volunteer to handpick material on *apply operators from another lesson (I will not introduce any new concepts but rather repackages what is already in the very good Software Carpentry curriculum) and rework logical subsetting (and, perhaps, mention vectorized ifelse() function) to cater for the need to teach implementation of conditional logic (branching) in R.

As I said, the loop and branching sections are good, but only as an advanced topic towards the end of the lesson material.

Move solutions out of instructors.Rmd and delete that file

Transitioning to new template: solutions to exercises are in instructors.Rmd in the root directory. They should be moved to _episodes_rmd/*.Rmd (so that they'll be processed correctly), and any remaining content in instructors.Rmd should be moved to _extras/guide.md.

Column headers

Since we're using data frames, would it make sense to add column headers to the data?

Functional programming example in "Creating functions"?

In version 5.3 of "Creating functions," there is a great example of function composition for converting temperature from Fahrenheit to Celsius.

fahr_to_celsius <- function(temp) {
  temp_k <- fahr_to_kelvin(temp)
  result <- kelvin_to_celsius(temp_k)
  return(result)
}

# freezing point of water in Celsius
fahr_to_celsius(32.0)

However, it seems much easier and perhaps more common to use a functional approach and chain the two functions together, at least for the simple purpose described:

kelvin_to_celsius(fahr_to_kelvin(32.0))

Is there merit to showing both approaches?

Use different variable name for slicing challenge

In the first lesson, there is an exercise on slicing. It is currently a bit confusing because the name of the vector to be sliced is element. This is awkward when explaining how to select specific elements of a vector. This should be updated to have a different variable name (and optionally the word in the character vector could also be changed to be consistent with the new variable name).

Reformat callout boxes

The lessons currently have asides which start with Tip:. These need to be reformatted to conform with the new lesson template, see here.

To find all the occurrences of these asides, you can run grep Tip: *Rmd

05 cmdline

For people using Windows, there is no mention of "R" being in the path before making calls to Rscript. If Rscript is called from the command line without R in the path, lesson 5 will only state that the command "Rscript" cannot be found,

Is there somewhere in the setup instructions that flags up the alteration of the PATH statement to include "R" before this lesson is undertaken?

Misleading wording in 02-func-R

The first Tip in the file starts with this sentence:

"One feature unique to R is that the return statement is not required."

This feature is far from unique to R. So called "implicit returns" are found in a variety of languages, Erlang, Ruby, and Rust are just three examples.

I would prefer something like "In R, it is not necessary to include the return statement."

Transfer glossary from gloss.md in bc repo to reference.md

The definitions in gloss.md in the bc repo that are referenced in this lesson need to be transferred to the reference file. This repo already contains a reference, named 06-R.md. Thus the following steps need to be taken:

  • Rename 06-R.md to reference.md
  • Copy the definitions in gloss.md that appear in the topic pages to reference.md
  • I'd recommend running something like grep gloss.md *Rmd to get a list of all the definitions used in the lessons
  • Change the definitions to the new format for use with pandoc
  • Update the citations in the topic pages to refer to reference.html instead of gloss.html

See here for the new format of the glossary.

Make sure all lessons have objectives and keypoints

Many lessons are missing keypoints in the yaml header of the Rmd files. One is also rather thin on objectives.

Objectives will be shown at the start of a lesson, to let participants know what to expect. Key points will appear at the end of the lesson, as a checkpoint so participants can check their understanding.

The following lessons are missing Key points:
(note these lesson numbers are might change again with the new lesson template, so use the titles if unsure)

Main lessons:

  • 06 Best Practices for Writing R
  • 07 Dynamic Reports with knitr
  • 08 Making Packages in R (also objective is there, but possibly too brief?)

Supplementary lessons:

  • 09 Introduction to RStudio
  • 10 Addressing Data
  • 13 Data Types and Structures
  • 14 The Call Stack
  • 15 Loops in R

Here's an example of what it should look like:


---
title: "Analyzing Multiple Data Sets"
teaching: 30
exercises: 0
questions:
- "How can I do the same thing to multiple data sets?"
- "How do I write a `for` loop?"
objectives:
- "Explain what a `for` loop does."
- "Correctly write `for` loops to repeat simple calculations."
- "Trace changes to a loop variable as the loop runs."
- "Trace changes to other variables as they are updated by a `for` loop."
- "Use a function to get a list of filenames that match a simple pattern."
- "Use a `for` loop to process multiple files."
keypoints:
- "Use `for (variable in collection)` to process the elements of a collection one at a time."
- "The body of a `for` loop is surrounded by curly braces (`{}`)."
- "Use `length(thing)` to determine the length of something that contains other values."
- "Use `list.files(path = \"path\", pattern = \"pattern\", full.names = TRUE)` to create a list of files whose names match a pattern."

---

Reference

Glossary

shape (of an array)

An array’s dimensions, represented as a vector. For example, a 5×3 array’s shape is (5,3).

in R should be dim() not shape

Make it clear which episodes are optional

The optional lesson episodes need to be clearly marked as such. (They have to be in the _episodes_rmd directory in order to trigger conversion to Markdown.) This may require modifying the template to add a style...

Missing *.svg image files for rendering in html files

Many (perhaps all) of the *.svg image files used in the novice R lessons appear to have come from the novice python lessons. These images are not in the "r-novice-inflammation" repo, and so they do not appear in the html files. The easiest fix would be to copy those images into the "figure" subdirectory (from either "bc" or "python-novice-inflammation". Is this approach best, given it will lead to duplicate files within the SWC universe of repos?

command line lesson (lesson 5) will differ depending on who is compiling the document

If I make lesson-rmd I will always get a change for the some of the chunks because it calls functions like sessionInfo(). The difference is from my computer being a different OS than the current lesson (which seems was generated from OSX)

Is there a better way to handle this?

This might just be an issue us maintainers need to deal with...

Not best practice in the final inflammation example

In lesson 4, when the students get their R program to the point where they have a single command to generate all the plots, there is a bit of a problem. If they run the script a second time, they open the newly created pdf documents as well as the csv files and the csv reader gets a little upset being given a pdf file ;-)

In the example before this, where they run a command to generate the plot, the plot is put in the top level directory of the exercise and the data is in the data subdirectory. When they go to automating this, the example reads the cvs file from the data subdirectory and then also writes the pdf to the same directory. My fix (and I am no R programmer!) was to get the automatic program to read the csv filenames without the path, do the substitution to create the pdf filename and then prepend data/ to the csv filename, which results in the PDF files all being in the working directory, whilst reading the cvs from the data subdirectory. Another alternative would be to make sure that only filenames with inflammation and csv are opened. I leave it to the experts to decide which is preferable.

Andrew

As an aside, a very nasty result of the R program is that if in the string replacement, the student mistypes csv (e.g. as cvs), all of their input files are clobbered as the pdf filename is the same as the csv filename i.e. they write to the input file - the first solution above will also fix that!

Update YAML header for topic pages

Currently the YAML header in the topic pages is from the previous build system using jekyll. These need to be updated to the following format:


---
layout: page
title: Lesson Title
subtitle: Topic Title
minutes: 10

---

Each topic is decently long since these were designed before the implementation of the new lesson template, so the minutes should be set to something like 30 minutes (if you've taught these before, feel free to add more realistic numbers).

Check/remove unexpected images

make lesson-check reports:

Unexpected image files: 01-supp-factors-adjusting-levels-1.png, 01-supp-factors-dropping-levels-1.png, 01-supp-factors-gender-counts-1.png, 01-supp-factors-reordering-factors-1.png, 01-supp-factors-updating-factors-1.png, 02-func-R-rescale-test-1.png, 02-func-R-rescale-test-2.png, 03-loops-R-loop-analyze-5.png, 03-loops-R-loop-analyze-6.png, 03-loops-R-loop-analyze-7.png, 03-loops-R-loop-analyze-8.png, 03-loops-R-loop-analyze-9.png, logical_vectors_indexing2-1.png, python-operations-across-axes.svg

suppy "ignore.case=TRUE" in callings of "sub" function

I suggest to suppy "ignore.case=TRUE" in callings of "sub" function in the section entitled "Making choice" within R course material.

The R function "sub" is used there to substitute the filename ending "csv" with "pdf". In Windows platforms, since file names are case insensitive, users are free to use capital letters. Unfortunately, pattern strings in "sub" function are case sensitive by default. So, it is necessary to accomodate the scenario of having capital letters in file name extension by supplying the argument value of "ignore.case=TRUE".

Add Challenge format info to PULL_REQUEST_TEMPLATE

Add some text to PULL_REQUEST_TEMPLATE to deal with common mistakes when submitting challenges, especially:

  • Challenge title
  • Answer format

Maybe just give a challenge template? Here's a "correct" example to help formulate this template.

> ## Slicing and re-assignment {.challenge}
>
> Using the inflammation data frame `dat` from above:
> Let's pretend there was something wrong with the instrument on the first five days for every second patient (#2, 4, 6, etc.), which resulted in the measurements being twice as large as they should be.
>
> 1. Write a vector containing each affected patient (hint: `? seq`)
> 2. Create a new data frame with in which you halve the first five days' values in only those patients
> 3. Print out the corrected data frame to check that your code has fixed the problem
>

``` {r slicing-re-assignment-answer, eval=FALSE, include=FALSE}
whichPatients <- seq(2,40,2)
whichDays <- c(1:5)
dat2 <- dat
dat2[whichPatients,whichDays] <- dat2[whichPatients,whichDays]/2
(dat2)
```

Not sure if I should hold off on this until the new lesson template is done?

Use two-digit identifier for generated figures

@fmichonneau writes:

For the fig path, I think it has been inconsistent. Personally, I
think it's a good idea to given figures a prefix for each episode, so
we can see right away where the figures are coming from (and whether
some need to be deleted if they are not in use anymore). Therefore, in
the gapminder lesson, I set as a prefix the 2 digit number for the
episode (even if the lesson doesn't generate a figure), so each
episode should start with something like:

    ```{r, include=FALSE}
    source("../bin/chunk-options.R")
    knitr_fig_path("01-")
    ```
  1. Make this change to all .Rmd files.
  2. Clean out all old figures.

See also swcarpentry/r-novice-gapminder#166

Inconsistent challenge naming

Some challenge titles are pre-pended with "Challenge - ". This is applied inconsistently, and seems to be absent from other lessons in other languages.

e.g. in 01-starting-with-data.Rmd
> ## Loading data with headers {.challenge}
vs
> ## Challenge - Assigning values to variables {.challenge}

I suggest removing all "Challenge - ". Any objections?
ping @chendaniely

Change figures/ to fig/

The subdirectory with images is currently figures/. This needs to be changed to fig/. Also the knitr chunk options at the beginning of each file needs to be updated to reflect this, e.g. opts_chunk$set(fig.path = "figure/02-func-R-") would be changed to opts_chunk$set(fig.path = "fig/02-func-R-"). See Lesson Layout for more information.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.