Coder Social home page Coder Social logo

uoftcoders / rcourse Goto Github PK

View Code? Open in Web Editor NEW
21.0 12.0 33.0 70.18 MB

Reproducible Quantitative Methods for EEB

Home Page: https://uoftcoders.github.io/rcourse/

License: Other

HTML 2.09% R 47.94% TeX 49.97%
rstats teaching-materials coding ecology eeb evolutionary-biology

rcourse's People

Contributors

aays avatar james-s-santangelo avatar joelostblom avatar lcoome avatar linamnt avatar lwjohnst86 avatar mbonsma avatar qulogic avatar saramati avatar smile4life avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rcourse's Issues

Where will final assignments be created?

Assuming they should be under version control and on GitHub... Do we create a repo on UofTCoders for each project and add the team members to each their respective repo? Do we have one of the members create a repo and the other share? If so, how do we decide which student gets the original repo (thus full control and potential perception that it is the lead repo..)?

My thought:

  • We create each repo on the UofTCoders.
  • Add each member as a contributor.
  • Set the master branch as protected (so no one can push to it).
  • Each member forks it.
  • Each member submits PR to the main one to keep it updated.
  • Any can merge PRs.

This way, we know whose done what, we can help out more closely, we can intervene if necessary (aka, resolve any file conflicts), and easily provide direct feedback on the project.

Others opinions?

Upload draft of assignments for getting started with the data set

From @joelostblom on July 13, 2017 23:32

Since exactly which data sets will be used is not finalized, focus on what you know you will include in terms of getting setup with prodigenR, tidying data, and what assignments will be associated with this. Might be a bit tricky to measure since the student choose different data sets, so let me know if you think it is unclear and we can discuss other alternatives.

Copied from original issue: UofTCoders/council#168

What's the workflow for grading assignments?

We discussed this briefly before. Should students submit assignments on blackboard, via email, or as pull requests on GitHub? GitHub would be nice just because it is easy to comment on specific lines of a pull request, but would need many repositories so that they cannot see each others work... I guess it works with inline comments in email as well, but it would not be as easy to render it nicely...

Upload draft of assignments for plotting lectures

From @joelostblom on July 13, 2017 23:28

I will include some introductory ggplot in the dplyr lecture. Maybe how to use scatter plots, histograms, and grouping by a factor (maybe hive/dotplots also). The idea is that this plotting lesson should be more advanced and could include topics like boxplots, violinplots, heatmaps, clustering, 2d histograms/kdes, facetting, theming, geoms, interactivity, confidence intervals, smoothing, etc.

The second plotting lesson will not have any assignments, but instead focus on the students plotting their own data, ideally in specific ways as defined by these lectures. Feel free to use some time of the second plotting lessons to cover more material if you feel like you need it.

Copied from original issue: UofTCoders/council#167

Create a DOI of the course content at end of class for future reference.

(This is for future reference and doesn't need to be dealt with now.)

This way, we can reference/cite the tagged final version used in this term for later years, change up the course as we see fit for other purposes (later years, for other universities, for a standalone workshop/course, etc), while still having the previous, citable course.

Thoughts? (again, for later reference. It's not important now)

Should we use data.table or dplyr (or something else) for the RQM course?

From @joelostblom on March 25, 2017 14:18

This is a long term item, I listed it for the May meeting (if we have a May meeting). But I'm putting down the details since this popped up in my head now, and we can discuss it here if you have the time.

I know data.table is generally faster than dplyr, but for the purposes of this class, I think we should almost exclusively care about what syntax is more intuitive to understand and what expertise already exists in the council.

I have limited experience in R, but for me data.table is more intuitive. I think this is largely because it has similarities to Python Pandas, and I don't know if someone with a different background would share my opinion.

@lwjohnst86 and @lcoome since you have the most R experience, I have assigned you to this issue. What do you use and what are your thoughts on this topic?


A discussion on the differences between the data.table and dplyr can be found in this SO thread. There are several syntax examples from the authors of the respective packages. I'll list a couple here for those who don't want to read the thread:

# dplyr
diamonds %>%
  filter(cut != "Fair") %>%
  group_by(cut) %>%
  summarize(
    AvgPrice = mean(price),
    MedianPrice = as.numeric(median(price)),
    Count = n()
  ) %>%
  arrange(desc(Count))

# data.table
diamonds[
  cut != "Fair", 
  .(AvgPrice = mean(price),
    MedianPrice = as.numeric(median(price)),
    Count = .N
  ), 
  by = cut
][ 
  order(-Count) 
]
DT[, sum(y), by=z]                       ## data.table
DF %>% group_by(z) %>% summarise(sum(y)) ## dplyr

DT[, if(any(x > 5L)) y[1L]-y[2L] else y[2L], by=z]                        ## data.table
DF %>% group_by(z) %>% summarise(if (any(x > 5L)) y[1L]-y[2L] else y[2L]) ## dplyr   

Copied from original issue: UofTCoders/council#132

Encourage students to use GitHub and Gitter as their primary ways of communication?

One of our grading criteria is on continuous progress. We will already now have access to their commit history and some level of discussions in issues depending on how the students use it. Student's will likely also use some sort of chat service to coordinate meetings etc.

It would be nice if students used GitHub and Gitter almost exclusively instead of email and standard chat apps. Both because this will make them become familiar with this workflow faster and it will facilitate fair grading, since we can take part of their entire discussion and see who leads the group in terms of setting up meetings, making sure deadlines are met, etc.

Potential drawbacks:

  • Students feel anxious that what they discuss withing the group can be viewed by their TAs (and everyone on the internet if the repo is public).
    • This is how they are likely to work in the future and it is important getting exposed to it and experience having their professional opinion associated with the name online.
    • They can always send emails between each other for matters they regard as confidential.
  • More work for us in grading the chat history.
    • We don't have to go through it in minute detail. It is more to see if someone really pulls all the weight in a team or if some teams don't communicate at all.

Finalize topic for what is now "Linear regression population dynamics models"

From @joelostblom on July 13, 2017 23:36

This is currently about linear regression since Martin mentioned this briefly and it ties in nicely with the Tuesday lecture also being on modelling. But, I don't think @mbonsma needs this lecture to tie into her modelling content (correct me if I'm wrong), so feel free to change this to anything you like, if you feel like there is a really important R/analytics/statistic concept that we should include for example. You can also shuffle your lectures and have the data set introduction here if you want student to think about it over the weekend and start working with their data set the following Tuesday.

Copied from original issue: UofTCoders/council#169

Follow up with Francois from CS

From @joelostblom on March 30, 2017 22:36

This is not of immediate concern.
If computer science is not interested, and EEB still wants it to be cross-appointed if it turns in to a full course, I think we should involve Stats or SciNet, which EEB already have connections with. I have heard good things of SciNet, at least from the one student I know who has tried their course offerings =)

Copied from original issue: UofTCoders/council#135

Ask wether we can get MikTech and the necessary R-latex packages on the computers

I think it would be great if these could be preinstalled so that students can easily render PDF reports and see how easy it is once everything is setup. A strong point for me to use markdown in the first place is that I can easily render PDFs and hand in for any class assignment. For this purpose, it is not as useful to teach how to generate markdown documents, since teachers will probably not accept those for assignments.

Setup guide

Decide on which R style guide to follow

From @joelostblom on July 16, 2017 17:46

From UofTCoders/council#164

@lwjohnst86 :
For the file naming, it might be best to encourage them to follow a standard that has been developed, rather than create their own. Like Google's R style guide https://google.github.io/styleguide/Rguide.xml

@joelostblom :
I agree that the students should follow an already developed style guideline, and I think it is important that we are consistent across the lectures as well.

I think it is trickier to agree on which style to follow, since R does not have official guidelines. I believe Google's recommendations appear to be sound in large, with the exception of using periods in object/variable and column names. This is confusion for me coming from and object oriented language and I would prefer snake case (a.k.a. underscores: variable_name) in variable names and Pascal / upper camel case in column names (ColumnName). But, and I keep saying this, I'm happy to conform to other standards based on advice from the more experienced R-users among us and discussion with the rest of the group.

However, I do believe underscores would be helpful for students if they are ever to read or write code written in another language, where periods have special meaning. This coding style is not without precedence in the R community, e.g. Hadley Wickham's stat 405 course recommends underscores for variable names, as does this R style guide. The most compelling reason to use underscores in our case might be to stay consistent with the recommendations in the tidyverse guidelines (naturally similar to those from stat 405), since we will largely be using packages from the tidyverse and it would be confusing to mix these conventions with those from another style guide. The tidyverse favors underscores for objects, functions, and parameters, e.g. tbl_df, group_by, geom_point, facet_wrap, etc. The tidyverse style guide is the most exhaustive of the ones I have seen, and often includes justifications of the chosen style, which helps understanding and memorization. There is also a package, lintr, for automatically controlling for adherence to the tidyverse guidelines, and I believe this is the syntax checker that is builtin to Rstudio.

Some additional info: Bioconductor recommends lower camel case. Some Swedish research shows that all three styles are common in existing CRAN packages. This is a helpful SO question, with several good answers.

Copied from original issue: UofTCoders/council#175

Decide on which one or two datasets to use

As per our discussion with Martin, it would make it simpler for us if we limited the choice of dataset they can use, so that trouble shooting on our end is easier. And in terms of grading, inter-team help/problem solving, etc.

Include working with timeseries data in dplyr?

I don't really have experience with timeseries, and I was initially thinking of not including it. However, it seems to be a prevalent data type to work with in ecology, and Martin mentioned that one of the databases largely is species abundance over time, so I think we should include this. I add it to one of the assignments, probably the last one.

Lecture hall items to follow up on ~1 week before RQM classes start

From @joelostblom on May 11, 2017 2:50

  • Check that the temperature in Ramsay Wright 109 has been fixed Yes
  • New Rstudio installed in Carr Hall? Needs to be > 1.0 for notebooks
  • Did they fix NetSupport mass login support?
  • Is NetSupport now available in Ramsay Wright 109? Yes
  • Did they install the packages or do we have to install every time?
  • We should visit and test things out maybe a week or so before the semester starts. Especially if we want to do in class surveys. Done, nothing installed =(

Copied from original issue: UofTCoders/council#156

Style guide for R lectures and courses

See here for initial issue: UofTCoders/council#175

Initial comments:

Originally from pull request UofTCoders/council#164

@lwjohnst86 :
For the file naming, it might be best to encourage them to follow a standard that has been developed, rather than create their own. Like Google's R style guide https://google.github.io/styleguide/Rguide.xml

@joelostblom :
I agree that the students should follow an already developed style guideline, and I think it is important that we are consistent across the lectures as well.

I think it is trickier to agree on which style to follow, since R does not have official guidelines. I believe Google's recommendations appear to be sound in large, with the exception of using periods in object/variable and column names. This is confusion for me coming from and object oriented language and I would prefer snake case (a.k.a. underscores: variable_name) in variable names and Pascal / upper camel case in column names (ColumnName). But, and I keep saying this, I'm happy to conform to other standards based on advice from the more experienced R-users among us and discussion with the rest of the group.

However, I do believe underscores would be helpful for students if they are ever to read or write code written in another language, where periods have special meaning. This coding style is not without precedence in the R community, e.g. Hadley Wickham's stat 405 course recommends underscores for variable names, as does this R style guide. The most compelling reason to use underscores in our case might be to stay consistent with the recommendations in the tidyverse guidelines (naturally similar to those from stat 405), since we will largely be using packages from the tidyverse and it would be confusing to mix these conventions with those from another style guide. The tidyverse favors underscores for objects, functions, and parameters, e.g. tbl_df, group_by, geom_point, facet_wrap, etc. The tidyverse style guide is the most exhaustive of the ones I have seen, and often includes justifications of the chosen style, which helps understanding and memorization. There is also a package, lintr, for automatically controlling for adherence to the tidyverse guidelines, and I believe this is the syntax checker that is builtin to Rstudio.

Some additional info: Bioconductor recommends lower camel case. Some Swedish research shows that all three styles are common in existing CRAN packages. This is a helpful SO question, with several good answers.

Include participation marks?

Like 5% or something. For random, non-assignment related tasks (e.g. for completing an important exercise in class or something).

Survey students about how challenging the course is?

Martin's mentioned that it is key to adjust the level of teaching to the student's ability, so that students don't feel either bored because it is too easy or lost because it is too hard. How do we want to keep an eye on student's opinion of the teaching level?

We will have the results from the assessments, and students can always contact us via email. Do we want to add a survey (maybe after each block) to actively encourage students to give their opinion about how challenging the course is?

Make list of required R-packages to be installed before the first class

From @joelostblom on May 5, 2017 17:53

We could also install this at the first class, or let the students install them as a learning experience. However, in order to get things running smoothly, I think we should install most, if not all of them, prior to the first class and test that things are working as expected.

Packages:

  • dplyr
  • prodigenr
  • ggplot2
  • rmarkdown
  • plotly?

@lwjohnst86 @lcoome what more is not part of a base R-studio install? Does it come with all these?

Copied from original issue: UofTCoders/council#155

Discuss how many days students should have to complete assignments

From @joelostblom on July 13, 2017 23:8

Joel:
The dates are chosen so that each assignment is due on Monday the following week. Is this enough time if we hand out assignments during the Thursday lecture? I chose Monday since it would give us some time to briefly go over most of the assignments before the next class and briefly repeat general concepts if many students struggled with the same problem.

Thinking about it a little more, I believe a due date on Tuesday, might be better. This gives students a chance to ask questions before and after class regarding the assignment, before they hand it in. It would fit particularly well if we make Tuesdays our office hours. It would also give us an additional day (wednesday) to go over the assignments and bring up and key concept on the following Thursday lecture. I will change to this in the next iteration unless there are opposing views.

lwjohnst86 a day ago Owner
I like this idea a lot

mbonsma a day ago Owner
So they would typically have five days to do the assignments? I think it should be a week and five days, unless they're quick.

joelostblom a day ago Member
I like the idea of a quick turn around (5 days) better, but let's discuss in person after we have a clearer idea of how involved the assignments will be.

joelostblom a minute ago Member
We talked briefly about this today. We agree that the due date should be on the same day that we have office hours so that student's can come and ask last minute questions.

We largely agreed that it would be better if students had 8 days rather than 3 days (excluding weekends) to hand in the assignment, especially since around 10 days seems to be standard and we don't want to scare students away by making them work on the weekends right away... The drawback of this is that it would be harder to follow up promptly if there are particular concepts that many students don't understand. We could remedy this with a socrates online quiz at the beginning of the following lecture (maybe during the break?), and repeat any concept that are unclear for many students. We just don't want to add to much overhead...

Copied from original issue: UofTCoders/council#165

Plan for the first class

This is for the first lecture, exactly one week from now!

This is what I imagine for the first class.

  1. Martin talks for ~15 min and then the stage is ours.
  2. I spend 10 min introducing the general concept of the class, why are excited to teach it, and how we will work throughout the semester. I also talk about the logistics of the class a bit, that we will use github and blackboard, how assignment will be distributed etc.
  3. I am in favor for walking through the syllabus by each person taking ~4 min introducing themselves, what they will teach, and why they are excited over reproducible/quantitative/programming skills and an open science workflow, and how learning these skills have helped them in their research.
  4. (We should be around 40 min into the class now, have a break? or continue since it is a lightweight first class anyways? If we have a break, we can say a few words to Martin and then those who want to leave can do so)
  5. I talk about intro to programming ~(30min)
  6. I talk about intro to Rstudio and Rmarkdown (~30min)
  7. Should I bring up the first assignment anything?

I imagine we don't have to bring up every single item on the syllabus, like academic integrity etc, just mentioning that they need to read the syllabus should be sufficient?

How to form groups?

So the discussion is formalized here. What does everyone think about how the groups will be formed?

I recall we talked about having groups with varying skill sets involved. I like that approach, but how will be actually go about doing that? Randomly split them based on skills? Or some other criteria?

Thoughts?

For modeling, use data to guide what we teach?

This is really for @mbonsma and me. Should we first decide on what one or two datasets we give them before completely fleshing out the lecture material? Makes it easier for us and easier for the students. So we can focus on teaching material that we think they will actually use in the project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.