uoftcoders / rcourse Goto Github PK
View Code? Open in Web Editor NEWReproducible Quantitative Methods for EEB
Home Page: https://uoftcoders.github.io/rcourse/
License: Other
Reproducible Quantitative Methods for EEB
Home Page: https://uoftcoders.github.io/rcourse/
License: Other
Assuming they should be under version control and on GitHub... Do we create a repo on UofTCoders for each project and add the team members to each their respective repo? Do we have one of the members create a repo and the other share? If so, how do we decide which student gets the original repo (thus full control and potential perception that it is the lead repo..)?
My thought:
This way, we know whose done what, we can help out more closely, we can intervene if necessary (aka, resolve any file conflicts), and easily provide direct feedback on the project.
Others opinions?
From @joelostblom on July 13, 2017 23:32
Since exactly which data sets will be used is not finalized, focus on what you know you will include in terms of getting setup with prodigenR, tidying data, and what assignments will be associated with this. Might be a bit tricky to measure since the student choose different data sets, so let me know if you think it is unclear and we can discuss other alternatives.
Copied from original issue: UofTCoders/council#168
We discussed this briefly before. Should students submit assignments on blackboard, via email, or as pull requests on GitHub? GitHub would be nice just because it is easy to comment on specific lines of a pull request, but would need many repositories so that they cannot see each others work... I guess it works with inline comments in email as well, but it would not be as easy to render it nicely...
I'll have to test this out, but since the assignments will be uploaded to Blackboard as pdfs, the links should be explicit, so they can click it.
(based on a comment from @QuLogic)
I would suggest a couple of folders named 'lectures' and 'assignments'. @lwjohnst86 Did you already have something in mind for this?
From @joelostblom on March 24, 2017 0:40
Copied from original issue: UofTCoders/council#117
For feedback on what was good etc
From @joelostblom on July 13, 2017 23:28
I will include some introductory ggplot in the dplyr lecture. Maybe how to use scatter plots, histograms, and grouping by a factor (maybe hive/dotplots also). The idea is that this plotting lesson should be more advanced and could include topics like boxplots, violinplots, heatmaps, clustering, 2d histograms/kdes, facetting, theming, geoms, interactivity, confidence intervals, smoothing, etc.
The second plotting lesson will not have any assignments, but instead focus on the students plotting their own data, ideally in specific ways as defined by these lectures. Feel free to use some time of the second plotting lessons to cover more material if you feel like you need it.
Copied from original issue: UofTCoders/council#167
It is currently listed as Dec 9-20 in the syllabus, let's pick a day.
From @joelostblom on May 4, 2017 1:45
Copied from original issue: UofTCoders/council#151
Could a few of the questions be an expansion of the material taught during the lesson? Or do we want every concept to be explained in class and then the students work through it with their own data?
From @joelostblom on March 24, 2017 0:41
Are there restrictions in general for who we can recruit as a helper for the RQM tutorials?
Copied from original issue: UofTCoders/council#118
I think we discusses this previously. It would be useful to have a survey of the students skill level, familiarity with concepts, etc for our own knowledge and use.
Depending on how we deal with forming groups, this survey could help with that.
From @joelostblom on July 13, 2017 23:39
Copied from original issue: UofTCoders/council#170
(This is for future reference and doesn't need to be dealt with now.)
This way, we can reference/cite the tagged final version used in this term for later years, change up the course as we see fit for other purposes (later years, for other universities, for a standalone workshop/course, etc), while still having the previous, citable course.
Thoughts? (again, for later reference. It's not important now)
From @joelostblom on March 25, 2017 14:18
This is a long term item, I listed it for the May meeting (if we have a May meeting). But I'm putting down the details since this popped up in my head now, and we can discuss it here if you have the time.
I know data.table
is generally faster than dplyr
, but for the purposes of this class, I think we should almost exclusively care about what syntax is more intuitive to understand and what expertise already exists in the council.
I have limited experience in R, but for me data.table
is more intuitive. I think this is largely because it has similarities to Python Pandas, and I don't know if someone with a different background would share my opinion.
@lwjohnst86 and @lcoome since you have the most R experience, I have assigned you to this issue. What do you use and what are your thoughts on this topic?
A discussion on the differences between the data.table
and dplyr
can be found in this SO thread. There are several syntax examples from the authors of the respective packages. I'll list a couple here for those who don't want to read the thread:
# dplyr
diamonds %>%
filter(cut != "Fair") %>%
group_by(cut) %>%
summarize(
AvgPrice = mean(price),
MedianPrice = as.numeric(median(price)),
Count = n()
) %>%
arrange(desc(Count))
# data.table
diamonds[
cut != "Fair",
.(AvgPrice = mean(price),
MedianPrice = as.numeric(median(price)),
Count = .N
),
by = cut
][
order(-Count)
]
DT[, sum(y), by=z] ## data.table
DF %>% group_by(z) %>% summarise(sum(y)) ## dplyr
DT[, if(any(x > 5L)) y[1L]-y[2L] else y[2L], by=z] ## data.table
DF %>% group_by(z) %>% summarise(if (any(x > 5L)) y[1L]-y[2L] else y[2L]) ## dplyr
Copied from original issue: UofTCoders/council#132
From @joelostblom on March 25, 2017 12:42
Copied from original issue: UofTCoders/council#130
From @joelostblom on July 15, 2017 16:27
Copied from original issue: UofTCoders/council#174
One of our grading criteria is on continuous progress. We will already now have access to their commit history and some level of discussions in issues depending on how the students use it. Student's will likely also use some sort of chat service to coordinate meetings etc.
It would be nice if students used GitHub and Gitter almost exclusively instead of email and standard chat apps. Both because this will make them become familiar with this workflow faster and it will facilitate fair grading, since we can take part of their entire discussion and see who leads the group in terms of setting up meetings, making sure deadlines are met, etc.
From @joelostblom on July 13, 2017 23:36
This is currently about linear regression since Martin mentioned this briefly and it ties in nicely with the Tuesday lecture also being on modelling. But, I don't think @mbonsma needs this lecture to tie into her modelling content (correct me if I'm wrong), so feel free to change this to anything you like, if you feel like there is a really important R/analytics/statistic concept that we should include for example. You can also shuffle your lectures and have the data set introduction here if you want student to think about it over the weekend and start working with their data set the following Tuesday.
Copied from original issue: UofTCoders/council#169
From @joelostblom on March 30, 2017 22:36
This is not of immediate concern.
If computer science is not interested, and EEB still wants it to be cross-appointed if it turns in to a full course, I think we should involve Stats or SciNet, which EEB already have connections with. I have heard good things of SciNet, at least from the one student I know who has tried their course offerings =)
Copied from original issue: UofTCoders/council#135
I think it would be great if these could be preinstalled so that students can easily render PDF reports and see how easy it is once everything is setup. A strong point for me to use markdown in the first place is that I can easily render PDFs and hand in for any class assignment. For this purpose, it is not as useful to teach how to generate markdown documents, since teachers will probably not accept those for assignments.
From @joelostblom on July 16, 2017 17:46
@lwjohnst86 :
For the file naming, it might be best to encourage them to follow a standard that has been developed, rather than create their own. Like Google's R style guide https://google.github.io/styleguide/Rguide.xml
@joelostblom :
I agree that the students should follow an already developed style guideline, and I think it is important that we are consistent across the lectures as well.
I think it is trickier to agree on which style to follow, since R does not have official guidelines. I believe Google's recommendations appear to be sound in large, with the exception of using periods in object/variable and column names. This is confusion for me coming from and object oriented language and I would prefer snake case (a.k.a. underscores: variable_name
) in variable names and Pascal / upper camel case in column names (ColumnName
). But, and I keep saying this, I'm happy to conform to other standards based on advice from the more experienced R-users among us and discussion with the rest of the group.
However, I do believe underscores would be helpful for students if they are ever to read or write code written in another language, where periods have special meaning. This coding style is not without precedence in the R community, e.g. Hadley Wickham's stat 405 course recommends underscores for variable names, as does this R style guide. The most compelling reason to use underscores in our case might be to stay consistent with the recommendations in the tidyverse guidelines (naturally similar to those from stat 405), since we will largely be using packages from the tidyverse and it would be confusing to mix these conventions with those from another style guide. The tidyverse favors underscores for objects, functions, and parameters, e.g. tbl_df
, group_by
, geom_point
, facet_wrap
, etc. The tidyverse style guide is the most exhaustive of the ones I have seen, and often includes justifications of the chosen style, which helps understanding and memorization. There is also a package, lintr, for automatically controlling for adherence to the tidyverse guidelines, and I believe this is the syntax checker that is builtin to Rstudio.
Some additional info: Bioconductor recommends lower camel case. Some Swedish research shows that all three styles are common in existing CRAN packages. This is a helpful SO question, with several good answers.
Copied from original issue: UofTCoders/council#175
As per our discussion with Martin, it would make it simpler for us if we limited the choice of dataset they can use, so that trouble shooting on our end is easier. And in terms of grading, inter-team help/problem solving, etc.
I don't really have experience with timeseries, and I was initially thinking of not including it. However, it seems to be a prevalent data type to work with in ecology, and Martin mentioned that one of the databases largely is species abundance over time, so I think we should include this. I add it to one of the assignments, probably the last one.
From @joelostblom on May 11, 2017 2:50
Copied from original issue: UofTCoders/council#156
See here for initial issue: UofTCoders/council#175
Originally from pull request UofTCoders/council#164
@lwjohnst86 :
For the file naming, it might be best to encourage them to follow a standard that has been developed, rather than create their own. Like Google's R style guide https://google.github.io/styleguide/Rguide.xml
@joelostblom :
I agree that the students should follow an already developed style guideline, and I think it is important that we are consistent across the lectures as well.
I think it is trickier to agree on which style to follow, since R does not have official guidelines. I believe Google's recommendations appear to be sound in large, with the exception of using periods in object/variable and column names. This is confusion for me coming from and object oriented language and I would prefer snake case (a.k.a. underscores: variable_name) in variable names and Pascal / upper camel case in column names (ColumnName). But, and I keep saying this, I'm happy to conform to other standards based on advice from the more experienced R-users among us and discussion with the rest of the group.
However, I do believe underscores would be helpful for students if they are ever to read or write code written in another language, where periods have special meaning. This coding style is not without precedence in the R community, e.g. Hadley Wickham's stat 405 course recommends underscores for variable names, as does this R style guide. The most compelling reason to use underscores in our case might be to stay consistent with the recommendations in the tidyverse guidelines (naturally similar to those from stat 405), since we will largely be using packages from the tidyverse and it would be confusing to mix these conventions with those from another style guide. The tidyverse favors underscores for objects, functions, and parameters, e.g. tbl_df, group_by, geom_point, facet_wrap, etc. The tidyverse style guide is the most exhaustive of the ones I have seen, and often includes justifications of the chosen style, which helps understanding and memorization. There is also a package, lintr, for automatically controlling for adherence to the tidyverse guidelines, and I believe this is the syntax checker that is builtin to Rstudio.
Some additional info: Bioconductor recommends lower camel case. Some Swedish research shows that all three styles are common in existing CRAN packages. This is a helpful SO question, with several good answers.
From @joelostblom on April 29, 2017 13:41
EEB430H1F T 2-4, RW109; TH 2-4, Carr Hall (St. Mike's)
From Martin:
The RW 109 lab should have R installed and you should be able to go and log in with your UTORID and see the set up. I do not know the Carr Hall lab so you’ll have to look into that. Best wishes,
Copied from original issue: UofTCoders/council#143
So that students can't get hold of the answers, should we make the repository private? Or keep them in an encrypted zip? Or make a separate repo?
From @joelostblom on July 15, 2017 16:25
Either in https://github.com/openjournals/jose, as a self-published on the UofTCoders website, or elsewhere.
Copied from original issue: UofTCoders/council#173
Like 5% or something. For random, non-assignment related tasks (e.g. for completing an important exercise in class or something).
Martin's mentioned that it is key to adjust the level of teaching to the student's ability, so that students don't feel either bored because it is too easy or lost because it is too hard. How do we want to keep an eye on student's opinion of the teaching level?
We will have the results from the assessments, and students can always contact us via email. Do we want to add a survey (maybe after each block) to actively encourage students to give their opinion about how challenging the course is?
From @joelostblom on May 5, 2017 17:53
We could also install this at the first class, or let the students install them as a learning experience. However, in order to get things running smoothly, I think we should install most, if not all of them, prior to the first class and test that things are working as expected.
@lwjohnst86 @lcoome what more is not part of a base R-studio install? Does it come with all these?
Copied from original issue: UofTCoders/council#155
From @joelostblom on March 25, 2017 11:49
Keep an eye on http://www.openaccessweek.org/ and see if they will post anything that we could mention in the course.
Copied from original issue: UofTCoders/council#129
From @joelostblom on July 13, 2017 23:8
Joel:
The dates are chosen so that each assignment is due on Monday the following week. Is this enough time if we hand out assignments during the Thursday lecture? I chose Monday since it would give us some time to briefly go over most of the assignments before the next class and briefly repeat general concepts if many students struggled with the same problem.
Thinking about it a little more, I believe a due date on Tuesday, might be better. This gives students a chance to ask questions before and after class regarding the assignment, before they hand it in. It would fit particularly well if we make Tuesdays our office hours. It would also give us an additional day (wednesday) to go over the assignments and bring up and key concept on the following Thursday lecture. I will change to this in the next iteration unless there are opposing views.
lwjohnst86 a day ago Owner
I like this idea a lot
mbonsma a day ago Owner
So they would typically have five days to do the assignments? I think it should be a week and five days, unless they're quick.
joelostblom a day ago Member
I like the idea of a quick turn around (5 days) better, but let's discuss in person after we have a clearer idea of how involved the assignments will be.
joelostblom a minute ago Member
We talked briefly about this today. We agree that the due date should be on the same day that we have office hours so that student's can come and ask last minute questions.
We largely agreed that it would be better if students had 8 days rather than 3 days (excluding weekends) to hand in the assignment, especially since around 10 days seems to be standard and we don't want to scare students away by making them work on the weekends right away... The drawback of this is that it would be harder to follow up promptly if there are particular concepts that many students don't understand. We could remedy this with a socrates online quiz at the beginning of the following lecture (maybe during the break?), and repeat any concept that are unclear for many students. We just don't want to add to much overhead...
Copied from original issue: UofTCoders/council#165
This is for the first lecture, exactly one week from now!
This is what I imagine for the first class.
I imagine we don't have to bring up every single item on the syllabus, like academic integrity etc, just mentioning that they need to read the syllabus should be sufficient?
So the discussion is formalized here. What does everyone think about how the groups will be formed?
I recall we talked about having groups with varying skill sets involved. I like that approach, but how will be actually go about doing that? Randomly split them based on skills? Or some other criteria?
Thoughts?
This is really for @mbonsma and me. Should we first decide on what one or two datasets we give them before completely fleshing out the lecture material? Makes it easier for us and easier for the students. So we can focus on teaching material that we think they will actually use in the project.
From @joelostblom on July 13, 2017 23:16
Copied from original issue: UofTCoders/council#166
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.