datacamp / authoring Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://authoring.datacamp.com/
License: Other
Home Page: https://authoring.datacamp.com/
License: Other
Doesn't appear that there is a mention of not having back-to-back multiple choice exercises either. I think given the discussion Greg led on good multiple choice exercises, we should revisit both of these pseudo-guidelines anyway. I'm going to proceed with instructors assuming these are not strict guidelines since they aren't included here.
In http://authoring.datacamp.com/courses/design/brainstorming-jargon.html
... it's maybe worth mentioning that "longitudinal data" is also called "repeated measures data" and "panel data".
From Dave Robinson's Stochastic Processes in R:
sample
, cumsum
, replicate
, and accumulate
functionsLink to student profiles.
If these don't match exactly, feel free to modify as needed in discussion:
This could be similar to the last exercise of the course or a last exercise in a chapter.
Last exercise in Chapter 2: Write a function that simulates 100 steps from a Markov chain of words, given a transition_matrix
with row and column names.
Skills:
sample
with prob
to find the next state in a transition matrixaccumulate
to save up many steps in a chainSolution:
library(purrr)
simulate_step <- function(state) {
state <- sample(nrow(transition_matrix), 1, prob = transition_matrix[state, ])
colnames(transition_matrix)[state]
}
accumulate(1:100, simulate_step, .init = "the")
Last exercise in the course: Write a function that, given a number of points, the width of a region, and the height of a region, generate that many points in a Poisson point process.
Use it to plot 50 points in the space of 10 x 10.
Skills required:
rpois
runif
, twiceplot
to compare and x
and y
(we'd give scaffolding).Solution
simulate_points <- function(density, x_width, y_width) {
number <- rpois(1, density * x_width * y_width)
x <- runif(number, 0, x_width)
y <- runif(number, 0, y_width)
plot(x, y)
}
simulate_points(1, 10, 10)
Middle of chapter 2: Generate three steps in a Markov chain given a transition matrix and starting state.
Solution:
step2 <- sample(nrow(transition_matrix), 1, transition_matrix[state, ])
step3 <- sample(nrow(transition_matrix), 1, transition_matrix[state2, ])
step4 <- sample(nrow(transition_matrix), 1, transition_matrix[state3, ])
Middle of chapter 4: Randomly generate 100 events in a one-dimensional Poisson process with a rate of 3 per second by simulating exponential waiting times. Find the distribution of how many events happen in the first 2 seconds.
Solution:
cumsum(rexp(100, 3))
replicate(1000, sum(cumsum(rexp(100, 3)) <= 2))
Lesson 1.1 - Random walks: Imagine you were gambling with a friend, and betting on a coin. Each time you either lose one dollar or gain one dollar. This is a random walk: at any moment, it could go up or down one step.
sample()
, and find the cumulative position with cumsum()
.plot()
, which looks a bit like a stock price graph.Lesson 1.2 - Biased random walk: So far the random walk has been symmetrical, with an equal probability of gaining or losing.
Lesson 1.3 - Properties of a random walk: Where will a random walk end up after 10 steps? 100 steps?
replicate()
for simulating it.Lesson 2.1 - Transition matrices
Lesson 2.2 - One step in a Markov chain
sample(2, 1, prob = transition_matrix[state, ])
lets you randomly stepLesson 2.3 - Accumulating steps in a chain
purrr
accumulate function to add up states into a chain.Lesson 2.4 - Example: Markov chain of words
rexp
to simulateexp_greater_5 <- exp_sample[exp_sample >= 5]
, then examine the properties and distribution of exp_greater_5
.sum(rexp(3, 5))
cumsum
ruinf
.Course Description
Whether it's prices in the stock market, the number of visitors to a website, or the population of rabbits in a forest, many phenomena that we'd like to model with statistics involved numbers tracked over time. In this course, you'll be introduced to the field of stochastic processes, an area of probability studying systems that change over time. You'll learn about common statistical models such as random walks, Poisson processes, and Markov chains, as well as being introduced to the exponential and gamma distributions. These provide the fundamentals for many statistical methods common in finance, biology, and many other fields.
Learning Objectives
Prerequisites
For courses in the middle of a track, some instructors find it difficult to know where to begin there course.
I propose updating step three of the course spec so that it has 3 parts.
3.1: Write the first exercise, so you know where the course begins
3.2: Write the last exercise, so you know where the course ends (as before)
3.3: Write some exercises that detail how you get from the start to the end (as before)
Even for courses which aren't in the middle of the track, I think that it can be useful to know the starting point. (It should help clarify prerequisites, and if instructors are throwing in material that is too tricky, we'll discover the problem early.)
In the directory courses/design
, this requires
template.md
to describe this sub-step.exercises-first.md
with examples, etc.exercises-capstone.md
and exercises-examples.md
to renumber them.@richierocks commented on Wed Sep 20 2017
I wonder if we need different specs for different course types. We don't have an official course taxonomy, but I think there are three different types:
Code courses: These start from a set of packages and teach the syntax. e.g., Data Manipulation with dplyr, Data Visualization with ggplot2
Technique courses: These start from a technique and teach how to use it, e.g., Introduction to Time Series Analysis or Intro to Statistics with R: Multiple Regression
Problem courses: These start with a business problem (or scientific problem) and teach what the problem is and how to solve it. The forthcoming business analytics and health analytics courses fall into this category. Arguably, so do the case study courses like Exploratory Data Analysis in R: Case Study.
It might be useful to have a different course spec for each type of course, since the questions the instructor asks will be slightly different. (At least, the order they ask the questions may change.)
How should the specs change by course type?
@ncarchedi commented on Wed Sep 27 2017
It's a good point. Related to my comment here. Though I'd be more inclined to have a single generic/flexible format for course specs that can be adapted to each use case vs. having three different formats.
@gvwilson commented on Mon Oct 02 2017
Address after we have feedback on the first round.
There's some stuff on ShutterStock in the content wiki
https://github.com/datacamp/content-wiki/blob/gitbook/docs/archive/using-images-in-courses.md
Should also mention things like
Need to distinguish between images in videos and images in coding exercises.
I'm not sure if this was resolved in another PR. There is an extra #
in the title ("Interactive Exercise Title")
http://authoring.datacamp.com/courses/exercises/normal-exercise.html
Verify that slides authoring is up to date.
State that we have access to Thinknum's datasets in the "Where can I find datasets?" section of
http://authoring.datacamp.com/courses/design/brainstorming-datasets.html
Maybe link to their list of datasets:
Not quite sure what the workflow is. Probably just get the CL to email Justin Zhen, but we can document this once we have a plan.
From @ismayc on Slack:
Looking over outline feedback (in the old course specs system), I'm wondering if it makes sense to have instructors clearly differentiate the data sources they plan to use in the slides/videos and those they plan to use in the exercises in the new course specs. It seems that when instructors have a good sense for this the course development goes a lot faster/easier.
Would be helpful for instructors to see that they have a maximum of 600 words per slide deck (with preference for closer to 500).
When the gitbook is running, on the tab exercise and bullet exercise pages the links to tab-vs-bullet-exercises.md
don't work. The extension needs to be .html
for gitbook to find the files (and there are typos in the file name on both pages).
Also on the same pages the links to R examples of tab/bullet exercises actually go to the SQL examples.
Happy to send a PR if you give me push access to the repo.
@machow commented on Wed Dec 13 2017
When two hashmarks are used to indicate a header, teach uses it as an exercise title. When ----
is used to indicate a level 2 header, teach thinks it is demarcating a new exercise.
What is the relationship between the teach exercise grammar and markdown supposed to be?
Replace two hash marks with ---
underneath.
@machow commented on Wed Dec 13 2017
Wait, I think I understand. It looks like ##
and ---*
define the opening and closing of an exercise block now. Might be useful to clarify in the authoring book, since it says that ##
is a header, but does not mention how to separate exercises.
http://authoring.datacamp.com/courses/exercises/
@gor181 commented on Thu Dec 14 2017
Exactly each exercise starts with --- type:
(old syntax) or ---
followed by a title for new syntax.
Moving this task there thanks
Feel free to close, if not an issue :). A reader who copies the R TabExercise example into Teach will see the following.
They would need to install dplyr in their requirements.sh
to get the exercise to run. It might be helpful to either
option (2) is probably part of a bigger discussion with @rv2e (maybe as a content-tech-request?)
I got a good quote from instructor Melinda Higgins.
When designing introductory courses, if you are bored to tears with the material, you've probably pitched it at the right level. If you find it interesting, it's too hard.
This feels like it belongs in the documentation somewhere. (Course design step 3?)
@rasmusab commented on Thu Sep 28 2017
What I need to know as a CL is who the Instructor aims to make the course for. This so that both I, the instructor and then the CD are aligned on this. I feel that the Learner profiles does not help me with this.
If I learn that
This course will give Jasmine a basic understanding of the Unix shell so that she can help her students solve the problems they encounter using the university's systems in their statistics courses.
I still know nothing about what the course instructor assumes about the Student, and even if I cross reference Jasmines bio it doesn't tell me much about what tools/techniques the course instructor assumes she has experience with. Also all the extra info like that she did an "MBA at Georgia State" or that she "is partially deaf" feels distracting as I don't know how to use that.
What I would like to see is a list, written in the course instructors own words, on what background the course instructor assumes for the course. Parts of it could look like this:
This is useful because now I can point to this list and say that "1 and 2 seems reasonable, but I don't think we should assume they know about POSIX standards."
@rasmusab commented on Mon Oct 02 2017
What I would like to see is something more akin to what I wrote in the course spec for my "foundations of Bayesian statistics in R"-course.
@richierocks commented on Mon Oct 02 2017
Related: I think that 3 learner profiles might be too much for many courses. It's just difficult to write for multiple audiences.
Thinking about some upcoming courses:
I suppose that top-of-the-funnel intro courses will have more target audiences, but I think mostly we should be restricting the instructors to thinking about 1 or 2 student archetypes.
@rasmusab commented on Mon Oct 02 2017
But as they are currently written, no single (or even tuple) learner profile matches my course. Like the "space" of our Students is so high dimensional (python/R, mac/win, probability/no-clue, regression/no-clue, etc.), that I don't see how a couple of student instances could possible cover that space.
Question: Are the current profiles based on data?
@rasmusab commented on Mon Oct 02 2017
Business analytics courses all need two target students: someone with business experience who has been mostly using Excel and wants to know how to do things in R, and someone with some R experience that wants to understand what business problems are being solved.
So this seems like a pretty specific student profile.
Question: If none of the currently available student profiles is suitable for a course, can we make up new ones that matches the type of audience we have in mind for that course?
David Mertz (Anaconda) reported the following:
The documentation at http://authoring.datacamp.com/courses/exercises/ does not well match the Teach Editor. I don't have much insight into what the underlying infrastructure actually does, or which is "less wrong".
Throughout the two the Teach Editor inserts headers similar to *** =pre_exercise_code
while the document describes headers like @pre_exercise_code
. Nick has stated that these two are equivalent and headers have the same spelling in actual words (just not the surrounding DSL markup).
Within the Teach Editor, I see the following exercise types. Highlighted in bold are those that exist one place but note the other.
The documentation describes:
There are a variety of places where using the template and/or the documentation does not produce an exercise that behaves as intended, but these are contained in GH issues in the courses-anaconda-ecosystem-1 repo, so I will not duplicate them here.
@vincentvankrunkelsven commented on Fri Jun 16 2017
LaTeX formulas in markdown should be escaped. E.g. in _
must be escaped:
@LoreDirick commented on Thu Sep 28 2017
What seems a bit weird to me in the listed outline is that it is unclear what is covered in one video and what is covered in another one. What I like about the current outline structure is exactly that: I tell my instructors that "main" bullets are a video, and "sub" bullets add extra details to the video and the exercises that follow if that is something the instructor already gives more details on. Also, as a CL, I don't know from this outline how many videos will be in each chapter, and if the instructor is not trying to put too much in one video.
@yashasroy commented on Fri Sep 29 2017
I agree. I think this stems from the fact that these are example course specs for a course that does not contain videos. I think the idea going forward is that it will be down to the CD and the instructor to figure out how many videos will be in each chapter (and what to cover in them), with the formative assessments being used for guidance.
This is related to https://github.com/datacamp/example-course-specs/issues/9 and https://github.com/datacamp/example-course-specs/issues/12.
@gvwilson commented on Mon Oct 02 2017
Please close if the revised wording makes this clearer.
Update the mini-manuals, once we have formalized our policy.
@LoreDirick commented on Thu Sep 28 2017
For some courses it might make sense to create some cornerstone exercises. But for other courses (eg case study courses, also: my credit risk modeling course) the entire workflow is more sequential, and exercises will change while you work through the course and data set. I'm referring to my credit risk course: this step would have been very hard for me at this stage. I could have created exercises here, but would have had to redo them, which is a time drain.
@gvwilson commented on Mon Oct 02 2017
Good point - revisit after feedback from early adopters?
@gvwilson commented on Tue Oct 31 2017
Would it make sense to combine steps 3 and 4 (summative and formative assessments) and just have instructors write a representative subset of exercises that we then put in order? cc @richierocks
Currently the documentation is only relevant for Python MCQs:
test_mc(n, [msg1, msg2])
Add examples for R and Shell:
R
test_mc(n, c(msg1, msg2))
Shell
Ex() >> test_mc(n, [msg1, msg2])
First reported here
This seems pretty inconsistent with how we have instructors actually build exercises using lots of ___ throughout.
In this exercise, you will inspect the contents of a text file (data.txt
) contained in the data/
directory and call the appropriate base R function to read it into a data frame.
type: NormalExercise
xp: 100
key: asdfa83435
@sample_code
@@data/data.txt
x y z
1 5 9
2 6 10
3 7 11
4 8 12
@@script.R
@solution
@@data/data.txt
x y z
1 5 9
2 6 10
3 7 11
4 8 12
@@script.R
read.table("data/data.txt", header = TRUE)
Add link to
https://help.github.com/articles/creating-an-issue
maybe others on tagging, commenting, closing issues?
to
http://authoring.datacamp.com/courses/design/technical-help-resources.html
The screencast which existed on https://www.datacamp.com/teach/documentation#tab_creating_slides was very helpful.
Every project should include:
in its root directory (with appropriate edits to author names and project URLs).
The <title>
in the pages generated by GitBook says "Welcome GitBook" - this should say something about DataCamp instead.
I've had multiple instructors ask how to include code in their slides. It would also be helpful to show how {{1}} can be added at the end of code blocks.
Are there any plans on adding the production of community tutorials and/or blogs to this webpage? Or does this fall outside the scope?
It sucks to keep up to date, and is already hugely outdated. Literally from the first version we deployed EOY 2015.
Move from content wiki to here, so instructors can see it
Currently in the internal Content wiki, but we should make this instructor facing.
https://github.com/datacamp/content-wiki/blob/gitbook/docs/archive/course-naming-rules.md
Marketing have advice on this too, so they ought to review anything we write here.
Try to limit to 1 page of documentation per step.
Possible format:
@rasmusab commented on Thu Sep 28 2017
While I really dig that the course instructor put some example exercises in the course outline, what happened to me when I started writing the formative assessments was that I started fully working on the course.
Like, 2-3 exercises per chapter, that's a lot of exercises to create, and to be able to create 3 exercises for each chapter you already need have a pretty good idea what the rest of the course is going to be and to know what exercises and videos go before and after. Also, in many cases a single exercise (background info, description, etc.) can be much more extensive than the example exercises given here.
I'm not saying it's bad that the course instructors create exercises, I'm just saying that we're asking them to create a substantial part of their course already as part of the course outline. One way of mitigating this would be to ask them to write down 2-3 exercise stubs per chapter, where a stub is more of a high level description of the task.
@yashasroy commented on Fri Sep 29 2017
It's worth considering whether asking instructors to write 2-3 formative assessments per chapter is more work than the current system, where they have to create an exercise-by-exercise level index (including videos), before they can actually start writing exercises. Often the course structure changes after the index is created anyway, making the index useless after it is approved. I think these course specs would be useful even after course launch as a way for someone trying to maintain the course 6 months post launch a way to quickly understand the original vision (in a way the current system's index doesn't quite do, but you could argue the current outline does).
I also don't think it is a substantial part of the course. There is still a lot of work remaining in identifying how videos will fit into the course narrative, how to order the exercises, not to mention writing the remaining ~6 exercises per chapter (assuming 12 exercise chapters) and slides/scripts.
Also, in many cases a single exercise (background info, description, etc.) can be much more extensive than the example exercises given here.
Maybe assignment text / background info should not be part of these formative assessments. Instead, the focus should be on the code and the instructions.
@rasmusab commented on Mon Oct 02 2017
For example, here is one of the exercises I created as part of the course specs. This clearly went over board, but then it could go over board for other course instructors as well. The point is that making 2-3 exercises like this per chapter in practice means you left the speccing stage and actually started working on the course:
rbeta
to fit an beta-binomial model and interpret the result.A Bayesian model which is quick and easy to fit in R is the binomial distribution. Recall that the assumptions of the binomial distribution are that the data is a count of successes (x
) out of a number of trials (n
), and that there is an underlying proportion of successes (p
). To turn the binomial distribution into a fully Bayesian model all we need to do is to specify a prior distribution over p
and there are many different ways you can do this. However, if we limit ourselves to using a prior that is a Beta distribution then it turns out there is a simple recipe that allow us to produce samples from the posterior distribution of p
:
If the prior over p
is Beta distributed with shape parameters prior_shape1
and prior_shape2
, and the data we have are x
successes and n - x
failures, then the posterior distribution of p
is also a Beta distribution with shape parameters posterior_shape1 = prior_shape1 + x
and posterior_shape2 = prior_shape2 + n - x
. To produce samples from the resulting posterior you can then use the rbeta
function which takes the sample size as the first argument, and the shape parameters as the second and third argument.
# The prior
prior_shape1 <- 1
prior_shape2 <- 1
prior_p <- rbeta(100000, prior_shape1, prior_shape2)
The code to the right defines a Beta(1, 1) distribution and samples from this distribution (prior_p
). Start by visualizing this prior using a histogram by plotting prior_p
using the hist
function.
hist(prior_p)
Right! A Beta(1, 1) is the same as a uniform distribution between 0 and 1, a reasonable prior when you have little information regarding the underlying proportion of success.
Now we have some data we want to update this prior with. Say we run a website and we just put up a banner advertising our latest product. Out of the first 100
webpage visitors 32
click on the banner. Assuming the Beta(1, 1) prior, what is probably the underlying proportion of visitors clicking on the banner? Produce a sample from this posterior distribution and visualize it using the hist
function.
clicks <- 32
visitors <- 100
posterior_p <- rbeta(10000, prior_shape1 + clicks, prior_shape2 + visitors - clicks)
hist(posterior_p)
If you take a quick look at the distribution you just plotted, what is probably the underlying proportion of visitors clicking on the banner?
Using the sample from the posterior distribution calculate the probability that the proportion of visitors clicking is more than 25% .
sum(posterior_p > 0.25) / length(posterior_p)
@gvwilson commented on Mon Oct 02 2017
Agreed that this is getting the instructor to work on the course - as discussed Friday, it isn't extra work, and it gives us early feedback on feasibility.
@rasmusab commented on Mon Oct 02 2017
Right, what I mean is, we don't want other course instructors to make the same mistake as I did: To start writing too extensive exercises before the course spec is finalised. My fear is that that the course instructor will put a lot of work into exercise specifics for many exercises that then might needs to be changed, because the course instructor was still kind of in the planning stage of the course.
That's why I think it could be good to have some wording that directs course instructors more towards writing shorter "exercise stubs" like you have in your example.
Lynne Williams: this was hard to answer early, would probably have been easier later on.
Course design process description and template in #6 should refer to profiles created by Marketing - switch to them once they're published.
@richierocks commented on Fri Nov 10 2017
In order for the CLs to write the requirements.sh
/requirements.R
file before handing off to the CD, it would be useful if the final step in the README was to produce a definitive list of the packages or other software required for the course.
We should add a simplified human-readable explanation of the IP portion of our contracts to http://authoring.datacamp.com, modeled on https://creativecommons.org/licenses/by-sa/4.0/.
@ismayc commented on Mon Sep 25 2017
@ncarchedi commented on Wed Sep 27 2017
I don't expect this doc to take much longer on average than the overview/outline/index combo takes instructors now. Seems like we should be getting people through the course spec'ing phase within 4 weeks of signing the contract. Some will do it much faster, especially repeat instructors.
@gvwilson commented on Mon Oct 02 2017
Revisit after we have feedback from early adopters.
We should have information for instructors, especially open-course instructors, on writing requirements files.
For context, here is a note from one open-course instructor:
"Hi! I'm trying to make a course with Nipype and I'm curious what is the best way to make user defined environment variables that python will recognize. Nipype is essentially a wrapper for neuroimaging tools that are installed/accessible via the commandline. With a Dockerfile I can use the ENV
keyword, but since I'm restricted to having my changes in requirements.sh
, I'm wondering what's the best method to get the docker container to recognize environment variables."
@rasmusab commented on Thu Sep 28 2017
Hey, these are my thoughts, so when I write should/must/has to it means my humble opinion is that it should/must/has to
The course specs could serve (at least) two purposes:
These are two different purposes, and while I think we should help with (2) in the sense that we give guidelines and suggestions I don't think it should be the main purpose of the course specs. We are working with experienced teachers and instructors, and we should let them do their thing when it comes to how they plan/sketch/experiment/think about their course.
The course specs should be focussed on (1). Even though somebody passed an audition doesn't mean we automatically allow them to do any course. The course specs is for us to see if the course is something we actually want, that it's on the right level for our students, that it doesn't overlap too much/little with other courses, etc.
For example, if an instructor likes to do concept maps, then great! But whether we should require all instructors to put concept maps in their course specs should depend on whether it helps CLs and CDs see what kind of course the instructor is making.
Another example: If an instructor likes to use learner profiles, then great! But the reason to include them in the course spec would be if it makes it easier for me as a CL to gauge wether the course is on an reasonable skill/knowledge level for our Students.
@yashasroy commented on Fri Sep 29 2017
We are working with experienced teachers and instructors, and we should let them do their thing when it comes to how they plan/sketch/experiment/think about their course.
In my experience, super experienced teachers and instructors can still struggle with creating interactive online content. I agree with you in principle that we shouldn't be too rigid, but having a process in place (that works! :) ) ensures course development is streamlined.
But I think your point also speaks to @ncarchedi's comment in https://github.com/datacamp/example-course-specs/issues/9 about how creating slides/scripts before exercises may be a better way for instructors to approach their courses.
@gvwilson commented on Mon Oct 02 2017
Please close if these points were addressed in the latest rewrite.
Show clearly that the order in which things are developed is not the order in which they are presented.
http://authoring.datacamp.com/courses/ still shows screenshots of the old teach editor, but it has been revamped entirely.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.