Coder Social home page Coder Social logo

stat_rethinking_2022's Introduction

Statistical Rethinking (2022 Edition)

Instructor: Richard McElreath

Lectures: Uploaded <Playlist> and pre-recorded, two per week

Discussion: Online, Fridays 3pm-4pm Central European Time

Purpose

This course teaches data analysis, but it focuses on scientific models first. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face.

Format

Online, flipped instruction. The lectures are pre-recorded. We'll meet online once a week for an hour to work through the solutions to the assigned problems.

We'll use the 2nd edition of my book, <Statistical Rethinking>. I'll provide a PDF of the book to enrolled students.

Registration: Please sign up via <[COURSE IS FULL SORRY]>. I've also set aside 100 audit tickets at the same link, for people who want to participate, but who don't need graded work and course credit.

Calendar & Topical Outline

There are 10 weeks of instruction. Links to lecture recordings will appear in this table. Weekly problem sets are assigned on Fridays and due the next Friday, when we discuss the solutions in the weekly online meeting.

Lecture playlist on Youtube: <Statistical Rethinking 2022>

Week ## Meeting date Reading Lectures
Week 01 07 January Chapters 1, 2 and 3 [1] <The Golem of Prague> <(Slides)>
[2] <Bayesian Inference> <(Slides)>
Week 02 14 January Chapters 4 and 5 [3] <Basic Regression> <(Slides)>
[4] <Categories & Curves> <(Slides)>
Week 03 21 January Chapters 5 and 6 [5] <Elemental Confounds> <(Slides)>
[6] <Good & Bad Controls> <(Slides)>
Week 04 28 January Chapters 7, 8 and 9 [7] <Overfitting> <(Slides)>
[8] <Markov chain Monte Carlo> <(Slides)>
Week 05 04 February Chapters 10 and 11 [9] <Logistic and Binomial GLMs> <(Slides)>
[10] <Sensitivity and Poisson GLMs> <(Slides)>
Week 06 11 February Chapters 12 and 13 [11] <Ordered Categories> <(Slides)>
[12] <Multilevel Models> <(Slides)>
Week 07 18 February Chapters 13 and 14 [13] <Multi-Multilevel Models> <(Slides)>
[14] <Correlated varying effects> <(Slides)>
Week 08 25 February Chapter 14 [15] <Social Networks> <(Slides)>
[16] <Gaussian Processes> <(Slides)>
Week 09 04 March Chapter 15 [17] <Measurement Error> <(Slides)>
[18] <Missing Data> <(Slides)>
Week 10 11 March Chapters 16 and 17 [19] <Beyond GLMs> <(Slides)>
[20] <Horoscopes> <(Slides)>

Coding

This course involves a lot of scripting. Students can engage with the material using either the original R code examples or one of several conversions to other computing environments. The conversions are not always exact, but they are rather complete. Each option is listed below. I also list conversions <here>.

Original R Flavor

For those who want to use the original R code examples in the print book, you need to install the rethinking R package. The code is all on github https://github.com/rmcelreath/rethinking/ and there are additional details about the package there, including information about using the more-up-to-date cmdstanr instead of rstan as the underlying MCMC engine.

R + Tidyverse + ggplot2 + brms

The <Tidyverse/brms> conversion is very high quality and complete through Chapter 14.

Python: PyMC3 and NumPyro and more

The <Python/PyMC3> conversion is quite complete. There are also at least two NumPyro conversions: <NumPyro1> <NumPyro2>. And there is this <TensorFlow Probability>.

Julia and Turing

The <Julia/Turing> conversion is not as complete, but is growing fast and presents the Rethinking examples in multiple Julia engines, including the great <TuringLang>.

Other

The are several other conversions. See the full list at https://xcelab.net/rm/statistical-rethinking/.

Homework and solutions

I will also post problem sets and solutions. Check the folders at the top of the repository.

stat_rethinking_2022's People

Contributors

rmcelreath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stat_rethinking_2022's Issues

Question about contrasts and correlation

Hi, thanks for being willing to help with my question from:
https://twitter.com/rg9119/status/1487179286532349952

Here an attempt at Python code to generate a synthetic dataset of height,sex entries from correlated parameters (my impression is you know Python, so hope that's ok!). Based on p. 155 of the book (though partly inspired by Lecture 4 of this term's course.

I'm trying to understand the word of warning from:
https://www.youtube.com/watch?v=QiHKdvAbYII&t=1752s

as regards computing contrasts (and plotting their density) when using categorical variables for sex.

I hope my confusion hasn't overcomplicated my question.
abc.txt

External registration open?

D'oh! APOLOGIES! I now see the tweet I saw was for 2022 not 2023. Missed my chance. I'm blaming COVID.

<< Ignore the following -- and feel free to delete this "issue" if someone knows how >>

Sorry if I simply missed a memo but I'm hoping to register for the course. Opening it to external participants was mentioned in a tweet a couple of weeks ago, but I've not seen any follow-up.

Does anyone know if this is a possibility? And, if so, is there a link to the registration process?

Thanks very much for any leads!

Unable to replicate a plot in Lecture 09

I was unable to replicate the plot of the hypothetical effect of manipulating the perception of applicants' gender. Using the code provided in the repository, I got something like this:
image
Instead of the one shown in the lecture:
image

Slide 34 and 35 in Lecture 18 (Missing data) are cyclic graphs

I think it was an oversight, but on slides 34 and 35 in Lecture 18, there is an edge from $G^\ast\rightarrow M_g$. This has two problems:

  1. If the missingness mechanism is due to the fact that the species $G$ is less studied, then there should be an arrow from $G\rightarrow M_g$ instead of $G^\ast\rightarrow M_g$.
  2. The arrow from $G^\ast\rightarrow M_g$ and from $M_g\rightarrow G^\ast$ results in a cyclic graph, which breaks everything.

Hopefully, you can correct the slides and the video without having to change everything. I don't have access to the 2nd edition of the textbook, but I wonder if there is also an error in there.

Phoenix Wright?!

I am taking the course right now and I have just come across the Phoenix Wright meme on colliders.
Thanks for bringing back good old memories! Loving the course so far!

This event has sold out online

Hi Richard McElreath
I am a recent undergrad with an interest in applied statistics. I am willing to learn more about statistics but the website says "This event has sold out online" Can you please let me in to attend your lectures?

More dramatic gain from partial pooling

First of all thanks for the book and the video course. The motivation behind multilevel models is clear: partial pooling is an "adaptive compromise" between no pooling and complete pooling. In the video lecture-12 (https://speakerdeck.com/rmcelreath/statistical-rethinking-2022-lecture-12?slide=40) we show the "gain" of using partial pooling using "cross-validation". But the cross-validation score of partial pooling is very similar to the complete pooling.

  1. For this particular example, are there any other (more convincing?) arguments (other than cross-validation) to use partial pooling against complete pooling?
  2. Is it possible to create a "simple" example in which we observe a more "dramatic" U-shaped cross-validation line?

Thanks in advance.

error still in Lecture 9 marginalized effect example?

Hi, I think you said that the error from lecture 9 is fixed, but I'm wondering if an error is still there on slide 75 and 76?

For example on slide 75, there's this code where link refers to model m2:

# simulate as if all apps from women
p_G1 <- link(m2,data=list(
D=rep(1:6,times=apps_per_dept),
N=rep(1,total_apps),
G=rep(1,total_apps)))

But in the lecture code, the model now refers to mGD.

# simulate as if all apps from women
p_G1 <- link(mGD,data=list(
    D=rep(1:6,times=apps_per_dept),
    N=rep(1,total_apps),
    G=rep(1,total_apps)))

Add Chapter 5 to week 2?

Given the discussion of DAGs and categorical variables in the homework, would it make sense to add Chapter 5 to this week's reading on the README.md schedule?

Question about interpretation of the individual intercepts in m11.4

In model m11.4, the model allows each monkey to have their own intercept but common treatment effect. I am not sure about the interpretation of the individual intercept when the treatment variable has index contrast. Does the intercept indicate the logit(p left) of an individual monkey when there is no treatment, and does this make sense when the smallest coding value of treatment is 1?

Sorry if the answer is obvious, but I haven't been able to wrap my head around this.

Thank you.

No residual plot code for Chapter 5 (Figure 5-4)

Hi professor @rmcelreath , thanks for the amazing book! Loving it!

I would like to know if it is possible to share some code around the residual plots of Chapter 5, more specific, Figure 5-4.

I am trying to replicate it, but finding some difficulties. I will keep trying anyway.

All the best,
Edu

wrong model in lecture 9 code?

I'm trying to reproduce what you did in lecture 9. Getting stuck at the marginal/counterfactual example.
Your (updated) code is this:

simulate as if all apps from women

p_G1 <- link(m2,data=list(
D=rep(1:6,times=apps_per_dept),
N=rep(1,total_apps),
G=rep(1,total_apps)))

But m2 is the model from the simulated data, and if I understand correctly, here we are trying to mimic the real data. So I think it should be mGD. And indeed, in your script you have this:

OLD WRONG CODE!

#p_G1 <- link( mGD , data=list(N=dat$N,D=dat$D,G=rep(1,12)) )

I think mGD is the right model. Unfortunately when I use mGD with the updated code above, my result figure doesn't look like yours (it also doesn't look like yours when. Instead of the main peak at around 0.1, I get the main peak at 0 and a minor one at 0.2. I'm not sure what's going on and trying to figure things out. Any pointers appreciated.
Thanks!

which book to buy?

Bayesian Updating and Slide 2

Hi Richard, thanks so much for the book and the course. Though I wasn't able to register for your course, I've been following along with the videos. I have an question and I hope it is ok to ask this here...

In Lecture 2, you explain the grid method have a slide about Bayesian updating. Your rules:

  1. State a casual model for how the observations arise, given each possible explanation.
  2. count ways data could arise for each explanation
  3. Relative plausibility is relative value from (2)

Your example uses throwing the globe and seeing if your finger lands on water or land so (2) above is easy to compute using the binomial distribution. But, how do you calculate (2) when your problem gets slightly more complicated (e.g. figuring out the probability that a posterior explanation data point is a good fit for matching a line to a set of real-valued data points)?

Thanks in advance.

lppd CV equation (text p218)

Thank you for your great book, slides and YouTube lecture. I am struggling to read through your book (2nd edition).
lppd CV equation, on page 218 of your book and on Lecture 7 slide, looks inconsistent with lppd equation on page 210 and lppd IS equation on page 218. I think "log" should be put before "1/S". Am I wrong?

Question about multiplication in Bayesian Inference

Hi Richard @rmcelreath ,

Thanks for this great course! I have been reading chapter two of the book and I can't see why for the marble example, successive multiplications will produce the same results as a single number of ways calculation. In the example we are calculating the number of ways to draw with replacement a blue marble followed by a white and then another blue marble (BWB), given there are four marbles in the bag, I can't seem to get the same result if I do calculation in three stages like:

Posterior = (Number of ways / Total number of ways) * Prior

Assuming Prior is 1 at the beginning.

p Blue White Blue Product
0.25 0.17 (1) 0.50 (3) 0.17 (1) 0.0145
0.50 0.33 (2) 0.33 (2) 0.33 (2) 0.0360
0.75 0.50 (3) 0.17 (1) 0.50 (3) 0.0425

Calculate in one go:

p BWB
0.25 0.15 (3)
0.50 0.40 (8)
0.75 0.45 (9)

We can see the for example, P = 0.75, doing successive draw / multiply gives a posterior probability of 0.0425, but if we do this in one go in the second table, we get 9 out of 20 ways = 0.45 which doesn't match with 0.0425. Assuming the normalization (denominator) is always 1 for every stage?

For the proportion of water in globe example, I also tried to see if the probabilities are calculated individually, the product will be the same as a single calculation. If we had the sequence WWWLL:

E Likelihood Posterior How Posterior is calculated
W p 2p (p * 1)/ integral from 0 to 1 {p*1}
WW p 3p^2 (p * 2p)/ integral from 0 to 1 {p * 2p}
WWW p 4p^3 (p * 3p^2)/ integral from 0 to 1 {p * 3p^2}
WWWL 1-p 20p^3 - 20p^4 ((1-p) * 4p^3)/ integral from 0 to 1 {(1-p) * 4p^3}
WWWLL 1-p 60p^3 - 120p^4 + 60p^5 ((1-p) * (20p^3−20p^4))/ integral from 0 to 1 {(1-p) * (20p^3−20p^4)}

And this is different if I use the binomial distribution formula directly, the result is:

$5!/(2!3!) * p^3 * (1-p)^2 = 10p^3 - 20p^4 + 10p^5

Same question in stack exchange: https://math.stackexchange.com/questions/4503794/bayesian-inference-multiplication

Posterior probability does not integrate to 1

The last line of slide 61 (https://speakerdeck.com/rmcelreath/statistical-rethinking-2022-lecture-02?slide=61) and in the book R code 3.2 (and R code 2.3) uses a standardization rule different from the one used for prior probability.

As explained by the Overthinking box at page 35 of the book, prior is an array of ones, since the important property is that it integrates to one over p_grid. The sum of the values of prior is indeed much greater than 1 (20 in code 2.3, 1000 in code 3.2).

The standardization used for posterior instead guarantees that sum(posterior) == 1, while the integral over p_grid is less than one.

This is not relevant for the shape of the posterior curve, but the asymmetry bothers me. I believe the right statement to use in 3.2 is

posterior <- (posterior / sum(posterior))*length(posterior)

then sum(posterior) == sum(prior) and both their integrals over p_grid should be 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.