Coder Social home page Coder Social logo

ds4humans's People

Contributors

joshclinton avatar nickeubank avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

ds4humans's Issues

Exercise: Descriptive / Proscriptive

  • Give them some proscriptive problems. Ask them to come up with descriptive questions they might want to answer to help.
  • Differentiate descriptive and proscriptive questions
  • Ask whether data scientists have "privileged authority" to speak to either class of question.

If you wanna see where I'm at...

It occurs to me that because most of my work I'm just pushing directly to the main branch, you don't actually have a lot of visibility into what I'm up to (sorry!).

If you wanna see where I'm at, so far I've re-written all of these readings:

image

And I'm working on passive-prediction questions now.

I've taken a "fuck figuring out the ideal audience, just write it for the class I'm teaching now" approach, which means what's there assumes that the readers have taken a basic statistical modeling course and are taking an ML course concurrently. It's what I need right now, so I figure start there and then we can reshape/organize for the future.

But if you want to read through it, I'd be very curious what you think. It is finally starting to feel like this kinda vague framework I've been working to express for a while is coming together. It's definitely not "a social scientist does data science" book, but rather (I hope!) a real data science book that tries to put all perspectives on similar footing in terms of usefulness.

Kyle: soften motivation language a little in EDA rant?

Kyle:

In the EDA reading, you have one paragraph that I was reflecting on:

“This is problematic because any activity that involves data but lacks a clear motivation is doomed to be unending and unproductive. Data science has emerged precisely because our datasets are far too complex for us to understand directly; indeed, I would argue that the job of a data scientist can be summed up, in part, as a person who identifies meaningful patterns in our data and makes them comprehensible.”

My question on this is whether or not a clear motivation is necessary for a preliminary analysis/EDA? A data scientist might “explore” the data in the process of conducting an EDA (whether directed or undirected); won’t what matters be what they synthesize and then present to the stakeholder?

Me:

This is definitely the branch I'm exploring walking out on, and I recognize that in taking such a strong position, I'm sure I'm not quite right. But given our students are coming from the opposite extreme, I'm feeling out the more extreme position on the other side.

With that said, I am finding it pretty compelling. To some degree I think it depends on your definition of "clear motivation" and how narrow-precise one interprets that. I don't expect people to dive in to their data with most of their paper already written and a need for nothing but the values to plug into their tables (if they're the research paper-writing sort); but I do think you need a clear sense of what matters in the sense of "what outcomes are problem relevant? what independent variables might you have leverage to manipulate (if your goal is to be impactful)?

Put differently, what you synthesize and present to the stakeholder is the conclusion or answer to a question, but the metric for whether something is substantive significant comes from the stakeholder's problem.

But your point is well taken and I'll keep thinking about it.

Kyle:

That perspective makes perfect sense. Erring on the side of overcorrection from many students’ current default of approaching an EDA like a treasure hunt without a map is reasonable. I agree it hinges around “how narrow-precise” the interpretation of what a “clear motivation” is. If there’s some wiggle room to make that a bit less narrow-precise, then the paradigm you present here feels very well-composed.

Let me know

If wanna get on zoom some time to talk about notebooks / how this book works.

Intro reading: Add rain/umbrella

Students don't quite grok the distinction between passive-prediction and causal. Add the rain example? Umbrella use (passively) predicts rain (if we see one, we don't expect the other), but there's no causal relationship (if I were to manipulate umbrella use, it wouldn't cause rain / we wouldn't "predict" rain to start).

Good correlation not causation example: blood pressure and surgery complications

Suppose high blood pressure predicts complications. Maybe it's the high blood pressure itself, maybe people with three jobs who take public transit are just leading more stressful lives.

So blood pressure is great predictor of complications; but would it respond to blood pressure drugs?

(And does that matter if just targeting for followup care?)

added R tutorials

Will need to group better so their list in the table of contents side-bar on the left isn't so long, but fun to put in place!

intro: causal & why?

Adriane:

Passive prediction doesn't care about "why", right? Whether you understand why a patient is likely to get cancer or not is irrelevant to your task. With enough data, you can put anything in your model, you don't even need to think bout whther its a good idea to put it in the model. A good prediction is a good prediction regardless of your understanding. Causality begins to move you closer to questions of why. You likely have an inkling of an understanding in order to choose to manipulate a cause, or think about a cause in the first place.

Potential Examples (with data)

2020 (or 2022) Exit Polls for a state - "Gender" gap?

  • Good for data wrangling, mutation, if else, missing data, univariate descriptives (mean, median, mode), data types

Pre-Election Polls: 2020 (but have all data since 2008) (but also used in Imai)

  • Good for looping/Group_by, error/uncertainty, prediction

Election Returns: County level & state level over time

  • visualization, prediction, merging (demographics), long vs wide data? mapping?
  • segmentation? X different americas (segmentation analysis of counties based on demographics & voting)

Covid incidence

  • Measurement issues, mapping? (Correlation with vote returns - correlation vs causality)

Trump Twitter data, House Twitter data

  • Text as data? classification, "sentiment analysis"

Federalist Papers (but from Imai)

  • Classification, prediction

Open ended responses from JDC Covid Survey

  • classification/description

Survey data for segmentation study for NBC. ("6 voter types")

  • classification

Crime/Incarceration data?

Voter file data

  • North Carolina? All?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.