nickeubank / ds4humans Goto Github PK
View Code? Open in Web Editor NEWHome Page: http://ds4humans.com
Home Page: http://ds4humans.com
a place to throw things into something with structure.
Talk about how often best to start by specifying ideal, then work back to feasible. Lot of student questions about this.
https://github.com/nickeubank/ds4humans/blob/main/30_questions/00_solving_the_right_problem.md
That way to think resonated. Add to passive prediction
she's gonna write a competing book called "data science for cats".
No idea if it works... the first time I tried to use it in teaching it went OK, the last time I tried teaching causal inference first and then introducing it and it did not go great. It feels meaningful to me, but... Curious what you think when you have a chance.
It occurs to me that because most of my work I'm just pushing directly to the main branch, you don't actually have a lot of visibility into what I'm up to (sorry!).
If you wanna see where I'm at, so far I've re-written all of these readings:
And I'm working on passive-prediction questions now.
I've taken a "fuck figuring out the ideal audience, just write it for the class I'm teaching now" approach, which means what's there assumes that the readers have taken a basic statistical modeling course and are taking an ML course concurrently. It's what I need right now, so I figure start there and then we can reshape/organize for the future.
But if you want to read through it, I'd be very curious what you think. It is finally starting to feel like this kinda vague framework I've been working to express for a while is coming together. It's definitely not "a social scientist does data science" book, but rather (I hope!) a real data science book that tries to put all perspectives on similar footing in terms of usefulness.
Kyle:
In the EDA reading, you have one paragraph that I was reflecting on:
“This is problematic because any activity that involves data but lacks a clear motivation is doomed to be unending and unproductive. Data science has emerged precisely because our datasets are far too complex for us to understand directly; indeed, I would argue that the job of a data scientist can be summed up, in part, as a person who identifies meaningful patterns in our data and makes them comprehensible.”
My question on this is whether or not a clear motivation is necessary for a preliminary analysis/EDA? A data scientist might “explore” the data in the process of conducting an EDA (whether directed or undirected); won’t what matters be what they synthesize and then present to the stakeholder?
Me:
This is definitely the branch I'm exploring walking out on, and I recognize that in taking such a strong position, I'm sure I'm not quite right. But given our students are coming from the opposite extreme, I'm feeling out the more extreme position on the other side.
With that said, I am finding it pretty compelling. To some degree I think it depends on your definition of "clear motivation" and how narrow-precise one interprets that. I don't expect people to dive in to their data with most of their paper already written and a need for nothing but the values to plug into their tables (if they're the research paper-writing sort); but I do think you need a clear sense of what matters in the sense of "what outcomes are problem relevant? what independent variables might you have leverage to manipulate (if your goal is to be impactful)?
Put differently, what you synthesize and present to the stakeholder is the conclusion or answer to a question, but the metric for whether something is substantive significant comes from the stakeholder's problem.
But your point is well taken and I'll keep thinking about it.
Kyle:
That perspective makes perfect sense. Erring on the side of overcorrection from many students’ current default of approaching an EDA like a treasure hunt without a map is reasonable. I agree it hinges around “how narrow-precise” the interpretation of what a “clear motivation” is. If there’s some wiggle room to make that a bit less narrow-precise, then the paradigm you present here feels very well-composed.
Where dem data come from
If wanna get on zoom some time to talk about notebooks / how this book works.
Students don't quite grok the distinction between passive-prediction and causal. Add the rain example? Umbrella use (passively) predicts rain (if we see one, we don't expect the other), but there's no causal relationship (if I were to manipulate umbrella use, it wouldn't cause rain / we wouldn't "predict" rain to start).
Suppose high blood pressure predicts complications. Maybe it's the high blood pressure itself, maybe people with three jobs who take public transit are just leading more stressful lives.
So blood pressure is great predictor of complications; but would it respond to blood pressure drugs?
(And does that matter if just targeting for followup care?)
Will need to group better so their list in the table of contents side-bar on the left isn't so long, but fun to put in place!
Adriane:
Passive prediction doesn't care about "why", right? Whether you understand why a patient is likely to get cancer or not is irrelevant to your task. With enough data, you can put anything in your model, you don't even need to think bout whther its a good idea to put it in the model. A good prediction is a good prediction regardless of your understanding. Causality begins to move you closer to questions of why. You likely have an inkling of an understanding in order to choose to manipulate a cause, or think about a cause in the first place.
2020 (or 2022) Exit Polls for a state - "Gender" gap?
Pre-Election Polls: 2020 (but have all data since 2008) (but also used in Imai)
Election Returns: County level & state level over time
Covid incidence
Trump Twitter data, House Twitter data
Federalist Papers (but from Imai)
Open ended responses from JDC Covid Survey
Survey data for segmentation study for NBC. ("6 voter types")
Crime/Incarceration data?
Voter file data
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.