Data Science Learning Resources

Programming

General

The Pragmatic Programmer (Book)

R

R for Data Science (Book)
Advanced R (Book)

Python

Machine Learning

General

Introduction to Statistical Learning (Book)
Applied Predictive Modeling (Book)
Elements of Statistical Learning (Book)
Computer Age of Statistical Inference (Book)
Statistical Modeling: The Two Cultures (Paper)
Deep Learning (Book)
Hands-On Machine Learning with Scikit-Learn & TensorFlow (Book | GitHub)

Unsupervised Modeling

ISLR: Ch. 10.3 Clustering Methods (Book chapter)
A K-Means Clustering Algorithm (Paper)
Generalized Low Rank Models (Paper)
Deep Learning Ch. 15 Autoencoders (Book chapter)
Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders (Book chapter | GitHub resource)
Sparse autoencoder (Andrew Ng CS294A lecture notes)

A/B Testing

Lessons from Running Thoursands of A/B Tests (Online presentation with many references)
Online Controlled Experiments at Large Scale (Paper)
Peaking at A/B Tests (Paper)
Multi-armed Bandit (Online tutorial)
A Modern Bayesian Look at the Multi-armed Bandit (Paper behind above online tutorial)
Predicting Search Satisfaction Metrics with Interleaved Comparisons (Paper)
Evaluating Retrieval Performance using Clickthrough Data (Paper)

Multivariate Adaptive Regression Splines

Multivariate Adaptive Regression Splines (Friedman's original paper)
APM: Ch. 7.2 Multivariate Adaptive Regression Splines (Book chapter)
ESL: Ch. 9.4 Multivariate Adaptive Regression Splines (Book chapter)
Notes on the earth package (Paper)

K-Nearest Neighbor

k-Nearest neighbour classifiers (Paper)
APM: Ch. 7.4 & 13.5 K-Nearest Neighbors (Book chapter)
ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers (Book chapter)

Random Forests

Gradient Boosting Machines

How to explain gradient boosting (Online tutorial)
Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014 (YouTube)
Trevor Hastie - Data Science of GBM (2013) (slides)
Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015 (YouTube)
Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014 (YouTube)
Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial (Paper)

Ensembles / Model Stacking / Super Learners

Ensemble Methods in Machine Learning (Paper)
Stacked Regressions (Paper)
Super Learner (Paper)

Natural Language Processing / Text Mining

Text Mining with R (Book)
Probabilistic Topic Models (Paper)

Tuning

Feature Engineering

Feature Selection

Feature Selection with the Boruta Package (Paper)
APM: Ch. 19 An Introduction to Feature Selection (Book chapter)

Machine Learning Interpretability

H2O.ai Machine Learning Interpretability Resources (GitHub resources)
Patrick Hall's Awesome Machine Learning Interpretability Resources (GitHub resources)
Interpretable Machine Learning (Book)
Visualizing the Feature Importance for Black Box Models (Paper)
A Simple and Effective Model-Based Variable Importance Measure (Paper)
Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (Paper)
pdp: An R Package for Constructing Partial Dependence Plots (Paper)
"Why Should I Trust You?": Explaining the Predictions of Any Classifier (Paper)
A Unified Approach to Interpreting Model Predictions (Paper)
Consistent Individualized Feature Attribution for Tree Ensembles (Paper)
On the Art and Science of Machine Learning Explanations (Paper)
Explanation in artificial intelligence: Insights from the social sciences (Paper)
Please Stop Permuting Features: An Explanation and Alternatives (Paper)
A Stratification Approach to Partial Dependence for Codependent Variables (Paper)

Auto ML

Benchmarking

The Design and Analysis of Benchmark Experiments (Paper)
Szilard Pafka's ML Benchmarking Research (GitHub resources)

peacelovingng / data-science-learning-resources Goto Github PK

data-science-learning-resources's Introduction

Data Science Learning Resources

Programming

General

R

Python

Command Line

Containers

Functional Programming

Version Control

Style Guide, Readability, Best Practices

Machine Learning

General

Unsupervised Modeling

A/B Testing

Multivariate Adaptive Regression Splines

K-Nearest Neighbor

Random Forests

Gradient Boosting Machines

Ensembles / Model Stacking / Super Learners

Natural Language Processing / Text Mining

Tuning

Feature Engineering

Feature Selection

Machine Learning Interpretability

Auto ML

Benchmarking

Resampling Procedures

data-science-learning-resources's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org