Some foundational papers to read in different areas of ML and statistics.
- Bartlett 2020 - Benign Overfitting in Linear Regression
- Bartlett, Montanari & Rakhlin 2021 - Deep learning: a statistical viewpoint
- Belkin, Hsu & Xu 2019 - Two models of double descent for weak features
- Nakkiran et al. 2021 - Optimal Regularization Can Mitigate Double Descent
- Nakkiran 2019 - More Data Can Hurt for Linear Regression: Sample-wise Double Descent
- Belkin 2021 - Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
- Belkin et al. 2019 - Reconciling modern machine learning and the bias-variance trade-off
- Hastie et al. 2020 - Surprises in High-Dimensional Ridgeless Least Squares Interpolation
- Mei & Montanari 2021 - The generalization error of random features regression: Precise asymptotics and double descent curve
- Xu & Hsu 2019 - On the number of variables to use in principal component regression
- Muthukumar et al. 2019 - Harmless interpolation of noisy data in regression
- Dar, Muthukumar & Baraniuk 2021 - A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
- Wyner et al. 2017 - Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers
- Belkin, Ma & Mandal 2018 - To Understand Deep Learning We Need to Understand Kernel Learning
- Zhang et al. 2017 - Understanding deep learning requires rethinking generalization
- Neyshabur et al. 2018 - Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks
- Nakkiran et al. 2020 - The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers
- Yang et al. 2020 - Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
- Lugosi & Mendelson 2019 - Mean estimation and regression under heavy-tailed distributions--a survey
- Prasad et al. 2018 - Robust Estimation via Robust Gradient Estimation
- Diakonikolas et al. 2019 - Robust Estimators in High Dimensions without the Computational Intractability
- Lai, Rao & Vempala 2016 - Agnostic Estimation of Mean and Covariance
- Diakonikolas et al. 2019 - Sever: A Robust Meta-Algorithm for Stochastic Optimization
- Gao et al. 2018 - Robust Estimation and Generative Adversarial Nets
- Minsker 2013 - Geometric median and robust estimation in Banach spaces
- Donoho & Liu 1988 - The "Automatic" Robustness of Minimum Distance Functionals
- Donoho & Liu 1991 - Geometrizing Rates of Convergence
- Hopkins, Li & Zhang 2019 - Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization
- Lecue et al. 2020 - Robust classification via MOM minimization
- Bates et al. 2021 - Distribution-Free, Risk-Controlling Prediction Sets
- Lei et al. 2018 - Distribution-Free Predictive Inference for Regression
- Gupta, Podkopaev & Ramdas 2020 - Distribution-free binary classification: prediction sets, confidence intervals and calibration
- Cauchois et al. 2020 - Robust Validation: Confident Predictions Even When Distributions Shift
- Barber et al. 2020 - The limits of distribution-free conditional predictive inference
- Barber et al. 2019 - Predictive inference with the jackknife+
- Tibshirani et al. 2019 - Conformal Prediction Under Covariate Shift
- Angelopoulos et al. 2020 - Uncertainty Sets for Image Classifiers using Conformal Prediction
- Ben-David et al. 2009 - A theory of learning from different domains
- Ganin et al. 2016 - Domain-Adversarial Training of Neural Networks
- Zhao et al. 2019 - On Learning Invariant Representation for Domain Adaptation
- Shimodaira 2000 - Improving predictive inference under covariate shift by weighting the log-likelihood function
- Saerens, Latinne & Decaesteker 2002 - Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure
- Courty et al. 2015 - Optimal Transport for Domain Adaptation
- Johansson, Sontag & Ranganath 2019 - Support and Invertibility in Domain-Invariant Representations
- Henzie-Deml & Meinhausen 2017 - Conditional Variance Penalties and Domain Shift Robustness
- Arjovsky et al. 2020 - Invariant Risk Minimization
- Rosenfeld, Ravikumar & Risteski 2021 - The Risks of Invariant Risk Minimization
- Peters, Buhlmann & Meinhausen 2015 - Causal inference using invariant prediction: identification and confidence intervals
- Azizzadenesheli et al. 2019 - Regularized Learning for Domain Adaptation under Label Shifts
- Liu, Zhu & Belkin 2020 - Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
- Dauphin et al. 2017 - Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
- Du et al. 2019 - Gradient Descent Finds Global Minima of Deep Neural Networks
- Soltanolkotabi, Javanmard & Lee 2018 - Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
- Arora, Cohen & Hazan 2018 - On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
- Kleinberg, Li & Yuan 2018 - An Alternative View: When Does SGD Escape Local Minima?
- Allen-Zhu, Li & Song 2018 - A Convergence Theory for Deep Learning via Over-Parameterization
- Jacot, Gabriel & Hongler 2018 - Neural Tangent Kernel: Convergence and Generalization in Neural Networks
- Arora et al. 2019 - Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
- Arora et al. 2019 - On Exact Computation with an Infinitely Wide Neural Net
- Lee et al. 2017 - Deep Neural Networks as Gaussian Processes
- Matthews et al. 2018 - Gaussian Process Behaviour in Wide Deep Neural Networks
- Chizat, Oyallon & Bach 2018 - On Lazy Training in Differentiable Programming
- Woodworth et al. 2019 - Kernel and Rich Regimes in Overparametrized Models
- Soudry et al. 2017 - The Implicit Bias of Gradient Descent on Separable Data
- Ji & Telgarsky 2018 - Risk and parameter convergence of logistic regression
- Telgarsky 2013 - Margins, Shrinkage, and Boosting
- Gunasekar et al. 2018 - Characterizing Implicit Bias in Terms of Optimization Geometry
- Ali, Kolter & Tibshirani 2018 - A Continuous-Time View of Early Stopping for Least Squares
- Ali, Kolter & Tibshirani 2020 - The Implicit Regularization of Stochastic Gradient Flow for Least Squares
- Srebro, Rennie & Jaakkola 2004 - Maximum-Margin Matrix Factorization
- D'Armour et al. 2020 - Underspecification Presents Challenges for Credibility in Modern Machine Learning
- Santurkar et al. 2019 - How Does Batch Normalization Help Optimization?
- Srivastava et al. 2014 - Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- Li et al. 2018 - Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift
- Bergsta & Bengio 2012 - Random Search for Hyper-Parameter Optimization
- Hara, Saitoh & Shouno 2017 - Analysis of dropout learning regarded as ensemble learning
- Frankle & Carbin 2019 - The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- Frankle et al. 2020 - Stabilizing the Lottery Ticket Hypothesis
- Hooker & Mentsch 2019 - Please Stop Permuting Features: An Explanation and Alternatives
- Mentsch & Zhou 2020 - Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success
- Mentsch & Zhou 2021 - Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest