-
This repository contains links to references (books, courses, etc) that are useful for learning statistics and machine learning (as well as some neighboring topics). References for background materials such as linear algebra, calculus/analysis/measure theory, probability theory, etc, are usually not included.
-
The level of the references starts from advanced undergraduate stats/math/CS and in some cases goes up to the research level. The books are often standard references and textbooks, used at leading institutions. In particular, several of the books are used in the standard curriculum of the PhD program in Statistics at Stanford University (where I learned from them as well), as well as at the University of Pennsylvania (where I work). The goal is to benefit students, researchers seeking to enter new areas, and lifelong learners.
-
For each topic, materials are listed in a rough order of from basic to advanced.
-
The list is highly subjective and incomplete, reflecting my own preferences, interests and biases. For instance, there is an emphasis on theoretical material. Most of the references included here are something that I have at least partially (and sometimes extensively) studied; and found helpful. Others are on my to-read list. Several topics are omitted due to lack of expertise (e.g., causal inference, Bayesian statistics, time series, sequential decision-making, functional data analysis, biostatistics, ...).
-
The links are to freely available author copies if those are available, or to online marketplaces otherwise (you are encouraged to search for the best price).
-
How to use these materials to learn: To be an efficient researcher, certain core material must be mastered. However, there is too much specialized knowledge, and it can be overwhelming to know it all. Fortunately, it is often enough to know what type of results/methods/tools are available, and where to find them. When they are needed, they can be recalled and used.
-
Please feel free to contact me with suggestions.
- Casella & Berger: Statistical Inference (2nd Edition) - Possibly the best introduction to the principles of statistical inference at an advanced undergraduate level. Mathematically rigorous but not technical. Covers key ideas and tools for constructing and evaluating estimators:
- Data reduction (sufficiency, likelihood principle),
- Methods for finding estimators (method of moments, Maximum likelihood estimation, Bayes estimators), methods for evaluating estimators (mean squared error, bias and variance, best unbiased estimators, loss function optimality),
- Hypothesis testing (likelihood ratio tests, power), confidence intervals (pivotal quantities, coverage),
- Asymptotics (consistency, efficiency, bootstrap, robustness).
- Wasserman: All of Statistics: A Concise Course in Statistical Inference - A panoramic overview of statistics; mathematical but proofs are omitted. Covers material overlapping with ESL, TSH, TPE (abbreviations defined below), and other books in this list.
- Cox: Principles of Statistical Inference - Covers a number of classical principles and ideas such as pivotal inference, ancillarity, conditioning, including famous paradoxes. Light on math, but containing deep thoughts.
- Hastie, Tibshirani & Friedman: The Elements of Statistical Learning - The bible of modern statistical methodology, comprehensive coverage from linear methods to kernels, basis expansions, trees/forests, model selection, high-dimensional methods, etc. Emphasizes ideas over math. Free on author's website. Known as "ESL".
- Lehmann & Casella: Theory of Point Estimation, 2nd Edition - Solid mathematically rigorous overview of point estimation theory. Known as "TPE".
- Lehmann & Casella: Testing Statistical Hypotheses, 4th Edition - A complement to TPE, covers the theory of inference (hypothesis tests and confidence intervals). Known as "TSH".
- van der Vaart: Asymptotic Statistics - Covers classical fixed-dimensional asymptotics.
- Candes: Theory of Statistics, STAT 300C Lecture Notes - Modern statistical theory: sparsity, detection thresholds, multiple testing, false discovery rate control, Benjamini-Hochberg procedure, model selection, conformal prediction, etc.
This section is the most detailed one, as it is the closest to my research.
- Tsybakov: Introduction to Nonparametric Estimation - The first two chapters contain many core results and techniques in nonparametric estimation, including lower bounds (Le Cam, Fano, Assouad).
- Weissman, Ozgur, Han: Stanford EE 378 Course Materials. Lecture Notes - Possibly the most comprehensive set of materials on information theoretic lower bounds, including estimation and testing (Ingster's method) with examples given in high-dimensional problems, optimization, etc.
- Johnstone: Gaussian estimation: Sequence and wavelet models - Beautiful overview of estimation in Gaussian noise (shrinkage, wavelet thresholding, optimality). Rigorous and deep, has challenging exercises.
- Duchi: Lecture Notes on Statistics and Information Theory - Eclectic modern topics in modern statistical learning, at the interface of stats and ML: intro to information theory tools, PAC-Bayes, minimax lower bounds (estimation and testing), probabilistic prediction, calibration, online game playing, online optimization, etc.
- Bach: Learning Theory from First Principles
- RJ Tibshirani: Lecture Notes on Advanced Topics in Statistical Learning: Spring 2023 - Overview of a variety of important dna modern topics in statistical machine learning. Some topics are advanced and hard to find summarized in other places, e.g., conformal prediction under distribution shift and calibration.
- van der Vaart: Semiparametric Statistics, Chapter III of Lectures at Ecole d'Ete de Probabilites de Saint-Flour XXIX, 1999 - Concise and mathematically rigorous introduction to key ideas in semiparametrics. Defines tangent sets and spaces, differentiable paths and score functions, differentiable maps and influence functions, efficiency, etc.
- Kosorok: Introduction to Empirical Processes and Semiparametric Inference - Detailed and rigorous introduction to semiparametrics, also containing the required background from empirical process theory (and necessary math background, such as topics from functional analysis). A number of detailed examples are presented, which greatly aid appreciating the power of the theory.
- Bickel, Klaassen, Ritov, Wellner: Efficient and Adaptive Estimation for Semiparametric Models - Thorough and rigorous, but also heavy, treatise on semi-parametrics; including some required background on local asymptotic normality. The first few chapters present the general theory and and can be focused on during a first reading.
- Anderson: An Introduction to Multivariate Statistical Analysis - Standard reference on multivariate statistical analysis (OLS, LDA, PCA, factor analysis, MANOVA). Describes practical methods with mathematical rigor. Beautifully written.
- Polits, Romano, Wolf: Subsampling - Canonical reference for the powerful resampling methodology of subsampling.
- van der Vaart, Wellner: Weak convergence and empirical processes - Thorough and mathematically fully rigorous (sometimes technically heavy) book on empirical processes; key reference when working in the area.
High dimensional (mean field, proportional limit) asymptotics; random matrix theory (RMT) for stats+ML
- Mei: Lecture Notes for Mean Field Asymptotics in Statistical Learning - Good overview of various techniques in the area: replica methods, Gaussian comparison inequalities/Convex Gaussian Minimax Theorem, Stieltjes transforms for random matrices, and approximate message passing (AMP). Several applications to stats+ML are presented.
- Couillet & Debbah: Random Matrix Methods for Wireless Communications - The first section is a good overview of the most commonly used RMT techniques and results for stats+ML. Strikes an ideal balance between rigor and clarity (Statements are rigorous, detailed proof sketches are presented, but some of the most technical proof components are omitted and references to papers are given).
- Bai & Silverstein: Spectral Analysis of Large Dimensional Random Matrices - A standard reference in the field, with citable results stated at full generality, and with proofs. Nonetheless, requires filling in details of calculations/arguments, which can take a lot of effort for students.
- Peck et al: Statistics - A Guide to the Unknown - Engaging essays about applications of statistics in diverse areas: public policy, science & tech, bio & medicine, business & industry, hobbies & recreation. Elementary (minimal to no prerequisites), and written in a way that "draws you in".
- Morton et al: Public Policy and Statistics: Case Studies from RAND
- Peck et al.: Statistical Case Studies: A Collaboration Between Academe and Industry Student Edition. Instructor Edition.
- Shalev-Shwartz & Ben-David: Understanding Machine Learning: From Theory to Algorithms - Good single reference source of core machine learning theory ideas and results.
- Srebro: Computational and Statistical Learning Theory - Great course materials on Statistical/PAC learning, online learning, crypto lower bounds.
- Orabona: A Modern Introduction to Online Learning
- Courses at Deeplearning.ai. Their course on Intro to Deep Learning with Andrew Ng is great.
- Andrej Karpathy's Neural Networks: Zero to Hero video lectures. 100% coding-based, hands-on tutorial on implementing basic autodiff, neural nets, language models, and a small version of GPT.
- DeepMind x UCL | Deep Learning Lecture Series 2021 Videos
- Prince: Understanding Deep Learning. Free PDF book, code notebooks, and slides available on author website.
- ML Safety Course at Center for AI Safety. See video lectures on Youtube.
- Elad Hazan's AI Safety Course at Princeton
This is subject to active development and research. There is no complete reference.
- The corresponding sections in the Understanding Deep Learning book. See also the associated tutorial posts: LLMs; Transformers 1, 2, 3; Training and fine-tuning; Inference
- Raschka: LLMs from Scratch
- UC Berkely Course Understanding Large Language Models: Foundations and Safety
- Transformer Mechanistic Interpretability: Transformer Circuits. Additional notes.
- Dobriban's course materials for STAT 991 - Contains detailed references to materials on uncertainty quantification for ML, including conformal prediction/predictive inference and calibration.
- Boyd and Vandenberghe: Convex Optimization - Good user- and algorithm-focused book on convex optimization. Mathematically rigorous and clean, but does not go deep in the theory.
- Nesterov: Introductory Lectures on Convex Optimization: A Basic Course - A deep dive into convex optimization theory, including optimality results.
- Bottou, Curtis, Nocedal: Optimization Methods for Large-Scale Machine Learning - Good introductory review focusing on scalable first order methods, such as SGD and variance-reduced methods. Has some proofs.
- Duchi: Introductory Lectures on Stochastic Optimization
- Boucheron, Lugosi, Massart: Concentration Inequalities: A Nonasymptotic Theory of Independence - Standard reference on concentration inequalities, used often in proofs in stats/ML.
- Vershynin: High-Dimensional Probability: An Introduction with Applications in Data Science - Another standard reference in the area, with citable and usable results. Also has some example applications to covariance estimation, graph estimation, etc.
- Talagrand: Upper and Lower Bounds for Stochastic Processes - Chaining is a theoretical tool invented by Talagrand, and can often give optimal bounds of the tail behavior of stochastic processes (even when standard concentration inequalities fail to do so). This is the a readable, but rigorous and complete reference by the inventor of the theory.