Book Review: Computer Age Statistical Inference

2016-10-28

by Joseph Rickert

Computer Age Statistical Inference: Algorithms, Evidence and Data Science by Bradley Efron and Trevor Hastie is a brilliant read. If you are only ever going to buy one statistics book, or if you are thinking of updating your library and retiring a dozen or so dusty stats texts, this book would be an excellent choice. In 475 carefully crafted pages, Efron and Hastie examine the last 100 years or so of statistical thinking from multiple viewpoints. In the nominal approach implied by the book’s title, they describe the impact of computing on statistics, and point out where powerful computers opened up new territory. On the first page of the preface they write:

… the role of electronic computation is central to our story. This doesn’t mean that every advance was computer-related. A land bridge had opened up to a new continent but not all were eager to cross.

Empirical Bayes and James-Stein estimation, they claim, could have been discovered under the constraints of mid-twentieth-century mechanical computation, but discovering the bootstrap, proportional hazard models, large-scale hypothesis testing, and the machine learning algorithms underlying much of data science required crossing the bridge.

A second path opened up in this text stops just short of the high ground of philosophy. Efron and Hastie blow by the great divide of the Bayesian versus Frequentist controversy to carefully consider the strengths and weaknesses of the three main systems of statistical inference: Frequentist, Bayesian and Fisherian Inference. You may have thought that Sir Ronald Fisher was a frequentist, but the inspired thoughts of a man of Fisher’s intellect are not so easily categorized. Efron and Hastie write:

Sir Ronald Fisher was arguably the most influential anti-Bayesian of all time, but that did not make him a conventional frequentist. His key data analytic methods … were almost always applied frequentistically. Their Fisherian rationale, however, often drew on ideas neither Bayesian nor frequentist in nature, or sometimes the two in combination.

Above all, in this text, Efron and Hastie are concerned with the clarity of statistical inference. They take special care to explain the nuances of Frequentist, Bayesian and Fisherian thinking, devoting early chapters to each of these conceptual frameworks. In these, they invite the reader to consider a familiar technique from either a Bayesian, Frequentist or Fisherian point of view. Then they raise issues and contrast and compare the merits of each approach. Unstated, but nagging in the back of my mind while reading these chapters, was the implication that there may, indeed, be other paths to the “science of learning from experience” (the authors’ definition of statistics) that have yet to be discovered.

But don’t let me mislead you into thinking that Computer Age Statistical Inference is mere philosophical fluff that doesn’t really matter day-to-day. Have a look at the table of contents. The book is organized into three parts. “Part I: Classic Statistical Inference” contains five chapters on classical statistical inference, including a gentle introduction to algorithms and inference, three chapters on the inference systems mentioned above, and a chapter on parametric models and exponential families. “Part II: Early Computer-Age Methods” has nine chapters on Empirical Bayes, James-Stein Estimation and Ridge Regression, Generalized Linear Models and Regression Trees, Survival Analysis and the EM Algorithm, The Jackknife and the Bootstrap, Bootstrap Confidence Intervals, Cross-Validation and Cp Estimates of Prediction Error, Objective Bayes Inference and MCMC, and Postwar Statistical Inference and Methodology. “Part III: Twenty-First-Century Topics” dives into the details of large-scale inference and data science, with seven chapters on Large-Scale Hypothesis Testing, Sparse Modeling and the Lasso, Random Forests and Boosting, Neural Networks and Deep Learning, Support Vector Machines and Kernel methods, Inference After Model Selection, and Empirical Bayes Estimation Strategies.

Efron and Hastie will keep your feet firmly on the ground while they walk you slowly through the details, pointing out what is important, and providing the guidance necessary to keep the whole forest in mind while studying the trees. From the first page, they maintain a unified exposition of their material by presenting statistics as a tension between algorithms and inference. With this in mind, it seems plausible that there really isn’t any big disconnect between the strict logic required to think your way through the pitfalls of large-scale hypothesis testing, and the almost cavalier application of machine learning models.

Nothing Efron and Hastie do throughout this entire trip is pedestrian. For example, their approach to the exponential family of distributions underlying generalized linear models doesn’t begin with the usual explanation of link functions fitting into the standard exponential family formula. Instead, they start with a Poisson family example, deriving a 2 parameter general expression for the family and showing how “tilting” the distribution by multiplying by an exponential parameter permits the derivation of other members of the family. The example is interesting in its own right, but the payoff, which comes a couple of pages later, is argument demonstrating how a generalization of the technique keeps the number of parameters required for inference under repeated sampling from growing without bound.

A great pedagogical strength of the book is the “Notes and Details” section concluding each chapter. Here you will find derivation details, explanations of Frequentist, Bayesian and Fisherian inference, and remarks of historical significance. The Epilogue ties everything together with a historical perspective that outlines how the focus of statistical progress has shifted between Applications, Mathematics and Computation throughout the twentieth century and the early part of this century.
Computer Age Statistical Inference contains no code, but it is clearly an R-informed text with several plots and illustrations. The data sets provided on Efron’s website, and the pseudo-code placed throughout the text are helpful for replicating much of what is described. The website points to the boot and bootstrap packages, and provides the code for a function used in the notes to the chapter on bootstrap confidence intervals.

My take on Computer Age Statistical Inference is that experienced statisticians will find it helpful to have such a compact summary of twentieth-century statistics, even if they occasionally disagree with the book’s emphasis; students beginning the study of statistics will value the book as a guide to statistical inference that may offset the dangerously mind-numbing experience offered by most introductory statistics textbooks; and the rest of us non-experts interested in the details will enjoy hundreds of hours of pleasurable reading.

Finally, for those of you who won’t buy a book without thumbing through it, PD Dr. Pablo Emilio Verde has you covered.