The two cultures of statistical modeling

Peter Norvig, “On Chomsky and the Two Cultures of Statistical Learning” (2011):

At the Brains, Minds, and Machines symposium held during MIT’s 150th birthday party, Technology Review reports that Prof. Noam Chomsky

derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior.

The transcript is now available, so let’s quote Chomsky himself:

It’s true there’s been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success … which I think is novel in the history of science. It interprets success as approximating unanalyzed data.

This essay discusses what Chomsky said, speculates on what he might have meant, and tries to determine the truth and importance of his claims. Chomsky’s remarks were in response to Steven Pinker’s question about the success of probabilistic models trained with statistical methods.

  1. What did Chomsky mean, and is he right?
  2. What is a statistical model?
  3. How successful are statistical language models?
  4. Is there anything like their notion of success in the history of science?
  5. What doesn’t Chomsky like about statistical models?

The abstract of Leo Breiman, “Statistical Modeling: The Two Cultures” in Statistical Science (2001):

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

“White box” machine learning

From “A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action,” Cell 177, 1–13 (2019):

Data-driven machine learning activities are poised to transform biological discovery and the treatment of human disease (Camacho et al., 2018, Wainberg et al., 2018, Webb, 2018, Yu et al., 2018a); however, existing techniques for extracting biological information from large datasets frequently encode relationships between perturbation and phenotype in opaque “black-boxes” that are mechanistically uninterpretable and, consequently, can only identify correlative as opposed to causal relationships (Ching et al., 2018). In natural systems, biological molecules are biochemically organized in networks of complex interactions underlying observable phenotypes; biological network models may therefore harbor the potential to provide mechanistic structure to machine learning activities, yielding transparent “white-box” causal insights (Camacho et al., 2018, Yu et al., 2018b).

Chemical and genetic screens are workhorses in modern drug discovery but frequently suffer from poor (1%–3%) hit rates (Roses, 2008). Such low hit rates often underpower the bioinformatic analyses used for causal inference because of limitations in biological information content. Experimentally validated network models possess the potential to expand the biological information content of sparse screening data; however, biological screening experiments are typically performed independently from network modeling activities, limiting subsequent analyses to either post hoc bioinformatic enrichment from screening hits or experimental validation of existing models. Therefore, there is a need to develop biological discovery approaches that integrate biochemical screens with network modeling and advanced data analysis techniques to enhance our understanding of complex drug mechanisms (Camacho et al., 2018, Wainberg et al., 2018, Xie et al., 2017). Here we develop one such approach and apply it to understanding antibiotic mechanisms of action.


Machine learning aims to generate predictive models from sets of training data; such activities are typically comprised of three parts: input data, output data, and the predictive model trained to compute output data from input data (Figure 1A; Camacho et al., 2018). Although modern machine learning methods can assemble high-fidelity input-output associations from training data, the functions comprising the resulting trained models often do not possess tangible biochemical analogs, rendering them mechanistically uninterpretable. Consequently, predictive models generated by such (black-box) machine learning activities are unable to provide direct mechanistic insights into how biological molecules are interacting to give rise to observed phenomena. To address this limitation, we developed a white-box machine learning approach, leveraging carefully curated biological network models to mechanistically link input and output data (Yu et al., 2018b).

h/t Anne Trafton of MIT News, “Painting a Fuller Picture of How Antibiotics Act”:

Markus Covert, an associate professor of bioengineering at Stanford University, says the study is an important step toward showing that machine learning can be used to uncover the biological mechanisms that link inputs and outputs.

“Biology, especially for medical applications, is all about mechanism,” says Covert, who was not involved in the research. “You want to find something that is druggable. For the typical biologist, it hasn’t been meaningful to find these kinds of links without knowing why the inputs and outputs are linked.”

The most vigorous exercise

C. S. Peirce, §10. Kinds of Reasoning, in Chapter 2, Lessons from the History of Science, Principles of Philosophy:

The methods of reasoning of science have been studied in various ways
and with results which disagree in important particulars. The followers of Laplace treat the subject from the point of view of the theory of probabilities. After corrections due to Boole and others, that method yields substantially the results stated above. Whewell described the reasoning just as it appeared to a man deeply conversant with several branches of science as only a genuine researcher can know them, and adding to that knowledge a full acquaintance with the history of science. These results, as might be expected, are of the highest value, although there are important distinctions and reasons which he overlooked. John Stuart Mill endeavored to explain the reasonings of science by the nominalistic metaphysics of his father. The superficial perspicuity of that kind of metaphysics rendered his logic extremely popular with those who think, but do not think profoundly; who know something of science, but more from the outside than the inside, and who for one reason or another delight in the simplest theories even if they fail to cover the facts.

Mill denies that there was any reasoning in Kepler’s procedure. He says it is merely a description of the facts. He seems to imagine that Kepler had all the places of Mars in space given him by Tycho’s observations; and that all he did was to generalize and so obtain a general expression for them. Even had that been all, it would certainly have been inference. Had Mill had even so much practical acquaintance with astronomy as to have practised discussions of the motions of double stars, he would have seen that. But so to characterize Kepler’s work is to betray total ignorance of it. Mill certainly never read the De Motu [Motibus] Stellae Martis, which is not easy reading. The reason it is not easy is that it calls for the most vigorous exercise of all the powers of reasoning from beginning to end.