Peter Norvig, “On Chomsky and the Two Cultures of Statistical Learning” (2011):
At the Brains, Minds, and Machines symposium held during MIT’s 150th birthday party, Technology Review reports that Prof. Noam Chomsky
derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior.
The transcript is now available, so let’s quote Chomsky himself:
It’s true there’s been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success … which I think is novel in the history of science. It interprets success as approximating unanalyzed data.
This essay discusses what Chomsky said, speculates on what he might have meant, and tries to determine the truth and importance of his claims. Chomsky’s remarks were in response to Steven Pinker’s question about the success of probabilistic models trained with statistical methods.
- What did Chomsky mean, and is he right?
- What is a statistical model?
- How successful are statistical language models?
- Is there anything like their notion of success in the history of science?
- What doesn’t Chomsky like about statistical models?
The abstract of Leo Breiman, “Statistical Modeling: The Two Cultures” in Statistical Science (2001):
There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.