Data is eating the world and winning Nobel Prizes, hiding behind better marketing terms such as “artificial intelligence” or “AI.” The 2024 Nobel for Physics was awarded to Geoffrey Hinton (the “godfather of AI”) and John Hopfield for their work on “machine learning with artificial neural networks.” Half of the Chemistry prize was awarded to DeepMind’s Demis Hassabis and John Jumper for their AlphaFold, a machine that learned to guess correctly the structure of proteins, also using artificial neural networks.
Both prizes were awarded for what used to be called “computational statistics,” complex statistical methods that identify patterns in large amounts of data and a process for developing a computer program that can find the same patterns or abstractions in a new data set. Both prizes celebrate the recent success of the “connectionism” approach to the development of “artificial intelligence” computer programs.
Born in the mid-1950s, artificial neural networks-based “connectionism” has been overshadowed until about a decade ago by “symbolic AI,” another approach born at the same time. The proponents of symbolic AI regarded as “alchemy” the statistical analysis favored by the connectionists, believing in the power of (their) human intelligence and its ability to come up with “rules” for logical thinking that can be then programmed to make computers think, reason, and plan. This was the dominant ideology in computer science in general, the fervent belief in “expert systems” (as one branch of symbolic AI was labeled), i.e., that it is possible for experts (computer scientists, AI researchers and developers) to distill human knowledge into computer code.
When Jonathan Rosenfeld, Somite.ai co-founder and CTO, recently explained to me his AI Scaling Laws, he mentioned Rich Sutton’s The Bitter Lesson in the context of why he (Rosenfeld) wants to “do better than experts.” Reviewing AI breakthroughs in chess, Go, speech recognition and natural language processing, Sutton concluded that “the only thing that matters in the long run is the leveraging of computation.” The falling cost of a unit of computation or “Moore’s Law,” is all that matters.
Sutton drew two lessons from computing’s bitter lesson. One insight is that what experts think about thinking and the rules they come up with don’t matter much because it’s futile to “find simple ways to think about the contents of minds.” The other lesson is that crunching numbers (eventually) triumphs every time because, unlike experts, it scales: “the power of general-purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great.”
Sutton noted that “the two methods that seem to scale arbitrarily in this way are search and learning,” methods that have served as the foundation of successful recent AI breakthroughs such as image classification, AlphFold and LLMs. But while the cost of computation has been rapidly and steadily decreasing for decades, these breakthroughs happened only over the last decade. Why?
No doubt Sutton has highlighted an important factor—the declining cost of computation—in the recent triumph of artificial neural networks (or deep learning or computational statistics). Writing in 2019, however, he should have acknowledged one other important contributor to the sudden triumph of connectionism: the availability of lots and lots of data.
When Tim Berners-Lee invented the World Wide Web thirty-five years ago, he (and many other inventors following him) created a giant data repository accessible to billions of internet users worldwide. Coupled with new tools (primarily the smart phone) for creating and sharing data in multiple forms (text, images, video), the Web supplied the crucial factor enabling the recent success of the old-new approach to “AI.”
The declining cost of computation and the discovery that GPUs were the most efficient way to process the calculations needed to find patterns in lots of data were not on their own what enabled the 2012 breakthrough in image classification. The prime contributor for that breakthrough was the availability of labeled images scraped from the Web and assembled in 2009 in ImageNet, an organized database. Similarly, the invention of a new type of statistical model for processing and analyzing text in 2017 was important contributor to today’s ChatGPT bubble but “generative AI” could not have happened without the massive amount of text (and images, and videos) available (with or without permission) on the Web.
Why has the declining cost of computation or “Moore’s Law” so central to descriptions and explanations of the trajectory of computing in general and “AI” advances in particular? Why have computing industry observers missed the most important trend, the explosion of data, driving technological innovations since at least the 1990s?
The term “data processing” was coined in 1954. “It’s not that all industry participants ignored data,” I wrote in 2019. “But data was perceived as the effect and not the cause, the result of having faster and faster devices processing more data and larger and larger containers (also enabled by Moore’s Law) to store it.”
“Data” is an ephemeral concept, difficult to define and quantify, as opposed to processing power and the fact that we could see with our own eyes the rapid shrinking of computers. Also helping fuel the focus on processing rather than on data was the very successful marketing powerhouse, Intel.
“Data” had a brief PR success between about 2005 and 2015, with terms such as “Big Data” and “Data Science” becoming the talk of the day. But these were quickly eclipsed by the most successful marketing and branding campaign ever, that of “artificial intelligence.” Still, data continues to eat the world, for better and worse. Finally, it even got—without explicit acknowledgment—two Nobel prizes.