Big-data and scientific paradigms

Sudeep Adhikari


Jim Gray, a renowned American computer scientist, has sub-divided the history of scientific exploration into four different distinct paradigms: Empirical (for instance, ancient Greek and Indian astronomy), Theoretical (Newtonian mechanics, Quantum and Relativistic mechanics etc), Computational (simulation of complex phenomena not amenable to analytical tools) and Data Exploration (unification of experiment, theory and simulation). Data-intensive science based on data exploration, according to Jim Gray, characterizes current scientific paradigm. With the vast amount of data at our disposal that have resulted from experiments, simulations and analytical studies in the past, according to Gray, “information velocity” of scientific studies can be significantly accentuated if they are made available online allowing their inter-operation.

It won’t be wrong to admit that complex systems such as biosphere, economy, society and consciousness may not be comprehended in their mind-boggling totality within the premise of our current scientific methodology, which is further shaped by our intrinsic linear, causal and reductionistic outlook. So is the fourth paradigm, that is, data exploration our foray into the newest scientific paradigm? I would not consider this picture very optimistic because data-intensive studies are very obviously guided by brute statistical approaches. I am also not sure if this approach is scientific in a definitive sense. Chris Anderson, however, in his thought provoking article titled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” in Wired (2008) argued how the conventional scientific methodology, which is based on building “testable hypotheses which can be further experimentally confirmed or falsified” is fast becoming obsolete. According to Anderson, big-data is not a just a lot of data, it is a different ontology in itself. Data without model may be just noise, but with big data at the disposal, he stated rather arrogantly: correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. I would still admit that this is very much debatable, but leaving aside issues pertaining to deeper scientific values of “understanding how nature works” for a while, may be this is the only practical way to make meaningful and pragmatic inferences while dealing with complex systems. As put forth by Peter Norvig, Google’s research director: all models are wrong, and increasingly you can succeed without them. What a time to be alive!

A whopping one-third of Amazon’s purchase comes from algorithmically generated recommendations on their webpage. Wal-Mart’s algorithm similarly found that when people are preparing for massive storms, they tend to desperately buy strawberry pop-tarts. This is not because Amazon’s and Wal-Mart’s algorithms have  better understanding of human-psychology to nudge them into buying crazy things; it  is just the shape of big-data discovered by their statistical algorithms.  While I personally have big reservation against treating such studies as belonging to the realm of science, their efficacy in terms of immense practicality to deal with complex systems can’t be under-estimated.

As discussed in the essay titled “The Emerging Science of Environmental Applications” by Jeff Dozier (University of California, Santa Barbara) and William B. Gail (Microsoft), data-intensive fourth paradigm is not a science in the traditional sense, that is question driven and guided by first principle, but a discipline which is more aimed at enabling us to seek practical courses of action. As further elaborated by the authors, “useful even when incomplete” is another primary signature of such an approach. This seems to be more relevant in the case of complex issue such as climate change, where despite the current state of our knowledge being fragmentary, there is an urgent need for us to make quick but useful inferences for the overall good of Earth and the whole humanity. Seeing that way, the fourth scientific paradigm is not based on the way we do science in  traditional sense, but its relevance lies in the application of vast amount of data at our disposal to make practical and actionable plans.

As similarly discussed by John R. Delaney (University of Washington) and Roger S. Barga (Microsoft), Earth’s ocean-system is the largest as well as the most complex biome which dynamics is still poorly known. The fourth paradigm is here most relevant because we may not be able to know the ocean-system at its totality, but what it can ensure is the availability of necessary information to make informed decisions of very immediate practical nature through massive data storage, cloud computing, scientific workflow, advanced graphics and handheld supercomputing.


Image credit:

One of the striking examples of the application of big-data paradigm in the realm of pure science is the discovery of Higgs Boson in 2012. The amount of data collected during the collision in the massive detectors of Large Hadron Collider (LHC) is so large that it took two complete years (2010 to 2012) to organize them before making the historic announcement. In contrast to traditional scientific paradigm, existence of Higgs Boson was conferred, not as an objective fact, but solely based on the shape of big-data that flowed profusely from LHC. Not only for the pragmatic comprehension of complex systems, may big-data also have a role to play to deal with vastly complicated experimental events. How is that for the fourth paradigm of scientific exploration?

I am not going to discount the immense practicality of fourth paradigm and its pivotal role for enabling us to make informed decisions, particularly while dealing with complex systems such as environment and economics. But at the meantime, it is an urgent duty of philosophy of science to demarcate if this paradigm is really a science in proper sense before we regress helplessly into a newer form of reductionistic dogmatism, with atoms replaced by bits. Is not there an eerily similar undercurrent of arrogance between Laplace’s I had no need of that hypothesis when asked by Napoleon about the role of God in his celestial mechanics and Peter Norvig’s conviction that google doesn’t need models to succeed?

The writer is a freelancer and writes regularly on science, philosophy, religion and pop-culture.

Published on 10 May 2020