We are pleased to post the following abstracts of forthcoming contributions:
Sabina Leonelli, University of Exeter, UK
This paper addresses the epistemological significance of big data within biology: is big data science a whole new way of doing research? Or, in other words: what difference does data quantity make to knowledge production strategies and their outputs? I argue that the novelty of big data science does not lie in the sheer quantity of data involved, though this certainly makes a difference to research methods and results. Rather, the novelty of big data science lies in (1) the prominence and status acquired by data as scientific commodity and recognised output; and (2) the methods, infrastructures, technologies, skills and knowledge developed to handle (format, disseminate, retrieve, model and interpret) data. These developments generate the impression that data-intensive research is a new mode of doing science, with its own epistemology and norms. I claim that in order to understand and critically discuss this claim, we need to analyze the ways in which data are actually disseminated and used to generate knowledge, and use such empirical study to question what counts as data in the first place. Accordingly, the bulk of this paper reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of work in experimental biology. I focus on online databases as a key example of infrastructures set up to organise and interpret such data; and on the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon in order to function well, including the ‘Open Data’ movement which is currently playing an important role in articulating the reasons and incentives for sharing scientific data in the first place. This case study illuminates some of the conditions under which the evidential value of data posted online is assessed and interpreted by researchers wishing to use those data to foster discovery, which in turn informs a philosophical analysis of what counts as data in the first place, and how data relate to knowledge production. In my conclusions, I reflect on the difference that data quantity is making to contemporary biological research, the methodological and epistemic challenges of identifying and analyzing data given these developments, and the opportunities and worries associated to big data discourse and methods.
What Difference Does Quantity Make? On the Epistemology of Big Data in Biology
Sabina Leonelli, University of Exeter, UK
This paper addresses the epistemological significance of big data within biology: is big data science a whole new way of doing research? Or, in other words: what difference does data quantity make to knowledge production strategies and their outputs? I argue that the novelty of big data science does not lie in the sheer quantity of data involved, though this certainly makes a difference to research methods and results. Rather, the novelty of big data science lies in (1) the prominence and status acquired by data as scientific commodity and recognised output; and (2) the methods, infrastructures, technologies, skills and knowledge developed to handle (format, disseminate, retrieve, model and interpret) data. These developments generate the impression that data-intensive research is a new mode of doing science, with its own epistemology and norms. I claim that in order to understand and critically discuss this claim, we need to analyze the ways in which data are actually disseminated and used to generate knowledge, and use such empirical study to question what counts as data in the first place. Accordingly, the bulk of this paper reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of work in experimental biology. I focus on online databases as a key example of infrastructures set up to organise and interpret such data; and on the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon in order to function well, including the ‘Open Data’ movement which is currently playing an important role in articulating the reasons and incentives for sharing scientific data in the first place. This case study illuminates some of the conditions under which the evidential value of data posted online is assessed and interpreted by researchers wishing to use those data to foster discovery, which in turn informs a philosophical analysis of what counts as data in the first place, and how data relate to knowledge production. In my conclusions, I reflect on the difference that data quantity is making to contemporary biological research, the methodological and epistemic challenges of identifying and analyzing data given these developments, and the opportunities and worries associated to big data discourse and methods.
Big Data, social physics and spatial analysis: the early years
Trevor J Barnes, University of British Columbia and Matthew W. Wilson, Harvard University and University of Kentucky
This paper examines one of the historical antecedents of Big Data, the social physics movement. Its origins are in the scientific revolution of the seventeenth century in Western Europe. But it is not named as such until the middle of the nineteenth century, and not formally institutionalized until another hundred years later when it is associated with work by George Zipf and John Stewart. Social physics is marked by the belief that large-scale statistical measurement of social variables reveals underlying relational patterns that can be explained by theories and laws found in natural science, and physics in particular. This larger epistemological position is known as monism, the idea that there are only one set of principles that apply to the explanation of both natural and social worlds. Social physics entered geography through the work of the mid-twentieth century geographer William Warntz, who developed his own spatial version called “macrogeography.” It involved the computation of large data sets, and made ever easier with the contemporaneous development of the computer, joined with the gravitational potential model. Our argument is that Warntz’s concerns with numeracy, large data sets, machine-based computing power, relatively simple mathematical formulas drawn from natural science, and an isomorphism between natural and social worlds became grounds on which Big Data later staked its claim to knowledge; it is a past that has not yet passed.
This paper examines one of the historical antecedents of Big Data, the social physics movement. Its origins are in the scientific revolution of the seventeenth century in Western Europe. But it is not named as such until the middle of the nineteenth century, and not formally institutionalized until another hundred years later when it is associated with work by George Zipf and John Stewart. Social physics is marked by the belief that large-scale statistical measurement of social variables reveals underlying relational patterns that can be explained by theories and laws found in natural science, and physics in particular. This larger epistemological position is known as monism, the idea that there are only one set of principles that apply to the explanation of both natural and social worlds. Social physics entered geography through the work of the mid-twentieth century geographer William Warntz, who developed his own spatial version called “macrogeography.” It involved the computation of large data sets, and made ever easier with the contemporaneous development of the computer, joined with the gravitational potential model. Our argument is that Warntz’s concerns with numeracy, large data sets, machine-based computing power, relatively simple mathematical formulas drawn from natural science, and an isomorphism between natural and social worlds became grounds on which Big Data later staked its claim to knowledge; it is a past that has not yet passed.