Big Data & Society: First Volume Abstracts

Showing posts with label First Volume Abstracts. Show all posts

Friday, 30 May 2014

Forthcoming in the Early Career Researcher Forum: Heather Ford and Brooke Foucault Welles

Big Data and Small: Collaborations Between Ethnographers and Data Scientists

Heather Ford, Oxford Internet Institute, UK

In the past three years, ethnographer and now-PhD student, Heather Ford, has worked on ad hoc collaborative projects around Wikipedia sources with two data scientists from Minnesota, Dave Musicant and Shilad Sen. In this essay, she talks about how the three met, how they worked together and what they gained from the experience. Three themes became apparent through their collaboration: that data scientists and ethnographers have much in common, that their skills are complementary, and that discovering the data together rather than compartmentalising research activities was key to their success.

On Minorities and Outliers: The Case for Making Big Data Small

Brooke Foucault Welles, Northeastern University, US

In this essay, I make the case for choosing to examine small subsets of Big Data datasets – making big data small. Big Data allow us to produce summaries of human behaviour at a scale never before possible. But in the push to produce these summaries, we risk losing sight of a secondary but equally important advantage of Big Data – the plentiful representation of minorities. Women, minorities and statistical outliers have historically been omitted from the scientific record, with problematic consequences. Big Data afford the opportunity to remedy those omissions, however to do so, Big Data researchers must choose to examine very small subsets of otherwise large datasets. I encourage researchers to embrace an ethical, empirical and epistemological stance on Big Data that includes minorities and outliers as reference categories, rather than the exceptions to statistical norms.

Monday, 28 April 2014

Forthcoming from Leonelli on Big Data in Biology, and Barnes & Wilson on Big Data, Social Physics and Spatial Analysis

We are pleased to post the following abstracts of forthcoming contributions:

What Difference Does Quantity Make? On the Epistemology of Big Data in Biology

Sabina Leonelli, University of Exeter, UK

This paper addresses the epistemological significance of big data within biology: is big data science a whole new way of doing research? Or, in other words: what difference does data quantity make to knowledge production strategies and their outputs? I argue that the novelty of big data science does not lie in the sheer quantity of data involved, though this certainly makes a difference to research methods and results. Rather, the novelty of big data science lies in (1) the prominence and status acquired by data as scientific commodity and recognised output; and (2) the methods, infrastructures, technologies, skills and knowledge developed to handle (format, disseminate, retrieve, model and interpret) data. These developments generate the impression that data-intensive research is a new mode of doing science, with its own epistemology and norms. I claim that in order to understand and critically discuss this claim, we need to analyze the ways in which data are actually disseminated and used to generate knowledge, and use such empirical study to question what counts as data in the first place. Accordingly, the bulk of this paper reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of work in experimental biology. I focus on online databases as a key example of infrastructures set up to organise and interpret such data; and on the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon in order to function well, including the ‘Open Data’ movement which is currently playing an important role in articulating the reasons and incentives for sharing scientific data in the first place. This case study illuminates some of the conditions under which the evidential value of data posted online is assessed and interpreted by researchers wishing to use those data to foster discovery, which in turn informs a philosophical analysis of what counts as data in the first place, and how data relate to knowledge production. In my conclusions, I reflect on the difference that data quantity is making to contemporary biological research, the methodological and epistemic challenges of identifying and analyzing data given these developments, and the opportunities and worries associated to big data discourse and methods.

Big Data, social physics and spatial analysis: the early years

Trevor J Barnes, University of British Columbia and Matthew W. Wilson, Harvard University and University of Kentucky

This paper examines one of the historical antecedents of Big Data, the social physics movement. Its origins are in the scientific revolution of the seventeenth century in Western Europe. But it is not named as such until the middle of the nineteenth century, and not formally institutionalized until another hundred years later when it is associated with work by George Zipf and John Stewart. Social physics is marked by the belief that large-scale statistical measurement of social variables reveals underlying relational patterns that can be explained by theories and laws found in natural science, and physics in particular. This larger epistemological position is known as monism, the idea that there are only one set of principles that apply to the explanation of both natural and social worlds. Social physics entered geography through the work of the mid-twentieth century geographer William Warntz, who developed his own spatial version called “macrogeography.” It involved the computation of large data sets, and made ever easier with the contemporaneous development of the computer, joined with the gravitational potential model. Our argument is that Warntz’s concerns with numeracy, large data sets, machine-based computing power, relatively simple mathematical formulas drawn from natural science, and an isomorphism between natural and social worlds became grounds on which Big Data later staked its claim to knowledge; it is a past that has not yet passed.

Wednesday, 19 March 2014

Forthcoming from Burrows & Savage, After the crisis?, & Lyon, Surveillance in the era of big data

We are pleased to post the following draft abstracts of forthcoming contributions:

COMMENTARY

After the Crisis? Big Data and the Methodological Challenges of Empirical Sociology

Roger Burrows, Goldsmiths, University of London, and Mike Savage, LSE, UK

Google Trends reveals that at the time we were writing our article on ‘The Coming Crisis of Empirical Sociology’ in 2007 almost nobody was searching the Internet for ‘big data’. It was only towards the very end of 2010 that the term began to register, just ahead of an explosion of interest from 2011 onwards. In this commentary we take the opportunity to reflect back on the claims we made in that original paper in light of more recent discussions about the social scientific implications of the inundation of digital data. Did our paper, with its emphasis on the emergence of, what we termed, ‘social transactional data’ and ‘digital byproduct data’ prefigure contemporary debates that now form the basis and rationale for this excellent new journal? Or was the paper more concerned with broader methodological, theoretical and political debates that have somehow been lost in all of the loud babble that has come to surround BIG DATA. This brief paper thus offers a reflexive and critical reflection on what has become – much to the surprise of its authors – one of the most cited papers in the discipline of sociology in the last decade.

RESEARCH ARTICLES

Surveillance in the era of big data: capacities, consequences, critique

David Lyon, Queen’s University, CA

The Snowden revelations about NSA surveillance, starting in 2013, along with the ambiguous complicity of internet companies and the international controversies that followed provide a perfect segue into contemporary conundrums of surveillance and big data. Attention has shifted from late C20th information technologies (IT) and networks to a C21st focus on data, currently crystallized in ‘big data.’ Affecting not only national security and marketing, but also health-care, finance, policing, democratic elections, education and research among others, Big data is seen as a political and business priority. Big data intensifies certain surveillance trends associated with IT and networks, and is thus implicated in fresh but fluid configurations. This is considered in three main ways: One, the capacities of big data (including metadata) intensify surveillance by expanding interconnected datasets and analytical tools. Existing dynamics of influence, risk-management and control increase their speed and scope through new techniques, especially predictive analytics. Two, while big data appears to be about size, qualitative change in surveillance practices is also perceptible, accenting consequences. Important trends persist – the control motif, faith in technology, public-private synergies and user-involvement – but the future-orientation increasingly severs surveillance from history and memory and the quest for pattern-discovery is used to justify unprecedented access to data. Three, the ethical turn becomes more urgent as a mode of critique. Modernity’s predilection for privacy betrays the subjects of surveillance who, so far from conforming to the abstract, disembodied image of both computing and legal practices, are engaged and embodied users-in-relation whose activities both fuel and foreclose surveillance. Rights to freedom and dignity appear more relevant than appeals to order and privacy.

Sunday, 9 February 2014

First Volume Contents III: Forthcoming contributions from researchers at Oxford Internet Institute and Microsoft Research

We are pleased to post the following draft abstracts of forthcoming contributions:

RESEARCH ARTICLES

Constructing meaning through big data: Reflexive triangulation and the problem of ground truth in user-generated content

Bernie Hogan, Mark Graham, Ahmed Medhat, David Palfrey, and Ralph Straumann, Oxford Internet Institute

Much has been said about the power of ‘big data’ to reveal social patterns and processes that were previously impossible to map and measure. A central part of these assertions has been the idea that with sufficient volumes, velocities or variety of data, such data becomes an increasingly accurate representation of the social world. Embedded in this claim is both a notion of the social world as stable and coherent, and the notion that meaning can emerge from data that are of a sufficient size.

Through an empirical analysis of spatial meaning in user-generated big data in Wikipedia, we demonstrate key considerations that must be internalized in the research process rather than discovered in the data. We detail how the triangulation of methods via visual inspection, natural language processing, statistical analysis and crowdsourcing identify six interpretive challenges for research: presence, correctness, resolution, uniqueness, stability, and authenticity.

For instance, location is often considered to be an objective piece of information in Wikipedia and is most usually articulated through points of latitude and longitude stored in a structured database. However, when analyzing the location embedded in articles and user pages, it becomes apparent that there is nothing neutral about the creation, analysis, filtering, or usage of this information. An article may not have a geotag despite being about a spatial entity (presence); users are continuously adding information to the Wikipedia database (stability); the act of embedding information is sometimes fraught with errors (correctness); its accuracy can be truncated (resolution); an article can include more than one point or a user may be ‘from’ multiple places (uniqueness); and the georeferencing of a place is not sufficient to associate a user with that place (authenticity). Very little of this complexity is apparent from just looking at labels of geographic coordinates.

This work ultimately demonstrates how data can never speak for themselves and points to a need for a reflexive consideration of the correspondence between research goals and empirical data. At every step of the process, different meanings and intentions have shaped the geographic information stored in Wikipedia’s databases. In other words, when trying to extract meaning from the entirety of Wikipedia’s store of information, we can demonstrate how there is never any kernel of ground truth behind layers of user generated content. Focusing on the codified ‘big data’ rather than the processes through which the codification happened runs the risk of concealing the technologies, motivations and cultures that come together to produce this content.

Although we focus on geographic content, these claims are not limited to geography. The act of creating these informational spaces is an inherently socially constructed process regardless of the external referent. The same can be said for forging relationships on social network sites where mixed methods uncover different notions of “friend” and on voting sites where mixed methods uncover different notions of “approval” or “upvoting”. In all cases, a reflexive triangulation can aid researchers in uncovering workable analysis rather than assuming the data will do it for them. Our archaeology of data ultimately demonstrates that large-scale statistical analysis and small-scale interpretive work are not mutually exclusive. Rather, they are mutually constitutive in the act of doing research. A recognition of this interplay across scales is necessary for work that preserves the reflexive nature of qualitative work and the scalability of big data analysis.

COMMENTARY

Data and its street life  

Alex Taylor, Microsoft Research

Is it just me, or does the talk about Big Data seem to be everywhere? Turning the (virtual) printed pages in the popular press, listening to the radio, and indeed perusing academic databases, so much of what I see and hear points towards Big Bata and its somewhat unnerving potential to reveal hitherto unexpected things about humanity (and yes the claims are often that grand). Census results; local statistics on crime, education, property prices, etc.; our shopping habits; our tweets and Facebook likes, etc. etc. All of it, it seems, is contributing to a veritable deluge of data to be aggregated and analysed. If we’re to believe the hype, altogether these data are enabling unprecedented access to who we are as individuals and citizens, and what we want.

What I find troubling, however, is that little of the rhetoric behind this big data appears to be about what ordinary folk—the person on the street if you will—might get from or do with their own data. Governments, corporate organisations, ad agencies, health providers, etc. all seem keen to amass data and, to be somewhat cynical about it, ‘monetise’ their ‘data assets’. Yes, some innovative and progressive attempts to redistribute agency are being developed here. Along with others the UK government, for instance, is actively involved in programmes to enable public access to their full range of data. Yet thus far it’s not clear to me who the intended audience is. A fair bit of know-how and expertise is needed to begin making sense of the award winning data.gov.uk site, let alone how one might go about actually analysing the data and putting it to use.

So why, precisely, does this deluge of data matter and for whom does it matter? In what form are the data being materialised and are these (new) forms enacting (to use the sociological parlance) anything different, anything that might just make a material difference to those of us on the street who are arguably producing much of the data? Oh, and amongst all of this, we must also ask is there the possibility of something else? Do the innovations in Big Data and big data analytics allow us to imagine something different, something better?

It’s with these questions as an impetus that myself and a mixture of designers, makers and social scientists have decided to quite literally take things back to the street. For a year, seven or eight of us from my organisation and some academic collaborators are hoping to develop a sense of why data might matter for people, people bound not in some seemingly arbitrary way through nodes in a dataset, but by the common place they live in, namely a street in Cambridge, Tenison Road. Working with the residents on Tenison Road our aim is to discover when data comes to matter on a street and how such a community might make it matter. Data is to be cast as coming from somewhere and intended for someone, not as it would seem Big Sata has it, from nowhere and for no one. Ultimately, we see this as a project in the making, already and always about a street and its emerging data. Our wish is to help make that data something relevant and purposeful.

In this commentary, I will describe some of the engagements we have had thus far with the street and our plans for the coming year. I will also touch on a few of the systems and services we are putting into place to enable Tenison Road’s residents to collect, aggregate and use their data. The work will be couched in a materialist understanding of data and describe how we are drawing on a feminist epistemology as a means to (re-)imagine alternative social and techno-scientific possibilities.

DEMONSTRATION

Public displays of street data

Alex Taylor, Microsoft Research

As part of the engagement with Tenison Road, we are building a range of ideas for monitoring movement along the street, e.g., automotive traffic, cyclists, pedestrians, etc. This demonstration will detail the building of these systems and how they have been designed to visualise the street’s data, publicly. We’ll explain how these systems have emerged through meetings we’ve had with Tenison Road’s residents in which they’ve voiced worries about new building developments in the area. The concern is for the traffic, pollution, loss of heritage, and, overall, the changing quality of life on the street. We’ll describe how the attempts to track movement are intended to sit alongside parallel data-centred efforts to catalogue the changing nature of Tenison Road and situated in the practical concerns of the street’s residents.

Thursday, 16 January 2014

First Volume Contents II: Forthcoming Research Articles by Kitchin, Couldry and Powell, and Turnbull

We are pleased to announce the following additional contributions to be published in 2014:

DRAFT ABSTRACTS: RESEARCH ARTICLES

Big Data, New Epistemologies and Paradigm Shifts

Rob Kitchin, National Institute for Regional and Spatial Analysis, National University of Ireland

This paper examines how the availability of big data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that big data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted and there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place.

Big Data from the Bottom Up

Nick Couldry and Alison Powell, London School of Economics

Contemporary societies are characterised by the way that all large-scale processes of strategic management (whether by corporations, governments or any other entity) increasingly rely on total surveillance: that is, on the continuous gathering and analysis of, and actions adjusted for, dynamically collected, individual-level data about what people are, do and say (big data). Compared to representative sampling, this approach to data collection is totalizing; it is also characterized by aggregation of multiple data sets through the use of calculation algorithms. This seemingly greater role for algorithms has led some commentators to focus on the alienating quality of the resulting ‘algorithmic power’ (Lash), an approach which leaves no room for agency or reflexivity on the part of ‘smaller’ actors. We posit that the emerging culture of data collection deserves to be examined in a way that foregrounds the agency and reflexivity of individual actors as well as the variable ways in which power and participation can be constructed. This article will offer a ‘social’ approach to the construction/use of data and analytics. The approach improves on current understandings of ‘big data's' theoretical and empirical qualities by thinking through how the new aspects of this type of data are relevant for people, their practices and their politics.

More specifically, the paper will explore, schematically, some key sites of fieldwork intervention that emerge within such an approach. First, work within a ‘social analytics’ approach (Couldry and Fotopoulou forthcoming) which studies the uses by social actors of data analytics of various sorts (whether generally available or customised) to sustain the sort of digital presence they need to have and more generally to achieve their broader social and civic purposes. Second, the study of 'data as media' that connects with how people are talking about emerging internet sensor-networks, and the questions about relative structural operations of power (and indeed participation, when data is media) that can be empirically investigated through around 'smart' cities, and emerging 'internet of things' applications. Third, turning to the wider issues for power and its resistance, we will discuss the increasing importance not just of voice (Couldry 2010) but also 'visibility' or 'transparency' when we think about the particular qualities of data as ‘media’.

Changing Topographies and In/vulnerabilities of Techno-scientific Knowledge Production in ‘The Big Data Revolution’

David Turnbull, Victorian Eco-Innovation Lab (VEIL), Faculty of Architecture, University of Melbourne

The idea of accumulating everything, of establishing a sort of general archive, the will to enclose in one place all times, all epochs, all forms, all tastes, the idea of constituting a place of all times that is itself outside of time and inaccessible to its ravages, the project of organizing in this way a sort of perpetual and indefinite accumulation of time in an immobile place, this whole idea belongs to our modernity. (Of Other Spaces, Foucault, 1967)

Contemporary databases, seen as the culmination of a long line of various information technologies, might now be recognized as “perhaps the most powerful technology in our control of the world and each other.” Memory Practices in the Sciences, Bowker, 2005)

There is no political power without control of the archive, if not of memory. Effective democratisation can always be measured by…access to the archive, its constitution and its interpretation. (Archive Fever, Derrida, 1995)

In a period of extraordinarily rapid transformation since around 2010 the advanced economies of the world have moved from the relatively immature stages of what was once celebrated as ‘the information age’, ‘the digital era’, and ‘the knowledge economy’ to what the IT trade boosters promote as ‘Analytics 3.0’[1] the ‘Fourth Paradigm’[2], the ‘Big Data Revolution.’[3] Now it is claimed that the entirety of the events in the universe, can be digitally recorded, quantified, assembled, mapped and analysed algorithmically, enabling data mining and pattern recognition on the most massive of scales, which in turn will reveal the answers to many questions including ones not yet asked.[4]

In addition to heralding a profound epistemic transformation in the effort to create a panoptic archive the ‘Big Data revolution’ also seems set to introduce massive economic, social and political transformations in knowledge production, privacy, identity in the massive efflorescence of global surveillance revealed by Edward Snowden. Economically the Big Data revolution can be seen as analogous to the industrial revolution where labour and land were accumulated through dispossession and enclosure of the commons. This time the object of accumulation is another common good–the previously unrecognised and unvalued asset class– data.[5] The world is undergoing the latest phase of socio-technical change that Heidegger described as ‘enframing’– the continual setting up, ordering, transformation and revealing of everything in the world as ‘standing reserve’– as something ready to be used, and to be transformed and used again’. But now not only are ‘our everyday lives are turned into data, a resource to be used by others, usually for profit’’ they are directly accessed by the world’s security agencies in pursuit of total surveiilance.[6] As Bruce Rich points out enclosure and framing have reciprocal meanings: ‘enframing has its historical counterpart in the interpretation of western economic development as a process of enclosure’, and according to the OED one definition of enclosure is to insert in a frame or a setting.[7] As ever the terms and conditions of what it means to know and to interact are being shaped by the technologies that provide communication and information.

Big data does not, of course, exist in isolation, it has come into being in part through the development of the interconnective network made possible by the internet, and through the ubiquitous invisible web of digitisation, software, code, and algorithms that are the calculative framework and infrastructure of modern society.[8] Big Data in combination with the internet has generated a new knowledge space, a commons which initially held great promise as a knowledge democracy in which everyone would have equal, unfettered, rights, along with open access and participation in all there was to know. But paradoxically, the interconnectivity and the generation of massive volumes of data that make the augmented knowledge commons possible also create the conditions for ‘digital enclosure’.[9] The forms of enclosure of the knowledge commons are multiple, complex and emergent, but all serve to restrict access, freedom and innovation through surveillance, territorial control, and commodification, or through standardization and ontological constriction.

Many would argue that there is nothing new or revolutionary about Big Data, science and the state have always aimed for accumulating as much data as possible. What makes it possible to argue that big data is now revolutionising knowledge production and surveillance is that the development of new modes of data assemblage and analysis make it seem, not just plausible, but essential to establish a regime of recording everything and hence to know everything. The evanescent dream of knowing everything and everyone is reborn as a combination of the panoptic archive and the ‘surveillance assemblage’[10]. Radically powerful and insightful new ways of understanding the world are indeed being opened up, but unsurprisingly the new world of Big Data is attended by paradoxes, contradictions, exclusions, restrictions, discriminations, and occlusions.[11]

[1] Davenport, TH. (2013), Preparing for Analytics 3.0, Retrieved Feb 28, 2013, from http://blogs.wsj.com/cio/2013/02/20/preparing-for-analytics-3-0/.

[2] Hey, T, S Tansley, et al., Eds) (2009), The Fourth Paradigm: Data-intensive Scientific Discovery. Redmond, Microsoft Research.

[3] Mayer-Schonberger, V and K Cukier (2013) Big Data: A Revolution That Will Transform How We Live, Work and Think, Boston, Houghton Miflin.

[4] Steiner, C (2012) Automate This: How Algorithms Came to Rule The World, New York, Portfolio/Penguin.

[5] Personal Data: The Emergence of a New Asset Class: http://www3.weforum.org/docs/WEF_ITTC_PersonalDataNewAsset_Report_2011.pdf

[6] Berry, D (2011) The Philosophy of Software: Code and Mediation in the Digital Age, New York, Palgrave MacMillan, p2, citing Heidegger 1993 Question concerning technology, 32

[7] Rich, B (1995) Mortgaging the Earth: The World Bank, Environmental Impoverishment, and the Crisis of Development, Boston, Beacon Press, 238.

[8] Graham, SDN (2005) Software-sorted Geographies, Progress in Human Geography 29: 562-80; Kitchen, R and M Dodge (2011) Code/space: Software and Everyday Life, Cambridge, MIT Press.

[9] Andrejevic, M (2007) iSpy: Surveillance and Power in the Interactive Era, University of Kansas Press. Andrejevic, M (2007) Surveillance in the Digital Enclosure, The Communication Review, 10: 295-317. Schiller, D (2007) How To Think About Information, Urbana, University of Illinois.

[10] Haggerty, K and R Ericson (2000) The Surveillant Assemblage, British Journal of Sociology, 51(4): 605-22.

[11] Foucault, M (1986) Of Other Spaces, Diacritics, 16(Spring): 22-27; Bowker, G (2013) The Theory/Data Thing, International Journal of Comunication, 7: 1-20.

Tuesday, 17 December 2013

First Volume Contents I: Forthcoming Research Articles and Commentaries

Over the coming months we will periodically post draft titles and abstracts of forthcoming contributions to the First Volume of the Journal. We are pleased to announce the following:

DRAFT ABSTRACTS: RESEARCH ARTICLES

Mapping the Dynamics of Climate Negotiations

Tommaso Venturini, Jean-Philippe Cointet, Nicolas Baya-Laffite, Ian Gray, and Vinciane Zabban, médialab, SciencesPo, FR

The advent of digital traces and tools capable of mapping these traces is bringing about a profound renewal in the social sciences. Thanks to these tools' abilities to handle vast quantities of data without reducing their quality through aggregation, emerging digital methods promise to overcome the classic gap between qualitative and quantitative methods.

For this promise to be fulfilled, however, three major misunderstandings about the nature of digital methods should be resolved: 1) a conception of digital traces by many social scientists that is both too narrow (in terms of their sources) and too ambitious (in terms of their representativeness); 2) an alternation between disregard and paranoia about the conditions of production of digital traces; 3) the tendency to confuse the digital and automatic.

In this article the advantages and the misunderstandings of the methodological shift triggered by digital mapping will be discussed through the case study of international negotiations about climate change. Though these negotiations are crucially important for the future of our societies and though they are thoroughly transcribed and made available on the Web, traceability does not necessarily guarantee legibility. Despite the availability of data, following the dynamics of the climate debate remains a challenge. How to make sense of 20 years of negotiations (and thousands of lines of transcriptions) while remaining sensible to the single diplomatic moves that might have steered the negotiations?

In this article, we discuss the efforts we deployed to 1) extract the dynamic network of actors and arguments mobilized in the negotiations convened under the United Nations Framework Convention on Climate Change (UNFCCC), and 2) provide a qualitative-quantitative map of the debate, capable of zooming-out on the major trends and coalitions over 20 years of negotiations and zooming-in on actor specific statements made during particular sessions of the UNFCCC Conference of Parties (COP).

Bigger and better, or more of the same? Emerging practices and perspectives on big data analysis in economics

Linnet Taylor, University of Amsterdam, NL and Ralph Schroeder and Eric Meyer, Oxford Internet Institute (OII), UK

Although the terminology of big data has so far gained little traction in economics, the availability of unprecedentedly rich datasets and the need for new approaches – both epistemological and computational – to deal with them is an emerging issue for the discipline. Using interviews conducted with a cross-section of economists, this paper examines perspectives on ‘big data’ across the discipline, the new types of data being used by researchers on economic issues, and the range of responses to this opportunity amongst economists. First we offer an overview of the geographic, professional and disciplinary networks that govern how big data is accessed and used by economists. Next, we outline the areas in which it is being used, including the prediction and ‘nowcasting’ of economic trends; mapping and predicting influence in the context of marketing; and acting as a cheaper or more accurate substitute for existing types of data such as censuses or labour market data. We then analyse the broader current and potential contributions of big data to economics, such as the ways in which econometric methodology is being used to shed light on questions beyond economics, how big data is improving or changing economic models, and the kinds of collaborations arising around big data between economists and other disciplines.

DRAFT ABSTRACTS: COMMENTARY

Official Statistics and Big Data

Peter Struijs, Barteld Braaksma and Piet Daas, Statistics Netherlands, NL

The advent of Big Data is expected to have a big impact for organisations for which the production and analysis of information is core business. National Statistical Institutes (NSIs) are such organisations. They are responsible for official statistics, which are heavily used by policy makers and other key players in society. The way they take up big data will eventually influence all of society. This contribution describes the changing position of NSIs in the age of Big Data. Up to, say, 1980, data was a scarce commodity. Official statistical information based on survey data had a unique value: there simply was no alternative. In the last couple of decades, more and more public administrations have started to systematically collect data. Statistical data collection by means of questionnaires was supplemented and partially replaced by the use of administrative data sources. The information provided by NSIs remained unique. In particular the possibility to combine data from different sources made official statistics even more valuable, since in many countries no other organisation was positioned to do so. However, Big Data is going to change that as both individuals and businesses start to create huge amounts of data.

Big Data sources create a number of opportunities for NSIs. There is a huge potential for new statistics. Location data for mobile phones could be used for almost instantaneous daytime population and tourism statistics. Social media messages could be used for several types of indicators, such as an early indicator of consumer confidence. Inflation figures could be derived from price information on the web. And so on. In addition, Big Data sources may be used to substitute or supplement more traditional data sources, such as questionnaires and administrative data sources, for already existing statistics.

The use of Big Data for official statistics, however, poses a number of challenges to NSIs. An obvious one is handling unusually large data sets. This may induce new forms of cooperation with data providers, IT vendors and academia, and change training needs. Changes in methods may require different skills and a different mindset. Other issues concern privacy, confidentiality and data security, which have recently gained weight by the increased public awareness that intelligence agencies are among the most active Big Data users.

Even more important for NSIs is the change of their position. Businesses have already started to produce information similar to what is traditionally provided by official statistical institutes, and they may become able to address new and existing statistical users’ needs. However, the institutional position of NSIs and their traditional values can still be a strength. NSIs are trusted parties that put a lot of effort into sound methodologies, quality control and privacy protection. By exploiting these values they can give an independent, professional judgement on the quality of information provided by other parties. Moreover, they have unique data collection opportunities including special access to administrative data sources. This could be extended to Big Data sources.

The future of official statistics in the age of Big Data is still hazy, but it is clear that the international statistical community needs to adapt to the new reality and react to the opportunities and challenges it provides. More collaboration will be needed among players inside and outside that community. As this contribution will show, there are already many activities going on and many national and international initiatives have been launched. For official statistics, the Big Data era is a most exciting time.

Big Data & Society (BD&S) is an open access peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies.

The Journal's key purpose is to provide a space for connecting debates about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge.

Impact Factor for 2020 = 5.987

Follow @BigDataSoc