Sunday, 9 February 2014

First Volume Contents III: Forthcoming contributions from researchers at Oxford Internet Institute and Microsoft Research

We are pleased to post the following draft abstracts of forthcoming contributions:


Constructing meaning through big data: Reflexive triangulation and the problem of ground truth in user-generated content

Bernie Hogan, Mark Graham, Ahmed Medhat, David Palfrey, and Ralph Straumann, Oxford Internet Institute

Much has been said about the power of ‘big data’ to reveal social patterns and processes that were previously impossible to map and measure. A central part of these assertions has been the idea that with sufficient volumes, velocities or variety of data, such data becomes an increasingly accurate representation of the social world. Embedded in this claim is both a notion of the social world as stable and coherent, and the notion that meaning can emerge from data that are of a sufficient size.

Through an empirical analysis of spatial meaning in user-generated big data in Wikipedia, we demonstrate key considerations that must be internalized in the research process rather than discovered in the data. We detail how the triangulation of methods via visual inspection, natural language processing, statistical analysis and crowdsourcing identify six interpretive challenges for research: presence, correctness, resolution, uniqueness, stability, and authenticity.

For instance, location is often considered to be an objective piece of information in Wikipedia and is most usually articulated through points of latitude and longitude stored in a structured database. However, when analyzing the location embedded in articles and user pages, it becomes apparent that there is nothing neutral about the creation, analysis, filtering, or usage of this information. An article may not have a geotag despite being about a spatial entity (presence); users are continuously adding information to the Wikipedia database (stability); the act of embedding information is sometimes fraught with errors (correctness);  its accuracy can be truncated (resolution); an article can include more than one point or a user may be ‘from’ multiple places (uniqueness); and the georeferencing of a place is not sufficient to associate a user with that place (authenticity). Very little of this complexity is apparent from just looking at labels of geographic coordinates.

This work ultimately demonstrates how data can never speak for themselves and points to a need for a reflexive consideration of the correspondence between research goals and empirical data. At every step of the process, different meanings and intentions have shaped the geographic information stored in Wikipedia’s databases.  In other words, when trying to extract meaning from the entirety of Wikipedia’s store of information, we can demonstrate how there is never any kernel of ground truth behind layers of user generated content. Focusing on the codified ‘big data’ rather than the processes through which the codification happened runs the risk of concealing the technologies, motivations and cultures that come together to produce this content.

Although we focus on geographic content, these claims are not limited to geography. The act of creating these informational spaces is an inherently socially constructed process regardless of the external referent. The same can be said for forging relationships on social network sites where mixed methods uncover different notions of “friend” and on voting sites where mixed methods uncover different notions of “approval” or “upvoting”. In all cases, a reflexive triangulation can aid researchers in uncovering workable analysis rather than assuming the data will do it for them. Our archaeology of data ultimately demonstrates that large-scale statistical analysis and small-scale interpretive work are not mutually exclusive. Rather, they are mutually constitutive in the act of doing research. A recognition of this interplay across scales is necessary for work that preserves the reflexive nature of qualitative work and the scalability of big data analysis.


Data and its street life 

Alex Taylor, Microsoft Research

Is it just me, or does the talk about Big Data seem to be everywhere? Turning the (virtual) printed pages in the popular press, listening to the radio, and indeed perusing academic databases, so much of what I see and hear points towards Big Bata and its somewhat unnerving potential to reveal hitherto unexpected things about humanity (and yes the claims are often that grand). Census results; local statistics on crime, education, property prices, etc.; our shopping habits; our tweets and Facebook likes, etc. etc. All of it, it seems, is contributing to a veritable deluge of data to be aggregated and analysed. If we’re to believe the hype, altogether these data are enabling unprecedented access to who we are as individuals and citizens, and what we want.

What I find troubling, however, is that little of the rhetoric behind this big data appears to be about what ordinary folk—the person on the street if you will—might get from or do with their own data. Governments, corporate organisations, ad agencies, health providers, etc. all seem keen to amass data and, to be somewhat cynical about it, ‘monetise’ their ‘data assets’. Yes, some innovative and progressive attempts to redistribute agency are being developed here. Along with others the UK government, for instance, is actively involved in programmes to enable public access to their full range of data. Yet thus far it’s not clear to me who the intended audience is. A fair bit of know-how and expertise is needed to begin making sense of the award winning site, let alone how one might go about actually analysing the data and putting it to use.

So why, precisely, does this deluge of data matter and for whom does it matter? In what form are the data being materialised and are these (new) forms enacting (to use the sociological parlance) anything different, anything that might just make a material difference to those of us on the street who are arguably producing much of the data? Oh, and amongst all of this, we must also ask is there the possibility of something else? Do the innovations in Big Data and big data analytics allow us to imagine something different, something better?

It’s with these questions as an impetus that myself and a mixture of designers, makers and social scientists have decided to quite literally take things back to the street. For a year, seven or eight of us from my organisation and some academic collaborators are hoping to develop a sense of why data might matter for people, people bound not in some seemingly arbitrary way through nodes in a dataset, but by the common place they live in, namely a street in Cambridge, Tenison Road. Working with the residents on Tenison Road our aim is to discover when data comes to matter on a street and how such a community might make it matter. Data is to be cast as coming from somewhere and intended for someone, not as it would seem Big Sata has it, from nowhere and for no one. Ultimately, we see this as a project in the making, already and always about a street and its emerging data. Our wish is to help make that data something relevant and purposeful.

In this commentary, I will describe some of the engagements we have had thus far with the street and our plans for the coming year. I will also touch on a few of the systems and services we are putting into place to enable Tenison Road’s residents to collect, aggregate and use their data. The work will be couched in a materialist understanding of data and describe how we are drawing on a feminist epistemology as a means to (re-)imagine alternative social and techno-scientific possibilities.


Public displays of street data
Alex Taylor, Microsoft Research

As part of the engagement with Tenison Road, we are building a range of ideas for monitoring movement along the street, e.g., automotive traffic, cyclists, pedestrians, etc. This demonstration will detail the building of these systems and how they have been designed to visualise the street’s data, publicly. We’ll explain how these systems have emerged through meetings we’ve had with Tenison Road’s residents in which they’ve voiced worries about new building developments in the area. The concern is for the traffic, pollution, loss of heritage, and, overall, the changing quality of life on the street. We’ll describe how the attempts to track movement are intended to sit alongside parallel data-centred efforts to catalogue the changing nature of Tenison Road and situated in the practical concerns of the street’s residents.