Big Data & Society: Essays and Provocations

The Journal has suspended publishing new website essays but will continue to host the previously posted material.
_________________________________________________________________

Let’s change the way Big Data present the places we live

Dr. Yanni Alexander Loukissas, Assistant Professor of Digital Media, School of Literature, Media and Communication, Georgia Institute of Technology
Email: yanni.loukissas@lmc.gatech.edu
Website: http://loukissas.lmc.gatech.edu

Published: Aug 15 2019

Soon after moving to Atlanta, Georgia in 2014, my partner and I decided it was time for us to buy a home. Like many buyers today, we started by looking online. The real estate website Zillow seemed to offer the most data: current listings for sale or rent, as well as past and predicted property values on every home in the United States, even those not currently on the market. Zillow is the big data of real estate.

We also worked with a realtor, who did not conceal his skepticism about the site, particularly its “Zestimates,” Zillow’s term for its proprietary value estimates, automatically recalculated every day on more than one-hundred million properties. According to the company, Zestimates are within 4.5% of the sale value fifty percent of the time, a statistic that realtors can dismiss as “close to a coin toss.” For a buyer, however, such predictions are tantalizing; like so much information on the internet, they do not seem to cost anything. Yet harnessing the predictive power of big data can have unseen societal costs. Zillow’s estimates do not just affect users. They are shaping a new culture of real estate that can amplify market fluctuations, with devastating effects on low-income homeowners and renters, even if they, themselves, are not users of the site.

In my job at Georgia Tech, I help creative people to think critically about emerging technologies, but also to imagine how those technologies might be made differently. I have come to see Zillow as useful object lesson about the limits of big data.

Zillow is part of an interface economy, in which companies aggregate data from a variety of local sources—in Zillow’s case: county records, multiple listing services, independent realtors, even homeowners (none of which can be taken at face value)—and make those data accessible through a combination of interactive visualizations and handy algorithmic tools. And there you have the Zestimate.

“Data want to be free,” argue Zillow’s economists. Setting data free might sound like a laudable goal. The reality, as we know, is more complex. Interfaces do not liberate. They only reframe. They ask us to see data within new virtual and interpretive settings, which make some things easier and others harder. In particular, they make it difficult to maintain what we know and value personally, if that conflicts with what we see on screen.

Upon visiting Zillow for the first time, I quickly found myself immersed in its interactive map. Properties for sale or rent appear as colored dots on a neutral background of streets, parks, and place-names. You can click each dot to bring up its details. Initially, the map made me feel more confident about our purchasing options. Only with experience did I learn that in the real world, property values are not rational nor premised on a set of stable, mappable conditions; they are hashed out between buyer and seller in an emotional process, sometimes resulting in heated bidding wars. Zillow’s map suggests a simplicity that is not rooted in reality. Nevertheless, the site has holding power. Zestimates are continually updated. I could not help but anxiously check in, to see how individual property values might be changing.

As a home buyer, I found that Zillow supported certain kinds of desires: the right neighborhood, the right price, the right school zone, and the expectation of a stable investment, or even a profit. But it did not support other things my partner and I cared about. We were wary of contributing to Atlanta’s latest wave of gentrification. There are few policies in Georgia to protect low-income homeowners and renters from the increasing costs of staying in their own neighborhoods. In fact, we were ambivalent about becoming homeowners at all, if it meant participating in an inherently unjust system. We compromised by avoiding areas of the city where we thought our purchase might contribute to rapidly inflating home values. Our concerns did not disappear, but they became more abstract, for they were not supported by Zillow’s interface.

When we eventually closed on a house, we paid more than the asking price in order to compete with other offers; after losing two previous bids, this seemed unavoidable. In the four years since, prices in our new neighborhood have increased markedly. Zillow did not help us predict that our purchase would contribute to this aggregate effect. While some in the area are elated by the anticipated return on their investment, we are concerned about the longer term. Couldn’t our home’s value just as suddenly fall? The Great Recession was proof enough of that. And there are larger effects to consider.
Not far from where we live, exactly these changes are having historical significance: long-term residents are being priced out of the most culturally and politically significant neighborhoods in the country. The Old Fourth Ward—where Dr. Martin Luther King Jr. once lived and is memorialized today, where his Ebenezer Baptist Church is located—is on Zillow, visible only as collection of properties with Zestimates on the rise. Its history, if anything, is a selling point.

Zillow’s interface, with its rational map of real estate investments and ready predictions for the future of each, is a vehicle for a turbo-charged form of real estate consumerism. “You are in the driver's seat,” a headline on the site promises. Designers sometimes call this a “frictionless” interface, because it streamlines a previously difficult task. In this case, the task of navigating the data-saturated housing marketplace is made to seem easy. But in order to do this, an interface must “deracinate” data: obscuring their roots, their sources, as well as the consequences of their use.

Today, what started as a personal experience has become a professional challenge. My students and I are exploring how interfaces to data might connect more effectively to the knowledge and values of their users. In one line of research—inspired by the St. Louis Map Room, “a community space for exploring and creating original, interpretive maps of the city”—we are investigating how data can inform difficult and highly personal conversations, challenging the rampant culture of real estate for instance, rather than easy predictions.

This past summer, we built the Atlanta Map Room. In its first use, we asked a group of undergraduates to map the Atlanta BeltLine, one of the most ambitious urban redevelopment projects under construction in the United States. As its name suggests, the BeltLine is a circular system of parks, trails, and perhaps someday light transit, intended to stitch together 45 “Intown” neighborhoods along 22 miles of disused rail lines.

From the perspective of some residents, the BeltLine is a direly-needed form of green infrastructure for an otherwise traffic-choked city. It promises to make Atlanta healthier and more beautiful. For others, the BeltLine is an engine of gentrification. Indeed, property values have risen disproportionately along the project’s path, putting further economic pressure on low-income residents in adjacent neighborhoods, including the Old Fourth Ward.

Students Working in the Atlanta Map Room

Image credit: GVUCenter - Joshua Preston

Representing the BeltLine in all its complexity is not easy. Nevertheless, the undergrads eagerly took to the task. The output of a session in the Map Room is a large-scale, four by sixteen foot paper map suitable for public display. The students made one for each of the BeltLine’s completed Eastside and Westside trails. Guided by an overhead projector running our custom mapping software, they used markers, paint pens, stickers and photographs to synthesize their own personal experiences of visiting, or even living near, the BeltLine with publicly available data on changing property values, demographics, and transit routes.

Their maps serve as an early indication of the Atlanta Map Room’s capacity as a space for critical reflection on data and the way they represent the places we live. The Atlanta Map Room has since attracted the attention of local community organizers, planners, historians, and data scientists, all of whom are seeking more accessible ways of creating dialogue around publicly available data. Over the next few years, we will be working with the St. Louis Map Room’s lead artist, Jer Thorp, to support the creation of many more “map rooms” around the country: in schools, museums, and civic centers. The project is not an alternative to the interface economy, but an antidote. It teaches that data should never appear to speak for themselves.

_________________________________________________________________

Datafying the Globe: Critical Insights into the Global Politics of Big Data Governance

Elke Schwarz (Queen Mary University), Aaron McKeil (London School of Economics and Political Science), Mitchell Dean (Copenhagen Business School), Mark Duffield (University of Bristol), David Chandler (University of Westminster)

Published 26 Jan 2019

Introduction

The United Nations considers the development of Big Data analytics to be key to addressing a wide range of problems from sustainable development to disaster risk reduction and conflict management.¹ The Economist magazine argues that Big Data is now the ‘world’s most valuable resource’ - the new oil: transforming the coming era to the same extent as oil drove development in the last century.² Although there is no agreed definition of Big Data, it is often framed in terms of a qualitative transformation in the volume, variety and velocity of data with the development of new digital technologies of sensing and computation, thus enabling policy-makers to see ‘reality’ rather than relying on conceptual imaginaries or wishful thinking. Finally, or so we are told, the long history of international policy failure - in areas as diverse as tackling conflict management, sustainable development or global health - could be coming to a close.

Leading analysts, Viktor Mayer-Schönberger and Kenneth Cukier, understand the transformative power of Big Data as a product of new technologies linked to ubiquitous computing, algorithmic learning and the Internet of Things which enables ‘datafication’ – the process of turning life into fungible information: ‘Datafication is not just about rendering attitudes and sentiments into an analysable form, but human behaviour as well.’³ The problems of the world, once datafied, would no longer be seen ‘as a string of happenings that we explain as natural or social phenomena, but as a universe comprised essentially of information’.⁴ The former editor of Wired magazine Chris Anderson therefore claims that Big Data brings about the end of the need for theory, causal modelling and hypotheses: “With enough data, the numbers speak for themselves... faced with massive data, this [traditional] approach to science – hypothesize, model, test – is becoming obsolete”.⁵

Big data thus promises a revolution in our approach to the understanding of problems and processes, existing frameworks of policy-making and traditional forms of governance. Evgeny Morozov argues that Big Data approaches aspire to remove the need for governance on the basis of rules and laws, displacing this with real-time feedback mechanisms based on new forms of (datafied) self-awareness: ‘suddenly, there’s no need to develop procedures for governing every contingency, for – or so one hopes – algorithms and real-time, immediate feedback can do a better job than inflexible rules out of touch with reality’.⁶ The stakes for politics, democratic forms of rule, transparency and critique are high and yet there has been little discussion of Big Data in the field of international relations. This collective discussion brings together academics concerned with the impact of Big Data practices, ideas and technologies. It builds a dialogue of cautionary insights into the promise of Big Data for international relations, through discussing different ways of understanding the impact and stakes of governing through Big Data. This collective discussion advances the argument that the promise of Big Data for international relations and global governance needs to be critically understood as containing inherent political problems and limitations.

Elke Schwarz argues that the power to extract and analyse massive amounts of data is to be welcomed but cautions that the problem of Big Data lies in attempts to extrapolate from the past to make futures actionable. Big Data as a discourse of governance is thereby problematic in attempts to veil the interpretations of the past upon which futures are predicated as data is not the world and no algorithmic interpretations come without their own in-built biases and exclusions. Big Data is thereby doubly dumb, as data cannot speak for themselves and to mistake data for the world would inevitably lead to the occlusion of the interpretative power that brings data into being.

Aaron McKeil posits Big Data as the promise of a world yet to come: a datatopian promise of a world in which each actor and agency engages in real time adaptation to ensure ongoing stability on the basis of the power of the algorithmic governance of the self. Despite the fact that this world can never actually come into being and that it is no less subject to the limitations inherent to the modern international system, Big Data discourse and practice has potentially important effects on the modern mode of power nevertheless.

Mitchell Dean warns that we should not rush to see Big Data as necessarily new. Data is what is understood as given to experience and is multiplied through being given to digital technologies enabling a new form of algorithmic governmentality through the processing and repackaging of data. In this respect Big Data is as much political as governmental in being part and parcel of the circulation of ideas and decision-making. In fact, in the digital circulation of appearances and effects as givens there is an almost religious eschatological dimension to the revelatory discourse of Big Data finality.

Mark Duffield analyses how Big Data appears not so much as the opening to a ‘Brave New World’ but rather as the end point of socio-economic and political processes of remote management, automated welfare and network capitalism trailed and tested in the humanitarian disaster zones of the Global South. Using the figure of the refugee, Big Data governance, satellite sensing, biometrics and block-chain technologies have already tested and developed unique ways of tracking dividuated mobile identities.

David Chandler analyses Big Data governance as one based upon the increase in the power of correlational insight, fed by new technological developments and algorithmic computation. He argues that the problem is not so much in the knowledge generated by this shift, which is only as good as the correlations it relies upon for processes of ‘datafication’, but rather than correlational knowledge closes the possibilities of futures which differ from the present, making the actual the horizon of policy-thinking.

Elke Schwarz: No Matter How Big, Data is Dumb

Data is great. Data can serve as an insightful foundation for determining ‘what happened’. It can provide a element of certainty about observed events, which is especially important in the current political environment of mis-information campaigns through ‘fake news’ and ‘influencing operations’. Descriptive analytics of data can serve as a corrective to historical accounts and may help produce knowledge and insights about social and political life. Where data is available in greater volume, with greater variety and with greater velocity, the granularity of descriptive analytics can become extremely sophisticated. And that is a good thing. Consider, for example, the good work done by the International Consortium of Investigative Journalists (ICIJ) in putting together the Panama Papers, which shed light on the offshore tax practices of the rich and famous. By using and customising largely open source data processing software, the ICIJ was able to sift through all of the 2.6 terabytes of data that had been sent anonymously to Süddeutsche Zeitung, and produce an unprecedented overview of how exactly the wealthy are able to benefit from offshore tax schemes.⁷ The Panama Papers data leak is among the largest in history, and the ICIJ is rightly proud to have been able to uncover such an extensive network of international tax avoidance by drawing on the computational ecology of Big Data.

The current enthusiasm for Big Data, however, is marked by its orientation toward the future. In business as in governance, the prospect of anticipating the future ahead of its time is what drives ever-greater investment in data systems as ways of producing actionable knowledge.⁸ Predictive analyses of Big Data promise to mitigate risk and exploit potential opportunities. Prescriptive analytics draw on intelligent algorithms to help decide on the optimal future path. Google’s driverless car is an example of both predictive and prescriptive Big Data technology, aspiring to change the world as we know it by mitigating driver error and saving innumerable lives.⁹ Similarly, the NSA’s ability to capture ever-greater amounts of meta-data, paired with the right algorithms, is hailed to eventually eradicate the problem of global terrorism.¹⁰ If only we had enough data-points, we can know the past, pre-empt the present and secure the future. Once the world is rendered in data-form, a mathematical problem-solving approach could be applied not only to maximally optimise Return on Investment (ROI) for businesses, but ideally also to pesky policy problems in population administration. Or so the story goes.

Unfettered optimism about the life-improving powers of Big Data is often expressed in hyperbolic tropes and metaphors. The frames we use impact the way we think through and act on new issues, so I want to suggest that a closer look at how we frame Big Data can help us identify what Big Data renders visible, as well as what becomes invisible. In many contemporary narratives, the seemingly natural and limitless supply of data is foregrounded in ways that obscure the technological architecture – the Cathedrals of Computation, as Ed Finn calls them – on which the translation of “theoretical ideas in[to] pragmatic instructions” rely.¹¹ It is within these algorithmic architectures, however, that political-economic power and control is located.

‘Big Data is the new oil’, for example, is a persistent trope which serves to reinforce the intrinsic value of massive numbers of data points which can be fashioned into wealth with big enough servers and processing powers.¹² In one sense, the trope persists because it is embedded in much of the terminology used to describe how data is produced (data mining, data spills, data pollution, and so on). On a deeper level, however, it persists because it feeds into a desire for the possibility of unlimited wealth. Data is of course nothing like oil. Not in the way it occurs in the tangible world, not in how it is harvested, and not in how it circulates as a commodity or vector of power. It is fully renewable, which is, indeed, among its most attractive qualities. And it can be perpetually produced and reused over and over again (or at least for as long as its material substrates are in plentiful supply). But no matter how it is produced, to draw value from data it needs to be “cultivated” not extracted.¹³

Another prominent metaphor is that of data as some form of being in need of taming. Here, data is posited as a beast of raw power, which becomes harnessed through training. As Tim Negris suggested in 2013: “Data is like horses; it can be untamed and unmanageable or it can be trained and useful”.¹⁴ For this, a so called ‘data whisperer’ is required. A data whisperer is an analyst who is able to interpret data and render it intelligible to its audience. In doing so, the data-whisperer helps “close the gap between data models and reality”.¹⁵ In other words, a data-whisperer is a translator of data, an interpreter of data’s hidden message, a refiner of its raw power. For Negris and others, being a data whisperer requires a finesse and intuitive skill that only a human has. However, as more and more data is produced, circulates, and is processed through algorithmic architectures and artificial intelligence systems, the human whisperer is gradually replaced by artificial intelligence to curate, cultivate, and make data ‘talk’.

Yet another trope was spotlighted in the introduction to this collective discussion: that ‘the numbers speak for themselves’, that they have some intrinsic capacity to reveal reality. But as most statisticians know, data cannot speak for itself. Data is not an independent entity. It is neither comparable to the qualities of oil, nor is it a monster or wild horse in need of taming. And neither can it speak for itself. Data is mute. Data is points of information capture, about one or multiple occurrences that took place in time. Future data is a potentiality based on historic data. Predictive and prescriptive Big Data processes rely on event data, real-time or historical. But outside of analytical frameworks and architectures of interpretation, data is dumb. We must realise, then, that it is not in data as such that social, economic or political power resides, but rather in the proprietary interfaces through which data circulates, which make data speak, and give data value – through the proliferating forms of technological ventriloquism that mark contemporary economy and society. And it is precisely here that the question as to who has control over data as a commodity or asset becomes salient. What Big Data expresses, and to what ends it is ultimately put, will depend on whom it is that builds the architectures used to ‘cultivate’ and render data intelligible. For businesses, the end is usually related to better ROI and greater optimisation of their processes. For the government of people and populations, this becomes a bit more complex, particularly when Big Data produces actionable futures.

Conceptual focus on Big Data as a valuable asset, as a resource, or as a commodity that can be harnessed frequently omits the technological architecture mentioned above, which bestows meaning and content to data points. But these architectures of code, these algorithmic structures of applied information, are as ubiquitous as they are obscure to most non-experts. The utopian ideal of omniscience through potentially unlimited data-points is seductive. Such a project meshes well with the technological ecology we are building for ourselves, holding out the promise of a “unified vision of the world through clean interfaces and carefully curated data”.¹⁶ However, between the ideal vision of unlimited knowledge through data and the algorithmic application of data in the real world, there always remains a gap. And it is in this gap that Big Data wields its insidious power of sorting the world into its visible and invisible, its actionable and in-actionable futures. As Cathy O’Neil has shown in her book, Weapons of Math Destruction, algorithmic architectures shape how Big Data circulates in ways that privilege those who have been counted as privileged, and disadvantage those that are already marked as disadvantaged.¹⁷ “Big Data”, O’Neil notes, “codifies the past” and in doing so regulates the future. Predictive and prescriptive architecture is not value free.¹⁸ As I highlight elsewhere, algorithms and data technologies in general are rarely neutral.¹⁹ They typically reflect the goals and intentions of their makers, and they have the unsettling capacity to normalise their position and priorities. They also alleviate decision-makers from moral responsibility by offering scientific-technological authority for judgment, wherein erroneous or incomplete information can be sanitised through an “invisible machinery of faith”.²⁰ And that which cannot be counted and therefore processed, for whichever reason, becomes invisible and thus economically, politically or socially irrelevant.

To harness the positive power of Big Data, we should be attentive to its shortcomings and its locales of power. Many political and social problems cannot be reliably put in numerical terms, and thus cannot be adequately encountered or addressed in a purely technological universe. To these issues, data cannot speak or be made to speak without producing structures of inclusion and exclusion that are sure to foster new socio-political problems. We should be aware, as Finn notes, that the story of the algorithm is the story of a new myth of omniscience, which inevitably has produced a series of gaps between information and its meanings. In our narratives of Big Data, we should perhaps be more attentive to what goes on in these spaces.

Aaron McKeil: “Datatopian” Global Governance?

Shifts in global governance enabled by Big Data tools are significant for global governance. Yet, their use and promise is no less subject to the same forces of global politics inherent to the modern international system that raises the challenges global governance seeks to address. Big Data involves a new outlook on global governance problems, which enables new “datatopian” visions of their improvement.²¹ This involves a new phenomenology of the “immense” in a Bachelardian sense, as vastness beyond intelligibility, which enables a “let the data reveal itself” outlook coupled with a sense that Big Data may illuminate new solutions.²² With the ability to analyze immense data sets comes the attempt to illuminate unknown correlations, rather than causal patterns. Like the behaviouralist movement of the 1960s, it is a “data first” approach, but Big Data places a distinct emphasis on correlation and probability over causation and predictable outcomes.²³ The ability to generate these correlations, based on the aspiration of the verisimilitude of immense data to practice, is much of what is distinct about a “Big” Data outlook and approach.²⁴
In global governance, these tools are increasingly applied to global security and development.²⁵ Automated, algorithm-driven analysis of Big Data, through state and non-state security apparatuses, promises to illuminate unknown security threats and risks. Surveillance reaches beyond cyberspace to all realms of security governance, domestic, international, and global. The US military, for instance, is enabling drones with facial recognition, connected to data streams, for deployment in battlefields and zones of instability.²⁶ Sensors for gathering all manner of data are also distributed in zones of conflict and instability, whereby the data is analyzed for the movement and capabilities of security risks. In global development, the use of Big Data applied to the UN Global Goals for Sustainable Development is connected to the promise of adjusting flows of capital investment, global and local policy, with instantly and minutely measurable and expansive data. This Big “Datatopian” promise for international development has created vast new blank spaces on the world map, where data is unavailable, forming the impetus to generate sufficient data in developing countries today.

By “datatopian”, I mean the implicit ideal of a continuously adjusting global security and development governance system, enabled by “live” data flows and analysis that enable continuous adjustments to governance in practice. Notions of a “smart planet”, where all systems are data producing and data governed fit into the datatopia global governance vision. This vision is one where the inefficiencies of contemporary global governance are eliminated and new solutions are revealed through an ongoing unending process of analysis and adjustment. With unlimited and “live” data analysis, Big Data promises a security and development apparatus continually adjusting to risks. As such, the world before Big Data driven global governance is past, the more and more volumes of data are produced, and the more and more Big Data tools are applied in global governance. Yet, the implicit vision of “datatopian” global governance is a project in the making.

What are the limits of data-driven global governance? First, while great advancements are likely to be achieved, the datatopian promise of Big Data algorithmic governance is, to an extent, always a fantasy. As Beckian risk society analysis tells us, every adjustment to practice, based on probability analysis of correlations, virtually always creates new risks from new correlations.²⁷ Second, and more problematically, the application of Big Data tools to global governance problems is no less subject to the problems inherent to the modern and state-centric global order than other global governance tools. The systemic-insecurity of a decentralized international security system, for instance, cannot be ameliorated by Big Data driven governance and itself will subject the use and form of Big Data tools around the dictates of great power politics.²⁸ The great-power-centric sovereign states system, conditioned by the deep-structure of the modern international imaginary, produces inherent world order deficiencies, that raise the challenges which global governance struggles to ameliorate. The illumination of correlations, in the realms of global health, trade, security, and climate governance, does not necessarily entail an appropriate adjustment by states, which are themselves both produced by and embedded in a modern international system that is the source of those governance problems. Without global order reform coupled with a deep structure transformation, the application of Big Data tools will be no less subject to the pervasive and deep-seated security and uneven development problems inherent to the modern international system.²⁹

Still, even if the application of Big Data tools to global governance problems has identifiable limits, the question remains of the extent to which Big Data tools are producing a “new” kind of global governance. In this sense, Big Data in global governance is as much a “game shifter” as it is a “game changer”, so to speak. To employ Deleuze’s terms, it is still a “difference” in global governance, although it promises a transformative “event”.³⁰ First, it is important to recognize that Big Data tools potentially qualify as a technology of sufficiently intense change in capacities to transform the base material structure of the international system. Like the invention of writing, there is a sense that Big Data tools significantly amplify state power to a qualitatively transformative extent. Certain technologies have a transformative impact on the material structure of international systems. The steam engine and telegraph cable, for instance, changed interaction capacities to such an extent that they produced changes in the types of interaction, beyond a change in degree. In respect to Big Data, the degree of gatherable and analyzable data, combined with a qualitative shift in outlook on the uses of data, is producing a qualitative shift in how global governance problems are approached and conducted. The ability for the state to monitor all data for “unknown unknown” risks is constitutive of a qualitative change from the use of data to analyse the causes of known risks.

However, second, it is equally important to clarify how continuities and repetitions shape and limit new governance possibilities. Big Data tools enable changes in how global governance is conducted, as well as changes in the goals of global governance, but their applications are shaped by the horizons of its enabling and constraining preconditions. Shifts in global security and development are important, yet, as shifts, their preconditions form continuities. For instance, a major condition for the possibility of Big Data’s aspired to “datatopian” global governance, are modern conceptions of order. The modern conception of order as a collaborative activity, amongst collaborating participants, is a precondition for the ambition of collectively adjusting practice via live data analysis.³¹ The drive or dynamic behind the rise of the digital in global governance is connected to this background modern conception of what it means to have and maintain social and political order. That is, the ambitions of Big Data claims based on new technological capacity, are conceptually enabled and constrained by this background assumption. A further and connected enabling and constraining precondition of these claims is the modern conception of simultaneous time, where all actors exist and act simultaneously, in profane time. With Ulrich Beck’s “risk society” in mind, this is also future-oriented time, the sense of a future-oriented time in which all parties to this order are simultaneously collaborating, and simultaneously adjusting for probable risks. As such, the “datatopian” ambitions of Big Data applications constitute a significant shift in the “how” and “why” of global governance, but not a transformative change of global governance ambitions as such. While shifts in global governance enabled by Big Data tools are significant, they ultimately are subject to not only the global forces and limitations inherent to the great-power-centric modern international system, but also to the sociological and cultural sources of the modern international system.

Mitchell Dean: Undoing the Big Narrative of Big Data

The primary requirement to understand Big Data is that we avoid the hubris associated with its knowledge claims and narrative. This means that we should respond to the volume, variety and velocity attributed to Big Data in a measured and considered fashion. We should slow down, address the granularity of Big Data, and not be afraid of finding continuities where others see only a radical break with the past, a disruption, or a massive acceleration.

The first way to do this would be to consider etymology. ‘Big’ seems obvious but it could also be rendered ‘grand’ or ‘great’ if we want to emphasize the movement from quantity to quality. At least initially ‘data’ is the plural of ‘datum’. It is now, perhaps, a collective singular. According to the Oxford English Dictionary, the latter only enters English and German from classical Latin around 1630 and 1631, meaning a piece of information. The Latin ‘datum’ is that which is given or present, and related to the verb dare, to give. Thus there is already a semantic oscillation in the extant use of the term data between a perhaps dominant meaning as information, chiefly numerical, or pieces of such, and several subordinate ones as something given or taken for granted, or something immediately apprehended by the mind or senses, or as a basis or reference point. In this sense, Big Data could be that grand information which can be assumed, taken-for-granted, or form the base, the foundation, or the archè. The study of Big Data is thus a component of what might be called an ‘archaeology’ conceived as a problematisation of the given.

A second way of adopting a more measured approach would entail a reduction of scale. With Big Data, then, concentrate on the small, on the datum, otherwise you will be overwhelmed. Many instances of datum are produced without intent or conscious action, e.g., the movements tracked by a location enabled smartphone, or are the unintended consequence of another action, e.g. a tap or a swipe of a card, a passing through an e-toll detector, undertaking an ATM transaction, the rate of a pulse or a heartbeat. These all allow an extravagant, digital, ‘post-panoptic surveillant assemblage’, if we are to follow a Foucauldian line.³² Here ‘digital traces’ and ‘digital trails’ become the stuff of Big Data. Some other bits of datum are conscious but simple acts, hardly requiring reflection: a ‘like’ or a ‘follow’. Some are richer from a performative, semantic, and semiotic perspective – the menu of emojis, the character-limited tweet. The datum is then aggregated into Big Data, where claims are made for it by the companies that record them. Twitter for instances claims to be the new ‘digital agora’.³³ This is to say that social media and the companies that develop these applications are of course only one part of the question of Big Data; but because they rely on the co-production of users, they have a particular character and are important in the shaping of political identity and the attempt to consciously shape and reshape political identities. Moreover, these companies have found a way to convert political acclamation into capital accumulation.³⁴ They generate profit from the actions of their users and their aggregation; academics, marketers, mass media, public relations specialists and political campaigns can interpret and use this aggregated information. A new domain of governability emerges, as demonstrated by the Trump campaign’s use of digital marketing techniques and Facebook profiles in the 2016 presidential election. More broadly, individual actions and preferences initiate an ‘algorithmic governmentality’³⁵ exemplified in an early form by Amazon’s famed recommendation system. Other forms of algorithmic governmentality entail a care of the self and forms of self-governing and self-monitoring such as the use of the international platform, the Quantified Self, the motto of which is ‘self-knowledge through numbers’. ‘Self-trackers’ record and monitor their own behaviour, health, sleep habits or other uncertainty and share and compare their information with others in search for the meaning of their digital data and best ways to self-manage.

A third way of deflating the hubristic narrative is contesting the claim that Big Data solutions are all technical – or informational. This is the claim of the end of theory, of hypothesis and causality, and perhaps even the political. But as the example of the use of Facebook profiles to identify ‘clusters of persuadable voters’ during the 2016 presidential campaign indicates, we are on the threshold of new form of political identity formation, new ways of forming publics and new possibilities of political manipulation.³⁶ With the formation of a ‘public mood’ or social sentiment produced through social media acclamations the digital marketer might replace, or at least shadow the public opinion pollster and spin doctor as the essential backroom political figure. It is therefore good to keep in mind that Big Data might have directly political ramifications as much as policy ones. If Big Data allows us to govern effects rather than causes, and can be harnessed in ‘nudge’ programmes, it can also treat political choice as an effect of a menu of expressed preferences, events and attributes of the individual, most of which are not directly political (e.g., who one follows or ‘friends’). It is as much political as governmental, both a domain of the struggle over sovereignty and its high offices (a question of ‘who decides? or ‘who judges?’) and a tool for national and international governmentality (the identification of ‘what is a problem?’). The earlier dispositive of politics formed through public opinion, the objective narrative of mass media, and public opinion polls (which the Clinton campaign so excelled in) may have been rendered inoperative by the privatized swings of public mood, the multiplicity of voices, and the real-time registration and aggregation of data made possible by social media. It is noteworthy that while media commentators and national opinion polls almost uniformly failed to anticipate the possibility of, let alone predict, the result of the 2016 US presidential election, an Artificial Intelligence program, MogIA, did so some ten days before.³⁷

Finally, we can place the story of Big Data in a much longer and deeper frame of political and economic theology. The eschatological dimension of such a narrative is certainly worth noting. There are obvious parallels with eschatological schemata in which the world is no longer governed by the Law (the Old Testament or Age of the Father), or by rules and discipline (the New Testament or Age of the Son), but by the knowledge of God revealed directly in the hearts of all humans (the Age of the Spirit), discovered by the twelfth century monk, Joachim di Fiori.³⁸ A form of governing that will henceforth be based on law or discipline but on the ‘rationality of the governed themselves’ also brings close to the telos of Michel Foucault’s genealogy of governmentality, itself not completely free of eschatology.³⁹ This idea of a comprehensive, direct and immediate knowledge given by the things themselves is found in the basic idea of the all-seeing and all-knowing God of the Bible. Big Data as the Great Given is also Providence, and the effects we seek to govern are its ‘collateral effects’ or Fate; a figure that reproduces liberalism’s relationship between the Invisible Hand of the market and our individual freedom. In this sense, the theological genealogy simply cautions against the claims to novelty of the Big Data narrative, and requires us to analyse that narrative as a part and parcel of those historical teleologies that were given something like a ‘secular’ form only with German Idealism, Marxism and Comte’s positivism in the nineteenth century. If, on the other hand, we wanted to give a name for the kind of study that can address the liturgical character of the tweets and posts that form an ekklèsia in the digital agora, we could speak of a ‘political archaeology of glory’.⁴⁰

Mark Duffield: The Boomerang Effect

Big Data and the computational turn have the appearance of having just arrived. The Internet, screen interfaces, software platforms and data-parks linking people and things in remarkable ways have become ubiquitous across the economy and society in less than a generation. When set against an international scene that is more polarised, fragmented and angry than it has been for decades,⁴¹ for its advocates the arrival of Big Data borders on the providential. It promises to take the complexity out of a world that we suddenly no longer appear to understand. At a time of deepening capitalist instability, however, we need to question whether Big Data is, indeed, part of the solution, or part of the problem.

Computational technologies have been around since the mid Twentieth Century. Helped by the absence of any serious political debate, that they appear ‘new’ has more to do with their relatively quick and seamless integration into the fabric of capitalism. In fact, the apparent naturalness of the computational turn tends to hide the decades of prior political, epistemological and ideological preparatory work necessary to turn the causal registers of knowledge and history into those of data, behaviour and resilience. Since the 1970s, theory and critique has been disparaged and displaced by the rise of behaviourism and complexity thinking within the academy.⁴² By the 1990s, across a range of practises this essentially cybernetic episteme had, using labour-intensive means, already transformed politically recalcitrant knowledge into more mathematically pliable signals and alerts.⁴³ Such preparatory work ahead of the computational turn is intimately interwoven with the conservative revolution and the transition from Fordism to post-Fordism. Rather than representing a new point of departure, however, from this perspective, the era of Big Data feels more like a point of arrival; an ending rather than a beginning.

To situate the international within this process of arriving and ending, Hannah Arendt’s concept of the ‘boomerang effect’ is useful.⁴⁴ In suggesting that a feedback loop interconnects social change in the global North and South, it unsettles conventional spatial and temporal boundaries. Rather than the global South seeing its future in the North, it's the North that reads its destiny in the South. For Arendt, the violent excess of Nineteenth Century imperialism in the colonies proved a test bed for Twentieth Century totalitarianism. The boomerang, however, is in constant motion. Today, the humanitarian disaster zone is the site of exception where its current effects are being realised.

Recognising no borders, the NGO-led fantastic invasion of the global-South in the 1980s anticipated the post-social relations and work ethic of the emerging new economy.⁴⁵ Hierarchically flat NGOs, for example, valorised and individuated community mutuality and gender while projectised forms of self-reliance pioneered a post-social spirit of capitalism. Building on the initial attempts to establish famine early warning systems during the 1970s, the fantastic invasion also experimented with analogue means of turning disasters into the art of pattern recognition. The abnormal behaviour of farmers, for example, became signals of impending disaster.⁴⁶ NGOs likewise pioneered complexity thinking. Whereas disasters once had causes and perpetrators, they now have multiple origins where everybody and nobody are responsible. The fantastic invasion explored post-social forms of existence and projectised employment at the same time as appropriating and transforming fractious knowledge and history into the smooth flows of information and data. From the end of the 1990s, this preparatory work allowed the seamless adoption of computational technologies and, with the spread of mobile telephony, the repurposing of the global South as a test bed for remote management and welfare automation befitting the precarious post-social world now in formation.

Unlike the 1980s when the new economy was emerging, the boomerang effect today is inseparable from the declining profit-based crisis of post-Fordism.⁴⁷ The new economy has produced growing inequality, the decline of living standards and the inexorable growth of a global ‘service sector’ precariat. Whereas automation in the past displaced workers into new and dynamic sectors of the economy, today they have nowhere to go but a technologically stagnant, insecure and low wage service sector.⁴⁸ As the necessary topographical hinterland of the ‘smart city’, the wired slum is ideally placed to anticipate new ways of governing and exploiting the global precariat through the spatial logics and asymmetries of information common to networks. Moreover, due to decay and, especially, deliberate acts of urbicide, the wired slum is ideal for exploring nomadic forms of ‘off-grid’ existence thus addressing post-Fordism’s crisis of infrastructural renewal.

Embracing what is commonly called humanitarian innovation, the current boomerang effect can be read as science-fiction.⁴⁹ That is, as a future-imaginary that never escapes the potential of the present. Beyond the anti-migrant ramparts of the West, Big Data’s usefulness for remote management and welfare automation is widely realised within the ruined and dilapidated landscapes of the post-social wild. Far above the periodic violence and unrest on the ground, corporate Internet connectivity is supplied by high endurance drones operating from the stratosphere. Through subsidies, software for data and the increasing effectiveness of low bandwidth apps, mobile telephony is ubiquitous throughout the wired slum. In the absence of conventional social statistics, behavioural change and population movements are constantly mapped through combinations of data informatics, sensor feedback and remote satellite imaging. For estimating humanitarian need, these methodologies are occasionally supplemented by targeting automated telephone questionnaires.

The distribution of food aid has been phased out due to costs and fears of criminal and patriarchal abuse. It has been replaced by the personal transfer of small amounts of cash or tokens through local kiosks that, due to solar power solutions, are found even in the most remote regions. Identification is biometric and, depending on early warning data, amounts and transfer timings can vary according to circumstances. Often precipitated by protests or empathetic celebrities, sometimes drones deliver emergency supplies. Using green-finance, maintaining social reproduction within the post-social wild is facilitated by commercial humanitarian objects that allow individuals to survive off-grid. These personalised solutions include portable solar powered devices, individual water filtration, therapeutic foods, emergency shelter and individual waste disposal systems. Regarding health and education, e-medicine and e-education apps empower self-diagnosis and self-learning respectively. As part of an Internet of Things, these objects provide user feedback allowing their use and impacts to be adjusted. In this way, the communities of yesterday have been transformed into today's 'communities of users' permanently enrolled in the constant prototyping of the attentive tools of their own social reproduction.

To incorporate the global precariat within the commodity and value-chains of global capitalism, block-chain technology is building on satellite sensing and biometrics to give each migrant and slum resident a unique and encrypted identity. Apart from monitoring movements and accessing entitlements, this accounting technology underpins the global gig economy that zombie capitalism now depends upon to maintain shareholder dividends. The massive amount of data generated by these technologies doubles as a security apparatus. Whether a beneficiary gets a few tokens, access to an e-medicine portal or is automatically terminated by a self-acting weapon depends upon the computer.

David Chandler: Big Data: Governance through Correlation rather than Causation

Big Data approaches can be usefully understood on the basis of governance through correlation rather than causation. Big Data governance deploys high tech algorithmic programmes and ubiquitous sensing technology and media streams in order to detect previous unseen processes of change through ‘datafication’ or correlational insight. These approaches operate on the depoliticised surface of appearances or ‘effects’:⁵⁰ on the ‘actualist’ notion that ‘only the actual is real’.⁵¹ As Roy Bhaskar, the originator of the philosophy of critical realism, has argued, ‘actualism’ can be seen to be problematic in that hierarchies of structures and assemblages disappear and the scientific search for ‘essences’ under the appearance of things loses its value.⁵² This form of governance pragmatically accepts that little can be done to prevent problems (understood as emergent or interactive effects) or to generalise ‘lessons learned’ and that aspirations of transformation are much more likely to exacerbate problems rather than solve them. Rather than attempt to ‘solve’ a problem or adapt societies, entities or ecosystems, in the hope that they will be better able to cope with problems and shocks, Big Data approaches work on how relational understandings can help in the present; in sensing and responding to processes of emergence.

This mode of governance shares the ontopolitical assumptions of actor network theory (ANT) and can be informed by a consideration of the long-running engagement between Bruno Latour (the leading proponent of ANT) and Graham Harman (a leading speculative realist) over the conceptualization of this approach.⁵³ The focus on relations in the actual, in the present rather than on the potential, or possibilities, which may lie latent or virtual in entities, ecosystems or assemblages, is crucial to the distinction with a causal ontology. For Harman, ANT falls down for its lack of distinction between objects and their relations, which he argues acts by ‘flattening everything out too much, so that everything is just on the level of its manifestation’, and therefore, the approach ‘can’t explain the change of the things’ or the hidden potential of alternative outcomes.⁵⁴ In ANT approaches, modernist understandings of the world, whether those of natural or of social science, give too much credence to entities as if they have fixed essences (allowing causal relations) rather than shifting relations to other actants:

The world is not a solid continent of facts sprinkled by a few lakes of uncertainties, but a vast ocean of uncertainties speckled by a few islands of calibrated and stabilized forms… Do we really know that little? We know even less. Paradoxically, this ‘astronomical’ ignorance explains a lot of things. Why do fierce armies disappear in a week? Why do whole empires like the Soviet one vanish in a few months? Why do companies who cover the world go bankrupt after their quarterly report?⁵⁵

Noortje Marres has argued for the importance of ANT approaches as a new way of seeing agency in the world on the pragmatic basis of ‘effect’ rather than a concern for emergent causation: ‘because pragmatists are not contemplative metaphysicians, because they say “we will not decide in advance what the world is made up of”, this is why they go with this weak signal of the effect. Because that is the only way to get to a new object, an object that is not yet met nor defined.’⁵⁶ Marres argues that taking ‘as our starting point stuff that is happening’ is a way of ‘suspending’ or of ‘undoing’ ontology, in order to study change.⁵⁷ This aspect is vital to Big Data and ‘datafication’ as a mode of governance, as this enables a focus upon the surface appearances of change, which are not considered so important in an ontology of causality.

Surface appearances of things are continually changing as their relationships do, not through an ontology of causal depth but through networks and interactions on the surface: in plain sight. Thus new opportunities arise to see with and through these relations and co-dependencies: whether it is the co-relation of pines and matsutake mushrooms (mobilized by Anna Tsing⁵⁸) or the co-relation between sunny weather and purchases of barbecue equipment or the co-relation between Google search terms and flu outbreaks.⁵⁹ These are relations of ‘effects’ rather than of causation, when some entities or processes have an effect on others they can be seen as ‘networked’ or ‘assembled’ but they have no relation of immanent or linear causation which can be mapped and reproduced or intervened in.

New actors or agencies are those brought into being or into relation to explain ‘effects’ and to see processes of emergence through ‘co-relation’. In this respect, new technological advances driving algorithmic machine learning, Big Data capabilities and the Internet of Things enable correlational modes of governance. Big Data aims not at instrumental or causal knowledge but at the revealing of co-relational or emergent effects in real-time, enabling unexpected effects to be better and more reflexively managed. Work on Big Data in relation to conflict risk reduction provides a good example of this shift from causal to correlational concerns. This was highlighted in the US journal Foreign Policy, which drew attention to the establishment of a web of Kenyan NGOs proactively engaging to prevent violence breaking out in the 2013 Kenyan elections (after over one thousand people died in conflict during the 2007 elections). Here the real-time monitoring and responsiveness essentially involved open-source data collection and the mapping of social media with donor-sponsoring of text messaging, calling for peace to be maintained when tensions arose.⁶⁰ Big Data - in terms of social media monitoring and responsiveness - was being used to ensure that communities policed themselves through real-time feedbacks. In fact, as John Karlsrud notes, this redistribution of agency in knowledge production is well captured in the term ‘crowd-sourcing’, literally out-sourcing agency and responsibility to the many rather than an individual or expert.⁶¹

The logic of this understanding of conflict is important in highlighting Big Data approaches as the modulation of effects on the basis of correlational mechanisms. Conflict is ‘sensed’ through picking up the earliest signs or signals from social media search or from direct notifications, with the aim of responding to ‘pre-event’ its escalation. It should be noted that seeing conflict through ‘datafying’ it - seeing it through correlation, i.e. through ‘effects’ in social media - has fundamental implications for the discipline of conflict and peace studies. This is because this form of governing is as distinct from conflict prevention as it is from post-conflict reactive intervention. For Big Data policy approaches, it does not make sense to think in traditional disciplinary terms of pre-conflict or post-conflict as conflict is a state of the world, ever present but needing to be seen and recognised as such to enable responsivity and care towards its management. This is what it means to see conflict as a relational process rather than as an entity or a state of being which then either exists or it does not. Conflict is no longer excluded as somehow an aberration or an exception; the modernist binary of peace and conflict thus no longer appears useful for seeing the world, in fact, it becomes a barrier to seeing the world as it actually is in reality. Through this process of grasping conflict through its datafied effects, conflict becomes normalised as an aspect of life that requires modulation, preferably through the development of community self-responsivity. Governance through modulation and responsivity then becomes not the ‘solution’ to conflict but the way of managing it in this new mode of governance. In this sense, we should be cautionary about Big Data as, rather than enabling transformative solutions and new possibilities, its key danger is the normalization or naturalization of the status quo.

Conclusion

The promise of Big Data for international relations and global governance needs to be critically understood as containing inherent problems and limitations. Two broad cautionary themes integrate the insights raised in this collective discussion into the problems and limitations of Big Data for international relations and global governance.

First, this collective discussion highlights the difficulties involved in the notion of using Big Data for predicting future trends and for policy guidance in the sphere of international relations and global governance. Schwarz analyses how the promise of using Big Data for improving the future necessarily entails the difficulties of interpreting data from the past for future policy and practice. McKeil argues that the promise of Big Data for international relations and global governance implies a “datatopian” future global order of continuously adjusting global security and development systems that is ultimately an unobtainable global order, because every adjustment for risks creates new risk correlations. Dean draws out how the narrative of Big Data’s promise for improving future international relations and global governance through algorithmic governance is shaped by eschatological preconditions. Duffield argues that through the application of Big Data to development and security global governance, the global North is testing its future in the post-colonies of the global South. Chandler suggests that the use of correlational knowledge for international relations and global governance necessarily limits possible futures.

Second, this collective discussion puts a clear focus upon the inherently political nature of the application of Big Data to international relations and global governance. Schwarz argues that the interpretation and coding of data is an inherently political activity. McKeil highlights that the ambition of a “datatopian” global order, even if ultimately impractical, enables political shifts in the modern mode of global political culture and practice. Dean spotlights that it is the hubris of the promissory narrative of Big Data which enables a new algorithmic governmentality. Duffield’s analysis points to the emergence of Big Data governance as not a beginning but rather the end-point of international retreat and a politics of remote governance. Lastly, Chandler argues that the “datafication” of governance entails an ontopolitical normalization of political problems as phases in a modular process, rather than as aberrations of practice.

It seems unlikely that the drive towards Big Data-driven understandings of international relations and global governance is going to abate anytime soon. This collective discussion cautions that the dynamic behind this appears to have less to do with Big Data’s success in traditional problem-solving terms and more to do with an underlying transformation of both the approach and the aspirations of international actors. These cautionary insights problematize the promise of applying Big Data tools to the problems of international relations and global governance. The application of Big Data tools to global problems such as sustainable development, disaster risk reduction, and conflict management necessarily involves confronting the political continuities, contentions, and limitations that have been raised in this collective discussion.

Endnotes

¹ ‘Data Revolution Report: A World That Counts’, United Nations, Available at: http://www.undatarevolution.org/report/
² ‘The world’s most valuable resource’, The Economist, 6 May 2017, 9.
³ Mayer-Schönberger and Cukier, Big Data: A Revolution that Will Transform How We Live, Work and Think (London: John Murray, 2013), 93.
⁴ Ibid, 96.
⁵ Chris Anderson, ‘The End of Theory: The Data Deluge Makes the Scientific Method Obsolete’, Wired Magazine 16(7), 23 June 2008. Available at: http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory.
⁶ ‘The Rise of Data and the Death of Politics’, The Observer, 30 July 2014. Available at: http://www.theguardian.com/technology/2014/jul/20/rise-of-data-death-of-politics-evgeny-morozov-algorithmic-regulation.
⁷ Luke Harding, ‘What are the Panama Papers? A Guide to the Biggest Data Leak’, The Guardian, 5 April 2016, Available at https://www.theguardian.com/news/2016/apr/03/what-you-need-to-know-about-the-panama-papers
⁸ Louise Amoore, ‘Algorithmic War: Everyday Geographies of the War on Terror’, Antipode 41, no. 1 (2009): 49-69
⁹ Mark van Rijmenam, ‘The Future of Big Data: Prescriptive Analytics Changes the Game’, DataInformed, 10 June, 2014, Available at: http://data-informed.com/future-big-data-prescriptive-analytics-changes-game/
¹⁰ Michael Hayden, quoted in David Cole, ‘We Kill People Based on Metadata’, The New York Review of Books, 10 May 2014. Available at: http://www.nybooks.com/daily/2014/05/10/we-kill-people-based-metadata/
¹¹ Ed Finn, What Algorithms Want: Imagination in the Age of Computing (Cambridge: MIT Press, 2017).
¹² The Economist pronounces big data now to be the world’s most valuable resource, putting oil in second place, Joris Tooners proclaims proudly in WIRED that data is the “new oil of the digital economy” – “an untapped, valuable asset”; Jonathan Vanian quotes Shivon Zilis, partner at Bloomberg Beta, in cementing this trope and explaining how the more sophisticated Artificial Intelligence becomes, the more valuable data is; Canadan’s CBS also uses the phrase in its discussion of the mounting wealth of the Big Five technology companies: Amazon, Apple, Facebook, Google and Microsoft. These are just a few examples.
¹³ Jer Thorp, ‘Big Data is Not the New Oil’, Harvard Business Review, 30. November 2012. Available at https://hbr.org/2012/11/data-humans-and-the-new-oil
¹⁴ Tim Negris. ‘Are You a Data Whisperer?’ Data Science Centre, 13 February 2013. Available at: https://www.datasciencecentral.com/profiles/blogs/are-you-a-data-whisperer
¹⁵ Alex Woodie, ‘Secrets of the Data Whisperer’, Datanami, 13 July 2016.
¹⁶ Finn, What Algorithms Want, 8
¹⁷ Cathy O’Neil, Weapons of Mass Destruction: How Big Data Increases Inequality and Threatens Democracy (New York: Crown, 2016).
¹⁸ Ibid., p.204
¹⁹ Elke Schwarz, ‘Technology and Moral Vacuums’, Journal of International Political Theory, forthcoming 2018.
²⁰ Finn, ‘What Algorithms Want’, 7
²¹ Kenneth Cukier, Viktor Mayer-Schoenberger, ‘The Rise of Big Data: How it’s Changing the Way We Think about the World’, Foreign Affairs, 92, no. 28 (2013), 28-40.
²² Gaston Bachelard, ‘Intimate Immensity’ in Maria Jolas trans. The Poetics of Space (Boston: Beacon, 1994), 183-210.
²³ Hedley Bull, ‘International Theory: The Case for a Classical Approach’, World Politics, 18, no. 3 (1966), 361-377; David Chandler, ‘A World Without Causation: Big Data and the Coming Age of Posthumanism’, Millennium: Journal of International Studies, 43, no. 3 (2015), 833-851.
²⁴ Mark Coté, Paolo Gerbaudo, and Jennifer Pybus, ‘Introduction: Politics of Big Data’, Digital Culture and Society, 2, no. 2 (2016), 5-15.
²⁵ Hans Krause Hansen and Tony Porter, ‘What do Big Data do in Global Governance?’, Global Governance, 23 (2017), 31.
²⁶ Andrej Zwitter, ‘Big Data and International Relations’, Ethics & Foreign Affairs, 29, no. 4 (2015), 377.
²⁷ Ulrich Beck, ‘World Risk Society as Cosmopolitan Society? Ecological Questions in a Framework of Manufactured Uncertainties’, Theory, Culture & Society 13, no. 4 (1996), 1-32.
²⁸ Daniel Deudney, ‘The Great Descent: ‘Global Governance’ in Historical and Theoretical Perspective’, ed. Amitav Acharya, Why Govern? Rethinking Demand and Progress in Global Governance (Cambridge: Cambridge University Press, 2016), 31-54.
²⁹ Richard Falk Power Shift: On the New Global Order (London: Zed, 2016).
³⁰ Gilles Deleuze, Difference and Repetition trans. Paul Patton, (New York: Columbia University Press, 1994).
³¹ Charles Taylor Modern Social Imaginaries (Durham: Duke University Press, 2004). ³² Mark Andrejevic and Kelly Gates, ‘Big Data Surveillance: Introduction’, Surveillance and Society 12, no. 2 (2014): 185-196.
³³ Jose van Dijck, The Culture of Connectivity: a Critical History of Social Media (Oxford: Oxford University Press, 2013), 69.
³⁴ I have presented the neglected concept of political acclamation and its relationship to social media in Mitchell Dean, ‘Political acclamation, social media and the public mood’, European Journal of Social Theory 20, no. 3 (2017): 417-434.
³⁵ Tyler Reiguluth, ‘Why data is not enough: digital traces as control of self and self-control’, Surveillance and Society 12, no. 2 (2014): 243–254.
³⁶ See the analysis of the forms of acclamation in the 2016 presidential campaign in Mitchell Dean, ‘Three forms of democratic political acclamation’, Telos 179 (Summer 2017), 28-29.
³⁷ http://www.cnbc.com/2016710/28donald-trump-will-win-the election-and-is-more-popular-than-obama-in-2008-ai-system-finds.html (accessed 5 November, 2016).
³⁸ Norman Cohn, The Pursuit of the Millennium: Revolutionary Millenarians and Mystical Anarchists of the Middle Ages (London: Pimlico, 2004/1957), 108-110; Karl Löwith, Meaning in History (Chicago: University of Chicago Press, 1949), Ch. VIII.
³⁹ Michel Foucault, The Birth of Biopolitics (London: Palgrave Macmillan, 2008), 313. On the eschatological elements of Foucault’s genealogical narrative, see Mitchell Dean and Kaspar Villadsen, State Phobia and Civil Society: The Political Legacy of Michel Foucault (Stanford: Stanford University Press, 2016), 126-131.
⁴⁰ See Giorgio Agamben, The Kingdom and the Glory: for a Theological Genealogy of Economy and Government, trans. Lorenzo Chiesa with Matteo Mandarini (Stanford: Stanford University Press, 2011).
⁴¹ Pankaj Mishra, Age of Anger: A History of the Present (UK: Allen Lane, Penguin Random House, 2017).
⁴² Bernard Stiegler, The Lost Spirit of Capitalism: Disbelief and Discredit, Vol 3, trans. Daniel Ross (Cambridge: Polity Press, 2014).
⁴³ Jennifer S. Light, From Warfare to Welfare: Defense Intellectuals and Urban Problems in Cold War America (Baltimore: Johns Hopkins University Press, 2003).
⁴⁴ Hannah Arendt, The Origins of Totalitarianism (New York: Harcourt, Inc, 1994).
⁴⁵ Luc Boltanski and Eve Chiapello, The New Spirit of Capitalism, trans. Gregory Elliott (London & New York: Verso, 2005).
⁴⁶ Nick Cater, Sudan the Roots of Famine: A Report for Oxfam (Oxford: Oxfam, 1986).
⁴⁷ Wolfgang Streeck, 'The Crises of Democratic Capitalism,' New Left Review, no. 71 (2011), 5-29.
⁴⁸ Jason E Smith, 'Nowhere to Go: Automation, Then and Now Part Two,' The Brooklyn Rail Available at: http://brooklynrail.org/2017/04/field-notes/Nowhere-to-Go-Automation-Then-and-Now-Part-Two.
⁴⁹ What follows is a digest of what is currently happening or being developed in different parts of the global South. Drawing these together gives a feel for the totalising cybernetic logic at work. Some indicative references include: Patrick Meier, Digital Humanitarians: How Big Data Is Changing the Face of Humanitarian Response (Boca Raton, London & New York: CRC Press, 2015); Mark Zuckerberg, 'Connecting the World from the Sky,' Facebook (internet.org - Everyone of us. Everywhere. Connected), Available at: http://internet.org/press/connecting-the-world-from-the-sky; Mat Honan, 'Facebook's Plan to Conquer the World - with Crappy Phones and Bad Networks,' no. 26 Februray (2014); WFP & Nielsen, 'Revolutionizing Data Collection: World Food Programme and Nielsen Use Scalable Mobile Surveys in Today's Changing Technological Landscape,' (2015); Philippe Bally et al., 'Remote Sensing and Humanitarian Aid: A Life-Saving Combination,' esa bulletin 122 (2005); Kevin P Donovan, 'Infrastructuring Aid: Materializing Social Protection in Northern Kenya,' in CSSR Working Paper No 333 (University of Cape Town: Centre for Social Science Research 2013); Simon Denny, 'Blockchain Future States: What Would a Future Geopolitics Powered by the Technology of Behind Bitcoin Look Like?,' e-flux architecture, Available at: http://www.e-flux.com/architecture/superhumanity/68703/blockchain-future-states/; Hugo Slim, 'Eye Scan Therefore I Am; the Individualization of Humanitarian Aid' European University Institute, Available at: http://iow.eui.eu/2015/03/eye-scan-therefore-i-am-the-individualization-of-humanitarian-aid/; Tom Scott-Smith, 'Humanitarian Neophilia: The 'Innovation Turn' and Its Implications,' Third World Quarterly 37, no. 12 (2016), 2229-2251.
⁵⁰ See Giorgio Agamben, ‘For a theory of destituent power’, Chronos 10, February 2014. Available at: http://www.chronosmag.eu/index.php/g-agamben-for-a-theory-of-destituent-power.html.
⁵¹ Graham Harman, Towards Speculative Realism: Essays and Lectures (Winchester: Zero Books, 2010), 180; see also Harman, Prince of Networks: Bruno Latour and Metaphysics (Melbourne, Australia: re:press, 2009), 127.
⁵² Roy Bhaskar, The Possibility of Naturalism: A Philosophical Critique of the Contemporary Human Sciences, 3^rd ed (Abingdon; Routledge, 1998), 7-8.
⁵³ See Bruno Latour, Graham Harman and Peter Erdélyi, The Prince and the Wolf: Latour and Harman at the LSE (Winchester: Zero Books, 2011).
⁵⁴ Ibid., 95.
⁵⁵ Latour, Reassembling the Social: An Introduction to Actor-Network-Theory (Oxford: Oxford University Press, 2005), 245.
⁵⁶ Latour et al, The Prince and the Wolf, 62.
⁵⁷ Ibid.: p.89.
⁵⁸ Anna L Tsing, The Mushroom at the End of the World: On the Possibility of Life in Capitalist Ruins (Princeton: Princeton University Press, 2015), 176.
⁵⁹ Alexis C Madrigal, ‘In Defense of Google Flu Trends’, The Atlantic, 27 March 2014. Available at: https://www.theatlantic.com/technology/archive/2014/03/in-defense-of-google-flu-trends/359688/.
⁶⁰ Sheldon Himelfarb, ‘Can Big Data Stop Wars Before They Happen?, Foreign Policy, 25 April 2014.
⁶¹ John Karlsrud, ‘Peacekeeping 4.0: Harnessing the Potential of Big Data, Social Media, and Cyber Technologies’, in J-F Kremer and B Müller (eds) Cyberspace and International Relations: Theory, Prospects and Challenges (London: Springer, 2014), 141-160.

_________________________________________________________________

New big data narratives emerging from society

Miren Gutérrez, Department of Communication, University of Deusto, Mundaiz Kalea, 50, 20012 San Sebastián, Spain, m.gutierrez@deusto.es

Published 23 Oct 2018

While working on a big data-based project on illegal fishing in western Africa in 2014, my colleagues and I at the Overseas Development Institute started putting together the rudiments of a data activist project that would include alliances, data-based research, cartography, lobbying, media reach work and an opportunistic report launch. Two years later, the maps that we published presented for the first time fishing vessels transferring fish at sea in areas where this operation is illegal.

The impact of these revelations was immediate: within days more than 150 media outlets, including The Guardian, CNN and BBC, had reported on the story, the Gambia prohibited foreign operations in its waters, and Namibia signed a Food and Agriculture Organisation agreement to prevent illegal fishing, as the report was recommending. The focus on visualising fish transhipments as a way to spot irregularities was replicated by other organisations (Global Fishing Watch, 2017). The evidence was sufficient to begin shifting the inaction on illegal fishing, and also got me thinking about the promise of data activism.

Our partner, FishSpektrum, had independently collected the data on which the illegal fishing project was based despite the secrecy that conceals many fishing operations and agreements. Some of these datasets were public, but not open as they were buried under layers of red tape.

While I was working on this project, Stefania Milan invited me to lecture on data journalism at the University of Utrecht. In our conversations, we coined the term ‘data activism’ to refer to activism enabled (and constrained) by data. In an article we published in 2015, ‘Citizens’ Media Meets Big Data: The Emergence of Data Activism’, we linked data activism to citizens’ media and outlined the questions surrounding it (Milan and Gutiérrez 2015). This was the point of departure for data activism as a theory.

Based on these early ruminations, my new book Data activism and social change offers analysis that can be used to create new data endeavours, as well as to reinforce the theory spawned around the politics of big data from a bottom-up perspective. The difficulties in obtaining data for the illegal fishing project made me realise that data mining could be employed as a lens to generate one of the taxonomies of data activists offered in the book.

Theoretically, the manuscript is based on critical data studies, including (Baack, 2016; Braman, 2009; Cukier and Mayer-Schoenberger, 2013; Kennedy et al., 2016; Kitchin, 2014; Milan and Gutierrez, 2015; Milan and van der Velden, 2016; Tufekci, 2014; van Dijck, 2014) as well as social movement, communication and media theory (Boyd and Crawford, 2012; Calhoun, 1992; Castells, 2009; della Porta, 2013; Downing, 2011; Goodwin et al., 2004; Habermas, 1996, 1984; Melucci, 1996; Rojas, 2015; van de Donk et al., 2004), among many others. Data for the analysis were obtained blending empirical observation, semi-structured interviews and case studies of the manner in which people employ data in their daily practices to cooperate, engender datasets, create radical cartography, defy top-down narratives and analyses, join forces and act.

Types of data activism

Based on the dozens of interviews and cases, the book offers a classification of data activists into skills transferrers, catalysts, data journalism producers, and full-blown data activists (see Figure 1). The forty plus organisations and initiatives examined fall mainly under one of these four categories, although there is a great degree of hybridisation.

Figure 1: Classification of activists. Elaboration by the author.

Skills transferrers make data activism possible by responding to diverse challenges, building networks and bridging the gap between the skills-holders and the unskilled. They typically transmit data or social science skills (e.g. School of Data), create data platforms, and visualisation and analysis tools (e.g. Vizzuality), and trigger collaborative opportunities (e.g. Medialab-Prado). Meanwhile, catalysts usually provide the funds and resources to sustain data projects (e.g. Open Knowledge Foundation).

Data journalism producers often fill gaps left by journalistic organisations and pose opportunities for data projects. Journalism merits extra attention in the book since it has pioneered in communicating data-based information via visualisations and it is an entry point into activism for some reporters.

For example, in Spain, the economic crisis hitting media outlets, the low level of transparency and open data, and the discredit of Spanish journalism seem to have inspired some NGOs, such as Civio, to fill a gap in journalism. While remaining an advocacy group, Civio generates journalistic content supported by data such as ‘España en llamas’(Spain on fire), established by this organisation in collaboration with Goteo.org. This project shows where and when fires happen, quantifies the loss of life and forest, estimates the economic loses and the resources employed to put them out, and tells journalistic stories about whether the conflagrations were deliberate, showing patterns and connections.

In Latin America, where journalism is very prestigious, InfoAmazonia provides news and reports about the endangered Amazon region, based on the work of a network of organisations, journalists and citizens delivering updates from the nine countries of this forest. Latin America fosters some of the most innovative examples of data activism and journalism. The gradual access to the data infrastructure, the establishment of transparency laws in some countries, the availability of funds for data projects, and the prestige journalism have fostered the emergence of organisations that depict themselves as journalist endeavours, even if they do not just propose journalistic outputs but also training and advocacy, as InfoAmazonia.

Data activists are innovating with a wide range of action repertoires, giving birth to innovative projects, forging novel alliances, generating new datasets, and generating unconventional narratives and solutions.

Attributes of data activism

The way activists generate data is employed as a lens to discover action patterns and catalogue different cases. The study identifies five main ways in which activists generate datasets. Data activist projects can rely on whistle-blowers for data (e.g. International Consortium of Investigative Journalists) or resort to public and open datasets (e.g. the ‘Western Africa’s Missing Fish’ project). When the data are not accessible, they use crowdsourcing tools to collect citizen data (e.g. Ushahidi’s deployments); turn to appropriating data (e.g. via MobileMinner) or obtain data from primary research or from data-capturing devices, such drones (e.g. WeRobotics).

Three of the qualities that are more frequent in data activism include the inclination to collaborate and generate alliances to tackle big datasets and large-scale causes, to make maps when they visualise data, and to hybridise. Data activists have no qualms in crossing lines separating campaigning, funding, research, training, journalism, media work and humanitarianism. ‘Vagabundos de la chatarra’ (Scrap drifters) (Carrion and Sagar, 2015), for example, mixes data-driven maps, videos, comics journalism and advocacy to tell the story of the hundreds of people who survived the economic crisis in Barcelona in 2013 by collecting and selling metal scraps.

Maps shape a speciality of data activism which I dubbed geoactivism. Geoactivists typically employ critical cartography to produce analysis and communication tools to engage people, denounce abuse, produce counter-maps and coordinate action. The book explores some of the platforms that offer cartographic services to data projects with a social goal, such as CARTO, Kiln Data Visualisation, Populate, OpenStreetsMaps, among others.

The text also examines other attributes of data activism. For example, a data activist organisation that is dedicated to transferring data skills, provides resources to support data projects, occasionally generates data visualisations, works in alliances and often provides match-making opportunities to deliver data projects can be categorised as a skill transferrer that also has something of a catalyst. This is the case of DataKind, which is specialised in deploying data scientists within NGOs to work alongside with advocates on social causes.

The birth of digital humanitarianism

The leading case study is Ushahidi (‘testimony’ in Swahili), created in 2008 amid an information shutdown in Kenya to map the post-electoral violence reported via email and text messages by surviving victims and relatives (Keim, 2012). The Haiti deployment in 2010 to map the earthquake in quasi-real-time was a turning point in humanitarianism, originating digital humanitarianism (Meier, 2015). Today, Ushahidi’s platform allows interactive mapping widely used in emergency situations to support the humanitarian assistance based on citizen data. The Ushahidi platform facilitates the crowdsourcing, verification and visualisation of data, which are transformed into actionable information to be used by humanitarian agencies and people for decision-making.

Ushahidi is employed in this study to illustrate data activism in action, as a geoactivist organisation with some elements of the skill transferrer, which obtains data mainly via its crowdsourcing platform. The book explores several Ushahidi deployments’ characteristic action repertoires –i.e. crisis mapping, crowdsourced data, geospatial platforms, integrated mobile applications, aerial and satellite imagery, and computational and statistical models for data verification—, networking strategies, controversies and criticism, and lessons learnt. The study reveals the asymmetries that exist within actors: deployers (i.e. the digital humanitarians) launch the application from remote locations and become gatekeepers of the map, the reporters (i.e. the citizens) provide the data and use the map, while the humanitarian workers assist victims.

Ushahidi deployments have disrupted orthodox humanitarianism when they incorporate citizens in data-generation processes, offer unconventional narratives around crises and propose alternatives to conventional humanitarianism (Bernholz et al., 2010; Meier, 2016, 2015). By doing so, digital humanitarianism explicitly addresses the politics of data, questioning data availability and agency, and the associated top-down narratives, inviting people to produce their datasets and to shape issues, ultimately empowering so-called victims, who become data-generators and decision-makers.

So, what about data activism?

A model for effective data activism is offered as a theoretical tool to examine other cases of data activism beyond this study or to help design other initiatives. Data activist endeavours hybridise business models, contents, repertoires of action, organisational structures, activities and objectives; their proactivity facilitates collaboration, which also allows them totackle big issues and datasets; and although they are not confrontational, they can look like a social movement when they employ unorthodox methods, foster adaptable network structures and are based on shared values.

Colossal amounts of data are generated every day, and entire professions, companies and industries are devoted to gathering, hoarding, visualising and transforming data into value. Vast amounts of words are uttered to praise the beauty of data-driven decisions; data activists are concerned, instead, about making impact-driven decisions. For most of them, the data infrastructure is a great tool to achieve their goals.

References - in preparation

Baack S (2016) Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism. Big Data & Society 2(2): 1–11.

Bauman Z and Lyon D (2012) Liquid Surveillance. Cambridge, Malden: Polity Press.

Bernholz L, Skloot E and Varela B (2010) Disrupting Philanthropy: Technology and the Future of the Social Sector. Center for Strategic Philanthropy and Civil Society Sanford School of Public Policy Duke University. Available at: http://cspcs.sanford.duke.edu/sites/default/files/DisruptingPhil_online_FINAL.pdf (accessed 14 August 2018).

Berry DM (2011) The computational turn: Thinking about the digital humanities. Culture machine 12: 1-22. Available at: www.culturemachine.net/index.php/cm/article/download/440/470 (accessed 14 August 2018).

Boyd D and Crawford K (2012) Critical questions for big data. Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679.

Braman S (2009) Change of state: Information, policy, and power. Cambridge, MA: MIT Press.

Brevini B, Hintz A and McCurdy C (2013) Beyond WikiLeaks: implications for the future of communications, journalism and society. Basingstoke: Palgrave Macmillan.

Calhoun C (1992) Habermas and the Public Sphere. Cambridge, MA: MIT Press.

Carrion J and Sagar (2015) Los Vagabundos de la Chatarra. Barcelona: Norma.

Castells M (2009) Communication Power. Oxford: Oxford University Press.

Chandler I (2013). Advocacy and campaigning. How to Guide. The Pressure Group Bond for International Development. Available at: www.bond.org.uk/data/files/resources/45/Advocacy-and-campaigning-How-To-guide-December-2013.pdf (accessed 14 August 2018).

Cukier K and Mayer-Schoenberger V (2013) The Rise of Big Data: How It's Changing the Way We Think about the World. Foreign Affairs 92(3): 28–40.

Deibert R (2010) After WikiLeaks, a New Era. The New York Times. Available at: www.nytimes.com/roomfordebate/2010/12/09/what-has-wikileaks-started/after-wikileaks-a-new-era (accessed 14 August 2018).

della Porta D (2013) Can democracy be saved? Cambridge, Malden: Polity Press.

Downing JDH (ed) (2011) Encyclopedia of Social Movement Media. Thousand Oaks: Sage Publications Inc.

Froomkin M (2003) Habermas@Discourse.net: Toward a Critical Theory of Cyberspace. Harvard Law Review 116(3).

Gangadharan SP (2012) Digital inclusion and data profiling. First Monday 17(5). Available at: http://firstmonday.org/article/view/3821/3199 (accessed 14 August 2018).

Global Fishing Watch, 2017. The Global View of Transshipment. Skytruth. Available at: http://globalfishingwatch.org/wp-content/uploads/GlobalViewOfTransshipment_Aug2017.pdf (accessed 14 August 2018).

Goodwin J, Jasper JM and Polletta F (2004). Emotional Dimensions of Social Movements. In: Snow DA, Soule SA and Kriesi H (eds). The Blackwell Companion to Social Movements. Malden, Oxford, Carlton: Wiley-Blackwell.

Habermas J (1984) The theory of communicative action. Boston: Beacon Press.

---. (1991) The Structural Transformation of the Public Sphere: An Inquiry Into a Category of Bourgeois Society. Cambridge MA: MIT Press.

---. (1996) Between Facts and Norms: Contributions to a Discourse Theory of Law and Democracy. Cambridge MA: MIT Press.

Hellerstein J (2008) The Commoditization of Massive Data Analysis. Radar O'reilly. Available at: http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html (accessed 14 August 2018).

Keim B (2012) Open Source for Humanitarian Action. Stanford Social Innovation Review. Available at: https://ssir.org/articles/entry/open_source_for_humanitarian_action (accessed 14 August 2018).

Kennedy H, Poell T and van Dijck J (eds) (2016) Data and agency. Big Data & Society Available at: http://journals.sagepub.com/doi/10.1177/2053951715621569 (accessed 14 August 2018).

Kitchin, R (2014) Big Data, new epistemologies and paradigm shifts. Big Data & Society Available at: https://doi.org/10.1177/2053951714528481 (accessed 14 August 2018).

Mann S, Nolan J and Wellman, B (2002) Sousveillance: Inventing and Using Wearable Computing Devices for Data Collection in Surveillance Environments. Surveillance & Society 1(3): 331–355.

Mayer-Schönberger V and Cukier K (2013) Big data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt.

Meier P (2015) Digital humanitarians: how big data is changing the face of humanitarian response. Boca Raton, London, New York: CRC Press/Taylor & Francis Group.

---. (2016) Crisis Maps: Harnessing the Power of Big Data to Deliver Humanitarian Assistance. Forbes. Available at: www.forbes.com/sites/skollworldforum/2013/05/02/crisis-maps-harnessing-the-power-of-big-data-to-deliver-humanitarian-assistance/#7f4e729115c7 (accessed 14 August 2018).

Melucci A (1996) Challenging Codes: Collective Action in the Information Age. Cambridge: Cambridge University Press.

Milan S and Gutiérrez M (2015) Citizens' media meets Big Data: The emergence of data activism. Mediaciones 11(14): 120-133. Available at: http://revistas.uniminuto.edu/index.php/med/article/view/1086 (accessed 14 August 2018).

Milan S and Hintz A (2013) Networked Collective Action and the Institutionalized Policy Debate: Bringing Cyberactivism to the Policy Arena? Policy & Internet 5: 7-26.

Milan S and van der Velden L (2016) The alternative epistemologies of data activism. Digital Culture & Society 2(2): 57-74.

Naughton J (2018) Magical thinking about machine learning won’t bring the reality of AI any closer. The Guardian. Available at: www.theguardian.com/commentisfree/2018/aug/05/magical-thinking-about-machine-learning-will-not-bring-artificial-intelligence-any-closer (accessed 14 August 2018).

O’Neil C (2016) Weapons of Math Destruction: How big data increases inequality and threatens democracy. New York: Crown Publishers.

Rojas F (2015) Big Data and Social Movement Research. Mobilizing Ideas. Available at: https://mobilizingideas.wordpress.com/2015/04/02/big-data-and-social-movement-research/ (accessed 14 August 2018).

Smith A (2018) Franken-algorithms: the deadly consequences of unpredictable code. The Guardian. Available at: www.theguardian.com/technology/2018/aug/29/coding-algorithms-frankenalgos-program-danger (accessed 14 August 2018).

Tufekci Z (2014) Engineering the public: Internet, surveillance and computational politics. First Monday 19(7). Available at: http://firstmonday.org/article/view/4901/4097 (accessed 14 August 2018).

van de Donk W, Wim BDL, Nixon PG and Rucht D (2004) Cyberprotest: New Media, Citizens and Social Movements. London, New York: Routledge.

van Dijck J (2014) Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society 12(2): 197–208.

_________________________________________________________________

Hallam Stevens (Nanyang Technological University), Lyle Fearnley (Singapore University of Technology), Shirley Sun (Nanyang Technological University) and Sara Watson (Harvard University) reflect on a workshop, Big Data in Asian Society, held at the Nanyang Technological University, Singapore, from 27-28 October 2016.

Published: 6 June 2017

Ground/Cloud: co-presence of paper and digital data systems in rural India.

Source/Credit: Sandeep Mertia

Big Data in Asia: Provocations and Potentials

Social, cultural, and critical studies of “big data” have now firmly established themselves as vital fields of scholarship. Despite this proliferation of work, relatively little attention has been given to understanding differential effects of big data on various regions, nations, or groups. For the most part, studies of the “effects” of big data have either explicitly or implicitly attended to the Global North, or treated the effects of data as more or less homogeneous across place and space.

As big data spread outwards from metropoleis, however, there is an increasing need to attend to how its effects manifest across different cultures, different linguistic communities, different political systems, different socio-economic groups, and different geographic configurations. Payal Arora’s (2016) analysis of what she calls the “bottom of the big data pyramid” shows that many Western-driven big data initiatives directed at the Global South make the world’s poorer communities more vulnerable to regimes of surveillance and more subject to “marketization” and other forms of capitalist exploitation. Although big data initiatives for the Global South are often framed in terms of empowerment, Arora calls for greater skepticism towards how these data regimes actually play out in these contexts.

The sparse attention to the increasingly diverse effects of big data motivated us to organize a workshop under the title “Big Data in Asian Society” (Nanyang Technological University, 27-28^th October 2016). In Asia, big data has begun to be recognized as a significant economic and political force. The Singapore government appointed its first “chief data scientist” in 2014, promising to develop the nation’s capacity for data analysis to improve service delivery in fields such as health care and transport. In China, Web businesses such as Baidu, Alibaba, and Tencent are already massive data-owners and are expanding globally while investing heavily in big data mining and analysis research (Swanson 2015; Marr 2017). Many cities across Asia (such as Songdo, South Korea) hope to draw on the power of big data to become “smart cities” (Halpern 2015).

“Asia” is a particularly good site within which to examine the diversity of big data as object and practice. The geographic, linguistic, political, socio-economic, and cultural heterogeneity of Asia poses an immediate challenge to the notion of big data as a global-universal currency. Nevertheless, Asia may be a cohesive enough, as a region, to support useful generalizations as well as comparative work. Here, we draw on notions of “Asia as method” (Chen 2010) to suggest that studying Asia requires new frames of reference that take account of the region’s unique, yet interconnected, languages, histories, cultures, and politics.

Some of the questions animating our workshop included: What kinds of uses does data find in the various social and political contexts of Asia? Do the risks and potentials of big data look the same in these different contexts? What happens when structures for organizing and analyzing data get imported into different social and cultural contexts? What might we gain from a comparative approach to studying big data? Where and how does data flow across and between various regions? Who are the generators and users of big data in and from Asia?

Our workshop involved only some first steps in the investigation of these questions. Nevertheless, our discussions generated six provocations that we believe will be critical for further work on this topic. The remainder of this commentary describes these provocations, suggesting how they might be useful for expanding the global reach of studies of big data and society.

1. Pay Attention to Who is Represented in/by Big Data

As big data expand their reach, the representativeness of those data becomes increasingly important. Data have the potential to define the range of the “normal” in a variety of contexts; if particular populations, groups, or regions, are left out of data sets, individuals and groups may be cast as “outsiders” and “outliers.” Such “outsider” status could have several kinds of effects: it might render some groups unable to reap the benefits of big data, thereby entrenching new kinds of inequalities; it might render some groups increasingly socially and politically “invisible” or “illegible.” Increasingly, representativeness is not merely a matter of collecting more data in different places in the same way. In many cases, especially in the Global South, it will mean finding new ways to collect and analyze data too.

This need for representativeness has become most pressing in the context of biomedicine. Shirley Sun (Nanyang Technological University) spoke at the workshop about the efforts of the Pan-Asian SNP Consortium to expand the diversity of genomic data beyond (largely western and largely metropolitan) narrow populations. Biologists and medical practitioners, especially those working in non-western contexts, have pointed out that the findings of genomic medicine (based on non-representative datasets) may be irrelevant or even harmful when applied to non-western patients. Efforts such as the Pan-Asian SNP Consortium, are attempting to redress this non-representativeness and ensure that non-western populations are not left out of genomic medicine. At the same time, Sun warned that such efforts also inadvertently contribute to the racialization of medicine by suggesting the need for a separate or unique medicine for Asians (Sun 2017).

2. Pay Attention to Who is Benefitting From Big Data (and who is most risked)

Sara Watson (Digital Asia Hub) reminded the workshop participants about the underlying corporate dynamics of big data. “Big data” is a term driven by business world hype and tech industry marketing even to the extent that the language of big data (“mining,” “refining”) reflects industrial value-extraction processes (Watson 2016). Through big data, companies hope to forge new kinds of resources, markets and public-private partnerships. This poses a particular set of challenges in non-western contexts. This goes beyond the problem of new “digital divides” (boyd and Crawford 2012) in a number of ways.

First, there is the danger that the financial benefits of big data flow disproportionately towards the Global North. Increasingly, businesses looking to gather more and wider data (such as Facebook and Google) are looking towards Asia (Dalton 2016; Russell 2015). Data is already becoming a resource that it is increasingly aggregated and monetized within a few global centers.

Second, the benefits of big data come with substantial risks. Particularly salient are the risks of breaches of privacy and anonymity. Such risks are not likely to be understood or appreciated in the same ways everywhere. Attitudes towards privacy are far from globally uniform; nor are the stakes of privacy equal for everyone. Given that, how can such risks be assessed adequately and distributed equally?

Third, the corporate dynamics of big data also risk concentrating skills and expertise associated with them outside Asia. Those who do the work of building systems and infrastructures establish enduring categories, standards, and practices. It is critical that representativeness in big data extend not only to consideration of who is represented, but also to who is doing the work of big data (building data infrastructures, data analysis, building apps, and so on). Making data work wholly representative requires building inclusivity into the front end of making and working with data.

Kaushik Sunder Rajan has written about the global expansion of pharmaceutical trials to India. He argues that a fundamental injustice arises when those whose bodies are risked in trials are not the same individuals who stand to gain from the benefits of new treatments (Rajan 2010). The potential with data may be an equivalent one: those who are placed at risk through data collection are not necessarily the same persons who stand to gain from the aggregations of data.

3. History Matters

As we examine big data in different spaces, it is not only social, cultural, linguistic, political, and economic contexts that matter. History plays a critical role too. Institutions for collecting, storing, managing, processing, analyzing, and distributing data do not emerge from thin air. Rather, such institutions have histories which are going to affect data practices as well as attitudes towards data collection and data use. Particularly in postcolonial contexts, the histories of the colonial data collection have critical implications for how local populations understand and respond to big data.

At the workshop, Arunabh Ghosh (Harvard University) gave us a glimpse of the history of statistics in twentieth century China. Under the Chinese Communist Party, the statistical bureau aimed to count every aspect of the Chinese economy and society, attempting to mobilize this “complete” account for the purposes of centralized planning. Such methods, eschewing sampling and probability, were based on a purported one-to-one correspondence between the statistics and the reality on the ground. Such pre-big data big data meant that very different amounts and kinds of social and economic data were available in and about Communist China. More importantly, however, it suggests how those data belong to specific regimes of data practice – they were collected, aggregated, and used in particular ways for particular political purposes. Big data practices in present day China necessarily sit against the background of a longer history of data positivism, data for state planning, and notions of “statistics-as-reality.” Such legacies are not easily shed.

Historians have long been sensitive to the fact that the stories they can tell are directly dependent on their data (usually in the form of archives). In Asia, particularly, the colonial and wartime legacies of archives forms an important baseline for historical interpretation. Ann Stoler argues that archives are not collections merely to be “mined,” but rather they are “cultural artifacts of fact production, of taxonomies, in the making, and of disparate notions of what made up colonial authority” (Stoler 2002). In other words, the “data” in archives can never be divorced from the social and political conditions of its production; such conditions will always influence possible narratives, especially when it comes to colonial regimes. The same applies to other forms of data, whether collected recently or in the past – provenance matters in what we can do and make with data.

4. (Infra)Structures Matter

We already know that big data is not raw, not neutral, that they are always collected for various purposes, and that these purposes affect downstream uses (Gitelman ed. 2013). Attending to this “situatedness” of big data is even more important in a global context. The structures and institutions through which big data is collected, stored, managed, and analyzed encode particular kinds of values into that data. These values are not global or universal values. This becomes important especially when data is moved around, imported, exported, shared, and used in different contexts.

Sandeep Mertia (Sarai-CSDS) gave us an account of his ethnographic work in rural India, where numerous government and non-government agencies are attempting to collect data about local populations. Here, collecting data runs up against the practical difficulties not only of translation into local languages, but also such concerns as keeping data-collection tablets charged in areas with scarce electricity. As data gets “mined” from local registries and recorded onto paper, then into tablet-based forms that are uploaded to centralized databases, data collectors need to find ways of making local data into globally mobile data. This relies on communication infrastructures as well as the ability to mesh local categories and systems into standardized forms. Making data travel depends on local customs, facilities, and infrastructure. Understanding the possible meanings of data collected in this way will require this kind of detailed attention the effects of these local structures and infrastructures (including Internet and other communications infrastructures, transport infrastructures, power and electrical infrastructures, and so on).

Data here might better be thought of as something produced through processes of negotiation and translations, rather than something liquid. The “meaning” of data collected on paper in local Indian villages is not the same as the “meaning” of data entered into a tablet or in a spreadsheet representing hundreds of villages. What “data” are varies with place, time, and purpose, and the structures and media in which they exist and move always set important limits on what can be done with them. From this point of view, analysis of big data must draw not only on the literature on the ethnography of infrastructure (Star 1999) but also on scholarship on dissemination of facts and knowledge (Howlett and Morgan, eds. 2011).

5. Big Data is Not Necessarily the Best Use of Resources

Investments in big data, and the infrastructure for collecting and using it, are often justified on the grounds that it will provide efficiencies and save money (Mayer-Schönberger and Cukier 2013). But again, attention to varied contexts, suggests that investments in big data may not always be the best way to solve local problems. In many cases, simpler, smaller-scale, less costly solutions may be far more effective. For the Global South, it may be particularly important to argue against “big data” discourses of efficiency and cheapness. Advocates of big data suggest that “geo-locating a rural African farmer working in his farm with the help of an app installed in his cellphone, identifying the soil type and needs of the field, and offering advice regarding appropriate seeds, where they can be purchased, and how they can be planted and harvested is not far in the future” (Kshetri 2014). However, it is far from clear who would pay for the infrastructure to implement such schemes. Moreover, it is unclear whether scenarios would benefit farmers equally or how data privacy would be respected.

At the workshop, we learned about the massive growth in air travel within Asia as low cost carriers appeal to lower income customers. Max Hirsh (Hong Kong University) explained how airport authorities and urban planners – faced with overwhelming growth – have looked toward high-tech, big data solutions. However, in many cases this has produced a large amount of “data we don’t need” (such as data about restroom cleanliness). On the other hand, more straightforwardly useful data about passengers that is collected by airlines is routinely discarded or ignored since it does not emerge from high-tech, automated systems. In what Hirsh labels “middle-tech solutions,” data that already exist can be combined with existing infrastructures to find far more effective and efficient solutions to local problems (Hirsh 2016).

6. More Sharing Does Not Necessarily Mean More Openness

Big data is also often sold as the key to openness and transparency. Sharing data will also generate efficiencies, we are told, by increasing governmental and bureaucratic openness in particular (for example see Open Data Government 2017). This narrative resonates with western liberal ideas of democracy and free markets undergirded by free press and freedom of information. In autocratic or quasi-democratic contexts, however, the connection between sharing data and openness is not so straightforward. In fact, the rhetoric of transparency around data may create the appearance of openness in ways that actually foreclose further debate.

Hallam Stevens (Nanyang Technological University) offered an analysis of the Singapore government website data.gov.sg. Although data from many government agencies are shared via data.gov.sg, there is no guarantee that such data is complete and little information about how it was collected. Moreover, although the website encourages citizens to utilize the data, legitimate and illegitimate uses are carefully prescribed. As such, many of the “apps” developed via data.gov.sg are directed towards surveillance, citizen self-policing, and consumerism. Rather than challenging government aims and ideologies, the ways in which government data is actually deployed and used reinforces already dominant narratives within Singaporean society.

This is consonant with the findings of Levy and Johns (2016), who argue that, in certain contexts, transparency can be “weaponized” to hamstring democratic governance. In biology, too, regimes of open data that emerged in genomics in the 1990s are being rethought in light of concerns about privacy, ethics, and justice within the health care system (Reardon et al. 2016). Just because data is open does not necessarily guarantee that the practices attached to it will be democratic or free or just (see Ruppert 2015).

Future Work: Models of Data

Big data is a space in which new kinds of expertise and new claims of authority are rapidly emerging. As we analyze these new forms of power, we must pay critical attention to their differential effects across space and place. One way of doing this is to consider different kinds of metaphors or models for understanding data. Thinking of data as a manufactured product (something actively produced by work) rather than a resource (to be mined or exploited) could lead to different possibilities for their circulation and use. Thinking of data not as free-flowing or liquid, but as negotiated and translated, changes the value and meaning of moving data around. Thinking of data in terms of rights or responsibilities (to privacy and anonymity) rather than in terms of markets challenges the presumptions of “open data.” Thinking of data as situated and contextual (in space and in history) rather than universal can help to suggest the limitations of particular data regimes.

Many of the practices and structures of big data are now being imported from the Global North into Asia. But as we have suggested here, many of the potentials and risks of big data are quite different in these different contexts. Understanding data (and its consequences) in these diverse contexts requires that we develop and apply different models and metaphors. Since big data remains an emerging phenomenon in Asia, scholars have an opportunity to make important critical inventions. Both the diversity and interconnectedness of Asia makes it useful as a method for thinking with different models of what data might be and what it might be for. Comparisons both within and beyond Asia can illuminate the need for attention to diverse views, contexts, and histories in big data. This will require expansive thinking that includes social scientists, humanists, and artists to work with data scientists and policy makers to address some of the questions and challenges raised here.

References

Arora, Payal (2016) “Bottom of the Big Data Pyramid: Big Data and the Global South” International Journal of Communications 10: 1681-1699.

boyd, danah and Kate Crawford (2012) “Critical Questions for Big Data: Provocations for a Cultural, Technological and Scholarly Phenomenon” Information, Communication & Society 15, no. 5: 662-679.

Chen, Kuan-Hsing (2010) Asia as Method: Toward Deimperialization. Durham, NC: Duke University Press.

Dalton, Andrew (2016) “Google and Facebook Team Up on a Direct Connection to Asia” Engadget, 12^th October. https://www.engadget.com/2016/10/12/google-facebook-direct-connection-to-asia/

Gitelman, Lisa, ed. (2013) Raw Data is an Oxymoron. MIT Press.

Halpern, Orit (2015) Beautiful Data: A History of Vision and Reason since 1945. Duke University Press.

Hirsh, Max (2016) Airport Urbanism: Infrastructure and Mobility in Asia. University of Minnesota Press.

Howlett, Peter and Mary S. Morgan, eds. (2011) How Well Do Facts Travel? The Dissemination of Reliable Knowledge. Cambridge University Press.

Kshetri, Nir (2014) “The emerging role of Big Data in key development issues: Opportunities, Challenges and Concerns” Big Data & Society (18 December). http://journals.sagepub.com/doi/full/10.1177/2053951714564227

Levy, Karen E.C. and David M. Johns (2016) “When Open Data is a Trojan Horse: The Weaponization of Transparency in Science and Government” Big Data & Society. DOI: 10.1177/2053951715621568.

Marr, Bernard (2017) “How Chinese Internet Giant Baidu Uses AI and Machine Learning” Forbes, 13^th February. https://www.forbes.com/sites/bernardmarr/2017/02/13/how-chinese-internet-giant-baidu-uses-ai-and-machine-learning/#7456b471776f

Mayer-Schönberger, Viktor and Kenneth Cukier (2013) Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt.

Open Data Government (2017) Website. https://opengovernmentdata.org/

Rajan, Kaushik S. (2010) “The Experimental Machinery of Global Clinical Trials: Case Studies From India” In: Asian Biotech: Ethics and Communities of Fate. Aihwa Ong and Nancy N. Chen, eds. Duke University Press: pp. 55-80.

Reardon, J. et al. (2016) “Bermuda 2.0: Reflections from Santa Cruz” Gigascience 5, no. 1: 1-4.

Ruppert, Evelyn (2015) “Doing the Transparent State: Open Government Data as Performance Indicators” In: A World of Indicators: The Making of Governmental Knowledge Through Quantification. R. Rottenberg, S. E. Merry, S-J. Park, and J. Mugler, eds. Cambridge University Press: pp. 127-150.

Russell, Jon (2015) “Google Expands Its Data Centers in Asia as Millions Come Online for First Time” TechCrunch, 2nd June. https://techcrunch.com/2015/06/02/google-expands-its-data-centers-in-asia-as-millions-come-online-for-first-time/

Star, Susan L. (1999). “The Ethnography of Infrastructure” American Behavioral Scientist 43, no. 3: 377-391.

Stoler, Ann L. (2002) “Colonial archives and the arts of governance” Archival Science 2: 87-109.

Sun, Shirley H. (2017) Socio-Economics of Personalized Medicine in Asia. New York: Routledge.

Swanson, Ana (2015) “How Baidu, Tencent, and Alibaba are leading the way in China’s Big Data Revolution” South China Morning Post, 25^th August. http://www.scmp.com/tech/innovation/article/1852141/how-baidu-tencent-and-alibaba-are-leading-way-chinas-big-data

Watson, Sara M. (2016) “Data is the new ‘___’” Dis magazine. http://dismagazine.com/discussion/73298/sara-m-watson-metaphors-of-big-data/