Essays and Provocations

The Journal invites short essays, provocations and blogs on topics relevant to the study of Big Data practices.
_________________________________________________________________

New big data narratives emerging from society

Miren Gutérrez, Department of Communication, University of Deusto, Mundaiz Kalea, 50, 20012 San Sebastián, Spain, m.gutierrez@deusto.es

Published 23 Oct 2018

While working on a big data-based project on illegal fishing in western Africa in 2014, my colleagues and I at the Overseas Development Institute started putting together the rudiments of a data activist project that would include alliances, data-based research, cartography, lobbying, media reach work and an opportunistic report launch. Two years later, the maps that we published presented for the first time fishing vessels transferring fish at sea in areas where this operation is illegal.

The impact of these revelations was immediate: within days more than 150 media outlets, including The Guardian, CNN and BBC
, had reported on the story, the Gambia prohibited foreign operations in its waters, and Namibia signed a Food and Agriculture Organisation agreement to prevent illegal fishing, as the report was recommending. The focus on visualising fish transhipments as a way to spot irregularities was replicated by other organisations (Global Fishing Watch, 2017). The evidence was sufficient to begin shifting the inaction on illegal fishing, and also got me thinking about the promise of data activism.

Our partner, FishSpektrum, had independently collected the data on which the illegal fishing project was based despite the secrecy that conceals many fishing operations and agreements. Some of these datasets were public, but not open as they were buried under layers of red tape.

While I was working on this project, Stefania Milan invited me to lecture on data journalism at the University of Utrecht. In our conversations, we coined the term ‘data activism’ to refer to activism enabled (and constrained) by data. In an article we published in 2015, ‘Citizens’ Media Meets Big Data: The Emergence of Data Activism’, we linked data activism to citizens’ media and outlined the questions surrounding it (Milan and Gutiérrez 2015). This was the point of departure for data activism as a theory.

Based on these early ruminations, my new book Data activism and social change offers analysis that can be used to create new data endeavours, as well as to reinforce the theory spawned around the politics of big data from a bottom-up perspective. The difficulties in obtaining data for the illegal fishing project made me realise that data mining could be employed as a lens to generate one of the taxonomies of data activists offered in the book.

Theoretically, the manuscript is based on critical data studies, including (Baack, 2016; Braman, 2009; Cukier and Mayer-Schoenberger, 2013; Kennedy et al., 2016; Kitchin, 2014; Milan and Gutierrez, 2015; Milan and van der Velden, 2016; Tufekci, 2014; van Dijck, 2014) as well as social movement, communication and media theory (Boyd and Crawford, 2012; Calhoun, 1992; Castells, 2009; della Porta, 2013; Downing, 2011; Goodwin et al., 2004; Habermas, 1996, 1984; Melucci, 1996; Rojas, 2015; van de Donk et al., 2004), among many others. Data for the analysis were obtained blending empirical observation, semi-structured interviews and case studies of the manner in which people employ data in their daily practices to cooperate, engender datasets, create radical cartography, defy top-down narratives and analyses, join forces and act.

Types of data activism

Based on the dozens of interviews and cases, the book offers a classification of data activists into skills transferrers, catalysts, data journalism producers, and full-blown data activists (see Figure 1). The forty plus organisations and initiatives examined fall mainly under one of these four categories, although there is a great degree of hybridisation.


Figure 1: Classification of activists. Elaboration by the author.

Skills transferrers make data activism possible by responding to diverse challenges, building networks and bridging the gap between the skills-holders and the unskilled. They typically transmit data or social science skills (e.g. School of Data), create data platforms, and visualisation and analysis tools (e.g. Vizzuality), and trigger collaborative opportunities (e.g. Medialab-Prado). Meanwhile, catalysts usually provide the funds and resources to sustain data projects (e.g. Open Knowledge Foundation).

Data journalism producers often fill gaps left by journalistic organisations and pose opportunities for data projects. Journalism merits extra attention in the book since it has pioneered in communicating data-based information via visualisations and it is an entry point into activism for some reporters.

For example, in Spain, the economic crisis hitting media outlets, the low level of transparency and open data, and the discredit of Spanish journalism seem to have inspired some NGOs, such as Civio, to fill a gap in journalism. While remaining an advocacy group, Civio generates journalistic content supported by data such as ‘España en llamas’(Spain on fire), established by this organisation in collaboration with Goteo.org. This project shows where and when fires happen, quantifies the loss of life and forest, estimates the economic loses and the resources employed to put them out, and tells journalistic stories about whether the conflagrations were deliberate, showing patterns and connections.

In Latin America, where journalism is very prestigious, InfoAmazonia provides news and reports about the endangered Amazon region, based on the work of a network of organisations, journalists and citizens delivering updates from the nine countries of this forest. Latin America fosters some of the most innovative examples of data activism and journalism. The gradual access to the data infrastructure, the establishment of transparency laws in some countries, the availability of funds for data projects, and the prestige journalism have fostered the emergence of organisations that depict themselves as journalist endeavours, even if they do not just propose journalistic outputs but also training and advocacy, as InfoAmazonia.

Data activists are innovating with a wide range of action repertoires, giving birth to innovative projects, forging novel alliances, generating new datasets, and generating unconventional narratives and solutions.

Attributes of data activism

The way activists generate data is employed as a lens to discover action patterns and catalogue different cases. The study identifies five main ways in which activists generate datasets. Data activist projects can rely on whistle-blowers for data (e.g. International Consortium of Investigative Journalists) or resort to public and open datasets (e.g. the ‘Western Africa’s Missing Fish’ project). When the data are not accessible, they use crowdsourcing tools to collect citizen data (e.g. Ushahidi’s deployments); turn to appropriating data (e.g. via MobileMinner) or obtain data from primary research or from data-capturing devices, such drones (e.g. WeRobotics).

Three of the qualities that are more frequent in data activism include the inclination to collaborate and generate alliances to tackle big datasets and large-scale causes, to make maps when they visualise data, and to hybridise. Data activists have no qualms in crossing lines separating campaigning, funding, research, training, journalism, media work and humanitarianism. ‘Vagabundos de la chatarra’ (Scrap drifters) (Carrion and Sagar, 2015), for example, mixes data-driven maps, videos, comics journalism and advocacy to tell the story of the hundreds of people who survived the economic crisis in Barcelona in 2013 by collecting and selling metal scraps.

Maps shape a speciality of data activism which I dubbed geoactivism. Geoactivists typically employ critical cartography to produce analysis and communication tools to engage people, denounce abuse, produce counter-maps and coordinate action. The book explores some of the platforms that offer cartographic services to data projects with a social goal, such as CARTO, Kiln Data Visualisation, Populate, OpenStreetsMaps, among others.

The text also examines other attributes of data activism. For example, a data activist organisation that is dedicated to transferring data skills, provides resources to support data projects, occasionally generates data visualisations, works in alliances and often provides match-making opportunities to deliver data projects can be categorised as a skill transferrer that also has something of a catalyst. This is the case of DataKind, which is specialised in deploying data scientists within NGOs to work alongside with advocates on social causes.

The birth of digital humanitarianism

The leading case study is Ushahidi (‘testimony’ in Swahili), created in 2008 amid an information shutdown in Kenya to map the post-electoral violence reported via email and text messages by surviving victims and relatives (Keim, 2012). The Haiti deployment in 2010 to map the earthquake in quasi-real-time was a turning point in humanitarianism, originating digital humanitarianism (Meier, 2015). Today, Ushahidi’s platform allows interactive mapping widely used in emergency situations to support the humanitarian assistance based on citizen data. The Ushahidi platform facilitates the crowdsourcing, verification and visualisation of data, which are transformed into actionable information to be used by humanitarian agencies and people for decision-making.

Ushahidi is employed in this study to illustrate data activism in action, as a geoactivist organisation with some elements of the skill transferrer, which obtains data mainly via its crowdsourcing platform. The book explores several Ushahidi deployments’ characteristic action repertoires –i.e. crisis mapping, crowdsourced data, geospatial platforms, integrated mobile applications, aerial and satellite imagery, and computational and statistical models for data verification—, networking strategies, controversies and criticism, and lessons learnt. The study reveals the asymmetries that exist within actors: deployers (i.e. the digital humanitarians) launch the application from remote locations and become gatekeepers of the map, the reporters (i.e. the citizens) provide the data and use the map, while the humanitarian workers assist victims.

Ushahidi deployments have disrupted orthodox humanitarianism when they incorporate citizens in data-generation processes, offer unconventional narratives around crises and propose alternatives to conventional humanitarianism (Bernholz et al., 2010; Meier, 2016, 2015). By doing so, digital humanitarianism explicitly addresses the politics of data, questioning data availability and agency, and the associated top-down narratives, inviting people to produce their datasets and to shape issues, ultimately empowering so-called victims, who become data-generators and decision-makers.

So, what about data activism?

A model for effective data activism is offered as a theoretical tool to examine other cases of data activism beyond this study or to help design other initiatives. Data activist endeavours hybridise business models, contents, repertoires of action, organisational structures, activities and objectives; their proactivity facilitates collaboration, which also allows them totackle big issues and datasets; and although they are not confrontational, they can look like a social movement when they employ unorthodox methods, foster adaptable network structures and are based on shared values.

Colossal amounts of data are generated every day, and entire professions, companies and industries are devoted to gathering, hoarding, visualising and transforming data into value. Vast amounts of words are uttered to praise the beauty of data-driven decisions; data activists are concerned, instead, about making impact-driven decisions. For most of them, the data infrastructure is a great tool to achieve their goals.

References

Baack S (2016) Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism. Big Data & Society 2(2): 1–11.

Bauman Z and Lyon D (2012) Liquid Surveillance. Cambridge, Malden: Polity Press.

Bernholz L, Skloot E and Varela B (2010) Disrupting Philanthropy: Technology and the Future of the Social Sector. Center for Strategic Philanthropy and Civil Society Sanford School of Public Policy Duke University. Available at: http://cspcs.sanford.duke.edu/sites/default/files/DisruptingPhil_online_FINAL.pdf (accessed 14 August 2018).

Berry DM (2011) The computational turn: Thinking about the digital humanities. Culture machine 12: 1-22. Available at: www.culturemachine.net/index.php/cm/article/download/440/470 (accessed 14 August 2018).

Boyd D and Crawford K (2012) Critical questions for big data. Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679.

Braman S (2009) Change of state: Information, policy, and power. Cambridge, MA: MIT Press.

Brevini B, Hintz A and McCurdy C (2013) Beyond WikiLeaks: implications for the future of communications, journalism and society. Basingstoke: Palgrave Macmillan.

Calhoun C (1992) Habermas and the Public Sphere. Cambridge, MA: MIT Press.

Carrion J and Sagar (2015) Los Vagabundos de la Chatarra. Barcelona: Norma.

Castells M (2009) Communication Power. Oxford: Oxford University Press.

Chandler I (2013). Advocacy and campaigning. How to Guide. The Pressure Group Bond for International Development. Available at: www.bond.org.uk/data/files/resources/45/Advocacy-and-campaigning-How-To-guide-December-2013.pdf (accessed 14 August 2018).

Cukier K and Mayer-Schoenberger V (2013) The Rise of Big Data: How It's Changing the Way We Think about the World. Foreign Affairs 92(3): 28–40.

Deibert R (2010) After WikiLeaks, a New Era. The New York Times. Available at: www.nytimes.com/roomfordebate/2010/12/09/what-has-wikileaks-started/after-wikileaks-a-new-era (accessed 14 August 2018).

della Porta D (2013) Can democracy be saved? Cambridge, Malden: Polity Press.

Downing JDH (ed) (2011) Encyclopedia of Social Movement Media. Thousand Oaks: Sage Publications Inc.

Froomkin M (2003) Habermas@Discourse.net: Toward a Critical Theory of Cyberspace. Harvard Law Review 116(3).

Gangadharan SP (2012) Digital inclusion and data profiling. First Monday 17(5). Available at: http://firstmonday.org/article/view/3821/3199 (accessed 14 August 2018).

Global Fishing Watch, 2017. The Global View of Transshipment. Skytruth. Available at: http://globalfishingwatch.org/wp-content/uploads/GlobalViewOfTransshipment_Aug2017.pdf (accessed 14 August 2018).

Goodwin J, Jasper JM and Polletta F (2004). Emotional Dimensions of Social Movements. In: Snow DA, Soule SA and Kriesi H (eds). The Blackwell Companion to Social Movements. Malden, Oxford, Carlton: Wiley-Blackwell.

Habermas J (1984) The theory of communicative action. Boston: Beacon Press.

---. (1991) The Structural Transformation of the Public Sphere: An Inquiry Into a Category of Bourgeois Society. Cambridge MA: MIT Press.

---. (1996) Between Facts and Norms: Contributions to a Discourse Theory of Law and Democracy. Cambridge MA: MIT Press.

Hellerstein J (2008) The Commoditization of Massive Data Analysis. Radar O'reilly. Available at: http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html (accessed 14 August 2018).

Keim B (2012) Open Source for Humanitarian Action. Stanford Social Innovation Review. Available at: https://ssir.org/articles/entry/open_source_for_humanitarian_action (accessed 14 August 2018).

Kennedy H, Poell T and van Dijck J (eds) (2016) Data and agency. Big Data & Society Available at: http://journals.sagepub.com/doi/10.1177/2053951715621569 (accessed 14 August 2018).

Kitchin, R (2014) Big Data, new epistemologies and paradigm shifts. Big Data & Society Available at: https://doi.org/10.1177/2053951714528481 (accessed 14 August 2018).

Mann S, Nolan J and Wellman, B (2002) Sousveillance: Inventing and Using Wearable Computing Devices for Data Collection in Surveillance Environments. Surveillance & Society 1(3): 331–355.

Mayer-Schönberger V and Cukier K (2013) Big data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt.

Meier P (2015) Digital humanitarians: how big data is changing the face of humanitarian response. Boca Raton, London, New York: CRC Press/Taylor & Francis Group.

---. (2016) Crisis Maps: Harnessing the Power of Big Data to Deliver Humanitarian Assistance. Forbes. Available at: www.forbes.com/sites/skollworldforum/2013/05/02/crisis-maps-harnessing-the-power-of-big-data-to-deliver-humanitarian-assistance/#7f4e729115c7 (accessed 14 August 2018).

Melucci A (1996) Challenging Codes: Collective Action in the Information Age. Cambridge: Cambridge University Press.

Milan S and Gutiérrez M (2015) Citizens' media meets Big Data: The emergence of data activism. Mediaciones 11(14): 120-133. Available at: http://revistas.uniminuto.edu/index.php/med/article/view/1086 (accessed 14 August 2018).

Milan S and Hintz A (2013) Networked Collective Action and the Institutionalized Policy Debate: Bringing Cyberactivism to the Policy Arena? Policy & Internet 5: 7-26.

Milan S and van der Velden L (2016) The alternative epistemologies of data activism. Digital Culture & Society 2(2): 57-74.

Naughton J (2018) Magical thinking about machine learning won’t bring the reality of AI any closer. The Guardian. Available at: www.theguardian.com/commentisfree/2018/aug/05/magical-thinking-about-machine-learning-will-not-bring-artificial-intelligence-any-closer (accessed 14 August 2018).

O’Neil C (2016) Weapons of Math Destruction: How big data increases inequality and threatens democracy. New York: Crown Publishers.

Rojas F (2015) Big Data and Social Movement Research. Mobilizing Ideas. Available at: https://mobilizingideas.wordpress.com/2015/04/02/big-data-and-social-movement-research/ (accessed 14 August 2018).

Smith A (2018) Franken-algorithms: the deadly consequences of unpredictable code. The Guardian. Available at: www.theguardian.com/technology/2018/aug/29/coding-algorithms-frankenalgos-program-danger (accessed 14 August 2018).

Tufekci Z (2014) Engineering the public: Internet, surveillance and computational politics. First Monday 19(7). Available at: http://firstmonday.org/article/view/4901/4097 (accessed 14 August 2018).

van de Donk W, Wim BDL, Nixon PG and Rucht D (2004) Cyberprotest: New Media, Citizens and Social Movements. London, New York: Routledge.

van Dijck J (2014) Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society 12(2): 197–208.

_________________________________________________________________

Hallam Stevens (Nanyang Technological University), Lyle Fearnley (Singapore University of Technology), Shirley Sun (Nanyang Technological University) and Sara Watson (Harvard University) reflect on a workshop, Big Data in Asian Society, held at the Nanyang Technological University, Singapore, from 27-28 October 2016.

Published: 6 June 2017

Ground/Cloud: co-presence of paper and digital data systems in rural India.
Source/Credit: Sandeep Mertia

Big Data in Asia:  Provocations and Potentials


Social, cultural, and critical studies of “big data” have now firmly established themselves as vital fields of scholarship. Despite this proliferation of work, relatively little attention has been given to understanding differential effects of big data on various regions, nations, or groups. For the most part, studies of the “effects” of big data have either explicitly or implicitly attended to the Global North, or treated the effects of data as more or less homogeneous across place and space.

As big data spread outwards from metropoleis, however, there is an increasing need to attend to how its effects manifest across different cultures, different linguistic communities, different political systems, different socio-economic groups, and different geographic configurations. Payal Arora’s (2016) analysis of what she calls the “bottom of the big data pyramid” shows that many Western-driven big data initiatives directed at the Global South make the world’s poorer communities more vulnerable to regimes of surveillance and more subject to “marketization” and other forms of capitalist exploitation. Although big data initiatives for the Global South are often framed in terms of empowerment, Arora calls for greater skepticism towards how these data regimes actually play out in these contexts. 

The sparse attention to the increasingly diverse effects of big data motivated us to organize a workshop under the title “Big Data in Asian Society” (Nanyang Technological University, 27-28th October 2016). In Asia, big data has begun to be recognized as a significant economic and political force. The Singapore government appointed its first “chief data scientist” in 2014, promising to develop the nation’s capacity for data analysis to improve service delivery in fields such as health care and transport. In China, Web businesses such as Baidu, Alibaba, and Tencent are already massive data-owners and are expanding globally while investing heavily in big data mining and analysis research (Swanson 2015; Marr 2017). Many cities across Asia (such as Songdo, South Korea) hope to draw on the power of big data to become “smart cities” (Halpern 2015).

“Asia” is a particularly good site within which to examine the diversity of big data as object and practice. The geographic, linguistic, political, socio-economic, and cultural heterogeneity of Asia poses an immediate challenge to the notion of big data as a global-universal currency. Nevertheless, Asia may be a cohesive enough, as a region, to support useful generalizations as well as comparative work. Here, we draw on notions of “Asia as method” (Chen 2010) to suggest that studying Asia requires new frames of reference that take account of the region’s unique, yet interconnected, languages, histories, cultures, and politics.

Some of the questions animating our workshop included: What kinds of uses does data find in the various social and political contexts of Asia? Do the risks and potentials of big data look the same in these different contexts? What happens when structures for organizing and analyzing data get imported into different social and cultural contexts? What might we gain from a comparative approach to studying big data? Where and how does data flow across and between various regions? Who are the generators and users of big data in and from Asia?

Our workshop involved only some first steps in the investigation of these questions. Nevertheless, our discussions generated six provocations that we believe will be critical for further work on this topic. The remainder of this commentary describes these provocations, suggesting how they might be useful for expanding the global reach of studies of big data and society.

1.      Pay Attention to Who is Represented in/by Big Data

As big data expand their reach, the representativeness of those data becomes increasingly important. Data have the potential to define the range of the “normal” in a variety of contexts; if particular populations, groups, or regions, are left out of data sets, individuals and groups may be cast as “outsiders” and “outliers.” Such “outsider” status could have several kinds of effects: it might render some groups unable to reap the benefits of big data, thereby entrenching new kinds of inequalities; it might render some groups increasingly socially and politically “invisible” or “illegible.” Increasingly, representativeness is not merely a matter of collecting more data in different places in the same way. In many cases, especially in the Global South, it will mean finding new ways to collect and analyze data too.

This need for representativeness has become most pressing in the context of biomedicine. Shirley Sun (Nanyang Technological University) spoke at the workshop about the efforts of the Pan-Asian SNP Consortium to expand the diversity of genomic data beyond (largely western and largely metropolitan) narrow populations. Biologists and medical practitioners, especially those working in non-western contexts, have pointed out that the findings of genomic medicine (based on non-representative datasets) may be irrelevant or even harmful when applied to non-western patients. Efforts such as the Pan-Asian SNP Consortium, are attempting to redress this non-representativeness and ensure that non-western populations are not left out of genomic medicine. At the same time, Sun warned that such efforts also inadvertently contribute to the racialization of medicine by suggesting the need for a separate or unique medicine for Asians (Sun 2017).

2.      Pay Attention to Who is Benefitting From Big Data (and who is most risked)

Sara Watson (Digital Asia Hub) reminded the workshop participants about the underlying corporate dynamics of big data. “Big data” is a term driven by business world hype and tech industry marketing even to the extent that the language of big data (“mining,” “refining”) reflects industrial value-extraction processes (Watson 2016). Through big data, companies hope to forge new kinds of resources, markets and public-private partnerships. This poses a particular set of challenges in non-western contexts. This goes beyond the problem of new “digital divides” (boyd and Crawford 2012) in a number of ways.

First, there is the danger that the financial benefits of big data flow disproportionately towards the Global North. Increasingly, businesses looking to gather more and wider data (such as Facebook and Google) are looking towards Asia (Dalton 2016; Russell 2015). Data is already becoming a resource that it is increasingly aggregated and monetized within a few global centers.

Second, the benefits of big data come with substantial risks. Particularly salient are the risks of breaches of privacy and anonymity. Such risks are not likely to be understood or appreciated in the same ways everywhere. Attitudes towards privacy are far from globally uniform; nor are the stakes of privacy equal for everyone. Given that, how can such risks be assessed adequately and distributed equally?

Third, the corporate dynamics of big data also risk concentrating skills and expertise associated with them outside Asia. Those who do the work of building systems and infrastructures establish enduring categories, standards, and practices. It is critical that representativeness in big data extend not only to consideration of who is represented, but also to who is doing the work of big data (building data infrastructures, data analysis, building apps, and so on). Making data work wholly representative requires building inclusivity into the front end of making and working with data.

Kaushik Sunder Rajan has written about the global expansion of pharmaceutical trials to India. He argues that a fundamental injustice arises when those whose bodies are risked in trials are not the same individuals who stand to gain from the benefits of new treatments (Rajan 2010). The potential with data may be an equivalent one: those who are placed at risk through data collection are not necessarily the same persons who stand to gain from the aggregations of data.

3.      History Matters

As we examine big data in different spaces, it is not only social, cultural, linguistic, political, and economic contexts that matter. History plays a critical role too. Institutions for collecting, storing, managing, processing, analyzing, and distributing data do not emerge from thin air. Rather, such institutions have histories which are going to affect data practices as well as attitudes towards data collection and data use. Particularly in postcolonial contexts, the histories of the colonial data collection have critical implications for how local populations understand and respond to big data.

At the workshop, Arunabh Ghosh (Harvard University) gave us a glimpse of the history of statistics in twentieth century China. Under the Chinese Communist Party, the statistical bureau aimed to count every aspect of the Chinese economy and society, attempting to mobilize this “complete” account for the purposes of centralized planning. Such methods, eschewing sampling and probability, were based on a purported one-to-one correspondence between the statistics and the reality on the ground. Such pre-big data big data meant that very different amounts and kinds of social and economic data were available in and about Communist China. More importantly, however, it suggests how those data belong to specific regimes of data practice – they were collected, aggregated, and used in particular ways for particular political purposes. Big data practices in present day China necessarily sit against the background of a longer history of data positivism, data for state planning, and notions of “statistics-as-reality.”  Such legacies are not easily shed.

Historians have long been sensitive to the fact that the stories they can tell are directly dependent on their data (usually in the form of archives). In Asia, particularly, the colonial and wartime legacies of archives forms an important baseline for historical interpretation. Ann Stoler argues that archives are not collections merely to be “mined,” but rather they are “cultural artifacts of fact production, of taxonomies, in the making, and of disparate notions of what made up colonial authority” (Stoler 2002). In other words, the “data” in archives can never be divorced from the social and political conditions of its production; such conditions will always influence possible narratives, especially when it comes to colonial regimes. The same applies to other forms of data, whether collected recently or in the past – provenance matters in what we can do and make with data.  

4. (Infra)Structures Matter

We already know that big data is not raw, not neutral, that they are always collected for various purposes, and that these purposes affect downstream uses (Gitelman ed. 2013). Attending to this “situatedness” of big data is even more important in a global context. The structures and institutions through which big data is collected, stored, managed, and analyzed encode particular kinds of values into that data. These values are not global or universal values. This becomes important especially when data is moved around, imported, exported, shared, and used in different contexts.

Sandeep Mertia (Sarai-CSDS) gave us an account of his ethnographic work in rural India, where numerous government and non-government agencies are attempting to collect data about local populations. Here, collecting data runs up against the practical difficulties not only of translation into local languages, but also such concerns as keeping data-collection tablets charged in areas with scarce electricity. As data gets “mined” from local registries and recorded onto paper, then into tablet-based forms that are uploaded to centralized databases, data collectors need to find ways of making local data into globally mobile data. This relies on communication infrastructures as well as the ability to mesh local categories and systems into standardized forms. Making data travel depends on local customs, facilities, and infrastructure. Understanding the possible meanings of data collected in this way will require this kind of detailed attention the effects of these local structures and infrastructures (including Internet and other communications infrastructures, transport infrastructures, power and electrical infrastructures, and so on).

Data here might better be thought of as something produced through processes of negotiation and translations, rather than something liquid. The “meaning” of data collected on paper in local Indian villages is not the same as the “meaning” of data entered into a tablet or in a spreadsheet representing hundreds of villages. What “data” are varies with place, time, and purpose, and the structures and media in which they exist and move always set important limits on what can be done with them. From this point of view, analysis of big data must draw not only on the literature on the ethnography of infrastructure (Star 1999) but also on scholarship on dissemination of facts and knowledge (Howlett and Morgan, eds. 2011).

5.      Big Data is Not Necessarily the Best Use of Resources

Investments in big data, and the infrastructure for collecting and using it, are often justified on the grounds that it will provide efficiencies and save money (Mayer-Schönberger and Cukier 2013). But again, attention to varied contexts, suggests that investments in big data may not always be the best way to solve local problems. In many cases, simpler, smaller-scale, less costly solutions may be far more effective. For the Global South, it may be particularly important to argue against “big data” discourses of efficiency and cheapness. Advocates of big data suggest that “geo-locating a rural African farmer working in his farm with the help of an app installed in his cellphone, identifying the soil type and needs of the field, and offering advice regarding appropriate seeds, where they can be purchased, and how they can be planted and harvested is not far in the future” (Kshetri 2014). However, it is far from clear who would pay for the infrastructure to implement such schemes. Moreover, it is unclear whether scenarios would benefit farmers equally or how data privacy would be respected.

At the workshop, we learned about the massive growth in air travel within Asia as low cost carriers appeal to lower income customers. Max Hirsh (Hong Kong University) explained how airport authorities and urban planners – faced with overwhelming growth – have looked toward high-tech, big data solutions. However, in many cases this has produced a large amount of “data we don’t need” (such as data about restroom cleanliness). On the other hand, more straightforwardly useful data about passengers that is collected by airlines is routinely discarded or ignored since it does not emerge from high-tech, automated systems. In what Hirsh labels “middle-tech solutions,” data that already exist can be combined with existing infrastructures to find far more effective and efficient solutions to local problems (Hirsh 2016).

6.      More Sharing Does Not Necessarily Mean More Openness

Big data is also often sold as the key to openness and transparency. Sharing data will also generate efficiencies, we are told, by increasing governmental and bureaucratic openness in particular (for example see Open Data Government 2017). This narrative resonates with western liberal ideas of democracy and free markets undergirded by free press and freedom of information. In autocratic or quasi-democratic contexts, however, the connection between sharing data and openness is not so straightforward. In fact, the rhetoric of transparency around data may create the appearance of openness in ways that actually foreclose further debate.

Hallam Stevens (Nanyang Technological University) offered an analysis of the Singapore government website data.gov.sg. Although data from many government agencies are shared via data.gov.sg, there is no guarantee that such data is complete and little information about how it was collected. Moreover, although the website encourages citizens to utilize the data, legitimate and illegitimate uses are carefully prescribed. As such, many of the “apps” developed via data.gov.sg are directed towards surveillance, citizen self-policing, and consumerism. Rather than challenging government aims and ideologies, the ways in which government data is actually deployed and used reinforces already dominant narratives within Singaporean society.

This is consonant with the findings of Levy and Johns (2016), who argue that, in certain contexts, transparency can be “weaponized” to hamstring democratic governance. In biology, too, regimes of open data that emerged in genomics in the 1990s are being rethought in light of concerns about privacy, ethics, and justice within the health care system (Reardon et al. 2016).   Just because data is open does not necessarily guarantee that the practices attached to it will be democratic or free or just (see Ruppert 2015).

Future Work: Models of Data

Big data is a space in which new kinds of expertise and new claims of authority are rapidly emerging. As we analyze these new forms of power, we must pay critical attention to their differential effects across space and place. One way of doing this is to consider different kinds of metaphors or models for understanding data. Thinking of data as a manufactured product (something actively produced by work) rather than a resource (to be mined or exploited) could lead to different possibilities for their circulation and use. Thinking of data not as free-flowing or liquid, but as negotiated and translated, changes the value and meaning of moving data around. Thinking of data in terms of rights or responsibilities (to privacy and anonymity) rather than in terms of markets challenges the presumptions of “open data.” Thinking of data as situated and contextual (in space and in history) rather than universal can help to suggest the limitations of particular data regimes.  

Many of the practices and structures of big data are now being imported from the Global North into Asia. But as we have suggested here, many of the potentials and risks of big data are quite different in these different contexts. Understanding data (and its consequences) in these diverse contexts requires that we develop and apply different models and metaphors. Since big data remains an emerging phenomenon in Asia, scholars have an opportunity to make important critical inventions. Both the diversity and interconnectedness of Asia makes it useful as a method for thinking with different models of what data might be and what it might be for. Comparisons both within and beyond Asia can illuminate the need for attention to diverse views, contexts, and histories in big data. This will require expansive thinking that includes social scientists, humanists, and artists to work with data scientists and policy makers to address some of the questions and challenges raised here.

References

Arora, Payal (2016) “Bottom of the Big Data Pyramid: Big Data and the Global South” International Journal of Communications 10: 1681-1699.

boyd, danah and Kate Crawford (2012) “Critical Questions for Big Data: Provocations for a Cultural, Technological and Scholarly Phenomenon” Information, Communication & Society 15, no. 5: 662-679.  

Chen, Kuan-Hsing (2010) Asia as Method: Toward Deimperialization. Durham, NC: Duke University Press.

Dalton, Andrew (2016) “Google and Facebook Team Up on a Direct Connection to Asia” Engadget, 12th October. https://www.engadget.com/2016/10/12/google-facebook-direct-connection-to-asia/

Gitelman, Lisa, ed. (2013) Raw Data is an Oxymoron. MIT Press.

Halpern, Orit (2015) Beautiful Data: A History of Vision and Reason since 1945. Duke University Press.

Hirsh, Max (2016) Airport Urbanism: Infrastructure and Mobility in Asia. University of Minnesota Press.

Howlett, Peter and Mary S. Morgan, eds. (2011) How Well Do Facts Travel? The Dissemination of Reliable Knowledge. Cambridge University Press.

Kshetri, Nir (2014) “The emerging role of Big Data in key development issues: Opportunities, Challenges and Concerns” Big Data & Society (18 December). http://journals.sagepub.com/doi/full/10.1177/2053951714564227

Levy, Karen E.C. and David M. Johns (2016) “When Open Data is a Trojan Horse: The Weaponization of Transparency in Science and Government” Big Data & Society. DOI: 10.1177/2053951715621568.

Marr, Bernard (2017) “How Chinese Internet Giant Baidu Uses AI and Machine Learning” Forbes, 13th February. https://www.forbes.com/sites/bernardmarr/2017/02/13/how-chinese-internet-giant-baidu-uses-ai-and-machine-learning/#7456b471776f

Mayer-Schönberger, Viktor and Kenneth Cukier (2013) Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt.

Open Data Government (2017) Website. https://opengovernmentdata.org/

Rajan, Kaushik S. (2010) “The Experimental Machinery of Global Clinical Trials: Case Studies From India” In: Asian Biotech: Ethics and Communities of Fate. Aihwa Ong and Nancy N. Chen, eds. Duke University Press: pp. 55-80.

Reardon, J. et al. (2016) “Bermuda 2.0: Reflections from Santa Cruz” Gigascience 5, no. 1: 1-4.

Ruppert, Evelyn (2015) “Doing the Transparent State: Open Government Data as Performance Indicators” In: A World of Indicators: The Making of Governmental Knowledge Through Quantification. R. Rottenberg, S. E. Merry, S-J. Park, and J. Mugler, eds. Cambridge University Press: pp. 127-150. 

Russell, Jon (2015) “Google Expands Its Data Centers in Asia as Millions Come Online for First Time” TechCrunch, 2nd June. https://techcrunch.com/2015/06/02/google-expands-its-data-centers-in-asia-as-millions-come-online-for-first-time/

Star, Susan L. (1999). “The Ethnography of Infrastructure” American Behavioral Scientist 43, no. 3: 377-391.

Stoler, Ann L. (2002) “Colonial archives and the arts of governance” Archival Science 2: 87-109.

Sun, Shirley H. (2017) Socio-Economics of Personalized Medicine in Asia. New York: Routledge. 

Swanson, Ana (2015) “How Baidu, Tencent, and Alibaba are leading the way in China’s Big Data Revolution” South China Morning Post, 25th August.   http://www.scmp.com/tech/innovation/article/1852141/how-baidu-tencent-and-alibaba-are-leading-way-chinas-big-data

Watson, Sara M. (2016) “Data is the new ‘___’” Dis magazine. http://dismagazine.com/discussion/73298/sara-m-watson-metaphors-of-big-data/