Essays and Provocations

The Journal invites short essays, provocations and blogs on topics relevant to the study of Big Data practices.
______________________________________________________________________

This month's essay is by Hallam Stevens (Nanyang Technological University), Lyle Fearnley (Singapore University of Technology), Shirley Sun (Nanyang Technological University) and Sara Watson (Harvard University) who reflect on a workshop, Big Data in Asian Society, held at the Nanyang Technological University, Singapore, from 27-28 October 2016.

Published: 6 June 2017
______________________________________________________________________
Ground/Cloud: co-presence of paper and digital data systems in rural India.
Source/Credit: Sandeep Mertia

Big Data in Asia:  Provocations and Potentials


Social, cultural, and critical studies of “big data” have now firmly established themselves as vital fields of scholarship. Despite this proliferation of work, relatively little attention has been given to understanding differential effects of big data on various regions, nations, or groups. For the most part, studies of the “effects” of big data have either explicitly or implicitly attended to the Global North, or treated the effects of data as more or less homogeneous across place and space.

As big data spread outwards from metropoleis, however, there is an increasing need to attend to how its effects manifest across different cultures, different linguistic communities, different political systems, different socio-economic groups, and different geographic configurations. Payal Arora’s (2016) analysis of what she calls the “bottom of the big data pyramid” shows that many Western-driven big data initiatives directed at the Global South make the world’s poorer communities more vulnerable to regimes of surveillance and more subject to “marketization” and other forms of capitalist exploitation. Although big data initiatives for the Global South are often framed in terms of empowerment, Arora calls for greater skepticism towards how these data regimes actually play out in these contexts. 

The sparse attention to the increasingly diverse effects of big data motivated us to organize a workshop under the title “Big Data in Asian Society” (Nanyang Technological University, 27-28th October 2016). In Asia, big data has begun to be recognized as a significant economic and political force. The Singapore government appointed its first “chief data scientist” in 2014, promising to develop the nation’s capacity for data analysis to improve service delivery in fields such as health care and transport. In China, Web businesses such as Baidu, Alibaba, and Tencent are already massive data-owners and are expanding globally while investing heavily in big data mining and analysis research (Swanson 2015; Marr 2017). Many cities across Asia (such as Songdo, South Korea) hope to draw on the power of big data to become “smart cities” (Halpern 2015).

“Asia” is a particularly good site within which to examine the diversity of big data as object and practice. The geographic, linguistic, political, socio-economic, and cultural heterogeneity of Asia poses an immediate challenge to the notion of big data as a global-universal currency. Nevertheless, Asia may be a cohesive enough, as a region, to support useful generalizations as well as comparative work. Here, we draw on notions of “Asia as method” (Chen 2010) to suggest that studying Asia requires new frames of reference that take account of the region’s unique, yet interconnected, languages, histories, cultures, and politics.

Some of the questions animating our workshop included: What kinds of uses does data find in the various social and political contexts of Asia? Do the risks and potentials of big data look the same in these different contexts? What happens when structures for organizing and analyzing data get imported into different social and cultural contexts? What might we gain from a comparative approach to studying big data? Where and how does data flow across and between various regions? Who are the generators and users of big data in and from Asia?

Our workshop involved only some first steps in the investigation of these questions. Nevertheless, our discussions generated six provocations that we believe will be critical for further work on this topic. The remainder of this commentary describes these provocations, suggesting how they might be useful for expanding the global reach of studies of big data and society.

1.      Pay Attention to Who is Represented in/by Big Data

As big data expand their reach, the representativeness of those data becomes increasingly important. Data have the potential to define the range of the “normal” in a variety of contexts; if particular populations, groups, or regions, are left out of data sets, individuals and groups may be cast as “outsiders” and “outliers.” Such “outsider” status could have several kinds of effects: it might render some groups unable to reap the benefits of big data, thereby entrenching new kinds of inequalities; it might render some groups increasingly socially and politically “invisible” or “illegible.” Increasingly, representativeness is not merely a matter of collecting more data in different places in the same way. In many cases, especially in the Global South, it will mean finding new ways to collect and analyze data too.

This need for representativeness has become most pressing in the context of biomedicine. Shirley Sun (Nanyang Technological University) spoke at the workshop about the efforts of the Pan-Asian SNP Consortium to expand the diversity of genomic data beyond (largely western and largely metropolitan) narrow populations. Biologists and medical practitioners, especially those working in non-western contexts, have pointed out that the findings of genomic medicine (based on non-representative datasets) may be irrelevant or even harmful when applied to non-western patients. Efforts such as the Pan-Asian SNP Consortium, are attempting to redress this non-representativeness and ensure that non-western populations are not left out of genomic medicine. At the same time, Sun warned that such efforts also inadvertently contribute to the racialization of medicine by suggesting the need for a separate or unique medicine for Asians (Sun 2017).

2.      Pay Attention to Who is Benefitting From Big Data (and who is most risked)

Sara Watson (Digital Asia Hub) reminded the workshop participants about the underlying corporate dynamics of big data. “Big data” is a term driven by business world hype and tech industry marketing even to the extent that the language of big data (“mining,” “refining”) reflects industrial value-extraction processes (Watson 2016). Through big data, companies hope to forge new kinds of resources, markets and public-private partnerships. This poses a particular set of challenges in non-western contexts. This goes beyond the problem of new “digital divides” (boyd and Crawford 2012) in a number of ways.

First, there is the danger that the financial benefits of big data flow disproportionately towards the Global North. Increasingly, businesses looking to gather more and wider data (such as Facebook and Google) are looking towards Asia (Dalton 2016; Russell 2015). Data is already becoming a resource that it is increasingly aggregated and monetized within a few global centers.

Second, the benefits of big data come with substantial risks. Particularly salient are the risks of breaches of privacy and anonymity. Such risks are not likely to be understood or appreciated in the same ways everywhere. Attitudes towards privacy are far from globally uniform; nor are the stakes of privacy equal for everyone. Given that, how can such risks be assessed adequately and distributed equally?

Third, the corporate dynamics of big data also risk concentrating skills and expertise associated with them outside Asia. Those who do the work of building systems and infrastructures establish enduring categories, standards, and practices. It is critical that representativeness in big data extend not only to consideration of who is represented, but also to who is doing the work of big data (building data infrastructures, data analysis, building apps, and so on). Making data work wholly representative requires building inclusivity into the front end of making and working with data.

Kaushik Sunder Rajan has written about the global expansion of pharmaceutical trials to India. He argues that a fundamental injustice arises when those whose bodies are risked in trials are not the same individuals who stand to gain from the benefits of new treatments (Rajan 2010). The potential with data may be an equivalent one: those who are placed at risk through data collection are not necessarily the same persons who stand to gain from the aggregations of data.

3.      History Matters

As we examine big data in different spaces, it is not only social, cultural, linguistic, political, and economic contexts that matter. History plays a critical role too. Institutions for collecting, storing, managing, processing, analyzing, and distributing data do not emerge from thin air. Rather, such institutions have histories which are going to affect data practices as well as attitudes towards data collection and data use. Particularly in postcolonial contexts, the histories of the colonial data collection have critical implications for how local populations understand and respond to big data.

At the workshop, Arunabh Ghosh (Harvard University) gave us a glimpse of the history of statistics in twentieth century China. Under the Chinese Communist Party, the statistical bureau aimed to count every aspect of the Chinese economy and society, attempting to mobilize this “complete” account for the purposes of centralized planning. Such methods, eschewing sampling and probability, were based on a purported one-to-one correspondence between the statistics and the reality on the ground. Such pre-big data big data meant that very different amounts and kinds of social and economic data were available in and about Communist China. More importantly, however, it suggests how those data belong to specific regimes of data practice – they were collected, aggregated, and used in particular ways for particular political purposes. Big data practices in present day China necessarily sit against the background of a longer history of data positivism, data for state planning, and notions of “statistics-as-reality.”  Such legacies are not easily shed.

Historians have long been sensitive to the fact that the stories they can tell are directly dependent on their data (usually in the form of archives). In Asia, particularly, the colonial and wartime legacies of archives forms an important baseline for historical interpretation. Ann Stoler argues that archives are not collections merely to be “mined,” but rather they are “cultural artifacts of fact production, of taxonomies, in the making, and of disparate notions of what made up colonial authority” (Stoler 2002). In other words, the “data” in archives can never be divorced from the social and political conditions of its production; such conditions will always influence possible narratives, especially when it comes to colonial regimes. The same applies to other forms of data, whether collected recently or in the past – provenance matters in what we can do and make with data.  

4. (Infra)Structures Matter

We already know that big data is not raw, not neutral, that they are always collected for various purposes, and that these purposes affect downstream uses (Gitelman ed. 2013). Attending to this “situatedness” of big data is even more important in a global context. The structures and institutions through which big data is collected, stored, managed, and analyzed encode particular kinds of values into that data. These values are not global or universal values. This becomes important especially when data is moved around, imported, exported, shared, and used in different contexts.

Sandeep Mertia (Sarai-CSDS) gave us an account of his ethnographic work in rural India, where numerous government and non-government agencies are attempting to collect data about local populations. Here, collecting data runs up against the practical difficulties not only of translation into local languages, but also such concerns as keeping data-collection tablets charged in areas with scarce electricity. As data gets “mined” from local registries and recorded onto paper, then into tablet-based forms that are uploaded to centralized databases, data collectors need to find ways of making local data into globally mobile data. This relies on communication infrastructures as well as the ability to mesh local categories and systems into standardized forms. Making data travel depends on local customs, facilities, and infrastructure. Understanding the possible meanings of data collected in this way will require this kind of detailed attention the effects of these local structures and infrastructures (including Internet and other communications infrastructures, transport infrastructures, power and electrical infrastructures, and so on).

Data here might better be thought of as something produced through processes of negotiation and translations, rather than something liquid. The “meaning” of data collected on paper in local Indian villages is not the same as the “meaning” of data entered into a tablet or in a spreadsheet representing hundreds of villages. What “data” are varies with place, time, and purpose, and the structures and media in which they exist and move always set important limits on what can be done with them. From this point of view, analysis of big data must draw not only on the literature on the ethnography of infrastructure (Star 1999) but also on scholarship on dissemination of facts and knowledge (Howlett and Morgan, eds. 2011).

5.      Big Data is Not Necessarily the Best Use of Resources

Investments in big data, and the infrastructure for collecting and using it, are often justified on the grounds that it will provide efficiencies and save money (Mayer-Schönberger and Cukier 2013). But again, attention to varied contexts, suggests that investments in big data may not always be the best way to solve local problems. In many cases, simpler, smaller-scale, less costly solutions may be far more effective. For the Global South, it may be particularly important to argue against “big data” discourses of efficiency and cheapness. Advocates of big data suggest that “geo-locating a rural African farmer working in his farm with the help of an app installed in his cellphone, identifying the soil type and needs of the field, and offering advice regarding appropriate seeds, where they can be purchased, and how they can be planted and harvested is not far in the future” (Kshetri 2014). However, it is far from clear who would pay for the infrastructure to implement such schemes. Moreover, it is unclear whether scenarios would benefit farmers equally or how data privacy would be respected.

At the workshop, we learned about the massive growth in air travel within Asia as low cost carriers appeal to lower income customers. Max Hirsh (Hong Kong University) explained how airport authorities and urban planners – faced with overwhelming growth – have looked toward high-tech, big data solutions. However, in many cases this has produced a large amount of “data we don’t need” (such as data about restroom cleanliness). On the other hand, more straightforwardly useful data about passengers that is collected by airlines is routinely discarded or ignored since it does not emerge from high-tech, automated systems. In what Hirsh labels “middle-tech solutions,” data that already exist can be combined with existing infrastructures to find far more effective and efficient solutions to local problems (Hirsh 2016).

6.      More Sharing Does Not Necessarily Mean More Openness

Big data is also often sold as the key to openness and transparency. Sharing data will also generate efficiencies, we are told, by increasing governmental and bureaucratic openness in particular (for example see Open Data Government 2017). This narrative resonates with western liberal ideas of democracy and free markets undergirded by free press and freedom of information. In autocratic or quasi-democratic contexts, however, the connection between sharing data and openness is not so straightforward. In fact, the rhetoric of transparency around data may create the appearance of openness in ways that actually foreclose further debate.

Hallam Stevens (Nanyang Technological University) offered an analysis of the Singapore government website data.gov.sg. Although data from many government agencies are shared via data.gov.sg, there is no guarantee that such data is complete and little information about how it was collected. Moreover, although the website encourages citizens to utilize the data, legitimate and illegitimate uses are carefully prescribed. As such, many of the “apps” developed via data.gov.sg are directed towards surveillance, citizen self-policing, and consumerism. Rather than challenging government aims and ideologies, the ways in which government data is actually deployed and used reinforces already dominant narratives within Singaporean society.

This is consonant with the findings of Levy and Johns (2016), who argue that, in certain contexts, transparency can be “weaponized” to hamstring democratic governance. In biology, too, regimes of open data that emerged in genomics in the 1990s are being rethought in light of concerns about privacy, ethics, and justice within the health care system (Reardon et al. 2016).   Just because data is open does not necessarily guarantee that the practices attached to it will be democratic or free or just (see Ruppert 2015).

Future Work: Models of Data

Big data is a space in which new kinds of expertise and new claims of authority are rapidly emerging. As we analyze these new forms of power, we must pay critical attention to their differential effects across space and place. One way of doing this is to consider different kinds of metaphors or models for understanding data. Thinking of data as a manufactured product (something actively produced by work) rather than a resource (to be mined or exploited) could lead to different possibilities for their circulation and use. Thinking of data not as free-flowing or liquid, but as negotiated and translated, changes the value and meaning of moving data around. Thinking of data in terms of rights or responsibilities (to privacy and anonymity) rather than in terms of markets challenges the presumptions of “open data.” Thinking of data as situated and contextual (in space and in history) rather than universal can help to suggest the limitations of particular data regimes.  

Many of the practices and structures of big data are now being imported from the Global North into Asia. But as we have suggested here, many of the potentials and risks of big data are quite different in these different contexts. Understanding data (and its consequences) in these diverse contexts requires that we develop and apply different models and metaphors. Since big data remains an emerging phenomenon in Asia, scholars have an opportunity to make important critical inventions. Both the diversity and interconnectedness of Asia makes it useful as a method for thinking with different models of what data might be and what it might be for. Comparisons both within and beyond Asia can illuminate the need for attention to diverse views, contexts, and histories in big data. This will require expansive thinking that includes social scientists, humanists, and artists to work with data scientists and policy makers to address some of the questions and challenges raised here.

References

Arora, Payal (2016) “Bottom of the Big Data Pyramid: Big Data and the Global South” International Journal of Communications 10: 1681-1699.

boyd, danah and Kate Crawford (2012) “Critical Questions for Big Data: Provocations for a Cultural, Technological and Scholarly Phenomenon” Information, Communication & Society 15, no. 5: 662-679.  

Chen, Kuan-Hsing (2010) Asia as Method: Toward Deimperialization. Durham, NC: Duke University Press.

Dalton, Andrew (2016) “Google and Facebook Team Up on a Direct Connection to Asia” Engadget, 12th October. https://www.engadget.com/2016/10/12/google-facebook-direct-connection-to-asia/

Gitelman, Lisa, ed. (2013) Raw Data is an Oxymoron. MIT Press.

Halpern, Orit (2015) Beautiful Data: A History of Vision and Reason since 1945. Duke University Press.

Hirsh, Max (2016) Airport Urbanism: Infrastructure and Mobility in Asia. University of Minnesota Press.

Howlett, Peter and Mary S. Morgan, eds. (2011) How Well Do Facts Travel? The Dissemination of Reliable Knowledge. Cambridge University Press.

Kshetri, Nir (2014) “The emerging role of Big Data in key development issues: Opportunities, Challenges and Concerns” Big Data & Society (18 December). http://journals.sagepub.com/doi/full/10.1177/2053951714564227

Levy, Karen E.C. and David M. Johns (2016) “When Open Data is a Trojan Horse: The Weaponization of Transparency in Science and Government” Big Data & Society. DOI: 10.1177/2053951715621568.

Marr, Bernard (2017) “How Chinese Internet Giant Baidu Uses AI and Machine Learning” Forbes, 13th February. https://www.forbes.com/sites/bernardmarr/2017/02/13/how-chinese-internet-giant-baidu-uses-ai-and-machine-learning/#7456b471776f

Mayer-Schönberger, Viktor and Kenneth Cukier (2013) Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt.

Open Data Government (2017) Website. https://opengovernmentdata.org/

Rajan, Kaushik S. (2010) “The Experimental Machinery of Global Clinical Trials: Case Studies From India” In: Asian Biotech: Ethics and Communities of Fate. Aihwa Ong and Nancy N. Chen, eds. Duke University Press: pp. 55-80.

Reardon, J. et al. (2016) “Bermuda 2.0: Reflections from Santa Cruz” Gigascience 5, no. 1: 1-4.

Ruppert, Evelyn (2015) “Doing the Transparent State: Open Government Data as Performance Indicators” In: A World of Indicators: The Making of Governmental Knowledge Through Quantification. R. Rottenberg, S. E. Merry, S-J. Park, and J. Mugler, eds. Cambridge University Press: pp. 127-150. 

Russell, Jon (2015) “Google Expands Its Data Centers in Asia as Millions Come Online for First Time” TechCrunch, 2nd June. https://techcrunch.com/2015/06/02/google-expands-its-data-centers-in-asia-as-millions-come-online-for-first-time/

Star, Susan L. (1999). “The Ethnography of Infrastructure” American Behavioral Scientist 43, no. 3: 377-391.

Stoler, Ann L. (2002) “Colonial archives and the arts of governance” Archival Science 2: 87-109.

Sun, Shirley H. (2017) Socio-Economics of Personalized Medicine in Asia. New York: Routledge. 

Swanson, Ana (2015) “How Baidu, Tencent, and Alibaba are leading the way in China’s Big Data Revolution” South China Morning Post, 25th August.   http://www.scmp.com/tech/innovation/article/1852141/how-baidu-tencent-and-alibaba-are-leading-way-chinas-big-data

Watson, Sara M. (2016) “Data is the new ‘___’” Dis magazine. http://dismagazine.com/discussion/73298/sara-m-watson-metaphors-of-big-data/