Big Data & Society: What’s the harm in categorisation? Reflections on the categorisation work of Tech 4 Good

What’s the harm in categorisation? Reflections on the categorisation work of Tech 4 Good

by Kate Sim and Margie Cheesman, Oxford Internet Institute

The UN Special Rapporteur Tendayi Achiume will soon release a report for the UN Human Rights Council on the role of digital technologies in facilitating contemporary forms of discrimination and inequality [FN1]. This report will be informed by an expert workshop which took place at UCLA in October last year and brought together academics and activists at the forefront of analysing the social, political, and ethical implications of data-driven technologies. The workshop aimed to assess the extent to which digital technologies not only reproduce, but uniquely exacerbate existing power structures. Particularly important points identified by the workshop participants were (1) the process of categorisation that underpins the design, implementation, and internal logics of digital technologies as a distinctive element, and (2) how the speed and scale of technological categorisation work reproduces and compounds social stratifications.

With this post we reflect on the workshop discussions and seek to complement the forthcoming report by examining the role of categorisation based on our respective field research on ‘Tech 4 Good’. Under the banner of ‘Tech 4 Good’, technology companies, policymakers, and civil society organizations come together in their desire to use data-driven technologies as efficient, apposite, and meaningful interventions to social concerns. But how exactly and for whom is technology ‘good’? Investigating the design and deployment of digital technologies can illuminate how these systems codify and stabilize power relations. Technological categorisation underpins this interpretive process through which affected individuals and society are codified. For example, our respective fieldwork on sexual misconduct reporting and humanitarian aid shows how the category-making work of data-driven systems maps onto and extends power structures. This is also evident in the contemporary vocabularies and frameworks for assessing the impact of digital technologies. We highlight how key stakeholders centre critiques around privacy-related harms, reducing the insidious problems of tracking, profiling and categorisation to a matter of individual choice rather than a more systemic issue. In short, this post shows why digital categorisation practices - no more so than in Tech 4 Good - demand greater public critique, accountability and reparative justice.

Notes from the field

Our respective work at the Oxford Internet Institute demonstrates how categorisation is a key part of marshalling digital technologies for ‘social good.’ For example, Sim’s research highlights how categorising violence is integral to the production and application of sexual misconduct reporting systems. Case management systems like Callisto, LiveSafe, and Vault Platform facilitate the creation of sexual misconduct reports that trigger institutional responses. The perception that sexual assault data is fixed and objective helps make these reporting tools seem accurate and trustworthy. However, gender assumptions are constructed and reinforced in the way that data is encoded in the system design. For example, users are asked to define their experience by selecting a category of violence from a prefigured list that includes categories like ‘sexual harassment’ and ‘verbal abuse.’ The implicit assumption here is that these categories are self-evident and well-defined. Yet, analysing users’ experiences searching for the ‘right’ category demonstrates how these reporting tools’ categories--as a constructed system of defining and sorting—fall short in capturing a range of users’ experience of sexual misconduct. “The body,” as Bowker and Star (2000) write, “cannot be aligned with the classification system” (p.190).

The impact of these categories extend beyond moments of disconnect on the user’s end. The system vendors’ assumptions that construct and guide these categories help create greater credibility to those users who can successfully complete the form. The system vendors appeal to data’s perceived objectivity and incorruptibility, which sociologist David Beer terms the ‘data imaginary’ (2017), to ascribe greater credibility to those users’ claims. Those who are able to conform to the system’s classificatory logic are rewarded as credible reporters, while those who cannot see their experiences marginalized.

Cheesman’s research in the aid industry demonstrates how categorisation is the basis of resource allocation. Aid is distributed according to metrics of refugee vulnerability, including bodily abilities, sexual habits, food consumption and credit scores. New data technologies used in humanitarian categorisation practices structure the conditions of recognition and support for refugees. Yet, the same
technologies are routinely depoliticised as neutral means of better targeting and delivering services. For example, when blockchain’s automated consensus algorithms facilitate ‘neutral’ information-sharing between aid organisations, and biometric identity checks are understood as an efficient, anti-fraud advancement in humanitarian bureaucracy. Veracity is ascribed to both these technologies: biological information is seen as accurate and non-contentious, and blockchain as constructing unbiased, incorruptible, real-time records.

As a result, these de-politicised approaches to humanitarian digitalisation stabilise categorisation practices as objective, eclipsing intersectional issues about category-making, discrimination and mobility control. Digital identification can adversely affect certain groups by fixing them to a unitary set of categories: infrastructures designed to provide legal identity and protection are used to denationalise and repatriate stateless people, as with the Rohingya in Myanmar and Bangladesh (Taylor & Mukiri-Smith 2019; Madianou 2019). Moreover, categories are not just political, but gendered and racialised - for example in ethnic groupings and identifications of ‘head of household’ in refugee camps. In all these cases, there is little capacity for refugee populations to resist, contest, or define their own subjectivities.

Critiques of digital humanitarianism often centre on individual technologies and individual privacy rather than systemic issues of categorisation. But evidence of protest against biometrics or smartcards (Al Jazeera 2018), of privacy breaches (Cornish 2017) or physically dangerous system avoidance (Latonero & Kift 2018:7) should not be the only point at which computational logics are questioned. These logics normalise power relations, and maintain and extend structures of North-South domination by the aid industry. Paternalistic approaches are both produced and caused by the exclusion of refugees from humanitarian categorisation and decision-making.

What’s the harm in categorisation?

The speed, scale, and scope of technological categorisation not only reproduces social stratifications and injustices; it also makes those asymmetries less visible.Workshop attendees highlighted how the approach favoured by technology companies, journalists and policymakers to primarily recognise the most tangible or immediate harms of digital technologies overlooks and obfuscates the insidious practices and impacts of categorisation. The focus on harm is rooted in current debates that individualise digital inequality in terms of personal choice and ethics. Violations of privacy continue to be the dominant metric through which stakeholders conceptualize the erosion of bodily and digital rights. Increasingly, the institutionalization of ethics as a strategy of self-regulation rather than public accountability illustrates the technology sector’s continuing hold over our collective language and imagined interventions (Metcalf, Moss & boyd 2019).

In response to these conceptual deficits, workshop attendees called for a reorientation of critical inquiry and intervention towards reparative justice. Beyond demanding more systematic mechanisms of transparency and accountability around digital categorisation practices, they asked: what are data subjects owed, who should be held responsible, and how? Proposed solutions to digital inequality include new models of ‘self-sovereign’ data ownership, control and monetisation for individuals (Wang & De Filippi 2020), as well as more collectivist ‘digital commons’ approaches (Prainsack 2019). However, if and how these could remedy asymmetries in categorisation power without producing exclusionary effects remains to be seen (ibid). As the UN 2020 report will also highlight, researchers, technology actors, activists, and policymakers must work in concert to develop more robust heuristics and frameworks for apprehending and addressing the structural nature of digital inequality.

----
[FN1] This report will be titled: 2020 Human Rights Council Report of the United Nations Special Rapporteur on Contemporary Forms of Racism, Racial Discrimination, Xenophobia and Related Intolerance.

Works Cited

Al Jazeera (2018) Violence stalks UN’s identity card scheme in Rohingya camps. https://www.aljazeera.com/news/2018/11/violence-stalks-identity-card-scheme-rohingya-camps-181122075307535.html

Beer, David. Metric Power. 1st ed. 2016 edition. Palgrave Macmillan, 2016.

———. “The Data Analytics Industry and the Promises of Real-Time Knowing: Perpetuating and Deploying a Rationality of Speed.” Journal of Cultural Economy 10, no. 1 (January 2, 2017): 21–33. https://doi.org/10.1080/17530350.2016.1230771.

Bowker, Geoffrey C. and Susan Leigh Star. Sorting Things Out: Classification and Its Consequences. New Ed edition. Cambridge, Mass.: MIT Press, 2000.

Cornish, L. (2017). New security concerns raised for RedRose digital payment systems | Devex. Devex, 1–6. Retrieved from https://www.devex.com/news/new-security-concerns-raised-for-redrose-digital-payment-systems-91619.

Latonero, M., & Kift, P. (2018). On Digital Passages and Borders: Refugees and the New Infrastructure for Movement and Control. Social Media and Society, 4(1).

Madianou, M. (2019). Technocolonialism: Theorizing Digital Innovation and Data Practices in Humanitarian Response. Social Media + Society, 1–33. https://doi.org/10.1177/2056305119863146

Metcalf, Jacob, Emanuel Moss, and danah boyd. “Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics.” Social Research: An International Quarterly, 82, no. 2 (2019): 449–76.

Prainsack, B. (2019). Logged out: Ownership, exclusion and public value in the digital data and information commons. Big Data and Society, 6(1), 1–15. https://doi.org/10.1177/2053951719829773

Taylor, Linnet, and Hellen Mukiri-Smith. 2019. “Global Data Justice : Framing the (Mis)Fit between Statelessness and Technology.” European Network on Statelessness. 2019. https://www.statelessness.eu/blog/global-data-justice-framing-misfit-between-statelessness-and-technology.

Wang, F., & Filippi, P. De. (2020). Self-Sovereign Identity in a Globalized World : Credentials-Based Identity Systems as a Driver for Economic Inclusion. 2(January), 1–22. https://doi.org/10.3389/fbloc.2019.00028

Monday, 23 March 2020

What’s the harm in categorisation? Reflections on the categorisation work of Tech 4 Good