Wednesday, 25 March 2020

Video abstract: Beyond algorithmic reformism

Peter Polack introduces his paper "Beyond algorithmic reformism: Forward engineering the designs of algorithmic systems" in Big Data & Society 7(1), First published: March 20, 2020.

Video abstract

Text abstract
This article develops a method for investigating the consequences of algorithmic systems according to the documents that specify their design constrains. As opposed to reverse engineering algorithms to identify how their logic operates, the article proposes to design or "forward engineer" algorithmic systems in order to theorize how their consequences are informed by design constraints: the specific problems, use cases, and presuppositions that they respond to. This demands a departure from algorithmic reformism, which responds to concerns about the consequences of algorithmic systems by proposing to make algorithms more transparent or less biased. Instead, by investigating algorithmic systems according to documents that specify their design constraints, we identify how the consequences of algorithms are presupposed by the problems that they propose to solve, the types of solutions that they enlist to solve these problems, and the systems of authority that these solutions depend on. To accomplish this, this article develops a methodological framework for researching the process of designing algorithmic systems. In doing so, it proposes to move beyond reforming the technical implementation details of algorithms in order to address the design problems and constraints that underlie them.

Keywords: Critical algorithm studies, predictive policing, design studies, algorithmic bias, algorithmic opacity, algorithmic accountability

Tuesday, 24 March 2020

Changes to Big Data and Society’s Review Policy due to COVID-19.

Changes to Big Data and Society’s Review Policy due to COVID-19.

Hello to all readers, authors and reviewers. Given the challenges around COVID-19 facing many of us, it makes little sense to continue with our normal review process. So, beginning today, Big Data and Society is starting a review hiatus for the next four weeks.

This means that (1) we will pause asking for any new reviews until April 19th. (2) For papers currently in review, we will extend the time for currently assigned referee reports by 4 weeks. (3) we will extend the submission deadline of papers in revision by 4 weeks. (4) We’ll still be accepting papers during this time but will not start the review process until April 19th. (5) Decisions on our recent call for special themes will also be pushed back until the end of April.

We recognize that these changes are toughest on authors (particularly those facing tenure and promotion decisions) and are prepared to address the specific circumstances they face. We will re-assess things shortly before April 19th and adjust plans according. If you have questions, please contact Matthew Zook (Managing Editor),

We wish everyone and their loved ones health and safety in the upcoming weeks. Stay well. Editorial Team of Big Data and Society journal

Monday, 23 March 2020

What’s the harm in categorisation? Reflections on the categorisation work of Tech 4 Good

What’s the harm in categorisation? Reflections on the categorisation work of Tech 4 Good

by Kate Sim and Margie Cheesman, Oxford Internet Institute

The UN Special Rapporteur Tendayi Achiume will soon release a report for the UN Human Rights Council on the role of digital technologies in facilitating contemporary forms of discrimination and inequality [FN1]. This report will be informed by an expert workshop which took place at UCLA in October last year and brought together academics and activists at the forefront of analysing the social, political, and ethical implications of data-driven technologies. The workshop aimed to assess the extent to which digital technologies not only reproduce, but uniquely exacerbate existing power structures. Particularly important points identified by the workshop participants were (1) the process of categorisation that underpins the design, implementation, and internal logics of digital technologies as a distinctive element, and (2) how the speed and scale of technological categorisation work reproduces and compounds social stratifications.

With this post we reflect on the workshop discussions and seek to complement the forthcoming report by examining the role of categorisation based on our respective field research on ‘Tech 4 Good’. Under the banner of ‘Tech 4 Good’, technology companies, policymakers, and civil society organizations come together in their desire to use data-driven technologies as efficient, apposite, and meaningful interventions to social concerns. But how exactly and for whom is technology ‘good’? Investigating the design and deployment of digital technologies can illuminate how these systems codify and stabilize power relations. Technological categorisation underpins this interpretive process through which affected individuals and society are codified. For example, our respective fieldwork on sexual misconduct reporting and humanitarian aid shows how the category-making work of data-driven systems maps onto and extends power structures. This is also evident in the contemporary vocabularies and frameworks for assessing the impact of digital technologies. We highlight how key stakeholders centre critiques around privacy-related harms, reducing the insidious problems of tracking, profiling and categorisation to a matter of individual choice rather than a more systemic issue. In short, this post shows why digital categorisation practices - no more so than in Tech 4 Good - demand greater public critique, accountability and reparative justice.

Notes from the field

Our respective work at the Oxford Internet Institute demonstrates how categorisation is a key part of marshalling digital technologies for ‘social good.’ For example, Sim’s research highlights how categorising violence is integral to the production and application of sexual misconduct reporting systems. Case management systems like Callisto, LiveSafe, and Vault Platform facilitate the creation of sexual misconduct reports that trigger institutional responses. The perception that sexual assault data is fixed and objective helps make these reporting tools seem accurate and trustworthy. However, gender assumptions are constructed and reinforced in the way that data is encoded in the system design. For example, users are asked to define their experience by selecting a category of violence from a prefigured list that includes categories like ‘sexual harassment’ and ‘verbal abuse.’ The implicit assumption here is that these categories are self-evident and well-defined. Yet, analysing users’ experiences searching for the ‘right’ category demonstrates how these reporting tools’ categories--as a constructed system of defining and sorting—fall short in capturing a range of users’ experience of sexual misconduct. “The body,” as Bowker and Star (2000) write, “cannot be aligned with the classification system” (p.190).

The impact of these categories extend beyond moments of disconnect on the user’s end. The system vendors’ assumptions that construct and guide these categories help create greater credibility to those users who can successfully complete the form. The system vendors appeal to data’s perceived objectivity and incorruptibility, which sociologist David Beer terms the ‘data imaginary’ (2017), to ascribe greater credibility to those users’ claims. Those who are able to conform to the system’s classificatory logic are rewarded as credible reporters, while those who cannot see their experiences marginalized.

Cheesman’s research in the aid industry demonstrates how categorisation is the basis of resource allocation. Aid is distributed according to metrics of refugee vulnerability, including bodily abilities, sexual habits, food consumption and credit scores. New data technologies used in humanitarian categorisation practices structure the conditions of recognition and support for refugees. Yet, the same
technologies are routinely depoliticised as neutral means of better targeting and delivering services. For example, when blockchain’s automated consensus algorithms facilitate ‘neutral’ information-sharing between aid organisations, and biometric identity checks are understood as an efficient, anti-fraud advancement in humanitarian bureaucracy. Veracity is ascribed to both these technologies: biological information is seen as accurate and non-contentious, and blockchain as constructing unbiased, incorruptible, real-time records.

As a result, these de-politicised approaches to humanitarian digitalisation stabilise categorisation practices as objective, eclipsing intersectional issues about category-making, discrimination and mobility control. Digital identification can adversely affect certain groups by fixing them to a unitary set of categories: infrastructures designed to provide legal identity and protection are used to denationalise and repatriate stateless people, as with the Rohingya in Myanmar and Bangladesh (Taylor & Mukiri-Smith 2019; Madianou 2019). Moreover, categories are not just political, but gendered and racialised - for example in ethnic groupings and identifications of ‘head of household’ in refugee camps. In all these cases, there is little capacity for refugee populations to resist, contest, or define their own subjectivities.

Critiques of digital humanitarianism often centre on individual technologies and individual privacy rather than systemic issues of categorisation. But evidence of protest against biometrics or smartcards (Al Jazeera 2018), of privacy breaches (Cornish 2017) or physically dangerous system avoidance (Latonero & Kift 2018:7) should not be the only point at which computational logics are questioned. These logics normalise power relations, and maintain and extend structures of North-South domination by the aid industry. Paternalistic approaches are both produced and caused by the exclusion of refugees from humanitarian categorisation and decision-making.

What’s the harm in categorisation?

The speed, scale, and scope of technological categorisation not only reproduces social stratifications and injustices; it also makes those asymmetries less visible.Workshop attendees highlighted how the approach favoured by technology companies, journalists and policymakers to primarily recognise the most tangible or immediate harms of digital technologies overlooks and obfuscates the insidious practices and impacts of categorisation. The focus on harm is rooted in current debates that individualise digital inequality in terms of personal choice and ethics. Violations of privacy continue to be the dominant metric through which stakeholders conceptualize the erosion of bodily and digital rights. Increasingly, the institutionalization of ethics as a strategy of self-regulation rather than public accountability illustrates the technology sector’s continuing hold over our collective language and imagined interventions (Metcalf, Moss & boyd 2019).

In response to these conceptual deficits, workshop attendees called for a reorientation of critical inquiry and intervention towards reparative justice. Beyond demanding more systematic mechanisms of transparency and accountability around digital categorisation practices, they asked: what are data subjects owed, who should be held responsible, and how? Proposed solutions to digital inequality include new models of ‘self-sovereign’ data ownership, control and monetisation for individuals (Wang & De Filippi 2020), as well as more collectivist  ‘digital commons’ approaches (Prainsack 2019). However, if and how these could remedy asymmetries in categorisation power without producing exclusionary effects remains to be seen (ibid). As the UN 2020 report will also highlight, researchers, technology actors, activists, and policymakers must work in concert to develop more robust heuristics and frameworks for apprehending and addressing the structural nature of digital inequality.

[FN1] This report will be titled: 2020 Human Rights Council Report of the United Nations Special Rapporteur on Contemporary Forms of Racism, Racial Discrimination, Xenophobia and Related Intolerance.

Works Cited

Al Jazeera (2018) Violence stalks UN’s identity card scheme in Rohingya camps.

Beer, David. Metric Power. 1st ed. 2016 edition. Palgrave Macmillan, 2016.

———. “The Data Analytics Industry and the Promises of Real-Time Knowing: Perpetuating and Deploying a Rationality of Speed.” Journal of Cultural Economy 10, no. 1 (January 2, 2017): 21–33.

Bowker, Geoffrey C. and Susan Leigh Star. Sorting Things Out: Classification and Its Consequences. New Ed edition. Cambridge, Mass.: MIT Press, 2000.

Cornish, L. (2017). New security concerns raised for RedRose digital payment systems | Devex. Devex, 1–6. Retrieved from

Latonero, M., & Kift, P. (2018). On Digital Passages and Borders: Refugees and the New Infrastructure for Movement and Control. Social Media and Society, 4(1).

Madianou, M. (2019). Technocolonialism: Theorizing Digital Innovation and Data Practices in Humanitarian Response. Social Media + Society, 1–33.

Metcalf, Jacob, Emanuel Moss, and danah boyd. “Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics.” Social Research: An International Quarterly, 82, no. 2 (2019): 449–76.

Prainsack, B. (2019). Logged out: Ownership, exclusion and public value in the digital data and information commons. Big Data and Society, 6(1), 1–15.

Taylor, Linnet, and Hellen Mukiri-Smith. 2019. “Global Data Justice : Framing the (Mis)Fit between Statelessness and Technology.” European Network on Statelessness. 2019.

Wang, F., & Filippi, P. De. (2020). Self-Sovereign Identity in a Globalized World : Credentials-Based Identity Systems as a Driver for Economic Inclusion. 2(January), 1–22.

Saturday, 7 March 2020

How biased is the sample? Reverse engineering the ranking algorithm of Facebook’s Graph application programming interface

Justin Chun-ting Ho
Big Data & Society 7(1), First published February 17, 2020.
Keywords: Bias detection, data mining, Facebook pages, application programming interface, social media research

Since November 2017, Facebook has introduced a new limitation on the maximum amount of page posts retrievable through their Graph application programming interface (API). However, there is limited documentation on how these posts are selected (Facebook 2017). In this article, I assess the bias caused by the new limitation by comparing two datasets of the same Facebook page, a full dataset obtained before the introduction of the limitation and a partial dataset obtained after. To establish generalisability, I also replicate the findings with data from another Facebook page.

This paper demonstrates that posts with high user engagement, Photo posts and Video posts, are over-represented while Link posts are under-represented. Top-term analysis reveals that there are significant differences in the most prominent terms between the full and partial dataset. This paper also reverse-engineered the new application programming interface’s ranking algorithm to identify the features of a post that would affect its odds of being selected by the new API. The estimated model posits that post types, Likes, Angry, Shares, and Likes on Comment are significant predictors. Sentiment analysis reveals that there are significant differences in the sentiment word usage between the selected and non-selected posts.

These findings have significant implications for research that use Facebook page data collected after the introduction of the limitation:
  • The under-representation of Link posts means that a significant amount of link-sharing activities would become invisible from the API.
  • There is no evidence to support the common expectation that the API would rank posts based on the amount of Likes and Comments. While the selected posts seem to have more Likes and Comments, other features also have an effect on the odds of being selected.
  • It is questionable to assume that the new API would return all the posts with the highest user engagement. Even though it is observed that the selected posts on average have higher user engagement, some highly commented and liked posts might not be selected due to the effect of other features.
  • Posts of certain linguistic styles can be filtered out as the new API tends to return posts with more emotional texts.
  • Non-random factors might be influencing the representation of most prominent terms in the selected posts, which could lead to bias in text models.
However, it is important to note that the data retrieved from the Graph API is still a useful resource that enables a wide range of research methods. We should not stop using the data because of the above-mentioned issues. Echoing Rieder et al. (2015), the potential bias calls for caution, prudence, and critical attention when using and interpreting the data. Uncovering the bias of the ranking algorithm will help researchers to better support their research results.


Facebook (2017) /page-id/feed. Available at: feed (accessed 31 January 2020).

Rieder B, Abdulla R, Poell T, et al. (2015) Data critique and analytical opportunities for very large Facebook pages: Lessons learned from exploring “We are all Khaled Said”. Big Data & Society 2(2): 1–22.

Friday, 6 March 2020

Establishing a Social Licence for FinTech: Reflections on the role of the private sector in pursuing ethical data practices

Mhairi Aitken, Ehsan Toreini, Peter Carmichael, Kovila Coopamootoo, Karen Elliott, Aad van Moorsel
Big Data & Society 7(1), First published: March 4, 2020
Keywords: Financial Technology, data, social licence, ethics, responsible artificial intelligence, trust

Recent years have witnessed a dramatic increase in attention directed at ethical dimensions of data practices and Artificial Intelligence (AI). Increasingly momentum for innovation is being met with interest in related ethical considerations and a number of high profile institutes and bodies have been established to focus on this area. The substantial investment in this field has to date largely resulted in a proliferation of guidance and sets of principles relating to ethical AI but important questions remain as to how such principles can be put into practice, and to what extent commitments to ethical AI go beyond rhetoric.

These are questions we engage with in our paper “Establishing a Social Licence for FinTech” and which also underpin our ongoing programme of research through our EPSRC-funded project “FinTrust” which examines the role of AI in finance.

We focus on FinTech (financial technology) as this represents a fast-moving industry and one which is attracting substantial investment. Within FinTech there is industrial advocacy surrounding the potential benefits of data science and AI in banking, however to date there has been little consideration of the ethical dimensions of these practices or the extent to which they align with public values and expectations. Therefore, our research focusses on FinTech in order to examine the opportunities and potential approaches to develop ethical data practices which go beyond compliance with regulation.

In our paper we consider the importance of establishing and maintaining a Social Licence for data practices. The notion of a Social Licence recognises that there can be meaningful differences between what is legally permissible and what is socially acceptable. A Social Licence is granted by a community of stakeholders and is intangible and unwritten but may be essential for the sustainability and legitimacy of particular practices or industries.

With attention being directed at digital ethics there is emerging interest in pursuing a Social Licence for data practices. However, it is interesting that while the notion of a Social Licence emerged in the 1990s in relation to private sector extractive industries (e.g. mining and forestry), to date where this has been discussed with regards data practices it has largely been in relation to public sector activities (e.g. healthcare and health research). In our paper we therefore consider what this means for private sector data-intensive industries, such as FinTech.

In discussing what would be required to establish a Social Licence for FinTech, we consider three main points:
  1. A Social Licence is underpinned by relationships of trust which need to be sustained over time. We consider how trust is established and what it might mean for a FinTech to be considered trustworthy.
  2. Establishing trust requires both technical and social approaches. We discuss the current technical approaches advocated in ethical AI (relating to Robustness, Fairness, Explainability and Lineage), the extent to which they may be conceived to demonstrate trustworthiness, and the importance of combining these with social approaches.
  3. Establishing and maintaining a Social Licence requires engagement with diverse stakeholders. Given that data practices are having far-reaching – and often unpredicted – impacts across society a broad conception of stakeholders acknowledges the importance of wide public engagement beyond potential service-users. We suggest that wide public engagement with broad publics is vital to ensure that current and future practices reflect public values and interests. Our paper then considers the extent to which it is reasonable to expect such broad approaches to be adopted by individual FinTechs or the wider industry.

The paper poses a number of questions to which we do not yet have the answers. For example, the paper does not aim to identify public interests or concerns relating to data practices in FinTech, or to set out what is required for FinTech to align with public values. Since there is a paucity of public engagement or deliberation examining public values around FinTech practices, further research (including through public engagement methods) is needed to examine what this means in practice.

Combining our interdisciplinary perspectives from Computer Science, Sociology, Human Computer Interaction and Organisational Science, our FinTrust project is continuing to build on the work presented in this paper to address these tricky questions. We aim to develop a toolkit which will set out a combination of technical and social approaches to underpin a future Social Licence for FinTech practices.

We posit that such approaches are needed across all areas and industries whose operations are dependent on data. Pursuing a Social Licence will complement regulation and build on ethical codes of practice. This is important to underpin culture change and to move beyond rhetorical commitments to develop best practice, meaningfully putting ethics at the heart of innovation.

Thursday, 20 February 2020

Reconfiguring National Data Infrastructures through the Nordic Data Imaginary

Aaro Tupasela, Karoliina Snell, Heta Tarkkala
Big Data & Society 7(1), First published: February 20th, 2020
Keywords: big data, health data, health policy, platform economy, Nordic data gold mine, data imaginary, sustainability

Data has become a central feature of economic development and system of production. The range of companies that source data from their users on a daily basis has become a mainstay of ethical and legal discussions surrounding data relations between the sources, collectors and users of that data. In our article just published in Big Data & Society, we focus on the development and implementation of national data infrastructures within the Nordic countries.

The Nordic countries aim to establish a unique place within the European and global health data economy. They have extensive nationally maintained and centralized health data records, as well as numerous biobanks where data from individuals can be connected based on personal identification numbers. Much of this phenomenon is the result of the emergence and development of the Nordic welfare state, where Nordic countries sought to systematically collect large amounts of population data to guide decision-making and improve the health and living conditions of the population. These massive collections of data have remained somewhat separate and connecting data between different sources has taken time and effort for researchers to accomplish due to ethical and legal constraints associated with such practices. With the explosive growth in utilizing big data in research and development, however, these data infrastructures are being re-purposed within the Nordic countries.

Recently, the so-called Nordic gold mine of data is being re-imagined in a wholly other context, where welfare state data and its ever-increasing logic of accumulation is seen as a driver for economic growth and private business development. This model, which the private sector has given birth to, has become a model for national projects to capitalize on population data. Our article explores the development of policies and strategies for health data economy in Denmark and Finland. We ask how nation states try to adjust and benefit from new pressures and opportunities to utilize their data resources in data markets. This, we argue, raises questions of social sustainability in terms of states being producers, providers, and consumers of data. The data imaginaries related to emerging health data markets also provide insight into how a broad range of different data sources, ranging from hospital records and pharmacy prescriptions to biobank sample data, are brought together to enable ‘full-scale utilization’ of health and welfare data.

Wednesday, 12 February 2020

Call for Special Theme Proposals for Big Data & Society, Due March 9

Call for Special Theme Proposals for Big Data & Society

The SAGE open access journal Big Data & Society (BD&S) is soliciting proposals for a Special Theme to be published in 2021. BD&S is a peer-reviewed, interdisciplinary, scholarly journal that publishes research about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge. BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practices that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government and crowd-sourced data. Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes.

Special Themes can consist of a combination of Original Research Articles (10000 words; maximum 6), Commentaries (3000 words; maximum 4) and one Editorial (3000 words). All Special Theme content will be waived Article Processing Charges. All submissions will go through the Journal’s standard peer review process.

Past special themes for the journal have included: Knowledge Production, Algorithms in Culture, Data Associations in Global Law and Policy, The Cloud, the Crowd, and the City, Veillance and Transparency, Environmental Data, Spatial Big Data, Critical Data Studies, Social Media & Society, Assumptions of Sociality, Health Data Ecosystems, Data & Agency, Big Data and Surveillance.  See to access these special themes.

Format of Special Theme Proposals
Researchers interested in proposing a Special Theme should submit an outline with the following information.

      An overview of the proposed theme, how it relates to existing research and the aims and scope of the Journal, and the ways it seeks to expand critical scholarly research on Big Data.
      A list of titles, abstracts, authors and brief biographies. For each, the type of submission (ORA, Commentary) should also be indicated. If the proposal is the result of a workshop or conference that should also be indicated.
Short Bios of the Guest Editors including affiliations and previous work in the field of Big Data studies. Links to homepages, Google Scholar profiles or CVs are welcome, although we don’t require CV submissions.
      A proposed timing for submission to Manuscript Central. This should be in line with the timeline outlined below.

Information on the types of submissions published by the Journal and other guidelines is available at .

Timeline for Proposals
Please submit proposals by March 9, 2020 to the Managing Editor of the Journal, Prof. Matthew Zook at The Editorial Team of BD&S will review proposals and make a decision by April 2020. Manuscripts would be submitted to the journal (via manuscript central) by or before September 2020. For further information or discuss potential themes please contact Matthew Zook at