Wednesday, 21 October 2020

Revisiting the Black Box Society by Rethinking the Political Economy of Big Data

Special Theme Issue

Guest lead editors: Benedetta Brevini* and Frank Pasquale**

* University of Sydney
** University of Maryland

Throughout the 2010s, scholars explored the politics and sociology of data, its regulation and its role in informing and guiding policymakers such as the importance of quality health data in the COVID-19 epidemic to “flatten the curve.” However, all too much of this work is being done in “black box societies” jurisdictions where the analysis and use of data is opaque, unverifiable, and unchallengeable. As a result, far too often data are used as a tool for social, political, and economic control, with biases often distorting decision making and accompanied by narratives of tech solutionism and even salvation-ism abound.

The Black Box Society was one of first scholarly accounts of algorithmic decision making to synthesize empirical research, normative frameworks, and legal argument and this symposium of commentaries reflect on what has happened since its publication. Much has happened since 2015 that vindicates and challenges the book’s main themes. Yet recurring examples of algorithmically driven injustices raise the question of whether transparency—the foundational normative value in The Black Box Society—is a first step toward a more emancipatory deployment of algorithms and AI, is an easily deflected demand, or actually worsens matters by rationalizing the algorithmic ordering of human affairs.

To address these issues, this symposium features the work of leading thinkers who have explored the interplay of politics, economics, and culture in domains ordered algorithmically by managers, bureaucrats, and technology workers. By bringing social scientists and legal experts into dialogue, we aim both to clarify the theoretical foundations of critical algorithm studies and to highlight the importance of engaged scholarship, which translates the insights of the academy into an emancipatory agenda for law and policy reform. While the contributions are diverse, a unifying theme animates them: each offers a sophisticated critique of the interplay between state and market forces in building or eroding the many layers of our common lives, as well as the kaleidoscopic privatization of spheres of reputation, search, and finance. Unsatisfied with narrow methodologies of economics or political science, they advance politico-economic analysis. They therefore succeed in unveiling the foundational role that the turn to big data has in organising economic and social relations. All the contributors help us imagine practical changes to prevailing structures that will advance social and economic justice, mutual understanding, and ecological sustainability. For this and much else, we are deeply grateful for their insightful work.

Editorial by Benedetta Brevini and Frank Pasquale, "Revisiting the Black Box Society by rethinking the political economy of big data"

Ifeoma Ajunwa, in “The Black Box at Work,” describes the data revolution of the workplace, which simultaneously demands workers surrender intimate data and then prevents them from reviewing how it is used.

Mark Andrejevic, in “Shareable and Un-Shareable Knowledge,” focuses on what it means to generate actionable but non-shareable information, reaffirming the urgency of intelligible evaluation as a form of dignity.

Margaret Hu’s article “Cambridge Analytica’s Black Box” surveys a range of legal and policy remedies that have been proposed to better protect consumer data and informational privacy.

Paul Prinsloo examines “Black Boxes and Algorithmic Decision-making in (Higher) Education” to show how the education sector is beginning to adopt technologies of monitoring and personalization that are similar to the way the automated public sphere serves political information to voters.

Benedetta Brevini, in “Black Boxes, not Green: Mythologizing AI and Omitting the Environment” documents how AI runs on technology, machines and infrastructures that deplete scarce resources in their production, consumption and disposal, thus placing escalating demands on energy and accelerating the climate emergency.

Gavin Smith develops the concept of our “right to the face” in “The Face is the Message: Theorisingthe Politics of Algorithmic Governance in the Black Box City” as he explores how algorithms are now responsible for important surveillance of cities, constantly passing judgment on mundane activities.

Nicole Dewandre’s article, “Big Data: From Fears of the Modern to Wake-up Call for a New Beginning” applies a deeply nuanced critique of modernity to algorithmic societies arguing that Big Data may be hailed as the endpoint or materialisation of a Western modernity, or as a wake-up call for a new beginning.

Jonathan Obar confirms this problem empirically in “Sunlight Alone is Not a Disinfectant: Consent andthe Futility of Opening Big Data Black Boxes,” and proposes solutions to more equitably share the burden of understanding.

Kamel Ajji in “CyborgFinance Mirrors Cyborg Social Media” outlines how The Black Box Society inspired him to found “21 Mirrors, a nonprofit organization aimed at analyzing, rating and reporting to the public about the policies and practices of social media, web browsers and email services regarding their actual and potential consequences on freedom of expression, privacy, and due process.”

Tuesday, 8 September 2020

Emerging models of data governance in the age of datafication

by Marina Micheli, Marisa Ponti, Max Craglia and Anna Berti Suman 

Big Data & Society 7(2), First published: Sept 1, 2020.

The article synthetizes and critically inquires a ‘moving target’: the various practices that are being advanced for the governance of personal data. In the last years, following scandals like Cambridge Analytica and new regulations for the protection of data like the GDPR, there is mounting attention on how data collected by big tech corporations and business entities might be accessed, controlled, and used by other societal actors. Scholars, practitioners and policy makers have been exploring the opportunities of agency for data subjects, as well as the alternative data regimes that could allow public bodies to use such data for their public interest mission. Yet, the current circumstances, which are the result of a tradition of ‘corporate self-regulation’ in the digital domain and an overall laissez-faire approach (albeit increasingly divergent by geopolitical context), see the hegemonic position of a few technology corporations that have de-facto established ‘quasi-data monopolies’. This is reflected in the asymmetry of power between data corporations, which hold most of the decision-making power over data access and use, and the other stakeholders.

The article increases knowledge about the practices for data governance that are currently developed by various societal actors beyond ‘big tech’. It does so describing four data governance models, emphasizing the power of social actors to control how data is accessed and used to produce different kinds of value. A relevant outcome of the article lies in the heuristic tools it proposes that could be useful to better understand and further examine the emerging models of data governance –looking in particular at the relations between stakeholders and the power (un)balances between them.

The idea for this study originates from a workshop that we organised in the context of the project Digitranscope at the Centre of Advanced Studies of the Joint Research Centre of the European Commission. Seventeen invited experts - from academia, public sector, policymaking, research and consultancy firms - took part at the event, back in October 2018, to discuss the policy implications of the governance of (and with) data. While preparing the workshop, we realised how the various labels that circulated in the policy arena to tackle data governance - such as data sovereignty, data commons, data trusts, etc. - tended to be used equivocally to refer to different concepts (technical solutions, legal frameworks, economic partnerships, etc.), with their meaning slightly shifting according to the context. Furthermore, during the workshop, participants highlighted the widespread lack of knowledge and practical understanding of possible alternatives to the ‘data extraction’ approach of big online platforms, as well as the need to find ways to use data collected by private companies for the public interest, and the urgency to consider data subjects as key stakeholders for the governance of data. With all these insights in mind, we decided to engage in the research that lead to this article.

The key contributions of this publication, according to our view, are conceptual and empirical.

  • We developed a ‘social-science informed’ definition of data governance that draws from science and technology studies and critical data studies (hence, also from some key publications of this journal). We understood data governance as the power relations between all the actors affected by, or having an effect on, the way data is accessed, controlled, shared and used, the various socio-technical arrangements set in place to generate value from data, and how value is redistributed between actors. Such definition allows moving beyond concerns of technical feasibility, efficiency discourses and ‘solutionist’ thinking. Instead, it points to the actual goals for which data is managed, emphasizing who benefits from it, the power un(balances) among stakeholders, the kind of value produced, and the mechanisms (including underling principles and system of thoughts) that sustain this approaches. 

  • We conducted a review of relevant resources from the scientific and grey literature on the practices of data governance that lead to the identification of four emerging models: data sharing pools, data cooperatives, public data trusts and personal data sovereignty. As this is a rapidly evolving field, we did not aim at offering an exhaustive picture of all possible models - hence these four should not be understood as comprehensive. They also have to be contextualised in our conceptual approach, in the time span in which the research has been conducted and in the European focus taken by the article. Yet, they provide a basis to understand how the emerging data governance models are (re)thinking and redressing power asymmetries between big data platforms and other actors. In particular, they show how both civic society and public bodies are key actors for democratising data governance and redistributing value produced through data.
A social science-informed conceptualisation of data governance allows seeing ‘through the infrastructure’ and encourages asking certain questions, such as: what principles guide data sharing and use? What is done with data and who can access and participate in its governance? What value is produced and how it is redistributed? This kind of questions is particularly relevant today, given that the policy debate around data governance is very active at the moment (especially in Europe). The future of the data governance models examined in this article – and of any model that allows more actors to control data and use it for purposes beyond the generation of profit for big tech corporations – depends on the policy actions and the legal frameworks that will be developed to sustain them.

Vignette of the data governance models examined in the article.

Keywords: Data governance, Big Data, digital platforms, data infrastructure, data politics, data policy

Wednesday, 2 September 2020

COVID-19 is spatial: Ensuring that mobile Big Data is used for social good

by Age Poom, Olle Järv, Matthew Zook and Tuuli Toivonen

Big Data & Society 7(2), First published: August 28, 2020

The mobility restrictions related to COVID-19 pandemic have resulted in the biggest disruption to individual mobilities in modern times. Hot spots, quarantine, closed borders, video-conferencing, social distancing and temporary closure of workplaces, schools, restaurants and recreational facilities are all profoundly about distance, separation, and space. Examining the geographical aspect of the pandemic is important in understanding its broad implications, including the broader societal impacts of containment policies. 

The avalanche of mobile Big Data – location and time-stamp data from mobile phone call records, operating system, social media or apps – makes it possible to study the spatial effects of the crisis with spatiotemporal detail even at national and global scales. Beyond health care objectives such as understanding how virus transmission is mediated by human mobility or evaluating adherence to restrictions, mobile Big Data also allows us to understand the changes in people’s daily interactions, mobilities and socio-spatial responses across population groups.

Our advocacy for the use of these data, however, is tempered both by our experiences in recent months with the serious limitations of using mobile Big Data and our unease with the power of these same data to track, surveil and discipline social behaviour at the scale of entire populations. 

Thus, we pose the question: How can we use mobile Big Data for social good, while also protecting society from social harm? Drawing on the Estonian and Finnish experiences during the early phases of COVID-19 pandemic, we highlight issues with quickly developed ad hoc data products as well as the “black box” solutions (Pasquale, 2015) offered by large platform companies that created “new digital divides” among researchers (boyd and Crawford, 2012).

We argue that these examples demonstrate a clear need to re-evaluate the public-private relationships with mobile Big Data and propose two strategic pathways forward.

First, we call for transparent and sound mobile Big Data products that provide relevant up-to-date longitudinal data on the mobility patterns of dynamic populations. To help increase their usefulness, data products should be transparent about their production methodology, and ensure easy access and stability. 

Second, there is also a need to develop trustworthy platforms for collaborative use of raw individual level data. Secured and privacy-respectful access to near real-time raw data is needed for developing and testing sound methodologies for the above-mentioned data products. This would help bridge the Big Data digital divide, enable scientific innovation, and offering needed flexibility in responding to unanticipated questions on changing locations and mobilities in case of crises. To be clear, we do not view this as simple to achieve, particularly as we weigh what kind of institution might best fill this role, or how is “social good” defined and operationalized in practice. But addressing these issues via public debates and academic discourses will leave us better prepared for the next crisis.

Summing up,
  • We need harmonized and representative data about human mobility for better crisis preparedness and social good in general;
  • Methodological transparency about mobile Big Data products are vital for open societies and capacity building;
  • Access to mobile Big Data to develop feasible methodologies and baseline knowledge for public decision-making is needed before the next crisis occurs;
  • Recognizing the fundamental spatiality of the current COVID-19 crisis and crises more generally is the most relevant of all.

Mobile Big Data can help us to better understand and address the important spatial dimensions of COVID-19 pandemic and every other social phenomenon. The challenge is doing so responsibly (Zook et al., 2017) and not normalizing a lack of spatial privacy.


boyd, d, Crawford, K (2012) Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679.

Pasquale, F (2015) The Black Box Society. Cambridge: Harvard University Press.

Zook, M, Barocas, S, boyd, d, et al. (2017) Ten simple rules for responsible big data research. PLOS Computational Biology 13(3): e1005399.

Keywords: mobile Big Data, mobility, COVID-19, spatial data infrastructure, social good, mobile phone data, social media data, privacy

Tuesday, 1 September 2020

Designing for human rights in AI

Evgeni Aizenberg and Jeroen van den Hoven introduce their publication "Designing for human rights in AI" in Big Data & Society 7(2), First published: Aug 18, 2020.

Video abstract

Text abstract
In the age of Big Data, companies and governments are increasingly using algorithms to inform hiring decisions, employee management, policing, credit scoring, insurance pricing, and many more aspects of our lives. Artificial intelligence (AI) systems can help us make evidence-driven, efficient decisions, but can also confront us with unjustified, discriminatory decisions wrongly assumed to be accurate because they are made automatically and quantitatively. It is becoming evident that these technological developments are consequential to people’s fundamental human rights. Despite increasing attention to these urgent challenges in recent years, technical solutions to these complex socio-ethical problems are often developed without empirical study of societal context and the critical input of societal stakeholders who are impacted by the technology. On the other hand, calls for more ethically and socially aware AI often fail to provide answers for how to proceed beyond stressing the importance of transparency, explainability, and fairness. Bridging these socio-technical gaps and the deep divide between abstract value language and design requirements is essential to facilitate nuanced, context-dependent design choices that will support moral and social values. In this paper, we bridge this divide through the framework of Design for Values, drawing on methodologies of Value Sensitive Design and Participatory Design to present a roadmap for proactively engaging societal stakeholders to translate fundamental human rights into context-dependent design requirements through a structured, inclusive, and transparent process.

Keywords: Artificial intelligence, human rights, Design for Values, Value Sensitive Design, ethics, stakeholders

Monday, 31 August 2020

Epistemic clashes in network science: Mapping the tensions between idiographic and nomothetic subcultures

Mathieu Jacomy, Aalborg University

Big Data & Society 7(2), First published: Aug 30, 2020
Keywords: network science, controversy mapping, scale-freeness, complex network, network practices, nomothetic and idiographic

My interest for networks was passed to me by one of my teachers, Franck Ghitalla, who had just read Albert-László Barabási’s best seller book Linked. Like many others, I was intrigued by the discovery of the scale-free network, a new and exotic structure that scientists started to find in every aspect of the world, from genetics to economy, from the power grid to terrorism, and to love. Or at least, that is what Barabási claimed.

The scale-free network is special because a few nodes get most of the links, while the rest is poorly connected. The number of links follows a power law, a distribution already known to be intriguingly pervasive in physics. From there, Barabási and other researchers went on a quest to theorize a universal law of complex networks, using the scale-free model as a foundation. But as the emergent field of network science consolidated, the apparent simplicity of the situation faded away. More accurate measurements challenged the pervasiveness of the power law. Models required more sophistication. Power laws were found in non-scale-free contexts. Scale-freeness became more difficult to assess in empirical situations. Yet the pervasiveness of more or less scale-free networks remained: Network scientists had found something, but what? The field adopted flexible umbrella terms such as “complex network” and “heavy-tailed distribution” to account for the diversity of empirical cases. Network science persisted as a field, but the prospect of theorizing a universal law had lost momentum. The scale-free model was productive despite constant criticism, but now, two decades later, some want to let it go to the benefit of a more experiment-driven approach. They meet a fierce resistance.

Epistemic clashes in network science narrates and reflects on the two main disputes on scale-freeness. The first dispute started in 2005 with a pre-print contesting the characterization of the power law, discussed in blog posts first, and later in academic publications. A compromise position gradually emerged, giving the impression that a consensus had formed. Network scientists believed that the problem originated in a disciplinary divide between statisticians and physicists, and that a better mutual understanding had solved it. However, a second dispute started again in 2018, once again as a pre-print contesting the pervasiveness of scale-free networks, discussed on social media then in academic publications. It came as a surprise to many network scientists, who considered the case closed.

This repeated failure of network scientists to agree on the facts established by their own field motivated me to analyze their exchanges as a controversy, drawing a methodological inspiration from Bruno Latour. My material consists of a set of 40 academic and non-academic publications that I selected, coded and analyzed. I synthetize, document and illustrate the dynamic of the controversy by focusing on a reduced set of actors and claims. I then propose my own interpretation about the persistence of a disagreement, arguing against the commonly accepted idea that it roots in a disciplinary divide.

Publications analyzed in the study, and their authors. 

To social science scholars, I offer a guided tour of network science, a field that is probably less well known than social network analysis. My account should make clear that the field is not as unified as it may seem from a distance. I put a particular effort into providing a nuanced perspective on the epistemological commitments of physicists such as Barabási. Their nomothetic approach to knowledge postulates the existence of universal laws of nature – finding them being the purpose of science. Evelyn Fox Keller criticized Barabási’s “faith” in “the traditional holy grail of universal ‘laws,’” but I argue that researcher’s beliefs are of no relevance here. The postulated existence of laws acts as an epistemic device, driving methods and shaping scientific claims; laws, here, are not a knowledge, but a way to know. Barabási’s papers, contrary to his general public books, do not claim the existence of universal laws. Their scientific validity does not depend on their existence, because to postulate is not to believe; that is why the approach is scientifically effective – robust. Social science scholars may find instructive to follow the resistance of network science’s nomothetic claims to their constant criticism. Although the nomothetic approach might be ultimately losing its dominance on the field.

To network scientists, beyond a recap of a controversy they already know about, I offer a better explanation of why the controversy reopened recently. The commonly accepted idea that it is rooted in a disciplinary divide is not satisfying. Indeed, bridging the disciplinary gap did not solve the problem; on the contrary, Aaron Clauset, a researcher who actively worked at it, co-authored the pre-print that reopened the controversy. I borrow Peter Galison’s concept of trading zones to develop a better diagnosis. I argue that since the inception of network science in the late nineties, theorists have been trading their models for the results of experimentalists. Theorists offered the scale-free model, postulating the pervasiveness of the power law; they needed experimental evidence of this pervasiveness. Experimentalists needed theoretical material to design their experiments; they employed the scale-free model and produced enough empirical evidence to ground the claim to pervasiveness of scale-free networks. This trade was beneficial to both sides, but it was not symmetrical. I argue that the controversy arose when experimentalists (e.g. Clauset) pushed for their own program to characterizing power laws independently of the scale-free model. This program is problematic because it breaks the exchange. Indeed, without the scale-free model, the experimental results do not benefit theorists such as Barabási; worse, they might challenge existing models and lead to new theories. I suggest that the situation is controversial because the balance of power is shifting in network science: from a dynamic where theory leads to experiments (a theory-driven program), the field is moving to experiments calling for new theories (experiment-driven program).

Monday, 24 August 2020

DEMOS: Intentionality and Design in Data Sonification of Social Issues

DEMOS are annual multi-media demonstrations focused on new methods, visualizations, experiments and approaches to the analysis of Big Data and curated by the Editorial Team of Big Data and Society  This years DEMOS is by Sara Lenzi and Paolo Ciuccarelli and titled "Intentionality and design in the data sonification of social issues".  Please see the brief overview below or read the paper at the link above.

Data sonification is the practice of representing data through sound. From the sonar to the Geiger counter, from the beep of an electric car approaching or the buzz of a forgotten seatbelt, the use of sound to convey information is not new. An official, academic definition of data sonification dates back to the 1990s, when the International Community for Auditory Display was founded. From that point forward sonification has been applied and debated as a possible means of representing numerical data in support of scientific analysis. Early fields of application include the auditory translation of seismic waves into audible frequencies, the use of sound to complement visualization of astronomical data, sonification of stock markets historical data to predict markets’ behaviour, and the representation of ECG data as sound. As data has dramatically expanded into everyday life questions about how data are represented and communicated has become a key issue in particular, in the case of data dealing with complex social issues. 

Traditional visual representational methods struggle to support publics in gathering insights and engaging with complex, abstract, multi-dimensional sets of data so that awareness and knowledge is increased. In this context, the use of sound - a sensory modality often described as engaging listeners at a visceral, intuitive level - to represent data has, in recent years, gained momentum. But how does a practice born to support experts in performing task-specific, data-driven scientific analysis transition to a means of mass communication addressing relevant social issues for a non-expert audience? One might look to other sound-related disciplines that have proven extremely successful in complementing audiences’ visual experiences. For instance, sound design for film and video gaming, both well- established design practices with a solid theoretical framework and structured educational curricula, focus attention both on the aesthetic component of the auditory experience and the efficiency and efficacy of the informative content. As a design discipline, sound design intentionally addresses the needs and the goals of the listener to build a purposeful, contextualised experience that engages as well as informs a non-specialised audience. The data sonification research community is explicitly advocating for the inclusion of such design practices into the process of creating sonifications for non-expert listeners. Such an approach will inevitably lead authors/designers to make intentional, deliberate decisions to address the specific needs of a given context and with a purpose, when transforming data into sound. 

In this article we introduce intentionality as a framing condition for sonification as a design process. Through the lenses of intentionality, we analyse five recent projects of sonifications which aim to engage publics with social issues. Taking into consideration explicit statements of the authors/designers of these sonifications, we distribute the works on a scale from a higher to a lower degree of intentionality. 

At the highest degree of intentionality sits Egypt Building Collapses by the activist group Tactical Technology Collective. In this work, data on one year of accidents involving the sudden collapse of residential buildings in Egypt are visualized on a dedicated website to raise awareness of a serious issue affecting Egyptian society, that resulted in 192 casualties and more than 800 homeless families in one year. The sonification takes the form of a 2.35 minute-long soundscape built on the occurrences of such accidents over one year and uses real sounds of collapsing buildings, an explicit choice made by the authors to use sound as a ‘connecting element’ between the real experience of the phenomenon (a building collapsing) and the abstract figures (the data) representing it. At the opposite pole of our intentionality scale sits Two Trains, by the sound artist Brian Foo, which uses music samples scoured from the Internet to represent income inequality in the areas crossed by Line Two of the New York City’s subway. In the words of the author, the selection of ‘agnostic sound traits’ will allow data to ‘speak for themselves’. 

But can data really speak from themselves? What is the role of the author in shaping the message conveyed to the listener through sound?  Through the discussion of these cases and others sitting at different points in the scale of intentionality we recognize the inevitability of a communicative relationship in every translation process - and the need to design it intentionally and responsibly for data sonification to become an important medium of communication for a wider audience. 

Monday, 20 July 2020

Summer break

The journal Big Data & Society will be on summer break from July 22nd to August 21st. Please accept any delays in processing and reviewing your submission, and in related correspondence during that time.

Have a great summer!