Sunday, 1 December 2024

Guest Blog: Navigating Crisis with Joint-Sensemaking: Insights from China's Health Code


by Yang Chen


Qu, J., Chen, L., Zou, H., Hui, H., Zheng, W., Luo, J.-D., Gong, Q., Zhang, Y., Wen, T., & Chen, Y. (2024). Joint-sensemaking, innovation, and communication management during crisis: Evidence from the DCT applications in China. Big Data & Society11(3). https://doi.org/10.1177/20539517241270714

 

In times of crisis, technology plays a key role in shaping collective responses. The COVID-19 pandemic highlighted the potential of digital contact tracing (DCT) applications, such as China's Health Code, to manage public health challenges. However, these technologies are not just technical solutions; they are sociotechnical phenomena emerging from complex interactions among technology, society, and governance.

 

The study "Joint-sensemaking, innovation, and communication management during crisis" explores how stakeholders in China navigated these complexities. It challenges the conventional view of innovation diffusion as linear, proposing instead a model of joint-sensemaking—a collaborative process of meaning-making among diverse actors.

 

A significant aspect of this research is the application of the structural hole theory, which examines how certain stakeholders, like official media, act as connectors or "structural hole spanners" within the communication network. These entities bridge different groups, facilitating information flow and influencing public sentiment more directly than traditional two-step flow models suggest. This highlights their critical role in shaping how innovations are perceived and adapted during crises.

 

The article provides empirical insights through the analysis of over 113,000 Weibo posts, revealing two pathways of sensemaking: the Patching and Add-in paths. These pathways illustrate how different interventions shape public sentiment and technology acceptance. By focusing on the dynamic interactions and the bridging roles played by key stakeholders, the study offers a nuanced understanding of innovation diffusion in crisis contexts.

 

We describe how these insights can inform strategies for policymakers and public health authorities. By leveraging the influence of structural hole spanners like official media, stakeholders can foster acceptance and trust, ensuring that DCT technologies are effectively integrated into society. This approach underscores the importance of continuous assessment and adaptation of communication strategies to meet the evolving needs of the public.

 

Overall, the study compels us to rethink crisis management as a transdisciplinary endeavor, involving ongoing dialogue and collaboration across different sectors and perspectives. It shifts the focus from problem-solving to problem-opening, encouraging a more comprehensive and inclusive approach to managing innovations during crises.

Tuesday, 26 November 2024

Bookcast: David Golumbia's Cyberlibertarianism

Bookcast details:

Featuring discussants: George Justice and Frank Pasquale
Book: Cyberlibertarianism: The Right-Wing Politics of Digital Technology (University of Minnesota Press, 2024)

This bookcast has been excerpted from a longer discussion: hear more on the University of Minnesota Press podcast here




Thursday, 7 November 2024

Paper Highlight: "Ethical scaling for content moderation: Extreme speech and the (in)significance of artificial intelligence"

Sahana Udupa discusses ''Ethical scaling for content moderation: Extreme speech and the (in)significance of artificial intelligence'', published by her and colleagues Antonis Maronikolakis and Axel Wisiorek in Big Data&Society. Read it on this link.

The paper received the ICA Outstanding Article Award for 2024. Congratulations!

Click here to know more about the Centre for Digital Dignity.

Friday, 25 October 2024

Guest Blog: Automated Informed Consent

by Adam John Andreotta and Björn Lundgren

Andreotta, A. J., & Lundgren, B. (2024). Automated informed consent. Big Data & Society, 11(4). https://doi.org/10.1177/20539517241289439

If you are an average internet user, you probably experience prompts for consent daily. For example, you may have been asked, via a pop-up window, whether you would like to accept all cookies, or only those necessary to function on a website you have visited. While it is a good thing that companies actually seek your consent, repeated requests cause so much annoyance that many of us become complacent about the content of the agreement. No one really has the time to read and accept every privacy policy that applies to them online. 

One solution to this problem that has emerged in the last few years is so-called “automated consent.” The idea is that software could learn what your privacy preferences are, and then do all the consenting for you. The idea is promising, but there are technical, legal, and ethical issues associated with it. In the paper, we deal with some of the ethical issues around automated consent. We start by articulating three different versions of automated consent: a reject-all option, a semi-automated option, and a fully automated option, which can feature AI and machine learning. Regardless of which version is used, we argue that a series of normative issues need to be wrestled with before wide scale adoption of the technology.  These include concerns about whether automated consent would prohibit peoples’ ability to give informed consent; whether automated consent might negatively impact people’s online competencies; whether the personal data collection required for automated consent to function raises new privacy concerns; whether automated consent might undermine market models; and how responsibility and accountability might be impacted by automated consent. Of course, there is much to be said regarding the legal implications of the technology itself, and we invite legal scholars and computer scientists to explore the regulatory implications and technological options related to automated consent in our discussion. 

Saturday, 5 October 2024

Guest Blog: Interoperable and Standardized Algorithmic Images: The Domestic War on Drugs and Mugshots Within Facial Recognition Technologies

by Aaron Tucker

Tucker, A. (2024). Interoperable and standardized algorithmic images: The domestic war on drugs and mugshots within facial recognition technologies. Big Data & Society, 11(3). https://doi.org/10.1177/20539517241274593

Generative AI (GenAI), such as Midjourney, Stable Diffusion, and DALL-E, are data visualization systems. Such technologies are the result of their training data in combination with dense algorithmic mathematics: the images produced by such systems surface that original training data, pairings of text and image, for better and worse. 

This dynamic is especially problematic when image data that are laced with racialized vectors of power, such as mugshots, are freely available to those building GenAI models. From their inception in the late 19th century, by scientists such as Francis Galton and Alphonse Bertillion, mugshots were always meant to be mobile and standardized: the accepted visuality of the mugshot, as a front facing pose and a side profile taken in light designed to maximize visibility, ensured that the photograph could be “accurately” compared to any face in question across a variety of locations. 

Such logics were adopted by computer scientists solving the problem of computational face recognition and mugshots. Reports such as the 1997 “Best Practice Recommendation for the Capture of Mugshots” stressed that mugshots needed to be “interoperable” so that they could shift between various FRTs and applications. 

Mugshots are intercut with socio-technical systems such as policing practices, mental health support, and addiction support; mugshots are not neutral images, but rather a composite of affect, lived narrative, social power structures, and, often, violence in many forms. Therefore, as Katherine Biber warned in her 2013 article, “In Crime’s Archive,” there are real dangers when the criminal archive slip uncritically into the cultural sphere. It is crucial that we pay attention to GenAI and the ways that it tells on itself and the data it is visualizing through its creations. As my article describes, the ability to generate images with the prompt of “a mugshot” that are defined by the same biases as mugshot databases is alarming. 

The solution is not to ban such prompts or crack down on prompt engineering that surfaces such results, but rather to address the root issue: the uncritical use and re-use of problematic data in machine training, not just in computer vision systems, but in all AI systems. 



Monday, 23 September 2024

BD&S 2024 Online Colloquium: POLITICS, POWER AND DATA

This year’s Big Data & Society colloquium centers on the theme of "Politics, Power, and Data," exploring the complex intersections where data, algorithms, and socio-political forces converge. Mark your calendars and be sure to join this exciting set of four talks. Details below.


Sunday, 22 September 2024

Guest Blog: By Isak Engdahl

Agreements ‘in the wild’: Standards and alignment in machine learning benchmark dataset construction. Big Data & Society, 11:2, pp. 1–13. doi: 10.1177/20539517241242457

 

How do AI scientists create data? This ethnographic case study provides an empirical exploration into the making of a computer vision dataset intended by its creators to capture everyday human activities from first-person perspectives arguably "in the wild". While this insider term suggests that the data is direct and untouched, this article reveals the extensive work that goes into shaping and curating it. 

 

The dataset is shown to be the outcome of deliberate curation and alignment work. The work involved requires a lot of careful coordination. Using protocols like data annotation guidelines meant to standardize the inputs of metadata helps to ensure consistency in technical work, contributing to making it usable for machine learning development. 

 

A deeper intervention regards the dataset creators’ use of a standardized list of daily activities, based on an American government survey, to guide what the dataset should cover and what the data subjects should record. The list did not correspond with the lives of the data subjects situated in different contexts without friction. 

 

To address this, dataset creators engaged in alignment work—negotiating with data subjects to ensure their recordings fit the project’s needs. A series of negotiations brought the diverse contributions into a shared direction, structuring the activities and the subsequent data. The careful structuring of activities, the instructions given to participants, and the subsequent annotation of the data all contribute to shaping the dataset into something that can be processed by computers.

 

The study uses these results to promote a reconsideration of how we view the "wildness" of data—a quality associated with objectivity and detachment in science—and to recognize the layers of human effort that go into creating datasets. Can we consider data 'wild' if it’s so carefully curated? It is perhaps not as "wild" as it might seem—the alignment work required to make the dataset usable for machine learning does seem to blur the line between intervention and detachment. A fuller understanding of scientific and engineering practices thus emerges when we consider the often unseen, labor-intensive work that coordination and agreement within research teams rely on.

 

The dataset creators moreover developed specific benchmarks and organized competitions to measure the performance of different models on tasks like action recognition and 3D scene understanding based on the dataset. Benchmark datasets are crucial for evaluating and comparing the performance of machine learning models. Benchmarks, following the Common Task Framework, help standardize the evaluation process across the AI community: it enables distributed groups to collaborate. As actors convene to size up different models, benchmarks become a social vehicle that engenders shared meanings on model performance. However, they also reinforce the limitations inherent in the data, as they become the yardstick by which new models are evaluated. This underscores the importance of closely examining how benchmark datasets are constructed in practice, and the qualities they are attributed.