Monday 23 September 2024

BD&S 2024 Online Colloquium: POLITICS, POWER AND DATA

This year’s Big Data & Society colloquium centers on the theme of "Politics, Power, and Data," exploring the complex intersections where data, algorithms, and socio-political forces converge. Mark your calendars and be sure to join this exciting set of four talks. Details below.


Sunday 22 September 2024

Guest Blog: By Isak Engdahl

Agreements ‘in the wild’: Standards and alignment in machine learning benchmark dataset construction. Big Data & Society, 11:2, pp. 1–13. doi: 10.1177/20539517241242457

 

How do AI scientists create data? This ethnographic case study provides an empirical exploration into the making of a computer vision dataset intended by its creators to capture everyday human activities from first-person perspectives arguably "in the wild". While this insider term suggests that the data is direct and untouched, this article reveals the extensive work that goes into shaping and curating it. 

 

The dataset is shown to be the outcome of deliberate curation and alignment work. The work involved requires a lot of careful coordination. Using protocols like data annotation guidelines meant to standardize the inputs of metadata helps to ensure consistency in technical work, contributing to making it usable for machine learning development. 

 

A deeper intervention regards the dataset creators’ use of a standardized list of daily activities, based on an American government survey, to guide what the dataset should cover and what the data subjects should record. The list did not correspond with the lives of the data subjects situated in different contexts without friction. 

 

To address this, dataset creators engaged in alignment work—negotiating with data subjects to ensure their recordings fit the project’s needs. A series of negotiations brought the diverse contributions into a shared direction, structuring the activities and the subsequent data. The careful structuring of activities, the instructions given to participants, and the subsequent annotation of the data all contribute to shaping the dataset into something that can be processed by computers.

 

The study uses these results to promote a reconsideration of how we view the "wildness" of data—a quality associated with objectivity and detachment in science—and to recognize the layers of human effort that go into creating datasets. Can we consider data 'wild' if it’s so carefully curated? It is perhaps not as "wild" as it might seem—the alignment work required to make the dataset usable for machine learning does seem to blur the line between intervention and detachment. A fuller understanding of scientific and engineering practices thus emerges when we consider the often unseen, labor-intensive work that coordination and agreement within research teams rely on.

 

The dataset creators moreover developed specific benchmarks and organized competitions to measure the performance of different models on tasks like action recognition and 3D scene understanding based on the dataset. Benchmark datasets are crucial for evaluating and comparing the performance of machine learning models. Benchmarks, following the Common Task Framework, help standardize the evaluation process across the AI community: it enables distributed groups to collaborate. As actors convene to size up different models, benchmarks become a social vehicle that engenders shared meanings on model performance. However, they also reinforce the limitations inherent in the data, as they become the yardstick by which new models are evaluated. This underscores the importance of closely examining how benchmark datasets are constructed in practice, and the qualities they are attributed.

 

 

Saturday 14 September 2024

Guest Blog: The 'doings' behind the data

by Isaelle Donatz-Fest

Donatz-Fest, I. (2024). The ‘doings’ behind data: An ethnography of police data construction. Big Data & Society, 11(3). https://doi.org/10.1177/20539517241270695

Data are the lifeblood of algorithmic systems. But data are often taken for granted by public organizations who see them as something just lying around, ready to use. Such is the case with police reports, which are increasingly used as data for algorithmic applications for policing worldwide.

But there are ‘doings’ behind data. Data are created in unexpected places—like the front seat of a speeding police car or the desk of an overworked detective. Material factors and human actors interact behind-the-scenes, informing data creation and interpretation.

I spent ~200 hours (ethnographically) observing how street-level employees at the Netherlands Police translate events to police reports. What I found was that data work is deeply embedded in policing, shaped by personal values, organizational context, and practical considerations. 

Structured data often clashes with the officers' understanding of a situation. Registration software demands incidents are fit into predefined categories, but the messy world that we live in rarely fits neatly into such boxes. Unstructured data provides more flexibility and richness but introduces complexities for standardization and (algorithmic) interpretation. Open text fields open the door to linguistic nuances, inconsistencies, and what I term 'voice,' the various identities present in the text.

I saw officers wrestle with these limitations, sometimes bending rules, sometimes choosing the path of least resistance. These choices reflect officer values and the pressures they face. Whether it’s a commitment to justice, a desire to help a colleague, or the need to quickly move on to the next call, the context impacts the data directly.

This work offers new empirical insight on the data underpinning public sector algorithms. By understanding the doings behind data, we can begin to question how we use them in algorithmic systems, which is particularly relevant in fields as impactful and powerful as policing. 

Wednesday 28 August 2024

Guest Blog: Problem-solving? No, problem-opening! A Method to Reframe and Teach Data Ethics as a Transdisciplinary Endeavour

by Stefano Calzati and Hendrik Ploeger

Calzati, S., & Ploeger, H. (2024). Problem-solving? No, problem-opening! A method to reframe and teach data ethics as a transdisciplinary endeavour. Big Data & Society, 11(3). https://doi.org/10.1177/20539517241270687

Technology can go a long way in “chopping up” reality and reifying resources – and data are no exception to that. The thingness of data – just think of the refrain “data as the new oil” – is often considered as a given, i.e., a datum. Yet a growing body of research has shown that data are inherently sociotechnical, leading to regard them as bundlings originating from processes at the coalescing point of technical and non-technical actors, factors, and values. So, the questions become: How to operationalize this? How, for instance, to teach new generations of undergraduates being trained in computer science, data analytics, software engineering, and similar technical subjects, that data are sociotechnical bundlings? How to incorporate such understanding into their practices? 

In the article “Problem-solving? No, problem-opening! A Method to Reframe and Teach Data Ethics as a Transdisciplinary Endeavour” we set out to answer these questions. First, we reconceptualize data ethics as not much a normative (dos vs don’ts) and axiomatic (good vs bad) toolbox, but a critical compass to think about data as sociotechnical bundlings and orient their fair processing. This, in turn, entails that data technologies are always good and bad at once, insofar as they produce, at all times, value-laden entanglements and un/intended consequences that demand to be unpacked and assessed in context, i.e., from different perspectives, simultaneously, and over time. This is an inherent transdisciplinary endeavor which cuts across epistemological boundaries, resists any privilege point of reference, and configures an ongoing multidimensional analysis. 

What we describe in detail in the article is the application of this view to an elective course titled “Ethics for the data-driven city” which we purposedly designed and taught as part of the Geomatics master program at Delft University of Technology. Notably, we developed a transdisciplinary method that is not problem-solving, but problem-opening, that is, a method that help students recognize and problematize the irreducibility of all ethical stances and the contingency of all technological “solutions”, especially when these are situated in the city as a complex system that resists computation. Overall, the course compels students, on the one hand, to think critically about (the definition of) problems, by shifting the ground on which engineering problem-solving rests, and, on the other hand, to materialize such critical shift into their final assignments, conceived in the form of digital or physical artefacts.

Tuesday 18 June 2024

BD&S Journal will be on break from Aug 4th to Sept 4th, 2024

 The editorial team of the journal Big Data & Society will be on break from August 4th to September 4th 2024.  Please accept any delays in processing and reviewing your submission, and in related correspondence during that time. Thank you!


Wednesday 29 May 2024

Guest Blog: Jascha Bareis on Trust and AI

by Jascha Bareis

Bareis, J. (2024). The trustification of AI. Disclosing the bridging pillars that tie trust and AI together. Big Data & Society, 11(2). https://doi.org/10.1177/20539517241249430

Does it make sense to tie trust with AI?

Everywhere we look: companies, politics, and research: so many people are focusing on AI. Being approached as the core technological institution of our times, notions of trust are repeatedly mobilized. Especially policy makers seem to feel urged to highlight the need of trustworthiness in relation to AI. Take the current European AI act that claims the EU to be a “leader in the uptake of trustworthy AI” (AIA, Article 1a), or the US 2023 US executive order on “Safe, Secure and Trustworthy AI”.

I simply asked myself: Despite all this attention, is it at all legitimate to marry a technology with an intersubjective phenomenon that used to be reserved between humans only? I can trust my neighbor next door taking care of my cat, but can I trust the TESLA Smart Cat Toilet automating cleaning cycle to take care of its poo-poo, too (indeed, the process of gadget’ification and smartification does not spare cat toilets)?

Does it make sense at all to talk about trust in the latter case or are we just dealing with a conceptual misfit? Doing more research, I realized that the way trust is handled in both the policy and academic AI debate is very sloppy, staying undertheorized and just somehow taken for granted.

I noticed that users approach trust and AI as something intersubjective, expecting great things from their new AI powered gadget and then being utterly disappointed if it fails to do so (because even if branded as “smart”, there is actually no AI in the TESLA Smart Cat Toilet). Users perceive AI as something being highly mediated by powerful actors, as when Elon Musk trusts that AI will be the cure to the world’s problems, many people seem to follow blindly (but do they trust AI then, or Elon?). And as something that can mobilize greater political dimensions and strong sentiments. As when my friend told me that she would certainly distrust AI because she distrusted the integrity of EU politicians who instead of regulating it, let Big Tech get “greedy and rich”.

Communication, mediation, sentiments, expectations, power, misconceptions – all of this seemed to have a say in the relationship between AI and trust. This created a messy picture with AI and trust being enmeshed in a social complex interplay with overlapping epistemic realms and actors.

As a consequence, I set out to problematize this relationship in this paper. I argue that trust is located in the social. And only if one acknowledges that AI is a very social phenomenon as well, this relationship makes sense at all. AI produces notions of trust and distrust because it is woven and negotiated in the everyday realities of users and society, with AI applications mediating human relationships, producing intimacies, social orders and knowledge authorities. I came up with the following analytical scheme.



I run through the scheme in the paper and describe its value and limitations, rendering trustworthy AI as a constant and complex dynamic between the actual technological developments and the social realities, political power struggles and futures that are associated with it.