Sunday, 7 December 2025

Guest Blog: The Sociotechnical Politics of Digital Sovereignty: Frictional Infrastructures and the Alignment of Privacy and Geopolitics

by Samuele Fratini

Fratini, S. (2025). The sociotechnical politics of digital sovereignty: Frictional infrastructures and the alignment of privacy and geopolitics. Big Data & Society, 12(4), https://doi.org/10.1177/20539517251400729. (Original work published 2025)

When policymakers and scholars talk about digital sovereignty, they mostly talk about laws, borders and state power. Nevertheless, everyday technologies often seem to do the same kind of political work. I applied a Science & Technology Studies approach to Threema, a Swiss secure-messaging app, because it offered a rich, situated case where design choices, contracts and marketing all appeared to be doing something like sovereignty in practice.

In the paper I argue two linked things. First, digital sovereignty is best seen as a hybrid black box: a provisional assemblage where technical choices (encryption, server location), institutional arrangements (procurement, service clauses) and discursive claims (trust, Swissness) come together and, unexpectedly, reinforce state digital sovereignty by reducing foreign dependencies. Second, the moments that reveal how that order is made are frictions — specific tensions that open the box. I trace three productive frictions in the Threema case: privacy (limits on metadata and surveillance), seclusion (refusal to interoperate), and territorialism (integration within institutions and servers).

Empirically, interviews with Threema staff and Swiss institutional actors, together with corporate and institutional documents, show how these frictions translate into concrete outcomes: server-location clauses, closed architectures, and privacy narratives that align private infrastructure with public expectations.

What should readers take away? If we want a meaningful theory of digital sovereignty, our epistemologies cannot stop at laws. We need to pay attention to procurement rules, certification systems and the everyday design decisions of developers: the practical mechanisms through which sovereignty is actually enacted. This paper offers the conceptual foundation to ground geopolitical competition onto everyday sociotechnical practices.

Wednesday, 3 December 2025

Guest Blog: LLMs and the Embedding of All the World

By Mikael Brunila



Brunila, M. (2025). Cosine capital: Large language models and the embedding of all things. Big Data & Society, 12(4), https://doi.org/10.1177/20539517251386055. (Original work published 2025)


Large language models (LLMs) have taken the world by storm since the release of ChatGPT in November 2022. The promise of Artificial Intelligence (AI) evolving into Artificial General Intelligence (AGI) is prompting companies to invest increasingly substantial sums in these technologies. This mania is only paralleled by quickly growing fears of automation and replacement among intellectual and creative workers of all stripes.


Looking beyond these dominant narratives on AI, I wanted to take a closer glimpse at the first principles of language modelling to understand (1) how they make sense of the world and (2) what kind of fundamental power asymmetries might result from their proliferation. Through examining Word2Vec, an early neural language model, I show that language models (whether large or small) extend a metaphor of communication that was established during early information theory and the encoding of data as “bits”. Instead of reducing the world into zeros and ones, LLMs encode words into “embeddings”, a series of hundreds of numbers that are “learned” as the model is trained on massive troves of textual data. What is more, any sequential data, not only text, can function as the input for producing embeddings. Text, images, behavioral profiles of swipes on dating apps, listening preferences on streaming services, clicks in browser sessions, and DNA sequences can all be “embedded”.


To describe the consequences of this new regime of modeling, I introduce the concept of “cosine capital”. This term is informed by two lines of inquiry. On the one hand, “cosine” refers to the measure used to evaluate how similar two embeddings are. On the other hand, “capital” describes the manner in which embeddings are accumulated over time. As technical systems increasingly rely on embeddings, our interactions with these systems end up producing data for the fine-tuning of more and “better” embeddings. This is what happens when we use ChatGPT, for instance. This movement, where companies that control a technology end up accumulating more and more of it, is reminiscent of Marx’s understanding of value as something that is always in motion. Cosine capital, then, is my attempt to theorize how LLMs act as harbingers of a new paradigm of quantifying the social world.


The article is supplemented with a GitHub repository that dives deeper into the technical relationship between bits, entropy, and embeddings.

Saturday, 15 November 2025

Guest Blog: Data Economy: A Discussion on Value, Fictitious Valorization, and National Sovereignty

by César Bolaño and Fabrício Zanghelini

Bolaño, C., & Zanghelini, F. (2025). Data economy: A discussion on value, fictitious valorisation, and national sovereignty. Big Data & Society, 12(4), https://doi.org/10.1177/20539517251396101. (Original work published 2025)

In this article, we outline key elements for a critique of the so-called data economy, often presented as a new phase of capitalism that would break with the model of flexible accumulation and the finance-dominated regime of regulation.

This phenomenon derives from the Third Industrial Revolution, which expanded the capacity to collect, store, and process massive amounts of data through digital systems. Our analysis focuses on its economic dimension, though it must be recalled that data extraction ultimately serves purposes of surveillance and social control by dominant economic agents. In this sense, big data essentially means control.

Data can be integrated into concrete productive processes. In advanced industries, real-time data enhance coordination and monitoring, as in automotive production systems. Platforms such as Uber use their data-driven digital architecture to exert external control over labor, materially subsuming it under capital and parasitically appropriating part of drivers’ income without establishing a wage relation — a regressive form of accumulation. Data are also employed in the production of the audience-commodity within cultural industries, making them increasingly segmented. Likewise, informational labor can transform data into information-commodities, which may serve, for instance, the real estate and construction sector, guiding speculation and the planning of new developments.

Against fetishistic views that attribute intrinsic value to things, we argue that data are not commodities a priori but crude raw material devoid of intrinsic value. Digital infrastructures operate, at an initial moment, to capture and detach data from their connection with individuals. In Marx’s view, all things that labor merely separates from their immediate connection with their natural environment become objects of labor, provided by nature; only through the transformative action of labor do they become raw materials endowed with value. The same reasoning applies to data: the mere separation and unstructured storage of dispersed data do not make them raw materials — nor commodities — endowed with value.

Beyond their concrete uses in production, we hypothesize that data may represent an innovative form of fictitious capital, in Marx’s terms, whose valorization is based on the promise of future revenues. Thus, when extracted and commercialized by corporations controlling large repositories, sets of raw data — though valueless — may operate as capital without being capital a priori, fueling speculation and fictitious valorization.

Finally, we note the erosion of the national state’s monopoly over the production and organization of official information, now transferred to external agents, thereby threatening national sovereignty. The case of official statistics is emblematic but only part of a broader issue: digital networks and platforms have become central to the current mode of capitalist regulation, deepening core–periphery asymmetries and extending the commodity form into the most intimate spheres of social life.

Just as society once resisted the patenting of genes, it must now confront the private exploitation of data and the growing power of digital platforms. Yet public management of data is legitimate only insofar as it ensures statistical confidentiality, anonymized access, and the use of information for collective well-being.

Thursday, 30 October 2025

Guest Blog: Synthetic Data as Meaningful Data. On Responsibility in Data Ecosystems

by Marianna Capasso

Capasso, M. (2025). Synthetic data as meaningful data. On Responsibility in data ecosystems. Big Data & Society, 12(4). https://doi.org/10.1177/20539517251386053 (Original work published 2025)

In the age of large-scale AI, data is more than just fuel, it's value. Yet as AI systems grow more powerful, they are also becoming increasingly data-hungry. In this context, synthetic data emerges as a promising solution: artificially generated data that may offer an infinite supply, privacy by design, and new ways to overcome bias in training data. But while synthetic data holds technical promise, the challenges run much deeper than simply generating convincing data points.
This study argues for viewing synthetic data not as a technical fix, but through an analogical lens. Rather than assessing synthetic data solely by how well it mimics or replaces the real, it should be understood as a relational and regulative concept — one that reflects the complex systems of science and innovation in which data is embedded, including the tensions, power structures, and governance models that surround it.
Key takeaways from this study include:
·      - Synthetic data introduces an analogical perspective on data. This shift invites more critical thinking about the trade-offs involved in synthetic data generation and use, and how it might be shaped to better serve data justice. Questions like “What is high-quality synthetic data?” must be asked alongside: “For whom? In what context? At what cost?”
·      - Current quality metrics fall short, especially when they ignore broader concerns such as fairness, representational harm, or model collapse. Cross-disciplinary collaboration is therefore essential for embedding responsibility in complex data innovation ecosystems.
·      - Synthetic data should not only aim to mitigate bias, but also support algorithmic reparation and responsiveness, proactively addressing historical and systemic bias in data-driven technologies by continuously adapting to emerging societal values, needs, and contextual changes.

Tuesday, 28 October 2025

Guest Blog: Healthcare big data projects are producing ambivalence among members of the public

by Shaul A. Duke, Peter Sandøe, Thomas Andras Matthiessen Skelly, and Sune Holm

Duke, S. A., Sandøe, P., Skelly, T. A. M., & Holm, S. (2025). Supportive but apprehensive: Ambivalent attitudes towards a Danish public health AI and surveillance-driven project. Big Data & Society12(4), https://doi.org/10.1177/20539517251386035. (Original work published 2025)

Healthcare big data projects are prone to give rise to ambivalent attitudes in the general public because they both tend to include features that seem promising and have the potential of benefiting the public, while also having features that seem risky, with potentially harmful implications on individuals and society. For instance, the goal of such projects most often promises to improve public health and address a real problem that exists and negatively affects people’s lives. However, the method of achieving this goal most often entails the expansion of surveillance measures, which may challenge individuals’ privacy and autonomy, including the use of artificial intelligence methods with all the ethical challenges they introduce, and the involvement of partners that members of the public might disapprove of.

Similar to other social situations – such as when having to cast a vote for a candidate or a political party – it is impossible for members of the public just to support the features they like and discard the rest. Instead, one needs to consider the package as a whole. Thus, it is not possible to vote for half a party, or for just part of the political agenda of a candidate. The literature on the political arena has taught us that in such mixed-bag situations, voters often become ambivalent, with a tension between support and apprehension building up. Our research, based on long semi-structured interviews, has similarly found that ambivalence towards healthcare big data projects manifests in a dual stance.

Most interestingly, we found that interviewees are using several techniques to alleviate the ambivalent tension once we presented the project and asked for their opinion about it. Specifically, these techniques enabled them to support the project as a whole, while still being apprehensive about some aspects within it. This reaction we understand as driven by their reluctance to miss out on potential healthcare benefits.

While duality was widely recognized by several texts that examine public opinion on healthcare big data/AI projects, ambivalence was not, and constitutes this article’s contribution to this field. We hope that this article will inspire other scholars to consider ambivalence and its alleviation in future research.

Tuesday, 21 October 2025

Guest Blog: AI is Mental – Evaluating the Role of Chatbots in Mental Health Support

by Briana Vecchione and Ranjit Singh

Vecchione, B., & Singh, R. (2025). Artificial intelligence is mental: Evaluating the role of large-language models in supporting mental health and well-being. Big Data & Society12(4), 20539517251383884. (Original work published 2025)

In our research on mental health and chatbots, we’ve been increasingly noticing people aren’t just using chatbots for mental health support; they’re actively comparing them to therapy.


Across platforms like ChatGPT and Replika, people turn to AI chatbots for emotional support once reserved for friends, therapists, or journals. These tools are always available and never judge—features that many of our research participants value. But what happens when care shifts from human relationships to algorithmic systems?


Chatbots are appealing because they ease the burden of emotional accounting, the work of naming what we feel and why. They respond quickly, “remember” past conversations, and provide validation. However, this sense of support can quickly feel superficial when the bot forgets, misremembers, or can’t go deeper. A good therapist doesn’t just affirm; they push, challenge, and guide.


A chatbot does not read body language, hold space, or pick up on subtleties beyond silences. However, its limitations can turn emotional reliance into harm. The story of Replika, an app where many users built romantic bonds with AI companions, shows how sudden changes in chatbot behavior can trigger grief and distress.


The impact extends beyond users. As chatbots scale, mental health professionals may be

pushed into oversight roles, reviewing transcripts or intervening when AI fails, instead of building deep, ongoing therapeutic relationships. This shift risks eroding the relational core of therapy and devaluing practitioners’ expertise.


Meanwhile, key ethical and regulatory questions remain unanswered. Who owns your data

when you confide in a chatbot? Who’s accountable when harm occurs? Framing AI companions as inevitable risks normalizes care that is scalable but shallow.


The perennial availability of AI is rapidly reshaping our expectations of care. People now seek support that is always-on, efficient, and emotionally responsive. Yet, it is also often limited. Without proper discernment, we risk mistaking availability for presence, agreeability for empathy, and validation for understanding. Care is not mere output. It lives in the slow, difficult work of attunement, which no machine can truly provide.

Thursday, 25 September 2025

Guest Blog: Interviewing AI: Using Qualitative Methods to Explore and Capture Machines’ Characteristics and Behaviors

by Mohammad Hossein Jarrahi

Jarrahi, M. H. (2025). Interviewing AI: Using qualitative methods to explore and capture machines’ characteristics and behaviors. Big Data & Society12(3), 20539517251381697. (Original work published 2025)

AI systems are now woven into the everyday lives of millions, yet their behavior often surprises even their own creators. Traditional methods of researching technology can certainly tell part of the story, especially how different groups of users make sense of these tools, but they may miss how AI systems actually act and behave in unpredictable ways. I argue that we need to develop new methods of data collection and analysis. This paper therefore grew out of that gap. I asked a simple question: what if I studied AI the way social scientists study people, by interviewing it?

 

I introduce "interviewing AI," a qualitative toolkit for making sense of machine behavior. It begins with open exploration to learn a system's quirks. It then moves to structured probing that varies prompts, stages scenarios, pushes at boundaries, and poses counterfactual "what if" questions to surface hidden patterns, breakdowns, and traces of reasoning. To widen the lens, I offer two designs: Temporal Interaction Analysis tracks the same system over time to capture shifts after updates or sustained use, and Comparative Synchronic Analysis compares multiple systems on the same tasks to reveal differences that matter in practice.

 

I pair these methods with qualitative analysis techniques such as thematic coding and critical discourse analysis to identify patterns and uncover embedded biases. Throughout, I argue for transparency and reflexivity, and I caution against reading too much human intent into machine outputs (i.e., anthropomorphization). I also make a case for treating prompts and AI responses as a single unit of analysis, as humans and AI co-produce the exchange.

 

Readers can expect a practical, research-ready holistic qualitative framework that complements other research approaches in human-centered studies of AI, helping designers, governors, and scholars document where systems help, where they fail, and why, and turn messy interactions into rigorous insights.