Tuesday, 23 December 2025

Guest Blog: Cross-Cultural Challenges in Generative AI: Addressing Homophobia in Diverse Sociocultural Contexts

 by Lilla Vicsek, Mike Zajko, Anna Vancsó, Judit Takacs, and Szabolcs Annus

Vicsek, L., Zajko, M., Vancsó, A., Takacs, J., & Annus, S. (2025). Cross-cultural challenges in generative AI: Addressing homophobia in diverse sociocultural contexts. Big Data & Society, 12(4), https://doi.org/10.1177/20539517251396069. (Original work published 2025)

Calls for greater cultural sensitivity in artificial intelligence are increasingly shaping debates about how generative AI should interact with users across the world. Yet, when these technologies adapt too closely to local values, tensions can emerge between cultural sensitivity and universal human rights. Our research set out to explore what this tension looks like in practice.

We analyzed how two major generative AI systems—ChatGPT 3.5 and Google Bard—respond to homophobic statements. To test whether contextual information would affect their replies, we varied the prompts by including details about the user’s religion or country. 

The results revealed ChatGPT’s responses often reflected cultural relativism, emphasizing that different cultures hold different viewpoints on this topic and that all perspectives should be respected. Responses of Bard, in contrast, tended to foreground human rights more, providing stronger and more explicit support for LGBTQ+ people and equality. Both systems demonstrated significant variation in their responses depending on the background information of the user, suggesting that AI systems may adjust the degree and form of support they express for LGBTQ+ people and issues according to the information they receive about a user. 

By uncovering this dynamic, our study highlights an important ethical dilemma. While cultural awareness is important, AI’s efforts to align with diverse values must not come at the cost of endorsing discriminatory or exclusionary beliefs. We argue that generative AI design and governance should be firmly grounded in universal human rights principles to ensure protection for marginalized groups across cultures.

Wednesday, 17 December 2025

Guest Blog: Navigating AI–nature frictions: Autonomous vehicle testing and nature-based constraints

Das, P., Woods, O., & Kong, L. (2025). Navigating AI–nature frictions: Autonomous vehicle testing and nature-based constraints. Big Data & Society, 12(4). https://doi.org/10.1177/20539517251406123 (Original work published 2025)

AI technologies are increasingly embedded in urban environments, with expanding applications across transportation, healthcare, disaster management, and public safety. Often promoted by tech firms and policymakers as a transformative tool for more efficient urban management, AI is often imagined as operating smoothly across urban spaces. In practice, however, urban environments are inherently erratic, shaped by complex and unpredictable social, political, cultural, and ecological interactions. These interactions and conditions are not always compatible with AI systems. Rather than viewing such challenges as temporary obstacles, our work foregrounds the tensions and frictions that fundamentally shape how AI is integrated into urban life.

In our recent paper, we focus on one specific dimension of AI–urban friction: the unpredictability of nature-based factors such as weather patterns and vegetation growth. We explore this through the case of autonomous vehicle (AV) testing in Singapore. The study forms part of a larger research project on smart city knowledge transfer in Southeast Asia, spanning Singapore, Thailand, Indonesia, and Vietnam. As a leader in digital transformation, Singapore positions itself as a testbed for advanced technologies such as AI and digital twins. During our fieldwork, AV testing emerged as a key site where frictions between AI systems and urban nature became especially visible in Singapore.

To make sense of these dynamics, we introduce the concept of frictional urbanisms in our paper. This framework captures how the smooth operational demands of AI comes into conflict with the rough and unpredictable conditions of urban environments. Our findings show that such frictions are not exceptional circumstances, but everyday conditions that shape how AI systems are tested, adapted, and governed.

The paper also lays the foundation for our next research project, “Autonomizing environmental governance in Asian cities: AI, climate change and frictional urbanisms,” funded by the Ministry of Education, Singapore. Beginning in January 2026, the project will examine how AI is reshaping urban environmental governance across South and Southeast Asia, with a focus on the emerging challenges at the intersection of two rapidly evolving fields: AI and urban climate change governance.

Sunday, 7 December 2025

Guest Blog: The Sociotechnical Politics of Digital Sovereignty: Frictional Infrastructures and the Alignment of Privacy and Geopolitics

by Samuele Fratini

Fratini, S. (2025). The sociotechnical politics of digital sovereignty: Frictional infrastructures and the alignment of privacy and geopolitics. Big Data & Society, 12(4), https://doi.org/10.1177/20539517251400729. (Original work published 2025)

When policymakers and scholars talk about digital sovereignty, they mostly talk about laws, borders and state power. Nevertheless, everyday technologies often seem to do the same kind of political work. I applied a Science & Technology Studies approach to Threema, a Swiss secure-messaging app, because it offered a rich, situated case where design choices, contracts and marketing all appeared to be doing something like sovereignty in practice.

In the paper I argue two linked things. First, digital sovereignty is best seen as a hybrid black box: a provisional assemblage where technical choices (encryption, server location), institutional arrangements (procurement, service clauses) and discursive claims (trust, Swissness) come together and, unexpectedly, reinforce state digital sovereignty by reducing foreign dependencies. Second, the moments that reveal how that order is made are frictions — specific tensions that open the box. I trace three productive frictions in the Threema case: privacy (limits on metadata and surveillance), seclusion (refusal to interoperate), and territorialism (integration within institutions and servers).

Empirically, interviews with Threema staff and Swiss institutional actors, together with corporate and institutional documents, show how these frictions translate into concrete outcomes: server-location clauses, closed architectures, and privacy narratives that align private infrastructure with public expectations.

What should readers take away? If we want a meaningful theory of digital sovereignty, our epistemologies cannot stop at laws. We need to pay attention to procurement rules, certification systems and the everyday design decisions of developers: the practical mechanisms through which sovereignty is actually enacted. This paper offers the conceptual foundation to ground geopolitical competition onto everyday sociotechnical practices.

Wednesday, 3 December 2025

Guest Blog: LLMs and the Embedding of All the World

By Mikael Brunila



Brunila, M. (2025). Cosine capital: Large language models and the embedding of all things. Big Data & Society, 12(4), https://doi.org/10.1177/20539517251386055. (Original work published 2025)


Large language models (LLMs) have taken the world by storm since the release of ChatGPT in November 2022. The promise of Artificial Intelligence (AI) evolving into Artificial General Intelligence (AGI) is prompting companies to invest increasingly substantial sums in these technologies. This mania is only paralleled by quickly growing fears of automation and replacement among intellectual and creative workers of all stripes.


Looking beyond these dominant narratives on AI, I wanted to take a closer glimpse at the first principles of language modelling to understand (1) how they make sense of the world and (2) what kind of fundamental power asymmetries might result from their proliferation. Through examining Word2Vec, an early neural language model, I show that language models (whether large or small) extend a metaphor of communication that was established during early information theory and the encoding of data as “bits”. Instead of reducing the world into zeros and ones, LLMs encode words into “embeddings”, a series of hundreds of numbers that are “learned” as the model is trained on massive troves of textual data. What is more, any sequential data, not only text, can function as the input for producing embeddings. Text, images, behavioral profiles of swipes on dating apps, listening preferences on streaming services, clicks in browser sessions, and DNA sequences can all be “embedded”.


To describe the consequences of this new regime of modeling, I introduce the concept of “cosine capital”. This term is informed by two lines of inquiry. On the one hand, “cosine” refers to the measure used to evaluate how similar two embeddings are. On the other hand, “capital” describes the manner in which embeddings are accumulated over time. As technical systems increasingly rely on embeddings, our interactions with these systems end up producing data for the fine-tuning of more and “better” embeddings. This is what happens when we use ChatGPT, for instance. This movement, where companies that control a technology end up accumulating more and more of it, is reminiscent of Marx’s understanding of value as something that is always in motion. Cosine capital, then, is my attempt to theorize how LLMs act as harbingers of a new paradigm of quantifying the social world.


The article is supplemented with a GitHub repository that dives deeper into the technical relationship between bits, entropy, and embeddings.

Saturday, 15 November 2025

Guest Blog: Data Economy: A Discussion on Value, Fictitious Valorization, and National Sovereignty

by César Bolaño and Fabrício Zanghelini

Bolaño, C., & Zanghelini, F. (2025). Data economy: A discussion on value, fictitious valorisation, and national sovereignty. Big Data & Society, 12(4), https://doi.org/10.1177/20539517251396101. (Original work published 2025)

In this article, we outline key elements for a critique of the so-called data economy, often presented as a new phase of capitalism that would break with the model of flexible accumulation and the finance-dominated regime of regulation.

This phenomenon derives from the Third Industrial Revolution, which expanded the capacity to collect, store, and process massive amounts of data through digital systems. Our analysis focuses on its economic dimension, though it must be recalled that data extraction ultimately serves purposes of surveillance and social control by dominant economic agents. In this sense, big data essentially means control.

Data can be integrated into concrete productive processes. In advanced industries, real-time data enhance coordination and monitoring, as in automotive production systems. Platforms such as Uber use their data-driven digital architecture to exert external control over labor, materially subsuming it under capital and parasitically appropriating part of drivers’ income without establishing a wage relation — a regressive form of accumulation. Data are also employed in the production of the audience-commodity within cultural industries, making them increasingly segmented. Likewise, informational labor can transform data into information-commodities, which may serve, for instance, the real estate and construction sector, guiding speculation and the planning of new developments.

Against fetishistic views that attribute intrinsic value to things, we argue that data are not commodities a priori but crude raw material devoid of intrinsic value. Digital infrastructures operate, at an initial moment, to capture and detach data from their connection with individuals. In Marx’s view, all things that labor merely separates from their immediate connection with their natural environment become objects of labor, provided by nature; only through the transformative action of labor do they become raw materials endowed with value. The same reasoning applies to data: the mere separation and unstructured storage of dispersed data do not make them raw materials — nor commodities — endowed with value.

Beyond their concrete uses in production, we hypothesize that data may represent an innovative form of fictitious capital, in Marx’s terms, whose valorization is based on the promise of future revenues. Thus, when extracted and commercialized by corporations controlling large repositories, sets of raw data — though valueless — may operate as capital without being capital a priori, fueling speculation and fictitious valorization.

Finally, we note the erosion of the national state’s monopoly over the production and organization of official information, now transferred to external agents, thereby threatening national sovereignty. The case of official statistics is emblematic but only part of a broader issue: digital networks and platforms have become central to the current mode of capitalist regulation, deepening core–periphery asymmetries and extending the commodity form into the most intimate spheres of social life.

Just as society once resisted the patenting of genes, it must now confront the private exploitation of data and the growing power of digital platforms. Yet public management of data is legitimate only insofar as it ensures statistical confidentiality, anonymized access, and the use of information for collective well-being.

Thursday, 30 October 2025

Guest Blog: Synthetic Data as Meaningful Data. On Responsibility in Data Ecosystems

by Marianna Capasso

Capasso, M. (2025). Synthetic data as meaningful data. On Responsibility in data ecosystems. Big Data & Society, 12(4). https://doi.org/10.1177/20539517251386053 (Original work published 2025)

In the age of large-scale AI, data is more than just fuel, it's value. Yet as AI systems grow more powerful, they are also becoming increasingly data-hungry. In this context, synthetic data emerges as a promising solution: artificially generated data that may offer an infinite supply, privacy by design, and new ways to overcome bias in training data. But while synthetic data holds technical promise, the challenges run much deeper than simply generating convincing data points.
This study argues for viewing synthetic data not as a technical fix, but through an analogical lens. Rather than assessing synthetic data solely by how well it mimics or replaces the real, it should be understood as a relational and regulative concept — one that reflects the complex systems of science and innovation in which data is embedded, including the tensions, power structures, and governance models that surround it.
Key takeaways from this study include:
·      - Synthetic data introduces an analogical perspective on data. This shift invites more critical thinking about the trade-offs involved in synthetic data generation and use, and how it might be shaped to better serve data justice. Questions like “What is high-quality synthetic data?” must be asked alongside: “For whom? In what context? At what cost?”
·      - Current quality metrics fall short, especially when they ignore broader concerns such as fairness, representational harm, or model collapse. Cross-disciplinary collaboration is therefore essential for embedding responsibility in complex data innovation ecosystems.
·      - Synthetic data should not only aim to mitigate bias, but also support algorithmic reparation and responsiveness, proactively addressing historical and systemic bias in data-driven technologies by continuously adapting to emerging societal values, needs, and contextual changes.

Tuesday, 28 October 2025

Guest Blog: Healthcare big data projects are producing ambivalence among members of the public

by Shaul A. Duke, Peter Sandøe, Thomas Andras Matthiessen Skelly, and Sune Holm

Duke, S. A., Sandøe, P., Skelly, T. A. M., & Holm, S. (2025). Supportive but apprehensive: Ambivalent attitudes towards a Danish public health AI and surveillance-driven project. Big Data & Society12(4), https://doi.org/10.1177/20539517251386035. (Original work published 2025)

Healthcare big data projects are prone to give rise to ambivalent attitudes in the general public because they both tend to include features that seem promising and have the potential of benefiting the public, while also having features that seem risky, with potentially harmful implications on individuals and society. For instance, the goal of such projects most often promises to improve public health and address a real problem that exists and negatively affects people’s lives. However, the method of achieving this goal most often entails the expansion of surveillance measures, which may challenge individuals’ privacy and autonomy, including the use of artificial intelligence methods with all the ethical challenges they introduce, and the involvement of partners that members of the public might disapprove of.

Similar to other social situations – such as when having to cast a vote for a candidate or a political party – it is impossible for members of the public just to support the features they like and discard the rest. Instead, one needs to consider the package as a whole. Thus, it is not possible to vote for half a party, or for just part of the political agenda of a candidate. The literature on the political arena has taught us that in such mixed-bag situations, voters often become ambivalent, with a tension between support and apprehension building up. Our research, based on long semi-structured interviews, has similarly found that ambivalence towards healthcare big data projects manifests in a dual stance.

Most interestingly, we found that interviewees are using several techniques to alleviate the ambivalent tension once we presented the project and asked for their opinion about it. Specifically, these techniques enabled them to support the project as a whole, while still being apprehensive about some aspects within it. This reaction we understand as driven by their reluctance to miss out on potential healthcare benefits.

While duality was widely recognized by several texts that examine public opinion on healthcare big data/AI projects, ambivalence was not, and constitutes this article’s contribution to this field. We hope that this article will inspire other scholars to consider ambivalence and its alleviation in future research.