Big Data & Society

Tuesday 9 April 2024

Guest Blog: Role-Based Privacy Cynicism and Local Privacy Activism: How Data Stewards Navigate Privacy in Higher Education

by Mihaela Popescu, Lemi Baruh, and Samuel Sudhakar

When was the last time you truly felt that adjusting your privacy settings on your most visited platform enhanced your safety? In today's digital age, especially in the United States, many users have come to accept that sacrificing privacy is an unavoidable consequence of engaging with digital technologies. This realization often breeds cynicism or apathy towards privacy, leading individuals to abandon efforts to safeguard their personal information.

This phenomenon, known by various names like privacy cynicism, privacy apathy or surveillance realism, encapsulates feelings of mistrust, powerlessness, and resignation that consumers commonly experience. While existing research focuses on data subjects' attitudes, our study presents a unique perspective – that of data workers who straddle the roles of both data subjects and data handlers in higher education settings. We aimed to explore the prevalence of privacy cynicism among these data workers and its potential impact on university data governance.

Projections indicate that the global market for big data analytics in education will exceed $50 billion by 2030. Within this landscape, university data professionals – including campus registrars, learning platform administrators, and information security officers – play a crucial role in safeguarding university data assets, albeit not always prioritizing the privacy of campus stakeholders. Our research, based on in-depth interviews with data professionals at California State University, unveiled significant findings:

1. Receptiveness to Datafication: Despite concerns about datafication trends, data professionals in higher education view its implementation as beneficial.

2. Tactics to Navigate Challenges: When faced with data misuse concerns, these professionals employ short-term "privacy activism" tactics to delay problematic uses.

3. Structural Changes vs. Short-Term Solutions: While effective in the short term, these tactics offer temporary fixes without fostering lasting structural changes.

Similar to consumer privacy cynicism, our interviews reflected a parallel sentiment among data professionals, particularly when organizational privacy definitions clashed with their personal beliefs. They grappled with powerlessness and disillusionment, exacerbated by the apathy shown by the very individuals they aim to protect.

A key insight from our study is the potential far-reaching consequences of this perception. A perceived lack of efficacy coupled with a perception that data subjects (namely, the students) don't care about privacy may lead to a spiral of resignation, reducing data professionals' motivation to advocate for enhanced privacy. This, in turn, limits data subjects' access to meaningful privacy options, further fueling their privacy apathy and cynicism.

Saturday 6 April 2024

Guest Blog: Situating Data Relations in the Datafied Home

by Gaia Amadori and Giovanna Mascheroni

Situating data relations in the datafied home: A methodological approach. Big Data & Society, 11(1). https://doi.org/10.1177/20539517241234268

As data relations, namely relations and communicative practices that are mediated, sustained, and shaped by the digital technologies that extract data, are pervading practices and imaginaries of parenting and childhood, the challenge of empirically studying datafication becomes particularly prominent in this context.

To address the epistemological and methodological challenges in the study of datafication from an everyday life perspective, we propose to focus on mediatized relations as a proxy for data relations. More specifically, drawing upon a non-media-centric figurational approach, we argue for the value of combining mixed method constructivist grounded theory methodology with network methods so as to materialise the relationships through, about and around data that emerge in contemporary family life. We do this by focusing on 3 households from a group of 20 with at least 1 child aged 8 years or younger in Italy, who participated in a qualitative longitudinal study on the datafication of childhood and family life.

The study aims to delineate an innovative methodological approach to highlighting the situatedness of data practices and imaginaries and developing new research tools to enhance the phenomenological richness of data practices in the diverse digital–material contexts of family life. In particular, we show how different family figurations translate into different patterns of mediatized relations and, consequently, of data relations, depending on cultural coordinates, such as parenting and mediation styles, as well as data and digital media imaginaries. Furthermore, we suggest how network methods represent a suitable tool for materialising the mediatized relations structure, providing a set of metrics and visualizations that can foster researchers’ and participants’ reflexivity.

In addition, we believe this approach can be extended beyond the home to understand how data relations reconfigure different communicative figurations.

Wednesday 20 March 2024

Guest Blog: Mapping the landscape of cloud AI: Microsoft, Google, Amazon, and the ‘industrialisation’ of artificial intelligence

By Fernando van der Vlist (@fvandervlist) and Anne Helmond (@silvertje)

Van der Vlist, F. N., Helmond, A., & Ferrari, F. L. (2024). Big AI: Cloud infrastructure dependence and the industrialisation of artificial intelligence. Big Data & Society, 11(1), 1–16.
https://doi.org/10.1177/20539517241232630

Convergence of AI and Big Tech—The ongoing competition among tech giants in the ‘cloud AI wars’ is shaping a supposed transformative era. Industry leaders like Bill Gates and Sundar Pichai underscore the foundational role of AI. However, this transformation is chiefly propelled by a select few—Microsoft, Google (Alphabet), and Amazon. These giants hold sway over the cloud computing landscape, wielding profound influence.

Characterising the platformisation and ‘industrialisation’ of AI

Van der Vlist, Helmond, and Ferrari’s comprehensive landscape study, titled ‘Big AI: Cloud infrastructure dependence and the industrialisation of artificial intelligence’, delves into the profound implications of the dominance wielded by these tech giants, introducing the term ‘industrialisation of AI’. This term captures the transition of AI systems from the realm of research and development to practical, ‘real-world’ applications across diverse industries. This transformation brings a new reliance on cloud infrastructure and substantial investments in computational resources, vital for the industrial-scale deployment of AI solutions. Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform emerge as the linchpin cloud platforms underpinning this ongoing industrialisation process.

The ramifications of their influence became glaringly evident during an AWS outage on June 13, 2023. The disruptions faced by clients like the Associated Press, McDonald’s, and Reddit underscored their extensive reliance on AWS. Market estimates emphasise AWS’s dominance, serving as the backbone of the Internet, followed by Microsoft Azure and Google Cloud. The comprehensive suites of cloud products and services offered by these companies not only underscore their dominance but also significantly contribute to their revenues.

Moreover, discursively, the term ‘AI’ acts as a powerful magnet, attracting substantial investments and prompting startups to seek partnerships with major players. This includes (exclusive) cloud provider partnerships such as between Microsoft Azure and OpenAI (powering ChatGPT and DALL·E, amongst others). These tech giants actively position themselves as essential infrastructure providers, pouring billions into costly cloud computing. As AI enters its ‘industrial age’, understanding the intricacies of AI’s value chains becomes crucial for strategic, political, and economic reasons.

The dominance of major tech companies is intrinsically tied to their control over infrastructure. This dominance, fueled by access to vast troves of data, substantial computational resources, and a geopolitical edge, underscores their pivotal role in driving AI development and deployment. As succinctly put by Kak and Myers West, ‘There is no AI without Big Tech’.

A ‘technography’ of AI and Big Tech: Infrastructure, models, and applications

To capture this structural convergence between AI and Big Tech, Van der Vlist et al. conceptualise ‘Big AI’. This term characterises the intricate interdependence between AI and the infrastructure, resources, and investments of major tech conglomerates. This structural dependency is the cornerstone of the ongoing industrialisation of AI. Their empirical analysis further substantiates these critiques. While ‘Big AI’ isn’t the sole trajectory for the future of AI, the continuous provisioning of essential infrastructure services by Microsoft, Google (Alphabet), and Amazon positions them to reap the benefits of AI’s widespread expansion across industry sectors.

In their empirical exploration—characterised as a ‘technography of cloud AI’—, they engage with the material aspects of cloud AI to examine its structural and operational features. They uncover various forms of support and investment and scrutinise the cloud platform offerings from Microsoft, Google, and Amazon. This comprehensive approach provides unique insights into the current state and evolution of ‘Big AI’, offering a profound understanding of AI as both a product and service category, and an integral component of existing cloud computing arrangements. Furthermore, their study sheds light on the developmental and deployment aspects of the purported ‘AI revolution’, heralded by ChatGPT’s launch in late 2022, highlighting the substantial role played by Microsoft, Google, and Amazon in convening enterprises, organisations, and developers, fostering the creation, capture, and commercialisation of AI.

Cloud AI stacks: Structural interconnections among cloud platform products and services offered by Microsoft Azure, Google Cloud Platform, and Amazon Web Services. https://doi.org/10.17605/osf.io/unvc2

Ultimately, the study goes beyond characterising the current ‘platformisation’ of AI, where AI expands beyond consumer-facing applications like ChatGPT to become a platform service provided by Big Tech companies (i.e. an AI platform and infrastructure as a service). This encompasses extensive suites of tools, products, and services—from hardware AI infrastructure to machine learning and computer vision software—, along with ‘platform boundary resources’ for developers and businesses to build upon. The study comprehensively analyses and substantiates this transformation with empirical evidence. It highlights how Big AI represents a dual form of power: first, by owning and offering essential infrastructure and support, and second, by controlling marketplaces for the distribution and deployment of AI models and applications across diverse sectors and industries. Additionally, the study leverages the empirical analysis to conceptualise AI’s cloud infrastructure dependence and the ongoing ‘industrialisation’ of AI, providing important guidance for policymakers and regulators in governing AI.

The full research article is openly available in Big Data & Society at https://doi.org/10.1177/20539517241232630. The data that support the findings of this study are openly available in the Open Science Framework (OSF) at https://doi.org/10.17605/osf.io/unvc2.

Monday 18 March 2024

Guest Blog: An Exploration of the Ways Agricultural Big Data is Assetized By Agriculture Technology Companies.

By Sarah Marquis

Hackfort, S., Marquis, S., & Bronson, K. (2024). Harvesting value: Corporate strategies of data assetization in agriculture and their socio-ecological implications. Big Data & Society, 11(1).
https://doi.org/10.1177/20539517241234279

In the past decade, much attention has been paid to the ways that Big Tech companies like Google and Facebook leverage personal data to create value, both economic and otherwise. Meanwhile, as digital technologies become more ubiquitous in agriculture, we thought it necessary to interrogate the ways that agricultural big data is being assetized in parallel ways.

In this paper, we ask the following question: how is agricultural data transformed into value by the most powerful agribusinesses and ag-tech ﬁrms?

To answer our research question, we read many financial records and annual reports and analyzed earnings calls to see how agricultural data was valued and discussed by multi-national agribusinesses like John Deere, Bayer, BASF and Farmers Edge. We came to several conclusions. The first is that any attempt to systematically examine what agribusinesses do with agricultural data is impaired by legal mechanisms that obfuscate data practices, datasets, and algorithms: copyright, intellectual property law, trade secrecy law, and arbitration agreements all allow for proprietary technologies and a high degree of vagueness and opacity. This is a ﬁnding in and of itself; such obfuscation prevents critical analysis and the kind of oversight that the equitable governance of technology requires. Our second, broader argument is that data itself is very likely an asset for agricultural ﬁrms, which now uniformly include big data-based services in their portfolios. We outline three strategies that ﬁrms use (or are likely to use) to generate value from agricultural data:

 Agribusinesses use agricultural big data to secure relationships in which users are dependent upon them.
 Agribusinesses gain from practices of price-setting and data sharing.
 Agricultural big data is used to develop new products and target marketing materials to users.

The strategies we have identiﬁed have socio-ecological implications; they affect social justice, food sovereignty, and sustainability, the latter of which does not always receive due attention in critical data studies (c.f. Gabrys, 2016; Goldstein and Nost, 2022). Our results indicate the reproduction of asymmetrical power relations in the agri-food system favoring corporations and the continuation of long-standing dynamics of inequalities. We can infer that the big data-based predictions agribusinesses sell to farmers are directed toward a productivist model of “surveillance agriculture” (Stone, 2022a) that reinforces existing patterns of unsustainable agro-industrial farming and renders other routes, such as agroecology, peasant farming, and organic farming less legitimate and possible.

Tuesday 5 March 2024

Bookcast: Melissa Villa-Nicholas discussing her book Data Borders: How Silicon Valley is Building an Industry around Immigrants

Featuring authors: Dr. Melissa Villa-Nicholas

Book: Data Borders: How Silicon Valley is Building an Industry around Immigrants (2023) University of California Press.

Wednesday 14 February 2024

Guest Blog: A Social Privacy Approach to Investigative Genetic Genealogy

by Nina de Groot

de Groot, N. F. (2023). Commercial genetic information and criminal investigations: The case for social privacy. Big Data & Society, 10(2).
https://doi.org/10.1177/20539517231216957

In 2022, four college students from the University of Idaho were stabbed to death. Law enforcement found DNA at the crime scene that probably originated from the murder suspect, but the DNA profile did not match with the FBI database of criminal offenders. The FBI then used, according to the prosecution, another method: searching in a public genetic genealogy database for relatives of the unknown suspect. Weeks later, they were, most likely based on this lead, able to arrest a suspect. By building extensive family trees of the unknown suspect and the DNA test consumer who is genetically related to the suspect, one can eventually zero in on the suspect by mapping genetic relationships and examining birth and death certificates or social media profiles. This so-called ‘investigative genetic genealogy’ is increasingly used in both the US and Europe.

Investigative genetic genealogy may have significant potential when it comes to crime-solving. Yet, the debate on investigative genetic genealogy tends to be reduced into a dichotomy between protecting privacy on the one hand and serving society by solving horrendous crimes, on the other. As the CEO of a commercial genetic genealogy database has said: “You have a right to privacy. You also have the right not to be murdered or raped”. However, reducing it to this dichotomy is problematic.

In my paper ‘Commercial genetic information and criminal investigations: The case for social privacy’, I propose to consider investigative genetic genealogy through the lens of social privacy. Social privacy is a helpful way to look beyond this simplified dichotomous choice. For example, it is not only about the privacy of the individual DNA-test consumer, but about the privacy of potentially thousands of genetic relatives. If one individual DNA test consumer gives consent to law enforcement use of their data, this person does so for many close and distant relatives.

Perhaps more importantly, a social privacy approach allows for the consideration of broader social- political concerns of investigative genetic genealogy. For example, it allows us to explore the complex issues when it comes to commercial parties entering the scene of criminal investigations, including dependence on commercial actors. Commercial companies currently have a major say in deciding in which crimes to allow (and not allow) law enforcement use of their database. Additionally, public forensic laboratories often do not have the technology to generate the type of DNA profile needed, which leads to the dependence of law enforcement on private actors in this process. Furthermore, investigative genetic genealogy could have problematic consequences for the relationship between citizens and state as well as for the public nature of criminal investigations.

A social privacy approach can also help shed a light on the implications for criminal investigations and the democratic process. Namely, if only a few percent of a population gives consent for law enforcement use of their DNA data, almost every individual within that population can be identified through distant relatives. Therefore, the decisions of only a fraction of the population can have far- reaching implications, reflecting a potentially harmful ‘tyranny of the minority’, which is a concept from digital data ethics. In that respect, insights from the social privacy debate on the interconnectedness of data for the online digital data debate, can be useful for exploring these issues in the context of investigative genetic genealogy.

Friday 5 January 2024

by Zelly Martin, Martin J. Riedl, Samuel C. Woolley

Martin, Z. C., Riedl, M. J., & Woolley, S. C. (2023). How pro- and anti-abortion activists use encrypted messaging apps in post-Roe America. Big Data & Society, 10(2). https://doi.org/10.1177/20539517231221736

How pro- and anti-abortion activists use encrypted messaging apps in post-Roe America

On June 24th, 2022, the United States Supreme Court ended nearly 50 years of federal protection of abortion. Immediately, we noticed that experts were recommending encrypted messaging apps as the solution to abortion-related data leakage for abortion-seekers, activists, and healthcare providers. Not two months later, a teenager and her mother in Nebraska were arrested for the teenager's abortion on the basis of data obtained from their Facebook Messenger conversations. It seemed, then, that encrypted messaging apps might provide security that unencrypted spaces (as Facebook Messenger, at the time, was) could not.

We thus set out to explore the utility of encrypted messaging apps as privacy-promoting spaces in a post-Roe America. We interviewed pro-abortion, anti-abortion, and encryption activists in U.S. states with varying levels of abortion restriction or protection. We found that while our pro-abortion interviewees often considered encryption as a powerful tool for security, our anti-abortion interviewees largely rejected it on principle alone, believing it to be characteristic of inauthenticity or criminality, or simply found it untrustworthy. Yet activists on both sides of the abortion issue used encrypted messaging apps for reasons other than security, including convenience and coordination.

Ultimately, we argue that although end-to-end encryption is a powerful security tool, it must be used in combination with other security practices to effectively resist patriarchal (and ubiquitous) surveillance by corporations and law enforcement in a post-Roe America.

Big Data & Society (BD&S) is an open access peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies.

The Journal's key purpose is to provide a space for connecting debates about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge.

Impact Factor for 2020 = 5.987

Follow @BigDataSoc