Sunday, 28 July 2019

The expansion of the health data ecosystem – Rethinking data ethics and governance

Special Theme Issue
Guest Editors: Tamar Sharon* and Federica Lucivero**

* Interdisciplinary Hub for Security, Privacy and Data Governance, Radboud University, NL
** Ethox and Wellcome Centre for Ethics and Humanities, University of Oxford, UK

As in other domains, digital data are taking on an ever more central role in health and medicine today. And as it has in other domains, datafication is contributing to a re-configuration of health and medicine, prompting its expansion to include new spaces, new practices, new techniques and new actors. Indeed, possibilities to quantify areas of life that have not traditionally been considered the remit of biomedicine – such as a person’s consumption patterns, her social media activity or her dietary habits – have contributed to a redefinition of almost any data as health-related data (Lucivero and Prainsack, 2015; Weber et al., 2014). Increasingly, these data are being generated outside the traditional spaces of medicine, as people go about their daily lives interacting with consumer mobile devices. Similarly, the technological tools needed to capture, store, analyze and manage the flow of these data, from wearables and smart phones to cloud platforms and machine learning, increasingly rely on infrastructure and know-how that lie beyond the scope of traditional medical systems and scientists, amongst data scientists and ICT specialists. Moreover, new stakeholders are cropping up in these quasi-medical yet still undomesticated territories. On one end of the spectrum, individuals who generate health data as they track and monitor their health are both solicited as research participants and are making demands on researchers to utilize their personal health data (HDE, 2014). On the other end of the spectrum, consumer technology corporations such as Apple and Google are reinventing themselves as obligatory passage points for data-intensive precision medicine (Sharon, 2016). And somewhere in between, not-for-profit organizations, such as Sage Bionetworks and, are positioning themselves as mediators in this ecosystem in formation, between the medical research community, individual and collective generators of data, and technology developers.

As proponents uphold, this expansion and decentralization of the health data ecosystem is promising: it may advance data-driven research and healthcare, and it may render research more inclusive (Shen, 2015; Topol, 2015). But, as critical scholars of science and technology have consistently shown, a fuller grasp of our technological present must always include the far-reaching, unexpected and sometimes deleterious social, political and cultural effects of discourses of scientific progress and technologically-enabled democratization and participation. In recent years, such critical scholarship has been particularly wary of the new power asymmetries that datafication contributes to. Rather than levelling power relations, critics observe, these are being redrawn along new digital divides based on data ownership or access, control over digital infrastructures, and new types of computational expertise, where those who generate data, especially citizens, patients, and consumers, are positioned on the losing side of the on-going extraction and scramble for the world’s data driven by state and corporate actors (Andrejevic, 2014; boyd and Crawford, 2012; Taylor, 2017; Zuboff, 2016).

In the context of the data economy, the dominant response to these growing power differentials has been to ensure that individual data subjects acquire more control over the data they produce – what Prainsack calls the “Individual Control” approach in her contribution to this special theme. Examples include the EU’s General Data Protection Regulation or initiatives that allow individuals to monetize their personal data (Lanier, 2013; In the context of data-driven medicine, this emphasis on increasing individual control over data has translated into attempts to develop better anonymization techniques and more fine-grained informed consent (Kaye et al., 2015), as well as the configuration of patients as the rightful “owners” of their own medical data (Kish and Topol, 2015).

However, scholars from different disciplines have begun questioning whether enhancing individual control over data is the most effective or desirable means of addressing the new power differentials of digital society. While some scholars emphasize the relational and social nature of persons and data (Taylor, 2012), others question the legal feasibility of individual ownership of data (Evans, 2016), and others highlight the futility of monetization schemes as a means of redressing inequalities (Casilli, 2019). Most importantly, the emphasis on individual rights and values may result in a reframing of societal concerns as individual ones all the while undermining the political power of collectives.

Each of the contributions that make up this special theme addresses the reconfiguration of existing relationships and the emergence of new power differentials that result from the expansion of the health data ecosystem. While they do this from different disciplinary perspectives, they all share the same starting point: the understanding that increased individual control of data subjects is insufficient for anticipating the far-reaching risks and preventing the societal, if not individual, harms associated with this expansion. In light of this, they argue for new governance frameworks, technological infrastructures and narratives that are predicated on the shared responsibility of multiple stakeholders and collective decision-making and control.

The commentaries by Brian Bot, Lara Mangravite and John Wilbanks, and by Bart Jacobs and Jean Popma both discuss the types of technical methods and arrangements that need to be developed to enable secure, responsible and equitable data sharing in the context of decentralized medical research. Both groups of authors are involved in the design and implementation of novel data management infrastructures.

A better understanding of the workings of data management infrastructures is discussed in a filmed interview with José van Dijck, on the recent book she has co-authored with Thomas Poell and Martijn de Waal, Platform Society: Public Values in a Connective World (2018). Van Dijck and Sharon discuss the importance of grasping how the material functioning of internet platforms contributes to shaping a new political and social reality.

In their commentary, Alessandro Blasimme, Effy Vayena and Ine Van Hoyweghen scrutinize how the proliferation of citizen generation of medical data, in initiatives like the American “All of Us” program, is unsettling the position of a less commonly studied stakeholder: the private insurance sector. Such initiatives, they argue, create a new “information asymmetry” between private insurers and those of their policy-holders who enroll in such research, which will likely make people more reluctant to donate personal health data for precision medicine research.

Tuukka Lehtiniemi and Minna Ruckenstein focus on data activism as a means of challenging the power asymmetries of datafied societies. Based on their engagement as social scientists with MyData, a data activism initiative originating in Finland, they identify and disentangle two parallel social imaginaries, a “technological” and a “socio-cultural imaginary”. They discuss the benefits and disadvantages of each and call for a greater role for the latter, while acknowledging its weaknesses.

The contributions by Barbara Prainsack and Linnet Taylor & Nadezdha Purtova both address the limitations of the framework of the commons –  today’s preferred site of theoretical and practical resistance for those scholars and activists seeking to counter digital power asymmetries by foregrounding collective, rather than individual, control over data. While Prainsack argues that a more systematic discussion of processes of inclusion and exclusion in commons is required, Taylor and Purtova call for more attention to which stakeholders are affected by data practices. Both agree that in light of the multiple nature of data, the original commons framework cannot be easily transposed from physical to data commons.

In her article, Tamar Sharon calls for a closer examination of the different conceptualizations of the common good that are at work in one specific area of the expanding health data ecosystem, what she calls the “Googlization of health research”, or the recent entrance of large consumer tech corporations into the medical domain. Using the framework of justification analysis (Boltanski and Thévenot, 2006), she identifies a plurality of conceptualizations of the common good that different actors mobilize to justify collaborating within these new multi-stakeholder research projects.

We hope that this special theme offers a productive – albeit far from comprehensive – overview of arguments for and examples of infrastructure, governance and ethics that are collective-centric in addressing the challenges posed by the datafication and expansion of the health ecosystem.


Andrejevic M (2014) The Big Data Divide. International Journal of Communication 8: 1673–89.

Boltanski L, and Thévenot, L (2006) On Justification: Economies of Worth. Princeton: Princeton University Press.

boyd d and Crawford K (2012) Critical Questions for Big Data. Information Communication & Society 15(5): 662–79.

Casilli A (2019) En Attendant les Robots: Enquête sur le Travail du Clic. Paris: Seuil.

Evans B (2016) Barbarians at the Gate: Consumer-Driven Health Data Commons and the Transformation of Citizen Science. American Journal of Law & Medicine (4): 1–34.

Health Data Exploration Project (HDE) (2014) Personal Data for the Public Good: New Opportunities to Enrich Understanding of Individual and Population Health. Calit2, UC Irvine and UC San Diego. Available at: at

Kaye J, Whitley E, Lund D, Morrison M, Teare H and Melham K (2015) Dynamic consent: a patient interface for twenty-first century research networks. European Journal of Human Genetics 23(2): 141-146.

Kish L and Topol E (2015) Unpatients – why patients should own their medical data. Nature Biotechnology 33(9): 921-924.

Lanier J (2013) Who Owns the Future? London: Penguin Books.
Lucivero F and Prainsack B (2015) The lifestylisation of healthcare? “Consumer genomics” and mobile health as technologies for healthy lifestyle’. Applied and Translational Genomics, 4.

Sharon T (2016) The Googlization of health research: from disruptive innovation to disruptive ethics. Personalized Medicine. DOI: 10.2217/pme-2016-0057.

Shen H (2015) Smartphones set to boost large-scale health studies. Nature. DOI:10.1038/nature.2015.17083.

Taylor L (2017) What is data justice? Big Data & Society 4(2): 1-14.

Taylor M (2012) Genetic Data and the Law: A Critical Perspective on Privacy Protection. Cambridge: Cambridge University Press.

Topol E (2015) The Patient Will See You Now: The Future of Medicine Is in Your Hands. New York: Basic Books.

van Dijck J, Poell T and de Waal M (2018) The Platform Society: Public Values in a Connective World. Oxford: Oxford University Press.

Weber GM, Mandl KD and Kohane IS (2014) Finding the missing link for big biomedical data. JAMA 311(24):2479–2480. doi:10.1001/jama.2014.4228

Zuboff S (2015) Big other: surveillance capitalism and the prospects of an information civilization. Journal of Information Technology 30(1): 75–89.

Monday, 8 July 2019

Summer break

The Big Data and Society Editorial Team will be on summer break from July 15th until August 15th. Please accept delays in processing and reviewing your submission during that time.

Many thanks for your understanding.

Friday, 14 June 2019

Call for Special Theme Proposals for Big Data & Society

The SAGE open access journal Big Data & Society (BD&S) is soliciting proposals for a Special Theme to be published in late 2020 or early 2021. BD&S is a peer-reviewed, interdisciplinary, scholarly journal that publishes research about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge. BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practices that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government and crowd-sourced data. Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes.

Special Themes can consist of a combination of Original Research Articles (8000 words; maximum 6), Commentaries (3000 words; maximum 4) and one Editorial (3000 words). All Special Theme content will be waived Article Processing Charges. All submissions will go through the Journal’s standard peer review process.

Past special themes for the journal have included: Knowledge Production, Algorithms in Culture, Data Associations in Global Law and Policy, The Cloud, the Crowd, and the City, Veillance and Transparency, Environmental Data, Spatial Big Data, Critical Data Studies, Social Media & Society, Assumptions of Sociality, Health Data Ecosystems and Data & Agency. See to access these special themes.

Format of Special Theme Proposals
Researchers interested in proposing a Special Theme should submit an outline with the following information.

- An overview of the proposed theme, how it relates to existing research and the aims and scope of the Journal, and the ways it seeks to expand critical scholarly research on Big Data.

- A list of titles, abstracts, authors and brief biographies. For each, the type of submission (ORA, Commentary) should also be indicated. If the proposal is the result of a workshop or conference that should also be indicated.

- Short Bios of the Guest Editors including affiliations and previous work in the field of Big Data studies. Links to homepages, Google Scholar profiles or CVs are welcome, although we don’t require CV submissions.

- A proposed timing for submission to Manuscript Central. This should be in line with the timeline outlined below.

Information on the types of submissions published by the Journal and other guidelines is available at

Timeline for Proposals
Please submit proposals by September 1, 2019 to the Managing Editor of the Journal, Prof. Matthew Zook at The Editorial Team of BD&S will review proposals and make a decision by November 2019. Manuscripts would be submitted to the journal (via manuscript central) by or before March 2020. For further information or discuss potential themes please contact Matthew Zook at

Saturday, 25 May 2019

Video abstract: Experiments with a data-public

Anders Koed Madsen and Anders Kristian Munk discuss their paper "Experiments with a data-public: Moving digital methods into critical proximity with political practice" in Big Data & Society 6(1), First Published February 15, 2019.

Video Abstract

Text Abstract
Making publics visible through digital traces has recently generated interest by practitioners of public engagement and scholars within the field of digital methods. This paper presents an experiment in moving such methods into critical proximity with political practice and discusses how digital visualizations of topical debates become appropriated by actors and hardwired into existing ecologies of publics and politics. Through an experiment in rendering a specific data-public visible, it shows how the interplay between diverse conceptions of the public as well as the specific platforms and data invoked, resulted in a situated affordance-space that allowed specific renderings take shape, while disadvantaging others. Furthermore, it argues that several accepted tropes in the literatures of digital methods ended up being problematic guidelines in this space. Among these is the prescription to shown heterogeneity by pushing back at established media logics.

Keywords: Digital methods, public engagement, pragmatism, controversy-mapping, critical proximity, multiplicity

Thursday, 23 May 2019

Video Abstract: Big data and quality data for fake news and misinformation detection

Fatemeh Torabi Asr and Maite Taboada
Big Data & Society 6(1), First published May 23, 2019.

Fake news is a problem. It is a big data problem. We are trying to solve it with small amounts of data. Those are, in a nutshell, the three main points of our paper. We review available datasets and introduce the MisInfoText repository as a contribution of our lab to the community. We make available the full text of the news articles, together with veracity labels previously assigned based on manual assessment of the articles’ truth content by fact-checkers. We also perform a topic modeling experiment to elaborate on the gaps and sources of imbalance in currently available datasets to guide future efforts. We appeal to the community to collect more data and to make it available for research purposes.

Video Abstract

This video was taken during Innovations in Research, an event at Simon Fraser University in Vancouver, as part of the Community Summit “Confronting the Disinformation Age”.

Credit: Simon Fraser University.

Keywords: Fake news, misinformation, labelled datasets, text classification, machine learning, topic modelling

Thursday, 9 May 2019

Are we outsourcing the curation of history to Facebook?

Carl J Öhman, David Watson

Online death has recently become a hot topic both in academia and the popular press. However, much of the current debate around the phenomenon has focused on its significance for individual users. For example, most discussions in media pertain to planning one’s own digital estate and/or how best to cope with the digital remains of a loved one. (This has been reaffirmed by the innumerable interview questions we have received on these topics since the recent publication of our article.) In view of this, we wanted to bring in a more societal perspective and ask what people dying on the internet means for us on a collective level. We also wanted to highlight the fact that death on social media is not just a Western, high-tech phenomenon. The so-called “digital afterlife” is often associated with futuristic scenarios and AI, but the reality is that people all around the world pass away every day, leaving behind enormous volumes of data. Many of them are using no more sophisticated technologies than smart phones and social media apps.

From this background, we collected data from the UN and Facebook’s audience insights feature, from which we built a model that projects the future accumulation of profiles belonging to deceased Facebook users. Our analysis suggests that a minimum of 1.4 billion users will pass away between now and 2100 if Facebook ceases to attract new users as of 2018. If the network continues expanding at current rates, however, this number will exceed 4.9 billion. In both cases, a majority of the profiles will belong to non-Western users. In the former scenario, we find that the dead may outnumber the living on the network as soon as 2070.

In discussing our findings, we draw on the emerging scholarship on digital preservation and stress the challenges arising from curating the profiles of the deceased. We argue that an exclusively commercial approach to data preservation poses important ethical and political risks that demand urgent consideration. We want to be careful to state that our paper is not a critique of Facebook’s current policies on this matter. In fact, we would argue they’re actually doing a pretty good job, all things considered. We doubt that user death was high on Mark Zuckerberg’s list of priorities when he created the network, yet Facebook has devoted considerable resources to handling these sensitive matters in recent years. Hence, we would like to direct attention not so much to Facebook itself, but to the question of how we as a society, as a civilization, should go about dealing with the fact that Facebook will host billions of records of deceased users. Eventually these profiles will lose their commercial value – then what? Can we expect Facebook to keep hosting the data? Will it simply be deleted? Sold off? We need to build the proper institutions and infrastructure to tackle these questions now, because in only a few decades, these challenges will already be at our doorstep.

In particular, we wish to draw attention to the political aspect of our work. In George Orwell’s 1984, the past is controlled exclusively by the Party. They own all historical records and are not above modifying them to serve their own interests. The Party can do this because they have a monopoly on historical data. Although extreme, this scenario illustrates the risks involved in concentrating power over the past among a limited set of actors. And to some extent, that is exactly what we do today … only in our case, it is not a state or a party that controls that data, but a small number of tech empires. In pre-digital society, data about significant historical events and persons have generally been distributed across numerous institutions (national archives, museums, etc). Now, as political and social movements are largely mediated by online platforms, the narrative is increasingly owned by just a handful of firms. Today’s Martin Luther Kings, Winston Churchills, and Napoleons all probably use social media. Their lives and deeds are recorded in timeline posts and tweets. As researchers, we don’t want to be alarmist about this, but we do argue that there is good reason to be cautious about how we proceed. What kind of digital society do we want to build?

Wednesday, 1 May 2019

Data Politics: The Birth of Sensory Power

by Engin Isin and Evelyn Ruppert

Didier Bigo, Engin Isin, and Evelyn Ruppert recently published an edited collection, Data Politics: Worlds, Subjects, Rights (2019, Routledge). Building on a commentary first published in Big Data & Society, the book explores how data has acquired the capacity to reconfigure relations between states, subjects, and citizens. Fourteen chapters consider how data and politics are now inseparable as data is not only shaping social relations, preferences and life chances but our very democracies. Concerned with the things (infrastructures of servers, devices and cables) and language (code, programming, and algorithms) that make up cyberspace, the book argues that understanding the conditions of possibility of data is necessary in order to intervene in and shape data politics.

We concluded our chapter entitled ‘Data’s empire: postcolonial data politics’ with the suggestion that Michel Foucault’s trilogy ‒ sovereign, disciplinary, and regulatory ‒ regimes of power is now joined by a fourth regime in the history of the present. We note that Foucault did not understand these regimes of power as supplanting but augmenting each other. That’s why he designated rather broad and shifting historical periods to identify their origins or birth: sovereign power roughly in the 16th and 18th centuries, disciplinary power in the 17th to 18th centuries, and regulatory (or biopower) in the 19th century.

The birth of regulatory power is of greatest interest to us as it relates to the development of knowledge about the species-body through the statistical sciences. Ian Hacking more precisely identified the 1820s and 1840s as the period when the idea of population was invented and statistical sciences were born as a regime of knowledge-power that regulated the relationship between the species-body (population) and individual body. Foucault broadly called this ‘biopolitics’ and inspired an important body of thought and work. His influence on the specific development of the history of statistics has been crucial and we have learned much from a pioneering body of subsequent scholarship.

Our starting point for the volume and the chapter is the need to place recent developments in data politics in relation to Foucault’s trilogy of regimes of knowledge-power. Gilles Deleuze already gestured toward this in his much-discussed ‘Postscript on the Societies of Control’ (1990) but it remained a suggestive if not early proposition.

We argue that to develop that proposition requires understanding the emergence of new data gathering, mining and analytic technologies. From web platforms, mobile phones, sensors, drones, satellites and wearables to devices that make up the Internet of Things, digital technologies and the data they generate are connected to the emergence of new regimes of knowledge-power especially during the last forty years. We provide a preliminary version of this proposition and conclude that perhaps the period between the 1980s and 2020s constitutes the birth of a new knowledge-power regime. We state that although we are confident about our claim, we are yet unable to name this regime.

With work we have done since writing the chapter we are now tempted to name the new knowledge-power regime as the birth of sensory power>. The reasons for this are given in the chapter. We know this is an ambitious claim that will require further work on our part. But we hope it will also inspire readers to respond to both the chapter and our subsequent proposition that sensory power is a fourth regime in the history of the present.