Tuesday, 15 November 2022

Learning accountable governance: Challenges and perspectives for data-intensive health research networks

by Sam Muller

Muller, S. H. A., Mostert, M., van Delden, J. J. M., Schillemans, T., & van Thiel, G. J. M. W.(2022). Learning accountable governance: Challenges and perspectives for data-intensive health research networks. Big Data & Society, 9(2). https://doi.org/10.1177/20539517221136078

In our article, we address the accountability of large-scale health data research. Accountability is crucial to ensure democratic control and to steer health data research to contribute public value. Yet whereas previous research about health data paid much attention to accountability as a norm for doing and organising health data research, it did not specify what accountability processes should look like in practice. Specifically, previous research did not take into account that much health data research takes place in international networks, in which public and private organisations collaborate internationally and in a relatively horizontal way.
In our analysis of the current state of accountability, we found that governing such networks to foster accountability faces several challenges. The fact that health data research takes place in complex networks puts a lot of pressure on realizing clear and stable accountability relationships. Moreover, smooth cooperation is difficult due to unclarity of norms and principles which could guide accountability processes. Lastly, effective design of information provision and debate is lacking.
To complement the shortcomings of current accountability in health data research networks, we propose focusing on accountability as a means of learning from insights and feedback about how good governance can be achieved. We suggest two pathways for pursuing learning accountability. First, an integrated governance structure for learning to occur needs to be developed. Provisional goals need to be established by building on overlapping consensus. This is crucial to develop mutual understanding, shared motivation and common commitment between organisations engaged in health data research. Second, ongoing deliberation and open communication about collaboration are required for reflexive dialogue. Stakeholders (publics and communities affected by health data research) should be represented and enabled to participate. Empowering them in the form of a collective forum enables learning from their experiences and holding health data research to account.

Thursday, 20 October 2022

Jill Rettberg introduces a new paper on, "Algorithmic failure as a humanities methodology: Machine learning's mispredictions identify rich cases for qualitative analysis"

Jill Rettberg introduces a new paper on, "Algorithmic failure as a humanities methodology: Machine learning's mispredictions identify rich cases for qualitative analysis", out in Big Data & Society  doi:10.1177/20539517221131290. First published October 18, 2022.

Video abstract


This commentary tests a methodology proposed by Munk et al. (2022) for using failed predictions in machine learning as a method to identify ambiguous and rich cases for qualitative analysis. Using a dataset describing actions performed by fictional characters interacting with machine vision technologies in 500 artworks, movies, novels and videogames, I trained a simple machine learning algorithm (using the kNN algorithm in R) to predict whether or not an action was active or passive using only information about the fictional characters. Predictable actions were generally unemotional and unambiguous activities where machine vision technologies were treated as simple tools. Unpredictable actions, that is, actions that the algorithm could not correctly predict, were more ambivalent and emotionally loaded, with more complex power relationships between characters and technologies. The results thus support Munk et al.'s theory that failed predictions can be productively used to identify rich cases for qualitative analysis. This test goes beyond simply replicating Munk et al.'s results by demonstrating that the method can be applied to a broader humanities domain, and that it does not require complex neural networks but can also work with a simpler machine learning algorithm. Further research is needed to develop an understanding of what kinds of data the method is useful for and which kinds of machine learning are most generative. To support this, the R code required to produce the results is included so the test can be replicated. The code can also be reused or adapted to test the method on other datasets.


  1. Machine vision,
  2. machine learning,
  3. qualitative methodology,
  4. machine anthropology,
  5. digital humanities,
  6. algorithmic failure

Monday, 26 September 2022

Comparing algorithmic platforms through differences

Tseng, Y.-S. (2022). Algorithmic empowerment: A comparative ethnography  of two open-source algorithmic  platforms – Decide Madrid and vTaiwan. Big Data & Society, 9(2). https://doi.org/10.1177/20539517221123505

What can critical algorithms/data studies learn from comparative urbanism? 


In my latest paper, I repurpose the idea of comparative urbanism to offer an alternative epistemology of algorithmic decision-making based on two open-source platforms in Taipei and Madrid. In the book Ordinary Cities, Robinson (2006) develops a cosmopolitan approach for comparing cities across axes of ‘difference’ within/beyond the dichotomy of the Global North/South in order to destabilise the conventionally Anglo-Saxon understanding of urban modernisation and world cities and the Northern reduction of the cultural and social richness of the Southern cities to the improvised counterparts. For Robinson, the ‘incommensurable difference’ (Robinson, 2006, p. 41) between the Global South and North, between the wealthy and poor, ‘needs to be viewed less as a problem to be avoided and more as a productive means for conceptualising contemporary urbanism’ (McFarlane and Robinson, 2010, p. 767; Robinson, 2011). Since then, there has been much discussion on how critical scholars can retheorise the urban by learning from cities in the Global South and East (Robinson, 2011; 2015; McFarlane, 2011).  


As much as urban scholars have charted out grounds for comparing cities that are deemed different in terms of socio-economics and politics (see Robinson, 2015, 2022), critical algorithm/data studies begins to identify needs to go beyond the Northern concept of data universalism as well as the dichotomy of Global North/South (Milan & TrerĂ©, 2019). For Milan & TrerĂ© (2019, p. 319), it is much needed to undo the ‘cognitive injustice that fails to recognise non-mainstream ways of knowing the world’ in data universalism. Against this backdrop, Yu-Shan’s paper demonstrates how comparative urbanism is helpful to provide new epistemology and theorisation of otherwise unknown algorithmic decision-making for citizen empowerment through differences and similarities of vTaiwan and Decide Madrid platforms.  

Urban comparison offers two methodological tactics to bring the two algorithmic platforms - vTaiwan and Decide Madrid – into a dialogue with the current conceptualisation of algorithmic devices and platforms in human geography and critical algorithmic studies.


Firstly, the genetic comparative tactic allows not to start with the technicality of the two platforms but to trace their shared trajectories. Both platforms were developed by activists and civic hackers involving in or inspired by the Occupy Movements in Madrid (the 15M) and Taipei (the Sunflower Movement) and were situated within wider on-going democratisations in Spain and Taiwan. It is through the genetic trajectories that the comparative study showcases how algorithmic platforms can be imbued with democratic claims and promises by politicians, activists and civic hackers. Crucially, situating algorithmic platforms in their shared trajectories of urban social movements carves out an important conceptual position which does not assume algorithmic platforms to be fundamentally anti-democratic but allow them to develop each’s own pathway towards becoming more (or less) democratic. 


Secondly, the generative comparative tactic conceptualises how algorithmic systems empower citizens across different geolocations, governmental institutions and algorithm-human relationships. The notion of algorithmic empowerment is generated by examining how the two algorithmic systems actualise democratic possibilities and practices across different sets of algorithmic orderings and human actions. Such algorithmic differences, no matter how subtle, matter for opening up or closing down political possibilities (Amoore, 2013; 2019; 2020), in this case, for empowering citizens in political decisions.  


Looking beyond this paper, the comparative urbanism has much more to offer to unpack critical data/algorithm studies. Considering the cultural richness and diversity of the highly digitalised urban societies in the Global East, comparative studies on algorithmic platforms or devices through  differences is urgently needed to broaden (or provincialize) the Northern theorisation of algorithms and data practices.  As digital geographers have already noted, ‘there is a pressing need to destabilize the dominance of the Global North as a universal placeholder and de facto field site for geographical research about the digital’ (Ash et al., 2018, p. 37). 

BD&S Webinar on Decolonizing Data Universalism, September 29th 2022 9:00-11:00 (GMT+1)

Come join the BD&S Webinar on Decolonizing Data Universalism, September 29th 2022 9:00-11:00 (GMT+1), 

Monday, 19 September 2022

BD&S Webinar on Big Data and the Environment, September 22nd 2022 15:30-17:00 (GMT+1)

Come join the BD&S Webinar on Big Data and the Environment, September 22nd 2022 15:30-17:00 (GMT+1), 

Wednesday, 7 September 2022

A citizens jury in Singapore to explore public views about sharing precision medicine data with private industry

Read the article: Ballantyne A, Lysaght T, Toh HJ, et al. Sharing precision medicine data with private industry: Outcomes of a citizens’ jury in Singapore. Big Data & Society. January 2022. doi:10.1177/20539517221108988.

Precision medicine and data sharing 

Precision medicine is an emerging field that uses biological (including genomic), medical, behavioural and environmental information about a population to provide more specific and tailored healthcare (sometimes called personalised medicine). Precision medicine is expected to deliver significant gains in health treatments and outcomes and be a powerful tool for doctors and their patients. PM research requires large amounts of genomic and clinical data from diverse populations in order to make robust and reliable observations about health and disease. Broad public participation is essential to the generation of these large datasets.  Community buy-in will depend on widespread trust and social license, which means that data sharing practices must broadly align with public expectations. Most PM programs will find it impracticable to ask for individuals consent each time data is used or shared. While studies suggest broad support for sharing PM data with researchers at publicly funded institutions, there is reluctance to share health information with private industry for research and development. As the private sector is likely to play an important role in generating public benefits from PM initiatives, it is important to understand what the concerns are and how they might be mitigated.

What is social license?

It is important to understand they public’s expectations, goals and concerns with respect to large population level health datasets.  Social license refers to public acceptance of a particular activity or practice; it is not an explicit consent or legal permit, but instead a type of implicit social permission that is based on community approval and trust. Prior research has shown that people are often unaware of the extent to which their health and other administrative data is shared, linked and used, or the governance and regulatory measures that are put in place to facilitate sharing(Ballantyne, 2018; Ipsos MORI, 2016).  Data sharing with diverse parties can be challenging because it requires balancing different, and sometimes competing, ethical considerations, including privacy, individual control and consent, trust, accountability and public benefit. It is therefore important to try to understand the limits of the social license in specific communities for certain data sharing activities. Studies conducted internationally have shown general support for sharing health data with researchers at publicly-funded institutions, provided that certain conditions are met, and most notably, high levels of data security and demonstration that the public interest is served by the data sharing arrangement (Hill et al., 2013; Kalkman et al., 2019; Stockdale et al., 2019; Garrison et al., 2015). 

What is a citizens’ jury?

A citizens’ jury is a democratic process that supports citizens to understand the range of issues and different perspectives associated with a contentious topic. Citizens’ juries aim to develop recommendations for government, government agencies, and public and private organisations. The deliberation process requires the jury to help define the issue or problem, understand the context, generate ideas, analyse options and offer advice and recommendations. This is a well-established method of qualitative research and deliberative democracy, with clear process and procedures to ensure robust outputs. 

What did we do? 

We recruited 19 jurors and asked them to consider the question: Under what circumstances, if any, is it permissible for a national precision medicine program to share data with private industry for research and development? We focused on data sharing with private industry because this issue was identified by key stakeholders in Singapore, and in prior empirical research, as one of the most contentious elements of data-sharing for PM programs. The jurors meet over 4 days between December 2020 and April 2021. 15 jurors were present for the final day of deliberations. The citizens’ jury was supported by an advisory panel consisting of key stakeholders including the Singapore Ministry of Health and/or others were involved in the National Precision Medicine Program. The jury heard from, and were able to engage with, academic and industry experts from Singapore and overseas. 

What did we find?

The jury expressed conditional support for sharing Singaporean precision medicine data with private industry for research and development; under some specific conditions. Overall, the jury agreed that PM data could be shared with pharmaceutical, biotechnology and technology industries, but not the private life/health insurance industry. The jury took this position based on their assessment of the balance between potential benefits to Singaporeans and potential harms. The benefits they found particularly compelling were new medical knowledge and interventions (especially for Asian populations), benefiting future generations, economic development and strengthening of the R&D sector in Singapore; while the relevant harms were unfair profiteering from Singaporean data, data misuse leading to stigma or harm, particularly in relation to companies overseas operating outside Singapore’s legal jurisdiction. Despite existing regulatory and policy protections, the jury remained concerned about the potential harms arising from sharing PM data with private industry, and in particular with private insurers, especially the risk of discrimination. The jury specified three assumptions and nine recommendations. The assumptions were taken to reflect existing consensus about how PM will be conducted in Singapore.  


A1. Data shared with private companies should be de-identified. 

A2. People need to opt-in to PM and have the right to withdraw at any time. 

A3. When people consent to the PM program, information should be comprehensible. For example, they should have someone to talk to, not relying on long written consent forms with lots of terms and conditions that people do not read.

The jury produced the following 9 recommendations for data sharing with private companies. All of these were unanimously endorsed by the 15 jurors present on the final day. 


1.     All data sharing with private industry should be in the public interests of Singaporeans. This means there is public benefit for Singapore. Private companies should not be able to access PM data solely for commercial or business purposes.

2.     We should not share data with insurance companies before there is anti-discrimination law in Singapore to prevent genetic discrimination for life and health insurance.

3.     We should establish an inter-agency committee to approve applications of PM data sharing with private industry; and this should include broad representation from agencies, for example: MOH, HSA, EDB, MOL; and community representatives.

4.     The oversight committee should consider the consequences of data sharing with respect to fairness and relative financial disadvantage between Singaporeans. e.g. cost of medicine in the future, some people not being unfairly discriminated against in insurance.

5.     There must be an accreditation process to ensure that private companies receiving the data are competent, reputable and trustworthy, and there should be enforceable contractual mechanisms in place to ensure overseas companies can be held accountable for any breach or data misuse.

6.     If companies breach the terms of the data access contract, they should be held accountable based on the severity of the breach; and the penalties should include the loss of access to the PM data for a number of years, fines, and criminal charges.

7.     Organizations, teams and individuals should be held accountable for data misuse at both the companies receiving the data and the public agency(s) responsible for releasing the data.

8.     The following should be made publicly transparent: the process of decision-making about sharing PM data with private industry; the companies who have accessed the data and their purpose in accessing the data; a summary of the research outputs so the public can judge the benefit of data sharing; and when there is a data breach and/or what penalties are issued.

9.     There should be higher levels of restrictions for sharing more sensitive data.


Are these findings different from previous research?

Our results aligns with prior international studies which found conditional acceptance for data sharing with private industry, a public benefit requirement, specific reluctance to share with insurance companies, and an emphasis on accountability and transparency in order that data holders and users can demonstrate trustworthiness. However, our results differ from prior studies in that individual consent did not dominate the deliberations; jurors were able to set it aside as an assumed prerequisite for participation in a precision medicine program as a whole and subsequently were not specifically concerned about individual consent for each time data was shared with a new user.



Ballantyne A (2018) Where is the human in the data? A guide to ethical data use. GigaScience 7(7): 1-3.

Garrison NA, Sathe NA, Antommaria AH, et al. (2015) A systematic literature review of individuals' perspectives on broad consent and data sharing in the United States. Genetics in Medicine 7(18): 663-671.

Hill EM, Turner EL, Martin RM, et al. (2013) "Let's get the best quality research we can": public awareness and acceptance of consent to use existing data in health research: a systematic and qualitative study. BMC Medical Research Methodology. DOI: 10.1186/1471-2288-13-72.(13): 72.

Ipsos MORI (2016) The One‐Way Mirror: public attitudes to commercial access to health data. Available at: https://www.ipsos.com/sites/default/files/publication/5200-03/sri-wellcome-trust-commercial-access-to-health-data.pdf (accessed November 11, 2021).

Kalkman S, van Delden J, Banerjee A, et al. (2019) Patients’ and public views and attitudes towards the sharing of health data for research: a narrative review of the empirical evidence. Journal of Medical Ethics. DOI: 10.1136/medethics-2019-105651. 3-13.

Stockdale J, Cassell J and Ford E (2019) "Giving something back": A systematic review and ethical enquiry into public views on the use of patient data for research in the United Kingdom and the Republic of Ireland. Wellcome Open Research.(3): 6.


Developing data capability with non-profit organisations using participatory methods

by Anthony McCosker, Swinburne University of Technology, Melbourne, Australia

Read the article: McCosker A, Yao X, Albury K, Maddox A, Farmer J, Stoyanovich J. Developing data capability with non-profit organisations using participatory methods. Big Data & Society. January 2022. doi:10.1177/20539517221099882


In a small food charity warehouse on the outskirts of a large Australian city, volunteer workers try to tally recipients of food packages. Some of the volunteers don't like using the computer system, or don't have the skills, and every now and then the paper forms run out. Reflecting on the issue and her drive to improve data systems and practices, the manager told us “If you really need to have your data to communicate impact and it's buried in five stacks of paper with a scratched-on-tally, it's not great” (Workshop participant, 5 May 2021).   


The data divide is growing. Uneven access to data is paired with deep disparities in knowledge and expertise creating inequality not only between corporate data-haves and citizen have-nots, but also across industry and organisational 'data settings'. This scenario affects who produces data in the digital society, who has control over or access to it and who has the skills to use it to derive some sort of benefit. It also affects the way data is valued. The benefit of data is usually weighed as commercial advantage. And yet it is civil society, charities, social services and community organisations that are most likely to find data’s social value.  


Our wager is that supporting community sector and non-profit organisations to develop their data capability, with attention to data's social value, can help address the data divide and build data equity. As big data assets are increasingly used not only to inform decisions but also to automate them – through emerging AI and automated decision-making systems – this goal is becoming urgent. 


A collaborative data capability approach


How then can data equity be improved within the non-profit sector? What can be done to build data capability both within community sector organisations and between them, increasing collaboration, community buy-in and social benefit?


In our Big Data & Society paper ‘Developing data capability with non-profit organisations using participatory methods’ we explore the concept of data capability building on approaches to data literacy and expertise, and design new participatory data methods. We hope these methods can help other researchers engage and work with non-profit organisations to build data capability to improve data equity and as a movement toward more effective collective data action. 


Data capability has both material and formal components – think hardware, storage capacity, software tools, as well as cultural, interpersonal and communicative dimensions – as in the ways we talk about or share ideas regarding the value of data and its uses, and the skills we build together in teams and sometimes in movements. 


The participatory data capability methodology presented in our paper draws lessons from a range of data literacy and participatory design projects. Participatory design and co-design approaches champion inclusivity and takes stock of multiple perspectives. Above all these practices foreground collaboration as well as self-enablement. 


Where we landed and next steps


Detailing the methods and results of two collaborative data projects, we describe successful processes in building data capability in non-profit contexts. This involved: a) creating spaces and techniques for iterative data elicitation to explore data pain points or challenges and identify relevant datasets, 

b) data discovery to gain responses to data visualisation using data walks, and 

c) helping the groups work towards context-aware data storytelling and building that into organisational mission, strategy and outcomes. 


Building on these methods, we also saw the need to: 

  • contextualise non-profit data practices, through engagement with a broader set of organisations and situations, and by
  • fostering a community of practice co-design to embed a data capability framework.


Working with community sector workers over several years has shown us the benefit of participatory methods in building data capability. It also showed us the deep willingness across the sector for stepping up to the challenge of creating a culture of responsible data production and use that works outside of the commercial asset paradigm that has been ascribed to it and limited its social impact and outcomes.   


The more we can articulate the pathways toward sector-wide collaborative capability the greater potential there is for creating the conditions for data for social good outcomes.