Wednesday, 16 June 2021

Mapping business and data partnerships in the social media ecosystem

Fernando van der Vlist (@fvandervlist) and Anne Helmond (@silvertje)

Social media platforms are among the world’s most profitable businesses whose business models rely on digital advertising revenues. The current global digital advertising market comprises thousands of interconnected platforms and (platform) businesses and is projected to be worth $333 billion, in which programmatic advertising accounts for the vast majority (84.5% or more) of total revenue. Despite its significance, not enough is known about the structure of this global digital advertising market, how exactly it relates to social media, and the importance of partnerships and partner integrations in connecting them.

In an empirical study, we consider the significance of business and data partnerships in the social media ecosystem to understand how partners mediate and shape the governance and power of the world’s largest digital platforms. We present an empirical method for tracing their partnerships and partner integrations (the software integrations built through partnerships) – inspired by a prior empirical study of how partnerships figured in Facebook’s platform evolution. We then apply this method to map the thousands of business(-to-business) partnership relations that comprise the social media business ecosystem to learn more about the different types of partnerships and their role in mediating and shaping the governance and power of social media platforms in particular.

The empirical maps show which relationships are involved, which are exclusive or shared, and help us to identify key sources and locations, or ‘nodes’, of power in this ecosystem. Importantly, they spotlight the central role of partnerships and partner integrations in connecting social media platforms with what we call the audience economy – the complex global and interconnected marketplace of business(-to-business) intermediaries involved in the creation, commodification, analysis, and circulation of data audiences for purposes including but not limited to digital advertising and marketing. That is, we present the relationship networks that make up the ecosystem of social media, search engines, and other large digital platforms and which also interconnects the players of this global ecosystem, including leading data intermediaries, cloud service providers, and digital advertising and marketing technology providers.

Social media and audience intermediary partner ecosystems, with highlighted social media platforms (light blue) and audience data intermediaries (orange).

Within this global and interconnected audience economy, business(-to-business) partners play a pivotal role through the creation of software tools, products, and services for shaping the creation, buying, modelling, measurement, and targeting of data audiences. In fact, we suggest that partnerships have been endemic and essential to the burgeoning business of digital platforms, particularly to their ‘programmatic’ (automated, data-driven) advertising and marketing businesses. Through intermediary partnerships and infrastructures, these data and advertising-related practices often extend far beyond any single digital platform environment or geographic territory. Consequently, partners contribute significantly to the ongoing process of ‘platformisation’ through their collective development of integrated software infrastructures between diverse economic sectors and spheres of life.

Most of the partnerships we found when conducting the empirical research in 2018 involved large advertising agencies (e.g. Dentsu and WPP), advertising and marketing clouds (e.g. Adobe Marketing Cloud, Oracle Marketing Cloud, and Salesforce Marketing Cloud), audience data aggregators such as data management and customer data platforms (‘DMPs’ and ‘CDPs’, e.g. eXelate, LiveRamp (formerly Acxiom), Oracle DMP (formerly BlueKai), and Salesforce DMP (formerly Krux)), data analytics and measurement firms (e.g. 4C Insights, Nielsen, and SocialCode), ‘multichannel’ advertising and marketing solutions (e.g. Adobe, AdParlor, Brand Networks, Oracle, Percolate, Salesforce, Spredfast, and Sprinklr), and customer relation management (‘CRM’) solutions (e.g. Adobe, Salesforce, Spredfast, and Sprinklr).

These examples are just the tip of the iceberg. It is clear that there are many different types of partnerships and players that may inform our understanding of the nature and structure of the audience economy – including the global digital advertising market, which is exceptionally dynamic – and where the vast stores of digital data held by social media and other types of digital platforms derive their value and worth. That is, how disparate data sources are aggregated, linked, and made valuable through diverse practical applications, including but not limited to targeted advertising and data analytics. Data aggregation and identity resolution have become central processes in this audience economy as a result – and we find those processes as solutions offered by virtually all the leading platform businesses.

Based on the empirical findings, we suggest that power is not only held by the world’s largest platforms (e.g. those referred to as ‘GAFAM’, or Alphabet, Amazon, Facebook, Apple, and Microsoft) but also mediated by their partners and dispersed within the integrated platform ecosystem. Google and Facebook’s digital advertising ‘duopoly’, for instance, depends to a certain extent on their strategic position within the partner ecosystem, while strategic partners such as Acxiom, Oracle, and Experian benefit from partnerships with Google and Facebook through being among the few with privileged access to their closed platforms (referred to as ‘walled gardens’ or ‘data silos’). Within this ecosystem, governance and control are exercised through partnership agreements and software infrastructure for the sourcing of data from disparate sources and the distribution across many media channels – all automated and occurring in the blink of an eye.

While there are many important implications to consider, as some already have, the global and interconnected structure of this audience economy raises geo-political concerns around how these intermediary partnerships enable or cause data to move across (international and intercontinental) borders. The prevalence of partnerships between and among audience data intermediaries means that it is exceptionally difficult, sometimes impossible, to trace the origins and flow of audience data throughout the integrated platform ecosystem (or understand where data originates, is stored, and moves – a requirement under the EU GDPR). For instance, Wodinsky from Gizmodo raised concerns about the role of partners mediating, through an unknown number of intermediary partnerships, between Western and Chinese firms and advertisers. Additionally, The Intercept featured how a network of local Chinese partners offered Oracle’s technology and services to Chinese police and defense entities. Given these implications, we hope that our research methodology – and an (openly available) dataset of the partnerships we found – provide useful starting points to undertake additional research to help further improve critical understanding of this audience economy and the players within it.

Monday, 14 June 2021

Introducing the Special Theme Issue on “Studying the COVID-19 Infodemic at Scale”

Guest editors:

·     Anatoliy Gruzd, Ted Rogers School of Information Technology Management, Ryerson University, Toronto, Canada

·     Manlio De Domenico, Center for Information Technology of Fondazione Bruno Kessler, Italy

·     Pier Luigi Sacco, Department of Humanities, IULM University, Milan, Italy; metaLAB (at) Harvard, USA

·     Sylvie Briand, World Health Organization, Switzerland

As manufacturing and distribution of the vaccines are ramping up, false and misleading information about vaccines efficacy, safety and side-effects have also increased on social media. This is reflected in the increasing number of vaccine-related claims being debunked by international fact-checking organizations. But, false and misleading COVID-19 claims, as tracked by the COVID19Misinfo portal from Ryerson University Social Media Lab, are not limited to vaccine-related content. In fact, since the onset of the COVID-19 pandemic in early 2020, social media has been a key vector in the spread of various types of misinformation about the virus including how it is transmitted and how to treat it. The prevalence of COVID-19 related misinformation on social media contributes to the phenomenon called “infodemic,” when people are exposed to large quantities of both accurate and misleading information related to a health topic. An infodemic makes it challenging for people to know what or whom to trust, especially when faced with conflicting claims or information.

To address the challenges of detecting and combating the spread of COVID-19 misinformation on social media and to contribute to the rapidly growing area of infodemiology, we are pleased to present the special theme on “Studying the COVID-19 Infodemic at Scale”. This special theme in Big Data & Society provides a space for original research articles and commentaries at the intersection of infodemiology, Big Data, and COVID-related dis/misinformation studies that explore questions such as: What are key terminologies and different computational approaches currently used to study and combat the spread of misinformation on social media? How can we use social media data to estimate the effects of the infodemic on individuals and society in general? And more specifically, how can we assess and mitigate the infodemic risks and consequences using Big Data?

The special theme issue builds on a successful series of public events and consultations organized by the World Health Organization (WHO) Information Network for Epidemics (EPI-WIN) Infodemic Management team in 2020. We are also building on the Big Data & Society symposium called “Viral Data” edited by Leszczynski and Zook (2020) which examined Big Data practices and specifically the notion of data virality as related to the pandemic at the midpoint of 2020. 

All together the special theme features the following six research articles and four commentaries by 57 authors from 23 institutions in six countries:

Monday, 7 June 2021

Machine Learning in Tutorials - Universal Applicability, Underinformed Application, and Other Misconceptions

Hendrik Heuer introduces a new paper on, "Machine learning in tutorials – Universal applicability, underinformed application, and other misconceptions", out in Big Data & Society  doi:10.1177/20539517211017593. First published May 21, 2021.

Video abstract


Machine learning has become a key component of contemporary information systems. Unlike prior information systems explicitly programmed in formal languages, ML systems infer rules from data. This paper shows what this difference means for the critical analysis of socio-technical systems based on machine learning. To provide a foundation for future critical analysis of machine learning-based systems, we engage with how the term is framed and constructed in self-education resources. For this, we analyze machine learning tutorials, an important information source for self-learners and a key tool for the formation of the practices of the machine learning community. Our analysis identifies canonical examples of machine learning as well as important misconceptions and problematic framings. Our results show that machine learning is presented as being universally applicable and that the application of machine learning without special expertise is actively encouraged. Explanations of machine learning algorithms are missing or strongly limited. Meanwhile, the importance of data is vastly understated. This has implications for the manifestation of (new) social inequalities through machine learning-based systems.

Keywords: Machine learningartificial intelligencealgorithmsdata sciencecritical data studiestutorials

Wednesday, 2 June 2021

Call for Special Theme Proposals for Big Data & Society

 Call for Special Theme Proposals for Big Data & Society

The SAGE open access journal Big Data & Society (BD&S) is soliciting proposals for a Special Theme to be published in 2022/23. BD&S is indexed by Clarivate Analytics with a 2019 journal impact factor of 4.577. BD&S is a peer-reviewed, interdisciplinary, scholarly journal that publishes research about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge. BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practices that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government and crowd-sourced data. Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes.

Special Themes can consist of a combination of Original Research Articles (10,000 words; maximum 6), Commentaries (3,000 words; maximum 4) and one Editorial (3,000 words). All Special Theme content will be waived Article Processing Charges. All submissions will go through the Journal’s standard peer review process.

Past special themes for the journal have included: Knowledge Production; Algorithms in Culture; Data Associations in Global Law and Policy; The Cloud, the Crowd, and the City; Veillance and Transparency; Environmental Data; Spatial Big Data; Critical Data Studies; Social Media & Society; Assumptions of Sociality; Health Data Ecosystems; Data & Agency; Big Data and Surveillance; The Turn to AI in Governing Communication Online; The Personalization of Insurance; Heritage in a World of Big Data; and Studying the COVID-19 Infodemic at Scale. See to access these special themes.


While open to submissions on any theme related to Big Data we particularly welcome proposals related to racialisation, indigenous data, health and education.

Format of Special Theme Proposals

Researchers interested in proposing a Special Theme should submit an outline with the following information.


  • An overview of the proposed theme, how it relates to existing research and the aims and scope of the Journal, and the ways it seeks to expand critical scholarly research on Big Data.

  • A list of titles, abstracts, authors and brief biographies. For each, the type of submission (ORA, Commentary) should also be indicated. If the proposal is the result of a workshop or conference that should also be indicated.
    Short Bios of the Guest Editors including affiliations and previous work in the field of Big Data studies. Links to homepages, Google Scholar profiles or CVs are welcome, although we don’t require CV submissions.

  • A proposed timing for submission to Manuscript Central. This should be in line with the timeline outlined below.


Information on the types of submissions published by the Journal and other guidelines is available at .


Timeline for Proposals

Please submit proposals by Sept 1, 2021 to the Managing Editor of the Journal, Prof. Matthew Zook at The Editorial Team of BD&S will review proposals and make a decision by October 2021. Manuscripts would be submitted to the journal (via manuscript central) by or before January/February 2022. For further information or discuss potential themes please contact Matthew Zook at

Wednesday, 12 May 2021

Prebunking against COVID-19 misinformation

Basol M, Roozenbeek J, Berriche M, Uenal F, McClanahan WP, Linden S van der. Towards psychological herd immunity: Cross-cultural evidence for two prebunking interventions against COVID-19 misinformation. Big Data & Society. May 2021. doi:10.1177/20539517211013868

To fight misinformation about COVID-19, we developed and tested a 5-minute browser game as well as a series of infographics, designed to work as a psychological “vaccine” against manipulative online content.

Misinformation about COVID-19 is widespread and can range from messages about fake remedies (such as drinking bleach or eating garlic) to elaborate conspiracy theories about microchips and a “new world order”. Belief in such misinformation has been associated with reduced willingness to get vaccinated against the disease, and has been linked to acts of vandalism such as the destruction of mobile phone towers out of a mistaken belief that 5G radiation exacerbates the symptoms of COVID-19. Unfortunately, any effort to reduce the spread of COVID-19 misinformation runs into a number of problems. First, as our understanding of the virus develops, it can be hard to determine what does and doesn’t count as reliable information. Second, it’s difficult to undo the damage caused by misinformation once it’s already spread. Unverified (false) information travels faster and can spread deeper into social networks than information that turns out to be true, making it difficult for fact-checkers to keep up. Research also shows that viral information is sticky, that repeated exposure to misinformation increases the chances of it being perceived as reliable, and that even after a falsehood has been debunked, people often continue to rely on the misinformation to some extent.

Faced with these difficulties, we instead chose to focus on how to prevent COVID-19 misinformation from being effective in the first place, using an approach known as “prebunking”. Prebunking is grounded in inoculation theory, and is based on the biological analogy of an immunisation process. Much like exposure to a weakened pathogen triggers the production of antibodies, inoculation theory posits that preemptively exposing people to a weakened persuasive argument builds people’s resistance against future deception. In addition, by “inoculating” people against the techniques that are commonly used in misinformation (as opposed to against individual examples), the scalability of inoculation interventions is significantly enhanced.

In collaboration with the UK Cabinet Office, DROG, and the WHO’s “stop the spread” campaign, we developed a psychological “inoculation” intervention against COVID-19 misinformation in the form of a short, free online browser game called Go Viral! ( In this 5-minute game, players learn and are exposed to weakened doses of three manipulation techniques commonly used in COVID-19 misinformation (using excessively emotional language, using testimony from fake experts, and spreading conspiracy theories). 

Go Viral! screenshots ( game can be played in English, French, German, Italian, Spanish and Ukrainian. 

We tested the efficacy of this game, now published in Big Data & Society (Basol, Roozenbeek et al., 2021). We conducted two large-sample studies. In study 1 (n = 1,771), we collected data from a pre-post survey that we implemented in the game itself. Players were asked to rate the manipulativeness of tweets about COVID-19 (half of which contained misinformation and half of which did not) on a 1-7 scale. We found that players rated misinformation tweets as significantly more manipulative after playing than before, whereas they rated “real” (non-misinformation) tweets as equally reliable. 

In study 2 (n = 1,777), we tested the efficacy of Go Viral! in a randomised trial against a control group as well as a separate treatment group that read a series of infographics about COVID-19, developed by UNESCO in collaboration with inoculation researchers. We ran this study in 3 languages (English, French and German). For the UK participants, we also did a one-week follow-up to see if the “inoculation” effect persists over time. We found that playing Go Viral! and reading the infographics significantly improved people’s ability to detect manipulative content about COVID-19 as well as their confidence in doing so. For the Go Viral! game (but not for the infographics), these effects persisted for at least a week. Also, playing Go Viral! reduced the self-reported willingness to share COVID-19 misinformation with people in their network. These effects were consistent across the three languages.

Overall, our findings show robust support for the use of prebunking and psychological “vaccines” against COVID-19 misinformation. Both the Go Viral! game and the UNESCO infographics are demonstrably effective, widely accessible, and easily scalable. As the success of COVID-19 vaccination programmes worldwide depend in part on minimising the amount of unreliable information that surrounds them, our study adds to the emerging insight that behavioural science is a crucial tool to help mitigate the spread of misinformation.   

Saturday, 8 May 2021

Identifying and Characterizing Scientific Authority-related Misinformation Discourse about Hydroxychloroquine on Twitter using Unsupervised Machine Learning

Michael Haupt introduces a video abstract of his new paper with Jiawei Li and Timothy Mackey, "Identifying and Characterizing Scientific Authority-related Misinformation Discourse about Hydroxychloroquine on Twitter using Unsupervised Machine Learning", out in Big Data & Society  doi:10.1177/20539517211013843. First published May 6, 2021.

Video abstract


This study investigates the types of misinformation spread on Twitter that evokes scientific authority or evidence when making false claims about the antimalarial drug hydroxychloroquine as a treatment for COVID-19. Specifically, we examined tweets generated after former U.S. President Donald Trump retweeted misinformation about the drug using an unsupervised machine learning approach called the biterm topic model that is used to cluster tweets into misinformation topics based on textual similarity. The top 10 tweets from each topic cluster were content coded for three types of misinformation categories related to scientific authority: medical endorsements of hydroxychloroquine, scientific information used to support hydroxychloroquine’s use, and a comparison group that included scientific evidence opposing hydroxychloroquine’s use. Results show a much higher volume of tweets featuring medical endorsements and use of supportive scientific information compared to accurate and updated scientific evidence, that misinformation-related tweets propagated for a longer time frame, and the majority of hydroxychloroquine Twitter discourse expressed positive views about the drug. Metadata from Twitter accounts found that prominent users within misinformation discourse were more likely to have media or political affiliation and explicitly expressed support for President Trump. Conversely, prominent accounts within the scientific opposition discourse primarily consisted of medical doctors or scientists but had far less influence in the Twitter discourse. Implications of these findings and connections to related social media research are discussed, as well as cognitive mechanisms for understanding susceptibility to misinformation and strategies to combat misinformation spread via online platforms.

Keywords: Misinformationscientific evidencetwittermachine learningcomputational social science

Tuesday, 20 April 2021

One size does not fit all: Constructing complementary digital reskilling strategies using online labour market data

Fabian Stephany introduces his new article, 'One size does not fit all: Constructing complementary digital reskilling strategies using online labour market data' in Big Data & Society doi:10.1177/20539517211003120. First published April 14th 2021.

Video abstract

Digital technologies are radically transforming our work environments and demand for skills, with certain jobs being automated away and others demanding mastery of new digital techniques. This global challenge of rapidly changing skill requirements due to task automation overwhelms workers. The digital skill gap widens further as technological and social transformation outpaces national education systems and precise skill requirements for mastering emerging technologies, such as Artificial Intelligence, remain opaque. Online labour platforms could help us to understand this grand challenge of reskilling en masse. Online labour platforms build a globally integrated market that mediates between millions of buyers and sellers of remotely deliverable cognitive work. This commentary argues that, over the last decade, online labour platforms have become the ‘laboratories’ of skill rebundling; the combination of skills from different occupational domains. Online labour platform data allows us to establish a new taxonomy on the individual complementarity of skills. For policy makers, education providers and recruiters, a continuous analysis of complementary reskilling trajectories enables automated, individual and far-sighted suggestions on the value of learning a new skill in a future of technological disruption.

Keywords: Artificial intelligenceautomationBig Datanetworksonline labour platformsskills