Monday, 20 July 2020

Summer break

The journal Big Data & Society will be on summer break from July 22nd to August 21st. Please accept any delays in processing and reviewing your submission, and in related correspondence during that time.

Have a great summer!

Thursday, 9 July 2020

BD&S accepted into the Social Sciences Citation Index with an Impact Factor of 4.577

We are pleased to announce that the journal Big Data & Society has been accepted into the multidisciplinary Social Sciences Citation Index (SSCI) by Clarivate Analytics. The journal's Impact Factor of 4.577 ranks it as the second highest journal (out of 108) in the Social Sciences Interdisciplinary domain of the SSCI. More information on SSCI can be found here:

We are proud of this accomplishment and thank our authors, reviewers and editors for all their hard work. The journal could not have achieved this without you. We look forward to continuing to offer feedback and space for thoughtful and innovative research on big data practices.

BD&S is a SAGE open access, double blind peer-reviewed scholarly journal. It publishes interdisciplinary research principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies. After launching in 2014, it has published over 300 original research articles, commentaries, demonstrations, and editorials in a series of special theme collections. Visit the journal web site at

Monday, 15 June 2020

Call for Special Theme Proposals for Big Data & Society

The SAGE open access journal Big Data & Society (BD&S) is soliciting proposals for a Special Theme to be published in 2021. BD&S is a peer-reviewed, interdisciplinary, scholarly journal that publishes research about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge. BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practices that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government and crowd-sourced data. Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes.

Special Themes can consist of a combination of Original Research Articles (10000 words; maximum 6), Commentaries (3000 words; maximum 4) and one Editorial (3000 words). All Special Theme content will be waived Article Processing Charges. All submissions will go through the Journal’s standard peer review process.

Past special themes for the journal have included: Knowledge Production, Algorithms in Culture, Data Associations in Global Law and Policy, The Cloud, the Crowd, and the City, Veillance and Transparency, Environmental Data, Spatial Big Data, Critical Data Studies, Social Media & Society, Assumptions of Sociality, Health Data Ecosystems, Data & Agency, Big Data and Surveillance, The Personalization of Insurance, The Turn to AI in Governing Communication Online.  See to access these special themes.

Format of Special Theme Proposals
Researchers interested in proposing a Special Theme should submit an outline with the following information.
  • An overview of the proposed theme, how it relates to existing research and the aims and scope of the Journal, and the ways it seeks to expand critical scholarly research on Big Data.
  • A list of titles, abstracts, authors and brief biographies. For each, the type of submission (ORA, Commentary) should also be indicated. If the proposal is the result of a workshop or conference that should also be indicated.
  • Short Bios of the Guest Editors including affiliations and previous work in the field of Big Data studies. Links to homepages, Google Scholar profiles or CVs are welcome, although we don’t require CV submissions.
  • A proposed timing for submission to Manuscript Central. This should be in line with the timeline outlined below.
Information on the types of submissions published by the Journal and other guidelines is available at .

Timeline for Proposals
Please submit proposals by September 15, 2020 to the Managing Editor of the Journal, Prof. Matthew Zook at The Editorial Team of BD&S will review proposals and make a decision by October 2020. Manuscripts would be submitted to the journal (via manuscript central) by or before January/February 2021. For further information or discuss potential themes please contact Matthew Zook at

Thursday, 11 June 2020

Malte Ziewitz receives 2020 Best Paper Award

We are most happy to share the news that an article by Malte Ziewitz published in Big Data & Society has just received 2020 Best Paper Award in Ethnomethodology and Conversation Analysis, assigned by the EMCA Committee of  the American Sociological Association.  In its announcement, the Committee noted the article is ‘timely, clever, and delicately formulated’ and a pleasure to read as well as a spur to further research on the often opaque rules that govern expert systems.’

The award-winning article is:
Ziewitz, M. (2017) A not quite random walk: Experimenting with the ethnomethods of the algorithm, Big Data & Society, 4(2), 1-13,

Algorithms have become a widespread trope for making sense of social life. Science, finance, journalism, warfare, and policing—there is hardly anything these days that has not been specified as “algorithmic.” Yet, although the trope has brought together a variety of audiences, it is not quite clear what kind of work it does. Often portrayed as powerful yet inscrutable entities, algorithms maintain an air of mystery that makes them both interesting and difficult to understand. This article takes on this problem and examines the role of algorithms not as techno-scientific objects to be known, but as a figure that is used for making sense of observations. Following in the footsteps of Harold Garfinkel’s tutorial cases, I shall illustrate the implications of this view through an experiment with algorithmic navigation. Challenging participants to go on a walk, guided not by maps or GPS but by an algorithm developed on the spot, I highlight a number of dynamics typical of reasoning with running code, including the ongoing respecification of rules and observations, the stickiness of the procedure, and the selective invocation of the algorithm as an intelligible object. The materials thus provide an opportunity to rethink key issues at the intersection of the social sciences and the computational, including popular concerns with transparency, accountability, and ethics.

Malte Ziewitz is an Assistant Professor and Director of Undergraduate Studies at Department of Science & Technology Studies, Cornell University.

Thursday, 21 May 2020

The virtue of simplicity: On machine learning models in algorithmic trading

Kristian Bondo Hansen, Copenhagen Business School
Big Data & Society 7(1), First published: May 20, 2020.
Keywords: Ockham’s razor, machine learning models, algorithmic trading, distributed cognition, model overfitting, explainability

In the spring of 2018, I was—as part of my fieldwork into the use of emerging technology in securities trading in financial markets—sitting in on an industry conference on the employment of machine learning (ML) and artificial intelligence (AI) in finance. The conference took place in a rather lavish late eighteenth century building in central London, which had belonged to the freemasons before being turned into a hotel and then conference venue. On the second day of the event a labour union was having a gathering just one floor down from where I sat listening to old school financiers and new school data scientists talk about Markov Chains, unstructured data, reinforcement learning, LSTMs, autoencoders, and a lot of other very technical stuff. I remember pondering what the union people might think about the “capitalists” upstairs and whether I was perceived as one of them. Thoughts about class affiliation and whether or not I was blending in aside, I was enjoying the conference, especially my coffee break conversations with the new breed of tech savvy financiers. In one of the more accessible and less hypothetical presentations—none of the participants were interested in sharing trade secrets, just glimpses of their potentially profitable ML and AI models—a young data wizard doing quantitative risk management in a Dutch clearing bank spoke on a late-stage ML model for anomaly detection in undisclosed financial data. Because it was the first presentation of a production-ready ML model and not just a thing on the drawing board, the room was buzzing with excitement. During the Q&A the presenter was queried about the number of tests and the type and scope of data the model had been trained on. Asked why the team from the clearing bank had only performed a limited number of tests of the model, the presenter replied “because it works. And I have deadlines too!”

Besides being amusingly frank, the response by the data scientist is telling of a combination of and tension between pragmatics and tireless scientific rigour that characterise contemporary quantitative model-driven trading and investment management. Though some algorithms are immensely sophisticated engineering marvels, it is, at the end of the day, their ability to consistently make money that counts. With data scientists increasingly and rapidly replacing economists in the back, middle and front offices of trading firms, hedge funds and banks, finance seems more and more to be turning into an applied data science industry. While finding edge in markets now partly is a scientific endeavour, the end goal however remains the same: to make money. It is the challenge of devising robust, sophisticated, profitable yet understandable and thus manageable ML algorithms for trading and investment purposes that I explore in my paper ‘The virtue of simplicity: On machine learning models in algorithmic trading’. More specifically, I engage with the development of such models from the quants’ perspective and analyse their reflections on how to deal with ML techniques, vast datasets, and the dynamism of financial markets in an in many ways impatient industry. Drawing on distributed cognition theory, my argument basically is that ML techniques enhance financiers’ ability to take advantage of opportunities, but at the same time possess a degree of unavoidable complexity that developers and users need to find ways to make sense of, manage and control.

The paper shows how ML quants attempt to manage the complexity of their algorithms by resorting to simplicity as a virtuous and pragmatic rule of thumb in model development and model implementation processes. Quants consider simplicity—they are particularly fond of the Ockham’s razor principles, which says that things should not be multiplied without necessity—a heuristic that helps them manage and control machine learning model complexity. The argument for having simplicity as a rule of thumb in ML modelling is to ensure comprehensibility, interpretability and explainability. It helps frame the modelling process by making it more foreseeable and controllable, which fortifies accountability. Rather than being able to account for every little detail in learning algorithms, what quants perceive as having an understanding or “feel” for a model is a matter of grasping the algorithm’s basic logic and being capable of interpreting output. The study contributes to research on the relationship between and interaction of humans and algorithms in finance and beyond. The research that went into this paper is carried out as part of the ERC funded interdisciplinary research project ‘AlgoFinance’, which explores algorithm and model use in financial markets. Combining ethnographic field studies with large scale agent-based simulations of securities markets we try to understand how algorithms—machine learning and non-machine learning—construct and shape interaction dynamics of market actors trading with one another. The research team consists of social scientists Christian Borch (PI), Daniel Souleles, Bo Hee Min and myself, and from the hard sciences side of things Zachery David, Nicholas Skar-Gislinge, and Pankaj Kumar. In addition to the sociological network perspective on interaction of trading algorithms, we examine ways organisations and individuals—traders, portfolio managers, quants, etc.—are affected by, adapt to and try to stay on top of technological advances in the field. One of the things we hope will come of our efforts is a better understanding of the social dynamics underpinning and embedded in the thoroughly quantified and increasingly automated world of securities trading. A big part of shedding light on this social dimension of algorithmic finance is to explore the socio-material assemblages of humans and algorithms, which exactly is what I do in my paper.

Monday, 18 May 2020

Big Data and Surveillance: Hype, Commercial Logics and New Intimate Spheres

Big Data & Society 7(1), First published May 14, 2020.

Guest lead editors: Prof. Kirstie Ball*, Prof. William Webster**

* Centre for Research into Information, Surveillance and Privacy, University of St Andrews, School of Management
** Centre for Research into Information, Surveillance and Privacy, University of Stirling, School of Management

When viewed through a surveillance studies lens Big Data is instantly problematic. In comparison with its predecessors, and by virtue of its pre-emptive impulses and intimate data flows, Big Data creates a more penetrating gaze into consumers’ and service users’ lives. As Big Data draws on data streams from social and online media, as well as personal devices designed to share data, consumers have limited opportunities to opt out of data sharing, as well as difficulty in finding out what happens to their data once it is shared. In the Big Data era, consumers and service users exert comparatively less control over their personal information flows and their mundane consumption activities become highly significant and subject to scrutiny. Their subjection to the social sorting which results from the classification of those data is comparatively intensified and commercialised. Those companies who are in a position to exploit the value created by Big Data Analytics (BDA) enjoy powerful market positions.

Consequently, greater attention needs to be placed on corporate and other actors which bring surveillance practices like BDA into being. BDA as practiced predominantly takes place in organizational settings. Addressing the mid-range of BDA - the mesh of organizations which mediate between the end consumer, the organisational and societal context, and the marketer of products - reveals how the influence and power of BDA is far from a done deal. The commercial logics which drive BDA implementation are seated in promises of seamless improvements in operational efficiency and more accurate decision-making arising directly from the use of analytics. As a marketing practice, for example, BDA seek to create value from an extensive array of new data-generating sources used by consumers. The aim is to produce new insight into consumer behaviours so that they can be better targeted by marketers in real time and that their intentions can be predicted with a greater degree of accuracy However, the realisation of this ‘value’ is highly contingent. Personnel management, technology infrastructure, organizational culture, skills, and management capability are all identified as crucial components and impact on the value generated. The sheer socio-technical range and interdependency of these internal variables highlight the two issues with which this special themed issue of Big Data and Society is concerned.

The first concerns the power relations and political dynamics of BDA implementation. Adopting, enacting and complying with the demands of BDA strategies involves a rethinking of roles, relationships and identities on the part of those involved in the transformation. Significant pressure and hype has been brought to bear on non-technical organizational constituencies, such as marketers, who have been challenged by the implications of BDA and are required to reconcile their creative, qualitative approaches with an analytical world- view. Similarly, in a public service context, managers are increasingly being required to base their policy and operational decisions on new information flows embedded in BDA. They are finding that these novel technologically intensive processes are conflicting with traditional long-established norms and organisational decision-making structures.

The second concerns how practices associated with BDA extend surveillance into the intimate sphere. The surveillance practices embedded in Big Data are typically seen as commercial processes and another facet of operational efficiency and value creation. Such data can be subtle, highly nuanced and very personal. It can be collected from the home and can include information gathered within intimate, domestic spaces. Ethical concerns are recognised by practitioners, although they are still couched within a value discourse - and a robust ethics committee can ‘allow’ and ‘oversee’ the collection of such data.

Big Data succeeds in extending the scope of surveillance by co-opting individuals into the de facto surveillance of their own private lives. Through the increasingly embedded role of online social networks and location sensitive mobile devices in social activities, the boundaries between surveillance and the surveilled subject become blurred. Big Data breaks down boundaries between different sources of data, thus allowing the combination of information from different social domains. In democracies, with clearer legal protections of the line between public and private, Big Data extends existing surveillance technologies in its ability to co-opt the key economic actors - the corporations - and thus gain a window into private lives. Big data practices are also allowing powerful commercial corporations greater access to the machinery of government and public services in that they are being increasingly influential in policy-making and service delivery, as well as getting greater access to data deriving from these organisational entitles. The levels of ubiquity in terms of data collection, previously only available in tightly controlled political environments, are therefore now available universally.

A brief guide to the special theme
This theme features six articles, all of which contextualise Big Data hype within and at times counter to business and organisational logics. They explore how BDA extends surveillance across more intimate boundaries highlighting: the emotional registers of consumer; home automation and household surveillance; and the surveillance and commercialisation of children via ‘Hello Barbie’. They also examine how Big Data practices are produced, reflecting the argument that the enactment of surveillant power using BDA is not a certainty but a negotiated organisational process. This theme addresses a gap in critical scholarship on Big Data, as it explores the links between Big Data, its organisational and commercial contexts and increasing levels of intimate surveillance. The articles illustrate how business and organisational practices shape and are shaped by BDA and how the producers and consumers of Big Data are forging new intimate and intensive surveillance relationships. BDA is not as revolutionary as sometimes suggested by vocal advocates. Its implementation and use is embedded within, and shaped by, powerful institutional norms and processes- and when seen in retrospect the development of BDA is clearly an incremental path dependent process.

Thursday, 14 May 2020

Playing with machines: Using machine learning to understand automated copyright enforcement at scale

Joanne Gray, Nicolas Suzor
Big Data & Society 7(1), First published April 28, 2020.
Keywords: machine learning, copyright enforcement, YouTube, content moderation, automated decision-making, Content ID

How can we understand how massive content moderation systems work? Major social media platforms use a combination of human and automated processes to efficiently evaluate the content that their users post against the rules of the platform and applicable laws. These sociotechnical systems are notoriously difficult to understand and hold to account.

In this study, we use digital methods to try to make the content moderation system on YouTube—a system that relies on both automated and discretionary decision-making and that is applied to varying types of video content—more legible for research.

Starting with a random sample of 76.7 million YouTube videos, we used the BERT language model to train a machine learning classifier to identify videos in categories that reflect ongoing controversies in copyright takedowns. Our four categories were full movies, gameplay, sports content, and ‘hacks’ (tutorials on copy control circumvention). We used this classifier to isolate and categorise a sample of approximately 13 million unique videos for further analysis.

For each of these videos, we were able to identify which videos had been removed, what reasons YouTube gave to explain their removal and, in some cases, who had requested the removal. We sought to compare trends in copyright takedowns, Content ID blocks, and terms of service removals across different categories of content. Across our entire dataset, videos were most frequently removed from YouTube by users themselves, followed by removals due to an account termination and then Content ID blocks. DMCA takedowns were the least common removal type.

One of the most important findings of this study is that we can see, at a large scale, the rates at which Content ID is used to remove content from YouTube. Previous large-scale studies have provided important insights into rates of DMCA removals but information about Content ID removals has remained imprecise, provided at a high level of abstraction by YouTube. In this paper, we provide the first large-scale systematic analysis of Content ID and Terms of Service removal rates, including comparisons with other removal types and across different categories of content.

Our analysis provides a comparison of different types of automation in content moderation, and how these different systems play out across different categories of content. We found very high rates of removals for videos associated with film piracy and this category had the highest rate of removals for terms of service violations and account terminations. Contrastingly, we found low rates of removals for game play. It appears from our data that game publishers are largely not enforcing their rights against gameplay streams and when a gameplay video is removed it is usually due to a claim by a music rightsholder.

For sports content, we found very high rates of removals for both live sporting streams and sports highlights, including high rates of terms of service violations. For this category of content it appears both copyright owners and YouTube are highly active in policing sports content on YouTube. For the ‘hacks’ category, we found high rates of removals but mostly for terms of service violations indicating that YouTube, rather than rightsholders, are more commonly taking action to remove content and terminate accounts that provide DRM circumvention information.

Overall, we found in YouTube’s heavily automated content moderation system there is substantial discretionary decision-making. We welcome the capacity for the use of discretion in automated systems. But it matters who is making the decisions and why. As nations pressure platforms to take a more active role in moderating content across the internet we must continue to advance our methods for holding decision-makers to account.

This experimental methodology has left us optimistic about the potential for researchers to use machine learning classifiers to better understand systems of algorithmic governance in operation at a large scale and over a sustained period.