Big Data & Society: big data

Showing posts with label big data. Show all posts

Thursday, 11 March 2021

“Reach the Right People”: The Politics of “Interests” in Facebook’s Classification System for Ad Targeting

by Kelley Cotter, Mel Medeiros, Chankyung Pak, Kjerstin Thorson

Big Data & Society, https://doi.org/10.1177/2053951721996046

In recent years, targeted advertising has gained a prominent place in American politics. In particular, political campaigns, candidates, and advocacy organizations have turned to Facebook for its voluminous array of options for targeting users according to “interests” inferred by machine learning algorithms. In this study, we explored for whom this (algorithmic) classification system for ad targeting works in order to contribute to conversations about the ways such systems produce knowledge about the world rooted in power structures. To do this, we critically analyzed the classification system from a variety of vantage points, particularly focusing on the representation of people of color (POC), women, and LGBTQ+ people. First, we drew on donated user data, which included a list of political ad categories people had been sorted into on Facebook. We also examined Facebook's documentation, training materials, and patents for insight into the inner workings of the system. Finally, we entered into the system via Facebook’s tools for advertisers to explore its contents.

Through this investigation, we catalogued a series of cases that reveal the political order enacted via Facebook’s classification system for ad targeting. We particularly highlight four themes. First, we demonstrate how certain ad categories reflect what Joy Buolamwini calls a "coded gaze," or the “embedded views that are propagated by those who have the power to code systems” (2016: n.p.). Second, we highlight how a disproportionate number of ad categories for women and people of color hint at an unmarked user and what Tressie McMillan Cottom (2020) calls "predatory inclusion." Third, we describe cases of ad categories that flatten dimensions of identity and suggest Kimberlé Crenshaw’s (1989) notion of a "single-axis framework" of identity, which fails to capture the intersectionality of identity. Fourth, we illustrate how Facebook's classification system exhibits something akin to what Whitney Phillips (2018) refers to as "both-sides-ism" by allowing for ad categories that could either represent an interest in civil rights or the endorsement of hateful ideologies.

Through these cases, we argue that Facebook's classification system for ad targeting is necessarily political as a result of its underlying technical and commercial logics and the human choices embedded in datafication processes. The system prioritizes the interests of the socially and economically powerful and represents those who have been historically marginalized not on their own terms, but on the terms of those occupying more privileged positions. We suggest that, as a tool for political communication, Facebook’s classification system may have downstream implications for the political voice and representation of marginalized communities to the extent that political campaigns, advocacy groups, and activists increasingly rely on it for cultivating and mobilizing supporters. As Facebook weighs the decision of if/when to reinstate political advertising, our study urges continued critical reflection on whose “interests” are served by Facebook’s classification system (and others like it).

References

Buolamwini J (2016) InCoding — In the Beginning Was the Coded Gaze. Available at: https://medium.com/mit-media-lab/incoding-in-the-beginning-4e2a5c51a45d

Cottom TM (2020) Where platform capitalism and racial capitalism meet: The sociology of race and racism in the digital society. Sociology of Race and Ethnicity 6(4): 441–449. DOI: 10.1177/2332649220949473.

Crenshaw K (1989) Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum Article 8: 139.

Phillips W (2018) The oxygen of amplification: Better practices for reporting on extremists, antagonists, and manipulators online. Data & Society. https://datasociety.net/library/oxygen-of-amplification/

Tuesday, 8 September 2020

Emerging models of data governance in the age of datafication

by Marina Micheli, Marisa Ponti, Max Craglia and Anna Berti Suman

Big Data & Society 7(2), https://doi.org/10.1177/2053951720948087. First published: Sept 1, 2020.

The article synthetizes and critically inquires a ‘moving target’: the various practices that are being advanced for the governance of personal data. In the last years, following scandals like Cambridge Analytica and new regulations for the protection of data like the GDPR, there is mounting attention on how data collected by big tech corporations and business entities might be accessed, controlled, and used by other societal actors. Scholars, practitioners and policy makers have been exploring the opportunities of agency for data subjects, as well as the alternative data regimes that could allow public bodies to use such data for their public interest mission. Yet, the current circumstances, which are the result of a tradition of ‘corporate self-regulation’ in the digital domain and an overall laissez-faire approach (albeit increasingly divergent by geopolitical context), see the hegemonic position of a few technology corporations that have de-facto established ‘quasi-data monopolies’. This is reflected in the asymmetry of power between data corporations, which hold most of the decision-making power over data access and use, and the other stakeholders.

The article increases knowledge about the practices for data governance that are currently developed by various societal actors beyond ‘big tech’. It does so describing four data governance models, emphasizing the power of social actors to control how data is accessed and used to produce different kinds of value. A relevant outcome of the article lies in the heuristic tools it proposes that could be useful to better understand and further examine the emerging models of data governance –looking in particular at the relations between stakeholders and the power (un)balances between them.

The idea for this study originates from a workshop that we organised in the context of the project Digitranscope at the Centre of Advanced Studies of the Joint Research Centre of the European Commission. Seventeen invited experts - from academia, public sector, policymaking, research and consultancy firms - took part at the event, back in October 2018, to discuss the policy implications of the governance of (and with) data. While preparing the workshop, we realised how the various labels that circulated in the policy arena to tackle data governance - such as data sovereignty, data commons, data trusts, etc. - tended to be used equivocally to refer to different concepts (technical solutions, legal frameworks, economic partnerships, etc.), with their meaning slightly shifting according to the context. Furthermore, during the workshop, participants highlighted the widespread lack of knowledge and practical understanding of possible alternatives to the ‘data extraction’ approach of big online platforms, as well as the need to find ways to use data collected by private companies for the public interest, and the urgency to consider data subjects as key stakeholders for the governance of data. With all these insights in mind, we decided to engage in the research that lead to this article.

The key contributions of this publication, according to our view, are conceptual and empirical.

We developed a ‘social-science informed’ definition of data governance that draws from science and technology studies and critical data studies (hence, also from some key publications of this journal). We understood data governance as the power relations between all the actors affected by, or having an effect on, the way data is accessed, controlled, shared and used, the various socio-technical arrangements set in place to generate value from data, and how value is redistributed between actors. Such definition allows moving beyond concerns of technical feasibility, efficiency discourses and ‘solutionist’ thinking. Instead, it points to the actual goals for which data is managed, emphasizing who benefits from it, the power un(balances) among stakeholders, the kind of value produced, and the mechanisms (including underling principles and system of thoughts) that sustain this approaches.

We conducted a review of relevant resources from the scientific and grey literature on the practices of data governance that lead to the identification of four emerging models: data sharing pools, data cooperatives, public data trusts and personal data sovereignty. As this is a rapidly evolving field, we did not aim at offering an exhaustive picture of all possible models - hence these four should not be understood as comprehensive. They also have to be contextualised in our conceptual approach, in the time span in which the research has been conducted and in the European focus taken by the article. Yet, they provide a basis to understand how the emerging data governance models are (re)thinking and redressing power asymmetries between big data platforms and other actors. In particular, they show how both civic society and public bodies are key actors for democratising data governance and redistributing value produced through data.

A social science-informed conceptualisation of data governance allows seeing ‘through the infrastructure’ and encourages asking certain questions, such as: what principles guide data sharing and use? What is done with data and who can access and participate in its governance? What value is produced and how it is redistributed? This kind of questions is particularly relevant today, given that the policy debate around data governance is very active at the moment (especially in Europe). The future of the data governance models examined in this article – and of any model that allows more actors to control data and use it for purposes beyond the generation of profit for big tech corporations – depends on the policy actions and the legal frameworks that will be developed to sustain them.

Vignette of the data governance models examined in the article.

Keywords: Data governance, Big Data, digital platforms, data infrastructure, data politics, data policy

Wednesday, 2 September 2020

COVID-19 is spatial: Ensuring that mobile Big Data is used for social good

by Age Poom, Olle Järv, Matthew Zook and Tuuli Toivonen

Big Data & Society 7(2), https://doi.org/10.1177/2053951720952088. First published: August 28, 2020

The mobility restrictions related to COVID-19 pandemic have resulted in the biggest disruption to individual mobilities in modern times. Hot spots, quarantine, closed borders, video-conferencing, social distancing and temporary closure of workplaces, schools, restaurants and recreational facilities are all profoundly about distance, separation, and space. Examining the geographical aspect of the pandemic is important in understanding its broad implications, including the broader societal impacts of containment policies.

The avalanche of mobile Big Data – location and time-stamp data from mobile phone call records, operating system, social media or apps – makes it possible to study the spatial effects of the crisis with spatiotemporal detail even at national and global scales. Beyond health care objectives such as understanding how virus transmission is mediated by human mobility or evaluating adherence to restrictions, mobile Big Data also allows us to understand the changes in people’s daily interactions, mobilities and socio-spatial responses across population groups.

Our advocacy for the use of these data, however, is tempered both by our experiences in recent months with the serious limitations of using mobile Big Data and our unease with the power of these same data to track, surveil and discipline social behaviour at the scale of entire populations.

Thus, we pose the question: How can we use mobile Big Data for social good, while also protecting society from social harm? Drawing on the Estonian and Finnish experiences during the early phases of COVID-19 pandemic, we highlight issues with quickly developed ad hoc data products as well as the “black box” solutions (Pasquale, 2015) offered by large platform companies that created “new digital divides” among researchers (boyd and Crawford, 2012).

We argue that these examples demonstrate a clear need to re-evaluate the public-private relationships with mobile Big Data and propose two strategic pathways forward.

First, we call for transparent and sound mobile Big Data products that provide relevant up-to-date longitudinal data on the mobility patterns of dynamic populations. To help increase their usefulness, data products should be transparent about their production methodology, and ensure easy access and stability.

Second, there is also a need to develop trustworthy platforms for collaborative use of raw individual level data. Secured and privacy-respectful access to near real-time raw data is needed for developing and testing sound methodologies for the above-mentioned data products. This would help bridge the Big Data digital divide, enable scientific innovation, and offering needed flexibility in responding to unanticipated questions on changing locations and mobilities in case of crises. To be clear, we do not view this as simple to achieve, particularly as we weigh what kind of institution might best fill this role, or how is “social good” defined and operationalized in practice. But addressing these issues via public debates and academic discourses will leave us better prepared for the next crisis.

Summing up,

We need harmonized and representative data about human mobility for better crisis preparedness and social good in general;
Methodological transparency about mobile Big Data products are vital for open societies and capacity building;
Access to mobile Big Data to develop feasible methodologies and baseline knowledge for public decision-making is needed before the next crisis occurs;
Recognizing the fundamental spatiality of the current COVID-19 crisis and crises more generally is the most relevant of all.

Mobile Big Data can help us to better understand and address the important spatial dimensions of COVID-19 pandemic and every other social phenomenon. The challenge is doing so responsibly (Zook et al., 2017) and not normalizing a lack of spatial privacy.

References

boyd, d, Crawford, K (2012) Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679. https://doi.org/10.1080/1369118X.2012.678878

Pasquale, F (2015) The Black Box Society. Cambridge: Harvard University Press.

Zook, M, Barocas, S, boyd, d, et al. (2017) Ten simple rules for responsible big data research. PLOS Computational Biology 13(3): e1005399. https://doi.org/10.1371/journal.pcbi.1005399

Keywords: mobile Big Data, mobility, COVID-19, spatial data infrastructure, social good, mobile phone data, social media data, privacy

Monday, 18 May 2020

Big Data and Surveillance: Hype, Commercial Logics and New Intimate Spheres

Editorial
Big Data & Society 7(1), https://doi.org/10.1177/2053951720925853. First published May 14, 2020.

Guest lead editors: Prof. Kirstie Ball*, Prof. William Webster**
* Centre for Research into Information, Surveillance and Privacy, University of St Andrews, School of Management
** Centre for Research into Information, Surveillance and Privacy, University of Stirling, School of Management

When viewed through a surveillance studies lens Big Data is instantly problematic. In comparison with its predecessors, and by virtue of its pre-emptive impulses and intimate data flows, Big Data creates a more penetrating gaze into consumers’ and service users’ lives. As Big Data draws on data streams from social and online media, as well as personal devices designed to share data, consumers have limited opportunities to opt out of data sharing, as well as difficulty in finding out what happens to their data once it is shared. In the Big Data era, consumers and service users exert comparatively less control over their personal information flows and their mundane consumption activities become highly significant and subject to scrutiny. Their subjection to the social sorting which results from the classification of those data is comparatively intensified and commercialised. Those companies who are in a position to exploit the value created by Big Data Analytics (BDA) enjoy powerful market positions.

Consequently, greater attention needs to be placed on corporate and other actors which bring surveillance practices like BDA into being. BDA as practiced predominantly takes place in organizational settings. Addressing the mid-range of BDA - the mesh of organizations which mediate between the end consumer, the organisational and societal context, and the marketer of products - reveals how the influence and power of BDA is far from a done deal. The commercial logics which drive BDA implementation are seated in promises of seamless improvements in operational efficiency and more accurate decision-making arising directly from the use of analytics. As a marketing practice, for example, BDA seek to create value from an extensive array of new data-generating sources used by consumers. The aim is to produce new insight into consumer behaviours so that they can be better targeted by marketers in real time and that their intentions can be predicted with a greater degree of accuracy However, the realisation of this ‘value’ is highly contingent. Personnel management, technology infrastructure, organizational culture, skills, and management capability are all identified as crucial components and impact on the value generated. The sheer socio-technical range and interdependency of these internal variables highlight the two issues with which this special themed issue of Big Data and Society is concerned.

The first concerns the power relations and political dynamics of BDA implementation. Adopting, enacting and complying with the demands of BDA strategies involves a rethinking of roles, relationships and identities on the part of those involved in the transformation. Significant pressure and hype has been brought to bear on non-technical organizational constituencies, such as marketers, who have been challenged by the implications of BDA and are required to reconcile their creative, qualitative approaches with an analytical world- view. Similarly, in a public service context, managers are increasingly being required to base their policy and operational decisions on new information flows embedded in BDA. They are finding that these novel technologically intensive processes are conflicting with traditional long-established norms and organisational decision-making structures.

The second concerns how practices associated with BDA extend surveillance into the intimate sphere. The surveillance practices embedded in Big Data are typically seen as commercial processes and another facet of operational efficiency and value creation. Such data can be subtle, highly nuanced and very personal. It can be collected from the home and can include information gathered within intimate, domestic spaces. Ethical concerns are recognised by practitioners, although they are still couched within a value discourse - and a robust ethics committee can ‘allow’ and ‘oversee’ the collection of such data.

Big Data succeeds in extending the scope of surveillance by co-opting individuals into the de facto surveillance of their own private lives. Through the increasingly embedded role of online social networks and location sensitive mobile devices in social activities, the boundaries between surveillance and the surveilled subject become blurred. Big Data breaks down boundaries between different sources of data, thus allowing the combination of information from different social domains. In democracies, with clearer legal protections of the line between public and private, Big Data extends existing surveillance technologies in its ability to co-opt the key economic actors - the corporations - and thus gain a window into private lives. Big data practices are also allowing powerful commercial corporations greater access to the machinery of government and public services in that they are being increasingly influential in policy-making and service delivery, as well as getting greater access to data deriving from these organisational entitles. The levels of ubiquity in terms of data collection, previously only available in tightly controlled political environments, are therefore now available universally.

A brief guide to the special theme
This theme features six articles, all of which contextualise Big Data hype within and at times counter to business and organisational logics. They explore how BDA extends surveillance across more intimate boundaries highlighting: the emotional registers of consumer; home automation and household surveillance; and the surveillance and commercialisation of children via ‘Hello Barbie’. They also examine how Big Data practices are produced, reflecting the argument that the enactment of surveillant power using BDA is not a certainty but a negotiated organisational process. This theme addresses a gap in critical scholarship on Big Data, as it explores the links between Big Data, its organisational and commercial contexts and increasing levels of intimate surveillance. The articles illustrate how business and organisational practices shape and are shaped by BDA and how the producers and consumers of Big Data are forging new intimate and intensive surveillance relationships. BDA is not as revolutionary as sometimes suggested by vocal advocates. Its implementation and use is embedded within, and shaped by, powerful institutional norms and processes- and when seen in retrospect the development of BDA is clearly an incremental path dependent process.

Thursday, 20 February 2020

Reconfiguring National Data Infrastructures through the Nordic Data Imaginary

Aaro Tupasela, Karoliina Snell, Heta Tarkkala
Big Data & Society 7(1), https://doi.org/10.1177/2053951720907107. First published: February 20th, 2020
Keywords: big data, health data, health policy, platform economy, Nordic data gold mine, data imaginary, sustainability

Data has become a central feature of economic development and system of production. The range of companies that source data from their users on a daily basis has become a mainstay of ethical and legal discussions surrounding data relations between the sources, collectors and users of that data. In our article just published in Big Data & Society, we focus on the development and implementation of national data infrastructures within the Nordic countries.

The Nordic countries aim to establish a unique place within the European and global health data economy. They have extensive nationally maintained and centralized health data records, as well as numerous biobanks where data from individuals can be connected based on personal identification numbers. Much of this phenomenon is the result of the emergence and development of the Nordic welfare state, where Nordic countries sought to systematically collect large amounts of population data to guide decision-making and improve the health and living conditions of the population. These massive collections of data have remained somewhat separate and connecting data between different sources has taken time and effort for researchers to accomplish due to ethical and legal constraints associated with such practices. With the explosive growth in utilizing big data in research and development, however, these data infrastructures are being re-purposed within the Nordic countries.

Recently, the so-called Nordic gold mine of data is being re-imagined in a wholly other context, where welfare state data and its ever-increasing logic of accumulation is seen as a driver for economic growth and private business development. This model, which the private sector has given birth to, has become a model for national projects to capitalize on population data. Our article explores the development of policies and strategies for health data economy in Denmark and Finland. We ask how nation states try to adjust and benefit from new pressures and opportunities to utilize their data resources in data markets. This, we argue, raises questions of social sustainability in terms of states being producers, providers, and consumers of data. The data imaginaries related to emerging health data markets also provide insight into how a broad range of different data sources, ranging from hospital records and pharmacy prescriptions to biobank sample data, are brought together to enable ‘full-scale utilization’ of health and welfare data.

Sunday, 16 December 2018

Illustrating Big Data discourses in the healthcare field

Marthe Stevens, Rik Wehrens and Antoinette de Bont

Over the last few years, there has been a growing critical scholarly discourse that reflects on how Big Data shape our knowledge and our understanding. Primarily the fields of Science and Technology Studies and Critical Data Studies have been instrumental in elaborating the neglected and problematic dimensions of Big Data. However, it is unclear how and to what extent such insights become embedded in the healthcare field.

At the same time, we notice that the healthcare field welcomes initiatives that aim to improve healthcare through Big Data. This development is interesting, as the healthcare field is characterized by a strongly institutionalized set of epistemological principles and generally accepted methodologies. The field favors, for example, high-quality evidence from randomized controlled trials and observational studies to guide treatment decisions. Big Data challenge these principles and methodologies as they promise faster and more representative knowledge on the basis of large-scale data analyses.

In our recent article in Big Data & Society, “Conceptualizations of Big Data and their epistemological claims: a discourse analysis”, we studied the various ways in which Big Data is conceptualized in the healthcare field and assess the consequences of these different conceptualizations. We constructed five ideal-typical discourses that all frame Big Data in specific ways and that use other metaphors to describe Big Data. Three of the discourses (the modernist, instrumentalist and pragmatist) frame Big Data in positive terms and disseminate a compelling rhetoric. Metaphors of capturing, illuminating and harnessing data presume that Big Data are benign and leading to valid knowledge. The scientist and critical-interpretive discourses question the objectivity and effectivity claims of Big Data. Their metaphors of selecting and constructing data illustrate another political message, framing Big Data as limited.

The modernist discourse: capturing data

Illustration by: Sue Doeksen

During our analysis, it became apparent that especially the critical-interpretive discourse has not broadly infiltrated the healthcare domain, despite the attention that is given to the problematic assumptions and epistemological difficulties of Big Data in fields such as Science and Technology Studies and Critical Data Studies. We argue that the healthcare field would benefit from a more prominent critical-interpretive discourse, as the other discourses do not address important reflections on the normativity and situatedness of Big Data as well as the social and political processes that create Big Data.

For the article, we worked together with an illustrator to visualize the discourses, as we believed that illustrations could help to deepen our and the reader’s understanding of the discourses. We contacted Sue Doeksen (www.suedoeksen.nl) and she was very willing to help us and think along. What followed was an exciting process in which we and Sue both inspired each other. She wanted to have a clear message to present in a simple illustration. We had to make sure that the essence of the discourses was captured in the images.

This paper is part of a broader research project that focusses on the expectations and imaginaries associated with Big Data in healthcare. In the project, we conceptualize Big Data as a collection of practices and we aim to study what sorts of meaning it receives, is given to and how it changes practices. During the study, we specifically focus on the epistemological claims of Big Data.

About the authors:

Marthe Stevens is a PhD candidate at the department of Healthcare Governance at the Erasmus School of Health Policy and Management (Erasmus University Rotterdam, the Netherlands) and WTMC. She studies the use of Big Data and Artificial Intelligence in hospital settings in the Netherlands and in Europe. Her work focuses on the expectations and imaginaries associated with these new (data-driven) technologies. For more information see www.marthestevens.com

Rik Wehrens is an assistant professor at the department of Healthcare Governance at the Erasmus School of Health Policy & Management. His (STS) research work focuses on issues of knowledge translation and ‘epistemological politics’, such as the coordination work between public health researchers and practitioners in negotiating the meaning of ‘practice-based health research’, and ‘valuation work’ in healthcare improvement programs. His current work explores the roles and expectations of Big Data in healthcare through ethnographic and discursive research ‘lenses’. As a part of the EU-funded project Big Medilytics (https://www.bigmedilytics.eu/), he is involved in an international comparison of formal and informal rules for Big Data in various European countries.

Antoinette de Bont is an endowed professor at the Erasmus School of Health Policy and Management. Her research agenda addresses national and international policy priorities, like the diversification of the healthcare work or the use of Big Data to increase efficiency in healthcare. The research question that defines her agenda is: how do interdependencies between people and technology explain innovation in healthcare.

Monday, 23 May 2016

Dave Beer introduces his new article "How should we do the history of Big Data?"

In this video abstract, Dave Beer, Reader in Sociology at the University of York, introduces his new article in Big Data & Society "How should we do the history of Big Data?

Tuesday, 26 May 2015

Richard Webber Discusses the Adoption of Geodemographic and Ethno-cultural Taxonomies for Analysing Big Data

by Richard Webber

During the 2015 General Election the Conservatives, Labour and the Liberal Democrats will have targeted millions, if not tens of millions of political communications using the information they hold about Britain’s 50 million or so electors. Britain’s new administration could easily depend on the skill with which their data management teams select the right electors to target, the right message to target them with and the right people to deliver each message. Winning General Elections is perhaps one of the most important purposes to which the principles underlying Big Data are put.

The mass of data built up from decades of political doorstep and telephone canvassing are critical inputs to the databases used to generate these communications. But so too are measures of status and cultural background with which each of these parties populate the data record they hold against each voter.

The measures they use do not correspond to the traditional definitions of social status and ethnic origin with which social scientists are familiar. After all there is no way the parties can establish the occupation of each individual elector. Nor can they ask each elector to identify the ethnic group with which they identify. Instead it is the elements of electors’ names and addresses that will be used to make these inferences used to populate each data record. Electors’ status – and quite a lot more about them, such as whether they live in a gentrified neighbourhood of a commuter village – will be inferred from the categorisation that optimally fits their postcode. Their cultural backgrounds, including ethnic origin, religion and language, will be inferred from their personal and family names.

It is not because theories relating to voting preference have been developed using these categories that they should be used for targeted communications. It is because they are the only practical ways to code the parties’ databases. Clearly the classifications need to be able to predict differences in electoral attitudes, behaviour and responses – this they clearly can.

Political parties are not alone in using these tools. Postcode classifications such as Mosaic and Acorn are routinely appended to the customer files of most large consumer facing companies, whether in the US, the UK or a number of other European countries. As tools for the analysis of customer behaviour they have many advantages over conventional questionnaires – they are non-intrusive, can be applied retrospectively, eliminate non-response and its consequent bias, are inexpensive to obtain, easy to append and compliant with data protection legislation. Most important of all they consistently provide high levels of discriminatory power, virtually irrespective of the behaviour analysed and typically as high as conventional measures of status and ethnicity.

Given the ubiquity of these systems in the commercial sector, it seemed curious to us three co-authors why these systems should be used so seldom by researchers working in university social science departments. This was particularly puzzling given the important influence neighbourhood characteristics are held to play in both geographic and sociological theory. Unlike other “big data” sets, many of which are commercially confidential and/or whose use involves high levels of computational competence, these generic segmentations based on big data are not difficult to acquire. Neither the task of appending these codes to research datasets nor the cost of obtaining them would seem to present an obstacle to their use.

Given our backgrounds in marketing and market research, in government and in qualitative social sciences, we thought it would be useful to pool a hundred years of experience so as to better understand the factors contributing to the uneven level of adoption of these systems between commercially based and university based researchers.

We felt the starting point for such an investigation should be an audit of the features of these new classification systems which variously attract or repel different groups of researchers. For example the use of labels such as “Liberal Opinion”, which many commercially employed researchers find a helpful summarisation of an important segment of the population, is evidently off-putting to many university employed researchers on account of its imprecision and lack of theoretical justification. From our respective experiences it was also our belief that institutional factors were particularly important reasons for differences in adoption levels. For example data managers working for political parties have good opportunities to evaluate the discriminatory power of such systems against behavioural data, they are given considerable autonomy, they do not have to justify their statistical methods to a wider reference group, the use of these systems supports a long term operational requirement and “what delivers” is ultimately a more important consideration than whether or not these systems embody particular theories.

By detailing many different considerations of this sort we concluded that it would be possible to develop a more general typology of research environments according to the pressures either favourable or hostile to the adoption of these systems. We were able to identify seven quite distinct research environments that we could characterise in this way. Each one differed in the extent to which it was open to the adoption of these new classifications.

The theoretical conclusion which we reached as a result of our investigation was as follows. The categories on which much social scientific theory is developed necessarily depends on the categorisations which it is practical for researchers to link to behavioural data. So long as the conventionally structured survey questionnaire was the exclusive source of quantitative data this would be social class and ethnic origin. But these forms of categorisation easily become institutionalised as a result of the inter-dependence between theory and the categories on the basis of which theory is developed. When alternative sources of evidence become available, as for example big data, it is no longer desirable to base analysis exclusively on the categorisation systems which were designed for collection via survey questionnaires. Other generic, synthetic representations of demographic constructs become more practical for the analysis of big data, not least, as in the example of “Liberal Opinion”, because they incorporate multiple different dimensions into a single classification system.

Perhaps most important of all – new evidence of this sort fosters an entirely new and original set of hypotheses and, as a result, opens up the academic establishment to fresh lines of research enquiry.

About the author

Richard Webber is the originator of the UK-based geodemographic classifications indicators, Acorn and Mosaic, and for many years managed the micromarketing divisions of first CACI and then Experian. In recent years he has championed the use of personal and family names as a means of inferring people’s cultural background, a tool used by the political parties to support their election campaigns. His colleague in this venture is Trevor Phillips. Richard Webber is a fellow of the Institute of Direct Marketing and of the Market Research Society, and a visiting Professor in the Department of Geography, Kings College London.

Tuesday, 28 April 2015

Twitter and Censorship: Can Images and Big Data Help Hack the Silence?

by Paolo Cardullo

In this post, I would like to reflect further on the process of gathering and analysing digital data from Twitter. In particular, I want to think about imaginative ways of working with digital images which immensely circulate on social media.

When I first started observing the Turkish censorship of Twitter in March 2014, I immediately thought that the sole use of visually powerful infographics and traffic analysis did not effectively explain what was going on, both online and behind the scenes of the 'digital coup'. I have been for a long time an Open Source software and digital rights activist, as well as a trainee hacker. I therefore found it hard to believe that data was seamlessly flowing despite attempts from the Turkish government to block the popular social platform. I immediately understood that graphics and tables were hiding a lot more than they were saying. I therefore started a process of intense following, outreaching a few active participants for online interviews, as well as collecting most reflexive tweets and images. The full story is narrated in my article for Big Data & Society. Here, I would like to show how images, triangulated with other sets of data, reinforce the theoretical arguments I make in the paper: namely, that during systemic chock points, such as censorship of the Internet, ordinary users enact a series of unpredictable tactics of circumvention of censorship, aided by more traditional hackers. Drawing from Italian Autonomists' ideas around digital labour and acknowledging the learning process attached to use of ubiquitous digital devices, I put together the provocative concept of a 'hacking multitude'. The paper suggests that a generalisation of hacking/multitude can be very problematic for Big Data analysis, because of the unacknowledged potential that a multitude has for modifying, hiding, or pushing through the flow of digital data.

In this ensemble of different data point, a special place belongs to images. They circulate immensely on digital platforms – they are increasingly 'poor images' (Hito Steyerl, 2009), highly compressed in order to circulate faster. Despite this, visual material is hard to take into account in traffic and metric analyses. Images are Big Data too, of course: a combination of bits and pixels that escapes algorithm identification or categorisation. Images in fact need to be contextualised, explained, and analysed (also in relation to other visual discourses). In the article, I broadly discuss one particular image (see image above) which triggered my intuitions on what a 'hacking multitude' might be. It is this image that somehow determined in me, the researcher, a Barthesian 'punctum' around transformative moments. I would contend that more traditional Big Data representations (metrics, infographics, tags, etc) can hardly reveal such a textured understanding of the social world.

About the author

Paolo Cardullo is an Associate Lecturer in Sociology at Goldsmiths College, University of London, and a visiting research fellow at the Centre for Urban and Community Research (CUCR). He finished his PhD in Visual Sociology at Goldsmiths (UoL), with a thesis around the affective geographies of gentrification in East Greenwich (2012). ‘Walking on the Rim: Towards a Geography of Resentment’ was discussed with Prof. Douglas Harper (Duquesne University, Pittsburg, and IVSA President) and Dr. Alison Rooke (Goldsmiths and CUCR Director). The thesis was supervised by Prof. Caroline Knowles and Prof. David Oswell (both at Goldsmiths).

Tuesday, 14 April 2015

David Beer: On 'Productive Measures' And Understanding Culture Through Data Circulation

by David Beer

After I completed work on my book Popular Culture And New Media: The Politics of Circulation I kept an eye out for interesting examples of data circulations in culture. The book had argued that digital forms of by-product data fold-back into culture with transformative effects. The book suggested that there was a politics to these data circulations and that they were far from neutral. The central claim was that in order to understand contemporary culture we need to have conceptual and methodological means for understanding these data circulations. In that book I argued that we need to look at the devices and infrastructures through which data is generated, to look at the ordering powers of the archives that these data accumulate within, then to think about how algorithms filter and sort these data circulations and finally to consider how these data are played with and become part of bodily routines. The book also argued that before we might rush into analyzing and using big data to understand the social world, first we needed to develop a greater understanding of the data themselves. Following the publication of the book I noticed that I was gathering a lot of examples of data circulation in the sport of association football. It seemed that there was a powerful politics of data circulations in football that appeared to be reshaping its production and consumption. This observation became the starting point for the article that was recently published in Big Data & Society.

The article tries to reflect on the growing data assemblage in the cultural sector. Football became the case study for doing this. As well as thinking about the constitutive power of data, which I use the concept of 'productive measure' to open-up, the article also thinks about the way that data are involved in decision-making and organisational ordering. As I worked on the piece it increasingly came to focus on the emergent industry of data analytics. As a result, the piece shows how embedded data analytics are in the cultural sector and hints at the implications of the intensification of metrics in the coming years. The industry of data analytics and data solutions is really important in understanding how data becomes a part of culture and everyday life. In football there is already an established and sophisticated industry of analytics, with data analysis experts and packages that have the capacity to reconfigure the game itself as well as how it is understood. This industry of analytics is illustrative of a growing data assemblage bit it is also interesting to see how these analytics are described, marketed and presented. Here we might detect a discourse that promotes a ‘faith in numbers’, as Theodor Porter has put it, and possibly even a version of what Philip Mirowski has described as an embedded ‘everyday neoliberalism’. The article argues that it is important to understand the materialities of the data infrastructure but that we should also think about the way that this industry imagines the power of data and data analytics. These imagined powers are also a part of the data infrastructure and a part of how data processes become embedded in our lives.

Hopefully the article and the concept of ‘productive measures’ it develops will open-up questions about the integration of big data into culture. This is an area that I think still needs quite a bit of development in terms of our understanding. The article focuses on the analytics industry but it also has some illustrations of the way that the presence of data changes the very making of culture – from where to pass the ball, to decisions about players worth, to the assessment of values skills, and so on. The concept of productive measures aims to extend the existing literature on the way that data enacts the social world, in particular it aims to think through the productive power of measures as they come to shape and reshape the production, consumption and value of culture. Football becomes the case study in this instance, but it is used of an illustration of a broader set of processes. The article is intended to continue the project of rethinking the culture as it comes to be implicated by circulations of big data. The objective of this particular article is to think about how the analysis of the metrics that are generated might shape the cultural forms we consume, but my hope is that it develops a concept that might be used to think more broadly about the productive power of metrics as both a material presence and a set of imagined possibilities.

About the author

David Beer is Senior Lecturer in Sociology at the University of York. His publications include Punk Sociology (2014), Popular Culture and New Media: The Politics of Circulation (2013) and New Media: The Key Concepts (2008, with Nick Gane). He is currently writing a book called Metric Power and co-edits the Theory, Culture & Society open access site theoryculturesociety.org

Big Data & Society (BD&S) is an open access peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies.

The Journal's key purpose is to provide a space for connecting debates about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge.

Impact Factor for 2020 = 5.987

Follow @BigDataSoc