Big Data and Small: Collaborations Between Ethnographers and Data Scientists
Heather Ford, Oxford Internet Institute, UK
In the past three years, ethnographer and now-PhD student, Heather Ford, has worked on ad hoc collaborative projects around Wikipedia sources with two data scientists from Minnesota, Dave Musicant and Shilad Sen. In this essay, she talks about how the three met, how they worked together and what they gained from the experience. Three themes became apparent through their collaboration: that data scientists and ethnographers have much in common, that their skills are complementary, and that discovering the data together rather than compartmentalising research activities was key to their success.
On Minorities and Outliers: The Case for Making Big Data Small
In the past three years, ethnographer and now-PhD student, Heather Ford, has worked on ad hoc collaborative projects around Wikipedia sources with two data scientists from Minnesota, Dave Musicant and Shilad Sen. In this essay, she talks about how the three met, how they worked together and what they gained from the experience. Three themes became apparent through their collaboration: that data scientists and ethnographers have much in common, that their skills are complementary, and that discovering the data together rather than compartmentalising research activities was key to their success.
On Minorities and Outliers: The Case for Making Big Data Small
Brooke Foucault Welles, Northeastern University, US
In this essay, I make the case for choosing to examine small subsets of Big Data datasets – making big data small. Big Data allow us to produce summaries of human behaviour at a scale never before possible. But in the push to produce these summaries, we risk losing sight of a secondary but equally important advantage of Big Data – the plentiful representation of minorities. Women, minorities and statistical outliers have historically been omitted from the scientific record, with problematic consequences. Big Data afford the opportunity to remedy those omissions, however to do so, Big Data researchers must choose to examine very small subsets of otherwise large datasets. I encourage researchers to embrace an ethical, empirical and epistemological stance on Big Data that includes minorities and outliers as reference categories, rather than the exceptions to statistical norms.
In this essay, I make the case for choosing to examine small subsets of Big Data datasets – making big data small. Big Data allow us to produce summaries of human behaviour at a scale never before possible. But in the push to produce these summaries, we risk losing sight of a secondary but equally important advantage of Big Data – the plentiful representation of minorities. Women, minorities and statistical outliers have historically been omitted from the scientific record, with problematic consequences. Big Data afford the opportunity to remedy those omissions, however to do so, Big Data researchers must choose to examine very small subsets of otherwise large datasets. I encourage researchers to embrace an ethical, empirical and epistemological stance on Big Data that includes minorities and outliers as reference categories, rather than the exceptions to statistical norms.